back to index

Talking Agentic Engineering with Giovanni Barillari


Whisper Transcript | Transcript Only Page

00:00:00.080 | Hello.
00:00:01.080 | Hello, Armin.
00:00:02.080 | Nice to see you again.
00:00:03.080 | Nice to see you.
00:00:04.080 | I'm Giovanni.
00:00:05.080 | I'm currently working at Sentry as a Site Reliability Engineer.
00:00:10.440 | But let's say I kinda worked a lot with Python and web development, some stuff like that.
00:00:17.200 | And yeah, I kinda contributed to OpenSearch like, I think I started like 15 years ago
00:00:23.680 | with Web2Py, like the web framework from Massimo DiPiero.
00:00:28.680 | So yeah, that's kinda me, I guess, I'm actually a physicist, so I might have some weird opinions
00:00:38.900 | sometimes on software and the industry, but yeah, I guess that's part of me, being me.
00:00:44.880 | Yeah.
00:00:45.880 | So we had a discussion, when was it, like almost two weeks ago, I think briefly in a coffee place
00:00:51.240 | on AI, thick programming.
00:00:55.360 | And I think like maybe, I feel like there were two themes we discussed.
00:00:59.080 | One was the impact that it has on OpenSource and sort of the what it could be versus what
00:01:04.120 | it is today.
00:01:04.980 | I think this is like probably, I think the main things.
00:01:09.580 | So one of the weird things at the moment is that I always use identity coding a lot and
00:01:14.600 | I think it works for me.
00:01:16.580 | But at the same time, obviously I know that a lot of people who are having a completely
00:01:20.440 | opposite opinion to that.
00:01:22.300 | I started going into like, why is that?
00:01:24.300 | And I think that at least my realization at the moment is that it actually takes quite a
00:01:28.440 | bit of time and to learn, which is ironic because it sort of feels like it shouldn't.
00:01:32.760 | But I feel like there's, in a weird way, there's a little bit of a skill to it.
00:01:36.800 | So one of the things that I did recently, for instance, was I built this, I mean, I built
00:01:40.520 | a lot of stuff for the company.
00:01:41.520 | But when I open source, it was this Q thing, like a durable execution system, absurd, yeah.
00:01:47.800 | And the reason I even was willing to build it in a way was because I am from a purely philosophical,
00:01:54.520 | software engineering, philosophical point of view.
00:01:56.520 | I believe in two things very strongly as little dependencies as possible and for a system to
00:02:02.460 | crash really well and recover, like it should be, it should be like the disability should
00:02:07.280 | come from, like, it doesn't matter how it sort of terminates itself, like it should come
00:02:11.060 | back to it in a way.
00:02:13.060 | And so durable execution for me is an interesting thing, but I also didn't want to combine 20 different
00:02:17.040 | sort of startup-y kind of solutions out there together and I felt like it should be easier
00:02:20.920 | to do.
00:02:21.920 | So the only thing is that I was able to do this with Claude and Codex rather efficiently.
00:02:28.340 | And otherwise I probably might not have built it at this stage of the company because it
00:02:32.500 | is sort of a side quest.
00:02:36.880 | So that's one of my recent lessons of like, you can actually kind of build open source software
00:02:40.320 | with it and it actually feel kind of good about the quality that it has.
00:02:43.800 | And I feel like there's more leverage of me to maybe not over engineer things quite as
00:02:47.400 | bad because I can do it with just the agent.
00:02:51.720 | I feel like there's a little bit of a pushback against sort of the madness that was maybe
00:02:55.700 | the last, at least the last 10 years of software engineering that happens in a lot of companies,
00:02:59.620 | which is like all of a sudden it's like an army of third-party services that you don't
00:03:04.440 | really understand.
00:03:05.440 | So what's your take on that?
00:03:07.180 | So like, I think like I have a lot of point of views on this, like, and in general, like
00:03:12.480 | about AI for coding, I guess, like, I think I have like a draft for a blog post, like sitting
00:03:19.300 | there like a couple of years.
00:03:21.280 | And once in a while, like I try to add stuff there and never publish it.
00:03:26.200 | But anyways, I digress.
00:03:27.800 | I think it really depends on the context, right?
00:03:31.180 | Like in which you're operate because like, I guess we are like on the opposite sides of
00:03:37.380 | things, like regarding almost everything, like on the work side of things, right?
00:03:42.020 | Like I work for a company, like an established company.
00:03:44.180 | You're like starting with something new as a startup.
00:03:47.200 | I did a bunch of startup in the past, but like, yeah, it was 15 years ago.
00:03:52.080 | So like there was no, I stopped to, so I don't know, like if at the time, if I had the chance,
00:03:58.480 | I would use something like that.
00:04:00.160 | But specifically like for open source, I think it depends.
00:04:04.080 | Like, first of all, like I'd say in the last 10 years, like even in a community like the Python
00:04:11.140 | one, which I'd say it's like, it has like a slow pace compared to something like JavaScript
00:04:17.320 | or, or, or, or stuff like that.
00:04:19.060 | And I'd say things that have changed quite, quite a lot in terms of even just like in terms of the
00:04:29.060 | amount of libraries out there that you can, you know, just put it in as an, as a dependency.
00:04:35.060 | So I'd say like the amount of code and projects out there, it grew a lot and like the quality
00:04:43.820 | of those projects, like it's very different, right?
00:04:47.120 | Like, like we have some like really long standing projects, which are, I'm not saying like they're
00:04:52.620 | good quality, like by, by definition, we have plenty of new stuff.
00:04:57.320 | Like some of that is good, some of that is bad, but the role like of agent encoding for
00:05:02.440 | managing an open source, like a project to me is weird because I guess like my main concern
00:05:09.500 | when I, when I work on one of my open source project is like the long-term maintenance.
00:05:14.760 | Like I rarely put somewhere like out there and, and just never care about it, about it anymore.
00:05:22.680 | Like I, I, I'm not really into like source available rather than open source, right?
00:05:28.680 | Like, like there's very few stuff that I don't touch anymore.
00:05:32.920 | Like after I put them available for the, for the people to use.
00:05:36.920 | So yeah, to my perspective, like in your specific case, like I see how it, it works because it's
00:05:44.280 | something like very self-contained quite small in terms of features or, or, or scope or, or stuff like that.
00:05:52.680 | But yeah, I guess like my question for you would be what, what, how that project looks like at,
00:05:57.880 | like in two years, like, do you care at all?
00:06:00.600 | Like, like yeah.
00:06:01.800 | I mean, I think, so I, I, first of all, I think you're right that there's, um, there's a lot more
00:06:05.960 | source code now and I actually always found that to be a problem.
00:06:09.160 | It's not that I love curl as an example.
00:06:11.320 | I, I think there's a lot of stuff wrong with curl, but what I love about curl is that it commits itself
00:06:17.080 | to very, very long-term stability to it's, it's sort of like a, like a rock in the shore.
00:06:23.880 | If you call it this way, I guess it's just there, it's going to run everywhere.
00:06:27.800 | It doesn't change.
00:06:28.840 | It's a very, very reliable piece of software.
00:06:30.440 | It has its weird behaviors.
00:06:32.040 | And I have learned a lot of them over the years, but, but it's there, it works.
00:06:35.800 | And it means that a lot of improvements can land in one piece of infrastructure that
00:06:40.680 | many of us are using.
00:06:41.800 | Same with SQLite and many other projects.
00:06:43.720 | There's a completely different vibe than, for instance, if you take the reason by a lot of,
00:06:47.960 | by fabric, not by fabrication, I would love, um, what do you call this?
00:06:50.760 | But there's so many more projects sprawling in the JavaScript ecosystem.
00:06:54.200 | And I think the reason here is actually less that people are combining efforts together and
00:06:59.000 | more it's like, Hey, I also want to build an open source project.
00:07:01.080 | Like see the act of creating it and not the act of maintaining it.
00:07:04.040 | The maintaining part is the hardest one to creating it is the easy one.
00:07:07.160 | Yeah.
00:07:07.400 | I think that there's because maintaining is hard.
00:07:10.040 | There's a question like how well does that work?
00:07:12.680 | I think in general, my, my theory is because AI makes writing new code much easier.
00:07:17.880 | That code should not be open source.
00:07:19.400 | most of the time it should be whatever you need for your code base.
00:07:22.760 | And then maybe the entire code base goes public.
00:07:24.600 | I don't know, but, but, but we don't need more open source libraries.
00:07:28.280 | We need less.
00:07:28.840 | We need more consolidation.
00:07:30.920 | We need more people working together to solve problems, not 20 different
00:07:34.360 | JavaScript front end frameworks or whatever.
00:07:36.200 | But I actually think that if you, if you want to maintain things, if you really want to
00:07:42.440 | commit yourself to maintenance, then AI will actually help a lot.
00:07:46.200 | Um, because you can, if you want, because the thing is like, you need to commit yourself to
00:07:52.040 | maintenance.
00:07:52.440 | What does it mean?
00:07:53.000 | There's a lot of stuff that comes with it.
00:07:54.200 | Some of which are really, really crappy, like writing change logs or, uh, finding report cases.
00:07:59.400 | This is one of my most favorite things that, that AI can do is this is weird description of, of a
00:08:05.400 | problem that someone might've run into and like, Claude, make me a report case.
00:08:09.400 | And it's like, I hate doing that, but it can do that.
00:08:12.120 | Right.
00:08:12.360 | So I think it depends.
00:08:13.640 | How do you, how do you wield the sword?
00:08:15.320 | And so I don't think it's like, I think that would be a different thing if you sort of say like,
00:08:19.160 | Hey, the way I'm going to maintain my libraries, I'm just going to AI slope, commit everything.
00:08:23.240 | Right.
00:08:23.480 | There will be a way of doing that, but I don't think it's going to be one that has like a lot
00:08:27.080 | of users at the end of the day.
00:08:28.200 | I don't know.
00:08:29.240 | I think it's complicated.
00:08:30.280 | Yeah.
00:08:30.520 | But to that point, like my take would be like, like from my experience and again, like maybe,
00:08:36.680 | maybe that also depend on the scale of the project, like in, in terms of popularity of
00:08:41.400 | the projects I maintain, but like, even at the scale I am, I rarely find the act of producing
00:08:50.280 | code for my open source project, like the hard part of the maintenance.
00:08:54.760 | Like to me, like the major of burden comes from like issues management or release schedule.
00:09:01.640 | Something that Claude is capable of assisting with.
00:09:05.160 | No, I think you're right.
00:09:06.200 | But I give you those two examples, but I think like, I also, I feel like I'm quite fast at writing
00:09:11.480 | code and yet I very rarely write code now.
00:09:14.760 | I mostly sort of delegate to the agent.
00:09:17.240 | And one of the reasons why I actually found this to be, so for instance, absurd is a good example.
00:09:23.800 | If I, if I would have written it by hand all the time, everything of it,
00:09:28.680 | I wouldn't have built it like I did because I hate SQL.
00:09:32.760 | I really, really do.
00:09:33.960 | It's not my favorite language.
00:09:35.080 | Most of the people.
00:09:35.560 | So I would have, I would have erred on the side of just writing most of it in Python,
00:09:40.600 | writing a Python SDK and then do the least amount of SQL necessary.
00:09:44.360 | But actually that's precisely what I didn't want to do.
00:09:46.840 | I knew that the right way of doing this is the same way as PGMQ does it, which is you write a bunch of
00:09:52.440 | stored functions because then you do everything from the database and the SDKs are very, very tiny.
00:09:57.480 | And so now I can have a Go SDK, a Python SDK, a JavaScript SDK, but I would not have enjoyed
00:10:03.560 | writing that because it would have involved writing one of my least favorite programming languages.
00:10:07.800 | It's good for queries, but the moment you do more complicated stuff of it, it's not enjoyable.
00:10:11.240 | And so I think it changes a little bit the perspective of how you do things because all
00:10:17.560 | of a sudden I don't quite mind as much anymore.
00:10:20.280 | Doing the things that I always knew were right to do, but it's just not enjoyable.
00:10:25.160 | I had this thing also recently where I basically had this small little utility script,
00:10:30.600 | which one of the quote unquote requirements that I made myself was that it should work on Linux and Mac.
00:10:36.040 | And it just didn't, it just didn't.
00:10:40.120 | Right.
00:10:40.600 | And it became so annoying to maintain because it's like, it basically depends on like different
00:10:44.920 | parameters of set and stuff like this.
00:10:46.520 | I just went to Cloud and said, look, let's just rewrite this in Perl.
00:10:50.200 | It's not that I like Perl, but it runs everywhere and it's very good with regular expressions.
00:10:54.600 | Right.
00:10:54.840 | Yeah.
00:10:55.240 | And it's one of the things that you can throw in any machine and you don't have to deal with
00:10:58.920 | like dependencies and for what it was doing, which is basically just parsing a little bit of stuff
00:11:03.720 | in a build step.
00:11:05.000 | It's perfect.
00:11:05.640 | Right.
00:11:05.880 | Yeah.
00:11:07.240 | I don't know.
00:11:07.720 | Like, I guess maybe I'm just lucky.
00:11:12.440 | Like the amount of things, like if I think about like in general, my, I don't know,
00:11:19.880 | daily life in programming, like both on the work side of things,
00:11:24.120 | like open source things like the amount of actually boring stuff or things that annoys
00:11:32.280 | It's very little, like, like in general, right?
00:11:35.240 | Like, like, I don't know.
00:11:36.600 | Like, like probably most of the people find annoying to write Vash scripts, but I learned
00:11:41.800 | that like, I don't know, ages ago.
00:11:43.880 | So like, I don't know.
00:11:45.080 | I'm, I'm just like, it's hard for me to do that step.
00:11:48.520 | Like where I see a problem and maybe like, yes, I don't like Vash, but maybe I need to write like,
00:11:54.760 | I don't know, those 10 lines of Vash.
00:11:56.120 | Vash, like, it's hard for me to do the switch and say, instead of just typing, like switch the
00:12:03.320 | context, right?
00:12:03.720 | Yeah.
00:12:03.720 | And then say, oh, okay, let's ask.
00:12:05.720 | I felt the same at one point.
00:12:07.240 | And now I'm like, it has switched in my head somehow.
00:12:10.120 | Yeah.
00:12:11.400 | I have a different version of this, which is like, for many years, I was sort of in my mind,
00:12:17.400 | I had this idea that I would not drive an automatic car.
00:12:19.720 | It's just, it was like, I will only ever buy stick shift cars.
00:12:24.760 | But at one point I was like, this just doesn't make a ton of sense to me because I really like,
00:12:28.520 | what's it called?
00:12:30.360 | Adaptive cruise control, which works most better if a team goes to zero and you don't have to
00:12:34.920 | shift like a maniac.
00:12:35.880 | So it was sort of this, I had to open myself up to the idea that I was just like,
00:12:40.840 | there's a car for fun.
00:12:41.960 | And then there's a car that just has to get me through the day.
00:12:44.120 | And that works better for an automatic.
00:12:45.560 | And I feel like maybe sort of I had made the same shift in my mind at one point about
00:12:50.360 | like the act of like punching it into the keyboard.
00:12:53.720 | It is stimulating, but it's not necessarily the important one.
00:13:00.760 | Yes, it's definitely not the important one.
00:13:03.320 | Like I spent a lot of time thinking before actually typing.
00:13:07.880 | But I guess like if I have to put like on, on like the balance, right?
00:13:15.160 | Like, like the thing to say, okay, now I, I, I've done all of my thought process,
00:13:22.280 | which I will do like in any case, like, like to me, it's very hard to delegate the thinking to
00:13:28.120 | whatever LLM like exists out there.
00:13:30.760 | Like at least for now, mostly because of the lack of predictability in the output.
00:13:37.080 | Like, like, it's very hard for me to, to delegate like the thinking to something like,
00:13:41.400 | which is not really, um, repeatable.
00:13:44.200 | Like can the same input can give me like 10 different answers.
00:13:47.960 | Do you still find that to be a problem now that it's very unpredictable?
00:13:51.880 | I'd say like, like, again, like in general, no, but talking about software, like it's,
00:13:58.680 | it's not the way I approach software, right?
00:14:01.480 | Like, like, it's not like, like my way in general of, of architecting something
00:14:06.520 | is not like there are a hundred possible ways of doing this.
00:14:10.920 | Like my, my, my general approach is probably there are like two, three way correct ways of doing this.
00:14:18.920 | And, and I need to pick the optimal one for my use case or context.
00:14:24.040 | Like, of course there are like probably a hundred ways to do the same ship, but like, realistically
00:14:29.400 | speaking, like probably three out of those a hundred are worth, you know, investigating.
00:14:34.440 | And, and the point is like, if I don't have, if I cannot like reproduce like this, those three ways,
00:14:43.160 | like with a certain amount of certainty, sorry.
00:14:46.360 | Yeah.
00:14:46.760 | I, I, I, I find it very hard to, you know, like, like leave the control to, to, to that.
00:14:51.960 | Right.
00:14:52.360 | I guess like for me, I think I understand this to me, like mentally, I never feel like I don't
00:14:58.600 | have to control in a way.
00:15:00.280 | And, and this is actually a little bit weird because like very clearly I let it do really
00:15:03.880 | crazy stuff.
00:15:04.520 | Right.
00:15:04.840 | Like it connects to my production system and, and checks the database and stuff like this.
00:15:08.520 | Right.
00:15:08.680 | Yeah.
00:15:09.560 | I will never, never do that.
00:15:11.160 | Well, I'm like, I'm, I'm very close to my escape button if it does something stupid.
00:15:15.800 | But, but, but like ignoring those sort of like more extreme versions of it, it runs tests,
00:15:20.440 | it writes code, it does some stuff, but at the end it presents me with a diff.
00:15:24.440 | And so it is me that commits.
00:15:27.080 | It is me that reviews, it's me that sort of interacts with it.
00:15:31.400 | Right.
00:15:31.800 | So I don't feel like my agency is not there.
00:15:34.760 | I can sort of fall down into a path where I'm like, like, this is like a sort of a pure
00:15:40.600 | slot project where I don't really care.
00:15:42.200 | Absurd has like the main SQL part and the driver, which is really good, I think.
00:15:46.120 | But then it has this UI, which is called Habitat, which I just, I just wanted to see my tasks.
00:15:50.120 | And I didn't care.
00:15:51.320 | It, this is pure slot, but, but I would have not written the UI in the first place before.
00:15:55.800 | Right.
00:15:56.120 | Yeah.
00:15:56.280 | Yeah.
00:15:56.680 | And so.
00:15:57.080 | But you still didn't, like that didn't change.
00:15:59.800 | Like, like something else, like you said of someone else.
00:16:02.920 | Yeah.
00:16:02.920 | Like something else.
00:16:03.720 | But the thing is that now I have the UI and, and, and I feel, I feel really happy about it
00:16:07.800 | because I can like, look, I run this on one of my agents and basically every single step that it
00:16:12.600 | does, I can now click through, I can, I can see it.
00:16:15.320 | And it even does some nice things for like, if I click on a string, it sort of changes from
00:16:18.680 | JSON rendering with like the escapes to like inline as much easier to debug.
00:16:22.840 | It gives me like, it gives me pure joy using this UI and, and debugging my shit better.
00:16:28.920 | That if I didn't have the agent at this point, I wouldn't have committed myself to doing it.
00:16:33.800 | So I think it changes the calculus in a way.
00:16:36.440 | If you, and because you mentioned earlier, like, is it different for, for an open source project or
00:16:40.520 | some, some new startup as another company, I think there are certain projects that you run
00:16:44.840 | into at the company where the actual hard part is not writing the code.
00:16:49.000 | In fact, it is very possible that they have a monumental task where the actual only change
00:16:54.040 | is one configuration parameter.
00:16:55.480 | And what actually is really hard is validating this break, right?
00:16:59.880 | So you often create these crazy harnesses around it to validate that what you're changing right now
00:17:06.040 | actually is going to work or you're creating this massive migration strategy.
00:17:11.080 | Right.
00:17:11.560 | And I think what AI helps with now is helping you create the tools to then do those changes in a way,
00:17:17.640 | because a lot of these tools are like throw away anyways, you're going to need them for like a week
00:17:21.480 | or maybe 90 days or however long it takes you to run the migration.
00:17:25.240 | And then afterwards they're like, just dev null, delete it.
00:17:28.440 | So I feel like there's like, even for, I think that can work in a very small company.
00:17:37.560 | I don't know.
00:17:38.040 | And to 50, 50 people, I'd say like, but it, for me, like it's harder to imagine even, even if I like,
00:17:49.320 | I don't know, like official, like building a company today, right?
00:17:52.120 | Like, and, and, and scaling that company to 500 people, not necessarily all of them working on code,
00:17:57.320 | of course, but let's say like, I don't know, half of them, whatever, 250 people working on code.
00:18:02.200 | And like, even, even if I can picture like working with AI, especially for that kind of thing,
00:18:08.600 | you mentioned might work like really good, like in the first, I don't know, year.
00:18:14.040 | I think like the major time consuming activity in big companies is, doesn't have
00:18:20.040 | anything to do with the code.
00:18:21.320 | Like, like, but even, even like, like thinking about features.
00:18:25.400 | But I don't think the AI makes it harder.
00:18:28.040 | No, it's just makes, makes cheaper to make experiment as soon as those experiments are
00:18:36.120 | self-contained enough.
00:18:37.480 | But again, like if we're talking about the big company is the same.
00:18:40.360 | But let's take a, let's take a century for instance, right?
00:18:42.520 | I used to work there.
00:18:43.400 | One of my least favorite tools, I appreciate that it exists,
00:18:47.880 | but I absolutely hated interacting with was the Snuba admin.
00:18:50.600 | Never had the chance to interact with it, but I heard a lot of stuff.
00:18:54.680 | And it's not because this tool was badly built or something like this,
00:18:58.520 | but it was like, it was built with about as much time as you have in addition
00:19:02.120 | to all the other shit that you have to do.
00:19:03.320 | But then everybody else has to use this tool, right?
00:19:06.120 | And so if, if, if we were to take like the, the very sort of simplistic view
00:19:12.040 | of it and it's like, what's the quality of that tool?
00:19:14.200 | And what would the, what would be the quality of that tool if it was written with AI?
00:19:18.920 | Now I'm not going to judge if it's going to do better or worse.
00:19:20.840 | That's my point.
00:19:21.720 | My point is that it actually took a really long time to write the tool too, right?
00:19:25.080 | Because reasons you need to deploy it and then, and then you can maintain it.
00:19:29.640 | And then they're going to be all the people that are going to complain
00:19:31.640 | about all this stuff that it doesn't do.
00:19:32.760 | And then someone else is going to throw some stuff into it.
00:19:34.840 | And so I, from my, from, from my recollection of actually even trying
00:19:39.720 | to look at the source of existing, it wasn't amazing.
00:19:43.000 | Yeah, yeah.
00:19:43.960 | But it wasn't built to be amazing.
00:19:45.720 | Yeah, of course.
00:19:46.760 | So, so these tools exist if you want them or not.
00:19:49.720 | A lot of the stuff within companies exists that, that is this internal thing.
00:19:54.280 | But I think like it's a very tiny portion, like on big companies, right?
00:19:58.360 | Maybe, but it is, is still in my mind.
00:20:01.080 | All of these tools in my mind are where all my frustration is, in a way.
00:20:05.320 | Okay.
00:20:05.640 | It's not, it's not the like, it's not that curl or it's not the Linux kernel,
00:20:10.120 | or it's not these things that are actually built for purpose.
00:20:13.160 | And, and like, and they're really, really rock solid pieces of software engineering.
00:20:17.640 | It's all the other shit because like, that's the one that you run into, right?
00:20:21.640 | Like the only way for me to debug an issue is to go through Snow White.
00:20:24.680 | Again, I don't want to pick on this thing, but it's like, this is like,
00:20:27.320 | there was a significant part of my day in every once in a while.
00:20:30.920 | I was going there and running queries and it was just never great.
00:20:33.640 | Yeah.
00:20:34.040 | And then I could look at like, here's this other tool I could use,
00:20:36.440 | where like actual people put ridiculous amount of effort into compared to the
00:20:40.120 | internal tooling that we built.
00:20:41.320 | And it was better, right?
00:20:42.440 | And I can tell you objectively, my internal debugging tools now are better.
00:20:46.760 | But it builds a fraction of time.
00:20:49.160 | And, and I think that problem will not necessarily scale any worse to a large company.
00:20:54.360 | No, I completely agree.
00:20:55.960 | My point was like, I think like the, the amount of time spent on those tools, like on a bit, like,
00:21:02.280 | like once the company like becomes bigger and bigger is, is less and less relevant in a way.
00:21:08.760 | Like, I don't see like the majority of discussion happening in Sentry, like during meetings about
00:21:14.520 | Snow Batman, like, it's just, you know, it's there.
00:21:17.640 | So like, so yeah, to me, to me, like, that's, that's the kind of the, the barrier in between,
00:21:24.360 | you know, like, I totally agree.
00:21:25.960 | Like every company out there could use like any coding, like AI coding tool to, to build like UIs over
00:21:35.080 | stuff or, or, or yeah, again, like internal tools for making queries or checking whatever the state it is.
00:21:43.080 | My point is like talking about real software, like.
00:21:46.200 | Okay.
00:21:48.520 | Here's a question.
00:21:49.960 | How good would AI have to be?
00:21:52.360 | Like, like how would, would it have to work?
00:21:56.440 | So do you actually think it would work for real software?
00:21:58.360 | Like, from my perspective, like the main, the, again, like the main point for me would be to have
00:22:05.080 | something that is predictable, like, like the predictability to me, like, is the, like, probably
00:22:11.960 | like the reason number one, like, I, I, I really find really hard, like working with AI in general.
00:22:19.960 | And it's, it's predictability in the code that it generates.
00:22:23.640 | So it's predictability in being able to tell ahead of time if it's going to be able to do this task.
00:22:27.880 | kind of, or a combination of thereof.
00:22:29.720 | Like, well, in which part I think to find it most unpredictable.
00:22:32.760 | I mean, at the end of the day, it's kind of the same, right?
00:22:35.160 | Because like, if you have to reroll 10 times, right?
00:22:42.600 | Like, like, and, and, and five, like, and even, even if like five, five out of those 10 time is
00:22:48.680 | decent and it produced something I can like work on, right?
00:22:55.240 | Like patch and, and, and, and adapt?
00:22:57.080 | For me it's different because if I feel like there's like, yes.
00:23:00.120 | But if I can know ahead of time, if the coin flip, like basically, let's say this is in fact a
00:23:06.760 | randomized thing and it would work a little bit like a role play game.
00:23:10.600 | And I was like, this is a, I don't know, this is a task that requires throwing 3d6 or something.
00:23:14.600 | Whereas this is one that requires, I don't know, one and everything over two sort of is, is success.
00:23:20.200 | Like you have ahead of time, you have an indication of, is it's going to be a problem
00:23:24.520 | worth throwing the AI to or not, right?
00:23:26.440 | And I think that the reason why I feel like I feel a little bit more confident about AI now is that I
00:23:31.080 | have a better understanding, at least for the models that I'm using, which problems are problems that the
00:23:35.960 | AI at least is going to make some progress versus which are the problems where I don't even have to
00:23:40.200 | try because it will not go anywhere or the output is going to be too random.
00:23:44.040 | So that to me is a different thing because it gives me the opportunity to sort of opt out of it
00:23:48.920 | before even wasting like coin tosses in that sense.
00:23:53.480 | I'd say like probably, okay, a few things here.
00:23:57.400 | Like, I guess like if some condition were different, like about the current state of, you know, AI.
00:24:05.480 | Things like if the media exposure was less, if the pace of everything was like slower in terms of
00:24:15.880 | investment, in terms of Twitter discussions, in terms of like, like in terms of a lot of company,
00:24:23.560 | like pushing hard on, on, on this, these kinds of things.
00:24:29.320 | And the fact that there are practically speaking, like two companies doing the vast majority of the
00:24:36.600 | hard lifting here on AI and the topic.
00:24:39.000 | I'd say if all those conditions were different, right?
00:24:44.600 | Like if we were in a condition in which like I had like, first of all, more open weights models or
00:24:54.280 | open source stuff in terms of, you know, LLM, I don't know, like train, like how to do training.
00:25:04.120 | You say, yes, I can read papers, but like papers that don't tell the whole story.
00:25:08.440 | Right?
00:25:08.680 | I think what's interesting about this is if we go with this open point for a second, we actually do have
00:25:14.600 | at least from a, from a pure perspective of like how it works.
00:25:17.720 | I'm not going to say it's easy to train a model, but it's actually not really all that hard.
00:25:21.640 | What is actually really hard is to source all the data, particularly to source it in a way that's
00:25:26.840 | actually at the end of the day, ethically correct.
00:25:29.080 | And, but at least I had the chance, right?
00:25:31.080 | Like to, to my point, like I don't want, in general, if the AI model I could use, like if the LLM I could use
00:25:38.840 | was not trained on like all the JavaScript code, which is in, in, in GitHub nowadays,
00:25:45.240 | which I would say like, even when they started, like even before the AI slot, like, like the quality
00:25:50.040 | wasn't good, right?
00:25:51.080 | Like, like, like in average, like on average, like wasn't good.
00:25:54.920 | Like, I don't want, like, I don't want something like that.
00:25:57.160 | I like, I would rather prefer like to, like to invest time into sourcing out my own source of,
00:26:04.200 | you know, learning.
00:26:05.400 | Like, yes, it would require me a bunch of time to do that and probably like produce like
00:26:11.000 | something more close to what I would like to see the ISP now.
00:26:15.080 | I think probably you might not produce enough corpus data yourself to be able to run an LLM just
00:26:19.960 | in your own code, right?
00:26:21.480 | Yeah, no, no, no, I'm not saying like just my code, but like, I kind of have like a few ideas on
00:26:27.080 | which code, like I didn't wrote, like to, to train an AI.
00:26:30.840 | But yeah, I think that you could actually probably get away with like a well-selected
00:26:34.760 | set of open source, particularly if people sort of opt into.
00:26:38.680 | I don't think you need necessarily all of GitHub and all of the terrible influencer blog posts with
00:26:44.360 | little code snippets that don't really do much.
00:26:46.200 | I think like we are not going to get rid of AI.
00:26:47.960 | So I think it's here to stay.
00:26:49.160 | And I actually do think it is a little bit like a steam engine in the sense that the basic idea of
00:26:53.800 | what the transformer looks like is simple enough that people will always be able to reproduce it.
00:26:58.440 | So then there's just a question of data.
00:27:00.280 | Seemingly data will be easy to get.
00:27:02.440 | Good quality data, maybe less so, I don't know, but you can even run to one of the existing models
00:27:06.920 | and just use it as a teacher model and sort of generate more slop out of it.
00:27:10.680 | And you're going to get, you're going to get a model that's at least sufficiently decent.
00:27:14.760 | If you, if it tumbled five times, then open weights.
00:27:18.120 | Who knows?
00:27:18.600 | So the crazy thing is if you do actually do that, like if you produce from one model to another
00:27:27.320 | and you do it a couple of times, the first iterations immediately gets better.
00:27:30.600 | Better under what?
00:27:32.520 | Better in whatever evaluations they're doing.
00:27:35.720 | Yeah, but again, I think like we also don't have like so many valuations method that makes
00:27:43.640 | sense nowadays, but whatever.
00:27:45.720 | Like I would agree with you that AI won't go away.
00:27:51.240 | I'd say sadly, because like to my perspective, like I'm not sad by the fact that LLMs will
00:27:59.560 | stimpy there like in 10 years from now.
00:28:01.560 | My main argument is like, I would rather prefer to see like a shifts in how we produce models
00:28:14.520 | for like, especially speaking for coding stuff.
00:28:18.920 | stuff.
00:28:19.560 | I would like to see like a direction in which we try to produce like smaller models to be able
00:28:27.000 | like to run them like on normal hardware without like, you know, the need of, I don't know how many
00:28:33.720 | clusters of super huge and BTS GPUs, which is, which is why like to see the conversation, like
00:28:42.360 | in some places to sit around at some point, because again, like, yes, yeah, I won't go away.
00:28:48.920 | But at the same time, like, it's very hard for me, like to even like doing today, like an investment
00:28:55.720 | over a company like Anthropic, even open AI, like even, even if it's like worth, I don't know, like how
00:29:02.840 | many billions of dollars, like, I don't know, in, in five years, those companies will still be there.
00:29:09.160 | Are they going to be profitable? Like, like, I don't know, like, there's a bunch of questions,
00:29:12.760 | right? Like, so I'm not saying like, I also don't know. I don't know. I really don't know if they're
00:29:17.960 | going to stick around. I don't even know if the tech necessarily is worth too much money that seemingly
00:29:22.760 | the market sort of puts on it right now. And clearly we have a bunch of energy problems too,
00:29:27.160 | right? So there's like, all of this I think is unsolved. But I also think like, if we, even if
00:29:31.560 | you take like one of the open weights models that exist today, they're actually not that terrible.
00:29:36.120 | I mean, like I still can't run them without sort of renting some super like hopper set up somewhere
00:29:41.560 | in some data center. And maybe even will take a couple of years to get to the point. And I think
00:29:45.640 | the models will have to get smaller. And I think there is some argument to be made that it can be
00:29:49.160 | smaller because you don't need all the world knowledge to run program. But I feel like we're
00:29:53.800 | at least on a trajectory where it still makes sense to me, right? It's, I don't really like, to be fair,
00:29:59.720 | I think like most like 90% or more of all of those AI startups are going to fail in one form or another
00:30:04.760 | because the market is way too frothy. Like it doesn't make a ton of sense to me. But the underlying
00:30:09.400 | reality to me is that even if we get stuck with transformers and even if we get stuck with what
00:30:14.840 | we have today, that shit's pretty amazing still. And I think like that the quality of the models that we
00:30:20.120 | have right now, we can probably get away with a fraction of the parameters that we can actually run it
00:30:25.320 | on smaller hardware. And there is some indication that you can actually fine-tune larger models.
00:30:29.880 | Sorry, fine-tune is the wrong term here, but like there's still larger models for more narrow use cases
00:30:34.760 | on smaller models and you still have decent quality, right? So I feel like it's at least lined up in a way
00:30:40.680 | where I can totally imagine that this will be okay. And that's why for me, like I'm looking at this
00:30:45.480 | mostly from like, what can it technically do right now? Because I do think we're going to have these
00:30:49.560 | things self-running or we are going to have them running on maybe alternative implementations,
00:30:54.360 | alternative trainings and maybe by selecting better codes rather than whatever has the highest entropy
00:30:59.720 | on GitHub will also lead to better results. Hopefully. I mean, like I don't think they're
00:31:03.480 | on this. I don't think they're in a position where they think or where we have HAI or anything like
00:31:07.560 | that. I just, my suspicion is that if you actually have higher quality code in there, you might actually
00:31:12.520 | get higher quality output. That's my assumption. Might be wrong, but I don't know if anyone's actually
00:31:17.400 | training it on that front. I mean, it's also hard to like, it's very hard for me to make like, you know,
00:31:24.200 | a prediction. Like, because, and this is probably why like the vast majority of people out there like
00:31:31.400 | see me as like a negative person in terms of AI, but I'm really not. Like the thing for me is like hard to
00:31:39.960 | make, you know, to, to, to, to expect these things getting better because like, I don't know, like if we
00:31:48.520 | make a parallel to stable diffusion, like diffusion models for images, videos, like, like, okay, videos has a lot more
00:31:56.760 | going on other than diffusion, but like, yes, like in there, like in the last, I don't know, like four to
00:32:04.280 | five years. Yes. The quality improved, but like, if you take that, like the last two years, they didn't
00:32:10.120 | improve much. Like it feels more like we're already plateauing in some way in the curve.
00:32:16.360 | It also looks terrible. I mean, I think the problem is like, I really never care about the images because
00:32:20.520 | like the images to me is like sort of like, it's novel and it's interesting,
00:32:26.120 | but I don't have a use for that. And all the users I can imagine are terrible. Like I don't
00:32:30.040 | want to watch advertisements made from AI or watch cinema movies made with the help of diffusion models.
00:32:37.240 | Like, I understand this is probably where we're going to go. And I will complain about it because
00:32:41.080 | I hate it, but I don't have the same reaction of text generation and code generation, text generation a
00:32:48.760 | little bit. If someone sends me, I stop to my email, we'll get angry, but, but there is a responsible use of it.
00:32:54.840 | And I don't really see the responsible use of the diffusion AI. Right. So I think like maybe the
00:32:58.280 | diffusion for... I don't know. Like picture creation.
00:33:01.960 | The Grannian logo is made with AI. Like straight with flaps.
00:33:05.960 | My absurd logo is also made with AI.
00:33:07.800 | You see? Like, I think there are cases, like I'm not, I'm also like not against like the use of that,
00:33:14.280 | like for advertisement. Like, I mean, like, I think it really depends on what you think about,
00:33:21.000 | like the business in general of advertisement, but, but yeah, like my point is like, if that's the working
00:33:27.720 | example for like fundamentally, like there, there are a lot of similarities, like, like in how an LLM work
00:33:35.480 | and how as the diffusion model work, at least mathematically speaking. So it's very hard for me to
00:33:41.160 | say yes. Like in, in two years, LLMs will be like amazing compared to today. Right.
00:33:47.800 | The thing is like, for me, the difference is that the LLM for code generation is already amazing today.
00:33:53.080 | So I don't need it to get better. I just need it to be running cheaper, faster, maybe have less
00:33:59.480 | shit in it. Like there's, there's some shit in it that I don't need, but it's not like I need a
00:34:05.480 | breakthrough in computing technology or in model architecture or even like the data input necessarily.
00:34:11.800 | It feels to me, it's already there. And I think that's the big difference because I, I don't like,
00:34:18.520 | I feel like getting so much utility out of it. Like it's expensive right now, for sure. Like
00:34:23.480 | I'm paying too much money for the inference, but I can't imagine the money getting to a 10th or less
00:34:28.200 | of what it costs. How? Well, I mean in parts, because like what you can, you could theoretically
00:34:33.320 | right now run, I don't know, Kini or, or any of those models on, I don't know, 300 billion parameters
00:34:41.720 | in a reasonable output, which is cheap use you can buy. It's not cheap use you can cheaply buy, but,
00:34:47.000 | but you can buy them. So if you just play this out for four years, we're going to be there where
00:34:52.040 | many more people are going to have a GPU that is capable of running models at that size. And I
00:34:56.040 | already know that that's good enough. Is it perfect? Probably not because there will be better things
00:34:59.720 | coming down the line and people will like invest more shit into it. And like eventually the market
00:35:04.520 | is going to like collapse. I think there is a bubble in it. But, but, but if we ignore that, if we ignore the
00:35:11.640 | absurd interest of investors to get a slice of this already sort of overstuffed pie, I think the
00:35:17.240 | fundamentals are, this is actually going to scale down so that it can realistically imagine this to
00:35:21.480 | be in my computer. And even my five year old M1 Mac, Max thingy with the 64 gigabytes of RAM that it can
00:35:28.760 | utilize largely for a GPU. I can actually run some pretty good models. And it's like, am I doing it a lot?
00:35:34.200 | No, but, but every once in a while I do it just to see what's possible. And it's like, I can see it
00:35:40.600 | being not entirely unrealistic. I don't know because the amount of time I'm pissed off compared to the
00:35:46.680 | ones I'm amazed are probably like, and I guess like the vast majority, like, like, yeah, the number one
00:35:57.720 | cause of that, like for my workflows is that like, I work on ideally work like on big repository, like
00:36:05.160 | big amount of repos like, like a century, we have this ops more repo with everything like inside there,
00:36:11.640 | like, and, and, and, and the amount of times, like any LLM out there gets confused about like the context
00:36:19.320 | in which it operates, like the amount of times it just says nonsensical stuff about like terraform
00:36:25.720 | modules, which is like a really good, like way, like, okay. So one annoying thing like about
00:36:31.960 | terraform is like every single provider, like if you, if you use like the GCP or the AWS provider,
00:36:38.120 | like the amount of changes they push out, they push a new version like once a week, right? Like that's
00:36:43.880 | the ratio. This is why people should commit to backwards compatibility. But again, like that,
00:36:50.120 | that would be something for me, like an AI would help, but they cannot because they gets like so
00:36:56.760 | fucking confused about like that amount of changes and that amount of context that they just speed out
00:37:03.560 | nonsense. Like, Hey, you used like a string here, but it's supposed to be an array of string. And you
00:37:09.960 | check the documentation of the terraform module. Oh, that's not true. So like things like that, right?
00:37:16.040 | Like, like, like, so I also say that my experience with using cloud for terraform was so bad that I
00:37:21.560 | moved to Bulumi. You see, like, like that's, but, but I mean, to my perspective, like, like as soon as
00:37:28.520 | Bulumi gets the same amount of usage of terraform, we'll end up like with, with the same. Unless they
00:37:34.920 | learn and not change everything all the time. No, because I mean, like the providers are like APIs,
00:37:41.800 | like the cloud providers API, like they can do whatever they want because like, they're like,
00:37:47.000 | that's the reality. But maybe if people start complaining that the agents don't work, if they
00:37:50.360 | change the stuff all the time, they will change. This is a lot, I think a question that I have in
00:37:53.960 | general is like, is like, nobody gave a shit about humans, but with agents, I think people give a
00:38:00.360 | shit about. Do we give a shit about machines? Like, I don't know. I feel like we care about the
00:38:05.160 | machine a lot more than about the human, honestly. Because the machine is measurable.
00:38:10.040 | But yeah, back to my point, like, I guess like if something like, at least for my use case, my daily
00:38:17.000 | use case, if something doesn't like fundamentally change in the way we treat like the latent space,
00:38:23.560 | you know, because like, I think like 99% of my issues with AI are because of the context. And so like,
00:38:29.800 | to my perspective, even if we go like to 10 billion tokens context, like, won't change anything. Like,
00:38:35.400 | I don't think like scaling the context for my use cases is the answer to the problem. I think like
00:38:41.480 | the answer to the problem is, yeah, changing the architecture around how we loop into the latent space.
00:38:49.960 | Right. And so, yeah, I guess like if some company at some point starts producing like a foundational
00:39:00.200 | model that works differently in terms of context and how like it treats tokens in inputs and outputs,
00:39:07.960 | yes, then probably at that point, I will be like optimist in terms of like this thing getting better.
00:39:13.400 | So for you it's about learning and remembering, right? Like a lot of like,
00:39:17.480 | that it doesn't start from zero and has, has a way to...
00:39:21.240 | Yes. Especially when I cannot control again, like, because if I use like entropic models or
00:39:26.920 | OpenAI models, like I cannot control the learning, like in general, like, because they put out like
00:39:32.040 | a new version of the model, but it's still called like GPT-5 codex or whatever. And maybe everything
00:39:37.560 | changed for like my use case. And I have no idea.
00:39:40.520 | This is a real problem. Like even, even a move from, from, I have this one,
00:39:44.440 | like the move from 4 to 4.5 change things dramatically. And I was not able to evaluate
00:39:49.640 | because like, I have no idea what changed. It's just like, it's different now.
00:39:52.040 | So I share that problem. I share that opinion. I share that problem. I don't know what the solution is.
00:39:58.280 | I just maybe a little bit too optimistic that the open-waste models are like a counterbalance
00:40:02.200 | to sort of the OpenAI-Entropic duopoly because we do have a bunch of Chinese models and I think they are actually par.
00:40:09.000 | Yeah. But again, like, even if you like, if you look at the stable diffusion ones or like,
00:40:15.800 | if you take Quen image or if you take like Quen for like, just tax generation, like...
00:40:20.760 | I don't think that they're better. I just think that they, they at least are in the point...
00:40:25.320 | because they exist, if Entropic and OpenAI decide to go even more nuclear on, on not sharing anything,
00:40:32.040 | then I feel like, okay, off to China. I am, I don't really care. And you can see this now,
00:40:36.920 | like the couple of American companies actually want to train their own models. They are just doing it
00:40:40.520 | on top of the Chinese models. Like that the cursor model clearly is trained on one of those Chinese
00:40:44.280 | models. I think Windsor did the same thing. So yeah, it's a little bit ridiculous that we have to sort of
00:40:49.720 | ref, like go to the other side of the world and to another political system to, to get open weight
00:40:55.000 | models, but I will take it because nobody else has them. But it is to me, it creates this counterpoint,
00:40:59.640 | at least for me, makes it feel like there's a possibility that we are not... Like if we didn't
00:41:05.160 | have the Chinese models, I would actually be much, much more negative on the whole thing because then
00:41:09.080 | it would imply to me that you can only run this as a very large corporation and there's no chance that
00:41:14.360 | this will trickle down. It's just right now, the existence of those models implies that it's easy
00:41:21.640 | enough, it's hard to say, but it's like, it's in the realm of possibility that we will see more of
00:41:27.080 | those. And even sort of this absolute failure of your European company, this mistral thingy,
00:41:31.480 | does create a model. So like this, it is what it is, but they're capable of producing something.
00:41:39.640 | Is it the most amazing? No, not at all. And I think they could do much better if they had a
00:41:44.280 | little bit more ambition, but they at least have a model. And so it shows that there is...
00:41:49.240 | Like this is not just Google runs the world. I think it's going to be, there's going to be some
00:41:53.480 | competition. Yeah, that's for sure. Yeah. I don't know. Like I also plan to like give another try
00:41:58.600 | to Cursor as they release like version two with Composer and that kind of thing to see. Because again,
00:42:06.440 | like I saw some videos like of people trying it and it's like at least like their
00:42:14.200 | model is very fast. So at least like that could, I don't know, like change my perception probably like,
00:42:23.480 | like, okay, maybe I still have to do like 10 re-rolls, but like, like the amount of time I need
00:42:29.240 | to wait is different. So maybe, yes. I don't know. Like, like in terms of perspective, again,
00:42:35.320 | I see some tiny things somewhere like that, that makes me wonder about the future. Like I see some
00:42:43.640 | very little few things here and there. But yeah, I think like to, like for me to actually say, okay,
00:42:51.160 | I can trust whatever, like I can trust in general what it spits out or what it gets from, from, from,
00:43:00.040 | from my thoughts. Yeah. I think it would be still hard for me to use. Like at least again, to, to my
00:43:06.120 | like field scopes, like, like even on my open source project. Like I think I could try to like do some
00:43:13.800 | like annoying stuff from granny and like, but still feel risky. Try some repo cases. They are fun.
00:43:18.360 | They're really fun. Or just see what it does if you just let it loose on an open source project. I did
00:43:24.360 | this a bunch of mini changes. Like, I just want to see how adventurous that it gets. And it, it gets
00:43:29.880 | bloody adventurous. This is interesting. I don't know. Like the vast majority of things,
00:43:33.960 | like I get annoyed on my projects, like Windows related stuff because I had like this terrible,
00:43:39.320 | terrible idea to support Windows where I just could say like, no, but anyways, like, and, and,
00:43:44.920 | and I tried once. But you know what? So fun thing. I had this situation very recently. I don't have,
00:43:50.200 | I have a Windows computer, but like, I don't want to boot for like one stupid bug fix. So one of the
00:43:55.800 | things that I did is like, I told Claude to fix the problem. It knows it's on a Mac and I told it, look,
00:44:01.240 | you can commit to a branch, you can push to GitHub, and then you can run the gh command to pull on the
00:44:07.720 | status report. And so it actually iterated while committing to GitHub and use the GitHub runner as
00:44:13.480 | a Windows machine. It was, it was marvelous. It took it like three hours, but it was working. And I think
00:44:19.160 | it's like, it's kind of fun. Yeah. I mean, I could try for that, like, but yeah. It was nice talking.
00:44:26.840 | Yes, absolutely. But thanks for, thanks. Yeah. Thank you. It was nice.