back to indexTalking Agentic Engineering with Giovanni Barillari

00:00:05.080 |
I'm currently working at Sentry as a Site Reliability Engineer. 00:00:10.440 |
But let's say I kinda worked a lot with Python and web development, some stuff like that. 00:00:17.200 |
And yeah, I kinda contributed to OpenSearch like, I think I started like 15 years ago 00:00:23.680 |
with Web2Py, like the web framework from Massimo DiPiero. 00:00:28.680 |
So yeah, that's kinda me, I guess, I'm actually a physicist, so I might have some weird opinions 00:00:38.900 |
sometimes on software and the industry, but yeah, I guess that's part of me, being me. 00:00:45.880 |
So we had a discussion, when was it, like almost two weeks ago, I think briefly in a coffee place 00:00:55.360 |
And I think like maybe, I feel like there were two themes we discussed. 00:00:59.080 |
One was the impact that it has on OpenSource and sort of the what it could be versus what 00:01:04.980 |
I think this is like probably, I think the main things. 00:01:09.580 |
So one of the weird things at the moment is that I always use identity coding a lot and 00:01:16.580 |
But at the same time, obviously I know that a lot of people who are having a completely 00:01:24.300 |
And I think that at least my realization at the moment is that it actually takes quite a 00:01:28.440 |
bit of time and to learn, which is ironic because it sort of feels like it shouldn't. 00:01:32.760 |
But I feel like there's, in a weird way, there's a little bit of a skill to it. 00:01:36.800 |
So one of the things that I did recently, for instance, was I built this, I mean, I built 00:01:41.520 |
But when I open source, it was this Q thing, like a durable execution system, absurd, yeah. 00:01:47.800 |
And the reason I even was willing to build it in a way was because I am from a purely philosophical, 00:01:54.520 |
software engineering, philosophical point of view. 00:01:56.520 |
I believe in two things very strongly as little dependencies as possible and for a system to 00:02:02.460 |
crash really well and recover, like it should be, it should be like the disability should 00:02:07.280 |
come from, like, it doesn't matter how it sort of terminates itself, like it should come 00:02:13.060 |
And so durable execution for me is an interesting thing, but I also didn't want to combine 20 different 00:02:17.040 |
sort of startup-y kind of solutions out there together and I felt like it should be easier 00:02:21.920 |
So the only thing is that I was able to do this with Claude and Codex rather efficiently. 00:02:28.340 |
And otherwise I probably might not have built it at this stage of the company because it 00:02:36.880 |
So that's one of my recent lessons of like, you can actually kind of build open source software 00:02:40.320 |
with it and it actually feel kind of good about the quality that it has. 00:02:43.800 |
And I feel like there's more leverage of me to maybe not over engineer things quite as 00:02:51.720 |
I feel like there's a little bit of a pushback against sort of the madness that was maybe 00:02:55.700 |
the last, at least the last 10 years of software engineering that happens in a lot of companies, 00:02:59.620 |
which is like all of a sudden it's like an army of third-party services that you don't 00:03:07.180 |
So like, I think like I have a lot of point of views on this, like, and in general, like 00:03:12.480 |
about AI for coding, I guess, like, I think I have like a draft for a blog post, like sitting 00:03:21.280 |
And once in a while, like I try to add stuff there and never publish it. 00:03:27.800 |
I think it really depends on the context, right? 00:03:31.180 |
Like in which you're operate because like, I guess we are like on the opposite sides of 00:03:37.380 |
things, like regarding almost everything, like on the work side of things, right? 00:03:42.020 |
Like I work for a company, like an established company. 00:03:44.180 |
You're like starting with something new as a startup. 00:03:47.200 |
I did a bunch of startup in the past, but like, yeah, it was 15 years ago. 00:03:52.080 |
So like there was no, I stopped to, so I don't know, like if at the time, if I had the chance, 00:04:00.160 |
But specifically like for open source, I think it depends. 00:04:04.080 |
Like, first of all, like I'd say in the last 10 years, like even in a community like the Python 00:04:11.140 |
one, which I'd say it's like, it has like a slow pace compared to something like JavaScript 00:04:19.060 |
And I'd say things that have changed quite, quite a lot in terms of even just like in terms of the 00:04:29.060 |
amount of libraries out there that you can, you know, just put it in as an, as a dependency. 00:04:35.060 |
So I'd say like the amount of code and projects out there, it grew a lot and like the quality 00:04:43.820 |
of those projects, like it's very different, right? 00:04:47.120 |
Like, like we have some like really long standing projects, which are, I'm not saying like they're 00:04:52.620 |
good quality, like by, by definition, we have plenty of new stuff. 00:04:57.320 |
Like some of that is good, some of that is bad, but the role like of agent encoding for 00:05:02.440 |
managing an open source, like a project to me is weird because I guess like my main concern 00:05:09.500 |
when I, when I work on one of my open source project is like the long-term maintenance. 00:05:14.760 |
Like I rarely put somewhere like out there and, and just never care about it, about it anymore. 00:05:22.680 |
Like I, I, I'm not really into like source available rather than open source, right? 00:05:28.680 |
Like, like there's very few stuff that I don't touch anymore. 00:05:32.920 |
Like after I put them available for the, for the people to use. 00:05:36.920 |
So yeah, to my perspective, like in your specific case, like I see how it, it works because it's 00:05:44.280 |
something like very self-contained quite small in terms of features or, or, or scope or, or stuff like that. 00:05:52.680 |
But yeah, I guess like my question for you would be what, what, how that project looks like at, 00:06:01.800 |
I mean, I think, so I, I, first of all, I think you're right that there's, um, there's a lot more 00:06:05.960 |
source code now and I actually always found that to be a problem. 00:06:11.320 |
I, I think there's a lot of stuff wrong with curl, but what I love about curl is that it commits itself 00:06:17.080 |
to very, very long-term stability to it's, it's sort of like a, like a rock in the shore. 00:06:23.880 |
If you call it this way, I guess it's just there, it's going to run everywhere. 00:06:28.840 |
It's a very, very reliable piece of software. 00:06:32.040 |
And I have learned a lot of them over the years, but, but it's there, it works. 00:06:35.800 |
And it means that a lot of improvements can land in one piece of infrastructure that 00:06:43.720 |
There's a completely different vibe than, for instance, if you take the reason by a lot of, 00:06:47.960 |
by fabric, not by fabrication, I would love, um, what do you call this? 00:06:50.760 |
But there's so many more projects sprawling in the JavaScript ecosystem. 00:06:54.200 |
And I think the reason here is actually less that people are combining efforts together and 00:06:59.000 |
more it's like, Hey, I also want to build an open source project. 00:07:01.080 |
Like see the act of creating it and not the act of maintaining it. 00:07:04.040 |
The maintaining part is the hardest one to creating it is the easy one. 00:07:07.400 |
I think that there's because maintaining is hard. 00:07:10.040 |
There's a question like how well does that work? 00:07:12.680 |
I think in general, my, my theory is because AI makes writing new code much easier. 00:07:19.400 |
most of the time it should be whatever you need for your code base. 00:07:22.760 |
And then maybe the entire code base goes public. 00:07:24.600 |
I don't know, but, but, but we don't need more open source libraries. 00:07:30.920 |
We need more people working together to solve problems, not 20 different 00:07:36.200 |
But I actually think that if you, if you want to maintain things, if you really want to 00:07:42.440 |
commit yourself to maintenance, then AI will actually help a lot. 00:07:46.200 |
Um, because you can, if you want, because the thing is like, you need to commit yourself to 00:07:54.200 |
Some of which are really, really crappy, like writing change logs or, uh, finding report cases. 00:07:59.400 |
This is one of my most favorite things that, that AI can do is this is weird description of, of a 00:08:05.400 |
problem that someone might've run into and like, Claude, make me a report case. 00:08:09.400 |
And it's like, I hate doing that, but it can do that. 00:08:15.320 |
And so I don't think it's like, I think that would be a different thing if you sort of say like, 00:08:19.160 |
Hey, the way I'm going to maintain my libraries, I'm just going to AI slope, commit everything. 00:08:23.480 |
There will be a way of doing that, but I don't think it's going to be one that has like a lot 00:08:30.520 |
But to that point, like my take would be like, like from my experience and again, like maybe, 00:08:36.680 |
maybe that also depend on the scale of the project, like in, in terms of popularity of 00:08:41.400 |
the projects I maintain, but like, even at the scale I am, I rarely find the act of producing 00:08:50.280 |
code for my open source project, like the hard part of the maintenance. 00:08:54.760 |
Like to me, like the major of burden comes from like issues management or release schedule. 00:09:01.640 |
Something that Claude is capable of assisting with. 00:09:06.200 |
But I give you those two examples, but I think like, I also, I feel like I'm quite fast at writing 00:09:17.240 |
And one of the reasons why I actually found this to be, so for instance, absurd is a good example. 00:09:23.800 |
If I, if I would have written it by hand all the time, everything of it, 00:09:28.680 |
I wouldn't have built it like I did because I hate SQL. 00:09:35.560 |
So I would have, I would have erred on the side of just writing most of it in Python, 00:09:40.600 |
writing a Python SDK and then do the least amount of SQL necessary. 00:09:44.360 |
But actually that's precisely what I didn't want to do. 00:09:46.840 |
I knew that the right way of doing this is the same way as PGMQ does it, which is you write a bunch of 00:09:52.440 |
stored functions because then you do everything from the database and the SDKs are very, very tiny. 00:09:57.480 |
And so now I can have a Go SDK, a Python SDK, a JavaScript SDK, but I would not have enjoyed 00:10:03.560 |
writing that because it would have involved writing one of my least favorite programming languages. 00:10:07.800 |
It's good for queries, but the moment you do more complicated stuff of it, it's not enjoyable. 00:10:11.240 |
And so I think it changes a little bit the perspective of how you do things because all 00:10:17.560 |
of a sudden I don't quite mind as much anymore. 00:10:20.280 |
Doing the things that I always knew were right to do, but it's just not enjoyable. 00:10:25.160 |
I had this thing also recently where I basically had this small little utility script, 00:10:30.600 |
which one of the quote unquote requirements that I made myself was that it should work on Linux and Mac. 00:10:40.600 |
And it became so annoying to maintain because it's like, it basically depends on like different 00:10:46.520 |
I just went to Cloud and said, look, let's just rewrite this in Perl. 00:10:50.200 |
It's not that I like Perl, but it runs everywhere and it's very good with regular expressions. 00:10:55.240 |
And it's one of the things that you can throw in any machine and you don't have to deal with 00:10:58.920 |
like dependencies and for what it was doing, which is basically just parsing a little bit of stuff 00:11:12.440 |
Like the amount of things, like if I think about like in general, my, I don't know, 00:11:19.880 |
daily life in programming, like both on the work side of things, 00:11:24.120 |
like open source things like the amount of actually boring stuff or things that annoys 00:11:32.280 |
It's very little, like, like in general, right? 00:11:36.600 |
Like, like probably most of the people find annoying to write Vash scripts, but I learned 00:11:45.080 |
I'm, I'm just like, it's hard for me to do that step. 00:11:48.520 |
Like where I see a problem and maybe like, yes, I don't like Vash, but maybe I need to write like, 00:11:56.120 |
Vash, like, it's hard for me to do the switch and say, instead of just typing, like switch the 00:12:07.240 |
And now I'm like, it has switched in my head somehow. 00:12:11.400 |
I have a different version of this, which is like, for many years, I was sort of in my mind, 00:12:17.400 |
I had this idea that I would not drive an automatic car. 00:12:19.720 |
It's just, it was like, I will only ever buy stick shift cars. 00:12:24.760 |
But at one point I was like, this just doesn't make a ton of sense to me because I really like, 00:12:30.360 |
Adaptive cruise control, which works most better if a team goes to zero and you don't have to 00:12:35.880 |
So it was sort of this, I had to open myself up to the idea that I was just like, 00:12:41.960 |
And then there's a car that just has to get me through the day. 00:12:45.560 |
And I feel like maybe sort of I had made the same shift in my mind at one point about 00:12:50.360 |
like the act of like punching it into the keyboard. 00:12:53.720 |
It is stimulating, but it's not necessarily the important one. 00:13:03.320 |
Like I spent a lot of time thinking before actually typing. 00:13:07.880 |
But I guess like if I have to put like on, on like the balance, right? 00:13:15.160 |
Like, like the thing to say, okay, now I, I, I've done all of my thought process, 00:13:22.280 |
which I will do like in any case, like, like to me, it's very hard to delegate the thinking to 00:13:30.760 |
Like at least for now, mostly because of the lack of predictability in the output. 00:13:37.080 |
Like, like, it's very hard for me to, to delegate like the thinking to something like, 00:13:44.200 |
Like can the same input can give me like 10 different answers. 00:13:47.960 |
Do you still find that to be a problem now that it's very unpredictable? 00:13:51.880 |
I'd say like, like, again, like in general, no, but talking about software, like it's, 00:14:01.480 |
Like, like, it's not like, like my way in general of, of architecting something 00:14:06.520 |
is not like there are a hundred possible ways of doing this. 00:14:10.920 |
Like my, my, my general approach is probably there are like two, three way correct ways of doing this. 00:14:18.920 |
And, and I need to pick the optimal one for my use case or context. 00:14:24.040 |
Like, of course there are like probably a hundred ways to do the same ship, but like, realistically 00:14:29.400 |
speaking, like probably three out of those a hundred are worth, you know, investigating. 00:14:34.440 |
And, and the point is like, if I don't have, if I cannot like reproduce like this, those three ways, 00:14:43.160 |
like with a certain amount of certainty, sorry. 00:14:46.760 |
I, I, I, I find it very hard to, you know, like, like leave the control to, to, to that. 00:14:52.360 |
I guess like for me, I think I understand this to me, like mentally, I never feel like I don't 00:15:00.280 |
And, and this is actually a little bit weird because like very clearly I let it do really 00:15:04.840 |
Like it connects to my production system and, and checks the database and stuff like this. 00:15:11.160 |
Well, I'm like, I'm, I'm very close to my escape button if it does something stupid. 00:15:15.800 |
But, but, but like ignoring those sort of like more extreme versions of it, it runs tests, 00:15:20.440 |
it writes code, it does some stuff, but at the end it presents me with a diff. 00:15:27.080 |
It is me that reviews, it's me that sort of interacts with it. 00:15:34.760 |
I can sort of fall down into a path where I'm like, like, this is like a sort of a pure 00:15:42.200 |
Absurd has like the main SQL part and the driver, which is really good, I think. 00:15:46.120 |
But then it has this UI, which is called Habitat, which I just, I just wanted to see my tasks. 00:15:51.320 |
It, this is pure slot, but, but I would have not written the UI in the first place before. 00:15:57.080 |
But you still didn't, like that didn't change. 00:15:59.800 |
Like, like something else, like you said of someone else. 00:16:03.720 |
But the thing is that now I have the UI and, and, and I feel, I feel really happy about it 00:16:07.800 |
because I can like, look, I run this on one of my agents and basically every single step that it 00:16:12.600 |
does, I can now click through, I can, I can see it. 00:16:15.320 |
And it even does some nice things for like, if I click on a string, it sort of changes from 00:16:18.680 |
JSON rendering with like the escapes to like inline as much easier to debug. 00:16:22.840 |
It gives me like, it gives me pure joy using this UI and, and debugging my shit better. 00:16:28.920 |
That if I didn't have the agent at this point, I wouldn't have committed myself to doing it. 00:16:36.440 |
If you, and because you mentioned earlier, like, is it different for, for an open source project or 00:16:40.520 |
some, some new startup as another company, I think there are certain projects that you run 00:16:44.840 |
into at the company where the actual hard part is not writing the code. 00:16:49.000 |
In fact, it is very possible that they have a monumental task where the actual only change 00:16:55.480 |
And what actually is really hard is validating this break, right? 00:16:59.880 |
So you often create these crazy harnesses around it to validate that what you're changing right now 00:17:06.040 |
actually is going to work or you're creating this massive migration strategy. 00:17:11.560 |
And I think what AI helps with now is helping you create the tools to then do those changes in a way, 00:17:17.640 |
because a lot of these tools are like throw away anyways, you're going to need them for like a week 00:17:21.480 |
or maybe 90 days or however long it takes you to run the migration. 00:17:25.240 |
And then afterwards they're like, just dev null, delete it. 00:17:28.440 |
So I feel like there's like, even for, I think that can work in a very small company. 00:17:38.040 |
And to 50, 50 people, I'd say like, but it, for me, like it's harder to imagine even, even if I like, 00:17:49.320 |
I don't know, like official, like building a company today, right? 00:17:52.120 |
Like, and, and, and scaling that company to 500 people, not necessarily all of them working on code, 00:17:57.320 |
of course, but let's say like, I don't know, half of them, whatever, 250 people working on code. 00:18:02.200 |
And like, even, even if I can picture like working with AI, especially for that kind of thing, 00:18:08.600 |
you mentioned might work like really good, like in the first, I don't know, year. 00:18:14.040 |
I think like the major time consuming activity in big companies is, doesn't have 00:18:21.320 |
Like, like, but even, even like, like thinking about features. 00:18:28.040 |
No, it's just makes, makes cheaper to make experiment as soon as those experiments are 00:18:37.480 |
But again, like if we're talking about the big company is the same. 00:18:40.360 |
But let's take a, let's take a century for instance, right? 00:18:43.400 |
One of my least favorite tools, I appreciate that it exists, 00:18:47.880 |
but I absolutely hated interacting with was the Snuba admin. 00:18:50.600 |
Never had the chance to interact with it, but I heard a lot of stuff. 00:18:54.680 |
And it's not because this tool was badly built or something like this, 00:18:58.520 |
but it was like, it was built with about as much time as you have in addition 00:19:03.320 |
But then everybody else has to use this tool, right? 00:19:06.120 |
And so if, if, if we were to take like the, the very sort of simplistic view 00:19:12.040 |
of it and it's like, what's the quality of that tool? 00:19:14.200 |
And what would the, what would be the quality of that tool if it was written with AI? 00:19:18.920 |
Now I'm not going to judge if it's going to do better or worse. 00:19:21.720 |
My point is that it actually took a really long time to write the tool too, right? 00:19:25.080 |
Because reasons you need to deploy it and then, and then you can maintain it. 00:19:29.640 |
And then they're going to be all the people that are going to complain 00:19:32.760 |
And then someone else is going to throw some stuff into it. 00:19:34.840 |
And so I, from my, from, from my recollection of actually even trying 00:19:39.720 |
to look at the source of existing, it wasn't amazing. 00:19:46.760 |
So, so these tools exist if you want them or not. 00:19:49.720 |
A lot of the stuff within companies exists that, that is this internal thing. 00:19:54.280 |
But I think like it's a very tiny portion, like on big companies, right? 00:20:01.080 |
All of these tools in my mind are where all my frustration is, in a way. 00:20:05.640 |
It's not, it's not the like, it's not that curl or it's not the Linux kernel, 00:20:10.120 |
or it's not these things that are actually built for purpose. 00:20:13.160 |
And, and like, and they're really, really rock solid pieces of software engineering. 00:20:17.640 |
It's all the other shit because like, that's the one that you run into, right? 00:20:21.640 |
Like the only way for me to debug an issue is to go through Snow White. 00:20:24.680 |
Again, I don't want to pick on this thing, but it's like, this is like, 00:20:27.320 |
there was a significant part of my day in every once in a while. 00:20:30.920 |
I was going there and running queries and it was just never great. 00:20:34.040 |
And then I could look at like, here's this other tool I could use, 00:20:36.440 |
where like actual people put ridiculous amount of effort into compared to the 00:20:42.440 |
And I can tell you objectively, my internal debugging tools now are better. 00:20:49.160 |
And, and I think that problem will not necessarily scale any worse to a large company. 00:20:55.960 |
My point was like, I think like the, the amount of time spent on those tools, like on a bit, like, 00:21:02.280 |
like once the company like becomes bigger and bigger is, is less and less relevant in a way. 00:21:08.760 |
Like, I don't see like the majority of discussion happening in Sentry, like during meetings about 00:21:14.520 |
Snow Batman, like, it's just, you know, it's there. 00:21:17.640 |
So like, so yeah, to me, to me, like, that's, that's the kind of the, the barrier in between, 00:21:25.960 |
Like every company out there could use like any coding, like AI coding tool to, to build like UIs over 00:21:35.080 |
stuff or, or, or yeah, again, like internal tools for making queries or checking whatever the state it is. 00:21:43.080 |
My point is like talking about real software, like. 00:21:56.440 |
So do you actually think it would work for real software? 00:21:58.360 |
Like, from my perspective, like the main, the, again, like the main point for me would be to have 00:22:05.080 |
something that is predictable, like, like the predictability to me, like, is the, like, probably 00:22:11.960 |
like the reason number one, like, I, I, I really find really hard, like working with AI in general. 00:22:19.960 |
And it's, it's predictability in the code that it generates. 00:22:23.640 |
So it's predictability in being able to tell ahead of time if it's going to be able to do this task. 00:22:29.720 |
Like, well, in which part I think to find it most unpredictable. 00:22:32.760 |
I mean, at the end of the day, it's kind of the same, right? 00:22:35.160 |
Because like, if you have to reroll 10 times, right? 00:22:42.600 |
Like, like, and, and, and five, like, and even, even if like five, five out of those 10 time is 00:22:48.680 |
decent and it produced something I can like work on, right? 00:22:57.080 |
For me it's different because if I feel like there's like, yes. 00:23:00.120 |
But if I can know ahead of time, if the coin flip, like basically, let's say this is in fact a 00:23:06.760 |
randomized thing and it would work a little bit like a role play game. 00:23:10.600 |
And I was like, this is a, I don't know, this is a task that requires throwing 3d6 or something. 00:23:14.600 |
Whereas this is one that requires, I don't know, one and everything over two sort of is, is success. 00:23:20.200 |
Like you have ahead of time, you have an indication of, is it's going to be a problem 00:23:26.440 |
And I think that the reason why I feel like I feel a little bit more confident about AI now is that I 00:23:31.080 |
have a better understanding, at least for the models that I'm using, which problems are problems that the 00:23:35.960 |
AI at least is going to make some progress versus which are the problems where I don't even have to 00:23:40.200 |
try because it will not go anywhere or the output is going to be too random. 00:23:44.040 |
So that to me is a different thing because it gives me the opportunity to sort of opt out of it 00:23:48.920 |
before even wasting like coin tosses in that sense. 00:23:53.480 |
I'd say like probably, okay, a few things here. 00:23:57.400 |
Like, I guess like if some condition were different, like about the current state of, you know, AI. 00:24:05.480 |
Things like if the media exposure was less, if the pace of everything was like slower in terms of 00:24:15.880 |
investment, in terms of Twitter discussions, in terms of like, like in terms of a lot of company, 00:24:23.560 |
like pushing hard on, on, on this, these kinds of things. 00:24:29.320 |
And the fact that there are practically speaking, like two companies doing the vast majority of the 00:24:39.000 |
I'd say if all those conditions were different, right? 00:24:44.600 |
Like if we were in a condition in which like I had like, first of all, more open weights models or 00:24:54.280 |
open source stuff in terms of, you know, LLM, I don't know, like train, like how to do training. 00:25:04.120 |
You say, yes, I can read papers, but like papers that don't tell the whole story. 00:25:08.680 |
I think what's interesting about this is if we go with this open point for a second, we actually do have 00:25:14.600 |
at least from a, from a pure perspective of like how it works. 00:25:17.720 |
I'm not going to say it's easy to train a model, but it's actually not really all that hard. 00:25:21.640 |
What is actually really hard is to source all the data, particularly to source it in a way that's 00:25:26.840 |
actually at the end of the day, ethically correct. 00:25:31.080 |
Like to, to my point, like I don't want, in general, if the AI model I could use, like if the LLM I could use 00:25:38.840 |
was not trained on like all the JavaScript code, which is in, in, in GitHub nowadays, 00:25:45.240 |
which I would say like, even when they started, like even before the AI slot, like, like the quality 00:25:51.080 |
Like, like, like in average, like on average, like wasn't good. 00:25:54.920 |
Like, I don't want, like, I don't want something like that. 00:25:57.160 |
I like, I would rather prefer like to, like to invest time into sourcing out my own source of, 00:26:05.400 |
Like, yes, it would require me a bunch of time to do that and probably like produce like 00:26:11.000 |
something more close to what I would like to see the ISP now. 00:26:15.080 |
I think probably you might not produce enough corpus data yourself to be able to run an LLM just 00:26:21.480 |
Yeah, no, no, no, I'm not saying like just my code, but like, I kind of have like a few ideas on 00:26:27.080 |
which code, like I didn't wrote, like to, to train an AI. 00:26:30.840 |
But yeah, I think that you could actually probably get away with like a well-selected 00:26:34.760 |
set of open source, particularly if people sort of opt into. 00:26:38.680 |
I don't think you need necessarily all of GitHub and all of the terrible influencer blog posts with 00:26:44.360 |
little code snippets that don't really do much. 00:26:46.200 |
I think like we are not going to get rid of AI. 00:26:49.160 |
And I actually do think it is a little bit like a steam engine in the sense that the basic idea of 00:26:53.800 |
what the transformer looks like is simple enough that people will always be able to reproduce it. 00:27:02.440 |
Good quality data, maybe less so, I don't know, but you can even run to one of the existing models 00:27:06.920 |
and just use it as a teacher model and sort of generate more slop out of it. 00:27:10.680 |
And you're going to get, you're going to get a model that's at least sufficiently decent. 00:27:14.760 |
If you, if it tumbled five times, then open weights. 00:27:18.600 |
So the crazy thing is if you do actually do that, like if you produce from one model to another 00:27:27.320 |
and you do it a couple of times, the first iterations immediately gets better. 00:27:32.520 |
Better in whatever evaluations they're doing. 00:27:35.720 |
Yeah, but again, I think like we also don't have like so many valuations method that makes 00:27:45.720 |
Like I would agree with you that AI won't go away. 00:27:51.240 |
I'd say sadly, because like to my perspective, like I'm not sad by the fact that LLMs will 00:28:01.560 |
My main argument is like, I would rather prefer to see like a shifts in how we produce models 00:28:14.520 |
for like, especially speaking for coding stuff. 00:28:19.560 |
I would like to see like a direction in which we try to produce like smaller models to be able 00:28:27.000 |
like to run them like on normal hardware without like, you know, the need of, I don't know how many 00:28:33.720 |
clusters of super huge and BTS GPUs, which is, which is why like to see the conversation, like 00:28:42.360 |
in some places to sit around at some point, because again, like, yes, yeah, I won't go away. 00:28:48.920 |
But at the same time, like, it's very hard for me, like to even like doing today, like an investment 00:28:55.720 |
over a company like Anthropic, even open AI, like even, even if it's like worth, I don't know, like how 00:29:02.840 |
many billions of dollars, like, I don't know, in, in five years, those companies will still be there. 00:29:09.160 |
Are they going to be profitable? Like, like, I don't know, like, there's a bunch of questions, 00:29:12.760 |
right? Like, so I'm not saying like, I also don't know. I don't know. I really don't know if they're 00:29:17.960 |
going to stick around. I don't even know if the tech necessarily is worth too much money that seemingly 00:29:22.760 |
the market sort of puts on it right now. And clearly we have a bunch of energy problems too, 00:29:27.160 |
right? So there's like, all of this I think is unsolved. But I also think like, if we, even if 00:29:31.560 |
you take like one of the open weights models that exist today, they're actually not that terrible. 00:29:36.120 |
I mean, like I still can't run them without sort of renting some super like hopper set up somewhere 00:29:41.560 |
in some data center. And maybe even will take a couple of years to get to the point. And I think 00:29:45.640 |
the models will have to get smaller. And I think there is some argument to be made that it can be 00:29:49.160 |
smaller because you don't need all the world knowledge to run program. But I feel like we're 00:29:53.800 |
at least on a trajectory where it still makes sense to me, right? It's, I don't really like, to be fair, 00:29:59.720 |
I think like most like 90% or more of all of those AI startups are going to fail in one form or another 00:30:04.760 |
because the market is way too frothy. Like it doesn't make a ton of sense to me. But the underlying 00:30:09.400 |
reality to me is that even if we get stuck with transformers and even if we get stuck with what 00:30:14.840 |
we have today, that shit's pretty amazing still. And I think like that the quality of the models that we 00:30:20.120 |
have right now, we can probably get away with a fraction of the parameters that we can actually run it 00:30:25.320 |
on smaller hardware. And there is some indication that you can actually fine-tune larger models. 00:30:29.880 |
Sorry, fine-tune is the wrong term here, but like there's still larger models for more narrow use cases 00:30:34.760 |
on smaller models and you still have decent quality, right? So I feel like it's at least lined up in a way 00:30:40.680 |
where I can totally imagine that this will be okay. And that's why for me, like I'm looking at this 00:30:45.480 |
mostly from like, what can it technically do right now? Because I do think we're going to have these 00:30:49.560 |
things self-running or we are going to have them running on maybe alternative implementations, 00:30:54.360 |
alternative trainings and maybe by selecting better codes rather than whatever has the highest entropy 00:30:59.720 |
on GitHub will also lead to better results. Hopefully. I mean, like I don't think they're 00:31:03.480 |
on this. I don't think they're in a position where they think or where we have HAI or anything like 00:31:07.560 |
that. I just, my suspicion is that if you actually have higher quality code in there, you might actually 00:31:12.520 |
get higher quality output. That's my assumption. Might be wrong, but I don't know if anyone's actually 00:31:17.400 |
training it on that front. I mean, it's also hard to like, it's very hard for me to make like, you know, 00:31:24.200 |
a prediction. Like, because, and this is probably why like the vast majority of people out there like 00:31:31.400 |
see me as like a negative person in terms of AI, but I'm really not. Like the thing for me is like hard to 00:31:39.960 |
make, you know, to, to, to, to expect these things getting better because like, I don't know, like if we 00:31:48.520 |
make a parallel to stable diffusion, like diffusion models for images, videos, like, like, okay, videos has a lot more 00:31:56.760 |
going on other than diffusion, but like, yes, like in there, like in the last, I don't know, like four to 00:32:04.280 |
five years. Yes. The quality improved, but like, if you take that, like the last two years, they didn't 00:32:10.120 |
improve much. Like it feels more like we're already plateauing in some way in the curve. 00:32:16.360 |
It also looks terrible. I mean, I think the problem is like, I really never care about the images because 00:32:20.520 |
like the images to me is like sort of like, it's novel and it's interesting, 00:32:26.120 |
but I don't have a use for that. And all the users I can imagine are terrible. Like I don't 00:32:30.040 |
want to watch advertisements made from AI or watch cinema movies made with the help of diffusion models. 00:32:37.240 |
Like, I understand this is probably where we're going to go. And I will complain about it because 00:32:41.080 |
I hate it, but I don't have the same reaction of text generation and code generation, text generation a 00:32:48.760 |
little bit. If someone sends me, I stop to my email, we'll get angry, but, but there is a responsible use of it. 00:32:54.840 |
And I don't really see the responsible use of the diffusion AI. Right. So I think like maybe the 00:32:58.280 |
diffusion for... I don't know. Like picture creation. 00:33:01.960 |
The Grannian logo is made with AI. Like straight with flaps. 00:33:07.800 |
You see? Like, I think there are cases, like I'm not, I'm also like not against like the use of that, 00:33:14.280 |
like for advertisement. Like, I mean, like, I think it really depends on what you think about, 00:33:21.000 |
like the business in general of advertisement, but, but yeah, like my point is like, if that's the working 00:33:27.720 |
example for like fundamentally, like there, there are a lot of similarities, like, like in how an LLM work 00:33:35.480 |
and how as the diffusion model work, at least mathematically speaking. So it's very hard for me to 00:33:41.160 |
say yes. Like in, in two years, LLMs will be like amazing compared to today. Right. 00:33:47.800 |
The thing is like, for me, the difference is that the LLM for code generation is already amazing today. 00:33:53.080 |
So I don't need it to get better. I just need it to be running cheaper, faster, maybe have less 00:33:59.480 |
shit in it. Like there's, there's some shit in it that I don't need, but it's not like I need a 00:34:05.480 |
breakthrough in computing technology or in model architecture or even like the data input necessarily. 00:34:11.800 |
It feels to me, it's already there. And I think that's the big difference because I, I don't like, 00:34:18.520 |
I feel like getting so much utility out of it. Like it's expensive right now, for sure. Like 00:34:23.480 |
I'm paying too much money for the inference, but I can't imagine the money getting to a 10th or less 00:34:28.200 |
of what it costs. How? Well, I mean in parts, because like what you can, you could theoretically 00:34:33.320 |
right now run, I don't know, Kini or, or any of those models on, I don't know, 300 billion parameters 00:34:41.720 |
in a reasonable output, which is cheap use you can buy. It's not cheap use you can cheaply buy, but, 00:34:47.000 |
but you can buy them. So if you just play this out for four years, we're going to be there where 00:34:52.040 |
many more people are going to have a GPU that is capable of running models at that size. And I 00:34:56.040 |
already know that that's good enough. Is it perfect? Probably not because there will be better things 00:34:59.720 |
coming down the line and people will like invest more shit into it. And like eventually the market 00:35:04.520 |
is going to like collapse. I think there is a bubble in it. But, but, but if we ignore that, if we ignore the 00:35:11.640 |
absurd interest of investors to get a slice of this already sort of overstuffed pie, I think the 00:35:17.240 |
fundamentals are, this is actually going to scale down so that it can realistically imagine this to 00:35:21.480 |
be in my computer. And even my five year old M1 Mac, Max thingy with the 64 gigabytes of RAM that it can 00:35:28.760 |
utilize largely for a GPU. I can actually run some pretty good models. And it's like, am I doing it a lot? 00:35:34.200 |
No, but, but every once in a while I do it just to see what's possible. And it's like, I can see it 00:35:40.600 |
being not entirely unrealistic. I don't know because the amount of time I'm pissed off compared to the 00:35:46.680 |
ones I'm amazed are probably like, and I guess like the vast majority, like, like, yeah, the number one 00:35:57.720 |
cause of that, like for my workflows is that like, I work on ideally work like on big repository, like 00:36:05.160 |
big amount of repos like, like a century, we have this ops more repo with everything like inside there, 00:36:11.640 |
like, and, and, and, and the amount of times, like any LLM out there gets confused about like the context 00:36:19.320 |
in which it operates, like the amount of times it just says nonsensical stuff about like terraform 00:36:25.720 |
modules, which is like a really good, like way, like, okay. So one annoying thing like about 00:36:31.960 |
terraform is like every single provider, like if you, if you use like the GCP or the AWS provider, 00:36:38.120 |
like the amount of changes they push out, they push a new version like once a week, right? Like that's 00:36:43.880 |
the ratio. This is why people should commit to backwards compatibility. But again, like that, 00:36:50.120 |
that would be something for me, like an AI would help, but they cannot because they gets like so 00:36:56.760 |
fucking confused about like that amount of changes and that amount of context that they just speed out 00:37:03.560 |
nonsense. Like, Hey, you used like a string here, but it's supposed to be an array of string. And you 00:37:09.960 |
check the documentation of the terraform module. Oh, that's not true. So like things like that, right? 00:37:16.040 |
Like, like, like, so I also say that my experience with using cloud for terraform was so bad that I 00:37:21.560 |
moved to Bulumi. You see, like, like that's, but, but I mean, to my perspective, like, like as soon as 00:37:28.520 |
Bulumi gets the same amount of usage of terraform, we'll end up like with, with the same. Unless they 00:37:34.920 |
learn and not change everything all the time. No, because I mean, like the providers are like APIs, 00:37:41.800 |
like the cloud providers API, like they can do whatever they want because like, they're like, 00:37:47.000 |
that's the reality. But maybe if people start complaining that the agents don't work, if they 00:37:50.360 |
change the stuff all the time, they will change. This is a lot, I think a question that I have in 00:37:53.960 |
general is like, is like, nobody gave a shit about humans, but with agents, I think people give a 00:38:00.360 |
shit about. Do we give a shit about machines? Like, I don't know. I feel like we care about the 00:38:05.160 |
machine a lot more than about the human, honestly. Because the machine is measurable. 00:38:10.040 |
But yeah, back to my point, like, I guess like if something like, at least for my use case, my daily 00:38:17.000 |
use case, if something doesn't like fundamentally change in the way we treat like the latent space, 00:38:23.560 |
you know, because like, I think like 99% of my issues with AI are because of the context. And so like, 00:38:29.800 |
to my perspective, even if we go like to 10 billion tokens context, like, won't change anything. Like, 00:38:35.400 |
I don't think like scaling the context for my use cases is the answer to the problem. I think like 00:38:41.480 |
the answer to the problem is, yeah, changing the architecture around how we loop into the latent space. 00:38:49.960 |
Right. And so, yeah, I guess like if some company at some point starts producing like a foundational 00:39:00.200 |
model that works differently in terms of context and how like it treats tokens in inputs and outputs, 00:39:07.960 |
yes, then probably at that point, I will be like optimist in terms of like this thing getting better. 00:39:13.400 |
So for you it's about learning and remembering, right? Like a lot of like, 00:39:17.480 |
that it doesn't start from zero and has, has a way to... 00:39:21.240 |
Yes. Especially when I cannot control again, like, because if I use like entropic models or 00:39:26.920 |
OpenAI models, like I cannot control the learning, like in general, like, because they put out like 00:39:32.040 |
a new version of the model, but it's still called like GPT-5 codex or whatever. And maybe everything 00:39:37.560 |
changed for like my use case. And I have no idea. 00:39:40.520 |
This is a real problem. Like even, even a move from, from, I have this one, 00:39:44.440 |
like the move from 4 to 4.5 change things dramatically. And I was not able to evaluate 00:39:49.640 |
because like, I have no idea what changed. It's just like, it's different now. 00:39:52.040 |
So I share that problem. I share that opinion. I share that problem. I don't know what the solution is. 00:39:58.280 |
I just maybe a little bit too optimistic that the open-waste models are like a counterbalance 00:40:02.200 |
to sort of the OpenAI-Entropic duopoly because we do have a bunch of Chinese models and I think they are actually par. 00:40:09.000 |
Yeah. But again, like, even if you like, if you look at the stable diffusion ones or like, 00:40:15.800 |
if you take Quen image or if you take like Quen for like, just tax generation, like... 00:40:20.760 |
I don't think that they're better. I just think that they, they at least are in the point... 00:40:25.320 |
because they exist, if Entropic and OpenAI decide to go even more nuclear on, on not sharing anything, 00:40:32.040 |
then I feel like, okay, off to China. I am, I don't really care. And you can see this now, 00:40:36.920 |
like the couple of American companies actually want to train their own models. They are just doing it 00:40:40.520 |
on top of the Chinese models. Like that the cursor model clearly is trained on one of those Chinese 00:40:44.280 |
models. I think Windsor did the same thing. So yeah, it's a little bit ridiculous that we have to sort of 00:40:49.720 |
ref, like go to the other side of the world and to another political system to, to get open weight 00:40:55.000 |
models, but I will take it because nobody else has them. But it is to me, it creates this counterpoint, 00:40:59.640 |
at least for me, makes it feel like there's a possibility that we are not... Like if we didn't 00:41:05.160 |
have the Chinese models, I would actually be much, much more negative on the whole thing because then 00:41:09.080 |
it would imply to me that you can only run this as a very large corporation and there's no chance that 00:41:14.360 |
this will trickle down. It's just right now, the existence of those models implies that it's easy 00:41:21.640 |
enough, it's hard to say, but it's like, it's in the realm of possibility that we will see more of 00:41:27.080 |
those. And even sort of this absolute failure of your European company, this mistral thingy, 00:41:31.480 |
does create a model. So like this, it is what it is, but they're capable of producing something. 00:41:39.640 |
Is it the most amazing? No, not at all. And I think they could do much better if they had a 00:41:44.280 |
little bit more ambition, but they at least have a model. And so it shows that there is... 00:41:49.240 |
Like this is not just Google runs the world. I think it's going to be, there's going to be some 00:41:53.480 |
competition. Yeah, that's for sure. Yeah. I don't know. Like I also plan to like give another try 00:41:58.600 |
to Cursor as they release like version two with Composer and that kind of thing to see. Because again, 00:42:06.440 |
like I saw some videos like of people trying it and it's like at least like their 00:42:14.200 |
model is very fast. So at least like that could, I don't know, like change my perception probably like, 00:42:23.480 |
like, okay, maybe I still have to do like 10 re-rolls, but like, like the amount of time I need 00:42:29.240 |
to wait is different. So maybe, yes. I don't know. Like, like in terms of perspective, again, 00:42:35.320 |
I see some tiny things somewhere like that, that makes me wonder about the future. Like I see some 00:42:43.640 |
very little few things here and there. But yeah, I think like to, like for me to actually say, okay, 00:42:51.160 |
I can trust whatever, like I can trust in general what it spits out or what it gets from, from, from, 00:43:00.040 |
from my thoughts. Yeah. I think it would be still hard for me to use. Like at least again, to, to my 00:43:06.120 |
like field scopes, like, like even on my open source project. Like I think I could try to like do some 00:43:13.800 |
like annoying stuff from granny and like, but still feel risky. Try some repo cases. They are fun. 00:43:18.360 |
They're really fun. Or just see what it does if you just let it loose on an open source project. I did 00:43:24.360 |
this a bunch of mini changes. Like, I just want to see how adventurous that it gets. And it, it gets 00:43:29.880 |
bloody adventurous. This is interesting. I don't know. Like the vast majority of things, 00:43:33.960 |
like I get annoyed on my projects, like Windows related stuff because I had like this terrible, 00:43:39.320 |
terrible idea to support Windows where I just could say like, no, but anyways, like, and, and, 00:43:44.920 |
and I tried once. But you know what? So fun thing. I had this situation very recently. I don't have, 00:43:50.200 |
I have a Windows computer, but like, I don't want to boot for like one stupid bug fix. So one of the 00:43:55.800 |
things that I did is like, I told Claude to fix the problem. It knows it's on a Mac and I told it, look, 00:44:01.240 |
you can commit to a branch, you can push to GitHub, and then you can run the gh command to pull on the 00:44:07.720 |
status report. And so it actually iterated while committing to GitHub and use the GitHub runner as 00:44:13.480 |
a Windows machine. It was, it was marvelous. It took it like three hours, but it was working. And I think 00:44:19.160 |
it's like, it's kind of fun. Yeah. I mean, I could try for that, like, but yeah. It was nice talking. 00:44:26.840 |
Yes, absolutely. But thanks for, thanks. Yeah. Thank you. It was nice.