Talking Agentic Engineering with Giovanni Barillari

Hello. Hello, Armin. Nice to see you again. Nice to see you. I'm Giovanni. I'm currently working at Sentry as a Site Reliability Engineer. But let's say I kinda worked a lot with Python and web development, some stuff like that. And yeah, I kinda contributed to OpenSearch like, I think I started like 15 years ago with Web2Py, like the web framework from Massimo DiPiero.

So yeah, that's kinda me, I guess, I'm actually a physicist, so I might have some weird opinions sometimes on software and the industry, but yeah, I guess that's part of me, being me. Yeah. So we had a discussion, when was it, like almost two weeks ago, I think briefly in a coffee place on AI, thick programming.

And I think like maybe, I feel like there were two themes we discussed. One was the impact that it has on OpenSource and sort of the what it could be versus what it is today. I think this is like probably, I think the main things. So one of the weird things at the moment is that I always use identity coding a lot and I think it works for me.

But at the same time, obviously I know that a lot of people who are having a completely opposite opinion to that. I started going into like, why is that? And I think that at least my realization at the moment is that it actually takes quite a bit of time and to learn, which is ironic because it sort of feels like it shouldn't.

But I feel like there's, in a weird way, there's a little bit of a skill to it. So one of the things that I did recently, for instance, was I built this, I mean, I built a lot of stuff for the company. But when I open source, it was this Q thing, like a durable execution system, absurd, yeah.

And the reason I even was willing to build it in a way was because I am from a purely philosophical, software engineering, philosophical point of view. I believe in two things very strongly as little dependencies as possible and for a system to crash really well and recover, like it should be, it should be like the disability should come from, like, it doesn't matter how it sort of terminates itself, like it should come back to it in a way.

And so durable execution for me is an interesting thing, but I also didn't want to combine 20 different sort of startup-y kind of solutions out there together and I felt like it should be easier to do. So the only thing is that I was able to do this with Claude and Codex rather efficiently.

And otherwise I probably might not have built it at this stage of the company because it is sort of a side quest. So that's one of my recent lessons of like, you can actually kind of build open source software with it and it actually feel kind of good about the quality that it has.

And I feel like there's more leverage of me to maybe not over engineer things quite as bad because I can do it with just the agent. I feel like there's a little bit of a pushback against sort of the madness that was maybe the last, at least the last 10 years of software engineering that happens in a lot of companies, which is like all of a sudden it's like an army of third-party services that you don't really understand.

So what's your take on that? So like, I think like I have a lot of point of views on this, like, and in general, like about AI for coding, I guess, like, I think I have like a draft for a blog post, like sitting there like a couple of years.

And once in a while, like I try to add stuff there and never publish it. But anyways, I digress. I think it really depends on the context, right? Like in which you're operate because like, I guess we are like on the opposite sides of things, like regarding almost everything, like on the work side of things, right?

Like I work for a company, like an established company. You're like starting with something new as a startup. I did a bunch of startup in the past, but like, yeah, it was 15 years ago. So like there was no, I stopped to, so I don't know, like if at the time, if I had the chance, I would use something like that.

But specifically like for open source, I think it depends. Like, first of all, like I'd say in the last 10 years, like even in a community like the Python one, which I'd say it's like, it has like a slow pace compared to something like JavaScript or, or, or, or stuff like that.

And I'd say things that have changed quite, quite a lot in terms of even just like in terms of the amount of libraries out there that you can, you know, just put it in as an, as a dependency. So I'd say like the amount of code and projects out there, it grew a lot and like the quality of those projects, like it's very different, right?

Like, like we have some like really long standing projects, which are, I'm not saying like they're good quality, like by, by definition, we have plenty of new stuff. Like some of that is good, some of that is bad, but the role like of agent encoding for managing an open source, like a project to me is weird because I guess like my main concern when I, when I work on one of my open source project is like the long-term maintenance.

Like I rarely put somewhere like out there and, and just never care about it, about it anymore. Like I, I, I'm not really into like source available rather than open source, right? Like, like there's very few stuff that I don't touch anymore. Like after I put them available for the, for the people to use.

So yeah, to my perspective, like in your specific case, like I see how it, it works because it's something like very self-contained quite small in terms of features or, or, or scope or, or stuff like that. But yeah, I guess like my question for you would be what, what, how that project looks like at, like in two years, like, do you care at all?

Like, like yeah. I mean, I think, so I, I, first of all, I think you're right that there's, um, there's a lot more source code now and I actually always found that to be a problem. It's not that I love curl as an example. I, I think there's a lot of stuff wrong with curl, but what I love about curl is that it commits itself to very, very long-term stability to it's, it's sort of like a, like a rock in the shore.

If you call it this way, I guess it's just there, it's going to run everywhere. It doesn't change. It's a very, very reliable piece of software. It has its weird behaviors. And I have learned a lot of them over the years, but, but it's there, it works. And it means that a lot of improvements can land in one piece of infrastructure that many of us are using.

Same with SQLite and many other projects. There's a completely different vibe than, for instance, if you take the reason by a lot of, by fabric, not by fabrication, I would love, um, what do you call this? But there's so many more projects sprawling in the JavaScript ecosystem. And I think the reason here is actually less that people are combining efforts together and more it's like, Hey, I also want to build an open source project.

Like see the act of creating it and not the act of maintaining it. The maintaining part is the hardest one to creating it is the easy one. Yeah. I think that there's because maintaining is hard. There's a question like how well does that work? I think in general, my, my theory is because AI makes writing new code much easier.

That code should not be open source. most of the time it should be whatever you need for your code base. And then maybe the entire code base goes public. I don't know, but, but, but we don't need more open source libraries. We need less. We need more consolidation. We need more people working together to solve problems, not 20 different JavaScript front end frameworks or whatever.

But I actually think that if you, if you want to maintain things, if you really want to commit yourself to maintenance, then AI will actually help a lot. Um, because you can, if you want, because the thing is like, you need to commit yourself to maintenance. What does it mean?

There's a lot of stuff that comes with it. Some of which are really, really crappy, like writing change logs or, uh, finding report cases. This is one of my most favorite things that, that AI can do is this is weird description of, of a problem that someone might've run into and like, Claude, make me a report case.

And it's like, I hate doing that, but it can do that. Right. So I think it depends. How do you, how do you wield the sword? And so I don't think it's like, I think that would be a different thing if you sort of say like, Hey, the way I'm going to maintain my libraries, I'm just going to AI slope, commit everything.

Right. There will be a way of doing that, but I don't think it's going to be one that has like a lot of users at the end of the day. I don't know. I think it's complicated. Yeah. But to that point, like my take would be like, like from my experience and again, like maybe, maybe that also depend on the scale of the project, like in, in terms of popularity of the projects I maintain, but like, even at the scale I am, I rarely find the act of producing code for my open source project, like the hard part of the maintenance.

Like to me, like the major of burden comes from like issues management or release schedule. Something that Claude is capable of assisting with. No, I think you're right. But I give you those two examples, but I think like, I also, I feel like I'm quite fast at writing code and yet I very rarely write code now.

I mostly sort of delegate to the agent. And one of the reasons why I actually found this to be, so for instance, absurd is a good example. If I, if I would have written it by hand all the time, everything of it, I wouldn't have built it like I did because I hate SQL.

I really, really do. It's not my favorite language. Most of the people. So I would have, I would have erred on the side of just writing most of it in Python, writing a Python SDK and then do the least amount of SQL necessary. But actually that's precisely what I didn't want to do.

I knew that the right way of doing this is the same way as PGMQ does it, which is you write a bunch of stored functions because then you do everything from the database and the SDKs are very, very tiny. And so now I can have a Go SDK, a Python SDK, a JavaScript SDK, but I would not have enjoyed writing that because it would have involved writing one of my least favorite programming languages.

It's good for queries, but the moment you do more complicated stuff of it, it's not enjoyable. And so I think it changes a little bit the perspective of how you do things because all of a sudden I don't quite mind as much anymore. Doing the things that I always knew were right to do, but it's just not enjoyable.

I had this thing also recently where I basically had this small little utility script, which one of the quote unquote requirements that I made myself was that it should work on Linux and Mac. And it just didn't, it just didn't. Right. And it became so annoying to maintain because it's like, it basically depends on like different parameters of set and stuff like this.

I just went to Cloud and said, look, let's just rewrite this in Perl. It's not that I like Perl, but it runs everywhere and it's very good with regular expressions. Right. Yeah. And it's one of the things that you can throw in any machine and you don't have to deal with like dependencies and for what it was doing, which is basically just parsing a little bit of stuff in a build step.

It's perfect. Right. Yeah. I don't know. Like, I guess maybe I'm just lucky. Like the amount of things, like if I think about like in general, my, I don't know, daily life in programming, like both on the work side of things, like open source things like the amount of actually boring stuff or things that annoys me.

It's very little, like, like in general, right? Like, like, I don't know. Like, like probably most of the people find annoying to write Vash scripts, but I learned that like, I don't know, ages ago. So like, I don't know. I'm, I'm just like, it's hard for me to do that step.

Like where I see a problem and maybe like, yes, I don't like Vash, but maybe I need to write like, I don't know, those 10 lines of Vash. Vash, like, it's hard for me to do the switch and say, instead of just typing, like switch the context, right? Yeah.

And then say, oh, okay, let's ask. I felt the same at one point. And now I'm like, it has switched in my head somehow. Yeah. I have a different version of this, which is like, for many years, I was sort of in my mind, I had this idea that I would not drive an automatic car.

It's just, it was like, I will only ever buy stick shift cars. But at one point I was like, this just doesn't make a ton of sense to me because I really like, what's it called? Adaptive cruise control, which works most better if a team goes to zero and you don't have to shift like a maniac.

So it was sort of this, I had to open myself up to the idea that I was just like, there's a car for fun. And then there's a car that just has to get me through the day. And that works better for an automatic. And I feel like maybe sort of I had made the same shift in my mind at one point about like the act of like punching it into the keyboard.

It is stimulating, but it's not necessarily the important one. Yes, it's definitely not the important one. Like I spent a lot of time thinking before actually typing. But I guess like if I have to put like on, on like the balance, right? Like, like the thing to say, okay, now I, I, I've done all of my thought process, which I will do like in any case, like, like to me, it's very hard to delegate the thinking to whatever LLM like exists out there.

Like at least for now, mostly because of the lack of predictability in the output. Like, like, it's very hard for me to, to delegate like the thinking to something like, which is not really, um, repeatable. Like can the same input can give me like 10 different answers. Do you still find that to be a problem now that it's very unpredictable?

I'd say like, like, again, like in general, no, but talking about software, like it's, it's not the way I approach software, right? Like, like, it's not like, like my way in general of, of architecting something is not like there are a hundred possible ways of doing this. Like my, my, my general approach is probably there are like two, three way correct ways of doing this.

And, and I need to pick the optimal one for my use case or context. Like, of course there are like probably a hundred ways to do the same ship, but like, realistically speaking, like probably three out of those a hundred are worth, you know, investigating. And, and the point is like, if I don't have, if I cannot like reproduce like this, those three ways, like with a certain amount of certainty, sorry.

Yeah. I, I, I, I find it very hard to, you know, like, like leave the control to, to, to that. Right. I guess like for me, I think I understand this to me, like mentally, I never feel like I don't have to control in a way. And, and this is actually a little bit weird because like very clearly I let it do really crazy stuff.

Right. Like it connects to my production system and, and checks the database and stuff like this. Right. Yeah. I will never, never do that. Well, I'm like, I'm, I'm very close to my escape button if it does something stupid. But, but, but like ignoring those sort of like more extreme versions of it, it runs tests, it writes code, it does some stuff, but at the end it presents me with a diff.

And so it is me that commits. It is me that reviews, it's me that sort of interacts with it. Right. So I don't feel like my agency is not there. I can sort of fall down into a path where I'm like, like, this is like a sort of a pure slot project where I don't really care.

Absurd has like the main SQL part and the driver, which is really good, I think. But then it has this UI, which is called Habitat, which I just, I just wanted to see my tasks. And I didn't care. It, this is pure slot, but, but I would have not written the UI in the first place before.

Right. Yeah. Yeah. And so. But you still didn't, like that didn't change. Like, like something else, like you said of someone else. Yeah. Like something else. But the thing is that now I have the UI and, and, and I feel, I feel really happy about it because I can like, look, I run this on one of my agents and basically every single step that it does, I can now click through, I can, I can see it.

And it even does some nice things for like, if I click on a string, it sort of changes from JSON rendering with like the escapes to like inline as much easier to debug. It gives me like, it gives me pure joy using this UI and, and debugging my shit better.

That if I didn't have the agent at this point, I wouldn't have committed myself to doing it. So I think it changes the calculus in a way. If you, and because you mentioned earlier, like, is it different for, for an open source project or some, some new startup as another company, I think there are certain projects that you run into at the company where the actual hard part is not writing the code.

In fact, it is very possible that they have a monumental task where the actual only change is one configuration parameter. And what actually is really hard is validating this break, right? So you often create these crazy harnesses around it to validate that what you're changing right now actually is going to work or you're creating this massive migration strategy.

Right. And I think what AI helps with now is helping you create the tools to then do those changes in a way, because a lot of these tools are like throw away anyways, you're going to need them for like a week or maybe 90 days or however long it takes you to run the migration.

And then afterwards they're like, just dev null, delete it. So I feel like there's like, even for, I think that can work in a very small company. I don't know. And to 50, 50 people, I'd say like, but it, for me, like it's harder to imagine even, even if I like, I don't know, like official, like building a company today, right?

Like, and, and, and scaling that company to 500 people, not necessarily all of them working on code, of course, but let's say like, I don't know, half of them, whatever, 250 people working on code. And like, even, even if I can picture like working with AI, especially for that kind of thing, you mentioned might work like really good, like in the first, I don't know, year.

I think like the major time consuming activity in big companies is, doesn't have anything to do with the code. Like, like, but even, even like, like thinking about features. But I don't think the AI makes it harder. No, it's just makes, makes cheaper to make experiment as soon as those experiments are self-contained enough.

But again, like if we're talking about the big company is the same. But let's take a, let's take a century for instance, right? I used to work there. One of my least favorite tools, I appreciate that it exists, but I absolutely hated interacting with was the Snuba admin. Never had the chance to interact with it, but I heard a lot of stuff.

And it's not because this tool was badly built or something like this, but it was like, it was built with about as much time as you have in addition to all the other shit that you have to do. But then everybody else has to use this tool, right? And so if, if, if we were to take like the, the very sort of simplistic view of it and it's like, what's the quality of that tool?

And what would the, what would be the quality of that tool if it was written with AI? Now I'm not going to judge if it's going to do better or worse. That's my point. My point is that it actually took a really long time to write the tool too, right?

Because reasons you need to deploy it and then, and then you can maintain it. And then they're going to be all the people that are going to complain about all this stuff that it doesn't do. And then someone else is going to throw some stuff into it. And so I, from my, from, from my recollection of actually even trying to look at the source of existing, it wasn't amazing.

Yeah, yeah. But it wasn't built to be amazing. Yeah, of course. So, so these tools exist if you want them or not. A lot of the stuff within companies exists that, that is this internal thing. Yes. But I think like it's a very tiny portion, like on big companies, right?

Maybe, but it is, is still in my mind. All of these tools in my mind are where all my frustration is, in a way. Okay. It's not, it's not the like, it's not that curl or it's not the Linux kernel, or it's not these things that are actually built for purpose.

And, and like, and they're really, really rock solid pieces of software engineering. It's all the other shit because like, that's the one that you run into, right? Like the only way for me to debug an issue is to go through Snow White. Yes. Again, I don't want to pick on this thing, but it's like, this is like, there was a significant part of my day in every once in a while.

I was going there and running queries and it was just never great. Yeah. And then I could look at like, here's this other tool I could use, where like actual people put ridiculous amount of effort into compared to the internal tooling that we built. And it was better, right?

And I can tell you objectively, my internal debugging tools now are better. Yes. But it builds a fraction of time. And, and I think that problem will not necessarily scale any worse to a large company. No, I completely agree. My point was like, I think like the, the amount of time spent on those tools, like on a bit, like, like once the company like becomes bigger and bigger is, is less and less relevant in a way.

Like, I don't see like the majority of discussion happening in Sentry, like during meetings about Snow Batman, like, it's just, you know, it's there. So like, so yeah, to me, to me, like, that's, that's the kind of the, the barrier in between, you know, like, I totally agree. Like every company out there could use like any coding, like AI coding tool to, to build like UIs over stuff or, or, or yeah, again, like internal tools for making queries or checking whatever the state it is.

My point is like talking about real software, like. Okay. Here's a question. How good would AI have to be? Like, like how would, would it have to work? So do you actually think it would work for real software? Like, from my perspective, like the main, the, again, like the main point for me would be to have something that is predictable, like, like the predictability to me, like, is the, like, probably like the reason number one, like, I, I, I really find really hard, like working with AI in general.

And it's, it's predictability in the code that it generates. So it's predictability in being able to tell ahead of time if it's going to be able to do this task. kind of, or a combination of thereof. Like, well, in which part I think to find it most unpredictable. I mean, at the end of the day, it's kind of the same, right?

Because like, if you have to reroll 10 times, right? Like, like, and, and, and five, like, and even, even if like five, five out of those 10 time is decent and it produced something I can like work on, right? Like patch and, and, and, and adapt? For me it's different because if I feel like there's like, yes.

But if I can know ahead of time, if the coin flip, like basically, let's say this is in fact a randomized thing and it would work a little bit like a role play game. And I was like, this is a, I don't know, this is a task that requires throwing 3d6 or something.

Whereas this is one that requires, I don't know, one and everything over two sort of is, is success. Like you have ahead of time, you have an indication of, is it's going to be a problem worth throwing the AI to or not, right? And I think that the reason why I feel like I feel a little bit more confident about AI now is that I have a better understanding, at least for the models that I'm using, which problems are problems that the AI at least is going to make some progress versus which are the problems where I don't even have to try because it will not go anywhere or the output is going to be too random.

So that to me is a different thing because it gives me the opportunity to sort of opt out of it before even wasting like coin tosses in that sense. I'd say like probably, okay, a few things here. Like, I guess like if some condition were different, like about the current state of, you know, AI.

Things like if the media exposure was less, if the pace of everything was like slower in terms of investment, in terms of Twitter discussions, in terms of like, like in terms of a lot of company, like pushing hard on, on, on this, these kinds of things. And the fact that there are practically speaking, like two companies doing the vast majority of the hard lifting here on AI and the topic.

I'd say if all those conditions were different, right? Like if we were in a condition in which like I had like, first of all, more open weights models or open source stuff in terms of, you know, LLM, I don't know, like train, like how to do training. You say, yes, I can read papers, but like papers that don't tell the whole story.

Right? I think what's interesting about this is if we go with this open point for a second, we actually do have at least from a, from a pure perspective of like how it works. I'm not going to say it's easy to train a model, but it's actually not really all that hard.

What is actually really hard is to source all the data, particularly to source it in a way that's actually at the end of the day, ethically correct. And, but at least I had the chance, right? Like to, to my point, like I don't want, in general, if the AI model I could use, like if the LLM I could use was not trained on like all the JavaScript code, which is in, in, in GitHub nowadays, which I would say like, even when they started, like even before the AI slot, like, like the quality wasn't good, right?

Like, like, like in average, like on average, like wasn't good. Like, I don't want, like, I don't want something like that. I like, I would rather prefer like to, like to invest time into sourcing out my own source of, you know, learning. Like, yes, it would require me a bunch of time to do that and probably like produce like something more close to what I would like to see the ISP now.

I think probably you might not produce enough corpus data yourself to be able to run an LLM just in your own code, right? Yeah, no, no, no, I'm not saying like just my code, but like, I kind of have like a few ideas on which code, like I didn't wrote, like to, to train an AI.

But yeah, I think that you could actually probably get away with like a well-selected set of open source, particularly if people sort of opt into. I don't think you need necessarily all of GitHub and all of the terrible influencer blog posts with little code snippets that don't really do much.

I think like we are not going to get rid of AI. So I think it's here to stay. And I actually do think it is a little bit like a steam engine in the sense that the basic idea of what the transformer looks like is simple enough that people will always be able to reproduce it.

So then there's just a question of data. Seemingly data will be easy to get. Good quality data, maybe less so, I don't know, but you can even run to one of the existing models and just use it as a teacher model and sort of generate more slop out of it.

And you're going to get, you're going to get a model that's at least sufficiently decent. If you, if it tumbled five times, then open weights. Who knows? So the crazy thing is if you do actually do that, like if you produce from one model to another and you do it a couple of times, the first iterations immediately gets better.

Better under what? Better in whatever evaluations they're doing. Yeah, but again, I think like we also don't have like so many valuations method that makes sense nowadays, but whatever. Like I would agree with you that AI won't go away. I'd say sadly, because like to my perspective, like I'm not sad by the fact that LLMs will stimpy there like in 10 years from now.

My main argument is like, I would rather prefer to see like a shifts in how we produce models for like, especially speaking for coding stuff. stuff. I would like to see like a direction in which we try to produce like smaller models to be able like to run them like on normal hardware without like, you know, the need of, I don't know how many clusters of super huge and BTS GPUs, which is, which is why like to see the conversation, like in some places to sit around at some point, because again, like, yes, yeah, I won't go away.

But at the same time, like, it's very hard for me, like to even like doing today, like an investment over a company like Anthropic, even open AI, like even, even if it's like worth, I don't know, like how many billions of dollars, like, I don't know, in, in five years, those companies will still be there.

Are they going to be profitable? Like, like, I don't know, like, there's a bunch of questions, right? Like, so I'm not saying like, I also don't know. I don't know. I really don't know if they're going to stick around. I don't even know if the tech necessarily is worth too much money that seemingly the market sort of puts on it right now.

And clearly we have a bunch of energy problems too, right? So there's like, all of this I think is unsolved. But I also think like, if we, even if you take like one of the open weights models that exist today, they're actually not that terrible. I mean, like I still can't run them without sort of renting some super like hopper set up somewhere in some data center.

And maybe even will take a couple of years to get to the point. And I think the models will have to get smaller. And I think there is some argument to be made that it can be smaller because you don't need all the world knowledge to run program. But I feel like we're at least on a trajectory where it still makes sense to me, right?

It's, I don't really like, to be fair, I think like most like 90% or more of all of those AI startups are going to fail in one form or another because the market is way too frothy. Like it doesn't make a ton of sense to me. But the underlying reality to me is that even if we get stuck with transformers and even if we get stuck with what we have today, that shit's pretty amazing still.

And I think like that the quality of the models that we have right now, we can probably get away with a fraction of the parameters that we can actually run it on smaller hardware. And there is some indication that you can actually fine-tune larger models. Sorry, fine-tune is the wrong term here, but like there's still larger models for more narrow use cases on smaller models and you still have decent quality, right?

So I feel like it's at least lined up in a way where I can totally imagine that this will be okay. And that's why for me, like I'm looking at this mostly from like, what can it technically do right now? Because I do think we're going to have these things self-running or we are going to have them running on maybe alternative implementations, alternative trainings and maybe by selecting better codes rather than whatever has the highest entropy on GitHub will also lead to better results.

Hopefully. I mean, like I don't think they're on this. I don't think they're in a position where they think or where we have HAI or anything like that. I just, my suspicion is that if you actually have higher quality code in there, you might actually get higher quality output.

That's my assumption. Might be wrong, but I don't know if anyone's actually training it on that front. I mean, it's also hard to like, it's very hard for me to make like, you know, a prediction. Like, because, and this is probably why like the vast majority of people out there like see me as like a negative person in terms of AI, but I'm really not.

Like the thing for me is like hard to make, you know, to, to, to, to expect these things getting better because like, I don't know, like if we make a parallel to stable diffusion, like diffusion models for images, videos, like, like, okay, videos has a lot more going on other than diffusion, but like, yes, like in there, like in the last, I don't know, like four to five years.

Yes. The quality improved, but like, if you take that, like the last two years, they didn't improve much. Like it feels more like we're already plateauing in some way in the curve. It also looks terrible. I mean, I think the problem is like, I really never care about the images because like the images to me is like sort of like, it's novel and it's interesting, but I don't have a use for that.

And all the users I can imagine are terrible. Like I don't want to watch advertisements made from AI or watch cinema movies made with the help of diffusion models. Like, I understand this is probably where we're going to go. And I will complain about it because I hate it, but I don't have the same reaction of text generation and code generation, text generation a little bit.

If someone sends me, I stop to my email, we'll get angry, but, but there is a responsible use of it. And I don't really see the responsible use of the diffusion AI. Right. So I think like maybe the diffusion for... I don't know. Like picture creation. The Grannian logo is made with AI.

Like straight with flaps. My absurd logo is also made with AI. You see? Like, I think there are cases, like I'm not, I'm also like not against like the use of that, like for advertisement. Like, I mean, like, I think it really depends on what you think about, like the business in general of advertisement, but, but yeah, like my point is like, if that's the working example for like fundamentally, like there, there are a lot of similarities, like, like in how an LLM work and how as the diffusion model work, at least mathematically speaking.

So it's very hard for me to say yes. Like in, in two years, LLMs will be like amazing compared to today. Right. The thing is like, for me, the difference is that the LLM for code generation is already amazing today. So I don't need it to get better. I just need it to be running cheaper, faster, maybe have less shit in it.

Like there's, there's some shit in it that I don't need, but it's not like I need a breakthrough in computing technology or in model architecture or even like the data input necessarily. It feels to me, it's already there. And I think that's the big difference because I, I don't like, I feel like getting so much utility out of it.

Like it's expensive right now, for sure. Like I'm paying too much money for the inference, but I can't imagine the money getting to a 10th or less of what it costs. How? Well, I mean in parts, because like what you can, you could theoretically right now run, I don't know, Kini or, or any of those models on, I don't know, 300 billion parameters in a reasonable output, which is cheap use you can buy.

It's not cheap use you can cheaply buy, but, but you can buy them. So if you just play this out for four years, we're going to be there where many more people are going to have a GPU that is capable of running models at that size. And I already know that that's good enough.

Is it perfect? Probably not because there will be better things coming down the line and people will like invest more shit into it. And like eventually the market is going to like collapse. I think there is a bubble in it. But, but, but if we ignore that, if we ignore the absurd interest of investors to get a slice of this already sort of overstuffed pie, I think the fundamentals are, this is actually going to scale down so that it can realistically imagine this to be in my computer.

And even my five year old M1 Mac, Max thingy with the 64 gigabytes of RAM that it can utilize largely for a GPU. I can actually run some pretty good models. And it's like, am I doing it a lot? No, but, but every once in a while I do it just to see what's possible.

And it's like, I can see it being not entirely unrealistic. I don't know because the amount of time I'm pissed off compared to the ones I'm amazed are probably like, and I guess like the vast majority, like, like, yeah, the number one cause of that, like for my workflows is that like, I work on ideally work like on big repository, like big amount of repos like, like a century, we have this ops more repo with everything like inside there, like, and, and, and, and the amount of times, like any LLM out there gets confused about like the context in which it operates, like the amount of times it just says nonsensical stuff about like terraform modules, which is like a really good, like way, like, okay.

So one annoying thing like about terraform is like every single provider, like if you, if you use like the GCP or the AWS provider, like the amount of changes they push out, they push a new version like once a week, right? Like that's the ratio. This is why people should commit to backwards compatibility.

But again, like that, that would be something for me, like an AI would help, but they cannot because they gets like so fucking confused about like that amount of changes and that amount of context that they just speed out nonsense. Like, Hey, you used like a string here, but it's supposed to be an array of string.

And you check the documentation of the terraform module. Oh, that's not true. So like things like that, right? Like, like, like, so I also say that my experience with using cloud for terraform was so bad that I moved to Bulumi. You see, like, like that's, but, but I mean, to my perspective, like, like as soon as Bulumi gets the same amount of usage of terraform, we'll end up like with, with the same.

Unless they learn and not change everything all the time. No, because I mean, like the providers are like APIs, like the cloud providers API, like they can do whatever they want because like, they're like, that's the reality. But maybe if people start complaining that the agents don't work, if they change the stuff all the time, they will change.

This is a lot, I think a question that I have in general is like, is like, nobody gave a shit about humans, but with agents, I think people give a shit about. Do we give a shit about machines? Like, I don't know. I feel like we care about the machine a lot more than about the human, honestly.

Because the machine is measurable. But yeah, back to my point, like, I guess like if something like, at least for my use case, my daily use case, if something doesn't like fundamentally change in the way we treat like the latent space, you know, because like, I think like 99% of my issues with AI are because of the context.

And so like, to my perspective, even if we go like to 10 billion tokens context, like, won't change anything. Like, I don't think like scaling the context for my use cases is the answer to the problem. I think like the answer to the problem is, yeah, changing the architecture around how we loop into the latent space.

Right. And so, yeah, I guess like if some company at some point starts producing like a foundational model that works differently in terms of context and how like it treats tokens in inputs and outputs, yes, then probably at that point, I will be like optimist in terms of like this thing getting better.

So for you it's about learning and remembering, right? Like a lot of like, that it doesn't start from zero and has, has a way to... Yes. Especially when I cannot control again, like, because if I use like entropic models or OpenAI models, like I cannot control the learning, like in general, like, because they put out like a new version of the model, but it's still called like GPT-5 codex or whatever.

And maybe everything changed for like my use case. And I have no idea. This is a real problem. Like even, even a move from, from, I have this one, like the move from 4 to 4.5 change things dramatically. And I was not able to evaluate because like, I have no idea what changed.

It's just like, it's different now. So I share that problem. I share that opinion. I share that problem. I don't know what the solution is. I just maybe a little bit too optimistic that the open-waste models are like a counterbalance to sort of the OpenAI-Entropic duopoly because we do have a bunch of Chinese models and I think they are actually par.

Yeah. But again, like, even if you like, if you look at the stable diffusion ones or like, if you take Quen image or if you take like Quen for like, just tax generation, like... I don't think that they're better. I just think that they, they at least are in the point...

because they exist, if Entropic and OpenAI decide to go even more nuclear on, on not sharing anything, then I feel like, okay, off to China. I am, I don't really care. And you can see this now, like the couple of American companies actually want to train their own models.

They are just doing it on top of the Chinese models. Like that the cursor model clearly is trained on one of those Chinese models. I think Windsor did the same thing. So yeah, it's a little bit ridiculous that we have to sort of ref, like go to the other side of the world and to another political system to, to get open weight models, but I will take it because nobody else has them.

But it is to me, it creates this counterpoint, at least for me, makes it feel like there's a possibility that we are not... Like if we didn't have the Chinese models, I would actually be much, much more negative on the whole thing because then it would imply to me that you can only run this as a very large corporation and there's no chance that this will trickle down.

It's just right now, the existence of those models implies that it's easy enough, it's hard to say, but it's like, it's in the realm of possibility that we will see more of those. And even sort of this absolute failure of your European company, this mistral thingy, does create a model.

So like this, it is what it is, but they're capable of producing something. Is it the most amazing? No, not at all. And I think they could do much better if they had a little bit more ambition, but they at least have a model. And so it shows that there is...

Like this is not just Google runs the world. I think it's going to be, there's going to be some competition. Yeah, that's for sure. Yeah. I don't know. Like I also plan to like give another try to Cursor as they release like version two with Composer and that kind of thing to see.

Because again, like I saw some videos like of people trying it and it's like at least like their model is very fast. So at least like that could, I don't know, like change my perception probably like, like, okay, maybe I still have to do like 10 re-rolls, but like, like the amount of time I need to wait is different.

So maybe, yes. I don't know. Like, like in terms of perspective, again, I see some tiny things somewhere like that, that makes me wonder about the future. Like I see some very little few things here and there. But yeah, I think like to, like for me to actually say, okay, I can trust whatever, like I can trust in general what it spits out or what it gets from, from, from, from my thoughts.

Yeah. I think it would be still hard for me to use. Like at least again, to, to my like field scopes, like, like even on my open source project. Like I think I could try to like do some like annoying stuff from granny and like, but still feel risky.

Try some repo cases. They are fun. They're really fun. Or just see what it does if you just let it loose on an open source project. I did this a bunch of mini changes. Like, I just want to see how adventurous that it gets. And it, it gets bloody adventurous.

This is interesting. I don't know. Like the vast majority of things, like I get annoyed on my projects, like Windows related stuff because I had like this terrible, terrible idea to support Windows where I just could say like, no, but anyways, like, and, and, and I tried once. But you know what?

So fun thing. I had this situation very recently. I don't have, I have a Windows computer, but like, I don't want to boot for like one stupid bug fix. So one of the things that I did is like, I told Claude to fix the problem. It knows it's on a Mac and I told it, look, you can commit to a branch, you can push to GitHub, and then you can run the gh command to pull on the status report.

And so it actually iterated while committing to GitHub and use the GitHub runner as a Windows machine. It was, it was marvelous. It took it like three hours, but it was working. And I think it's like, it's kind of fun. Yeah. I mean, I could try for that, like, but yeah.

It was nice talking. Yes, absolutely. But thanks for, thanks. Yeah. Thank you. It was nice.

Talking Agentic Engineering with Giovanni Barillari

Transcript