back to indexTalking AI and Agentic Coding with Yury Selivanov

00:00:07.240 |
So we're going to discuss the Chantik coding and how useful AI is for us. 00:00:12.080 |
And potentially we're taking opposing sides here, is my guess. 00:00:18.920 |
So maybe also to set maybe some sort of level here. 00:00:22.320 |
Right now I'm basically alone with a co-founder, so I don't have a team more 00:00:26.120 |
or less, and so my Chantik coding experience is a lot of just two 00:00:30.480 |
And so you're probably in a different position too. 00:00:34.320 |
Well, first of all, I'm joining this to play devil's advocate because I kind 00:00:42.520 |
I'm using them myself, sometimes productively, sometimes less. 00:00:46.320 |
That said, the stuff that I want to discuss, it's real. 00:00:49.480 |
I think those problems are real and it's good. 00:00:53.280 |
I think ultimately it will help if people talk about those issues. 00:00:58.040 |
So I'm not going to be aggressive, but I'm going to be aggressively making 00:01:04.000 |
Even maybe sometimes sort of focusing too much on things that ultimately I 00:01:10.280 |
But that's, that's my role in this conversation. 00:01:13.160 |
I don't want pro AI people to, to find where I live and like, to a different discussion. 00:01:20.440 |
I love AI, but, but, but yeah, there are some, some, some issues with it that we need to discuss. 00:01:29.080 |
Uh, I'm a Python core developer for, uh, uh, since 2013. 00:01:33.120 |
Get a lot of stuff to Python, like async await syntax, uh, and, uh, um, in famous asynchronous generators. 00:01:42.520 |
And, uh, uh, uh, I, uh, I'm a co-founder and CEO of, uh, gel data where we try to fix some things in Postgres. 00:01:52.720 |
Uh, I've been, uh, a software engineer for, uh, for a long, long time. 00:01:57.040 |
Um, and, uh, I'm taking, uh, full advantage of AI. 00:02:02.160 |
At least I wanna, wanna, wanna think of myself this way. 00:02:05.040 |
Like I, I'm super excited about this technology. 00:02:07.440 |
I think this is, this is something new, uh, and, uh, that's why it's exciting. 00:02:11.160 |
Uh, and, uh, sometimes, uh, I, uh, I have some wins with it. 00:02:16.520 |
That said, Armin invited me to, uh, be devil's advocate. 00:02:22.360 |
And so I am basically unemployed since April this year. 00:02:33.640 |
And in the last couple of months, fell into a hole of going really, 00:02:39.640 |
And started a company just basically two months ago, uh, where I'm now taking 00:02:46.040 |
full advantage of authentic coding, maybe to, to a degree that's slightly unhealthy. 00:02:53.760 |
Do you want to play devil's advocate for a change? 00:02:56.520 |
So let's maybe start with, I mean, maybe one, one extra piece of context here. 00:03:00.800 |
is like, because I did a lot of Python over the years, I've also sort of decided that 00:03:04.960 |
this time around, I'm just going to, uh, sub check myself to the choices of the computer. 00:03:09.760 |
And I had the AI sort of run evals a couple of months ago, and I was trying to figure out 00:03:13.840 |
like, which language should I use if I want to end up with a chat decoding. 00:03:21.040 |
I think your, your chances of finding a co-founder, like dropping the more things. 00:03:26.640 |
I already have a co-founder, so I'm done, I'm past that, past that point. 00:03:36.000 |
Wasn't there like a, I think it was like a Y Combinator where it was like the, the goal 00:03:40.320 |
is like Y Combinator 1 is like the single person, billion dollar company. 00:03:44.800 |
So that doesn't count if AI is the co-founder. 00:03:47.440 |
Well, you have a few, you have JPD5 and you have three co-founders. 00:03:52.000 |
Let's start with building prototypes with AI. 00:03:54.640 |
I think that that's probably the part where there's the most agreement on that there's some 00:03:59.120 |
I'm guessing because seemingly Lovable exists and they have a bunch of money that they make 00:04:05.200 |
Maybe we start with, is it possible to take what seemingly we all agree on, which is like, 00:04:11.520 |
you can actually prototype really quickly with this thing and apply that also to prototyping in the 00:04:16.080 |
concept context of an existing code base or at the very least using it there. 00:04:21.520 |
And then maybe like to which degree does the same apply from your perspective also for just 00:04:29.200 |
So this, so this is basically where probably you and I agree. 00:04:32.560 |
I think that the coding tools, all of them, not necessarily Lovable, but Cloud Code as well. 00:04:37.040 |
And the cursor, obviously like even like vanilla ChatGPT, chat editor, if you want to copy, 00:04:42.880 |
paste code, they work amazing for prototyping. 00:04:45.840 |
My problem with that is that when you attempt to prototype something or build something within 00:04:51.520 |
the existing, within the context of existing code base, suddenly, suddenly it's, it's, it's much harder. 00:04:58.000 |
Suddenly you might find yourself spending a lot of time unproductively trying to replicate your recent 00:05:05.440 |
success with prototyping something that worked and basically just face, face a wall. 00:05:11.360 |
And this, this is kind of the, the start of the, I guess, of the conversation is that I think that 00:05:17.360 |
everybody agrees that AI is absolutely amazing for creating prototypes. 00:05:21.680 |
Productizing them sometimes is much, much harder. 00:05:25.040 |
And, uh, if, uh, and if you have a team of people, this is where, uh, things start slowly sort of, um, 00:05:35.920 |
Um, yeah, I had, I had, I had a lot of success with, uh, with building prototypes, but again, uh, 00:05:44.240 |
And why does it not work for you in, in, in the context of a code base? 00:05:47.520 |
Because I can actually very simply take the opposing view here. 00:05:51.600 |
I swear to you, like more than 90% of my, my infrastructure code right now is like a really 00:05:57.680 |
Uh, it, it basically is a, is a piece that gives agents mailboxes for emails and it's all. 00:06:06.560 |
But my point is mostly like, this is no longer a prototype. 00:06:12.640 |
And sort of like every iterative change that I'm doing to it is, it's like writing to a code base 00:06:19.440 |
I actually, I, I, I, in, in some way, I'm actually impressed how well it still works, 00:06:23.440 |
despite it being basically at this point, more 40,000 lines of code code, um, which is maybe not 00:06:28.640 |
the largest code base in the world, but it's definitely past the context size. 00:06:31.840 |
I kind of wonder like to which, where, where, where do you, like, why does it not work for you? 00:06:43.920 |
So I'm building a new tool actually for AI, uh, and to, to, to have it better integrated in my 00:06:50.800 |
And, uh, to do that, we're, uh, this is like a big spoiler or something that we want to launch, 00:06:56.080 |
But that thing plugs into your terminal, uh, like literally it acts as a transparent proxy. 00:07:02.240 |
And I wanted to understand how to build this thing first. 00:07:07.680 |
One approach is, uh, to basically take T-Max for example, and, and just instrument, uh, around it. 00:07:12.800 |
And, uh, the other approach would be to tap into Pty and whatnot, uh, and then like actually do the 00:07:18.960 |
proxy when you sort of like intercept, start in, start out and do all of that. 00:07:22.880 |
So my first task was to understand, uh, to what degree I can push T-Max. 00:07:28.480 |
I've used it a couple of times, but this is where, uh, I think cloud code, uh, shined. 00:07:33.200 |
At that time, um, uh, there was SunNet 3.5, I think, and, uh, uh, Opus 4.1. 00:07:42.240 |
So I had to use the expensive one, but the expensive one was amazing. 00:07:44.720 |
I had one day where I built six prototypes, uh, of this thing. 00:07:48.160 |
The first one was using T-Max, the other was like a slightly different thing. 00:07:52.240 |
And then I realized, let, let, let me try to tap into those things at like at a more fundamental 00:07:57.520 |
sort of IO level, because T-Max, I just couldn't see how I can have decent user experience. 00:08:01.920 |
Again, thanks to AI, the prototype worked quickly. 00:08:04.400 |
Started using Python and obviously hit the performance wall immediately. 00:08:07.520 |
So, uh, just, uh, just realized that, hey, like I'm gonna have milliseconds of 00:08:15.120 |
And Claude and I were managed to improve the performance to be decent, but then, uh, like 00:08:19.840 |
in pure shell, but then you run Vim inside that thing. 00:08:22.560 |
And like, suddenly, like, you can just see, like, you press a key and it takes some time 00:08:31.120 |
But I'm not an active, I'm not, I'm not a good rust engineer. 00:08:38.880 |
My first vibe coding, like actual proper experiment, I did this sort of night of like, 00:08:43.760 |
not sleeping with Mario Zechner and Peter Steinberger is we basically built the thing called 00:08:50.480 |
Bipe Tunnel, which was a PTY interceptor that streams the keystrokes to the internet. 00:09:06.240 |
And, uh, there are like a lot of things that I wanted to tap into. 00:09:09.120 |
Like for example, I wanted it to react to certain shortcuts. 00:09:12.160 |
I wanted it to be able to sort of like multiplex different terminals on one screen, kind of like 00:09:18.400 |
T-Max is doing, but do it like on a specific shortcut. 00:09:21.840 |
And amazingly, I built all that, all those things in one day. 00:09:26.080 |
But then I showed it to Rust engineer, the company who is actually extremely qualified. 00:09:32.160 |
Uh, like, first of all, like, oh my God, how much did you pay for this? 00:09:35.200 |
And I'm like 600 bucks in a day, which is insane, obviously. 00:09:38.880 |
But then, uh, his reaction was like, first he was shocked, but then, hmm, this probably sounds 00:09:44.960 |
about right because it would take us probably a couple of weeks to build all those prototypes. 00:09:51.280 |
But then we started looking at the code and obviously like it's, it's, it's just, just, just bad code. 00:09:58.880 |
Like I'm an expert Python engineer and I'm like decent, I guess at Rust, but this is where the gap 00:10:05.120 |
was basically for me, the Rust looked fine for him. 00:10:11.360 |
If, if I had, had to do this myself and I would just be, uh, uh, vibe coding this thing further 00:10:17.840 |
and further and further building product, I would 100% hit the wall where I would not be able, uh, 00:10:24.720 |
to progress much without me building myself deeper understanding of the, uh, of the programming 00:10:31.040 |
language and the problem of, of, of, of everything essentially. 00:10:34.880 |
And, uh, and we're going to chat about like that specific trap, uh, in, in a little bit, 00:10:39.520 |
but that's my complaint about building prototypes is that at some point of time, uh, you have to evolve 00:10:47.440 |
And this is where currently there are limits. 00:10:50.480 |
If you are an expert engineer and you know, well, the domain and you know, well, uh, the 00:10:56.000 |
programming language, then you can harness the power, I guess, of, of, of the, of the tool. 00:11:09.280 |
For me, the interesting thing is like, I think you're right. 00:11:12.080 |
Like you can definitely, um, I call this once, like you can, we can Vibe code a code base to death, 00:11:19.040 |
So my co-founder built a really great prototype for what it's worth. 00:11:22.480 |
For like, for, for, for a particular thing that we're exploring, which is basically like a machine 00:11:29.680 |
And it does interesting things in the context of email, but, but it was built on top of a combination 00:11:34.800 |
of Mailgun and Cloudflare workers and TypeScript and D1. 00:11:39.360 |
And it was very, very, very little input of me. 00:11:41.760 |
Um, and after three weeks, it created the code base. 00:11:44.960 |
It had like multiple tables, duplicates and all kinds of stuff in it. 00:11:50.880 |
One of which is, well, that's irresponsible, but it's not really because like the whole point 00:11:55.840 |
is like figure out, do we even want to build this in the first place? 00:11:58.160 |
And, and so my version of this was then to say, okay, so, so what, what is it that we're 00:12:04.560 |
And I also used, um, a combination of Codex and Cloud for it. 00:12:10.800 |
But I think like in a responsible way where, where like there's certain things in, in software 00:12:15.040 |
engineering where it's really, really critical that you get it right. 00:12:19.600 |
Is the problem that you're solving even scaling in the right way? 00:12:22.720 |
And not everything is a scaling problem, but like you can definitely build a bunch of 00:12:28.320 |
And then obviously the code, the code can't look like complete trash. 00:12:31.680 |
What I found at least is that there's a huge difference in my experience between the kind 00:12:36.160 |
of code that you get when you get an AI to program Go as an example, or even Java or PHP versus for 00:12:41.920 |
instance, when I get an AI to write when they write JavaScript or when they write to some degree 00:12:48.800 |
And I think it has a lot to do with the complexity of the abstractions that are typically in place. 00:12:53.920 |
Because one of the reasons I think the TypeScript code base is just in a terrible state is because 00:12:58.240 |
it just created the same abstractions over and over in different files and never found them. 00:13:02.560 |
And in Go it didn't even get the idea to write the abstraction in first place because Go is just 00:13:07.200 |
So I think that to some degree, I think it, like you definitely have to use your brain. 00:13:11.760 |
Like even in the Go code base, I think if I wouldn't have refacted the hell out of it, 00:13:16.080 |
it would probably have made a terrible mess too. 00:13:18.720 |
But I think for as long as you commit yourself to wanting to uphold a standard and a quality bar, 00:13:25.600 |
It doesn't come for free though, but I'm still quicker than if I would have to write 00:13:32.720 |
That's the main problem is that it's like, I think we should move like to my next talking 00:13:38.960 |
point that I have like a little list of things. 00:13:41.440 |
Like my next talking point is directly related to this. 00:13:44.880 |
This is time and expectation management, which is that you have no idea will it work for you this day 00:13:53.280 |
Sometimes you want to do something complex, you prompt it and it just punch shots it for you. 00:14:03.520 |
You maybe do a little bit of cleanup, write the commit message to sort of make it not look like 00:14:11.040 |
Like go solve your next task or go walk in the park. 00:14:14.000 |
But sometimes something easy can take hours of work. 00:14:19.040 |
Like it doesn't work this time, but it's like almost there. 00:14:21.600 |
So it's like, it feels like 15 more minutes and you're done. 00:14:25.200 |
So you spend the time and suddenly like three hours pass and it's still like where it started 00:14:30.800 |
And then you can end up in a situation where the whole day is wasted and you haven't done it yourself. 00:14:35.200 |
Like maybe you would solve this task in two hours of coding. 00:14:40.640 |
I sometimes look at AI coding tools like this perfect dopamine cycle. 00:14:47.200 |
It gives you a kick when it works and then it doesn't work. 00:14:50.400 |
So it gets you slightly depressed, but then it works again. 00:14:52.720 |
And I'm not sure that ultimately it saves time. 00:14:54.960 |
Like again, like there are clear examples when it does. 00:14:57.840 |
I think AI perfectly replaces stock overflow for a lot of times. 00:15:01.920 |
Like seven times out of 10, like it will give you the correct result. 00:15:06.000 |
And when it doesn't, it's kind of easy to see and check. 00:15:15.200 |
Perceptually, I have a feeling that I can save a lot of time using AI. 00:15:18.960 |
But when I'm objectively looking back at that time, I'm no longer sure that that was the case. 00:15:24.800 |
So I think it saves me time and I think objectively so. 00:15:28.960 |
But I think that my strong suspicion of how to make it work, and this is sort of like greatly 00:15:34.880 |
extrapolated from a data point of me, is I think, so first of all, my opinion has dramatically changed. 00:15:40.880 |
And I've told this people before, prior to me, I was like, all right, this is just a bunch of nonsense. 00:15:45.440 |
But what I think like the reason why I feel like I can read the machine, like I have an understanding 00:15:50.320 |
of what the hell it's doing is because the weird thing is that there is no learning curve. 00:15:57.120 |
There's a hill to walk up on, a very, very, very long hill. 00:16:02.400 |
And you walk up this hill for like two months and then you feel like there's enlightenment, 00:16:06.000 |
because now you know what the machine sort of does and doesn't do. 00:16:09.040 |
So it's not about like you, there's a learning curve and you have to learn the machine. 00:16:14.960 |
It sounds so ridiculous, but because like the whole point about is that you need to understand 00:16:18.720 |
what is a task that it will not be able to do, because otherwise you're going to run into 00:16:22.000 |
a situation that you spent three hours on a goddamn thing that will have taken you 15 minutes by hand. 00:16:27.120 |
And because of this dopamine thing, you don't really feel like, because there's progress all 00:16:33.600 |
And like, if you actually measure the time that you iterate on the damn pull request, 00:16:38.720 |
But the whole like, if you actually spent the time sort of like to onboard yourself 00:16:43.280 |
into that properly, the number of cases where you're going to run into this kind of greatly 00:16:48.000 |
diminishes because you recognize ahead of time what's going to work or not. 00:16:52.080 |
And actually, I made the same experience at one point with Rust, where I felt the only way in which 00:16:56.960 |
I could be productive at all in Rust was to figure out what I couldn't do in the language. 00:17:00.560 |
And whenever I had a problem, that was like, you have to self-borrow. 00:17:03.120 |
You have to, I don't know, do a bunch of stuff that languages can't do. 00:17:06.400 |
You just have to recognize very early on that you're doing something that will not lead to 00:17:10.400 |
success because otherwise you just, you're grinding out there with no progress. 00:17:14.480 |
And so I think like for as long as you steer away from the shit that doesn't work, 00:17:19.520 |
My problem is that again, like you walk up that hill and then your module is released. 00:17:24.240 |
And then the hill suddenly is slightly different for you. 00:17:31.120 |
Like with any tool that we've had before, be that the programming language or 00:17:38.000 |
All the knowledge that you have accumulates and you build on that knowledge. 00:17:44.080 |
Sometimes, sometimes if you forget to like reset context and then the context overflows 00:17:48.960 |
of Claude and then just like starts mumbling and doing stupid things, you have to actively 00:17:54.160 |
pull yourself and realize, hmm, I should probably just start a new chat. 00:18:03.280 |
Like it just gives up and like, you should do that. 00:18:11.680 |
And it's, it's, it's kind of ridiculous and cool. 00:18:14.480 |
And sometimes even refreshing to find yourself in this, but oftentimes it's also just frustrating. 00:18:21.280 |
And you can end up in this situation pretty easily. 00:18:22.400 |
Again, my main point here is that it's not clearly a hill. 00:18:26.320 |
It's something that you, some, some surface that you see like 30 meters in front of you. 00:18:32.160 |
And you have no idea if it's uphill or downhill. 00:18:39.920 |
The irony is that I think like, unless I would have actually had the time to, to like not 00:18:44.160 |
have pressure, I mean, I had a pressure to start a company, right? 00:18:46.560 |
That was an internal pressure, but it wasn't like, it has to happen by day X. 00:18:49.760 |
And so I basically had a bunch of time to deal with a bunch of shit and to figure this out. 00:18:55.520 |
And what I realized when I talked to a bunch of people in a company, I was like, well, 00:18:59.040 |
my leadership makes me want to use AI tools and it doesn't work for me. 00:19:02.800 |
I was like, yeah, if, if, if you would basically have to use like 20% time, 00:19:06.000 |
whatever they call it at Google to figure out how this shit works. 00:19:08.400 |
Like you wouldn't make a progress because it just, it, it's like, it requires trial 00:19:15.600 |
That's like, there's, there's no guarantee that your problem tomorrow is going to be like 00:19:19.440 |
within sort of your, your feeling of like what the machine can do or not. 00:19:22.640 |
And the model might change and they might, I don't know, regress something. 00:19:26.480 |
Like there was a time when Antropic had a bunch of server errors where like 00:19:29.840 |
the quality of the model went down and you wouldn't know. 00:19:32.320 |
It's just like, you got to post more than a month after. 00:19:34.800 |
It's like, yeah, your shit didn't work quite as well as it used to. 00:19:37.840 |
So I, and, and individually you, you, you feel like gaslit by this thing all the time 00:19:42.560 |
because like, I feel like this kind of problem I did successfully before. 00:19:51.040 |
But, um, but man, I think the reason it's called vibe coding to some degree is because 00:19:56.320 |
there are these, these feelings where you get a sense of like, does it work or not? 00:20:00.640 |
And it's, it's, it's mind bending in a way because as a programmer, you're used to determinism 00:20:05.360 |
or you want to, you chase determinism as much as you can. 00:20:08.080 |
And now we have like, yeah, fuck determinism. 00:20:15.360 |
Particularly if you try to use these things for like, also like building an agent or something. 00:20:21.200 |
I'm with you on the, on the, like, there's this, this part about it. 00:20:24.960 |
But I think the question is like, what's the percentage of it? 00:20:26.880 |
For me, the feeling of the percentage of like the shittiness and the weirdness, it's like 20%. 00:20:31.520 |
And it's not 90% because it's, it's only that small thing. 00:20:36.400 |
I still feel like I get a lot of value out of it. 00:20:38.720 |
But I heard a lot of people say like, well, nice. 00:20:40.880 |
This is like this 90% of my problem is the shit doesn't work like I want. 00:20:44.240 |
And so the 10% improvement is just not worth it. 00:20:46.640 |
Like, yeah, if that's how you feel about it, I can see it. 00:20:49.360 |
It's just doesn't really, I don't feel like that. 00:20:52.400 |
By the way, I have a feeling that by the end of this conversation, you might stop using AI 00:21:07.680 |
I do have a thing here, which is a time management and expectation management. 00:21:13.280 |
And I think like the main way in which it saves me time is concurrency and parallelism. 00:21:17.680 |
I feel like I'm solving multiple problems simultaneously in one way or another. 00:21:22.080 |
I think a pretty big way in which I do actually save a lot of time is that 00:21:25.360 |
there's a bunch of problems that are going on, which are fully solvable 00:21:32.000 |
And for, for someone who runs a very lean organization right now, 00:21:37.040 |
And it's like, even, even Mitchell Hashimoto mentioned this recently. 00:21:42.320 |
It's like, you can spend time with your kids and you feel like productive still. 00:21:48.560 |
This idea that even when I'm doing something, maybe it's unhealthy too. 00:21:51.920 |
Like there's probably some version of this where like, you should really turn off your brain 00:21:56.960 |
But the fact that I can sort of multitask is for me really the thing that gives me the biggest 00:22:08.240 |
I've heard multiple times that people who multitask have perceived the perception of 00:22:12.880 |
them being productive and in reality, they are not. 00:22:15.120 |
So I think this is like one of the traps again, when you can be fully convinced that you're saving 00:22:23.760 |
And the problem here is that it's really, really hard to really benchmark it and get 00:22:27.920 |
a definitive answer if it works for you or not. 00:22:33.040 |
One week with AI and another week without AI, it's just not going to work, right? 00:22:36.800 |
Finding equivalent problems and then sort of benchmarking against that is also like almost 00:22:44.480 |
I guess, I guess what will happen is that those tools will keep improving. 00:22:47.680 |
And at some point of time, the advantage of using them is going to be so clear. 00:22:52.320 |
That's going to be hard for people like me to even make this argument. 00:22:55.200 |
But like, at least right now, I find myself as a senior engineer, software engineer, 00:23:00.400 |
to be not fully convinced that it saves me time. 00:23:02.960 |
I'm still doing it because I'm enjoying part of this process. 00:23:06.080 |
And sometimes I want to be this dopamine junkie and play with the stack. 00:23:10.960 |
And sometimes I have a feeling that I 100% know that AI will work here. 00:23:17.680 |
But definitively, I can't say yet that it's 100% improvement in like me writing code myself. 00:23:27.360 |
Sometimes, yes, like this example with prototype, 100% AI helped. 00:23:31.440 |
Day-to-day in a more complex code base, I'm not so sure. 00:23:36.320 |
Like I had plenty examples where I wasted, it felt like I wasted days of 00:23:42.800 |
I got something in return, so it's not completely like a loss. 00:23:46.400 |
But this is why I feel like it is because this is a problem that's also hard for an engineer, 00:23:53.120 |
Because I, for instance, like I don't work at Sentry anymore, but I did play a lot. 00:23:58.320 |
The last couple of months was like, how good can I do authentic coding on the Sentry code base? 00:24:01.600 |
But since it's sort of fair source out there on GitHub, so I can still fuck around with it. 00:24:05.040 |
And I found it equally frustrating to work on that code base with the agent as myself. 00:24:10.960 |
Because we have created a monster of like a multi-service complex thing. 00:24:21.360 |
If you build a large thing with lots of components at scale, a bunch of stuff is not fun. 00:24:29.920 |
I'm just pointing out that it is actually hard. 00:24:32.160 |
Like production services, yeah, it's complicated. 00:24:34.640 |
And I think like if I were to take a lens and I said like, well, 00:24:37.360 |
the problems that the agent runs into are the problem every engineer runs into. 00:24:41.440 |
And one of the ways in which to become productive with a large company is to 00:24:45.600 |
figure out which things not to do because there's pain on the other side of it. 00:24:49.680 |
But in some ways, the reason we have developer experience teams and 00:24:53.280 |
companies is just to reduce the total amount of like, I don't know, like potential code 00:24:59.280 |
changes that you have to do where you'll run into a wall and the wall is just terrible developer 00:25:06.400 |
So I wanted to which degree sort of what you run into is also in parts that it's actually 00:25:11.440 |
not great for a human either to work on this. 00:25:13.600 |
I'm not saying you have a shitty code base, just to be clear. 00:25:15.520 |
I just wondered to which degree this is limited. 00:25:21.360 |
But I want to shift gears a little bit and talk about the implementation of those tools. 00:25:27.120 |
And I think it's a little bit related to this problem. 00:25:34.560 |
And even if the context is huge or advertised to be huge of like millions of tokens, in reality, 00:25:40.800 |
we know that it's like the performance degrades after like 32,000 tokens or something like that. 00:25:45.520 |
So you, and this is the problem of context management. 00:25:47.920 |
And this is the weirdest part of me of the, of this whole AI revolution, because we have this 00:25:54.400 |
amazing LLMs and it really feels like the future, but under the hood, it's powered by principles from 00:26:03.680 |
Like you as an engineer pick up what you want to feed that beast. 00:26:07.920 |
And that feeding happens on a lot of assumptions, a lot of hard code and luck. 00:26:15.360 |
You can trust an LLM to itself form the context. 00:26:19.040 |
So you have to feed it with something, with your prompt. 00:26:22.560 |
And then your ID adds some context around it. 00:26:27.760 |
Maybe some settings in your, I don't know, PyProjectTOML or CargoTOML or something like that. 00:26:34.720 |
Like you just, just, just wait for magic to happen. 00:26:37.360 |
But it's an important bit that actually feels like we don't have any progress there at all. 00:26:42.400 |
This, this, this notion of forming the context. 00:26:45.280 |
And when you are just building a prototype, the context is empty. 00:26:48.640 |
All you have is your prompt and maybe a couple of desires, like write it to me and go or whatever. 00:26:54.560 |
Or when the project is small, it might all fit there. 00:26:57.200 |
In the compass code base, context forming is extremely hard and it might not necessarily be. 00:27:01.760 |
Those files that you have in your ID might not be necessarily relevant. 00:27:05.680 |
And asking you to just drag and drop tabs meticulously for every prompt when you select 00:27:10.960 |
which files it should focus on also feels like a drag, right? 00:27:17.120 |
I mean, the whole context management thing, it feels completely sort of out of sync with the 00:27:23.600 |
So I guess my question to some degree is like, how do you work? 00:27:26.240 |
Because, um, I, I'm the kind of person that really only makes progress in anything that I'm 00:27:32.480 |
doing when I talk to another person in some bizarre way. 00:27:34.880 |
And, and maybe I talk to like another version of me, but I have to talk for a problem. 00:27:39.600 |
Like I have to, I have, sometimes there's weird ideas like this shit is going to work. 00:27:43.520 |
And then as I'm sort of like working myself for it. 00:27:45.760 |
Like I just actually, it's just like, there's a bunch of holes in it. 00:27:47.840 |
And so for me, almost a natural part of solving any problem is to work my way through it. 00:27:53.760 |
And that lends itself really well to agenda coding because that turns out to feed all the 00:28:00.000 |
And now I just need to get the machine to do it too. 00:28:02.240 |
So like, I just really just have, I just talk to my computer a lot and it's a byproduct. 00:28:09.520 |
The context problem is almost in quote solved because it's just, I don't know, it, it doesn't, 00:28:14.480 |
to me in a way, it feels like I'm almost always anyways, working in a way where I just feel the 00:28:20.000 |
context either for myself or for the machine. 00:28:22.160 |
So it's not such a huge departure, I guess, to, to be a little bit more descriptive to me 00:28:30.400 |
And so maybe that also is why I maybe encountered a little bit fear of problems because I don't, 00:28:34.160 |
I think you call it a drag of like providing all the files in context. 00:28:37.520 |
I don't know, for me that has always been natural. 00:28:39.360 |
I used to maintain this, like this, this files where I write down all the steps that I want to 00:28:45.840 |
Because I feel like that gives me a better understanding of what my change is or how 00:28:50.720 |
Like a lot of it refactoring, it's always like, you got to do this first because otherwise the 00:28:56.640 |
And so that to me is like constant engineering that you even do without AI. 00:29:04.880 |
And I don't know if I even want a machine that figures the shit out itself because then like, 00:29:09.680 |
I feel like my role is to provide the context. 00:29:13.920 |
So I don't know, is it not, is it not related to you in a way that I feel like that's the 00:29:22.880 |
Like what you're saying, like for example, I have ultra wide monitor. 00:29:25.920 |
Because like when I'm writing software, I want to have multiple columns and I want 00:29:30.960 |
I have sometimes the same file open like side by side at different locations because I need to 00:29:37.840 |
So just like you, I also have to have context before I'm writing code. 00:29:42.560 |
I have to understand what files it will touch and what files are related in the project and 00:29:48.400 |
Like, like naturally, I think any engineer does that. 00:29:51.760 |
Maybe like one of the failure points for me is that I'm not talking to the computer. 00:29:56.240 |
I just recently started experimenting with that. 00:29:58.400 |
Just like bound, like double tap on a fan key to capture audio and transcribe it. 00:30:06.160 |
So I don't yet have definitive answer if it works or not. 00:30:09.360 |
But still, sometimes you might have a file that has 5,000 lines of code in it, which for what 00:30:18.080 |
It's, it's, it's, it's, it's, it's also weird, but I can see Claude losing the threat often 00:30:28.080 |
Sometimes the problem is just like that you have to touch 20 files to solve it. 00:30:34.160 |
Context smaller, like to subdivide this problem into smaller problems. 00:30:38.320 |
Sometimes I realize, hmm, it will take me like three times longer to write this prompt, 00:30:43.200 |
to explain it, everything in nitty gritty details, like what it's supposed to be doing. 00:30:47.600 |
But if I do all of that work, maybe just easier for me to just go and do it myself 00:30:52.080 |
at this point when I'm explaining everything. 00:30:54.480 |
So it's, it's like really, really hard, at least for me. 00:30:59.520 |
Did you, did you see a difference between, I know that you also work on CPython, right? 00:31:02.800 |
So did you see a difference from working on CPython versus like on GEL, for instance, 00:31:08.320 |
So I'm not actively working on CPython, but recently I rewrote UUID implementation 00:31:14.800 |
from Python to C. It definitely helped. Like, I have a feeling that it helped, 00:31:19.200 |
but also it was a clean slate problem. And also LLMs are amazing at translating 00:31:27.200 |
a thing from one language to another language. So this is where it helped. 00:31:31.040 |
It still made like a lot of non-idiomatic to CPython things. So ultimately not, I doubt that 00:31:37.520 |
there is like a single line of code that survived ultimately. I rewrote the whole thing, but it did 00:31:41.920 |
save time because a lot of boilerplate was, was, was, was reduced for me. I didn't have to copy, 00:31:47.040 |
paste and change things as I usually do. LLM can do that for me and then I can edit. So that's my 00:31:52.160 |
recent contribution. And again, like it's a bad example because it felt like a new thing. I should 00:31:58.320 |
I found, I found one kind of interesting failure case. And I talked with Lukas at Europython this, 00:32:04.880 |
which is I actually found it to be much harder to work on code, which is in the training set, 00:32:11.680 |
then on code that's not in the training set. Interesting. 00:32:14.080 |
Um, and it's, it, I think it's sort of counterintuitive, but it, it might also make 00:32:20.400 |
sense because one of the problems is that you basically work on something that's overrepresented 00:32:24.320 |
in the training sets. Like the CPython code base, probably there's, there's a lot of it in the training 00:32:28.480 |
set, but, but the one that you're working on is actually not the one that has seen. It's the, 00:32:33.200 |
it's, it's like the nine months newer version of it. And I, I've also encountered this with 00:32:37.520 |
the century code basis. Like it, it, there are millions of forks of century out there, 00:32:41.280 |
some of which are really, really old. So it, it just hallucinates code together, 00:32:45.200 |
which has not been in the code base for a long time because it thinks it's still there. So that I 00:32:51.280 |
found really interesting because that's an entire class of problems that where, where, where, like, I don't 00:32:57.200 |
know if it's going to be a representative going forward, but for instance, like you get this idea 00:33:00.800 |
that like, if you have this huge context and like all of your shit is in the training data, then it 00:33:05.840 |
should be so much better. But the reality for me at least is that unless it's sort of day-to-day, 00:33:10.320 |
up-to-date, which unlike to be is, it's actually not helpful. So now I actually remember there was one 00:33:15.760 |
problem with my UUID work at the Python core sprint, writing it in C. The problem is that Python C API 00:33:22.000 |
has like an old style way of declaring a module, Python module in C, and there is a new way. New way is 00:33:30.640 |
like two phases in civilization. There is like a notion of module context where you sort of put your global 00:33:37.680 |
variables and whatnot. There is something required for you to do if you want to use free-threaded Python 00:33:42.880 |
in the future or sub-interviews, stuff like that. So it's required. And obviously if you are working on 00:33:50.160 |
Python code base, anything new that you do must, must follow that new thing. And it's quite different API, 00:33:56.400 |
like internally, just the different arrangement of the code and type declaration. And there are some 00:34:00.400 |
gotchas there. And this is where it felt miserably. So essentially it was insisting on doing things the 00:34:05.040 |
older way. And yeah, I had to actively fight it. And ultimately, I think I give up and just focus 00:34:11.200 |
on like little things, like write me the body of this function. That's the part where I feel like 00:34:14.960 |
I kind of want to talk about open source a little bit and the impact that this whole thing has on it. 00:34:18.640 |
Because that's the part where I might actually take the opposing view and say, it's going to be terrible. 00:34:23.120 |
Yeah, yeah, yeah. I think we might align. And again, you might stop using AI for this conversation. 00:34:28.800 |
I have a lot of thoughts on that. I'm not sure. But I actually, I worry about this quite a bit because 00:34:32.880 |
I actually feel like, I actually could take the view here where it's like, unless we're really 00:34:39.280 |
careful, we're going to make a huge mess of it. The whole thing of open source for me has never been 00:34:44.560 |
that we need more open source. I always felt like what you need is like open source libraries are sort 00:34:51.120 |
of common problems that a lot of people are sort of banding together and they're delivering the best 00:34:55.280 |
quality code that can be so that we also overall can build better companies. So that to me was like 00:35:01.120 |
the idea is like, if there's a really hard problem, you get the best people together, you cooperate 00:35:05.280 |
together, you build this thing, and then everybody is hopefully going to leverage this. With the cost of 00:35:10.000 |
code generation going down and seemingly everybody loving the idea that GitHub stores is all that 00:35:14.560 |
matters. And we're going to have millions of modules on NPM and like having thousands of dependencies 00:35:19.280 |
of an application is actually the way to build shit. Actually, I think like because it is such a 00:35:24.880 |
predominant view, this, this can only end terribly with so much more actual real slop going around. 00:35:33.200 |
So I don't know, do I have a counter to that? Because my view on that is actually, unless. 00:35:37.040 |
I have a lot of thoughts on that. First, let's just, just to get that elephant out of the room. 00:35:42.080 |
I think there could be a more fundamental problem with all of this is because it's proven now that if AI 00:35:49.600 |
is trained on content generated by AI, the quality degrades significantly and the open source is just 00:35:56.560 |
this perfect mixer of things. Because now you have pull requests partially written by AI. It's like 00:36:01.680 |
really hard to separate which part of it is written by AI, which part of it is written for human, even for 00:36:06.320 |
humans. That's part of my few complaints in a minute. But it's also hard for AI and for all those training 00:36:13.120 |
pipelines or whatever they have to create those modules. So I'm not an AI researcher, but like I 00:36:19.920 |
have enough in my context to know that this is a problem. So I'm a little bit worried about this. 00:36:24.560 |
We're going to see a lot of new code written, potentially 10x, maybe 100x new code written. There's 00:36:32.160 |
going to be a sea of new stuff and like who knows which part of it is AI, which part of it is human brilliance. 00:36:37.440 |
So with that out of the way, problems that I see, like let's get back for example to this comfortable 00:36:43.440 |
example of UUID. It's sort of required by CPython contribution policy that you sort of notify people 00:36:49.920 |
if you use the AI or not. And here I am. I don't know. Like I really don't know how to answer this 00:36:54.720 |
question. So it's 20, 100 lines of C code, which is significant. Most of this code is kind of mundane. 00:37:02.000 |
This is not like, I don't know, dict object C where we have a complicated pointer, arithmetic 00:37:07.280 |
magic and whatnot. It's mundane C, but it's still C. So it's sharp. You can die there easily and take 00:37:14.560 |
the interpreter with you easily. So it has to be reviewed. And responsibly, I wrote that code 00:37:19.840 |
responsibly. So I'm not, I don't have a single line where it's just AI generated and I haven't 00:37:24.560 |
touched it, rewrote it or reviewed it really carefully. But then the instant I'm saying I use the AI for this, 00:37:31.200 |
the whole thing is like dismissed because people, nothing that's 20, 200 lines of C. He's not an 00:37:36.080 |
insane person. He probably generated half of it. So it dismisses my work now, but also I myself in a 00:37:42.480 |
similar situation, even like I know Lukasz, for example, he submitted that and he said I use the AI. 00:37:47.280 |
I would also be dismissive in this case. So how can I even trust people now with this kind of stuff? 00:37:52.960 |
Like what is the social dynamic here? The social dynamic is like really, really hard. And I was, I think like, 00:37:59.680 |
this is one of the reasons why I feel like I don't even want to take a side on anything when 00:38:05.120 |
it comes to the social dynamic, because like, this is just going to have to play out somehow. 00:38:08.560 |
But one of the areas where I definitely noticed, like, this is, is all kind of wonky. It's like, 00:38:12.320 |
if you, I definitely have released source code out there, which is a hundred percent AI generated. 00:38:17.600 |
And one of which was this Vite plugin. It's a very simple Vite plugin. It just forwards the console 00:38:22.080 |
log to the Vite logger output so that you can see the console log in the browser in the terminal. 00:38:28.160 |
A hundred percent AI generated. And I, and then I was like, okay, so I'm going to publish this now. 00:38:31.840 |
And then I wrote a license under Apache 2. And I also said like, if that applies, because quite frankly, 00:38:37.760 |
laws in the US, courts in the US have already said like, that's not human generated output. So there's also the 00:38:43.520 |
question of like, if you actually have a significant amount of, of code being traded by, by AI, like, 00:38:49.840 |
does it still cross the threshold of what we say, like, that is actually genuine human creation worthy 00:38:54.880 |
of, of, of copyright. So like, even on that level, I'm now starting to like, look at a lot of source 00:38:59.840 |
code out there. Like I've, I ran into a company where I, they have code on GitHub. I think it might be, 00:39:06.720 |
I don't know if it's open source license or if it's just happens to be in GitHub, but, 00:39:09.600 |
but I found like implementations, uppercase implementation_summary.md in the full of emojis 00:39:16.240 |
somewhere in the code base. And I looked at the code was like, this is like, someone is vibing hard 00:39:20.960 |
here. And, and, and if you, if you hang around and sort of the startup kind of ecosystem right now, 00:39:27.280 |
there's so much of people throwing shit on GitHub, which is probably just a full AI output. 00:39:33.440 |
And how are we going to respond to that? Like it's one thing of seeing this against like a pull request on 00:39:38.240 |
an like established open source project. And maybe there's at least something in place, 00:39:41.280 |
but there's a, there's going to be a whole range of people, which is like creating this amalgamation of, 00:39:46.240 |
of, of, of, of different kinds of things there, which are just regurgitated in some way, human output. 00:39:52.880 |
Yeah. It, it, it erodes trust. That's the problem. 00:39:56.880 |
Because I might know that, Hey, like, well, I'm not going to be using code written by you anymore, 00:40:01.920 |
but like, for example, let's say Lukas, like I, I know Lukas well, he's, he's a brilliant engineer. 00:40:06.880 |
So here's the library written by him. And I'm pretty sure that if this let's, let's assume 00:40:12.240 |
it's something like to do with like high performance IO. And I know that Lukas would like look at 00:40:16.640 |
everything, everything that touches like the core logic and the, it will be solid, but there is a lot 00:40:21.200 |
of code on the outskirts of it. So if there is like disclaimer, I use the AI here, can I trust that 00:40:26.400 |
that code is fine? Maybe he didn't even review it. I don't know. Maybe it just appears to be working. 00:40:31.840 |
So this whole, this whole problem of like me trusting someone personally or some organization 00:40:36.560 |
personally. And, and, and just, just because of it's that organization, I trust that the code 00:40:41.680 |
is good. I feel that no longer applies and it's huge. 00:40:44.640 |
No, I, I shared a concern a lot actually, because I don't, that's the thing where like, I, I don't actually, 00:40:50.800 |
there's one thing where I know like what the machine does to me. Right. But it's a data point of one, 00:40:55.920 |
you know, in a rather narrow set of things. Right. And so I cannot even argue, like I, I trust the AI, 00:41:01.680 |
I don't trust the AI. It's like the only view I can take is like whatever I create together with the AI, 00:41:08.320 |
But I don't think that's the view that most people take. And I don't, and there's like no social standard 00:41:12.800 |
for it, nor anything. And it would be irresponsible for me to say like, well, because my experience is 00:41:17.360 |
this, that's sort of like what everybody's experience is going to be. And, and we're also very early in 00:41:22.720 |
this because in some ways we're, we're doing like this version of open source agent decoding for like 00:41:27.280 |
six months, give or take, even less, I think in some ways. Because for sure, what I see is that 00:41:32.560 |
after cloud code and after new codecs, there's a lot more, like we're way past this sort of little bit 00:41:38.480 |
of autocomplete, where you were at least like very actively paying attention to every single thing that 00:41:43.040 |
sort of autocompletes out. Now we're like, well, let it run for like 15 minutes and then let's do some 00:41:48.160 |
code review here. So I, I'm with you on the, on the trust part. And I also think that one of the 00:41:52.640 |
problems with this is now it has created, look, there are also positive things, right? I think 00:41:56.160 |
like creating repo cases for bug reports, great time saver in the context of open source. Because 00:42:01.680 |
you used to get this, this bug reports of like, well, shit doesn't work. And then the only thing 00:42:06.000 |
I could do is like works on my machine, give me some more details. And now, now I get this to like 00:42:10.160 |
copy paste this, try to make a repo case and usually gets one. So, so those are positive things, 00:42:15.440 |
but I'm, I'm definitely suspicious of pull requests now. And it has made a lot of things much cheaper 00:42:20.160 |
than actually were pretty good that they were not cheap before. Because some of, some of the issue 00:42:24.240 |
reports are clearly not even human generated. They're also like AI generated. And security reports 00:42:29.040 |
against libraries, they're like partially completely, Claude, invent some security issues so that can 00:42:34.160 |
annoy some people to get a CVE for. Like all of it's just insane now. It's like objectively insane. 00:42:40.560 |
It's insane that the well-written issue is a red flag. Or code with comments like, no, like, I'm not 00:42:48.240 |
going to touch it. I used to like, if I, if I got a long, like a long issue against one of my project, 00:42:54.960 |
I was like, oh, this is like going to spend some time reading for this. Like, this is great. And I 00:42:58.240 |
get a long issue. It's like, oh my God, someone went hard on Claude here. But it's just like, this is, 00:43:04.160 |
this is the trust eroding part. I really, really hate. Because like, that's, that's not so much about like, 00:43:09.280 |
what the machine does with you. It's like what, what we as a group of engineers find. Like, I find 00:43:14.400 |
it irresponsible when people sort of shit some AI stuff into an issue tracker and don't declare it. 00:43:19.920 |
Or into my mailbox for what it's worth. I get so much email now that looks like slop. And I, I don't 00:43:26.400 |
even want to engage with that person. Well, the idea is that you don't. You ask AI to do that and just 00:43:31.840 |
never check your email again. No, but that's really trust eroding and I hate it. I really, really hate it. 00:43:41.520 |
We're going to talk about it in a second. That's, that's, that's the thing. Like, 00:43:46.640 |
trust, trust erodes, not just like between me and somebody who I know or, or know about. 00:43:54.880 |
Trust erodes within the team. That's the thing. Like, it's hard to understand. It's hard for people to 00:44:00.480 |
sort of know what works and what doesn't work. And sometimes you can see a pull request. I know that, 00:44:06.320 |
that the code in that pull request is written by human and I know that humans, so that is fine. 00:44:10.640 |
I can review it. But then tests. And I see that tests are generated by AI and it's such a common 00:44:16.800 |
sentiment online that, hey, just make it run tests. But you know that probably that having a big test 00:44:22.400 |
suite might not necessarily be a good thing. Like if you have, if you have duplicate tests or if you have 00:44:28.080 |
too many mock tests, it's, it's, it's, it's, it's, it's worse than not having tests at all. 00:44:32.400 |
Sometimes tests are expensive, extremely expensive to maintain, to evolve, to do anything. And if you 00:44:37.600 |
heavily rely on mocking, for example, you might not have a test suite at all. You might have an illusion 00:44:42.400 |
of having a test suite. And AI is really good at creating this illusion of being able to write tests. 00:44:48.080 |
And because tests are always treated like a second class citizen, but a lot of engineers say, like, 00:44:52.240 |
I'm not going to be investing much time into reviewing that part. And let's just hope that it 00:44:56.000 |
gets it right. I can see myself just like really being scared of, of, of the situation. Like we're 00:45:01.200 |
just not going to write high quality software because high quality software demands having nice tests. 00:45:05.920 |
So I think like, this is sort of a meta point for me in general. I've written about some Twitter 00:45:11.600 |
autonomous. The, the, the quality of AI generated tests is like, it's so bad. That's, that's the part, 00:45:17.920 |
that's real slow. But I also think that this is, is to a large degree because 00:45:22.000 |
like we as engineers also suck at writing tests, like really, really bad because I, I've seen many 00:45:26.880 |
more bad tests, mocks, for instance, I hate them. There's so many situations where it mocks out the 00:45:32.160 |
part that's actually the one that's most likely to fail and then it fails. Right. Or, or like so many 00:45:38.160 |
tests where people just write integration tests that, that rely on bizarre sort of side effects in the 00:45:43.840 |
system somewhere. Like let's put some sleep here. Let's do some, like all of this. Right. And, and you're 00:45:49.120 |
going to get a lot of these tests generated by AI now, but they were already pretty bad test patterns 00:45:54.640 |
before. And yeah, we treat them as second-class citizens. That's, that's. I think like as engineers, 00:45:58.640 |
we don't know at scale, we don't know how to write good tests and we're even rewarding people for writing 00:46:03.760 |
bad tests. That's the problem with tests. Is that like the more you sort of progress in your software 00:46:10.720 |
engineer career, the more you understand that tests are hard and yes, mock tests are horrible. That, 00:46:15.680 |
that knowledge internalizes in time. Usually companies go down because of that lack of that 00:46:20.480 |
knowledge, but tests are also extremely demanding, as you said, and to not have mocks means building 00:46:25.920 |
infrastructure for tests. And sometimes you might have, you might end up building more infrastructure 00:46:31.200 |
for your tests than infrastructure for other parts of like the actual production code sometimes. And then 00:46:36.480 |
once you build that infrastructure, you need tests for that infrastructure. It's like, it's, it's a 00:46:40.240 |
fractal of a problem to have good, reliable test suite. Running it, paralyzing it, make sure that like 00:46:46.000 |
it doesn't run hours on GitHub CI and slows everything down. This is really hard. And yes, AIs are not trained 00:46:53.520 |
on that. There are not too many good examples of good test suites. And even if they are, usually those 00:46:58.960 |
test suites are so highly specialized to that specific project, they're an integral part of that 00:47:04.160 |
code base. You can just separate it. So it needs deep insight to get any, like any, takes deep knowledge 00:47:10.880 |
to get like any good insights that you can just apply to another project after that. It's really hard. 00:47:15.680 |
And I have a feeling that it will take a couple of years of like LLM progress before, like they can sort of 00:47:20.400 |
extract that information and reapply it to your domain. 00:47:23.600 |
I think like at one point you sort of, the only way I think that is going to get better is like, 00:47:28.080 |
if, if we maybe also solve the other problem, which is sort of like maybe the quality of AI goes down 00:47:34.560 |
because they're trained on an LLM output. Right. So at one point I think like someone has to find a way 00:47:40.320 |
to judge the quality of a code base and be more selective on, on the, on the learning part. And I think 00:47:45.520 |
for tests, it will be necessary to some degree because there's just so many bad tests out there. And 00:47:49.360 |
they're like their entire test frameworks, which encourage shitty tests. And there has been a 00:47:53.520 |
generation of programmers that really believe that those shitty tests are exactly the gold standard 00:47:57.040 |
of a test that you should write. And, and I think like that's, that's actually a problem for open 00:48:00.800 |
source to large degree. Books written, books, whole books written about like writing tests in a horrible 00:48:05.200 |
way by prominent publishers. I mean, this is, this is the weird thing is like, that's actually why I feel 00:48:09.680 |
like my, my job is going to be secure for, for generations to come is because really like eventually you realize 00:48:16.000 |
that you're getting really good, but being very counter cultured because the culture is, is sort 00:48:21.200 |
of, is sort of going to the median of software engineering, which is where not good quality is 00:48:26.240 |
being created. Yeah. But yeah. So I have a feeling that the problem here is like even more fundamental 00:48:33.600 |
and it probably has to do with the current like technology of LLMs. Again, I'm not an AI researcher, 00:48:37.520 |
but I'm hearing from here and there from prominent AI researchers that LLMs are like either at that end, 00:48:42.960 |
or like we need next big revolution in LLM to happen. And to me personally, it all boils back to this active 00:48:49.280 |
context management because LLMs don't have memory. All they have is context. Every, every new task is 00:48:54.480 |
completely new from them. A person can learn. A person can internalize the, the, the, the information 00:48:59.760 |
about this project, about knowledge, about the mission of this project and about some meta and the, and 00:49:05.040 |
understanding of the domain that you are trying to solve. But LLM doesn't have that. And writing tests, 00:49:11.120 |
good tests requires all of that. And Claude.md will not help you to capture all that knowledge. 00:49:16.400 |
So until we see next generation of those AI of either LLMs or God knows, maybe it's going to be 00:49:23.200 |
something else that sort of has this part of context management inside that loop, inside the model, 00:49:29.360 |
somehow, I don't know how, it will continue to be this problem where there are just some areas which 00:49:34.240 |
appear to be easy or appear to be non-important, like good tests for your production application. Not going to be 00:49:39.440 |
solved without intense human input. And also like deep culture adjustment needs to happen. 00:49:44.720 |
I think my only counter argument here is that I think that we have as an industry created so much complex, 00:49:51.680 |
complex code and like over engineered shit and really bad tests that now the question is just, is 00:49:56.720 |
it going to get worse in a way? Because the correct solution here is actually in a team to push back 00:50:01.200 |
on slop either human generated or machine generated. And at least I found over the years, both in a 00:50:07.520 |
company and on other projects in open source or elsewhere, it is actually very, very hard to take 00:50:13.120 |
a principled stance on things that sort of most people think is actually a good thing. That's actually 00:50:18.160 |
really hard. And I think in a lot of projects, you end up with the kind of code that you know after a while, 00:50:24.160 |
you should never accept, but other contributors on the project will accept it or it's sort of 00:50:29.600 |
industry standard, they need this stuff. I don't know to which degree that is necessarily an AI problem. 00:50:33.760 |
I think is, I guess a little bit my call here. If we concentrate on tests as an example, are you going 00:50:40.480 |
to get worse tests now with AI than you get? I think you get overall more tests, but the percentage of 00:50:47.680 |
the shitty ones are probably like all things equal, you're just going to get more tests. So you're going 00:50:53.280 |
to get more shitty ones, but the percentage might be the same. Because I actually think that the AI 00:50:58.160 |
sort of perfectly creates sort of the standard crap that we have created. The reason it's so hard to 00:51:04.800 |
work on large code bases, at least what has come out of the Silicon Valley in the last 15 years is like 00:51:12.800 |
super complex systems, like overly, overly complicated. Everything needs to be outsourced to some 00:51:20.000 |
third parties, like infrastructure startup. Like it's just insane. I don't know. I feel like that 00:51:26.960 |
there's, that is at the root of that evil to some degree. And I actually found AI to be at least some 00:51:32.640 |
sort of pinnacle of hope here, where rather than me having to go and use this infrastructure component 00:51:39.360 |
that some random company might give me, because like, we need to do it this way because otherwise 00:51:43.040 |
it doesn't work. Now it's like, okay, you know what? I get the 80% of solution to this. It's a 500 lines 00:51:47.920 |
module code somewhere in my utility module. It's exactly tweaked for exactly the size of my problem. I'm going 00:51:53.040 |
to use that. And as a result, I have less crazy stuff going on. So I feel like there's also like 00:51:58.720 |
It's going to be so funny if it's going to be bad unit mocked tests that break the AI back. 00:52:04.320 |
And people just say, no, this is, this is the end. Let's not use AI after all. But it might as well 00:52:10.160 |
just happen. Yeah. Because bad tests will slow down things, not only for people, but for AI as well. 00:52:15.040 |
Testing incorrect things, incorrect behavior, codifying it, incorrect behavior in a test, 00:52:20.400 |
which creates a bad sort of feedback loop in the development for everyone specifically, 00:52:25.040 |
especially for AI. Yeah. All of those things are unsolved. I'm with you that part of this 00:52:29.120 |
problem is coming from humans. Establish projects where you already have a good functional test suite 00:52:34.240 |
with a good functional harness to run it will likely benefit. But even those things will degrade 00:52:42.000 |
if people just blindly allow LLM or outsource the task to the LLM. So I think some education must 00:52:48.480 |
happen and it might happen just naturally because people will start observing these problems more 00:52:55.600 |
and more and more and more and finally understand the value of a properly written minimal functional 00:53:02.720 |
test suite. But again, it's definitely going to be a learning curve, not just for EIs and LLMs and 00:53:08.880 |
and everybody, but for like software engineers themselves, 100%. And ultimately, this is good. 00:53:13.600 |
Like we can actually say that this might actually have net positive effect out of all of this is 00:53:19.440 |
that... Look, there's optimism. Yeah. There is some optimism that people will actually understand the 00:53:24.560 |
value of this because there is a lot of also like misalignment on tests in general. Should we have them 00:53:29.760 |
or not and whatnot. Well, if you want to use AI, then you have to. There is just no way. You should come to 00:53:35.840 |
the question of like, how hyped is it? Before, before that, like, let's quickly chat about the other 00:53:40.240 |
thing, which is laziness. Laziness and the brain numbness or the way how I wanted to send it to 00:53:46.400 |
your first brain smoothness and by another complaint with AI. So as I said before, I can, I can write Rust, 00:53:52.560 |
I can understand it, but I'm not good at Rust. Not to the degree that I'm good like at C or Python or some 00:53:58.560 |
other programming languages like TypeScript or whatever. Rust to me is still new. And what I 00:54:02.480 |
observed is that there are a couple of code bases where I contribute and where it's like, it's socially 00:54:07.840 |
acceptable for us to contribute some of the code that say I generated, people can review it and I'm open 00:54:12.240 |
about it. That's fine. But I'm not learning it. That's the thing. Like, I'm not exercising my brain. 00:54:17.440 |
I'm, it generates Rust code. I'm making some fixes here and there, but it's not compared to me writing that code 00:54:24.560 |
at all. So it feels like I, like my progress is stalled. So I have to make a conscious decision of 00:54:30.560 |
not using AI in order to get better. But there is a significant pushback towards it because the kind of 00:54:36.400 |
Rust code that I need to write is actually extremely simple and like an RPC method here and an RPC method 00:54:42.480 |
there. AI excels at that kind of stuff. I'm just harnessing the powerful already written code by humans 00:54:47.840 |
and exposing it. That's easy, but I'm not doing it. So it's like I'm on a Smith machine doing bench and 00:54:54.160 |
my personal trainer just lifts it for me. So I'm doing the movement, but I'm not getting stronger 00:55:00.160 |
because of that at all. And it's such a deep trap. So you say that you're better at Go now. And I'm 00:55:06.080 |
curious, like, are you? Because maybe you have perception of you being a good Go engineer now, 00:55:10.720 |
but like in reality, it's not. I don't know. So I'm definitely a better Go engineer than it was 00:55:14.560 |
like six months ago for sure. And it is very objective because I basically couldn't write any Go code. 00:55:21.040 |
like I could, but not like. There's a lot of stuff in the standard library I would have to go to the 00:55:28.240 |
Go docs for all the time. And I was like, okay, I know how this shit works. Like I definitely learned 00:55:32.160 |
a lot. I think it is very easy to fall in the trap where you don't learn. And I made this learning the 00:55:38.560 |
hard way for sure. And not even just for, in the context of programming. Like I feel like the thing 00:55:46.480 |
that I learned the most over the last two years, just working with AI to begin with is understanding 00:55:51.040 |
that if you turn off your brain, really bad things can happen. And so it is in the same sense of like, 00:55:56.960 |
you have to, if you go with your gym example, like it doesn't help you that you know that you have to lift. 00:56:01.840 |
You have to like, you have to make it a habit of going there, doing it regularly. Increasing 00:56:06.880 |
weights. That's, that's not something that comes naturally. So neither does like working responsibly 00:56:11.280 |
with API. You have to, you have to go into this, the reinforcement part. But then once you, once you, 00:56:18.880 |
you understand the dangers and like how to work with it, you can make it more of a habit. 00:56:24.560 |
So I think that's what makes that work. I really liked, I said this before, I really liked the 00:56:28.640 |
ENTROPIC, like marketing campaign over the hand. It's like, there was never a better time to have 00:56:34.080 |
problems. Because I, that's, that's how I feel like it's like, I have a problem now. I work and I use it 00:56:39.840 |
like a better search engine, like talk myself through it. Like I learned a bunch of things where I feel 00:56:45.360 |
like I wanted to learn that before, but now it sort of can dumb it down and sort of can transpose the 00:56:51.120 |
problem into a space where I actually feel engaged. So if you want to learn, it's great to learn with 00:56:55.680 |
it, but you have to make a conscious effort of wanting to do that. And if you, I don't know, 00:56:59.680 |
if it's just vibe, if you just don't even try, it's like, you feel like you're making progress, 00:57:04.400 |
but in reality, you just, you will feel bad about it or you should feel bad about it. That's, I think. 00:57:10.560 |
That's, that's, that's the thing. Like logically I hear you, like you're making a good argument, 00:57:14.640 |
but then I'm looking back at myself and I have, I have two modes of operation. Well, it's more, 00:57:19.920 |
but like the spectrum is either at the end of the spectrum or I'm a beast. Like I'm writing code, 00:57:25.200 |
I'm debugging it. I'm in the loop. I'm wired the keyboard and I know the problem and laser focused 00:57:30.560 |
on it. I will solve this problem. We'll fix this bug eventually. In another mode, I'm just like, 00:57:34.880 |
like a jelly on, on, on my chair, just lazily typing prompt. We are, we are going in circles with Claude, 00:57:42.320 |
both understand that this is unproductive. Claude hates me. I hate Claude. And we aren't like in this thing. 00:57:47.840 |
And I know that the bug, the wall that it can cross, I can do it easily. I don't want to 00:57:53.840 |
because I'm, I'm, I'm in this weird sort of like regime now that I'm just like not actively engaged 00:57:59.360 |
with it. I'm just wasting time. And that's, that, that, that's my problem is that when you use AI, 00:58:04.560 |
you are not as focused on alert, the TikTok of programming. Yes, but it is, but it is like, 00:58:11.360 |
it's just, this is the laziness. This is, and then when you are in that mode, it's really hard to learn 00:58:16.720 |
either because you learn when you are alert, when you are focused and when you are not focused and when 00:58:21.280 |
this thing is doing for you and you're tired, like suddenly it works. You just like, okay, like PR is ready. 00:58:28.160 |
You don't want to learn anymore. To my point of earlier, actually from a societal point of view, 00:58:33.280 |
I have like this deep rooted concern that this will turn into yet another social network problem where 00:58:38.320 |
people are like, this is going to be great for humanity. We're all going to talk more to each 00:58:42.640 |
other. And now all we have learned is like, we're smart zombies now with like a lot of psychological 00:58:48.000 |
problems and everything. Right. So like, that is my concern on this, but like individually, I feel 00:58:52.320 |
like I have solved this for now for me, but like, I hear you there. It's like, there's definitely a 00:58:57.440 |
version of this where, man, it just really feels like there's a, if you don't catch yourself in a 00:59:02.800 |
moment where you just give into the machine and you turn off your brain, like it stops being great. 00:59:09.280 |
It just has all the, the something, yeah, I mean, for sure. I think part of the problem here is that 00:59:16.160 |
this is what I'm trying to solve now. Hopefully I will finish that project with my team soon. Uh, but 00:59:22.080 |
part of this is that the, the, the AI is a little bit disjointed from your day to day, like actual 00:59:27.600 |
workflow as a software engineer. It's, it's like this chat box where you type things, but it doesn't affect 00:59:32.960 |
how you use tools yourself. So I want to fix that problem. Hence, hence the whole the terminal 00:59:38.480 |
magic. Uh, I wanted to be part of that, but I think something like that is required. Like when AI 00:59:43.200 |
actually augments tools instead of replacing them. So I'm not saying that I will solve this problem by 00:59:48.480 |
any means, but I think that's where the direction must take turn eventually is that AI would augment 00:59:55.760 |
your workflow and not replace it. I feel that one of the biggest problems right now with AI is that it 01:00:00.560 |
attempts and the marketing and, and, and everything and the social pressure attempts to replace you 01:00:05.840 |
instead of making you 10, 10 times more productive in what you already know how to do, just make you more 01:00:12.160 |
powerful. So that's the thing that that's, I guess my biggest sort of complaint about the current 01:00:16.880 |
landscape. And we might actually have enough of AI technology to make what I'm talking about happen. It's 01:00:22.720 |
just as an industry, we're not focused on that yet. We're just like drinking this Kool-Aid of like 01:00:27.280 |
generating new stuff in the chat window. And I think one of the problems that I have is like, because we're so 01:00:31.680 |
focused on this idea now that this is like the revolution and everything, any attempt of bringing 01:00:37.120 |
some sort of nuance to anything is immediately either ridiculed or it is, um, like there's, 01:00:43.280 |
there's real pushback against it. One thing for instance, like it's, it's maybe not so much for 01:00:48.080 |
programmers, but like, I noticed a lot on like, I'm still on this social network called X. Um, there's so 01:00:53.760 |
much pushback against Entropic for instance, because they keep talking about like, it's going to replace 01:00:58.160 |
jobs and stuff like this. And they have this like, they sort of come out of this, like what is called 01:01:02.320 |
EA movement. But I think like it is actually, I think it's good that people are talking about this a little 01:01:07.440 |
bit in, in, in the AI companies. It's like, yeah, this might have some impact on society. And maybe you 01:01:13.120 |
should think about this. Because I, I think that unless we actually start thinking about some of the 01:01:18.560 |
consequences of this, maybe like being responsible of how we use it, we're not going to be responsible in 01:01:23.760 |
no way, shape, or form look at us. Like we so smoothly transitioned to the comfortable topic of overhype 01:01:29.760 |
in AI. Uh, it's, it's, it's one overhype is what I'm concerned with. And like what you, what you said 01:01:38.320 |
essentially, like the whole, the whole thing is part of it is that I have deep concern that AI is just 01:01:43.360 |
being overhyped. Like the whole thing of like replacing humans and replacing, maybe not software engineers, 01:01:48.560 |
but I don't know, like support people and whatnot. So far there, there are not too many examples where 01:01:53.040 |
that are actually successful, but it's being touted as the next big thing by everyone, by CEOs of those 01:01:58.960 |
shops and labs, uh, for sure. Like it's always three months until we have a full, full, full AGI. And I'm 01:02:05.520 |
just getting, it's not even funny anymore. I'm just, I'm just exhausted. I'm exhausted about reading that. 01:02:11.360 |
I'm exhausted about reading reactions to that or people talking about that at this point of time. 01:02:16.400 |
And I'm, I'm, I think I personally feel an enormous amount of value in AI, in chat GPT, in Claude, 01:02:24.400 |
in, in, in everything. I'm worried that this overhypeness will just actually create a bubble, 01:02:30.720 |
inflate it artificially. And to a degree it probably is already. And then this beautiful thing might 01:02:36.000 |
collapse and then I don't have an opportunity to, haven't had an opportunity to actually like even 01:02:40.320 |
enjoy it, like, because, because, because it got overhyped. So I'm a little bit worried about that. 01:02:44.480 |
I think people have unreasonable expectations about the effectiveness of AI and they, uh, uh, and they 01:02:50.960 |
make plans about it and they talk about it without, uh, actually building proper expertise. And it feels like 01:02:57.840 |
a lot of wrong decisions and opinions are made about AI. At least this is what it feels like to me. 01:03:04.320 |
So I just wish that we would be like, whoa, whoa, whoa, like let's calm down. Like let's, let's, 01:03:08.960 |
let's not try to replace everyone with AI tomorrow. Let's, uh, let's build better, better tools, better 01:03:14.320 |
tests, not for tests that we were talking about, but tests for AI. Like understand how you can actually 01:03:19.520 |
replace anyone with AI, because you need tests for that. You need to make sure that whatever agent you 01:03:25.280 |
create actually works. That's an open area of research right now. How do they do that? 01:03:29.520 |
Like, I think the problem with the hype in particular, like I, I, I still think it's going to change the 01:03:34.240 |
world and I think it's going, they're going to be 100%. And I think that there's, I don't necessarily 01:03:38.800 |
know, like there will be like 10%. I think there will be more programmers. That's generally my view, 01:03:42.400 |
because I think like more people will program, but, um, I think like the definition of a programmer 01:03:46.880 |
might change and stuff like that. But what is really, really frustrating is this ridiculous discourse 01:03:54.000 |
around AI in any shape or form is like engagement bait and, and this ridiculous bets, which are going 01:04:01.200 |
on about like, when we're going to have super intelligence and HEI and whatever. Like someone 01:04:07.120 |
has to get burned in that somewhere. And I'm pretty sure it's going to be the wrong people because 01:04:11.760 |
statistically most likely it is. Exactly. Exactly. 01:04:14.080 |
So, so there's definitely like, there's, there's some bubble going on here for sure. I don't think that if it's going to a pop, 01:04:22.640 |
it's going to be, oh, we didn't want to do AI. So we're going to do something else instead. 01:04:26.160 |
I don't think we're going to end up like Dune where we said like, this is too powerful. 01:04:29.120 |
Let's just say no to AI. I just don't see that. Um, so my, my view is that even with the existing 01:04:35.760 |
models, even with the existing technology, we're going to end up somewhere where it's, it's actually 01:04:39.120 |
going to be pretty cool. But some of the bets going on are just pretty insane and interested. 01:04:44.960 |
Are we going to need all that energy? Maybe, maybe it's also just a little bit insane what 01:04:49.440 |
we're doing. I don't know. Like what is at one point, I think we have to look at this and say 01:04:54.800 |
like some of the people which are currently paying someone's bill are doing this because they actually 01:04:59.200 |
think there's some value in it, but they might themselves at one point feel like that was not 01:05:05.360 |
sustainable. And then they stopped paying for that. And then the service that they paid money for 01:05:08.800 |
stops paying for the service that they paid. Like there's a, there's a, all that money that goes to 01:05:13.440 |
different kinds of companies up and down the AI stacks right now. I don't think it's going to 01:05:17.200 |
be sustainable long-term. Someone made a misbet somewhere. I don't know who it is and maybe I'm 01:05:21.760 |
wrong and maybe this is like the forever bubble, but some of it looks so unsustainable. 01:05:27.520 |
Yeah, I agree. I don't have much to add on top of it. It's just my deep concern. I don't have any answers 01:05:32.720 |
to it. I'm concerned. I'm concerned about this technology. Yeah. Being impacted in a negative way 01:05:40.240 |
because of the consequences of it being overhyped. I also have another like just fundamental problem 01:05:44.960 |
with the whole thing. How can private companies at this point be so fucking valuable? It's like if you 01:05:50.160 |
wanted to say like capital is all great because like people can participate in the public markets, 01:05:54.400 |
there's nothing where me I will grow to at least in a reasonable way because that company is like 01:05:59.840 |
it like we have created this behemoth of private capital that hold all of it. There's a problem on 01:06:05.040 |
its own, but it's just insane in a way that they grow like that. Like what was Google worth when it 01:06:13.600 |
went public? A fraction of what all of those companies are even inflated for inflation and sort of the 01:06:17.920 |
development of S&P 500. That to me like this insanity of creating this super national private large companies 01:06:26.720 |
is just... It's the pump phase for sure. Maybe we'll collapse. I really hope there's not going to be a dumb 01:06:32.640 |
phase anytime soon. That's going to be massive. Anyways, they will learn something here. 01:06:37.600 |
I don't know what they will learn. It seems like the conversation converted you to become an AI pessimist. 01:06:45.920 |
No, I am super optimistic. I love it. It's great. Am I optimistic about like what society will do with 01:06:52.240 |
it? I don't know. I don't want to deal with it right now. Individually I feel super charged. 01:06:57.120 |
Yeah. Well, me too. Ultimately I'm using AI. I think that a lot of our own practices might actually 01:07:05.120 |
improve because of AI and will ultimately allow us to build faster and better. I also think that there's 01:07:10.800 |
going to be a lot more software engineers in the end because somebody will have to fix the AI mass and 01:07:15.600 |
actually make good out of it. So I'm not worried about engineers being replaced by software at all, 01:07:22.800 |
despite what a lot of people think on the street. So yeah, I think the future is bright. It just feels 01:07:29.680 |
like we are like really in a lot of uncharted territory right now and we don't have good answers to a lot of 01:07:35.680 |
actually hard and imminent questions. So I'm not worried about HGI happening anytime soon because 01:07:41.520 |
like what I see right now is that it's barely able to consume ClaudeMD reliably, left alone of launching 01:07:49.120 |
nukes at us. So I am not worried about any of that HGI-ness, but I am worried about the impact on the 01:07:56.000 |
field and like ultimately AI slowing down some parts of software engineers as opposed to accelerating it. 01:08:03.040 |
Unless we get better at using it and instrumenting it and just like better at writing software actually. 01:08:09.680 |
So it's an interesting intersection of problems for sure. 01:08:18.560 |
Let's see the feedback on this. I'm kind of curious. 01:08:21.440 |
Well, I don't think that we said anything controversial.