back to indexAgentic Coding with Claude Code

00:00:18.640 |
"Okay, I have no idea if this works. This is an experiment. 00:00:29.280 |
If you are, if you can hear me or something, write something into the stream chat. 00:00:42.400 |
Actually, let me open the stream. There can be some sort of live update on 00:00:53.840 |
what I'm doing, so like some sort of feedback. 00:00:58.560 |
So yeah, I have created a small little test project here just to 00:01:08.240 |
what I'm usually doing is I'm using plot code. There are a bunch of other options too. 00:01:14.800 |
There's open code, there is amp, and a few others. And what all of these have in common is that they 00:01:22.960 |
currently primarily run on the command line and detached from your editor. So I'm actually using now 00:01:32.160 |
VS code for the most part again. I used to use cursor, but the way I'm working is I have cloth running here 00:01:40.080 |
and I have an editor here. And I use the editor to review, and I'm using the editor primarily to 00:01:48.880 |
make some small changes to it, right. So I have created a small little project here. And I just 00:01:55.440 |
want to show the authentic setup. So what am I doing for agents to work at all, right? 00:02:02.320 |
So the most important part here is to have a cloth MD file. This is actually auto-generated with 00:02:10.960 |
/init. And just I did some minor modifications to it. For almost every single project that I'm doing, 00:02:19.520 |
I use a makefile. In fact, I would say for 100% of projects I'm doing, I have a makefile. And the makefile 00:02:25.760 |
acts as the main entry point for the agent to run things. So the agent, plot code in particular, 00:02:33.920 |
will use bash, it will use python, it will write some code, it will run the code, and it can dig itself 00:02:40.240 |
out of a bunch of nasty situations with this way. But I want to steer it in certain directions, right? 00:02:47.280 |
And so what I usually have is a project overview. It just tells it what it is. I don't actually know 00:02:53.440 |
if it's necessary. It will read the readme tool. But it gives you some context. And then I give it 00:02:58.720 |
immediately next the commands that it should use. The most important command here is makedef. 00:03:04.000 |
And the reason this is so important is because I actually don't really want it to run this command. 00:03:11.920 |
I usually run this command myself. This brings up all the services. 00:03:16.720 |
So this particular project has two services, as a front-end service, as a back-end service. And 00:03:22.720 |
I tell it here that this brings up the server. It starts both the front-end and the back-end. 00:03:30.400 |
I also tell it that it auto-reloads and it auto-compiles. And I also tell it it should never 00:03:36.640 |
stop the server. And that's actually one of the first important things here is that 00:03:42.160 |
because it can do anything. And in particular, because I run it in YOLO mode and it will just do anything, 00:03:47.040 |
I also wanted to encourage it never to run in the wrong direction. 00:03:55.040 |
So it's kind of annoying to explain, but it can, for instance, stop the server and restart the server. 00:04:08.720 |
Okay. So if the audio doesn't quite work, let me see if I can fix something here. 00:04:13.840 |
I might have, I might have overdone this. Let me see. 00:04:24.480 |
Let me maybe remove here. Let me, let me know if the audio fixes itself a little bit, if this is better. 00:04:32.480 |
Okay. I did, I did the change here. Let me see if it works. But 00:04:38.720 |
basically, I wanted to stay on the path of, of most success. And 00:04:49.840 |
for this to work, I have to basically juggle me and the agent. So what I want to do is I want 00:04:56.560 |
the dev environment to be in my, to the front. I always want to see it, right? So I have clawed up 00:05:02.880 |
here doing stuff. And I always have the dev server running things and I can see what it's doing, right? 00:05:07.600 |
So if I, for instance, go now to this website and I load it. Well, so far, nothing has happened 00:05:14.560 |
because it doesn't log requests. But for instance, if I were to hit the backend service, let's say, 00:05:20.560 |
go to slash API health. I can see that a request was made, right? So I want to see 00:05:26.800 |
how the server runs. The server is always there for me. 00:05:29.120 |
So that's the first, most important thing. I want the server to run in my terminal. I don't want 00:05:41.440 |
The next thing is that I want to have a consistent log of all the things that are happening. I've 00:05:46.080 |
talked about this a couple of times, but this basically allows me to see 00:05:49.040 |
frontend and backend requests simultaneously, even though they're from different services, right? 00:05:53.200 |
One is the Veed server. The other one is my Go server. 00:05:56.160 |
But I also want to give this visibility to the agent, right? So one of the tools in here 00:06:02.560 |
is this make tail lock command. So if I run this here, basically, I see the same thing as here. 00:06:11.040 |
Why is this important? Well, it is important because I want to get the agent to always understand how it 00:06:20.800 |
establishes the context. Like if I'm talking about a bug with the agent, I want the agent to understand 00:06:26.000 |
what's going on, right? So I will show in a minute how I'm doing this, but this is one of the reasons 00:06:32.240 |
why this tail lock command is here. And then the other commands there, as you would expect, how to 00:06:37.040 |
run the linter, how to run format, how to run clean. It's just that's the entry point of the tools that it 00:06:43.520 |
should use. And then here really is the most important part about this. I'm telling it 00:06:49.760 |
where the log files are. So I want to prevent it from guessing a bunch of other ways to establish 00:06:56.240 |
the context that it needs. Again, I reaffirm that this is the command it should read, 00:07:01.120 |
used to read the log file. And I also tell it to never stop the server. I also tell it that this 00:07:07.840 |
server auto compiles and auto reloads, right? That's the most important part. So how do I have the 00:07:13.920 |
auto reload setup, the auto compile setup? Well, I use, so they said there's a program called four men. 00:07:20.080 |
I'm not using that. And yeah, I think the problem is I probably don't have the right 00:07:29.280 |
setup here for the webcam. So I might, this might be confusing because I noticed earlier that the audio 00:07:39.120 |
desync is a little bit. I need to get the setup working better, but I hope it's not distracting. 00:07:45.840 |
Otherwise I can just disable the webcam. Maybe this makes it less awkward. I'll just turn it off for now 00:07:51.680 |
because I think the webcam is, is trailing a little bit here. Okay. So I'm using a fork of foreman or a 00:07:59.920 |
re-implementation of foreman called shorman. You can find this on the internet. Foreman clone, foreman. 00:08:08.800 |
It's this one here. And the reason I'm using this is because this is basically a very small shell script, 00:08:16.560 |
which does the same thing as foreman, but I had to make some changes to it. And it was easier for me 00:08:21.040 |
to do the change in shorman. So what does it do? So I have a file here called proc file. And it says, 00:08:29.360 |
this is the command to run for the front end. And this is the command to run for the backend. 00:08:34.240 |
And the front end is a basic Vite application, um, which auto reloads all the time, right? So as I'm 00:08:42.160 |
making changes to the front end, it automatically appears because it recompiles. And I have set up 00:08:47.200 |
something very similar for the backend. I basically told it to use this watch exec tool. And this is 00:08:55.600 |
this one here. All that this does is it watches these files, um, specifically go and SQL files, 00:09:03.680 |
and, um, it watches recursively this entire folder, and then it runs this, um, this go run command, um, 00:09:12.880 |
to, uh, to compile this. And so when I make a change here on my server, let's say, um, 00:09:21.520 |
I think this is might actually be unused at the moment. So let me completely delete this file. I 00:09:26.480 |
don't want to use it. Right. It's gone. The server has recompiled, right? That's, that's, that's all that is. Um, 00:09:34.560 |
then, um, go back to this. So, so then I have this sort of very basic setup where at least 00:09:48.560 |
these things log in, um, into this project, um, what in this project is not set up yet is that the 00:09:56.000 |
server also the front end also logs into it. So if I go here and I issue, um, console log whatever, 00:10:03.440 |
I don't see this here, right? So that's actually one of the changes that I want to do. And I want 00:10:09.040 |
to show why this is useful. Um, but basically that's the next thing that I want to set up is also get this, 00:10:15.440 |
to, to, to, to log into the server. Um, I want to say one last thing about the, 00:10:23.520 |
the make command here. So I give it instructions in the cloud file, right? I'm, I'm telling it, 00:10:29.520 |
these are the commands that you should run. There's some extra stuff down here about how I wanted this, 00:10:34.480 |
the structure of the project to be, but this is not alone. This alone is not enough to actually get it 00:10:41.840 |
to work reliably, um, to work correctly, right? So the reason I made changes to Shorman is because 00:10:50.480 |
I actually discovered, this is not, it doesn't take a long to discover this, that, um, the tools work 00:10:58.720 |
better if they are more descriptive in their error messages about, um, what, what would, what actually 00:11:07.600 |
happened. Let's put it this way, right? So one of the things that I changed into Shorman is that when 00:11:12.560 |
Shorman runs, it writes this file called Shorman pit, and this is basically the, um, the, the, this 00:11:20.000 |
master process running as a shell script. And if I run it a second time, Shorman checks if it's already 00:11:27.040 |
running, and then errors. But many tools do that, right? Many tools error when they're running. What this, 00:11:34.240 |
what I changed here is that I error in a way where if the agent reads the error, it is more likely to 00:11:42.000 |
understand what happened, right? So every once in a while, say the agent tries to do an HTTP request, 00:11:50.720 |
but it makes the HTTP request right at the moment where the server restarts, right? And then it might 00:11:54.960 |
see, oh, the server's not running, right? And then it comes, gets the idea to run make dev. But the reality 00:12:00.000 |
is that the server was actually running, right? And so when the agent now goes in and intentionally 00:12:04.960 |
tries to start the server, it gets a slightly different error message than it would otherwise 00:12:08.320 |
get, right? It now gets the error message. Service is already running. That's good. We auto-reload. 00:12:12.400 |
No need to do anything, right? So it reinforces to the agent that it doesn't, it shouldn't stop the 00:12:19.600 |
server. It shouldn't kill a bunch of processes, right? Because the agent will, it will start killing a 00:12:24.480 |
bunch of stuff and restart it. And I don't want it, right? I, I, I want this service here to be very 00:12:29.360 |
reliable. And when it tries to start it again, I want it to get exactly the error message that it 00:12:34.000 |
should get to not go off the beaten path, right? It should, if it does accidentally run it, now it 00:12:38.960 |
will, it will realize, oh, it's actually running, right? Um, so, and you can see this, right? So if I 00:12:45.360 |
tell the agent, um, I want you to start the dev services, right? It will run make dev. 00:12:56.720 |
It gets an error, but it says no action needed, right? So this is, this is, this works better 00:13:05.040 |
than if it just errors out and says, um, showman already running or whatever the default is, right? 00:13:11.200 |
So getting better error messages specifically for the agentic loop in is, is one key part here. 00:13:17.520 |
The second thing is of course that I changed showman to write these, these log files correctly. So if we're 00:13:23.440 |
looking to showman, but I can actually show this in a different way, right? So, um, there's, um, 00:13:29.520 |
there's a dev log. This is the one that it writes. And as you can see, it only contains the messages 00:13:36.800 |
from the current run. So if I restart this here, it starts out fresh. Um, so this is one of the changes 00:13:46.000 |
that I found to work really, really well is I'm hiding away information that it will not need 00:13:50.960 |
because otherwise, if I have this ever growing log file, then sometimes it picks up new work and it 00:13:58.640 |
sees unrelated changes from yesterday, for instance, right? Um, so that's, um, that's one of the other 00:14:06.640 |
changes that I landed into showman. Um, the other thing is since there was already a question of 00:14:14.640 |
Docker compose, I do not actually use Docker compose. Um, I don't use any Docker here and this is all 00:14:20.640 |
running on my machine. If I were to put anything into Docker, then maybe I would care about Docker 00:14:24.720 |
compose, but this is just the most basic setup that I can have. Um, okay. So, uh, let's make a change, 00:14:31.920 |
right? Let's put the, uh, console forward plugin in. Um, the reason I want this plugin is because it 00:14:38.400 |
makes just iteration generally much easier. I just haven't set it up here yet. So that is this plugin. 00:14:43.520 |
Um, so I want it to set this up. Um, so let's say, please set up this plugin by our NPM. 00:14:58.000 |
And my English just sucks. So it doesn't read me. Um, okay. Let's see if it can do that, right? 00:15:04.720 |
So it should hopefully read this. Um, I, I can optimize a lot of these things, right? This probably 00:15:15.680 |
also could have done it manually. I just want to see if it works. Um, one of the things, for instance, 00:15:20.320 |
that slows down Claude a lot right now is that it actually from scratch always tries to figure out 00:15:25.520 |
what we're using here. So, uh, this might be one of the things we should put into this project 00:15:30.320 |
is that we're using NPM. Um, so I didn't write this yet. So we can do this here. We always use NPM. 00:15:40.560 |
Right? This in theory should prevent it from using PNPM or something else that it might have. 00:15:47.760 |
Um, it will only pick up on that when we, uh, start from scratch, but, um, 00:15:55.040 |
in it for future iterations, maybe it will, will improve this slightly. Um, another thing it probably 00:16:05.280 |
read the instructions incorrectly. I noticed the other day that I'm not documenting this correctly. I'm 00:16:10.960 |
actually importing, or maybe I do it correctly, but I think I've done this once before and it always 00:16:17.200 |
imports this incorrectly. So I will manually fix this now because this should actually be like this. 00:16:27.760 |
this should work now. Um, I noticed that this before that it always gets this wrong. Um, 00:16:41.760 |
but in theory now what should happen is that if we log an error here, 00:16:46.960 |
we see it in the log, right? That that's what we want to accomplish here. And 00:16:56.320 |
the whole point of this is that future iterations where, uh, where we're coming, where we're running 00:17:04.720 |
into issues on the front end will also show up in the same log, right? So now we have at least this, 00:17:10.000 |
this running. Um, and let's see, there's some changes here. Um, 00:17:16.400 |
let's do this, um, update. So, uh, what do I want to say here? 00:17:25.680 |
Set up console forward plugin and remove old pagination. 00:17:30.160 |
I got very lazy and used a lot of dictation now. Um, okay. So 00:17:39.760 |
let's try to set up some code here, right? That's, that's really what we're here for. 00:17:48.560 |
agentic from the start to work particularly well. 00:17:51.840 |
So I did actually bootstrap this with cloud code, but I did already make some changes so that it has 00:17:56.480 |
an infrastructure that I like. So in particular, for instance, um, 00:18:00.560 |
at the very least, I picked my web framework or the router that I want to use. 00:18:06.240 |
I set up some utilities and here to respond with errors. It's just the most basic kind of infrastructure 00:18:14.560 |
that I wanted to use, uh, for building our API. 00:18:19.440 |
The second thing that I did is I created a plan and this is, this is basically all the things 00:18:24.000 |
that I want to implement here, right? I, I want to build a small bulletin board that's modeled after 00:18:28.160 |
PHPB mostly, but also 4chan. So I don't want to have user authentication. The idea is just that I'm using, 00:18:35.680 |
what's called trip codes to authenticate, um, admins can fill in boards. So basically I have this, 00:18:40.960 |
this whole plan here that I wanted to implement. And I will not tell it to do this in one go, 00:18:48.960 |
but what I want to do is I want to have it look at the plan and tell me if it needs something else. 00:18:55.920 |
So I created a plan in plan.md. I want you to ultra think about it and see if there are omissions in 00:19:04.000 |
the plan that we need to fill in. So, um, so it will now read this file and 00:19:13.920 |
ultra think is basically a hard coded value in Claude that also extends the thinking context window. 00:19:24.880 |
So it will, um, use, um, more tokens to reason. Um, one question came up is what like this, 00:19:33.760 |
dictation tool I'm using, I'm using two different ones. I'm trying, um, flow. The other one I'm using 00:19:42.240 |
is called voice ink. I use them both for different things. I'm just trialing different things right now. 00:19:48.400 |
Um, uh, yeah, that's, that's basically the answer to that. Um, 00:19:54.080 |
the reason I wanted to read through my plan is that it's actually quite good at telling me if there are 00:20:01.920 |
omissions that will help it later. I don't usually use, um, this with Claude. Instead, what I usually 00:20:12.560 |
do is I copy paste this entire thing into O3. So let's do this here. Um, I have a plan here. I want 00:20:19.760 |
you to think hard about this plan and tell me if there are omissions to this plan that we should 00:20:24.800 |
look into before we implement it. Let's put the plan and see what O3 is doing. Um, 00:20:32.960 |
so let's see what it came up with. Several omissions, admin authentication. So how admin 00:20:41.520 |
privilege is granted verified. That's actually a good point. We didn't mention that. So, uh, let's do 00:20:49.600 |
both here. Actually, we already have a section up here. So, um, admin_scan_well_reports, admin_commissions 00:20:59.200 |
are hardcoded in nflr. Okay, we, we don't really have authentication. Maybe we just use, um, 00:21:10.560 |
very, very, very basic, um, HTTP_basic. Well, we'll see. Um, 00:21:17.920 |
so most of this we will not actually do. Um, this doesn't really matter. 00:21:26.320 |
The indexes, I think, like a lot of this stuff it will figure out along the way anyways. Mostly I want 00:21:34.080 |
to see if there is some, um, some very clear omission that we have, um, that we should clarify. 00:21:42.000 |
Now, so far this looks good. See if, if, uh, this came up with something. Uh, thank you for giving me 00:21:52.480 |
two. Uh, do I have to pick one? Uh, just want to quickly look through. Uh, okay. This one tells me that 00:22:02.560 |
there are no deletions in it. That's a good point. Um, we will not do this for now. Uh, okay. So, 00:22:11.520 |
so far, if we go to this bulletin board, there's nothing, right? This, um, uh, we haven't set up anything 00:22:21.680 |
yet. Um, I get a warning here that the, we're kind of using outdated packages for 00:22:32.400 |
for the dev tools. I will leave this for now. I just don't want to spend too much time on the 00:22:35.360 |
stream on the wrong things, but, um, yeah, the, the idea is basically that we are going to implement 00:22:42.000 |
The only endpoint that I have right now is actually the, uh, help check endpoint. So we don't have much. 00:22:49.440 |
We don't have any database code yet, other than setting up the database once in the server. I think 00:22:54.560 |
it's here somewhere. Um, so let's see. So we'll like one API endpoint and I think with best to start with 00:23:02.640 |
listing all the boards that exist. And because you cannot create a board right now because there's no 00:23:10.480 |
admin panel, we're just going to hard code a bunch of boards in the database. Um, so here we have migrations. 00:23:18.480 |
So we ask it to make a new migration with two boards that it can just make up. And then we're going to 00:23:25.040 |
list the boards. I want you to make an API endpoint that lists all the boards that exist because we do 00:23:31.840 |
not yet have an API to create the boards. I want you to make a migrations and create two default boards, 00:23:38.000 |
one called general and one called water cooler. Um, let's see if this is good enough. 00:23:46.480 |
Um, I think I already wrote what the response for APIs largely should be. I think this is all, um, 00:24:00.640 |
yeah, so let's do this. Each board in the response should pose as could contain 00:24:08.640 |
the most recent topic and the most recent post in addition to just the title and description. 00:24:19.280 |
So let's see what the questions are in the meantime. 00:24:23.200 |
Have I tried using claw to generate a file mapping and usable voicing AI post-processing for prompt 00:24:32.400 |
generation? Um, I have tried that. So far I haven't, a lot of the things I'm doing at the moment are 00:24:40.640 |
basically based on does it actually make anything work better. And I know, I know that a lot of these 00:24:45.760 |
AI tools can do quite impressive things, but very often it doesn't make any more productive. So I don't 00:24:52.480 |
really like using voicing or something like this to generate prompts to then have another prompt. So I 00:24:58.560 |
much rather have commands set up. Um, but yeah, I haven't, I haven't tried that so much. 00:25:06.080 |
So it, it kind of, it came up with a migration here, uh, for defaults. So, um, it will run this. 00:25:14.720 |
One of the things you will notice is that in this project and in fact, all of the code I'm writing now, 00:25:19.360 |
I'm, I'm, I'm asking it to write a custom SQL. I do not use an ORM. This is really because I always 00:25:27.280 |
liked writing SQL manually. In fact, I really just like SQL. It's not that I enjoy SQL, but I like having 00:25:33.600 |
as little of an indirection between me and the database. The main reason I don't do it when I don't 00:25:41.520 |
use enchanted coding as much is because it is annoying to write SQL, but now they have a machine write it for me. 00:25:46.800 |
This beats to me having, um, like another indirection in place. So let's see what it does here. 00:25:55.360 |
Um, I think it already does some things I don't like, but let's see. Um, most likely what's going 00:26:03.840 |
to spit out is code I don't like. And then rather than it making more code like this, I just want to 00:26:10.400 |
stick with the initial one. I want to fix it up because the more code exists that looks like what 00:26:15.600 |
I want, the more likely it is that future API generations will kind of fit into this. Right. 00:26:20.640 |
That's sort of the idea. Um, yeah. And so, as you know, I basically, I gave it all the permissions. 00:26:29.840 |
I just let it write. I, I don't, I don't do anything here. Right. It's like, I, I just let it go. 00:26:37.200 |
It has all the permissions to do everything on the system, which in parts could be a terrible idea, 00:26:41.760 |
but seemingly plot code does really well. Um, right. So it, it managed to, to run the API. 00:26:50.800 |
Uh, it sees that there is a, there's a response coming back from the API. So, so it is working. 00:26:58.560 |
Uh, we can also go to the browser now and sort of test this, um, I think it called it boards. Right. 00:27:04.720 |
And so we see, we see that there is a board and it actually has test posts in it. I'm assuming it has 00:27:12.000 |
test posts in it because it's just went to the database and created some. This is my guess. Uh, 00:27:16.880 |
I didn't actually see where it did it, but this might be an interesting moment to look into the database. 00:27:22.800 |
Uh, so we have a database here called miniDB. This was empty when we started earlier. 00:27:34.800 |
I wonder when it, when it created them. I didn't look. So, uh, when did it create them? Did it make me a test? 00:27:43.760 |
So let's, let's do this. Um, let's check quickly which files, uh, we have here. So it must have created 00:27:51.520 |
these manually through, at which point did it create them? So it created some handlers. Um, 00:27:59.600 |
when did it create? This is one of the reasons why the terminal interface is not very great, 00:28:06.320 |
because I don't have to search here. I have to quickly go through this and see. Um, 00:28:12.400 |
I don't actually know when it made the, when it made the posts, but it clearly created some, 00:28:21.200 |
some content database here. We'll just leave it now. Um, this is, this works good enough. Let's check 00:28:26.240 |
the changes, right? So now we can see sort of how I do that. So I know that I changed these files, 00:28:31.280 |
right? Because they're all modified. So we have a new route here. Um, boards. This is okay. I'm, 00:28:39.360 |
it's fine. I have a list boards. And so all of this is new, right? We only had the health check before. 00:28:45.600 |
And now we have, uh, this. So it calls this get boards, which is down here. Um, I really don't 00:28:55.440 |
like this, right? The, it, it should not do this. All the database code should go into separate module. 00:29:00.480 |
So let's start with this here, right? Um, we need some changes. So let's see. 00:29:07.680 |
All the database queries should go into models slash boards dot go. Actually, we'll do boards dot go. 00:29:17.520 |
Um, so that's the first that we want. So we want this to go somewhere else. 00:29:25.280 |
And this is okay. So the boards response is okay. So it will be a list of boards. Each board will be a 00:29:32.000 |
database model. But this kind of thing here will be kind of weird because I want the model to represent 00:29:38.480 |
a singular row only. So the models should only represent a singular row, not any, uh, joint records. 00:29:49.680 |
Uh, so we need to figure out how to best, um, query this board then to have this in two. Um, 00:29:59.920 |
what is it doing here anyways? It is, it's running another query. So this is an n plus one query anyways. 00:30:13.920 |
So let's just say that it should move this over there. Um, the topic should go into models topic.go. 00:30:26.160 |
And then we have the post. The post should go else. 00:30:37.280 |
Um, let's just see if it's, if it manages to refactor this a little bit. 00:30:41.040 |
Um, and then we see from there what we need to do. 00:30:44.880 |
"Does Claude Code Visual Extension work if you're outside of Visual School Terminal?" 00:30:50.000 |
Um, yes. So if the, if the, um, if the integration is set up correctly, it works even if it's running 00:30:56.800 |
on the side, right? I can also start Claude in here, but I don't really like it. I prefer this terminal 00:31:01.600 |
on the outside. Um, but yeah, it's, these changes, they still show up. Um, although I think that this 00:31:07.760 |
comes actually from the Git plugin. Um, but we'll see. What are the questions there? Um, 00:31:16.720 |
so maybe I should explain this because I didn't do this, but "plot YOLO" 00:31:26.480 |
write this here. It's just an alias for this impossible to pronounce argument called dangerous 00:31:33.200 |
and skip permissions, right? Basically I run this all the time. Is it a good idea? 00:31:38.560 |
I don't know. I'm not strongly advocating for it, but I can tell you that I'm using it this way all the 00:31:45.040 |
time. Um, so that's why it doesn't ask me for anything. It's just, it just edits. Um, what are my 00:31:53.840 |
thoughts on Gemini CLI? I will reevaluate it last time I was using it. The problem basically is that 00:32:02.880 |
any model other than the entropic family of models is not overly amazing. It will use its usage. So 00:32:10.240 |
I want to see that these authentic loops work. So that's why I'm playing with it. I have most success with 00:32:18.240 |
Claude. I also think that Claude is the cheapest option because the 100 euro, sorry, 100 dollars a 00:32:25.280 |
month package in Sonnet only mode is enough. Um, and it's kind of hard to beat for the price right now, 00:32:34.880 |
right? And I don't know how long this price is going to stick here, but that's really why I'm not trying 00:32:41.600 |
Gemini much. I have Gemini on the system. I sometimes give Claude access to Gemini to read through a code base, 00:32:50.080 |
I'm, I'm going to get this working first, working well, and then I will try other tools again. I, I also tried AMP. I tried a bunch of 00:32:58.960 |
other ones, but, um, this is the one that, um, it's just, 00:33:03.760 |
I think it has the highest chance of sticking around also in part because the people that write 00:33:07.920 |
the tool are also the people that write the model or create the model. And so they go hand in hand. 00:33:12.160 |
Um, okay. So now we have a board.go get all boards. This looks, this looks quite a bit better. We don't 00:33:20.720 |
need pagination here because we don't expect that many boards. So that will be quite good. Um, now it uses 00:33:26.880 |
the scan to feed this and we have a board by ID. Uh, this is also quite okay. And the board by slug. 00:33:41.760 |
one of the consequences now that all of these methods can return null or board. So if the board 00:33:52.000 |
doesn't exist, it returns null or nil. Um, do I like this? I don't know. Um, so we have this most recent 00:34:11.360 |
that is not amazing is that it looks like the board doesn't have a pointer to the most recent 00:34:21.120 |
topic. But the topic as opposed to the most recent post. So we, maybe this is okay. I, I, I will not 00:34:29.920 |
judge the database structure too much right now. Okay. So I think we can stick with this. In theory, 00:34:36.560 |
if we now go to here, it should more look more or less the same. So we have the most recent topic, 00:34:40.880 |
the most recent post. Um, let's actually remove the most recent post from, um, 00:34:57.440 |
so wonder what is not the office here. Okay. Let's leave it for now. Let's leave it for now. But I 00:35:04.000 |
think we will throw it away. So what I usually do when I'm program with this is I create myself a to do 00:35:08.880 |
file. Um, where I basically keep track of all the stuff that I still need to do. So one is, um, 00:35:15.920 |
I'll call this nets. Um, we should remove the most recent post from the board listing. 00:35:24.800 |
Okay. So we'll think of this later. So let's have a look at how the API response so far looks like. So we 00:35:32.320 |
have a list port route, um, which is hooked up to the router. Um, and it creates this boards response 00:35:44.960 |
and then response with Jason and get port with recent is what it calls, which is now it gets all the 00:35:52.640 |
boards and then it gets the most recent topic and post. Um, 00:36:02.880 |
yeah, uh, not overly amazing, but kind of, okay. But one of the things I do not like is this part 00:36:08.400 |
here, right? Does HTTP dot error. Um, and we have this utility here called internal server error. So we'll 00:36:17.760 |
actually use this. So we'll call, uh, utils dot internal server error, w and error. Then we remove the other one. 00:36:28.080 |
So we want to do this and hopefully going forward, we will actually start using this utility instead. 00:36:36.240 |
Why do I want to use this utility? Well, for the one hand, because it logs the error and it returns 00:36:41.280 |
with a standardized message. So that's why I want this. And, um, and then this is okay. And so board with recent 00:36:52.000 |
is an extended struct that has the board in it plus the extra things here. So this is, this is okay. Um, 00:36:59.040 |
so let's say we commit this, we'll leave this for later. 00:37:04.080 |
Um, so let's say edit basic board API response. So the next thing we want to do is we want to hook up the end, 00:37:15.920 |
the front end, right? So if we go here, we don't see anything. So let's say we want to have the board 00:37:23.760 |
show up. I want you to change the front end to show all the boards. Um, for now, I want you to make sure 00:37:31.520 |
make sure that we create components for each row on the listing so that we can reuse this later. 00:37:39.120 |
These rows should be reused for topics in a board as well as for the board listing overall. 00:37:48.880 |
Um, we might need a parameter to change the, actually, I don't want this. Let me, let me do this definitely. Um, 00:37:55.760 |
I want you to now show all the boards the most recent topic in the overall, uh, in the index page, 00:38:14.000 |
I want to now, this, this might, so the problem with whenever it creates a front end from nothing, 00:38:20.880 |
it turns into a mess. Since there's basically no real front end, this might be incredibly messy. 00:38:26.960 |
And I'm a little bit afraid that it doesn't even manage to set up the router. 00:38:29.920 |
Um, so I'll see what it does. If I can, we can watch it. In the meantime, 00:38:37.680 |
Yeah. So for how to put the browser locks in a terminal, I used a Vite plugin that I wrote. 00:38:45.280 |
Um, you can also do this yourself from API endpoint. The Vite plugin was this one here. 00:38:54.960 |
Uh, one other question is what the font, the font I'm using is MonoLisa. 00:39:02.000 |
I think all the time. This one here. Uh, that's the font. What other question? 00:39:10.880 |
Yeah. So one question is if you manually edit the code like that, do you have the problem that the 00:39:17.760 |
model has unedited versions in the context? And yes, this is a problem. One of the problems with this 00:39:23.760 |
is that it will recall things that you have already thrown away. This is actually a pretty big problem. 00:39:28.800 |
Um, this is one of the reasons why I clear the context all the time. Um, the same problem, 00:39:34.560 |
by the way, also comes up if you do code formatting. It's quite often that the linter and the formatter 00:39:39.360 |
edit the file in a certain way and sometimes they should get back and forth. 00:39:44.000 |
I don't have a good solution for this, but it is a problem. Um, 00:39:50.960 |
I can't really recommend anything here other than I do want to do these commits. 00:39:56.080 |
Then I want to clear the context. Sometimes I maintain a to-do list. So before I run out of context, 00:40:03.120 |
for instance, I tell the agent to summarize everything that it did into a file and I can look at this file 00:40:08.880 |
later and then continue from there. Um, so let's see what it did. It probably has created something here. 00:40:17.440 |
Um, so this is actually an interesting thing. It has not managed to run this, right? And so now, 00:40:22.160 |
now we can probably see that our tooling comes in helpful, hopefully. 00:40:26.800 |
When I navigate to the page, I get a bunch of errors. Please check the log and see what's going on. 00:40:35.040 |
Right? So it should now read the log, which it does. And hopefully see what it broke. Um, okay, cool. So it managed. 00:40:50.880 |
Probably it wouldn't have needed the log, but having the log now means that it was just able to go back 00:40:57.680 |
there and figure this out. And at least we have something now, right? So I, not that I like how 00:41:04.000 |
this looks at the moment. You can't even click on it or anything, but, um, 00:41:07.520 |
yeah, we, we see something. Um, let's make two changes here. 00:41:14.320 |
I want these to be rows. So one below the other, not next to each other. And I also want to not show 00:41:21.680 |
the most recent post. I only want to show the most recent topic. So I just want to make this change 00:41:28.400 |
and then we're going to figure out how to make it less crappy. Um, cause it probably doesn't look very 00:41:33.600 |
nice. Um, the way I do front end code at the moment is I let it write a whole bunch of stuff and then I 00:41:41.120 |
ask it to extract components that usually sort of works. Um, but front end code, unfortunately, 00:41:47.760 |
turns out to be very sloppy very quickly. Okay. So, um, okay, this at least is getting somewhere. 00:42:11.520 |
Um, so this is already, we're already sort of down. Um, if I saw, please lend everything. 00:42:18.320 |
This is, it's already going to be annoying because it clearly left a bunch of nonsense behind. And so 00:42:26.800 |
the linter will immediately complain, hopefully that, um, there's unused stuff. So let's see. 00:42:33.840 |
Um, by the way, in this project, I'm not using any hooks. Um, I do use some hooks in other ones, 00:42:40.320 |
but I want to start with the basics here. Uh, okay. So we, we got rid of some unused stuff. 00:42:47.760 |
Um, I don't know what page is this. We're throwing this away for now. 00:42:56.080 |
And this is already messy. I don't, this is already too big. So the API client, I'm actually okay with, 00:43:01.840 |
it can leave that, but I don't like the types on the same file. So, um, let's do this. 00:43:09.200 |
Move the types from API to T or .ts into a separate file. 00:43:31.200 |
Um, let's see. Just kind of want to move this out. 00:43:36.320 |
So the types are here now. The API is here. One of the most important things is to make sure that the 00:43:44.000 |
files don't grow too large. The larger the files, the harder it is for the, for the system to work with it. 00:43:50.080 |
Um, so this is, this is, this is okay for now. So we're going to just have 00:43:55.040 |
not the nicest thing here. I'm going to manually remove this welcome thing, which I think, 00:44:07.920 |
Um, let's throw all of this away. So we have only the boards. 00:44:17.520 |
The frontend so far, probably a little bit messy, but, um, 00:44:24.320 |
initial display of the boards in the frontend. 00:44:32.560 |
The problem immediately here now is going to be that, um, 00:44:35.920 |
we don't have the router. We have the router set up, but we don't have query set up. So I think 00:44:45.680 |
Okay. It does use query, which is, that's good. 00:44:54.640 |
It might be okay. Oh, well, we'll see. We'll see how messy it gets as we continue. 00:45:01.280 |
Um, okay. So what should we do next? I think next we're going to show 00:45:04.880 |
each individual board. So the next thing we need to do is we need to create these boards. 00:45:16.240 |
I wonder if I should continue the session or not. Maybe we'll continue the session. Might be a bad idea, but 00:45:23.920 |
might actually help. Now, please add an endpoint to show all the topics in one board. 00:45:31.120 |
We will also actually, no, I will, I will, I will do it from scratch here because I want to set up 00:45:37.280 |
pagination. Now we need an API endpoint to list all the topics in a board. Note for this, we will need 00:45:43.520 |
a pagination helper. We want to use cursor-based pagination. That means not offset, but to continue 00:45:50.080 |
from a specific starting point. And we want to take the cursor to continue to the next page from 00:46:00.640 |
the URL parameter, because it's going to be a get request. 00:46:03.840 |
It's actually going to be a shitty user experience because it means you can't jump to a specific page. 00:46:25.280 |
We want to take the, take the page and per page parameter from the URL. 00:46:37.040 |
Um, or else it's going to be a get request. Default per page 250. 00:46:42.880 |
Um, also return the total number of pages that exist. 00:46:52.480 |
Okay. In the meantime, I have more questions. 00:46:56.240 |
I do not use compact at all. Never, ever use compact. If you run out of context, 00:47:02.880 |
compact is basically a command that just screws up everything. 00:47:06.880 |
Um, I don't know what happens if you compact. It's going to be a gamble. It's, it is already 00:47:14.080 |
random enough. It happens out of the box, but I never, never, never run compact. Instead, if I 00:47:19.360 |
notice that I'm running out of context, I'm asking Claude to summarize what it did into a markdown file. 00:47:25.760 |
I reviewed a markdown file, start a new session, and then read back from the markdown file. 00:47:32.080 |
Um, because then at the very least, I know what it pulls in the context. Compact, I have no idea 00:47:38.720 |
what it does. Um, I don't think the tool even shows you what it did after compacting. So it's, it's, 00:47:44.480 |
it's a gamble. It's a pure gamble. Um, never do that. 00:47:48.720 |
Basically, um, auto compact is, I rather have Claude stop than auto compact because it's just so, 00:47:59.920 |
so random. It's, it's absolute random. I also only use Sonnet. Um, I kind of wish I could use Opus more, 00:48:09.840 |
but even a $200 subscription, I run out of Opus. So I just stick myself to Sonnet and I use O3 for 00:48:19.920 |
planning a lot where I basically go on. Um, I take what I'm working on, copy paste that into just JetGPT, 00:48:28.640 |
pick O3 and have a conversation about my architecture there. Um, and I can maybe show this later. I kind of 00:48:34.880 |
want to hook up the trip codes to, to post something and then we'll see, um, how well this works. So 00:48:41.920 |
what I should have received here now in theory is, uh, what did we get here? We got the pagination. 00:48:49.200 |
So these are the parameters for the pagination. We get an offset. Interesting. Why do we have an offset? 00:48:58.720 |
Um, okay. So we're pulling this from the query page and per page, the offset is calculated. Um, 00:49:14.400 |
So what did it do? So it lists the topics and this is a list of topics and the pagination meta. 00:49:24.960 |
Um, do I like this? Do I like this? I don't know if I like this. Um, 00:49:31.520 |
do I like this? I don't know if I like this. We'll figure this out if I like this. Um, I think I might 00:49:39.760 |
want to rename this a pagination probably, but maybe this is good enough. Uh, so let's see. So in theory, 00:49:46.800 |
there should be an API now for me to hit, uh, API boards, which one test. This has stuff in it. 00:49:55.120 |
Uh, what the slash? No. What's, what's the API? 00:50:00.000 |
Um, board. What's a board? Board ID. Is it a board ID or is it a board slug? Let's try. 00:50:12.640 |
So that's boards. Uh, board ID one is a test board. 00:50:18.560 |
You get something here? No, we don't get anything. Oh, slash topics. 00:50:24.560 |
With this. Okay. Board not found. So we need a test probably. Okay. Um, 00:50:35.840 |
So we get this total one, total pages one per page 10. So if we do, uh, page equals two, 00:50:42.960 |
we get an empty list. Three empty list. Okay. This, this is okay. Um, 00:50:51.120 |
I do want to change the meta to pagination though. So I think we're just going to do it manually. 00:51:03.520 |
I'm going to call this pagination. Um, okay. So that's up. I changed meta to pagination. 00:51:16.640 |
So I just give it this context immediately interrupted just so that hopefully, um, 00:51:23.440 |
it doesn't get confused later, uh, through a manual edit. Um, there are some questions. I will quickly 00:51:31.120 |
go to them. Have I used context seven? Yes. Um, I don't have good experiences with it. I don't use 00:51:37.600 |
any MCP servers other than playwrights. And I try to not use playwright either. Um, 00:51:42.560 |
then the other question, uh, yeah, in general, I don't like it looking up docs. I much rather give it 00:51:52.320 |
the docs myself, uh, if it needs them. I I'm very conservative on context usage. I don't like any tools 00:51:58.640 |
that pull anything in automatically. I optimize everything for low context usage. Um, 00:52:03.680 |
I want you to now register a URL for the board. So if a user goes to slash b slash slug, 00:52:13.200 |
then we will show the most recent topics there and add a pagination for previous and next page and 00:52:20.560 |
a basic overview of how many pages exist and a quick jump to a particular page. 00:52:24.720 |
Did manage, um, the like slash b. What do we have here? Slash 00:52:41.760 |
yeah, let's do a slash b slash slug. I like this. Um, and show the, so the topics there. 00:52:57.120 |
Note, we can always rely on monotonic increasing primary key integers for board order. 00:53:06.800 |
Um, because we use SQLite here and I want to avoid, uh, it's using dates right now. 00:53:13.440 |
Um, also please use the link component to link from the index page to the board. 00:53:28.400 |
Okay. Let's see if it manages. Maybe in the meantime, there's some questions. 00:53:32.080 |
Uh, again, that was another question what I use for text to speech. Um, 00:53:36.720 |
right now this is using whisper flow. I also use voice inc, which is open source. Uh, both of them work. 00:53:43.360 |
Um, I just, I'm trialing whisper this week. Um, normally I use voice inc. Um, 00:53:55.600 |
Um, are there any other questions that I can answer in the meantime? Because I'm pretty sure 00:54:01.920 |
this is going to take like four or five minutes for it to produce something reasonable. 00:54:08.400 |
Earlier there was a question if I'm streaming, this is the first stream I've been doing in three years 00:54:19.840 |
probably. So we'll see if I will do this again, but, um, yeah, that's kind of lazy. 00:54:27.360 |
Um, yeah. So how do I write the logs automatically to the dev log file? This is what I'm doing with 00:54:39.600 |
shorman. So if you, let's do this in the meantime, let's put this on GitHub. Then you can sort of steal 00:54:49.760 |
don't want to put the whole thing up there, mini db. Let's make a repo mini db. Create new repository. 00:55:05.120 |
This is right through that, I put it in port or screen first. 00:55:20.320 |
So, um, in here in scripts, this my shorman fork, I should probably, 00:55:31.680 |
now that this is on GitHub, I should probably make sure that the licenses 00:55:53.680 |
It's always kind of funny if you have a bunch of black pullet stuff. 00:56:06.720 |
Because, um, you cannot really copyright any of this stuff. 00:56:10.880 |
Um, very little. There's going to be a bunch of court cases. 00:56:18.000 |
Okay. Um, but yeah, shorman is, is what I'm using for the, for the logs. 00:56:36.720 |
Um, I navigate it to a board and it didn't work. 00:56:43.920 |
Right. And this is again why the unified logging is so functional. 00:56:53.120 |
So, it should hopefully figure out what it did wrong. 00:56:59.520 |
Because I, I'm assuming that this wasn't too wrong that it did, right? 00:57:16.720 |
Could it be that we don't run the 10 stack plugin for Meet? 00:57:28.240 |
Because the plugin for 10 stack should do all of this. 00:57:36.080 |
So, you probably don't have this plugin in there. 00:58:00.320 |
I didn't even have to figure out what's going on. 00:58:02.160 |
Like, okay, I did give it a hint that it has to set up the plugin. 00:58:18.400 |
So, it couldn't figure out how to make the tail. 00:58:20.880 |
And then it immediately ran and went for the, for the log itself. 00:58:24.720 |
And that's one of the big problems right now. 00:58:26.960 |
Why I'm so careful about giving it, um, the right context. 00:58:34.000 |
But it should really have cd'd into the right folder. 00:58:40.960 |
And, and this, this is basically contributing to context rod, right? 00:58:45.440 |
Now, now it has remembered that this didn't work. 00:58:53.440 |
And I'm, I'm, I'm trying to nudge it in the right direction. 00:58:56.560 |
Um, so one of the things I can try here now is that, um, on the make command, maybe we can. 00:59:02.240 |
Um, honestly, I, I only have partially good results with this. 00:59:08.080 |
If you fail to run the make file, um, you have to remember that you have to run it from top level. 00:59:24.320 |
Right, so, so maybe this will, oops, maybe this will nudge it in the right direction. 00:59:52.880 |
But it added the link component as instructed. 01:00:13.520 |
To test this better, I want you to generate 120 different posts across 10 different topics. 01:00:23.120 |
And put them into all the different boards that exist already. 01:00:30.400 |
To make this easier, please write yourself a little test script. 01:00:38.640 |
Do I want it to write a test script or figure it out itself? 01:01:03.040 |
Um, but please use inference to generate a bunch of real sounding conversations. 01:01:10.080 |
And pipe them into an input file that this script will then use. 01:01:17.040 |
So, basically, I want to get it into a situation where it now generates me out an entire board. 01:01:24.720 |
Mario, since you're writing, back to Whisperflow. 01:01:31.360 |
But the problem with Voice Inc at the moment is that the AI integration just adds too much latency. 01:01:40.160 |
Um, so for the screencast, I opted to Whisperflow. 01:01:47.760 |
And Whisperflow is, is, is just the lowest latency thing I found. 01:01:56.480 |
For me, one of the really big benefits of agent decoding is actually test data generation. 01:02:07.200 |
Because I'm actually struggling a lot with traditional applications that all of my test data just looks 01:02:14.240 |
Um, and now you can just get an LLM to really create your pretty good looking test data. 01:02:22.400 |
And it makes it much easier to see the product, to feel the product, uh, and to experience what it looks 01:02:27.680 |
Um, so that's just such a nice, uh, nice aspect of it. 01:02:36.960 |
Try use Tmux for better lock tailing and running servers. 01:02:51.040 |
I just don't load MCPs into my context in the first place. 01:02:54.560 |
Um, one question is, do you have any experience with the amount of users you get out of a $20 01:03:08.960 |
I think that you don't get that much out of it. 01:03:20.240 |
cloud subscription and you only use one agent at the time with Sonnet, you're not going to hit the limits. 01:03:26.240 |
With two or three simultaneously, you can hit the limits. 01:03:34.080 |
With $200 subscription on Sonnet, I don't think you can run up and run into the limits. 01:03:39.360 |
Uh, but with the $20 one, I'm pretty sure that you can run out very quickly. 01:03:48.880 |
So because I use dangerously bypassing permissions, I don't really use the plan mode explicitly. 01:03:55.280 |
And the problem for this is that it actually disables a bunch of things. 01:04:00.960 |
So when it plans, it permanently asks for permissions for all the tools. 01:04:04.880 |
So I basically ask it to plan without plan mode. 01:04:08.880 |
Because the plan mode, as far as I can tell, at least in parts, auto activates just unprompting. 01:04:15.040 |
But that's really why I don't use the plan mode. 01:04:28.640 |
I think they're probably like 10, 15 different pretty decent voice-to-text things at the moment. 01:04:37.840 |
And I think it's a little bit ridiculous to pay for Whisperflow. 01:04:41.360 |
And I don't really like that because it is... 01:04:46.720 |
On the Whisper model, which is the open source one. 01:04:52.960 |
Something like Whisperflow in an open source way, um... 01:06:29.680 |
Next thing is we're going to enumerate the topics. 01:06:40.000 |
So basically we are going to add an endpoint to see all the posts on a topic. 01:06:52.400 |
And we also want to add the front-end component and the front-end page to... 01:07:17.840 |
This monthly paying for basically local whisper models is nonsense. 01:07:26.000 |
I'm actually quite okay for paying for the API inference. 01:07:30.960 |
I think it could actually fix up a lot of the little issues with 01:07:38.160 |
voice input on a very well-trained local model too. 01:07:59.520 |
It was an alias on Mercurial, which I was using before git. 01:08:02.640 |
And I got so used to it that when I moved to git, I set up this alias and never went back. 01:08:22.880 |
Pretty good on open code if you use it through... 01:08:41.120 |
Very slowly this will start working at one point. 01:08:48.480 |
I mean this is not a very interesting screencast in many ways. 01:08:54.560 |
Because this doesn't really show a trend decoding all that much. 01:09:01.680 |
If I still have some time, I think I have 20 minutes left. 01:09:05.920 |
If I still have some time, I will try to add some tests. 01:09:13.360 |
What do I usually do while waiting for Claude? 01:09:15.840 |
So this is the moment when I'm going to pitch Vibe Tunnel. 01:09:20.480 |
Or actually I think I was barely involved at this point in this project. 01:09:25.120 |
I think this is primarily now Mario's and Peter's project. 01:09:29.840 |
Basically run all of your Claude instances through the browser. 01:09:35.760 |
So I can go in for a coffee and see what it's doing. 01:09:44.000 |
But the answer of like what do you do while... 01:09:46.720 |
While waiting for Claude is you go to Twitter and you write stuff, I guess. 01:10:48.080 |
Maybe now that I'm sort of writing it more indirectly, I don't mind it quite as much. 01:10:55.040 |
But I kind of want to show why Go works so well for authentic coding. 01:11:08.960 |
This you would not write in any other programming language than Go, right? 01:11:16.160 |
If error not nil, return internal server error. 01:11:26.480 |
And I know that there are ways in which I could do this differently. 01:11:29.520 |
And then return an error and like handle some of it on a higher level. 01:11:35.600 |
In Python, you wouldn't write it because it's ugly code. 01:11:38.560 |
In Rust, you wouldn't write it because it's ugly code. 01:11:48.000 |
So the bar of error handling in Go is exactly that bar. 01:11:58.320 |
And one of the consequences of this, like all the handling is local. 01:12:02.000 |
So it's very easy for the agent to understand what's going on. 01:12:07.920 |
Because it doesn't have to look through so many layers of abstraction, right? 01:12:11.200 |
It sees basically everything that's going on in this function is going on in this function. 01:12:18.720 |
It doesn't have to understand complicated error handling patterns elsewhere. 01:12:22.720 |
That's why Go is so good from a code writing perspective. 01:12:27.120 |
All of the meta shenanigans that this language has is pretty standard too. 01:12:34.160 |
Like there's not a lot of complexity you need to understand. 01:12:40.240 |
And the last part is if you run the Go tests, then it caches them. 01:12:48.160 |
With Go, you can basically tell it to run all the tests at all times. 01:12:56.720 |
Because it means it never accidentally tests too narrow. 01:12:59.440 |
So in Python, I have it that tests one function only. 01:13:02.960 |
Because it explicitly only tests that function. 01:13:05.440 |
And it completely forgets that it five minutes ago broke another function. 01:13:10.000 |
And only at the very end, it discovers that it made a huge mess. 01:13:22.880 |
But I'm actually not sure if that is correct. 01:13:32.640 |
Well, I can click on something and nothing happens. 01:13:47.760 |
I will stay on the board page what's going on. 01:13:51.120 |
But we can in the meantime look at the code that it generated. 01:15:37.840 |
I think it's completely wrong on what it's trying to debug here. 01:15:42.000 |
But look, it's checking the route tree if it's regenerated. 01:17:38.240 |
So not only am I confused by 10 stack router. 01:17:40.800 |
It's also that the LLM is confused by 10 stack router. 01:18:09.120 |
So let's commit this and then try to make a test. 01:18:23.440 |
And maybe one last thing we could do is like actually add support for. 01:18:41.280 |
So once more, the frontend log to Cloud Code is basically a frontend. 01:18:52.720 |
Like even the cursor agent is nothing like this. 01:18:54.960 |
Like this is a completely different experience. 01:19:05.440 |
But the problem with tests is that agents are not very good at writing tests. 01:19:18.160 |
Actually, before we write a test, we will write a way to create posts. 01:19:26.800 |
I want you to add an internal function to create posts, 01:19:32.720 |
which we will then hook up to an API later, but we don't hook it up yet. 01:19:36.480 |
And we also want the function to create topics. 01:19:42.960 |
So that is basically creating a post plus a topic in one go. 01:19:50.320 |
And then I want you to write a singular test that creates... 01:19:59.040 |
Let's do these APIs and I will make a test plan here. 01:20:05.680 |
Because the thing with the test plan is that... 01:20:26.000 |
Because the way it wrote the test right now is it wrote it against the underlying SQLite code. 01:20:33.280 |
And the problem with this is that this doesn't have enough abstraction to allow you to... 01:20:41.440 |
The way I really like my code to work is that you can do something like this. 01:20:45.440 |
The way I like it is that you can write tests that... 01:20:55.680 |
Insert, insert, insert, insert, insert, but then the rollback. 01:21:08.880 |
Right, this here, for instance, it takes a SQLite DB. 01:21:48.720 |
Because my parameter here can be either a database... 01:21:58.480 |
We basically have to refactor the entire code base. 01:22:00.480 |
And they're just not amazing ways, I think, to do that. 01:22:21.120 |
And now for this to work with my intended rollback strategy, this also has to be save points. 01:22:33.200 |
So this is going to be the point where maybe we're going to ask Gemini. 01:22:58.960 |
Actually, I think that Sonnet might not be able to do this in a good way. 01:23:40.000 |
But the way we're doing database transactions right now doesn't work for how I want tests to work. 01:23:45.280 |
Please ultra think how to re-architect the code to support this better. 01:24:08.720 |
But there's just one way to test databases and that's rollbacks. 01:24:26.960 |
At least I think there are no other questions. 01:25:05.600 |
The other thing is like for for this authentic coding 01:25:19.440 |
I don't quite work like I work normally because 01:25:26.000 |
But I also don't stay engaged with the agent as much as I do right now. 01:26:27.440 |
Does this actually work with nested transactions and save points correctly? 01:26:33.360 |
Because I don't think it works because it will have to 01:26:41.760 |
How do we distinguish between Claw and Gemini? 01:27:04.400 |
and creating architecture and back and forth for this. 01:27:09.840 |
It's not amazing, and Gemini, the model, is not very good at tool usage. 01:27:15.840 |
So, for the agentic loop, I still haven't found anything better than Sonnet and Opus. 01:27:22.720 |
But this is also why, and I mentioned this earlier, I use O3, and sometimes I use Gemini 01:27:29.280 |
to plan larger changes, and then I give the output of that to Sonnet. 01:27:33.920 |
And I just do the planning of larger changes just in the UI and ChatGPT or in the AI studio for the most part. 01:27:45.600 |
I tried OpenTelemetry and a bunch of other things. 01:27:48.320 |
It creates too much nonsense, too much output with all of the spans that it produces. 01:27:55.360 |
And it didn't work quite as well as just a simple thing of logging everything into one file. 01:28:09.440 |
And also Gemini, sorry Gemini, Claude, to just not fully understand how OpenTelemetry works. 01:28:16.720 |
So, right now at least, simple log files work incredibly well. 01:28:28.800 |
I would actually love to see someone show how to use OTel successfully for agentic workflows. 01:28:37.680 |
Just, it didn't work for me is all I can say. 01:29:06.640 |
My current approach has fundamental flaws with net transaction save points. 01:29:30.560 |
I, the problem is like, I know how I set up this to normally work. 01:29:35.840 |
And I don't know if the AI can actually one-shot this. 01:29:55.920 |
Okay, so it creates a SQLite memory database. 01:30:22.560 |
that it's now no longer going to give good code. 01:30:26.240 |
And it doesn't even have that much stuff in the context, 01:30:29.360 |
but it's already, it's already making mistakes that it doesn't do on a smaller context. 01:30:33.280 |
Like, it's, it's, it went too narrow on one specific problem. 01:30:38.880 |
And this is the point where I no longer expect good output from this, actually. 01:31:13.520 |
Uh, now it's, now it's turning into full slope. 01:31:27.440 |
So, what I will do now is I will, I will make a branch. 01:31:55.760 |
So, we're, we're going to, we're going to, we're going to go back to the drawing board here. 01:32:10.400 |
Um, this, this might be really, really awful. 01:32:23.440 |
So, the strategy now will be to unsloppify this and to get it to do something. 01:32:34.960 |
And, maybe the way of doing this will be to get the tests rns to run. 01:32:41.760 |
So, the board test, we're going to make this not terrible. 01:32:46.480 |
Um, so, run migrations for testing should just be run migrations. 01:33:12.320 |
So, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um. 01:33:20.640 |
So, run migrations for testing does this instead. 01:33:36.240 |
Run all, add a test setup function, which takes a callback, 01:34:07.120 |
between test runs, so we don't waste quite as much. 01:34:13.120 |
Um, actually, I don't want to explain it, but I don't want to migrate all the time, basically. 01:34:23.440 |
Um, and then we should test if this actually works, so create, let's review this. 01:34:36.480 |
Um, and so where, let's see what we have to begin, dot begin. 01:34:50.960 |
In transaction goal, libigo, migration, word test. 01:35:07.120 |
Tests are the worst because it doesn't understand how to create a test harness. 01:35:24.720 |
Um, this entire thing should go, this transaction manager. 01:35:50.960 |
But I would love it to at least set up the harness correctly. 01:35:56.800 |
Um, so here is, here's my best recommendation here at this moment. 01:36:08.880 |
Because it just doesn't understand how good tests should look like. 01:36:13.920 |
And I don't know what it says about us as programmers, but the way it sets up tests is just bad. 01:36:20.720 |
I can only assume that the bulk of people out there are writing horrible tests. 01:36:30.080 |
What, what it should actually do is we should set up like a really good, um, 01:36:36.160 |
Um, the, the, the, the pattern I like to use here is actually from Django. 01:36:41.520 |
They work quite well and they hide save points and transactions properly. 01:36:46.400 |
Get this in a good spot and only then start writing tests. 01:36:49.040 |
Because everything that is done here so far is really, really bad. 01:37:10.160 |
I might let this run, um, and actually set up the tests correctly in a way that I like. 01:37:15.920 |
And then I will show either at the future stream or just in another, um, like a video or, or just 01:37:23.040 |
like a follow-up post of how to run the tests. 01:37:25.120 |
Because I don't think we're going to get to a reasonable point in the next 20 minutes. 01:37:32.480 |
I, I gave myself an hour and a half and already over time. 01:37:35.120 |
So maybe I will do two more minutes of last questions. 01:37:46.320 |
And then, uh, thank you so much for watching.