Agentic Coding with Claude Code

"I think this should be the live post." "Bringing should be live." "Okay, I have no idea if this works. This is an experiment. If you are, if you can hear me or something, write something into the stream chat. Because I don't know. Actually, let me open the stream. There can be some sort of live update on what I'm doing, so like some sort of feedback.

So yeah, I have created a small little test project here just to maybe show a little bit how I'm doing this. Okay, so what I'm usually doing is I'm using plot code. There are a bunch of other options too. There's open code, there is amp, and a few others.

And what all of these have in common is that they currently primarily run on the command line and detached from your editor. So I'm actually using now VS code for the most part again. I used to use cursor, but the way I'm working is I have cloth running here and I have an editor here.

And I use the editor to review, and I'm using the editor primarily to make some small changes to it, right. So I have created a small little project here. And I just want to show the authentic setup. So what am I doing for agents to work at all, right?

So the most important part here is to have a cloth MD file. This is actually auto-generated with /init. And just I did some minor modifications to it. For almost every single project that I'm doing, I use a makefile. In fact, I would say for 100% of projects I'm doing, I have a makefile.

And the makefile acts as the main entry point for the agent to run things. So the agent, plot code in particular, will use bash, it will use python, it will write some code, it will run the code, and it can dig itself out of a bunch of nasty situations with this way.

But I want to steer it in certain directions, right? And so what I usually have is a project overview. It just tells it what it is. I don't actually know if it's necessary. It will read the readme tool. But it gives you some context. And then I give it immediately next the commands that it should use.

The most important command here is makedef. And the reason this is so important is because I actually don't really want it to run this command. I usually run this command myself. This brings up all the services. So this particular project has two services, as a front-end service, as a back-end service.

And I tell it here that this brings up the server. It starts both the front-end and the back-end. I also tell it that it auto-reloads and it auto-compiles. And I also tell it it should never stop the server. And that's actually one of the first important things here is that because it can do anything.

And in particular, because I run it in YOLO mode and it will just do anything, I also wanted to encourage it never to run in the wrong direction. So it's kind of annoying to explain, but it can, for instance, stop the server and restart the server. And I really don't want it to do that.

Okay. So if the audio doesn't quite work, let me see if I can fix something here. I might have, I might have overdone this. Let me see. Let me maybe remove here. Let me, let me know if the audio fixes itself a little bit, if this is better. Okay.

I did, I did the change here. Let me see if it works. But basically, I wanted to stay on the path of, of most success. And for this to work, I have to basically juggle me and the agent. So what I want to do is I want the dev environment to be in my, to the front.

I always want to see it, right? So I have clawed up here doing stuff. And I always have the dev server running things and I can see what it's doing, right? So if I, for instance, go now to this website and I load it. Well, so far, nothing has happened because it doesn't log requests.

But for instance, if I were to hit the backend service, let's say, go to slash API health. I can see that a request was made, right? So I want to see how the server runs. The server is always there for me. So that's the first, most important thing. I want the server to run in my terminal.

I don't want clawed to run it in the background. The next thing is that I want to have a consistent log of all the things that are happening. I've talked about this a couple of times, but this basically allows me to see frontend and backend requests simultaneously, even though they're from different services, right?

One is the Veed server. The other one is my Go server. But I also want to give this visibility to the agent, right? So one of the tools in here is this make tail lock command. So if I run this here, basically, I see the same thing as here.

Why is this important? Well, it is important because I want to get the agent to always understand how it establishes the context. Like if I'm talking about a bug with the agent, I want the agent to understand what's going on, right? So I will show in a minute how I'm doing this, but this is one of the reasons why this tail lock command is here.

And then the other commands there, as you would expect, how to run the linter, how to run format, how to run clean. It's just that's the entry point of the tools that it should use. And then here really is the most important part about this. I'm telling it where the log files are.

So I want to prevent it from guessing a bunch of other ways to establish the context that it needs. Again, I reaffirm that this is the command it should read, used to read the log file. And I also tell it to never stop the server. I also tell it that this server auto compiles and auto reloads, right?

That's the most important part. So how do I have the auto reload setup, the auto compile setup? Well, I use, so they said there's a program called four men. I'm not using that. And yeah, I think the problem is I probably don't have the right setup here for the webcam.

So I might, this might be confusing because I noticed earlier that the audio desync is a little bit. I need to get the setup working better, but I hope it's not distracting. Otherwise I can just disable the webcam. Maybe this makes it less awkward. I'll just turn it off for now because I think the webcam is, is trailing a little bit here.

Okay. So I'm using a fork of foreman or a re-implementation of foreman called shorman. You can find this on the internet. Foreman clone, foreman. It's this one here. And the reason I'm using this is because this is basically a very small shell script, which does the same thing as foreman, but I had to make some changes to it.

And it was easier for me to do the change in shorman. So what does it do? So I have a file here called proc file. And it says, this is the command to run for the front end. And this is the command to run for the backend. And the front end is a basic Vite application, um, which auto reloads all the time, right?

So as I'm making changes to the front end, it automatically appears because it recompiles. And I have set up something very similar for the backend. I basically told it to use this watch exec tool. And this is this one here. All that this does is it watches these files, um, specifically go and SQL files, and, um, it watches recursively this entire folder, and then it runs this, um, this go run command, um, to, uh, to compile this.

And so when I make a change here on my server, let's say, um, I think this is might actually be unused at the moment. So let me completely delete this file. I don't want to use it. Right. It's gone. The server has recompiled, right? That's, that's, that's all that is.

Um, then, um, go back to this. So, so then I have this sort of very basic setup where at least these things log in, um, into this project, um, what in this project is not set up yet is that the server also the front end also logs into it.

So if I go here and I issue, um, console log whatever, I don't see this here, right? So that's actually one of the changes that I want to do. And I want to show why this is useful. Um, but basically that's the next thing that I want to set up is also get this, to, to, to, to log into the server.

Um, I want to say one last thing about the, the make command here. So I give it instructions in the cloud file, right? I'm, I'm telling it, these are the commands that you should run. There's some extra stuff down here about how I wanted this, the structure of the project to be, but this is not alone.

This alone is not enough to actually get it to work reliably, um, to work correctly, right? So the reason I made changes to Shorman is because I actually discovered, this is not, it doesn't take a long to discover this, that, um, the tools work better if they are more descriptive in their error messages about, um, what, what would, what actually happened.

Let's put it this way, right? So one of the things that I changed into Shorman is that when Shorman runs, it writes this file called Shorman pit, and this is basically the, um, the, the, this master process running as a shell script. And if I run it a second time, Shorman checks if it's already running, and then errors.

But many tools do that, right? Many tools error when they're running. What this, what I changed here is that I error in a way where if the agent reads the error, it is more likely to understand what happened, right? So every once in a while, say the agent tries to do an HTTP request, but it makes the HTTP request right at the moment where the server restarts, right?

And then it might see, oh, the server's not running, right? And then it comes, gets the idea to run make dev. But the reality is that the server was actually running, right? And so when the agent now goes in and intentionally tries to start the server, it gets a slightly different error message than it would otherwise get, right?

It now gets the error message. Service is already running. That's good. We auto-reload. No need to do anything, right? So it reinforces to the agent that it doesn't, it shouldn't stop the server. It shouldn't kill a bunch of processes, right? Because the agent will, it will start killing a bunch of stuff and restart it.

And I don't want it, right? I, I, I want this service here to be very reliable. And when it tries to start it again, I want it to get exactly the error message that it should get to not go off the beaten path, right? It should, if it does accidentally run it, now it will, it will realize, oh, it's actually running, right?

Um, so, and you can see this, right? So if I tell the agent, um, I want you to start the dev services, right? It will run make dev. It gets an error, but it says no action needed, right? So this is, this is, this works better than if it just errors out and says, um, showman already running or whatever the default is, right?

So getting better error messages specifically for the agentic loop in is, is one key part here. The second thing is of course that I changed showman to write these, these log files correctly. So if we're looking to showman, but I can actually show this in a different way, right?

So, um, there's, um, there's a dev log. This is the one that it writes. And as you can see, it only contains the messages from the current run. So if I restart this here, it starts out fresh. Um, so this is one of the changes that I found to work really, really well is I'm hiding away information that it will not need because otherwise, if I have this ever growing log file, then sometimes it picks up new work and it sees unrelated changes from yesterday, for instance, right?

Um, so that's, um, that's one of the other changes that I landed into showman. Um, the other thing is since there was already a question of Docker compose, I do not actually use Docker compose. Um, I don't use any Docker here and this is all running on my machine.

If I were to put anything into Docker, then maybe I would care about Docker compose, but this is just the most basic setup that I can have. Um, okay. So, uh, let's make a change, right? Let's put the, uh, console forward plugin in. Um, the reason I want this plugin is because it makes just iteration generally much easier.

I just haven't set it up here yet. So that is this plugin. Um, so I want it to set this up. Um, so let's say, please set up this plugin by our NPM. And my English just sucks. So it doesn't read me. Um, okay. Let's see if it can do that, right?

So it should hopefully read this. Um, I, I can optimize a lot of these things, right? This probably also could have done it manually. I just want to see if it works. Um, one of the things, for instance, that slows down Claude a lot right now is that it actually from scratch always tries to figure out what we're using here.

So, uh, this might be one of the things we should put into this project is that we're using NPM. Um, so I didn't write this yet. So we can do this here. We always use NPM. Right? This in theory should prevent it from using PNPM or something else that it might have.

Um, it will only pick up on that when we, uh, start from scratch, but, um, in it for future iterations, maybe it will, will improve this slightly. Um, another thing it probably read the instructions incorrectly. I noticed the other day that I'm not documenting this correctly. I'm actually importing, or maybe I do it correctly, but I think I've done this once before and it always imports this incorrectly.

So I will manually fix this now because this should actually be like this. But in theory, uh, plug in this should work now. Um, I noticed that this before that it always gets this wrong. Um, but in theory now what should happen is that if we log an error here, we see it in the log, right?

That that's what we want to accomplish here. And the whole point of this is that future iterations where, uh, where we're coming, where we're running into issues on the front end will also show up in the same log, right? So now we have at least this, this running. Um, and let's see, there's some changes here.

Um, let's do this, um, update. So, uh, what do I want to say here? Set up console forward plugin and remove old pagination. I got very lazy and used a lot of dictation now. Um, okay. So let's try to set up some code here, right? That's, that's really what we're here for.

How can we make some changes to this? Um, I actually don't find agentic from the start to work particularly well. So I did actually bootstrap this with cloud code, but I did already make some changes so that it has an infrastructure that I like. So in particular, for instance, um, at the very least, I picked my web framework or the router that I want to use.

I set up some utilities and here to respond with errors. It's just the most basic kind of infrastructure that I wanted to use, uh, for building our API. The second thing that I did is I created a plan and this is, this is basically all the things that I want to implement here, right?

I, I want to build a small bulletin board that's modeled after PHPB mostly, but also 4chan. So I don't want to have user authentication. The idea is just that I'm using, what's called trip codes to authenticate, um, admins can fill in boards. So basically I have this, this whole plan here that I wanted to implement.

And I will not tell it to do this in one go, but what I want to do is I want to have it look at the plan and tell me if it needs something else. So I created a plan in plan.md. I want you to ultra think about it and see if there are omissions in the plan that we need to fill in.

So, um, so it will now read this file and ultra think is basically a hard coded value in Claude that also extends the thinking context window. So it will, um, use, um, more tokens to reason. Um, one question came up is what like this, dictation tool I'm using, I'm using two different ones.

I'm trying, um, flow. The other one I'm using is called voice ink. I use them both for different things. I'm just trialing different things right now. Um, uh, yeah, that's, that's basically the answer to that. Um, the reason I wanted to read through my plan is that it's actually quite good at telling me if there are omissions that will help it later.

I don't usually use, um, this with Claude. Instead, what I usually do is I copy paste this entire thing into O3. So let's do this here. Um, I have a plan here. I want you to think hard about this plan and tell me if there are omissions to this plan that we should look into before we implement it.

Let's put the plan and see what O3 is doing. Um, so let's see what it came up with. Several omissions, admin authentication. So how admin privilege is granted verified. That's actually a good point. We didn't mention that. So, uh, let's do both here. Actually, we already have a section up here.

So, um, admin_scan_well_reports, admin_commissions are hardcoded in nflr. Okay, we, we don't really have authentication. Maybe we just use, um, very, very, very basic, um, HTTP_basic. Well, we'll see. Um, so most of this we will not actually do. Um, this doesn't really matter. The indexes, I think, like a lot of this stuff it will figure out along the way anyways.

Mostly I want to see if there is some, um, some very clear omission that we have, um, that we should clarify. Now, so far this looks good. See if, if, uh, this came up with something. Uh, thank you for giving me two. Uh, do I have to pick one?

Uh, just want to quickly look through. Uh, okay. This one tells me that there are no deletions in it. That's a good point. Um, we will not do this for now. Uh, okay. So, so far, if we go to this bulletin board, there's nothing, right? This, um, uh, we haven't set up anything yet.

Um, I get a warning here that the, we're kind of using outdated packages for for the dev tools. I will leave this for now. I just don't want to spend too much time on the stream on the wrong things, but, um, yeah, the, the idea is basically that we are going to implement the feature now.

The only endpoint that I have right now is actually the, uh, help check endpoint. So we don't have much. We don't have any database code yet, other than setting up the database once in the server. I think it's here somewhere. Um, so let's see. So we'll like one API endpoint and I think with best to start with listing all the boards that exist.

And because you cannot create a board right now because there's no admin panel, we're just going to hard code a bunch of boards in the database. Um, so here we have migrations. So we ask it to make a new migration with two boards that it can just make up.

And then we're going to list the boards. I want you to make an API endpoint that lists all the boards that exist because we do not yet have an API to create the boards. I want you to make a migrations and create two default boards, one called general and one called water cooler.

Um, let's see if this is good enough. Um, I think I already wrote what the response for APIs largely should be. I think this is all, um, yeah, so let's do this. Each board in the response should pose as could contain the most recent topic and the most recent post in addition to just the title and description.

Okay, let's see what it does. So let's see what the questions are in the meantime. Have I tried using claw to generate a file mapping and usable voicing AI post-processing for prompt generation? Um, I have tried that. So far I haven't, a lot of the things I'm doing at the moment are basically based on does it actually make anything work better.

And I know, I know that a lot of these AI tools can do quite impressive things, but very often it doesn't make any more productive. So I don't really like using voicing or something like this to generate prompts to then have another prompt. So I much rather have commands set up.

Um, but yeah, I haven't, I haven't tried that so much. So it, it kind of, it came up with a migration here, uh, for defaults. So, um, it will run this. One of the things you will notice is that in this project and in fact, all of the code I'm writing now, I'm, I'm, I'm asking it to write a custom SQL.

I do not use an ORM. This is really because I always liked writing SQL manually. In fact, I really just like SQL. It's not that I enjoy SQL, but I like having as little of an indirection between me and the database. The main reason I don't do it when I don't use enchanted coding as much is because it is annoying to write SQL, but now they have a machine write it for me.

This beats to me having, um, like another indirection in place. So let's see what it does here. Um, I think it already does some things I don't like, but let's see. Um, most likely what's going to spit out is code I don't like. And then rather than it making more code like this, I just want to stick with the initial one.

I want to fix it up because the more code exists that looks like what I want, the more likely it is that future API generations will kind of fit into this. Right. That's sort of the idea. Um, yeah. And so, as you know, I basically, I gave it all the permissions.

I just let it write. I, I don't, I don't do anything here. Right. It's like, I, I just let it go. It has all the permissions to do everything on the system, which in parts could be a terrible idea, but seemingly plot code does really well. Um, right. So it, it managed to, to run the API.

Uh, it sees that there is a, there's a response coming back from the API. So, so it is working. Uh, we can also go to the browser now and sort of test this, um, I think it called it boards. Right. And so we see, we see that there is a board and it actually has test posts in it.

I'm assuming it has test posts in it because it's just went to the database and created some. This is my guess. Uh, I didn't actually see where it did it, but this might be an interesting moment to look into the database. Uh, so we have a database here called miniDB.

This was empty when we started earlier. And it has created some posts here. Um, I wonder when it, when it created them. I didn't look. So, uh, when did it create them? Did it make me a test? So let's, let's do this. Um, let's check quickly which files, uh, we have here.

So it must have created these manually through, at which point did it create them? So it created some handlers. Um, when did it create? This is one of the reasons why the terminal interface is not very great, because I don't have to search here. I have to quickly go through this and see.

Um, I don't actually know when it made the, when it made the posts, but it clearly created some, some content database here. We'll just leave it now. Um, this is, this works good enough. Let's check the changes, right? So now we can see sort of how I do that.

So I know that I changed these files, right? Because they're all modified. So we have a new route here. Um, boards. This is okay. I'm, it's fine. I have a list boards. And so all of this is new, right? We only had the health check before. And now we have, uh, this.

So it calls this get boards, which is down here. Um, I really don't like this, right? The, it, it should not do this. All the database code should go into separate module. So let's start with this here, right? Um, we need some changes. So let's see. All the database queries should go into models slash boards dot go.

Actually, we'll do boards dot go. Um, so that's the first that we want. So we want this to go somewhere else. And this is okay. So the boards response is okay. So it will be a list of boards. Each board will be a database model. But this kind of thing here will be kind of weird because I want the model to represent a singular row only.

So the models should only represent a singular row, not any, uh, joint records. Uh, so we need to figure out how to best, um, query this board then to have this in two. Um, what is it doing here anyways? It is, it's running another query. So this is an n plus one query anyways.

That's probably good enough for now. Um, So let's just say that it should move this over there. Um, the topic should go into models topic.go. And then we have the post. The post should go else. Um, let's just see if it's, if it manages to refactor this a little bit.

Um, and then we see from there what we need to do. "Does Claude Code Visual Extension work if you're outside of Visual School Terminal?" Um, yes. So if the, if the, um, if the integration is set up correctly, it works even if it's running on the side, right? I can also start Claude in here, but I don't really like it.

I prefer this terminal on the outside. Um, but yeah, it's, these changes, they still show up. Um, although I think that this comes actually from the Git plugin. Um, but we'll see. What are the questions there? Um, so maybe I should explain this because I didn't do this, but "plot YOLO" write this here.

It's just an alias for this impossible to pronounce argument called dangerous and skip permissions, right? Basically I run this all the time. Is it a good idea? I don't know. I'm not strongly advocating for it, but I can tell you that I'm using it this way all the time.

Um, so that's why it doesn't ask me for anything. It's just, it just edits. Um, what are my thoughts on Gemini CLI? I will reevaluate it last time I was using it. The problem basically is that any model other than the entropic family of models is not overly amazing.

It will use its usage. So I want to see that these authentic loops work. So that's why I'm playing with it. I have most success with Claude. I also think that Claude is the cheapest option because the 100 euro, sorry, 100 dollars a month package in Sonnet only mode is enough.

Um, and it's kind of hard to beat for the price right now, right? And I don't know how long this price is going to stick here, but that's really why I'm not trying Gemini much. I have Gemini on the system. I sometimes give Claude access to Gemini to read through a code base, but it's, um, I'm, I'm going to get this working first, working well, and then I will try other tools again.

I, I also tried AMP. I tried a bunch of other ones, but, um, this is the one that, um, it's just, I think it has the highest chance of sticking around also in part because the people that write the tool are also the people that write the model or create the model.

And so they go hand in hand. Um, okay. So now we have a board.go get all boards. This looks, this looks quite a bit better. We don't need pagination here because we don't expect that many boards. So that will be quite good. Um, now it uses the scan to feed this and we have a board by ID.

Uh, this is also quite okay. And the board by slug. I am quite okay with all of these. Um, one of the consequences now that all of these methods can return null or board. So if the board doesn't exist, it returns null or nil. Um, do I like this?

I don't know. Um, so we have this most recent post by board ID. Um, okay. So, so one of the things for sure that is not amazing is that it looks like the board doesn't have a pointer to the most recent topic. But the topic as opposed to the most recent post.

So we, maybe this is okay. I, I, I will not judge the database structure too much right now. Okay. So I think we can stick with this. In theory, if we now go to here, it should more look more or less the same. So we have the most recent topic, the most recent post.

Um, let's actually remove the most recent post from, um, of the API because I don't think we need it. Um, so wonder what is not the office here. Okay. Let's leave it for now. Let's leave it for now. But I think we will throw it away. So what I usually do when I'm program with this is I create myself a to do file.

Um, where I basically keep track of all the stuff that I still need to do. So one is, um, I'll call this nets. Um, we should remove the most recent post from the board listing. Okay. So we'll think of this later. So let's have a look at how the API response so far looks like.

So we have a list port route, um, which is hooked up to the router. Um, and it creates this boards response and then response with Jason and get port with recent is what it calls, which is now it gets all the boards and then it gets the most recent topic and post.

Um, yeah, uh, not overly amazing, but kind of, okay. But one of the things I do not like is this part here, right? Does HTTP dot error. Um, and we have this utility here called internal server error. So we'll actually use this. So we'll call, uh, utils dot internal server error, w and error.

Then we remove the other one. So we want to do this and hopefully going forward, we will actually start using this utility instead. Why do I want to use this utility? Well, for the one hand, because it logs the error and it returns with a standardized message. So that's why I want this.

And, um, and then this is okay. And so board with recent is an extended struct that has the board in it plus the extra things here. So this is, this is okay. Um, so let's say we commit this, we'll leave this for later. Um, so let's say edit basic board API response.

So the next thing we want to do is we want to hook up the end, the front end, right? So if we go here, we don't see anything. So let's say we want to have the board show up. I want you to change the front end to show all the boards.

Um, for now, I want you to make sure make sure that we create components for each row on the listing so that we can reuse this later. These rows should be reused for topics in a board as well as for the board listing overall. Um, we might need a parameter to change the, actually, I don't want this.

Let me, let me do this definitely. Um, I want you to now show all the boards the most recent topic in the overall, uh, in the index page, on the index page. Um, And let's ignore the, um, I want to now, this, this might, so the problem with whenever it creates a front end from nothing, it turns into a mess.

Since there's basically no real front end, this might be incredibly messy. And I'm a little bit afraid that it doesn't even manage to set up the router. Um, so I'll see what it does. If I can, we can watch it. In the meantime, I can look at this on questions here.

Yeah. So for how to put the browser locks in a terminal, I used a Vite plugin that I wrote. Um, you can also do this yourself from API endpoint. The Vite plugin was this one here. Um, and this is what it does. Uh, one other question is what the font, the font I'm using is MonoLisa.

I think all the time. This one here. Uh, that's the font. What other question? Yeah. So one question is if you manually edit the code like that, do you have the problem that the model has unedited versions in the context? And yes, this is a problem. One of the problems with this is that it will recall things that you have already thrown away.

This is actually a pretty big problem. Um, this is one of the reasons why I clear the context all the time. Um, the same problem, by the way, also comes up if you do code formatting. It's quite often that the linter and the formatter edit the file in a certain way and sometimes they should get back and forth.

I don't have a good solution for this, but it is a problem. Um, I can't really recommend anything here other than I do want to do these commits. Then I want to clear the context. Sometimes I maintain a to-do list. So before I run out of context, for instance, I tell the agent to summarize everything that it did into a file and I can look at this file later and then continue from there.

Um, so let's see what it did. It probably has created something here. Um, so this is actually an interesting thing. It has not managed to run this, right? And so now, now we can probably see that our tooling comes in helpful, hopefully. When I navigate to the page, I get a bunch of errors.

Please check the log and see what's going on. Right? So it should now read the log, which it does. And hopefully see what it broke. Um, okay, cool. So it managed. Probably it wouldn't have needed the log, but having the log now means that it was just able to go back there and figure this out.

And at least we have something now, right? So I, not that I like how this looks at the moment. You can't even click on it or anything, but, um, yeah, we, we see something. Um, let's make two changes here. I want these to be rows. So one below the other, not next to each other.

And I also want to not show the most recent post. I only want to show the most recent topic. So I just want to make this change and then we're going to figure out how to make it less crappy. Um, cause it probably doesn't look very nice. Um, the way I do front end code at the moment is I let it write a whole bunch of stuff and then I ask it to extract components that usually sort of works.

Um, but front end code, unfortunately, turns out to be very sloppy very quickly. Okay. So, um, okay, this at least is getting somewhere. Um, so let's see what it wrote. So it created an index route. Um, so this is already, we're already sort of down. Um, if I saw, please lend everything.

This is, it's already going to be annoying because it clearly left a bunch of nonsense behind. And so the linter will immediately complain, hopefully that, um, there's unused stuff. So let's see. Um, by the way, in this project, I'm not using any hooks. Um, I do use some hooks in other ones, but I want to start with the basics here.

Uh, okay. So we, we got rid of some unused stuff. Um, I don't know what page is this. We're throwing this away for now. And then it created this API.ts. And this is already messy. I don't, this is already too big. So the API client, I'm actually okay with, it can leave that, but I don't like the types on the same file.

So, um, let's do this. Move the types from API to T or .ts into a separate file. API. Yes, into a separate file. Um, Actually, other than the API client itself. Um, let's see. Just kind of want to move this out. So the types are here now. The API is here.

One of the most important things is to make sure that the files don't grow too large. The larger the files, the harder it is for the, for the system to work with it. Um, so this is, this is, this is okay for now. So we're going to just have not the nicest thing here.

I'm going to manually remove this welcome thing, which I think, where do we have this? Where is this? Let's see here. Um, let's throw all of this away. So we have only the boards. Um, okay. So we have a starting point. The frontend so far, probably a little bit messy, but, um, initial display of the boards in the frontend.

The problem immediately here now is going to be that, um, we don't have the router. We have the router set up, but we don't have query set up. So I think it uses, no, it does use query. Okay. It does use query, which is, that's good. Um, then it uses this get boards function.

It might be okay. Oh, well, we'll see. We'll see how messy it gets as we continue. Um, okay. So what should we do next? I think next we're going to show each individual board. So the next thing we need to do is we need to create these boards. So let's do this.

Um, um, I wonder if I should continue the session or not. Maybe we'll continue the session. Might be a bad idea, but might actually help. Now, please add an endpoint to show all the topics in one board. We will also actually, no, I will, I will, I will do it from scratch here because I want to set up pagination.

Now we need an API endpoint to list all the topics in a board. Note for this, we will need a pagination helper. We want to use cursor-based pagination. That means not offset, but to continue from a specific starting point. And we want to take the cursor to continue to the next page from the URL parameter, because it's going to be a get request.

It's actually going to be a shitty user experience because it means you can't jump to a specific page. So we will not use cursors here. I'm going to use offset-based pagination. We want to take the, take the page and per page parameter from the URL. Um, or else it's going to be a get request.

Default per page 250. Um, also return the total number of pages that exist. Don't hook this up to the front end yet. Okay. In the meantime, I have more questions. I do not use compact at all. Never, ever use compact. If you run out of context, compact is basically a command that just screws up everything.

Um, I don't know what happens if you compact. It's going to be a gamble. It's, it is already random enough. It happens out of the box, but I never, never, never run compact. Instead, if I notice that I'm running out of context, I'm asking Claude to summarize what it did into a markdown file.

I reviewed a markdown file, start a new session, and then read back from the markdown file. Um, because then at the very least, I know what it pulls in the context. Compact, I have no idea what it does. Um, I don't think the tool even shows you what it did after compacting.

So it's, it's, it's a gamble. It's a pure gamble. Um, never do that. Basically, um, auto compact is, I rather have Claude stop than auto compact because it's just so, so random. It's, it's absolute random. I also only use Sonnet. Um, I kind of wish I could use Opus more, but even a $200 subscription, I run out of Opus.

So I just stick myself to Sonnet and I use O3 for planning a lot where I basically go on. Um, I take what I'm working on, copy paste that into just JetGPT, pick O3 and have a conversation about my architecture there. Um, and I can maybe show this later.

I kind of want to hook up the trip codes to, to post something and then we'll see, um, how well this works. So what I should have received here now in theory is, uh, what did we get here? We got the pagination. So these are the parameters for the pagination.

We get an offset. Interesting. Why do we have an offset? Um, okay. So we're pulling this from the query page and per page, the offset is calculated. Um, and then this pagination meta is probably used in the API response. So what did it do? So it lists the topics and this is a list of topics and the pagination meta.

Um, do I like this? Do I like this? I don't know if I like this. Um, do I like this? I don't know if I like this. We'll figure this out if I like this. Um, I think I might want to rename this a pagination probably, but maybe this is good enough.

Uh, so let's see. So in theory, there should be an API now for me to hit, uh, API boards, which one test. This has stuff in it. Uh, what the slash? No. What's, what's the API? Um, board. What's a board? Board ID. Is it a board ID or is it a board slug?

Let's try. What was the other API? So that's boards. Uh, board ID one is a test board. You get something here? No, we don't get anything. Oh, slash topics. With this. Okay. Board not found. So we need a test probably. Okay. Um, So we get this total one, total pages one per page 10.

So if we do, uh, page equals two, we get an empty list. Three empty list. Okay. This, this is okay. Um, I do want to change the meta to pagination though. So I think we're just going to do it manually. I'm going to call this pagination. Um, okay. So that's up.

I changed meta to pagination. So I just give it this context immediately interrupted just so that hopefully, um, it doesn't get confused later, uh, through a manual edit. Um, there are some questions. I will quickly go to them. Have I used context seven? Yes. Um, I don't have good experiences with it.

I don't use any MCP servers other than playwrights. And I try to not use playwright either. Um, then the other question, uh, yeah, in general, I don't like it looking up docs. I much rather give it the docs myself, uh, if it needs them. I I'm very conservative on context usage.

I don't like any tools that pull anything in automatically. I optimize everything for low context usage. Um, I want you to now register a URL for the board. So if a user goes to slash b slash slug, then we will show the most recent topics there and add a pagination for previous and next page and a basic overview of how many pages exist and a quick jump to a particular page.

Did manage, um, the like slash b. What do we have here? Slash slash test. Maybe just do, um, yeah, let's do a slash b slash slug. I like this. Um, and show the, so the topics there. Most recent first. Um, and this is good. Note, we can always rely on monotonic increasing primary key integers for board order.

Um, because we use SQLite here and I want to avoid, uh, it's using dates right now. Um, also please use the link component to link from the index page to the board. Let's see. Okay. Let's see if it manages. Maybe in the meantime, there's some questions. Uh, again, that was another question what I use for text to speech.

Um, right now this is using whisper flow. I also use voice inc, which is open source. Uh, both of them work. Um, I just, I'm trialing whisper this week. Um, normally I use voice inc. Um, say give or take equivalent on a Mac. Um, are there any other questions that I can answer in the meantime?

Because I'm pretty sure this is going to take like four or five minutes for it to produce something reasonable. Um, Earlier there was a question if I'm streaming, this is the first stream I've been doing in three years probably. So we'll see if I will do this again, but, um, yeah, that's kind of lazy.

Um, yeah. So how do I write the logs automatically to the dev log file? This is what I'm doing with shorman. So if you, let's do this in the meantime, let's put this on GitHub. Then you can sort of steal the shorman fork that I have. Um, don't want to put the whole thing up there, mini db.

Let's make a repo mini db. Create new repository. Mini db. This is right through that, I put it in port or screen first. Um, bum, bum, create repo. And then put this up there. So, um, in here in scripts, this my shorman fork, I should probably, now that this is on GitHub, I should probably make sure that the licenses that should be did correctly.

Because I did not put this in. Uh, let's edit this quickly, shorman. Oh. It's always kind of funny if you have a bunch of black pullet stuff. Because, um, you cannot really copyright any of this stuff. Um, very little. There's going to be a bunch of court cases. Um, edit license to shorman.

Okay. Um, but yeah, shorman is, is what I'm using for the, for the logs. Hey, it's this. Okay. So, let's see. Um, do we have a frontend now? So, I can click on this and I get not found. So, clearly it doesn't work. Um, I navigate it to a board and it didn't work.

Check the logs. Right. And this is again why the unified logging is so functional. It sees my browser logs, right? So, it doesn't just see the server. So, it should hopefully figure out what it did wrong. Um, I don't even know what happened. Because I, I'm assuming that this wasn't too wrong that it did, right?

Because there's a B board TSX. This should work. But maybe it doesn't. Um. So, hold on. Could it be that we don't run the 10 stack plugin for Meet? Um, that would be a problem. Like this is supposed to be a 10. Because the plugin for 10 stack should do all of this.

Um. So, you probably don't have this plugin in there. Does it work now? Not so far. Ah, because I didn't plug it in yet. I actually think that this is. Ah, ah, look at this. But isn't this nice? I didn't even have to figure out what's going on. Like, okay, I did give it a hint that it has to set up the plugin.

But, I mean, this is, I love this. This is just so nice. Um. So, here you see one of the problems, right? So, it wasn't the wrong folder. So, it couldn't figure out how to make the tail. And then it immediately ran and went for the, for the log itself.

And that's one of the big problems right now. Why I'm so careful about giving it, um, the right context. Because it actually went the wrong way. It still managed to succeed. But it should really have cd'd into the right folder. And run make tail log there. And it just didn't do it.

And, and this, this is basically contributing to context rod, right? Now, now it has remembered that this didn't work. But this worked. And it shouldn't, right? It should not make these mistakes. And I'm, I'm, I'm trying to nudge it in the right direction. Um, so one of the things I can try here now is that, um, on the make command, maybe we can.

Um, honestly, I, I only have partially good results with this. If you fail to run the make file, um, you have to remember that you have to run it from top level. Let's put it here. Um, the make commands. Right, so, so maybe this will, oops, maybe this will nudge it in the right direction.

Um, but can't guarantee. But still, I mean, like, pretty cool. Okay, so, so we have this now. So the board roughly works. So let's double check quickly what it did. Um, so we have, this is auto-generate. We don't care. Uh, now we have here a link. So we can check this again.

But it added the link component as instructed. It goes to the board. And the board itself. Um, it has imagination somewhere here. But we don't really see it. Because we don't have enough topics. And now, now let's do this. Uh, now let's be creative. To test this better, I want you to generate 120 different posts across 10 different topics.

And put them into all the different boards that exist already. To make this easier, please write yourself a little test script. Um, and just... Actually. Do I want it to write a test script or figure it out itself? Um, just use Python for this. Use UV. And put it into scripts.

We might need this later again. Right. So, um, and basically, um, actually hold on. One, one last thing. Um, but please use inference to generate a bunch of real sounding conversations. And pipe them into an input file that this script will then use. Okay. So, basically, I want to get it into a situation where it now generates me out an entire board.

So I can test this better. Um... Mario, since you're writing, back to Whisperflow. Um, I'm trialing them both. But the problem with Voice Inc at the moment is that the AI integration just adds too much latency. And I want a little bit of fix up. Um, so for the screencast, I opted to Whisperflow.

Um, it's all about latency for me. And Whisperflow is, is, is just the lowest latency thing I found. Um, so this is, this is really, really why. Um... For me, one of the really big benefits of agent decoding is actually test data generation. Because I'm actually struggling a lot with traditional applications that all of my test data just looks not great enough.

Um, and now you can just get an LLM to really create your pretty good looking test data. And it makes it much easier to see the product, to feel the product, uh, and to experience what it looks like. Um, so that's just such a nice, uh, nice aspect of it.

This is going to take a while. So maybe we go to questions. Um... Try use Tmux for better lock tailing and running servers. I don't know. I like what I have. Works good enough for me. Uh, what else is here? How do you disallow MCPs? I just don't load MCPs into my context in the first place.

Um, one question is, do you have any experience with the amount of users you get out of a $20 cloud subscription? I don't know is the short answer. Um... I think that you don't get that much out of it. But I'm not sure. You can try it and see.

I can tell you that with a $100 cloud subscription and you only use one agent at the time with Sonnet, you're not going to hit the limits. With two or three simultaneously, you can hit the limits. Um... With $200 subscription on Sonnet, I don't think you can run up and run into the limits.

I don't think it's possible. Uh, but with the $20 one, I'm pretty sure that you can run out very quickly. Um... Do you use the plan mode? So because I use dangerously bypassing permissions, I don't really use the plan mode explicitly. And the problem for this is that it actually disables a bunch of things.

So when it plans, it permanently asks for permissions for all the tools. So I basically ask it to plan without plan mode. Because the plan mode, as far as I can tell, at least in parts, auto activates just unprompting. Um... But that's really why I don't use the plan mode.

Um... And that's sort of the answer. So... It's still generating. Um... Yeah. They're like... I think they're probably like 10, 15 different pretty decent voice-to-text things at the moment. Um... For all kinds of different setups. And I think it's a little bit ridiculous to pay for Whisperflow. And I don't really like that because it is...

The magic is happening on device anyways. Um... On... On the Whisper model, which is the open source one. So... Um... Yeah. I hope we just get to the point where, um... Something like Whisperflow in an open source way, um... Becomes like a... Like a thing that everybody contributes to.

Okay. So it's now generating words. Cool. So look at this. I have... We have stuff to look at. Is it not nice? It just auto generates all of it. Nice. Best setup for home office. Coffee setup for home office. Look at this. Um... Cool. So... We have... We have content.

Which is cool. And it wrote me this little script here. Um... To populate the form. Right? I will not even look at the script. Don't have to. I don't care. It did its job. Um... So... What I will do now is... I will commit this. Uh... We'll do... First we do...

Web. And we'll format this quickly. And then we check in... Edit... Board listing. No topic listing in boards. Um... And now we add the scripts. Edit... One pop... Relator script. So now that we have this. We can do one last thing. Where we do... Actually I should probably have...

Checked that we had here. Um... Anyways. It doesn't matter. Next thing is we're going to enumerate the topics. Now I want you to make... Um... Um... A way to look at all the topics. So basically we are going to add an endpoint to see all the posts on a topic.

With pagination. Same general API flow as we had for the... Board index page. And we also want to add the front-end component and the front-end page to... Show that too. And again also support pagination. I don't know if this will work. Let's see. And I'll go to the questions at any time.

Um... Yeah. This monthly paying for basically local whisper models is nonsense. Um... I'm... I'm actually quite okay for paying for the API inference. But I also don't think that... I think it could actually fix up a lot of the little issues with voice input on a very well-trained local model too.

So... Yeah. Um... CI is not an alias for commit. That's an alias that I set up. Um... So I have an alias in my git config. This... Git config. CI. So I have a bunch of these ones here. Um... It was an alias on Mercurial, which I was using before git.

And I got so used to it that when I moved to git, I set up this alias and never went back. I have no idea what Kimi is. Uh... You mean like Kimi v2, the... This new model? Is that... Is that... Is that the new huge model? Is that Kimi?

I haven't tried it. I heard that it's... Pretty good on open code if you use it through... Uh... I guess open router or something. But I haven't tried it. Um... Didn't have the time. Um... Okay. So... Very slowly this will start working at one point. We'll see. We'll see.

I mean this is not a very interesting screencast in many ways. Because this doesn't really show a trend decoding all that much. Because there's really not that much to see. I'm just adding more of the same now. If I still have some time, I think I have 20 minutes left.

If I still have some time, I will try to add some tests. Um... Which I think is more interesting. The next question. What do I usually do while waiting for Claude? Um... So this is the moment when I'm going to pitch Vibe Tunnel. This is a thing we built.

Or actually I think I was barely involved at this point in this project. This is a... I think this is primarily now Mario's and Peter's project. Um... But it's a way to... Basically run all of your Claude instances through the browser. So I can go in for a coffee and see what it's doing.

That's the... That's the general idea here. Um... But the answer of like what do you do while... Um... While waiting for Claude is you go to Twitter and you write stuff, I guess. Um... See the problem with the music is... Let's see. Let me see. I need to... I need to turn on the screen capture sound.

So now... Hold on. Can you... Can you now hear the terrible music? No, it doesn't work. It doesn't work. Can you hear it now? Anyways, that's the music. Let's see if we see our topic. It's still been in front. Yeah, so maybe... Maybe here's an interesting thing. Why am I using Go?

Uh... Go is... Go... Go is a language I don't like. Um... As a... As a... As a human writing code. Maybe now that I'm sort of writing it more indirectly, I don't mind it quite as much. But I kind of want to show why Go works so well for authentic coding.

Um... Look at this. Okay, maybe this is the bad one. Maybe we are looking at the handlers. I mean, look at this. This you would not write in any other programming language than Go, right? You wouldn't say... If error not nil, return internal server error. Like I have one, two, three...

Three branches just to handle server errors. And I know that there are ways in which I could do this differently. And then return an error and like handle some of it on a higher level. But... My point mostly is... In Python, you wouldn't write it because it's ugly code.

In Rust, you wouldn't write it because it's ugly code. In Go... A lot of Go code looks like this. And it's perfectly fine. So the bar of error handling in Go is exactly that bar. And an agent are writing exactly that code. So it is not any worse. And one of the consequences of this, like all the handling is local.

So it's very easy for the agent to understand what's going on. Because it doesn't have to look through so many layers of abstraction, right? It sees basically everything that's going on in this function is going on in this function. And not anywhere else, right? It doesn't have to understand complicated error handling patterns elsewhere.

It's pretty straightforward. That's why Go is so good from a code writing perspective. The other thing is that... All of the meta shenanigans that this language has is pretty standard too. Like there's not a lot of complexity you need to understand. Yes, there's some attributes on it. But it's good enough at comprehending this.

And the last part is if you run the Go tests, then it caches them. And so you can basically... And I don't have the test setup yet. But you can... With Go, you can basically tell it to run all the tests at all times. Without it slowing down the authentic loop.

And that is so good. Because it means it never accidentally tests too narrow. So in Python, I have it that tests one function only. Because it explicitly only tests that function. And it completely forgets that it five minutes ago broke another function. And only at the very end, it discovers that it made a huge mess.

And with Go it just doesn't happen. And I will show this in a bit. But now supposedly I can look at the topic. But I'm actually not sure if that is correct. Because it doesn't seem to work. So... Well, I can click on something and nothing happens. What's going on?

So, when I click on a topic nothing happens. I don't actually see the topic. I will stay on the board page what's going on. But we can in the meantime look at the code that it generated. So, we got more API to return posts. Then we get... What's here?

What do we have here? Get posts for topic of vaccination. I'm assuming this is probably okay. What did it do here? What did it do here? ParseInt. That's... Why? Why do I have a parseInt all this time? That's the kind of slope that should go. So, this will go into our to-do list.

Get rid of parseInt. This slope should go away. Okay. So, so this... Okay. So, what do we have here? We have... What's going on? VoidT. I don't actually like to depict T. Why did it make this folder? I can delete this folder. Did it find the problem? Well, I don't think it...

I think it's completely wrong on what it's trying to debug here. But look, it's checking the route tree if it's regenerated. So that's positive. Wait, I think it's the issue now. What's the issue now? Is that blah, blah, blah, blah. This is all wrong. Actually, I think the issue here is like...

Let me try. I think the issue here is that You need to create $ward here. Then this has to become index TSX. Always. I think it has to go here. Move. And then this has to be... e.topic. I think that is how this works. Is this how it works?

Or did I fuck up everything now? What's going on? Compare. Okay. What's going on? All right. Okay, I broke everything. Classic. But what did I break? I think I broke something. I'm always confused. So not only am I confused by 10 stack router. It's also that the LLM is confused by 10 stack router.

But I've been in this situation before. And I think it's related to that. It has to be in this right structure here. Yeah, look at this. Now it works. Okay, cool. So welcome to the new forum. We see stuff here now. This works. Nice. Still slop though, but slightly better slop.

So let's commit this and then try to make a test. Let's finish it off by adding a test. Make format, web, edit, topic listing. And maybe one last thing we could do is like actually add support for. But I want to write a test. I think I want to write this.

Let's see what else was written there. How do you feed from it? Yeah. So once more, the frontend log to Cloud Code is basically a frontend. I have a plugin that forwards this. Yeah. This is nothing like cursor. Like even the cursor agent is nothing like this. Like this is a completely different experience.

Okay. So let's write some tests. This is what we're here for. Let's write some tests. So we want to write some tests. But the problem with tests is that agents are not very good at writing tests. That's really the reality of all of this. So we're going to write one test.

And I think we're going to... Actually, before we write a test, we will write a way to create posts. I want you to add an internal function to create posts, which we will then hook up to an API later, but we don't hook it up yet. And we also want the function to create topics.

So that is basically creating a post plus a topic in one go. And then I want you to write a singular test that creates... No, no, no. I don't want it yet. Okay. Let's do these APIs and I will make a test plan here. Test plan. Because the thing with the test plan is that...

Here's how usually I want tests to work. All the database tests should use rollbacks. That's actually the biggest problem. Because the way it wrote the test right now is it wrote it against the underlying SQLite code. And the problem with this is that this doesn't have enough abstraction to allow you to...

Basically, have implicit rollbacks. The way I really like my code to work is that you can do something like this. That you can do... The way I like it is that you can write tests that... Insert, insert, insert, insert, insert, but then the rollback. And for this to work...

We need to change too much. Because we need to basically... If you go with post... Right, this here, for instance, it takes a SQLite DB. But when you do a transaction... When you basically do... Your txn error equals... DB.begin, I think. Right. If error not null return null error.

Right. It must be this. Right. This here is a different type. Yeah, I also want... I also want this different. Come on. Yeah, there we go. So this here... This is not going to be a problem now. Because my parameter here can be either a database... Or it can be a transaction.

Right. So for my test setup to work... We basically have to refactor the entire code base. And they're just not amazing ways, I think, to do that. So this might be annoying. Let's see. So create topic. Right. So here we have this one. Right. So it creates a... It creates a transaction.

And now for this to work with my intended rollback strategy, this also has to be save points. So... So this might be annoying. So this is going to be the point where maybe we're going to ask Gemini. Because I think... Actually, I think that Sonnet might not be able to do this in a good way.

So let's see. How do we do this. How do we do this? How do we do this? How do we do this? How do we do this? How do we do this? How do we do this? How do we do this? So let's commit these creators. Edit functionality. I'm going to add up here.

See, I can't type them down. Do create posts and topics. So now we should... We should come up with this test plan. Um... Let's... Let's see if Sonnet can do it. Um... I want to write some tests. But the way we're doing database transactions right now doesn't work for how I want tests to work.

Please ultra think how to re-architect the code to support this better. Um... Let's let it do this thing. I just don't want it any more complicated. But there's just one way to test databases and that's rollbacks. And... I think... I think we might need to do this by factor.

Uh... Any other questions in the meantime? Uh... No. No other questions. So... At least I think there are no other questions. Uh... So the git... The git repo is on git already. Uh... On github. It's... Here. Mini db. Why did I call it mini db? Uh... It should be mini db.

It's mini bb. There you go. Uh... Yeah, yeah, yeah, yeah. There you go. Um... The other thing is like for for this authentic coding with streaming, I don't... I don't quite work like I work normally because first of all, I don't talk all the time. But I also don't stay engaged with the agent as much as I do right now.

Um... There's a lot of waiting involved. So I try to paralyze work. I try to do other things in the meantime. So it's a little bit... A little bit different. So let's see what it did. It... Decided that... We are going to use an interface called Querier. Huh. Really?

That's what we're going to do? I don't think it's going to work. I don't think it's going to work, man. Ah, come on, come on, come on. Think hard for this problem. Does this actually work with nested transactions and save points correctly? Question mark. Because I don't think it works because it will have to know how deep it is.

How do we distinguish between Claw and Gemini? What we use Gemini for? Gemini, the model, is excellent at programming. It is also excellent at thinking, if you can call it this way, and creating architecture and back and forth for this. Gemini CLI, the command line tool. It's not amazing, and Gemini, the model, is not very good at tool usage.

So, for the agentic loop, I still haven't found anything better than Sonnet and Opus. But this is also why, and I mentioned this earlier, I use O3, and sometimes I use Gemini to plan larger changes, and then I give the output of that to Sonnet. And I just do the planning of larger changes just in the UI and ChatGPT or in the AI studio for the most part.

Then there was a question of vector. I tried OpenTelemetry and a bunch of other things. It creates too much nonsense, too much output with all of the spans that it produces. And it didn't work quite as well as just a simple thing of logging everything into one file. I actually struggle to make this work.

And I find it also to be quite involved. And also Gemini, sorry Gemini, Claude, to just not fully understand how OpenTelemetry works. So, right now at least, simple log files work incredibly well. Complicated OTel stuff. It doesn't work good enough for me. I would actually love to see someone show how to use OTel successfully for agentic workflows.

Just, it didn't work for me is all I can say. So, what did it say? What did it say about my interjection? Did it say something? So, you're right to question. My current approach has fundamental flaws with net transaction save points. What? How did you fix it then? I, the problem is like, I know how I set up this to normally work.

And I don't know if the AI can actually one-shot this. Okay, so it has a nesting level now. It has save points. Maybe, maybe, maybe. Okay, so we have a board test. So, we're going to set up test DB. Okay, so it creates a SQLite memory database. Ah, this is slow, pure slow.

Why do we do this? Why, why? Great. Should run the real migrations. So, you can already see at this point that this is it. I can already see that it's now no longer going to give good code. And it doesn't even have that much stuff in the context, but it's already, it's already making mistakes that it doesn't do on a smaller context.

Like, it's, it's, it went too narrow on one specific problem. And this is the point where I no longer expect good output from this, actually. Does it even manage to run migrate? Migration for testing is run migration. Like, why? Why are we doing this? And, and what is begin tx here?

Uh, now it's, now it's turning into full slope. Um, and this is all just for the test setup. So, what I will do now is I will, I will make a branch. Uh, testing setup. Because I don't like any of this. Um, make format. So, we're going to, um, pretty initial test setup.

It doesn't quite work. So, we're, we're going to, we're going to, we're going to go back to the drawing board here. Um, so, um, I, I think this is, this is awful. Um, this, this might be really, really awful. So, let's do a div to name. So, where, where did the slope start?

Um, this is, might still be okay. So, the strategy now will be to unsloppify this and to get it to do something. And, maybe the way of doing this will be to get the tests rns to run. So, the board test, we're going to make this not terrible. Um, so, run migrations for testing should just be run migrations.

Why is run migrations here in lowercase? Because we have init. Okay. So, we have init, which runs migrations. And, that is what the server is doing. Right. So, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um, um. So, run migrations for testing does this instead.

So, let's start with this. Um, take this and make a test utils package, which creates the test harness. Run all, add a test setup function, which takes a callback, which handles migrations database in memory set up and tier down. Then update the tests to use this. I also want you to reuse the test database between test runs, so we don't waste quite as much.

Um, actually, I don't want to explain it, but I don't want to migrate all the time, basically. Um, and then we should test if this actually works, so create, let's review this. Create post now uses q exec. Um, and so where, let's see what we have to begin, dot begin.

What do we do dot begin? In transaction goal, libigo, migration, word test. What? This is, this is just all nonsense. Tests are the worst because it doesn't understand how to create a test harness. Um, this entire thing should go, this transaction manager. So, set up test DB. Like, what is it doing now?

What's it doing? All right, I might actually have to defer this to next time. But I would love it to at least set up the harness correctly. Um, so here is, here's my best recommendation here at this moment. Don't set up tests initially with Claude. Because it just doesn't understand how good tests should look like.

And I don't know what it says about us as programmers, but the way it sets up tests is just bad. I can only assume that the bulk of people out there are writing horrible tests. Um, all, all of this is wrong. Like, all of this is wrong. What, what it should actually do is we should set up like a really good, um, transaction infrastructure in the beginning.

Um, the, the, the, the pattern I like to use here is actually from Django. Django has these atomic blocks. They work quite well and they hide save points and transactions properly. So, I should actually do that first. Get this in a good spot and only then start writing tests.

Because everything that is done here so far is really, really bad. Yeah, so, um, I might do the following. I might let this run, um, and actually set up the tests correctly in a way that I like. And then I will show either at the future stream or just in another, um, like a video or, or just like a follow-up post of how to run the tests.

Because I don't think we're going to get to a reasonable point in the next 20 minutes. And I don't have that much time. I, I gave myself an hour and a half and already over time. So maybe I will do two more minutes of last questions. Um, yeah. But I think I will leave it here.

And then, uh, thank you so much for watching. See you next time.

Agentic Coding with Claude Code

Transcript