back to indexAI in Action: Refactoring with PlantUML, N8N Workflows, and Claude vs. Gemini

Chapters
0:0 Lighthearted Intro
0:44 MCP configured with Codax
1:6 How the MCP and Codax CLI works
1:28 Running the analyst in an E2B sandbox
1:34 Trouble with MCP's interaction with spreadsheet data
2:24 N8N workflow demonstration
3:59 Generating a workflow in N8N with O3Pro
4:18 Assembling a workflow in N8N with O3Pro
4:47 Using PlanUML to define code base refactoring flow
4:59 How PlantUML works
5:7 Connecting MCP to own documentation
5:40 MCP managing a JavaScript sandbox with PlantUML integration
6:6 Using Redis for communication between agents
9:59 Refactoring plan for a codebase with lots of garbage
39:41 "kogi plantuml" search
40:7 PlantUML Activity Diagrams
40:48 Experiences with Claude Code and Gemini
41:14 Elaborating on the plan to refactor the codebase
42:13 Example of a PlantUML diagram from ChatGPT
42:20 Linking the solution to the video
47:11 Elaborating on their code base refactoring flow
50:20 Own MCP, "Jesus"
50:45 Own MCP, "Go-Go-Goja"
51:6 Own MCP, "Vibes"
52:3 Experiences with Claude and Gemini
55:20 PlantUML diagram generated by ChatGPT
56:10 Avoiding over-fitting with their models
58:10 How they avoid over-fitting with their models
59:9 Working with different models at once
59:16 Importance of automating the code base refactoring flow
00:00:07.820 |
I'm excited to see this, though. I'm going to have to duck out at half-past. 00:00:28.760 |
Adam, do you want to just drive? Do you want me to intro? How do you want to do this? 00:00:32.620 |
I'll just drive. Let me see if I can get this command to run on the first shot, which is highly, highly unlikely. 00:00:47.740 |
That's my hopes there. In fact, if it runs on the first shot, you should not trust it in any capacity. 00:00:55.980 |
Exactly, exactly. Yeah, so I'm going to see. Because this is going to take a while, I think I want to spawn it now. 00:01:02.580 |
I want to see if I can run this analyst that I have set up in an E2B. 00:01:11.420 |
So that's these tiny little, like, Docker containers, you know, I guess, that you can spin up and have all sorts of toolkits available already on them. 00:01:19.780 |
I'm going to say complete as much of advent of code 2024 as you can. 00:01:27.600 |
I only put one puzzle in there. And so I have it on this branch and I have it in this repo. 00:01:39.520 |
Because, because I'm in the wrong repository, you know, of course. 00:02:05.040 |
Right, yeah. Why doesn't this have a package JSON? 00:02:11.500 |
Oh, because it just, because it just doesn't. 00:02:13.700 |
Let me see if this is going to, or maybe this is kicking off very quietly. 00:02:24.540 |
And I don't know if anyone's ever messed with this, you know, this service, but so I should have, yeah, so there's a little sandbox in here and it's running and it's just kind of doing whatever it is that it might be, that it might be doing. 00:02:47.280 |
I'm a little bit of a maniac and so I have hooked it up to a, you know, your tool of choice here, right? 00:02:57.280 |
And so it created a sandbox and it sort of launched it. 00:03:05.620 |
And the first thing that I do is make sure that Claude's up to date, which it is. 00:03:09.780 |
And then it says, okay, I'm supposed to complete, I was told to complete, redo this, read me, you know? 00:03:18.960 |
And I am on, I'm on Opus and I have some MCP servers. 00:03:30.840 |
The only MCPs that I have in there are, well, I have, you know, Claude, I just sort of list all of their tools right when it moves up. 00:03:41.300 |
So this is just all of the stuff Claude just comes with. 00:03:45.980 |
This is my own, like, silly MCP server, which is really just a way for it to access my own written documentation about how I like to structure repos and, you know, my beyond Lint preferences. 00:04:03.200 |
Why wouldn't, why wouldn't you use ClaudeMD files for that? 00:04:12.180 |
Um, I, I have used some, uh, windsurf rule files, which is like the same thing, but when I'm like at personally, I'm always bouncing from, um, from like, uh, from ID to ID and I want to mess with codex also. 00:04:31.700 |
And so I just found this was at the time, the most reliable way for me to, uh, put things in there. 00:04:37.640 |
And then I can also put my own silliness in there. 00:04:41.200 |
Like, you know, uh, if you could ask the user a question, you know, here's an ask, it just always responds with users not available, please proceed. 00:04:49.020 |
Uh, but I'm just curious how often it would ask. 00:04:54.360 |
Uh, how have you found, uh, where, where are you? 00:05:01.520 |
So you're coming at this, not from a development perspective. 00:05:05.360 |
I'm trying to understand where you're coming from here. 00:05:08.780 |
Um, how have you been finding codex and are you talking about codex CLI or the codex platform? 00:05:15.860 |
I am talking about the codex CLI and not platform. 00:05:24.120 |
big distinction, you know, I, I think that, uh, I'm, I'm, I'm honestly like in the process of getting it rigged up in this same, you know, like E2B setup and it's, um, and I don't like it as much, but maybe it's just because I've been biased towards, uh, towards Claude code. 00:05:43.020 |
Um, like one of my, one of the things that I was just, you know, just my hot takes, you know, like today I was just reading about how to set up its MCP servers. 00:05:52.540 |
And it's like our MCP servers and it's like, our MCP server setup is just like Claude's except it's a Toml file and the key is named something different. 00:06:06.680 |
Um, so do you have a Claude max sub or are you just paying by the API call? 00:06:15.020 |
Uh, both, unfortunately, uh, to do it in the background like this, you'd have to pay by the API call. 00:06:23.960 |
So this is gonna, you know, this will spit out a charge. 00:06:26.600 |
Why, why would you have to pay for the API call just to programmatically? 00:06:31.520 |
Um, now I could be wrong about this, but, uh, I thought because maybe because I'm like, it's not me sitting there at the terminal that it's like, I'm using it as a, I should just be able to throw my. 00:06:44.780 |
You can programmatically use Claude code within your max subscription. 00:06:48.320 |
A lot of people, uh, work with like multiple work trees and, and they'll have like multiple terminal windows open and even like terminal managers. 00:06:57.500 |
And there's, there's actually, it's a very busy space right now. 00:07:00.180 |
Um, people trying to like put their own custom harnesses around Claude code to, to better manage the sessions. 00:07:06.000 |
Um, I can for sure see that, uh, I feel like I have an upper limit of how much I can just keep mentally track of. 00:07:13.080 |
Very, very, yeah, that's, that's really, um, I mean, that's the, that's the crux of any, uh, agenda development, uh, enterprises. 00:07:21.000 |
Like you have to, you have to, these tools allow you to accelerate a lot and sometimes they allow you to accelerate far past your capability to keep yourself in check. 00:07:30.480 |
Um, that's a very common anti-pattern I've seen with developers who are freshly adopting this technology, uh, because it's very exciting to get development. 00:07:39.420 |
Um, and it's very exciting to see the speed at which things start to work. 00:07:43.820 |
Um, but also like if you're not being extraordinarily deliberate and planning from the outset, uh, to make a code base that works well with AI, then you're just gonna, uh, you're gonna quickly find yourself at a tipping point slash critical mass of spaghetti. 00:08:02.160 |
Um, where there's just nothing to do with it and you just gotta scrap it and start again. 00:08:06.300 |
My, uh, analogy or the way that I think about this is like most, um, most developers are, uh, are kind of, you know, the, the closest. 00:08:20.940 |
I can think it's like a, like a Kung Fu master, but they, they follow a very, a single, a single line of thought, very disciplined, you know? 00:08:30.480 |
Um, and like, that's what makes a good code base is a very disciplined, you know, sort of well paved roads all the way throughout. 00:08:36.420 |
Whereas, uh, the models know all the techniques instead of just one technique. 00:08:42.740 |
And so they make a spaghetti because they, because to them it is interchangeable. 00:08:48.060 |
Um, so that's where the cloud MD files and the, uh, documentation references within the repo come into play for me. 00:08:56.920 |
So the cloud MD files are great because they work in like a nested, uh, a nested way. 00:09:01.660 |
And so you can include them as low in, uh, the directory structure as their relevance, uh, applies. 00:09:10.320 |
So if you've got a module where there's specific rules around like the way that this module works and connects to other modules, you make sure that you've got a cloud MD file file is in that folder. 00:09:20.440 |
Well, not just that, but you've got a cloud MD file in that folder. 00:09:25.200 |
So I tend to have like a top layer, uh, underscore doc underscore ref folder in which I keep, uh, XML formatted documentation for the entire technology stack, uh, in my, uh, project. 00:09:42.420 |
And I have little index files in there that, um, also an accidental format that point to where different components of those technology stacks are noted in there. 00:09:52.860 |
And so in the cloud MD files, I have it referenced the index to find where it needs to pull that. 00:09:59.280 |
So that way you can like lazy load relevant context without like eating up the context window to search for that context in the first place. 00:10:06.900 |
And that greatly reduces context bloat and allows me to, uh, keep task chains running for longer without interruption. 00:10:18.960 |
Let me see if I can pull a code window in here. 00:10:23.460 |
Uh, I mean, I must have, yeah, so also just making sure your, your whole workflow is as modular as possible, because like if the, if the, so the less, the less context, the AI needs to be considering to make a productive change, uh, to your code base. 00:10:48.000 |
probably better, probably absolutely unequivocally the better outcomes you're going to be getting. 00:10:53.640 |
And if you can structure it in a way that you've got like different, uh, tasks that are in, uh, different, uh, modules that may, you could, that can make sense to have like multiple. 00:11:05.700 |
Work streams running at one time where they're not going to like actually get all up in each, like with each other in the middle of doing a thing. 00:11:15.760 |
So one thing that I've done with another, uh, another repo that sounds a lot like what you're talking about with the, with the, you know, like I have a top level one, but I haven't put one everywhere. 00:11:28.140 |
Uh, is with, with windsurf at least I know you can, uh, you can, you can put like a bunch of read me. 00:11:37.380 |
You can put a bunch of markdown files in the dot windsurf slash rules directory, and then you can have these different things that are like, okay, this is always on. 00:11:49.200 |
Like this is, you know, this has to, this has to be used. 00:11:53.220 |
Uh, and wait, wait, wait, how big is that file? 00:12:07.860 |
I mean, so this stuff is relevant to every single task. 00:12:13.680 |
Uh, I, I believe so at one time, you know, it's, it's probably honest that this, that parts of this could be, uh, could be cleaned up. 00:12:23.160 |
Um, you know, but then you can say like, oh, you know, like this one's a great, a lot of useful information in here. 00:12:31.440 |
Uh, but then I, I think you can do like a similar thing to, um, like embedding a cloud file in a sub directory that you can tell windsurf, like, 00:12:40.920 |
Hey, only if it's, you know, this packages and, and only if it's in this sub directory, then bring up this file, uh, and, and then you'll know what to do. 00:12:52.800 |
So that's, that's, that's kind of, that's, that's similar to, to what you're talking about with Claude. 00:12:58.500 |
I have not optimized, uh, this particular repository for, for Claude. 00:13:04.720 |
I'm trying to get it to rely on reading these files, you know, like as it's, as it's coding to, and, and deciding when to read these files. 00:13:16.220 |
So generally when it's, when it's making those efforts of its own accord, it's searching wide almost every time. 00:13:23.200 |
Like, and like at least every context window is searching wide a fresh and that search is going to lead to a lot of context rot because also everything that it's looking at in its search for the thing that it needs. 00:13:35.860 |
Um, generally speaking, I guess it depends on the exact interactions of the sub agents. 00:13:40.180 |
Um, I don't use windsurf very often, so I can't speak authoritatively on, on that specifically, but in my experience with the other tools that I've used, uh, a lot of that context rot, uh, makes its way into your outputs. 00:13:54.620 |
Um, and so that, that can be something that sort of just like misdirects it in subtle and obnoxious ways. 00:14:00.680 |
Uh, and it's one of the reasons that people end up clearing, uh, like Claude code sessions so frequently. 00:14:09.360 |
I felt like I had read at some point that you can tell Claude code to use sub agents for certain tasks and that it will respect that. 00:14:20.320 |
Um, I have not, uh, I guess I have not tried that and seen it and then observed it working, uh, really successfully for me, but I, but I, if you're telling. 00:14:31.160 |
For the sub agent workflow, you want to make sure like what you're doing is appropriate for, for sub agents and that like it works as a, as like a work pattern. 00:14:39.820 |
So, uh, you, you'd want a multi-layer kind of delegatable tasks. 00:14:46.720 |
So say you have, you're planning out your project and you've got it into like this series of, of well-contained actionable tasks. 00:14:56.320 |
You can tell Claude code to spin off a sub agent, uh, for each of these tasks, uh, serially, uh, if they're not parallel processable and then each. 00:15:07.340 |
And you can then also specifically give it like which sub agent to use. 00:15:12.580 |
So like, if you're on a little more limited description, you can go with sonnet and sonnet is still absolutely good enough for a lot of like well-defined contained tasks. 00:15:20.740 |
Uh, and that can like greatly expand the, the resources you're working with over the course of the project. 00:15:26.020 |
Uh, but anyway, each one of those sub agents that kicks off for that task is going to not pollute the main context other than to say like job done boss. 00:15:35.440 |
Um, and then it'll kick off the next thing, which will be a new, uh, sub agent with its own fresh context window. 00:15:41.620 |
So you really have to make sure you're setting up the dominoes well for them to be knocking down, uh, because if you just say use sub agents, uh, it's not going to know when it's probably never going to choose it. 00:15:55.000 |
Um, I, it, I wouldn't say never, it could be very creative if you give it, uh, if you give it unbounded, uh, tasks, it can get very creative on how it tries to answer those. 00:16:06.820 |
And that can be counterproductive a lot of time. 00:16:11.120 |
All right, let me see what is this, what is this done? 00:16:21.400 |
So I started this at the, so well, and one of the reasons that at least for me, like I'm trying to, I'm trying to get this into something that I can analyze this. 00:16:30.500 |
I'm, I'm curious, you know, like you to see, you know, like, how does it, you know, how does it overcome problems? 00:16:37.180 |
You know, how, where does it, where does it go sideways? 00:16:40.060 |
Um, one of the things that we tried to, that I tried to do with, uh, with a partner, uh, we've been trying to do the last, you know, few days is, uh, give it. 00:16:49.900 |
Just, um, like, uh, dumps of dumps of spreadsheets, you know, like just go to, go to an enterprise system and, you know, dump six spreadsheets and tell it to do an analysis, you know? 00:17:04.060 |
Uh, and yeah, it takes it a lot of, it, it, it doesn't do a good job. 00:17:10.180 |
Uh, and how many are you giving it at a time? 00:17:13.360 |
Uh, I think we're giving it between two and six in the, in the three different things that we've done. 00:17:26.860 |
What's the model that's actually trying to parse the spreadsheets? 00:17:30.120 |
Oh, uh, we've, uh, tried it with, uh, Sonnet and Opus and hopefully I'll have codecs connected, you know, soon. 00:17:42.800 |
I'm not a hundred on how well they, I know spreadsheets are a little bit weird when you're parsing them with LLMs. 00:17:50.240 |
They have different strategies for handling them. 00:17:52.940 |
And a lot of it's like, uh, code-based and programmatic. 00:17:59.540 |
I found that it liked to write Python scripts, uh, and, uh, and use Python to, to try and load up the spreadsheets. 00:18:09.380 |
Um, and that can be really problematic if the spreadsheets aren't formatted in a, in a way that, that is productive for that kind of parsing. 00:18:18.540 |
Uh, especially if they've got like, uh, empty spots in them, it can get really weird. 00:18:23.940 |
Uh, also if there's any kind of like digital protection, like 365 sensitivity labeling, uh, with restrictions around that, that can be problematic. 00:18:32.580 |
But, uh, the one thing that I would definitely consider no matter which flow I was using is I would want to process any attachments in sequence in like a separate, uh, context window. 00:18:44.860 |
because, uh, unless there was like a very specific, like a to B consideration that you were making. 00:18:50.440 |
And even then I would try to find a way to parse them individually and then parse the, the outputs because, uh, basically I don't, I'm maybe I over-focus on this. 00:19:02.380 |
But, but to me, like the wider, the, the context window gets the less valuable your results. 00:19:07.720 |
And even if you're working within a context window that technically the LLM can handle, um, like, 00:19:12.640 |
uh, Claude code, Sonnet and Opus, they're technically rated for 200 K tokens. 00:19:21.760 |
I think the second you get, uh, above 125, it gets funky. 00:19:25.780 |
It gets a little bit funky and it still might work, but it's not going to be as focused or clear as the output that you're going to be getting with a fresh context window or a context window under 120 K. 00:19:36.760 |
Uh, like, likewise, like ChatGPT 4.0, uh, the pro version gets a 32 K context window, but the enterprise version gets a 128 K context window. 00:19:48.520 |
And let me tell you that thing is garbage past 32 K so like you can, you can get something to it, except to ask and happily, uh, you know, 00:19:58.780 |
process it, but the outputs that you're getting are, you know, widely dependent on how you're managing that context and the tool that you're managing that context with and how that tool breaks down the information. 00:20:11.560 |
you're getting it, giving it to formatted into the context that's working with mm-hmm mm-hmm yeah, I think it's, uh, I, I think it's a, it's a very challenging problem. 00:20:28.480 |
If you're using codex CLI, I would try 4.1 and see how that did 4.1 does okay with, uh, instruction following and long context windows. 00:20:39.400 |
It just extensively has a context window limitation up to a million, uh, tokens, but of course, just like anything else, I wouldn't trust it up to that actual million, but it has a much wider range where it doesn't lose the plot. 00:20:52.000 |
then, or as does, I guess, Gemini 2.5, uh, so both of those might be, uh, candidates for the, for parsing if the other models aren't handling them well. 00:21:03.880 |
Yeah, I, I feel like, you know, there's, um, you know, there, there has to be like a, well, there doesn't have to be, there's possibly, you know, like a generalized architecture. 00:21:18.520 |
You know, that is just like, Hey, you orchestrate and hand things off to, to, to sub agents, you know? 00:21:30.700 |
We're not at the point where you can, where you can just say, Hey, here's some spreadsheets now produce me magic, you know? 00:21:37.160 |
Um, and so I don't think that, you know, like it's that we're on the verge of just, uh, replacing the entire blue white collar sector tomorrow. 00:21:47.200 |
We're not even close, we're not even close, like AI will not take anyone's job. 00:21:51.740 |
Someone who knows how to use AI will take a lot of people's jobs. 00:21:58.240 |
Well, and, and, you know, maybe that, maybe that, you know, brings me to the, you know, the other piece. 00:22:04.840 |
I think we wanted to, to talk about at some point, which was, um, uh, this N8, uh, N8N thing. 00:22:24.460 |
Yeah, so, uh, uh, and of course, you know, well, maybe this is, maybe this is fortunate, you know, like I had some sort of crash with it and it lost, uh, my ability to, uh, plug into, or I had it plugged into a model. 00:22:39.300 |
Um, but, you know, basically this is, uh, let me just, you know, I can, it's, it's a little workflow tool, right? 00:22:52.560 |
I can press this and then I can get an HTTP request. 00:22:56.620 |
And so now I can come in here and I can say, uh, do a, so I haven't, I, you know, I haven't messed with this, you know, all but, uh, an hour or two. 00:23:10.860 |
Uh, I'm going to, I have already connected to open AI research of your message model. 00:23:21.180 |
I don't know how familiar people are with these kind of like no code, uh, interfaces. 00:23:27.180 |
Uh, so this was, so what I did find that was cool, uh, is, you know, so if I want my code to plug into this and then this is going to plug into this and, you know, so if I come in here, uh, yeah, it automatically like knows what the stuff that it's going to get wired up is. 00:23:48.840 |
And so I think that's kind of, um, what can I ask this model to do? 00:23:55.680 |
Um, my name is, uh, and I am from, uh, you know, operation. 00:24:12.500 |
So I just have, uh, name can be anything and API it's whether something came in for an eight through the API or through the, through the button. 00:24:28.360 |
Um, I think that, uh, You know, maybe where I, maybe where I want to, maybe where I want to pause right away is, uh, like, I no longer think that this is just like an average user task. 00:24:44.140 |
You know, like, you know, like, you know, like, you know, go ahead, you know, average user. 00:24:50.320 |
That's a very common, uh, experience where people are told that AI solves all their problems. 00:24:56.240 |
And they're like, okay, I'll go use AI to solve all my problems when they get there. 00:25:06.320 |
There's too many ways to skin any cat with AI basically. 00:25:09.900 |
And any given, like any given, this is like, maybe this is, you know, why I'm, I'm vacillating between these two extremes, you know, is like, this is putting rails up for every goddamn thing. 00:25:23.560 |
You know, like this is, this is total, this is total rails. 00:25:27.160 |
And this would be a nightmare to build, I'm sorry, I'm not catching up on, uh, I'm not up on chat. 00:25:32.920 |
Uh, this would be like a nightmare to build, like all of your workflows for your entire company. 00:25:39.240 |
And then I started thinking about, you know, then you start thinking about the change management process and what happens, what happens when an error happens, I mean, forget an error, you know. 00:25:52.120 |
You will drown in edge cases if you try to set up your whole company around N8N workflows, uh, just from like the feminism. 00:26:06.580 |
It is cool and slick, you know, and I'm going to keep messing with it because I want to, uh, you know, I want to keep, uh, you know, because I want to see what it can do. 00:26:16.060 |
Uh, let me see if there's, uh, I, I would love to, Manuel, I would love to hear, uh, okay. 00:26:28.840 |
Uh, I would love to hear, I would love to hear some of your anecdotes, Manuel. 00:26:34.240 |
So last year at, um, at, uh, the AI engineering conference, uh, this dude showed me, he was working at like some kind of healthcare company. 00:26:44.680 |
And older, I don't know how this is all like structured, but they had like hundreds and hundreds of NN workflows. 00:26:53.020 |
The interesting thing was like that the people who were good at prompting were the analysts. 00:26:58.000 |
And so for example, one analyst had figured out that YAML was a good intermediate format to do prompting things. 00:27:05.560 |
So, but they seemed like, I don't know how they were orchestrating all of that stuff, but the dude showed me that like huge NN workflows to like, and this was, it was like a year ago. 00:27:16.660 |
So, so it wasn't like full on agents or so, but they were pre AI. 00:27:23.560 |
They were already using Natan, like quite a bit. 00:27:33.040 |
Secret with N8N is don't use N8N when you're using it in a N8N, just like whatever you're doing, go to O3 pro say, give me a workflow. 00:27:47.120 |
And I was like, okay, well the solid three, five at the time didn't really do all too well. 00:27:51.580 |
Although you could, you know, like built an intermediate step. 00:27:54.880 |
Um, but they had like huge, huge workflows that they were using for forever. 00:28:00.220 |
And that allowed, you know, the analysts to have like little blocks that would connect to their own APIs. 00:28:08.880 |
And that's the, so the, the, the move here is O3 pro here's my data dog logs. 00:28:14.340 |
They're just going to stream to you whenever these agents get off track, you should make an N8N workflow. 00:28:19.680 |
That's going to generate the Jason that will get them past this hump. 00:28:25.020 |
So you like just recursively feed it back to the model and be stupid. 00:28:31.680 |
Like, so basically you just, the, the nodes can run JavaScript and then you can also 00:28:37.080 |
make a, so there's JavaScript notes and there's, and there's webhook notes. 00:28:42.060 |
Like those were, those were like the easiest to, I had to give a workflow. 00:29:01.240 |
Is that you have all these like influencers doing N8N things and you can just like screenshot 00:29:06.500 |
their setups, which some of them are really useful. 00:29:09.440 |
Like they're, they're LinkedIn scrapers or whatever. 00:29:11.720 |
And, and it just saves a lot of time to be like, okay, someone has already done this, 00:29:17.360 |
Um, but I, I don't, I can't deal with like graphical interfaces. 00:29:24.400 |
For me, anytime I'm trying to make a graphical interface, I run into something that a 00:29:29.120 |
graphical interface can't handle, but I need to dig into the CLI and it has just 00:29:43.200 |
The reason you use it is because it's a structured data format. 00:29:46.780 |
That's really like, it's really hard for a model to mess up. 00:29:50.420 |
It, it, it automatically gives you the rails and you can ask it to generate more 00:29:54.760 |
rails for itself and it's not going to just sold me on any then or Nate, as we're 00:30:00.640 |
So the, yeah, the, the thing with GUI with lots of applications, but like 00:30:05.080 |
you know, applications like, like N8N is like, stop trying to use them. 00:30:13.540 |
The model's smarter than you just figure out how to get the model to fix itself. 00:30:18.360 |
And just be careful to context engineering when you're doing this. 00:30:25.240 |
It's like, I mean, that makes, that makes perfect sense to me. 00:30:28.960 |
Cause you already have the data dog, like tracking everything, like getting a feedback 00:30:36.040 |
And then those agents are generating workflows that are going back into the data 00:30:39.880 |
dog that are flagging specific things that are going back into the workflow that the agents 00:30:46.420 |
So you get these little recursive, recursive self-improvement loops. 00:30:52.400 |
It's just like one little loop that works for me. 00:30:56.800 |
How do you avoid overfitting in those scenarios? 00:31:00.160 |
I guess if it's not giving you, if it's not throwing an error, it wouldn't cause it to adjust. 00:31:05.620 |
Like, so the, you, you, you review it, but like N8N is kind of how you avoid the overfitting. 00:31:11.900 |
Like you give it these really structured places to play and just let it play. 00:31:24.900 |
Yeah, it's, I haven't, I feel like I'm not using it enough, but like from, from a philosophy 00:31:31.840 |
So like my, my, my thing that I end up doing is like, uh, when I'm, when I'm thinking about 00:31:37.460 |
like these sort of AI first workflows is like, I start over engineering them. 00:31:42.860 |
So I have to step in like, no, that's, I need to like stop myself from doing that. 00:31:46.840 |
Like, there's a way to get the model to step in on itself. 00:31:58.180 |
It's a model of your choosing for that workflow. 00:32:05.840 |
So if, if, if there's a thing that this, that just a straight up call to one model is 00:32:10.220 |
not going to work well, the answer to that is like call multiple models, have them debate, 00:32:15.320 |
then ensemble that into the, like the ultimate deliverable. 00:32:19.520 |
So it's always like, ship it back to the model. 00:32:22.980 |
Like the, if the model, if the model can't handle the model and get the model to prompt 00:32:27.740 |
the model, um, because I'm not going to try to figure out how to prompt the model to get 00:32:33.660 |
What's funny is when we, uh, something we've observed, um, just in this experimentation with, 00:32:46.560 |
uh, throwing spreadsheets and read me's at, um, E to B, you know, my numeral, we'd ran out of just 00:32:57.480 |
Uh, you know, I'm just, just throwing things at, at E to B is that when, uh, when we wrote 00:33:04.500 |
the read me, it did not perform as well as when we wrote the read me and then asked the model 00:33:11.020 |
They're better at prompting themselves than, than we are. 00:33:14.420 |
Uh, like the topic console bench is really, really good for this. 00:33:19.460 |
I would say there's like a, there's, there's like a sliver above where they're, where they're 00:33:26.240 |
at, where I've seen, I'm, I wouldn't consider myself to be a talented enough context engineering 00:33:35.180 |
But I have seen people who get an output, uh, from something like O3 pro and then they just 00:33:40.400 |
go in and they start moving tokens, uh, as superfluous. 00:33:51.140 |
I'm trying to figure it out, but the models are great at that science. 00:33:54.400 |
The model, the models will get you very far, I guess. 00:33:57.400 |
Uh, yeah, that basically is, or, and it's like, yeah, like, or like, I think there's, uh, some 00:34:06.000 |
principle, like principally papers that like show this, like the, the models are better at 00:34:12.160 |
Like we might have like an edge sliver, but end of the day, like that's human stuff. 00:34:16.980 |
What you, what the, the thing that you want is to add that little sliver into the search 00:34:21.860 |
space that the model is looking forward to reconfigure this prompt into like the optimal 00:34:27.900 |
sort of, sort of directive for the other model. 00:34:35.220 |
That's all I had, you know, I'm happy to answer the questions. 00:34:44.980 |
I'm happy to hack at something, you know, what's the, yeah. 00:34:49.860 |
What's something that you can't get the model to do with N8N right now? 00:34:53.700 |
Or like, what's the, what's an N8N thing where, what is the thing where you're like, 00:34:59.640 |
You know, I'm still, I'm still like getting my footing to be fair, you know? 00:35:05.200 |
So what I, uh, or like, so, uh, six spreadsheets was something that I heard doesn't work. 00:35:10.520 |
You go, go to O3 pro say, Hey, I have these six spreadsheets. 00:35:13.860 |
And I also have N8N that can also like stop to call a model. 00:35:17.580 |
Just so you know, um, can you give me a workflow that is going to correctly 00:35:22.320 |
ensemble the models such that they can actually parse the, um, uh, the spreadsheets. 00:35:30.620 |
And then here are the, the spreadsheet outlet output that I didn't like. 00:35:36.780 |
And then I'm gonna give you a description of the kind that I like, and then you 00:35:42.060 |
Um, so yes, but O3 pro still has one 28 K context window. 00:35:50.880 |
And so depending on those spreadsheets, he could be operating outside. 00:35:55.320 |
So it's gonna, it's gonna truncate it, but what it should figure out while it's 00:35:58.740 |
trying to make that workflow is that it needs to create sub spreadsheets and 00:36:07.920 |
So I might workshop that with, I might workshop your O3 pro prompt with like O3, uh, before 00:36:16.740 |
And like, I might put that in a workflow that says like, Hey, like check this pro you know, 00:36:20.520 |
the, uh, or like I would, I, I might expect that O3 might find out that, oh, like when I 00:36:27.360 |
am creating this prompt of, uh, uh, uh, uh, of these three spreadsheets, then I should have 00:36:33.240 |
a model recheck this outgoing spreadsheet slash prompt kind of thing that I'm sending during 00:36:38.580 |
this workflow sort of, um, thing, but you need to like. 00:36:42.540 |
You need to be able to iterate iteratively prompt it. 00:36:45.920 |
So you kind of like need like a deep researchy workflow, which you can build within it. 00:36:49.860 |
And you can just sort of, you know, you can, you can sense you have the model node and you 00:36:53.880 |
have the JS node, like you have, you have all the tools you would normally have. 00:36:57.180 |
If you're a JS developer, you just also have this like JSON based rails, primitive to build 00:37:05.720 |
Um, so you could build in quality gates then to each step of the workflow where if X, Y, Z not 00:37:12.440 |
met, and that could be X, Y, Z as determined by a model, uh, do not proceed or loop back. 00:37:20.660 |
And like, uh, uh, like stream the data dog logs of the other models and then have an ensemble 00:37:28.800 |
If the, the, the cloud code prompts need to be changed or intervened or whatever, and 00:37:34.620 |
then like have something text you when you need to like, when it actually needs a human. 00:37:38.460 |
So I guess step one would be setting cost limits. 00:37:46.260 |
So ideally the, the place I want to go with this is I want, I want, or because flagship models 00:37:51.520 |
are going to be the thing with flagship models is they're powerful, but they're out of the box 00:37:56.100 |
So if I have my collection of data dog logs and my collection of N8N workflows, and I have 00:38:01.740 |
like Kimi K2 or deep seek R1, then I can like RL, my internal model to be better at specifically 00:38:17.460 |
I'm telling you guys here, it's just in pieces. 00:38:26.760 |
And like anything works, you can use like Composio or you can use make or kind of whatever. 00:38:33.720 |
It's just like anything that's sort of like constrains it. 00:38:35.880 |
I think like the way Emmanuel approaches it is he just goes straight raw code and he's like, 00:38:39.960 |
make me some DSLs and then flip around the words and the DSLs kind of a kind of situation. 00:38:45.360 |
I just use N8N just because it's easy to self-host and I have a coolify one click on it, but it 00:38:51.060 |
like, um, it's not that you necessarily need to use N8N either, but this graphical representation 00:39:06.900 |
I need to dig in more, but, um, what I've been doing when using sub agents. 00:39:12.960 |
So I use M code, but I guess cloud code is the same. 00:39:15.420 |
I will ask it to design a diagram of how to paralyze sub tasks. 00:39:22.200 |
And the, I use plant UML cause it's actually, I think better suited for it. 00:39:27.720 |
I ask it to output a plant UML diagram and then use that as an instruction. 00:39:32.540 |
And it's like, it allows you to keep the model much more plant UML plant UML. 00:39:40.860 |
Like, like the old days, I think leaving behind UML and all of that, the first, the first 00:39:47.160 |
It's like mermaid, but like more tailored to software engineering. 00:39:50.940 |
So if you, if you go look at, uh, if you go look at activity diagram, that's 00:39:55.680 |
like the ones I usually use, um, see that scroll down a little bit. 00:40:08.640 |
Um, so if you scroll down, they have these like syntax down here and that 00:40:14.760 |
allows you to define, like, it's a well-known notation that's been around for 00:40:19.620 |
like 20 years and, but, but it, it doesn't really matter per se. 00:40:23.560 |
It's more like, because it's a DSL slash syntax to define control flows. 00:40:28.560 |
It's a really effective prompting technique compared to. 00:40:33.960 |
You know, uh, even mark down to do lists because this wall per default, um, 00:40:43.320 |
Let me see if I can share my screen because it will become much more 00:40:52.320 |
So I'm not really used to how to do these things. 00:41:02.640 |
Uh, so let's say we have a big mess of a code base with lots of garbage. 00:41:10.080 |
Uh, I want to refactor it so that we first define a plan, like an architecture spec. 00:41:20.760 |
Then review it, go back if it's bad, then start implementing a prototype. 00:41:26.940 |
Then based on the prototype launch a series of parallel agents in their own work tree to refactor. 00:41:45.960 |
Um, I'm going to switch away from O3, but I would usually do this in O3. 00:41:49.740 |
Um, you know, with like the actual code base, not like, not like the prompt garbage code base. 00:41:58.980 |
Whatever the agent decides it's, is necessary. 00:42:04.740 |
It's like, it's like, I, the goal is to trust it a lot. 00:42:09.340 |
Like I don't want to verify, I want to, I want to see how far I can push this kind of stuff. 00:42:14.820 |
Um, and, and I mean, this is like upfront work to tell it where to look. 00:42:24.120 |
Um, but really my idea is like, okay, how much can I do? 00:42:31.080 |
So that I remove myself from the, from the process. 00:42:35.400 |
And so this would be like a pretty clean way of architecting something right. 00:42:40.320 |
Where you have like, maybe this is a sub-agent that does the review. 00:42:43.280 |
Then you implement the prototype, you do this, and then you spawn the parallel refactors. 00:42:48.400 |
So don't do this in Sonnet, at least not the heavily. 00:42:54.180 |
neutered Sonnet, because that's a recipe for, for disaster. 00:43:02.160 |
Um, but, uh, I don't know how we could do this, simulate that, uh, simulate a run through with agents, as if you were a coding agent with sub-agent tools. 00:43:19.220 |
Uh, I don't know, but this is a really effective prompting technique versus, you know, the more like text oriented way of doing things. 00:43:27.140 |
Uh, so I, I recommend trying it out because you can use it not just for coding, but for anything that's like a workflow and an agent, instead of describing it in text, which is like. 00:43:39.520 |
ish, if you use code to describe it, it will like prompt it too hard to write code. 00:43:44.260 |
But I found this like intermediate diagram technique, pretty good to, to prompt sub-agents. 00:43:49.360 |
So I've asked, I've put a memory in my chat GPT subscription so that whenever it thinks to output code that unless I was specifically asking for code, it should default to pseudo code. 00:44:03.760 |
yeah, yeah, so, so, so I've been playing around with these different things. 00:44:07.660 |
I have something that's really interesting that I'm going to show as well, because they might give people some ideas. 00:44:12.700 |
Um, so the, I've been playing around with these like prompting techniques. 00:44:19.600 |
that are like kind of world-changing like use T-mux was a big, was a big one where it would always struggle running like servers in the background and tailing their logs and like closing them properly. 00:44:29.380 |
And, you know, it would like run a server, but then not be able to kill it and would kill other processes. 00:44:35.140 |
So just using T-mux was like a pretty good way to solve that. 00:44:38.440 |
And then I think for agents, what I think would be really interesting and I haven't really tried it out yet. 00:44:48.580 |
Man, I generate way too much stuff and madness, um, is using Redis CLI to both synchronize and also communicate sub agents with each other. 00:44:58.900 |
Cause they can store data, but they can also subscribe to event queues. 00:45:02.980 |
So that sub agents are able to communicate saying like, I've done this and I've modified this file and then other agents can just pull them. 00:45:10.280 |
So I had them create like a JavaScript sandbox. 00:45:17.740 |
A set of, so there's low level primitives for Redis, right. 00:45:22.960 |
Which are cool, but they're like a little bit brittle. 00:45:26.440 |
And then on top of those, I built a whole set of basically agent coordination things. 00:45:33.280 |
So this is the, the low level stuff, but then agents can do this. 00:45:40.060 |
And I'm really curious to see if you give the agents basically the command line tools to do this, or the JavaScript methods to do this, like how much they can start communicating. 00:45:51.520 |
Um, because they know what communication between agents looks like. 00:45:54.520 |
Like there's enough of that in the training corpus to, uh, to say, well, I'm going to wait on this agent to have finished refactoring the email stuff before I move on. 00:46:03.400 |
And then they call a blocking tool, um, and wait until the email agent says like I'm done. 00:46:09.760 |
So, so I haven't played with this too much, but do you have agents like hot loaded and ready to fire off or do you, uh, like spin them up for each one? 00:46:21.940 |
So I have a MCP that's just like executing JavaScript, but I haven't tried it out yet. 00:46:27.100 |
Like, this is just an idea that I'm playing with. 00:46:28.960 |
I, I tried to give it like shell scripts and say like call redis CLI to synchronize and that worked in some instances. 00:46:36.100 |
And it was really cool to see them like all the same pub sub and say like, oh, well, I, it looks like this agent hasn't finished. 00:46:43.240 |
Uh, I went through an attempt to automate cloud code with this before hooks were a thing and it, it was a bit, it was a bit of a nightmare. 00:46:52.220 |
Uh, in my experience, but I mean, I, I'm sure there are better ways to do it now. 00:47:00.020 |
Uh, it wasn't robust enough, but let one thing that we can try is if we take this and go back to our diagram, right? 00:47:09.960 |
Like if we rewind to here and then now use annotate with these APIs to show how to communicate. 00:47:22.780 |
Amongst agents and add it to the flow diagram. 00:47:29.140 |
This is horrible prompting, but, um, this kind of stuff seems to, seems to work pretty well. 00:47:41.340 |
Like I want to find the structures that make prompting. 00:47:44.840 |
Not making it up on the spot is, is, is too hard, too much pressure. 00:47:56.380 |
Like once you prompt them with this and the graph, I think it's going to be really robust. 00:48:00.040 |
Um, and, um, yeah, I'm going to, going to play more with it. 00:48:09.820 |
So green that you're going to mess with this one on. 00:48:12.040 |
I got to make sure I'm, uh, I think next week, maybe. 00:48:26.200 |
I think there's so much untapped stuff in agents to actually build big code bases that don't 00:48:32.500 |
Cause like refactoring a bit code base is even more pattern matching than like building 00:48:38.260 |
Have you done anything with like, uh, craft databases for like, uh, code base management 00:48:46.060 |
where it sort of like internal, like summarizes and internalizes that stuff to a graph database 00:48:53.260 |
for, uh, prompting and pulling like lazy loaded sections of it. 00:49:02.440 |
A little bit by using this, I don't, I don't know how much I've shared in, in AI in action, 00:49:07.720 |
but I've been doing this like JavaScript based thing where it's just like in JavaScript, you 00:49:12.400 |
So that gives the model the option to store things, but also create JavaScript function as needed. 00:49:24.000 |
I just said, like, you know, you have a database, you have JavaScript, like, please model the database. 00:49:29.280 |
And it did a graph database kind of like, right. 00:49:33.620 |
So I added a tree sitter library into my JavaScript sandbox and it started using that. 00:49:42.080 |
That's what I meant with like, I have zero prompt effort to, to do like, the only thing I need to 00:49:49.820 |
If you don't put write the code, it will like not do it. 00:49:54.120 |
It's like model the database, by the way, you can write code. 00:49:57.000 |
And then it will do like, it will like, come up with a pretty good graph database on the spot. 00:50:15.900 |
That's what you shout at the computer screen? 00:50:22.620 |
Like the GS, uh, it used to be called experiments JS web server, but that that's kind of, uh, 00:50:28.760 |
so, so this is, it has like a full ripple and like all kinds of crazy stuff. 00:50:33.480 |
This is like a little bit too over, over vibed, I think. 00:50:37.020 |
Um, but, uh, the other thing that I have is this go, go, go jar. 00:50:47.900 |
And so I cleaned up all my module engine and I'm starting to add more and more modules in there. 00:50:54.920 |
So currently those are very, very simple, but, um, in my, in my big vibes mess repo. 00:51:05.060 |
So this is all stuff built completely autonomously by man is like, this is code. 00:51:13.040 |
Um, but I have, I have like an LSP interface. 00:51:18.300 |
I have like a, I have like a, I have a tree sitter interface that works. 00:51:21.700 |
And then I have a text text apply interface that works as well. 00:51:31.060 |
Like, like, uh, it's really cool to be in JavaScript and say like, oh, where's the symbol used? 00:51:39.400 |
Um, anyway, I'm, I'm really curious to see where all of this, all of this goes, um, or like why I want to, uh, so much to play with. 00:51:54.120 |
So in, in, uh, Neo Vim, if you use mcphub.nvim, you can make a native Lua MCP server that has access to the Neo Vim API such that you can use things like telescope. 00:52:07.640 |
Um, and trouble to like, let the model just drive Neo Vim and bounce around between all the symbols and grab LSP outputs and all that, you know, I'm just, I'm not good at Neo Vim. 00:52:21.960 |
It's, it's like pretty token inefficient, but if you give, I, I tried a little bit, if you give, if you give the model Tmux and then you say like run Vim and write macros, it will start using Tmux to write macros to refactor the file. 00:52:33.980 |
It's like really funny and then it will read it back and forth. 00:52:39.860 |
Because it like calls the shell tools all the fricking time, but it's really funny to see it. 00:52:47.720 |
And that's why the Lua MCP is really intriguing to me. 00:52:49.940 |
Cause it's just gonna, it's just gonna try to, it's just gonna tool call to, to drive it. 00:52:55.920 |
What, what, what models are you guys working with that are, that are using these MCPs effectively? 00:53:05.000 |
Opus four tends to be the one that I usually default to. 00:53:09.720 |
Um, I think, or like what I, I want to find an open source solution for that. 00:53:24.620 |
Cloud four seems to be, seems to be the guy, at least when it comes to tool calling. 00:53:32.200 |
If Claude needs a model for something else, just give it a tool where it can call Oh three. 00:53:42.580 |
A really good, uh, I've been getting better results than I expected, uh, by having, uh, 00:53:49.440 |
Claude code call other models like Gemini for components of the, of the workflow. 00:53:54.480 |
Uh, like, like Gemini has got that 1 million token context window. 00:53:59.400 |
And so when there's like a deep analysis that I want to kick off in the middle of a 00:54:02.880 |
component, but like, you know, cloud's got that 200 K limitation. 00:54:08.820 |
And the trick, uh, or the trick there, uh, Gemini has really good. 00:54:13.860 |
Throughput on its recall, but it's re it it's, uh, when it's context window is full. 00:54:23.160 |
So what you want to do is have Gemini is have a Gemini pull out all of the information 00:54:28.780 |
and make a new document that you can, that Claude can then read down and do reasoning 00:54:33.940 |
Yeah, that's, yeah, that's exactly the workflow. 00:54:35.900 |
Um, because yeah, Gemini is great for summarizing and like analyzing, but it's, 00:54:47.400 |
You have an MCP for that or like, how would you do it? 00:54:53.480 |
Um, so it depends on, it depends on your workflow, frankly. 00:54:58.220 |
Uh, yeah, you using the hooks, um, or, or you structure the, I'm curious. 00:55:06.240 |
I'm curious if you, if you were willing to try out something for me, cause I use M code and 00:55:11.240 |
I don't have time to like, I don't want to go into M code, but try this, the same approach 00:55:17.200 |
out, which is, um, which is to have, if you, if you start having like a refactoring plan, like 00:55:24.540 |
this one, also ask it to create the necessary hooks to do it. 00:55:28.040 |
So you can say like, oh, every time you do this kind of thing, review with Gemini. 00:55:34.460 |
So basically speaking out loud, what you just said, then it generates this mermaid diagram 00:55:39.460 |
with, you know, just like pretty strong poles. 00:55:47.000 |
But the thing I'm, I'm, I'm trying to Yolo refactor right now. 00:55:49.900 |
Like I took a huge code base and I was like, yeah, make this event driven. 00:55:59.180 |
I got it, but definitely, uh, I'm going to ping you when I have the time to actually explore 00:56:10.700 |
What are your, what are your, your handles Adam? 00:56:19.180 |
Uh, I got a bounce, but I, that's currently my, my big thing for the last two weeks was 00:56:29.140 |
I'm going to be like, oh, this needs to be event driven. 00:56:33.040 |
It's, it should be able to come up with a good architecture to do that. 00:56:36.400 |
Cause, cause it can, if I ask or three, I'm going to give me something. 00:56:40.720 |
It's just like applying that to the code base, which is also very tedious kind of ad hoc. 00:56:48.840 |
It doesn't seem to be able to do it, but it feels eminently possible to do, to do that part. 00:56:56.340 |
It's like, that's models are made to do this and it's just the way we currently. 00:57:03.240 |
You want like a patch based or diff based workflow or like rather, uh, like I'm, I'm where I'm 00:57:08.460 |
heading with it is I, I think it's going to be patches where, um, the, um, I think it's 00:57:14.140 |
going to be like, I think it's going to be like the ad hoc creation of the tools that 00:57:18.500 |
that you need to do the thing that you need to do. 00:57:20.740 |
Cause you need to adapt to certain systems where. 00:57:24.240 |
If you want to refactor, you need to find the right granularity to make sure you're not 00:57:28.600 |
breaking everything, but also to not make it small enough that you actually never progress. 00:57:33.020 |
Cause the pattern I've seen is that cloud will start refactoring, right? 00:57:41.460 |
And then sonnet at some point we'll get like a lint error saying like, oh, well, this symbol 00:57:47.820 |
So I'll be like, oh shit, the symbol's broken. 00:57:51.020 |
And then, so it starts wrapping the stuff that actually needs to be removed. 00:57:55.040 |
And then it starts removing the wrapper and says like, oh shit, the wrapper is gone. 00:58:00.380 |
And then some kind of like architecture astronaut pattern. 00:58:03.920 |
What's going to be like, I'm going to make a mock interface wrapper wrapper. 00:58:09.500 |
Do you have like prompts against anti-patterns like that? 00:58:13.280 |
Like if you're telling it not to go too hard. 00:58:16.280 |
And I think either it's M code, either it's the model as well. 00:58:21.200 |
Like it seems hard to fight against on its training already. 00:58:24.580 |
Um, but I think something about the agentic part of it also makes it much more easy to 00:58:32.020 |
So, so I'm experimenting, but my trivial approaches were. 00:58:41.920 |
Um, it doesn't feel like, like context bloat errors. 00:58:45.860 |
It feels like, oh, I found something wrong and I reacted upon it, which makes sense when 00:58:53.380 |
Like when you greenfield, you want to avoid having errors too early, but when you refactor, 00:58:58.360 |
you actually want to have errors early until like everything resolves kind of. 00:59:13.580 |
My suspicion here is that I am once again, trying to over engineer, I'm, I'm suspecting that 00:59:18.580 |
like periodic injections of like, no, do it again might be like kind of enough to just. 00:59:41.100 |
Like, oh, why, why is, why is the user indicating that this is wrong? 00:59:45.260 |
Let me reason about what this agent is doing. 00:59:47.360 |
So now I need to look at its context window and then I'm gonna need to reconfigure that 00:59:50.840 |
context window, um, in order for it to actually accomplish the goal. 00:59:54.920 |
So if I just sort of periodically inject that when there's like two, there's one error, then there's 01:00:00.420 |
the same error, then there's the original error again, like detect that. 01:00:08.540 |
Um, is like at least one of the components of the waterfall that is in there. 01:00:14.720 |
But I'm always like, how can I make this dumber? 01:00:16.800 |
It's kind of the, the directive in my experience. 01:00:25.480 |
Is to make it, uh, make it keep it simple, uh, ideally. 01:00:30.300 |
But I feel like agents put, put, put us in a really weird spot where depending on how 01:00:35.820 |
they interpret our simple, uh, commands, it can get pretty complicated, pretty quick. 01:00:42.300 |
I'm the, the other direction I'm kind of heading is, uh, uh, like functional 01:00:47.760 |
Cause if I can just take a PRD that defines properties of the software. 01:00:56.400 |
I gotta get Oh three to learn Erlang where I was going to say, I need 01:01:01.200 |
So yeah, dude, uh, I, I've been, I've been having a lot of good, uh, results 01:01:07.360 |
by just getting it broken out into like PRDs for each feature. 01:01:12.960 |
And that's where I get like the lazy loading of like the, the edges of each module and how 01:01:19.320 |
So I can keep it from getting too much context around the whole project while it's 01:01:24.060 |
focusing on any component, but that's not for like a new, that's not for a new refactor of 01:01:30.240 |
Well, you, you just, all you need to do is generate a PRD that currently describes that 01:01:39.240 |
That's not a simple, it's a little bit more complicated when you're 01:01:45.000 |
Cause you gotta do that like layered agentic review of each, uh, subcomponent. 01:01:49.860 |
And if it's not already modularized, that gets fun. 01:01:54.480 |
It's definitely, um, it's definitely a task, but yeah, I think like the fundamental 01:01:59.980 |
primitive of like big list and then smaller to do lists. 01:02:04.920 |
Claude's has one, like the little tools like think. 01:02:09.420 |
And like scratch pad are like way more, way more important than we think than, than 01:02:15.540 |
Well, well, well, think is, think is dangerous because if you don't actually have much already 01:02:19.620 |
in the context window loaded, like anthropic themselves will say, don't use think, don't 01:02:26.400 |
I made, I made my own MCP server that is called think. 01:02:29.460 |
And what it does is it, it, it runs, uh, from the paper algorithm of thoughts. 01:02:34.020 |
It's like choose an algorithm, run it down once and then review it. 01:02:44.040 |
I I've used the, I've tried the sequential thinking MCP. 01:02:47.580 |
That's it's not much better than a, than a checklist that, uh, opus generates itself 01:02:56.700 |
I'll, I'll see if I, I might be broken right now, but it is. 01:03:03.300 |
I mean, you know, uh, I'll, I put it in AI in, in accent. 01:03:07.920 |
It's, uh, uh, MCP reasoner is what it's called, but it's, I think it's got like five algorithms 01:03:13.860 |
So the idea is like, okay, sequentially thinking this way and then, um, like evaluate. 01:03:22.380 |
If not choose a different algorithm and run that. 01:03:33.840 |
Oh, you muted again, but I think we're, uh, Yeah, I, I really appreciate this. 01:03:43.980 |
Um, I really appreciate this whole, uh, talk has been very informative for me. 01:03:48.060 |
Uh, I appreciate the, the counterpoints to, to my experience. 01:03:56.220 |
I'm, I'm curious to see if anything like, uh, unblocks or changes for you.