[AI in Action Sep 18] Beyond Single LLMs: Building with Parallel AI Agents

of time. So it's going to be interesting in the context of a 45-minute demo to see how far we get with GPT-5. And then it's actually a good opportunity, though, to show this multi-tooled approach that I often use to explore different ideas. So next week, I'm doing a live stream with Zed of all teams showing my live coding workflow.

So this is kind of like a run-through of building the same thing that I want to build next week. And the thing I want to build is a Doodle clone that also has AI. So Doodle is a way to coordinate different people to have a meeting together. So you put in your availability, and then at the end, you can see the availability of all the people.

And it's kind of a pain in the ass to fill things out because I want to do things like, oh, every Sunday from three to five, but then not on this Sunday. And also during the week, I don't have time, like stuff like that is what I want to put in and have it populate like the calendar.

So that's kind of what I want to do. And usually I start that workflow. Well, one way of doing it, let's see if I can share my screen really nicely. Since we are going to show all of these different things. Can you, are you seeing my iPad? That's not what I wanted to show.

Let me see. I want to show my Linux. There we go. So one thing I want to do that I would do is I would just use Manus and Manus is kind of like starting it on auto, auto build. So, um, since I want to build something with an, with an LLM backend, I am going to use, uh, something that I had Manus built in a previous project, which is like a mock open AI server.

And, um, that way I can test like the API. I have it do like the real chat API. So what I'm going to do is like build a doodle clone with AI functionality. And Golang with a simplistic HTML interface, um, use to test the messages backend. And then I think that's about it, right?

Like I'll just like try to do something like this. And so I'll, I'll let this run. This will cost me $10. Um, it will give me some insight into what's possible to be built. So in this case, this is not really useful, right? Like I, the doodle clone, I already know that it needs design.

I need to think a little bit about it, but something I did just like 30 minutes ago is like, oh, I want like a go AST linting plugin for something like make me a self-contained project takes me about like 30 seconds to put the thing together. And then I can just like, let this run in a background the way we'll do it.

So we'll come back like in a little bit and then, um, we'll see where it goes. And so that's step one of my workflow, every kind of idea that I want to have, I just like put it into Manus. So when I'm like walking, I'm like, oh, wouldn't it be cool to have like a linting plugin for go?

I'll just like put it into my phone and then, um, let it go. So that was my, that's the first technique is just like put ideas into Manus. The output is usually like pretty fucking janky. Um, but sometimes there's like useful things to be gotten. So for example, the linting plugin that I showed before.

Kind of didn't build the first one because it didn't have like a C compiler installed. So I installed the C compiler, tried it again. And then at the end, I got something that kind of worked and I asked it to write a tutorial on how to set things up correctly so that it tells you like, oh, you need to install a C compiler.

You need to do X, Y, Z. Which is useful. Right. It's like, that's a useful output for me. Um, even if I'm not going to use the plugin, I get like a nice tutorial markdown thing that I can then put into codex. Um, so, so one way you can think about is like, oh, you have like an idea.

And then I put the idea into Manus. Um, and it's not that I want finished software to come out of Manus cause like it never really works, but I want to have different, what I call like useful outputs. And useful outputs is like, it doesn't even need to compile.

It doesn't like even how it failed is like usually interesting information. Cause it tells you like, oh yeah, like if you let an agent run on its own, that's how it will fail. Um, what's important is that these useful outputs, you can like further modify them. Right. So, um, let's say we, we have this useful output and then I'll, I'll continue using something else with it.

Like say codex, CLI, and then build like the, the real output. Um, so that's one way to basically use parallel sub agents, right? Like if I have multiple ideas on how to do something, I'll just put all of them in there. Or I'll try rerunning it. Right. Um, Costs me the time it takes to put a prompt in.

Um, however, for projects where I know a little bit more, right? Like the doodle project, for example. Um, the cool thing is that I think in the training corpus, it already knows what doodle is. It also knows what a web chat is. So I will, uh, not need to like put too much use, too much prompting in to explain what I'm about to do, which is, which is useful in this case.

Sometimes I do need to put a significant amount of effort into explaining what I want to do what I want to do. In those cases, I will open chat GPT and I will start like, um, um, uh, talking about things. So actually let me show this kind of work where I use chat GPT as it's kind of an agent these days, right?

GPT five. Uh, I consider it an agent that takes like 15 minutes as well. So what is, for example, something that I wanted to do? Um. Let me see. Um. Something that doesn't have to do with secrets that I use at work. Uh, let me see. Oh, yeah. See local firecracker style VMs.

Like this would be kind of an idea that I have. Where I'm like, oh, how does this work? I'll literally put that in and then it will do some research, tell me how to do it. Um, and then I will ask more. On how to do certain things. I don't necessarily like look at this too closely, but this is like output that I can then copy paste into, um, into, uh, into another agent.

So here, similarly, I'll sometimes use chat GPT five pro. And in this case, I kind of use it more like Manus, except it doesn't have a VM. At least that's not how I, I know I does it. But here, for example, it's like, oh, design a little app server that gets triggered when a new Google me transcript is published that analyzes the transcript, blah, blah, blah, blah.

Uh, it's thought for 26 minutes. Um, and then I get this kind of output. I don't know if it actually compiles it. Like, I don't know if it like under the hood has actually a VM because this actually did compile. Um, but now I have like code to start with and like, just put it into codex and say, like, you know, this has been like vetted by this big ass model that has access to the web that has access to all kinds of things.

Um, and then I'll let these run in the background. Right. It's like, um, uh, I don't know. I have like tons of these. Um, so that's a second way of doing things where paralyze things into like basically sub agents. Um, and while it's doing these things, um, I will actually get my sketchbook out and just like start thinking about what I want to build.

So in the case of doodle, I do want to store my times on a database. Right. So I do need a database. For me, that's SQLite. Uh, I do have multiple users for the sake of brevity. I'm not going to do login. Just going to say like, Oh, you have a user ID as a field.

Um, and then we have a chat functionality, right? We have a chat bot functionality. So the chat bot functionality does need to execute some actions. Like it does need to have access to tools to modify this. So it's going to be, to add time available. I don't know. Um, added details.

I'm not sure how much I want to flesh this out on paper right now. I don't really want to flesh this out. I think, I think I have everything in here. Um, oh, and then I have a UI rate where it shows like a calendar view. Um, with some meeting info.

So this would be the UI. Um, once I have like worked out kind of what I want to do, this can be like more or less complex. Um, it's, it's fun to do it on paper. You have time. Cause you know, the agents are taking like 15 minutes. Anyway, no need to rush or be stressed.

I'm giving you a presentation. So I'm giving you a presentation. So I am going to leave it at that about ish. Um, I'm going to go back to my, um, to my Linux. And I don't have, um, visibility into the chat. So if someone wants to chime in from time to time, if there's something important, then, uh, feel free to take over the mic.

Um, so what I'll often do, especially when there's a UI involved is, uh, I'll go to Claude. And then select make a minimal web UI for a doodle clone with chat bot AI functionality to create and schedule and mark available times. And I'll just wait. Like now I also, again, have like five minutes to do something.

Um, and the concept behind what I'm doing right now is that I will have, um, the UI, which is kind of its own self-contained component. Right. I will open another cloud. Actually, I'll open chat GPT cause it's not as, as insane. Um, to tie everything together. I like using my standard pattern of like make a YAML DSL to represent AI enhanced doodle functionality to represent.

Yeah, whatever. I'll just do that. Um, often I'll try like two different models just to see plot is going to add like 800 things that I didn't ask for, which is fine. Um, GPT five is going to be a little bit more structured. Uh, in this case, both are good, right?

It's like, I don't need to. I just want to get some ideas. Um, oh, it misunderstood doodle. So doodle calendar scheduling functionality. I hope that, uh, man is actually understood this. Uh, I didn't think about this, but this is yeah calendar options. Okay. So that's good. Um, didn't even think that this, this would like not work.

So, um, yeah, this misunderstood it as well. Doodle calendar. Scheduling. Scheduling. Functionality. Um, why is it not cancelable. Um, calendar schedule and functionality with GPT five, you have to be pretty careful actually about the prompts, which I'm not here. Um, yeah. So sonnet is like barfing out is crazy stuff.

Um, right. Cause like location type hybrid with different options. Like I don't want, I don't need hybrid locations right now. Um, time zones is a pretty good idea. I don't need tags, but you can tell like, this is the kind of stuff that like, um, um, sonnet likes to add, or I'm like a weight, a buffer time, like max meetings, like all of this stuff is like just crazy.

Right. Uh, so often I'll just like make it more minimal and design a DSL for actions that a chat bot would output to effect to a actions. Um, I don't know. Uh, there's a voting system. They're scheduling roles. Like sauna is this like in Jesus Christ. Um, so yeah, so this is good.

This is kind of the actions that I want, right? Like create poll at participants, suggest times at time slot. Then for, uh, this, I really don't need. Like what the heck. Um, so, uh, focus on purely creation plus time slot reservation. Um, so I'll gather ideas like this. Uh, the GPT five one is I think going to be much more, much more useful.

Um, so we have participants. I don't, I don't know if we want that or not. Uh, there's constraints. I don't know. Uh, make it more minimal for an MVP. Um, if I didn't feel so rushed, I feel a bit rushed because of the discord stuff, but I would at these points, I would like go back to my sketching on the iPad and just like, you know, think a little bit about what I want about these things.

Um, so this looks good enough, right? Um, so now I would have, I would have this little DSL here. One thing that I would do is like, oh, build a Google clone API only with a CLI tool in Golang using these, using this representation, um, use SQLite for persistence, allow users to add their name.

Um, which is what I drew out on the, on the iPad earlier. And then let me think document the API, um, allow for AI triggered actions to be executed. And then here I'll just like paste this like minimal list of actions with batching notify sync calendar. Like, oh my God, sonnet.

Um, yeah, so that is really, I don't know. It's getting worse and worse. Like three, five was already bad, but like, this is just getting like worse and worse. Um, so yeah. So submit response. We want to have, uh, so I'm going to paste this. Then we want to have add time slot.

And then we want to have add participant and create full. Uh, there we go. So here again, I'm going to use, uh, in this case, Manus, but this is also something I could use. Um, I could use codex for. Um, and for the sake of completeness, let's see what this guy's doing here.

So this is looking nice. Uh, yeah. So see, this is probably going to be useful in the, in the, in the sense that we'll have a UI that we can just like flop in and then potentially transform this to have a proper backend. Uh, because obviously I didn't tell it like any of the libraries that I want to use or so.

Um, which is, which is the next step is like in this first space where I, um, in this first phase where I'm, uh, where I'm doing some ideation. Right. So just to show you, um, in this first phase is just like gathering. A lot of ideas. And then what I call these like useful outputs.

I'm not trying to get like some meaningful software out of it. I'm just trying to get like useful outputs that I can then transform into, into proper software. Um, and as you can tell, I'm going to have like a ton of stuff and I don't really care about any of it.

If it doesn't work, who cares? As long as I then in the second step, right here, I'm going to start like curating and assessing things. Um, maybe thinking a little bit more about more of these features. Um, so, right. So there's like a feedback cycle here. Um, until I like transform this into more useful outputs, right?

Like something that's like going to be a little bit more cleaned up maybe. So this could be either cleaner code. Is one type of, um, transformation I could do. Won't get into the cleaner code. What that looks like. It could be like documents. Um, so this could be like tutorials about some on how to build something.

Uh, because man is, for example, well, like, right. It will do like a lot of trial and error and like iterate on its own errors. But at the end I can say like, okay, now that you figured something out, you know, just like make a clean document that we can then one shot in the future.

We can have specs. Obviously. Uh, we can have like, say a database schema. Or an API. Or even like a little library, right? Like a little go library. Um, I guess that falls on to clean the code. So I'm gonna remove that. Um, we can have a, we can have like UI designs screenshots.

Um, not sure what else we can get, but that's one kind of output that we can get out of this. Um, I mean, I guess they're all documents because we're on a computer. Um, and then the third step, so this is where the proper engineering comes in. So here, uh, again, out of these documents, I can feed them back into the first step, right?

Which is like just generating stuff. Um, but once I get into the third step, which I will going to call like engineering here, I'm going to spend a little bit more time saying like, okay, what does it mean to transform this into a code base that I wanna keep clean over longer periods of time?

Uh, the way I do it, which is in, which is in, um, because I have like a lot of libraries and go is I will start using my existing libraries. Um, so these are existing libraries. Um, and then I'm going to take plus, you know, the code that we generated or these, um, useful inputs.

I'm going to call them. Um, and out of this, I'm actually going to generate like real code, right? Like the final, final code. Um, and this part is going to be much more like document driven, like more traditional, traditional workflow. Um, but as you've seen, there's like a lot of like little design.

So I can take, for example, one part of the cleaner code, which is, I'm going to go a little bit deeper into what you can do out of cleaner code. Um, or how you transform these into reusable pieces is that I will consider like three types of output. So one output is a library.

So for example, imagine that I'm working on this doodle thing. And, um, I want to integrate with Google calendar. Manus is going to build like some kind of Google calendar integration. What I will do afterwards is select, Oh, extract the Google calendar part that Manus has built into a clean, reusable library.

And that's something that sonnet was really struggling with, but GPT five can do really well. So I will just give GPT five, the big mess that Manus made and said, like, you know, please extract a reusable Google calendar library. So that's one really nice way of doing things. Let's see if one of these is actually done.

Yeah. So we have a doodle AI clone. I don't know if it works. Um, for the sake of this demo, we're not going to look at it too closely. I'm going to create a new workspace. Um, with my labs glazed, uh, Geppetto. Pinocchio, Pinocchio. I think that's the repos I want to have.

And, oh, right. And then I'm going to call this doodle AI in action. So this is just going to check out like a couple of, uh, get repos into a workspace, right? So if you're going to make this a little bit bigger. Um, if we look here, I have like my different repos checked out, and then I have like a go work file that is going to be useful.

All the stuff that comes out of Manus, I'm going to put into the vibes repo. Um, so the vibes repo is going to be today. We're the 19th. Um, and then I'm going to look at the output of Manus. It's all contained in the same directory. Uh, I have no idea what's in there.

There's probably like a bunch of junk. There's binaries in there. Um, like Manus is just like total leave clot alone for a little bit and then see what kind of mess it makes after that. So, um, we have like a database in here. I have no idea how this all works.

Uh, actually, maybe I shouldn't have like deleted the open AI mock server. Um, but if we look at the read me, um, yeah, so it's building that. I guess we can redo this. Uh, so let me, let's see if I actually, if it can, um, if it works. Um, go work this, we're gonna build this into open AI mock server.

I guess that didn't work. Um, well, what kind of useful thing could we do out of this? Um, I'm gonna open codex. I'm going to say, hold on. I'm gonna open codex. In my workspace. And I'm gonna say, um, allow codex to work in here. It's like, you can't see your screen at the moment.

Oh, sorry. Yeah. Oh man. Thanks for saying that. I've been working on Linux for a bit. Uh, all right, here we are. So I downloaded the result from Manus. So if we look at the result from Manus, it did like a bunch of stuff in here. It has some pretty funky screenshots.

It looked like, um, and then it had like an API. So actually the screenshots are pretty nice. So what I'm going to do here. Um, I've checked it out into my vibes repository into a doodle AI clone directory. And what I'm going to do now is I'm going to add this.

So I'm just going to commit this and say like, uh, you know, a poop initial import, uh, Google AI clone. I don't, I have no idea if it even compiles or what it does. Um, but it's in here. It has like nice index HTML template. It looks like, um, I'm skeptical that that is what we've seen, but it could be.

I mean, it seems pretty minimal, but maybe it's just that good at things. Maybe it's just that good at things, huh? Yeah. All right. So one thing I can do, for example, is use my LLM framework to actually, um, to actually do LLM calls on this one. Cause I don't know which LLM functionality, how it implemented it.

If we have like an AI client that probably right. Like that has everything like built on its own, but I already have like a full setup to do these things. Um, I actually wonder if it's like worth just like testing it. Uh, yeah, so actually let's, let's test that.

Um, does someone know the base URL of open AI? I guess codex knows update vibes doodle AI clone. Um, to use the default open AI. URL. Um, yeah, so I'll do that. And we'll come back to this one. It's, uh, when it's done and also like copy some environment variables over.

Um, so that was one part, right? Like this is obviously not the finished product that I want, because I do want to have this like forcing of the output, calling the actions, which I don't think, um, man is built. I never trust man. It will tell you like, it works great.

And then you're like, uh, but we can still look at how this is built and how it does. It's prompts and responses. Um, so that was one solution. This is like GPT five thrown out, which came back with. It's, uh, oh, this was the Google meet transcript. Sorry. Um, so here we have our own little web app where we can create a new event.

Um, this was the first, the first, uh, Monday at noon. I don't know. I'm going to try things out Monday night. Um, there's no AI functionality here. There's an AI assistant. Okay. How can I assist you today? Make a meeting. Hold on. Hold on. What's going on. I can apparently only type one letter at a time.

So that's problematic. Make a, oh my God. Um, what happens when I press this? I suppose this is broken. I'm not sure what's going on here. Um, so because I know that Claude artifacts has like actually real functionality is like, I can only type one letter at a time.

I can also implement a real Claude code, uh, Claude backend where I can use my key since you can now do apps that call Claude for real. Um, so I like the UI here. It's like, it gives me some like nice ideas on how to do things. Uh, here we have our YAML DSL.

We already looked at that one. Um, make it more minimal with attendees. All right. And then here we have our crazy DSL. And then here we have the madness thing, um, which, what did we ask Manus to do in here? Build Google clone. No, build it. Oh, I, I miss, I misspelled it.

Let's launch another one where I don't misspell it. Make a doodle clone. Doodle calendar scheduling. All right. Uh, is this interesting so far? Is this getting like a little bit too confusing? I don't like I have 10 minutes left, so I definitely don't have enough time to get to a working solution.

Um, but this is kind of how I work, uh, day to day. We'll have these different ideas running. But actually, because I only input like a couple of lines into things, I will most of the time actually spend a pretty chill time just thinking things out on paper usually.

Um, so one, one thing I would do, for example, is take these gamble DSLs and print them out and just like draw on them. So like, yeah, I like this part. I don't like this part. Uh, I don't like doing that on screen. Cause it's like, you have to scroll all the time and it's like hard to, to do annotations.

So I'll build that and then I'll have like these little components that I then at the end. Push together. Well, have like a library, another really useful, really, really, really, really useful piece of output. They can have in the CLI tool. Um, because the CLI tool as bad as it could be is something that an agent will be able to use to iterate on the final product.

So as long as it has something to start and inspect itself with, right? So for example, a nice little CLI tool to start scheduling messages, or maybe a little CLI tool to trigger an LLM response and parse the output and print out the parsed output. Those things will really allow the agent to interact with the system as we're building it instead of, you know, using a playwright MCP or having me like input stuff into the web UI.

I'd rather expose an endpoint and then present it so that it can be. For example, if I have like a conversation with history, make a CLI tool that the model can call in one shot, right? But can still be used to conduct a full history. So there's many ways to, to, um, to build tools like that.

One thing where the tool needs to keep track of things over time. You can either use TMUX to let it run in the background, but you can also store the state in SQLite and then at each time resume it without having to burn all the tokens representing that state.

So if, for example, I, I want to test something with like a chat bot history, I would make a CLI tool where you start a session and then you just like append messages to this session. And you say like, oh, please do an inference on this conversation, right? And then that way the model can call it.

And call it, it like finishes pretty quickly. It only outputs the tokens that it cares about, but it's still doing like a long running, a long running system. Um, another part is obviously like rest APIs, or if you have something that's event driven, you can do event driven APIs, right?

Like event definitions. So that's a big part of all my terminal UI framework. Um, you can also build things like plugins is what I call it is. So basically just like expose your functionality to a third party API. So for example, an MCP tool is something you could add.

Um, and to do that, for example, I have a library that you literally give a go function and then it makes an MCP server out of it. So you can, you can take whatever random garbage man has gave you and say like, you know, make an MCP server out of it.

It takes like two lines of code. And then suddenly you can use it in cursor to do things without building even a CLI tool. Um, yeah, this is interesting. So there's, uh, oh, I'm, I'm switching back to Linux. I'm gonna, I'm gonna keep it, um, on the side. I think, uh, I'm starting to be at the end of what I wanted to show, which is, you know, each of these, of these little nuggets here now is kind of something that is starting to look like finished software that I can reuse in the future.

But if you look at the way we got there from here to here to here, all of this is kind of like Greenfield. It doesn't need to really understand an existing code base. Um, it just needs to have document explaining how to write in a certain style. Um, and I, I can show some of these documents that I use over and over.

Um, so let me see one type, one document that I use over and over and over and over is this. So if I go to glazed, which is my library to do, to do command line tools. Um, one document that I use all the time is called build first command.

And so this is basically a full run through on how to build a CLI tool using my framework. Right. So, um, if I go to codex and then I look at this and then I say like, make a CLI tool. See build first command, um, to interact with the SQLite database to store read doodles.

Um, save the CLI tool under doodle AI clone. Uh, and then I'm gonna save it as command. Doodle CLI. That's basically now I know has enough information in it to actually not only make a CLI tool, but actually start to use my library to do things like structured output, like defining parameters, defining flags, like all of these things.

And once this has been packaged as a CLI tool using my framework, I can then suddenly use it to build further things that use my framework. Right. Like I can start integrating it. Um, so step-by-step I just transform things, but as soon as I have the CLI tool, it's able to inspect its own doodle database.

So I can say like, Oh, add an endpoint to do X, Y, Z. And it will be able to do call calls, but it's also going to be able to use the CLI to double check that things worked well. So step-by-step I like kind of build my whole end product out.

And that's why the planning step is so, I think so important is that I need to have a sense of all the little nuggets that I kind of want to have to get my final product going. Um, part of it is just like software architecture experience. Uh, but the other one is also just like playing with these things all the time.

Cause you need to see where the agent capabilities are at, um, in terms of what they do. So for example, transforming a library was impossible with Sonnet. I just like, couldn't get reliable results cause Sonnet would just like add 15,000 features all the time, would run into an issue, create like 15 test scripts.

And at the end, I just ended up with like three half done libraries, which is not what was really useful. Um, but now with GPT-5, I actually can take three half done libraries and say like, please make a nice one. And trust that in 15 minutes, I'll have like a nice library.

Um, I'm going to stop here, um, to see if there's, if there's questions, if there's, uh, people who have similar workflows. Or if you just want me to continue showing some things. Well, I guess like the, the, yeah, the kind of problem that I'm, uh, running into is like, so if I, if, uh, in an ideal world, if I am onboarding to a code base, that's like already Brownfield.

Then I'm, then I, then I, then the code base has been built kind of with this sort of structure in mind. Uh, and like, you know, we kind of have a, a somewhat common tool set. Um, but, uh, we do not live in an ideal world. So if I'm in like a code base, that's sort of normally structured, that's like kind of sloppy, um, uh, basically like somebody's been driving cloud code in it, but is not like a super cloud code power user.

And I'm like trying to grok like the good onboarding flow of like, okay, how do I kind of re-architect this thing at least, you know, to some degree. Do you have an example of a code base? Like, do you have something in mind that's concrete? Cause I can think of Wanderers right now cause I'm in a different brain mode.

Um, yeah, unfortunately I do. I don't have one. I don't have like a good OSS one in mind, but we could probably just like pick like a random one. Uh, I got it. Oh, I got a good one. I got a good one. I got a good one. Um, what about this one?

I don't know who that is, but, uh, it's like four. Yeah. It's like four. I got like a big giant ancient Python thing that my bot runs. All right. That's that sounds great. That sounds great. Give us that. Just start a red bot discord bot. Okay. Let's see if I can get my browser too.

There we go. Okay. Here. I'll just drop it in zoom chat. Um, yeah, just paste the link. So it's, uh, com creators. And so if we are creators, uh, red discord bot. Yeah, you got it. Um, yeah, so this one is, this one is like well documented and not super sloppy, but it is big and giant and full of ancient pie.

So, um, I'm going to try some yellow. Cause this is something I've never tried, but I'm like, okay, can you just do something with this? Let's do it. Um, and then the other thing that would do is like, oh, analyze this and write a nice onboarding, uh, book. So, um, this is something I love doing.

It's like, it's unclear if deep research is able to read source code or not. I never really got this to work. So I'm not gonna try this time around. Um, but I'm gonna try to deep research again. Cause that it wasn't able to read through GitHub cause it got blocked or something.

And then I'm gonna try the same with, uh, chat GPT five pro. This is. To throw in on this one is the code base that I'm working. I'm trying to get into specifically has a bunch of, uh, like old MD files in it that reference the old structure of the code base.

So when I'm sending LLMs into it, they're getting tripped up by, uh, all these like little spec docs that got old scripts and stuff. Uh, it's like, it's like one of the, it's a very like in progress kind of thing. Um, as opposed to this with the, where, as far as I know, the docs are like fairly accurate.

So I'm suspecting that GPT five will be able to figure that out and then just read the docs and give them to you, but we will see. Yeah. So this is, um, this is like the next way of doing it, but this, this really plays into the concept of what I'm, what I'm saying, right?

Like we have some kind of input. That's obviously that obviously is useful. Cause it's like a project that you could imagine that, you know, in an ideal world, this was like generated by an AI and you're like, Oh, I want to do something with this. I actually don't really, I'm not sure of the quality.

I'm not really sure how to get into it. It's like missing a nice onboarding thing. I don't fully trust it necessarily. So what I will, what I will, um, do here is, uh, I'm going to Yolo this now. Um, I mean, we could run an innate, but then we could also say like, you know, uh, analyze this code base, looking for AI slop, identify entry points for a new intern joining the team.

Right. A full analysis, uh, document. This is going to be, um, pretty terse. And the way I usually store those is like right in T 10, 25, 0, 9, 19. full analysis of the code base for quality. And for, um, in turn to start, um, adding all the context necessary to get started and understand what this is about.

I'm not great at this prompting, but this has worked for me fairly well. GPT-5 is really terse in the documentation. So usually once GPT-5 has done a skeleton, I'll switch to sonnet and say like, please sonnet, fill this app with all the details you can find. I go in the opposite direction where I will send sonnet out to write all the documentation.

And then I'll point to GPT-5 and be like, Hey, this is what sonnet gave us. Uh, can you, uh, fix it? Yeah. I think, I think your way might actually be the better way. I'm not fully sure. Um, so, uh, we'll see, but so, so this is like kind of now I have like, how many did I start?

I started like five or so. One, two, uh, then GPT-5 row. Did it even start? I, one of those didn't start. Uh, so this is the deep research. Oh, and this is pro and it hasn't updated its title yet. So we have one, two, we have madness, and then we have this.

And I know all of these are going to take like 15 to 20 minutes. Right. Um, so at this point I can actually chill out and go either read the code by myself or actually think about what I want to build with this bot overall. And just like take my time and design things out.

Cause I, this is going to burn my brain when I go back to it. Right. Cause I'll have like four interns telling me like, Oh, look, I analyze the code base. Like here's some documentation. Um, so I don't need to start more agents. Right. I, I, I, this is, this is going like, I don't need, I don't need to start things in parallel or like start editing code or whatever.

It's like more about, I want these useful documents to understand what this is all about. Um, yeah, yeah, no, it makes sense in that. Uh, yeah. Like kind of my, how I, uh, using, been using a, a similar approach or like, I sort of, uh, like to me, this is reading as like an ensemble agent approach.

Right. And I have that, I have like code companion in EnVim and I just have like an ensemble hotkey, but I usually, the way I usually use that is I'll be. I'll have had a few attempts from coding agents and it's still very broken. So I'll be like, okay, now I need to pull out the big guns because ensembling is expensive.

Um, but I guess ensembling is cheap now. If you put it in, put enough of them in enough agent harnesses, the deep research angle is interesting. Um, yeah, I, I mean, I, I have the $200 per month plan, right? I've never exact, I've never hit a limit with it and anything that I do.

So I'm just like, and sonnet like a sonnet is expensive, right? Like Manus and sonnet is like gonna be a 10 bucks per run. For, for like this kind of general flow, which makes sense to me. Cause like in both cases, brownfield and greenfield, essentially what you do is just like Yolo mode at first.

Give me a bunch of like, I know it's gonna be a slot, but maybe there's gonna be some good stuff out of it. Collate it, then pull the good stuff out of it. Um, what I'm curious is how much benefit you get out of GPT. 5 pro in this process.

Is it like, I couldn't do this without pro or is it like, okay. Like pros like a nice nav and it's gonna take 30 minutes. I'm just throwing them there. Anyway, kind of thing. Um, but question. I haven't like played too much with GPT 5 pro. I usually, when I have something where I need this, where I know this needs to read the docs quite a bit.

Cause codex CLI is not great at reading the internet, even with the search functionality. Um, where I know, Oh, I want this thing to go back and forth with the online docs and all of that. I've started using GPT 5. Um, I also use it a lot for like criticizing things.

So let me see, but usually like less code stuff. Um, but, uh, yeah, so this was, this was like a good example of 5 pro where I was like, Oh, just like compiled from the start. Like I don't, uh, which required, right? Like I know that this required, like analyzing some API and understanding how to do these things and whatever.

Uh, I've had really, really good results having it analyze like, uh, scientific papers. So if I do have something that's like requiring, I actually, I could try this dust detection algorithms. Um, actually let's try that with GPT 5 pro. Like that's something I would use pro for it, which is like research dust cleanup algorithms to clean, uh, film photo scans and implement.

If you, and do, uh, do a research on the literature and find implementations online and write media for reports. I don't know exactly how to prompt GPT 5 pro either. I must say, and it's so opaque that I don't know. It's gonna do stuff. Um, if someone knows here who's using GPT 5 pro, um, I I'd love to hear more takes.

Um, this is finished, right? So, so this would be like a, uh, a good time to, I, I will often print this out to be honest, cause I like paper. Um, but if we don't print it out and look at the, oh man, this is, there we go. Um, this will give me like inputs to the architecture.

Uh, what's really useful about this is that it gives you the, this is kind of like an, a starting point for everything in the future. So every time I prompt an LLM now I, it actually mentions which file path, which commands to run. It's a really useful, like starting document to always put into your context.

It's a more useful, like agents MD in a way, right? Cause it like, uh, it has like more intent behind it. If you compare this document, which shows you, what are the entry points? What are the file? Like which commands do you use for X, Y, Z? If you compare it to agents MD, agents MD is kind of like a, a very boring kind of like, oh, you've got a folder that contains rate.

Like it's, it's a less, it explains less what the things are about, but because you prompt things with what you want to have, it's not going to match really strongly, but it's going to match much more strongly on the intern document. because it's explaining the project in terms of using the project.

It's not explaining the project in terms of like a generic agents MD prompt. Um, so that's really like a really useful technique as the first step of having like a big code base and just be like, oh, explain this code base for someone who's new to the project. Um, which usually does a better job than whatever this stuff, what the prompt for this stuff is.

Um, so, I mean, now that we have it, you would get to the second step, which is like, oh, can you run it? Can you build this? Which, uh, which, uh, is often not that easy. Um, I'm curious, do you clear context between those two? Where it's like, Hey, oh no.

Interesting. Interesting. Interesting. I, the wet, the thing that I would think is that, um, what you, or what, like how I might approach that is, um, oh my God. I hate OBS. Um, we have, we're at the end. Sorry. I'm not like a good host. Um, yeah, I guess at the, to, to, to wrap up, I'll land the plane real quick.

Um, the way that I might approach that is if I'm writing an intern onboarding doc, the way I might test that doc is by after the model writes doc. Number one, you clear it out and then see if model number two can figure it out. Okay. Yeah. So, so in this case, like there's no, uh, we're gonna chat all the, all the time.

And in this case, I would take it a further step, which is like, oh, run the project. Which allows me to, cause at this point it still doesn't. Like, I probably needs more information on what to do. I'll let it like, you know, burn all its tokens, failing to run itself and like installing libraries and doing all of that stuff.

Uh, and then I'll have like the final version of the thing. Right. And then you can kind of test it with a fresh agent. So like, look, is this enough? Um, and probably what I would do in, in your case is like, once I have the thing running, I would select build a bunch of CLI tools so that the next agent can actually interact with the system.

Um, right. And then, and once I have these two things, which is a documentation of the code base, at least where to look for things. And I get it running, plus a set of like bunch of tools that allow me to interact with things without having to finesse like a web UI or whatever.

Um, then I can start like either refactoring the thing or starting to add functionality. And I'm kind of on my way at that point. Right. Um, all right. Uh, I mean, I guess that's it. Uh, it's a small group today. Right. I'm switching off of OBS after this. I don't know how much of the audio I lost, but I definitely lost some of it.

So I, I, when are you doing yours? I think this is being recorded. Uh, I think on Tuesday. Yeah, for sure. I think on Tuesday. Um, so I'll prepare a little bit better so that I, um, I have a timeline of what to show for a, uh, for doodle clone.

Um, yeah, yeah, well, yeah, if that's the, if that's the real deal, then yeah, I'll see if I can swing by and catch that just so I have continuity and, uh, in my non-continuous upload schedule. But yeah, um, I guess, uh, yeah, the frickin' GG, everybody. Good stuff. Cool.

Thanks. Cool. Later. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Thank you.

[AI in Action Sep 18] Beyond Single LLMs: Building with Parallel AI Agents

Chapters

Transcript