Back to Index

[AI in Action 12 Sept] Building a Scalable Discord Scheduling Bot with AI Agents


Chapters

0:0 Introduction to Agent-Driven Development and the Discord Bot 00:54 The Concept of Agent Chassis and Challenges 01:21 Designing for Testability and Incremental Builds 02:26 User Experience and Interaction Flows in Discord 03:07 Addressing Security and Permissions with Agent Roles 03:39 The Role of T-MUX in Managing Agent Execution and State 04:00 The Importance of Well-Defined APIs and API Contracts 04:25 Orchestration and Monitoring of Agent Workflows 05:10 Demonstrating Real-time Git Repository Tracking with a Web UI 06:20 Challenges with Shell Scripts vs. Structured Code Generation 07:10 Towards a More Robust and Maintainable Agent Architecture

Transcript

Yeah, welcome. So Flo, you were proposing a Codex CLI day. I was figuring we could try doing doing that hacking on the bot here, though, then I realized I need to spend the time if I want to hack on it to set up all the API keys and whatever.

Yeah, it's definitely a pain, not for sure. I think there's like a lot of like unappreciated work that went on on behalf of David to get that stuff set up because like I tried myself and it's definitely a pain. But I think Codex CLI could help. I also think it's just been getting so much praise lately.

It's funny because like Oliver called it. He was like, yeah, in a couple of months, like all of this scaffolding that I created around cloud code won't matter. And at the moment, I didn't really see how. But now it's like GPT-5 is actually really good. I think a lot of the scaffolding can still be useful.

Like I built out a little app. I can show this maybe a little app using Codex to pull down a bunch of resumes from greenhouse and then spawn off sub agents to go and run them. And then it could do would run those through either Codex or then later set it up to use Cursor as it because it was I had more budget there and but yeah, it's pretty good.

How do you how do you like the results like the output of the models that they go through that scaffolding? I mean, GPT-5 is great. I don't know how anybody would say it. Like it's great. It's a it's a step function improvement when it comes to writing software. Yeah, there's I realized I was putting up with so many little weird imperfections in the last generation of models.

I mean, the current current generation like it's still stated art, I guess, but there's so many little things that you don't I spent like maybe the last day and a half. It's like hardcore cursor open for like 16 hours a day type stuff. And yeah, GPT-5 is really good.

Like I didn't I didn't realize it was that good, but it's it's really actually. Yeah, no, it's it's like I hate on it so much. I can't like I see a freaking emoji. I'm like. Well, which is amazing because I used to love Sonnet. Yeah, it was like, I mean, it's still good for some things, but it's like a traumatic response at this point of like dealing with Sonnet Slop.

I think it's easy to forget how fast the progress is, but I mean, I've been on GPT-5 high in cursor for like three weeks. I've never changed. Are you guys doing the cloud code sub agent where the sub agent its only tool is bash and the only command it's allowed to run is like is like cursor agent or codex or whatever.

You take the take the other chassis and you nest them. nest them. Actually, I do use Sonnet to write plants for GPT-5, which is like the best of both worlds for many things. Yeah, because I'm yeah, I'm like currently thinking about how to. Because I'm like. I think cloud codes chassis is the most full feature, so I'm like, okay, I can have.

If I have like my nested cloud MDS are going to work this the way that I expect them to work. If I'm running these other ones, if I'm running all the other ones headless, then I only have to worry about like one config kind of kind of methodology, I guess.

So I'm like, but I'm like, is this the right chassis like should cloud code be the top level here or or something else. I'm not sure. So unless somebody else is excited to drive. I have not gotten to code all day because I've been stuck in catch up mode meetings.

I'd kind of be I'd be happy to just throw code exit. I I pulled down the repo and I told it to set up an agents MD. I've set up Mongo and an open router key. And what I was thinking actually is since I don't want to mess with the whole discord stuff.

My first thing would be to tell it to build me a CLI that lets me interact with it without discord. And we'll just see where it goes. So maybe we can do that. So this is doing right now it's building agents and that's kind of slow. And that is a sort of challenge when it comes to live coding.

These things is GPD five is great, but it will do this where it just like sits here for a long time. And we're hanging out and then, you know, 10 minutes later, I'll come back with something brilliant. But it's not, it's not the most demo friendly model. Well, great discussion point, right?

What do you guys do when you're waiting for GPD five to finish its stick? Actually, that's something I could share. I would need to set up my, my zoom so I can share screen. But I use the time. Like often you can actually let some run for like 10, 15 minutes, right?

Which is like the length of a Pomodoro. If you know the concept. So I'll usually like open a sketchbook and just like start thinking for myself for 50 minutes. Cause I, I can trust GPD five to come back with like useful results, right? It's not like Claude or sonnet that you need to watch a little bit more carefully where you're like, what are you, what are you doing?

Um, so I really find this like 10, 15 minutes of deep thinking for myself is really valuable. Interesting. The, um, and I I'm presuming, so if what you're doing there is I'm presuming you're thinking about the project that you're working on. Yeah. Um, so I'm curious what, yeah, what at that point do your specs look like if you're still sort of noodling?

Um, I, I mean, I've always done the noodling, like building software on paper. Well, mostly, I don't know. I'll just like sketch out different subsystems or ideas. And by doing that on paper is like how I get like maybe a better understanding and also, right? Like the different, basically with these agents, what you want to build is, is you want to design your software such that it can be built incrementally.

in blocks of 15 minutes worth of code. Yeah. Because, and then building these little 50 minutes things so that they can be built one shot by a model, right? Like you get a sense of how much you can ask out of a model. And then how, if one of those jobs, these 50 minutes jobs can be integrating two of these prepared blocks.

So often it mirrors like existing software. But for example, building a CLI tool to examine something, right? Like that's kind of a 15 minute task for a model. And this tool might not even be that important in the bigger scheme of things, but it is an important building block for the LLM to do another type of 50 minute job.

Like for example, integrating two APIs, if it has a good CLI tool to do that, then it's able to do it in 50 minutes. If it doesn't have the CLI tool to test the API integration, then you're kind of stuck in a human loop. So often like just sketch out these like smaller, smaller sub features.

And that's actually what I want to show when, once I figure out my screen sharing. Yeah. So like, while Claude is working, the idea is like a thing, you know, you can think about general software architecture, but you can also kind of think about the, so like, okay. In order for Claude to do this, it's going to need a tool.

So I should have, you know, current Claude or a sub agent start building this little CLI tool. That's going to help the next agent build this part of the architecture kind of thing. You're muted, Manuel. I see you're talking. Yeah. Something, something like that. Let me, let me, I'll, I'll come back.

Sure. So I'll go while he's doing that. So I've got it. Oh, it did it finished. So I'm going to actually look at what it did. You've adopted the T temp nomenclature. Oh, yeah. I mean, I, Manuel's on my team. So I just like follow his lead on like everything.

But here's what we do. So we want, let's just scan what this is looking like. So we want to give a CLI access. Reuses existing schedule history, et cetera. Um, this looks reasonable. Okay. So it's got some domains. Build out some adapters. Okay. Sure. Uh, proposed files and scripts.

Um. I mostly just want it to do. I don't actually want these structured interactions because I want, I mean, these are fine, but I actually want it to really focus on mirroring the same interaction style because I want to be able to test it. So I'm going to, of course, correct it.

Um, so adjust the implementation. I don't, uh, care as much about the structured, um, as I want to be able to test the same way the discord user would. So prioritize the chat interactions allow, uh, threading, um, or expose threading, uh, and maybe expose at mentions or something like that.

Um, essentially just trying to guide the thing into what, what do I actually want here? Which is, I want to be able to fix a bug I ran into in discord without me having to figure out the bullshit that is discord tokens. So go ahead. Um, what prompted that prompt?

Like, what did you see that? Cause I missed that part. What did you see in the. Yep. Yeah. So I was digging through here and it said proposed. I was kind of going through and it says proposed files and scripts and it's got this. Oh, here's my name of my CLI.

And it's like, Oh, schedule this way, schedule this way. Um, and it's got, uh, these examples. And I'm like, Oh, this is, this is providing very structured, um, CLI options, which makes sense. If what I was trying to do was build programming, building blocks, which often is what I'm trying to do in a CLI.

Uh, but it's not actually what I'm trying to do in this particular case. So like, I see why the model went there because that's how I would normally design a CLI. But here I'm much more interested in how do I replicate discord without discord? Um, that would might actually push us to should we build a TUI, but, um, you know, a terminal UI that actually, you know, replicates the different thread.

Um, things, but okay. So it claims it updated it. So let's take a look at this. And the, uh, uh, uh, with regard to the TUI, we could do the class of the Emmanuel use SQLite and use my Tmux panes to, to give us a little dashboard thing. Um, so actually here, interestingly, it is already giving me a repl, which is halfway to a TUI.

Uh, that's what it's looking for. Um, so sure. Like this, this might not be terrible. Maybe this is a good start. So we'll have repl entry. It's got a discord sim. Yeah, sure. Let's see, see how it does. Um, so I tend to now, now that I've got a version that I think is nice.

Uh, my habit is then I will, um, commit that because that is actually the fundamental building block. And if I want to throw things away and come back to that, I will, um, oops. I hate the way that the zoom screen share is actually covering some of my thing.

Um, Um, okay. And then I let it go. Um, so that's going to take a while. Yeah, that's my, that's my cue to take over while this is running. Awesome. And then once mine, mine is running, you'll take back. Sounds great. Uh, do I need to stop sharing for you to do that?

Uh, yeah. So just to show, this is kind of what my sketchbooks look like when I built my stuff, but that's, I don't know. I've always done that. So this allows me to, I don't know, come up with ideas. Um, but let me see, let me share my Linux.

All right. Can you, can you read that? That readable? Yeah. So one thing I'll often do similar to like the codec CLI runs or whatever is I just throw stuff into, into Manus. Um, right. So I'll, I don't know. I had like an idea about a streaming go YAML parser.

I'll just like literally just ask it, like, please build this like idea that I have. And then I'll come back and I'll see if it works or not. So I'll often do that. And those are like more like little ideas of stuff that I, that I'm interested in. Like, I don't know, SQLite vector extension.

I've tried this like 15 times. Um, another thing that I'll do while I'm, you know, writing in my sketchbook and I like to have ideas or while anytime I have like some kind of idea, I'll be looking for things. Um, and this morning I had, I was building a terminal tool and I wanted to have like pretty output.

Right. So what I did here, um, I was like, Oh, I want to design a little go library to help building CLI tools with colorful output to make developer tools. Um, and then I just asked it to make like a couple of API sketches. Um, and out of those three designs, I just took that straight output from chatgpt.com.

Cause it has like web search and all that kind of stuff. Um, and I pasted it into, into, um, into cursor to, to build rate. And then it will run for 15 minutes. I have time to do something else. Um, I did that during standup this morning, Kevin. Um, and because I know that we will like usually finish with success.

I'll just like start iterating on it. And then like, this is the kind of stuff that I get now, which is like, you know, like a fully. Async kind of thing that can do sub tasks. And that's a good building block to now be integrated into a bigger construct.

Um, so I'll have all these like experiments that are just, I don't care about the code. Right. I just want to see, does it work? Does it like look like something I want? And now that I have this little tool here, I will actually use GPT five for 50 minute run to kind of integrate it into a separate library.

So let's do that. And then it's back to Kevin. Um, so it's, uh, make design a, which is like my first design, um, a package in Boba T and Boba T is like my library where I have all kinds of different little terminal UI features. Um, so usually I'll do this first analyze Boba T and figure out how the other components are.

designed and where the docs are then follow those guidelines style. I don't know. Pretty messy, but usually this is, this is a pretty good. Okay. Yeah. I'll just show you what Boba T is. I'll just show you what Boba T is, which. Yikes. Those are the kinds of things that I build in 15 minutes increments is, um, there's a bunch of components here, like a file picker or a rebel, right?

Cause a rebel is something I use over and over. So now I have a rebel library that comes in really handy. Cause I can say like, Oh, make a rebel. Like, by the way, look at, look at this package. It already exists. Um, and the rebel over time has become like pretty sick.

Um, I don't even know, uh, what the sickest version of it is. Is this something that, is this like a process that you kind of like repeat as kind of a new thing in every repo or project that you're working on? Or do you have like a, like a little tools folder kind of thing.

That's like sort of separate that you kind of refer back to that has things like the rebel and stuff in it. Uh, I have these packages that grow over time for different types of functionality. Um, so for example, one of the things that I did was I started building this like agent framework and I have it like a timeline of a chat, right?

Where you have like the different messages of the bot and then potentially like log messages. Uh, so that was a component. It was called timeline. But one day I had the idea, Hey, wouldn't it be cool to have a rebel that uses these timeline components and that I just gave to codex.

So like, Oh, integrate the two show me, you know, what the results are and then concentrate on my actual work. But when I came back, it was like, Oh yeah, that's pretty good. Um, I actually don't know how this works anymore, but, um, see, this is kind of now it's using these timeline components.

Um, and now I have a reusable thing cause I asked it to make a documentation. So now if I want to do, uh, you know, why not, why not actually start a second codex here. Let's start a codex CLI. Actually, I don't use this enough. Um, which project is this console output library.

And I'm going to start codex and I'm going to say, use the timeline.md. Repple.md. Um, the, um, D for the rebel timeline. I don't think I have documentation yet. So that's going to be a good, uh, a good, um, good use case to build a rebel to interact with a rest API and Yolo, right?

Like that. I'll come back. I probably just doesn't, I forgot to set the approvals, but. Um, and most certainly because I know kind of how it interacts with these components of mine, this is going to be fairly useful. It will probably have like slash commands to talk to a URL and like add a timeline.

Widget for HTTP results, like that kind of stuff. And then it's cool. Like, it's going to be a little bit messy. I'll ask it to clean it up and then I'll have like an HTTP rebel, um, which I can then reuse on my next project. Because the way the API for this, like timeline thing is built, it allows me to basically greenfield the HTTP rebel.

Cause it's, it has like these plugin API kind of APIs that allow me to just like all these little atoms are like green fields on their own, but they kind of connect together. Uh, and then sometimes it turns into a big, big mess where you're like, I have too many little connected things.

Now I need to clean up the space API. So those are, those are the times where I really need to think for a few days on how to do that. Um, Kevin, you're, you're good. You're. I mean, it's still going. It's still going. Yeah. So it's amazing. I love, I love GPT five.

I mean, uh, do you want to look into what it's doing on your side? Otherwise we can look on all this side here. Right. I mean, we can, um, or I think, yeah, if you have something interesting to show go, cause mine. Oh, wait, it looks like it's, it's wrapping up.

Okay. Maybe I'll take back over and, uh, Then, uh, mine's doing interesting thing, but like, it's your, it's your turn. Oh, I mean, if yours is doing something interesting, you can show. I can show. So the, this, um, this little console output thing that I was showing that shows like progress.

And like, it is able to show like individual tasks being executed in a group. And then when the group is finished and like collapses it, it does all these little things. And it's all event driven, which is one of my ways of connecting things. And the timeline is also event driven.

You can map a component to an event. So this is what it's been doing here. It like identified all the events used for the console. Right. And then it started creating a render for the individual events. And so it's starting to like basically copy over all the structure of this console library, but matching it to this like event style that is in Boba T that way it starts to be incorporated into my ecosystem.

And, and my thinking is like, you know, one-off YOLO thing in Manus or in codex. And the second task is going to be like, okay, now shape what exists into this form that allows it to be integrated. So it's like a transformer step, right? Like LLMs are great at transforming things from one to the other.

And then after that I'll transform the result into a test program because I want to test the thing. And then I'll transform the whole thing into a document to be reused in the future. So it's this like kind of four-step process, right? Like Manus YOLO transform to something that fits my framework, then transform it to a demo and then transform it to a document.

And I've, I've done this like dozens of times, like multiple dozens of times now. So I know that just like four-step kind of process works really well. So yeah. Okay. So you have your, you have your, your baseline kind of, you know, separate folder that's got like general tools and scaffolds and stuff.

So we go Manus YOLO, then we go adapt, like kind of human steps in, adapts it to our little kind of external tool folder, whichever ones are relevant. And then we take, we go doc and then from doc, we get to code. Yeah. I'll post some examples in the chat, like concrete examples where it goes from Manus to finished code.

I'm curious. Do you, or I guess, yeah, either you or cable have any favorite slash commands or favorite sub agents that you've used, or do you guys not use those? Don't use it much. I don't, I don't use any besides the documentation for my projects. I don't use anything tree made.

I just like rock, correct prompt. Nice. Nice. So coming back, it did a first implementation. And then it asked me, it says, do you want me to, if you want, do you want zero code duplication? And so I was looking into it and it's actually embedding all of these like responses of, oh, you're doing.

Here's the prompting here's the, this all in this AI chat set, which is totally not what we want. Right. The goal of this is to be able to do it. So I'm coming back. I'm course correcting it. Right. And saying, oh yeah, please refactor. We want, and I'm trying, I don't know the extent to which this is actually helpful, but I try to tell the intent to it because I think that will then help.

So like, I want to be able to test the logic behind the discord response. All actual things should be shared. So we're going to. I'll do that too, but I'm, I'm not sure. I'm always one worried that I'll confuse it. You know what I mean? It's like, I should only talk to you about what you're actually doing right now and not tell you about this other stuff.

Cause then you'll run around doing it. Especially bad with Sonnet, but maybe GPT-5 is a little better about it, but I do that too. I've noticed where it's like, I'll give them a heads up in hopes that they will drive in the right direction, knowing that what we need to do next.

But I feel like I need to do more like data collection around it. Yeah. My personal experience like Sonnet sucks at that, but GPT-5 is really good at that. Like I had to go task by task in a lot of situations, but with GPT-5, like I can give it two, two things and it'll like do both correctly.

Sometimes three, I'm feeling lucky. Yeah, so I mean, I think it might actually be interesting to go back to our initial doc and see, like, where did we go astray? Cause if I were being curious about this, instead of having it correct, what I'd do is I'd come back and edit my doc and then have it re-implement again.

I think this is just not specified at all in a lot of ways. It looks like it, yeah, I don't see anything. We don't specify one way or another. We did say like a thin chat adapter that mirrors the Discord message events. So if I as a human was interpreting this, I would have interpreted it differently, but we didn't specify one way or another.

And it kind of like got to the point where it was like, oh, do you want me to do this? Like it may be actually inferred that this was a possibility, but it did ask. So it'll be interesting to see if this course correct where it gets it to.

Sometimes when I'm waiting, then I'll spin off another thing. Often I like work. We have multiple repos, so I'll usually do it in a different repo. But here I was actually thinking I might do it in this same repo. The only question is how much refactoring is it doing?

Do we need the new refactored code in place for it to be able to figure out the right approach to it? I think we could probably get it started. So I'm going to do another codex here and I'll say like, you know, I want to add some functionality. To the bot we allow scheduling on behalf of others, but not canceling or rescheduling on behalf of others.

I'm going to look at what would be needed to implement and write a spec. So interesting or yeah, I wonder if did the thought occurred to you cable that this might be a good place to use sub agents. What occurred to me was if we have one agent that's in the middle of working on a refactor.

If we tell that agent, hey, deploy a sub agent to go do this, then it should be able to send prompts to that sub agent, but also including the refactor. Yeah, in its window that there is a refactor ongoing in another spot. Well, that's a good question. So let's instead of doing it here, how would I do that over here?

So how do I do a sub agent while it's working? The I know in Claude code, as long as you have the agents defined, you can just tell it, use sub agents to do X or Y. I don't know how sub agents, if they even work in codex, how they work.

Because if it were me, the way I would have designed the sub agent system would be to allow the initial agent to wait if it needs to. All right, so I just took the prompt that I hadn't yet entered in the other one copied it over here and I'm trying this so let's see what happens.

Do you do slash agents? Do you see anything in there? Nope. Interesting, interesting. Does codex even have sub agents? I'm not sure. I'm Googling it now. I don't think, I don't think. I don't think. I don't think so. Everything I think about code. Hmm, interesting. All right. All right, so this is just queued.

Not doing anything. Maybe I'll actually just kick it off in my new thing over here. Now, I had a conversation with somebody from augment code, which is another one of these tools, and his stance was so long as you have a CLI implementation, you have sub agents. Because you expose the CLI as a tool, you tell you can use the CLI, and now you have sub agents, and that's all you need.

The asterisk I would put there is, do you have asynchronous tools, right? Can this be happening? Can I kick it off and manage it without having to wait for it? Use Tmux. All right, so we should try that. Let's see, this is already doing work. So the is the use Tmux.

So if build yourself a CLI and use Tmux is is Tmux the thing that lets us kind of jury rig our async kind of blocking stuff, or is that am I not reading that correctly? Yeah, what do we want to build with multiple agents? Yeah, but what should we do?

Let's try it. So what was one thing that that I found work works well to do multiple agents. If you have an idea, remove that prompt for now, Kevin, just to do a little bit of 3D work. But if you guys have a feature that's like paralyzable, like a web app with a back end and a database and whatever.

Okay, well, let's let's do a web app expose. I'm still in our Discord app. So let's build a web front end. Our AI in action. Give us a give us the dashboard for our CLI that we're building kind of thing, maybe? So, Kevin, let's try this prompt, make a mermaid diagram for the, I don't know how to split up work for a parallel team of agents to build a web front end for XYZ.

Let's see, split up work for a team of agents, build a web front end for our bot. And probably we would need to analyze the requirements a little bit up front. I don't know if it's going to do it yet. Let's YOLO. Well, let's see. Maybe set the model to high.

It would take to build a web front end for the spot. Make a mermaid diagram for how you'd split up the work for a team of agents. Let's just try that. See what happens. Yeah. Meanwhile, going back over here, it's implementing these different things. Let's see what's happening in our get repo.

So the refactor is continuing. We now have lib shared. It's got a message handler in it. This is doing some of that logic. Okay. Let's see. Let's see, does that. So this is actually, the logic is still duplicated here. So it's still in the process. This one is still trying to do something here.

And here we're starting. And this is my experience a lot is I will put these threads in and then I will wait and we will see what happens. Yeah, we should have stored this in a document. I think we can ask it still. Oh yeah. I'll say write this into a document.

I, I find it. Right. The spec and the diagram into documents. I mean, it cues, it cues request, right? I think. Yeah. I think it cued earlier when he made a request. Okay. So what does those sub graph look like? I mean, I guess it doesn't render it. Yeah.

Cause here you can see it. Like there's a bunch of like sequential dependencies, I guess, in the graph that we were just seeing. And then, then the parallelization starts, I think. I'm not sure. I'm not sure exactly what this is. Some of these actually parallelization looks like starts at a two.

Yeah. So let's see. Meanwhile, my rescheduling change finished to spec. So we can take a look at that. Oh, and I have a good idea for another one to kick off. That's directly related to this. Okay. So let's, let's, let's, let's, let's, let's, let's take a look at that.

Oh, and I have a good idea for another one to kick off. That's directly related to this. Okay. Let's, let's see. Oh, do we have a permission model in an audit trail? That's not something I don't know if we have that in our. Um, do we have a scheduling admin role?

That's fascinating. Yeah. Do those roles. My suspicion is we have none of those things. Yeah. You're referencing a lot of permissions and roles. Do those exist? I could pass. I could pass. I could pass. This one is still going. You can start a codex with that search. And then it has web search by the way.

It doesn't for default. It's like, it's edit tool stuff goes wrong. More often than not. It's a, that's a bit annoying. Yes. Okay. So it doesn't have any rules or whatever. So let's. Discard. Discard. Discard the rules, et cetera. Let the scheduler. Well, the speaker. Anything. I don't talk.

And so at this point we have three things going on and that maxes my brain out. I can't handle more than that. Um, so we can see that the front end spec just finished. So let's take, we can take a look at that. Um, I actually liked when we were using amp code instead of codex, because it, yeah, let me say my brain can handle three amps and you know, it sounded like current, which is great.

Um, I don't think three codexes has the same ring to it. Um, all right. So this talks about a minimal API. It exposes these things. Uh, this is once again, going to structured. Structured. It's not, um, doing unstructured, but that might actually be okay in a web UI, right?

The web UI. We might actually want it more structured. I don't know. Uh, all right. And then maybe now use, uh, use T max. Name tabs capture pain. Oh, I would put a comma after team. Oh, yeah, no, that's shit. That's fine. That's fine. Capture pain and the codex, uh, CLI to build this.

I don't know if that's gonna be enough. Right. But, uh, Max, according, according to the diagram. Yeah. With maximum parallelism is good. Um, check on agents every 20 seconds or something like that. Is that input field in Vim mode? Is that how you're able to backspace so fast? Um, how am I able to what?

Uh, yeah. Um, so, well, um, within, within, um, each agent, it's just doing its own thing. They don't let me do Vim mode there, but I'm in T mux, which I have total control over. And I have that with Vim like binding. So I'm hopping around between different. So I, I am myself in T mux.

Um, and I use it all over the place for that type of thing. Um, this one is still going 800 seconds in. Um, here we have the logic and minimal checks. I'm going to wait on the scheduling on top of it until this refactor is done. Um, and then we'll see what this one does.

Um, let's see how many codex processes I have running. I mean, I've got a bunch of these running in other places too. So, I mean, you can attach to the T mux, right? That's true. I haven't created it yet, but I will. Um, the, the one thing I could say about permissions is that I do have that.

I set up the discord, uh, when we did the last debugging session with literally nobody in there. Like nobody's in there and I can give you permissions, right? Like give you the tokens or whatever for this discord bot. Uh, you know, if we make enough progress, I don't know.

We have like 20 minutes left, but, but, um, the one thing I'm, I'm not sure if, if that bot will have all like already will have all the permissions that this code needs. Cause I, I never could get to the bottom of that. When I was looking at the, the code, like what all the permissions that actually needs, but it has enough permissions to be able to talk to people and thread stuff.

It just doesn't have admin, like the bot itself doesn't have admin permission over the discord server. Yeah. But I'm more aware of that stuff. Uh, cause it's already like. Yeah. I mean, my whole goal. It's building it's all agent framework. Like you don't need to prompt very much and it will build like the thing at hand.

It's pretty, it's pretty cool. Yeah. Including it's building a DevOps agent and QA agent and all of that fun stuff. So we will see what happens. Uh, that'll be interesting. But, uh, flow to your point. The whole reason I started with like build me a CLI or REPL for this is I didn't want.

I wanted to be able to debug my bug with and fix it without actually having to mess with anything about how I access discord. Um, so. Oh, it's creating a, an NPM thing for, for us to run it. Um, this is fascinating. Yeah. I'm very curious what it's going to end up with.

Me too. Uh, my stuff is finished, so I'm, I'm good to switch back if you want. Plus I have, I have something that I'll throw in Manus and then, uh, probably. Maybe by the end of the session we'll have it. Okay. So it, it built me a set of scripts to do this, but it didn't run it itself.

So I'm just, I guess I'm going to tell it to run and see what happens. But then, yeah, if you want to, uh, I want. You run it with. Yeah. Run it in the background. Cool. Yeah. And if with SQLite, you basically get a database that you can then like, you know, pull in a different Tmux with a dashboard.

Like that's why the, there's like Tmux plus SQLite and it says just like two stop agents and we'll figure it out. It's pretty cool. All right. Uh, let me stop sharing and then you can share, show your thing. Okay. All right. So, um, the one thing we were building, if you remember is the REPL for HTP calls.

So that finished here, right? It was like the codex run. So if I test it, uh, I guess it's command REPL test demo. I have a feeling it's going to be, it's going to be useful. Not that super featured, but so we have the REPL now. And then it shows me example, get http bin.org.get.

So we're going to try that. And then, yeah, I knew that it would do these like, uh, formatted, um, widgets, and then you can kind of scroll through them. Um, oh, this is not the fixed version yet. And I don't know if it can do much more. I don't think it can do much more.

Um, but that's like a, right. That's like kind of a useful starting point. And that's where this like, um, planning with Sonnet comes in. If I had asked Sonnet, like, oh, design like a web, like a REPL to interact with HTP. It would add like 15,000 features to it.

I would then remove, you know, like the, the nonsense. And then you give it to GPT-5 and it will build like all of them. Right. So suddenly you have kind of the best of both worlds. If you ask GPT-5, like make me a webpage to, you know, manage users, which I asked it a while ago.

It made a webpage. There was one text area. I could paste in JSON and I could say update user. And I was like, okay, it's a webpage to manage users, but it's like not really what I wanted. Um, but that's exactly the kind of behavior I want when I want to fix a bug.

Right. It's like, I, it's like, it's going to fix the bug and it's not going to do anything else. So that was one thing that we started. And then the other one that we started was moving the console into Boba T as a reusable package. So this is also what we have here.

We have PKG console. We have a document, like a documentation about it. So we can look at the documentation, um, which is written in the style of documentation that I, that it does anyway. And it will tell me, so this is important in documentation files. I find it's just like mentioning file names, mentioning package names, mentioning, uh, symbol names.

Cause that way it's kind of like an index file for models to know what code to write. So if I pointed at this, it pretty much has all the info needed to write a minimal example, but also to look at where this is actually built. So that way, if it needs additional information, it will already have starting points without necessarily having to grab things or what happens more often is like finding confusing information in the repo.

And then it will look at the wrong side of the repo and it will build like nonsense. You're like, no, no, no, not this console, this console. Um, but if we now test this examples console. Yeah. So basically now I have this as a reusable library and I can just, the next time I need, like the next time I need some kind of, um, of output framework.

Cause if I look at the demo now, the demo is going to use the, the public API on how to do this. So now I have like a perfect example on how to add this to an existing tool. I can think of a tool right now that does something useful that I could reuse.

Um, but I'm sure I can share something up in, in a discord. And then the other idea that I had, which is what Kevin was doing is like polling to get status while the agents are running. It's like make a web app that shows real time update information on the changes to a git repository, including new files.

So that as a developer, I can track what my coding agents are doing in a nice web UI. So this is pretty much what I want. And then I've got like a bunch of, um, Oh wow. They added shit. That's great. Um, and then, uh, Yeah. Yeah. Yeah. Yeah.

That's what I was going to ask you. The pre-saved prompts that you have that you put at the bottom. Like you find these useful, obviously, cause I see you using them a bunch. Oh, no, this is from them. This is like community stuff. I don't reuse prompts. Like sometimes I'll paste, paste stuff, but, um, I'm too lazy to do reusable stuff.

Like in one of the other minutes conversations you had this, like the thing you're about to do now, you had that in one of them. That was like, um, make sure that the tests that you, or make sure that this, I think actually works. And you put it just at the bottom like that.

Yeah. Cause I mean, Manus is really like, I, I call it like, like, uh, like LLM roulette. If you really don't know what's going to come out. Cause like it's on it and it will like, it will always succeed. Right. And then you can track what it does, which is like, it writes like 99% of the program perfectly.

And it doesn't compile because the third party library wasn't installed or something. And then it will say like, let me write a simpler version, which is like echo success. And then it will say like success. And then you're like, well, that was right. That was like a useless run.

Uh, and often it will self destroy the stuff. You can tell from the history. It's like, oh, you got something really useful. And then it just destroyed it. And you'd like lost it, which is, which is really annoying. Um, take screenshots at different steps. Uh, and then at the end, make a zip file and write a report.

Um, make sure that the app really works by testing it twice. I don't know. I don't know if that makes sense. This is this. I never tried that. Let's try it. And then I'll let run. And it looks like sometimes amazing stuff comes out. Sometimes it's like complete nonsense.

Sometimes I learned something from its failures. Like the YAML streaming one was like a big failure, but I learned quite a bit. And then sometimes it's just like jank and you can do something useful with it. Um, but I, I have good hopes for this one. Like, I think something useful will come out of this.

If just for getting a feel for what such an app could be, could be like. Um, I don't know if cable made any progress on that last one, but, but I am curious for this one when you're, cause this is something that's like extremely interesting to me. Like making a web app so you can keep real time updates on like all the changes, especially if you're, if you're kicking off agents and sub agents and sub agents of sub agents and all that good stuff.

What, um, have you like explored this idea of like using get work trees? Cause that was like all the rage a couple months ago or weeks ago, like using get work trees to like have agents in different, um, versions of your repo at the same time. Is that something like that?

Or is that not something anyone else has explored? I don't give it to agents, although I could, but I've got this workspace manager that I use a lot, right. For like each feature that I do, I'll just have a project. And a lot of these things that I built have like multiple, multiple repositories.

Like there's, they're spread across many repositories. So that allows me to like quickly check out the same branch from like many repos and then work in them. And I think I could just tell the agent, Hey, use this tool to manage your work trees. Right. Um, that would probably work because, and this is the workspace manager is something that is entirely vibe.

I have no idea how it works. And it got to a point where like, I was trying to add more features. And then I was like, okay, I reached the end point of this. I tried to refactor it like three times with sonnet and I was getting nowhere. Um, but now I think I have a refactor with GPT five somewhere, but I haven't tested it out yet.

Um, and I started building things like, okay, what let's build features to like attach T muxes to a specific work tree. So you can quickly like execute commands in a certain workspace. Like I tried playing with some ideas that I think are worth exploring. Um, Kevin, you, you have something to, to show.

Yeah. It could, it could not figure out the team ups and sub agent and whatever. It just like, it's totally. I figured it. I figured that there might have been a little bit more, uh, to it. How did it barf down? Like you want to go back. Um, yeah.

So. Yeah. First. Okay. Oh, a little bit. So one, it's really resistant to actually using it. So what it did, if I go up. So first it set up a thing and then it's like, oh, they won't actually let me try it. I'm running it. It's, oh, it doesn't actually support this flag that I thought it did.

It's falling. It's airing. Okay. I'm like, well, fix it. Like, figure it out. You're go. And then it tried and then it stops again. And it's like, okay, you have all this great stuff now, but it's there. And I'm like, no, I want you to use it to build it.

Go. And then it starts just trying to build it on its own is not using it. So. Yeah. It's what I would do. Yeah. Go ahead. It's just, I think it's a, this is a place where you have to fall back to much more prescriptive again, and just be like, this is what I want you to do.

Do this, do this, do this. We're close to time. So I'm not going to go down that rabbit hole, but yeah. So what I would do now, like, like one of the ways I think about this. Cause like madness is something that will do something like that quite a bit.

Right. It'll be like, I can figure out how to install this library. So I did this nonsense. And then this nonsense. So you're like, okay. Um, is this, however, instead of thinking like, oh, I didn't achieve the thing that I wanted it to do is like, the question is more like what's useful about this output.

Yeah. And one of the prompts I use very often is like, okay, write a full report on what you wanted to do, what you did, what you tried, what you learned, what didn't work. And then I have this like full report. That's a little bit more readable than a huge session.

And that one I can use next time to say like, you know, we tried stuff in the past. Here's how far we got, uh, use this information to, and then you input your own thinking, right? Like I, I'd probably go grab my piece of paper and be okay. How could I build a prompt that makes it build work with code codex properly after having read that, read that document.

Cause it'll probably just require like, you know, walking around a little bit and just like thinking, why, why did it get confused? Um, yeah, it did figure out quite a few things, right? Like it got its codex profile. Maybe a trick here is to not tell it to use codex, but just to, to, to make an agent.sh and then use that.

So that it doesn't know that it's actually calling codex underneath. Um, it's like doing these kinds of shenanigans. Um, codex, by the way, has like, if you do codex proto dash dash help, it has a json stdio protocol, which I started playing with. Yeah. Um, I didn't do it here, but I remember.

So I, I did a variation of this to a simpler version. I was just doing a simple, a single sub edge that I was trying to manage and do it. Um, for this like resume reviewer thing that I was doing. And one of the things that was very interesting is codex really wanted to build this all using shell scripts and it would do shell script and shell script.

And it kept running into error handling issues. It does this, whatever. And eventually I was like, F it, go build it in Ruby. Uh, I chose Ruby because we use Ruby for a bunch of stuff. I didn't want to do whatever it did. And instantly it was more reliable.

And now we, it felt like we were building on something stable where it was like, oh, there's this issue deal with it. It has the ability to do real error handling in a way that shell scripts just barfed on. Right. And then you like shell scripts. I mean, you can always transform them, but it's like hard to build a package for shell scripts, right?

Like packaging shell scripts to be reused is always like fraud. Um, while if you move to that, that's where the dagger stuff comes in. So dagger is a, is a tool that allows you to build pipelines on top of Docker. And it has. APIs for Python, JavaScript, and go.

So you can basically build like kind of a self-contained go binary that will run Docker containers and run scripts in there. And you can build all of these bigger, bigger systems without having to write shell scripts. Um, which is always kind of a two-edged sword because shell scripts are nice because they're so flexible.

Right. Um, but that's something to explore as well, which is like just a building block to more easily transform one thing to the other. Yep. So I'm having it remove all the things that it wrote other than the report, leave the report. Might do keep that, um, for now.

Uh, cause the thing that I really cared about was this chat repl, which will, I'll like figure out later today or this weekend, um, getting it there. So I can then finally debug the issue that I ran into in the discord bot that they actually wanted to. Um, Actually, Manus is pretty good at memories.

So I think that's also, it helps it not having to use templates and stuff very often. So I do use memories quite a bit, actually. Um, the question there was around social media content. I wasn't actually doing any social media content generation. I'm interested in content generation. Cause my wife is doing some other type of content generation.

Uh, but I wasn't doing that. Somebody else I think was doing social media. Um, and then I think, uh, Baruch was going to, oh, I've just hit my usage limit. Oh, that's fine. Um, so I've lasted through all my codex usage with y'all. I've never had it. It's worth getting pro.

Yeah. Yeah. I mean, this is on the business. Right. So business, I think I've got, oh, maybe it's using mine. I don't know. That's an interesting question. Um, No, but business is not pro. No, it's just like 200. Yeah. Yeah. Also maybe starting like 12 at the same time.

It's maybe not. And then having to try to do a bunch of sub agents to do things and all those things. Yeah. Um, is that still up? The team X is up. Uh, so anyway, I think we're at time. Hopefully this was interesting, at least to folks. Um, to kind of, to see as we're going down this, um, and yeah, in four hours and three minutes.

I'll see if I can get codex to actually finish the thing we want to develop. All right. Cheers, everyone. Um, hopefully the, the topic that we originally had for this week, I think he's going to come in two weeks. He got sick. Uh, we still need a topic for next week.

So if you want to bring something, as you can see, it doesn't have to be very well thought out ahead of time. You can just kind of show how you're using the tools. Um, thank you all. Cheers. Happy Friday. Peace. I'm, I'm nominating Kishore for next week, whether he wants to or not.

No, I'm just kidding. Oh, he failed. Oh, sorry. I don't have anything. - Aww.