Containing Agent Chaos — Solomon Hykes, Dagger

- - Hello, hello. Okay, my slides are up. You can see them, right? That's me. Okay. Well, this is a very special moment for me because I just realized yesterday walking in, this is the exact same spot, the same stage, actually, that I stepped on almost exactly day for day 10 years ago to kick off DockerCon 2015.

I thought it was pretty funny. I don't know if anyone was there for that. Maybe this audience is too young, maybe. I don't know. Okay, well, I'm here to talk about chaos, specifically the kind of chaos that emerges when you try to use coding agents. And I want to talk about chaos from the perspective of our community at Dagger, which is platform engineers.

I don't know if there's any platform engineers in the room. Okay, it's you and me, ma'am. Okay, well, it is known sometimes as other things, but basically platform engineers have a really tough job because they don't get to build and ship cool software. they get to enable all of you to build and ship cool software in the most productive way possible, right?

It's a really tough job. It takes range. It takes experience. It takes a lot of patience. But we do it for the endless gratification, you know, just the gratitude we get from developers. Just kidding. No one ever says, "Thank you," but it's okay. Someone has to do it. Tough job.

Speaking of enabling, anyone here use coding agents? We are outnumbered. Okay, well, I want to say to you congratulations and welcome to platform engineering. Yeah. I mean, your job now is to enable robots to ship awesome software while you spend more and more of your time enabling them to do that productively, right?

Tough job. I applaud you for giving up really the most fun and rewarding part of the job, you know? Very selfless. Yeah, so, of course, this is not completely a reality yet. I mean, we don't have quite yet the team of agents just kind of, you know, humming along, doing the job while we sit back and fix environments for them.

But you can kind of see it coming, right? I mean, some of you are definitely doing that, hacking that together. There's a lot of cool posts out there and scripts and tools. So we know it's coming. The question is, how do we enable this to happen, not just for this incredibly cool and bleeding-edge crowd, but for everyone else?

Like, everyone shipping software everywhere just sort of creating maximum value by enabling agents to do the work for them, ultimately taking their jobs. That is the dream, right? Okay. So, yeah. How do we do that and make it not too painful? Well, I want to go back to basics.

What is an agent? The famous definition, of course, is it's an LLM that's wrecking everything in a loop on behalf of a human. The diagram is from Anthropic. Thank you, Anthropic. I tweaked the explanation just a little bit. In the context of coding agents, it looks like this. Oh, man, that was supposed to be animated.

It's even better when it's animated. It's okay. Yeah, you've got one agent, and it's doing stuff in the environment is your computer. And it can do great work. It can also do very crazy things. So you have to kind of watch it closely, right, and approve, approve. No, no, don't do that.

That's crazy. Yes, that's good. That's kind of the status quo today. But, of course, we want to scale it, right? We want a team. So how do we do that? Well, right now, I would say there are two options, both equally wonderful and fun. The first one I call YOLO mode, you know, I'll just run 10.

What can happen? Amazingly, this diagram is not the worst case scenario, but, yeah, you know, you get the idea. So the whole methodology of watching it closely just kind of falls apart really quickly because they're all stepping on each other's toes. They're sharing an environment, right? Okay. Enter option two.

Oh, don't worry about that. We'll run the agents. Right? We'll take care of everything. We've got the background mode. We've got the model. We've got the tools. We've got the environment. We've got the compute. We've got the secrets. We've got everything. You know, just open an issue, wait for the PR, relax, until, of course, it doesn't work, and then you're like, no, that's not what I meant.

These actually work really well. I think, like, 10 of those launched just today and yesterday. And they're great. It's just that, you know, sometimes you just want to get in there. Like, okay, give me the keyboard. You know? And sometimes you just want to run it on your machine or on your favorite compute provider.

Right? Use your favorite model. You want to mix and match. There are limitations to this all-in-one model. So the question is, is there something better? Is there just a scenario where I just got a team and they're working and, you know, I can step in or leave them alone and we're just kind of getting stuff done together?

So this is how I would summarize it, what I would want. There's really four things. First, I want background work. You know, I don't want to be in there just watching every action. That's obvious. I want Rails. And that means I want to be able to constrain the agent to not just do things that I already know are not necessary.

So obvious things like context of the project. What's, you know, what's our coding style? What tools to use? But also, here's how to build. Here's how to test. Here's the base image we use, right? You can access this secret. You can access that. So there's an easy way to do that.

Because otherwise, I'm going to waste so many tokens just correcting as I go, right? The third is, inevitably, when I do need to step in, I want a really efficient and seamless way to do that. And it can't be watch every action and it can't be just wait for the PR and do co-review.

You know, I need a middle ground here. And the fourth thing is I want optionality because, like I was saying before, it's a crazy market. You know, there's awesome models, awesome compute, awesome infrastructure. Agents are really cool. And as cool as they are now, I mean, one of you is probably, like, launching one right now.

And then there's another one tomorrow. So I don't really want to lock myself into a whole package today and say no in advance to whatever's coming out tomorrow, right? Not in this market. So to get that, I need an environment that has properties that match this. It needs to be isolated, right?

So background work works. It needs to be customizable so I can set up those rails. It needs to be multiplayer so I can, you know, go, all right, give me that. Let me fix this. Or let me check. Did you do it? You know, when the model says, I did it.

Did you do it? And then, you know, it should be open. No shade on making money and scaling a huge cloud service. That's great. You know, we have one. They're great. But I just want choice, right? I want to be able to choose and get the best commodity. I'll just use this word.

It's okay. It's okay to use it. It's the best commodity component for each job. And, you know, it could even be open source. Who knows? We could collaborate on this. Anyway, so unsurprisingly, maybe, I'm going to talk about containers now. Someone actually said, you know, you should check that they know Docker.

They know containers. Okay. Who knows what containers are? Who's used containers? Okay. Cool. Cool. All right. Boost my confidence a little bit. But the point here is we have the technology. And it's not just about containers. But they do play a crucial role. Because it's a foundational technology. And it is underutilized.

We don't fully leverage what this technology can do. Because we're used to the first incarnation of the tools. Made for humans. Same thing for Git. I see a lot of hacks involving Git work trees. Anyone playing with Git work trees to get stuff done? Okay. You know what I'm talking about.

This is about that. And of course we have models that are incredibly smart. Getting smarter. And they can exercise these technologies really fully. We just need to integrate them in a native way. So that we really tackle the problem at hand. Which is giving great environments to these agents.

Anyway. So if we built that native integration what would it look like? Well we have a take. Sorry. We a dagger. I forgot completely to mention my company. That's okay. It's great. Check it out. We have a take on that. Something we call container use. You know there's computer use.

Browser use. These agents need container use. They need a way to use containers to create environments and work inside of them. This is not the same thing as sandboxing. Right? There are a lot of ways to execute the output of the agent in a secure sandbox. Very useful. Very cool.

But that's not the same thing as the agent developing inside of containers entirely. Right? That's what we're talking about here. So. I asked my team. Hey. We've been developing this thing. Oh. It's open source. But it's not yet open source. Like it's not finished. But I asked the team.

I should show it. Right? And they said absolutely not. It's not ready. So anyway. You want to demo? Okay. Just so we're clear. This is you agreeing to watch me stumble through a broken demo of unfinished software. Yes? Okay. So much could go wrong right now. Okay. This is my terminal.

Can you see it? Okay. For technical reasons, I'm not going to go to full screen. You just got to stop me when I reach the edge. Oh, actually I can see it. Never mind. Okay. Yeah. Old school. Old school. Okay. We used to do this all the time. In the old days.

Okay. So here's what I'm going to do. I'm going to just try to develop something very simple here. I've got an empty directory. I'm going to try and make a little homepage for my awesome container use project. And I'm going to use clod. Cloud code. I'm going to try and use a bunch of them.

Hopefully I made something very clear. This is not a coding agent. It's environments that are portable that you can attach to any coding agent. That's the idea. So you like clod? Use clod. You like codecs? Use codecs. Et cetera, et cetera, et cetera. In an IDE. In the command line.

Whatever. And also in the cloud. Right? In CI. Lots of cool things you can do once you're async. So one of the reasons the team said don't do a demo is I'm actually terrible at using cloud. So I have an alias for remembering the flag to disable all permissions.

I can never remember it. And I have a prompt here. I'll read it to you in a minute. But it's basically make me a homepage. Make it a Go web app so I can know what's going on because I'm not a cool kid writing TypeScript. And run the app when you're done.

So while this runs, while this maybe runs, hopefully. Okay. Okay, cool. So what's happening here is I configured cloud code to use, you know, with container use. To use containers, literally. Yeah, MCP. So it was an MCP integration. There are other integrations that we're working on. But MCP is the obvious place to start.

And so now it has, you know, all its usual tools. This is vanilla cloud code. But now it can create an environment for itself. And now it's editing files in that environment, like a little sandbox. And it can also run commands to build it and test it and, of course, run it in ephemeral containers.

This is not one Docker container sitting there. Every time an action needs to be taken, there's an ephemeral container running and then being snapshotted and returning. So it's just doing its thing. What would I want to show here? Okay. So here I'm going to first show that nothing has been polluting my workspace.

It's happening in a little sandbox. And the way the sandbox works, the state of these files and the containers that are being run is actually persisted in Git. And a bunch of special Git objects that are kind of living alongside the repo. So it's right there if I need it.

This is all local. But it's not polluting my workspace by default. So hopefully it's going to produce something soon. While it does that, I'm going to use this little command line. Is this readable? Okay. Little command line. See you. Like, go work. See you later. But no, really, it's for container use.

And I can list environments. And you can see there's a new environment that's been created here with a little random name here. And so there's a few things I can do. One thing I can do is open a terminal. And here -- okay, this part is powered by Dagger, right?

But we use Dagger as a sort of a toolbox. It has all the primitives you need. And so here I can see exactly what the agent sees. The files, but also the tools. So I can see, okay, what Go version did you configure for yourself? All right, because the agent is given the ability to figure out what environment it needs and then configure that, but in a repeatable containerized way.

So here I can see -- okay, does it build? Okay, it builds. Okay, so you're done? What's going on? Okay. While we do that, I'm also going to show you -- actually, I have two more things to say. One, a really cool feature of this that I'm not going to show is secrets.

So you can just plug in secrets from things like 1Password. I use 1Password. I don't want to use a separate password manager from an AI company. No offense. I just want to use my password manager. So I can just plug in and say, this environment gets this secret. And boom, it can use it, right?

And then the team said, please don't show that. That's just -- that's going to break for sure. So I won't. And the other thing I want to say is that because it's all powered by Dagger, and the point here, it's containers and it's open source. That's what you should know.

It's running on my machine. Actually, no. It's not running on my machine because we're at a conference, and there's a lot of things that can go wrong if you run containers and download images. So instead, I just have it running on my home server in my basement, about one mile this way.

And it just kind of works seamlessly. It's string files up, string files down. It all just kind of works. Okay. This is the part that I cannot control, as you know. Okay. One more thing I'll show you. You can watch. So here, I can see the history. So behind the scenes, every snapshot of the state is like a Git log.

It's actually using Git under the hood. So if I'm happy with the results, I can go and get it. So it's like a happy medium between the -- it's like a loop, a collaboration loop that's just right. It's not watching every tool and wrecking a shared environment, but it's not waiting for a pull request and, you know, having these long back and forth.

It's right in the middle. I can see everything going on, and I can say, okay, give me the history of that. I want that. Okay. It says it's live. It's running. Ooh. Pretty nice. Cool. Okay. So now -- Okay. I appreciate it, but you guys can be honest. It's a little boring.

So this design is boring. Make it really pop. Trying to impress a engineering. There. Okay. Okay. Okay. So the reason I'm doing that is trying to create the circumstances where I would need a lot of parallel experiments, right? Make it pop. What does that mean? It means anything. What if I want to try several experiments in parallel, right?

So I'm just going to say -- Oh! Well, hold on one second. Stop. Before I do that, I'm going to merge this, right? There's still nothing here, but I'm saying I like it. So I'm going to say, merge that environment. And I have it. It's my history. I can open a pull request.

I can clean it up. Whatever. So that's a loop that I can work with, right? And now I can say, nah, boring. And then I can say, since the environment is now in this state, I can ask for help from a few other agents, right? I can say, okay, hey, Claude Yolo.

Nope, that's not right. Claude Yolo, this web app looks a bit boring. Can you make it pop, please? Okay. And go. And go. And go. Okay. So this is where things start really going wrong. But as the team pointed out, they said, well, something's going to go wrong, right?

They said, yeah, but you were kind of showing that if things go wrong, you can throw away the environment, and you're good. You can restart. I said, okay, that's cool. So let's say I don't like this one. I'm like, nope. Goodbye. That's it. I don't have to go clean up the mess, right?

That's the whole point. Okay. So this is getting a little messy. Oh, I wanted to show Goose also. So Goose is a really cool open source agent. Whoops. All right. Hold on a second. Goose Yolo. Same thing. Everyone has complicated flags for disabling all these safeties that I don't need anymore, right?

Because it's -- Okay. Sorry. Okay. Well, I'm really taking a chance here. So while this is happening, one thing we've been working on, but it's still a work in progress, is there's a watch command. Oh, I showed you that already. But as -- So as -- This is a git command, right?

Thinly wrapped git commands. Our UX is really -- I cannot -- Words cannot express how unfinished this is. But -- But it'll evolve rapidly because the bones are strong. It's git, it's dagger, and, you know, it's your existing agent. Right? And then a little bit of glue. So, for example, here is literally -- It's a git command that you can copy/paste.

But as the agents work, you're going to see state snapshotting, and you're going to see these branches just kind of diverging. And then I can diff them and apply them, merge them, whatever I want. And what I really want it to show, and then I'm done, is I just want to see one of them run.

So you can see when the agent runs a service, like in this case, go run, npm run, whatever, it's doing it in its containerized environment. And that's going to seamlessly be tunneled to my machine here on a different port without any conflicts, right? So when I say the environment's isolated, it's the files, it's context, it's configuration, and it's execution, right?

And the cool extra thing is all of this is actually technically this here is running in my basement. So you can go crazy on the infrastructure side. Like you can run this on a cluster. We like to run this stuff from CI. It's just a lot of fun stuff you can do.

And I'm getting at 30 seconds. Oh, Goose is running. Great. Okay. We did not solve prompt engineering. Do it. Okay, not done, not done. Oh, man. Okay. Well, just imagine-- Okay, well, while this happens, because I've got 30 seconds left, I'm just going to say thank you. And there's one last thing I want to say about DockerCon.

Ten years ago, we used to open source stuff on stage all the time. So if you want, I can go and open source it right now. Okay. You have been warned, though, about the not finished part, right? Okay. Okay. Oh, I think my-- It would be funny if the demo failed at the clicking on GitHub part.

Okay. All right. Goodbye. Goodbye. Next time. I promise it works. Okay. Haven't done this in a while. Wait. Oh. I'm almost done. I promise. Come on. You did so well. Change visibility. Yes. I want-- Yes. I have read and understand. Oh, God. Oh, God. Yes. At Dagger, we take security very seriously.

Okay. All right. I think it's-- Wait. I think it's done. Yes. Okay. So, yeah. Thank you very much. And it's github.com/dagger/containeruse. Come say hi. Come participate. And thank you so much for having me. Thank you so much for having me. Thank you so much for having me. Thank you so much for having me.

Thank you so much for having me. Thank you so much for having me. Thank you so much for having me. Thank you so much for having me. Thank you so much for having me. Thank you so much for having me. Thank you so much for having me. Thank you so much for having me.

for having me. Thank you so much for having me. you

Containing Agent Chaos — Solomon Hykes, Dagger

Transcript