back to indexContaining Agent Chaos — Solomon Hykes, Dagger

00:00:35.880 |
because I just realized yesterday walking in, 00:00:38.380 |
this is the exact same spot, the same stage, actually, 00:00:41.580 |
that I stepped on almost exactly day for day 10 years ago 00:01:11.900 |
from the perspective of our community at Dagger, 00:01:16.880 |
I don't know if there's any platform engineers in the room. 00:01:26.520 |
Okay, well, it is known sometimes as other things, 00:01:32.960 |
but basically platform engineers have a really tough job 00:01:35.660 |
because they don't get to build and ship cool software. 00:01:39.240 |
they get to enable all of you to build and ship cool software 00:01:54.960 |
you know, just the gratitude we get from developers. 00:01:58.020 |
No one ever says, "Thank you," but it's okay. 00:02:05.800 |
Speaking of enabling, anyone here use coding agents? 00:02:13.900 |
Okay, well, I want to say to you congratulations 00:02:22.120 |
I mean, your job now is to enable robots to ship awesome software 00:02:28.080 |
while you spend more and more of your time enabling them 00:02:35.220 |
I applaud you for giving up really the most fun 00:02:46.260 |
Yeah, so, of course, this is not completely a reality yet. 00:02:50.760 |
I mean, we don't have quite yet the team of agents just kind of, 00:02:55.900 |
you know, humming along, doing the job while we sit back 00:03:02.640 |
I mean, some of you are definitely doing that, 00:03:07.280 |
There's a lot of cool posts out there and scripts and tools. 00:03:13.280 |
The question is, how do we enable this to happen, 00:03:17.780 |
not just for this incredibly cool and bleeding-edge crowd, 00:03:26.620 |
Like, everyone shipping software everywhere just sort of creating maximum value 00:03:31.500 |
by enabling agents to do the work for them, ultimately taking their jobs. 00:03:41.840 |
How do we do that and make it not too painful? 00:03:47.200 |
The famous definition, of course, is it's an LLM that's wrecking everything in a loop on behalf of a human. 00:04:00.140 |
In the context of coding agents, it looks like this. 00:04:10.420 |
Yeah, you've got one agent, and it's doing stuff in the environment is your computer. 00:04:18.780 |
So you have to kind of watch it closely, right, and approve, approve. 00:04:35.160 |
Well, right now, I would say there are two options, both equally wonderful and fun. 00:04:41.220 |
The first one I call YOLO mode, you know, I'll just run 10. 00:04:50.340 |
Amazingly, this diagram is not the worst case scenario, but, yeah, you know, you get the idea. 00:04:56.540 |
So the whole methodology of watching it closely just kind of falls apart really quickly because 00:05:16.580 |
You know, just open an issue, wait for the PR, relax, until, of course, it doesn't work, 00:05:23.340 |
and then you're like, no, that's not what I meant. 00:05:28.460 |
I think, like, 10 of those launched just today and yesterday. 00:05:34.460 |
It's just that, you know, sometimes you just want to get in there. 00:05:41.460 |
And sometimes you just want to run it on your machine or on your favorite compute provider. 00:05:48.460 |
There are limitations to this all-in-one model. 00:05:51.100 |
So the question is, is there something better? 00:05:54.520 |
Is there just a scenario where I just got a team and they're working and, you know, I can 00:06:00.720 |
step in or leave them alone and we're just kind of getting stuff done together? 00:06:05.120 |
So this is how I would summarize it, what I would want. 00:06:12.120 |
You know, I don't want to be in there just watching every action. 00:06:19.120 |
And that means I want to be able to constrain the agent to not just do things that I already 00:06:27.120 |
So obvious things like context of the project. 00:06:40.120 |
Because otherwise, I'm going to waste so many tokens just correcting as I go, right? 00:06:46.120 |
The third is, inevitably, when I do need to step in, I want a really efficient and seamless 00:06:53.120 |
And it can't be watch every action and it can't be just wait for the PR and do co-review. 00:07:01.120 |
And the fourth thing is I want optionality because, like I was saying before, it's a 00:07:07.120 |
You know, there's awesome models, awesome compute, awesome infrastructure. 00:07:15.120 |
And as cool as they are now, I mean, one of you is probably, like, launching one right 00:07:22.120 |
So I don't really want to lock myself into a whole package today and say no in advance 00:07:32.120 |
So to get that, I need an environment that has properties that match this. 00:07:43.120 |
It needs to be customizable so I can set up those rails. 00:07:46.120 |
It needs to be multiplayer so I can, you know, go, all right, give me that. 00:08:00.120 |
No shade on making money and scaling a huge cloud service. 00:08:11.120 |
I want to be able to choose and get the best commodity. 00:08:18.120 |
It's the best commodity component for each job. 00:08:27.120 |
Anyway, so unsurprisingly, maybe, I'm going to talk about containers now. 00:08:33.120 |
Someone actually said, you know, you should check that they know Docker. 00:08:47.120 |
But the point here is we have the technology. 00:08:56.120 |
We don't fully leverage what this technology can do. 00:09:00.120 |
Because we're used to the first incarnation of the tools. 00:09:06.120 |
I see a lot of hacks involving Git work trees. 00:09:09.120 |
Anyone playing with Git work trees to get stuff done? 00:09:17.120 |
And of course we have models that are incredibly smart. 00:09:20.120 |
And they can exercise these technologies really fully. 00:09:25.120 |
We just need to integrate them in a native way. 00:09:27.120 |
So that we really tackle the problem at hand. 00:09:32.120 |
Which is giving great environments to these agents. 00:09:35.120 |
So if we built that native integration what would it look like? 00:09:56.120 |
They need a way to use containers to create environments and work inside of them. 00:10:05.120 |
There are a lot of ways to execute the output of the agent in a secure sandbox. 00:10:11.120 |
But that's not the same thing as the agent developing inside of containers entirely. 00:10:45.120 |
This is you agreeing to watch me stumble through a broken demo of unfinished software. 00:11:02.120 |
For technical reasons, I'm not going to go to full screen. 00:11:05.120 |
You just got to stop me when I reach the edge. 00:11:25.120 |
I'm going to just try to develop something very simple here. 00:11:32.120 |
I'm going to try and make a little homepage for my awesome container use project. 00:11:45.120 |
It's environments that are portable that you can attach to any coding agent. 00:12:03.120 |
Lots of cool things you can do once you're async. 00:12:05.120 |
So one of the reasons the team said don't do a demo is I'm actually terrible at using cloud. 00:12:12.120 |
So I have an alias for remembering the flag to disable all permissions. 00:12:27.120 |
Make it a Go web app so I can know what's going on because I'm not a cool kid writing TypeScript. 00:12:35.120 |
So while this runs, while this maybe runs, hopefully. 00:12:42.120 |
So what's happening here is I configured cloud code to use, you know, with container use. 00:12:51.120 |
There are other integrations that we're working on. 00:12:56.120 |
And so now it has, you know, all its usual tools. 00:13:03.120 |
But now it can create an environment for itself. 00:13:05.120 |
And now it's editing files in that environment, like a little sandbox. 00:13:09.120 |
And it can also run commands to build it and test it and, of course, run it in ephemeral containers. 00:13:15.120 |
This is not one Docker container sitting there. 00:13:18.120 |
Every time an action needs to be taken, there's an ephemeral container running and then being snapshotted and returning. 00:13:31.120 |
So here I'm going to first show that nothing has been polluting my workspace. 00:13:38.120 |
And the way the sandbox works, the state of these files and the containers that are being run is actually persisted in Git. 00:13:47.120 |
And a bunch of special Git objects that are kind of living alongside the repo. 00:13:56.120 |
But it's not polluting my workspace by default. 00:13:59.120 |
So hopefully it's going to produce something soon. 00:14:01.120 |
While it does that, I'm going to use this little command line. 00:14:16.120 |
And you can see there's a new environment that's been created here with a little random name here. 00:14:26.120 |
And here -- okay, this part is powered by Dagger, right? 00:14:34.120 |
And so here I can see exactly what the agent sees. 00:14:42.120 |
So I can see, okay, what Go version did you configure for yourself? 00:14:46.120 |
All right, because the agent is given the ability to figure out what environment it needs 00:14:51.120 |
and then configure that, but in a repeatable containerized way. 00:15:05.120 |
While we do that, I'm also going to show you -- actually, I have two more things to say. 00:15:09.120 |
One, a really cool feature of this that I'm not going to show is secrets. 00:15:13.120 |
So you can just plug in secrets from things like 1Password. 00:15:18.120 |
I don't want to use a separate password manager from an AI company. 00:15:24.120 |
So I can just plug in and say, this environment gets this secret. 00:15:29.120 |
And then the team said, please don't show that. 00:15:32.120 |
That's just -- that's going to break for sure. 00:15:35.120 |
And the other thing I want to say is that because it's all powered by Dagger, 00:15:39.120 |
and the point here, it's containers and it's open source. 00:15:47.120 |
It's not running on my machine because we're at a conference, 00:15:51.120 |
and there's a lot of things that can go wrong if you run containers 00:15:55.120 |
So instead, I just have it running on my home server in my basement, 00:16:11.120 |
This is the part that I cannot control, as you know. 00:16:19.120 |
So behind the scenes, every snapshot of the state is like a Git log. 00:16:25.120 |
So if I'm happy with the results, I can go and get it. 00:16:28.120 |
So it's like a happy medium between the -- it's like a loop, 00:16:35.120 |
It's not watching every tool and wrecking a shared environment, 00:16:40.120 |
but it's not waiting for a pull request and, you know, 00:16:46.120 |
I can see everything going on, and I can say, 00:17:24.120 |
So the reason I'm doing that is trying to create the circumstances 00:17:27.120 |
where I would need a lot of parallel experiments, right? 00:17:32.120 |
What if I want to try several experiments in parallel, right? 00:17:40.120 |
Before I do that, I'm going to merge this, right? 00:17:47.120 |
There's still nothing here, but I'm saying I like it. 00:18:00.120 |
So that's a loop that I can work with, right? 00:18:07.120 |
And then I can say, since the environment is now in this state, 00:18:11.120 |
I can ask for help from a few other agents, right? 00:18:21.120 |
Claude Yolo, this web app looks a bit boring. 00:18:42.120 |
So this is where things start really going wrong. 00:18:51.120 |
They said, yeah, but you were kind of showing that if things go wrong, 00:18:54.120 |
you can throw away the environment, and you're good. 00:19:20.120 |
Everyone has complicated flags for disabling all these safeties 00:19:36.120 |
So while this is happening, one thing we've been working on, 00:19:42.120 |
but it's still a work in progress, is there's a watch command. 00:19:59.120 |
But it'll evolve rapidly because the bones are strong. 00:20:03.120 |
It's git, it's dagger, and, you know, it's your existing agent. 00:20:15.120 |
But as the agents work, you're going to see state snapshotting, 00:20:19.120 |
and you're going to see these branches just kind of diverging. 00:20:22.120 |
And then I can diff them and apply them, merge them, whatever I want. 00:20:27.120 |
And what I really want it to show, and then I'm done, 00:20:33.120 |
So you can see when the agent runs a service, 00:20:38.120 |
like in this case, go run, npm run, whatever, 00:20:42.120 |
it's doing it in its containerized environment. 00:20:44.120 |
And that's going to seamlessly be tunneled to my machine here 00:20:47.120 |
on a different port without any conflicts, right? 00:20:53.120 |
it's the files, it's context, it's configuration, 00:20:59.120 |
And the cool extra thing is all of this is actually technically 00:21:05.120 |
So you can go crazy on the infrastructure side. 00:21:34.120 |
Okay, well, while this happens, because I've got 30 seconds left, 00:21:42.120 |
And there's one last thing I want to say about DockerCon. 00:21:48.120 |
Ten years ago, we used to open source stuff on stage all the time. 00:21:51.120 |
So if you want, I can go and open source it right now. 00:22:04.120 |
You have been warned, though, about the not finished part, right? 00:22:12.120 |
It would be funny if the demo failed at the clicking on GitHub part.