Stateful environments for vertical agents

00:00:00.040 | All right, hi. I'm Josh, founder of Synth. I help people make their agents a lot better,

00:00:21.400 | and over the last few months I found some patterns around structuring people's agent code,

00:00:29.560 | that I think they've found very helpful and I've found very helpful for thinking about how to build

00:00:35.860 | effective agents, especially for vertical applications like finance, accounting, health,

00:00:41.800 | and so on and so forth. So I like to call these stateful environments because they're environments

00:00:48.760 | that capture state for the agent. So let's define terms. What is an environment? It feels like a loose

00:00:57.100 | term, but it actually has quite a long history. People working on reinforcement learning tasks,

00:01:02.980 | which are really just tasks where you're trying to get the AI to do something without stipulating how

00:01:08.140 | to do it, have been using environments to kind of containerize the logic behind the task away from

00:01:14.560 | their AI algorithm for quite a while. So the first implementation was RL Glue, then OpenAI, back when

00:01:20.380 | OpenAI was an RL company and not really a language model company, came out with the OpenAI Gym. And then

00:01:26.260 | most recently, probably the first kind of vertical-ish application academic papers, SWE Bench and SWE Agent,

00:01:34.420 | kind of coined the term of Agent Computer Interface. So people have been thinking about containerizing a kind of stateful

00:01:42.300 | workspace for AI's to have for quite a while. This is not reinventing the wheel. We're just building on top of what

00:01:50.300 | what people have already thought about. OK, so why are we adding on this abstraction of statefulness now? Well, two years ago,

00:02:03.180 | people mostly gave their LMs tools to calculate simple sums or maybe search the internet for the weather. You really didn't need to have a lot of clean, heavy-duty abstractions

00:02:16.180 | solutions for pretty simple logic like that. As models got better and people wanted them to use more effective tools, they moved to API-based tool use. And, you know, maybe you see that with some people getting excited about MCP.

00:02:29.060 | And it wasn't really until models got a lot better with Sonnet 3.5 that people started kind of thinking about a product or an artifact that the AI works on, iterates on, improves step over step over a long horizon.

00:02:46.940 | And I think when Claude Artifacts came out is probably when a lot of people started thinking about having some abstractions to help agents like Claude work on product and artifacts like Claude Artifacts in the web app.

00:03:02.820 | So that's kind of the impetus, the why now. So what are we contributing? Well, a stateful environment is an engine that computes results external to the agent implementation.

00:03:15.700 | The agent manipulates the environment somehow, but there's a lot of logic underneath that might involve accessing an API working on an Excel document or some kind of external source of truth that gets computed on and goes into a system of record.

00:03:34.580 | It can be a lot for an agent to interact with Excel though, like the entire application. So a stateful environment exposes a kind of representation or a version of that environment that the agent can make sense of, can observe and manipulate usefully.

00:03:51.460 | So you don't have to show the agent the whole OS. You kind of just show it what it needs to see in the terminal. And then often, and this is important for people doing RL training, but it can also be really handy in multi-agent settings.

00:04:03.460 | Network boundaries so that your agent doesn't have to run in the same process as whatever your stateful environment is.

00:04:12.460 | Okay. So what does this get us? I help people improve their agents. If you containerize the logic of your vertical app into code that never changes, it's a lot easier to completely revamp your agent when the new model comes out.

00:04:26.460 | It's a lot harder to do that when all the logic is kind of just clumped together. What else does it give you?

00:04:32.340 | Well, if you have a separate process determining your environment that has standard network boundaries, you can easily have multi-agent and spin up new models or spin up new agents to work on this single product together across time and there's really no problems.

00:04:51.340 | People have thought about how to do asynchronous work. People have thought about how to do asynchronous work. And the answer to that question of how to do asynchronous work in a reliable way in production is network boundaries.

00:04:59.340 | And then I think the most exciting thing is once you have this boundary, you can start doing things like resetting the state of the thing that your agent is working on.

00:05:08.340 | You can do rollbacks. I think a lot of people working with agents in a code setting know how valuable it is to just be able to roll back the agent after it's kind of gotten derailed.

00:05:18.340 | And if you have stateful environments, that's really easy to implement. And so in particular, a few years ago, there was a paper called Language Agent Tree Search that was really impressive and it got really good results, but it's almost impossible to implement in production because just nobody had really good abstractions for it.

00:05:38.340 | And techniques like this are really useful in a long horizon setting like a lot of builders care about today. If you have a resettable environment, you sort of get Language Agent Tree Search for free.

00:05:50.340 | And so here's kind of a screenshot of a step in the tree search. The agent branched out in two directions while playing Minecraft. One of those branches did a lot better. And then it's really easy to kind of just converge, pick the best branch and go from there.

00:06:06.340 | And in a game like Minecraft where you have hundreds or thousands of steps, avoiding derailing and resetting like that can be really handy.

00:06:14.340 | But maybe not just in Minecraft, also in kind of a lot of other settings where people are having their agents do a lot of work over long horizons.

00:06:23.340 | So if you'd like to see some implementations of stateful environments, you can go to our GitHub. There's an open source repository that captures a lot of these abstractions.

00:06:34.340 | Implementations across a lot of academic benchmarks. How do you find that? Look for synth AI environments. And that's the talk.

00:06:44.340 | I'm going to go to the next one.

00:06:46.340 | We'll see you next time.

Stateful environments for vertical agents — Josh Purtell, Synth Labs