back to indexBreakthrough Agents: Building Reliable Agentic Systems

00:00:00.000 | 
Hey everybody, my name is Eno, co-founder and CTO of a company called Factory. 00:00:19.760 | 
At Factory, we believe that the way we build software is radically changing. 00:00:26.240 | 
We are transitioning from the era of human-driven software development to agent-driven software 00:00:33.380 | 
You can see glimpses of that today, however, it seems like we are trying to get to that 00:00:42.700 | 
The current zeitgeist is to take the IDE, a tool that was designed first and foremost 00:00:48.260 | 
for human beings to write lines of code by hand, and add AI on top of that. 00:00:54.480 | 
And then you keep adding AI until kind of something changes. 00:00:59.940 | 
But what we've seen is that in order for organizations, both small and large, to unlock changes in productivity 00:01:08.840 | 
beyond just 10%, 15%, there is more of a workflow change that is demanded. 00:01:16.500 | 
And so, we've noticed that in order to truly get 5, 10, 20x performance, you need to start 00:01:24.140 | 
shifting your mindset from, "We are collaborating with AI systems," to, "We are delegating tasks 00:01:33.680 | 
In order to make that transition, you start to realize that you need a really different 00:01:38.140 | 
type of tool, a different type of platform, one that has an interface that lets you manage 00:01:49.020 | 
You need infrastructure that lets you scale up and parallelize these agents at the same time. 00:01:55.880 | 
You need context from across your entire engineering system that integrates more than just GitHub, 00:02:02.540 | 
source code, or Jira, but also your observability tools, right? 00:02:10.280 | 
And you need to connect all of this with agents that can actually reliably execute on the task 00:02:18.040 | 
We think that this deserves an entirely new sort of platform. 00:02:25.580 | 
Most of the enterprise organizations that we work with have 5, 10,000-plus engineers, right? 00:02:32.260 | 
And so, this transition doesn't just come from kind of switching your editor or from switching 00:02:38.920 | 
kind of to watching some tutorials about Vibe coding. 00:02:42.200 | 
Instead, it kind of requires an active investment. 00:02:45.340 | 
And so, I'm going to kind of talk about some of the things we've done in our platform that 00:02:50.240 | 
enable this human-computer interaction with agents and also talk about some of the things 00:02:55.100 | 
that we've done to enable reliable agents that can actually execute on some of the tasks 00:03:03.120 | 
So I'm going to start with a quick video of what's possible. 00:03:06.780 | 
This video showcases a quick glimpse of what it's like to trigger an agentic system. 00:03:12.160 | 
We call them droids from your project management system. 00:03:15.920 | 
And then you watch, in a platform, watch it go from ticket to pull request. 00:03:22.300 | 
These types of tasks have quickly become table stakes for agentic systems. 00:03:27.600 | 
Tasks with a clear goal, clear success criteria, they're now quite solvable with AI. 00:03:34.540 | 
And the agent accomplishes these tasks with a combination of tools, reasoning, search, right, 00:03:40.920 | 
computer use, and then it can take all of this and bring it into one of your existing 00:03:45.780 | 
platforms, your source code manager, maybe GitHub. 00:03:54.240 | 
These are all questions that plenty of people have spoken about. 00:04:00.020 | 
And so, before we go into super deep detail about how you get something like this, going 00:04:05.040 | 
from a plan all the way to a mergeable code, I'm going to just talk a bit about what we think 00:04:14.020 | 
So we found that there's lots of different workflows, some people call them cognitive architectures, 00:04:20.020 | 
that we've experimented with, plenty of other teams have experimented with. 00:04:24.360 | 
And it's hard to draw an explicit boundary of what an agent is. 00:04:28.860 | 
But there are three characteristics that we think are pretty consistent amongst the systems 00:04:37.780 | 
Your agentic system needs to make plans that decide one or more futures actions in the system. 00:04:44.260 | 
That can be very simple, like outlining a single kind of edit call and then returning back 00:04:50.080 | 
to the user, or it could be quite complex, like saying, "We're going to go search through 00:04:54.960 | 
the code base, plan out a couple of different edits, create tickets in our project management 00:04:59.860 | 
system, execute on each of those tickets, and come back with a write-up sent in Slack to 00:05:09.840 | 
But in addition to that, you also need decision-making, right? 00:05:13.840 | 
And so when you are executing on a plan, there are tons of different data and interactions 00:05:20.400 | 
that your agent needs to be able to make, right? 00:05:22.980 | 
Look at the existing state and make a call based on that. 00:05:26.860 | 
A lot of the time, reasoning is referred to as an agent's ability to make these decisions 00:05:35.160 | 
And then finally, agents need environmental grounding, right? 00:05:39.020 | 
So, agentic systems read and write information to their external environment. 00:05:45.480 | 
Ideally, they also use this information to react and adapt to changes in that environment. 00:05:54.580 | 
When I think about agentic systems, most of the challenge in making them reliable exists 00:06:03.420 | 
But there's kind of a meta problem that exists, which is, no matter how reliable your agent 00:06:09.880 | 
system is, you have to answer the question, where does the human fit in? 00:06:15.960 | 
We think that we're currently at the point in time in history with the least number of developers 00:06:23.840 | 
Humans are here to stay in building software. 00:06:27.300 | 
But there's an outer loop and there's an inner loop of software development. 00:06:31.780 | 
In the outer loop, you are reasoning about what needs to get done. 00:06:37.180 | 
You're listening to customers and translating those into requirements. 00:06:40.780 | 
You're iterating on different architectural decisions. 00:06:44.440 | 
And in that inner loop, you are writing lines of code. 00:06:52.700 | 
We think that the inner loop is going away very soon. 00:06:56.320 | 
And agents will take up the vast majority of that. 00:06:59.120 | 
And so how do you create an AI UX that blends delegation so that you can stay in flow in 00:07:09.760 | 
When an agent ultimately can't accomplish something because of the technology, because of the current 00:07:14.320 | 
state of where we're at, you need to be able to steer that with precision. 00:07:18.720 | 
So that's kind of a background thing that I want to reference as we deep dive into those 00:07:28.480 | 
We like to say a droid is only as good as its plan. 00:07:32.760 | 
One of the biggest challenges we face is ensuring that a droid creates a good plan and sticks to 00:07:39.780 | 
We were inspired by robotics and control systems in thinking about what are the techniques 00:07:46.160 | 
we can use to improve the ability of the system to make high quality plans. 00:07:51.800 | 
If a droid goes off and executes on something and it's totally the wrong idea, a customer's 00:08:00.100 | 
And so there's a couple of things that we'd call out. 00:08:05.540 | 
Plans don't need to be just a high level do the thing, right? 00:08:14.040 | 
Your plans should be continuously updated by the environment as it's executed on. 00:08:23.000 | 
Certain tasks have a certain sort of template or workflow, right, that they can be executed 00:08:30.080 | 
Now, if you're too rigid with your plan templates and how your system wants to do that, you know, 00:08:33.320 | 
do something, you're going to reduce creativity. 00:08:35.820 | 
But if you know that at the end of the day, when your agent codes, that it's going to create 00:08:39.900 | 
a pull request or a commit, then you can sort of start to reason through what the beginning, 00:08:45.760 | 
intermediate, and end steps of your plan might look like, right? 00:08:48.960 | 
And so we built all these systems into droids. 00:08:51.860 | 
And I have an example here, a real world example of a droid that is making a plan. 00:08:57.200 | 
You'll see it is searching through information, it's reasoning, it is taking that information 00:09:04.200 | 
and breaking it down into a long form plan that has multiple steps, one for front end, one 00:09:10.480 | 
for the back end, tests that it's going to do, and how it will actually execute commands to 00:09:18.180 | 
This type of planning is really hard to get right. 00:09:21.480 | 
But when you do get it right, it makes your system far more reliable at actually executing 00:09:35.080 | 
This is probably the hardest thing to control in your agents. 00:09:40.980 | 
When you're building software, when you are doing tasks across other domains, human beings 00:09:46.960 | 
are making hundreds, thousands of micro decisions around what to name the variable, the scale 00:09:53.500 | 
of the change to make, where should this change go in the code base, should we imitate the patterns 00:09:59.600 | 
of the code base, or is the code base filled with tech debt and we should instead innovate 00:10:05.460 | 
Agents need to be able to assess these trade-offs and take action decisively and correctly in order 00:10:14.040 | 
There's a few factors you can introduce to improve your agents' ability to make decisions. 00:10:22.200 | 
And when you're thinking about how to actually introduce these changes, you kind of have to 00:10:31.280 | 
think, what sort of decisions am I going to face? 00:10:34.980 | 
And do I have the ability to explicitly select, or criteria, by which my agent should make 00:10:43.440 | 
If you are an agent for travel planning or an agent for code review, right, you have a very 00:10:50.540 | 
limited set of things that the agent actually will need to decide, right? 00:10:56.140 | 
You can say, I need to decide on what the price points are. 00:10:59.800 | 
I need to decide about if this code fits a certain set of standards, right? 00:11:04.300 | 
If you're more open-ended, like you have an agent that can do a lot of different things, or write a bunch of different 00:11:11.640 | 
code paths, there's less explicit decision-making criteria that you can introduce, right? 00:11:17.360 | 
A lot of this stuff happens in the reasoning layer of models. 00:11:21.460 | 
So you really have to think about how your agent is instructed and the context that it 00:11:28.920 | 
Factory customers often ask questions like, how do I structure an API for this project, right? 00:11:34.020 | 
They expect droids to be able to evaluate requirements from the user and from their organization, assess 00:11:40.700 | 
the existing code base, reason through different performance implications, and then ultimately 00:11:48.340 | 
So this is really tricky, but powerful when done right. 00:11:51.280 | 
And so if you take some of these decision-making criteria, context from the environment, you bring 00:11:56.540 | 
it together, that is what gives an agent the ability to actually make a proper decision. 00:12:02.500 | 
And the last thing is environmental grounding, right? 00:12:05.280 | 
This is the connection your agent has with the real world. 00:12:09.020 | 
Things interact through the world in AI computer interfaces, right? 00:12:14.140 | 
Dedicated tools and context injection from other systems to the agent. 00:12:19.840 | 
This is actually really different and not so simple as saying, let's take an API or let's 00:12:27.220 | 
The entire internet and the last 40 years of computing on top of it has basically existed 00:12:36.060 | 
And so we have to build new AI computer interfaces that let our agents naturally interact with 00:12:43.580 | 
This is where we spend most of our time at Factory. 00:12:46.500 | 
We found that control over the tools your agent uses is the single most important differentiator 00:12:54.680 | 
In addition to just the tools themselves, there is also a sense that in order to ground yourself 00:13:02.000 | 
in your environment, you need to properly process information that comes in. 00:13:06.340 | 
An example of this might be if I take a CLI command and I get 100,000 lines of response 00:13:14.560 | 
If we just pass that to the agent, it's going to go off the rails, right? 00:13:21.400 | 
We need to take the important information and hand it to the agent and the unimportant information 00:13:27.120 | 
These types of decisions are make or break for complex systems that interact with huge volumes 00:13:33.060 | 
So think about how you process information, not just the tools that you're actually calling. 00:13:39.120 | 
And so here's an example of a Droid being handed a Sentry error alert. 00:13:47.760 | 
It's going to search through repositories to find the candidate, search using a couple of 00:13:52.520 | 
different strategies, semantic, glob, grep, APIs for Sentry, view that relevant information, 00:13:59.760 | 
and access GitHub PRs that happened around the time of the merged error, and then validate 00:14:07.240 | 
And so it's going to use all that knowledge to then write an RCA based on all of that information. 00:14:13.760 | 
This is the type of grounding that your systems need to do in order to go beyond just coding 00:14:18.380 | 
and enter into the world of full software development work. 00:14:24.920 | 
So just to recap, a couple of main takeaways. 00:14:28.580 | 
Start with a clear plan and start with clear boundaries. 00:14:33.360 | 
Show the work of your systems as they reason and make decisions, and that helps build trust, 00:14:41.380 | 
And iterate as your agent works, allow it to reason, search, think through hard problems, 00:14:50.880 | 
And finally, from that beginning point, design for human collaboration. 00:14:58.400 | 
Shipping high-quality software is climbing Mount Everest. 00:15:03.020 | 
A couple of notes that, you know, want to add. 00:15:08.540 | 
We're always thinking about deeper integrations, memory, and if you are thinking about working 00:15:14.340 | 
on these types of hard problems, or working on agentic systems in general, we are always 00:15:20.480 | 
And if your team is not delegating more than 50% of its engineering tasks to AI agents,