back to index

LangChain Interrupt 2025 Factory AI Building Reliable Agentic Systems – Eno Reyes


Whisper Transcript | Transcript Only Page

00:00:00.800 | The IDE, a tool that was designed first and foremost
00:00:03.960 | for human beings to write lines of code by hand
00:00:07.760 | and add AI on top of that.
00:00:10.000 | And then you keep adding AI until something changes.
00:00:15.360 | But what we've seen is that in order for organizations,
00:00:20.360 | both small and large, to unlock changes in productivity
00:00:24.560 | beyond just 10 and 15%, there is more of a workflow change
00:00:29.720 | that is demanding.
00:00:32.040 | And so we've noticed that in order to truly get 5, 10, 20x
00:00:38.160 | performance, you need to start shifting your mindset
00:00:41.080 | from we are collaborating with AI systems
00:00:44.480 | to we are delegating tasks entirely to AI systems.
00:00:49.240 | In order to make that transition, you
00:00:51.440 | start to realize that you need a really different type
00:00:54.280 | of tool, a different type of platform, one that has an interface
00:00:59.440 | that lets you manage and delegate these AI systems.
00:01:04.120 | You need infrastructure that lets you scale up and parallelize
00:01:07.880 | these agents at the same time.
00:01:10.800 | You need context from across your entire engineering system that
00:01:15.760 | integrates more than just GitHub, source code, or Jira, but also your observability
00:01:21.240 | tools, right?
00:01:22.160 | Your knowledge bases, the internet, and you need to connect all of this with agents that can
00:01:28.600 | actually reliably execute on the task in an end-to-end way.
00:01:32.240 | We think that this deserves an entirely new sort of platform.
00:01:37.240 | And so at Factory, we're building that.
00:01:40.680 | most of the enterprise organizations that we work with have 5, 10,000 plus engineers, right?
00:01:47.360 | And so this transition doesn't just come from kind of switching your editor or from switching
00:01:54.480 | kind of to watching some tutorials about live coding.
00:01:57.360 | So instead, it kind of requires an active investment.
00:02:00.440 | And so I'm going to kind of talk about some of the things we've done in our platform that
00:02:05.880 | enable this human computer interaction with agents and also talk about some of the things
00:02:11.040 | that we've done to enable reliable agents that can actually execute on some of the tasks
00:02:16.040 | that we are talking about.
00:02:18.040 | So I'm going to start with a quick video of what's possible.
00:02:22.360 | This video showcases a quick glimpse of what it's like to trigger an agentic system.
00:02:27.760 | We call them droids from your project management system.
00:02:31.520 | And if you watch in a platform, watch it go from ticket to pull request.
00:02:37.920 | These types of tasks have quickly become table stakes for agentic systems.
00:02:43.240 | These tasks with a clear goal, clear success criteria, they're now quite solvable with AI.
00:02:49.920 | And the agent accomplishes these tasks with a combination of tools, reasoning, search, right,
00:02:56.760 | computer use, and then it can take all of this and bring it into one of your existing platforms,
00:03:01.960 | your source code management, and your GitHub.
00:03:04.360 | But what really is an agentic system?
00:03:08.080 | How does it use tools?
00:03:09.720 | These are all questions that plenty of people have spoken about.
00:03:12.760 | The definition remains quite fuzzy, right?
00:03:15.520 | And so before we go into super deep detail about how you get something like this, going
00:03:20.760 | from a plan all the way to a mergeable code, I'm going to just talk a bit about what we think
00:03:27.320 | an agentic system is, right?
00:03:29.520 | So we found that there's lots of different workflows, some people call them cognitive architectures,
00:03:35.520 | that we've experimented with, plenty of other teams have experimented with.
00:03:39.720 | And it's hard to draw an explicit boundary of what an agent is.
00:03:44.320 | But there are three characteristics that we think are pretty consistent amongst the systems
00:03:48.720 | that most people refer to as being agentic.
00:03:51.720 | That first is planning, right?
00:03:53.480 | Your agentic system needs to make plans that decide one or more future actions in the system.
00:03:59.720 | That can be very simple, like outlining a single kind of edit call and then returning back to the user, where it can be quite complex.
00:04:07.720 | Like saying, we're going to go search through the code base, plan out a couple of different edits, create tickets in our project management system, execute on each of those tickets, and come back with a write-up sentence slap to our manager manager.
00:04:20.720 | Plans can be small, they can be simple, but in addition to that, you also need decision making, right?
00:04:27.720 | And so when you are executing on a plan, there are tons of different data and interactions that your agent needs to be able to make, right?
00:04:37.320 | Look at the existing state and make a call based on that.
00:04:41.320 | A lot of the time, reasoning is referred to as an agent's ability to make these decisions in an intermediate way.
00:04:50.320 | And then finally, agents need environmental grounding, right?
00:04:54.320 | So, agentic systems read and write information to their external environment.
00:05:00.320 | Ideally, they also use this information to react and adapt to changes in that environment.
00:05:06.320 | When I think about agentic systems, most of the challenge in making them reliable exists in the details of the agent itself.
00:05:18.320 | But there's kind of a meta problem that exists, which is no matter how reliable your agent system is, you have to answer the question, where does the human fit in?
00:05:31.320 | We think that we're currently at the point in time in history with the least number of developers that there will ever be.
00:05:38.320 | Humans are here to stay in building software.
00:05:42.320 | But there's an outer loop and there's an inner loop of software development.
00:05:46.320 | In the outer loop, you are reasoning about what needs to get done.
00:05:50.320 | You're working with your colleagues.
00:05:52.320 | You're listening to customers and translating those into requirements.
00:05:55.320 | You're iterating on different architectural decisions.
00:05:59.320 | And in that inner loop, you are writing lines of code.
00:06:02.320 | You're testing that code.
00:06:04.320 | You're building.
00:06:05.320 | You're checking up on that.
00:06:06.320 | You're doing code review.
00:06:07.320 | We think that the inner loop is going away very soon.
00:06:11.320 | And agents will take up the vast majority of that.
00:06:14.320 | And so how do you create an AI/UFs that blends delegation so that you can stay and flow in that outer loop with control when an agent ultimately can't accomplish something because of the technology, because of the current state of where we're at?
00:06:30.320 | You need to be able to steer that with precision, right?
00:06:33.320 | So this is kind of a background thing that I want to reference as we deep dive into those three principles of agenting systems.
00:06:42.320 | So planning.
00:06:43.320 | We like to say a droid is only as good as its plan.
00:06:47.320 | One of the biggest challenges we face is ensuring that a droid creates a good plan and sticks to it.
00:06:55.320 | We were inspired by robotics and control systems in thinking about what are the techniques we can use to improve the ability of the system to make high-quality plans.
00:07:07.320 | If a droid goes off and executes on something and it's totally the wrong idea, a customer's time and money is heavily wasted, right?
00:07:15.320 | And so there's a couple of things that we call out.
00:07:18.320 | Sub-pass decomposition, right?
00:07:20.320 | Plans don't need to be just a high-level do-the-thing, right?
00:07:24.320 | We can break plans down into sub-tasks.
00:07:27.320 | Model predictive control, right?
00:07:29.320 | Your plans should be continuously updated by the environment as it's executed on.
00:07:34.320 | And explicit plan templating, right?
00:07:37.320 | Certain tasks have a certain sort of template or workflow, right, that they can be executed on.
00:07:44.320 | Now, if you're too rigid with your plan templates and how your system wants to do something, you're going to reduce creativity.
00:07:50.320 | But if you know that at the end of the day when you're engaging codes that it's going to create a pull request or a commit, then you can sort of start to reason through what the beginning, intermediate, and end steps of your plan might look like.
00:08:03.320 | Right?
00:08:04.320 | And so we built all these systems into droids.
00:08:07.320 | And I have an example here, a real-world example of a droid that is making a plan.
00:08:12.320 | You'll see it is searching through information, it's reasoning, it is taking that information and breaking it down into a long-form plan that has multiple steps.
00:08:24.320 | One for front-end, one for back-end tests that it's going to do, and how it will actually execute commands to roll this change out with feature flags.
00:08:33.320 | This type of planning is really hard to get right, but when you do get it right, it makes your system far more reliable at actually executing on that end of tasks.
00:08:43.320 | And now this droid will keep going.
00:08:47.320 | The second thing is decision-making, right?
00:08:50.320 | This is probably the hardest thing to control in your agents.
00:08:55.320 | When you're building software, when you are doing tasks across other domains, human beings are making hundreds, thousands of micro decisions around what's named the variable, the scale of the change to make, where should this change go in the codebase, should we imitate the patterns of the codebase, or is the codebase filled with tech debt and we should instead innovate on the codebase, right?
00:09:20.320 | Agents need to be able to assess these trade-offs and take action decisively and correctly in order to be effective.
00:09:29.320 | There's a few factors you can introduce to improve your agent's ability to make decisions, right?
00:09:35.320 | And when you're thinking about how to actually introduce these changes, you kind of have to think what sort of decisions am I going to face and do I have the ability to explicitly select or criteria by which my agent should make these decisions, right?
00:09:58.320 | If you are an agent for travel planning or an agent for code review, right?
00:10:05.320 | You have a very limited set of things that the agent actually will need to decide, right?
00:10:11.320 | You can say, I need to decide on what the price points are.
00:10:14.320 | I need to decide about if this code fits a certain set of standards, right?
00:10:19.320 | If you're more open-ended, like you have an agent that can do a lot of different things or write a bunch of different code paths, there's less explicit decision-making criteria that you can introduce, right?
00:10:32.320 | A lot of this stuff happens in the reasoning layer of models.
00:10:36.320 | So you really have to think about how your agent is instructed and the context that it has about the environment around it.
00:10:43.320 | Factory customers often ask questions like, how do I structure an API for this project, right?
00:10:49.320 | They expect droids to be able to evaluate requirements from the user and from their organization, assess the existing code base, reason through different performance implications, and then ultimately come up with a final decision of what to do.
00:11:03.320 | So this is really tricky, but powerful when done right.
00:11:06.320 | And so if you take some of these decision-making criteria, context in the environment, you bring it together, that is what gives an agent the ability to actually make a proper decision.
00:11:17.320 | And the last thing is environmental grounding, right?
00:11:20.320 | This is the connection your agent has with the real world.
00:11:23.320 | Agents interact through the world in AI computer interfaces, right?
00:11:28.320 | Dedicated tools and context injection from other systems to the agent.
00:11:34.320 | This is actually really different and not so simple as saying, let's take an API or let's just build a simple tool, right?
00:11:41.320 | The entire internet and the last 40 years of computing on top of it has basically existed primarily for human beings.
00:11:50.320 | And so we have to build new AI computer interfaces that let our agents naturally interact with the world.
00:11:58.320 | This is where we spend most of our time in factory.
00:12:01.320 | We found that control over the tools your agent uses is the single most important differentiator in your agent's reliability.
00:12:08.320 | In addition to just the tools themselves, there is also a sense that in order to ground yourself in your environment, you need to properly process information that comes in.
00:12:21.320 | An example of this might be if I take a CLI command and I get 100,000 lines of response from that CLI tool call.
00:12:29.320 | If we just pass that to the agent, it's going to go off the rails, right?
00:12:33.320 | We need to process it.
00:12:34.320 | We need to find out what's important.
00:12:36.320 | We need to take the important information and hand it to the agent and the unimportant information and hide it.
00:12:42.320 | These types of decisions are make or rate for complex systems that interact with huge volumes of data.
00:12:48.320 | So think about how you process information, not just the tools that you're actually using.
00:12:53.320 | And so here's an example of a droid being called a Sentry error.
00:12:58.320 | And it's being told, we need an RCA.
00:13:01.320 | We need to figure out what happened, right?
00:13:03.320 | It's going to search through repositories to find the candidate, search using a couple of different strategies, semantic, law, breadth, APIs, entry, view that relevant information, and access Gitmo PRs that happened around the time of the merged error, and then validate that with some additional searches.