Ship Production Software in Minutes, Not Months

. Hi everybody, my name is Eno. I really appreciate that introduction. And maybe I can start with a bit of background. I started working on LLMs about two and a half years ago when GPT 3.5 was coming out. And it became increasingly clear that agentic systems were going to be possible with the help of LLMs.

At Factory, we believe that the way that we use agents, in particular to build software, is going to radically change the field of software development. We're transitioning from the era of human-driven software development to agent-driven development. You can see glimpses of that today. You guys have already heard a bunch of great talks about different ways that agents can help with coding in particular.

However, it seems like right now we're still trying to find what that interaction pattern, what that future looks like. And a lot of what's publicly available is more or less an incremental improvement. The current zeitgeist is to take tools that were developed 20 years ago for humans to write every individual line of code, and ultimately tools that were designed first and foremost for human beings.

And you sprinkle AI on top. And then you keep adding layers of AI. And then at some point, maybe there's some step function change that happens. But there's not a lot of clarity there in exactly what that means. You know, there's a quote that is attributed to Henry Ford.

If I had asked people what they wanted, they would have said, faster horses. Now, we believe that there are some fundamentally hard problems blocking organizations from accessing the true power of AI. This power can only be found when your team is delegating the majority of their tasks across the software lifecycle to agents.

To do that, you need a platform that has an intuitive interface for managing and delegating tasks, centralized context from across all your engineering tools and data sources, agents that consistently produce reliable, high-quality outputs, and infrastructure that supports thousands of agents working in parallel. These are all hard problems to solve.

But our team has spent the last two years partnering with large organizations to build towards this future. This talk is going to serve as sort of a deep dive into agent-native development and a bit of a share of some of the lessons that we've learned helping enterprise organizations make the transition to agent-native development.

When Andre Karpathy said, English is the new programming language, he captured this very exciting moment. And if you're to judge AI progress based on Twitter, you'd think that you can basically vibe code your way to anything. But vibe coding isn't the approach to solve hard problems. You can't vibe code a legacy Java 7 app that runs 5% of the world's global bank transactions.

You need a little bit more software engineering. So agents really should not be thought of as a replacement for human ingenuity. Agents are climbing gear. And building production software is like scaling Mount Everest. And so while better tools have made this climb more accessible, we still need to think about how to leverage them and use our existing expertise in order to drive this transformation.

I want to start with a quick video of what's possible today. And so in this, you'll see a quick glimpse of what it's like to delegate a task to an agentic system. You can watch the droid, as we call them, ingest the task, and start grounding itself in the environment.

It uses tools to search through the code base, determine the git branch, check out what the machine has available to it. It looks through recent changes to the code base. It looks at memories of its recent interactions with users, as well as memories from its interactions across the entire organization.

And then the droid comes back with a plan and says, here's exactly what I'm going to do, but I'd like you to clarify a couple of things. We need to expect our agents to not just take what we say at face value, but instead question it and make us better software developers.

And so after the user comes back with that info, the droid comes, it executes on that task, it leverages its tools to write code, runs pre-commit hooks, lints, and ultimately generates a pull request that passes CI. But how can you achieve outcomes like this on a regular basis, right?

"It's nice when it works, but what about when it fails?" At the heart of effective AI-assisted development lies a very fundamental truth. AI tools are only as good as the context that they receive. So much of what people are calling prompt engineering is really mentally modeling this alien intelligence that has a slice of context of the real world.

And if you start thinking about your AI tools this way, you're going to start to get a lot better at interacting with them. We've investigated thousands of droid-assisted development sessions. And you see this sort of heuristic emerge, where AI is most likely failing to solve the problem, not because the LLMs aren't good enough, but because it's missing crucial context that's required to truly solve it.

And better models are going to make this happen less often, but the real solution is not just making the AI smarter. It's going to be getting better at providing these systems with that missing context. LLMs don't know about your morning stand-up. They don't know about the meeting that you had ad hoc and the whiteboard that you did, right?

But you can give those things to the LLM if you transcribe your notes, if you take a photo and you upload it, right? You have to start thinking about these things not as tools, but as something in between a co-worker and a platform, right? And if you can get that context that lies in the cracks between systems, you use platforms that integrate natively with all of your data sources, and you have agents that can actually make use of those things, you can start actually driving this transition to agent-native development.

I want to talk a bit as well about planning and design. When your organization is doing agent-native development, then you are using agents at every stage. Droids don't just write code. They can help with that part, but the hardest thing about software development is not the code. It's about figuring out exactly what to build.

Here you can watch a droid as it's tasked with trying to find the most up-to-date information about a new model release and integrate that into an existing chat application. It's going to leverage internet search, its knowledge of your code base, its understanding of your product goals from its organ memory, and its understanding of your technical architecture from the design doc you wrote last week.

Planning with AI is fundamentally different from planning alone. It's not necessarily just asking, "Please build this thing for me, or give me the design doc." But instead, it's about delegating the groundwork and the research to AI agents, then using a collaborative platform to interact and explore possibilities together. That is how you get better at planning with agents.

Now you can see here we have a nice document, a nice plan. You could export that to Notion, Confluence, JIRA, any of your integrations with no setup because MCP is great, but having every developer have to install a bunch of servers, install a bunch of servers, click a bunch of things, pass around the API key is not necessarily ideal.

And so platforms are going to evolve and solve a lot of these problems, but in the meantime, you do have droids. And now, a little bit more on this. The real unlock for AI, transforming your organization with respect to planning, is going to be when you start standardizing the way that your organization thinks.

Right? And so there's a bit of an example that we just had a couple of weeks ago while we were planning out a feature related to our cloud development environments. We got a lot of feedback from users, and so we had about three months of user transcripts. People from enterprises, individuals that we knew, we transcribe every single interaction and meeting at Factory.

We take those notes and we combine them with a droid that has access to our architecture. We take an ad hoc meeting that one of our engineers took a granola of, if you guys use granola, I love that tool, and we throw that all to the knowledge droid. And we say, we don't say, let's plan the feature out.

We say, could you find any patterns in the customer feedback that map up to our assumptions? Can you highlight any technical constraints with what we have today that might help us make this better? And then we take all of that output, those documents, there's maybe four or five intermediate results here, and that's what we use to start iterating on a final PRD that helps us outline the full feature.

You can take that PRD, and if you have a droid that has access to linear and JIRA with tools to create tickets, create epics, modify those things, then that PRD can be turned into a roadmap. Eight tickets, this ticket is dependent on that ticket, but ultimately work that can be parallelized amongst a group of eight code droids.

Right? And so this is how software is going to evolve. We're going to move from executing to orchestrating systems that work on our behalf. I talked about a couple of these. I think PRDs, ENG design docs, RCA templates, quarterly ENG and product roadmaps, right? Transcriptions of your meetings. Normally, you might see this stuff as a burden, but when your company is doing agent-native software development, your process and your documentation is a knowledge base and a map for your droids to learn and imitate the way that your team thinks.

This documentation and process is a conversation with both future developers as well as future AI systems. And so if you can communicate that why behind the decision, that context for those future developers and agents, if you can communicate, then you'll start to see that there's a huge lift in their ability to natively work the way that your team actually works.

I want to talk about agent-driven development with respect to site reliability engineering. There is a lot that goes into a real incident response. It would be crazy for me to go up here and say you could actually just automate all of SRE and RCA work today. But there is a difference in the AI agent-driven approach, right?

Here we're watching a droid take a sentry incident and convert it into a full RCA and mitigation plan. Traditional incident response is effectively solving a puzzle. The pieces are scattered across dozens of systems. Logs in one place, metrics in another, historical context somewhere else. There's knowledge in your team's head.

Droids in your organization fundamentally change this, right? When an alert triggers, you can pull in context from relevant system logs, past incident, runbooks in Notion or Confluence, team discussions from Slack. And you can see that a droid that has the tools and the ability to access this can condense that search effort from hours to minutes.

And so really, the acceptable time to act for a standard enterprise organization, it's really going to be zero, right? The moment that an incident happens, you should have a droid that's telling you exactly what happened, exactly how to fix it. And the thing that gets interesting is when you have user and organization level memory, you really start to build a model of what your team's response patterns and common issues are.

And so it's not just generating runbooks or generating a mitigation for one incident, right, but creating new processes that help solve some of these issues. And once you've written that RCA, right, you can move on to generate runbooks for those new learned patterns, update existing responses, RCA, right, and RCA, right, you can move on to the next slide, right, you can move on to the next slide, right, and then you can move on to the next slide, right, and then you can move on to the next slide, and then you can move on to the next slide, and then you can move on to the next slide, and then you can move on to the next slide, and then you can move on to the next slide.

So, we're seeing teams that are able to cut incident response time in half because context is immediate. They're able to reduce repeat incidents because the third time something happens, the droid starts to say, maybe we should fix this, and they're able to improve team collaboration because when a new engineer joins the team and says, how do we do this, it's already in memory.

So, most importantly, what we're seeing in general is a shift from reactive to predictive operations because you can now start to really see the patterns across the entire operational history, and agentic systems turn each of these incidents into an opportunity to make the entire system far more reliable. AI agents are not replacing software engineers.

They're significantly amplifying their individual capabilities. The best developers I know are spending far less time in the IDE writing lines of code. It's just not high leverage. They're managing agents that can do multiple things at once, that are capable of organizing the systems, and they're building out patterns that supersede the inner loop of software development, and they're moving to the outer loop of software development.

They aren't worried about agents taking their jobs. They're too busy using the agents to become even better at what they do. The future belongs to developers who understand how to work with agents, not those who hope that AI will just do the work for them. And in that future, the skill that matters most is not technical knowledge or your ability to optimize a specific system, but your ability to think clearly and communicate effectively with both humans and AI.

Now, if you find any of this interesting, and you want to try the droids, I'm happy to share that everyone here at this talk can use this QR code to sign up for an account. Our mobile experience is not optimized yet, but the droids are on that, and so I'd recommend trying this on a laptop, but you will get 20 million free tokens credited to your account.

And I also want to add that, you know, first and foremost, factory is an enterprise platform, right? And so if you're thinking about security, if you're thinking about where are the audit logs, whose responsibility is it when an agent goes and runs remove RF recursive on your code base, right?

Droids don't do that. But if it were to, right? Whose responsibility is that? Then these are the types of questions that we're interested in and that we're helping large organizations solve today. And so if you're a security professional, if you're thinking about ownership, auditability, indemnification if you're a lawyer, right?

These are the types of questions that you should start asking today because YOLO mode is probably not the best thing to be running inside your enterprise, right? And so give it a scan, give it a try, check out some of the controls we have, and if you have any questions, feel free to reach out via email.

Thanks. Thanks. Thanks. Thanks. Thanks. Thanks. Thanks. Thanks. Thanks. Thanks. Thanks. We'll see you next time.

Ship Production Software in Minutes, Not Months — Eno Reyes, Factory

Transcript