From Pilot to Platform: Agentic Developer Products with LangGraph

00:00:00.000 | All right. Hello, everyone. Thanks for being here and joining us on this nice Wednesday afternoon.

00:00:18.000 | My name is Matas Ristanis, and this is my colleague.

00:00:21.000 | Hey, folks. I'm Saurabh Sherhati.

00:00:23.000 | And today we're going to present how we built AI developer tools at Uber using LangGraph.

00:00:29.000 | So to start off a little bit of context, Uber is a massive company serving 33 million trips a day across 15,000 cities.

00:00:36.000 | This is enabled by a massive code base with hundreds of millions of lines of code.

00:00:41.000 | And it is our job, the job of developer platform, to make sure that that code is churning smoothly through.

00:00:48.000 | Now, all you need to know really is that we have 5,000 developers that are hard to please that we have to keep happy.

00:00:55.000 | So that's not so easy on us here.

00:00:58.000 | To accomplish that, we built out a large corpus of dev tools for our engineers.

00:01:03.000 | And today we'll present a few of them to you and some of the key insights that we found out while building them.

00:01:09.000 | So, Saurabh, take us through the agenda.

00:01:13.000 | All right. So we'll dive right in by talking about the 10,000-foot view of the AI developer to landscape at Uber.

00:01:21.000 | As part of that, we'll highlight a couple of products we've built.

00:01:24.000 | We'll actually show you what the user experience is like.

00:01:26.000 | And then we'll tell you what are the reusable tools and agents that power them.

00:01:31.000 | After that, you know, we can only focus on a couple.

00:01:34.000 | But we'll do a quick, we'll blow through a couple more products we've built just to show you how this is proliferated all through Uber.

00:01:40.000 | And finally, we'll just, you know, tell you what we learned and hopefully there's something reusable there for you.

00:01:45.000 | So let's do it.

00:01:47.000 | Let's do it.

00:01:48.000 | Okay.

00:01:49.000 | So our AI DevTools strategy at Uber is built on primarily three pillars, right?

00:01:54.000 | The first one is these products or bets that we have to take.

00:01:58.000 | So we've picked things that directly improve developer workflow.

00:02:02.000 | So these are things that our developers perform today.

00:02:05.000 | It could be writing tests.

00:02:07.000 | Yes, I know it's boring.

00:02:08.000 | It is reviewing code, which can also be laborious.

00:02:11.000 | And we're like, okay, how do we make this better?

00:02:13.000 | How do we make this faster?

00:02:14.000 | How do we eliminate oil for developers?

00:02:17.000 | We've taken a bet on few areas.

00:02:19.000 | It's based on, you know, what we think we can make the most impact, but we're also always learning.

00:02:24.000 | You know, that's why we're here.

00:02:25.000 | See what everyone else is up to and see what else we can target.

00:02:29.000 | The second pillar of our strategy is we've got to build the right, what we call like cross-cutting primitives.

00:02:35.000 | There's foundational AI technologies that is pretty much, you know, in all your solutions, you'll probably feel it too.

00:02:42.000 | And having the right, you know, abstractions in place, the right frameworks, the right tooling helps us build more solutions and build them faster.

00:02:52.000 | And lastly, what I'd say is probably the cornerstone of this strategy is what we call intentional tech transfer.

00:02:58.000 | We've taken a bet on a few product areas.

00:03:01.000 | We want to build them.

00:03:02.000 | We want to build them as fast as possible.

00:03:03.000 | But we do stop and be deliberate about, hey, what here is reusable?

00:03:08.000 | What can be spun out into something that reduces the barrier for the next problem we want to solve?

00:03:13.000 | And so LangFX is our opinionated framework we built that wraps like LangGraph and LangChain and makes it work better with Uber systems.

00:03:23.000 | And it was born out of a necessity, right?

00:03:26.000 | We had the first couple of products emerge and they wanted to solve problems in an agentic manner.

00:03:32.000 | They wanted to build reusable nodes and like LangGraph was the perfect fit to do it because we saw it was proliferating across the organization.

00:03:40.000 | We made it available.

00:03:41.000 | We built a opinionated framework around it.

00:03:44.000 | So, you know, I think enough of the view.

00:03:47.000 | Let's just dive into one of the products.

00:03:48.000 | Mark, can you walk us through Validator?

00:03:50.000 | Yeah, absolutely.

00:03:51.000 | So the first part we'll showcase today is called Validator.

00:03:55.000 | Now, what it is is an NID experience that flags up best practices violations and security issues for engineers in code automatically.

00:04:03.000 | So it is effectively a LangGraph agent that we built a nice ID UX around.

00:04:09.000 | And, you know, let's take a look at how it works.

00:04:11.000 | So we have a screenshot here that shows a user opening Go file.

00:04:18.000 | And what they have there is they're notified of a violation in this case.

00:04:21.000 | So they have a little bit of a diagnostic that they can mouse over.

00:04:24.000 | And they got a nice modal saying, hey, in this case, you're using the incorrect method to create a temporary test file.

00:04:33.000 | You know, this will leak into the host.

00:04:35.000 | You want to have them automatically clean up for you.

00:04:38.000 | So what do you do about it?

00:04:39.000 | What can the user do?

00:04:40.000 | Well, they have multiple choices.

00:04:42.000 | They can apply a pre-computed fix that we have prepared for them in the background.

00:04:46.000 | Or if they choose so, they can ship off the fix to their ID agentic assistant if they prefer.

00:04:51.000 | So that's what we have in the next slide, actually, is the fix request has been shift off.

00:04:55.000 | And we got back a fix from the ID.

00:04:57.000 | And so the issue is no longer present.

00:04:59.000 | And the user is happy.

00:05:00.000 | The issue is resolved.

00:05:01.000 | They no longer have a code smell.

00:05:03.000 | So that's super.

00:05:04.000 | Some of the key ideas that we found out while building this.

00:05:09.000 | The main thing is that the agent abstraction allows us to actually compose multiple sub-agents

00:05:14.000 | under a central validator agent for now, for example.

00:05:17.000 | So we have a, you know, sections, a sub-agent for a validator that calls into the LLM with a list of practices and sort of gets those points of feedback resolved or returned.

00:05:29.000 | But there's also a deterministic bit where, for example, we want to discover lint issues from static linters.

00:05:35.000 | So there's nothing stopping us from running a lint tool and then passing on the learnings through the rest of the graph that allows us to, you know, pre-compute a fix even for those.

00:05:43.000 | So that's the learning.

00:05:44.000 | And in terms of impact, you know, we're seeing thousands of fixed interactions a day from satisfied engineers that fix their problems in code before they come back later to bite them.

00:05:56.000 | And I think, you know, we think we've built a compelling experience here, right?

00:05:59.000 | We've met developers where they are in the IDE.

00:06:02.000 | We have tooling that runs in the background.

00:06:04.000 | It can combine, you know, deterministic capabilities like we use AST parsing tools.

00:06:09.000 | We find out where each of the test boundaries lie.

00:06:12.000 | We're able to evaluate each one of these against a set of curated best practices, flag up violations, figure out what the most expressive way to deliver this back to the user, shorten the IDE, give them a way of applying fixes.

00:06:27.000 | But we thought, why stop there?

00:06:30.000 | For sure.

00:06:31.000 | So why stop at validating?

00:06:33.000 | Let's help engineers by authoring their tests from the get-go.

00:06:37.000 | Now, you know, the second tool we're showing off here is called AutoCover.

00:06:40.000 | And it is a tool to help engineers build -- or generate, rather -- building, passing, coverage-raising, business case testing, and, you know, validated and mutation tested tests.

00:06:51.000 | So, like, really high-quality tests is what we're shooting for here.

00:06:54.000 | And the intent is to save the engineer time.

00:06:56.000 | So they're developing code.

00:06:57.000 | They want to get their tests quickly and move on to the next business feature that they want to implement.

00:07:00.000 | So the way we got to this is actually we took a bunch of domain expert agents.

00:07:06.000 | We actually threw invalidator in there as well, and more on that later.

00:07:09.000 | And then we arrive at a test generation tool.

00:07:12.000 | So let's take a look at how it works.

00:07:14.000 | We have a screenshot of, you know, Go source file, as an example.

00:07:19.000 | And the user can, you know, invoke it in all auto-covered in multiple ways.

00:07:22.000 | If they want to invoke it for the whole file and sort of bulk generate, they can do a right-click, as shown in the screenshot, and then just invoke it.

00:07:28.000 | And then once the user clicks the button, what happens next is a whole bunch of stuff happens in the background.

00:07:33.000 | So we start with adding a new target to the build system.

00:07:36.000 | We, you know, we set up a test file.

00:07:39.000 | We run an initial coverage check to get a sort of a target space for us to operate on.

00:07:43.000 | All while that is being done, we also analyze the surrounding source to get the business context out, so that we know what to test against.

00:07:52.000 | And what the user sees really is just they get switched to an empty test file in this case.

00:07:56.000 | It can also be populated.

00:07:57.000 | And then because we did all that stuff in the background, we're starting to already generate tests.

00:08:02.000 | And what the user will see is, they'll see a stream of tests come in.

00:08:06.000 | And the file will be in constant flux.

00:08:09.000 | There will be tests coming in at fast speed.

00:08:11.000 | We'll do a build.

00:08:12.000 | This test didn't pass.

00:08:13.000 | We'll take it out.

00:08:14.000 | Some tests might get merged.

00:08:16.000 | Some tests might get removed because they're redundant.

00:08:18.000 | You might see benchmark, like concurrency tests come in later.

00:08:22.000 | And so, you know, the user is sort of watching this experience.

00:08:26.000 | And then at the end, arriving at a nice set of validated vetted tests.

00:08:30.000 | That's what we want.

00:08:31.000 | That's the magic we want for our users here.

00:08:34.000 | Yeah.

00:08:35.000 | And that's what we want.

00:08:36.000 | Let's dive a bit deeper into the graph here to see how it actually functions.

00:08:40.000 | So here's the graph.

00:08:41.000 | On the bottom right, you can actually see validator, which is the same agent that we just talked

00:08:47.000 | about previously.

00:08:48.000 | So you can already see some of the composability learnings that we found useful.

00:08:53.000 | So how do we arrive at this graph?

00:08:56.000 | We looked at the sort of heuristics that an engineer would use while writing tests.

00:09:02.000 | And so, for example, you want to prepare your test environment.

00:09:05.000 | You want to think about which business cases to test.

00:09:08.000 | That's the job of the scaffolder.

00:09:10.000 | And then you want to think up new test cases, whether it be for extending existing tests

00:09:15.000 | or just writing new tests altogether.

00:09:17.000 | That's the job of the generator.

00:09:18.000 | And then you want to run your builds, your tests.

00:09:20.000 | And then if those are passing, you want to run a coverage check to see what you missed.

00:09:24.000 | That's the job of the executor.

00:09:25.000 | And so we go on to complete the graph this way.

00:09:29.000 | And then because we no longer have a human involved, we can actually supercharge the graph

00:09:33.000 | and sort of juice it up so that we can do 100 iterations of a code generation at the same

00:09:38.000 | time, and then 100 executions at the same time.

00:09:41.000 | We've seen, you know, for a sufficiently large source file, you can do that.

00:09:44.000 | And that's sort of where our key learning comes in is we found that having these super-capable

00:09:50.000 | main expert agents gives us unparalleled performance, sort of exceptional performance

00:09:55.000 | compared to other agentic coding tools.

00:09:57.000 | So we benchmarked it against, you know, the industry agentic coding tools that are available

00:10:01.000 | for test generation.

00:10:02.000 | And we get about two to three times more coverage in about half the time compared to them

00:10:08.000 | because of the speed-ups that we did in creating this graph here and sort of the custom

00:10:13.000 | bespoke knowledge that we built into our agents.

00:10:16.000 | And in terms of impact, we have -- the tool has helped raise developer platform coverage

00:10:23.000 | by about 10%.

00:10:24.000 | So that maps to about 21,000 dev hours saved, which we're super happy about.

00:10:28.000 | And we're seeing continued use of thousands of tests generated monthly.

00:10:31.000 | So, yeah, that's very happy about that.

00:10:34.000 | So, take us through some more products.

00:10:36.000 | Yeah, so we didn't want to stop at 5,000 tests a week.

00:10:39.000 | Like, we've built these primitives, right?

00:10:40.000 | Just wanted to give you a sneak peek of what else we've been able to do in the organization with this.

00:10:45.000 | So what you see on screen right now is our Uber Assistant Builder.

00:10:48.000 | Think of it like our internal custom GPT store where you can build chatbots that are, you know, steeped in Uber knowledge.

00:10:54.000 | So, like, one of them you see on the screen is the Security Scorebot.

00:10:57.000 | And it has access to some of the same tools that we showcased earlier.

00:11:01.000 | So it knows it's steeped in Uber's best practices.

00:11:03.000 | It can detect security anti-patterns.

00:11:05.000 | So even before I get to the point of I'm in my IDE writing code, I can ask questions about architecture and figure out whether my implementation is secure or not.

00:11:14.000 | Right?

00:11:15.000 | Same primitives.

00:11:16.000 | Power is a different experience.

00:11:17.000 | So, next up we have Picasso.

00:11:19.000 | Picasso is our internal workflow management platform.

00:11:23.000 | And we built a conversational AI.

00:11:25.000 | We call it Genie.

00:11:26.000 | Adopt that.

00:11:27.000 | It understands workflow automation.

00:11:28.000 | It understands the source of truth.

00:11:30.000 | And it can give you feedback grounded in product truth, like, aware of what the product does.

00:11:36.000 | Third thing I want to show you, and this is not an exhaustive list, right, is our tool called uReview.

00:11:43.000 | Obviously, we built stuff in the IDE.

00:11:45.000 | We tried to flag anti-patterns earlier in the process.

00:11:48.000 | But sometimes things still slip through the crack.

00:11:51.000 | You know, why not reinforce and make sure quality is enforced before, you know, code gets landed, before your PR gets merged.

00:11:58.000 | So, again, powered with some of the same tools that you saw earlier that power, like, Validator and Test Generator,

00:12:04.000 | were able to flag, you know, both code review comments and code suggestions that developers can apply during review time.

00:12:12.000 | I think with that, we'll just jump over to the learnings.

00:12:15.000 | Yeah, sounds good.

00:12:16.000 | So, in terms of the learnings, we already sort of talked about this.

00:12:20.000 | But we found that building domain expeditions that are super capable are actually the way to go to get outsized results.

00:12:27.000 | So, they use context better.

00:12:29.000 | You can encode things in rich state.

00:12:31.000 | They hallucinate less.

00:12:32.000 | And then, you know, the outgoing result is much better.

00:12:35.000 | So, an example that I already talked about is the executor agent.

00:12:39.000 | So, we're able to finagle our build system to allow us to, on the same file, execute 100 tests on the same test file without colliding,

00:12:46.000 | and then also get separate coverage reports.

00:12:48.000 | That's an example of a domain expert that's super capable and gives us that performance that we want.

00:12:53.000 | Secondly, we found that when possible, composing agents with deterministic sub-agents, or just have the whole agent deterministic,

00:13:01.000 | makes a lot of sense if you can solve the problem in a deterministic way.

00:13:05.000 | So, you know, one example of that was the lint agent undervalidator.

00:13:09.000 | We want to have reliable output, and if we have a deterministic tool that can give us that intelligence,

00:13:16.000 | we don't need to rely on an LLM.

00:13:17.000 | We can have that reliable output and pass on the learnings to the rest of the graph and have them fixed.

00:13:23.000 | And then third, we found that we can scale up our dev efforts quite a bit by solving a bounded problem,

00:13:29.000 | by creating an agent, and then reusing it in multiple applications.

00:13:32.000 | So you already saw it with validator, the standalone experience, and validator within auto-cover for test generation validation.

00:13:38.000 | But I'm going to give you one more lower-level example, and that's the build system agent.

00:13:42.000 | That's actually used through both of those products.

00:13:45.000 | That's an even lower-level abstraction that is required for us to be able to, you know,

00:13:50.000 | have the agents be able to, like, execute builds and, like, execute tests in our build system.

00:13:55.000 | So, Sourabh, take us through some of the strategic learnings now.

00:13:58.000 | Yeah.

00:13:59.000 | Sourabh talked us through some of the tech benefits, but this is the one I'm probably most excited to share.

00:14:04.000 | Like, you can set up your organization for success if you want to build agentic AI.

00:14:09.000 | And I think we've done a pretty good job of it at Uber.

00:14:12.000 | We haven't devolved into an AI arms race.

00:14:14.000 | We're all building in collaboration, and I think these are our biggest takeaways.

00:14:18.000 | The first being just, you know, encapsulation boost collaboration.

00:14:23.000 | When there are well-thought-out abstractions, like LandGraph, and there are opinions on how to do things like handle state management, how to deal with concurrency,

00:14:34.000 | it really allows us to scale development horizontally.

00:14:37.000 | It lets us tackle more problems and more complex problems without creating this operational bottleneck, right?

00:14:45.000 | An example I'll give you is our security team was able to write rules for Validator, like the product we showcased earlier.

00:14:52.000 | It's able to detect security anti-patterns, but the security team knew nothing about--

00:14:56.000 | Well, this part of the security team knew nothing about AI agents and how the graph was constructed,

00:15:00.000 | but they were still able to add value and improve the lives of our developers.

00:15:04.000 | And so, like, a natural segue from that is if you're able to encapsulate, you know, work into these well-defined nodes,

00:15:12.000 | then, like, graphs are the next thing you think about, right?

00:15:15.000 | Like, graphs help us model these interactions perfectly.

00:15:19.000 | They oftentimes mirror how developers already interact with the system.

00:15:24.000 | So, when we do the classic process engineering and identify process bottlenecks and inefficiencies,

00:15:31.000 | it doesn't just help accelerate or boost the AI workloads.

00:15:34.000 | It also helps improve the experience for people not even interacting with the AI tools, right?

00:15:39.000 | So, it's not, like, an arms race either, or should we build agentic systems or should we improve our existing systems?

00:15:45.000 | It usually segues into, like, helping each other.

00:15:48.000 | Like, just, you know, we spoke about our agentic test generation,

00:15:51.000 | and we found multiple inefficiencies through, like, how do you do mock generation quickly?

00:15:57.000 | How do you modify build files, invoke, like, interact with the build system?

00:16:02.000 | How do you even execute the tests?

00:16:04.000 | And in the process of, like, fixing all these paper cuts,

00:16:09.000 | we improved the experience for just, like, non-agentic applications,

00:16:13.000 | just for developers interacting directly with our systems.

00:16:16.000 | And it's been hugely beneficial.

00:16:18.000 | And, you know, with that, I want to bring this talk to an end.

00:16:22.000 | We really enjoyed presenting here.

00:16:24.000 | Thank you for the opportunity.

00:16:26.000 | Hopefully, you all learned something, and we'll take something back to your companies.

00:16:30.000 | Thank you.

00:16:31.000 | Thank you.

00:16:32.000 | Thank you.

00:16:33.000 | Thank you.

00:16:34.000 | Thank you.

00:16:35.000 | Thank you.