back to index

From Pilot to Platform: Agentic Developer Products with LangGraph


Whisper Transcript | Transcript Only Page

00:00:00.000 | All right. Hello, everyone. Thanks for being here and joining us on this nice Wednesday afternoon.
00:00:18.000 | My name is Matas Ristanis, and this is my colleague.
00:00:21.000 | Hey, folks. I'm Saurabh Sherhati.
00:00:23.000 | And today we're going to present how we built AI developer tools at Uber using LangGraph.
00:00:29.000 | So to start off a little bit of context, Uber is a massive company serving 33 million trips a day across 15,000 cities.
00:00:36.000 | This is enabled by a massive code base with hundreds of millions of lines of code.
00:00:41.000 | And it is our job, the job of developer platform, to make sure that that code is churning smoothly through.
00:00:48.000 | Now, all you need to know really is that we have 5,000 developers that are hard to please that we have to keep happy.
00:00:55.000 | So that's not so easy on us here.
00:00:58.000 | To accomplish that, we built out a large corpus of dev tools for our engineers.
00:01:03.000 | And today we'll present a few of them to you and some of the key insights that we found out while building them.
00:01:09.000 | So, Saurabh, take us through the agenda.
00:01:13.000 | All right. So we'll dive right in by talking about the 10,000-foot view of the AI developer to landscape at Uber.
00:01:21.000 | As part of that, we'll highlight a couple of products we've built.
00:01:24.000 | We'll actually show you what the user experience is like.
00:01:26.000 | And then we'll tell you what are the reusable tools and agents that power them.
00:01:31.000 | After that, you know, we can only focus on a couple.
00:01:34.000 | But we'll do a quick, we'll blow through a couple more products we've built just to show you how this is proliferated all through Uber.
00:01:40.000 | And finally, we'll just, you know, tell you what we learned and hopefully there's something reusable there for you.
00:01:45.000 | So let's do it.
00:01:47.000 | Let's do it.
00:01:48.000 | Okay.
00:01:49.000 | So our AI DevTools strategy at Uber is built on primarily three pillars, right?
00:01:54.000 | The first one is these products or bets that we have to take.
00:01:58.000 | So we've picked things that directly improve developer workflow.
00:02:02.000 | So these are things that our developers perform today.
00:02:05.000 | It could be writing tests.
00:02:07.000 | Yes, I know it's boring.
00:02:08.000 | It is reviewing code, which can also be laborious.
00:02:11.000 | And we're like, okay, how do we make this better?
00:02:13.000 | How do we make this faster?
00:02:14.000 | How do we eliminate oil for developers?
00:02:17.000 | We've taken a bet on few areas.
00:02:19.000 | It's based on, you know, what we think we can make the most impact, but we're also always learning.
00:02:24.000 | You know, that's why we're here.
00:02:25.000 | See what everyone else is up to and see what else we can target.
00:02:29.000 | The second pillar of our strategy is we've got to build the right, what we call like cross-cutting primitives.
00:02:35.000 | There's foundational AI technologies that is pretty much, you know, in all your solutions, you'll probably feel it too.
00:02:42.000 | And having the right, you know, abstractions in place, the right frameworks, the right tooling helps us build more solutions and build them faster.
00:02:52.000 | And lastly, what I'd say is probably the cornerstone of this strategy is what we call intentional tech transfer.
00:02:58.000 | We've taken a bet on a few product areas.
00:03:01.000 | We want to build them.
00:03:02.000 | We want to build them as fast as possible.
00:03:03.000 | But we do stop and be deliberate about, hey, what here is reusable?
00:03:08.000 | What can be spun out into something that reduces the barrier for the next problem we want to solve?
00:03:13.000 | And so LangFX is our opinionated framework we built that wraps like LangGraph and LangChain and makes it work better with Uber systems.
00:03:23.000 | And it was born out of a necessity, right?
00:03:26.000 | We had the first couple of products emerge and they wanted to solve problems in an agentic manner.
00:03:32.000 | They wanted to build reusable nodes and like LangGraph was the perfect fit to do it because we saw it was proliferating across the organization.
00:03:40.000 | We made it available.
00:03:41.000 | We built a opinionated framework around it.
00:03:44.000 | So, you know, I think enough of the view.
00:03:47.000 | Let's just dive into one of the products.
00:03:48.000 | Mark, can you walk us through Validator?
00:03:50.000 | Yeah, absolutely.
00:03:51.000 | So the first part we'll showcase today is called Validator.
00:03:55.000 | Now, what it is is an NID experience that flags up best practices violations and security issues for engineers in code automatically.
00:04:03.000 | So it is effectively a LangGraph agent that we built a nice ID UX around.
00:04:09.000 | And, you know, let's take a look at how it works.
00:04:11.000 | So we have a screenshot here that shows a user opening Go file.
00:04:18.000 | And what they have there is they're notified of a violation in this case.
00:04:21.000 | So they have a little bit of a diagnostic that they can mouse over.
00:04:24.000 | And they got a nice modal saying, hey, in this case, you're using the incorrect method to create a temporary test file.
00:04:33.000 | You know, this will leak into the host.
00:04:35.000 | You want to have them automatically clean up for you.
00:04:38.000 | So what do you do about it?
00:04:39.000 | What can the user do?
00:04:40.000 | Well, they have multiple choices.
00:04:42.000 | They can apply a pre-computed fix that we have prepared for them in the background.
00:04:46.000 | Or if they choose so, they can ship off the fix to their ID agentic assistant if they prefer.
00:04:51.000 | So that's what we have in the next slide, actually, is the fix request has been shift off.
00:04:55.000 | And we got back a fix from the ID.
00:04:57.000 | And so the issue is no longer present.
00:04:59.000 | And the user is happy.
00:05:00.000 | The issue is resolved.
00:05:01.000 | They no longer have a code smell.
00:05:03.000 | So that's super.
00:05:04.000 | Some of the key ideas that we found out while building this.
00:05:09.000 | The main thing is that the agent abstraction allows us to actually compose multiple sub-agents
00:05:14.000 | under a central validator agent for now, for example.
00:05:17.000 | So we have a, you know, sections, a sub-agent for a validator that calls into the LLM with a list of practices and sort of gets those points of feedback resolved or returned.
00:05:29.000 | But there's also a deterministic bit where, for example, we want to discover lint issues from static linters.
00:05:35.000 | So there's nothing stopping us from running a lint tool and then passing on the learnings through the rest of the graph that allows us to, you know, pre-compute a fix even for those.
00:05:43.000 | So that's the learning.
00:05:44.000 | And in terms of impact, you know, we're seeing thousands of fixed interactions a day from satisfied engineers that fix their problems in code before they come back later to bite them.
00:05:56.000 | And I think, you know, we think we've built a compelling experience here, right?
00:05:59.000 | We've met developers where they are in the IDE.
00:06:02.000 | We have tooling that runs in the background.
00:06:04.000 | It can combine, you know, deterministic capabilities like we use AST parsing tools.
00:06:09.000 | We find out where each of the test boundaries lie.
00:06:12.000 | We're able to evaluate each one of these against a set of curated best practices, flag up violations, figure out what the most expressive way to deliver this back to the user, shorten the IDE, give them a way of applying fixes.
00:06:27.000 | But we thought, why stop there?
00:06:30.000 | For sure.
00:06:31.000 | So why stop at validating?
00:06:33.000 | Let's help engineers by authoring their tests from the get-go.
00:06:37.000 | Now, you know, the second tool we're showing off here is called AutoCover.
00:06:40.000 | And it is a tool to help engineers build -- or generate, rather -- building, passing, coverage-raising, business case testing, and, you know, validated and mutation tested tests.
00:06:51.000 | So, like, really high-quality tests is what we're shooting for here.
00:06:54.000 | And the intent is to save the engineer time.
00:06:56.000 | So they're developing code.
00:06:57.000 | They want to get their tests quickly and move on to the next business feature that they want to implement.
00:07:00.000 | So the way we got to this is actually we took a bunch of domain expert agents.
00:07:06.000 | We actually threw invalidator in there as well, and more on that later.
00:07:09.000 | And then we arrive at a test generation tool.
00:07:12.000 | So let's take a look at how it works.
00:07:14.000 | We have a screenshot of, you know, Go source file, as an example.
00:07:19.000 | And the user can, you know, invoke it in all auto-covered in multiple ways.
00:07:22.000 | If they want to invoke it for the whole file and sort of bulk generate, they can do a right-click, as shown in the screenshot, and then just invoke it.
00:07:28.000 | And then once the user clicks the button, what happens next is a whole bunch of stuff happens in the background.
00:07:33.000 | So we start with adding a new target to the build system.
00:07:36.000 | We, you know, we set up a test file.
00:07:39.000 | We run an initial coverage check to get a sort of a target space for us to operate on.
00:07:43.000 | All while that is being done, we also analyze the surrounding source to get the business context out, so that we know what to test against.
00:07:52.000 | And what the user sees really is just they get switched to an empty test file in this case.
00:07:56.000 | It can also be populated.
00:07:57.000 | And then because we did all that stuff in the background, we're starting to already generate tests.
00:08:02.000 | And what the user will see is, they'll see a stream of tests come in.
00:08:06.000 | And the file will be in constant flux.
00:08:09.000 | There will be tests coming in at fast speed.
00:08:11.000 | We'll do a build.
00:08:12.000 | This test didn't pass.
00:08:13.000 | We'll take it out.
00:08:14.000 | Some tests might get merged.
00:08:16.000 | Some tests might get removed because they're redundant.
00:08:18.000 | You might see benchmark, like concurrency tests come in later.
00:08:22.000 | And so, you know, the user is sort of watching this experience.
00:08:26.000 | And then at the end, arriving at a nice set of validated vetted tests.
00:08:30.000 | That's what we want.
00:08:31.000 | That's the magic we want for our users here.
00:08:34.000 | Yeah.
00:08:35.000 | And that's what we want.
00:08:36.000 | Let's dive a bit deeper into the graph here to see how it actually functions.
00:08:40.000 | So here's the graph.
00:08:41.000 | On the bottom right, you can actually see validator, which is the same agent that we just talked
00:08:47.000 | about previously.
00:08:48.000 | So you can already see some of the composability learnings that we found useful.
00:08:53.000 | So how do we arrive at this graph?
00:08:56.000 | We looked at the sort of heuristics that an engineer would use while writing tests.
00:09:02.000 | And so, for example, you want to prepare your test environment.
00:09:05.000 | You want to think about which business cases to test.
00:09:08.000 | That's the job of the scaffolder.
00:09:10.000 | And then you want to think up new test cases, whether it be for extending existing tests
00:09:15.000 | or just writing new tests altogether.
00:09:17.000 | That's the job of the generator.
00:09:18.000 | And then you want to run your builds, your tests.
00:09:20.000 | And then if those are passing, you want to run a coverage check to see what you missed.
00:09:24.000 | That's the job of the executor.
00:09:25.000 | And so we go on to complete the graph this way.
00:09:29.000 | And then because we no longer have a human involved, we can actually supercharge the graph
00:09:33.000 | and sort of juice it up so that we can do 100 iterations of a code generation at the same
00:09:38.000 | time, and then 100 executions at the same time.
00:09:41.000 | We've seen, you know, for a sufficiently large source file, you can do that.
00:09:44.000 | And that's sort of where our key learning comes in is we found that having these super-capable
00:09:50.000 | main expert agents gives us unparalleled performance, sort of exceptional performance
00:09:55.000 | compared to other agentic coding tools.
00:09:57.000 | So we benchmarked it against, you know, the industry agentic coding tools that are available
00:10:01.000 | for test generation.
00:10:02.000 | And we get about two to three times more coverage in about half the time compared to them
00:10:08.000 | because of the speed-ups that we did in creating this graph here and sort of the custom
00:10:13.000 | bespoke knowledge that we built into our agents.
00:10:16.000 | And in terms of impact, we have -- the tool has helped raise developer platform coverage
00:10:23.000 | by about 10%.
00:10:24.000 | So that maps to about 21,000 dev hours saved, which we're super happy about.
00:10:28.000 | And we're seeing continued use of thousands of tests generated monthly.
00:10:31.000 | So, yeah, that's very happy about that.
00:10:34.000 | So, take us through some more products.
00:10:36.000 | Yeah, so we didn't want to stop at 5,000 tests a week.
00:10:39.000 | Like, we've built these primitives, right?
00:10:40.000 | Just wanted to give you a sneak peek of what else we've been able to do in the organization with this.
00:10:45.000 | So what you see on screen right now is our Uber Assistant Builder.
00:10:48.000 | Think of it like our internal custom GPT store where you can build chatbots that are, you know, steeped in Uber knowledge.
00:10:54.000 | So, like, one of them you see on the screen is the Security Scorebot.
00:10:57.000 | And it has access to some of the same tools that we showcased earlier.
00:11:01.000 | So it knows it's steeped in Uber's best practices.
00:11:03.000 | It can detect security anti-patterns.
00:11:05.000 | So even before I get to the point of I'm in my IDE writing code, I can ask questions about architecture and figure out whether my implementation is secure or not.
00:11:14.000 | Right?
00:11:15.000 | Same primitives.
00:11:16.000 | Power is a different experience.
00:11:17.000 | So, next up we have Picasso.
00:11:19.000 | Picasso is our internal workflow management platform.
00:11:23.000 | And we built a conversational AI.
00:11:25.000 | We call it Genie.
00:11:26.000 | Adopt that.
00:11:27.000 | It understands workflow automation.
00:11:28.000 | It understands the source of truth.
00:11:30.000 | And it can give you feedback grounded in product truth, like, aware of what the product does.
00:11:36.000 | Third thing I want to show you, and this is not an exhaustive list, right, is our tool called uReview.
00:11:43.000 | Obviously, we built stuff in the IDE.
00:11:45.000 | We tried to flag anti-patterns earlier in the process.
00:11:48.000 | But sometimes things still slip through the crack.
00:11:51.000 | You know, why not reinforce and make sure quality is enforced before, you know, code gets landed, before your PR gets merged.
00:11:58.000 | So, again, powered with some of the same tools that you saw earlier that power, like, Validator and Test Generator,
00:12:04.000 | were able to flag, you know, both code review comments and code suggestions that developers can apply during review time.
00:12:12.000 | I think with that, we'll just jump over to the learnings.
00:12:15.000 | Yeah, sounds good.
00:12:16.000 | So, in terms of the learnings, we already sort of talked about this.
00:12:20.000 | But we found that building domain expeditions that are super capable are actually the way to go to get outsized results.
00:12:27.000 | So, they use context better.
00:12:29.000 | You can encode things in rich state.
00:12:31.000 | They hallucinate less.
00:12:32.000 | And then, you know, the outgoing result is much better.
00:12:35.000 | So, an example that I already talked about is the executor agent.
00:12:39.000 | So, we're able to finagle our build system to allow us to, on the same file, execute 100 tests on the same test file without colliding,
00:12:46.000 | and then also get separate coverage reports.
00:12:48.000 | That's an example of a domain expert that's super capable and gives us that performance that we want.
00:12:53.000 | Secondly, we found that when possible, composing agents with deterministic sub-agents, or just have the whole agent deterministic,
00:13:01.000 | makes a lot of sense if you can solve the problem in a deterministic way.
00:13:05.000 | So, you know, one example of that was the lint agent undervalidator.
00:13:09.000 | We want to have reliable output, and if we have a deterministic tool that can give us that intelligence,
00:13:16.000 | we don't need to rely on an LLM.
00:13:17.000 | We can have that reliable output and pass on the learnings to the rest of the graph and have them fixed.
00:13:23.000 | And then third, we found that we can scale up our dev efforts quite a bit by solving a bounded problem,
00:13:29.000 | by creating an agent, and then reusing it in multiple applications.
00:13:32.000 | So you already saw it with validator, the standalone experience, and validator within auto-cover for test generation validation.
00:13:38.000 | But I'm going to give you one more lower-level example, and that's the build system agent.
00:13:42.000 | That's actually used through both of those products.
00:13:45.000 | That's an even lower-level abstraction that is required for us to be able to, you know,
00:13:50.000 | have the agents be able to, like, execute builds and, like, execute tests in our build system.
00:13:55.000 | So, Sourabh, take us through some of the strategic learnings now.
00:13:58.000 | Yeah.
00:13:59.000 | Sourabh talked us through some of the tech benefits, but this is the one I'm probably most excited to share.
00:14:04.000 | Like, you can set up your organization for success if you want to build agentic AI.
00:14:09.000 | And I think we've done a pretty good job of it at Uber.
00:14:12.000 | We haven't devolved into an AI arms race.
00:14:14.000 | We're all building in collaboration, and I think these are our biggest takeaways.
00:14:18.000 | The first being just, you know, encapsulation boost collaboration.
00:14:23.000 | When there are well-thought-out abstractions, like LandGraph, and there are opinions on how to do things like handle state management, how to deal with concurrency,
00:14:34.000 | it really allows us to scale development horizontally.
00:14:37.000 | It lets us tackle more problems and more complex problems without creating this operational bottleneck, right?
00:14:45.000 | An example I'll give you is our security team was able to write rules for Validator, like the product we showcased earlier.
00:14:52.000 | It's able to detect security anti-patterns, but the security team knew nothing about--
00:14:56.000 | Well, this part of the security team knew nothing about AI agents and how the graph was constructed,
00:15:00.000 | but they were still able to add value and improve the lives of our developers.
00:15:04.000 | And so, like, a natural segue from that is if you're able to encapsulate, you know, work into these well-defined nodes,
00:15:12.000 | then, like, graphs are the next thing you think about, right?
00:15:15.000 | Like, graphs help us model these interactions perfectly.
00:15:19.000 | They oftentimes mirror how developers already interact with the system.
00:15:24.000 | So, when we do the classic process engineering and identify process bottlenecks and inefficiencies,
00:15:31.000 | it doesn't just help accelerate or boost the AI workloads.
00:15:34.000 | It also helps improve the experience for people not even interacting with the AI tools, right?
00:15:39.000 | So, it's not, like, an arms race either, or should we build agentic systems or should we improve our existing systems?
00:15:45.000 | It usually segues into, like, helping each other.
00:15:48.000 | Like, just, you know, we spoke about our agentic test generation,
00:15:51.000 | and we found multiple inefficiencies through, like, how do you do mock generation quickly?
00:15:57.000 | How do you modify build files, invoke, like, interact with the build system?
00:16:02.000 | How do you even execute the tests?
00:16:04.000 | And in the process of, like, fixing all these paper cuts,
00:16:09.000 | we improved the experience for just, like, non-agentic applications,
00:16:13.000 | just for developers interacting directly with our systems.
00:16:16.000 | And it's been hugely beneficial.
00:16:18.000 | And, you know, with that, I want to bring this talk to an end.
00:16:22.000 | We really enjoyed presenting here.
00:16:24.000 | Thank you for the opportunity.
00:16:26.000 | Hopefully, you all learned something, and we'll take something back to your companies.
00:16:30.000 | Thank you.
00:16:31.000 | Thank you.
00:16:31.000 | Thank you.
00:16:32.000 | Thank you.
00:16:32.000 | Thank you.
00:16:33.000 | Thank you.
00:16:34.000 | Thank you.
00:16:34.000 | Thank you.
00:16:35.000 | Thank you.