LangChain Interrupt 2025 Uber Agentic Developer Products with LangGraph

00:00:00.000 | What we think we can make the most important role is always learning, that's why we're here, see what everyone else is up to and see what else we can target.

00:00:12.000 | And lastly, what I was going to say is the kind of pillar of our strategy is we don't want to build the right, we call it cross-current derivatives.

00:00:19.000 | There's a lot of traditional analytics that pretty much align all your solutions, your property dealer too, and having the right abstractions in place, the right frameworks, like tooling, helps us build more solutions and build them faster.

00:00:36.000 | And lastly, what I was going to say is probably the cornerstone of this strategy is what we call intentional background stuff.

00:00:42.000 | We've taken a better on a few product areas, we want to build them, we want to build them as fast as possible, but we do stop and be deliberate about, hey, what here is useful?

00:00:52.000 | What can we spun out into something that reduces the value for the next problem we want to solve?

00:00:58.000 | And so, LandEffects is our integrated framework we built, that helps, like LandDrop and LandChain, it makes it work better with Uber systems.

00:01:08.000 | And it was born out of necessity, right?

00:01:10.000 | We had the first couple of products emerge, and they wanted us all.

00:01:14.000 | The problems in HR, they expanded it, they wanted us to build a new system in the north, and LandDrop was the first thing to do it.

00:01:21.000 | Because we saw this proliferating from the organization, we made it available, and we built a new data framework around it.

00:01:28.000 | So, you know, I think enough of the view, let's dive into one of the products, one of the students validate it.

00:01:34.000 | Yeah, absolutely.

00:01:35.000 | So, the first product showcases the ASL validator.

00:01:38.000 | Now, what it is, is that the NIDs, like I said, blackout, dash-backed discrimination for security issues for engineers, and code, automatically.

00:01:47.000 | So, it is effectively a man-ground region that pulls a nice ID U.S. around.

00:01:53.000 | And, you know, let's take a look at how it works.

00:01:55.000 | So, we have this feature here, that shows a user opening build file.

00:02:01.000 | And, what they have in there is, they notified of a violation in this case.

00:02:05.000 | So, they have a little bit of a typos that they can mouse over.

00:02:08.000 | And they get a nice photo, saying, "Hey, in this case, you're using the incorrect method to create a temporary test file.

00:02:16.000 | You know, this will leak into closing, and you want to have it automatically get up for you."

00:02:22.000 | So, what do you do about it?

00:02:23.000 | What can the user do?

00:02:24.000 | Well, they have multiple choices.

00:02:25.000 | They can apply a pre-computed fix that they have to share for them in the background.

00:02:30.000 | Or, if they choose so, they can ship off the fix to their ID identity system.

00:02:34.000 | So, that's what we have in the text line, actually.

00:02:37.000 | The fix provides them to ship out, and we log back with the fix from the ID.

00:02:41.000 | So, the issue is no longer present, and the user tagging into the issue is resolved.

00:02:47.000 | So, that's super.

00:02:49.000 | Some of the key ideas that we found out about building this.

00:02:53.000 | The main thing is that the agent abstraction allows us to actually compose multiple sub-agents under a central validator, for example.

00:03:01.000 | So, we have a, you know, sections, a sub-agent for validator that calls into that a list of practices and sort of gets those points of feedback.

00:03:10.000 | Dissolved or returned.

00:03:13.000 | But, there's also an interesting fit, where, for example, we want to discover link issues from data vendors.

00:03:19.000 | So, if there's nothing stopping us from running a link tool, and then passing on to the rest of the graph, that allows us to, you know,

00:03:25.000 | re-comput it even for those.

00:03:27.000 | So, that's the 30.

00:03:28.000 | And, in terms of impact, you know, we've seen thousands of fixed interactions today from data side engineers that fix their problems in code before they come back later to find them.

00:03:39.000 | And, I think, you know, we think we've built a compelling experience today.

00:03:43.000 | Like, we met developers where they are in the ID team.

00:03:46.000 | We have to link that onto the background.

00:03:48.000 | We just combine, you know, we don't use the capabilities like we use case part of Google.

00:03:53.000 | We find our way to the desk boundary live.

00:03:56.000 | We're able to evaluate issues against a set of curated best practices, lack of violations.

00:04:02.000 | And, comfortably, the most expressive way to deliver this back to the user.

00:04:07.000 | Short of the ID.

00:04:08.000 | Give them a way of applying fixments.

00:04:10.000 | But we thought, why stop there?

00:04:14.000 | For sure.

00:04:15.000 | So, why stop and validate?

00:04:17.000 | Let's help engineers by welcoming their tests from the get-off.

00:04:20.000 | Now, you know, the second thing we're showing you up here is called AutoCover.

00:04:24.000 | And, it is a tool to help engineers build, or generate, rather.

00:04:27.000 | Building, passing, coverage raising, business case testing.

00:04:32.000 | And, you know, validate and implementation testing tests.

00:04:35.000 | So, like, really high quality tests, what we're showing you for here.

00:04:37.000 | And, the intent is to save the engineer time.

00:04:39.000 | So, they're all built.

00:04:40.000 | And, you want to get there just quickly, and move on to the next business feature that you

00:04:43.000 | want to implement.

00:04:44.000 | So, the way we got to do this is, actually, we took a bunch of experts doing expeditions.

00:04:50.000 | We actually threw in the validator there, as well, and warned that later.

00:04:53.000 | And, we arrived at the test generation tool.

00:04:56.000 | So, let's take a look at our works.

00:04:58.000 | We have a screenshot of, you know, a source graph, as an example.

00:05:02.000 | And, the user can, you know, invoke it in a lot of other ways.

00:05:06.000 | If they want to open for the whole file, and sort of both generate, they can do a right-click,

00:05:09.000 | as shown in the screenshot, and just invoke it.

00:05:12.000 | And, at once, the user clicks the button.

00:05:14.000 | What happens next is a whole bunch of stuff happens in the background.

00:05:17.000 | So, we start with adding a new target to the built system.

00:05:20.000 | We, you know, we set up a test file.

00:05:22.000 | We run an initial coverage check to get a sort of target space for us to operate on.

00:05:27.000 | All while that is being done, we also analyze the surrounding source to get the business markets out,

00:05:33.000 | so that we know what to test against.

00:05:36.000 | And, what the user sees really is just, they get switched to an empty test file.

00:05:39.000 | In this case, it can also be populated.

00:05:41.000 | And then, because we did all that stuff in the background, we're starting to already generate tests.

00:05:46.000 | And, what the user will see is, there's a stream of tests coming.

00:05:50.000 | And then, the file will be constant flux.

00:05:52.000 | There will be tests coming in at a fast speed.

00:05:55.000 | We will build.

00:05:56.000 | This does the fast.

00:05:57.000 | We'll take it out.

00:05:58.000 | Some tests might get converged.

00:05:59.000 | Some tests might get removed because they're redundant.

00:06:02.000 | You might see benchmark, like concurrency tests come in later.

00:06:06.000 | And so, you know, the user is sort of watching this experience.

00:06:10.000 | And then, at the end, we're running a nice set of validated beta tests.

00:06:14.000 | That's what we want.

00:06:15.000 | That's the magic we want for our users here.

00:06:17.000 | Yeah, and that's what we want.

00:06:20.000 | Let's dive a bit deeper into the graph here to see how it actually functions.

00:06:23.000 | So, here's the graph.

00:06:25.000 | Now, on the bottom right, you can actually see value there, which is the same agent that

00:06:30.000 | we just talked about previously.

00:06:31.000 | So, you can already see some of the composability learnings that we found useful.

00:06:38.000 | But, so how do we arrive at this graph?

00:06:41.000 | We look at the sort of heuristics that an engineer would use while writing tests.

00:06:46.000 | And, so for example, you want to prepare a test environment.

00:06:49.000 | You want to think about which business, in this case, to test.

00:06:52.000 | That's the job of a scaffolder.

00:06:54.000 | Then you want to think about new test cases.

00:06:56.000 | Whether it be for extending existing tests or just writing new tests altogether.

00:07:00.000 | That's the job of a generator.

00:07:01.000 | And then you want to run your builds, your tests.

00:07:03.000 | And then if you, you know, those are passing, you want to run a coverage check to see what

00:07:07.000 | you missed.

00:07:08.000 | That's the job of an expert.

00:07:09.000 | And so, we go on to, you know, complete the graph this way.

00:07:13.000 | And then because we don't, no longer have anyone involved, we can actually supercharpe the graph.

00:07:17.000 | Sort of juice it up so that we can do a hundred iterations of code generation at the same time.

00:07:22.000 | And another hundred executions at the same time.

00:07:24.000 | We've seen, you know, for a sufficiently large source file, you can do that.

00:07:28.000 | And that's sort of where our key learning comes in.

00:07:30.000 | We found that having these super capable domain expert agents gives us unparalleled performance.

00:07:37.000 | Sort of exceptional performance compared to other agendic coding tools.

00:07:40.000 | We benchmarked with the industry agendic coding tools that are available for test generation.

00:07:45.000 | And we get about two, three times more coverage in about half the time compared to them.

00:07:51.000 | because of the speed ups that we did in creating this graph view.

00:07:55.000 | And sort of the custom bespoke knowledge that we built into our agency.

00:07:59.000 | And in terms of impact, we have, the school has helped raise the whole developer platform coverage

00:08:07.000 | by by 10%.

00:08:08.000 | So that maps to about 21,000 that are saved, which we're super happy about.

00:08:11.000 | And we're saying continue to use of thousands of test generated monthly.

00:08:15.000 | So, yeah, that's very happy about that.

00:08:18.000 | I'm sorry, take us through some more questions.

00:08:21.000 | Yeah, so we don't want to stop it, but I would just mean, like, we built this primitive, right?

00:08:24.000 | We're going to give you a sneak peek of what else we've been able to do when we organize this with this.

00:08:28.000 | So what you see on the screen right now is our hoover assistant builder.

00:08:31.000 | Think of it like our internal custom GPT store where you can build jackpots that are, you know, steep to hoover knowledge.

00:08:38.000 | So, like, one of them you see on the screen is the security score bar.

00:08:41.000 | And it has access to some of the same tools that we showed you.

00:08:45.000 | You know, it's conceived of who's best practices.

00:08:48.000 | It can detect security at that battery.

00:08:50.000 | So even before I get to the point of I'm getting my ID right forward, I can ask questions about architecture,

00:08:55.000 | figure out whether my implementation is secure or not, right?

00:08:58.000 | Same primitives, power, different experience.

00:09:01.000 | Next up we have Picasso.

00:09:03.000 | Picasso is our internal workflow management platform.

00:09:06.000 | And we build a contractual AI, as well as GD.

00:09:10.000 | I got that.

00:09:11.000 | It understands workflow automation.

00:09:12.000 | It understands the source of truth.

00:09:14.000 | And it can give you feedback, grounded in product, through, like, aware of what the product does.

00:09:20.000 | The other thing I want to show you, and this is not an exhaustive list, right?

00:09:24.000 | Is our tool called Q2View.

00:09:27.000 | Obviously, we build stuff in the ID.

00:09:29.000 | We try and flag at the values earlier in the process.

00:09:32.000 | But sometimes things also need to track.

00:09:35.000 | You know, why not reinforce the major qualities and force before, you know,

00:09:39.000 | what gets landed, before your VR gets merged.

00:09:42.000 | So, again, Power could some of the same tools that you saw earlier in Power, like Validator,

00:09:47.000 | and Death Generator.

00:09:48.000 | We're able to flag, you know, both of you comments and both suggestions that developers can apply

00:09:54.000 | during the view panel.

00:09:56.000 | I think with that, we'll just jump over to the learnings.

00:09:59.000 | Yep, it's awesome.

00:10:00.000 | So, in terms of the learnings, we already sort of talked about this,

00:10:04.000 | but we found that building domain expeditions that are streamable

00:10:07.000 | are actually the way to go to get outside results.

00:10:11.000 | So, they use hotness better.

00:10:13.000 | You can upload things in which state.

00:10:15.000 | They use loosely less.

00:10:16.000 | And then, you know, the outgoing result is much better.

00:10:19.000 | So, an example that I already talked about is the execute agent.

00:10:23.000 | So, we're able to connect our load system to allow us to, on the same file, execute the

00:10:28.000 | hunting tests, on the same test map, without writing, and then also get separate coverage

00:10:32.000 | reports.

00:10:33.000 | That's an example of a domain expeditions that's super capable and gives us that performance

00:10:36.000 | that we want.

00:10:37.000 | Secondly, we found that when possible, composing agents with deterministic subagents, or just

00:10:44.000 | have the whole agent deterministic, makes a lot of sense.

00:10:46.000 | If you can solve the problem, you have deterministic weight.

00:10:48.000 | So, you know, one example of that is the lit-agent undervalidator.

00:10:53.000 | We want to have reliable output.

00:10:55.000 | And if we have deterministic tools that we get to give out that intelligence, we need to

00:11:00.000 | allow, we can have that reliable, often pass on the learnings to the rest of the graph

00:11:05.000 | and have them fixed.

00:11:06.000 | And then, third, we found that we can scale up our data efforts quite a bit by solving a

00:11:11.000 | problem by creating an agent and then using it in multiple applications.

00:11:16.000 | So, you already saw it with the statin-blown experience and the value of the part of our test

00:11:22.000 | generation valuation.

00:11:23.000 | But I'm going to give you one more lower-level example.

00:11:25.000 | That's the build system agent.

00:11:26.000 | That's actually used for both of the products.

00:11:29.000 | That's the lower-level abstraction that is required for us to be able to, you know, have

00:11:34.000 | the agents be able to execute builds and execute tests in our build system.

00:11:39.000 | So, Sorak, I think it's supposed to be a strategic learning.

00:11:42.000 | Yeah.

00:11:43.000 | So, I might have talked to us about some of the tech benefits, but this is the one I'm

00:11:46.000 | probably most excited to share.

00:11:48.000 | Like, you can set up your organization for success if you want to build a agent with AI.

00:11:53.000 | I think we've done a pretty good job of it at Cooper.

00:11:56.000 | We haven't developed an AI on-trades.

00:11:58.000 | We're all building in collaboration and I think these are our biggest takeaway.

00:12:02.000 | The third thing is, you know, encapsulation with collaboration.

00:12:07.000 | So, when there are well-thought-out abstractions like Landgraf and there are opinions on how

00:12:13.000 | to do things like handle, state management, how to deal with the currency, it really allows

00:12:19.000 | us to scale development horizontally.

00:12:21.000 | Let's just tackle more problems and more complex problems without creating this operational model

00:12:28.000 | entry.

00:12:29.000 | An example I'll give you is, our security team was able to guide tools for validated like the

00:12:34.000 | the product we did earlier.

00:12:35.000 | It's able to guide security on that matter.

00:12:37.000 | The security team knew nothing about this part of security.

00:12:40.000 | They knew nothing about AI agents and how they're constructed.

00:12:44.000 | But they were still able to add value into the large part of all the works.

00:12:48.000 | And so, like a natural segue from that is if you're able to encapsulate, you know, work into these well-defined

00:12:54.000 | works.

00:12:55.000 | Then, like graphs are the next thing to think about, right?

00:12:58.000 | Like graphs are the next thing to think about, right?

00:13:00.000 | Like graphs model these interactions perfectly.

00:13:02.000 | They often have to mirror how developers already interact with the system.

00:13:07.000 | So, when we do the processing process engineering and identify process bottlenecks of inefficiency,

00:13:14.000 | it doesn't just help accelerate or boost the AI workflows.

00:13:18.000 | It also helps improve the experience for people not even interacting with the AI tools, right?

00:13:23.000 | So, it's not like a haunt or should we relate to this or should we improve our existing system.

00:13:29.000 | it usually segments into, like, helping each other.

00:13:32.000 | Like, you know, we talk about our agenting test generation and we find multiple inefficiencies

00:13:38.000 | through, like, how are you doing mock generation quickly?

00:13:41.000 | How do you modify build files that will, like, interact with the build system?

00:13:46.000 | Are you going to execute the depths?

00:13:48.000 | And in the process of, like, fixing all these paper cuts, we improve the experience for just, like,

00:13:55.000 | non-TAG applications for developers to interact with our systems.

00:14:00.000 | And it's been hugely beneficial.

00:14:03.000 | And, you know, with that, I want to bring this up to Penny.

00:14:06.000 | We really enjoyed presenting here.

00:14:08.000 | Thank you for the opportunity.

00:14:10.000 | Hopefully, you all learned something and we'll take something back to your documents.

00:14:14.000 | Thank you.

00:14:15.000 | Thank you.

LangChain Interrupt 2025 Uber Agentic Developer Products with LangGraph – Sourabh Shirhatti