Back to Index

LangChain Interrupt 2025 Agents at Scale with LangGraph – David Tag


Transcript

It's really easy for teams to make production-ready Python services, and I think we've been pretty successful so far. Over 20 teams use this framework, and I think this obviously will be glorious at this point. Over 30 services have been created to support generating Python experiences on Python. And finally, at an entire architecture level, at a system level, we invest into new distributed architecture to specifically support HR2 modes of communication.

So let's start from the ground up, and I'll talk about why we're just Python. So at a 10,000-foot view, these are the languages that we use for LinkedIn. Up until, I would say, late 2022, Python was really-- and I like to call it Python here. Python, we use mostly just for internal tooling, internal productivity tools, but also big data applications here, kind of high-smart, offline jobs.

But really, Java was used to build up the vast, overwhelming majority of our business project. And so, come LinkedIn's 2022, which was really LinkedIn's first foray to Gendered AI, we saw that, hey, we're already using Java for non-gendered AI use cases. Let's just use Java for JA also. Some of you might be wincing, and I can already anticipate that.

So at that time, we built some really basic apps. They are just simple prompts with basic prompt engineering, little to no rag, no conversational memory to speak of. And this was okay, and you probably predicted this part, but it was okay until it really wasn't. What we saw was a lot of teams, they wanted to experiment with Python.

They wanted to use Python for prompt engineering for the evaluations, but because of our stack, they were forced to build Java and use that to build their services. So not only was that a problem, the more fundamental problem that we faced was how do you experiment with the open source libraries, the open source community, if your stack is fundamentally non-Python.

And as soon as there is that kind of language gap, then it's really difficult for teams, honestly, to innovate and iterate the various techniques. It seemed like, you know, every month, every week, there would be a new model being released, a new library, a new prompt engineering technique, a new protocol.

And I'm sure all of you are acutely aware of this problem, that it's just, frankly, pretty hard to keep track of all the developments that's happening in the industry. It seems like there's something new being developed on the line. So we took a step back and made a couple of key observations.

First, you know, there's undeniable interest across the company from teams all across different verticals that they want to use Generative AI. We noticed that this Generative AI, or for Generative AI specifically, this Java Python setup really wasn't working out. And finally, you know, we have to use, we realized we have to use Python no matter what, because at the end of the day, you know, we have to be on top of the latest industry trends and make sure that we, you know, make sure we're creating those benefits in our second term.

And so, we said, hey, let's make a bold bet. Let's use Python for the business logic, the engineering, the evals, the project, everything else that you would need to build a functional production application. And I was going to say we even took the next step further, which is, let's build a framework to make it the people, and specifically make a framework so that teams don't have to guess about what's the right way to do things.

Instead, they can use this framework and take out some of the guesswork for them. So, let me talk about exactly that. What was a service framework that ended up building? At high level, we use Python GRPC, and we use LinkedIn and LinkGrab to model for business logic. This is what I talked about through GRPC, so I won't go too much into detail why we chose GRPC.

But, at high level, I listed just some of the features here that we really appreciated, namely, it's built-in streaming support, it's really awesome. Binary serialization is a big performance boost, as well as just the native cross-language features, which, if you recall the couple of slides earlier, is a ton of features.

So, how do you process your support was critical. So, here I built a sample reactive agent using our application framework. So, as you see here, there's standard utilities for tool calling, highlighted in kind of a yellow there. Standard utilities for large language model inference using the R to build infertile stack.

Standard utilities for conversational memory and checkpointing. And the key thing that I'd like to note here is that we use LinkChain and LinkGraph to tie all of this stuff together. And really, LinkChain and LinkGraph form the core of the charge that we have. And then, this leads to the question of the hour is, why didn't we end up choosing LinkChain and LinkGraph or the CMO or different alternatives to build our element model or our applications?

The first thing I'd like to say is that it just plainly is really easy to use, and I'm sure everyone in this would agree here, that even what we saw was that Java engineers are able to pick it up really easily. Like, if you look at the syntax, you can pretty easily identify what's happening.

And furthermore, through various community integrations, just like a community FI supplementation or the React page and pre-built from the LinkGraph official repo, we were able to build logical apps days down weeks, which is, if you think about the number of teams that work on generic BI across the company, this is, you know, weeks, months of time that you can save your constant work.

And the second thing that we really appreciated, and I think we've heard similar things from the speakers today, and that is the LinkChain package and LinkGraph package have really sensible interfaces. And so we're able to build our internal infrastructure using these interfaces. For example, if we look at this chat model interface here, LinkedIn uses Azure OpenAI, but also uses on-premise launching models as well.

So teams, if they want to switch between model providers, they can do so with a couple lines of code, and that's that. So we zoom out on this diagram here. Essentially, what I'm describing allows us to build one of these boxes. But then, you might ask, how do we actually tie all these agents together?

And so, the two problems that we identified, specifically with agentic setups, was that, one, agents can take a lot of time to process data. This leads into the whole ambient agent idea, where, how can we model long-running ease and risk flows within our infrastructure stack? And second, is agents can execute in parallel.

And furthermore, the outputs of one agent might depend on the outputs of another agent. So how can we make sure that things are done in the right order? Does these mean to depend on this section of my topic? Which is the new infrastructure that we built, and I'll call it on an agent platform.

So the first part of our solution was to model long-running asynchronous flows. We modeled this as precisely the messaging problem. And the reason why we chose messaging is because LinkedIn already has a really robust messaging service that serves countless members every day. And so, we extended this to, also, include pdf-informed communication, mainly agent-to-agent messaging, but also user-to-agent messaging as well.

We even built a nice thing, so that there was, like, a near-line flow where messages can fail, that we automatically picked up and tried via the GMA system. So then, you know, that covers how agents can talk to each other. They can send messages to one another, but this doesn't cover how agents can actually make sure that tasks are done in fine order.

So, the second part of this solution that we developed was to explicitly build memory catered for agents. And so, in ejecting memory, you'll probably see, I think you've seen things like this from people outside booths as well, is that there's kind of different forms of memory that the agent should be able to access.

For us, it's memory is both scoped and layered. So, working memory, long-term memory, collective memory, and these each provide different functions. that the agent can utilize to do the things that the agents can do. So, for example, for a new interaction, you'll probably just fill out working memory. But over time, as the agent becomes more, has more interactions with a particular user or something like that, then more memory will be populated long-term memory over time.

So, now we've covered the arrows and the boxes. So, how do agents actually do things? Well, this kind of leads to what I was talking about before, where we've built this notion of skills, which is like function calling in a lot of senses, but it deviates this from this notion in a couple of different ways.

So, the first thing is that function calling is usually in a local tier or two box. But here, skills can be really unique. RPC calls, database queries, prompts, and more importantly, other agents. And the second thing that I'd like to emphasize here is the how. Specifically, we let agents invoke skills synchronously, but also asynchronously, which we consider the different themes that we talked about.

The asynchronously part is absolutely critical. And I'd like to think that we did a good job with the design here, because overall, the setup is pretty similar to MCPE well before it's actually now. So, and finally, we took this skill concept and made it centralized. And so, we provided, we implemented a registry, so the team services could expose skills and register them in a central place, so that if an agent wants to access skills called by another team, they can do so and discover these skills via this central registry.

Let me walk you all through what sample flow I've ever had. So, in this case, the supervisor, agent types of sourcing agent that I need help searching for the level engineer. The sourcing agent will contact the skill registry. The skill registry will respond with the skill that it thinks is a good fit for the task.

And finally, the sourcing agent will execute this skill. And like I said before, this links to the core components that we've built, too. So, again, the emphasis means making it really easy for agents to develop. And lastly, I could spend an entire hour on this topic, but we also build customer durability, because agentic ways of execution require particular observables.

To wrap up here, I'd like to cover two things. First is, I think a key emphasis that we've discovered these past couple years is really invest in productivity. This space is moving quite really fast. And if you want to make sure that you're kind of following best practices, but also available to adapt to different conditions or changes in the product, you need to make sure that it's really easy for developers to build.

And so you can do this by standardizing patterns, making sure it's as easy for any developer to contribute as possible. The second thing I'd like to emphasize is, at the end of the day, we're still building production software. You should still consider the usual things, availability, reliability, reliability, but also, again, observability is paramount.

You can't fix what you can't observe. And really, to do this, you need robust evaluations, and things that people talk about already. You should count on what you're going to support with them. So that's all. I'll end up. Thank you. Thank you, David. - This is a musical from Pedro and Brown.