From Pilot to Platform: Agents at Scale with LangGraph

. Hi, everyone. Thanks for coming. This is my talk, Agents at Scale Using Lane Graph. And today, I'd like to focus on thinking about scale in two different ways. The first kind of scale, I think, we're all familiar with. You know, as engineers, product builders, there's a sense of scale in which we want to make sure our services, our apps, are able to process tons of data, and it's performant.

The second kind of scale, I think, is a little bit more subtle. And what I'd like to focus a lot on in this presentation is, how can we scale the amount of the level of agentic adoption within our organizations and kind of make sure that everyone can generate the best ideas?

So without further ado, let's get started. I'd like to start by showcasing LinkedIn's first-ever production agent. We built an agent called LinkedIn Hiring Assistant, and it's an agent that's specifically designed to help recruiters automate various parts of their process, and really to let them spend more time having meaningful conversations with candidates.

So I'm going to play a video. It's going to go quite fast, but I promise I'll break it down step-by-step right after it finishes playing. The key thing here that I'd like to point out is precisely what we just saw. Again, I know it was fast. Precisely what we just saw where the agent does something in the background and after some period of time lets the recruiter know, lets the user know that it's finished processing and can return some data.

So like I said before, let's go through this step-by-step. So what's highlighted here in the box is the recruiter starts off the process by describing the kind of job that they'd like to fill. In this case, they're looking for an experienced growth marketer, and they've attached, you know, various documents that kind of describe more about what the position they're actually trying to fill.

Next, the agent will automatically generate different qualifications based off of what the kind of recruiter user initially inputted, but also based on the supplementary documents they provided. Next, the agent will take its time, let the recruiter know that, hey, I'm going to be working on this, and I'll come back to you after some period of time until finally the recruiter gets notified that there are different candidates that they can review, and finally if they click on that, they'll be taken to a detailed list view page where they can review the candidates that the agent has sourced.

And so this really is, I think, if we look at kind of the different topics that have been discussed and interrupt so far, this is really kind of following the ambient agent pattern that different speakers and companies have adopted. So under the hood, what this looks like is a pretty traditional supervisor multi-agent architecture where there's a supervisor agent that coordinates between different sub-agents, and each sub-agent is allowed to interact with existing LinkedIn services, systems, via tool calling.

As I'll share a bit more in this presentation, it's not just tool calling, we have something called skills, but like I said, I'll get to that when we get to that. So this is the overall plan for this talk here. First, I'd like to talk about how LinkedIn really standardized on using Python.

Traditionally, we used Java to build all of our apps, but we decided with Gen.AI, we would take the bold move to just build everything in Python. Next, at the library level, I'll talk about how we did a bunch of research and vetted and finally incorporated different open source libraries and really built standard utils on top of the ones that we really liked.

For example, LangChain and LangGraph. Next, if we zoom out a bit, I'll talk about how we built an app framework to make it really easy for teams to bank production-ready Python services. And I think we've been pretty successful so far. Over 20 teams used this framework, and I think this is honestly probably a lower estimate at this point.

Over 30 services have been created to support Gen.AI product experiences on LinkedIn. And finally, if we kind of zoom out at an entire architecture level, at a system level, we invested into new distributed architecture to specifically support agentic modes of communication. So, let's start from the ground up, and I'll talk about why we chose Python.

So, at a 10,000-foot view, these are the languages that we use here at LinkedIn. Up until, I would say, late 2022, Python was really -- and I like to call it Python here -- Python we use mostly just for internal tooling, different internal productivity tools, but also big data applications, your kind of PySpark, offline jobs.

But really, Java was used to build a vast, overwhelming majority of our business logic. And so, come late 2022, which was really LinkedIn's first foray into Gen.AI, we saw that, hey, we're already using Java for non-Gender.AI use cases. Let's just use Java for Gen.AI also. And some of you might be wincing.

I can already anticipate that. So, at that time, we built some really basic apps. They are just, you know, simple prompts with basic prompt engineering, little to no rag, no conversational memory to speak of. And this was okay, and probably you've probably predicted this part, but it was okay until it really wasn't.

What we saw was a lot of teams, they wanted to experiment with Python. They wanted to use Python for prompt engineering for the evaluations, but because of our stack, they were forced to build in Java and use that to build their services. So, not only was that a problem, the more fundamental problem that we faced was how do you experiment with the open source libraries, the open source community, if your stack is fundamentally non-Python.

And as soon as there is that kind of language gap, then it becomes really difficult for teams, honestly, to innovate and iterate with the latest and greatest techniques. It seemed like, you know, every month, every week, there would be a new model being released, a new library, a new prompt engineering technique, a new protocol.

And I'm sure all of you are acutely aware of this problem, that it's just, frankly, pretty hard to keep track of all the developments that's happening in the industry. It seems like there's something new being developed all the time. So, we took a step back, and we made a couple of key observations.

First, you know, there's undeniable interest across the company, from teams all across different verticals that they want to use generative AI. We noticed that for generative AI, specifically, this Java/Python setup really wasn't working out. And finally, you know, we have to use -- we realized we had to use Python no matter what.

Because at the end of the day, you know, we have to be on top of the latest industry trends, and make sure that we can, you know, make sure we're bringing those benefits into our stack internally. And so, we said, hey, let's make a bold bet. Let's use Python for the business logic, the engineering, the evals, and pretty much everything else that you would need to build a functional production application.

And I would say we even took the next step further, which is, let's build a framework and make it the default. And specifically, make a framework so that teams don't have to guess about what's the right way to do things. Instead, they can use this framework and take out some of the guesswork for them.

So, let me talk about exactly that. What was the service framework that we ended up building? At a high level, we used Python gRPC, and we used LinkedIn and Lingraph to model the core business logic. This isn't a talk about gRPC, so I won't go too much into detail why we chose gRPC.

But at a high level, I listed just some of the features here that we really appreciated. Namely, its built-in streaming support was really awesome. Binary serialization is a big performance boost. As well as just the native cross-language features, which, if you recall the couple slides earlier, we use a ton of languages.

So, having cross-language support was critical. So, here I've built a sample React agent using our application framework. So, as we see here, there's standard utilities for tool calling, highlighted in kind of the yellow there. Standard utilities for large language model inference using our internal inferencing stack. Standard utilities for conversational memory and checkpointing.

And the key thing that I'd like to note here is that we use LinkedIn and Lingraph to tie all this stuff together. And really, LinkedIn and Lingraph form the core of each of our general VI apps. So, then, this leads to the question of the hour is, you know, why did we end up choosing LinkedIn and Lingraph over the sea of other kind of different alternatives to build and model our applications?

The first thing that I'd like to say is that it's just plainly really easy to use, and I'm sure everyone in this room would agree here, that even what we saw was that Java engineers were able to pick it up really easily. Like, if you look at the syntax, you can pretty easily identify what's happening in the kind of this most basic construct here.

And furthermore, through various community integrations, just like the community FICE implementation or the React agent pre-built from the Lingraph official repo, we were able to build non-trivial apps in days down from weeks. Which is, if you think about the number of teams that work on Generative VI across the company, this is, you know, weeks, months of time being saved across the board.

And the second thing that we really appreciated, and I think we've heard similar themes from different speakers today, and that is the LingChain package and Lingraph package have really sensible interfaces. And so, we were able to model our internal infrastructure using these interfaces. For example, if we look at this chat model interface here, LinkedIn uses Azure OpenAI, but also uses on-premise large language models as well.

So, teams, if they want to switch between model providers, they can do so with a couple lines of code, and that's that. So, if we zoom out, recall this diagram here. Essentially, what I've described is it allows us to build one of these boxes. But then, you might ask, how did we actually tie all these agents together?

And so, the two problems that we identified, specifically with agentic setups, was that, one, agents can take a lot of time to process data. This leads into the whole ambient agent idea, where, how can we model long-running asynchronous flows within our infrastructural stack? And second is, agents can execute in parallel.

And furthermore, the outputs of one agent might depend on the outputs of another agent. So, how can we make sure that things are done in the right order? This leads me into the penultimate section of my talk here, which is the new infrastructure that we built, and I'll call it, called our agent platform.

So, the first part of our solution was, to model long-running asynchronous flows, we model this as precisely a messaging problem. And the reason why we chose messaging is because LinkedIn already has a really robust messaging service that serves countless members every day. And so, we extended this to also include agentic forms of communication, namely agent-to-agent messaging, but also user-to-agent messaging as well.

We even built some nice things so that there's like a nearline flow where if messages get failed, they'll be automatically picked up and retried via a queuing system. So, then, that covers how agents can talk to each other, they can send messages to one another, but this doesn't cover how agents can actually make sure that tasks are done in the right order.

So, the second part of this solution that we developed was to specifically build memory catered for agents. And so, our agentic memory, you'll probably see, I think you've seen themes of this from people outside the booths as well, is that there's kind of different forms of memory that the agent should be able to access.

For us, it's memory is both scoped and layered. So, there's working memory, long-term memory, collective memory, and these each provide different functions that the agent can utilize to do the things that it needs to do. So, for example, for a new interaction, you'll probably just fill out working memory.

But over time, as the agent has more interactions with a particular user or something like that, then more memory will be populated to long-term memory over time. So, now we've covered the arrows and the boxes. So, how do agents actually do things? Well, this kind of leads to what I was talking about before, where we've developed this notion of skills, which is like function calling in a lot of senses.

But it deviates us from this notion in a couple of different ways. The first thing is that function calling is usually, you know, local to your box. But here, skills can really be anything. It can be RPC calls, database queries, prompts, and more importantly, other agents. And the second thing that I'd like to emphasize here is the how.

Specifically, we let agents invoke skills synchronously, but also asynchronously, which we consider the different themes that we've talked about. The asynchronicity part is absolutely critical. And I'd like to think that we did a good job with the design here, because overall, this setup is pretty similar to MCP well before it was actually around.

So, and finally, we took this skill concept and made it centralized. And so, we provided, we implemented a registry, so team services could expose skills and register them in a central place, so that then if an agent wants to access skills developed by another team, they can do so and discover these skills via this central registry.

Let me walk you all through what a sample flow might look like. So, in this case, the supervisor agent tells the sourcing agent that I need help searching for a mid-level engineer. The sourcing agent will contact the skill registry. The skill registry will respond with a skill that it thinks is a good fit for the task.

And then finally, the sourcing agent will execute this skill. And like I said before, this links to the core components that we've built, too. So, again, the emphasis being to make it really easy for agents to develop. And lastly, I could spend an entire hour on this topic, but we also build custom observability because agentic ways of execution require particular observability solutions.

So, to wrap up here, I'd like to cover two things. First is I think a key emphasis that we've discovered over these past couple of years is to really invest in productivity. This space is moving incredibly fast, and if you want to make sure that you're kind of following the best practices, but also available to adapt to different conditions or changes in the product, you need to make sure that it's really easy for developers to build.

And so, you can do this by standardizing patterns, making sure it's as easy for any developer to contribute as possible. And the second thing I'd like to emphasize is, at the end of the day, we're still building production software. You should still consider the usual things, availability, reliability, but also, again, observability is paramount.

You can't fix what you can't observe. And really, to do this, you need robust evaluations with the things that people have talked about already. You should account for non-deterministic workload. So, that's all, and thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.

From Pilot to Platform: Agents at Scale with LangGraph

Transcript