LangChain Interrupt 2025 Aladdin Copilot

00:00:00.000 | ...how they build their own land and build houses.

00:00:02.020 | Like I said, the microphone is open, and you're open, and you're open, and you're open.

00:00:05.800 | Here's the microphone.

00:00:06.340 | Good afternoon, everyone.

00:00:19.000 | My name is Brennan Rosales, and I am an AI engineering lead at BlackRock.

00:00:23.200 | I'm joined today by my friend and colleague, Pedro Vicente Valdez, principal AI engineer.

00:00:29.260 | We're extremely excited to be here today to share with you how BlackRock is leveraging AI

00:00:35.280 | to really transform the investment management space.

00:00:38.300 | We're going to be focusing today on the Aladdin platform and the agentic architecture supporting Aladdin.

00:00:45.380 | So let's start out with just a quick primer on what is BlackRock and what is Aladdin.

00:00:51.960 | BlackRock is one of the world's beating asset managers.

00:00:55.720 | It was over $11 trillion in AUM.

00:00:59.420 | And ultimately, our main goal is to help people experience financial well-being.

00:01:04.760 | And the Aladdin platform is key to our ability to be able to deliver on that goal.

00:01:10.780 | Aladdin is a proprietary technology platform that unifies the investment management process.

00:01:17.340 | It provides asset managers with a whole portfolio solution that gives them access to both public

00:01:24.440 | and climate markets, and allows them to cover the needs of both institutional and retail investors.

00:01:30.540 | This system is built by the industry for the industry.

00:01:35.180 | BlackRock is the biggest user of Aladdin, but we also sell and support hundreds of clients globally

00:01:42.700 | across 70 countries, all using the Aladdin platform.

00:01:45.940 | The Aladdin organization is made up of around 7,000 people, with over 4,000 of those being

00:01:52.400 | and these engineers build and maintain the around 100 Aladdin front-end applications that thousands

00:01:59.920 | of people can use every day.

00:02:03.700 | We see three key outcomes at the BlackRock firm level.

00:02:18.900 | We would really like to increase everyone's productivity.

00:02:21.500 | We would love for AI to try an alpha generation and really give people a much more personalized

00:02:27.580 | experience with the platform.

00:02:29.700 | One way we're realizing this is through the Aladdin co-pilot initiative.

00:02:34.200 | The Aladdin co-pilot application is currently embedded across all of those 100 front-end applications

00:02:41.320 | that are covered through the network.

00:02:42.400 | This co-pilot acts as the connective tissue across the Aladdin platform, proactively servicing relevant

00:02:49.800 | content for you at the right moment and fostering productivity at each step along the way.

00:02:55.280 | We like to think about three, four value drivers within the scope of Aladdin co-pilot.

00:03:01.020 | The first is that we want every user using the Aladdin software to become an Aladdin expert.

00:03:06.740 | And we can do that by enabling a much more intuitive approach.

00:03:10.860 | The second is really has to do with increasing the personalization and customization of this platform.

00:03:18.160 | And through that, our users will be able to unlock new efficiencies to highly configurable experiences.

00:03:25.120 | And the third one is really just to democratize access to all of these insights.

00:03:29.740 | If our users, if it is much easier for our users to find the data, the data points that they need,

00:03:35.920 | then they will, at the end of the day, be much more enabled to make a lot better decision.

00:03:40.160 | So here is a high-level overview of the actual architecture supporting the Aladdin co-pilot today.

00:03:47.260 | I'll take you through kind of the life cycle of a user query.

00:03:51.360 | Before I dive into that, I want to point your attention to the top right here.

00:03:55.480 | And this component is called Plugin Ventures.

00:03:58.240 | I referenced this earlier, but there are, let's say, 50, 60 Aladdin engineering teams

00:04:03.820 | that cover specific domains throughout the platform.

00:04:06.920 | You can think of an example of one of these teams, maybe the trading team.

00:04:10.860 | They own all of the services that serve trading data to all Aladdin applications.

00:04:15.980 | Pedro and I are not experts in these, you know, business, financial domains.

00:04:23.340 | Our job is to make it as easy as possible for these engineers to plug their functionality

00:04:28.480 | that they own into this system.

00:04:30.560 | And we do that through this plug-in registry.

00:04:33.300 | Today, we allow these development teams two ways to onboard.

00:04:38.320 | The first, we come to define a tool as kind of mapping one-to-one

00:04:43.880 | through an existing Aladdin API already out in production.

00:04:47.360 | And then the second onboarding path that we give to these development teams

00:04:51.580 | for a very complex workflows is the ability to spin up a custom agent

00:04:56.300 | and also plug that into the system.

00:04:58.180 | I think one important call-out here, we kind of started designing this thing

00:05:02.420 | two and a half years ago.

00:05:03.980 | And back then, we needed to figure out kind of a standardized agentic communication protocol.

00:05:09.200 | And we, you know, we came up with a very slim, very specific version of that for Aladdin.

00:05:15.980 | But as new, more established protocols like PlanChain's agent protocol and 8A come out,

00:05:21.420 | we're kind of actively evaluating those solutions.

00:05:23.860 | So now I can point your attention to the left side of the screen.

00:05:28.080 | I'm not in the way.

00:05:29.760 | So we're going to start out with a query of what is my exposure to aerospace in portfolio one.

00:05:35.740 | At this point in time, when the user submits their query, we have a lot of very useful and relevant context.

00:05:42.340 | Some of those could be which Aladdin application they're asking this question from,

00:05:47.280 | what they actually have up on the screen, what portfolios, what assets, what widgets are they actually looking at at that point in time.

00:05:54.460 | And in addition to that, there's also kind of this set of predefined global settings throughout these Aladdin apps.

00:06:00.980 | And ideally, we are respecting those preferences as we make all these calls on behalf of the user.

00:06:07.500 | So once a query gets submitted, we enter our orchestration graph.

00:06:12.040 | From day one, it's been built using LinkChain.

00:06:15.680 | So the first node in that graph is an input card drill node.

00:06:19.020 | And it covers a lot of the needs for our responsible AI moderation.

00:06:23.180 | At this point in the graph, we're trying to find off-topic and toxic content and deal with the program needs.

00:06:33.640 | We would also, at this point, find what data file of relevant PII and make sure that we're handling that as carefully as possible.

00:06:41.520 | So once we get to that node, we go into this filtering and access control node.

00:06:45.520 | And at that, that's the point in time where we have this, you know, set of agents and tools,

00:06:50.100 | a couple thousand agents and tools that are registered in the plugin registry.

00:06:53.320 | When teams onboard this functionality to the registry,

00:06:57.760 | they have complete control over which environment, environments this plugin is enabled in,

00:07:03.200 | which users have access to this, and even which Aladdin apps can even access these plugins.

00:07:09.540 | Within this node, we also, it's important for us to try our best to decrease the kind of searchable universe of all of these plugins.

00:07:17.440 | Because once we get to our planning step, you know, if we're sending more than 40, 50 tools,

00:07:22.640 | we probably won't have that good enough control.

00:07:25.100 | So we leave that filtering and access control node with some set of, say, 20 to 30 tools and agents.

00:07:32.440 | And we enter our orchestration node.

00:07:33.920 | There, we're very reliant on GPT-4 function calling today.

00:07:38.320 | We essentially just iterate over a planning and action node.

00:07:41.820 | We go through that as many times as possible.

00:07:44.160 | And at some point, you know, GPT-4 says, hey, you got the answer.

00:07:47.580 | You'll give it back to the user.

00:07:48.620 | Or, you know, I might even find this answer.

00:07:50.920 | We get that final answer.

00:07:53.140 | We pass it back out.

00:07:54.260 | You're out of the card rail nodes.

00:07:56.000 | Trying to protect hallucinations as best as we can.

00:07:58.880 | And then we end up on the left side and tell the users that their exposure to the air is mostly effective in this portfolio is 5%.

00:08:07.640 | So that's a bit of high-level information on the architecture of the system.

00:08:12.580 | I'm going to pass the pager here to talk about how we think about evaluating the system.

00:08:16.800 | Yeah, so I think, like, one thing is obvious from the presentations that we've been seeing.

00:08:22.960 | Like, we're all using Supervisor.

00:08:24.320 | I think, like, I've seen very little agent-to-agent autonomous conversations.

00:08:29.380 | And we are just one more of those presentations using Supervisor.

00:08:32.520 | Why?

00:08:32.800 | Because it's very easy to build.

00:08:35.140 | It's very easy to complete it, but it's very easy to test.

00:08:37.780 | So hopefully, hopefully next year we are doing lots of those agents.

00:08:41.300 | But how do we evaluate the system that we just presented?

00:08:43.820 | So, Brendan walked you through the life cycle of the query.

00:08:47.080 | There's many places where you can make a wrong decision.

00:08:52.060 | It can be in the initial moderation, in the card rails, in the visual components of the card rails.

00:08:57.040 | On the filtering, did we not produce the universe correctly?

00:09:00.460 | Did the orchestra work on turn at some point?

00:09:04.340 | All the way back to the hallucination block, something that was not actually hallucination.

00:09:08.100 | So, so much, so much, so many systems in place of how exactly do we gain confidence.

00:09:13.720 | This, this a lot of the puzzle is deployed in all our clients.

00:09:17.340 | So, this is actually production, orchestration, built on a line, right, like, like Brendan said.

00:09:24.100 | So, this is just an example of a lens with trace, just for one single tool call, right, like we go to input card rails, filtering, planning, the actual tool calling, communicating with the other agents, coming back, planning, let's see if we answer the question, some output card rails, custom to our domain, and hallucination checks.

00:09:44.740 | So, we first start with system prompts, and I know we've been hearing about evaluation, we're going to keep telling you, I think, the same lessons are learned across all of us, I think it's very evident that, similar to how in traditional coding you are doing as human development, it's no different in the world of LLMs, you have to do evaluation of the development.

00:10:04.400 | So, one of the things where we start is the LLMs, where are we calling LLMs, we have a system prompt, why are we writing what we are writing in that system prompt to begin with?

00:10:14.140 | So, what we do is, we are, and you can find us after all, so we are very paranoid about telling the user the wrong things, so we start by testing every single behavior that we are intending by writing that in the system prompt.

00:10:30.400 | So, for example, if I write, you're all equal value, you must never provide investment advice, what is investment advice, I need to generate a lot of synthetic data, I need to generate a lot of data with the subject matter experts, I need to call it all of that and build an evaluation platform and make sure that we are never going to give investment advice.

00:10:47.540 | And again, this is a very dummy example, but we do this for every system prompt, we do this for every intended behavior, we put into our system prompt.

00:10:54.540 | I know I guess, but we don't want to have a Chevrolet bone or something, right, so, for that, we have a lot of LLMs in the judge to specifically evaluate that we are doing the right thing, so we do this for every system prompt that we have.

00:11:09.540 | At the end of the day, we have a report. This is a very important part of our CI/CD pipeline, it runs every day, it runs on every PR. Why?

00:11:17.200 | Because we are a bunch of developers, every day we are improving the system, every day we are releasing to our development environment, and you want to know if you're breaking stuff, you want to know if you're reducing the performance of the system.

00:11:29.200 | I know it's very easy to chase your own tail with LLM, so this is exactly what lets us go really fast in the area.

00:11:36.860 | Now, this is only the system prompt.

00:11:38.860 | When we talk about the orchestration, agent-to-agent communication, agent-to-API, how do we make sure that we fill out the right parameters in those API calls,

00:11:46.860 | that we send the right tracing to the other agents in that communication, then it's when we get this end-to-end testing capability to all the allotting developers and engineers that we have.

00:12:00.520 | So, we start with the configuration later, and this is what we allow developers to basically configure the testing scenario.

00:12:07.520 | We have a lot of applications, like Brendan said, what allotting application are you in? What is the user seeing in that stream content? What portfolios are loaded? What assets are loaded? What widgets are enabled?

00:12:18.180 | Then, in the application context, they often have some settings in the system that they want respected. We need to make sure that we take those into account during the planning phase when calling those APIs.

00:12:29.180 | And, of course, we allow them to customize multi-trance scenarios for that testing. Like, for example, here's a little bit of history, the query, and the final response.

00:12:37.840 | So, this is for the configuration layer, the query, the response, the chat history, and in the next step, we ask them to provide us the solution layer.

00:12:46.840 | Now, for every plugin that comes into a system, we ask them for parameters. That's basically how me and Brendan are able to guarantee that allotting the product is going to be performing and be able to route in whatever scenario, in whatever context.

00:12:57.840 | So, we are very dependent on that capture data. And, in the solution layer, it's where they give us how are we supposed to solve the user query that you're giving us in that graph.

00:13:07.500 | It might take separate threats. It might take separate unrelated threats. So, for example, in Threat 1, I need to get an exposure. In Threat 2, I need to get how much can they actually, let's say, buy it.

00:13:19.500 | I want to know how much of BlackRock you can buy in your portfolio. I need to know how many shares I can buy to be compliant.

00:13:25.160 | And then I need to know how much cash I have in that portfolio. So, this structure allows us to test those interactions, multi-step, multi-turn, and have some confidence in our system that we're doing the right thing.

00:13:39.160 | And, there's the little Apple accounts that's being used to sign into any of those types of data here.

00:13:47.820 | So, it's not letting me change. But anyway, the next slide is basically just a lesson like this end-to-end about we run this every day, every night to make sure we're routing correctly, to make sure that we are solving the queries correctly.

00:14:05.920 | We tell the teams today, this is how we're performing for you. This is how we're performing for you. So, we're working on federating the agent to development of the data.

00:14:14.580 | Can they fix it?

00:14:15.580 | Yes. So, all of this is just to say, as many of us have been saying, that evaluation during development is very important, because, especially when you're trying to federate in a large enterprise, otherwise, hey, my individual query is not working, hey, this other query is not working, you need a statistical way of saying that your system is working, is improved.

00:14:34.680 | It's not deteriorating, it's not deteriorating, so that you can continue to build the scalable products. Otherwise, it's very hard.

00:14:41.680 | So, again, I think like the biggest lesson I have as an engineering lead in the organization is that having this as part of your, having this end-to-end evaluation as part of your CI/CD processes, it's very important for you to know what your business is doing.

00:14:57.680 | That being said, yeah, thank you so much.

LangChain Interrupt 2025 Aladdin Copilot – Pedro Vicente Valdez