back to indexHow BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

Chapters
0:0
0:30 Introduction to BlackRock's AI Initiatives
1:31 Classifying AI Applications
2:22 Use Case: New Issue Operations
3:59 Challenges with Scaling AI Knowledge Apps
7:2 Architecture of BlackRock's AI Framework
8:32 Demonstration of the Sandbox
15:52 Key Takeaways from the Discussion
00:00:17.000 |
Infant, director of engineering at BlackRock. 00:00:19.000 |
This is my colleague, Viber principal engineer, 00:00:21.000 |
and we both work for the data teams at BlackRock. 00:00:25.000 |
how we can scale building custom applications in BlackRock. 00:00:28.000 |
Specifically, we're talking about AI applications 00:00:32.000 |
So just to level set before I get into the details. 00:00:39.000 |
What we do is our portfolio managers, analysts, 00:00:41.000 |
get a torrent of information on a daily basis. 00:00:50.000 |
which ultimately results in a particular trade. 00:01:02.000 |
that the investment managers actually perform 00:01:07.000 |
So these teams are kind of responsible for, like, 00:01:09.000 |
acquiring the data that you kind of need, right? 00:01:11.000 |
To actually executing a trade, running through compliance, 00:01:14.000 |
all the way to, like, all of the post-trading activities, right? 00:01:17.000 |
So all of these teams actually have to build these internal tools 00:01:21.000 |
that are actually fairly complex for each of their domains. 00:01:25.000 |
So building apps and pushing out these apps relatively quickly 00:01:33.000 |
if you actually classify what kind of apps we are talking about, 00:01:35.000 |
what you'll see is that it kind of falls into, like, 00:01:39.000 |
One is everything to do with document extraction. 00:01:43.000 |
I kind of want to, like, extract entities out of it in that bucket. 00:01:47.000 |
"Hey, I kind of want to define a complex workflow or an automation." 00:01:51.000 |
So I could have a case where I kind of want to run through 00:01:54.000 |
X number of steps and then integrate to my downstream systems. 00:01:57.000 |
And then you have the normal, like, Q&A-type systems 00:02:00.000 |
that you look at, like, this is your chat interfaces. 00:02:06.000 |
So in each of these domains, what we see is we have this, like, 00:02:10.000 |
big opportunity to leverage your models and LLMs to either 00:02:14.000 |
augment our existing systems or, like, kind of, like, 00:02:19.000 |
So that is, like, the domain we are speaking about. 00:02:21.000 |
So I'll move quickly to one particular use case. 00:02:24.000 |
So this is a use case that came to us, like, about, like, 00:02:30.000 |
And we have a team within the investment operations space. 00:02:36.000 |
So this team is kind of responsible for setting up securities 00:02:43.000 |
Or, like, there is, like, a stock split for a particular 00:02:47.000 |
The team actually has to take the security and they have to 00:02:49.000 |
set it up in our internal systems before our portfolio 00:02:52.000 |
managers or traders can actually action upon it. 00:02:56.000 |
So we kind of have to build this tool for the investment 00:03:02.000 |
This is, like, actually, honestly, this is, like, 00:03:05.000 |
But at a super high level, we have to build an app that is 00:03:08.000 |
able to, like, ingest your prospectus or a term sheet. 00:03:16.000 |
And these are, like, your business teams, your equity teams, 00:03:19.000 |
They actually know how to set up these complex instruments. 00:03:25.000 |
And now that team works with the engineering teams to actually 00:03:28.000 |
build this transformation logic and the like. 00:03:31.000 |
And then integrate it with our downstream applications. 00:03:33.000 |
So you can see that this process actually takes a long time. 00:03:37.000 |
So building an app and then you're introducing new model providers. 00:03:40.000 |
You're trying to put in, like, new strategies. 00:03:42.000 |
There are a lot of challenges to get a single app out. 00:03:48.000 |
Doesn't quite work right now because of the complexity and the 00:03:52.000 |
domain knowledge that's imbued in the human head. 00:03:56.000 |
So the big challenges with scale are, again, these three categories. 00:04:02.000 |
One is, you're spending a lot of time with our domain experts. 00:04:08.000 |
So in the first phase where we have to extract these documents. 00:04:13.000 |
Your prompt itself, in our simplest case, like, started with a couple 00:04:17.000 |
Before you knew it, you're trying to describe this financial 00:04:19.000 |
instrument and it's, like, three paragraphs long. 00:04:23.000 |
So there's this challenge of, like, hey, I have to iterate over 00:04:29.000 |
And I think even the previous speaker had mentioned, you kind of 00:04:31.000 |
need to eval and have this data set, how good is your prompt 00:04:35.000 |
So that's the first set of challenges in creating, like, 00:04:41.000 |
Second set of challenges is around, like, LLM strategies. 00:04:45.000 |
What I mean by this is, like, when you're building an AI app, 00:04:48.000 |
so to speak, you have to choose what strategy. 00:04:52.000 |
Am I going to use, like, a drag-based approach? 00:04:55.000 |
Or am I going to use a chain-of-thought-based approach? 00:04:57.000 |
Even for a simple task of, like, data extraction, depending on what 00:05:00.000 |
your instrument is, this actually varies very highly. 00:05:04.000 |
If you take, like, an investment corporate bond, like the 00:05:08.000 |
I can do this with, like, in context, pass it to model, I'm 00:05:11.000 |
able to get my stuff back if the document size is small. 00:05:15.000 |
Some documents are, like, thousands of pages long, 10,000 pages 00:05:18.000 |
Now suddenly you're like, oh, okay, I don't know if I can pass 00:05:20.000 |
more than a million tokens into, say, the open AI models. 00:05:26.000 |
Then, okay, I need to choose a different strategy. 00:05:28.000 |
And often what we do is we have to choose different strategies 00:05:32.000 |
and kind of mix them with your prompts to kind of build this 00:05:35.000 |
iterative process where, like, I have to play around with my 00:05:38.000 |
prompts, I have to play around with the different LLM strategies, 00:05:40.000 |
and we kind of want to make that process as quickly as well. 00:05:44.000 |
Then you have, obviously, the context limitations, model 00:05:46.000 |
limitations, different vendors, and you're trying and 00:05:54.000 |
Then the biggest challenge is, like, okay, fine. 00:06:03.000 |
You have your traditional challenges, which has to do with 00:06:09.000 |
But then in the AI space, it's like you have this new challenge 00:06:12.000 |
of, like, what type of cluster am I going to deploy this to? 00:06:16.000 |
So our equity team would come and say something like, hey, 00:06:18.000 |
I need to analyze, you know, 500 research reports. 00:06:25.000 |
If you're going to do that, I probably have to have, like, 00:06:26.000 |
a GPU-based inference cluster that I can kind of spin up. 00:06:30.000 |
This is the use case that I kind of described, which is the 00:06:33.000 |
In that case, what we do is, okay, I don't really want to use 00:06:38.000 |
What I do instead is I use, like, a burstable cluster. 00:06:42.000 |
So all those have to be kind of, like, defined so that our app 00:06:47.000 |
deployment phase is, like, as close to, like, a CI/CD pipeline 00:06:52.000 |
So these are, again, it's not an exhaustive list. 00:06:55.000 |
I think what I'm trying to highlight is the challenges with 00:07:00.000 |
So what we did at BlackRock is what I'm going to do is I'll 00:07:04.000 |
kind of give you a high-level architecture, and then maybe 00:07:07.000 |
you can dive into the details and mechanics of how this works 00:07:11.000 |
and how we are able to build apps relatively quickly. 00:07:14.000 |
We're able to -- we took this -- an app took us close to, like, 00:07:17.000 |
eight months -- somewhere between three to eight months to 00:07:19.000 |
build a single app for a complex use case, and we're able to 00:07:22.000 |
compress time, bring it down to, like, a couple of days. 00:07:25.000 |
We achieved that by building up this framework. 00:07:28.000 |
What I kind of want to focus on is on the top two boxes that 00:07:31.000 |
you see, which is your sandbox in your app factory. 00:07:34.000 |
So to the -- the data platform and the developer platform, 00:07:38.000 |
it's like -- the names suggest, hey, the platform is for 00:07:41.000 |
someone for ingesting data, et cetera, right? 00:07:43.000 |
You have an orchestration layer that has a pipeline that kind 00:07:45.000 |
of, like, transforms it, brings it into some new format, 00:07:48.000 |
and then you kind of distribute that as an app or a report. 00:07:51.000 |
What kind of accelerates app development is, like, if you are 00:07:55.000 |
able to federate out those pain points or those bottlenecks, 00:07:59.000 |
which is, like, prompt creation or extraction templates, 00:08:04.000 |
Having extraction plans, or, like -- and then building out 00:08:07.000 |
these logic pieces, which you're calling transformer and 00:08:11.000 |
If you can get that sandbox out into the hands of the domain 00:08:14.000 |
experts, then your iteration speed becomes really fast. 00:08:18.000 |
So you're kind of saying that, hey, I have this modular component. 00:08:20.000 |
Can I move across the situation really quickly? 00:08:22.000 |
And then pass it along to an app factory, which is, like, 00:08:25.000 |
our cloud-native operator, which takes a definition and 00:08:35.000 |
So what I'm going to show you guys is pretty slimmed down 00:08:40.000 |
version of the actual tool we used internally. 00:08:44.000 |
So to start with, when the -- so we have, like, two different 00:08:49.000 |
One is the sandbox, another one is the factory. 00:08:52.000 |
So think of sandbox as a playground for the operators to 00:08:55.000 |
sort of, like, quickly build and refine the extraction 00:08:58.000 |
templates, sort of run the extraction on the set of 00:09:01.000 |
documents, and then compare and contrast the results of 00:09:05.000 |
So it's sort of, like, to get started with the extraction 00:09:08.000 |
template itself, you might have seen in the other tools, 00:09:11.000 |
both closed and open source, they have similar concept like 00:09:14.000 |
prompt template management, where you have certain fields that 00:09:17.000 |
you want to extract out of the documents, and you have their 00:09:20.000 |
corresponding prompts and some metadata that you can 00:09:22.000 |
associate with them, such as the data type that you expect of 00:09:27.000 |
But when these operators sort of, like, trying to run 00:09:30.000 |
extractions on these documents, they need far more sort of, 00:09:33.000 |
like, greater configuration capabilities than just, like, 00:09:36.000 |
configuring prompts and configuring the data types that they 00:09:40.000 |
So they need, like, hey, I need to have multiple QC checks on 00:09:45.000 |
I need to have a lot of validations and constraints on the fields. 00:09:49.000 |
And there might be, like, inter-field dependencies, what the fields 00:09:54.000 |
So as Infant mentioned, with the new security operation issuance 00:09:59.000 |
basically onboarding that stuff, there could be a case where the 00:10:04.000 |
security or the bond is callable, and you have other fields such as 00:10:08.000 |
call data and call price, which now needs to have a value. 00:10:10.000 |
So there is, like, this inter sort of, like, field dependencies that 00:10:13.000 |
operators sort of, like, need to -- they need to take that into 00:10:18.000 |
So here's, like, what a sample extraction template looks like. 00:10:24.000 |
So here is how a -- again, this is an example template where we 00:10:29.000 |
have, like, issuer, callable, call price, and call data, this field 00:10:33.000 |
And to sort of, like, add new fields, we will define the field name, 00:10:36.000 |
define the data type that is expected out of that, define the 00:10:41.000 |
Not every time you want to sort of, like, run an extraction for 00:10:45.000 |
a field, there might be a derived field that operator expects, 00:10:48.000 |
which is sort of, like, populated through some transformation 00:10:52.000 |
And once -- again, whether the field is required and the field 00:10:57.000 |
Here is where you define what sort of, like, dependencies this field 00:11:04.000 |
The next thing is the document management itself. 00:11:06.000 |
So this is where the documents are ingested from the data 00:11:11.000 |
platform, they are tagged according to the business category, 00:11:14.000 |
and they're labeled, they're embedded, all of that stuff. 00:11:17.000 |
Okay, while -- I think while Wiver kind of brings it up, so I 00:11:21.000 |
think what -- in essence what we're saying is we have kind of 00:11:23.000 |
built this tool which has, like, a UI component and, like, a 00:11:26.000 |
framework that actually lets you take these different pieces and 00:11:30.000 |
these modular components and give it to the hands of, like, the 00:11:34.000 |
domain experts to build out their app really quickly, right? 00:11:41.000 |
So let me just sort of walk you guys the -- what happens next. 00:11:44.000 |
So, like, once you have set up the extraction templates and 00:11:47.000 |
documents management, the operators basically run the 00:11:50.000 |
That's where they basically see the values that they expect from 00:11:54.000 |
these documents and sort of, like, review them. 00:11:57.000 |
The thing we have seen with these operators trying to use other 00:12:05.000 |
The thing we have seen with these operators is that most of the 00:12:09.000 |
tools that they have used in past, these tools basically does 00:12:16.000 |
But when it comes to, like, hey, I need to now use this result 00:12:20.000 |
that has been presented to me and pass it to the downstream 00:12:24.000 |
processes, the process right now is very manual, where they have 00:12:27.000 |
to, like, download a CSV or JSON file that run manual or add up 00:12:31.000 |
transformation and then push it to the downstream process. 00:12:34.000 |
So what we have done -- and, again, I can't show you -- but what 00:12:37.000 |
we have done is, like, build this sort of, like, low-code, no-code 00:12:40.000 |
framework where the operators can basically essentially run the -- 00:12:45.000 |
sort of build this transformation and execution workflows and sort 00:12:50.000 |
of, like, have this end-to-end pipeline running. 00:12:53.000 |
I think, yeah, so I think we'll conclude by saying that, like, key 00:12:59.000 |
I would say there are, like, three key takeaways. 00:13:01.000 |
Invest heavily on your, like, prompt engineering skills for your domain 00:13:04.000 |
experts, especially in, like, the financial space and world. 00:13:07.000 |
Defining and describing these documents is really hard, right? 00:13:10.000 |
A second is, like, educating the firm and the company on what an LLM 00:13:14.000 |
strategy means and how to actually fix these different pieces for your 00:13:19.000 |
And I think the third one I would say is, hey, the key takeaway 00:13:22.000 |
that we had is all of this is great in experimentation and prototyping mode, 00:13:26.000 |
but if you kind of want to bring this, you have to really evaluate what your ROI 00:13:29.000 |
is and is it going to be, like, more expensive actually spinning up an AI app 00:13:33.000 |
versus just having, like, an off-the-shelf product that does it quicker 00:13:38.000 |
So those are the three key takeaways in terms of, like, building apps at scale. 00:13:43.000 |
And what we have realized was, like, hey, this notion of, like, human in the loop. 00:13:48.000 |
And the one more thing I'll add is human in the loop is super important, right? 00:13:51.000 |
We all are, like, really tempted, like, let's go all agent tech with this. 00:13:54.000 |
But in the financial space with compliance, with regulations, you kind of need those 00:13:58.000 |
four eyes check and you kind of need the human in the loop. 00:14:01.000 |
So design for human in the loop first if you are in a highly regulated environment. 00:14:05.000 |
Yeah, and as Info said, one thing we couldn't show is the whole app factory sort of, like, component, 00:14:11.000 |
which is all the things that operators do through this iteration cycle through the sandbox. 00:14:16.000 |
They take all that knowledge, the extraction templates, the transformers and executors 00:14:20.000 |
that build through this workflow pipeline, and through our app ecosystem within BlackRock, 00:14:27.000 |
they sort of, like, build this custom applications that are then exposed to the users, 00:14:32.000 |
where users of this app don't have to worry about how to configure templates 00:14:36.000 |
or how to basically figure out how to integrate the result values into final downstream processes. 00:14:41.000 |
They are presented with this whole end-to-end app where they can just go 00:14:44.000 |
and, like, sort of, like, upload documents and run extraction 00:14:47.000 |
and sort of get the whole pipeline set up running. 00:14:51.000 |
Yeah. With that, we'll open up for questions. 00:14:55.000 |
Yeah. Yeah. So I have a question which may directly be related to -- 00:15:00.000 |
Good morning. Good morning. I have a question which may directly be related to the architecture that you developed. 00:15:17.000 |
You can tell me. I can discuss later. But the question is going to be, you have developed the key takeaways. 00:15:27.000 |
One of those key takeaways had been in invest heavily on prompt engineering. 00:15:34.000 |
So you have essentially automated the process from the leaf level. 00:15:41.000 |
For example, a company is coming to an IPO from that level all the way to cataloging through ETL processes 00:15:51.000 |
and then to finally to the data analytics. So now, your CEO who looks at the balance sheet, assets and liability, 00:16:01.000 |
will be using your AI the most. And for your CEO, now, what are the features involved here at the lowest level? 00:16:13.000 |
For example, term, maturity, duration. There are so many metrics at the leaf level. 00:16:19.000 |
How are you transforming those features from the lowest level to highest level? 00:16:26.000 |
I'm looking for an answer in reference to decentralized data. 00:16:30.000 |
Yeah. I mean, I can give you a quick answer and then we can discuss in detail, like, offline. 00:16:35.000 |
I think real quickly, like, the framework that we built was specifically targeting, like, the investment operation domain experts 00:16:41.000 |
who are trying to build applications. To your question of, like, hey, what does the CEO care about? 00:16:45.000 |
Can I construct a memo that gives me my asset liabilities X, Y, Z? 00:16:50.000 |
Those would be, like, different initiatives which may or may not use our particular framework. 00:16:55.000 |
But, yes, there are many reusable components in here that people can use. Yeah. 00:17:00.000 |
So I'm kind of wondering, you know, for something similar for each problem. 00:17:12.000 |
Yeah. So I do, like, a lot of document products and for insurance company. Pretty much the same problems as you guys run into. 00:17:18.000 |
So I wonder, how do you build a walls around your information extraction from the documents, right? 00:17:22.000 |
Because there are so many things that can go wrong, such as from ACR. 00:17:26.000 |
Like, OLM doesn't understand what all these terms actually mean, no matter how you prompt it, right? 00:17:34.000 |
Yeah. Again, I mean, we had all of that that we wanted to show. 00:17:37.000 |
I think a short answer to your question is in terms of, like, information security and what are the boundaries that we are putting in terms of, like, hey, we are not having data leakage or errors or understanding of, like, in terms of security, you can think of it as different layers. 00:17:52.000 |
All the way from, like, your infra, platform, application, right? 00:17:57.000 |
And the user levels, there are different controls and policies in place. 00:18:03.000 |
Like, I think there are policies across the stack that we can get into in detail later that kind of addresses your concerns. 00:18:10.000 |
And also to your point, I think we have, like, different sort of, like, strategies that we use based on the sort of, like, the use case at hand. 00:18:20.000 |
So it's not just, like, hey, one rag versus this. 00:18:23.000 |
There are multiple model providers that we use, multiple different strategies, et cetera, different, like, engineering sort of tweaks. 00:18:31.000 |
So it's a quite complex sort of, yeah, process.