How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

00:00:00.000 | .

00:00:15.000 | Hi, everyone, thank you for having us.

00:00:17.000 | Infant, director of engineering at BlackRock.

00:00:19.000 | This is my colleague, Viber principal engineer,

00:00:21.000 | and we both work for the data teams at BlackRock.

00:00:23.000 | And today we are going to talk about

00:00:25.000 | how we can scale building custom applications in BlackRock.

00:00:28.000 | Specifically, we're talking about AI applications

00:00:30.000 | and knowledge apps at BlackRock.

00:00:32.000 | So just to level set before I get into the details.

00:00:35.000 | So BlackRock is an asset management firm,

00:00:37.000 | the world's largest asset manager.

00:00:39.000 | What we do is our portfolio managers, analysts,

00:00:41.000 | get a torrent of information on a daily basis.

00:00:44.000 | They synthesize this information,

00:00:46.000 | they develop an investment strategy,

00:00:48.000 | and then they rebalance their portfolios,

00:00:50.000 | which ultimately results in a particular trade.

00:00:52.000 | Now the investment operations teams,

00:00:55.000 | you can think of that as the teams

00:00:57.000 | that are the backbone or the engine

00:01:00.000 | that make sure that all of the activities

00:01:02.000 | that the investment managers actually perform

00:01:04.000 | on a day-to-day basis, like, run smoothly.

00:01:06.000 | Right?

00:01:07.000 | So these teams are kind of responsible for, like,

00:01:09.000 | acquiring the data that you kind of need, right?

00:01:11.000 | To actually executing a trade, running through compliance,

00:01:14.000 | all the way to, like, all of the post-trading activities, right?

00:01:17.000 | So all of these teams actually have to build these internal tools

00:01:21.000 | that are actually fairly complex for each of their domains.

00:01:24.000 | Right?

00:01:25.000 | So building apps and pushing out these apps relatively quickly

00:01:28.000 | is, like, of utmost importance to us.

00:01:30.000 | Right?

00:01:31.000 | So if you move on to the next slide, again,

00:01:33.000 | if you actually classify what kind of apps we are talking about,

00:01:35.000 | what you'll see is that it kind of falls into, like,

00:01:37.000 | four different buckets.

00:01:38.000 | Right?

00:01:39.000 | One is everything to do with document extraction.

00:01:42.000 | So I have an app.

00:01:43.000 | I kind of want to, like, extract entities out of it in that bucket.

00:01:46.000 | Second has to do everything with, like,

00:01:47.000 | "Hey, I kind of want to define a complex workflow or an automation."

00:01:51.000 | So I could have a case where I kind of want to run through

00:01:54.000 | X number of steps and then integrate to my downstream systems.

00:01:57.000 | And then you have the normal, like, Q&A-type systems

00:02:00.000 | that you look at, like, this is your chat interfaces.

00:02:02.000 | And finally, like, the agentic systems.

00:02:05.000 | Right?

00:02:06.000 | So in each of these domains, what we see is we have this, like,

00:02:10.000 | big opportunity to leverage your models and LLMs to either

00:02:14.000 | augment our existing systems or, like, kind of, like,

00:02:17.000 | supercharge those.

00:02:18.000 | Right?

00:02:19.000 | So that is, like, the domain we are speaking about.

00:02:21.000 | So I'll move quickly to one particular use case.

00:02:24.000 | So this is a use case that came to us, like, about, like,

00:02:27.000 | three to four months back.

00:02:29.000 | Right?

00:02:30.000 | And we have a team within the investment operations space.

00:02:33.000 | It's known as the new issue operations team.

00:02:35.000 | Right?

00:02:36.000 | So this team is kind of responsible for setting up securities

00:02:39.000 | whenever there is, like, a market event.

00:02:41.000 | Right?

00:02:42.000 | So a company goes IPO.

00:02:43.000 | Or, like, there is, like, a stock split for a particular

00:02:45.000 | organization.

00:02:46.000 | Right?

00:02:47.000 | The team actually has to take the security and they have to

00:02:49.000 | set it up in our internal systems before our portfolio

00:02:52.000 | managers or traders can actually action upon it.

00:02:55.000 | Right?

00:02:56.000 | So we kind of have to build this tool for the investment

00:02:59.000 | operations team.

00:03:00.000 | Right?

00:03:01.000 | To set up a particular security.

00:03:02.000 | This is, like, actually, honestly, this is, like,

00:03:03.000 | a super simplified version of what happens.

00:03:05.000 | But at a super high level, we have to build an app that is

00:03:08.000 | able to, like, ingest your prospectus or a term sheet.

00:03:11.000 | It pushes it through a particular pipeline.

00:03:13.000 | Right?

00:03:14.000 | Then you talk to your domain experts.

00:03:16.000 | And these are, like, your business teams, your equity teams,

00:03:18.000 | ETF teams, et cetera.

00:03:19.000 | They actually know how to set up these complex instruments.

00:03:23.000 | You get some kind of structured output.

00:03:25.000 | And now that team works with the engineering teams to actually

00:03:28.000 | build this transformation logic and the like.

00:03:31.000 | And then integrate it with our downstream applications.

00:03:33.000 | So you can see that this process actually takes a long time.

00:03:36.000 | Right?

00:03:37.000 | So building an app and then you're introducing new model providers.

00:03:40.000 | You're trying to put in, like, new strategies.

00:03:42.000 | There are a lot of challenges to get a single app out.

00:03:45.000 | Right?

00:03:46.000 | We tried this with agentic systems.

00:03:48.000 | Doesn't quite work right now because of the complexity and the

00:03:52.000 | domain knowledge that's imbued in the human head.

00:03:55.000 | Right?

00:03:56.000 | So the big challenges with scale are, again, these three categories.

00:04:01.000 | Right?

00:04:02.000 | One is, you're spending a lot of time with our domain experts.

00:04:06.000 | Prompt engineering.

00:04:07.000 | Right?

00:04:08.000 | So in the first phase where we have to extract these documents.

00:04:10.000 | Right?

00:04:11.000 | They're very complex.

00:04:12.000 | Right?

00:04:13.000 | Your prompt itself, in our simplest case, like, started with a couple

00:04:16.000 | of sentences.

00:04:17.000 | Before you knew it, you're trying to describe this financial

00:04:19.000 | instrument and it's, like, three paragraphs long.

00:04:22.000 | Right?

00:04:23.000 | So there's this challenge of, like, hey, I have to iterate over

00:04:25.000 | these prompts.

00:04:26.000 | I have to version and compare these prompts.

00:04:28.000 | How do I manage it effectively?

00:04:29.000 | And I think even the previous speaker had mentioned, you kind of

00:04:31.000 | need to eval and have this data set, how good is your prompt

00:04:34.000 | performing?

00:04:35.000 | So that's the first set of challenges in creating, like,

00:04:37.000 | AI apps itself.

00:04:38.000 | Right?

00:04:39.000 | How are you going to manage this?

00:04:40.000 | In what direction?

00:04:41.000 | Second set of challenges is around, like, LLM strategies.

00:04:44.000 | Right?

00:04:45.000 | What I mean by this is, like, when you're building an AI app,

00:04:48.000 | so to speak, you have to choose what strategy.

00:04:52.000 | Am I going to use, like, a drag-based approach?

00:04:54.000 | Right?

00:04:55.000 | Or am I going to use a chain-of-thought-based approach?

00:04:57.000 | Even for a simple task of, like, data extraction, depending on what

00:05:00.000 | your instrument is, this actually varies very highly.

00:05:03.000 | Right?

00:05:04.000 | If you take, like, an investment corporate bond, like the

00:05:07.000 | vanilla one, it's fairly simple.

00:05:08.000 | I can do this with, like, in context, pass it to model, I'm

00:05:11.000 | able to get my stuff back if the document size is small.

00:05:14.000 | Right?

00:05:15.000 | Some documents are, like, thousands of pages long, 10,000 pages

00:05:17.000 | long.

00:05:18.000 | Now suddenly you're like, oh, okay, I don't know if I can pass

00:05:20.000 | more than a million tokens into, say, the open AI models.

00:05:24.000 | What do I do then?

00:05:25.000 | Right?

00:05:26.000 | Then, okay, I need to choose a different strategy.

00:05:28.000 | And often what we do is we have to choose different strategies

00:05:32.000 | and kind of mix them with your prompts to kind of build this

00:05:35.000 | iterative process where, like, I have to play around with my

00:05:38.000 | prompts, I have to play around with the different LLM strategies,

00:05:40.000 | and we kind of want to make that process as quickly as well.

00:05:42.000 | That's a challenge.

00:05:43.000 | Right?

00:05:44.000 | Then you have, obviously, the context limitations, model

00:05:46.000 | limitations, different vendors, and you're trying and

00:05:48.000 | testing things for quite a while.

00:05:51.000 | And this kind of goes into the month.

00:05:53.000 | Right?

00:05:54.000 | Then the biggest challenge is, like, okay, fine.

00:05:56.000 | I've kind of built this app.

00:05:58.000 | Now what?

00:05:59.000 | How do I get this to deployment?

00:06:00.000 | And it's this whole other set of challenges.

00:06:02.000 | Right?

00:06:03.000 | You have your traditional challenges, which has to do with

00:06:05.000 | distribution, access control.

00:06:06.000 | How am I going to fit the app to the users?

00:06:09.000 | But then in the AI space, it's like you have this new challenge

00:06:12.000 | of, like, what type of cluster am I going to deploy this to?

00:06:15.000 | Right?

00:06:16.000 | So our equity team would come and say something like, hey,

00:06:18.000 | I need to analyze, you know, 500 research reports.

00:06:21.000 | Like, overnight, can you help me do this?

00:06:23.000 | Right?

00:06:24.000 | So, okay.

00:06:25.000 | If you're going to do that, I probably have to have, like,

00:06:26.000 | a GPU-based inference cluster that I can kind of spin up.

00:06:29.000 | Right?

00:06:30.000 | This is the use case that I kind of described, which is the

00:06:32.000 | new issue set up.

00:06:33.000 | In that case, what we do is, okay, I don't really want to use

00:06:36.000 | my GPU inference cluster, et cetera.

00:06:38.000 | What I do instead is I use, like, a burstable cluster.

00:06:41.000 | Right?

00:06:42.000 | So all those have to be kind of, like, defined so that our app

00:06:47.000 | deployment phase is, like, as close to, like, a CI/CD pipeline

00:06:50.000 | as possible.

00:06:51.000 | Then you have, like, cost controls.

00:06:52.000 | So these are, again, it's not an exhaustive list.

00:06:55.000 | I think what I'm trying to highlight is the challenges with

00:06:57.000 | kind of building AI apps.

00:06:59.000 | Right?

00:07:00.000 | So what we did at BlackRock is what I'm going to do is I'll

00:07:04.000 | kind of give you a high-level architecture, and then maybe

00:07:07.000 | you can dive into the details and mechanics of how this works

00:07:11.000 | and how we are able to build apps relatively quickly.

00:07:13.000 | Right?

00:07:14.000 | We're able to -- we took this -- an app took us close to, like,

00:07:17.000 | eight months -- somewhere between three to eight months to

00:07:19.000 | build a single app for a complex use case, and we're able to

00:07:22.000 | compress time, bring it down to, like, a couple of days.

00:07:24.000 | Right?

00:07:25.000 | We achieved that by building up this framework.

00:07:28.000 | What I kind of want to focus on is on the top two boxes that

00:07:31.000 | you see, which is your sandbox in your app factory.

00:07:33.000 | Right?

00:07:34.000 | So to the -- the data platform and the developer platform,

00:07:38.000 | it's like -- the names suggest, hey, the platform is for

00:07:41.000 | someone for ingesting data, et cetera, right?

00:07:43.000 | You have an orchestration layer that has a pipeline that kind

00:07:45.000 | of, like, transforms it, brings it into some new format,

00:07:48.000 | and then you kind of distribute that as an app or a report.

00:07:51.000 | What kind of accelerates app development is, like, if you are

00:07:55.000 | able to federate out those pain points or those bottlenecks,

00:07:59.000 | which is, like, prompt creation or extraction templates,

00:08:02.000 | choosing an LLM strategy, right?

00:08:04.000 | Having extraction plans, or, like -- and then building out

00:08:07.000 | these logic pieces, which you're calling transformer and

00:08:10.000 | executors.

00:08:11.000 | If you can get that sandbox out into the hands of the domain

00:08:14.000 | experts, then your iteration speed becomes really fast.

00:08:17.000 | Right?

00:08:18.000 | So you're kind of saying that, hey, I have this modular component.

00:08:20.000 | Can I move across the situation really quickly?

00:08:22.000 | And then pass it along to an app factory, which is, like,

00:08:25.000 | our cloud-native operator, which takes a definition and

00:08:27.000 | spits out an app.

00:08:28.000 | Right?

00:08:29.000 | That's super high level.

00:08:30.000 | With that, quick demo.

00:08:32.000 | Perfect.

00:08:33.000 | All right.

00:08:34.000 | Cool.

00:08:35.000 | So what I'm going to show you guys is pretty slimmed down

00:08:40.000 | version of the actual tool we used internally.

00:08:44.000 | So to start with, when the -- so we have, like, two different

00:08:48.000 | core components.

00:08:49.000 | One is the sandbox, another one is the factory.

00:08:52.000 | So think of sandbox as a playground for the operators to

00:08:55.000 | sort of, like, quickly build and refine the extraction

00:08:58.000 | templates, sort of run the extraction on the set of

00:09:01.000 | documents, and then compare and contrast the results of

00:09:04.000 | these extractions.

00:09:05.000 | So it's sort of, like, to get started with the extraction

00:09:08.000 | template itself, you might have seen in the other tools,

00:09:11.000 | both closed and open source, they have similar concept like

00:09:14.000 | prompt template management, where you have certain fields that

00:09:17.000 | you want to extract out of the documents, and you have their

00:09:20.000 | corresponding prompts and some metadata that you can

00:09:22.000 | associate with them, such as the data type that you expect of

00:09:25.000 | the final result values.

00:09:27.000 | But when these operators sort of, like, trying to run

00:09:30.000 | extractions on these documents, they need far more sort of,

00:09:33.000 | like, greater configuration capabilities than just, like,

00:09:36.000 | configuring prompts and configuring the data types that they

00:09:38.000 | expect for the end result.

00:09:40.000 | So they need, like, hey, I need to have multiple QC checks on

00:09:44.000 | the result values.

00:09:45.000 | I need to have a lot of validations and constraints on the fields.

00:09:49.000 | And there might be, like, inter-field dependencies, what the fields

00:09:53.000 | that are getting extracted.

00:09:54.000 | So as Infant mentioned, with the new security operation issuance

00:09:59.000 | basically onboarding that stuff, there could be a case where the

00:10:04.000 | security or the bond is callable, and you have other fields such as

00:10:08.000 | call data and call price, which now needs to have a value.

00:10:10.000 | So there is, like, this inter sort of, like, field dependencies that

00:10:13.000 | operators sort of, like, need to -- they need to take that into

00:10:16.000 | consideration and be able to configure that.

00:10:18.000 | So here's, like, what a sample extraction template looks like.

00:10:24.000 | So here is how a -- again, this is an example template where we

00:10:29.000 | have, like, issuer, callable, call price, and call data, this field

00:10:32.000 | set up.

00:10:33.000 | And to sort of, like, add new fields, we will define the field name,

00:10:36.000 | define the data type that is expected out of that, define the

00:10:39.000 | source, whether it's extracted or derived.

00:10:41.000 | Not every time you want to sort of, like, run an extraction for

00:10:45.000 | a field, there might be a derived field that operator expects,

00:10:48.000 | which is sort of, like, populated through some transformation

00:10:51.000 | downstream.

00:10:52.000 | And once -- again, whether the field is required and the field

00:10:56.000 | dependencies.

00:10:57.000 | Here is where you define what sort of, like, dependencies this field

00:10:59.000 | have and sort of validations, right?

00:11:01.000 | So this is how they set up the extraction.

00:11:04.000 | The next thing is the document management itself.

00:11:06.000 | So this is where the documents are ingested from the data

00:11:11.000 | platform, they are tagged according to the business category,

00:11:14.000 | and they're labeled, they're embedded, all of that stuff.

00:11:17.000 | Okay, while -- I think while Wiver kind of brings it up, so I

00:11:21.000 | think what -- in essence what we're saying is we have kind of

00:11:23.000 | built this tool which has, like, a UI component and, like, a

00:11:26.000 | framework that actually lets you take these different pieces and

00:11:30.000 | these modular components and give it to the hands of, like, the

00:11:34.000 | domain experts to build out their app really quickly, right?

00:11:38.000 | I think something happened.

00:11:40.000 | It's just saying.

00:11:41.000 | So let me just sort of walk you guys the -- what happens next.

00:11:44.000 | So, like, once you have set up the extraction templates and

00:11:47.000 | documents management, the operators basically run the

00:11:49.000 | extractions.

00:11:50.000 | That's where they basically see the values that they expect from

00:11:54.000 | these documents and sort of, like, review them.

00:11:57.000 | The thing we have seen with these operators trying to use other

00:12:01.000 | tools -- no, this is just saying.

00:12:04.000 | Yeah, I did.

00:12:05.000 | The thing we have seen with these operators is that most of the

00:12:09.000 | tools that they have used in past, these tools basically does

00:12:13.000 | extraction.

00:12:14.000 | They do a pretty good job at extraction.

00:12:16.000 | But when it comes to, like, hey, I need to now use this result

00:12:20.000 | that has been presented to me and pass it to the downstream

00:12:24.000 | processes, the process right now is very manual, where they have

00:12:27.000 | to, like, download a CSV or JSON file that run manual or add up

00:12:31.000 | transformation and then push it to the downstream process.

00:12:34.000 | So what we have done -- and, again, I can't show you -- but what

00:12:37.000 | we have done is, like, build this sort of, like, low-code, no-code

00:12:40.000 | framework where the operators can basically essentially run the --

00:12:45.000 | sort of build this transformation and execution workflows and sort

00:12:50.000 | of, like, have this end-to-end pipeline running.

00:12:53.000 | I think, yeah, so I think we'll conclude by saying that, like, key

00:12:57.000 | takeaways of this, right?

00:12:59.000 | I would say there are, like, three key takeaways.

00:13:01.000 | Invest heavily on your, like, prompt engineering skills for your domain

00:13:04.000 | experts, especially in, like, the financial space and world.

00:13:07.000 | Defining and describing these documents is really hard, right?

00:13:10.000 | A second is, like, educating the firm and the company on what an LLM

00:13:14.000 | strategy means and how to actually fix these different pieces for your

00:13:18.000 | particular use case.

00:13:19.000 | And I think the third one I would say is, hey, the key takeaway

00:13:22.000 | that we had is all of this is great in experimentation and prototyping mode,

00:13:26.000 | but if you kind of want to bring this, you have to really evaluate what your ROI

00:13:29.000 | is and is it going to be, like, more expensive actually spinning up an AI app

00:13:33.000 | versus just having, like, an off-the-shelf product that does it quicker

00:13:37.000 | and faster, right?

00:13:38.000 | So those are the three key takeaways in terms of, like, building apps at scale.

00:13:43.000 | And what we have realized was, like, hey, this notion of, like, human in the loop.

00:13:48.000 | And the one more thing I'll add is human in the loop is super important, right?

00:13:51.000 | We all are, like, really tempted, like, let's go all agent tech with this.

00:13:54.000 | But in the financial space with compliance, with regulations, you kind of need those

00:13:58.000 | four eyes check and you kind of need the human in the loop.

00:14:01.000 | So design for human in the loop first if you are in a highly regulated environment.

00:14:05.000 | Yeah, and as Info said, one thing we couldn't show is the whole app factory sort of, like, component,

00:14:11.000 | which is all the things that operators do through this iteration cycle through the sandbox.

00:14:16.000 | They take all that knowledge, the extraction templates, the transformers and executors

00:14:20.000 | that build through this workflow pipeline, and through our app ecosystem within BlackRock,

00:14:27.000 | they sort of, like, build this custom applications that are then exposed to the users,

00:14:32.000 | where users of this app don't have to worry about how to configure templates

00:14:36.000 | or how to basically figure out how to integrate the result values into final downstream processes.

00:14:41.000 | They are presented with this whole end-to-end app where they can just go

00:14:44.000 | and, like, sort of, like, upload documents and run extraction

00:14:47.000 | and sort of get the whole pipeline set up running.

00:14:51.000 | Yeah. With that, we'll open up for questions.

00:14:53.000 | I think we have, like, a minute or two left.

00:14:55.000 | Yeah. Yeah. So I have a question which may directly be related to --

00:15:00.000 | Good morning. Good morning. I have a question which may directly be related to the architecture that you developed.

00:15:17.000 | You can tell me. I can discuss later. But the question is going to be, you have developed the key takeaways.

00:15:27.000 | One of those key takeaways had been in invest heavily on prompt engineering.

00:15:34.000 | So you have essentially automated the process from the leaf level.

00:15:41.000 | For example, a company is coming to an IPO from that level all the way to cataloging through ETL processes

00:15:51.000 | and then to finally to the data analytics. So now, your CEO who looks at the balance sheet, assets and liability,

00:16:01.000 | will be using your AI the most. And for your CEO, now, what are the features involved here at the lowest level?

00:16:13.000 | For example, term, maturity, duration. There are so many metrics at the leaf level.

00:16:19.000 | How are you transforming those features from the lowest level to highest level?

00:16:26.000 | I'm looking for an answer in reference to decentralized data.

00:16:30.000 | Yeah. I mean, I can give you a quick answer and then we can discuss in detail, like, offline.

00:16:35.000 | I think real quickly, like, the framework that we built was specifically targeting, like, the investment operation domain experts

00:16:41.000 | who are trying to build applications. To your question of, like, hey, what does the CEO care about?

00:16:45.000 | Can I construct a memo that gives me my asset liabilities X, Y, Z?

00:16:50.000 | Those would be, like, different initiatives which may or may not use our particular framework.

00:16:54.000 | Thank you.

00:16:55.000 | But, yes, there are many reusable components in here that people can use. Yeah.

00:17:00.000 | So I'm kind of wondering, you know, for something similar for each problem.

00:17:12.000 | Yeah. So I do, like, a lot of document products and for insurance company. Pretty much the same problems as you guys run into.

00:17:18.000 | So I wonder, how do you build a walls around your information extraction from the documents, right?

00:17:22.000 | Because there are so many things that can go wrong, such as from ACR.

00:17:26.000 | Like, OLM doesn't understand what all these terms actually mean, no matter how you prompt it, right?

00:17:31.000 | All this stuff.

00:17:32.000 | So that's kind of what bothers me.

00:17:34.000 | Yeah. Again, I mean, we had all of that that we wanted to show.

00:17:37.000 | I think a short answer to your question is in terms of, like, information security and what are the boundaries that we are putting in terms of, like, hey, we are not having data leakage or errors or understanding of, like, in terms of security, you can think of it as different layers.

00:17:52.000 | All the way from, like, your infra, platform, application, right?

00:17:57.000 | And the user levels, there are different controls and policies in place.

00:18:01.000 | And it's also within your SDN network.

00:18:03.000 | Like, I think there are policies across the stack that we can get into in detail later that kind of addresses your concerns.

00:18:10.000 | And also to your point, I think we have, like, different sort of, like, strategies that we use based on the sort of, like, the use case at hand.

00:18:20.000 | So it's not just, like, hey, one rag versus this.

00:18:23.000 | There are multiple model providers that we use, multiple different strategies, et cetera, different, like, engineering sort of tweaks.

00:18:31.000 | So it's a quite complex sort of, yeah, process.

00:18:35.000 | All right.

00:18:36.000 | They're cool.

00:18:37.000 | Awesome.

00:18:38.000 | Thank you.

00:18:39.000 | All right.

00:18:40.000 | Thank you.

00:18:41.000 | Thank you.

00:18:42.000 | Thank you.

00:18:42.000 | We'll see you next time.

How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

Chapters