Multi-Agent Frontiers: Building Ask D.A.V.I.D.

Good morning everyone. I'm really thrilled to be here today at the Interrupt conference. A big thank you to the organizers for putting together such a great event and for inviting us to be able to share our journey with you today. My name is David. I'm here with my colleague Jane from the JPMorgan private bank.

So at the private bank we're part of the investment research team. And this is the team that's responsible for curating and managing lists of investment products and opportunities for our clients. Now when I talk about lists we're not talking about a few dozen or a few hundred products we're talking about thousands of products each backed by many years of very valuable data.

So when you have such extensive lists of diverse products questions are inevitable. And when there's a question our small research team needs to go find some answers. So we go digging around databases of materials files and we piece together answers to the questions that come across our desks each and every day.

Now not only is this a very manual and time-consuming process but this limits our ability to scale and it really makes it difficult for us to really provide insights into the products that we have on our platform. So as a group we got together we challenged ourselves. We said let's come up with a way to automate the investment research process aiming to deliver precise and accurate results.

Today when you have a question you come to me. You come to David and David will give you an answer. But tomorrow you'll be able to go Ask David, our AI-powered solution designed to transform the way we answer investment questions. With Ask David, we're aiming to provide curated answers, insights, and analytics delivered to you as quickly as you can ask a question.

Now I know you're all probably asking yourselves, David, are you going to just put yourself out of a job? Not quite. What we're doing is we're building a tool to make our jobs easier and much more efficient. The stakes here are high. Billions of dollars of assets are at risk and we're committed to building a tool that not only meets but also exceeds the expectations of all of our stakeholders.

Looking to the future, we're really excited about all the possibilities that Ask David potentially brings to the table. Now to dive deeper into the technical magic behind Ask David, I'll turn it over to Jane who will walk you through the notes and bolts of how we're making our vision a reality.

She might even let you in on what Ask David stands for because I promise you I didn't name this after myself. Thank you. Thank you, David. Ask David is a domain-specific QA agent. Let's start with terminology analysis. First of all, we have decades of structured data. Those are the backbones of many up-and-running production systems.

Prior to the introduction of an agent, users have access to the same data, but they have to navigate through different systems and manually syndicate information. An agent can introduce efficiency and integrated user experience. Next, we have unstructured data. As a bank, we manage master a vast amount of documentation including emails, meeting notes, presentations.

With the rise of virtual meetings, we also have increasing amount of video and audio recordings. How do we make full use of that information? The advancement of LRRM really bringing tremendous opportunity in this area. Lastly, as a research team, we have an organization that helps proprietary models and analytics, which are designed to really derive insights and visualization to help decision-making.

Previously, it will require a human expert to conduct this kind of analysis and offering white glove surveys. With the help of an agent, we can scale the insight generation and we can make our service available to more of our clients. Now imagine being a financial advisor in a client meeting and your clients suddenly bring up a fund and ask you why it's terminated.

Believe me, it's actually a very loaded question. So, in the past, you would reach out to our investment research team, talk with real David, and then you figure out what's the status and the history of the fund and what's the reason behind it, what's the research about this fund, what's the What are similar funds, how do I curate this answer specific for this client, and you will come up with a presentation yourself manually.

With the help of an agent, we can get access to the same data, analytics, insights, and visualization right in your meeting. Enable the real-time decision-making. That is our vision of Ask David, and you probably guessed it. David stands for data, analytics, visualization, insights, and decision-making system. So, this is our approach to build up Ask David, which is a multi-agent system.

Starting from our supervisor agent, which acts as our trader, it talks with our end user, understand their intention, and try to delegate the task to one or more of sub-agents in the team. The supervisor agent has access to both short-term and long-term memories so that it can customize the user experience.

It also knows when to invoke human-in-the-loop to ensure the highest level of accuracy and also reliability. Next, we have our structured data agent. It will translate natural language into either SQL queries or API calls, and it will use large-language model to summarize the data on top. Our structured data is a little bit different from structured data.

Usually, it requires some kind of preprocess. But as long as you save it, vectorize it, and put it into a database, we can employ a RAG agent on top to effectively derive information. Lastly, we have the analytics agent. We talked about our proprietary model and APIs. They are usually in the format of APIs or programming libraries.

For a simple query that can be directly answered by API calls, we'll use a React agent and use APIs as tools. But for more complex queries, we'll use text-to-code generation capabilities and use human supervision for the execution. This graph is our end-to-end workflow. It's starting with a planning node.

And you probably noticed there are two subgraphs over here. One is a general QA flow. So for any questions, for any general questions, for example, how do I invest in gold, you will go to the left-hand side subgraph. And if the question is regarding specific funds, you will go to the right-hand side flow.

Each of the flow, as you can see, are equipped with one supervisor agent and a team of specialized agents. Once we retrieve the answer, you probably notice there's one node to personalize the answer and another node to do the reflection check. I will explain it in detail in an example to follow.

The whole flow ends with summarization. So now, back to our client question. Why was this fund terminated? This is how our agent can handle it. So as you can see on the right-hand side, the agent answer was the fund was terminated due to a performance issue. And you can actually click into the reference link to see more about the fund performance and the reason behind it.

What really happened behind the thing? From that planning node, we start to understand this user inquiry is related to a specific fund. So it goes to the specific fund flow. And the supervisor agent inside will be able to extract the fund information as a context and understand that actually the doc search agent is the right one to solve the problem.

Once doc search agent gets that information, it starts to trigger the tools underneath to get the data from MongoDB. Once we retrieve that information, the data and information will be personalized. The same information can be presented in different ways. It depends on who is asking. For example, we are talking about the fund termination reason over here.

A due diligence specialist may demand a very detailed answer. And while advisor maybe just need a general answer. So this personalization node will tailor the answer based on the user roles. Next, we have the reflection node. It uses LM judge to make sure that the answer we generated makes sense.

If it doesn't, what do we do? We try again. So the last one, the whole flow ends with summarization. In the summarization node, we do several things. We summarize the conversation, we update the memory, and we return the final answer. So it was quite a journey working on this multi-agent application.

We are very excited to share the lesson learned. Number one, start simple and refactor often. I know I showed you a fairly complex diagram earlier, but we didn't really focus on building that diagram from day one. So day one, if you can see, is a plain vanilla react agent.

We try to understand how it works. And from there, we need to work on the specialized agent on the picture tool, which actually is a rag agent, if you can see. We start to customize the flow. But once we get very comfortable with the performance of the specialized agent, we start to integrate that into our multi-agent flow in picture three with the supervisor.

And in the picture four, that's our current stats. We actually have the subgraph generated for specific kind of intentions. Right now, we only have two intentions, but we can scale easily with this architecture. So I talked about fast iterations. But how do we know every iteration we're moving towards the right direction?

The answer is evaluation-driven development. So everyone knows that compared to the traditional AI projects. Gen AI projects actually has a shorter development phase. But we do have a long evaluation phase. So our suggestion is to start early, think about the metrics, what kind of goal you want to achieve.

As we are in financial industry, obviously accuracy is one of the most important things. And the continuous evaluation helps you gain confidence that you are improving day by day. So there are additional tips I have over here based on our own experience of evaluation. As you can see on the screen, right, so the dark blue bars over here are coming from the metrics of evaluation on our main flow.

And the green one is actually one example of our sub-agent. So my tip number one is make sure you independently evaluate your sub-agents. But the key for your evaluation is to find places to improve, right? These help you figure out what's the weak link to improve your accuracy. Second point is it depends on how you design your agents.

Make sure you pick the right metrics. So if you have a summarization, you may want to check whether your summarization is concise or not. So conciseness is one of the metrics you want to pick. If you are doing a tool call, maybe you can have the trajectory evaluation instead.

So there is a common myth. I think especially if you're a developer, you're talking about TDD. I think a lot of people say, no, I just don't do that. It's a lot of work, right? But it's not the same in the evaluation. You actually can start evaluation with or without ground truth.

It doesn't matter. There are so many metrics beyond just accuracy. And each one of them will provide you some insight. And you know what? Once you start doing evaluation, you will have review. Once you start doing review, you're actually going to accumulate more for the ground truth examples. Lastly, we have large language model itself as judge in combination of a human review.

This automatic solution really helps us scale without adding too much burdens to our human SME to review large amount of AI-generated answers. Talking about SME, our last lesson learned over here is about human SME in the loop. When you apply a general model to a specific domain, usually you will get less than 50% of accuracy.

But you can do a quick improvement like chunking strategies, you can change your searching algorithms, and you can actually make proper engineering. You know, that can get you to that 80% mark. From 80 to 90, we are using the workflow chains. We are creating the subgraphs so that we can fine tune certain kind of questions without impacting each other.

Between 90% and 100%, that's what we call the last mile. And the last mile is always the hardest mile. In terms of GNI applications, it may not be achievable to get that 100% mark, right? So what do we do? So human SME in the loop is very important to us because we have billion dollars at stake and we cannot afford inaccuracy.

In other words, Ask David still consults with real David whenever needed. Go ahead. In conclusion, three takeaways for you. Iterate fast, evaluate early, keep humans in the loop. Thank you so much. Thank you. Thank you. Thank you.

Multi-Agent Frontiers: Building Ask D.A.V.I.D.

Transcript