back to indexLangChain Interrupt 2025 JP Morgan Building Ask D A V I D – Zheng Xue

00:00:00.040 |
We're really excited about all the possibilities that ask David 00:00:03.480 |
and potentially bring to that paper. So now we dive deeper into the technical 00:00:08.100 |
magic behind Ask David. I'll turn it over to Jane who will walk through the notes and bolts of 00:00:13.020 |
how we are making our vision or reality. And she might even let you know what Ask David 00:00:17.520 |
stands for because I promise you I didn't name this out for myself. Thank you. 00:00:22.280 |
Thank you David. So Ask David is a human-specific QA agent. So let's start with 00:00:29.000 |
epidemiology analysis. First of all, we have the database of structured data. Those are the 00:00:35.000 |
backcodes of many other running production systems. Prior to the introduction of an agent, 00:00:40.000 |
users have access to the same data. But they have to navigate through different 00:00:44.000 |
systems and manually navigate their information. An agent can introduce 00:00:49.000 |
efficiency and integrated user experience. Next, we have unstructured data. As a bank, we 00:00:58.000 |
have a vast amount of documentation, including emails, emails, presentations. With the rise 00:01:05.000 |
of virtual readings, we also have increasing amount of video and audio recordings. How do we make 00:01:12.000 |
for use of data information? The advancement of LRRM really bring in tremendous opportunity in this area. Lastly, as a research team, we have applied proprietary models and analytics, which are designed to really provide insights and visualization to help disseminate it. Previously, it will require a human expert to conduct this kind of analysis and offering a write-off survey. With the help of an agent, we can scale the 00:01:40.000 |
inside generation and we can make our service available to more of our clients. 00:01:47.000 |
Now imagine, being a financial advisor in a planning meeting, and your clients suddenly bring out the fund and ask you why it's terminating. 00:01:57.000 |
Believe me, it's actually a very loaded question. So, in the past, you would reach out to our investment research team, talk with real debate, and then you figure out what's the strategy and the history of the fund and what's the reason behind it. What's the research about this fund? What are similar funds? How do I encourage these answers? 00:02:16.000 |
specific for this fund? And you will come up with that presentation yourself manually. With the help of an agent, we can get access to the same data, analytics, insights, and visualization, writing or reading, enable the real-time decision-making. That is our vision of us, David. And you probably guessed it. David stands for data, analytics, visualization, insights, and decision-making data. 00:02:43.000 |
So, this is our approach to build us, David, which is a multi-agent system. 00:02:50.000 |
So, we found our supervisor agent, which acts as an officer in talks with other users, understand their intention, and try to delegate the task to one or more of the staff agents in the team. The supervisor agents have access to both short-term and long-term libraries, so that they can customize the user experience. It also knows when to invoke you in the loop to ensure the highest level of accuracy and automation. 00:08:20.000 |
So, everyone knows that compared to the traditional AI projects, 00:08:26.000 |
CNI projects actually have a shorter development phase. 00:08:40.000 |
Obviously, accuracy is one of the most important things. 00:08:45.000 |
And the continuous evaluation helps you get that confidence. 00:08:49.000 |
So, there are additional tips I have over here based on our own experience of evaluation. 00:08:55.000 |
So, the dark blue bars over here are coming from the metrics of evaluation on main flow. 00:09:01.000 |
And the big one is actually one example of our sub-agent. 00:09:05.000 |
So, my tip number one is make sure you independently evaluate your sub-agents. 00:09:10.000 |
Well, the key for evaluation is to find places to improve, right? 00:09:14.000 |
These help you figure out what we think to improve your accuracy. 00:09:20.000 |
Second point is, depending on how you design your agents, make sure you pick the right metrics. 00:09:27.000 |
So, if you have a summarization, you may want to check whether your summarization is concise or not. 00:09:32.000 |
So, concisiveness is one of the metrics you want to pick. 00:09:36.000 |
If you are doing a true call, maybe you can have a trajectory evaluation instead. 00:09:45.000 |
I mean, especially if you are a developer, you talk about TDD. 00:09:49.000 |
I think a lot of people say that I just don't do that. 00:09:55.000 |
You actually can start evaluation with or without long-truth. 00:10:01.000 |
There are so many metrics beyond just accuracy. 00:10:04.000 |
And each one of them will provide you some insight. 00:10:08.000 |
Once you start doing evaluation, you will have review. 00:10:11.000 |
Once you start doing review, you actually can accumulate more of the long-truth examples. 00:10:16.000 |
Lastly, we help large learning models help us judge in combination of human review. 00:10:25.000 |
These automated solutions really help us scale without adding too much burdens to our human SME to review large amount of AI-generated answers. 00:10:35.000 |
Talking about SME, our last lesson learned over here is about human SME in the loop. 00:10:41.000 |
When you apply a general model to a specific event, usually you will get less than 50% of accuracy. 00:10:49.000 |
But you can do a quick improvement like charting strategies. 00:10:54.000 |
You can change your searching outcomes and then you can actually make improvement in engineering. 00:11:03.000 |
From 80 to 90, we are using the workflow chains. 00:11:06.000 |
We are creating the subgraph so that we can fine-tune certain kind of questions without impacting each other. 00:11:12.000 |
Between 90% and 100%, that's what we call the last mile. 00:11:17.000 |
And the last mile is always the hardest mile. 00:11:20.000 |
In terms of GNI applications, it may not be achievable to get that 100% mark, right? 00:11:27.000 |
So human SME in the loop is very important to us. 00:11:30.000 |
Because we have billion dollars at stake and we cannot afford embarrassing. 00:11:36.000 |
In other words, "Ask David" still consults with real David whenever you need it.