back to index

The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)


Chapters

0:0 Introduction
1:37 Agenda
2:16 Refresher
4:47 Examples
8:36 Summary
9:3 multimodality
10:34 instruction tuning
11:26 agent native database
12:15 outro

Whisper Transcript | Transcript Only Page

00:00:00.040 | Welcome everybody. I want to thank you for coming to this session today. Today, I want to talk about
00:00:19.560 | AI-powered search and retrieval. And for those of you who don't know me, my name is Frank. I am
00:00:25.800 | actually a part of the Voyage AI team, and we recently joined MongoDB. I want to say probably
00:00:29.680 | about three to four months ago. Just a quick introduction to Voyage AI. We build the most
00:00:34.800 | accurate cost-effective embedding models and re-rankers for RAG and semantic search. A lot of
00:00:40.240 | the applications I think that we've seen that I've seen in particular actually go beyond that. So some
00:00:44.080 | folks use it for classification, others for, you know, a variety of different applications about
00:00:47.920 | clustering, so on and so forth. Voyage is available via the Voyage AI API, the Azure and AWS
00:00:54.240 | marketplaces as well. And now we are a part of MongoDB. So representing MongoDB,
00:00:59.360 | yeah, thank you. So we're representing MongoDB here. And, you know, we're all really excited
00:01:03.760 | to see what the future will hold. And the reason, you know, I think in the past when I would give
00:01:08.480 | presentations like this, I'd actually talk very, very specifically about things like evaluating,
00:01:13.840 | evaluation for embeddings, right? I would talk about things like, you know, how do we rankers
00:01:18.560 | play a role in your ultimate retrieval stack? And today I want to go a little bit higher level.
00:01:24.080 | I want to talk a little bit more about what I like to call AI search. I know it's a very, very broad term, has a
00:01:29.040 | lot of different meanings to different folks. But I want to go -- I want to start -- use that as a
00:01:32.480 | starting point and then talk about where we are today and also to a secondary extent where we're
00:01:36.560 | going as well. So a quick agenda here for the next 10 or so minutes. The first I want to give a quick
00:01:43.520 | refresher, you know, a little bit about embeddings, about search and retrieval more broadly in this day
00:01:50.480 | and age. Then I'll talk about some real world applications. I think each of the applications -- there's
00:01:54.640 | going to be three. Each of the applications I'm going to talk about -- I'll probably spend about a minute
00:01:58.080 | each there -- there's going to be a lesson or something to learn or a key fact to take away from that
00:02:05.760 | application. And then what's most exciting, I think, to me and hopefully to the rest of you as well, is what's
00:02:10.160 | to come, right? Where's the future of AI-powered search and retrieval? And where are we going from here?
00:02:15.680 | So a quick refresher. I'm going to blow through these pretty quickly. AI-powered search, at least how I define it, and how I hope this will continue to be defined moving forward.
00:02:20.640 | is a search system that finds related concepts even without identical wording. I think this is very important, right?
00:02:29.760 | So a lot of folks -- you may be familiar with things like TF-IDF or BM25. You know, AI-powered search goes way beyond that, right?
00:02:37.120 | Not only -- it can understand keywords. It can help you retrieve based on some of these more traditional information retrieval algorithms. But it can also help you find related concepts.
00:02:46.640 | To that point, it also understands the user's intent. So, for example, if I am in a -- you know, I'm trying
00:02:55.760 | to search for some products to buy, something to buy, and I say my best friend is sick, perhaps it can recommend me some, you know, get well baskets or something like that, right?
00:03:03.760 | It really should be able to understand what my ultimate meaning is, rather than just saying, hey, okay, my best friend is sick. Maybe, you know, I'm going to try to find something that's good for a best friend more generically.
00:03:13.760 | The last thing that I'll mention is that it can perform some level of reasoning and some level of instruction following. And I'll get to that a little bit later in that last section.
00:03:24.880 | But to get right into it, you know, a really popular use case of AI-powered search and retrieval is RAG. And I'm going to go through this pretty quickly.
00:03:32.880 | The idea is that without retrieval, you either get probably some sort of hallucination, or in some cases, your LLM is just going to flat out refuse to respond to you, or it's going to give you a really, really generic answer like the one you see up here above.
00:03:44.880 | But with retrieval, with AI-powered search built into it, you get a much, much more grounded response.
00:03:51.040 | Again, this use case, I think, is pretty common. These are actually slides that I took from 10U's talk that's going to be happening at 12:15.
00:03:59.360 | So you guys should definitely go to that if you're around. And the idea is that you generate embeddings, use that for search, and then you give it to your LLM.
00:04:07.360 | Last thing I want to mention is that embedding quality here is a very, very core component of AI-powered search and retrieval.
00:04:14.240 | And I think pretty much 95% to 99% of the systems I've seen out there, from what I gather, use embeddings in some way, shape, or form.
00:04:24.560 | The idea is that, again, you have embedding, you have this unstructured data, usually it's text, PDFs, Word documents, Google Drive files, PowerPoints, etc.
00:04:35.040 | And you're able to embed them into a same space, such that when you do a search or you try to find a prompt, you have a prompt, you search for the most relevant documents,
00:04:43.120 | you're able to pull that information all the way up to the top.
00:04:46.400 | So I'm going to go through these pretty quickly. I'll try to spend about a minute each here.
00:04:50.560 | Some real-world applications of AI-powered search and retrieval. The first one is chatting with your code base.
00:04:58.800 | And this is actually, if folks have heard of continue.dev, this is actually their application. A lot of their code is open source.
00:05:04.480 | So first, you do, this is a classic RAG plus re-ranking approach.
00:05:09.360 | I think the lesson from this particular application is that there is no one-size-fits-all embedding model.
00:05:17.760 | There is no one-size-fits-all LLM.
00:05:20.160 | Always do evaluations to see which one is best for your application.
00:05:24.560 | Now, in this case, continue did theirs, and they found that Voyage Code 3 actually performs the best.
00:05:28.960 | Again, the reason is because for a lot of chat with your code base applications, you want to have an
00:05:34.880 | embedding model and also to a secondary extent an LLM that is really, really good at understanding,
00:05:40.880 | well, code, documentation, and developers, so on and so forth. So this is the first lesson from the first
00:05:46.000 | application. The second one is that there's, you know, when it comes to the second one, again, it's also a
00:05:51.840 | very domain-specific application here. But the second one that I want to mention is if you see this
00:05:55.200 | build box here where there's some filtering and then there's some other structured data that's also
00:06:01.280 | passed to the search system, if this is the thing that I want to highlight for this particular application
00:06:07.040 | is that oftentimes and, you know, coming from a company that builds embedding models, it's hard for
00:06:13.040 | me to say this. I think oftentimes embeddings alone are not enough, right? If you want to build a really
00:06:17.920 | powerful search and retrieval system, you need to have a lot of that structured data that's a part of it.
00:06:23.680 | So to give you an example, beyond just this particular domain, you know, if I have,
00:06:28.000 | let's say, some legal documents, I embed those legal documents using Voyage Law 2, but then when I do my
00:06:36.240 | search, or if I'm building my agent, perhaps I want to understand, I want to know, hey, I only want to find
00:06:42.080 | documents that are from a particular state, or maybe I want to find only official legal documents, or I want to find
00:06:48.160 | documents that have a very, very particular set of details inside of them. This is all -- these are all
00:06:54.320 | things that can be done at the filtering stage. So this is typically done directly inside the vector
00:06:58.480 | store. In some cases, it's done after that, right? So I want to say -- I want to be very, very clear,
00:07:02.720 | there is oftentimes -- and the second lesson here is that there's oftentimes other sources of data,
00:07:06.880 | other pieces of structured data that you need to include inside of your search system as well,
00:07:11.760 | right? And then really, you know, just go beyond that. And then the last thing, you know, the last
00:07:17.600 | sort of application -- these are real world, these are actually built today -- the last thing that I want
00:07:22.320 | to say is, when it comes to a lot of agentic retrieval, oftentimes, it's a feedback loop.
00:07:28.800 | So your AI search system is no longer just input, output, right? Sometimes, if you get a query,
00:07:36.080 | you might want your LLM to, you know, do some searches, and then you might want your LLM to expand
00:07:41.360 | that query, or you might want your LLM to decompose that query. I'll give you a quick example. If I asked
00:07:46.320 | for something like, you know, what are Twitter's -- give me Twitter -- or I guess X now. You know,
00:07:51.680 | give me X's Q4 -- you know, or 2024 earnings, right? You could -- your LLM could decompose that into Q1,
00:07:59.120 | Q2, Q3, Q4 earnings, and then send that as four separate queries to these different vector stores,
00:08:05.200 | to the different vector databases, so on and so forth. On top of this as well, you also see that
00:08:10.880 | we're in the era of agents, right? 2025, I would say, is the year of the agents, maybe even 2026 as well.
00:08:15.920 | And a lot of agents, a lot of these agentic applications, they're gonna need to be really,
00:08:20.160 | really powerful at conversational data. So, you really want embeddings, you really want a search
00:08:25.680 | system that's built around that. Now, there's a lot of details that's missing in this particular
00:08:29.840 | block diagram, but I think hopefully that goes to show you sort of the lessons to take away from that.
00:08:35.280 | All right. So, this is the most -- so, you know, I just covered three existing applications. And hopefully,
00:08:43.840 | these applications, I think they give you a window into where we are today, you know, some of the tips,
00:08:48.960 | some of the tricks, just three of them that's being used today in these AI search systems. There's many,
00:08:54.560 | many out there, right? There's, I think, a lot to cover here. But I think what's more exciting to me
00:08:59.680 | is what's to come in AI-powered search and AI-powered retrieval. I don't know what happened to my arrows here.
00:09:06.480 | I apologize. I seem to sort of disappear in the background. But the future is 100% multimodal.
00:09:13.680 | That is the case for large language models. That is the case for embeddings. For AI search and
00:09:18.480 | retrieval overall, that's going to be the case as well. I think there's no doubt about that. And when
00:09:22.880 | I say multimodal, I really -- I don't mean multimodal in the sense of, oh, you know, I'm an agent that's
00:09:27.520 | operating in the real world. I can connect all these different modalities like site and touch and taste
00:09:33.600 | together. I'm talking more about multimodality just from a pure sort of like a foundational perspective.
00:09:40.480 | The ability to understand images and text together or the ability to understand images, text, and audio
00:09:45.680 | together. And it's going to be really important for search systems. It's going to be really important
00:09:49.600 | for embedding models just as we have a lot of VLMs out there today. This particular example is Voyage
00:09:55.040 | multimodal 3. Again, I apologize. I don't know what happened to the arrows here. But the idea here is that
00:10:01.120 | it can take text, it can take images, or it can take a combination of text, interleave text and images,
00:10:08.800 | and really embed all those into a single really powerful semantic space. So it might be a little
00:10:14.160 | bit hard to understand exactly what's going on here, but the query being strong LLMs and the nearest sort of
00:10:23.040 | document being the Claude 3.5 blog post. So I hope that's a little bit clear here. I know, again,
00:10:28.880 | I know it might be a little bit harder to understand, but this is one of the things that I want to get to.
00:10:34.000 | The second thing that I also think is particularly exciting is instruction tuning. Instruction tuning
00:10:40.480 | into a secondary extent reasoning as well. So right now, if you look at embeddings and you look at embedding models,
00:10:47.520 | they just take a query or they take a document and they give you a vector. I think moving forward,
00:10:52.160 | we're going to see situations where in addition to that query or in addition to that document,
00:10:57.440 | we want to be able to steer the vector in a particular direction or in a particular way.
00:11:03.520 | So to give an example, perhaps I want to, you know, in addition to my query, in addition to my prompt,
00:11:09.840 | in addition to my search, I also ask, I also give an instruction to say, find documents for me that only
00:11:16.320 | dive into detail about this particular aspect. And that is where I think instruction tuning is really
00:11:23.200 | going to play a huge role moving forward for AI search and retrieval. Last thing that I want to talk
00:11:28.160 | about is, and this is sort of a buzz wordy kind of term out there, is sort of the agent native database.
00:11:34.560 | And I think this is where Voyage joining forces with MongoDB is super exciting for a lot of us.
00:11:39.600 | The idea that today, a lot of search and retrieval, there's many, many multiple different components
00:11:44.480 | that you have to put together. And what you see on the left is actually already a really, really
00:11:48.240 | simplified version of that. And the capability to move directly to something that is just a single
00:11:55.200 | piece of infrastructure that does the embedding for you, it does the re-ranking for you, perhaps it does
00:11:59.840 | some of that query augmentation or query decomposition for you. All of that inside a single data platform,
00:12:05.760 | inside a single database, I think that is super exciting. So I think this is something that you'll
00:12:10.160 | see more and more of this year, and also next year as well, and something to look out for hopefully as
00:12:14.800 | well. So with that being said, I've got about three minutes left. Would love to take any questions,
00:12:19.440 | if you have them. But I'll also leave this up for a couple more seconds. Feel free to
00:12:24.400 | scan those QR codes, follow us, and I hope to see you sometime else at this conference.