The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)

Welcome everybody. I want to thank you for coming to this session today. Today, I want to talk about AI-powered search and retrieval. And for those of you who don't know me, my name is Frank. I am actually a part of the Voyage AI team, and we recently joined MongoDB.

I want to say probably about three to four months ago. Just a quick introduction to Voyage AI. We build the most accurate cost-effective embedding models and re-rankers for RAG and semantic search. A lot of the applications I think that we've seen that I've seen in particular actually go beyond that.

So some folks use it for classification, others for, you know, a variety of different applications about clustering, so on and so forth. Voyage is available via the Voyage AI API, the Azure and AWS marketplaces as well. And now we are a part of MongoDB. So representing MongoDB, yeah, thank you.

So we're representing MongoDB here. And, you know, we're all really excited to see what the future will hold. And the reason, you know, I think in the past when I would give presentations like this, I'd actually talk very, very specifically about things like evaluating, evaluation for embeddings, right? I would talk about things like, you know, how do we rankers play a role in your ultimate retrieval stack?

And today I want to go a little bit higher level. I want to talk a little bit more about what I like to call AI search. I know it's a very, very broad term, has a lot of different meanings to different folks. But I want to go -- I want to start -- use that as a starting point and then talk about where we are today and also to a secondary extent where we're going as well.

So a quick agenda here for the next 10 or so minutes. The first I want to give a quick refresher, you know, a little bit about embeddings, about search and retrieval more broadly in this day and age. Then I'll talk about some real world applications. I think each of the applications -- there's going to be three.

Each of the applications I'm going to talk about -- I'll probably spend about a minute each there -- there's going to be a lesson or something to learn or a key fact to take away from that application. And then what's most exciting, I think, to me and hopefully to the rest of you as well, is what's to come, right?

Where's the future of AI-powered search and retrieval? And where are we going from here? So a quick refresher. I'm going to blow through these pretty quickly. AI-powered search, at least how I define it, and how I hope this will continue to be defined moving forward. is a search system that finds related concepts even without identical wording.

I think this is very important, right? So a lot of folks -- you may be familiar with things like TF-IDF or BM25. You know, AI-powered search goes way beyond that, right? Not only -- it can understand keywords. It can help you retrieve based on some of these more traditional information retrieval algorithms.

But it can also help you find related concepts. To that point, it also understands the user's intent. So, for example, if I am in a -- you know, I'm trying to search for some products to buy, something to buy, and I say my best friend is sick, perhaps it can recommend me some, you know, get well baskets or something like that, right?

It really should be able to understand what my ultimate meaning is, rather than just saying, hey, okay, my best friend is sick. Maybe, you know, I'm going to try to find something that's good for a best friend more generically. The last thing that I'll mention is that it can perform some level of reasoning and some level of instruction following.

And I'll get to that a little bit later in that last section. But to get right into it, you know, a really popular use case of AI-powered search and retrieval is RAG. And I'm going to go through this pretty quickly. The idea is that without retrieval, you either get probably some sort of hallucination, or in some cases, your LLM is just going to flat out refuse to respond to you, or it's going to give you a really, really generic answer like the one you see up here above.

But with retrieval, with AI-powered search built into it, you get a much, much more grounded response. Again, this use case, I think, is pretty common. These are actually slides that I took from 10U's talk that's going to be happening at 12:15. So you guys should definitely go to that if you're around.

And the idea is that you generate embeddings, use that for search, and then you give it to your LLM. Last thing I want to mention is that embedding quality here is a very, very core component of AI-powered search and retrieval. And I think pretty much 95% to 99% of the systems I've seen out there, from what I gather, use embeddings in some way, shape, or form.

The idea is that, again, you have embedding, you have this unstructured data, usually it's text, PDFs, Word documents, Google Drive files, PowerPoints, etc. And you're able to embed them into a same space, such that when you do a search or you try to find a prompt, you have a prompt, you search for the most relevant documents, you're able to pull that information all the way up to the top.

So I'm going to go through these pretty quickly. I'll try to spend about a minute each here. Some real-world applications of AI-powered search and retrieval. The first one is chatting with your code base. And this is actually, if folks have heard of continue.dev, this is actually their application. A lot of their code is open source.

So first, you do, this is a classic RAG plus re-ranking approach. I think the lesson from this particular application is that there is no one-size-fits-all embedding model. There is no one-size-fits-all LLM. Always do evaluations to see which one is best for your application. Now, in this case, continue did theirs, and they found that Voyage Code 3 actually performs the best.

Again, the reason is because for a lot of chat with your code base applications, you want to have an embedding model and also to a secondary extent an LLM that is really, really good at understanding, well, code, documentation, and developers, so on and so forth. So this is the first lesson from the first application.

The second one is that there's, you know, when it comes to the second one, again, it's also a very domain-specific application here. But the second one that I want to mention is if you see this build box here where there's some filtering and then there's some other structured data that's also passed to the search system, if this is the thing that I want to highlight for this particular application is that oftentimes and, you know, coming from a company that builds embedding models, it's hard for me to say this.

I think oftentimes embeddings alone are not enough, right? If you want to build a really powerful search and retrieval system, you need to have a lot of that structured data that's a part of it. So to give you an example, beyond just this particular domain, you know, if I have, let's say, some legal documents, I embed those legal documents using Voyage Law 2, but then when I do my search, or if I'm building my agent, perhaps I want to understand, I want to know, hey, I only want to find documents that are from a particular state, or maybe I want to find only official legal documents, or I want to find documents that have a very, very particular set of details inside of them.

This is all -- these are all things that can be done at the filtering stage. So this is typically done directly inside the vector store. In some cases, it's done after that, right? So I want to say -- I want to be very, very clear, there is oftentimes -- and the second lesson here is that there's oftentimes other sources of data, other pieces of structured data that you need to include inside of your search system as well, right?

And then really, you know, just go beyond that. And then the last thing, you know, the last sort of application -- these are real world, these are actually built today -- the last thing that I want to say is, when it comes to a lot of agentic retrieval, oftentimes, it's a feedback loop.

So your AI search system is no longer just input, output, right? Sometimes, if you get a query, you might want your LLM to, you know, do some searches, and then you might want your LLM to expand that query, or you might want your LLM to decompose that query. I'll give you a quick example.

If I asked for something like, you know, what are Twitter's -- give me Twitter -- or I guess X now. You know, give me X's Q4 -- you know, or 2024 earnings, right? You could -- your LLM could decompose that into Q1, Q2, Q3, Q4 earnings, and then send that as four separate queries to these different vector stores, to the different vector databases, so on and so forth.

On top of this as well, you also see that we're in the era of agents, right? 2025, I would say, is the year of the agents, maybe even 2026 as well. And a lot of agents, a lot of these agentic applications, they're gonna need to be really, really powerful at conversational data.

So, you really want embeddings, you really want a search system that's built around that. Now, there's a lot of details that's missing in this particular block diagram, but I think hopefully that goes to show you sort of the lessons to take away from that. All right. So, this is the most -- so, you know, I just covered three existing applications.

And hopefully, these applications, I think they give you a window into where we are today, you know, some of the tips, some of the tricks, just three of them that's being used today in these AI search systems. There's many, many out there, right? There's, I think, a lot to cover here.

But I think what's more exciting to me is what's to come in AI-powered search and AI-powered retrieval. I don't know what happened to my arrows here. I apologize. I seem to sort of disappear in the background. But the future is 100% multimodal. That is the case for large language models.

That is the case for embeddings. For AI search and retrieval overall, that's going to be the case as well. I think there's no doubt about that. And when I say multimodal, I really -- I don't mean multimodal in the sense of, oh, you know, I'm an agent that's operating in the real world.

I can connect all these different modalities like site and touch and taste together. I'm talking more about multimodality just from a pure sort of like a foundational perspective. The ability to understand images and text together or the ability to understand images, text, and audio together. And it's going to be really important for search systems.

It's going to be really important for embedding models just as we have a lot of VLMs out there today. This particular example is Voyage multimodal 3. Again, I apologize. I don't know what happened to the arrows here. But the idea here is that it can take text, it can take images, or it can take a combination of text, interleave text and images, and really embed all those into a single really powerful semantic space.

So it might be a little bit hard to understand exactly what's going on here, but the query being strong LLMs and the nearest sort of document being the Claude 3.5 blog post. So I hope that's a little bit clear here. I know, again, I know it might be a little bit harder to understand, but this is one of the things that I want to get to.

The second thing that I also think is particularly exciting is instruction tuning. Instruction tuning into a secondary extent reasoning as well. So right now, if you look at embeddings and you look at embedding models, they just take a query or they take a document and they give you a vector.

I think moving forward, we're going to see situations where in addition to that query or in addition to that document, we want to be able to steer the vector in a particular direction or in a particular way. So to give an example, perhaps I want to, you know, in addition to my query, in addition to my prompt, in addition to my search, I also ask, I also give an instruction to say, find documents for me that only dive into detail about this particular aspect.

And that is where I think instruction tuning is really going to play a huge role moving forward for AI search and retrieval. Last thing that I want to talk about is, and this is sort of a buzz wordy kind of term out there, is sort of the agent native database.

And I think this is where Voyage joining forces with MongoDB is super exciting for a lot of us. The idea that today, a lot of search and retrieval, there's many, many multiple different components that you have to put together. And what you see on the left is actually already a really, really simplified version of that.

And the capability to move directly to something that is just a single piece of infrastructure that does the embedding for you, it does the re-ranking for you, perhaps it does some of that query augmentation or query decomposition for you. All of that inside a single data platform, inside a single database, I think that is super exciting.

So I think this is something that you'll see more and more of this year, and also next year as well, and something to look out for hopefully as well. So with that being said, I've got about three minutes left. Would love to take any questions, if you have them.

But I'll also leave this up for a couple more seconds. Feel free to scan those QR codes, follow us, and I hope to see you sometime else at this conference.

The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)

Chapters

Transcript