Building a Smarter AI Agent with Neural RAG

00:00:00.120 | All right, so I was gonna give a live demo coding, but I will, but I know you all are

00:00:23.680 | actually here to hear a cool story. So I'll tell you a story about web search built for AI,

00:00:28.720 | and then we do some coding at the end. This story will end with this slide, one API

00:00:35.620 | to get any information from the web, and you'll know what this means by the end, but

00:00:41.080 | the story starts in 1998, and what you're looking at is the state-of-the-art in

00:00:47.500 | information retrieval in 1998. You type in a word Australia to this new search

00:00:53.080 | engine called Google, and it magically finds you all the documents that contain

00:00:57.160 | the word Australia from the web. It's crazy. And the big insight of Google was

00:01:02.060 | they had this page rank algorithm, so the results are ranked by authority based on

00:01:06.700 | the graph structure of the web. And this was a clever algorithm, and it was really

00:01:09.800 | cool. I was two years old at the time, so if I was conscious, I would have thought

00:01:14.020 | this was cool. Okay, and now our story skips 23 years to 2021. By this point, I was conscious,

00:01:23.560 | barely. And I noticed that GB3 had recently come out, and it was this magical thing that

00:01:32.920 | you could input a whole paragraph explaining exactly what you want, and it would really understand

00:01:37.680 | the subtleties of your language and give you an output that exactly matched. And it's hard to remember how

00:01:42.580 | magical this was, but it was really magical in 2021. And at the same time, I noticed there was Google,

00:01:47.220 | which, you know, you type in a simple query, like shirts without stripes, and it would give you shirts with stripes,

00:01:53.580 | which is crazy. It like doesn't understand the word without, because it's doing a keyword comparison algorithm.

00:02:00.280 | And so I decided that for the next at least 10 years, I'm going to devote myself to building a search engine that combines the technology of GB3

00:02:08.880 | with a search engine to make a search engine that actually understands what you're saying at a deep level, and understands all the documents on the web at a deep level, and gives you exactly what you asked for.

00:02:19.120 | This is a very big idea, and we've worked on it for four years and a lot of progress, but it would change the world if you actually solve this problem.

00:02:28.120 | And so in 2021, we joined YC, summer 2021. We raised a couple million dollars, and we did what every YC startup should do.

00:02:37.120 | We spent half of it on a GPU cluster. I'm joking. You shouldn't do that.

00:02:41.120 | And then we also followed YC's advice, where we didn't talk to any users or customers for a year and a half, and we just did research.

00:02:52.120 | Again, you shouldn't do that. You shouldn't talk to users. But in our case, it made sense, because we were trying to solve a really hard problem,

00:02:56.120 | which is like redesign search from scratch, using the same technology as GB3, this like next token prediction idea with transformers.

00:03:03.120 | What if you could apply the same thing to search? And this is actually one of our OneDB training runs.

00:03:10.120 | The purple one, I believe, was a breakthrough where it really learned. There was a few breakthroughs along the way involving random data sets

00:03:17.120 | and different transformer architectures that we were trying, and this purple one really started to work well.

00:03:23.120 | And the general idea we had was like, okay, so what is a search engine? You have like a trillion documents on the web.

00:03:30.120 | And traditional search engines on a very high level will create like a keyword index of those documents.

00:03:35.120 | So for each document, you ask what are the words in those documents, and you create this big inverted index where you map from like words like brown

00:03:43.120 | to all the documents that contain that word. And then at search time, you know, when a search without stripes comes in,

00:03:49.120 | you do some crazy keyword comparison algorithm and get the top results.

00:03:54.120 | That's obviously a simplification of what Google does, but at a fundamental level, it's doing a keyword comparison.

00:03:59.120 | But the idea was like, what if you could actually, so with transformers, like the big thing is like,

00:04:04.120 | what if you could turn each document not into a set of keywords, but into embeddings?

00:04:08.120 | And these embeddings can be arbitrarily powerful, right? Like it's a list of, an embedding is just a list of numbers,

00:04:13.120 | and it could represent lots of information. So an embedding, it doesn't just capture the words in the document,

00:04:18.120 | but also the meaning, the ideas in the document, and the way people refer to that document on the web.

00:04:23.120 | And you know, embedding can be arbitrarily big, and so it like, of course, in the limit, it would just destroy keywords.

00:04:28.120 | And so you have this like, arbitrarily powerful representation. And now the fundamental idea was just like,

00:04:33.120 | the bitter lesson, what if we could like, you know, train transformers to output embeddings for documents,

00:04:37.120 | and if we keep getting more and more data, and that's high quality, we could make a search engine that actually understands you.

00:04:42.120 | And the way it would work at inference, at search time is like, a search comes in, a query comes in like shirts without stripes.

00:04:48.120 | Traditional search engines would use the above thing, where they would do a very fancy keyword comparison,

00:04:53.120 | and a bunch of other things. And then instead, we would just embed the shirts without stripes, and compare it to the embeddings of all the trillion documents.

00:04:59.120 | And you know, after a year and a half, we actually had a new search engine that worked in a very different way.

00:05:05.120 | And you search shirts without stripes on Google, I'm sorry, on Exa, and you get a list of results that actually do not have stripes.

00:05:13.120 | It's a simple example, but it could handle way more complex queries, like paragraph-long queries.

00:05:20.120 | And when we launched this in November 2022, we got a lot of excitement on Twitter.

00:05:25.120 | This is a very new paradigm for search. You could do all sorts of interesting queries that you couldn't do before.

00:05:28.120 | And then, two weeks later, this happened. It was a small tweet.

00:05:35.120 | And this is a visual depiction of San Francisco at the time.

00:05:40.120 | You guys probably all remember this.

00:05:43.120 | And then this is a visual depiction of the Exa team at the time.

00:05:47.120 | Because ChatGPT completely changed the way we interact with the world's information.

00:05:52.120 | You know, like, everyone can now use an LLM to just, like, talk to their computer,

00:05:57.120 | and get information.

00:05:58.120 | And we were thinking, wait, is there even a role for search in this world?

00:06:01.120 | Like, these LLMs are so powerful.

00:06:02.120 | And then, very quickly, we realized, yes, there is a role.

00:06:05.120 | Because LLMs don't know everything on the web.

00:06:08.120 | So, for example, if you ask an LLM like GPT-4, find me cool personal sites of engineers in San Francisco.

00:06:13.120 | It can't.

00:06:14.120 | Like, it just doesn't have that in the weights.

00:06:16.120 | It'll apologize, whatever.

00:06:18.120 | And, you know, there's a very simple information theory argument here, where it's like, there literally

00:06:23.120 | isn't enough information in the weights of GPT-4 to store the whole web.

00:06:26.120 | GPT-4 will call, like, we don't know exactly how many parameters.

00:06:29.120 | I think someone leaked it on YouTube once.

00:06:30.120 | But it's like, oh, you know, a couple trillion parameters.

00:06:32.120 | You could call it, like, less than 10 terabytes in the weights of GPT-4.

00:06:36.120 | And then the internet is, like, over a million terabytes.

00:06:39.120 | And that's just the documents on the web.

00:06:41.120 | There's also images and video.

00:06:42.120 | And that's way more.

00:06:44.120 | Actually, the web, if you look, I did a tweet recently about the size of the web.

00:06:48.120 | And it's in the exabyte range.

00:06:50.120 | And our name is Exa.

00:06:51.120 | It's not a coincidence.

00:06:53.120 | Anyway, so, like, LLMs need to search the web, just from this simple argument.

00:06:58.120 | And they're going to need to do that for a long time.

00:06:59.120 | Which, if you talk to ML researchers, they'll say the same thing.

00:07:01.120 | It's just, like, it's too hard.

00:07:03.120 | Also, the web is constantly updating.

00:07:04.120 | That's another problem.

00:07:05.120 | It's not just the size of the web.

00:07:06.120 | It's the constant updatingness of the web that makes it very tricky.

00:07:08.120 | So LLMs always will need search.

00:07:10.120 | That's great.

00:07:11.120 | And so when you combine an LLM with a search engine like Exa,

00:07:14.120 | you can handle these queries.

00:07:16.120 | So, like, find me cool personal sites and engineers in SF.

00:07:19.120 | The LLM will search Exa, get a list of personal sites,

00:07:23.120 | and then, like, use that information to output the perfect thing for the user.

00:07:26.120 | You're all very familiar with this.

00:07:28.120 | Like, LLMs plus search, it's obvious now, right?

00:07:30.120 | Like, everyone knows about it.

00:07:32.120 | But now let me tell you a secret about search that most people don't know.

00:07:37.120 | And the secret is that traditional search engines were not built for this world of AI.

00:07:43.120 | Traditional search engines were built for humans.

00:07:45.120 | And humans are very different from AI.

00:07:48.120 | So every search engine, like Google, Bing, you name it, was built in a different era for this kind of creature.

00:07:55.120 | This slow flesh human that's typing keywords and wants to read a few links and really cares about UI of the page and all these things.

00:08:04.120 | Like, it's a lazy human.

00:08:05.120 | They type simple keywords.

00:08:06.120 | Google is great for this creature.

00:08:08.120 | Google was optimized for this creature.

00:08:10.120 | It gives you exactly the kinds of things you would click on.

00:08:13.120 | But AIs are very different.

00:08:15.120 | Like, an AI can gobble up information like crazy.

00:08:18.120 | This is a much slowed down version of what our AIs probably feel like inside.

00:08:23.120 | And so AIs are very different.

00:08:24.120 | They want to use complex queries, not simple ones, to find not a couple links, but just tons of knowledge, as much knowledge as they could get.

00:08:31.120 | Because they actually have the patience to just analyze it all extremely fast.

00:08:34.120 | And so the search algorithm that's optimal for this type of creature is not the same algorithm that's optimal for the human.

00:08:42.120 | Like, that would be crazy if the same algorithm that was optimal for humans was optimal for AIs.

00:08:47.120 | And so, like, a lot of the tools, the search tools that we're talking about these days on Twitter and stuff like that,

00:08:53.120 | they're still using, like, the old traditional search combined with AIs.

00:08:57.120 | It's just not the right puzzle fit.

00:08:59.120 | So, actually, we're really trying to think of, like, what is the right search engine for this AI world?

00:09:04.120 | And so just a few examples we could dive deep into of how AIs are different.

00:09:09.120 | Well, AIs want precise, controllable information.

00:09:12.120 | So, oh, by the way, when I say AI, I'm usually talking about, like, an AI product.

00:09:16.120 | So imagine, like, in this case, like, a VC that's using an AI system to find a list of companies because they want to invest.

00:09:22.120 | So, you know, they're looking for something -- what's the next big thing?

00:09:24.120 | What's the next big thing that feels like Bell Labs?

00:09:26.120 | Well, when they tell their AI what they want, the AI will then go search a search engine, right?

00:09:30.120 | And if it searches a search engine like Google, it'll get a list of results that humans like to click on.

00:09:35.120 | But it's not very information dense, and it doesn't even match what the person asked for -- what the AI asked for.

00:09:39.120 | The AI asked for startups working on something huge that feels like Bell Labs.

00:09:43.120 | It should get a list of startups.

00:09:44.120 | It's kind of a crazy idea, but what if search engines actually returned exactly what you asked of them and not what Google knows you'll click on?

00:09:52.120 | And so, with AIs especially, they just want a search engine that returns exactly what they asked for.

00:09:56.120 | Because what really the world's going to look like is you're going to interact with your AI agent, and you're going to ask for something, and then it's going to make tons of searches.

00:10:02.120 | Like, okay, maybe they want startups working on something similar to Bell Labs.

00:10:06.120 | Maybe they want startups working only in New York City that have this quality and that quality.

00:10:10.120 | And they'll do all sorts of searches, and it just wants a search API that just does what it asks.

00:10:14.120 | And so you need a search engine like that.

00:10:16.120 | So X is like that.

00:10:18.120 | Another difference between AIs and humans is AIs want to search with lots of context.

00:10:21.120 | Again, if you have an AI assistant, and you talk to it all day, and then you ask for restaurants or apartments or what have you, the AI has lots of context on you.

00:10:30.120 | So it should be able to search with this large multi-paragraph thing, saying like, you know, my human is a software engineer, and it likes these types of things, and I like these types of things.

00:10:38.120 | And like, can you give me, you know, restaurants that match those preferences?

00:10:42.120 | And so you need a search engine that could literally handle multiple paragraphs of text.

00:10:46.120 | But traditional search engines like Google were not meant to do that because humans would never type in multiple paragraphs because they're too lazy.

00:10:52.120 | So Google was optimized for like simple keyword queries.

00:10:54.120 | So Google, I think, has like a few dozen keyword limit, whereas EXA can handle like multiple paragraphs of text.

00:11:02.120 | Another big one where AIs are different than humans is AIs want comprehensive knowledge.

00:11:06.120 | Like, if you give a human 10,000 links or 10,000 pages, it doesn't know what to do with that.

00:11:11.120 | Like, it would take 10 days of extreme patience to process all that.

00:11:15.120 | But AIs can do it in three seconds if it's parallelized, right?

00:11:18.120 | And so if I'm a VC and I want to report on like all the companies in a space, I want literally all the companies.

00:11:24.120 | And there's a huge amount of value to getting truly all of them and not just like the 10 or 20 that Google is able to find.

00:11:30.120 | And so you need a search engine that exposes the ability to return 1,000, 10,000, whatever it is,

00:11:35.120 | and also has this semantic ability to like, you know, when you say like every startup funded by YC working on AI, you actually can get all of them.

00:11:41.120 | So like Google literally just can't do this at all.

00:11:44.120 | Okay, I hope that through these examples, we see that the space of possible queries is actually like way larger than people realize.

00:11:52.120 | And until like 2022, we were kind of in this like top left blue world.

00:11:57.120 | So this circle is like the space of possible queries and the blues are like, you know, specific subsets of that space.

00:12:03.120 | And so like, we were all in that top left corner of blue for a long time where you could, you know, search engines can handle like basic keyword queries like Stripe pricing or someone's GitHub page or Taylor Swift boyfriend or whatever it is.

00:12:18.120 | After 2022, everyone started to want the top right blue circle where it was like, hey, actually, I want to make queries like explain this concept to me like I'm a five year old or here's my code.

00:12:28.120 | Can you like debug it?

00:12:29.120 | This is a form of query doesn't require search, but it's a another type of query that was introduced to the world in 2022.

00:12:35.120 | And then like, there's other types of queries like the semantic queries like people in San Francisco who know assembly.

00:12:41.120 | As far as I'm aware, X is like, I mean, X kind of like introduced this kind of query and does really well in them on those queries.

00:12:49.120 | And then there's these really complex queries like find me every article that argues X and not Y from an author like Z.

00:12:55.120 | And we're starting to now have systems like X is like Websets product that could handle these things.

00:12:59.120 | And I think this is actually a huge space because this like turns the web into like a database you could filter however you want.

00:13:05.120 | And that's really what AIs want.

00:13:06.120 | They want this like full control database like query system that they could just get whatever they need for their user.

00:13:12.120 | And then there are the queries that no one has thought of yet.

00:13:14.120 | Like every week we get tons of queries and like, oh wait, that's a really interesting type of query that that no search engine could do right now.

00:13:21.120 | And eventually we'll try to, you know, handle all the queries that are possible.

00:13:25.120 | But there's so many new types of queries now because we have these AI systems and the stakes like the expectations have just gotten way higher.

00:13:32.120 | So now we end our story with the same slide, one API to get any information from the web.

00:13:39.120 | So again, like X is trying to, if you go back, like handle not just like the keyword queries, but also the semantic queries and also the super complex queries and eventually all queries.

00:13:49.120 | We want one API that could like give these AI systems whatever knowledge they want.

00:13:54.120 | You have the AI and you have EXA providing the knowledge.

00:13:58.120 | Oh, I only have four minutes.

00:13:59.120 | Okay.

00:14:00.120 | Okay.

00:14:01.120 | So that's, let's see.

00:14:04.120 | Oop.

00:14:05.120 | How do I go to a different part of my computer?

00:14:09.120 | Hmm.

00:14:14.120 | If I change to the code editor, how do I do that?

00:14:16.120 | Let's see.

00:14:17.120 | What?

00:14:18.120 | Oh, it's there.

00:14:19.120 | Oh, but I can't see it.

00:14:20.120 | That's so weird.

00:14:21.120 | .

00:14:24.120 | Oh, cool.

00:14:25.120 | Okay.

00:14:26.120 | Okay.

00:14:27.120 | There we go.

00:14:30.120 | Okay.

00:14:31.120 | Cool.

00:14:32.120 | Well, first of all, just very quick exploration of this is our search dashboard.

00:14:35.120 | We could try different queries.

00:14:36.120 | I'll just point out like in the search API endpoint, you know, we expose lots of different toggles.

00:14:43.120 | So first of all, you just try out a query and get, it shows you the code and it gets you a list of results.

00:14:50.120 | And it exposes tons of different types of filters that you might want to do.

00:14:53.120 | For example, like number of results, 10, 100, 1,000, whatever it is.

00:14:56.120 | You could have like date ranges or, you know, I only want to search over these domains.

00:14:59.120 | And it's a lot of toggles, but I think the point is actually you want the toggles because your AI is actually going to be calling this.

00:15:04.120 | You want a search engine that gives you full control.

00:15:06.120 | And we have like neural and keyword search.

00:15:09.120 | So you could try different ones.

00:15:11.120 | Okay.

00:15:12.120 | Let me quickly jump to the code.

00:15:15.120 | Okay.

00:15:16.120 | So I prepared this like code, agent.py.

00:15:19.120 | So we made this agent, agent Mark.

00:15:22.120 | And Mark loves to make markdown out of things.

00:15:24.120 | Anything you give it, it will make markdown.

00:15:26.120 | And Mark will make markdown.

00:15:27.120 | And so in this case, we're going to here.

00:15:31.120 | Well, I guess in this case, let's try this query.

00:15:36.120 | Personal site of engineer in San Francisco who likes information retrieval.

00:15:40.120 | Well, this is the kind of a query that neural would be a lot better at.

00:15:49.120 | Okay.

00:15:50.120 | Save it.

00:15:55.120 | Oh, running the wrong agent.

00:16:00.120 | Okay.

00:16:01.120 | So it's just, it's making a query to get like a list of personal sites of engineers in San Francisco

00:16:04.120 | who like information retrieval.

00:16:05.120 | And Mark, the agent is just making a markdown output of that.

00:16:08.120 | That's a very neural type query.

00:16:10.120 | You also might want to do a different type of query, which is like a more keyword heavy one.

00:16:15.120 | Let's see, like, my GitHub.

00:16:31.120 | So, okay.

00:16:32.120 | So here I would want to make a keyword query.

00:16:34.120 | So you just change the keyword.

00:16:40.120 | Search.

00:16:41.120 | So it's going to get information from my GitHub using keyword search, because this is a very

00:16:44.120 | typical, like, Google-like search that would work well, right?

00:16:46.120 | Oh, God.

00:16:47.120 | I'm running this wrong one.

00:16:49.120 | Okay.

00:16:50.120 | Cool.

00:16:51.120 | That's information about Wilbrick's GitHub.

00:16:54.120 | And then, okay, so when you're actually building an agent, you're going to be combining lots of different types of searches.

00:16:59.120 | So neural searches and keyword searches and all sorts of other searches that X exposes.

00:17:04.120 | So, like, the right agent in the future is going to be this system that decides what type of search it needs for whatever the user says.

00:17:12.120 | Like, it'll be like, oh, okay, I'm going to make, like, a neural search to get a list of things.

00:17:15.120 | And then for each one, I'm going to do a keyword search, right?

00:17:17.120 | You want to give the agent, like, just full access to the world's information in however way it wants.

00:17:23.120 | Not just keyword search, but also all these other things.

00:17:27.120 | And so here, I one-shotted with O3 a GitHub agent, which combines these two queries.

00:17:33.120 | So, first, it'll -- because, you know, I want to get the GitHub of every engineer in San Francisco who likes information retrieval.

00:17:40.120 | So the agent will make a neural search to get a list of people, extract the names, and then search those using a keyword search to get their GitHubs.

00:17:48.120 | And then if you run that -- here, it's just getting 10 results.

00:17:54.120 | But we could, you know, with EXA, we could do 100 or 1,000 if you're on an enterprise plan.

00:18:00.120 | So now it's getting all the GitHub info.

00:18:07.120 | Cool.

00:18:08.120 | So that's just an example.

00:18:10.120 | And, yeah, I mean, there are lots of other things that you could do with EXA.

00:18:13.120 | Like, we actually just today launched this research endpoint where it will actually do, like,

00:18:19.120 | as much searches and LLM calls in the background to get you that perfect report or that perfect structured output for the thing you asked for.

00:18:26.120 | So it's kind of like a deep research API.

00:18:28.120 | And it's state-of-the-art deep research API.

00:18:31.120 | Cool.

00:18:32.120 | That is the talk.

00:18:33.120 | I hope that was interesting.

00:18:34.120 | Thank you.

00:18:35.120 | Thank you.

00:18:36.120 | Thank you.

00:18:37.120 | We'll see you next time.

Building a Smarter AI Agent with Neural RAG - Will Bryk, Exa.ai