back to indexGraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem

00:00:00.000 |
I basically dedicated my professional life towards getting developers to be able to build 00:00:19.440 |
better applications and build applications better by leveraging not just individual data points, 00:00:26.700 |
kind of retrieved at once, like one at a time, or summed up or group calculated averages, 00:00:33.360 |
but individual data points connected by relationships, right? And today I'm going to 00:00:38.520 |
talk about that applied in the world of LLMs and Gen AI. So before I do that, though, I'm going to take 00:00:45.820 |
a little bit of a detour. I'm going to talk about search, the evolution of search. Everyone here in 00:00:49.920 |
this room knows that the vast majority of web searches today are handled with Google, but some 00:00:54.720 |
of you know that it didn't start that way. It started this way. Who here recognizes this web page? 00:00:59.440 |
Right, yeah. Who here recognizes Alta Vista as a name? Like a few people, right? Back in the mid-90s, 00:01:07.440 |
there was dozens of web search companies, dozens plural, like 30, 40, 50 web search companies, 00:01:12.600 |
and they all used basically the same technology. They used keyword-based text search, inverted index 00:01:19.020 |
type search, BM25 like, for those of you who know what that means. And it worked really, really well 00:01:23.740 |
until it didn't. And the Alta Vista effect kicked in, which was the notion that you search for something, 00:01:31.740 |
you got a thousand or thousands of hits back, and you have to look through page after page 00:01:37.660 |
until you found the result that was relevant to you. The Alta Vista effect, you got too much back from the internet. 00:01:43.740 |
That wasn't a problem in the beginning because most of the things you search for when I went onto the 00:01:48.460 |
internet in the beginning got zero results back because there was no content about that on the 00:01:52.860 |
internet, right? But the Alta Vista effect, too many search results, was solved by Google. This is 00:02:00.220 |
Google's press release mid-2000. They talk about a billion URLs they've indexed, right? But they also 00:02:07.260 |
talk about the technology that they use behind the scenes, the technology called PageRank that delivers 00:02:13.180 |
the most important search results really early on. In fact, the first, the top 10 blue links on that 00:02:19.740 |
first page, right? That technology, PageRank, is actually a graph algorithm, which is actually, 00:02:27.100 |
it's called Eigenvector Centrality, and the innovation that Google did was applying that 00:02:32.140 |
to the scale of the internet and the scale of the web, right? PageRank. That ushered in and created, 00:02:39.820 |
honestly, the most valuable company on the planet for quite some while, the PageRank era, right? That 00:02:46.620 |
lasted for about a decade, about a dozen years, until in 2012, Google wrote this blog post, which is an 00:02:53.740 |
amazing blog post introducing the knowledge graph, things, not strings, where they said, you know 00:03:00.460 |
what, guys? We've done an upgrade on the back end of our search technology, the biggest one since we 00:03:06.060 |
invented PageRank, where we're moving away from not just storing the text and the links between the 00:03:12.620 |
documents, but also the concepts embedded in those documents, things, not just strings. And we all know 00:03:20.700 |
what the knowledge graph looks like visually. When you search for something on Google today, Moscone Center, 00:03:27.100 |
just around the corner from here, you're going to get this little panel right on the right hand side. 00:03:32.460 |
If you look at that panel, it has a combination of unstructured text, in this case from Wikipedia, 00:03:38.860 |
with structured text, it has the address, the owner of the Moscone building, you know, that kind of stuff. 00:03:45.100 |
This thing is backed on the back end by the data structure looking like this, right? It has these 00:03:52.060 |
concept, the rings, that we call nodes, that are connected to other nodes through relationships, 00:03:57.900 |
and both the nodes and the relationships have key value properties. You can attach two, three, a thousand, 00:04:04.140 |
ten thousand on both the nodes, and very importantly, also on the relationships. This is a knowledge graph, 00:04:11.340 |
and that was the next decade or so, 12 years, of Google's dominance. Until a few months ago. A few 00:04:19.580 |
months ago, at Google I/O, they took the next step, ushered in by the AI engineers conference a year ago. 00:04:27.340 |
Well, not quite, but of course, the entire craze around Gen AI. And this is one of the examples that they did, 00:04:32.860 |
the classic travel itinerary. They helped me plan out this travel. Everyone here in this room knows 00:04:38.380 |
that this is backed by an LLM. And it is backed by an LLM in combination with this knowledge graph data 00:04:45.660 |
structure, GraphRag. This is ushering in the next era of web search, the GraphRag era. What I'm going to 00:04:55.100 |
talk to you about today is how can you use, well, first of all, should you, and if so, how can you 00:05:01.020 |
use GraphRag for your own RAG-based applications? So what is GraphRag, right? It is very, very simple. 00:05:09.980 |
GraphRag is RAG where on the retrieval path you use a knowledge graph. Very, very simple. It doesn't say 00:05:18.620 |
you only use a knowledge graph, but you use a knowledge graph. Maybe in combination with other 00:05:23.020 |
technologies like vector search. So let's take the classic example of a customer service bot, 00:05:29.020 |
right? And let's say that you are working at a company that is building Wi-Fi routers, for example, 00:05:36.460 |
right? And you have a bunch of support articles, right? And they've been stored in text files, right? And then 00:05:43.340 |
you are tasked with building a bot that either gives direct end users access to it or your own 00:05:49.100 |
customer service agents, employees, like access to this information. And you know how to do this because 00:05:55.340 |
you live in the LLM world, in the Gen.AI world, so you're going to use RAG for this, right? And so you 00:05:59.900 |
have that data. It's text documents. You've added that text onto the properties of particular nodes, 00:06:07.500 |
right? So you have a node per article. But then you've also said that, you know what, 00:06:11.660 |
this article is about this particular Wi-Fi product, right? So you have a relation to that Wi-Fi product. 00:06:17.500 |
And that Wi-Fi product sits in a hierarchy of other Wi-Fi products. And it's written by this particular 00:06:23.500 |
customer service engineer, you know, that kind of stuff. And then the end user has a question. 00:06:28.380 |
Hey, my Wi-Fi lights are flashing yellow and my connection drops. Like, what should I do? Something like that. 00:06:35.340 |
I think we all know how we do this. We vectorize the search, right? We get some kind of vector embedding back. 00:06:41.340 |
We use vector search to get the core documents. But here's where the graph RAG part kicks in. 00:06:46.780 |
You get those core articles back, which are linked to the nodes. Actually, the text is on the nodes. 00:06:52.140 |
But then you use the graph to traverse from there and retrieve more context around it. Maybe it's not 00:06:58.380 |
just that particular article for that particular Wi-Fi, but something else in that family. Maybe you use the 00:07:05.180 |
fact that this particular engineer has very highly ranked content. And then you rank that higher, 00:07:10.620 |
right? You retrieve more context than what you get out of the ANN-based search from your vector store. 00:07:17.900 |
And you pass that on to the LLM. Along with the question, you get an answer back, and you hand it 00:07:22.620 |
to the user. So the core pattern is actually really, really simple, but really, really powerful. 00:07:30.060 |
All right? You start with doing a vector search. I think of this almost as a primary key. It's, 00:07:36.220 |
of course, not a primary key, but almost like a primary key lookup into the graph. You use that vector 00:07:40.940 |
search. You get an initial set of nodes. Then you walk the graph, and you expand that and find relevant 00:07:48.780 |
content based on the structure of the graph. Then you take that and you return it to the LLM. Or optionally, 00:07:55.100 |
maybe that gives you 1,000 or 10,000 nodes back. And then you do what Google did. You rank that. You get the top 00:08:02.460 |
K based on the structure of the graph. Maybe you even use page rank, right? You get that. You pass it on to the LLM. 00:08:08.700 |
It's really, really simple, but really, really powerful. And then there's a number more advanced 00:08:14.380 |
patterns, but that's kind of the next talk I'll do in a year. It's more sophisticated graph retrieval 00:08:20.780 |
patterns, right? But the core one, very, very simple. Okay. So if that's what GraphRag is, 00:08:29.420 |
what are the benefits of GraphRag? When should you use it? When should you not use it? The first and most 00:08:35.340 |
stark benefit is accuracy. It's directly correlated to the quality of the answer. There's been a ton 00:08:42.300 |
of research articles about this in the last six months, something like that. I believe the first 00:08:47.900 |
one was this one by data.world. I just picked out three at random here that I like. This is the first 00:08:54.460 |
one that I know of by data.world, which is a data cataloging company based on a knowledge graph. And they 00:09:00.300 |
proved out across, I think, 43 different questions that on average, the response quality, the accuracy, 00:09:08.060 |
was three times higher if they use a knowledge graph in combination with vector search. 00:09:14.780 |
I love this paper by LinkedIn. It shows a very similar type. I think it's like 75% or 77% 00:09:22.540 |
increase in accuracy. But it also has a great architecture view. So you can take the QR code 00:09:28.940 |
right there and look at that paper, which combines various components and also the flow through that, 00:09:34.620 |
that I thought was just really pedagogical. But by and large, it's showing the same thing, 00:09:40.220 |
a little bit of different numbers, but significantly higher accuracy when it used graph in combination 00:09:45.900 |
with vector search. And then Microsoft had a fantastic blog post and subsequently, I think, 00:09:51.900 |
two academic papers. The blog post was in February of this year, where they also talk about the increased 00:09:58.300 |
quality of response. But also beyond that, hey, you know what? GraphRag enables us to answer another 00:10:05.900 |
important class of questions that we couldn't even do with vector search alone or baseline 00:10:11.260 |
vector search or baseline RAG alone. So first benefit, higher quality response back. 00:10:17.740 |
The second one is easier development. And this one is a little bit interesting because there's an 00:10:24.540 |
asterisk in there. Because what we hear very clearly from our users is that it's easier to build RAG 00:10:30.220 |
applications with graph RAG compared to baseline RAG. But we also hear it's like, it's actually hard. 00:10:36.380 |
And what's the nuance there? Well, the nuance is if you already have a knowledge graph up and running. 00:10:42.220 |
So there's a learning curve where people need to learn, how do I create the knowledge graph in the 00:10:46.300 |
first place? Once you have that, it's a lot easier. But how do you create that knowledge graph, 00:10:52.380 |
right? So let's put a little pin in that. If I rush through the next few slides quickly enough, 00:10:57.580 |
I'm going to show you hopefully a demo on that. But let's put a little pin in that. So this is an 00:11:02.860 |
example. This is from a very high growth stage fintech company that is very cutting edge in AI. 00:11:12.300 |
And they started playing around with graph RAG a few, about six months ago. And they took an existing 00:11:17.500 |
application and they said, you know what? We are going to port this from a vector database to Neo4j, 00:11:23.500 |
and most of the operations yield a better result. They can calculate the embeddings on a database 00:11:28.460 |
level. Getting related actions is as simple as following the relationships between nodes. 00:11:33.260 |
And this one I love. The cache, and the cache here is their application, they call it the cache, 00:11:39.100 |
can be visualized. This is an extremely valuable debugging tool. And in the parenthesis, I actually 00:11:45.420 |
said that you've already fixed a couple of bugs just thanks to this. Right? Amazing. Like once you've 00:11:51.180 |
been able to create that graph, it's a lot easier to build your RAG application. 00:11:55.180 |
And why is that? Right? So let's talk a little bit about representation. Let's say we have the phrase 00:12:03.980 |
in there, apples and oranges are both fruit. And we want to represent that in vector space and in graph 00:12:10.060 |
space. In graph space, we already talked about this. Apple is a fruit. Orange is a fruit. Pretty easy. 00:12:16.700 |
That's the representation in graph space. In vector space, it looks like this. Maybe. Or maybe this is 00:12:24.780 |
something else like we actually don't know. Two different ways of representing that phrase. 00:12:29.500 |
And then we can run similarity calculations in different ways using these both representations that 00:12:36.540 |
I'm not going to go through right now. We can search in different ways. These are not competing ways of 00:12:42.940 |
doing it. They're complementary ways of doing it. Right? One is not better than the other. Except I will 00:12:47.980 |
make one statement. Which is, when you sit down and you write your application, when you build your 00:12:54.860 |
application, I'm actually going to make the statement that one of them is superior. This vector space 00:13:00.540 |
representation is completely opaque to a human being. But the graph representation is very, very clear. 00:13:08.220 |
It is explicit. It's deterministic. It's visual. You can see it. You can touch it as you build your 00:13:14.620 |
applications. This is the, I already fixed a couple of bugs thanks to this. Just by porting it from a 00:13:20.460 |
vector-only store to graph rag, they were able to see and work with the data. And that is really freaking 00:13:27.100 |
powerful. That shows up in development time as you're building your applications. It's also showing up for 00:13:34.540 |
our friends in IT who worry about things maybe that is not directly related to building the application, 00:13:41.580 |
which is explainability, which is auditability, which is governance. That explicit data structure 00:13:50.060 |
has knock-on effects over there that are really, really powerful. Once you're up and running in 00:13:55.740 |
production, you need to be able to explain why something happened. 00:13:59.340 |
So, higher accuracy, better answers, easier to build once you're through the hump of creating the 00:14:08.140 |
knowledge graph, and then increased explainability and governance for IT and the business, right? 00:14:13.900 |
Those are the three things. So, how do you get started with graph rag? Well, I've talked a lot about 00:14:20.940 |
this already. Like, how do you create the knowledge graph in the first place? So, a little bit of nuance here. 00:14:26.220 |
So, basically, there are three types of data out in the world that I care about when I think about 00:14:31.260 |
knowledge graph creation. The first one is structured data. So, this is your data in your snowflake or 00:14:36.780 |
something like that or Postgres, right? The other one is unstructured data, PDF files, raw text from a web 00:14:43.500 |
page. And the other one, the third one is mixed. People tend to call this semi-structured, but it's 00:14:49.180 |
not. Hit me up afterwards and I'll tell you why it's not. But basically, what this one is is structured data 00:14:54.060 |
where some of the fields are long form text, right? Basically, we're great in the first bucket in the 00:15:01.980 |
graph world. It's very easy to go from Snowflake or Postgres or MySQL or Oracle into a property graph 00:15:08.540 |
model. The unstructured one is really freaking hard, right? It's hard to do in theory. It's 00:15:16.700 |
also had immature tooling for a long run. The middle one is actually where the majority of at least 00:15:24.300 |
enterprise production use cases are in the real world. 00:15:29.420 |
So, man, two and a half minutes. This is rough. There are two types of graphs and I'm not going 00:15:35.180 |
to talk about them. I want to talk about them. Lexical graphs and domain graphs is actually really 00:15:39.260 |
relevant, but I really want to get to this demo. So, I've talked about creating graphs with unstructured 00:15:46.060 |
information. So, we just built this new tool that we launched just a few weeks ago called the Knowledge 00:15:51.340 |
Graph Builder. And you see it here. Can you see the screen? Okay. So, basically, here you can drag 00:15:58.380 |
and drop your PDF files. You can put in YouTube links, Wikipedia links. You can point it to your 00:16:04.220 |
kind of cloud service buckets, right? And it's going to extract the data from there and create the graph. 00:16:09.500 |
So, here I added a few things. I added a PDF of Andrew Ning's newsletter, The Batch. I added the 00:16:17.740 |
Wikipedia page for OpenAI. And I added the YouTube from Swix and Alessio's, you know, the Four Wars Latent 00:16:24.860 |
Space Podcast. So, I added all that and I uploaded it into this Knowledge Graph Builder. And when I do 00:16:33.100 |
that, it creates if -- let's see here. I knew the Ethernet connection was going to do it. It automagically 00:16:42.380 |
I created a little Knowledge Graph. If it renders -- wait for it. It says one minute here. 00:16:50.620 |
So, it better render pretty soon. All right. Let me do this again. Please work. 00:17:00.220 |
And I was like trying to keep it alive in the -- in the thing, too. All right. 00:17:18.700 |
Okay. Let's see. I think we are here. And then it says show me a graph. And it's not going to show me the 00:17:36.940 |
graph. Oh, yeah, it will. Come on. You can do it. All right. Yes. 00:17:45.180 |
So, what we have here -- check this shit out. I would love to sit here and just drink in your applause, 00:17:51.340 |
but we need to look at this data. So, check this out. This is the document, the Four Wars document. 00:17:57.100 |
Here are the various chunks. And then you can take a chunk and you can expand that. This, I put in the 00:18:02.700 |
the embedding. And you can -- I'll zoom out here. And you can see that it takes the logical concept 00:18:12.060 |
elements out of that chunk. Right? Machine learning. They talk about something that is developed in a 00:18:17.340 |
similar fashion. I don't even know. There's some company there. Right? And you get that entire graph 00:18:23.180 |
of all this information. On top of that, I really don't have time to show it. But there's also -- I really 00:18:30.540 |
don't have time to show it. There's a chat bot in here that you can use. And you can introspect the 00:18:35.660 |
result that gets back. I'll -- one more second. Take up your phones. If you think this looks cool, 00:18:41.740 |
take a photo of this QR code. And you're going to have an amazing landing page where you have access to 00:18:48.460 |
all of this information. You can get up and running yourself. Thank you for the additional minutes. 00:18:52.620 |
Thank you, Emil. Thanks, Emil. Thanks, everyone, for paying attention. 00:19:02.940 |
Thank you, Emil. Thanks, everyone. Thanks, everyone. Thanks, everyone.