back to index

GraphRAG: The Marriage of Knowledge Graphs and RAG: Emil Eifrem


Whisper Transcript | Transcript Only Page

00:00:00.000 | I basically dedicated my professional life towards getting developers to be able to build
00:00:19.440 | better applications and build applications better by leveraging not just individual data points,
00:00:26.700 | kind of retrieved at once, like one at a time, or summed up or group calculated averages,
00:00:33.360 | but individual data points connected by relationships, right? And today I'm going to
00:00:38.520 | talk about that applied in the world of LLMs and Gen AI. So before I do that, though, I'm going to take
00:00:45.820 | a little bit of a detour. I'm going to talk about search, the evolution of search. Everyone here in
00:00:49.920 | this room knows that the vast majority of web searches today are handled with Google, but some
00:00:54.720 | of you know that it didn't start that way. It started this way. Who here recognizes this web page?
00:00:59.440 | Right, yeah. Who here recognizes Alta Vista as a name? Like a few people, right? Back in the mid-90s,
00:01:07.440 | there was dozens of web search companies, dozens plural, like 30, 40, 50 web search companies,
00:01:12.600 | and they all used basically the same technology. They used keyword-based text search, inverted index
00:01:19.020 | type search, BM25 like, for those of you who know what that means. And it worked really, really well
00:01:23.740 | until it didn't. And the Alta Vista effect kicked in, which was the notion that you search for something,
00:01:31.740 | you got a thousand or thousands of hits back, and you have to look through page after page
00:01:37.660 | until you found the result that was relevant to you. The Alta Vista effect, you got too much back from the internet.
00:01:43.740 | That wasn't a problem in the beginning because most of the things you search for when I went onto the
00:01:48.460 | internet in the beginning got zero results back because there was no content about that on the
00:01:52.860 | internet, right? But the Alta Vista effect, too many search results, was solved by Google. This is
00:02:00.220 | Google's press release mid-2000. They talk about a billion URLs they've indexed, right? But they also
00:02:07.260 | talk about the technology that they use behind the scenes, the technology called PageRank that delivers
00:02:13.180 | the most important search results really early on. In fact, the first, the top 10 blue links on that
00:02:19.740 | first page, right? That technology, PageRank, is actually a graph algorithm, which is actually,
00:02:27.100 | it's called Eigenvector Centrality, and the innovation that Google did was applying that
00:02:32.140 | to the scale of the internet and the scale of the web, right? PageRank. That ushered in and created,
00:02:39.820 | honestly, the most valuable company on the planet for quite some while, the PageRank era, right? That
00:02:46.620 | lasted for about a decade, about a dozen years, until in 2012, Google wrote this blog post, which is an
00:02:53.740 | amazing blog post introducing the knowledge graph, things, not strings, where they said, you know
00:03:00.460 | what, guys? We've done an upgrade on the back end of our search technology, the biggest one since we
00:03:06.060 | invented PageRank, where we're moving away from not just storing the text and the links between the
00:03:12.620 | documents, but also the concepts embedded in those documents, things, not just strings. And we all know
00:03:20.700 | what the knowledge graph looks like visually. When you search for something on Google today, Moscone Center,
00:03:27.100 | just around the corner from here, you're going to get this little panel right on the right hand side.
00:03:32.460 | If you look at that panel, it has a combination of unstructured text, in this case from Wikipedia,
00:03:38.860 | with structured text, it has the address, the owner of the Moscone building, you know, that kind of stuff.
00:03:45.100 | This thing is backed on the back end by the data structure looking like this, right? It has these
00:03:52.060 | concept, the rings, that we call nodes, that are connected to other nodes through relationships,
00:03:57.900 | and both the nodes and the relationships have key value properties. You can attach two, three, a thousand,
00:04:04.140 | ten thousand on both the nodes, and very importantly, also on the relationships. This is a knowledge graph,
00:04:11.340 | and that was the next decade or so, 12 years, of Google's dominance. Until a few months ago. A few
00:04:19.580 | months ago, at Google I/O, they took the next step, ushered in by the AI engineers conference a year ago.
00:04:27.340 | Well, not quite, but of course, the entire craze around Gen AI. And this is one of the examples that they did,
00:04:32.860 | the classic travel itinerary. They helped me plan out this travel. Everyone here in this room knows
00:04:38.380 | that this is backed by an LLM. And it is backed by an LLM in combination with this knowledge graph data
00:04:45.660 | structure, GraphRag. This is ushering in the next era of web search, the GraphRag era. What I'm going to
00:04:55.100 | talk to you about today is how can you use, well, first of all, should you, and if so, how can you
00:05:01.020 | use GraphRag for your own RAG-based applications? So what is GraphRag, right? It is very, very simple.
00:05:09.980 | GraphRag is RAG where on the retrieval path you use a knowledge graph. Very, very simple. It doesn't say
00:05:18.620 | you only use a knowledge graph, but you use a knowledge graph. Maybe in combination with other
00:05:23.020 | technologies like vector search. So let's take the classic example of a customer service bot,
00:05:29.020 | right? And let's say that you are working at a company that is building Wi-Fi routers, for example,
00:05:36.460 | right? And you have a bunch of support articles, right? And they've been stored in text files, right? And then
00:05:43.340 | you are tasked with building a bot that either gives direct end users access to it or your own
00:05:49.100 | customer service agents, employees, like access to this information. And you know how to do this because
00:05:55.340 | you live in the LLM world, in the Gen.AI world, so you're going to use RAG for this, right? And so you
00:05:59.900 | have that data. It's text documents. You've added that text onto the properties of particular nodes,
00:06:07.500 | right? So you have a node per article. But then you've also said that, you know what,
00:06:11.660 | this article is about this particular Wi-Fi product, right? So you have a relation to that Wi-Fi product.
00:06:17.500 | And that Wi-Fi product sits in a hierarchy of other Wi-Fi products. And it's written by this particular
00:06:23.500 | customer service engineer, you know, that kind of stuff. And then the end user has a question.
00:06:28.380 | Hey, my Wi-Fi lights are flashing yellow and my connection drops. Like, what should I do? Something like that.
00:06:35.340 | I think we all know how we do this. We vectorize the search, right? We get some kind of vector embedding back.
00:06:41.340 | We use vector search to get the core documents. But here's where the graph RAG part kicks in.
00:06:46.780 | You get those core articles back, which are linked to the nodes. Actually, the text is on the nodes.
00:06:52.140 | But then you use the graph to traverse from there and retrieve more context around it. Maybe it's not
00:06:58.380 | just that particular article for that particular Wi-Fi, but something else in that family. Maybe you use the
00:07:05.180 | fact that this particular engineer has very highly ranked content. And then you rank that higher,
00:07:10.620 | right? You retrieve more context than what you get out of the ANN-based search from your vector store.
00:07:17.900 | And you pass that on to the LLM. Along with the question, you get an answer back, and you hand it
00:07:22.620 | to the user. So the core pattern is actually really, really simple, but really, really powerful.
00:07:30.060 | All right? You start with doing a vector search. I think of this almost as a primary key. It's,
00:07:36.220 | of course, not a primary key, but almost like a primary key lookup into the graph. You use that vector
00:07:40.940 | search. You get an initial set of nodes. Then you walk the graph, and you expand that and find relevant
00:07:48.780 | content based on the structure of the graph. Then you take that and you return it to the LLM. Or optionally,
00:07:55.100 | maybe that gives you 1,000 or 10,000 nodes back. And then you do what Google did. You rank that. You get the top
00:08:02.460 | K based on the structure of the graph. Maybe you even use page rank, right? You get that. You pass it on to the LLM.
00:08:08.700 | It's really, really simple, but really, really powerful. And then there's a number more advanced
00:08:14.380 | patterns, but that's kind of the next talk I'll do in a year. It's more sophisticated graph retrieval
00:08:20.780 | patterns, right? But the core one, very, very simple. Okay. So if that's what GraphRag is,
00:08:29.420 | what are the benefits of GraphRag? When should you use it? When should you not use it? The first and most
00:08:35.340 | stark benefit is accuracy. It's directly correlated to the quality of the answer. There's been a ton
00:08:42.300 | of research articles about this in the last six months, something like that. I believe the first
00:08:47.900 | one was this one by data.world. I just picked out three at random here that I like. This is the first
00:08:54.460 | one that I know of by data.world, which is a data cataloging company based on a knowledge graph. And they
00:09:00.300 | proved out across, I think, 43 different questions that on average, the response quality, the accuracy,
00:09:08.060 | was three times higher if they use a knowledge graph in combination with vector search.
00:09:14.780 | I love this paper by LinkedIn. It shows a very similar type. I think it's like 75% or 77%
00:09:22.540 | increase in accuracy. But it also has a great architecture view. So you can take the QR code
00:09:28.940 | right there and look at that paper, which combines various components and also the flow through that,
00:09:34.620 | that I thought was just really pedagogical. But by and large, it's showing the same thing,
00:09:40.220 | a little bit of different numbers, but significantly higher accuracy when it used graph in combination
00:09:45.900 | with vector search. And then Microsoft had a fantastic blog post and subsequently, I think,
00:09:51.900 | two academic papers. The blog post was in February of this year, where they also talk about the increased
00:09:58.300 | quality of response. But also beyond that, hey, you know what? GraphRag enables us to answer another
00:10:05.900 | important class of questions that we couldn't even do with vector search alone or baseline
00:10:11.260 | vector search or baseline RAG alone. So first benefit, higher quality response back.
00:10:17.740 | The second one is easier development. And this one is a little bit interesting because there's an
00:10:24.540 | asterisk in there. Because what we hear very clearly from our users is that it's easier to build RAG
00:10:30.220 | applications with graph RAG compared to baseline RAG. But we also hear it's like, it's actually hard.
00:10:36.380 | And what's the nuance there? Well, the nuance is if you already have a knowledge graph up and running.
00:10:42.220 | So there's a learning curve where people need to learn, how do I create the knowledge graph in the
00:10:46.300 | first place? Once you have that, it's a lot easier. But how do you create that knowledge graph,
00:10:52.380 | right? So let's put a little pin in that. If I rush through the next few slides quickly enough,
00:10:57.580 | I'm going to show you hopefully a demo on that. But let's put a little pin in that. So this is an
00:11:02.860 | example. This is from a very high growth stage fintech company that is very cutting edge in AI.
00:11:12.300 | And they started playing around with graph RAG a few, about six months ago. And they took an existing
00:11:17.500 | application and they said, you know what? We are going to port this from a vector database to Neo4j,
00:11:23.500 | and most of the operations yield a better result. They can calculate the embeddings on a database
00:11:28.460 | level. Getting related actions is as simple as following the relationships between nodes.
00:11:33.260 | And this one I love. The cache, and the cache here is their application, they call it the cache,
00:11:39.100 | can be visualized. This is an extremely valuable debugging tool. And in the parenthesis, I actually
00:11:45.420 | said that you've already fixed a couple of bugs just thanks to this. Right? Amazing. Like once you've
00:11:51.180 | been able to create that graph, it's a lot easier to build your RAG application.
00:11:55.180 | And why is that? Right? So let's talk a little bit about representation. Let's say we have the phrase
00:12:03.980 | in there, apples and oranges are both fruit. And we want to represent that in vector space and in graph
00:12:10.060 | space. In graph space, we already talked about this. Apple is a fruit. Orange is a fruit. Pretty easy.
00:12:16.700 | That's the representation in graph space. In vector space, it looks like this. Maybe. Or maybe this is
00:12:24.780 | something else like we actually don't know. Two different ways of representing that phrase.
00:12:29.500 | And then we can run similarity calculations in different ways using these both representations that
00:12:36.540 | I'm not going to go through right now. We can search in different ways. These are not competing ways of
00:12:42.940 | doing it. They're complementary ways of doing it. Right? One is not better than the other. Except I will
00:12:47.980 | make one statement. Which is, when you sit down and you write your application, when you build your
00:12:54.860 | application, I'm actually going to make the statement that one of them is superior. This vector space
00:13:00.540 | representation is completely opaque to a human being. But the graph representation is very, very clear.
00:13:08.220 | It is explicit. It's deterministic. It's visual. You can see it. You can touch it as you build your
00:13:14.620 | applications. This is the, I already fixed a couple of bugs thanks to this. Just by porting it from a
00:13:20.460 | vector-only store to graph rag, they were able to see and work with the data. And that is really freaking
00:13:27.100 | powerful. That shows up in development time as you're building your applications. It's also showing up for
00:13:34.540 | our friends in IT who worry about things maybe that is not directly related to building the application,
00:13:41.580 | which is explainability, which is auditability, which is governance. That explicit data structure
00:13:50.060 | has knock-on effects over there that are really, really powerful. Once you're up and running in
00:13:55.740 | production, you need to be able to explain why something happened.
00:13:59.340 | So, higher accuracy, better answers, easier to build once you're through the hump of creating the
00:14:08.140 | knowledge graph, and then increased explainability and governance for IT and the business, right?
00:14:13.900 | Those are the three things. So, how do you get started with graph rag? Well, I've talked a lot about
00:14:20.940 | this already. Like, how do you create the knowledge graph in the first place? So, a little bit of nuance here.
00:14:26.220 | So, basically, there are three types of data out in the world that I care about when I think about
00:14:31.260 | knowledge graph creation. The first one is structured data. So, this is your data in your snowflake or
00:14:36.780 | something like that or Postgres, right? The other one is unstructured data, PDF files, raw text from a web
00:14:43.500 | page. And the other one, the third one is mixed. People tend to call this semi-structured, but it's
00:14:49.180 | not. Hit me up afterwards and I'll tell you why it's not. But basically, what this one is is structured data
00:14:54.060 | where some of the fields are long form text, right? Basically, we're great in the first bucket in the
00:15:01.980 | graph world. It's very easy to go from Snowflake or Postgres or MySQL or Oracle into a property graph
00:15:08.540 | model. The unstructured one is really freaking hard, right? It's hard to do in theory. It's
00:15:16.700 | also had immature tooling for a long run. The middle one is actually where the majority of at least
00:15:24.300 | enterprise production use cases are in the real world.
00:15:29.420 | So, man, two and a half minutes. This is rough. There are two types of graphs and I'm not going
00:15:35.180 | to talk about them. I want to talk about them. Lexical graphs and domain graphs is actually really
00:15:39.260 | relevant, but I really want to get to this demo. So, I've talked about creating graphs with unstructured
00:15:46.060 | information. So, we just built this new tool that we launched just a few weeks ago called the Knowledge
00:15:51.340 | Graph Builder. And you see it here. Can you see the screen? Okay. So, basically, here you can drag
00:15:58.380 | and drop your PDF files. You can put in YouTube links, Wikipedia links. You can point it to your
00:16:04.220 | kind of cloud service buckets, right? And it's going to extract the data from there and create the graph.
00:16:09.500 | So, here I added a few things. I added a PDF of Andrew Ning's newsletter, The Batch. I added the
00:16:17.740 | Wikipedia page for OpenAI. And I added the YouTube from Swix and Alessio's, you know, the Four Wars Latent
00:16:24.860 | Space Podcast. So, I added all that and I uploaded it into this Knowledge Graph Builder. And when I do
00:16:33.100 | that, it creates if -- let's see here. I knew the Ethernet connection was going to do it. It automagically
00:16:42.380 | I created a little Knowledge Graph. If it renders -- wait for it. It says one minute here.
00:16:50.620 | So, it better render pretty soon. All right. Let me do this again. Please work.
00:17:00.220 | And I was like trying to keep it alive in the -- in the thing, too. All right.
00:17:18.700 | Okay. Let's see. I think we are here. And then it says show me a graph. And it's not going to show me the
00:17:36.940 | graph. Oh, yeah, it will. Come on. You can do it. All right. Yes.
00:17:45.180 | So, what we have here -- check this shit out. I would love to sit here and just drink in your applause,
00:17:51.340 | but we need to look at this data. So, check this out. This is the document, the Four Wars document.
00:17:57.100 | Here are the various chunks. And then you can take a chunk and you can expand that. This, I put in the
00:18:02.700 | the embedding. And you can -- I'll zoom out here. And you can see that it takes the logical concept
00:18:12.060 | elements out of that chunk. Right? Machine learning. They talk about something that is developed in a
00:18:17.340 | similar fashion. I don't even know. There's some company there. Right? And you get that entire graph
00:18:23.180 | of all this information. On top of that, I really don't have time to show it. But there's also -- I really
00:18:30.540 | don't have time to show it. There's a chat bot in here that you can use. And you can introspect the
00:18:35.660 | result that gets back. I'll -- one more second. Take up your phones. If you think this looks cool,
00:18:41.740 | take a photo of this QR code. And you're going to have an amazing landing page where you have access to
00:18:48.460 | all of this information. You can get up and running yourself. Thank you for the additional minutes.
00:18:52.620 | Thank you, Emil. Thanks, Emil. Thanks, everyone, for paying attention.
00:19:02.940 | Thank you, Emil. Thanks, everyone. Thanks, everyone. Thanks, everyone.