back to index

Agentic GraphRAG: AI’s Logical Edge — Stephen Chin, Neo4j


Whisper Transcript | Transcript Only Page

00:00:00.040 | My name is Steven Chin. I run the developer relations team at Neo4j. Actually, I just
00:00:20.280 | started writing a new book on GraphRag with O'Reilly. So that was my weekend, was writing
00:00:27.360 | chapters. First chapter done. So I think we'll actually put the pre-release version of it
00:00:32.980 | up soon. And what I am going to talk about for the next 10 minutes is agentic GraphRag
00:00:40.240 | and a little bit about how you can accomplish this. Okay? So, first of all, who here is familiar
00:00:46.040 | with graph databases? Okay. That's really good. Okay. You are well above and ahead of the curve.
00:00:53.340 | How many folks have given GraphRag a try to build systems using graphs? Okay. So one
00:01:02.220 | guy in the back. He is the expert. If you have any questions, ask him. And just to set the
00:01:08.240 | context on why, is we basically have a problem with a lot of agentic systems where they are
00:01:15.340 | not meeting use cases. This was Gartner's prediction of doom and utter failure. And they have a lot
00:01:20.920 | of hallucinations in them. And so this is an example. I am going to kind of breeze through
00:01:27.120 | this a little bit. But I basically asked the OpenA3 reasoning API to solve a question about
00:01:33.040 | biases, reasoning, and math. And the basic problem is it doesn't do a good job if you don't give
00:01:40.220 | it enough deep information sources of answering things correctly. And the question is how many
00:01:45.520 | girls you can fit in a classroom? So like it's a tech computer science bias. It's a little
00:01:49.760 | bit about like math and reasoning because I asked it about a grid. It inaccurately anchors. These
00:01:56.760 | are the grid sizes which you could choose on the number of girls you can fit. It inaccurately anchors
00:02:01.960 | in the number of people you can fit in the number of people. And if you don't have a problem, you can
00:02:06.200 | have a problem. And if you don't have a problem, you can fit in the number of people. But if you don't
00:02:06.200 | have a problem, you can fit in the number of people. You can fit in the number of people. And you
00:02:06.200 | can fit in the number of people. And if you don't have a problem, you can fit in the number of people.
00:02:08.160 | It's a square grid. So like the audience, you guys always win against the reasoning AI, even though
00:02:13.920 | it takes 40 seconds to noodle on this. And it further, so I ask like, are the girls and boys going
00:02:21.680 | to go to home economics or to, whatchamacallit, to sports? And again, this is both a bias and also I
00:02:31.280 | misled it by giving information about the ratio of girls and boys. And it comes up with an answer,
00:02:37.360 | calculating that all the girls will go to cooking class. Which is horrible. I mean, I'm the chef at
00:02:43.360 | home and I love cooking. So I think that these are the biases. Now this is kind of funny because we can
00:02:50.560 | reason about the situation and it's like a problem that we can think about. But imagine if this was
00:02:55.360 | like in life sciences about drug discovery. Or if you're solving a supply chain issue. So the fact the LLM
00:03:02.480 | has gone and inserted biases. It's done incorrect reasoning along the way. This means you're going
00:03:09.120 | to get the wrong business results. And it's very hard to figure this out. So basically the problem is
00:03:14.720 | the LLM is good at extrapolating information like doing language tasks, figuring out things. And it gives the
00:03:21.840 | impression of intelligence where there's no real intelligence. There's no real kind of human
00:03:27.840 | reasoning behind it. And so we over ascribe things that can do. And there's a bunch of things that
00:03:34.640 | can't do well. Now those are things that knowledge graphs are actually really good at.
00:03:40.160 | So one way we can solve this problem is by throwing more computers at it, more agents. And agentic
00:03:47.280 | systems are good at improving the quality of results because you have LLMs talking to each other and
00:03:52.800 | reasoning with each other about the problem. So basically you have agents who observe, they think,
00:03:57.920 | and they take actions. So you have different types of agents which are doing different things in the
00:04:02.160 | workflow. An agentic runtime might look something like this where you're building out and you have a
00:04:08.800 | orchestration layer which is working on the agents. You have some Gen AI models hooked up. You have some
00:04:14.720 | tools. And then these all collaborate to give you better results than one LLM can give you together.
00:04:22.800 | Now the challenge with this is that it's a very monolithic architecture. It's hard to maintain. It's hard
00:04:28.800 | to swap out the tools and it also kind of puts you in a situation where you can't secure the system. You
00:04:35.200 | can't do a bunch of things. So a good way to solve this is using MCP. You kind of use MCP as your tools
00:04:42.000 | where your agents are talking to them. With MCP now you have your servers and you have your data sources
00:04:49.040 | which are now talking to each other. You have your client and server so now you can give it files or
00:04:55.360 | database records. You can give it a graph database as the system of record and we built a bunch of
00:05:01.120 | tools at Neo4j on top of MCP. So we built a Cypher tool which Cypher is the query language for graph
00:05:08.640 | databases. So basically it will use -- when you ask the MCP server it gives capabilities to generate
00:05:15.920 | Cypher queries off of prompts or questions or other things you want to pass it. We have a memory module so
00:05:21.760 | this gives you some agent memory you can use to plug into agentic systems. And then we also have
00:05:27.120 | MCP on top of our cloud APIs if you want to provision databases or do different things on top of it.
00:05:33.520 | And I think this is a pattern you'll see with a lot of people who are the vendors or building things is
00:05:37.760 | now you can plug these tools into your agent architecture and you can use them together with your
00:05:43.520 | graph. Typically agents are represented in some sort of graph. This is a picture of a lang graph agent.
00:05:49.120 | And you can layer memory on top of it. So these are all folks who just spoke in our panel. So Zepp, Cogni,
00:05:57.680 | Memzero, we're all talking about their approach to agents. Actually they all run on top of Neo4j.
00:06:04.720 | So they use Neo4j as the core graph database. Some of them are pluggable so you can choose your graph
00:06:09.600 | database of choice. But they're a really good way of giving memory to your agents which is graph based
00:06:15.280 | and matches the way LLMs want to store, communicate, and retrieve information. And also one of our other
00:06:23.120 | speakers on the graph track showed his architecture for doing video search and summarization. And if you
00:06:29.120 | notice, they do a bunch of this already, right? So they do short term memory, they have a whole bunch
00:06:36.080 | of lookup information, and they give you a choice of, they have both a graph rag pipeline and a vector
00:06:42.160 | pipeline. So I highlighted the graph rag in red. And the reason they're doing this is because when you
00:06:48.400 | use graph rag, you get some advantages in terms of the results coming back and typically a lower rate
00:06:54.800 | of hallucinations. This would be your, like, direct LLMs are kind of, you know, they give you very generic
00:06:59.760 | responses to a healthcare question. Baseline rag, you get better results back, but it's, it's incomplete
00:07:06.560 | because it's doing, basically it's doing vector similarity. So similarity is not relevance. It doesn't mean it
00:07:13.680 | actually understands the problem. This would be a system which does graph rag, and the typical pattern that would
00:07:19.360 | get you this is, um, first do your search in a, in a vector search. You can use a vector database. We also
00:07:27.680 | support vector search on top of Neo4j, and it gives you back results. And that's a good way of translating the
00:07:33.120 | question into, like, vectors. And then you have mappings from the vector embeddings to your graph, and you, you pull
00:07:41.120 | back the nodes which are relevant. So in this case, like, you're asking about a, um, emphysema, you'd get back the, the node for emphysema, you'd pull back all the
00:07:49.200 | related nodes and diagnosis and conditions. And you see it's just, it lists them all, right? So this is a very effective way to get really good responses back when you're dealing with
00:07:57.520 | something where you, you kind of have this mix of structured and unstructured data. Here's a quick architecture of how you could put this together using, you know, you're using traversal and vector similarity.
00:08:09.520 | Um, you take in the question, you do the query against either vectors or knowledge graphs. Um, graph data science or graph analytics are helpful as well to do community algorithms.
00:08:19.040 | algorithms and groupings and things like this. Um, some of this is in the Microsoft Graphrag paper and other research which is going on in this area.
00:08:26.320 | And, um, you feed that back as context to the LLM to improve the quality of the answers. Um, some of the patterns I've talked about quickly. So text to cipher is what our MCP server does.
00:08:38.160 | it's, it's, it's, it can be good, but sometimes it can be really bad because the generation of cipher by LLMs is not as good as you need it for some cases.
00:08:48.160 | Um, the one I was talking about is, as, um, which one was it? Um, basically vector search with graph context, right?
00:08:55.600 | So you do a vector search and then you use that to pull back related nodes and graph context. That's a really good pattern to start with.
00:09:02.880 | And, um, you can also do pre and post filtering of vector results to bubble things which are more relevant to the top of the context.
00:09:09.040 | Um, certain systems, this is quite good as well because all you want to do is make sure the LLM gets things higher up in the buffer for context windows.
00:09:16.320 | Even if LLMs now have larger context windows, basically what, what the evals show is they ignore most of it and they look at the stuff at the top.
00:09:24.320 | Okay. So a quick example of a company which is doing this. So Klarna basically replaced all of their SaaS systems with a great graph rag project.
00:09:33.360 | Um, they're one of our customers. Um, they took an enterprise wiki's HR systems internal documentation.
00:09:40.400 | 250k employee questions asked for the first year. 2,000 daily queries processed and 85% employee adoption.
00:09:46.720 | So like a really good adoption of this technology. Um, and I'll give you a couple resources and I think we're at time.
00:09:54.640 | Is that about right, crew? Oh, okay. Oh, I see. There's five minutes between sessions.
00:10:01.280 | I was, I was rushing to do this even quicker. Okay. So we have time for questions, which is great.
00:10:05.760 | Um, so one resource I'd recommend is the, um, Neo4j certified developer program. Um, so with the number
00:10:14.480 | of hands in the room here for folks who said they know graphs, I'm pretty sure you could all pass the
00:10:20.320 | Neo4j certified exam if you just took it today. And we basically will, we'll mail you a Neo4j certified
00:10:26.880 | t-shirt. You get a little LinkedIn badge to put on your profile. And it's a nice way to just show the world
00:10:32.400 | that, like, you actually know this stuff. Like, you know graph technology, you know stuff. We'll probably
00:10:36.800 | add an additional certifications for, like, you know, graph reg and other stuff in the future. But
00:10:41.280 | the base certified developer class is a good way just to get base knowledge in this. And we do have
00:10:46.320 | classes in Graph Academy on building chat bots, using LLMs, using all of this stuff as well.
00:10:51.680 | Um, second resource is our nodes conference. So the Neo4j nodes conference is an annual conference. It runs in
00:10:59.040 | three different time zones, 24 hours, all free content, free sessions. You know, come, come out
00:11:04.400 | and join and check out some of this content. Okay. I'll let people finish who want to do the QR code.
00:11:10.480 | And thank you very much for coming. And we have a few minutes for questions. Okay. So what we'll do is,
00:11:17.040 | anyone who has questions, just raise your hand and shout it out. If anyone wants to leave the room,
00:11:22.560 | feel free to do that as well. I don't want to keep you trapped in here. Okay. So in the back.
00:11:26.640 | Yeah. Yeah. Okay. So the question is, like, what the pattern is for doing this type of search.
00:11:45.120 | And that's exactly right. So basically, what you're doing is you're using the LLM for what it's good at,
00:11:51.520 | which is language translation. So the user can enter whatever convoluted question they want,
00:11:56.800 | which would never translate to a beautiful cipher query. And then you, you first tell the LLM, well,
00:12:03.600 | do a vector search on that, like finds vector similarity in this. And then because you've
00:12:09.600 | generated like the graph and the embeddings to point to each other. Now you can go to the graph and you can
00:12:15.440 | say, well, this, these, these embeddings all point to this node in the graph. So it's probably about,
00:12:21.280 | in the, in the case emphysema. Now I want to pull back the nodes, which are either like you could use
00:12:26.560 | cosine similarity or you could use, um, um, community grouping algorithms or different algorithms to figure
00:12:33.040 | out what's relevant and then pass that as context.
00:12:35.760 | Speaker 2: So in a simple architecture, the chunks are the same.
00:12:38.400 | Speaker 2: What? Speaker 2: The chunks you embed in your nodes are the same pieces.
00:12:43.360 | Speaker 2: Yeah. So, so, so like what, what we typically do, and this is what'll happen if you,
00:12:47.920 | um, import unstructured data to Neo4j. Well, we, we, we can create a node structure out of it using LLMs.
00:12:57.360 | And then you hang your embeddings, your text embeddings off as properties off of the nodes.
00:13:01.600 | Speaker 2: Um, so we have a couple plugins for this. We have a Neo4j Python library,
00:13:08.320 | which that has, will do a lot of this. Uh, we also have an integration with Langchain.
00:13:12.880 | So you can use Langchain or, um, llama index or haystack. Yeah. So whatever framework you want to do,
00:13:21.760 | you can choose. And we pretty much have integrations with all of them to, to help with the, um, yeah,
00:13:27.600 | the associations. Okay. And there were a bunch of hands, but I don't know who's first. So you, you, you can go.
00:13:35.120 | Speaker 2: Yeah, I think you said that you have an MCP server for memory.
00:13:39.600 | Speaker 2: Yeah. Speaker 2: Okay. Does that know also the logic, like the update and so on,
00:13:44.080 | Speaker 2: Okay. So, so the question's about like how the MCP agent for memory works. Um, so that,
00:13:57.440 | that is a great question, but I actually don't know the answer. It's a, it's an open source MCP server.
00:14:04.080 | Now, now if, if you want the answer, um, the session, which I'm giving with Michael Hunger and Jesus,
00:14:11.920 | Speaker 2: Um, in, I don't know, in a, in a bit, um, Michael's team built all the MCP servers. So he,
00:14:19.200 | he actually will know the answer. And yes, it's, it's today. I think it's right after this or shortly
00:14:24.400 | after this and Michael will go on for hours if you ask him that. So that's a great question. Okay. And
00:14:30.320 | Speaker 2: We're at time. You got the last question. Yeah. So I want to ask about your opinion between
00:14:35.760 | a Lang chain and Lang graph frameworks. Uh, are they like complimentary or they're like,
00:14:40.320 | what's your perspective of that? Um, okay. So the question is like,
00:14:46.320 | like Lang chain versus Lang graph. I thought, I thought Lang graph was like the agent thing that
00:14:51.920 | the Lang chain folks built. No, did you, did you use any of that? Like I'm also new to that. So
00:14:57.520 | Speaker 2: Yeah. So we, we have a bunch of experiments and prototypes with, um, Lang graph for,
00:15:04.000 | for doing agents and things. Um, we have integrations with Lang chain. We also integrate with all the other
00:15:11.360 | memory vendors. I mean, like I would say from our perspective, use the tool that's best for you and
00:15:15.840 | we'll integrate with everything. Yeah. Okay. Thanks for coming. Thanks.