back to indexStop Using RAG as Memory — Daniel Chalef, Zep

00:00:00.000 |
I'm here today to tell you that there's no one-size-fits-all memory and why you 00:00:23.440 |
need to model your memory after your business domain. So if you saw me a 00:00:29.240 |
little bit earlier, and I was talking about Graffiti, Zep's open-source temporal 00:00:34.200 |
graph framework, you might have seen me just speak to how you can build custom 00:00:40.880 |
entities and edges in the Graffiti graph for your particular business domain, so 00:00:48.560 |
business objects from your business domain. What I'm going to demo today is 00:00:52.760 |
actually how Zep implements that and how easy it is to use from Python, TypeScript, 00:00:58.480 |
or Go. And what we've done here is we've solved the fundamental problem 00:01:03.440 |
plaguing memory. And we're enabling developers to build out memory that is 00:01:12.440 |
far more cogent and capable for many different use cases. So I'm going to 00:01:18.720 |
just show you a quick example of where things go really wrong. So many of you 00:01:25.240 |
might have used ChatGPT before. It generates facts about you in memory. And 00:01:30.200 |
you might have noticed that it really struggles with relevance. Sometimes it 00:01:34.960 |
just pulls out all sorts of arbitrary facts about you. And unfortunately, when you 00:01:40.120 |
store arbitrary facts and retrieve them as memory, you get inaccurate responses or 00:01:45.800 |
hallucinations. And the same problem happens when you're building your own agents. So 00:01:52.200 |
here we go. We have an example media assistant. And it should remember things 00:01:57.160 |
about jazz music, NPR podcasts, the daily, et cetera, all the things that I like to 00:02:02.720 |
listen to. But unfortunately, because I'm in conversation with the agent or it's 00:02:07.160 |
picking up my voice when it's a voice agent, it's learning all sorts of irrelevant 00:02:13.160 |
things. Like I wake up at 7:00 AM, my dog's name is Melody, et cetera. And the point here 00:02:21.000 |
is that irrelevant facts pollute memory. They're not specific to the media player 00:02:27.000 |
business domain. And so the technical reality here is as well that many frameworks take this 00:02:34.280 |
really simplistic approach to generating facts. If you're using a framework that has memory capabilities, 00:02:40.680 |
agent framework, it's generating facts and throwing it into a vector database. And unfortunately, 00:02:45.960 |
the facts dumped into the vector database or Redis mean that when you're recalling that memory, 00:02:51.480 |
it's difficult to differentiate what should be returned. We're going to return what is semantically 00:02:57.160 |
similar. And here we have a bunch of facts that are semantically similar to my request for my favorite 00:03:04.680 |
tunes. We have some good things. And unfortunately, Melody is there as well, because Melody is a 00:03:10.520 |
dog named Melody. And that might be something to do with tunes. 00:03:17.160 |
Melody is there as well. And so a bunch of irrelevant stuff. So basically semantic similarity is not 00:03:26.520 |
business relevance. And this is not unexpected. I was speaking a little bit earlier about how vectors 00:03:34.920 |
and are just basically projections into an embedding space. There's no causal or relational relations between 00:03:43.400 |
them. And so we need a solution. We need domain aware memory, not better semantic search. 00:03:51.800 |
So with that, I am going to unfortunately be showing you a video because the Wi-Fi has been absolutely terrible. 00:04:06.200 |
Okay. So I built a little application here, and it is a finance coach. And I've told it I want to buy a house. 00:04:20.840 |
And it's asking me, well, how much do I earn a year? It's asking me about what student loan debt I might have. 00:04:32.040 |
And we'll see that on the right-hand side, what is stored in Zepp's memory are some very explicit 00:04:41.000 |
business objects. We have financial goals, debts, income sources, etc. These are defined by the developer. 00:04:53.400 |
And they're defined in a way which is really simple to understand. We can use Pydantic or 00:05:01.640 |
Zod or Go structs. And we can apply business rules. So let's go take a look at some of the code here. 00:05:12.280 |
using Zepp's underlying SDK. We can define these entity types. We can give a description to the entity type. 00:05:19.800 |
We can even define fields, the business rules for those fields, so the values that they take on. 00:05:26.920 |
And then we can build tools for our agent to retrieve a financial snapshot, which runs multiple Zepp searches 00:05:34.840 |
at the same time concurrently, and filters by specific node types. 00:05:43.640 |
And when we start our Zepp application, what we're going to do is we're going to register 00:05:47.640 |
these particular goals, sorry, objects with Zepp, so it knows to build this ontology in the graph. 00:05:58.200 |
So let's do a quick little addition here. I'm going to say that I have $5,000 a month rent. 00:06:07.240 |
I think it's rent. And in a few seconds, we see that Zepp's already paused that new message 00:06:16.520 |
and has captured that $5,000. And we can go look at the chart, the graph. This is the Zepp front end. 00:06:23.480 |
And we can see the knowledge graph for this user has got a debt account entity. It's got fields on it 00:06:32.120 |
that we've defined as a developer. And so again, we can really get really tight about what we retrieve 00:06:39.320 |
from Zepp by filtering. Okay, so we're at time. So just very quickly, we wrote a paper about how all of this works. 00:06:47.000 |
You can get to it by that link below. And I appreciate your time today. You can look me up afterwards.