Stop Using RAG as Memory — Daniel Chalef, Zep

00:00:00.000 | I'm here today to tell you that there's no one-size-fits-all memory and why you

00:00:23.440 | need to model your memory after your business domain. So if you saw me a

00:00:29.240 | little bit earlier, and I was talking about Graffiti, Zep's open-source temporal

00:00:34.200 | graph framework, you might have seen me just speak to how you can build custom

00:00:40.880 | entities and edges in the Graffiti graph for your particular business domain, so

00:00:48.560 | business objects from your business domain. What I'm going to demo today is

00:00:52.760 | actually how Zep implements that and how easy it is to use from Python, TypeScript,

00:00:58.480 | or Go. And what we've done here is we've solved the fundamental problem

00:01:03.440 | plaguing memory. And we're enabling developers to build out memory that is

00:01:12.440 | far more cogent and capable for many different use cases. So I'm going to

00:01:18.720 | just show you a quick example of where things go really wrong. So many of you

00:01:25.240 | might have used ChatGPT before. It generates facts about you in memory. And

00:01:30.200 | you might have noticed that it really struggles with relevance. Sometimes it

00:01:34.960 | just pulls out all sorts of arbitrary facts about you. And unfortunately, when you

00:01:40.120 | store arbitrary facts and retrieve them as memory, you get inaccurate responses or

00:01:45.800 | hallucinations. And the same problem happens when you're building your own agents. So

00:01:52.200 | here we go. We have an example media assistant. And it should remember things

00:01:57.160 | about jazz music, NPR podcasts, the daily, et cetera, all the things that I like to

00:02:02.720 | listen to. But unfortunately, because I'm in conversation with the agent or it's

00:02:07.160 | picking up my voice when it's a voice agent, it's learning all sorts of irrelevant

00:02:13.160 | things. Like I wake up at 7:00 AM, my dog's name is Melody, et cetera. And the point here

00:02:21.000 | is that irrelevant facts pollute memory. They're not specific to the media player

00:02:27.000 | business domain. And so the technical reality here is as well that many frameworks take this

00:02:34.280 | really simplistic approach to generating facts. If you're using a framework that has memory capabilities,

00:02:40.680 | agent framework, it's generating facts and throwing it into a vector database. And unfortunately,

00:02:45.960 | the facts dumped into the vector database or Redis mean that when you're recalling that memory,

00:02:51.480 | it's difficult to differentiate what should be returned. We're going to return what is semantically

00:02:57.160 | similar. And here we have a bunch of facts that are semantically similar to my request for my favorite

00:03:04.680 | tunes. We have some good things. And unfortunately, Melody is there as well, because Melody is a

00:03:10.520 | dog named Melody. And that might be something to do with tunes.

00:03:17.160 | Melody is there as well. And so a bunch of irrelevant stuff. So basically semantic similarity is not

00:03:26.520 | business relevance. And this is not unexpected. I was speaking a little bit earlier about how vectors

00:03:34.920 | and are just basically projections into an embedding space. There's no causal or relational relations between

00:03:43.400 | them. And so we need a solution. We need domain aware memory, not better semantic search.

00:03:51.800 | So with that, I am going to unfortunately be showing you a video because the Wi-Fi has been absolutely terrible.

00:04:04.360 | And let me bring up the video.

00:04:06.200 | Okay. So I built a little application here, and it is a finance coach. And I've told it I want to buy a house.

00:04:20.840 | And it's asking me, well, how much do I earn a year? It's asking me about what student loan debt I might have.

00:04:32.040 | And we'll see that on the right-hand side, what is stored in Zepp's memory are some very explicit

00:04:41.000 | business objects. We have financial goals, debts, income sources, etc. These are defined by the developer.

00:04:53.400 | And they're defined in a way which is really simple to understand. We can use Pydantic or

00:05:01.640 | Zod or Go structs. And we can apply business rules. So let's go take a look at some of the code here.

00:05:09.000 | We have a TypeScript financial goal schema

00:05:12.280 | using Zepp's underlying SDK. We can define these entity types. We can give a description to the entity type.

00:05:19.800 | We can even define fields, the business rules for those fields, so the values that they take on.

00:05:26.920 | And then we can build tools for our agent to retrieve a financial snapshot, which runs multiple Zepp searches

00:05:34.840 | at the same time concurrently, and filters by specific node types.

00:05:43.640 | And when we start our Zepp application, what we're going to do is we're going to register

00:05:47.640 | these particular goals, sorry, objects with Zepp, so it knows to build this ontology in the graph.

00:05:58.200 | So let's do a quick little addition here. I'm going to say that I have $5,000 a month rent.

00:06:07.240 | I think it's rent. And in a few seconds, we see that Zepp's already paused that new message

00:06:16.520 | and has captured that $5,000. And we can go look at the chart, the graph. This is the Zepp front end.

00:06:23.480 | And we can see the knowledge graph for this user has got a debt account entity. It's got fields on it

00:06:32.120 | that we've defined as a developer. And so again, we can really get really tight about what we retrieve

00:06:39.320 | from Zepp by filtering. Okay, so we're at time. So just very quickly, we wrote a paper about how all of this works.

00:06:47.000 | You can get to it by that link below. And I appreciate your time today. You can look me up afterwards.