Stop Using RAG as Memory — Daniel Chalef, Zep

I'm here today to tell you that there's no one-size-fits-all memory and why you need to model your memory after your business domain. So if you saw me a little bit earlier, and I was talking about Graffiti, Zep's open-source temporal graph framework, you might have seen me just speak to how you can build custom entities and edges in the Graffiti graph for your particular business domain, so business objects from your business domain.

What I'm going to demo today is actually how Zep implements that and how easy it is to use from Python, TypeScript, or Go. And what we've done here is we've solved the fundamental problem plaguing memory. And we're enabling developers to build out memory that is far more cogent and capable for many different use cases.

So I'm going to just show you a quick example of where things go really wrong. So many of you might have used ChatGPT before. It generates facts about you in memory. And you might have noticed that it really struggles with relevance. Sometimes it just pulls out all sorts of arbitrary facts about you.

And unfortunately, when you store arbitrary facts and retrieve them as memory, you get inaccurate responses or hallucinations. And the same problem happens when you're building your own agents. So here we go. We have an example media assistant. And it should remember things about jazz music, NPR podcasts, the daily, et cetera, all the things that I like to listen to.

But unfortunately, because I'm in conversation with the agent or it's picking up my voice when it's a voice agent, it's learning all sorts of irrelevant things. Like I wake up at 7:00 AM, my dog's name is Melody, et cetera. And the point here is that irrelevant facts pollute memory.

They're not specific to the media player business domain. And so the technical reality here is as well that many frameworks take this really simplistic approach to generating facts. If you're using a framework that has memory capabilities, agent framework, it's generating facts and throwing it into a vector database. And unfortunately, the facts dumped into the vector database or Redis mean that when you're recalling that memory, it's difficult to differentiate what should be returned.

We're going to return what is semantically similar. And here we have a bunch of facts that are semantically similar to my request for my favorite tunes. We have some good things. And unfortunately, Melody is there as well, because Melody is a dog named Melody. And that might be something to do with tunes.

Melody is there as well. And so a bunch of irrelevant stuff. So basically semantic similarity is not business relevance. And this is not unexpected. I was speaking a little bit earlier about how vectors and are just basically projections into an embedding space. There's no causal or relational relations between them.

And so we need a solution. We need domain aware memory, not better semantic search. So with that, I am going to unfortunately be showing you a video because the Wi-Fi has been absolutely terrible. And let me bring up the video. Okay. So I built a little application here, and it is a finance coach.

And I've told it I want to buy a house. And it's asking me, well, how much do I earn a year? It's asking me about what student loan debt I might have. And we'll see that on the right-hand side, what is stored in Zepp's memory are some very explicit business objects.

We have financial goals, debts, income sources, etc. These are defined by the developer. And they're defined in a way which is really simple to understand. We can use Pydantic or Zod or Go structs. And we can apply business rules. So let's go take a look at some of the code here.

We have a TypeScript financial goal schema using Zepp's underlying SDK. We can define these entity types. We can give a description to the entity type. We can even define fields, the business rules for those fields, so the values that they take on. And then we can build tools for our agent to retrieve a financial snapshot, which runs multiple Zepp searches at the same time concurrently, and filters by specific node types.

And when we start our Zepp application, what we're going to do is we're going to register these particular goals, sorry, objects with Zepp, so it knows to build this ontology in the graph. So let's do a quick little addition here. I'm going to say that I have $5,000 a month rent.

I think it's rent. And in a few seconds, we see that Zepp's already paused that new message and has captured that $5,000. And we can go look at the chart, the graph. This is the Zepp front end. And we can see the knowledge graph for this user has got a debt account entity.

It's got fields on it that we've defined as a developer. And so again, we can really get really tight about what we retrieve from Zepp by filtering. Okay, so we're at time. So just very quickly, we wrote a paper about how all of this works. You can get to it by that link below.

And I appreciate your time today. You can look me up afterwards.

Stop Using RAG as Memory — Daniel Chalef, Zep

Transcript