Architecting Agent Memory: Principles, Patterns, and Best Practices

In the next 10 to 15 minutes, here's, I guess, my promise to you. I'm going to give you some information that will be high level. There'll be some practical component to it, but this information I'll give you within the next six months will be very relevant. And it will put you in the best position to build the best AI applications, to build the best agents that are believable, capable, and reliable.

I know. We're going to get there. You know what? Just for you. There we go. You're welcome. So we're going to be talking about memory. We're going to be talking about the stateless applications that we're building today and how we can make them stateful. We're going to be talking about the prompt engineering that we're doing today and how we can reduce that by focusing on persistence.

We're going to be turning the responses in our AI application and making our agents build relationship with our customers. And all of it is going to be centered around memory. So I'm going to do a very quick evolution of what we've been seeing for the past two to three years.

We started off with chatbots, LLM power chatbots. They were great. chatbots came out in November 2022. And yeah, exploded. Then we went into RAG. We gave this chatbots more domain-specific relevant knowledge. And they gave us more personalized responses. Then we began to scale the compute, the data we're giving to the LLMs.

And it gave us emerging capabilities, reasoning, tool use. Now we're in the world of AI agents and agentic systems. And the big debate is what is an agent? What is an AI agent? I don't like to go into that debate because that's like asking what is consciousness. Is a spectrum.

The agenticity, and that's a word now, agenticity, of an agent is a spectrum. So there are different levels. I came here and I saw Waymo, and to me it was pure sorcery. We don't have that in the UK. And there are different levels of self-driving, so you can look at the agentic spectrum in that respect.

We have a minimal agent, where there's an LLM run in the loop. Great. Then you have a level four is autonomous agent. A bunch of agents that have access to tools. They can do whatever they want. They're not prompted in any way or a minimal way. But this is how I see things.

It's a spectrum. So what is an AI agent? It's a computation entity with awareness of its environment through perception, cognitive abilities through an LLM, and also can take action through tool use. But the most important bit is there is some form of memory, short term or long term. Memory is important.

It's important because we're trying to make our agents reflective, interactive, proactive, and reactive, and autonomous. And most of this, if not all, can be solved with memory. I work at MongoDB, and we're going to connect the dots. Don't worry. So this is all nice and good. This is what you look at if you double-click into one AI agent is.

But the most important bit to me... I'll go slide. People are taking pictures. Sorry. All right. Let's go. The most important bit is memory. And when we talk about memory, the easy way you can think about it is short-term, long-term. But there are other distinct forms, right? Conversational, entity memory, knowledge, data store, cache, working memory.

We're going to be talking about all of that today. So these are the high-level concepts. But let me go a little bit metal. Why we're all here today in this conference. It's because of AI, right? We're all architects of intelligence. The whole point of AI is to build some form of computational entity that surpasses human intelligence or mimics it.

Then AGI, we're focused on making that intelligence surpass humans in all tasks we can think of. And if you think about the most intelligent humans you know, what determines the intelligence is their ability to recall. It's their memory. So if AI or AGI is meant to mimic human intelligence, it's a no-brainer, no pun intended, that we need memory within the agents that we're building today.

Does anyone disagree? Good. I would have kicked you out. Okay, let's go. So humans, in your brain right now, you have this. This is not what it looks like, but it's close enough. You have different forms of memory. And that's what makes you intelligent. That's what makes you retain some of the information I'm going to be giving you today.

There is short-term, long-term, working memory, semantic, episodic, procedural memory. In your brain right now, there is something called a cerebellum. I always get the word wrong, but that's where you store most of the routines and skills you can do. Can anyone hear your backflip? Really? Wow. You can see my excitement.

The information or the knowledge of that backflip is actually stored in that part of your brain. So I heard it's 90% confidence, by the way. It is, right? I'm not going to do one. But it's stored in that part of your brain. Now, you can actually mimic this in agents.

I'm going to show you how. But now we're talking about agent memory. Agent memory is the mechanisms that we are implementing to actually make sure that states persist in our AI application. Our agents are able to accumulate information, turn data into memory, and have it inform the next execution step.

But the goal is to make them more reliable, believable, and capable. Those are the key things. And the core topic that we are going to be working on as AI memory engineers is on memory management. We are going to be building memory management systems. And memory management is a systematic process of organizing all the information that you're putting into the context window.

Yes, we have like large context window, but that's not for you to stuff all your data in. That's for you to pull in the relevant memory and structure them in a way that is effective, that allows for the response to be relevant. So these are the core components of memory management: generation, storage, retrieval integration, updating, deletion.

There's a lie here, because you don't delete memories. Humans don't delete their memories, except it's a traumatic one that you want to forget. But we really should be looking at implementing forgetting mechanisms within the memory management systems that we're building. You don't want to delete memories. And there are different research papers that are looking at how to implement some form of forgetting within agents.

But the most important bit is retrieval. And I'm getting to the MongoDB part. Moving around, this is RAG. It's very simple, right? Because we've been doing it as AI engineers. MongoDB is that one database that is called to RAG pipelines, because it gives you all the retrieval mechanisms. RAG is not just vector.

Vector search is not all you need. You need other type of search. And we have that with MongoDB, anything you can think of. You're going to be hearing a lot about MongoDB in this conference today. But this is what RAG is. And you level up, you go into the world of agentic RAG, right?

You give the retrieval capability to the agent as a tool. And now we can choose when to call on information. There's a lot going on. I'll send this somehow to you guys. Or you can come to me and I'll LinkedIn it to you. Add me on LinkedIn. And just ask for the slides and I'll send it to you.

Richmond Alake on LinkedIn. This is memory. MongoDB is the memory provider for agentic systems. And when you understand that we provide the developer, the AI memory engineer, the AI engineer, all the features that they need to turn data into memory to make the agents believable, capable and reliable, you begin to understand the importance of having a technology partner like MongoDB on your AI stack.

So this is the same image, but just a bit more focused on all the different memories. I'm going to skip through this slide because I go into a bit of detail. I'm also going to give you a library. I'm working on an open source library. I'm ashamed of the name.

I was trying to be cool when I came up with it. It's called Memories. You can type that on Google. You'll find it. But it has the design patterns of all of this memory that I'm showing you, all these memory types that I will show you as well. But there are different forms of memory and AI agents and how we make them work.

So let's start with Persona. Is anyone here from OpenAI? Leave. I'm joking. Well, a couple of months ago, right? So they gave ChatGPT a bit of personality, right? And they didn't do a good job, but they are going in the right direction, which is we are trying to make our systems more believable, right?

We're trying to make them more human. We're trying to make them create relationship with the consumer, with the users of our systems. Persona memory helps with that. And you can model that in MongoDB, right? This is memories. If you spin up the library, it helps you spin up all of this different type of memory type.

So this is Persona. I have a little demo if we have time. But this is Persona memory. This is what it would look like in MongoDB. Then there's Toolbox. The guidance from OpenAI is you should only put the schema of maybe 10 to 21 tools in the context window.

But when you use your database as a toolbox where you're storing the JSON schema of your tools in MongoDB, you can scale. Because just before you hit the LLM, you can just get the relevant tool using any form of search. So that's toolbox. That's a toolbox memory. And that's what it would look like, right?

This is how you model it in MongoDB. You store the information of your JSON schema. Now you'll begin to understand that MongoDB gives you that flexible data model. The document data model is very flexible. It can adapt to whatever model you want your data to take, whatever structure. And you have all of the retrieval capabilities, graph, vector, text, geo-special query in one database.

Conversation memory is a bit obvious, right? Back and forth conversation with ChatGPT, with Claude. You can store that in your database as well in MongoDB as conversational memory. And this is what that would look like. Timestamp. Timestamp. And you have a conversation ID. And you can see something there called recall recency and associate conversation ID.

And that's my attempt at implementing some memory signals. And that goes into the forgetting mechanism that I'm trying to implement in my very famous library, Memories. I'm going to go through the next slides a bit quicker because I want to get to the end of this. Workflow memory is very important.

You build your agency system. They execute a certain step. Step one, step two, step three, it fails. But one thing you could do is the failure is experience. It's a learning experience. You can store that in your database. I see you nodding. You're like, yeah. You can store that in your database.

And you can then pull that in in the next execution to inform the LLM to not take this step or explore all the paths. You can store that in MongoDB as well. You can model that. Because what you have in MongoDB is that memory provider for your agentic system.

And this is what that looks like when you model it. An example of it anyway. So we have episodic memory. We have long-term memory. We have an agent registry. You can store the information of your agent as well. And this is how I do it. You can see the agent that's tools, persona, all the good stuff.

There's entity memory as well. So there's different forms of memory. And the memory, the memory's library is very experimental and educational. But it encapsulates some of the memory and implementation and design patterns that I'm thinking of on an everyday basis that we're thinking of in MongoDB. So MongoDB, you probably get the point now.

The memory provider for agentic systems. There are tools out there that focus on memory management. MemGPT, MemZero, Zep. They're great tools. But after speaking to some of you folks and some of our partners and customers here, there is not one way to solve memory. And you need a memory provider to build your custom solution to make sure the memory management systems that you're able to implement are effective.

So we really understand the importance of managing data and managing memory. And that's why earlier this year, we acquired Voyage AI. Now they create the best, no offense, OpenAI, embedding models in the market today. Voyage AI embedding models are, we have text multimodal, we have re-rankers. And this allows you to really solve the problem or at least reduce AI hallucination within your ragged and agentic systems.

And what we're doing and what we're focused on, the mission for MongoDB, is to make the developer more productive by taking away the considerations and all the concerns around managing different data and all the process of chunking retrieval strategies. We pull that into the database. We are redefining the database.

And that's why in a few months, we're going to be pulling in Voyage AI, the embedded models and the re-rankers into MongoDB Atlas. And you will not have to be writing chunking strategies for your data. I see a lot of people nodding. Yeah. That's good. So MongoDB is a household name, to be honest.

I watched MongoDB IPO back when I was in university. I bought the stocks when I was in university, free, just free. I only had about 100 pounds. I was broke. But we are very focused and we take it very seriously, making sure that you guys can build the best AI products, AI features very quickly in a secure way.

So MongoDB is built for the change that we are going to experience now, tomorrow, in the next couple of years. I want to end with this. You know who these two guys are? Damn. Okay. This is Hubble and Wiseau. They won a Nobel Prize in the late 90s. But they did some research on the visual cortex of cats, the experiment of cats.

This probably wouldn't fly now, but back in the 50s and 60s, things were a bit more relaxed. But they found out that the visual cortex of the brains between cats and humans actually worked by learning different hierarchies of representation. So edges, contours and abstract shapes. Now, people that are in deep learning will know that this is how convolutional neural network works.

And the research that these guys did inspired and informed convolutional neural networks. That's face detection, object detection. It all comes from neuroscience. So we are architects of intelligence, but there is a better architect of intelligence. It's nature. Nature's created our brains. It's the most effective form of intelligence and, well, some humans are meat, but it's the most effective form of intelligence that we have today.

And we can look inwards to build this agentic system. So last week, Saturday, myself and Tengu is the chief AI scientist at MongoDB, also the founder of Voyage AI. We sat with these three guys in the middle, our neuroscientists. Kenneth has been exploring human brain and memory for over 20 years.

And over here is Charles Parker. He's the creator of MEMGPT, your letter. And we are having this conversation. And once again, we're mirroring how we're bringing neuroscientists and application developers together to solve and push us on the path of AGI. So that's my talk done. Check out memories. And you can come talk to me about memory.

Add me on LinkedIn if you want this presentation. Thank you for your time.

Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB

Transcript