back to index

Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB


Whisper Transcript | Transcript Only Page

00:00:00.000 | In the next 10 to 15 minutes, here's, I guess, my promise to you. I'm going to give you some
00:00:21.200 | information that will be high level. There'll be some practical component to it, but this
00:00:29.420 | information I'll give you within the next six months will be very relevant. And it will
00:00:34.360 | put you in the best position to build the best AI applications, to build the best agents
00:00:40.240 | that are believable, capable, and reliable. I know. We're going to get there. You know
00:00:48.280 | what? Just for you. There we go. You're welcome. So we're going to be talking about memory. We're
00:00:56.340 | going to be talking about the stateless applications that we're building today and how we can make
00:01:02.580 | them stateful. We're going to be talking about the prompt engineering that we're doing today
00:01:07.280 | and how we can reduce that by focusing on persistence. We're going to be turning the responses in
00:01:13.520 | our AI application and making our agents build relationship with our customers. And all of
00:01:19.220 | it is going to be centered around memory. So I'm going to do a very quick evolution of what we've been
00:01:29.340 | seeing for the past two to three years. We started off with chatbots, LLM power chatbots. They were great.
00:01:36.100 | chatbots came out in November 2022. And yeah, exploded. Then we went into RAG. We gave this
00:01:44.340 | chatbots more domain-specific relevant knowledge. And they gave us more personalized responses.
00:01:48.980 | Then we began to scale the compute, the data we're giving to the LLMs. And it gave us emerging capabilities,
00:01:56.420 | reasoning, tool use. Now we're in the world of AI agents and agentic systems.
00:02:02.660 | And the big debate is what is an agent? What is an AI agent? I don't like to go into that debate because
00:02:11.300 | that's like asking what is consciousness. Is a spectrum. The agenticity, and that's a word now,
00:02:19.540 | agenticity, of an agent is a spectrum. So there are different levels. I came here and I saw Waymo,
00:02:29.540 | and to me it was pure sorcery. We don't have that in the UK. And there are different levels of self-driving,
00:02:35.380 | so you can look at the agentic spectrum in that respect. We have a minimal agent, where there's an LLM
00:02:40.660 | run in the loop. Great. Then you have a level four is autonomous agent. A bunch of agents that have
00:02:47.060 | access to tools. They can do whatever they want. They're not prompted in any way or a minimal way.
00:02:52.820 | But this is how I see things. It's a spectrum. So what is an AI agent? It's a computation entity
00:03:00.420 | with awareness of its environment through perception, cognitive abilities through an LLM, and also can take
00:03:07.540 | action through tool use. But the most important bit is there is some form of memory, short term or long term.
00:03:14.180 | Memory is important. It's important because we're trying to make our agents reflective, interactive,
00:03:20.580 | proactive, and reactive, and autonomous. And most of this, if not all, can be solved with memory.
00:03:30.420 | I work at MongoDB, and we're going to connect the dots. Don't worry. So this is all nice and good.
00:03:36.980 | This is what you look at if you double-click into one AI agent is. But the most important bit to me...
00:03:42.660 | I'll go slide. People are taking pictures. Sorry.
00:03:45.220 | All right. Let's go. The most important bit is memory. And when we talk about memory,
00:03:51.300 | the easy way you can think about it is short-term, long-term. But there are other distinct forms,
00:03:56.100 | right? Conversational, entity memory, knowledge, data store, cache, working memory. We're going to be
00:04:01.940 | talking about all of that today. So these are the high-level concepts. But let me go a little bit
00:04:07.060 | metal. Why we're all here today in this conference. It's because of AI, right? We're all architects of
00:04:15.940 | intelligence. The whole point of AI is to build some form of computational entity that surpasses
00:04:20.660 | human intelligence or mimics it. Then AGI, we're focused on making that intelligence surpass humans
00:04:28.900 | in all tasks we can think of. And if you think about the most intelligent humans you know,
00:04:34.980 | what determines the intelligence is their ability to recall. It's their memory. So if AI or AGI is meant
00:04:42.580 | to mimic human intelligence, it's a no-brainer, no pun intended, that we need memory within the
00:04:48.980 | agents that we're building today. Does anyone disagree? Good. I would have kicked you out.
00:04:55.060 | Okay, let's go. So humans, in your brain right now, you have this. This is not what it looks like,
00:05:02.180 | but it's close enough. You have different forms of memory. And that's what makes you intelligent.
00:05:06.740 | That's what makes you retain some of the information I'm going to be giving you today.
00:05:10.500 | There is short-term, long-term, working memory, semantic, episodic, procedural memory.
00:05:15.060 | In your brain right now, there is something called a cerebellum. I always get the word wrong,
00:05:20.340 | but that's where you store most of the routines and skills you can do. Can anyone hear your backflip?
00:05:25.620 | Really? Wow. You can see my excitement. The information or the knowledge of that backflip is
00:05:35.460 | actually stored in that part of your brain. So I heard it's 90% confidence, by the way.
00:05:40.100 | It is, right? I'm not going to do one. But it's stored in that part of your brain. Now,
00:05:47.940 | you can actually mimic this in agents. I'm going to show you how. But now we're talking about agent memory.
00:05:56.500 | Agent memory is the mechanisms that we are implementing to actually make sure that states persist in our AI application.
00:06:06.580 | Our agents are able to accumulate information, turn data into memory, and have it inform the next
00:06:14.340 | execution step. But the goal is to make them more reliable, believable, and capable.
00:06:22.580 | Those are the key things. And the core topic that we are going to be working on as AI memory
00:06:30.500 | engineers is on memory management. We are going to be building memory management systems.
00:06:36.260 | And memory management is a systematic process of organizing all the information that you're putting
00:06:42.260 | into the context window. Yes, we have like large context window, but that's not for you to stuff
00:06:47.540 | all your data in. That's for you to pull in the relevant memory and structure them in a way that
00:06:53.300 | is effective, that allows for the response to be relevant. So these are the core components of memory
00:07:00.820 | management: generation, storage, retrieval integration, updating, deletion. There's a lie here, because
00:07:07.380 | you don't delete memories. Humans don't delete their memories, except it's a traumatic one that you want to
00:07:12.980 | forget. But we really should be looking at implementing forgetting mechanisms within the memory management
00:07:20.100 | systems that we're building. You don't want to delete memories. And there are different research
00:07:23.700 | papers that are looking at how to implement some form of forgetting within agents. But the most important
00:07:30.260 | bit is retrieval. And I'm getting to the MongoDB part. Moving around, this is RAG. It's very simple,
00:07:40.260 | right? Because we've been doing it as AI engineers. MongoDB is that one database that is called to RAG
00:07:49.060 | pipelines, because it gives you all the retrieval mechanisms. RAG is not just vector. Vector search
00:07:53.860 | is not all you need. You need other type of search. And we have that with MongoDB, anything you can think of.
00:07:59.380 | You're going to be hearing a lot about MongoDB in this conference today. But this is what RAG is. And
00:08:05.940 | you level up, you go into the world of agentic RAG, right? You give the retrieval capability
00:08:12.260 | to the agent as a tool. And now we can choose when to call on information. There's a lot going on. I'll send
00:08:20.980 | this somehow to you guys. Or you can come to me and I'll LinkedIn it to you. Add me on LinkedIn.
00:08:27.540 | And just ask for the slides and I'll send it to you. Richmond Alake on LinkedIn. This is memory.
00:08:35.220 | MongoDB is the memory provider for agentic systems. And when you understand that we provide the developer,
00:08:45.460 | the AI memory engineer, the AI engineer, all the features that they need to turn data
00:08:52.020 | into memory to make the agents believable, capable and reliable, you begin to understand the
00:08:56.900 | importance of having a technology partner like MongoDB on your AI stack.
00:09:03.540 | So this is the same image, but just a bit more focused on all the different memories. I'm going
00:09:10.020 | to skip through this slide because I go into a bit of detail. I'm also going to give you a library.
00:09:15.940 | I'm working on an open source library. I'm ashamed of the name. I was trying to be cool when I came up
00:09:20.980 | with it. It's called Memories. You can type that on Google. You'll find it. But it has the design patterns
00:09:29.380 | of all of this memory that I'm showing you, all these memory types that I will show you as well.
00:09:33.780 | But there are different forms of memory and AI agents and how we make them work. So let's start with
00:09:38.340 | Persona. Is anyone here from OpenAI? Leave. I'm joking.
00:09:44.980 | Well, a couple of months ago, right? So they gave ChatGPT a bit of personality, right?
00:09:53.620 | And they didn't do a good job, but they are going in the right direction, which is we are trying to make
00:10:01.060 | our systems more believable, right? We're trying to make them more human. We're trying to make them
00:10:05.460 | create relationship with the consumer, with the users of our systems. Persona memory helps with that.
00:10:12.900 | And you can model that in MongoDB, right? This is memories. If you spin up the library,
00:10:19.220 | it helps you spin up all of this different type of memory type. So this is Persona. I have a little
00:10:25.780 | demo if we have time. But this is Persona memory. This is what it would look like in MongoDB.
00:10:32.660 | Then there's Toolbox. The guidance from OpenAI is you should only put the schema of maybe 10 to 21 tools
00:10:43.060 | in the context window. But when you use your database as a toolbox where you're storing the JSON schema
00:10:50.260 | of your tools in MongoDB, you can scale. Because just before you hit the LLM, you can just get the
00:10:58.420 | relevant tool using any form of search. So that's toolbox. That's a toolbox memory. And that's what
00:11:05.540 | it would look like, right? This is how you model it in MongoDB. You store the information of your JSON
00:11:12.020 | schema. Now you'll begin to understand that MongoDB gives you that flexible data model. The document data
00:11:19.140 | model is very flexible. It can adapt to whatever model you want your data to take, whatever structure.
00:11:25.460 | And you have all of the retrieval capabilities, graph, vector, text, geo-special query in one database.
00:11:32.660 | Conversation memory is a bit obvious, right? Back and forth conversation with ChatGPT, with Claude. You
00:11:39.300 | can store that in your database as well in MongoDB as conversational memory. And this is what that would
00:11:45.140 | look like. Timestamp. Timestamp. And you have a conversation ID. And you can see something there
00:11:50.820 | called recall recency and associate conversation ID. And that's my attempt at implementing some memory
00:11:56.900 | signals. And that goes into the forgetting mechanism that I'm trying to implement in my very famous library,
00:12:04.580 | Memories. I'm going to go through the next slides a bit quicker because I want to get to the end of this.
00:12:12.260 | Workflow memory is very important. You build your agency system. They execute a certain step. Step one,
00:12:16.980 | step two, step three, it fails. But one thing you could do is the failure is experience. It's a learning
00:12:22.580 | experience. You can store that in your database. I see you nodding. You're like, yeah. You can store that in
00:12:28.020 | your database. And you can then pull that in in the next execution to inform the LLM to not take this step or
00:12:35.140 | explore all the paths. You can store that in MongoDB as well. You can model that. Because what you have
00:12:41.220 | in MongoDB is that memory provider for your agentic system. And this is what that looks like when you
00:12:45.780 | model it. An example of it anyway. So we have episodic memory. We have long-term memory. We have an agent
00:12:52.020 | registry. You can store the information of your agent as well. And this is how I do it. You can see the agent
00:12:58.260 | that's tools, persona, all the good stuff. There's entity memory as well. So there's different forms of
00:13:03.860 | memory. And the memory, the memory's library is very experimental and educational. But it encapsulates
00:13:11.300 | some of the memory and implementation and design patterns that I'm thinking of on an everyday basis
00:13:17.940 | that we're thinking of in MongoDB. So MongoDB, you probably get the point now. The memory provider for
00:13:24.420 | agentic systems. There are tools out there that focus on memory management. MemGPT, MemZero, Zep.
00:13:31.940 | They're great tools. But after speaking to some of you folks and some of our partners and customers here,
00:13:38.900 | there is not one way to solve memory. And you need a memory provider to build your custom solution
00:13:48.260 | to make sure the memory management systems that you're able to implement are effective.
00:13:53.220 | So we really understand the importance of managing data and managing memory. And that's why earlier this
00:14:01.140 | year, we acquired Voyage AI. Now they create the best, no offense, OpenAI, embedding models in the market
00:14:09.460 | today. Voyage AI embedding models are, we have text multimodal, we have re-rankers. And this allows you to
00:14:19.060 | really solve the problem or at least reduce AI hallucination within your ragged and agentic systems.
00:14:24.500 | And what we're doing and what we're focused on, the mission for MongoDB,
00:14:28.900 | is to make the developer more productive by taking away the considerations and all the concerns around
00:14:35.860 | managing different data and all the process of chunking retrieval strategies. We pull that into the database. We are redefining the database.
00:14:46.100 | And that's why in a few months, we're going to be pulling in Voyage AI, the embedded models and the re-rankers into
00:14:51.940 | MongoDB Atlas. And you will not have to be writing chunking strategies for your data. I see a lot of people nodding. Yeah.
00:15:02.500 | That's good. So MongoDB is a household name, to be honest. I watched MongoDB IPO back when I was in
00:15:09.860 | university. I bought the stocks when I was in university, free, just free. I only had about 100
00:15:16.420 | pounds. I was broke. But we are very focused and we take it very seriously, making sure that
00:15:24.740 | you guys can build the best AI products, AI features very quickly in a secure way.
00:15:30.180 | So MongoDB is built for the change that we are going to experience
00:15:33.140 | now, tomorrow, in the next couple of years. I want to end with this. You know who these two guys are?
00:15:39.380 | Damn. Okay. This is Hubble and Wiseau. They won a Nobel Prize in the late 90s. But they did some research on
00:15:48.740 | the visual cortex of cats, the experiment of cats. This probably wouldn't fly now, but back in the 50s
00:15:55.940 | and 60s, things were a bit more relaxed. But they found out that the visual cortex of the brains between
00:16:02.260 | cats and humans actually worked by learning different hierarchies of representation. So edges, contours and
00:16:10.020 | abstract shapes. Now, people that are in deep learning will know that this is how convolutional neural network
00:16:15.700 | works. And the research that these guys did inspired and informed convolutional neural networks. That's face
00:16:24.900 | detection, object detection. It all comes from neuroscience. So we are architects of intelligence,
00:16:31.700 | but there is a better architect of intelligence. It's nature. Nature's created our brains. It's the most
00:16:38.260 | effective form of intelligence and, well, some humans are meat, but it's the most effective form of
00:16:43.140 | intelligence that we have today. And we can look inwards to build this agentic system. So last week,
00:16:49.060 | Saturday, myself and Tengu is the chief AI scientist at MongoDB, also the founder of Voyage AI. We sat with
00:16:56.500 | these three guys in the middle, our neuroscientists. Kenneth has been exploring human brain and memory for
00:17:03.220 | over 20 years. And over here is Charles Parker. He's the creator of MEMGPT, your letter.
00:17:09.140 | And we are having this conversation. And once again, we're mirroring how we're bringing neuroscientists and
00:17:15.380 | application developers together to solve and push us on the path of AGI. So that's my talk done.
00:17:24.020 | Check out memories. And you can come talk to me about memory. Add me on LinkedIn if you want this presentation. Thank you for your time.