back to indexArchitecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB

00:00:00.000 |
In the next 10 to 15 minutes, here's, I guess, my promise to you. I'm going to give you some 00:00:21.200 |
information that will be high level. There'll be some practical component to it, but this 00:00:29.420 |
information I'll give you within the next six months will be very relevant. And it will 00:00:34.360 |
put you in the best position to build the best AI applications, to build the best agents 00:00:40.240 |
that are believable, capable, and reliable. I know. We're going to get there. You know 00:00:48.280 |
what? Just for you. There we go. You're welcome. So we're going to be talking about memory. We're 00:00:56.340 |
going to be talking about the stateless applications that we're building today and how we can make 00:01:02.580 |
them stateful. We're going to be talking about the prompt engineering that we're doing today 00:01:07.280 |
and how we can reduce that by focusing on persistence. We're going to be turning the responses in 00:01:13.520 |
our AI application and making our agents build relationship with our customers. And all of 00:01:19.220 |
it is going to be centered around memory. So I'm going to do a very quick evolution of what we've been 00:01:29.340 |
seeing for the past two to three years. We started off with chatbots, LLM power chatbots. They were great. 00:01:36.100 |
chatbots came out in November 2022. And yeah, exploded. Then we went into RAG. We gave this 00:01:44.340 |
chatbots more domain-specific relevant knowledge. And they gave us more personalized responses. 00:01:48.980 |
Then we began to scale the compute, the data we're giving to the LLMs. And it gave us emerging capabilities, 00:01:56.420 |
reasoning, tool use. Now we're in the world of AI agents and agentic systems. 00:02:02.660 |
And the big debate is what is an agent? What is an AI agent? I don't like to go into that debate because 00:02:11.300 |
that's like asking what is consciousness. Is a spectrum. The agenticity, and that's a word now, 00:02:19.540 |
agenticity, of an agent is a spectrum. So there are different levels. I came here and I saw Waymo, 00:02:29.540 |
and to me it was pure sorcery. We don't have that in the UK. And there are different levels of self-driving, 00:02:35.380 |
so you can look at the agentic spectrum in that respect. We have a minimal agent, where there's an LLM 00:02:40.660 |
run in the loop. Great. Then you have a level four is autonomous agent. A bunch of agents that have 00:02:47.060 |
access to tools. They can do whatever they want. They're not prompted in any way or a minimal way. 00:02:52.820 |
But this is how I see things. It's a spectrum. So what is an AI agent? It's a computation entity 00:03:00.420 |
with awareness of its environment through perception, cognitive abilities through an LLM, and also can take 00:03:07.540 |
action through tool use. But the most important bit is there is some form of memory, short term or long term. 00:03:14.180 |
Memory is important. It's important because we're trying to make our agents reflective, interactive, 00:03:20.580 |
proactive, and reactive, and autonomous. And most of this, if not all, can be solved with memory. 00:03:30.420 |
I work at MongoDB, and we're going to connect the dots. Don't worry. So this is all nice and good. 00:03:36.980 |
This is what you look at if you double-click into one AI agent is. But the most important bit to me... 00:03:42.660 |
I'll go slide. People are taking pictures. Sorry. 00:03:45.220 |
All right. Let's go. The most important bit is memory. And when we talk about memory, 00:03:51.300 |
the easy way you can think about it is short-term, long-term. But there are other distinct forms, 00:03:56.100 |
right? Conversational, entity memory, knowledge, data store, cache, working memory. We're going to be 00:04:01.940 |
talking about all of that today. So these are the high-level concepts. But let me go a little bit 00:04:07.060 |
metal. Why we're all here today in this conference. It's because of AI, right? We're all architects of 00:04:15.940 |
intelligence. The whole point of AI is to build some form of computational entity that surpasses 00:04:20.660 |
human intelligence or mimics it. Then AGI, we're focused on making that intelligence surpass humans 00:04:28.900 |
in all tasks we can think of. And if you think about the most intelligent humans you know, 00:04:34.980 |
what determines the intelligence is their ability to recall. It's their memory. So if AI or AGI is meant 00:04:42.580 |
to mimic human intelligence, it's a no-brainer, no pun intended, that we need memory within the 00:04:48.980 |
agents that we're building today. Does anyone disagree? Good. I would have kicked you out. 00:04:55.060 |
Okay, let's go. So humans, in your brain right now, you have this. This is not what it looks like, 00:05:02.180 |
but it's close enough. You have different forms of memory. And that's what makes you intelligent. 00:05:06.740 |
That's what makes you retain some of the information I'm going to be giving you today. 00:05:10.500 |
There is short-term, long-term, working memory, semantic, episodic, procedural memory. 00:05:15.060 |
In your brain right now, there is something called a cerebellum. I always get the word wrong, 00:05:20.340 |
but that's where you store most of the routines and skills you can do. Can anyone hear your backflip? 00:05:25.620 |
Really? Wow. You can see my excitement. The information or the knowledge of that backflip is 00:05:35.460 |
actually stored in that part of your brain. So I heard it's 90% confidence, by the way. 00:05:40.100 |
It is, right? I'm not going to do one. But it's stored in that part of your brain. Now, 00:05:47.940 |
you can actually mimic this in agents. I'm going to show you how. But now we're talking about agent memory. 00:05:56.500 |
Agent memory is the mechanisms that we are implementing to actually make sure that states persist in our AI application. 00:06:06.580 |
Our agents are able to accumulate information, turn data into memory, and have it inform the next 00:06:14.340 |
execution step. But the goal is to make them more reliable, believable, and capable. 00:06:22.580 |
Those are the key things. And the core topic that we are going to be working on as AI memory 00:06:30.500 |
engineers is on memory management. We are going to be building memory management systems. 00:06:36.260 |
And memory management is a systematic process of organizing all the information that you're putting 00:06:42.260 |
into the context window. Yes, we have like large context window, but that's not for you to stuff 00:06:47.540 |
all your data in. That's for you to pull in the relevant memory and structure them in a way that 00:06:53.300 |
is effective, that allows for the response to be relevant. So these are the core components of memory 00:07:00.820 |
management: generation, storage, retrieval integration, updating, deletion. There's a lie here, because 00:07:07.380 |
you don't delete memories. Humans don't delete their memories, except it's a traumatic one that you want to 00:07:12.980 |
forget. But we really should be looking at implementing forgetting mechanisms within the memory management 00:07:20.100 |
systems that we're building. You don't want to delete memories. And there are different research 00:07:23.700 |
papers that are looking at how to implement some form of forgetting within agents. But the most important 00:07:30.260 |
bit is retrieval. And I'm getting to the MongoDB part. Moving around, this is RAG. It's very simple, 00:07:40.260 |
right? Because we've been doing it as AI engineers. MongoDB is that one database that is called to RAG 00:07:49.060 |
pipelines, because it gives you all the retrieval mechanisms. RAG is not just vector. Vector search 00:07:53.860 |
is not all you need. You need other type of search. And we have that with MongoDB, anything you can think of. 00:07:59.380 |
You're going to be hearing a lot about MongoDB in this conference today. But this is what RAG is. And 00:08:05.940 |
you level up, you go into the world of agentic RAG, right? You give the retrieval capability 00:08:12.260 |
to the agent as a tool. And now we can choose when to call on information. There's a lot going on. I'll send 00:08:20.980 |
this somehow to you guys. Or you can come to me and I'll LinkedIn it to you. Add me on LinkedIn. 00:08:27.540 |
And just ask for the slides and I'll send it to you. Richmond Alake on LinkedIn. This is memory. 00:08:35.220 |
MongoDB is the memory provider for agentic systems. And when you understand that we provide the developer, 00:08:45.460 |
the AI memory engineer, the AI engineer, all the features that they need to turn data 00:08:52.020 |
into memory to make the agents believable, capable and reliable, you begin to understand the 00:08:56.900 |
importance of having a technology partner like MongoDB on your AI stack. 00:09:03.540 |
So this is the same image, but just a bit more focused on all the different memories. I'm going 00:09:10.020 |
to skip through this slide because I go into a bit of detail. I'm also going to give you a library. 00:09:15.940 |
I'm working on an open source library. I'm ashamed of the name. I was trying to be cool when I came up 00:09:20.980 |
with it. It's called Memories. You can type that on Google. You'll find it. But it has the design patterns 00:09:29.380 |
of all of this memory that I'm showing you, all these memory types that I will show you as well. 00:09:33.780 |
But there are different forms of memory and AI agents and how we make them work. So let's start with 00:09:38.340 |
Persona. Is anyone here from OpenAI? Leave. I'm joking. 00:09:44.980 |
Well, a couple of months ago, right? So they gave ChatGPT a bit of personality, right? 00:09:53.620 |
And they didn't do a good job, but they are going in the right direction, which is we are trying to make 00:10:01.060 |
our systems more believable, right? We're trying to make them more human. We're trying to make them 00:10:05.460 |
create relationship with the consumer, with the users of our systems. Persona memory helps with that. 00:10:12.900 |
And you can model that in MongoDB, right? This is memories. If you spin up the library, 00:10:19.220 |
it helps you spin up all of this different type of memory type. So this is Persona. I have a little 00:10:25.780 |
demo if we have time. But this is Persona memory. This is what it would look like in MongoDB. 00:10:32.660 |
Then there's Toolbox. The guidance from OpenAI is you should only put the schema of maybe 10 to 21 tools 00:10:43.060 |
in the context window. But when you use your database as a toolbox where you're storing the JSON schema 00:10:50.260 |
of your tools in MongoDB, you can scale. Because just before you hit the LLM, you can just get the 00:10:58.420 |
relevant tool using any form of search. So that's toolbox. That's a toolbox memory. And that's what 00:11:05.540 |
it would look like, right? This is how you model it in MongoDB. You store the information of your JSON 00:11:12.020 |
schema. Now you'll begin to understand that MongoDB gives you that flexible data model. The document data 00:11:19.140 |
model is very flexible. It can adapt to whatever model you want your data to take, whatever structure. 00:11:25.460 |
And you have all of the retrieval capabilities, graph, vector, text, geo-special query in one database. 00:11:32.660 |
Conversation memory is a bit obvious, right? Back and forth conversation with ChatGPT, with Claude. You 00:11:39.300 |
can store that in your database as well in MongoDB as conversational memory. And this is what that would 00:11:45.140 |
look like. Timestamp. Timestamp. And you have a conversation ID. And you can see something there 00:11:50.820 |
called recall recency and associate conversation ID. And that's my attempt at implementing some memory 00:11:56.900 |
signals. And that goes into the forgetting mechanism that I'm trying to implement in my very famous library, 00:12:04.580 |
Memories. I'm going to go through the next slides a bit quicker because I want to get to the end of this. 00:12:12.260 |
Workflow memory is very important. You build your agency system. They execute a certain step. Step one, 00:12:16.980 |
step two, step three, it fails. But one thing you could do is the failure is experience. It's a learning 00:12:22.580 |
experience. You can store that in your database. I see you nodding. You're like, yeah. You can store that in 00:12:28.020 |
your database. And you can then pull that in in the next execution to inform the LLM to not take this step or 00:12:35.140 |
explore all the paths. You can store that in MongoDB as well. You can model that. Because what you have 00:12:41.220 |
in MongoDB is that memory provider for your agentic system. And this is what that looks like when you 00:12:45.780 |
model it. An example of it anyway. So we have episodic memory. We have long-term memory. We have an agent 00:12:52.020 |
registry. You can store the information of your agent as well. And this is how I do it. You can see the agent 00:12:58.260 |
that's tools, persona, all the good stuff. There's entity memory as well. So there's different forms of 00:13:03.860 |
memory. And the memory, the memory's library is very experimental and educational. But it encapsulates 00:13:11.300 |
some of the memory and implementation and design patterns that I'm thinking of on an everyday basis 00:13:17.940 |
that we're thinking of in MongoDB. So MongoDB, you probably get the point now. The memory provider for 00:13:24.420 |
agentic systems. There are tools out there that focus on memory management. MemGPT, MemZero, Zep. 00:13:31.940 |
They're great tools. But after speaking to some of you folks and some of our partners and customers here, 00:13:38.900 |
there is not one way to solve memory. And you need a memory provider to build your custom solution 00:13:48.260 |
to make sure the memory management systems that you're able to implement are effective. 00:13:53.220 |
So we really understand the importance of managing data and managing memory. And that's why earlier this 00:14:01.140 |
year, we acquired Voyage AI. Now they create the best, no offense, OpenAI, embedding models in the market 00:14:09.460 |
today. Voyage AI embedding models are, we have text multimodal, we have re-rankers. And this allows you to 00:14:19.060 |
really solve the problem or at least reduce AI hallucination within your ragged and agentic systems. 00:14:24.500 |
And what we're doing and what we're focused on, the mission for MongoDB, 00:14:28.900 |
is to make the developer more productive by taking away the considerations and all the concerns around 00:14:35.860 |
managing different data and all the process of chunking retrieval strategies. We pull that into the database. We are redefining the database. 00:14:46.100 |
And that's why in a few months, we're going to be pulling in Voyage AI, the embedded models and the re-rankers into 00:14:51.940 |
MongoDB Atlas. And you will not have to be writing chunking strategies for your data. I see a lot of people nodding. Yeah. 00:15:02.500 |
That's good. So MongoDB is a household name, to be honest. I watched MongoDB IPO back when I was in 00:15:09.860 |
university. I bought the stocks when I was in university, free, just free. I only had about 100 00:15:16.420 |
pounds. I was broke. But we are very focused and we take it very seriously, making sure that 00:15:24.740 |
you guys can build the best AI products, AI features very quickly in a secure way. 00:15:30.180 |
So MongoDB is built for the change that we are going to experience 00:15:33.140 |
now, tomorrow, in the next couple of years. I want to end with this. You know who these two guys are? 00:15:39.380 |
Damn. Okay. This is Hubble and Wiseau. They won a Nobel Prize in the late 90s. But they did some research on 00:15:48.740 |
the visual cortex of cats, the experiment of cats. This probably wouldn't fly now, but back in the 50s 00:15:55.940 |
and 60s, things were a bit more relaxed. But they found out that the visual cortex of the brains between 00:16:02.260 |
cats and humans actually worked by learning different hierarchies of representation. So edges, contours and 00:16:10.020 |
abstract shapes. Now, people that are in deep learning will know that this is how convolutional neural network 00:16:15.700 |
works. And the research that these guys did inspired and informed convolutional neural networks. That's face 00:16:24.900 |
detection, object detection. It all comes from neuroscience. So we are architects of intelligence, 00:16:31.700 |
but there is a better architect of intelligence. It's nature. Nature's created our brains. It's the most 00:16:38.260 |
effective form of intelligence and, well, some humans are meat, but it's the most effective form of 00:16:43.140 |
intelligence that we have today. And we can look inwards to build this agentic system. So last week, 00:16:49.060 |
Saturday, myself and Tengu is the chief AI scientist at MongoDB, also the founder of Voyage AI. We sat with 00:16:56.500 |
these three guys in the middle, our neuroscientists. Kenneth has been exploring human brain and memory for 00:17:03.220 |
over 20 years. And over here is Charles Parker. He's the creator of MEMGPT, your letter. 00:17:09.140 |
And we are having this conversation. And once again, we're mirroring how we're bringing neuroscientists and 00:17:15.380 |
application developers together to solve and push us on the path of AGI. So that's my talk done. 00:17:24.020 |
Check out memories. And you can come talk to me about memory. Add me on LinkedIn if you want this presentation. Thank you for your time.