Memory Masterclass: Make Your AI Agents Remember What They Do!

00:00:00.000 | I'm super excited to be here with you. This is my first time speaking at AI Engineer.

00:00:22.560 | We have an amazing group of guest speakers, Vasilija Markovic from Cogni. Vasilija, oh, there

00:00:34.460 | is Vasilija, Daniel Chalev from Graffiti and ZPI and Alex Gilmore from Neo4j. The plan looks

00:00:46.360 | like this. I will do a very quick power talk about the topic that I'm super passionate.

00:00:52.120 | The AI memory. Next, we'll have four live demos, and we'll move on to some new solution

00:01:01.760 | that we are proposing, a GraphRack chat arena that I will be able to demonstrate, and I would

00:01:09.440 | like you to follow along once it's being demonstrated. And at the very end, we'll have a very short

00:01:17.520 | Q&A session. There is a Slack channel that I would like you to join. So, please scan the

00:01:28.160 | QR code right now before we begin. And let's make sure that everyone has access to these materials.

00:01:37.160 | There is a walkthrough shirt on the channel that will go through closer to the end of our

00:01:46.800 | workshop. But I would like you to start setting it up, if you may, on your laptops if you want

00:01:53.120 | to follow along.

00:01:54.760 | All right. It's workshop-graphrackchat. You can also find it on Slack. And you can join

00:02:10.760 | the channel. So, a little bit about myself. So, hi, everyone. Again, I'm Mark Bain. And I'm very passionate about the memory. What is memory? The deep

00:02:19.120 | physics and applications of memory across different technologies. You can find me at Mark and Bain on social media or on my website. And let me tell you a little bit of a story about myself. So, when I was 16 years old, I was very good at maths and I did math olympiads with

00:02:48.760 | many brilliant minds, including Wojciech Zaremba, the co-founder of OpenAI. And thanks to that deep understanding of maths and physics, I did have many great opportunities to be exposed to the problem of AI memory. So, first of all, I would like to recall two conversations that I had with Wojciech and Ilya in 2014

00:03:18.400 | in September. When I came here to study at Stanford, at one party, we met with Ilya and Wojciech, who back then worked at Google. And they were kind of trying to pitch me that there will be a huge revolution in AI. And I kind of, like, followed that. I was a little bit unimpressed back then. Right now, I probably kind of take it as a very big excitement when

00:03:48.040 | when I look back to the times. And I was really wishing good luck to the guys who were doing deep learning because back then, I didn't really see this prospect of GPUs giving that huge edge in compute.

00:04:04.040 | Ilya.

00:04:05.040 | However, during that conversation, it was like 20 minutes. At the very end, I asked Ilya, all right, so there is going to be a big AI revolution. But how will these AI systems communicate with each other?

00:04:22.040 | And the answer was very perplexing and kind of sets the stage to what's happening right now. Ilya simply answered, I don't know. I think they will invent their own language. So that was 11 years ago. Fast forward to now.

00:04:40.040 | The last two years I have spent doing very deep research on physics of AI. And kind of, like, dove into all of these most modern AI architectures, including attention, diffusion models, VAEs, and many other ones. And I realized that there is something critical. Something missing. And this power talk is about this missing thing.

00:05:09.680 | So over the last two years, I kind of followed on on my last years of doing a lot of research in physics, computer science, information science. And I came to this conclusion that memory, AI memory, in fact, is any data in any format, and this is important, including code,

00:05:39.320 | algorithms, algorithms, algorithms, algorithms, and hardware, and any causal changes that affect them. That was something very mind-blowing to reach that conclusion. And that conclusion sets the tone to this whole track, the graph track.

00:05:57.320 | the graph track.

00:06:00.320 | In fact, I was also perplexed by how biological systems use memory and how different cosmological structures or quantum structures, they, in fact, have a memory. They kind of remember.

00:06:15.320 | And let's get back to maths and to physics and geometry. When I was doing science olympiads, I was really focused on two or three things. Geometry, trigonometry, and algebra. And I realized, in the last year, that more or less the volume of loss in physics per

00:06:43.320 | Loss in physics perfectly matches the volume of loss in mathematics. And also, the constants in mathematics, if you really think deeply through geometry, they match the constants both in mathematics and in physics. And if you really think even deeper, they kind of like transcend over all the other disciplines.

00:07:08.320 | So that way we think a lot. And I found out that the principles that govern LLMs are the exact same principles that govern neuroscience. And they are the exact same principles that govern mathematics.

00:07:27.320 | And I studied, I studied papers of Perlman. I don't know if you've heard who is Perlman.

00:07:36.320 | Perlman is this mathematician who refused to take a $1 million award for proving the -- for proving the -- for proving the -- one of the most important conjectures.

00:07:54.320 | I studied about symmetries of three spheres. And once I realized that this deep math of spheres and circles is very much linked with how attention and diffusion models work,

00:08:18.320 | So basically the formulas that Perlman reached are linking entropy with curvature. And curvature,

00:08:25.320 | basically, if you think of curvature, it's a tension. It's gravity. So in a sense, there are multiple disciplines where the same things are appearing multiple times. And I will be publishing a series of papers with some amazing

00:08:46.320 | Supervisors who are co-authors who are co-authors of two of these methods, methodologies. The transformers and VAEs. And I came to this realization that this equation governs everything.

00:09:08.320 | It governs math, governs physics, governs our AI memory, governs neuroscience, biology, physics, chemistry, and so on and so forth.

00:09:20.320 | So I came to this equation that memory times compute would like to be a squared imaginary unit circle.

00:09:34.320 | If that existed ever, we would have perfect symmetries and we would kind of not exist. Because for us to exist, this asymmetries needs to show up. And in a sense, every single LLM through

00:09:47.320 | weights and biases, the weights and biases, the weights are giving the structure, the compute that comes and transforms the data in sort of the raw format, the compute turns it into weights. The weights are basically, if you take these billions of parameters, the weights are the sort of like matrix structure of how this data looks like when you really find relationships in the raw data.

00:10:16.320 | All right. And then there are these biases, these tiny shifts that are kind of like trying to like in a robust way adapt to this model so that it doesn't break apart but still is still is very well reflecting the reality. So something is missing. So when we take weights and biases and we apply scaling clause and we keep adding more data, more compute, we kind of get a better and better and better understanding of the data.

00:10:45.320 | In a sense, if we have infinite data, we wouldn't have any biases. And this understanding is again the principle of this track of GraphRack. The disappearance of biases is what we are looking for when we are scaling our models. So in a sense, the amount of memory and compute should be exactly the same.

00:11:14.320 | It's just slightly expressed in a different way. But if there are some imbalances, then something important happens. And I came to another conclusion that our universe is basically a network database. It has a graph structure and it's a temporal structure. So it keeps on moving, following some certain principles and rules.

00:11:43.320 | And these principles and rules are not necessarily fuzzy. They have to be fuzzy because otherwise everything would be completely predictable. But if it would be completely predictable, it means that me, myself, would know everything about every single of you about myself from the past and myself from the future. So in a sense, it's impossible. And that's why we have

00:11:48.320 | So in a sense, it's impossible. And that's why we have this sort of heat, diffusion, entropy models.

00:11:55.320 | They allow us to exist. But something is preserved.

00:12:02.320 | Relationships.

00:12:03.320 | Relationships.

00:12:04.320 | Relationships.

00:12:05.320 | Relationships.

00:12:06.320 | Relationships.

00:12:07.320 | Any single symmetry that happens at the quantum level.

00:12:08.320 | Any single symmetry that happens at the quantum level.

00:12:09.320 | Any single symmetry that happens at the quantum level.

00:12:10.320 | Any single tiny symmetry that happens at the quantum level.

00:12:11.320 | So in a sense, it's impossible.

00:12:12.320 | So in a sense, it's impossible.

00:12:13.320 | And that's why we have this sort of like heat, diffusion, entropy models.

00:12:14.320 | So in a sense, it's impossible.

00:12:15.320 | And that's why we have this sort of like heat, diffusion, entropy models.

00:12:17.320 | They allow us to exist.

00:12:18.320 | So in a sense, it's impossible.

00:12:19.320 | And that's why we have this sort of like heat, diffusion, entropy models.

00:12:20.320 | They allow us to exist.

00:12:21.320 | But something is preserved.

00:12:22.320 | Relationships.

00:12:23.320 | Relationships.

00:12:24.320 | Any single asymmetry that happens at the quantum level.

00:12:28.320 | Any single tiny asymmetry that happens preserves causal links.

00:12:44.960 | And these causal links are the exact thing that I would like you to have as a takeaway from

00:12:52.400 | this workshop.

00:12:55.260 | The difference between simple rack, hybrid rack, any types of rack, and graph rack is that

00:13:02.980 | we are having the ability to keep these causal links in our memory systems.

00:13:11.520 | Basically the relationships are what preserves causality.

00:13:16.440 | That's why we can solve hallucinations.

00:13:24.840 | That's why we can optimize hypothesis generation and testing.

00:13:31.680 | So we will be able to do amazing research in biosciences, chemical sciences, just because

00:13:40.000 | of understanding that this causality is preserved within the relationships.

00:13:46.700 | And these relationships, when there are these asymmetries that are needed, they kind of create this curvature,

00:13:54.420 | I would say.

00:13:55.800 | So we intuitively feel every single of you is choosing some specific workshops and talks that

00:14:02.860 | you guys go to.

00:14:04.540 | Right now all of you are attending to the talk and workshop that we are giving.

00:14:13.600 | It means that it matters to you.

00:14:17.180 | And it means that potentially you see value.

00:14:21.120 | And this value, this information is transcended through space and time.

00:14:27.960 | So it's very subjective to you or any other object.

00:14:33.100 | And I think we really need to understand this.

00:14:39.600 | So LLMs are basically these weights and biases or correlations that give us this opportunity

00:14:45.800 | to be fuzzy, you know, actually one thing that I learned from Wojciech 10, 8, 11 years ago was

00:14:55.280 | that hallucinations are the exact necessary thing to be able to solve a problem where you have too little

00:15:02.760 | memory or too little compute for the combinatorial space of the problem you're solving.

00:15:07.640 | So you're basically imagining, you're taking some hypothesis based on your history and you're

00:15:15.120 | kind of trying to project it into the future.

00:15:17.520 | But you have too little memory, too little compute to do that, so it can be as good as the amount

00:15:21.840 | of memory and compute you have.

00:15:24.320 | So it means that the missing part is something that you kind of can curve thanks to all of these

00:15:31.760 | causal relationships and this fuzziness and reasoning is reading of this asymmetries.

00:15:45.600 | And the causal links.

00:15:47.440 | Hence, I really believe that agentic systems are sort of the next big thing right now because

00:15:59.880 | they are following the network database principle.

00:16:05.260 | But to be causal, to recover this causality from our fuzziness, we need graph databases.

00:16:14.520 | We need causal relationships.

00:16:16.780 | And that's the major thing in this emerging trend of GraphRack that we are here to talk about.

00:16:26.560 | And I would like to at this moment invite on stage our three amazing guest speakers.

00:16:36.580 | And I would like to start with Vasilya.

00:16:38.420 | Vasilya, please come over to the stage.

00:16:41.780 | Next will be Alex and Daniel.

00:16:46.460 | And I will present something myself.

00:16:48.500 | All right.

00:16:49.500 | So Vasilya will show us how to learn a search and optimize memory based on certain use case

00:16:58.520 | at hand.

00:17:00.520 | All right.

00:17:05.520 | Let's just make sure this works.

00:17:18.540 | Okay.

00:17:19.540 | Nice to meet you all.

00:17:26.560 | And I'm Vasilya, I'm originally from Montenegro, a small country in the Balkans.

00:17:30.560 | This is beautiful, so if you want to go there, my cousins Igor and Milos are going to welcome

00:17:34.900 | you.

00:17:35.900 | Everyone knows everyone.

00:17:36.900 | So, you know, in case you're just curious about memory, I'm building a memory tool on top of

00:17:42.380 | the graph and vector databases.

00:17:44.880 | My background is in business, big data engineering, and clinical psychology.

00:17:48.800 | So a lot what Mark talked about kind of connects to that.

00:17:53.140 | I'm going to show you a small demo here.

00:17:55.140 | The demo is to do a Mexican standoff between two developers, where we are analyzing their

00:18:00.140 | GitHub repositories.

00:18:01.940 | And these data from the GitHub depositories is in the graph, and this Mexican standoff means

00:18:07.820 | that we will let a crew of agents go, analyze, look at their data, and try to compare them against

00:18:13.940 | each other and give us a result that should represent who should we hire, let's say, ideally

00:18:19.740 | out of these two people.

00:18:21.060 | So what we're seeing here currently is how Cognify works in the background.

00:18:25.460 | So Cognify is working by adding some data, turning that into a semantic graph, and then

00:18:31.060 | we can search it with a wide variety of options.

00:18:33.380 | We plugged in crew AI on top of it, so we can pretty much do this on the fly.

00:18:37.020 | So here in the background, I have a client running.

00:18:40.020 | This client is connected to the system.

00:18:43.180 | So it's now currently searching the data sets and starting to build the graphs.

00:18:48.500 | So let's see.

00:18:49.980 | It takes a couple of seconds.

00:18:52.140 | But in the background, we are effectively ingesting the GitHub data from the GitHub API, building

00:18:58.880 | the semantic structure, and then letting the agents actually search it and make decisions

00:19:03.480 | on top of it.

00:19:04.560 | So as every time with live demos, things might go wrong, so I have a video version in case

00:19:10.220 | this does.

00:19:11.620 | Let's see.

00:19:16.360 | And I'll switch to the -- oh, here we go.

00:19:18.240 | So the semantic graph started generating.

00:19:21.040 | And as you can see, we have activity log where the graph is being continuously updated on the

00:19:26.560 | fly.

00:19:27.560 | Data is being stored in memory.

00:19:28.880 | And then data is being enriched, and the agents are going and making decisions on top.

00:19:33.120 | So what you can see here on the side is effectively the logic that is reading, writing, analyzing,

00:19:41.720 | and using all of this, let's say, preconfigured set of weights and benchmarks to analyze any person

00:19:48.320 | here.

00:19:49.320 | So this is a framework that's modular.

00:19:50.320 | You can build these tasks.

00:19:51.320 | You can ingest from any type of a data source, 30-plus data sources supported now.

00:19:55.540 | You can build any type of a custom graph.

00:19:57.380 | You can build graphs from relational databases, semi-structured data, and we also have these

00:20:02.020 | memory association layers inspired by the cognitive science approach.

00:20:05.920 | And then effectively, as we kind of build and enrich this graph on the fly, we see that

00:20:12.560 | it's getting bigger.

00:20:13.560 | It's getting more popular.

00:20:14.640 | And then we're storing the data back into the graph.

00:20:17.020 | So this is the stateful, temporal aspect of it.

00:20:20.560 | We can build the graph in a way that we can add the data back, that we can analyze these

00:20:25.120 | reports, that we can search them, and that we can let other agents access them on the fly.

00:20:29.460 | The idea for us was let's have a place where agents can write and continuously add the data

00:20:33.700 | in.

00:20:34.700 | So I'll have a look at the graph now.

00:20:37.820 | So we can inspect it a bit.

00:20:39.760 | So if we click on any node, we can see the details about the commits, about the information

00:20:45.920 | from the developers, the PRs, whatever they did in the past, and which repos they contributed

00:20:51.820 | to.

00:20:52.820 | And then at the end, as the graph is pretty much filled, we will see the final report kind

00:20:58.140 | of starting to come in.

00:20:59.140 | So let's see how far we got with this.

00:21:02.000 | So it's taking, it's preparing now the final output for the hiring decision task.

00:21:07.960 | So let's have a look at that when it gets loaded.

00:21:12.320 | We just finished this this morning.

00:21:13.420 | I hope to have a hosted version for you all today, but didn't work through AI's causing

00:21:17.980 | some troubles.

00:21:19.140 | So let's, we have to resolve this one.

00:21:23.860 | So let's see.

00:21:28.860 | Yes.

00:21:30.980 | So I will just show you the video with the end so we don't wait for it.

00:21:39.520 | So here you can see that towards the end, we can see the graph and we can see the final

00:21:52.040 | decision, which is a green node.

00:21:53.840 | And in the green node, we can see that we decided to hire Laszlo, our developer who has a PhD

00:22:00.280 | in graphs.

00:22:01.280 | So it's not really difficult to make that call.

00:22:03.200 | And we see why and we see the numbers and the benchmarks.

00:22:06.520 | So thank you.

00:22:07.520 | Again, very fast three minute demo, so hope you enjoyed.

00:22:10.560 | And if you have some questions, I'm here afterwards.

00:22:12.600 | We have, we are open source.

00:22:13.680 | So happy to see new users and if you're interested, try it.

00:22:16.940 | Thanks.

00:22:17.940 | Woohoo!

00:22:18.940 | Thank you.

00:22:20.940 | Thank you, Vasilija.

00:22:23.140 | Next up is Alex.

00:22:24.960 | So Vasilija showed us something I call semantic memory.

00:22:29.560 | So basically you take your data, you load it and cognify it, as they like to say.

00:22:36.320 | Come on, come on up, Alex.

00:22:38.960 | And that's the base.

00:22:42.360 | That's something we already are doing.

00:22:45.480 | And next up is Alex will show us Neo4j MCP server.

00:22:52.240 | The stage is yours.

00:22:53.400 | Test, test, test, test, test, test, test, five, four, three, two, one, we're good.

00:23:08.280 | Okay.

00:23:09.280 | All right.

00:23:11.280 | So, hi everyone.

00:23:12.280 | My name's Alex.

00:23:13.320 | I'm an AI architect at Neo4j.

00:23:14.320 | I'm going to demo the memory MCP server that we have available.

00:23:22.440 | So there is this walkthrough document that I have.

00:23:26.200 | We'll make this available in the Slack or by some means so that you can do this on your

00:23:29.660 | own.

00:23:30.660 | But it's pretty simple to set up.

00:23:32.600 | And what we're going to showcase today is really like the foundational functionality that

00:23:35.880 | we would like to see in a agentic memory sort of application.

00:23:40.520 | Primarily, we're going to take a look at semantic memory in this MCP server, but we are currently

00:23:45.980 | developing it.

00:23:46.980 | And we're going to add additional memory types as well, which we'll discuss probably later

00:23:50.380 | on in the presentation.

00:23:53.380 | So in order to do this, we will need a Neo4j database.

00:23:57.280 | Neo4j is a graph native database that we'll be using to store our knowledge graph that we're

00:24:00.960 | creating.

00:24:02.460 | They have an Aura option, which is hosted in the cloud, or we can just do this locally

00:24:07.540 | with the Neo4j desktop app.

00:24:09.620 | Additionally, we're going to do this via Claw desktop.

00:24:13.780 | And so we just need to download that.

00:24:16.300 | And then we can just add this config to the MCP configuration file in Claw.

00:24:22.180 | And this will just connect to the Neo4j instance that you create.

00:24:26.300 | And what's happening here is we're going to -- Claw will pull down the memory server from

00:24:32.100 | PyPy and it will host it in the back end for us.

00:24:34.340 | And then it will be able to use the tools that are accessible via the MCP server.

00:24:39.980 | And the final thing that we're going to do before we can actually have the conversation

00:24:42.500 | is we're just going to use this brief system prompt.

00:24:45.580 | And what this does is just ensure that we are properly recalling and then logging memories

00:24:50.360 | after each interaction that we have.

00:24:52.540 | So with that, we can take a look at a conversation that I had in Claw desktop using this memory

00:24:59.140 | server.

00:25:00.680 | And so this is a conversation about starting an agentic AI memory company.

00:25:05.780 | And so we can see all these tool calls here.

00:25:09.000 | And so initially, we have nothing in our memory store, which is as expected.

00:25:13.980 | Now, as we kind of progress through this conversation, we can see that at each interaction, it tries

00:25:20.080 | to recall memories that are related to the user prompt.

00:25:24.180 | And then at the end of this interaction, it will create new entities in our knowledge graph

00:25:30.320 | and relationships.

00:25:31.660 | And so in this case, an entity is going to have a name, a type, and then a list of observations.

00:25:37.820 | And these are just facts that we know about this entity.

00:25:40.600 | And this is what is going to be updated as we learn more.

00:25:43.920 | In terms of the relationships, these are just identifying how these entities relate to one another.

00:25:51.020 | And this is really the core piece of why using a graph database as sort of the context layer

00:25:57.020 | is so important because we can identify how these entities are actually related to each other.

00:26:03.120 | It provides a very rich context.

00:26:06.420 | And so as this goes on, we can see that we have quite a few interactions.

00:26:10.480 | We are adding observations, creating more entities.

00:26:14.580 | And at the very end here, we can see we have quite a lengthy conversation.

00:26:17.620 | We can say, let's review what we have so far.

00:26:21.180 | And so we can read the entire knowledge graph back as context, and Claude can then summarize

00:26:26.220 | that for us.

00:26:27.460 | And so we have all of the entities we found, all the relationships that we've identified,

00:26:31.480 | and all the facts that we know about these entities based on our conversation.

00:26:35.660 | And so this provides a nice review of what we discussed about this company and our ideas

00:26:40.300 | about how to create it.

00:26:42.380 | Now we can also go into Neo4j browser.

00:26:45.560 | This is available both in Aura and local, and we can actually visualize this knowledge

00:26:48.720 | graph.

00:26:49.720 | We can see that we discussed Neo4j, we discussed MCP, and Lang graph.

00:26:54.380 | And if we click on one of these nodes, we can see that there is a list of observations

00:26:58.680 | that we have.

00:26:59.680 | And this is all of the information that we've tracked throughout that conversation.

00:27:02.880 | And so it's important to know that even though this knowledge graph was created with

00:27:06.180 | a single conversation, we can also take this and use it in additional conversations.

00:27:10.400 | We can use this knowledge graph with other clients such as cursor IDE or Windsurf.

00:27:15.560 | And so this is really a powerful way to create a memory layer for all of your applications.

00:27:25.660 | And so with that, I'll pass it on.

00:27:28.560 | Thank you.

00:27:29.560 | All right.

00:27:30.560 | Give a round of applause to Alex.

00:27:35.720 | Thank you, Alex.

00:27:36.720 | The next up is Daniel.

00:27:38.380 | I will just assure personal beliefs about MCPs.

00:27:43.420 | I was testing MCPs of Neo4j, Graffiti, Cogni, Memzear just before the workshop.

00:27:50.380 | And I'm a strong believer that this is our future.

00:27:54.780 | We'll have to work on that.

00:27:56.140 | And in a second, I will be showing a mini graph rack chat arena.

00:28:01.320 | And next up, something very, very important that Daniel does is temporal graphs.

00:28:07.100 | Daniel is co-founder of Graffiti and Zepp.

00:28:09.720 | They have 10,000 stars on GitHub and growing very fast.

00:28:13.660 | The stage is yours, Daniel.

00:28:14.960 | Please show us what you do.

00:28:16.380 | Thank you.

00:28:18.220 | So 5, 4, 3, 2, 1.

00:28:22.980 | Did that work?

00:28:24.620 | It seems to have, right?

00:28:26.980 | So I'm here today to tell you that there's no one-size-fits-all memory.

00:28:36.820 | And why you need to model your memory after your business domain.

00:28:42.580 | So if you saw me a little bit earlier and I was talking about Graffiti, Zepp's open source

00:28:48.020 | temporal graph framework, you might have seen me just speak to how you can build custom entities

00:28:56.020 | and edges in the Graffiti graph for your particular business domain.

00:29:02.700 | So business objects from your business domain.

00:29:05.640 | What I'm going to demo today is actually how Zepp implements that and how easy it is to use

00:29:10.660 | from Python, TypeScript, or Go.

00:29:13.740 | And what we've done here is we've solved a fundamental problem plaguing memory.

00:29:18.660 | And we're enabling developers to build out memory that is far more cogent and capable for many

00:29:30.460 | different use cases.

00:29:31.580 | So I'm going to just show you a quick example of where things go really wrong.

00:29:38.940 | So many of you might have used ChatGPT before.

00:29:41.380 | It generates facts about you in memory.

00:29:44.780 | And you might have noticed that it really struggles with relevance.

00:29:49.060 | Sometimes it just pulls out all sorts of arbitrary facts about you.

00:29:53.060 | And unfortunately, when you store arbitrary facts and retrieve them as memory, you get inaccurate

00:29:59.020 | responses or hallucinations.

00:30:02.420 | And the same problem happens when you're building your own agents.

00:30:06.500 | So here we go.

00:30:07.500 | We have an example media assistant.

00:30:10.460 | And it should remember things about jazz music, NPR podcasts, the daily, et cetera, all the things

00:30:16.340 | that I like to listen to.

00:30:18.420 | But unfortunately, because I'm in conversation with the agent or it's picking up my voice when

00:30:22.700 | I'm, you know, it's a voice agent, it's learning all sorts of irrelevant things, like I wake

00:30:28.380 | up at 7:00 a.m., my dog's name is Melody, et cetera.

00:30:33.220 | And the point here is that irrelevant facts pollute memory.

00:30:39.260 | They're not specific to the media player business domain.

00:30:43.100 | And so the technical reality here is as well that many frameworks take this really simplistic

00:30:49.440 | approach to generating facts.

00:30:52.340 | If you're using a framework that has memory capabilities, agent framework, it's generating

00:30:56.860 | facts and throwing it into a vector database.

00:30:59.020 | And unfortunately, the facts dumped into the vector database or Redis mean that when you're

00:31:04.240 | recalling that memory, it's difficult to differentiate what should be returned.

00:31:08.100 | We're going to return what is semantically similar.

00:31:12.020 | And here we have a bunch of facts that are semantically similar to my request for my favorite tunes.

00:31:20.760 | We have some good things.

00:31:22.200 | And unfortunately, Melody is there as well, because Melody is a dog named Melody.

00:31:27.720 | And that might be something to do with tunes.

00:31:31.900 | And so a bunch of irrelevant stuff.

00:31:37.880 | So basically, semantic similarity is not business relevance.

00:31:43.540 | And this is not unexpected.

00:31:45.960 | I was speaking a little bit earlier about how vectors and are just basically projections into

00:31:51.220 | an embedding space.

00:31:52.220 | There's no causal or relational relations between them.

00:31:59.540 | And so we need a solution.

00:32:00.720 | We need domain-aware memory, not better semantic search.

00:32:06.200 | So with that, I am going to, unfortunately, be showing you a video because the Wi-Fi has been

00:32:13.940 | absolutely terrible.

00:32:19.100 | And let me bring up the video.

00:32:23.780 | Okay, so I built a little application here.

00:32:29.680 | And it is a finance coach.

00:32:33.600 | And I've told it I want to buy a house.

00:32:37.860 | And it's asking me, well, how much do I earn a year?

00:32:42.760 | It's asking me about what student loan debt I might have.

00:32:46.500 | And you'll see that on the right-hand side, what is stored in Zepp's memory are some very

00:32:54.440 | explicit business objects.

00:32:59.260 | We have financial goals, debts, income sources, et cetera.

00:33:05.960 | These are defined by the developer.

00:33:08.780 | And they're defined in a way which is really simple to understand.

00:33:13.900 | We can use Pydantic or Zod or GoStructs.

00:33:19.180 | And we can apply business rules.

00:33:21.020 | So let's go take a look at some of the code here.

00:33:23.920 | We have a TypeScript financial goal schema using Zepp's underlying SDK.

00:33:29.680 | We can define these entity types.

00:33:32.160 | We can give a description to the entity type.

00:33:35.120 | We can even define fields, the business rules for those fields, the values that they take on.

00:33:41.320 | And then we can build tools for our agent to retrieve a financial snapshot which runs multiple

00:33:47.780 | Zepp searches at the same time concurrently and filters by specific node types.

00:33:58.100 | And when we start our Zepp application, what we're going to do is we're going to register

00:34:02.600 | these particular objects with Zepp so it knows to build this ontology in the graph.

00:34:12.620 | So let's do a quick little addition here.

00:34:15.880 | I'm going to say that I have $5,000 a month rent.

00:34:21.880 | I think it's rent.

00:34:25.720 | And in a few seconds, we see that Zepp's already parsed that new message and has captured that

00:34:33.000 | $5,000.

00:34:34.000 | And we can go look at the graph.

00:34:35.780 | This is the Zepp front end.

00:34:38.780 | And we can see the knowledge graph for this user has got a debt account entity.

00:34:45.240 | It's got fields on it that we've defined as a developer.

00:34:50.220 | And so again, we can really get really tight about what we retrieve from Zepp by filtering.

00:34:55.620 | Okay.

00:34:56.620 | So we're at time.

00:34:57.620 | So just very quickly, we wrote a paper about how all of this works.

00:35:01.400 | You can get to it by that link below and appreciate your time today.

00:35:08.720 | You can look me up afterwards.

00:35:11.720 | Great paper, really.

00:35:13.300 | All right.

00:35:14.720 | So once I'm getting ready, I would appreciate if you confirm with me whether you have access

00:35:21.300 | to Slack.

00:35:22.520 | Is the Slack working for you, the Slack channel?

00:35:24.520 | All right.

00:35:25.520 | I think we're slowly running out of time.

00:35:27.420 | So I'd appreciate if you have any questions to any of the speakers.

00:35:32.060 | Please write these questions on Slack.

00:35:34.280 | And we will be outside of this room.

00:35:36.700 | And we are happy to answer more of these questions just after the workshop.

00:35:41.160 | I right now move on with a use case that I developed and to this GraphRack chat arena.

00:35:51.720 | To be specific, before delving into agentic memory, into knowledge graphs, I led a private

00:36:04.860 | cyber security lab and worked for defence clients.

00:36:10.180 | A very big client with very serious problems on the security side.

00:36:16.320 | And I used to -- in one project, I had to navigate between something like 27, 29 different terminals

00:36:25.580 | and shells.

00:36:27.740 | And it requires knowing lots of languages.

00:36:33.840 | Like if you think of different Linux distros, every firewall and networking devices usually

00:36:38.960 | has its own shell, proprietary often.

00:36:41.740 | There is power shell.

00:36:42.740 | So you need to know lots of languages to communicate with these machines to work with such clients.

00:36:48.060 | And I realized that LLMs are not only amazing to translate these languages, but they are also

00:36:54.000 | very good to kind of create a new type of shell, a human language shell.

00:36:59.600 | There are such shells.

00:37:00.600 | But such shells, they would really be excellent if they have episodic memory, the sort of temporal

00:37:10.860 | memory of what was happening in this shell historically.

00:37:14.440 | And if we have access to this temporal history, the events, we kind of know what the users were

00:37:21.640 | doing, what their behaviours are.

00:37:23.780 | We kind of can control every single code execution function that's running, including the ones

00:37:28.820 | of agents.

00:37:30.420 | So I spotted with some investors and advisers of mine, I spotted a niche.

00:37:37.640 | Something we call agentic firewall.

00:37:38.780 | And I wanted to do a super quick demo of how it would work.

00:37:45.020 | So basically, you would run commands and type pwd.

00:37:51.380 | And in a sense, I suppose lots of us had computer science classes or we worked in shell.

00:37:58.980 | And we have to remember all of these commands, like, show me running Docker containers.

00:38:06.020 | Like, it's docker ps, right?

00:38:08.200 | But if you go for more advanced commands, I think it's for a reason, yeah, I think it's

00:38:16.780 | for a reason.

00:38:23.780 | All right, it's there.

00:38:35.540 | Okay, thank you.

00:38:39.540 | In general, I would need to know right now some command that can extract me, for instance,

00:38:45.200 | the name of the container that's running and its status.

00:38:48.660 | Show me just image and status.

00:38:53.720 | I can make mistakes, like human language fuzzy mistakes.

00:39:00.700 | So if Apache is running.

00:39:04.520 | All right.

00:39:05.520 | All right.

00:39:06.520 | All right.

00:39:07.520 | Show the command we did three commands ago.

00:39:14.920 | So basically, if you plug in the agentic -- if you plug in the agentic memory to things like

00:39:24.380 | that, I think it got it wrong, but you get me right.

00:39:29.000 | So if I get through, like, different shells and terminals, and I have this textual context

00:39:36.060 | of what was done and the context of the certain machine of what is happening here.

00:39:43.860 | And it kind of spans across all the user -- all the machines, all the users, and all the sessions

00:39:49.200 | in PTYs, TTYs, I think that we can really have a very good context also for security.

00:39:57.140 | So that space, the temporal logs, the episodic logs, is something that I see will boom and

00:40:03.340 | emerge.

00:40:04.500 | So I believe that all of our agents that will be executing code in terminals will be executing

00:40:13.100 | it through -- maybe not all, but the ones that are running on the enterprise gate.

00:40:19.020 | They will be going through agentic firewalls.

00:40:21.920 | I'm close to sure about that.

00:40:25.420 | So that's my use case.

00:40:27.880 | And now let's move on to GraphRack chat arena.

00:40:31.500 | So you have on Slack a link to this doc.

00:40:36.880 | And this doc is allowing you to set up a repo that we've created for this workshop.

00:40:42.800 | And we'll be promoting it afterwards.

00:40:45.360 | So about a year ago, I met with Jerry Liu from LamaLindex and we were chatting quite a while

00:40:50.740 | about how to evolve this conversational memory.

00:40:55.240 | And he gave me two pieces of advice.

00:40:57.560 | One of them, think about data abstractions.

00:40:59.660 | The other, think about evals.

00:41:01.820 | Data abstractions, I kind of quickly solved within like two months.

00:41:05.440 | Evals, I realized that there won't be any evals in form of a benchmark.

00:41:11.020 | All of these hot potatoes and all of that, it's fun.

00:41:13.520 | I know that there are great papers written by our guest speakers and other folks about hot

00:41:18.580 | potatoes.

00:41:19.580 | But it's not the thing.

00:41:20.900 | You can't do a benchmark for a thing that doesn't exist.

00:41:25.060 | Basically, the agentic GraphRack memory will be this type of memory that evolves.

00:41:31.860 | So you don't know what will evolve.

00:41:33.780 | So if you don't know what will evolve, you will need a simulation arena.

00:41:37.140 | And that will be the only right evil.

00:41:42.920 | So one year, fast forward, and we've created a prototype of such agentic memory arena.

00:41:50.340 | Think about it like web arena, but for memory.

00:41:52.700 | And let me quickly show you that.

00:41:54.800 | You can go to this repository.

00:41:56.520 | I did a fork of that.

00:41:58.240 | There is Memzero.

00:41:59.240 | There is Graffiti.

00:42:00.240 | There is Cogni.

00:42:02.980 | And there will be two approaches.

00:42:04.320 | One approach will be sort of the repo, the library itself, and the other is through MCPs.

00:42:12.300 | Because we don't really know what will work out better.

00:42:15.100 | So whether repos or the MCPs will work out better.

00:42:17.400 | So we need to test these different approaches.

00:42:19.780 | So we need to create this arena for that.

00:42:22.640 | So we basically clone that repo.

00:42:24.780 | And we use ADK for that.

00:42:28.420 | So we get this nice chat where you can talk to these agents.

00:42:34.660 | And you can switch between agents.

00:42:37.500 | So I want to talk with Neo.

00:42:39.600 | And there is a Neo4j agent running behind the scenes.

00:42:43.160 | There is a Cypher graph agent running behind the scenes.

00:42:47.020 | And I can kind of for now switch between these agents.

00:42:50.520 | Maybe I'll increase the font size a little bit.

00:42:52.800 | So the Neo agent is basically answering the questions about this amazing technology, the graphs, specifically Neo4j.

00:43:02.440 | And I can switch to Cypher.

00:43:04.440 | And then an agent that is excellent at running Cypher queries talks with me.

00:43:10.660 | And I'm writing add to graph that I mark and I'm passionate about memory architectures.

00:43:16.520 | And basically what it does is it runs these layers that are created by Cogni, by Memzero, by Graffiti, and all the other vendors of semantic and temporal memory solutions.

00:43:29.520 | Or specifically created by an MCP server that Alex was demonstrating, the Neo4j MCP server.

00:43:40.180 | So I'm really looking forward to how this technology evolves.

00:43:46.260 | But what I quickly wanted to show you is that it already works.

00:43:50.460 | It has this science of being this agentic memory arena.

00:43:54.920 | So I can ask my graph through questions, and the agent goes to the connection.

00:44:01.760 | This is just one - you know what's amazing?

00:44:04.160 | It's just one Neo4j graph.

00:44:07.660 | It's just one Neo4j graph on the backend, and all of these technologies that can be tested.

00:44:13.360 | How the graphs are being created and retrieved.

00:44:16.600 | It's like - when I think of that, it's like the most brilliant idea that we can do with agentic

00:44:23.260 | memory simulations.

00:44:24.800 | So I get answers from the graph.

00:44:27.700 | Here is the graph.

00:44:29.240 | I can basically rerun the commands to see what's happening on this graph.

00:44:35.340 | And let me just move on.

00:44:38.640 | And next thing is, I would like to add to the graph that Vasilio will show how to integrate

00:44:44.680 | Cogni and ta-da-da-da.

00:44:46.040 | So I add new information.

00:44:48.160 | And the cipher writes it to the graph.

00:44:51.380 | And then I want to do something else.

00:44:53.420 | It's super early stage still.

00:44:55.060 | But then I transfer it to graffiti, and I can repeat the exact same process.

00:45:00.460 | So I can right now, using graffiti, search what I just added.

00:45:04.960 | And I can switch between these different memory solutions.

00:45:08.160 | So that's why I'm so excited about that.

00:45:10.800 | And we do not have time to practice it together, do the workshop, but I'm sure we will write

00:45:16.520 | some articles.

00:45:17.900 | So please follow us.

00:45:20.040 | And I would appreciate, if you have any questions, pass them on to Slack.

00:45:25.800 | I will ask Andreas whether we have time for a short Q&A.

00:45:30.480 | Or we need to move it to the breakout or outside of the room.

00:45:33.840 | It would take, like, five minutes?

00:45:35.480 | Five minutes.

00:45:36.480 | All right.

00:45:37.480 | So that's all for now, for today.

00:45:41.680 | I really would like Vasilio, Daniel, and Alex to come back to stage so you can ask any of

00:45:47.480 | us, please direct the questions to any of us, and we'll try to answer them.

00:45:54.680 | Yeah.

00:45:55.680 | Let's go.

00:45:56.680 | Hi.

00:45:57.680 | I'm Lucas.

00:45:58.680 | I want to ask a fundamental question.

00:46:00.180 | How do you decide what is a bad memory over time?

00:46:04.880 | Because you could, like, as a developer and as a person, we evolve the line of thought.

00:46:10.480 | So one thing that you thought was good, like, three years, ten years ago, may not be good

00:46:15.960 | right today.

00:46:17.560 | So how do you decide?

00:46:18.560 | Sure.

00:46:19.560 | A very good question.

00:46:21.260 | So I will answer in -- maybe you guys can help.

00:46:25.160 | I will answer in a very scientific way.

00:46:27.740 | So basically the one that causes a lot of noise.

00:46:30.720 | The noisy one doesn't make a lot of sense.

00:46:33.740 | So you decrease noise by redundancy and by relationships.

00:46:39.640 | So the less relationships and the more noisiness, the -- so in a sense, a not well-connected

00:46:47.820 | node has the potential of not being correct, but there are other ways to validate that.

00:46:55.000 | Would you like to follow on?

00:46:56.340 | Yeah, sure.

00:46:57.340 | A practical way, we let you model the data with Pydantix so you can kind of blow the data

00:47:03.180 | you need and add weights to the edges and nodes.

00:47:06.420 | So you can do something like temporal weighting, you can add your custom, let's say, logic and

00:47:10.600 | then effectively you would know how your data is kind of evolving in time and how it's becoming

00:47:15.260 | less or more relevant and what is the set of algorithms you would need to apply.

00:47:19.600 | So this is the idea, not solve it for you, but help you solve it with tooling.

00:47:23.880 | But yeah, there is -- depends on the use case, I would say.

00:47:25.780 | Yeah.

00:47:26.780 | I have nothing to add.

00:47:27.780 | I think that's a great explanation.

00:47:28.780 | I think what I would add is that there is missing causal links.

00:47:33.740 | Missing causal links is what is most probably a good indicator of fuzzy nice.

00:47:40.780 | Yeah.

00:47:41.780 | Next question.

00:47:42.960 | Can you hear me?

00:47:44.960 | How would you embed in security or privacy into the network or the application layer?

00:47:51.620 | If there's a corporate, they have top secret data, or I have personal data that is a graph,

00:47:55.960 | I want to share that, but not all of it.

00:47:58.960 | Oh, that's a really good one.

00:48:00.960 | I think I'll answer that very briefly.

00:48:04.380 | So basically, you do have to have that context.

00:48:07.100 | You do have to have these decisions, intentions of colonels, of majors, and anyone like in the

00:48:13.280 | enterprise -- like CISOs and anyone in the enterprise stack.

00:48:17.640 | And in a sense, it also gets kind of like fuzzy and complex, so I expect this to be a very

00:48:23.140 | big challenge.

00:48:24.140 | That's why I want to work on that.

00:48:25.140 | But I'm sure that applying ontologies, the right ontologies, first of all, to this enterprise

00:48:30.400 | cybersecurity stack really kind of provides these guardrails for navigating this challenging

00:48:38.040 | problem and decreasing this fuzziness and errors.

00:48:41.040 | Thank you.

00:48:42.040 | Yeah.

00:48:43.040 | I would also just add, like, all these applications are built on Neo4j.

00:48:46.740 | And so in Neo4j, you can, like, do role-based access controls, and so you can prevent users

00:48:53.200 | from accessing data that they're not allowed to see.

00:48:55.580 | So it's something that you can configure with that.

00:48:58.460 | And one more question.

00:48:59.460 | Hi.

00:49:00.460 | This question is for Mark.

00:49:01.460 | Yeah.

00:49:02.460 | Yeah.

00:49:03.460 | Go on.

00:49:04.460 | Go on.

00:49:05.460 | Go on.

00:49:06.460 | Go on.

00:49:07.460 | You were about to say something?

00:49:08.460 | Please go ahead first.

00:49:09.460 | Yeah.

00:49:09.460 | Just one thing.

00:49:10.460 | Like, we also noticed that if you isolate a graph per user or kind of keep it, like,

00:49:11.460 | very physically separate, for us, it really works well.

00:49:13.460 | People react to that really well.

00:49:14.720 | So that's one way.

00:49:15.840 | Yes.

00:49:16.840 | Independent graphs.

00:49:17.840 | Personal graphs.

00:49:18.840 | Yeah.

00:49:19.840 | Mark, in your earlier presentation, you mentioned this equation that relates to gravity, entropy,

00:49:24.300 | and something, and also memory and compute.

00:49:27.220 | Yes.

00:49:28.220 | Could you show those two again and explain them again?

00:49:30.720 | Of course.

00:49:31.720 | Yeah.

00:49:32.720 | If we have time.

00:49:33.720 | Other than that, it's probably for a series of papers to properly explain that.

00:49:38.600 | So that's one.

00:49:39.600 | Memory times compute equals size square.

00:49:41.780 | The other one is that if you take all the attention, diffusion, and VAs, which are doing

00:49:45.840 | the smoothing, it preserves the sort of asymmetries.

00:49:50.000 | So very briefly speaking, let's set up the vocabulary.

00:49:53.060 | So first of all, curvature equals attention equals gravity.

00:49:57.280 | This is the very simple, most important principle here.

00:50:00.520 | I will need to, when writing these papers, we are really tightly trying to define these three.

00:50:05.600 | Next.

00:50:06.600 | Diffusion, heat, entropy.

00:50:07.600 | Next one.

00:50:08.600 | It's the exact same thing.

00:50:09.960 | We just need to align definitions.

00:50:11.640 | And if it's not the exact same thing, if there are other definitions, we need to show what's

00:50:15.040 | really different.

00:50:17.040 | And now, if you think about attention, it kind of shows the sort of, like, pathways towards

00:50:23.600 | certain asymmetries.

00:50:25.360 | If you take a sphere, if you start bending that sphere and make it like, you know, you kind

00:50:30.360 | of try to extend it, two things happen, entropy increases and curvature increases, in a sense.

00:50:37.560 | And Perlman, what he did, he proved that you can, like, bend these spheres in any way, 3D

00:50:42.360 | spheres.

00:50:43.360 | 4D and 5D and higher level spheres were already solved.

00:50:46.800 | So he solved for a 3D sphere.

00:50:49.520 | And these equations are proving that basically there won't be any other architectures for LLMs.

00:50:55.360 | It will be just attention, diffusion models, and VAs.

00:50:58.400 | Maybe not just VAs, but, like, kind of, like, something that smooths -- leaves room for biases.

00:51:03.840 | All right.

00:51:04.840 | Thank you, all.

00:51:05.840 | I really appreciate you coming.

00:51:08.520 | I hope it was helpful.

00:51:09.960 | Thank you, the guest speakers.

00:51:12.400 | And we'll answer the questions outside of the room.

00:51:15.600 | I appreciate that.

00:51:16.600 | Thank you.

00:51:17.600 | Thank you.

00:51:18.600 | Thank you.

00:51:19.600 | We'll see you next time.

Memory Masterclass: Make Your AI Agents Remember What They Do! — Mark Bain, AIUS