Conversational Memory in LangChain for 2025

00:00:00.000 | in this chapter we're going to be taking a look at conversational memory in linechain we're going

00:00:06.160 | to be taking a look at the core like chat memory components that have already been in linechain

00:00:13.760 | since the start but are essentially no longer in the library and we'll be seeing how we actually

00:00:20.560 | implement those historic conversational memory utilities in the new versions of linechain so

00:00:29.920 | 0.3 now as a pre-warning this chapter is fairly long but that is because conversational memory is

00:00:37.360 | just such a critical part of chatbots and agents conversational memory is what allows them to

00:00:44.320 | remember previous interactions and without it our chatbots and agents would just be responding to

00:00:51.200 | the most recent message without any understanding of previous interactions within a conversations

00:00:57.360 | so they would just not be conversational and depending on the type of conversation we might

00:01:03.920 | want to go with various approaches to how we remember those interactions within a conversation now throughout

00:01:12.960 | this chapter we're going to be focusing on these four memory types we'll be referring to these and i'll

00:01:19.200 | be showing you actually how each one of these works but what we're really focusing on is rewriting these

00:01:25.600 | for the latest version of langchain using the what's called the runnable with message

00:01:33.840 | history so we're going to be essentially taking a look at the original implementations for each

00:01:42.080 | of these four original memory types and then we'll be rewriting them with the the runnable memory history

00:01:48.320 | class so just taking a look at each of these four very quickly a constantial buffer memory is i think

00:01:56.560 | the simplest most intuitive of these memory types it is literally just you have your messages they come

00:02:05.520 | in to this object they are stored in this object as essentially a list and when you need them again

00:02:12.640 | it will return them to you there's nothing nothing else to it's super simple the conversation buffer

00:02:18.880 | window memory okay so new word in the middle of the window this works in pretty much the same way

00:02:25.760 | but those messages that it has stored it's not going to return all of them for you instead it's just going to

00:02:31.680 | return the most recent let's say the most recent three for example okay and that is defined by a parameter

00:02:39.280 | k conversational summary memory rather than keeping track of the entire interaction memory directly what

00:02:47.680 | it's doing is as those interactions come in it's actually going to take them and it's going to compress

00:02:53.600 | them into a smaller little summary of what has been within that conversation and as every a new interaction

00:03:01.200 | is coming in it's going to do that and i keep iterating on that summary and then that is going

00:03:06.400 | to return to us when we need it and finally we have the conversational summary buffer memory so this is it's

00:03:14.000 | taking so the buffer part of this is actually referring to a very similar thing to the buffer window memory but

00:03:21.040 | rather than it being a you know most k messages it's looking at the number of tokens within your memory

00:03:27.360 | and it's returning the most recent k tokens that's what the buffer part is there and then it's also

00:03:36.480 | merging that with the summary memory here so essentially what you're getting is almost like a list of the most

00:03:44.080 | recent messages based on the token length rather than the number of interactions plus a summary which would

00:03:50.960 | you know come at the the top here so you get kind of both the idea is that obviously this summary here

00:03:57.760 | would maintain all of your interactions in a very compressed form so you're you're losing less

00:04:05.360 | information and you're still maintaining you know maybe the very first interaction the user might have

00:04:10.240 | introduced themselves giving you their name hopefully that would be maintained within the summary and it

00:04:16.560 | would not be lost and then you have almost like a higher resolution on the most recent um k or k tokens

00:04:24.000 | from your memory okay so let's jump over to the code we're going into the zero four chat memory notebook

00:04:30.880 | okay open that in colab okay now here we are let's go ahead and install the prerequisites run all

00:04:37.520 | we again can or cannot use alignsmith it is up to you enter that and let's come down and start

00:04:47.360 | so first to just initialize our lm using 40 mini in this example again low temperature

00:04:55.920 | and we're going to start with conversation buffer memory okay so this is the original version of

00:05:03.760 | this memory type so let me uh where are we we're here so memory conversation buffer memory and we're

00:05:12.320 | returning messages that needs to be set to true so the reason that we set return messages true as it

00:05:19.040 | mentions up here is if you do not do this it's going to be returning your chat history as a string to an llm

00:05:27.280 | whereas well chat llms nowadays would expect message objects so yeah you just want to be returning these

00:05:37.440 | as messages rather than as strings okay otherwise yeah you're going to get some kind of strange

00:05:43.200 | behavior out from your lms if you return them strings so you do want to make sure that it's true i think

00:05:48.400 | by default it might not be true but this is coming this is deprecated right it does tell you here

00:05:54.240 | as deprecation warning this is coming from older blank chain but it's a good place to start just to

00:06:00.000 | understand this and then we're going to rewrite this with the runnables which is the recommended way of

00:06:04.880 | doing so nowadays okay so adding messages to our memory we're going to write this okay so it's just a

00:06:13.040 | just a conversation user ai user ai so on random chat main things to note here is i do provide my name

00:06:20.480 | we have the the model same right towards the start of those interactions okay so i'm just going to

00:06:25.760 | add all of those we do it like this okay then we can just see we can load our history like so so let's just

00:06:37.840 | see what we have there okay so we have a human message ai message human message right this is

00:06:42.960 | exactly what we i showed you just here it's just in that message format from line chain okay so we can

00:06:50.640 | do that alternatively we can actually do this so we can get our memory we initialize the constational

00:06:56.960 | buffer memory as we did before and we can actually add it directly each message into our memory like that

00:07:03.600 | so we can use this add user message add ai message so on and so on load again and it's going to give

00:07:08.800 | us the exact same thing again there's multiple ways to do the same thing cool so we have that to pass all

00:07:15.920 | of this into our lm again this is all deprecated stuff we're going to learn how to use properly in a moment

00:07:21.440 | but this is how long chain is doing in the past so to pass all of this into our lm we'd be using this

00:07:28.720 | conversation chain right again this is deprecated nowadays we would be using lcell for this so i just

00:07:37.520 | want to show you okay how this would all go together and then we would invoke okay what is my name again

00:07:42.240 | let's run that and we'll see what we get it's remembering everything remember so this conversation

00:07:49.120 | buffer memory it doesn't drop messages it just remembers everything right and honestly with the sort of high

00:07:56.960 | context windows of many lms that might be what you do it depends on how long you expect a conversation

00:08:02.160 | to go on for but you could you probably in most cases would get away with this okay so what let's

00:08:09.360 | see what we get um i say what is my name again okay let's see what it gives me says your name is james

00:08:15.520 | great thank you that works now as i mentioned all of this i just showed you is actually deprecated that's

00:08:22.240 | the old way of doing things let's see how we actually do this in modern or up-to-date line chain

00:08:28.080 | so we're going to be using this runnable with message history to implement that we will need to use lcell

00:08:34.720 | and for that we will need to just define prompt templates our lm as we usually would okay so we're

00:08:41.520 | going to set up our system prompt which is just your helpful system called zeta okay we're going to put in this

00:08:48.640 | messages placeholder okay so that's important essentially that is where our messages are coming

00:08:56.320 | from our conversation buffer memory is going to be inserted right so it's going to be that chat history

00:09:02.880 | is going to be inserted after our system prompt but before our most recent query which is going to be

00:09:08.880 | inserted last here okay so messages placeholder item that's important and we use that throughout the

00:09:16.160 | course as well so we use it both for chat history and we'll see later on we also use it for the

00:09:21.120 | intermediate thoughts that a agent would go through as well so important to remember that little thing

00:09:28.000 | we'll link our prompt template to our lm again if we would like we could also add in the i think we only

00:09:36.800 | have the query here oh we would probably also want our history as well but i'm not going to do that right

00:09:44.240 | now okay so we have our pipeline and we can go ahead and actually define our runnable with message history

00:09:51.840 | now this class or object when we are initializing it does require a few items we can see them here

00:09:57.440 | okay so we see that we have our pipeline with history so it's basically going to be uh you can

00:10:03.520 | you can see here right we have that history meshes key right this here has to align with what we provided

00:10:10.480 | as the meshes placeholder in our pipeline right so we have our pipeline prompt template here and here

00:10:19.600 | right so that's where it's coming from it's coming from meshes placeholder variable name is history right

00:10:24.240 | that's important that links to this then for the input meshes key here we have query that again links to this

00:10:34.960 | okay so both important to have that the other thing that is important is obviously we're passing in that

00:10:42.240 | pipeline from before but then we also have this get session history basically what this is doing

00:10:47.840 | is it's saying okay i need to get the list of messages that make up my chat history that are going to be

00:10:53.280 | inserted into this variable so that is a function that we define okay and with it within this function

00:10:59.920 | what we're trying to do here is actually replicate what we have with the previous conversation buffer

00:11:08.240 | memory okay so that's what we're doing here so it's very simple right so we have this in memory chat

00:11:17.040 | message history okay so that's just the object that we're going to be returning what this will do is

00:11:22.320 | it will set up a session id the session i use essentially like a unique identifier so that each

00:11:28.240 | conversational interaction within a single conversation is being mapped to a specific

00:11:32.880 | conversation so you don't have overlapping let's say of multiple users using the same system you want

00:11:37.520 | to have a unique session id for each one of those okay and what it's doing is saying okay if session id is

00:11:43.120 | not in the chat map which is this empty dictionary we're defined here we are going to initialize that

00:11:50.240 | session with an in memory chat message history okay and that's it and we return okay and all that's going

00:11:59.280 | to do is it's going to basically append our messages they will be appended within this chat map session id

00:12:06.400 | and they're going to get returned there's nothing really there's nothing else to it to be honest so

00:12:12.960 | we invoke our runnable let's see what we get oh i need to run this

00:12:17.840 | okay note that we do have this config so we have a session id that's to again as i mentioned keep

00:12:26.560 | different conversations separate okay so we've run that now let's run a few more so what is my name again

00:12:33.360 | let's see if it remembers your name is james how can i help you today james okay so it's what we've just

00:12:41.520 | done there is literally conversation buffer memory but for up-to-date line chain with l cell with runnables

00:12:51.360 | so you know the recommended way of doing it nowadays so that's a very simple example okay there's really

00:12:59.520 | and not that much to it it gets a little more complicated as we start thinking about the

00:13:04.720 | different types of memory although with that being said it's not massively complicated we're only really

00:13:09.760 | going to be changing the way that we're getting our interactions so let's uh let's dive into that and

00:13:17.440 | see how we will do something similar with the conversation buffer window memory but first let's

00:13:22.480 | actually just understand okay what is the conversation buffer window memory so as i mentioned near the start

00:13:27.840 | it's going to keep track of the last k messages so there's a few things to keep in mind here more

00:13:34.240 | messages does mean more tokens that send each request and if we have more tokens in each request

00:13:40.160 | it means that we're increasing the latency of our responses and also the cost so with the previous

00:13:46.240 | memory type we're just sending everything and because we're sending everything that is going to be

00:13:50.960 | increasing our costs going to be increasing our latency for every message especially as a conversation

00:13:55.440 | gets longer and longer and we don't we might not necessarily want to do that so with this conversation

00:14:01.360 | buffer window memory we're going to just say okay just return me the most recent messages okay so let's

00:14:09.600 | let's see how that would work here we're going to return the most recent four messages okay we are again

00:14:16.640 | make sure we've turned messages is set to true again this is deprecated this is just the old way of doing it

00:14:22.800 | in a moment we'll see the updated way of doing this we'll add all of our messages okay so we have this

00:14:31.680 | and just see here all right so we've added in all these messages there's more than four messages here

00:14:38.640 | and we can actually see that here so we have human message ai human ai human ai human ai right so we've

00:14:47.680 | got four pairs of human ai interactions there but here we don't have there's more than four pairs so

00:14:54.080 | four pairs will take us back all the way to here i'm researching different types of conversational

00:15:02.000 | memory okay and if we take a look here the most the first message we have is i'm researching different

00:15:07.280 | types of conversational memory so it's cut off these two here which will be a bit problematic when we ask

00:15:13.280 | you what our name is okay so let's just see i'm going to be using conversation chain object again again

00:15:19.040 | just remember that is deprecated and i want to say what is my name again let's see let's see what it says

00:15:26.880 | uh i'm sorry but i don't know that's your name or any personal information if you like you can tell

00:15:31.440 | me your name right so it doesn't actually remember so that's kind of like a negative of the conversation

00:15:39.360 | buffer window memory of course the uh to fix that in this scenario we might just want to increase k

00:15:45.680 | maybe we say around the previous eight interaction pairs and it will actually remember so what's my name

00:15:52.400 | again your name is james so now it remembers we've just modified how much it is remembering but of course

00:15:58.240 | you know there's pros and cons to this it really depends on what you're trying to build so let's

00:16:03.440 | take a look at how we would actually implement this with the runnable with message history

00:16:09.520 | okay so you know getting a little more complicated here although it is it's not it's not complicated

00:16:18.560 | but well we'll see okay so we have a buffer window message history we're creating a class here

00:16:23.520 | this class is going to inherit from the base chat message history object from linechain okay and all of

00:16:32.080 | our other message history objects can do the same thing before with the in-memory message object that

00:16:39.040 | was basically replicating the buffer memory so we didn't actually need to do anything we need to

00:16:46.480 | define our own class here so in this case we do so we follow the same pattern that linechain

00:16:53.520 | follows with this base chat message history and you can see a few of the functions here that are

00:16:58.720 | important so add messages and clear are the ones that we're going to be focusing on we also need to

00:17:03.760 | have messages which this object attribute here okay so we're just implementing the synchronous

00:17:11.360 | methods here if we want this to be async if we want to supply async we would have to add a add messages

00:17:18.720 | and a get messages and a clear as well so let's go ahead and do that we have messages we have k again

00:17:26.000 | we're looking at remembering the top k messages or most recent k messages only so it's important that we

00:17:31.520 | have that variable we are adding messages through this class this is going to be used by linechain within our

00:17:38.560 | runnable so we need to make sure that we do have this method and all we're going to be doing is

00:17:43.200 | extending the self messages list here and then we're actually just going to be trimming that down so

00:17:48.800 | that we're not remembering anything beyond those you know most recent k messages that we have set from

00:17:56.720 | here and then we also have the clear method as well so we need to include that that's just going to clear the history

00:18:03.520 | okay so it's not this isn't complicated right it just gives us this nice

00:18:08.240 | default standard interface for message history and we just need to make sure we're following that pattern

00:18:15.440 | okay i've included the uh this print here just so we can see what's happening okay so we have that

00:18:20.560 | and now for that get chat history function that we defined earlier rather than using the built-in method

00:18:29.120 | we're going to be using our own object which is a buffer window message history which we defined just

00:18:34.880 | here okay so if session id is not in the chat map as we did before we're going to be initializing our

00:18:42.400 | buffer window message history we're setting k up here with a default value of 4 and then we just return it

00:18:48.640 | okay and and that's it so let's run this we have our runnable with message history we have all of these

00:18:55.920 | variables which are exactly the same as before but then we also have these variables here with this history

00:19:01.920 | factory config and this is where if we have um new variables that we've added to our message history

00:19:12.480 | in this case k that we have down here we need to provide that to line training tell it this is a new

00:19:18.560 | configurable field okay and we've also added it for the session id here as well so we're just being

00:19:24.560 | explicit and have everything in that so we have that and we run okay now let's go ahead and invoke and see

00:19:33.600 | what we get okay so important here this history factory config that is kind of being fed through

00:19:41.200 | into our invoke so that we can actually modify those variables from here okay so we have config configurable

00:19:47.760 | session id okay we'll just put whatever we want in here and then we also have the number k

00:19:52.480 | okay so remember the previous four interactions i think in this one we're doing something slightly

00:20:00.320 | different i think we're remembering the four interactions rather than the previous four

00:20:04.800 | interaction pairs okay so my name is james uh we're going to go through i'm just going to actually

00:20:10.480 | clear this and now i'm going to start again and we're going to use the exact same add user message

00:20:16.000 | and ai message that we used before we're just manually inserting all that into our history

00:20:20.080 | so that we can then just see okay what is the result and you can see that k equals four is actually

00:20:27.360 | unlike before where we were having the uh saving the top four interaction pairs we're now saving the

00:20:35.680 | most recent four interactions not pairs just interactions and honestly i just think that's

00:20:41.920 | clearer i think it's weird that the number four for k would actually save the most recent eight messages

00:20:48.800 | right i think that's odd so i'm just not replicating that weirdness

00:20:54.400 | we could if we wanted to i just don't like it so i'm not doing that and anyway we can see from messages

00:21:01.920 | that we're returning just the most four recent messages okay we should be these four okay cool so

00:21:08.960 | we've just using the runnable we've replicated the old way of having a window memory and okay i'm going

00:21:17.680 | to say what is my name again as before it's not going to remember so we can come to here i'm sorry

00:21:22.880 | but i don't have access to personal information and so on and so on if you like to tell me your name

00:21:27.120 | it doesn't know now let's try a new one where we initialize a new session okay so we're going with idk14

00:21:36.320 | so that's going to create a new conversation there and we're going to say we're going to set k to 14

00:21:43.360 | okay great i'm going to manually insert the other uh messages as we did before okay and we can see

00:21:50.160 | all of those you can see at the top here we are still maintaining that hi my name is james message

00:21:54.880 | now let's see if it remembers my name your name is james okay there we go cool so that is working and

00:22:03.520 | we can also see so we just added this what is my name again let's just see if did that get added to our

00:22:10.320 | lists of messages right what is my name again nice and then we also have the response your name is

00:22:15.920 | james so just by invoking this because we're using the the runnable with message history it's just

00:22:23.040 | automatically adding all of that into our message history which is nice cool all right so that is the

00:22:31.600 | buffer window memory and now we are going to take a look at how we might do something a little more

00:22:37.600 | complicated which is the the summaries okay so when you think about the summary you know what are we

00:22:42.720 | doing we're actually taking the messages we're using that lm call to summarize them to compress them

00:22:50.160 | and then we're storing them within messages so let's see how we would actually do that so to start with

00:22:58.720 | let's just see how it was done in old line chain so you have conversation summary memory go through that

00:23:06.320 | and let's just see what we get so again same interactions right i'm just invoking invoking

00:23:15.680 | invoking i'm not adding these directly to the messages because it actually needs to go through

00:23:20.320 | a um like that summarization process and if we have a look we can see it happening okay current

00:23:28.000 | conversation so sorry current conversation hello there my name is james ai is generating current

00:23:34.560 | conversation the human introduces himself as james ai greets james warmly and expresses its readiness to

00:23:40.640 | chat and assist inquiring about how his day is going right so it's summarizing the the previous

00:23:46.880 | interactions and then we have you know after that summary we have the most recent human message and

00:23:53.120 | then the ai is going to generate its response okay and that continues going continues going and you

00:23:58.800 | see that the the final summary here is going to be a lot longer okay and it's different that first

00:24:03.520 | summary of course asking about his day uh imagine studies researching different types of conversational

00:24:08.080 | memory the ai responds enthusiastically explaining that conversational memory includes short-term

00:24:13.200 | memory long-term memory contextual memory personalized memory and then inquires if james is focused on a

00:24:17.680 | specific type of memory okay cool so we get essentially the summary is just getting uh longer and

00:24:25.200 | longer as we go but at some point the idea is that it's not going to keep just growing and it should

00:24:30.400 | actually be shorter than if you were saving every single interaction whilst maintaining as much of the

00:24:36.400 | information as possible but of course you're not going to maintain all of the information that you would with

00:24:43.520 | for example the the buffer memory right with the summary you are going to lose information but hopefully

00:24:50.720 | less information than if you're just cutting interactions so you're trying to reduce your token count

00:24:57.520 | whilst maintaining as much information as possible now let's go and ask what is my name again it should be able to

00:25:05.520 | answer because we can see in the summary here that i introduced myself as james okay response your name

00:25:13.360 | is james how is your research going okay so how's that cool let's see how we'd implement that so again

00:25:20.880 | as before we're going to go with that conversation summary message history we're going to be importing a

00:25:27.600 | system message uh we're going to be using that not for the lm that we're chatting with but for the lm that

00:25:32.320 | will be generating our summary so actually that is not quite correct there's create a summary not

00:25:40.880 | that it matters it's just the doctrine so we have our messages and we also have the lm so different

00:25:45.600 | different tribute here to what we had before when we initialize the conversation summary message history

00:25:51.920 | we need to passing in our lm we have the same methods as before we have add messages and clear and what we're

00:25:59.280 | doing is as messages coming we extend with our current messages but then we're modifying those okay so we

00:26:07.120 | construct our like instructions to make a summary okay so that is here we have the system prompt uh

00:26:15.360 | given the existing conversation summary and the new messages generate a new summary of the conversation

00:26:19.840 | ensuring to maintain as much relevant information as possible okay then we have a human message here

00:26:26.000 | through that we're passing the existing summary okay and then we're passing in the new messages

00:26:32.480 | okay cool so we format those and invoke the lm

00:26:39.120 | here and then what we're doing is in the messages we're actually replacing the existing history that we had

00:26:47.680 | before with a new history which is just a single system summary message okay let's see what we get

00:26:56.080 | as before we have that and get chat history exactly the same as before the only real difference is that

00:27:01.520 | we're passing in the lm parameter here and of course as we're passing in the lm parameter in here it does

00:27:07.840 | also mean that we're going to have to include that in the configurable field spec and that we're going to need to

00:27:14.160 | include that when we're invoking our pipeline okay so we run that pass me the lm

00:27:21.440 | now of course one side effect of generating summaries or everything is that way actually

00:27:28.640 | you know we're generating more so you are actually using quite a lot of tokens whether or not you are

00:27:35.600 | saving tokens or not actually depends on the length of a conversation as a conversation gets longer if you're

00:27:41.120 | storing everything after a little while that the token usage is actually going to increase so if in your

00:27:48.560 | use case you expect to have shorter conversations you would be saving money and tokens by just using this

00:27:55.920 | standard buffer memory whereas if you're expecting very long conversations you would be saving tokens

00:28:02.800 | some money by using the summary history okay so let's see what we got from there we have a summary of the conversation

00:28:10.080 | james introduced himself by saying hi my name james i responded warmly asking hi james and no

00:28:14.960 | interaction include details about token usage okay so we actually uh included everything here which we probably should not have done

00:28:23.920 | why did we do that uh so in here we're including all of the

00:28:32.560 | oh in here so using or including all of the content from the messages so i think maybe if we just do

00:28:42.320 | x.content for x in messages that should resolve that

00:28:49.760 | okay there we go so we quickly fix that so yeah before we're passing in the entire messages object

00:28:57.920 | which obviously includes all of this information whereas actually we just want to be passing into content

00:29:03.760 | so we modified that and now we're getting what we expect okay cool and then we can keep going all right

00:29:11.920 | so as we as we keep going the summary should get more like abstract like as we just saw here is

00:29:18.560 | literally just giving us the messages directly almost okay so we're getting the summary there

00:29:24.720 | and we can keep going we're going to add just more messages to that we'll see the you know as

00:29:30.400 | we'll get send those we'll get in a response send again get response we're just adding all of that

00:29:36.560 | invoking all that and i'll be of course adding everything into our message history okay cool so

00:29:43.440 | we've run that let's see what the uh latest summary is

00:29:46.480 | okay and then we have this so this is a summary that we have instead of our our chat history okay cool

00:29:57.360 | now finally let's see what's my name again we can just double check you know has my name in there

00:30:02.800 | so it should be able to tell us

00:30:04.320 | okay cool so your name is james pretty interesting so let's have a quick look over at linesmith so the

00:30:17.760 | reason i want to do this is just to point out okay the different essentially token usage that we're getting

00:30:23.120 | with each one of these okay so we can see that we have these runnable message history which are

00:30:27.360 | probably uh improved in naming there but we can see okay how long is each one of these taken how many

00:30:34.960 | tokens are they also using come back to here we have this runnable message history and this is we'll go

00:30:42.240 | through a few of these maybe to here i think we can see here this is that first interaction where we're using the

00:30:48.880 | the buffer memory and we can see how many tokens were used here so 112 tokens when we're asking what

00:30:56.080 | is my name again okay then we modified this to include i think it was like 14 interactions or something on

00:31:04.480 | those lines saying obviously increases number of tokens that we're using right so we can see that actually

00:31:09.440 | happening all in line stuff which is quite nice and we can compare okay how many tokens is each one of

00:31:14.160 | these using now this is looking at the buffer window and then if we come down to here and look at

00:31:20.640 | this one so this is using our summary okay so our summary with what is my name again i actually use more

00:31:27.600 | tokens in this scenario right which is interesting because we're trying to compress information the reason

00:31:33.200 | those more is because there's not there hasn't been that many interactions as the conversation length

00:31:39.440 | increases with the summary this total number of tokens especially if we prompt it correctly to keep that

00:31:46.080 | low that should remain relatively small whereas with the buffer memory that will just keep increasing

00:31:54.480 | and increasing as a as the conversation gets longer so useful little way of using line smith there to

00:32:02.560 | just kind of figure out okay in terms of tokens and costs of what we're looking at for each of these

00:32:07.840 | memory types okay so our final memory type acts as a mix of the summary memory and the buffer memory

00:32:17.040 | so what it's going to do is keep the buffer up until an n number of tokens and then once a message exceeds

00:32:25.920 | the n number of token limit for the buffer it is actually going to be added into our summary so this

00:32:33.280 | memory has the benefit of remembering in detail the most recent interactions whilst also not having limitation

00:32:43.840 | of using too many tokens as a conversation gets longer and even potentially exceeding context windows if you

00:32:51.600 | try super hard so this is a very interesting approach now as before let's try the original way of

00:32:59.280 | implementing this then we will go ahead and use our update method for implementing this so we come down to

00:33:07.280 | here and we're going to do lang chain memory import conversation summary buffer memory okay a few things

00:33:15.040 | here lm for summary we have the n number of tokens that we can keep before they get added to the summary

00:33:23.440 | and then return messages of course okay you can see again this is deprecated we use the conversation chain

00:33:29.760 | and then we just pass in our memory there and then we can chat okay so super straightforward first message

00:33:37.520 | we'll add a few more here again we have to invoke because how memory type here is using the lm to create

00:33:46.560 | those summaries as it goes and let's see what they look like okay so we can see for the first message here

00:33:52.560 | we have a human message and then an ai message then we come a little bit lower down again same thing human

00:34:00.080 | message is the first thing in our history here then it's a system message so this is at the point where

00:34:06.560 | we've exceeded that 300 token limit and the memory type here is generating those summaries so that summary

00:34:14.560 | comes in as a system message and we can see okay the human named james introduces himself and mentions he's

00:34:20.240 | researching different types of conversation memory and so on and so on right okay cool so we have that

00:34:26.320 | then let's come down a little bit further we can see okay so the summary there okay so that's what we

00:34:35.120 | that's what we have that is the implementation for the old version of this memory again we can see it's

00:34:43.520 | deprecated so how do we implement this for our more recent versions of langchain and specifically 0.3 well

00:34:52.720 | again we're using that runnable with message history and it looks a little more complicated than we were

00:34:59.360 | getting before but it's actually just you know it's nothing too complex we're just creating a summary as we

00:35:06.800 | didn't with the previous memory type but the decision for adding to that summary is based on in this case

00:35:14.320 | actually the number of messages so i didn't go with the the langchain version where it's a number of tokens

00:35:21.040 | i don't like that i prefer to go with messages so what i'm doing is saying okay less k messages okay

00:35:27.520 | once we exceed k messages the measures beyond that are going to be added to the memory okay cool so

00:35:36.160 | let's see we first initialize our conversation summary buffer message history class with lm and k

00:35:47.040 | okay so these two here so lm of course to create summaries and k is just the limit of number of messages

00:35:53.360 | that we want to keep before adding them to the summary or dropping them from our messages and

00:35:58.160 | adding them to the summary okay so we will begin with okay do we have an existing summary so the reason

00:36:07.840 | we set this in none is we can't extract the summary the existing summary unless it already exists and the

00:36:16.560 | only way we can do that is by checking okay do we have any messages if yes we want to check if within

00:36:22.880 | those messages we have a system message because we're we're doing the same structure as what we

00:36:27.840 | have up here where the system message that first system message is actually our summary so that's

00:36:32.720 | what we're doing here we're checking if there is a summary message already stored within our messages

00:36:37.760 | okay so we're checking for that if we find it we'll just do we have this little print statement so we

00:36:46.080 | can see that we found something and then we just make our existing summary i should actually move

00:36:52.400 | move this to the first instance here okay so that existing summary will be set to the first message

00:37:04.560 | okay and this would be a system message rather than a string

00:37:09.600 | cool so we have that then we want to add any new messages to our history okay so we're sending the history

00:37:20.240 | there and then we're saying okay if the length of our history is exceeds the k value that we set

00:37:26.720 | we're going to say okay we found that many messages we're going to be dropping the latest it's going to

00:37:30.800 | be the latest two messages this i will say here one thing or one problem with this

00:37:37.360 | is that we're not going to be saving that many tokens if we're summarizing every two messages so

00:37:44.480 | what i would probably do is in in an actual like production setting i would probably say let's go up

00:37:52.560 | to 20 messages and once we hit 20 messages let's take the previous 10 we're going to summarize them

00:38:00.160 | and put them into our summary alongside any you know previous summary that already existed but in in you

00:38:05.600 | know this is also fine as well okay so we say we found those messages we're going to drop the latest

00:38:14.880 | two messages okay so we pull the the oldest messages out i should say not the latest it's the oldest

00:38:24.640 | not the latest i want to keep the latest drop the oldest so we pull out the oldest messages and keep

00:38:31.440 | only the most recent messages okay then i'm saying okay if we if we don't have any old messages to

00:38:39.520 | summarize we don't do anything we just return okay so this indicates that this has not been triggered

00:38:46.560 | we would hit this but in the case this has been triggered and we do have old messages we're going

00:38:54.560 | to come to here okay so this is we can see our system message prompt template saying giving the

00:39:02.640 | existing conversation summary in the new messages generate a new summary of the conversation ensuring

00:39:07.920 | to maintain as much relevant information as possible so if we want to be more conservative with tokens we

00:39:13.200 | could modify this prompt here to say keep the summary to within the length of a single paragraph for

00:39:20.560 | example and then we have our human message prompt template which can say okay here's the existing

00:39:25.840 | compensation summary and here are our new messages now new messages here is actually the old messages

00:39:31.280 | but the way that we're framing it to the llm here is that we want to summarize the whole

00:39:36.880 | conversation right it doesn't need to have the most recent messages that we're storing within

00:39:41.680 | our buffer it doesn't need to know about those that's irrelevant to the summary so we just tell

00:39:46.640 | it that we have these new matches and as far as this lm is concerned this is like the full set of

00:39:52.160 | interactions okay so then we would format those and invoke our lm and then we'll print out our new

00:39:59.840 | summary so we can see what's going on there and we would prepend that new summary to our conversation

00:40:07.520 | history okay and and this will work so we can just prepend it like this because we've already popped

00:40:17.200 | where was it up here if we have an existing summary we already popped that from the list so it's already

00:40:24.880 | been pulled out of that list so it's okay for us to just we don't need to say like we don't need to do

00:40:31.440 | this because we've already dropped that initial system message if it existed okay and then we have

00:40:37.120 | the clear method as before so that's all of the logic for our conversational summary buffer memory

00:40:46.640 | okay so we can see what's going on here we can see what's going on here we define our get chat history

00:40:53.600 | function with the lm and k parameters there and then we'll also want to set the configurable fields

00:40:55.280 | again so that is just going to be called session id lm and k

00:41:04.320 | okay so now we can invoke the k value to begin with is going to be four okay so we can see no

00:41:12.800 | old messages to update summary with that's good let's invoke this a few times and let's see what we get

00:41:19.280 | okay so no old messages to update summary with

00:41:26.560 | found six messages dropping the oldest two and then we have new summary in the conversation james

00:41:31.760 | introduced himself and first is interested in researching different types of conversational

00:41:35.760 | memory right so you can see there's quite a lot in here at the moment so we would definitely want to

00:41:40.880 | prompt the lm the summary lm to keep that short otherwise we're just getting a ton of stuff

00:41:48.800 | right but we can see that that is you know it's working it's functional so let's go back

00:41:56.480 | and see if we can prompt it to be a little more concise so we come to here ensuring to maintain

00:42:01.680 | as much relevant information as possible however we need to keep our summary concise the limit

00:42:13.920 | is a single short paragraph okay something like this let's try and let's see what we get with that

00:42:26.400 | okay so message one again nothing to update see this so new summary you can see it's a bit shorter it

00:42:32.720 | doesn't have all those bullet points

00:42:34.480 | okay so that seems better let's see so you can see the first summary is a bit shorter

00:42:45.120 | but then as soon as we get to the second and third summaries the second summary is actually

00:42:50.720 | slightly longer than the third one okay so we're going to be we're going to be losing a bit of

00:42:56.000 | information in this case more than we were before but we're saving a ton of tokens so that's of course

00:43:02.480 | a good thing and of course we could keep going and adding many interactions here and we should see that

00:43:08.080 | this conversation summary will be it should maintain that sort of length of around one short paragraph

00:43:16.080 | so that is it for this chapter on conversation memory we've seen a few different memory types we've

00:43:24.080 | implemented their old deprecated versions so we can see what they were like and then we've

00:43:29.520 | re-implemented them for the latest versions of lang chain and to be honest using logic where we are

00:43:36.640 | getting much more into the weeds and that is in some ways okay it complicates things that is true but in

00:43:44.080 | other ways it gives us a ton of control so we can modify those memory types as we did with that final

00:43:50.080 | summary buffer memory type we can modify those to our liking which is incredibly useful when you're

00:43:58.080 | actually building applications for the real world so that is it for this chapter we'll move on to the next one

00:44:03.840 | that's it for this chapter we'll move on to the next chapter we'll see you next time

Conversational Memory in LangChain for 2025

Chapters