back to indexConversational Memory in LangChain for 2025

Chapters
0:0 Conversational Memory in LangChain
1:12 LangChain Chat Memory Types
4:26 LangChain ConversationBufferMemory
8:23 Buffer Memory with LCEL
13:14 LangChain ConversationBufferWindowMemory
16:1 Buffer Window Memory with LCEL
22:32 LangChain ConversationSummaryMemory
25:17 Summary Memory with LCEL
30:12 Token Usage in LangSmith
32:8 Conversation Summary Buffer Memory
34:36 Summary Buffer with LCEL
00:00:00.000 |
in this chapter we're going to be taking a look at conversational memory in linechain we're going 00:00:06.160 |
to be taking a look at the core like chat memory components that have already been in linechain 00:00:13.760 |
since the start but are essentially no longer in the library and we'll be seeing how we actually 00:00:20.560 |
implement those historic conversational memory utilities in the new versions of linechain so 00:00:29.920 |
0.3 now as a pre-warning this chapter is fairly long but that is because conversational memory is 00:00:37.360 |
just such a critical part of chatbots and agents conversational memory is what allows them to 00:00:44.320 |
remember previous interactions and without it our chatbots and agents would just be responding to 00:00:51.200 |
the most recent message without any understanding of previous interactions within a conversations 00:00:57.360 |
so they would just not be conversational and depending on the type of conversation we might 00:01:03.920 |
want to go with various approaches to how we remember those interactions within a conversation now throughout 00:01:12.960 |
this chapter we're going to be focusing on these four memory types we'll be referring to these and i'll 00:01:19.200 |
be showing you actually how each one of these works but what we're really focusing on is rewriting these 00:01:25.600 |
for the latest version of langchain using the what's called the runnable with message 00:01:33.840 |
history so we're going to be essentially taking a look at the original implementations for each 00:01:42.080 |
of these four original memory types and then we'll be rewriting them with the the runnable memory history 00:01:48.320 |
class so just taking a look at each of these four very quickly a constantial buffer memory is i think 00:01:56.560 |
the simplest most intuitive of these memory types it is literally just you have your messages they come 00:02:05.520 |
in to this object they are stored in this object as essentially a list and when you need them again 00:02:12.640 |
it will return them to you there's nothing nothing else to it's super simple the conversation buffer 00:02:18.880 |
window memory okay so new word in the middle of the window this works in pretty much the same way 00:02:25.760 |
but those messages that it has stored it's not going to return all of them for you instead it's just going to 00:02:31.680 |
return the most recent let's say the most recent three for example okay and that is defined by a parameter 00:02:39.280 |
k conversational summary memory rather than keeping track of the entire interaction memory directly what 00:02:47.680 |
it's doing is as those interactions come in it's actually going to take them and it's going to compress 00:02:53.600 |
them into a smaller little summary of what has been within that conversation and as every a new interaction 00:03:01.200 |
is coming in it's going to do that and i keep iterating on that summary and then that is going 00:03:06.400 |
to return to us when we need it and finally we have the conversational summary buffer memory so this is it's 00:03:14.000 |
taking so the buffer part of this is actually referring to a very similar thing to the buffer window memory but 00:03:21.040 |
rather than it being a you know most k messages it's looking at the number of tokens within your memory 00:03:27.360 |
and it's returning the most recent k tokens that's what the buffer part is there and then it's also 00:03:36.480 |
merging that with the summary memory here so essentially what you're getting is almost like a list of the most 00:03:44.080 |
recent messages based on the token length rather than the number of interactions plus a summary which would 00:03:50.960 |
you know come at the the top here so you get kind of both the idea is that obviously this summary here 00:03:57.760 |
would maintain all of your interactions in a very compressed form so you're you're losing less 00:04:05.360 |
information and you're still maintaining you know maybe the very first interaction the user might have 00:04:10.240 |
introduced themselves giving you their name hopefully that would be maintained within the summary and it 00:04:16.560 |
would not be lost and then you have almost like a higher resolution on the most recent um k or k tokens 00:04:24.000 |
from your memory okay so let's jump over to the code we're going into the zero four chat memory notebook 00:04:30.880 |
okay open that in colab okay now here we are let's go ahead and install the prerequisites run all 00:04:37.520 |
we again can or cannot use alignsmith it is up to you enter that and let's come down and start 00:04:47.360 |
so first to just initialize our lm using 40 mini in this example again low temperature 00:04:55.920 |
and we're going to start with conversation buffer memory okay so this is the original version of 00:05:03.760 |
this memory type so let me uh where are we we're here so memory conversation buffer memory and we're 00:05:12.320 |
returning messages that needs to be set to true so the reason that we set return messages true as it 00:05:19.040 |
mentions up here is if you do not do this it's going to be returning your chat history as a string to an llm 00:05:27.280 |
whereas well chat llms nowadays would expect message objects so yeah you just want to be returning these 00:05:37.440 |
as messages rather than as strings okay otherwise yeah you're going to get some kind of strange 00:05:43.200 |
behavior out from your lms if you return them strings so you do want to make sure that it's true i think 00:05:48.400 |
by default it might not be true but this is coming this is deprecated right it does tell you here 00:05:54.240 |
as deprecation warning this is coming from older blank chain but it's a good place to start just to 00:06:00.000 |
understand this and then we're going to rewrite this with the runnables which is the recommended way of 00:06:04.880 |
doing so nowadays okay so adding messages to our memory we're going to write this okay so it's just a 00:06:13.040 |
just a conversation user ai user ai so on random chat main things to note here is i do provide my name 00:06:20.480 |
we have the the model same right towards the start of those interactions okay so i'm just going to 00:06:25.760 |
add all of those we do it like this okay then we can just see we can load our history like so so let's just 00:06:37.840 |
see what we have there okay so we have a human message ai message human message right this is 00:06:42.960 |
exactly what we i showed you just here it's just in that message format from line chain okay so we can 00:06:50.640 |
do that alternatively we can actually do this so we can get our memory we initialize the constational 00:06:56.960 |
buffer memory as we did before and we can actually add it directly each message into our memory like that 00:07:03.600 |
so we can use this add user message add ai message so on and so on load again and it's going to give 00:07:08.800 |
us the exact same thing again there's multiple ways to do the same thing cool so we have that to pass all 00:07:15.920 |
of this into our lm again this is all deprecated stuff we're going to learn how to use properly in a moment 00:07:21.440 |
but this is how long chain is doing in the past so to pass all of this into our lm we'd be using this 00:07:28.720 |
conversation chain right again this is deprecated nowadays we would be using lcell for this so i just 00:07:37.520 |
want to show you okay how this would all go together and then we would invoke okay what is my name again 00:07:42.240 |
let's run that and we'll see what we get it's remembering everything remember so this conversation 00:07:49.120 |
buffer memory it doesn't drop messages it just remembers everything right and honestly with the sort of high 00:07:56.960 |
context windows of many lms that might be what you do it depends on how long you expect a conversation 00:08:02.160 |
to go on for but you could you probably in most cases would get away with this okay so what let's 00:08:09.360 |
see what we get um i say what is my name again okay let's see what it gives me says your name is james 00:08:15.520 |
great thank you that works now as i mentioned all of this i just showed you is actually deprecated that's 00:08:22.240 |
the old way of doing things let's see how we actually do this in modern or up-to-date line chain 00:08:28.080 |
so we're going to be using this runnable with message history to implement that we will need to use lcell 00:08:34.720 |
and for that we will need to just define prompt templates our lm as we usually would okay so we're 00:08:41.520 |
going to set up our system prompt which is just your helpful system called zeta okay we're going to put in this 00:08:48.640 |
messages placeholder okay so that's important essentially that is where our messages are coming 00:08:56.320 |
from our conversation buffer memory is going to be inserted right so it's going to be that chat history 00:09:02.880 |
is going to be inserted after our system prompt but before our most recent query which is going to be 00:09:08.880 |
inserted last here okay so messages placeholder item that's important and we use that throughout the 00:09:16.160 |
course as well so we use it both for chat history and we'll see later on we also use it for the 00:09:21.120 |
intermediate thoughts that a agent would go through as well so important to remember that little thing 00:09:28.000 |
we'll link our prompt template to our lm again if we would like we could also add in the i think we only 00:09:36.800 |
have the query here oh we would probably also want our history as well but i'm not going to do that right 00:09:44.240 |
now okay so we have our pipeline and we can go ahead and actually define our runnable with message history 00:09:51.840 |
now this class or object when we are initializing it does require a few items we can see them here 00:09:57.440 |
okay so we see that we have our pipeline with history so it's basically going to be uh you can 00:10:03.520 |
you can see here right we have that history meshes key right this here has to align with what we provided 00:10:10.480 |
as the meshes placeholder in our pipeline right so we have our pipeline prompt template here and here 00:10:19.600 |
right so that's where it's coming from it's coming from meshes placeholder variable name is history right 00:10:24.240 |
that's important that links to this then for the input meshes key here we have query that again links to this 00:10:34.960 |
okay so both important to have that the other thing that is important is obviously we're passing in that 00:10:42.240 |
pipeline from before but then we also have this get session history basically what this is doing 00:10:47.840 |
is it's saying okay i need to get the list of messages that make up my chat history that are going to be 00:10:53.280 |
inserted into this variable so that is a function that we define okay and with it within this function 00:10:59.920 |
what we're trying to do here is actually replicate what we have with the previous conversation buffer 00:11:08.240 |
memory okay so that's what we're doing here so it's very simple right so we have this in memory chat 00:11:17.040 |
message history okay so that's just the object that we're going to be returning what this will do is 00:11:22.320 |
it will set up a session id the session i use essentially like a unique identifier so that each 00:11:28.240 |
conversational interaction within a single conversation is being mapped to a specific 00:11:32.880 |
conversation so you don't have overlapping let's say of multiple users using the same system you want 00:11:37.520 |
to have a unique session id for each one of those okay and what it's doing is saying okay if session id is 00:11:43.120 |
not in the chat map which is this empty dictionary we're defined here we are going to initialize that 00:11:50.240 |
session with an in memory chat message history okay and that's it and we return okay and all that's going 00:11:59.280 |
to do is it's going to basically append our messages they will be appended within this chat map session id 00:12:06.400 |
and they're going to get returned there's nothing really there's nothing else to it to be honest so 00:12:12.960 |
we invoke our runnable let's see what we get oh i need to run this 00:12:17.840 |
okay note that we do have this config so we have a session id that's to again as i mentioned keep 00:12:26.560 |
different conversations separate okay so we've run that now let's run a few more so what is my name again 00:12:33.360 |
let's see if it remembers your name is james how can i help you today james okay so it's what we've just 00:12:41.520 |
done there is literally conversation buffer memory but for up-to-date line chain with l cell with runnables 00:12:51.360 |
so you know the recommended way of doing it nowadays so that's a very simple example okay there's really 00:12:59.520 |
and not that much to it it gets a little more complicated as we start thinking about the 00:13:04.720 |
different types of memory although with that being said it's not massively complicated we're only really 00:13:09.760 |
going to be changing the way that we're getting our interactions so let's uh let's dive into that and 00:13:17.440 |
see how we will do something similar with the conversation buffer window memory but first let's 00:13:22.480 |
actually just understand okay what is the conversation buffer window memory so as i mentioned near the start 00:13:27.840 |
it's going to keep track of the last k messages so there's a few things to keep in mind here more 00:13:34.240 |
messages does mean more tokens that send each request and if we have more tokens in each request 00:13:40.160 |
it means that we're increasing the latency of our responses and also the cost so with the previous 00:13:46.240 |
memory type we're just sending everything and because we're sending everything that is going to be 00:13:50.960 |
increasing our costs going to be increasing our latency for every message especially as a conversation 00:13:55.440 |
gets longer and longer and we don't we might not necessarily want to do that so with this conversation 00:14:01.360 |
buffer window memory we're going to just say okay just return me the most recent messages okay so let's 00:14:09.600 |
let's see how that would work here we're going to return the most recent four messages okay we are again 00:14:16.640 |
make sure we've turned messages is set to true again this is deprecated this is just the old way of doing it 00:14:22.800 |
in a moment we'll see the updated way of doing this we'll add all of our messages okay so we have this 00:14:31.680 |
and just see here all right so we've added in all these messages there's more than four messages here 00:14:38.640 |
and we can actually see that here so we have human message ai human ai human ai human ai right so we've 00:14:47.680 |
got four pairs of human ai interactions there but here we don't have there's more than four pairs so 00:14:54.080 |
four pairs will take us back all the way to here i'm researching different types of conversational 00:15:02.000 |
memory okay and if we take a look here the most the first message we have is i'm researching different 00:15:07.280 |
types of conversational memory so it's cut off these two here which will be a bit problematic when we ask 00:15:13.280 |
you what our name is okay so let's just see i'm going to be using conversation chain object again again 00:15:19.040 |
just remember that is deprecated and i want to say what is my name again let's see let's see what it says 00:15:26.880 |
uh i'm sorry but i don't know that's your name or any personal information if you like you can tell 00:15:31.440 |
me your name right so it doesn't actually remember so that's kind of like a negative of the conversation 00:15:39.360 |
buffer window memory of course the uh to fix that in this scenario we might just want to increase k 00:15:45.680 |
maybe we say around the previous eight interaction pairs and it will actually remember so what's my name 00:15:52.400 |
again your name is james so now it remembers we've just modified how much it is remembering but of course 00:15:58.240 |
you know there's pros and cons to this it really depends on what you're trying to build so let's 00:16:03.440 |
take a look at how we would actually implement this with the runnable with message history 00:16:09.520 |
okay so you know getting a little more complicated here although it is it's not it's not complicated 00:16:18.560 |
but well we'll see okay so we have a buffer window message history we're creating a class here 00:16:23.520 |
this class is going to inherit from the base chat message history object from linechain okay and all of 00:16:32.080 |
our other message history objects can do the same thing before with the in-memory message object that 00:16:39.040 |
was basically replicating the buffer memory so we didn't actually need to do anything we need to 00:16:46.480 |
define our own class here so in this case we do so we follow the same pattern that linechain 00:16:53.520 |
follows with this base chat message history and you can see a few of the functions here that are 00:16:58.720 |
important so add messages and clear are the ones that we're going to be focusing on we also need to 00:17:03.760 |
have messages which this object attribute here okay so we're just implementing the synchronous 00:17:11.360 |
methods here if we want this to be async if we want to supply async we would have to add a add messages 00:17:18.720 |
and a get messages and a clear as well so let's go ahead and do that we have messages we have k again 00:17:26.000 |
we're looking at remembering the top k messages or most recent k messages only so it's important that we 00:17:31.520 |
have that variable we are adding messages through this class this is going to be used by linechain within our 00:17:38.560 |
runnable so we need to make sure that we do have this method and all we're going to be doing is 00:17:43.200 |
extending the self messages list here and then we're actually just going to be trimming that down so 00:17:48.800 |
that we're not remembering anything beyond those you know most recent k messages that we have set from 00:17:56.720 |
here and then we also have the clear method as well so we need to include that that's just going to clear the history 00:18:03.520 |
okay so it's not this isn't complicated right it just gives us this nice 00:18:08.240 |
default standard interface for message history and we just need to make sure we're following that pattern 00:18:15.440 |
okay i've included the uh this print here just so we can see what's happening okay so we have that 00:18:20.560 |
and now for that get chat history function that we defined earlier rather than using the built-in method 00:18:29.120 |
we're going to be using our own object which is a buffer window message history which we defined just 00:18:34.880 |
here okay so if session id is not in the chat map as we did before we're going to be initializing our 00:18:42.400 |
buffer window message history we're setting k up here with a default value of 4 and then we just return it 00:18:48.640 |
okay and and that's it so let's run this we have our runnable with message history we have all of these 00:18:55.920 |
variables which are exactly the same as before but then we also have these variables here with this history 00:19:01.920 |
factory config and this is where if we have um new variables that we've added to our message history 00:19:12.480 |
in this case k that we have down here we need to provide that to line training tell it this is a new 00:19:18.560 |
configurable field okay and we've also added it for the session id here as well so we're just being 00:19:24.560 |
explicit and have everything in that so we have that and we run okay now let's go ahead and invoke and see 00:19:33.600 |
what we get okay so important here this history factory config that is kind of being fed through 00:19:41.200 |
into our invoke so that we can actually modify those variables from here okay so we have config configurable 00:19:47.760 |
session id okay we'll just put whatever we want in here and then we also have the number k 00:19:52.480 |
okay so remember the previous four interactions i think in this one we're doing something slightly 00:20:00.320 |
different i think we're remembering the four interactions rather than the previous four 00:20:04.800 |
interaction pairs okay so my name is james uh we're going to go through i'm just going to actually 00:20:10.480 |
clear this and now i'm going to start again and we're going to use the exact same add user message 00:20:16.000 |
and ai message that we used before we're just manually inserting all that into our history 00:20:20.080 |
so that we can then just see okay what is the result and you can see that k equals four is actually 00:20:27.360 |
unlike before where we were having the uh saving the top four interaction pairs we're now saving the 00:20:35.680 |
most recent four interactions not pairs just interactions and honestly i just think that's 00:20:41.920 |
clearer i think it's weird that the number four for k would actually save the most recent eight messages 00:20:48.800 |
right i think that's odd so i'm just not replicating that weirdness 00:20:54.400 |
we could if we wanted to i just don't like it so i'm not doing that and anyway we can see from messages 00:21:01.920 |
that we're returning just the most four recent messages okay we should be these four okay cool so 00:21:08.960 |
we've just using the runnable we've replicated the old way of having a window memory and okay i'm going 00:21:17.680 |
to say what is my name again as before it's not going to remember so we can come to here i'm sorry 00:21:22.880 |
but i don't have access to personal information and so on and so on if you like to tell me your name 00:21:27.120 |
it doesn't know now let's try a new one where we initialize a new session okay so we're going with idk14 00:21:36.320 |
so that's going to create a new conversation there and we're going to say we're going to set k to 14 00:21:43.360 |
okay great i'm going to manually insert the other uh messages as we did before okay and we can see 00:21:50.160 |
all of those you can see at the top here we are still maintaining that hi my name is james message 00:21:54.880 |
now let's see if it remembers my name your name is james okay there we go cool so that is working and 00:22:03.520 |
we can also see so we just added this what is my name again let's just see if did that get added to our 00:22:10.320 |
lists of messages right what is my name again nice and then we also have the response your name is 00:22:15.920 |
james so just by invoking this because we're using the the runnable with message history it's just 00:22:23.040 |
automatically adding all of that into our message history which is nice cool all right so that is the 00:22:31.600 |
buffer window memory and now we are going to take a look at how we might do something a little more 00:22:37.600 |
complicated which is the the summaries okay so when you think about the summary you know what are we 00:22:42.720 |
doing we're actually taking the messages we're using that lm call to summarize them to compress them 00:22:50.160 |
and then we're storing them within messages so let's see how we would actually do that so to start with 00:22:58.720 |
let's just see how it was done in old line chain so you have conversation summary memory go through that 00:23:06.320 |
and let's just see what we get so again same interactions right i'm just invoking invoking 00:23:15.680 |
invoking i'm not adding these directly to the messages because it actually needs to go through 00:23:20.320 |
a um like that summarization process and if we have a look we can see it happening okay current 00:23:28.000 |
conversation so sorry current conversation hello there my name is james ai is generating current 00:23:34.560 |
conversation the human introduces himself as james ai greets james warmly and expresses its readiness to 00:23:40.640 |
chat and assist inquiring about how his day is going right so it's summarizing the the previous 00:23:46.880 |
interactions and then we have you know after that summary we have the most recent human message and 00:23:53.120 |
then the ai is going to generate its response okay and that continues going continues going and you 00:23:58.800 |
see that the the final summary here is going to be a lot longer okay and it's different that first 00:24:03.520 |
summary of course asking about his day uh imagine studies researching different types of conversational 00:24:08.080 |
memory the ai responds enthusiastically explaining that conversational memory includes short-term 00:24:13.200 |
memory long-term memory contextual memory personalized memory and then inquires if james is focused on a 00:24:17.680 |
specific type of memory okay cool so we get essentially the summary is just getting uh longer and 00:24:25.200 |
longer as we go but at some point the idea is that it's not going to keep just growing and it should 00:24:30.400 |
actually be shorter than if you were saving every single interaction whilst maintaining as much of the 00:24:36.400 |
information as possible but of course you're not going to maintain all of the information that you would with 00:24:43.520 |
for example the the buffer memory right with the summary you are going to lose information but hopefully 00:24:50.720 |
less information than if you're just cutting interactions so you're trying to reduce your token count 00:24:57.520 |
whilst maintaining as much information as possible now let's go and ask what is my name again it should be able to 00:25:05.520 |
answer because we can see in the summary here that i introduced myself as james okay response your name 00:25:13.360 |
is james how is your research going okay so how's that cool let's see how we'd implement that so again 00:25:20.880 |
as before we're going to go with that conversation summary message history we're going to be importing a 00:25:27.600 |
system message uh we're going to be using that not for the lm that we're chatting with but for the lm that 00:25:32.320 |
will be generating our summary so actually that is not quite correct there's create a summary not 00:25:40.880 |
that it matters it's just the doctrine so we have our messages and we also have the lm so different 00:25:45.600 |
different tribute here to what we had before when we initialize the conversation summary message history 00:25:51.920 |
we need to passing in our lm we have the same methods as before we have add messages and clear and what we're 00:25:59.280 |
doing is as messages coming we extend with our current messages but then we're modifying those okay so we 00:26:07.120 |
construct our like instructions to make a summary okay so that is here we have the system prompt uh 00:26:15.360 |
given the existing conversation summary and the new messages generate a new summary of the conversation 00:26:19.840 |
ensuring to maintain as much relevant information as possible okay then we have a human message here 00:26:26.000 |
through that we're passing the existing summary okay and then we're passing in the new messages 00:26:32.480 |
okay cool so we format those and invoke the lm 00:26:39.120 |
here and then what we're doing is in the messages we're actually replacing the existing history that we had 00:26:47.680 |
before with a new history which is just a single system summary message okay let's see what we get 00:26:56.080 |
as before we have that and get chat history exactly the same as before the only real difference is that 00:27:01.520 |
we're passing in the lm parameter here and of course as we're passing in the lm parameter in here it does 00:27:07.840 |
also mean that we're going to have to include that in the configurable field spec and that we're going to need to 00:27:14.160 |
include that when we're invoking our pipeline okay so we run that pass me the lm 00:27:21.440 |
now of course one side effect of generating summaries or everything is that way actually 00:27:28.640 |
you know we're generating more so you are actually using quite a lot of tokens whether or not you are 00:27:35.600 |
saving tokens or not actually depends on the length of a conversation as a conversation gets longer if you're 00:27:41.120 |
storing everything after a little while that the token usage is actually going to increase so if in your 00:27:48.560 |
use case you expect to have shorter conversations you would be saving money and tokens by just using this 00:27:55.920 |
standard buffer memory whereas if you're expecting very long conversations you would be saving tokens 00:28:02.800 |
some money by using the summary history okay so let's see what we got from there we have a summary of the conversation 00:28:10.080 |
james introduced himself by saying hi my name james i responded warmly asking hi james and no 00:28:14.960 |
interaction include details about token usage okay so we actually uh included everything here which we probably should not have done 00:28:23.920 |
why did we do that uh so in here we're including all of the 00:28:32.560 |
oh in here so using or including all of the content from the messages so i think maybe if we just do 00:28:42.320 |
x.content for x in messages that should resolve that 00:28:49.760 |
okay there we go so we quickly fix that so yeah before we're passing in the entire messages object 00:28:57.920 |
which obviously includes all of this information whereas actually we just want to be passing into content 00:29:03.760 |
so we modified that and now we're getting what we expect okay cool and then we can keep going all right 00:29:11.920 |
so as we as we keep going the summary should get more like abstract like as we just saw here is 00:29:18.560 |
literally just giving us the messages directly almost okay so we're getting the summary there 00:29:24.720 |
and we can keep going we're going to add just more messages to that we'll see the you know as 00:29:30.400 |
we'll get send those we'll get in a response send again get response we're just adding all of that 00:29:36.560 |
invoking all that and i'll be of course adding everything into our message history okay cool so 00:29:43.440 |
we've run that let's see what the uh latest summary is 00:29:46.480 |
okay and then we have this so this is a summary that we have instead of our our chat history okay cool 00:29:57.360 |
now finally let's see what's my name again we can just double check you know has my name in there 00:30:04.320 |
okay cool so your name is james pretty interesting so let's have a quick look over at linesmith so the 00:30:17.760 |
reason i want to do this is just to point out okay the different essentially token usage that we're getting 00:30:23.120 |
with each one of these okay so we can see that we have these runnable message history which are 00:30:27.360 |
probably uh improved in naming there but we can see okay how long is each one of these taken how many 00:30:34.960 |
tokens are they also using come back to here we have this runnable message history and this is we'll go 00:30:42.240 |
through a few of these maybe to here i think we can see here this is that first interaction where we're using the 00:30:48.880 |
the buffer memory and we can see how many tokens were used here so 112 tokens when we're asking what 00:30:56.080 |
is my name again okay then we modified this to include i think it was like 14 interactions or something on 00:31:04.480 |
those lines saying obviously increases number of tokens that we're using right so we can see that actually 00:31:09.440 |
happening all in line stuff which is quite nice and we can compare okay how many tokens is each one of 00:31:14.160 |
these using now this is looking at the buffer window and then if we come down to here and look at 00:31:20.640 |
this one so this is using our summary okay so our summary with what is my name again i actually use more 00:31:27.600 |
tokens in this scenario right which is interesting because we're trying to compress information the reason 00:31:33.200 |
those more is because there's not there hasn't been that many interactions as the conversation length 00:31:39.440 |
increases with the summary this total number of tokens especially if we prompt it correctly to keep that 00:31:46.080 |
low that should remain relatively small whereas with the buffer memory that will just keep increasing 00:31:54.480 |
and increasing as a as the conversation gets longer so useful little way of using line smith there to 00:32:02.560 |
just kind of figure out okay in terms of tokens and costs of what we're looking at for each of these 00:32:07.840 |
memory types okay so our final memory type acts as a mix of the summary memory and the buffer memory 00:32:17.040 |
so what it's going to do is keep the buffer up until an n number of tokens and then once a message exceeds 00:32:25.920 |
the n number of token limit for the buffer it is actually going to be added into our summary so this 00:32:33.280 |
memory has the benefit of remembering in detail the most recent interactions whilst also not having limitation 00:32:43.840 |
of using too many tokens as a conversation gets longer and even potentially exceeding context windows if you 00:32:51.600 |
try super hard so this is a very interesting approach now as before let's try the original way of 00:32:59.280 |
implementing this then we will go ahead and use our update method for implementing this so we come down to 00:33:07.280 |
here and we're going to do lang chain memory import conversation summary buffer memory okay a few things 00:33:15.040 |
here lm for summary we have the n number of tokens that we can keep before they get added to the summary 00:33:23.440 |
and then return messages of course okay you can see again this is deprecated we use the conversation chain 00:33:29.760 |
and then we just pass in our memory there and then we can chat okay so super straightforward first message 00:33:37.520 |
we'll add a few more here again we have to invoke because how memory type here is using the lm to create 00:33:46.560 |
those summaries as it goes and let's see what they look like okay so we can see for the first message here 00:33:52.560 |
we have a human message and then an ai message then we come a little bit lower down again same thing human 00:34:00.080 |
message is the first thing in our history here then it's a system message so this is at the point where 00:34:06.560 |
we've exceeded that 300 token limit and the memory type here is generating those summaries so that summary 00:34:14.560 |
comes in as a system message and we can see okay the human named james introduces himself and mentions he's 00:34:20.240 |
researching different types of conversation memory and so on and so on right okay cool so we have that 00:34:26.320 |
then let's come down a little bit further we can see okay so the summary there okay so that's what we 00:34:35.120 |
that's what we have that is the implementation for the old version of this memory again we can see it's 00:34:43.520 |
deprecated so how do we implement this for our more recent versions of langchain and specifically 0.3 well 00:34:52.720 |
again we're using that runnable with message history and it looks a little more complicated than we were 00:34:59.360 |
getting before but it's actually just you know it's nothing too complex we're just creating a summary as we 00:35:06.800 |
didn't with the previous memory type but the decision for adding to that summary is based on in this case 00:35:14.320 |
actually the number of messages so i didn't go with the the langchain version where it's a number of tokens 00:35:21.040 |
i don't like that i prefer to go with messages so what i'm doing is saying okay less k messages okay 00:35:27.520 |
once we exceed k messages the measures beyond that are going to be added to the memory okay cool so 00:35:36.160 |
let's see we first initialize our conversation summary buffer message history class with lm and k 00:35:47.040 |
okay so these two here so lm of course to create summaries and k is just the limit of number of messages 00:35:53.360 |
that we want to keep before adding them to the summary or dropping them from our messages and 00:35:58.160 |
adding them to the summary okay so we will begin with okay do we have an existing summary so the reason 00:36:07.840 |
we set this in none is we can't extract the summary the existing summary unless it already exists and the 00:36:16.560 |
only way we can do that is by checking okay do we have any messages if yes we want to check if within 00:36:22.880 |
those messages we have a system message because we're we're doing the same structure as what we 00:36:27.840 |
have up here where the system message that first system message is actually our summary so that's 00:36:32.720 |
what we're doing here we're checking if there is a summary message already stored within our messages 00:36:37.760 |
okay so we're checking for that if we find it we'll just do we have this little print statement so we 00:36:46.080 |
can see that we found something and then we just make our existing summary i should actually move 00:36:52.400 |
move this to the first instance here okay so that existing summary will be set to the first message 00:37:04.560 |
okay and this would be a system message rather than a string 00:37:09.600 |
cool so we have that then we want to add any new messages to our history okay so we're sending the history 00:37:20.240 |
there and then we're saying okay if the length of our history is exceeds the k value that we set 00:37:26.720 |
we're going to say okay we found that many messages we're going to be dropping the latest it's going to 00:37:30.800 |
be the latest two messages this i will say here one thing or one problem with this 00:37:37.360 |
is that we're not going to be saving that many tokens if we're summarizing every two messages so 00:37:44.480 |
what i would probably do is in in an actual like production setting i would probably say let's go up 00:37:52.560 |
to 20 messages and once we hit 20 messages let's take the previous 10 we're going to summarize them 00:38:00.160 |
and put them into our summary alongside any you know previous summary that already existed but in in you 00:38:05.600 |
know this is also fine as well okay so we say we found those messages we're going to drop the latest 00:38:14.880 |
two messages okay so we pull the the oldest messages out i should say not the latest it's the oldest 00:38:24.640 |
not the latest i want to keep the latest drop the oldest so we pull out the oldest messages and keep 00:38:31.440 |
only the most recent messages okay then i'm saying okay if we if we don't have any old messages to 00:38:39.520 |
summarize we don't do anything we just return okay so this indicates that this has not been triggered 00:38:46.560 |
we would hit this but in the case this has been triggered and we do have old messages we're going 00:38:54.560 |
to come to here okay so this is we can see our system message prompt template saying giving the 00:39:02.640 |
existing conversation summary in the new messages generate a new summary of the conversation ensuring 00:39:07.920 |
to maintain as much relevant information as possible so if we want to be more conservative with tokens we 00:39:13.200 |
could modify this prompt here to say keep the summary to within the length of a single paragraph for 00:39:20.560 |
example and then we have our human message prompt template which can say okay here's the existing 00:39:25.840 |
compensation summary and here are our new messages now new messages here is actually the old messages 00:39:31.280 |
but the way that we're framing it to the llm here is that we want to summarize the whole 00:39:36.880 |
conversation right it doesn't need to have the most recent messages that we're storing within 00:39:41.680 |
our buffer it doesn't need to know about those that's irrelevant to the summary so we just tell 00:39:46.640 |
it that we have these new matches and as far as this lm is concerned this is like the full set of 00:39:52.160 |
interactions okay so then we would format those and invoke our lm and then we'll print out our new 00:39:59.840 |
summary so we can see what's going on there and we would prepend that new summary to our conversation 00:40:07.520 |
history okay and and this will work so we can just prepend it like this because we've already popped 00:40:17.200 |
where was it up here if we have an existing summary we already popped that from the list so it's already 00:40:24.880 |
been pulled out of that list so it's okay for us to just we don't need to say like we don't need to do 00:40:31.440 |
this because we've already dropped that initial system message if it existed okay and then we have 00:40:37.120 |
the clear method as before so that's all of the logic for our conversational summary buffer memory 00:40:46.640 |
okay so we can see what's going on here we can see what's going on here we define our get chat history 00:40:53.600 |
function with the lm and k parameters there and then we'll also want to set the configurable fields 00:40:55.280 |
again so that is just going to be called session id lm and k 00:41:04.320 |
okay so now we can invoke the k value to begin with is going to be four okay so we can see no 00:41:12.800 |
old messages to update summary with that's good let's invoke this a few times and let's see what we get 00:41:19.280 |
okay so no old messages to update summary with 00:41:26.560 |
found six messages dropping the oldest two and then we have new summary in the conversation james 00:41:31.760 |
introduced himself and first is interested in researching different types of conversational 00:41:35.760 |
memory right so you can see there's quite a lot in here at the moment so we would definitely want to 00:41:40.880 |
prompt the lm the summary lm to keep that short otherwise we're just getting a ton of stuff 00:41:48.800 |
right but we can see that that is you know it's working it's functional so let's go back 00:41:56.480 |
and see if we can prompt it to be a little more concise so we come to here ensuring to maintain 00:42:01.680 |
as much relevant information as possible however we need to keep our summary concise the limit 00:42:13.920 |
is a single short paragraph okay something like this let's try and let's see what we get with that 00:42:26.400 |
okay so message one again nothing to update see this so new summary you can see it's a bit shorter it 00:42:34.480 |
okay so that seems better let's see so you can see the first summary is a bit shorter 00:42:45.120 |
but then as soon as we get to the second and third summaries the second summary is actually 00:42:50.720 |
slightly longer than the third one okay so we're going to be we're going to be losing a bit of 00:42:56.000 |
information in this case more than we were before but we're saving a ton of tokens so that's of course 00:43:02.480 |
a good thing and of course we could keep going and adding many interactions here and we should see that 00:43:08.080 |
this conversation summary will be it should maintain that sort of length of around one short paragraph 00:43:16.080 |
so that is it for this chapter on conversation memory we've seen a few different memory types we've 00:43:24.080 |
implemented their old deprecated versions so we can see what they were like and then we've 00:43:29.520 |
re-implemented them for the latest versions of lang chain and to be honest using logic where we are 00:43:36.640 |
getting much more into the weeds and that is in some ways okay it complicates things that is true but in 00:43:44.080 |
other ways it gives us a ton of control so we can modify those memory types as we did with that final 00:43:50.080 |
summary buffer memory type we can modify those to our liking which is incredibly useful when you're 00:43:58.080 |
actually building applications for the real world so that is it for this chapter we'll move on to the next one 00:44:03.840 |
that's it for this chapter we'll move on to the next chapter we'll see you next time