back to indexChatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4
Chapters
0:0 Conversational memory for chatbots
0:28 Why we need conversational memory for chatbots
1:45 Implementation of conversational memory
4:5 LangChain's Conversation Chain
12:0 Conversation Summary Memory in LangChain
19:6 Conversation Buffer Window Memory in LangChain
21:35 Conversation Summary Buffer Memory in LangChain
24:33 Other LangChain Memory Types
25:25 Final thoughts on conversational memory
00:00:22.560 |
depends almost entirely on the previous parts 00:00:26.600 |
of the conversation, the previous interactions. 00:00:32.880 |
a large language model or other type of language model 00:00:40.000 |
It is the type of memory that we would see in chatbots 00:00:50.920 |
we would not be able to have a coherent conversation 00:01:01.280 |
That means that every incoming query to the model 00:01:07.760 |
It doesn't remember previous queries or anything like that. 00:01:16.500 |
So how can we make a stateless large language model 00:01:23.960 |
Well, that is what we're going to describe in this video. 00:01:29.380 |
We're going to be using the LangChain library 00:01:37.760 |
and we'll introduce some of the essential types 00:01:40.880 |
that you need to know in order to build chatbots 00:01:47.000 |
we will get worked through this notebook here. 00:01:49.800 |
If you'd like to follow along and run this code as well, 00:01:52.680 |
there will be a link top of the video right now 00:02:09.640 |
the vast majority of this notebook is by Francisco. 00:02:15.120 |
That is the same Francisco that we saw in the previous video 00:02:18.120 |
and he'll be joining us for future videos as well. 00:02:20.840 |
So let's go ahead and import everything in here. 00:02:27.080 |
So we have this conversational buffer memory, 00:02:38.620 |
we're going to be using OpenAI's large language models. 00:02:41.840 |
So we want to go ahead and actually save the OpenAI API key. 00:02:46.840 |
To get that, you'll need to go over to platform.openai.com. 00:02:52.660 |
You log into your account in the top right corner. 00:02:56.340 |
Once you have logged in, you can click on view API keys 00:03:06.380 |
into the little prompt that we get here in there. 00:03:24.460 |
that there is a new chat GPT model that is available to us 00:03:31.920 |
whilst we're running through this first example, 00:03:34.360 |
I will also show you the examples that we get 00:03:44.020 |
we're going to be sticking with the text vintage 003 model 00:03:48.980 |
we can use any large language model with this. 00:03:54.740 |
like chat GPT that has been trained specifically 00:03:59.580 |
So later on, we're going to be using this function here, 00:04:01.980 |
count tokens, don't worry about it right now, 00:04:05.260 |
First thing we want to have a look at is this, 00:04:09.740 |
So everything that we are about to talk about 00:04:19.300 |
So this is going to be the text vintage 003 model 00:04:26.100 |
or the core part of this conversation chain does 00:04:33.740 |
AI is talkative and provides lots of specific details 00:04:39.580 |
If the AI does not know the answer to the question, 00:04:44.180 |
So this is a prompt that essentially primes the model 00:04:57.420 |
And it does this first by having this current conversation, 00:05:02.420 |
this is the history that we're going to pass. 00:05:04.580 |
So, you know, I said before, these models are stateless. 00:05:09.100 |
is because we're actually taking over the past interactions 00:05:17.620 |
So in that single prompt, we will have our current input, 00:05:23.140 |
and all of our past interactions with the chatbot. 00:05:37.180 |
okay, now it's your time to respond with this AI part. 00:05:41.420 |
And beyond that, there's not really anything special 00:05:50.520 |
the actual code for the conversation chain here, right? 00:05:55.900 |
the code that calls these is exactly the same 00:06:00.340 |
as the code that calls the large language model chain. 00:06:02.900 |
All right, there's not actually anything else changing 00:06:07.380 |
Okay, so let's move on to the different memory types. 00:06:17.380 |
Now, these different types of memory in line chain 00:06:20.700 |
are essentially just going to change the history part 00:06:35.100 |
and just place it into that history parameter. 00:06:43.100 |
there are pros and cons to each one of these methods. 00:06:46.220 |
So the simplest of those is the conversation buffer memory. 00:06:50.760 |
Now, the conversation buffer memory is very simple. 00:06:53.860 |
It basically takes all of your past interactions 00:06:58.660 |
and it just passes them into that history parameter 00:07:10.220 |
Now, to initialize that, we just do this, super simple. 00:07:16.540 |
This time we're specifying the memory parameter, 00:07:19.100 |
and we're using conversational buffer memory. 00:07:21.500 |
We run that, and then we can just pass in some input. 00:07:28.380 |
Good morning, it's a beautiful day today, isn't it? 00:07:34.380 |
is actually count the number of tokens that we're using, 00:07:52.860 |
so we have this getOpenAI callback from a lang chain here. 00:08:00.300 |
we are actually going to get the total number of tokens 00:08:02.940 |
that we just used in our most recent request to OpenAI. 00:08:14.660 |
all we're going to do is pass our conversational chain, 00:08:35.180 |
the different possibilities, so on and so on, right? 00:08:38.160 |
And because we're saving the raw interactions 00:08:51.140 |
And we just see this keep increasing and increasing. 00:08:55.300 |
But each one of these queries that we're making 00:08:59.220 |
is considering all the previous interactions, 00:09:26.560 |
the potential of integrating large language models 00:09:35.340 |
And it says your aim is to explore the potential 00:09:54.060 |
is we have the conversation chain that we initialize, 00:10:03.420 |
what it is that we're feeding into that history parameter. 00:10:13.220 |
We're not summarizing, we're not modifying it in any way. 00:10:16.980 |
It's just the conversation that we had from above. 00:10:20.540 |
Okay, so that is the first one of our memory types. 00:10:23.500 |
I think the pros and cons to this are relatively clear, 00:10:29.120 |
So the pros are that we're storing everything, right? 00:10:36.260 |
We're not modifying or shortening them in any way. 00:10:45.740 |
And to add to that, just storing previous interactions 00:10:50.980 |
It's intuitive, it's not complicated in any way. 00:10:59.680 |
And naturally, if we're storing all these tokens, 00:11:02.680 |
it means that the response times of the model 00:11:08.280 |
and the queries that we're sending get bigger. 00:11:16.880 |
And beyond that, it's even completely limiting us. 00:11:38.060 |
but a conversation might go on for longer than this. 00:11:41.320 |
So as soon as we hit that limit with this type of memory, 00:11:46.320 |
we're actually just going to hit an error and that's it. 00:12:00.940 |
First, we have the conversation summary memory. 00:12:04.700 |
This allows us to avoid excessive token usage 00:12:12.020 |
Rather than storing everything, we just summarize it. 00:12:14.940 |
So in between us getting our previous interactions 00:12:19.500 |
and passing them into the history parameter of our prompt, 00:12:32.060 |
and our memory is a conversation summary memory. 00:12:45.080 |
just as a quick reminder, is coming from this. 00:12:50.980 |
and then we have all of our memory types imported there. 00:12:54.360 |
Okay, and let's have a look at what the prompt is 00:13:11.300 |
to the chatbot or the conversational AI component. 00:13:21.900 |
so that is just this conversation chain that we have here. 00:13:42.340 |
The AI thinks artificial intelligence is a force for good. 00:13:45.320 |
And then it's saying new lines of conversation. 00:13:50.780 |
Because AI will help humans reach their full potential. 00:13:58.220 |
AI thinks artificial intelligence is a force for good 00:14:03.100 |
because it will help humans reach their full potential. 00:14:08.420 |
of the previous summary, a little bit more information. 00:14:25.740 |
Now let's see how we would actually run through all of this. 00:14:28.820 |
So we have our conversational summary memory here. 00:14:33.680 |
We're going to go through the same conversation again. 00:14:39.740 |
And we'll just run through these very quickly. 00:14:46.500 |
is the model able to remember that final question again? 00:15:05.460 |
'cause we want to reduce the number of tokens, 00:15:09.740 |
We're spending a total of almost 750 tokens here. 00:15:20.140 |
for the one where we're just saving everything, 00:15:27.820 |
which is actually less than half the number of tokens. 00:15:43.020 |
So we have this conversation, sum, memory, buffer. 00:15:47.660 |
Okay, I'm not gonna read it all, but you can go through. 00:16:07.420 |
which is essentially the opening eyes tokenizer, 00:16:10.780 |
and we're using it for text of interest 0x03, 00:16:13.260 |
which is the current large language model that we're using. 00:16:15.860 |
Now, if we look at specifically the memory buffer 00:16:19.420 |
and look at the number of tokens that we have in there 00:16:42.580 |
is because first we're doing that summarization 00:16:46.820 |
and you can also see that we have two prompts. 00:16:49.820 |
This prompt here is already quite a lot of text. 00:16:57.260 |
I understand that the summary itself is shorter, 00:17:01.260 |
because the actual total here is still longer. 00:17:13.580 |
which I have calculated with using a longer conversation. 00:17:18.460 |
you can see at the top of the video right now. 00:17:30.300 |
that is our conversation buffer memory, okay? 00:17:41.980 |
we actually hit the token limit of the model. 00:17:50.220 |
but then it kind of, it doesn't grow quite as quickly. 00:17:53.300 |
It grows quite quickly towards the end there, 00:18:01.100 |
it's actually better to use the direct buffer memory. 00:18:05.980 |
summary memory, or you can see here, it works better. 00:18:09.700 |
It reduces the number of tokens that you're using overall. 00:18:12.340 |
So naturally, which one of those you would use 00:18:16.940 |
So I think we can kind of summarize the pros and cons here. 00:18:20.740 |
For summary memory is it shortens the number of tokens 00:18:29.220 |
And it's all relatively straightforward implementation, 00:18:48.260 |
the memorization of those previous interactions 00:18:57.460 |
particularly if you're not using a particularly advanced 00:19:02.340 |
So both those things we also just need to consider 00:19:06.700 |
So moving on to conversational buffer window memory, 00:19:09.500 |
this one acts in a very similar way to a buffer memory, 00:19:11.900 |
but there is now a window on the number of interactions 00:19:20.300 |
So we're gonna save the one most recent interaction 00:19:27.380 |
So one human interaction, one AI interaction. 00:19:39.060 |
you know, it's pretty straightforward, right? 00:19:43.140 |
but we can see we go through the same conversation again, 00:19:45.500 |
and we get to the end and we say, what is my aim again? 00:19:48.620 |
All right, and your aim is to use data sources 00:19:59.980 |
this previous interactions, one last interaction, 00:20:10.180 |
we go to the memory attribute, load memory variables. 00:20:15.980 |
'cause we don't actually wanna pass anything in. 00:20:21.180 |
Then we can actually see the history that we have there. 00:20:27.860 |
And the previous interaction was actually this, right? 00:20:34.220 |
Now, obviously, naturally with this approach, 00:20:37.620 |
you're probably going to be using a much larger number 00:20:42.580 |
And you can actually see the effect of that here. 00:20:45.020 |
So adding to the previous two visuals that we saw, 00:20:48.140 |
we now have conversational buffer window memory 00:20:50.500 |
for that longer conversation with K equals 12. 00:20:56.740 |
And you can see, okay, the token count for these, 00:20:59.500 |
it's kind of on par with the conversational buffer memory 00:21:03.260 |
up until you get to the number of interactions 00:21:06.700 |
And then it kind of flattens out a little bit 00:21:20.380 |
we're saving the raw input of the most recent interactions. 00:21:27.020 |
that we are not remembering distant interactions. 00:21:30.180 |
So if we do want to remember distant interactions, 00:21:39.220 |
And that is a conversation summary buffer memory. 00:21:43.820 |
So let's import that quickly from lang chain, 00:21:58.820 |
Now to initialize this, we will use this code here. 00:22:11.900 |
like we did before, conversation summary buffer memory. 00:22:15.340 |
And in there, so we have passing a large language model. 00:22:21.940 |
is because you can see here that we are summarizing, okay? 00:22:26.100 |
And then we also pass in this max token limit. 00:22:28.580 |
Now, this is kind of equivalent to what you saw before, 00:22:37.340 |
we're going to save the previous six interactions, 00:22:51.700 |
and we are also saving the most recent interactions 00:22:59.660 |
and we would just use it in the exact same way. 00:23:02.740 |
Okay, so main pros and cons with this one as well, 00:23:05.740 |
kind of like a mix of the previous two methods we looked at. 00:23:09.100 |
So we have a little bit of the buffer window in there, 00:23:19.100 |
and the buffer window means that we are not misrepresenting 00:23:27.140 |
because we're storing them in their raw form. 00:23:28.860 |
So we're keeping as much information there as possible, 00:23:34.940 |
And then the cons, kind of similar to the other ones again. 00:23:52.340 |
They don't really contain all of the information. 00:23:57.980 |
as well as the summarized interactions from a while back, 00:24:04.580 |
But this method does allow us quite a few methods, 00:24:23.780 |
which is roughly equivalent to the K equals 12 00:24:29.820 |
And we see that the tokens are not too excessive. 00:24:46.140 |
that have been mentioned throughout the conversation. 00:24:50.020 |
We say, "My name is human and I like mangoes." 00:25:16.660 |
So we have that knowledge graph memory in there. 00:25:20.100 |
we're not going to dive into that for now any further. 00:25:23.140 |
We're going to cover that more in a future video. 00:25:28.980 |
to conversational memory types with LangChain. 00:25:38.220 |
are more than enough to actually build chatbot 00:25:58.100 |
although we really have no idea how that works.