back to index

Chatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4


Chapters

0:0 Conversational memory for chatbots
0:28 Why we need conversational memory for chatbots
1:45 Implementation of conversational memory
4:5 LangChain's Conversation Chain
12:0 Conversation Summary Memory in LangChain
19:6 Conversation Buffer Window Memory in LangChain
21:35 Conversation Summary Buffer Memory in LangChain
24:33 Other LangChain Memory Types
25:25 Final thoughts on conversational memory

Whisper Transcript | Transcript Only Page

00:00:00.000 | conversational memory has become
00:00:02.300 | a very important topic recently.
00:00:04.840 | It essentially describes how a chatbot
00:00:08.760 | or some other type of AI agent
00:00:12.480 | can respond to queries
00:00:14.800 | in kind of like a conversational manner.
00:00:17.320 | So if you think about conversation,
00:00:19.500 | the current point of the conversation
00:00:22.560 | depends almost entirely on the previous parts
00:00:26.600 | of the conversation, the previous interactions.
00:00:29.300 | And conversational memory is how we allow
00:00:32.880 | a large language model or other type of language model
00:00:36.440 | to remember those previous interactions.
00:00:40.000 | It is the type of memory that we would see in chatbots
00:00:44.380 | like OpenAI's ChatGPT or Google's Lambda.
00:00:49.100 | Without conversational memory,
00:00:50.920 | we would not be able to have a coherent conversation
00:00:54.400 | with any of these chatbots.
00:00:56.340 | And that is because by default,
00:00:58.360 | these large language models are stateless.
00:01:01.280 | That means that every incoming query to the model
00:01:04.980 | is treated independently of everything else.
00:01:07.760 | It doesn't remember previous queries or anything like that.
00:01:11.560 | It just looks at what you are giving it
00:01:14.000 | at this current moment in time.
00:01:16.500 | So how can we make a stateless large language model
00:01:21.100 | remember things that happened before?
00:01:23.960 | Well, that is what we're going to describe in this video.
00:01:27.240 | We're going to jump straight into it.
00:01:29.380 | We're going to be using the LangChain library
00:01:32.580 | to work through these different types
00:01:34.680 | of conversational memory,
00:01:35.800 | because there are a few different types
00:01:37.760 | and we'll introduce some of the essential types
00:01:40.880 | that you need to know in order to build chatbots
00:01:43.560 | or other conversational agents.
00:01:45.500 | So to get started,
00:01:47.000 | we will get worked through this notebook here.
00:01:49.800 | If you'd like to follow along and run this code as well,
00:01:52.680 | there will be a link top of the video right now
00:01:54.840 | that will take you to this notebook.
00:01:56.700 | So libraries that we're going to be using
00:01:58.280 | are LangChain, OpenAI, and TicToken.
00:02:00.560 | All right, we will install those.
00:02:03.120 | And once you've installed those,
00:02:04.120 | we'll move on to our imports here.
00:02:06.960 | Now, before we do move on any further,
00:02:09.640 | the vast majority of this notebook is by Francisco.
00:02:12.840 | So thanks a lot for putting that together.
00:02:15.120 | That is the same Francisco that we saw in the previous video
00:02:18.120 | and he'll be joining us for future videos as well.
00:02:20.840 | So let's go ahead and import everything in here.
00:02:23.840 | You can see a few different memory types
00:02:26.240 | that we're going to be using.
00:02:27.080 | So we have this conversational buffer memory,
00:02:28.900 | summary memory, and so on.
00:02:31.100 | We'll be taking a look at all of these
00:02:32.500 | and actually maybe some other ones as well.
00:02:35.680 | First thing we want to do is,
00:02:37.660 | well, throughout this notebook,
00:02:38.620 | we're going to be using OpenAI's large language models.
00:02:41.840 | So we want to go ahead and actually save the OpenAI API key.
00:02:46.840 | To get that, you'll need to go over to platform.openai.com.
00:02:52.660 | You log into your account in the top right corner.
00:02:56.340 | Once you have logged in, you can click on view API keys
00:02:59.900 | and then you'll just create a new secret key
00:03:02.620 | and just copy that and then paste it
00:03:06.380 | into the little prompt that we get here in there.
00:03:11.100 | Okay, once we've done that,
00:03:14.300 | we have our OpenAI API key in there
00:03:17.300 | and what we can do is run this first step.
00:03:21.020 | So obviously recently you may have heard
00:03:24.460 | that there is a new chat GPT model that is available to us
00:03:28.220 | and we can actually just put that in here.
00:03:31.020 | And what I'll do is,
00:03:31.920 | whilst we're running through this first example,
00:03:34.360 | I will also show you the examples that we get
00:03:39.180 | when we run with a GPT 3.5 turbo.
00:03:42.120 | But for the majority of this notebook,
00:03:44.020 | we're going to be sticking with the text vintage 003 model
00:03:47.460 | just to show that, you know,
00:03:48.980 | we can use any large language model with this.
00:03:52.020 | It doesn't have to be a large language model
00:03:54.740 | like chat GPT that has been trained specifically
00:03:57.220 | as a chatbot.
00:03:58.060 | It can be any large language model.
00:03:59.580 | So later on, we're going to be using this function here,
00:04:01.980 | count tokens, don't worry about it right now,
00:04:04.420 | we'll skip that.
00:04:05.260 | First thing we want to have a look at is this,
00:04:08.320 | the conversation chain.
00:04:09.740 | So everything that we are about to talk about
00:04:12.900 | is built on top of the conversation chain.
00:04:15.780 | And the conversation chain,
00:04:17.380 | we just pass it a large language model.
00:04:19.300 | So this is going to be the text vintage 003 model
00:04:22.140 | we just created.
00:04:23.620 | And all it's going to do,
00:04:26.100 | or the core part of this conversation chain does
00:04:28.980 | is what you can see here.
00:04:30.500 | So the following is a friendly conversation
00:04:32.380 | between a human and AI.
00:04:33.740 | AI is talkative and provides lots of specific details
00:04:38.020 | from its context.
00:04:39.580 | If the AI does not know the answer to the question,
00:04:41.900 | it truthfully says it does not know, right?
00:04:44.180 | So this is a prompt that essentially primes the model
00:04:48.240 | to behave in a particular way.
00:04:50.460 | So in this case, it's going to behave
00:04:53.900 | like a friendly AI assistant.
00:04:57.420 | And it does this first by having this current conversation,
00:05:02.420 | this is the history that we're going to pass.
00:05:04.580 | So, you know, I said before, these models are stateless.
00:05:07.700 | The reason we can pass a history
00:05:09.100 | is because we're actually taking over the past interactions
00:05:12.480 | and passing them into the model
00:05:15.380 | at this current point in time.
00:05:17.620 | So in that single prompt, we will have our current input,
00:05:22.020 | our current query,
00:05:23.140 | and all of our past interactions with the chatbot.
00:05:26.620 | All right, so all of those past interactions
00:05:28.220 | get passed to history here,
00:05:30.100 | and our current query or current input
00:05:33.340 | is passed to the input.
00:05:35.300 | And then we tell the model,
00:05:37.180 | okay, now it's your time to respond with this AI part.
00:05:41.420 | And beyond that, there's not really anything special
00:05:44.580 | going on with this conversation chain.
00:05:46.220 | So we can see here in these low examples
00:05:48.820 | that Francisco pulled up,
00:05:50.520 | the actual code for the conversation chain here, right?
00:05:54.260 | So which we initialized here,
00:05:55.900 | the code that calls these is exactly the same
00:06:00.340 | as the code that calls the large language model chain.
00:06:02.900 | All right, there's not actually anything else changing
00:06:05.520 | other than this prompt here.
00:06:07.380 | Okay, so let's move on to the different memory types.
00:06:12.060 | So as I mentioned before,
00:06:13.500 | there are a few different memory types
00:06:15.420 | that we are able to use.
00:06:17.380 | Now, these different types of memory in line chain
00:06:20.700 | are essentially just going to change the history part
00:06:24.860 | of that prompt that you saw before.
00:06:26.860 | So these go in and they will essentially
00:06:30.220 | take the conversation history,
00:06:32.480 | format it in a particular way,
00:06:35.100 | and just place it into that history parameter.
00:06:38.300 | But as you may have guessed,
00:06:39.460 | they format things differently.
00:06:41.940 | And because of that,
00:06:43.100 | there are pros and cons to each one of these methods.
00:06:46.220 | So the simplest of those is the conversation buffer memory.
00:06:50.760 | Now, the conversation buffer memory is very simple.
00:06:53.860 | It basically takes all of your past interactions
00:06:56.140 | between you and the AI,
00:06:58.660 | and it just passes them into that history parameter
00:07:02.740 | as the raw text.
00:07:04.280 | There's no processing done.
00:07:05.460 | It is literally just a raw conversation
00:07:08.000 | that you've had up to this point.
00:07:10.220 | Now, to initialize that, we just do this, super simple.
00:07:13.180 | So we have the memory here.
00:07:15.060 | So we have our conversation chain.
00:07:16.540 | This time we're specifying the memory parameter,
00:07:19.100 | and we're using conversational buffer memory.
00:07:21.500 | We run that, and then we can just pass in some input.
00:07:24.260 | So we're going to go with good morning AI.
00:07:27.080 | And we get this response here.
00:07:28.380 | Good morning, it's a beautiful day today, isn't it?
00:07:30.380 | How can I help you?
00:07:32.220 | Cool.
00:07:33.060 | Now, one thing that I do want to do
00:07:34.380 | is actually count the number of tokens that we're using,
00:07:36.420 | 'cause that's a very big part
00:07:37.740 | of which one of these methods
00:07:39.260 | we might want to use over the others.
00:07:41.460 | Now, to count those tokens that we're using,
00:07:44.940 | we actually need to refer back
00:07:47.060 | to the count tokens function up here, right?
00:07:50.660 | So this is just a,
00:07:52.860 | so we have this getOpenAI callback from a lang chain here.
00:07:57.080 | And within that callback,
00:08:00.300 | we are actually going to get the total number of tokens
00:08:02.940 | that we just used in our most recent request to OpenAI.
00:08:08.680 | So let's come down here.
00:08:12.580 | To use that count tokens function,
00:08:14.660 | all we're going to do is pass our conversational chain,
00:08:18.100 | and we're going to pass in our input.
00:08:19.700 | So our next input is going to be,
00:08:20.980 | my interest here is to explore the potential
00:08:22.900 | of integrating larger language models
00:08:24.700 | with external knowledge, okay?
00:08:27.420 | We run this, and we see, okay,
00:08:29.620 | we spent a total of 179 tokens so far.
00:08:33.000 | We keep going, I just want to analyze
00:08:35.180 | the different possibilities, so on and so on, right?
00:08:38.160 | And because we're saving the raw interactions
00:08:42.460 | that have happened up to this point,
00:08:44.300 | naturally with every new interaction,
00:08:46.460 | the number of tokens that we're using
00:08:48.380 | with each call increases.
00:08:51.140 | And we just see this keep increasing and increasing.
00:08:55.300 | But each one of these queries that we're making
00:08:59.220 | is considering all the previous interactions,
00:09:02.300 | and we can see that, right?
00:09:04.220 | There is a common thread throughout
00:09:06.860 | each of these interactions,
00:09:08.720 | and the responses that we're getting,
00:09:10.360 | like it's clearly a conversation.
00:09:12.320 | And if you come down to here,
00:09:15.200 | we ask the final question, which is,
00:09:17.360 | what is my aim again?
00:09:18.620 | Now, earlier on, we specified my interest,
00:09:22.400 | we didn't specifically say aim,
00:09:24.200 | we said my interest here is to explore
00:09:26.560 | the potential of integrating large language models
00:09:29.040 | with external knowledge.
00:09:30.800 | And if we come down,
00:09:32.120 | that's basically what we're asking here.
00:09:34.160 | What is my aim?
00:09:35.340 | And it says your aim is to explore the potential
00:09:37.860 | of integrating large language models
00:09:39.100 | with external knowledge.
00:09:40.140 | Okay, so that's just to confirm that, yes,
00:09:42.460 | the model does in fact remember
00:09:45.340 | the start of the conversation.
00:09:47.020 | So clearly that works, right?
00:09:49.820 | And let's wait for those to run.
00:09:51.780 | Okay, cool, and so what you can see here
00:09:54.060 | is we have the conversation chain that we initialize,
00:09:56.540 | and then we have this memory attribute,
00:09:59.060 | and within that, the buffer attribute.
00:10:01.540 | This is literally going to show us
00:10:03.420 | what it is that we're feeding into that history parameter.
00:10:07.420 | And you can just see that it is literally
00:10:09.940 | just the conversation.
00:10:11.540 | Like there is nothing else there.
00:10:13.220 | We're not summarizing, we're not modifying it in any way.
00:10:16.980 | It's just the conversation that we had from above.
00:10:20.540 | Okay, so that is the first one of our memory types.
00:10:23.500 | I think the pros and cons to this are relatively clear,
00:10:27.340 | but let's go through them anyway.
00:10:29.120 | So the pros are that we're storing everything, right?
00:10:33.740 | We're taking the raw interactions.
00:10:36.260 | We're not modifying or shortening them in any way.
00:10:38.860 | And that means that we're storing
00:10:40.220 | the maximum amount of information,
00:10:42.020 | which means we're not losing any information
00:10:44.180 | from the previous interactions.
00:10:45.740 | And to add to that, just storing previous interactions
00:10:49.540 | is a very simple approach.
00:10:50.980 | It's intuitive, it's not complicated in any way.
00:10:54.060 | So that's also a nice benefit.
00:10:56.220 | But they kind of come with a few cons.
00:10:59.680 | And naturally, if we're storing all these tokens,
00:11:02.680 | it means that the response times of the model
00:11:05.160 | are going to be slower,
00:11:06.240 | especially as the conversation continues
00:11:08.280 | and the queries that we're sending get bigger.
00:11:11.560 | And it also means that our costs
00:11:14.920 | are going to increase as well.
00:11:16.880 | And beyond that, it's even completely limiting us.
00:11:20.280 | So right now, the Text Adventure 003 model
00:11:24.560 | and the GPT 3.5 Turbo model
00:11:28.260 | both have a max token limit of,
00:11:32.620 | I think it's 4,096 tokens.
00:11:36.620 | That's pretty big,
00:11:38.060 | but a conversation might go on for longer than this.
00:11:41.320 | So as soon as we hit that limit with this type of memory,
00:11:46.320 | we're actually just going to hit an error and that's it.
00:11:50.100 | Like we can't continue the conversation.
00:11:52.260 | So that's a pretty big downside.
00:11:54.300 | So are there any other types of memory
00:11:57.500 | that can help us remedy these issues?
00:11:59.940 | Yes, there are.
00:12:00.940 | First, we have the conversation summary memory.
00:12:04.700 | This allows us to avoid excessive token usage
00:12:08.080 | by summarizing the previous interactions.
00:12:12.020 | Rather than storing everything, we just summarize it.
00:12:14.940 | So in between us getting our previous interactions
00:12:19.500 | and passing them into the history parameter of our prompt,
00:12:23.900 | they're summarized and obviously shortened
00:12:25.940 | or hopefully shortened.
00:12:27.580 | Now to use this, we run this.
00:12:30.300 | Okay, so we have conversation chain
00:12:32.060 | and our memory is a conversation summary memory.
00:12:35.260 | And to actually do this summarization,
00:12:37.780 | we also need a large language model.
00:12:39.540 | So we're actually just going to use
00:12:40.920 | the same large language model here.
00:12:43.140 | And this conversation summary memory,
00:12:45.080 | just as a quick reminder, is coming from this.
00:12:48.020 | So line chain, chains, conversation.memory,
00:12:50.980 | and then we have all of our memory types imported there.
00:12:54.360 | Okay, and let's have a look at what the prompt is
00:12:59.180 | for this summarization component, right?
00:13:01.580 | Because we're performing two calls here.
00:13:03.460 | We are performing the call
00:13:05.220 | to the summarization large language model,
00:13:08.220 | and then we will be performing the call
00:13:11.300 | to the chatbot or the conversational AI component.
00:13:16.300 | So the first call is that summarization
00:13:19.020 | and it looks like this, okay?
00:13:20.180 | So conversation of some,
00:13:21.900 | so that is just this conversation chain that we have here.
00:13:25.380 | We're looking at the memory
00:13:26.340 | and we're looking at the prompt template.
00:13:28.340 | It is progressively summarizing the lines
00:13:30.860 | of conversation provided,
00:13:32.240 | adding on to the previous summary,
00:13:33.820 | returning a new summary, okay?
00:13:36.560 | So we're going to, this is an example.
00:13:39.120 | Current summary, the human asks
00:13:40.220 | what AI thinks of artificial intelligence.
00:13:42.340 | The AI thinks artificial intelligence is a force for good.
00:13:45.320 | And then it's saying new lines of conversation.
00:13:47.660 | Why do you think AI is a force for good?
00:13:50.780 | Because AI will help humans reach their full potential.
00:13:53.980 | And then it creates a new summary.
00:13:56.140 | The human asks what AI thinks of this.
00:13:58.220 | AI thinks artificial intelligence is a force for good
00:14:03.100 | because it will help humans reach their full potential.
00:14:05.660 | So it's basically just added on to the end
00:14:08.420 | of the previous summary, a little bit more information.
00:14:11.420 | And it will basically keep doing that
00:14:13.940 | with each interaction, right?
00:14:16.060 | So from there, we have the current summary,
00:14:20.060 | we pass in new lines of conversation,
00:14:22.980 | and then we will create a new summary.
00:14:25.740 | Now let's see how we would actually run through all of this.
00:14:28.820 | So we have our conversational summary memory here.
00:14:33.680 | We're going to go through the same conversation again.
00:14:35.500 | So good morning, AI.
00:14:36.820 | And you'll notice that the responses
00:14:38.340 | are going to be slightly different.
00:14:39.740 | And we'll just run through these very quickly.
00:14:42.300 | And what we really want to see is,
00:14:44.900 | although we're summarizing,
00:14:46.500 | is the model able to remember that final question again?
00:14:50.020 | What is my aim again?
00:14:52.180 | And fortunately, we can see that, yeah,
00:14:54.820 | it does have that same correct answer again.
00:14:58.660 | So that's pretty cool.
00:15:00.820 | Now, the only issue I see here is,
00:15:03.700 | okay, we're summarizing
00:15:05.460 | 'cause we want to reduce the number of tokens,
00:15:06.920 | but just take a look at this.
00:15:09.740 | We're spending a total of almost 750 tokens here.
00:15:13.980 | That's the second-to-last input.
00:15:16.420 | Let's compare that to up here.
00:15:18.340 | Okay, second-to-last input
00:15:20.140 | for the one where we're just saving everything,
00:15:23.260 | like the raw interactions, is 360,
00:15:27.820 | which is actually less than half the number of tokens.
00:15:32.820 | So what is going on there?
00:15:35.900 | Well, the summaries are generated text,
00:15:39.620 | and they can actually be pretty long.
00:15:43.020 | So we have this conversation, sum, memory, buffer.
00:15:47.660 | Okay, I'm not gonna read it all, but you can go through.
00:15:51.100 | There's clearly a lot of text there.
00:15:53.340 | So is this actually helping us at all?
00:15:56.460 | Well, it can do.
00:15:57.900 | It just requires us to get
00:16:00.060 | to a certain length of conversation.
00:16:02.300 | And we can see here that the actual,
00:16:04.100 | so we're using this TIC token tokenizer,
00:16:07.420 | which is essentially the opening eyes tokenizer,
00:16:10.780 | and we're using it for text of interest 0x03,
00:16:13.260 | which is the current large language model that we're using.
00:16:15.860 | Now, if we look at specifically the memory buffer
00:16:19.420 | and look at the number of tokens that we have in there
00:16:21.900 | versus the actual conversation,
00:16:23.740 | so this is a conversational buffer memory
00:16:26.340 | where we're storing everything,
00:16:27.940 | and this is a summary memory
00:16:29.780 | where we're just storing a summary.
00:16:31.580 | If we compare both of those,
00:16:33.620 | we can actually see that the summary memory
00:16:36.420 | is actually a lot shorter, right?
00:16:38.860 | The only issue is the reason
00:16:40.580 | that we're actually spending more tokens
00:16:42.580 | is because first we're doing that summarization
00:16:45.340 | in the first place,
00:16:46.820 | and you can also see that we have two prompts.
00:16:49.820 | This prompt here is already quite a lot of text.
00:16:53.060 | So, you know, okay, that's great.
00:16:55.780 | It seems like, you know,
00:16:57.260 | I understand that the summary itself is shorter,
00:17:00.140 | but it doesn't matter
00:17:01.260 | because the actual total here is still longer.
00:17:05.300 | And yes, that's true for this conversation,
00:17:07.500 | but for longer conversations,
00:17:09.180 | this is not usually the case.
00:17:12.060 | So we have this visual here,
00:17:13.580 | which I have calculated with using a longer conversation.
00:17:16.780 | The link for the code for that,
00:17:18.460 | you can see at the top of the video right now.
00:17:21.180 | But in this, we can see the comparison
00:17:23.700 | between these two methods.
00:17:25.380 | So the line that you see
00:17:27.580 | just kind of growing linearly right here,
00:17:30.300 | that is our conversation buffer memory, okay?
00:17:34.780 | So the first one we looked at.
00:17:36.260 | And you see that we get to a certain level,
00:17:38.580 | like around 25 interactions.
00:17:41.060 | And at that point,
00:17:41.980 | we actually hit the token limit of the model.
00:17:45.260 | Whereas the summary memory that we're using,
00:17:48.100 | initially, yes, is higher,
00:17:50.220 | but then it kind of, it doesn't grow quite as quickly.
00:17:53.300 | It grows quite quickly towards the end there,
00:17:55.500 | but the overall growth rate is much slower.
00:17:59.180 | So for shorter conversations,
00:18:01.100 | it's actually better to use the direct buffer memory.
00:18:03.900 | But for longer conversations,
00:18:05.980 | summary memory, or you can see here, it works better.
00:18:09.700 | It reduces the number of tokens that you're using overall.
00:18:12.340 | So naturally, which one of those you would use
00:18:15.260 | just depends on your use case.
00:18:16.940 | So I think we can kind of summarize the pros and cons here.
00:18:20.740 | For summary memory is it shortens the number of tokens
00:18:24.580 | for long conversations,
00:18:26.140 | also enables much longer conversations.
00:18:29.220 | And it's all relatively straightforward implementation,
00:18:32.100 | super easy to understand.
00:18:34.100 | But on the cons, naturally you just solve
00:18:36.420 | the shorter conversations.
00:18:38.180 | It doesn't help at all.
00:18:39.420 | It's actually less efficient.
00:18:41.380 | The memorization of the prior chats,
00:18:45.660 | because we're not saving everything
00:18:46.860 | like we did the buffer memory,
00:18:48.260 | the memorization of those previous interactions
00:18:50.740 | is wholly reliant on the summarization,
00:18:53.900 | including that information,
00:18:55.620 | which it might not always do,
00:18:57.460 | particularly if you're not using a particularly advanced
00:19:00.260 | large language model for that summarization.
00:19:02.340 | So both those things we also just need to consider
00:19:05.620 | with this approach.
00:19:06.700 | So moving on to conversational buffer window memory,
00:19:09.500 | this one acts in a very similar way to a buffer memory,
00:19:11.900 | but there is now a window on the number of interactions
00:19:15.540 | that we remember or save.
00:19:18.140 | So we're gonna set that equal to one.
00:19:20.300 | So we're gonna save the one most recent interaction
00:19:23.620 | from the AI and also the human, right?
00:19:27.380 | So one human interaction, one AI interaction.
00:19:31.540 | Usually it would be much larger,
00:19:33.580 | but it's just for the sake of this example.
00:19:36.900 | So running through these again,
00:19:39.060 | you know, it's pretty straightforward, right?
00:19:40.940 | I'm not gonna rerun everything here,
00:19:43.140 | but we can see we go through the same conversation again,
00:19:45.500 | and we get to the end and we say, what is my aim again?
00:19:48.620 | All right, and your aim is to use data sources
00:19:50.420 | to give context to the model, right?
00:19:53.260 | You know, that's wrong.
00:19:54.420 | But the reason that the model is saying this
00:19:58.740 | is because it actually only remembers
00:19:59.980 | this previous interactions, one last interaction,
00:20:03.300 | because we set K equal to one.
00:20:05.220 | And we can actually see that here.
00:20:07.140 | So if we go into our conversation chain,
00:20:10.180 | we go to the memory attribute, load memory variables.
00:20:13.860 | We set inputs equal to nothing now,
00:20:15.980 | 'cause we don't actually wanna pass anything in.
00:20:18.100 | And we load the history item.
00:20:21.180 | Then we can actually see the history that we have there.
00:20:24.300 | So we have just the previous interaction.
00:20:27.860 | And the previous interaction was actually this, right?
00:20:30.420 | So, because we asked this question.
00:20:32.300 | So we actually just have that.
00:20:34.220 | Now, obviously, naturally with this approach,
00:20:37.620 | you're probably going to be using a much larger number
00:20:40.620 | than K equals one.
00:20:42.580 | And you can actually see the effect of that here.
00:20:45.020 | So adding to the previous two visuals that we saw,
00:20:48.140 | we now have conversational buffer window memory
00:20:50.500 | for that longer conversation with K equals 12.
00:20:52.780 | So remembering the previous 12 interactions,
00:20:55.540 | and also K equals six.
00:20:56.740 | And you can see, okay, the token count for these,
00:20:59.500 | it's kind of on par with the conversational buffer memory
00:21:03.260 | up until you get to the number of interactions
00:21:05.860 | that you're saving.
00:21:06.700 | And then it kind of flattens out a little bit
00:21:08.900 | because you're just saving the previous 12
00:21:12.260 | or six interactions in this case.
00:21:15.340 | So naturally, I think the main pro is here,
00:21:18.940 | kind of similar to the buffer memory,
00:21:20.380 | we're saving the raw input of the most recent interactions.
00:21:25.140 | And the con is obviously
00:21:27.020 | that we are not remembering distant interactions.
00:21:30.180 | So if we do want to remember distant interactions,
00:21:32.380 | we just can't do that with this example.
00:21:35.820 | Now, there's one other memory type
00:21:37.380 | that I wanted to very quickly cover here.
00:21:39.220 | And that is a conversation summary buffer memory.
00:21:43.820 | So let's import that quickly from lang chain,
00:21:48.220 | chains, conversation.memory, import.
00:21:54.340 | I'm going to be importing
00:21:55.460 | conversation summary buffer memory, okay?
00:21:58.820 | Now to initialize this, we will use this code here.
00:22:03.820 | So we have the conversation chain again,
00:22:07.140 | we're passing in the large language model,
00:22:09.860 | and then we're also passing in the memory
00:22:11.900 | like we did before, conversation summary buffer memory.
00:22:15.340 | And in there, so we have passing a large language model.
00:22:20.340 | Now, the reason we're doing this
00:22:21.940 | is because you can see here that we are summarizing, okay?
00:22:26.100 | And then we also pass in this max token limit.
00:22:28.580 | Now, this is kind of equivalent to what you saw before,
00:22:33.220 | where we had K equals six.
00:22:36.300 | But rather than saying
00:22:37.340 | we're going to save the previous six interactions,
00:22:40.500 | we're actually saying we're going to save
00:22:42.140 | the previous 650 tokens.
00:22:45.940 | So we're kind of doing both.
00:22:49.500 | So we're summarizing here,
00:22:51.700 | and we are also saving the most recent interactions
00:22:55.740 | in their raw form.
00:22:57.900 | So we can run that,
00:22:59.660 | and we would just use it in the exact same way.
00:23:02.740 | Okay, so main pros and cons with this one as well,
00:23:05.740 | kind of like a mix of the previous two methods we looked at.
00:23:09.100 | So we have a little bit of the buffer window in there,
00:23:12.260 | and also the summaries in there.
00:23:14.580 | So the summarizer means that we can remember
00:23:17.220 | those distant interactions,
00:23:19.100 | and the buffer window means that we are not misrepresenting
00:23:24.100 | the most recent interactions,
00:23:27.140 | because we're storing them in their raw form.
00:23:28.860 | So we're keeping as much information there as possible,
00:23:31.940 | but only from the most recent interactions.
00:23:34.940 | And then the cons, kind of similar to the other ones again.
00:23:38.180 | For the summarizer,
00:23:39.420 | we naturally increase the token count
00:23:42.700 | for shorter conversations.
00:23:44.580 | And with the summarizer,
00:23:46.100 | we're not necessarily going to remember
00:23:49.020 | distant interactions that well.
00:23:52.340 | They don't really contain all of the information.
00:23:55.380 | And naturally, storing the raw interactions,
00:23:57.980 | as well as the summarized interactions from a while back,
00:24:01.380 | all of this increases the token count.
00:24:04.580 | But this method does allow us quite a few methods,
00:24:08.580 | or a few parameters that we can tweak
00:24:10.180 | in order to get what it is that we want.
00:24:13.540 | And in this little visualization here,
00:24:15.380 | we've added the summary buffer memory
00:24:18.820 | with a max token limit of 1,300,
00:24:21.300 | and also 650 there,
00:24:23.780 | which is roughly equivalent to the K equals 12
00:24:27.700 | and K equals six that we had before.
00:24:29.820 | And we see that the tokens are not too excessive.
00:24:33.500 | Now, there are also other memory types.
00:24:36.380 | Very quickly, you can see this one here
00:24:38.260 | is the conversation knowledge graph memory,
00:24:41.260 | which essentially keeps a knowledge graph
00:24:44.140 | of all of the entities
00:24:46.140 | that have been mentioned throughout the conversation.
00:24:48.060 | So you can kind of see that here.
00:24:50.020 | We say, "My name is human and I like mangoes."
00:24:53.260 | And then we see the memory here.
00:24:54.900 | And we see that the entity human,
00:24:56.740 | referring to the person,
00:24:58.580 | and the entity human,
00:24:59.940 | I think this is referring to the name.
00:25:01.740 | They're both connected
00:25:02.700 | because the human is the name of human.
00:25:06.780 | And then here we have the entity human
00:25:09.700 | and the entity mangoes.
00:25:11.540 | And we can see that they are connected
00:25:13.300 | because the human likes mangoes, right?
00:25:16.660 | So we have that knowledge graph memory in there.
00:25:19.260 | But beyond that,
00:25:20.100 | we're not going to dive into that for now any further.
00:25:23.140 | We're going to cover that more in a future video.
00:25:25.980 | So that's actually it for this introduction
00:25:28.980 | to conversational memory types with LangChain.
00:25:33.660 | We've covered quite a few there.
00:25:35.900 | And I think the ones that we have covered
00:25:38.220 | are more than enough to actually build chatbot
00:25:41.620 | or conversational AI agents
00:25:44.020 | using what seem to be the same methods
00:25:46.780 | as those that are being used
00:25:49.140 | by some of the state-of-the-art
00:25:51.420 | conversational AI agents out there today,
00:25:53.860 | like OpenAI's ChatGPT
00:25:56.140 | and possibly Google's Lambda,
00:25:58.100 | although we really have no idea how that works.
00:26:00.900 | But for now, we'll leave it there.
00:26:03.340 | As I mentioned,
00:26:04.180 | there's going to be a lot more
00:26:05.100 | on this type of stuff in the future.
00:26:06.980 | Thank you very much for watching.
00:26:09.260 | I hope this has been interesting and useful.
00:26:11.500 | And I will see you again in the next one.
00:26:14.540 | (gentle music)
00:26:17.980 | (gentle music)
00:26:20.580 | (gentle music)
00:26:23.180 | (gentle music)