Chatbot Memory for Chat-GPT, Davinci + other LLMs

00:00:00.000 | conversational memory has become

00:00:02.300 | a very important topic recently.

00:00:04.840 | It essentially describes how a chatbot

00:00:08.760 | or some other type of AI agent

00:00:12.480 | can respond to queries

00:00:14.800 | in kind of like a conversational manner.

00:00:17.320 | So if you think about conversation,

00:00:19.500 | the current point of the conversation

00:00:22.560 | depends almost entirely on the previous parts

00:00:26.600 | of the conversation, the previous interactions.

00:00:29.300 | And conversational memory is how we allow

00:00:32.880 | a large language model or other type of language model

00:00:36.440 | to remember those previous interactions.

00:00:40.000 | It is the type of memory that we would see in chatbots

00:00:44.380 | like OpenAI's ChatGPT or Google's Lambda.

00:00:49.100 | Without conversational memory,

00:00:50.920 | we would not be able to have a coherent conversation

00:00:54.400 | with any of these chatbots.

00:00:56.340 | And that is because by default,

00:00:58.360 | these large language models are stateless.

00:01:01.280 | That means that every incoming query to the model

00:01:04.980 | is treated independently of everything else.

00:01:07.760 | It doesn't remember previous queries or anything like that.

00:01:11.560 | It just looks at what you are giving it

00:01:14.000 | at this current moment in time.

00:01:16.500 | So how can we make a stateless large language model

00:01:21.100 | remember things that happened before?

00:01:23.960 | Well, that is what we're going to describe in this video.

00:01:27.240 | We're going to jump straight into it.

00:01:29.380 | We're going to be using the LangChain library

00:01:32.580 | to work through these different types

00:01:34.680 | of conversational memory,

00:01:35.800 | because there are a few different types

00:01:37.760 | and we'll introduce some of the essential types

00:01:40.880 | that you need to know in order to build chatbots

00:01:43.560 | or other conversational agents.

00:01:45.500 | So to get started,

00:01:47.000 | we will get worked through this notebook here.

00:01:49.800 | If you'd like to follow along and run this code as well,

00:01:52.680 | there will be a link top of the video right now

00:01:54.840 | that will take you to this notebook.

00:01:56.700 | So libraries that we're going to be using

00:01:58.280 | are LangChain, OpenAI, and TicToken.

00:02:00.560 | All right, we will install those.

00:02:03.120 | And once you've installed those,

00:02:04.120 | we'll move on to our imports here.

00:02:06.960 | Now, before we do move on any further,

00:02:09.640 | the vast majority of this notebook is by Francisco.

00:02:12.840 | So thanks a lot for putting that together.

00:02:15.120 | That is the same Francisco that we saw in the previous video

00:02:18.120 | and he'll be joining us for future videos as well.

00:02:20.840 | So let's go ahead and import everything in here.

00:02:23.840 | You can see a few different memory types

00:02:26.240 | that we're going to be using.

00:02:27.080 | So we have this conversational buffer memory,

00:02:28.900 | summary memory, and so on.

00:02:31.100 | We'll be taking a look at all of these

00:02:32.500 | and actually maybe some other ones as well.

00:02:35.680 | First thing we want to do is,

00:02:37.660 | well, throughout this notebook,

00:02:38.620 | we're going to be using OpenAI's large language models.

00:02:41.840 | So we want to go ahead and actually save the OpenAI API key.

00:02:46.840 | To get that, you'll need to go over to platform.openai.com.

00:02:52.660 | You log into your account in the top right corner.

00:02:56.340 | Once you have logged in, you can click on view API keys

00:02:59.900 | and then you'll just create a new secret key

00:03:02.620 | and just copy that and then paste it

00:03:06.380 | into the little prompt that we get here in there.

00:03:11.100 | Okay, once we've done that,

00:03:14.300 | we have our OpenAI API key in there

00:03:17.300 | and what we can do is run this first step.

00:03:21.020 | So obviously recently you may have heard

00:03:24.460 | that there is a new chat GPT model that is available to us

00:03:28.220 | and we can actually just put that in here.

00:03:31.020 | And what I'll do is,

00:03:31.920 | whilst we're running through this first example,

00:03:34.360 | I will also show you the examples that we get

00:03:39.180 | when we run with a GPT 3.5 turbo.

00:03:42.120 | But for the majority of this notebook,

00:03:44.020 | we're going to be sticking with the text vintage 003 model

00:03:47.460 | just to show that, you know,

00:03:48.980 | we can use any large language model with this.

00:03:52.020 | It doesn't have to be a large language model

00:03:54.740 | like chat GPT that has been trained specifically

00:03:57.220 | as a chatbot.

00:03:58.060 | It can be any large language model.

00:03:59.580 | So later on, we're going to be using this function here,

00:04:01.980 | count tokens, don't worry about it right now,

00:04:04.420 | we'll skip that.

00:04:05.260 | First thing we want to have a look at is this,

00:04:08.320 | the conversation chain.

00:04:09.740 | So everything that we are about to talk about

00:04:12.900 | is built on top of the conversation chain.

00:04:15.780 | And the conversation chain,

00:04:17.380 | we just pass it a large language model.

00:04:19.300 | So this is going to be the text vintage 003 model

00:04:22.140 | we just created.

00:04:23.620 | And all it's going to do,

00:04:26.100 | or the core part of this conversation chain does

00:04:28.980 | is what you can see here.

00:04:30.500 | So the following is a friendly conversation

00:04:32.380 | between a human and AI.

00:04:33.740 | AI is talkative and provides lots of specific details

00:04:38.020 | from its context.

00:04:39.580 | If the AI does not know the answer to the question,

00:04:41.900 | it truthfully says it does not know, right?

00:04:44.180 | So this is a prompt that essentially primes the model

00:04:48.240 | to behave in a particular way.

00:04:50.460 | So in this case, it's going to behave

00:04:53.900 | like a friendly AI assistant.

00:04:57.420 | And it does this first by having this current conversation,

00:05:02.420 | this is the history that we're going to pass.

00:05:04.580 | So, you know, I said before, these models are stateless.

00:05:07.700 | The reason we can pass a history

00:05:09.100 | is because we're actually taking over the past interactions

00:05:12.480 | and passing them into the model

00:05:15.380 | at this current point in time.

00:05:17.620 | So in that single prompt, we will have our current input,

00:05:22.020 | our current query,

00:05:23.140 | and all of our past interactions with the chatbot.

00:05:26.620 | All right, so all of those past interactions

00:05:28.220 | get passed to history here,

00:05:30.100 | and our current query or current input

00:05:33.340 | is passed to the input.

00:05:35.300 | And then we tell the model,

00:05:37.180 | okay, now it's your time to respond with this AI part.

00:05:41.420 | And beyond that, there's not really anything special

00:05:44.580 | going on with this conversation chain.

00:05:46.220 | So we can see here in these low examples

00:05:48.820 | that Francisco pulled up,

00:05:50.520 | the actual code for the conversation chain here, right?

00:05:54.260 | So which we initialized here,

00:05:55.900 | the code that calls these is exactly the same

00:06:00.340 | as the code that calls the large language model chain.

00:06:02.900 | All right, there's not actually anything else changing

00:06:05.520 | other than this prompt here.

00:06:07.380 | Okay, so let's move on to the different memory types.

00:06:12.060 | So as I mentioned before,

00:06:13.500 | there are a few different memory types

00:06:15.420 | that we are able to use.

00:06:17.380 | Now, these different types of memory in line chain

00:06:20.700 | are essentially just going to change the history part

00:06:24.860 | of that prompt that you saw before.

00:06:26.860 | So these go in and they will essentially

00:06:30.220 | take the conversation history,

00:06:32.480 | format it in a particular way,

00:06:35.100 | and just place it into that history parameter.

00:06:38.300 | But as you may have guessed,

00:06:39.460 | they format things differently.

00:06:41.940 | And because of that,

00:06:43.100 | there are pros and cons to each one of these methods.

00:06:46.220 | So the simplest of those is the conversation buffer memory.

00:06:50.760 | Now, the conversation buffer memory is very simple.

00:06:53.860 | It basically takes all of your past interactions

00:06:56.140 | between you and the AI,

00:06:58.660 | and it just passes them into that history parameter

00:07:02.740 | as the raw text.

00:07:04.280 | There's no processing done.

00:07:05.460 | It is literally just a raw conversation

00:07:08.000 | that you've had up to this point.

00:07:10.220 | Now, to initialize that, we just do this, super simple.

00:07:13.180 | So we have the memory here.

00:07:15.060 | So we have our conversation chain.

00:07:16.540 | This time we're specifying the memory parameter,

00:07:19.100 | and we're using conversational buffer memory.

00:07:21.500 | We run that, and then we can just pass in some input.

00:07:24.260 | So we're going to go with good morning AI.

00:07:27.080 | And we get this response here.

00:07:28.380 | Good morning, it's a beautiful day today, isn't it?

00:07:30.380 | How can I help you?

00:07:32.220 | Cool.

00:07:33.060 | Now, one thing that I do want to do

00:07:34.380 | is actually count the number of tokens that we're using,

00:07:36.420 | 'cause that's a very big part

00:07:37.740 | of which one of these methods

00:07:39.260 | we might want to use over the others.

00:07:41.460 | Now, to count those tokens that we're using,

00:07:44.940 | we actually need to refer back

00:07:47.060 | to the count tokens function up here, right?

00:07:50.660 | So this is just a,

00:07:52.860 | so we have this getOpenAI callback from a lang chain here.

00:07:57.080 | And within that callback,

00:08:00.300 | we are actually going to get the total number of tokens

00:08:02.940 | that we just used in our most recent request to OpenAI.

00:08:08.680 | So let's come down here.

00:08:12.580 | To use that count tokens function,

00:08:14.660 | all we're going to do is pass our conversational chain,

00:08:18.100 | and we're going to pass in our input.

00:08:19.700 | So our next input is going to be,

00:08:20.980 | my interest here is to explore the potential

00:08:22.900 | of integrating larger language models

00:08:24.700 | with external knowledge, okay?

00:08:27.420 | We run this, and we see, okay,

00:08:29.620 | we spent a total of 179 tokens so far.

00:08:33.000 | We keep going, I just want to analyze

00:08:35.180 | the different possibilities, so on and so on, right?

00:08:38.160 | And because we're saving the raw interactions

00:08:42.460 | that have happened up to this point,

00:08:44.300 | naturally with every new interaction,

00:08:46.460 | the number of tokens that we're using

00:08:48.380 | with each call increases.

00:08:51.140 | And we just see this keep increasing and increasing.

00:08:55.300 | But each one of these queries that we're making

00:08:59.220 | is considering all the previous interactions,

00:09:02.300 | and we can see that, right?

00:09:04.220 | There is a common thread throughout

00:09:06.860 | each of these interactions,

00:09:08.720 | and the responses that we're getting,

00:09:10.360 | like it's clearly a conversation.

00:09:12.320 | And if you come down to here,

00:09:15.200 | we ask the final question, which is,

00:09:17.360 | what is my aim again?

00:09:18.620 | Now, earlier on, we specified my interest,

00:09:22.400 | we didn't specifically say aim,

00:09:24.200 | we said my interest here is to explore

00:09:26.560 | the potential of integrating large language models

00:09:29.040 | with external knowledge.

00:09:30.800 | And if we come down,

00:09:32.120 | that's basically what we're asking here.

00:09:34.160 | What is my aim?

00:09:35.340 | And it says your aim is to explore the potential

00:09:37.860 | of integrating large language models

00:09:39.100 | with external knowledge.

00:09:40.140 | Okay, so that's just to confirm that, yes,

00:09:42.460 | the model does in fact remember

00:09:45.340 | the start of the conversation.

00:09:47.020 | So clearly that works, right?

00:09:49.820 | And let's wait for those to run.

00:09:51.780 | Okay, cool, and so what you can see here

00:09:54.060 | is we have the conversation chain that we initialize,

00:09:56.540 | and then we have this memory attribute,

00:09:59.060 | and within that, the buffer attribute.

00:10:01.540 | This is literally going to show us

00:10:03.420 | what it is that we're feeding into that history parameter.

00:10:07.420 | And you can just see that it is literally

00:10:09.940 | just the conversation.

00:10:11.540 | Like there is nothing else there.

00:10:13.220 | We're not summarizing, we're not modifying it in any way.

00:10:16.980 | It's just the conversation that we had from above.

00:10:20.540 | Okay, so that is the first one of our memory types.

00:10:23.500 | I think the pros and cons to this are relatively clear,

00:10:27.340 | but let's go through them anyway.

00:10:29.120 | So the pros are that we're storing everything, right?

00:10:33.740 | We're taking the raw interactions.

00:10:36.260 | We're not modifying or shortening them in any way.

00:10:38.860 | And that means that we're storing

00:10:40.220 | the maximum amount of information,

00:10:42.020 | which means we're not losing any information

00:10:44.180 | from the previous interactions.

00:10:45.740 | And to add to that, just storing previous interactions

00:10:49.540 | is a very simple approach.

00:10:50.980 | It's intuitive, it's not complicated in any way.

00:10:54.060 | So that's also a nice benefit.

00:10:56.220 | But they kind of come with a few cons.

00:10:59.680 | And naturally, if we're storing all these tokens,

00:11:02.680 | it means that the response times of the model

00:11:05.160 | are going to be slower,

00:11:06.240 | especially as the conversation continues

00:11:08.280 | and the queries that we're sending get bigger.

00:11:11.560 | And it also means that our costs

00:11:14.920 | are going to increase as well.

00:11:16.880 | And beyond that, it's even completely limiting us.

00:11:20.280 | So right now, the Text Adventure 003 model

00:11:24.560 | and the GPT 3.5 Turbo model

00:11:28.260 | both have a max token limit of,

00:11:32.620 | I think it's 4,096 tokens.

00:11:36.620 | That's pretty big,

00:11:38.060 | but a conversation might go on for longer than this.

00:11:41.320 | So as soon as we hit that limit with this type of memory,

00:11:46.320 | we're actually just going to hit an error and that's it.

00:11:50.100 | Like we can't continue the conversation.

00:11:52.260 | So that's a pretty big downside.

00:11:54.300 | So are there any other types of memory

00:11:57.500 | that can help us remedy these issues?

00:11:59.940 | Yes, there are.

00:12:00.940 | First, we have the conversation summary memory.

00:12:04.700 | This allows us to avoid excessive token usage

00:12:08.080 | by summarizing the previous interactions.

00:12:12.020 | Rather than storing everything, we just summarize it.

00:12:14.940 | So in between us getting our previous interactions

00:12:19.500 | and passing them into the history parameter of our prompt,

00:12:23.900 | they're summarized and obviously shortened

00:12:25.940 | or hopefully shortened.

00:12:27.580 | Now to use this, we run this.

00:12:30.300 | Okay, so we have conversation chain

00:12:32.060 | and our memory is a conversation summary memory.

00:12:35.260 | And to actually do this summarization,

00:12:37.780 | we also need a large language model.

00:12:39.540 | So we're actually just going to use

00:12:40.920 | the same large language model here.

00:12:43.140 | And this conversation summary memory,

00:12:45.080 | just as a quick reminder, is coming from this.

00:12:48.020 | So line chain, chains, conversation.memory,

00:12:50.980 | and then we have all of our memory types imported there.

00:12:54.360 | Okay, and let's have a look at what the prompt is

00:12:59.180 | for this summarization component, right?

00:13:01.580 | Because we're performing two calls here.

00:13:03.460 | We are performing the call

00:13:05.220 | to the summarization large language model,

00:13:08.220 | and then we will be performing the call

00:13:11.300 | to the chatbot or the conversational AI component.

00:13:16.300 | So the first call is that summarization

00:13:19.020 | and it looks like this, okay?

00:13:20.180 | So conversation of some,

00:13:21.900 | so that is just this conversation chain that we have here.

00:13:25.380 | We're looking at the memory

00:13:26.340 | and we're looking at the prompt template.

00:13:28.340 | It is progressively summarizing the lines

00:13:30.860 | of conversation provided,

00:13:32.240 | adding on to the previous summary,

00:13:33.820 | returning a new summary, okay?

00:13:36.560 | So we're going to, this is an example.

00:13:39.120 | Current summary, the human asks

00:13:40.220 | what AI thinks of artificial intelligence.

00:13:42.340 | The AI thinks artificial intelligence is a force for good.

00:13:45.320 | And then it's saying new lines of conversation.

00:13:47.660 | Why do you think AI is a force for good?

00:13:50.780 | Because AI will help humans reach their full potential.

00:13:53.980 | And then it creates a new summary.

00:13:56.140 | The human asks what AI thinks of this.

00:13:58.220 | AI thinks artificial intelligence is a force for good

00:14:03.100 | because it will help humans reach their full potential.

00:14:05.660 | So it's basically just added on to the end

00:14:08.420 | of the previous summary, a little bit more information.

00:14:11.420 | And it will basically keep doing that

00:14:13.940 | with each interaction, right?

00:14:16.060 | So from there, we have the current summary,

00:14:20.060 | we pass in new lines of conversation,

00:14:22.980 | and then we will create a new summary.

00:14:25.740 | Now let's see how we would actually run through all of this.

00:14:28.820 | So we have our conversational summary memory here.

00:14:33.680 | We're going to go through the same conversation again.

00:14:35.500 | So good morning, AI.

00:14:36.820 | And you'll notice that the responses

00:14:38.340 | are going to be slightly different.

00:14:39.740 | And we'll just run through these very quickly.

00:14:42.300 | And what we really want to see is,

00:14:44.900 | although we're summarizing,

00:14:46.500 | is the model able to remember that final question again?

00:14:50.020 | What is my aim again?

00:14:52.180 | And fortunately, we can see that, yeah,

00:14:54.820 | it does have that same correct answer again.

00:14:58.660 | So that's pretty cool.

00:15:00.820 | Now, the only issue I see here is,

00:15:03.700 | okay, we're summarizing

00:15:05.460 | 'cause we want to reduce the number of tokens,

00:15:06.920 | but just take a look at this.

00:15:09.740 | We're spending a total of almost 750 tokens here.

00:15:13.980 | That's the second-to-last input.

00:15:16.420 | Let's compare that to up here.

00:15:18.340 | Okay, second-to-last input

00:15:20.140 | for the one where we're just saving everything,

00:15:23.260 | like the raw interactions, is 360,

00:15:27.820 | which is actually less than half the number of tokens.

00:15:32.820 | So what is going on there?

00:15:35.900 | Well, the summaries are generated text,

00:15:39.620 | and they can actually be pretty long.

00:15:43.020 | So we have this conversation, sum, memory, buffer.

00:15:47.660 | Okay, I'm not gonna read it all, but you can go through.

00:15:51.100 | There's clearly a lot of text there.

00:15:53.340 | So is this actually helping us at all?

00:15:56.460 | Well, it can do.

00:15:57.900 | It just requires us to get

00:16:00.060 | to a certain length of conversation.

00:16:02.300 | And we can see here that the actual,

00:16:04.100 | so we're using this TIC token tokenizer,

00:16:07.420 | which is essentially the opening eyes tokenizer,

00:16:10.780 | and we're using it for text of interest 0x03,

00:16:13.260 | which is the current large language model that we're using.

00:16:15.860 | Now, if we look at specifically the memory buffer

00:16:19.420 | and look at the number of tokens that we have in there

00:16:21.900 | versus the actual conversation,

00:16:23.740 | so this is a conversational buffer memory

00:16:26.340 | where we're storing everything,

00:16:27.940 | and this is a summary memory

00:16:29.780 | where we're just storing a summary.

00:16:31.580 | If we compare both of those,

00:16:33.620 | we can actually see that the summary memory

00:16:36.420 | is actually a lot shorter, right?

00:16:38.860 | The only issue is the reason

00:16:40.580 | that we're actually spending more tokens

00:16:42.580 | is because first we're doing that summarization

00:16:45.340 | in the first place,

00:16:46.820 | and you can also see that we have two prompts.

00:16:49.820 | This prompt here is already quite a lot of text.

00:16:53.060 | So, you know, okay, that's great.

00:16:55.780 | It seems like, you know,

00:16:57.260 | I understand that the summary itself is shorter,

00:17:00.140 | but it doesn't matter

00:17:01.260 | because the actual total here is still longer.

00:17:05.300 | And yes, that's true for this conversation,

00:17:07.500 | but for longer conversations,

00:17:09.180 | this is not usually the case.

00:17:12.060 | So we have this visual here,

00:17:13.580 | which I have calculated with using a longer conversation.

00:17:16.780 | The link for the code for that,

00:17:18.460 | you can see at the top of the video right now.

00:17:21.180 | But in this, we can see the comparison

00:17:23.700 | between these two methods.

00:17:25.380 | So the line that you see

00:17:27.580 | just kind of growing linearly right here,

00:17:30.300 | that is our conversation buffer memory, okay?

00:17:34.780 | So the first one we looked at.

00:17:36.260 | And you see that we get to a certain level,

00:17:38.580 | like around 25 interactions.

00:17:41.060 | And at that point,

00:17:41.980 | we actually hit the token limit of the model.

00:17:45.260 | Whereas the summary memory that we're using,

00:17:48.100 | initially, yes, is higher,

00:17:50.220 | but then it kind of, it doesn't grow quite as quickly.

00:17:53.300 | It grows quite quickly towards the end there,

00:17:55.500 | but the overall growth rate is much slower.

00:17:59.180 | So for shorter conversations,

00:18:01.100 | it's actually better to use the direct buffer memory.

00:18:03.900 | But for longer conversations,

00:18:05.980 | summary memory, or you can see here, it works better.

00:18:09.700 | It reduces the number of tokens that you're using overall.

00:18:12.340 | So naturally, which one of those you would use

00:18:15.260 | just depends on your use case.

00:18:16.940 | So I think we can kind of summarize the pros and cons here.

00:18:20.740 | For summary memory is it shortens the number of tokens

00:18:24.580 | for long conversations,

00:18:26.140 | also enables much longer conversations.

00:18:29.220 | And it's all relatively straightforward implementation,

00:18:32.100 | super easy to understand.

00:18:34.100 | But on the cons, naturally you just solve

00:18:36.420 | the shorter conversations.

00:18:38.180 | It doesn't help at all.

00:18:39.420 | It's actually less efficient.

00:18:41.380 | The memorization of the prior chats,

00:18:45.660 | because we're not saving everything

00:18:46.860 | like we did the buffer memory,

00:18:48.260 | the memorization of those previous interactions

00:18:50.740 | is wholly reliant on the summarization,

00:18:53.900 | including that information,

00:18:55.620 | which it might not always do,

00:18:57.460 | particularly if you're not using a particularly advanced

00:19:00.260 | large language model for that summarization.

00:19:02.340 | So both those things we also just need to consider

00:19:05.620 | with this approach.

00:19:06.700 | So moving on to conversational buffer window memory,

00:19:09.500 | this one acts in a very similar way to a buffer memory,

00:19:11.900 | but there is now a window on the number of interactions

00:19:15.540 | that we remember or save.

00:19:18.140 | So we're gonna set that equal to one.

00:19:20.300 | So we're gonna save the one most recent interaction

00:19:23.620 | from the AI and also the human, right?

00:19:27.380 | So one human interaction, one AI interaction.

00:19:31.540 | Usually it would be much larger,

00:19:33.580 | but it's just for the sake of this example.

00:19:36.900 | So running through these again,

00:19:39.060 | you know, it's pretty straightforward, right?

00:19:40.940 | I'm not gonna rerun everything here,

00:19:43.140 | but we can see we go through the same conversation again,

00:19:45.500 | and we get to the end and we say, what is my aim again?

00:19:48.620 | All right, and your aim is to use data sources

00:19:50.420 | to give context to the model, right?

00:19:53.260 | You know, that's wrong.

00:19:54.420 | But the reason that the model is saying this

00:19:58.740 | is because it actually only remembers

00:19:59.980 | this previous interactions, one last interaction,

00:20:03.300 | because we set K equal to one.

00:20:05.220 | And we can actually see that here.

00:20:07.140 | So if we go into our conversation chain,

00:20:10.180 | we go to the memory attribute, load memory variables.

00:20:13.860 | We set inputs equal to nothing now,

00:20:15.980 | 'cause we don't actually wanna pass anything in.

00:20:18.100 | And we load the history item.

00:20:21.180 | Then we can actually see the history that we have there.

00:20:24.300 | So we have just the previous interaction.

00:20:27.860 | And the previous interaction was actually this, right?

00:20:30.420 | So, because we asked this question.

00:20:32.300 | So we actually just have that.

00:20:34.220 | Now, obviously, naturally with this approach,

00:20:37.620 | you're probably going to be using a much larger number

00:20:40.620 | than K equals one.

00:20:42.580 | And you can actually see the effect of that here.

00:20:45.020 | So adding to the previous two visuals that we saw,

00:20:48.140 | we now have conversational buffer window memory

00:20:50.500 | for that longer conversation with K equals 12.

00:20:52.780 | So remembering the previous 12 interactions,

00:20:55.540 | and also K equals six.

00:20:56.740 | And you can see, okay, the token count for these,

00:20:59.500 | it's kind of on par with the conversational buffer memory

00:21:03.260 | up until you get to the number of interactions

00:21:05.860 | that you're saving.

00:21:06.700 | And then it kind of flattens out a little bit

00:21:08.900 | because you're just saving the previous 12

00:21:12.260 | or six interactions in this case.

00:21:15.340 | So naturally, I think the main pro is here,

00:21:18.940 | kind of similar to the buffer memory,

00:21:20.380 | we're saving the raw input of the most recent interactions.

00:21:25.140 | And the con is obviously

00:21:27.020 | that we are not remembering distant interactions.

00:21:30.180 | So if we do want to remember distant interactions,

00:21:32.380 | we just can't do that with this example.

00:21:35.820 | Now, there's one other memory type

00:21:37.380 | that I wanted to very quickly cover here.

00:21:39.220 | And that is a conversation summary buffer memory.

00:21:43.820 | So let's import that quickly from lang chain,

00:21:48.220 | chains, conversation.memory, import.

00:21:54.340 | I'm going to be importing

00:21:55.460 | conversation summary buffer memory, okay?

00:21:58.820 | Now to initialize this, we will use this code here.

00:22:03.820 | So we have the conversation chain again,

00:22:07.140 | we're passing in the large language model,

00:22:09.860 | and then we're also passing in the memory

00:22:11.900 | like we did before, conversation summary buffer memory.

00:22:15.340 | And in there, so we have passing a large language model.

00:22:20.340 | Now, the reason we're doing this

00:22:21.940 | is because you can see here that we are summarizing, okay?

00:22:26.100 | And then we also pass in this max token limit.

00:22:28.580 | Now, this is kind of equivalent to what you saw before,

00:22:33.220 | where we had K equals six.

00:22:36.300 | But rather than saying

00:22:37.340 | we're going to save the previous six interactions,

00:22:40.500 | we're actually saying we're going to save

00:22:42.140 | the previous 650 tokens.

00:22:45.940 | So we're kind of doing both.

00:22:49.500 | So we're summarizing here,

00:22:51.700 | and we are also saving the most recent interactions

00:22:55.740 | in their raw form.

00:22:57.900 | So we can run that,

00:22:59.660 | and we would just use it in the exact same way.

00:23:02.740 | Okay, so main pros and cons with this one as well,

00:23:05.740 | kind of like a mix of the previous two methods we looked at.

00:23:09.100 | So we have a little bit of the buffer window in there,

00:23:12.260 | and also the summaries in there.

00:23:14.580 | So the summarizer means that we can remember

00:23:17.220 | those distant interactions,

00:23:19.100 | and the buffer window means that we are not misrepresenting

00:23:24.100 | the most recent interactions,

00:23:27.140 | because we're storing them in their raw form.

00:23:28.860 | So we're keeping as much information there as possible,

00:23:31.940 | but only from the most recent interactions.

00:23:34.940 | And then the cons, kind of similar to the other ones again.

00:23:38.180 | For the summarizer,

00:23:39.420 | we naturally increase the token count

00:23:42.700 | for shorter conversations.

00:23:44.580 | And with the summarizer,

00:23:46.100 | we're not necessarily going to remember

00:23:49.020 | distant interactions that well.

00:23:52.340 | They don't really contain all of the information.

00:23:55.380 | And naturally, storing the raw interactions,

00:23:57.980 | as well as the summarized interactions from a while back,

00:24:01.380 | all of this increases the token count.

00:24:04.580 | But this method does allow us quite a few methods,

00:24:08.580 | or a few parameters that we can tweak

00:24:10.180 | in order to get what it is that we want.

00:24:13.540 | And in this little visualization here,

00:24:15.380 | we've added the summary buffer memory

00:24:18.820 | with a max token limit of 1,300,

00:24:21.300 | and also 650 there,

00:24:23.780 | which is roughly equivalent to the K equals 12

00:24:27.700 | and K equals six that we had before.

00:24:29.820 | And we see that the tokens are not too excessive.

00:24:33.500 | Now, there are also other memory types.

00:24:36.380 | Very quickly, you can see this one here

00:24:38.260 | is the conversation knowledge graph memory,

00:24:41.260 | which essentially keeps a knowledge graph

00:24:44.140 | of all of the entities

00:24:46.140 | that have been mentioned throughout the conversation.

00:24:48.060 | So you can kind of see that here.

00:24:50.020 | We say, "My name is human and I like mangoes."

00:24:53.260 | And then we see the memory here.

00:24:54.900 | And we see that the entity human,

00:24:56.740 | referring to the person,

00:24:58.580 | and the entity human,

00:24:59.940 | I think this is referring to the name.

00:25:01.740 | They're both connected

00:25:02.700 | because the human is the name of human.

00:25:06.780 | And then here we have the entity human

00:25:09.700 | and the entity mangoes.

00:25:11.540 | And we can see that they are connected

00:25:13.300 | because the human likes mangoes, right?

00:25:16.660 | So we have that knowledge graph memory in there.

00:25:19.260 | But beyond that,

00:25:20.100 | we're not going to dive into that for now any further.

00:25:23.140 | We're going to cover that more in a future video.

00:25:25.980 | So that's actually it for this introduction

00:25:28.980 | to conversational memory types with LangChain.

00:25:33.660 | We've covered quite a few there.

00:25:35.900 | And I think the ones that we have covered

00:25:38.220 | are more than enough to actually build chatbot

00:25:41.620 | or conversational AI agents

00:25:44.020 | using what seem to be the same methods

00:25:46.780 | as those that are being used

00:25:49.140 | by some of the state-of-the-art

00:25:51.420 | conversational AI agents out there today,

00:25:53.860 | like OpenAI's ChatGPT

00:25:56.140 | and possibly Google's Lambda,

00:25:58.100 | although we really have no idea how that works.

00:26:00.900 | But for now, we'll leave it there.

00:26:03.340 | As I mentioned,

00:26:04.180 | there's going to be a lot more

00:26:05.100 | on this type of stuff in the future.

00:26:06.980 | Thank you very much for watching.

00:26:09.260 | I hope this has been interesting and useful.

00:26:11.500 | And I will see you again in the next one.

00:26:13.700 | Bye.

00:26:14.540 | (gentle music)

00:26:17.980 | (gentle music)

00:26:20.580 | (gentle music)

00:26:23.180 | (gentle music)

00:26:25.780 | you

00:26:27.840 | you

Chatbot Memory for Chat-GPT, Davinci + other LLMs - LangChain #4

Chapters