Chat with OpenAI in LangChain

00:00:00.000 | With the introduction of OpenAI's new ChatGPT endpoint,

00:00:04.240 | the LangChain library have very quickly, unsurprisingly,

00:00:08.080 | added a ton of new support for chat.

00:00:11.320 | The reason for this is that,

00:00:13.000 | unlike previous large language model endpoints,

00:00:16.080 | the new ChatGPT endpoint is slightly different.

00:00:20.600 | It takes multiple inputs, and therefore, with LangChain,

00:00:24.960 | this new sort of approach to calling large language models

00:00:29.660 | has been supported with its own set of objects and functions.

00:00:34.660 | So the new ChatCompletion endpoint from OpenAI,

00:00:38.060 | it differs in the typical large language model endpoints

00:00:41.720 | in that you can essentially pass in three types of inputs

00:00:46.360 | that are defined or distinguished

00:00:49.060 | by these three different role types.

00:00:50.960 | These three different role types

00:00:52.640 | are system, user, and assistant.

00:00:55.880 | The system or system message acts as the initial prompt

00:01:00.880 | to the model in order to set up its behavior

00:01:04.700 | for the rest of the interaction.

00:01:06.220 | So for example, with ChatGPT, what you would find,

00:01:10.360 | before we even write anything,

00:01:13.220 | OpenAI have already passed in a system message to ChatGPT,

00:01:18.220 | kind of telling it how to behave.

00:01:20.500 | Then after that, we have the user messages.

00:01:24.060 | So user messages is like what we write, okay?

00:01:27.200 | So in ChatGPT, we write something that's a user message.

00:01:30.460 | And then the other one is the assistant message.

00:01:32.600 | Those are the responses that we get from ChatGPT, okay?

00:01:35.960 | So the assistant is what ChatGPT is producing.

00:01:39.540 | Now, when we use the endpoint,

00:01:42.040 | for every new interaction,

00:01:43.320 | we're feeding in a history of previous interactions as well.

00:01:46.200 | So we're always gonna have that system message at the top.

00:01:49.360 | We're going to have a user message

00:01:51.760 | followed by an assistant message, followed by user message,

00:01:54.240 | and so on and so on.

00:01:55.280 | So there is some difference with this new endpoint.

00:01:59.160 | And therefore, how we interact with ChatGPT via LangChain

00:02:04.160 | is also different.

00:02:06.200 | So let's just jump straight into it.

00:02:09.460 | Okay, so we get started with a pip install.

00:02:12.260 | Here we're doing LangChain and OpenAI.

00:02:14.440 | There's only two libraries we use for this.

00:02:16.420 | Once those have been installed or updated,

00:02:19.040 | okay, so this is the latest version of OpenAI and LangChain.

00:02:22.240 | So you do need to update

00:02:24.380 | if you haven't updated them very recently.

00:02:27.380 | Now, what we'll do is start by initializing

00:02:30.280 | the chat OpenAI object.

00:02:32.440 | For that, we do need an OpenAI API key.

00:02:35.520 | So you can click this link.

00:02:37.080 | There will be a link to this notebook,

00:02:39.320 | so you can follow along at the top of the video

00:02:41.320 | somewhere right now.

00:02:42.500 | But this will take us across to this page here.

00:02:46.800 | So this is platform.openai.com.

00:02:50.360 | And what we do is we go to view API keys.

00:02:52.900 | We go to here, create a new secret key,

00:02:55.560 | and then you copy that secret key.

00:02:57.540 | And what you do is run this cell.

00:03:00.320 | And you can see at the top here,

00:03:01.760 | it says, tells me OpenAI API key.

00:03:03.920 | If you're on Colab, it will appear just below the cell.

00:03:06.720 | And you just paste your API key into there.

00:03:09.440 | Okay, that stores the API key into here.

00:03:11.960 | And then we come down here.

00:03:14.580 | And what we're going to do is initialize

00:03:16.580 | the chat OpenAI object.

00:03:18.900 | So for this, we're going to be using the chat GPT model.

00:03:23.900 | Now, by using this, we're essentially going to default

00:03:27.420 | to the latest version of chat GPT.

00:03:31.780 | So right now, the latest version is actually this here.

00:03:36.780 | Okay, so if you want to follow this video

00:03:40.580 | and the exact same responses in the future,

00:03:43.620 | you need to write this.

00:03:44.660 | But I will leave it like this.

00:03:47.260 | Basically, as they release new versions of this model,

00:03:51.900 | this will just default to the latest one.

00:03:53.900 | Now, when setting temperature to zero,

00:03:58.180 | that would make the completions fully deterministic

00:04:01.580 | as far as I could tell.

00:04:03.140 | So like running the same prompt twice,

00:04:05.540 | you'll get the same output.

00:04:07.060 | Now, we've seen this.

00:04:08.540 | So the chats with chat GPT are kind of structured like this.

00:04:13.340 | So we have system, user, assistant, user, assistant.

00:04:17.300 | That final empty assistant prompt there

00:04:20.580 | is kind of telling the model,

00:04:24.420 | like now it's your time to respond, right?

00:04:27.820 | So the model is just completing

00:04:30.300 | the end of this conversation.

00:04:32.780 | And the way that we format that is like this, okay?

00:04:36.820 | In LineChain, they kind of mirror this format.

00:04:40.500 | It's very similar, but slightly different.

00:04:42.420 | So we have these system message objects,

00:04:44.860 | a human message object, and an AI message object.

00:04:49.100 | So to create this up here, we would write this, okay?

00:04:53.500 | So we have these messages,

00:04:54.900 | and it's just a list of these, okay?

00:04:56.900 | In the order that they have been passed in the conversation.

00:05:01.300 | Okay, so we're just passing,

00:05:02.900 | we are stopping user for human message

00:05:05.180 | and assistant for AI message.

00:05:07.180 | Assistant message is still system message.

00:05:09.860 | And let's run this, okay?

00:05:12.020 | And let's run this.

00:05:13.900 | So this is going to generate a response

00:05:16.380 | from the chat GPT model, right?

00:05:19.060 | And I get this.

00:05:20.140 | So we have AI message, it's pretty long.

00:05:23.420 | So what we can do is just print it out,

00:05:25.780 | and we get this, it's still pretty long,

00:05:28.500 | but we can go along like so.

00:05:31.580 | All right, cool.

00:05:32.780 | Now, if we take a look up here at the initial response

00:05:37.740 | before printing out the response content,

00:05:40.300 | come to the start and we can see that it's an AI message.

00:05:43.380 | So it's the same type of object as this here.

00:05:46.820 | So that means that we can actually just append

00:05:49.860 | this AI message, our response, directly to messages here,

00:05:54.620 | and that will create the full conversation,

00:05:58.660 | including the latest response, all right?

00:06:00.660 | So that's what we're doing here.

00:06:02.060 | And then from there, we can just continue the conversation.

00:06:04.660 | So we will create a new human message prompt,

00:06:08.140 | we'll add that to our messages,

00:06:09.860 | and then we'll send all of those to chat GPT.

00:06:12.660 | Okay, so now what was the next question I asked?

00:06:16.420 | Why do physicists believe it can produce a unified theory?

00:06:20.940 | This is talking about string theory up here.

00:06:23.540 | And then it goes in and starts explaining

00:06:25.580 | that they believe that string theory has potential

00:06:28.060 | to produce a unified theory, because so on and so on.

00:06:31.100 | Okay, cool.

00:06:32.100 | Now, that is, I suppose, a core functionality

00:06:35.300 | of Lionchain's new chat features,

00:06:38.780 | but there are a few other things

00:06:40.620 | that they've introduced alongside these.

00:06:43.100 | So we have a few new prompt templates.

00:06:46.500 | So these new prompt templates, we have like a AI message,

00:06:51.020 | human message, and system message prompt template.

00:06:54.860 | And these are kind of just an extension

00:06:57.340 | of the original prompt templates in Lionchain.

00:07:00.780 | But when you use them, you have a couple of functions

00:07:03.940 | that will allow you to create your prompt template

00:07:07.540 | and output it as a system message, AI message,

00:07:10.740 | or user message.

00:07:12.180 | And you can also kind of like link them all together

00:07:14.980 | to create a list of messages that you then just pass

00:07:17.940 | straight into your chat endpoint.

00:07:20.380 | Now, I'm not super aware of like a huge number of reasons

00:07:25.380 | to use these right now, but these are part

00:07:30.620 | of the new features in Lionchain for chat.

00:07:32.980 | So I figure it is important to share these.

00:07:36.060 | And if it seems like something that would actually help you

00:07:41.060 | with whatever it is you're building, then that's great.

00:07:43.980 | You now know how, or you will know how to use them.

00:07:47.140 | So we'll come down to here.

00:07:49.140 | What I'm doing is I'm making sure

00:07:51.100 | I'm using the March model here.

00:07:53.580 | So we're going to set up our first system message,

00:07:57.260 | and we're going to create a human message,

00:08:00.100 | all right, our first input.

00:08:01.820 | Now, within this system message,

00:08:05.020 | I'm saying I want the responses to be no more

00:08:07.780 | than 100 characters long, including white space.

00:08:10.980 | And I want it to sign off every message

00:08:12.860 | with a random name like robot or Barbara, okay?

00:08:17.220 | We're just giving it tasks to do

00:08:18.940 | to see how well it follows these instructions.

00:08:21.220 | So run this, and now we make our first completion from this,

00:08:26.220 | and let's see how it does with those instructions.

00:08:29.220 | Okay, so the length is way out.

00:08:32.020 | Like we asked for 100 at maximum, it's 154.

00:08:35.540 | And it also didn't give us a sign off there as well.

00:08:40.420 | Now, this is kind of just an issue

00:08:43.860 | with the current version of ChatGPT.

00:08:48.100 | Okay, so with this version here.

00:08:49.780 | It's not very good at following system messages, apparently.

00:08:52.780 | It's kind of better to pass these instructions

00:08:55.220 | into your human message.

00:08:56.540 | But we might not want a user to have to specify these things.

00:09:01.620 | So maybe this is where we can use

00:09:03.820 | one of these prompt templates.

00:09:05.860 | So let's try.

00:09:06.740 | What we're gonna do is for every human message,

00:09:11.180 | we're gonna pass it into here, right?

00:09:12.940 | So we had that question before.

00:09:14.780 | Hi AI, how are you?

00:09:16.220 | What is quantum physics?

00:09:17.460 | We'd pass that into input here.

00:09:19.180 | And what I'm going to do is after the question,

00:09:21.900 | I'm gonna say, can you keep the response to no more

00:09:25.740 | than 100 characters, including white space,

00:09:29.380 | and sign off with a random name?

00:09:31.220 | So we create our prompt like this.

00:09:34.180 | So we have this LangChain prompts chat,

00:09:36.420 | and we have human message prompt template.

00:09:38.380 | And we also need to use this chat prompt template.

00:09:40.900 | I feel like this is a little bit convoluted at the moment,

00:09:45.900 | but this is just how it is.

00:09:47.740 | So we're gonna go through it anyway.

00:09:50.980 | So we have human message from template,

00:09:53.060 | and we're gonna have this, okay?

00:09:55.700 | This is just like a typical prompt template in LangChain.

00:09:59.300 | Then once we have that human template,

00:10:01.020 | we need to pass it to this chat prompt template

00:10:05.060 | and from messages, right?

00:10:07.180 | And then in there, we pass in like a list

00:10:10.820 | of whatever messages we want, right?

00:10:13.020 | So I will give you another example soon,

00:10:15.540 | but we can also pass multiple messages here,

00:10:18.900 | like system message, human message, AI message, and so on,

00:10:22.220 | which I found some way of kind of using that.

00:10:27.140 | So, I mean, I think that's kind of interesting at least.

00:10:30.300 | So we format that with some input.

00:10:31.940 | So we pass in this input here,

00:10:34.700 | how AI, how you, what is quantum physics,

00:10:36.980 | and let's see what we get from that.

00:10:39.100 | So we get this chat prompt value object,

00:10:42.460 | and it has messages, a list of messages in there.

00:10:46.180 | First message, and the only message is,

00:10:48.860 | hi AI, how are you, what is quantum physics, right?

00:10:51.540 | So that's our input.

00:10:52.780 | And then we have, can you keep the response

00:10:54.380 | to no more than 100 characters, including white space,

00:10:56.900 | sign off, so on and so on, right?

00:10:58.740 | So that is our template that is being applied based on this.

00:11:03.220 | All right, cool.

00:11:04.500 | Now, we come down to here,

00:11:07.260 | and to use our human message prompt template

00:11:11.460 | as a typical message or human message,

00:11:16.140 | we actually need to use this here, right?

00:11:19.940 | So we take our chat prompt value,

00:11:22.460 | which we created here and we can see here,

00:11:25.180 | and we can either pass it as two messages,

00:11:30.060 | that will give us the format that we need

00:11:32.700 | in order to pass it to chat GPT,

00:11:35.100 | or we can just create a string out of it, okay?

00:11:38.260 | So this would, I suppose, be pretty much the same

00:11:41.820 | as using an F string.

00:11:44.460 | The only thing that's added onto there

00:11:45.900 | is we have this human, right?

00:11:48.900 | Otherwise, it's literally just taking this

00:11:52.820 | and converting it into a string.

00:11:54.340 | Okay, so let's see if this approach works.

00:11:57.100 | Here, I'm just kind of throwing it all together.

00:11:59.740 | So we have the chat prompt, the input,

00:12:02.980 | hi, hi, how are you doing?

00:12:04.260 | That's going to create this,

00:12:07.580 | and then I'm going to convert two messages

00:12:09.700 | and take the first message,

00:12:11.060 | which is the only message in there,

00:12:12.740 | which is essentially going to give us this human message.

00:12:16.460 | Okay, and did I, can you keep the response

00:12:20.420 | to no more than 100 characters?

00:12:23.220 | And then here, I put 60 characters.

00:12:24.860 | So maybe I just put 100 here,

00:12:28.980 | and we'll try 60 later as well.

00:12:30.980 | So let's run that.

00:12:32.460 | All right, so you can see now it's listening.

00:12:35.060 | So we said 100 characters here, didn't really work,

00:12:37.860 | but then we did, we've also added it

00:12:39.580 | into this user or human message here,

00:12:43.220 | and now it's sticking to that, right?

00:12:45.500 | So length is good, let's keep going,

00:12:47.260 | and we also have this signed off with bot route.

00:12:50.060 | So that is working by adding those instructions

00:12:53.380 | into the user message, we're getting better results.

00:12:56.220 | Okay, cool.

00:12:57.540 | In my last attempt, I actually got slightly

00:12:59.820 | over the character limit, apparently.

00:13:01.980 | So I mean, we can run this again,

00:13:03.820 | and okay, so we've set temperature to zero here,

00:13:07.020 | and because of that, we would expect the output

00:13:09.740 | to be the same every single time.

00:13:11.700 | So it's deterministic.

00:13:13.140 | So quantum physics is very small scale.

00:13:15.540 | I think it's every time it's outputting the same.

00:13:17.940 | Okay, cool, and then let's continue with this.

00:13:21.980 | So I want to show you,

00:13:23.460 | we can use this prompt templating method

00:13:25.980 | in order to build a initial set of messages

00:13:28.620 | that we can basically use as like examples,

00:13:32.060 | like few shot training for our chat model.

00:13:36.020 | So what we can do like here,

00:13:38.540 | we've done 100 characters, right?

00:13:42.060 | Maybe we can go even lower,

00:13:44.100 | but maybe in that case, we might need to give some examples

00:13:47.860 | to the system, right?

00:13:49.380 | So let's do that.

00:13:51.260 | We're going to have this character limit,

00:13:53.900 | and we're going to have this sign off inputs or variables.

00:13:57.340 | For the human message,

00:13:58.540 | we're just going to pass in the input there, right?

00:14:03.540 | So for this first one,

00:14:04.820 | we're not going to pass in those instructions,

00:14:06.900 | because we're actually going to create this human message,

00:14:09.380 | and we're also going to create the following AI message

00:14:11.580 | as an example to the chatbot as to how it should respond.

00:14:17.340 | Okay, and we put all of these together.

00:14:19.980 | So we have the system template,

00:14:22.260 | the human template, and the AI template,

00:14:24.500 | like know that we're using AI message prompt template,

00:14:27.020 | human message prompt template,

00:14:28.300 | and system message prompt template for each of those.

00:14:30.980 | And what we do is create a list of messages.

00:14:34.820 | So it goes obviously the system message first,

00:14:36.780 | the human message second, and the AI message third.

00:14:40.820 | And these are the templates, right?

00:14:43.220 | So what we then do is we take our chat prompt,

00:14:46.260 | which is a list of these,

00:14:47.940 | and we format that prompt with our input.

00:14:50.740 | So we have the character limit,

00:14:52.100 | which we're going to set to 50,

00:14:53.420 | so half of what we had before, making it harder.

00:14:56.420 | I'm going to say the sign off has to be this robot,

00:15:00.140 | and the input is going to be the same as before.

00:15:04.180 | And then we're giving an example response, right?

00:15:07.180 | So good is the physics of small things.

00:15:09.780 | That example response is going to automatically

00:15:13.100 | have the sign off added to it.

00:15:14.620 | All right, so let's run this,

00:15:16.860 | and let's see what we get.

00:15:17.820 | So system message, you are a helpful assistant.

00:15:21.060 | You keep responses, no more than 50 characters long.

00:15:24.100 | You sign off every message with robot McRobot,

00:15:26.740 | so we can see where those are being added there.

00:15:30.220 | Human message, hi AI, what is quantum physics?

00:15:34.340 | So it's, you know,

00:15:35.500 | because we're just passing the input in there.

00:15:37.780 | And then we have the AI message.

00:15:39.340 | Good, it's physics of small things, robot McRobot, okay?

00:15:44.940 | Very short answer.

00:15:45.900 | And let's just see if that helps the system

00:15:50.340 | produce just very short answers.

00:15:52.540 | So we run this, and we get atoms, electrons, photons,

00:15:57.540 | and then it does the sign off.

00:15:58.820 | So I think that's a pretty good response.

00:16:01.220 | Let's try again.

00:16:03.020 | Right, so here we go slightly over.

00:16:05.860 | So we get like four characters over there.

00:16:09.380 | So maybe we can be more strict again.

00:16:11.740 | So what we can do is we add in that template

00:16:16.300 | that we used before where we add in the answer

00:16:18.860 | in less than the character limit, including white space.

00:16:23.060 | Okay, we're going to add that to our human message.

00:16:26.980 | So we're going to create the human message like this.

00:16:29.980 | So the chat prompt template and so on and so on.

00:16:33.540 | Okay, cool.

00:16:34.380 | So is it like particle physics?

00:16:36.260 | That's why I asked before, yeah.

00:16:37.780 | So asking the same question,

00:16:38.940 | but we're adding that onto the end.

00:16:41.020 | So it's like particle physics,

00:16:42.540 | answering less than 50 characters, including white space.

00:16:45.220 | Then what I'm going to do is,

00:16:46.980 | so within the messages right now,

00:16:49.900 | we have this query that we created before,

00:16:52.780 | where we need to replace that query

00:16:54.460 | with our new modified query.

00:16:56.540 | So I'm going to remove the most recent message in messages,

00:17:00.660 | and I'm going to send it with this new human prompt value,

00:17:04.940 | which is this kind of new version

00:17:08.620 | with those instructions added to the end.

00:17:10.620 | So let's have a look, make sure we have the right format.

00:17:13.260 | So system, human, AI, human, AI.

00:17:16.700 | That's the last correct response we got from the AI.

00:17:19.340 | And now we have the new modified human message.

00:17:22.260 | Okay, cool.

00:17:23.100 | So it's like this, answering less than 50 characters.

00:17:26.260 | And now we pass that through our chat system again,

00:17:30.300 | and we get way shorter.

00:17:32.980 | So 28, it's like, yes, similar.

00:17:35.180 | Because we're saying,

00:17:37.260 | we're telling it in the most recent query again,

00:17:40.500 | but you need to answer in less than 50 characters.

00:17:42.820 | All right, so what I mentioned before

00:17:45.300 | is that maybe this is a little bit convoluted,

00:17:47.420 | and that's not to say that there aren't use cases for this.

00:17:51.180 | It's just that it would be unfair of me

00:17:54.660 | to tell you all of this and be like,

00:17:56.540 | this is how you use it.

00:17:57.660 | And then like, just miss something

00:17:59.980 | that could make things much easier

00:18:02.020 | in most, at least most use cases,

00:18:04.460 | or simpler use cases, or something along those lines.

00:18:07.620 | All right, so I would say it's arguable

00:18:09.340 | as to whether all of the above that we just did

00:18:11.940 | would be any simpler than using an F-string.

00:18:14.660 | So we have this input.

00:18:16.740 | Okay, cool, is it like particle physics?

00:18:18.620 | That's our most recent question, right?

00:18:21.540 | And we can just use an F-string, right?

00:18:23.100 | So we have the F-string here,

00:18:24.940 | and we have the human message, the content,

00:18:28.260 | and then we just say,

00:18:29.660 | answering less than the character limit,

00:18:31.660 | which is set here, characters, including whitespace, right?

00:18:35.940 | And the result of that is basically the same.

00:18:39.420 | Look, we have this, right?

00:18:41.460 | That's the same as all of this code here.

00:18:45.540 | So now all of this code, is that right?

00:18:49.660 | Yeah, plus this.

00:18:52.100 | So it depends, I don't know,

00:18:54.820 | it depends on your use case,

00:18:55.980 | like what you're doing, how you prefer to write this,

00:18:58.380 | but just be aware that you can also do this

00:19:01.220 | and you get the same result.

00:19:02.940 | So now we can see again,

00:19:05.180 | popping the last message to remove the one

00:19:07.900 | that we created using the prompt template,

00:19:10.860 | and then I'm adding the one that we created

00:19:12.340 | using the F-string approach,

00:19:14.220 | and we get this, right, it's the same thing.

00:19:16.500 | There's no difference there,

00:19:17.980 | we can process it through chat GPT again,

00:19:20.580 | and we'll get the same response.

00:19:22.780 | Okay, so just wanted to make you aware of that.

00:19:25.660 | But yeah, that's it for this video.

00:19:28.260 | We've covered, I think the vast majority

00:19:30.660 | of the new chat features within the Limechain,

00:19:34.780 | and naturally, like we saw at the end there,

00:19:37.380 | we don't need to use all of them.

00:19:38.980 | Like the prompt templates, you can use of course,

00:19:41.660 | if you have a reason to, but it isn't needed

00:19:44.300 | if you have a simpler approach to doing these things.

00:19:47.500 | But yeah, it's cool to see this being implemented

00:19:51.980 | in Limechain, and although I haven't been through it yet,

00:19:54.860 | I'm hoping that there will be good integrations

00:19:58.900 | of these new chat features with like

00:20:01.780 | their conversation memory, their retrieval augmentation,

00:20:04.460 | and everything else within Limechain,

00:20:06.220 | which is, that's where the value

00:20:08.580 | of this sort of thing will come in.

00:20:09.780 | Right now, it's kind of like a simple wrapper

00:20:11.780 | on top of OpenAI's chat completion endpoint,

00:20:15.260 | but hopefully with all of the agents,

00:20:19.260 | conversation memory, and retrieval augmentation components

00:20:22.140 | that Limechain offers, we'll get a tight integration

00:20:24.820 | between those, and that's where this will be useful.

00:20:27.900 | So, that's it for this video.

00:20:31.460 | I hope all of this has been useful and interesting.

00:20:33.900 | But for now, thank you very much for watching,

00:20:36.300 | and I will see you again in the next one.

00:20:38.860 | Bye.

00:20:39.700 | (gentle music)

00:20:42.280 | (gentle music)

00:20:44.860 | (gentle music)

00:20:47.440 | (gentle music)

00:20:50.020 | you

00:20:52.080 | you

Chat with OpenAI in LangChain - #5

Chapters