Back to Index

Prompt Templates for GPT 3.5 and other LLMs - LangChain #2


Chapters

0:0 Why prompts are important
2:42 Structure of prompts
4:10 Langchain code Setup
5:56 Langchain's PromptTemplates
8:34 Few shot learning with LLMs
13:4 Few shot prompt templates in Langchain
16:9 Length-based example selectors
21:19 Other Langchain example selectors
22:12 Final notes on prompts + Langchain

Transcript

Today we're taking another look at the lang chain library and we're gonna be focusing on what are called prompt templates, which are a very core component of the library and this mirrors the usefulness of prompts for large language models in general. Now, although prompts maybe don't seem as interesting as the models themselves, they're actually a very critical component, particularly in today's world of large language models.

The reason I say that is in the past, when we consider different tasks within language, we all use different models for those. So named entity recognition or question answering, there were different models trained for each one of those purposes. Now, the separation between these tasks has over time decreased with the introduction of transform models, it became the case that you would pre-train a single big language model like BERT and then you would form transfer learning in order to just change a couple of layers at the end of the network in order to adapt it to different tasks.

And with the more recent adoption of large language models, the separation between different use cases has decreased even more. So now, rather than actually using different layers at the end of the same model, like we did with transform models or just using completely different models as we may have done before using transform models, we now actually use the same model for completely different tasks.

Now, things like named entity recognition, question answering, summarization, even translation are all done by the same models. The only thing that actually changes now is the prompt, the input that we feed into the model. We literally just say, can you do this? Or can you do something else, right?

That is all we're changing now. So the models themselves no longer need changing, it's actually just the inputs to those models that change in order for us to modify the task that we are performing. So with large language models, it turns out that prompts are the most important thing for us to learn.

The large language models themselves have already been trained. Sure, we can fine tune them. Sure, we can add an external knowledge base. Sure, we can do all these other things. But in the end, one of the key components for us to learn when we're using large language models is how to do prompts correctly.

The Langtrain library recognizes the importance of prompts and they have built an entire object class or a few object classes for them. In this video, that's what we're going to talk about. We're going to talk about prompt templates and a few of the other things that are kind of parallel to prompt templates.

So we're going to get started with just taking a look at this simple prompt here and we're going to break it down. So the top here, we have the instructions of the prompt. Here, we have context or external information. Here, a query. And here is what we call a output indicator.

Now, each one of these components serves a purpose. But actually, when we look at maybe what we would usually put into a large language model or what a user would put into a large language model, it's only this little bit here, okay? This little bit, which libraries and model providers offer large language models?

That's all we're actually expecting our users to put in there. So that would be our query. And what we're actually doing here is considering that that's the only thing our user is going to be inputting. We're actually providing all this other information in order to kind of guide the large language model to answer the question in the way that we think our user would like the question to be answered.

In this case, we're doing what I would call factual Q&A, which is what you can see here, answer the question based on the context below. So based on this information here. If it can't be answered, I want you to say, I don't know, okay? That's what I would call factual Q&A.

So we basically don't want to answer the question if we can't find the information behind that answer. And the reason we might want to do that is because large language models have this very strong tendency to make things up and make it seem super convincing. So it can be good to do this sort of thing in order to avoid that.

Now, let's go to our code. There are a few things that we need to install here. So pip install, LangChain and OpenAI. Of course, you can do this with other frameworks as well. It doesn't have to be OpenAI. You can use Cohere, you can use Hugging Face. It's completely up to you.

But for what we're doing here, the OpenAI model is very good. So here's our prompt. This is exactly the same as what I showed you before. So I'm going to run this. And you can see, I just can't explain what I just explained there. Instructions, context, question, output indicator.

So using LangChain, first thing we want to do is actually initialize a model. So I'm going to go OpenAI API key here. So I've already created this variable. This just contains my API key, which you can get from, if you click on this link here, there will be a link to this notebook, either at the top of the screen now or in the video description.

Otherwise, what you can do is head over to this web address here. So platform.openai.com/account/api-keys. That may change in the future, but for now, that is where you would go. So we initialize our model. I'm going to run this. And then what we can do is, we're going to just make a generation from the prompt that we just created.

So the prompt up here, right? This is just going to extract a few things. Okay, looks good. The only problem is like, we wouldn't typically want to write all of this within our prompt in this format, right? So like facades here is the user's query. So the user should be inputting whatever is here.

And as well as that, we have these contexts here. This would actually come in as an external source of information. So we also wouldn't hard-code that into our code either. So Lightning Chain has something called prompt templates, which will help us handle this. Now, for now, I'm just going to keep the context in there.

But what I am going to do is replace the query, the user's query with this. It just kind of looks like an F string in Python, but it's not an F string. Otherwise, we would have this up here. In fact, it is actually just a plain string, but it will be interpreted like that by the prompt template.

So what we need to do here is just replace where we would expect the query to go with query. And we need to make sure that within the input variables of this prompt template object, which we have imported here, we need to make sure that this here aligns with our F string type thing here.

And then after that, we just have our template, which is obviously just this here. And that will create our prompt template. Now, if we would like, we can just insert a query here. Okay, so you can see what I'm doing. We have prompt template.format. Now we have this query, which is just going to be the question that we had before, right?

And we can run this, print it, and we can see that we now have the same text, but now we have our query in there instead. And we can change this, like what is a large language model or something? What is a large language model? Right, we could put that in there and it would change our query here.

Now, in this case, we don't actually have an external knowledge-based setup, so the context doesn't change. That's fine, this is just an example. We don't need to worry about that right now. So what I'm going to do is take the first example where we have prompt template and we have the actual question, and we're going to feed it into this OpenAI object here.

This here is actually our large language model that we just initialized. And if we run that, we should get this here. Okay, so we basically, within these few lines of code, we kind of just replicated what we did up here, but a little more dynamically. So let's come back down and what I'm going to do is show you why we would actually use this because honestly, right now, it seems kind of pointless.

For example, we could just put this as an F-string and write some little code around it. It wouldn't be that hard. So what is the point of actually using this? Well, one, it's just nice, it's easy, it's clean, but two, this isn't the only thing it does. So if we come down, we can also do something like this.

So this is called a few-shot prompt templates. Now, this few-shot prompt template object is ideal for doing something that we would call few-shot learning for our large language models. And what few-shot learning refers to is the idea of feeding in a few examples into a already trained model and essentially training it on those few examples so that it can then actually perform well on a slightly different domain.

Now, the approach to few-shot learning can vary. In the more traditional sense, it would be that you're feeding in a few items to the model and training it on those few items as you usually would train a ML model. In this case, we're actually feeding these examples into the model via the prompt.

But this actually, it seems weird, but it makes sense because with large language models, there are two primary sources of knowledge. Those are the parametric knowledge and the source knowledge. The parametric knowledge is the knowledge that the large language model has learned during training and stored within the model's weights.

So something like, who was the first man on the moon? The model is going to be able to answer Neil Armstrong because it's already learned that information during training and it's managed to store that information within the model weights. The other type of knowledge, source knowledge, is different. That is where you're actually feeding the knowledge into the model at inference time via the model input, i.e.

via the prompt. So considering all of this, the idea behind Lionchain's few-shot prompt template object is to provide few-shot learning via the source knowledge, via the prompt. And to do this, we just add a few examples to the prompts that the model will then read as it's reading everything else.

So you remember earlier on, we had the instructions, context, query, and output indicator. In this case, it would be like we have instructions, examples, query, and output indicator. Now, let's take a look at where we might want to use this. Now, in this prompt, I'm saying the following is a conversation with an AI system.

It is typically sarcastic and witty, producing creative and amusing responses to the questions. Here are some examples. Actually, we're not doing that yet, so let's remove that. So this is all we have, right? So we have the instruction, and then we have what would be the user's query, and then we have the output indicator.

We set the temperature here to one so that just increases the randomness of the output, i.e., it will make it more creative, and then we can run this, right? And we get the meaning of life is whatever you make of it. I mean, to me, it's not sarcastic. It's not witty or creative.

It's not funny. So it's not really doing what I want it to do. So what we can do here is do few-shot learning. So this is the same. I've just added here are some examples onto the end there, and then I'm just adding a couple of examples. So kind of like sarcastic responses to our user's questions.

How are you? I can't complain. What time is it? It's time to get a watch. And then I'm going to ask the same question again at the end, and then we'll see what the model outputs. And it's not perfect, but we are more likely to get kind of like a less serious answer by putting in these less serious responses.

Now we can probably fine tune this. Like we can say the assistant is always sarcastic and witty. Here are some examples, like we can cut this bit out. And that might help us produce more precise answers. I need to edit this bit. And here we get quite a sarcastic answer of you need to ask someone who's actually living it, which I think is quite good.

Try a few more. Okay, somewhere between 42 and a double cheeseburger. It's good. 42 again, 42 again, and so on. So we're getting pretty good answers. I think we should have gone with this prompt from the start. Now we come down here. What we can do is just show you how these few-shot prompt templates work.

So we import few-shot prompt templates, and we create these examples. The examples, each one of them is going to have a query, and an answer. Okay, so you can see that here, this here would be our query, and this would be our answer. Okay, so we initialize that, and then we create what is called a example template.

Same thing as before, it looks like an F-string, but it actually isn't, or at least not yet. So we use the example template, and what we do is we actually create a new prompt template based on this example template. Okay, so we're creating like a example prompt. So it's going to take in the query and an answer this time.

Then we need to break apart our previous prompt into smaller components. Okay, so I'm going to, here are a few samples. We're going to use the same one as we used before. So this is just the instruction, and then the suffix here is essentially, well, actually we have two things.

We have the query itself that the user is going to put in, and then we have the output indicator. Then we go ahead and actually initialize our few-shot prompt template. We have our examples, which is this list up here. Also, one thing we should note here is that these, for every single example, needs to line up to this, okay?

We have our example prompt, which we have initialized here. We have the prefix, suffix, input variables, right? This is not the same as what we have coming into here, because this is actually just a query from the user. So it needs to satisfy this part here. And then we have this example separator.

So example separator is just what it's going to use to separate each one of those examples within the prompt that we're building. So let's run this, and we're going to say, what is the meaning of life again? And we'll just print this out so we can see. So the following excerpts, so on and so on, this is the same as before.

And we see that we've separated each one of these with two new lines. We say, you know, we have all those examples that we fed in through that list, okay? And then to generate with this, we do the same again. Okay, so we have our few-shot prompt template. We use format query to run this.

Okay, run it again. Doesn't like whatever I've done to the prompt. So let me come, here are some examples. So let's change this to some examples. I don't think that should make a big difference. And I'll just change the separator a little bit as well. Okay, and then we get our sort of joke answers, 42 again.

Okay, so we get a few good responses. Again, it's not perfect, but it's just an example. Now, what I actually want to show you now is, why would we also use this over just feeding things in with an fString? Well, there's also a little bit more logic that we can use.

So in a lot of cases, naturally, as with typical machine learning models, it's better to feed in more examples for training than less examples. And we should try and do that as well with what we are doing here, whether it's with feeding the examples into our prompt. So what I've done here is created a lot of these kind of examples, and we're just going to, yeah, we can just run these.

Now, we're going to want to feed in as many of these samples as possible, but at the same time, we might want to limit the number of examples we're actually feeding into it. So there are a few reasons for this. One, we don't want to create excessive texts that are separating the instructions and the query itself.

Sometimes that can be distracting for the model. And on the other hand, we can actually add in so many examples that we exceed the maximum context window that the model allows. So that's basically the number of tokens from your query or from your prompt and from your generation. You add those back together, and that creates your context window.

Every model, including the OpenAI model we're using here, has a maximum context window, and we can't exceed that, otherwise we're going to throw an error. So we definitely don't want to go over that limit. And another thing we might want to consider is that we don't want to use too many tokens because it costs money to run this.

So we might also want to limit the number of examples we're bringing through because of that. And we might want to limit the number of examples based on how long the user's query is. So if the user just has like a small three-word query, we can include more of our examples.

If the user is like kind of writing us a little bit of a poem, then we might want to limit the number of examples we're bringing through. And that is where we would use something like this. So there are a few of these, what we'd call example selectors. The most basic of those is called the length-based example selector.

With a length-based example selector, we would feed in our list of examples, we'd feed in our example prompt that we created earlier, and then we'd also select the maximum length. What we're doing here anyway, the default setting is super simple. All we're doing is splitting based on new line characters or white space.

So for example, with this text here, in this first bit, we have eight words, and then here we have another six words. So we can split based on new lines and spaces and we will get this, okay? And here is the number of words that we have there. That is all that this is doing.

So when we set max length, that's where setting the max length for the number of separate tokens based on white space and new lines. So from here, we're going to initialize the what I'm gonna call dynamic prompt template. Now, this is just a dynamic version of our few-shot prompt template.

So in here before, we just put in the examples. Okay, so we had examples equals examples. That's just saying feed in all the examples every time. This time, we've already fed in our list of examples to the example selector up here. So we can actually use this example selector to select from those examples, a certain number of them based on whatever prompt this few-shot prompt template will receive later on.

So let's run this. And actually I need to run up here as well. So run this. And what we're gonna do is just quite a small prompt here. So this would be four tokens, run this. And we can see there are a few examples here. So we have four examples in total before we get to our final part here, right?

And then if we wanted to run that, we again just pass it through OpenAI, right? And we get this kind of sarcastic, jokey answer. Now, let's try and ask a longer question. So this is what I mean when I'm saying occasionally maybe someone is going to write you a poem when they're querying something.

So we have, they're kind of just rambling on, right? It's much longer. So what happens if we query with this? Okay, we can see straight away that we actually get just one example being pulled through. So because this is a much longer question, we're not including as many examples.

And of course we can modify this as to what makes sense for us. So we can increase the max length here and we'll just rerun everything there. So we have the prompt template, we recreate it and then run that again with the same long question. Okay, here. And we can see that we're actually now including five various samples because we've just doubled the number of example words that are allowed through.

Now, this is just a small example of what we can do with prompt templates. For example, if we wanted to use different example selectors, we can do. So I showed you the very simple length-based example selector here. But we can do what I think is better things with this as well.

So we can actually base the samples that we include on similarity. So we embed our examples as vector embeddings and then we calculate similarity between them in order to, when we're asking a question, always try to include relevant examples rather than just kind of filling up with examples that are maybe not so relevant to the current query.

And then there are a few other ones as well. This one is very new, the Ngram overlap example selector. And we're going to cover all of these at some point in a future video. But for now, that's it for this video. You know, as you've seen, we've just gone through the basics of prompt templates and a few short prompt templates with a very simple example selector.

And for a lot of use cases, that's probably all you're going to need. So with that in mind, I'm going to leave it there for this video. So thank you very much for watching. I hope this has been useful and interesting and I will see you again in the next one, bye.

(soft music) (soft music) (soft music) (soft music) (soft music) (soft music) (soft music)