Back to Index

Prompt Engineering with OpenAI's GPT-3 and other LLMs


Chapters

0:0 What is Prompt Engineering?
2:15 Anatomy of a Prompt
7:3 Building prompts with OpenAI GPT 3
8:35 Generation / completion temperature
13:50 Few-shot training with examples
16:8 Adding external information
22:55 Max context window
27:18 Final thoughts on Gen AI and prompts

Transcript

Prompt engineering is an immersion discipline within the world of generative AI and it describes the art of writing good intentional prompts that produce an output from a generative AI model that we actually want. And to a degree it is an art, it's very hard to explain how to create a good prompt, but to a larger extent there's a very logical process and way that we can go into creating prompts that can be described and easily applied to produce better output from large language models and of course the generative art tools as well.

Good prompts are the key to producing good outputs for these models. Using different types of prompts we can modify the mode or type of task that is being performed and we can even use prompts to train models to some degree and the performance of doing that is actually surprisingly good.

Now there's a few things to learn with prompt engineering and I think one of the best ways to maybe think about this discipline is to think of it as a more abstract version of programming. So throughout the last decades we've seen programming languages become more and more abstract. Prompts for AI models is almost like the next step, it's a super abstract programming of an AI model and that's exactly how I want to approach this here.

I want to discuss prompts and building good prompts, the different parts of a prompt and how we apply them to large language models. Now when we think of large language models there are a lot of different use cases that they are used for. We see things like creative writing, question answering, text summarization, data extraction, like a ton of these completely different things and with each of these different tasks we're not actually doing anything different in terms of the model.

The models are all the same for each one of these tasks. The difference is the prompts themselves. Now what do these prompts look like? Well we can typically break them apart into a few components. So we have the instructions of a prompt, any external information or we can also call these contexts quite commonly as well.

We would also have the user input or a query and we can also prime our prompt with what we call an output indicator and this is usually just a little word at the end. Now not all prompts require all of these components but often a good prompt will use one or more of them.

So starting with instructions. Instructions tell the model what to do and this is a very key part of more instruction based models like OpenAI's Text DaVinci 003 and through these instructions we try to define what we would like the model to do and that means how it should use some inputs and how it should format outputs and what it should consider whilst it's going through that process and we would always put these instructions at the very top of our prompt.

I'll explain a little bit more about that pretty soon. Following this we have our external information or contexts and these are additional parts of information that we feed into the model via the prompt. These can be things that we manually insert into the prompt, information that we pull in through a long-term memory component, a vector database or we get it through other means like a web search API or a calculator API, something along those lines.

Following that we have our user input. Now that's pretty obvious that's just the input from a particular user. It depends on what you're doing like if you have a text summarization use case they might input a two-page chunk of text and we might want to summarize that into a paragraph or on the other side maybe it's a question answering tool and in that case the user might just type in a few words and question mark and that is their question, that is their user input.

So of course that can vary as well. Then finally we have our output indicator this is essentially the start of what we would like the model to begin generating. So it is it's kind of like a way of indicating to the model hey okay now it's time for you to start you know writing something and I want you to start writing something based on this first little chunk of text.

So a good example or very clear example at least in my view is when you have code generation model you want it to generate python code you know you give instructions to do so then your output indicator will just be the word import all in lowercase because most python scripts will actually begin with the word import because you're going to be importing your libraries like import numpy and so on.

On the other hand if you were building like a conversational chatbot this output indicator might be like the name of the chatbot followed by a colon as if you're sort of in a chat log. Okay so they're the four main components of a prompt that we're going to talk about and that we're going to actually use to construct our prompts throughout this video.

Okay so let's have a look at an example here we have the sort of our prompt up here this is the instruction okay so that's right at the top of the prompt answer the question based on the context below if the question cannot be answered using the information provided to answer with I don't know.

Okay so this is a form of conservative Q&A so given a question a user question we want the model to answer based only on information that we can verify. Okay and that verified information is something that is our external information here our context that is also fed into the prompt right and we're saying if that information is not contained within this context here or we'll probably usually have a list of context I want you to say I don't know.

In this case if the model answers or makes something up it can lead on to pretty bad results so we really don't want it to make anything up because they tend to do that pretty often. So we have our instructions we have our external information or the context then we have user query which is down here okay so the question which libraries and model providers offer LLMs and then we have that final bit is that output indicator so this is like okay now you can start answering the question.

Now this is a pretty good prompt it's clear we have our instructions we have some more external information we have a question we have the output indicator at the end there. Okay so let's have a look at how we will actually implement these things. So we're going to work through this notebook here if you'd like to follow along and run the notebook yourself you can do there'll be a link in the top of the video right now and also in the video description.

The first thing we need to do is pip install the OpenAI library initially that's the only library we need there'll be another one a little bit later on which I'll explain when we get to it and we've come down to the first code block we see we have this prompt this is the same one I just went through so I'm going to run that and then initialize my OpenAI instance using my OpenAI API key.

So if you need this you can get it from here which is just at this so beta.openai.com account API keys which you can just access you log into your account you create a new secret key and just copy that into the notebook. Okay so once you have authenticated with that we're going to generate from that prompt that we just created.

Okay so all we do is we go to completion endpoint create we're using the text DaVinci 003 model it's one of the most recent instructional models and then we print out the response which will be in this path of the JSON returned. Now here you can see that it stops pretty like suddenly here now the reason for that is because our max token length isn't very long and we'll explain that a little bit more later on but what we first need to do is just increase that length right now and what we're going to do is just set that to 256 so max tokens equals 256.

Okay and let's just see if that does answer the question which libraries are modified as offer large language models and that's exactly right. Okay so we have HuggingFace, OpenAI and Cohere right and then alternatively if we do not have the correct information within the context the model should play I don't know because we have this a little bit here so I'm just going to put in the context libraries are a place full of books and we would actually hopefully assume that the model is going to put output I don't know.

Now let me just copy the max tokens again put that in here and we see that it follows our instructions it says I don't know. Okay great so that's just a simple prompt now what's next let's come down here and what we're going to talk about is the temperature within our completion endpoint.

So we can think of the temperature parameter as telling us how random the model can be or how creative the model can be and it simply represents the probability of the model to choose a word that is not actually the first choice of the model and this works because when the model is predicting tokens or words it is actually assigning a probability distribution over all possible words or tokens that it could output.

So you know let's say we have all of these different words here or tokens I should say and there are hundreds of thousands of these it's not six of these but what we're essentially doing is the model is going through these and it's kind of assigning a probability distribution so maybe this one is a high probability this one's low again this one's kind of big and this one up here is the most likely one right so this is the probability here and in this case if we have the temperature set to zero the word that is going to be chosen is this one here okay because there's no randomness in the model it's just going to always choose the highest probability token.

If instead we turn the temperature up to one there is a lot more randomness it may still choose this token here because it has the highest probability but it will also consider this because there's still a decent probability there okay and to a lesser extent it will also consider this bit here this token to a lesser degree this one to a lesser degree this one and so on right so by increasing the temperature we increase the the weighting almost of these other possible tokens as being the select tokens within the generation of the model and this will generally lead to more creative or kind of random outputs so considering this if we have our conservative fact-based Q&A we might actually want to turn the temperature down okay more towards the zero because we don't want the model to make anything up we want it to be not creative and just factual whereas if the idea is we want to produce some creative writing or some interesting chatbot conversations then we might turn the temperature up because it will usually produce something more entertaining and interesting and to some degree kind of surprising to see in a good way.

So let's take a look at what that might look like so here we're going to create a conversation with an amusing chatbot so again I want to just set the I need to add in the match tokens here okay and I'll add that in so we're going to start and the temperature is going to be very low so the default is is one with the opening eye completion endpoint running states zero so the following is a conversation with a funny chatbot the chatbot's responses are amusing and entertaining now this is like the instructions okay and then down here we have the style of conversation we have the user's input here and then we have our next output indicator so let's run this and we get this so oh just hanging out and having a good time what about you you know it's not it's pretty predictable it's not that interesting it's definitely not funny it's not amusing or entertaining now let's run this again last time I got a good answer to this but it doesn't always provide a good answer so you know we can try so let me put in the match tokens again it's a little more interesting hang out with my electronic friends it's always a good time a bit better hang out contemplating the meaning of life okay so a few better answers I don't think any of these are as good as the first one I got but they're not bad and definitely much better than the you know the first one we have here which is just a bit kind of plain and boring so let's move on to what we would call few shot training for our model now what we'll often find is that sometimes these models don't quite get what we are looking for and we can actually see that in this example here so the following is conversation similar thing again AI assistant assistant is stochastic witty producing creative and funny responses to the user's questions here are some examples and then in here what we can do is actually put in some examples but before we do that I just want to remove that and I want to show you what we get to begin with so let's run this we've turned the temperature up so it should come up with something kind of creative to a degree but we'll see that it's not not particularly interesting okay so yeah maybe if you want a serious answer this is what you're looking for but I'm not asking for anything seriously I want something stochastic witty creative and amusing right so what if we come down here and we actually add a few examples to our prompt okay so this is what we would refer to as few shot training we're adding a few essentially training examples into the prompt so we're going to say okay user how are you AI is just kind of being stochastic I can't complain sometimes I still do the user will ask what time is it and the AI says it's time to get a watch okay and let's see if we get a a less serious answer to what is the meaning of life again let me put in the max tokens the previous answer is pretty good I don't know if we're going to get a good one like that again but let's try okay and you know we get something we get something good again as a great philosopher Shrek once said Fiona the meaning in life is to find your passion so kind of useful but also pretty amusing so this is a much better response and we got that by just providing a few examples beforehand so we did some few shot training we showed a few training examples to model and all of a sudden it can produce a much better output now next thing I want to talk about is adding multiple contexts or adding a context in the maybe I think it was the first example we had a context in there but we manually wrote that okay we added that in there in reality what we do is something slightly different so let's consider the use case of question answering question answering we want the model to be factual we don't want it to make things up and ideally we would also like the model to be able to source where it's kind of getting this information from so what we essentially want here is some form of external source of information that we can feed into the model and we can also use to kind of fact check that what the model is saying is actually true or is at least coming from somewhere that is reliable now when we are feeding this type of information into a model via the prompt we would refer to a source knowledge and source knowledge as you might expect is just any type of knowledge that is fed into the model via the prompt and this is kind of the I don't want to say the opposite but this is an alternative to what we would call parametric knowledge which is knowledge that the model has learned during the training process and stores within the model weights themselves so that seems like if you ask the model who's the first man on the moon it's probably going to be able to say Neil Armstrong because it's kind of remembered that from during its training when it's seen you know tons and tons of human information and data but if you ask more specific pointed questions it can sometimes make things up or can provide an answer which is kind of generic and not actually that useful and that's where we would like to use this source knowledge to feed in more useful information now in this example we're just going to feed in a list of dummy external informations so in reality we'd probably use like a search engine api or a long-term memory component rather than just relying on a list like we're doing here but for the sake of simplicity this is all we're going to do so we have a few contexts here it's talking about large language models the latest models use nlp so on and so on it also talks about getting your api key from open ai talks about the open ai's api accessible via the open ai library or some bits of code and down here we also talk about accessing it via the line chain library now it's going to use all this information and it's going to use all that to build a better prompt and create a better output so what we do is we have our instructions at the top here as we did before then we have our external information our context now for gpt3 in particular what they recommend is that you separate your external information or your context from the rest of the prompt using like three of these or also you can use three of these not like that but like this okay we're going to stick with these and then when it comes to the prompts themselves you also separate them each one of the unique prompts so you know we have one here one here and so on we're separating each one of those with just two of these characters as well then we have our question and then we have our output indicator now let me actually just copy this i'm going to put it up here so we can just see what we're actually building here uh oh we need to run this context here now let's run this again okay and i need to i need to print it because it's a mess okay cool so still kind of messy but it it works and also points out to me that i've missed a little bit so we have the instructions here and then we have our separator and then we have our context now actually here i've got this the thing to to separate them but that's not exactly what we want we actually want to have some new line characters in there as well so add those and and to actually do that we're going to need to separate this bit as well so put context string in there and then we also here just put in the context string directly so come to here and then we get this sort of nicer format just by the way it will work even if you don't have this nice format but we should try and format it like this and so we have our context and each one of them is separated okay and then you go down you have a question and the answer at the end there that is what we want i'm going to replace this with the context string and then if we come down to here i'm also going to add in our max tokens great and let's run that okay cool so the question we'll just go up to the top here uh answer the question based on the context below and then answer i don't know if it doesn't know the answer you know same as before uh give me two examples of how to use openai's gpt3 model using python from start to finish okay so what we have is two options we can either use it via openai or can go via lang train both these are correct now the one question here is okay we added in these contexts but actually did it need those can we do this prompt without context and still get a decent answer we can try so answer the question and uh we don't have any context here same question same output indicator let's run that oh one thing we all need is the max tokens again okay and yeah we get this so using openai's gpt3 model with python to generate text you know yeah that is true but it's not it's not very useful and then here it's saying using gpt3 to generate images which isn't even possible so yeah not not really what you know not a good answer essentially so this is you know where we will see our certain information that source knowledge as actually being pretty useful now considering how big our prompts got with those contexts and you know that wasn't even that much information being fed into our prompts how big is too big like at what point are our prompts too large for us to actually use the model this is a pretty obviously important thing because if we go too big we're going to start throwing errors so let's begin taking a look at that so for text of entry 003 the what we call the context window which is the maximum number of tokens that text of entry 003 can handle within both the prompt and also the completion creation that is 4097 tokens okay not words but tokens so we can set the maximum completion length of our model as we saw before where we're setting the max tokens we can set that but that cannot cover the 4097 because we need to consider the number of tokens within our prompt as well the only problem is how do we measure the number of tokens within our prompt okay so for that we need to use OpenAI's tick token tokenizer right so for that you'll need to pip install tick token taking the early prompt and we'll just stick with the the one where I haven't added in the the new lines here let's take a look at how we can actually measure the number of tokens in there so we need to import tick token we create our prompt hopefully better than using us here we have our encoder name which I will explain in a moment and then we have our tokenizer so this is a tick token tokenizer which we initialize by using tick token get a coding then we have that code name and code name is important for each model here and then we just tokenize it in code our prompt check the length of that and we get 412 so we're feeding this prompt into our text imagery 003 model which is going to use 412 of our context window tokens okay which is that 4097 that you see here that leaves us with 3685 tokens for the completion okay for that so that max tokens we can set it to that number but no higher now one important thing to note here is that not all OpenAI models use the p50k base encoder there is this link here which will take you through to a table of the different encoders but in short as of recording this video they were this okay so for most GPT-3 models and also GPT-2 you're going to be using the r50k base or GPT-2 encoder for the code models and recent instructional models we use p50k base and for the embedding model hud002 we use cl100k base so if we now set our maximum tokens to this and generate let's let's take a look what we get now one thing you will note straight away is that the completion time is longer it's because we have more maximum tokens even if it doesn't fill that entire space with with tokens the computational time does take longer now let's see what we have here it's pretty much the same as what we got before even though we've increased the number of tokens because it doesn't need to use all of that space so the model doesn't now what happens if we increase max tokens by one more right so if we go to you can kind of see already if we go to 3,686 let's run that and we get this invalid request error that's because we've exceeded that maximum content length so we need to just be very cautious with that and obviously like if you're using this in a environment where you might expect the maximum content length to be exceeded you should probably consider implementing something like this check into your pipeline now that is actually everything for this video we've just actually been through a fair few things when it comes to building better prompts and how to handle some of the key parameters within your prompt completion inputs so as I mentioned prompts are super important and if you don't get them right your output is is not going to be that great to be honest so it's worth spending some time just learning more about prompts not just what we've covered here but you know maybe more and just especially more than anything experimenting with your prompts and considering other things depending on what you're doing like do you need to be pulling in more information for your prompts from external sources of information do you need to be modifying the temperature are there other variables in the completion endpoint that you could do modifying all these things are pretty important to consider if you're you know actually building something with any real value using large language models and another thing to point out is that this is not specific to GPT-3 models this is if you want to use coheres generation endpoints or completion endpoints or you want to use open source hugging face models like you want to use bloom or something for completion you should also consider these prompt engineering like rules of thumb or trips beyond that I think we've covered everything so that's it for this video I hope it's been useful and interesting but for now we'll leave it there so thank you very much for watching and I will see you again in the next one.

Bye!