back to indexPrompt Engineering with OpenAI's GPT-3 and other LLMs
Chapters
0:0 What is Prompt Engineering?
2:15 Anatomy of a Prompt
7:3 Building prompts with OpenAI GPT 3
8:35 Generation / completion temperature
13:50 Few-shot training with examples
16:8 Adding external information
22:55 Max context window
27:18 Final thoughts on Gen AI and prompts
00:00:00.000 |
Prompt engineering is an immersion discipline within the world of generative AI and it describes 00:00:07.200 |
the art of writing good intentional prompts that produce an output from a generative AI model 00:00:15.520 |
that we actually want. And to a degree it is an art, it's very hard to explain how to create a 00:00:24.720 |
good prompt, but to a larger extent there's a very logical process and way that we can go into 00:00:33.840 |
creating prompts that can be described and easily applied to produce better output from large 00:00:41.520 |
language models and of course the generative art tools as well. Good prompts are the key to producing 00:00:48.320 |
good outputs for these models. Using different types of prompts we can modify the mode or type of task 00:00:54.880 |
that is being performed and we can even use prompts to train models to some degree and the performance 00:01:03.120 |
of doing that is actually surprisingly good. Now there's a few things to learn with prompt 00:01:09.920 |
engineering and I think one of the best ways to maybe think about this discipline is to think of 00:01:17.280 |
it as a more abstract version of programming. So throughout the last decades we've seen programming 00:01:23.760 |
languages become more and more abstract. Prompts for AI models is almost like the next step, it's a 00:01:29.200 |
super abstract programming of an AI model and that's exactly how I want to approach this here. I 00:01:36.080 |
want to discuss prompts and building good prompts, the different parts of a prompt and how we apply 00:01:43.120 |
them to large language models. Now when we think of large language models there are a lot of 00:01:50.080 |
different use cases that they are used for. We see things like creative writing, question answering, 00:01:56.000 |
text summarization, data extraction, like a ton of these completely different things and with each of 00:02:02.960 |
these different tasks we're not actually doing anything different in terms of the model. The 00:02:08.960 |
models are all the same for each one of these tasks. The difference is the prompts themselves. 00:02:15.120 |
Now what do these prompts look like? Well we can typically break them apart into a few components. 00:02:21.680 |
So we have the instructions of a prompt, any external information or we can also call these 00:02:28.800 |
contexts quite commonly as well. We would also have the user input or a query and we can also 00:02:36.720 |
prime our prompt with what we call an output indicator and this is usually just a little word 00:02:42.640 |
at the end. Now not all prompts require all of these components but often a good prompt will 00:02:48.560 |
use one or more of them. So starting with instructions. Instructions tell the model 00:02:54.480 |
what to do and this is a very key part of more instruction based models like OpenAI's 00:03:02.240 |
Text DaVinci 003 and through these instructions we try to define what we would like the model to do 00:03:09.520 |
and that means how it should use some inputs and how it should format outputs and what it should 00:03:15.520 |
consider whilst it's going through that process and we would always put these instructions at 00:03:20.560 |
the very top of our prompt. I'll explain a little bit more about that pretty soon. Following this 00:03:25.920 |
we have our external information or contexts and these are additional parts of information that we 00:03:32.960 |
feed into the model via the prompt. These can be things that we manually insert into the prompt, 00:03:39.920 |
information that we pull in through a long-term memory component, a vector database or we get it 00:03:46.880 |
through other means like a web search API or a calculator API, something along those lines. 00:03:53.120 |
Following that we have our user input. Now that's pretty obvious that's just the input from a 00:03:58.480 |
particular user. It depends on what you're doing like if you have a text summarization use case 00:04:04.160 |
they might input a two-page chunk of text and we might want to summarize that into a paragraph 00:04:10.400 |
or on the other side maybe it's a question answering tool and in that case the user might 00:04:16.000 |
just type in a few words and question mark and that is their question, that is their user input. 00:04:21.520 |
So of course that can vary as well. Then finally we have our output indicator this is essentially 00:04:28.320 |
the start of what we would like the model to begin generating. So it is it's kind of like a way of 00:04:34.560 |
indicating to the model hey okay now it's time for you to start you know writing something and 00:04:39.440 |
I want you to start writing something based on this first little chunk of text. So a good example 00:04:46.480 |
or very clear example at least in my view is when you have code generation model you want it to 00:04:52.160 |
generate python code you know you give instructions to do so then your output indicator will just be 00:04:57.680 |
the word import all in lowercase because most python scripts will actually begin with the word 00:05:04.880 |
import because you're going to be importing your libraries like import numpy and so on. 00:05:09.840 |
On the other hand if you were building like a conversational chatbot this output indicator 00:05:16.560 |
might be like the name of the chatbot followed by a colon as if you're sort of in a chat log. 00:05:23.280 |
Okay so they're the four main components of a prompt that we're going to talk about and that 00:05:29.120 |
we're going to actually use to construct our prompts throughout this video. Okay so let's 00:05:34.160 |
have a look at an example here we have the sort of our prompt up here this is the instruction okay 00:05:41.200 |
so that's right at the top of the prompt answer the question based on the context below if the 00:05:46.080 |
question cannot be answered using the information provided to answer with I don't know. Okay so this 00:05:51.040 |
is a form of conservative Q&A so given a question a user question we want the model to answer based 00:05:59.040 |
only on information that we can verify. Okay and that verified information is something that 00:06:05.600 |
is our external information here our context that is also fed into the prompt right and we're saying 00:06:12.720 |
if that information is not contained within this context here or we'll probably usually have a list 00:06:18.320 |
of context I want you to say I don't know. In this case if the model answers or makes something up 00:06:24.240 |
it can lead on to pretty bad results so we really don't want it to make anything up 00:06:28.720 |
because they tend to do that pretty often. So we have our instructions we have our external 00:06:34.480 |
information or the context then we have user query which is down here okay so the question 00:06:40.080 |
which libraries and model providers offer LLMs and then we have that final bit is that output 00:06:47.440 |
indicator so this is like okay now you can start answering the question. Now this is a pretty good 00:06:53.920 |
prompt it's clear we have our instructions we have some more external information we have a question 00:07:00.720 |
we have the output indicator at the end there. Okay so let's have a look at how we will actually 00:07:06.240 |
implement these things. So we're going to work through this notebook here if you'd like to follow 00:07:11.600 |
along and run the notebook yourself you can do there'll be a link in the top of the video right 00:07:15.920 |
now and also in the video description. The first thing we need to do is pip install the OpenAI 00:07:21.600 |
library initially that's the only library we need there'll be another one a little bit later on 00:07:25.520 |
which I'll explain when we get to it and we've come down to the first code block we see we have 00:07:30.960 |
this prompt this is the same one I just went through so I'm going to run that and then 00:07:36.560 |
initialize my OpenAI instance using my OpenAI API key. So if you need this you can get it from 00:07:46.320 |
here which is just at this so beta.openai.com account API keys which you can just access you 00:07:54.400 |
log into your account you create a new secret key and just copy that into the notebook. Okay so once 00:08:00.400 |
you have authenticated with that we're going to generate from that prompt that we just created. 00:08:05.360 |
Okay so all we do is we go to completion endpoint create we're using the text DaVinci 00:08:10.800 |
003 model it's one of the most recent instructional models and then we print out the response which 00:08:18.000 |
will be in this path of the JSON returned. Now here you can see that it stops pretty like suddenly 00:08:25.760 |
here now the reason for that is because our max token length isn't very long and we'll explain 00:08:31.520 |
that a little bit more later on but what we first need to do is just increase that length right now 00:08:37.840 |
and what we're going to do is just set that to 256 so max tokens equals 256. Okay and let's just 00:08:47.520 |
see if that does answer the question which libraries are modified as offer large language 00:08:53.040 |
models and that's exactly right. Okay so we have HuggingFace, OpenAI and Cohere right and then 00:09:00.720 |
alternatively if we do not have the correct information within the context the model should 00:09:06.240 |
play I don't know because we have this a little bit here so I'm just going to put in the context 00:09:12.000 |
libraries are a place full of books and we would actually hopefully assume that the model is going 00:09:17.200 |
to put output I don't know. Now let me just copy the max tokens again put that in here and we see 00:09:27.440 |
that it follows our instructions it says I don't know. Okay great so that's just a simple prompt 00:09:33.120 |
now what's next let's come down here and what we're going to talk about is the temperature within 00:09:40.160 |
our completion endpoint. So we can think of the temperature parameter as telling us how random 00:09:50.640 |
the model can be or how creative the model can be and it simply represents the probability of 00:09:57.360 |
the model to choose a word that is not actually the first choice of the model and this works 00:10:03.440 |
because when the model is predicting tokens or words it is actually assigning a probability 00:10:10.880 |
distribution over all possible words or tokens that it could output. So you know let's say we 00:10:17.280 |
have all of these different words here or tokens I should say and there are hundreds of thousands 00:10:22.960 |
of these it's not six of these but what we're essentially doing is the model is going through 00:10:27.840 |
these and it's kind of assigning a probability distribution so maybe this one is a high 00:10:31.600 |
probability this one's low again this one's kind of big and this one up here is the most likely one 00:10:38.240 |
right so this is the probability here and in this case if we have the temperature set to zero 00:10:45.040 |
the word that is going to be chosen is this one here okay because there's no randomness in the 00:10:51.440 |
model it's just going to always choose the highest probability token. If instead we turn the temperature 00:10:58.320 |
up to one there is a lot more randomness it may still choose this token here because it has the 00:11:05.200 |
highest probability but it will also consider this because there's still a decent probability there 00:11:11.360 |
okay and to a lesser extent it will also consider this bit here this token to a lesser degree this 00:11:17.440 |
one to a lesser degree this one and so on right so by increasing the temperature we increase 00:11:25.520 |
the the weighting almost of these other possible tokens as being the select tokens within the 00:11:33.520 |
generation of the model and this will generally lead to more creative or kind of random outputs 00:11:41.040 |
so considering this if we have our conservative fact-based Q&A we might actually want to 00:11:50.000 |
turn the temperature down okay more towards the zero because we don't want the model to make 00:11:55.280 |
anything up we want it to be not creative and just factual whereas if the idea is we want to produce 00:12:04.560 |
some creative writing or some interesting chatbot conversations then we might turn the temperature 00:12:11.760 |
up because it will usually produce something more entertaining and interesting and to some degree 00:12:17.360 |
kind of surprising to see in a good way. So let's take a look at what that might look like so here 00:12:23.120 |
we're going to create a conversation with an amusing chatbot so again I want to just set the 00:12:29.840 |
I need to add in the match tokens here okay and I'll add that in so we're going to start 00:12:36.960 |
and the temperature is going to be very low so the default is is one with the opening eye 00:12:42.800 |
completion endpoint running states zero so the following is a conversation with a funny chatbot 00:12:48.160 |
the chatbot's responses are amusing and entertaining now this is like the instructions 00:12:53.920 |
okay and then down here we have the style of conversation we have the user's input here and 00:13:00.080 |
then we have our next output indicator so let's run this and we get this so oh just hanging out 00:13:06.160 |
and having a good time what about you you know it's not it's pretty predictable it's not that 00:13:09.920 |
interesting it's definitely not funny it's not amusing or entertaining now let's run this again 00:13:16.000 |
last time I got a good answer to this but it doesn't always provide a good answer so 00:13:21.120 |
you know we can try so let me put in the match tokens again it's a little more interesting 00:13:26.560 |
hang out with my electronic friends it's always a good time a bit better hang out contemplating 00:13:32.720 |
the meaning of life okay so a few better answers I don't think any of these are as good as the first 00:13:39.360 |
one I got but they're not bad and definitely much better than the you know the first one we have 00:13:46.720 |
here which is just a bit kind of plain and boring so let's move on to what we would call few shot 00:13:56.560 |
training for our model now what we'll often find is that sometimes these models don't quite get 00:14:04.880 |
what we are looking for and we can actually see that in this example here so the following is 00:14:11.280 |
conversation similar thing again AI assistant assistant is stochastic witty producing creative 00:14:16.960 |
and funny responses to the user's questions here are some examples and then in here what we can do 00:14:22.880 |
is actually put in some examples but before we do that I just want to remove that and I want to show 00:14:28.640 |
you what we get to begin with so let's run this we've turned the temperature up so it should come 00:14:36.960 |
up with something kind of creative to a degree but we'll see that it's not not particularly 00:14:42.560 |
interesting okay so yeah maybe if you want a serious answer this is what you're looking for 00:14:49.040 |
but I'm not asking for anything seriously I want something stochastic witty creative and amusing 00:14:54.720 |
right so what if we come down here and we actually add a few examples to our prompt okay so this is 00:15:03.520 |
what we would refer to as few shot training we're adding a few essentially training examples into 00:15:09.600 |
the prompt so we're going to say okay user how are you AI is just kind of being stochastic I can't 00:15:15.120 |
complain sometimes I still do the user will ask what time is it and the AI says it's time to get 00:15:20.480 |
a watch okay and let's see if we get a a less serious answer to what is the meaning of life 00:15:26.640 |
again let me put in the max tokens the previous answer is pretty good I don't know if we're going 00:15:32.160 |
to get a good one like that again but let's try okay and you know we get something we get something 00:15:39.760 |
good again as a great philosopher Shrek once said Fiona the meaning in life is to find your passion 00:15:46.640 |
so kind of useful but also pretty amusing so this is a much better response and we got that by just 00:15:55.840 |
providing a few examples beforehand so we did some few shot training we showed a few training 00:16:03.600 |
examples to model and all of a sudden it can produce a much better output now next thing I 00:16:09.600 |
want to talk about is adding multiple contexts or adding a context in the maybe I think it was 00:16:15.520 |
the first example we had a context in there but we manually wrote that okay we added that in there 00:16:20.720 |
in reality what we do is something slightly different so let's consider the use case of 00:16:26.080 |
question answering question answering we want the model to be factual we don't want it to make 00:16:32.080 |
things up and ideally we would also like the model to be able to source where it's kind of getting 00:16:39.360 |
this information from so what we essentially want here is some form of external source of information 00:16:46.320 |
that we can feed into the model and we can also use to kind of fact check that what the model 00:16:52.800 |
is saying is actually true or is at least coming from somewhere that is reliable now when we are 00:17:00.240 |
feeding this type of information into a model via the prompt we would refer to a source knowledge 00:17:06.960 |
and source knowledge as you might expect is just any type of knowledge that is fed into the model 00:17:14.400 |
via the prompt and this is kind of the I don't want to say the opposite but this is an alternative 00:17:21.760 |
to what we would call parametric knowledge which is knowledge that the model has learned during 00:17:27.600 |
the training process and stores within the model weights themselves so that seems like if you ask 00:17:33.040 |
the model who's the first man on the moon it's probably going to be able to say Neil Armstrong 00:17:38.160 |
because it's kind of remembered that from during its training when it's seen you know tons and tons 00:17:43.120 |
of human information and data but if you ask more specific pointed questions it can sometimes make 00:17:50.720 |
things up or can provide an answer which is kind of generic and not actually that useful and that's 00:17:57.200 |
where we would like to use this source knowledge to feed in more useful information now in this 00:18:04.080 |
example we're just going to feed in a list of dummy external informations so in reality we'd 00:18:10.560 |
probably use like a search engine api or a long-term memory component rather than just relying 00:18:16.880 |
on a list like we're doing here but for the sake of simplicity this is all we're going to do so we 00:18:22.480 |
have a few contexts here it's talking about large language models the latest models use nlp 00:18:28.160 |
so on and so on it also talks about getting your api key from open ai talks about the open ai's 00:18:36.000 |
api accessible via the open ai library or some bits of code and down here we also talk about 00:18:41.600 |
accessing it via the line chain library now it's going to use all this information and it's going 00:18:46.880 |
to use all that to build a better prompt and create a better output so what we do is we have 00:18:53.280 |
our instructions at the top here as we did before then we have our external information our context 00:18:59.520 |
now for gpt3 in particular what they recommend is that you separate your external information 00:19:06.960 |
or your context from the rest of the prompt using like three of these or also you can 00:19:14.480 |
use three of these not like that but like this okay we're going to stick with these and then 00:19:23.760 |
when it comes to the prompts themselves you also separate them each one of the unique prompts so 00:19:29.280 |
you know we have one here one here and so on we're separating each one of those with just two 00:19:34.560 |
of these characters as well then we have our question and then we have our output indicator 00:19:42.720 |
now let me actually just copy this i'm going to put it up here so we can just see 00:19:49.040 |
what we're actually building here uh oh we need to run this context here now let's run this again 00:19:56.320 |
okay and i need to i need to print it because it's a mess okay cool so still kind of messy but it it 00:20:06.880 |
works and also points out to me that i've missed a little bit so we have the instructions here 00:20:13.680 |
and then we have our separator and then we have our context now actually here i've got this the 00:20:20.160 |
thing to to separate them but that's not exactly what we want we actually want to have some new 00:20:25.840 |
line characters in there as well so add those and and to actually do that we're going to need to 00:20:36.800 |
separate this bit as well so put context string in there and then we also here just put in the 00:20:46.960 |
context string directly so come to here and then we get this sort of nicer format just by the way 00:20:59.280 |
it will work even if you don't have this nice format but we should try and format it like this 00:21:06.560 |
and so we have our context and each one of them is separated okay and then you go down you have 00:21:12.960 |
a question and the answer at the end there that is what we want i'm going to replace this with 00:21:18.080 |
the context string and then if we come down to here i'm also going to add in our max tokens 00:21:24.880 |
great and let's run that okay cool so the question we'll just go up to the top here 00:21:35.040 |
uh answer the question based on the context below and then answer i don't know if it doesn't know 00:21:40.800 |
the answer you know same as before uh give me two examples of how to use openai's gpt3 model using 00:21:48.240 |
python from start to finish okay so what we have is two options we can either use it via openai 00:21:56.080 |
or can go via lang train both these are correct now the one question here is okay we added in 00:22:02.560 |
these contexts but actually did it need those can we do this prompt without context and still get a 00:22:08.320 |
decent answer we can try so answer the question and uh we don't have any context here same question 00:22:15.120 |
same output indicator let's run that oh one thing we all need is the max tokens again 00:22:22.000 |
okay and yeah we get this so using openai's gpt3 model with python to generate text 00:22:31.200 |
you know yeah that is true but it's not it's not very useful and then here it's saying using 00:22:39.040 |
gpt3 to generate images which isn't even possible so yeah not not really what you know not a good 00:22:45.600 |
answer essentially so this is you know where we will see our certain information that source 00:22:52.640 |
knowledge as actually being pretty useful now considering how big our prompts got with those 00:22:59.200 |
contexts and you know that wasn't even that much information being fed into our prompts 00:23:05.200 |
how big is too big like at what point are our prompts too large for us to actually use the 00:23:12.640 |
model this is a pretty obviously important thing because if we go too big we're going to start 00:23:17.680 |
throwing errors so let's begin taking a look at that so for text of entry 003 the what we 00:23:26.800 |
call the context window which is the maximum number of tokens that text of entry 003 can handle 00:23:35.040 |
within both the prompt and also the completion creation that is 4097 tokens okay not words 00:23:46.720 |
but tokens so we can set the maximum completion length of our model as we saw before where we're 00:23:54.400 |
setting the max tokens we can set that but that cannot cover the 4097 because we need to consider 00:24:01.120 |
the number of tokens within our prompt as well the only problem is how do we measure the number 00:24:06.560 |
of tokens within our prompt okay so for that we need to use OpenAI's tick token tokenizer 00:24:13.200 |
right so for that you'll need to pip install tick token taking the early prompt and we'll just stick 00:24:18.640 |
with the the one where I haven't added in the the new lines here let's take a look at how we can 00:24:23.440 |
actually measure the number of tokens in there so we need to import tick token we create our prompt 00:24:30.960 |
hopefully better than using us here we have our encoder name which I will explain in a moment 00:24:37.040 |
and then we have our tokenizer so this is a tick token tokenizer which we initialize by using tick 00:24:45.600 |
token get a coding then we have that code name and code name is important for each model here 00:24:50.320 |
and then we just tokenize it in code our prompt check the length of that and we get 412 so we're 00:24:59.360 |
feeding this prompt into our text imagery 003 model which is going to use 412 of our context 00:25:06.480 |
window tokens okay which is that 4097 that you see here that leaves us with 3685 tokens for the 00:25:18.080 |
completion okay for that so that max tokens we can set it to that number but no higher now one 00:25:25.120 |
important thing to note here is that not all OpenAI models use the p50k base encoder there is 00:25:33.760 |
this link here which will take you through to a table of the different encoders but in short 00:25:40.160 |
as of recording this video they were this okay so for most GPT-3 models and also GPT-2 you're 00:25:46.880 |
going to be using the r50k base or GPT-2 encoder for the code models and recent instructional 00:25:54.400 |
models we use p50k base and for the embedding model hud002 we use cl100k base so if we now 00:26:04.880 |
set our maximum tokens to this and generate let's let's take a look what we get now one thing you 00:26:11.680 |
will note straight away is that the completion time is longer it's because we have more maximum 00:26:18.080 |
tokens even if it doesn't fill that entire space with with tokens the computational time does take 00:26:24.240 |
longer now let's see what we have here it's pretty much the same as what we got before 00:26:32.720 |
even though we've increased the number of tokens because it doesn't need to use all of that space 00:26:37.920 |
so the model doesn't now what happens if we increase max tokens by one more right so if we 00:26:45.440 |
go to you can kind of see already if we go to 3,686 let's run that and we get this invalid 00:26:53.520 |
request error that's because we've exceeded that maximum content length so we need to just be very 00:27:01.200 |
cautious with that and obviously like if you're using this in a environment where you might expect 00:27:08.800 |
the maximum content length to be exceeded you should probably consider implementing something 00:27:15.360 |
like this check into your pipeline now that is actually everything for this video we've just 00:27:24.080 |
actually been through a fair few things when it comes to building better prompts and how to handle 00:27:30.880 |
some of the key parameters within your prompt completion inputs so as I mentioned prompts are 00:27:39.680 |
super important and if you don't get them right your output is is not going to be that great to 00:27:46.400 |
be honest so it's worth spending some time just learning more about prompts not just what we've 00:27:52.640 |
covered here but you know maybe more and just especially more than anything experimenting 00:27:58.480 |
with your prompts and considering other things depending on what you're doing like do you need 00:28:03.920 |
to be pulling in more information for your prompts from external sources of information 00:28:08.800 |
do you need to be modifying the temperature are there other variables in the completion endpoint 00:28:13.200 |
that you could do modifying all these things are pretty important to consider if you're you know 00:28:19.360 |
actually building something with any real value using large language models and another thing to 00:28:26.000 |
point out is that this is not specific to GPT-3 models this is if you want to use coheres generation 00:28:34.640 |
endpoints or completion endpoints or you want to use open source hugging face models like you want 00:28:41.280 |
to use bloom or something for completion you should also consider these prompt engineering 00:28:48.320 |
like rules of thumb or trips beyond that I think we've covered everything so that's it for this 00:28:56.000 |
video I hope it's been useful and interesting but for now we'll leave it there so thank you 00:29:03.280 |
very much for watching and I will see you again in the next one. Bye!