Prompt Engineering with OpenAI's GPT-3 and other LLMs

00:00:00.000 | Prompt engineering is an immersion discipline within the world of generative AI and it describes

00:00:07.200 | the art of writing good intentional prompts that produce an output from a generative AI model

00:00:15.520 | that we actually want. And to a degree it is an art, it's very hard to explain how to create a

00:00:24.720 | good prompt, but to a larger extent there's a very logical process and way that we can go into

00:00:33.840 | creating prompts that can be described and easily applied to produce better output from large

00:00:41.520 | language models and of course the generative art tools as well. Good prompts are the key to producing

00:00:48.320 | good outputs for these models. Using different types of prompts we can modify the mode or type of task

00:00:54.880 | that is being performed and we can even use prompts to train models to some degree and the performance

00:01:03.120 | of doing that is actually surprisingly good. Now there's a few things to learn with prompt

00:01:09.920 | engineering and I think one of the best ways to maybe think about this discipline is to think of

00:01:17.280 | it as a more abstract version of programming. So throughout the last decades we've seen programming

00:01:23.760 | languages become more and more abstract. Prompts for AI models is almost like the next step, it's a

00:01:29.200 | super abstract programming of an AI model and that's exactly how I want to approach this here. I

00:01:36.080 | want to discuss prompts and building good prompts, the different parts of a prompt and how we apply

00:01:43.120 | them to large language models. Now when we think of large language models there are a lot of

00:01:50.080 | different use cases that they are used for. We see things like creative writing, question answering,

00:01:56.000 | text summarization, data extraction, like a ton of these completely different things and with each of

00:02:02.960 | these different tasks we're not actually doing anything different in terms of the model. The

00:02:08.960 | models are all the same for each one of these tasks. The difference is the prompts themselves.

00:02:15.120 | Now what do these prompts look like? Well we can typically break them apart into a few components.

00:02:21.680 | So we have the instructions of a prompt, any external information or we can also call these

00:02:28.800 | contexts quite commonly as well. We would also have the user input or a query and we can also

00:02:36.720 | prime our prompt with what we call an output indicator and this is usually just a little word

00:02:42.640 | at the end. Now not all prompts require all of these components but often a good prompt will

00:02:48.560 | use one or more of them. So starting with instructions. Instructions tell the model

00:02:54.480 | what to do and this is a very key part of more instruction based models like OpenAI's

00:03:02.240 | Text DaVinci 003 and through these instructions we try to define what we would like the model to do

00:03:09.520 | and that means how it should use some inputs and how it should format outputs and what it should

00:03:15.520 | consider whilst it's going through that process and we would always put these instructions at

00:03:20.560 | the very top of our prompt. I'll explain a little bit more about that pretty soon. Following this

00:03:25.920 | we have our external information or contexts and these are additional parts of information that we

00:03:32.960 | feed into the model via the prompt. These can be things that we manually insert into the prompt,

00:03:39.920 | information that we pull in through a long-term memory component, a vector database or we get it

00:03:46.880 | through other means like a web search API or a calculator API, something along those lines.

00:03:53.120 | Following that we have our user input. Now that's pretty obvious that's just the input from a

00:03:58.480 | particular user. It depends on what you're doing like if you have a text summarization use case

00:04:04.160 | they might input a two-page chunk of text and we might want to summarize that into a paragraph

00:04:10.400 | or on the other side maybe it's a question answering tool and in that case the user might

00:04:16.000 | just type in a few words and question mark and that is their question, that is their user input.

00:04:21.520 | So of course that can vary as well. Then finally we have our output indicator this is essentially

00:04:28.320 | the start of what we would like the model to begin generating. So it is it's kind of like a way of

00:04:34.560 | indicating to the model hey okay now it's time for you to start you know writing something and

00:04:39.440 | I want you to start writing something based on this first little chunk of text. So a good example

00:04:46.480 | or very clear example at least in my view is when you have code generation model you want it to

00:04:52.160 | generate python code you know you give instructions to do so then your output indicator will just be

00:04:57.680 | the word import all in lowercase because most python scripts will actually begin with the word

00:05:04.880 | import because you're going to be importing your libraries like import numpy and so on.

00:05:09.840 | On the other hand if you were building like a conversational chatbot this output indicator

00:05:16.560 | might be like the name of the chatbot followed by a colon as if you're sort of in a chat log.

00:05:23.280 | Okay so they're the four main components of a prompt that we're going to talk about and that

00:05:29.120 | we're going to actually use to construct our prompts throughout this video. Okay so let's

00:05:34.160 | have a look at an example here we have the sort of our prompt up here this is the instruction okay

00:05:41.200 | so that's right at the top of the prompt answer the question based on the context below if the

00:05:46.080 | question cannot be answered using the information provided to answer with I don't know. Okay so this

00:05:51.040 | is a form of conservative Q&A so given a question a user question we want the model to answer based

00:05:59.040 | only on information that we can verify. Okay and that verified information is something that

00:06:05.600 | is our external information here our context that is also fed into the prompt right and we're saying

00:06:12.720 | if that information is not contained within this context here or we'll probably usually have a list

00:06:18.320 | of context I want you to say I don't know. In this case if the model answers or makes something up

00:06:24.240 | it can lead on to pretty bad results so we really don't want it to make anything up

00:06:28.720 | because they tend to do that pretty often. So we have our instructions we have our external

00:06:34.480 | information or the context then we have user query which is down here okay so the question

00:06:40.080 | which libraries and model providers offer LLMs and then we have that final bit is that output

00:06:47.440 | indicator so this is like okay now you can start answering the question. Now this is a pretty good

00:06:53.920 | prompt it's clear we have our instructions we have some more external information we have a question

00:07:00.720 | we have the output indicator at the end there. Okay so let's have a look at how we will actually

00:07:06.240 | implement these things. So we're going to work through this notebook here if you'd like to follow

00:07:11.600 | along and run the notebook yourself you can do there'll be a link in the top of the video right

00:07:15.920 | now and also in the video description. The first thing we need to do is pip install the OpenAI

00:07:21.600 | library initially that's the only library we need there'll be another one a little bit later on

00:07:25.520 | which I'll explain when we get to it and we've come down to the first code block we see we have

00:07:30.960 | this prompt this is the same one I just went through so I'm going to run that and then

00:07:36.560 | initialize my OpenAI instance using my OpenAI API key. So if you need this you can get it from

00:07:46.320 | here which is just at this so beta.openai.com account API keys which you can just access you

00:07:54.400 | log into your account you create a new secret key and just copy that into the notebook. Okay so once

00:08:00.400 | you have authenticated with that we're going to generate from that prompt that we just created.

00:08:05.360 | Okay so all we do is we go to completion endpoint create we're using the text DaVinci

00:08:10.800 | 003 model it's one of the most recent instructional models and then we print out the response which

00:08:18.000 | will be in this path of the JSON returned. Now here you can see that it stops pretty like suddenly

00:08:25.760 | here now the reason for that is because our max token length isn't very long and we'll explain

00:08:31.520 | that a little bit more later on but what we first need to do is just increase that length right now

00:08:37.840 | and what we're going to do is just set that to 256 so max tokens equals 256. Okay and let's just

00:08:47.520 | see if that does answer the question which libraries are modified as offer large language

00:08:53.040 | models and that's exactly right. Okay so we have HuggingFace, OpenAI and Cohere right and then

00:09:00.720 | alternatively if we do not have the correct information within the context the model should

00:09:06.240 | play I don't know because we have this a little bit here so I'm just going to put in the context

00:09:12.000 | libraries are a place full of books and we would actually hopefully assume that the model is going

00:09:17.200 | to put output I don't know. Now let me just copy the max tokens again put that in here and we see

00:09:27.440 | that it follows our instructions it says I don't know. Okay great so that's just a simple prompt

00:09:33.120 | now what's next let's come down here and what we're going to talk about is the temperature within

00:09:40.160 | our completion endpoint. So we can think of the temperature parameter as telling us how random

00:09:50.640 | the model can be or how creative the model can be and it simply represents the probability of

00:09:57.360 | the model to choose a word that is not actually the first choice of the model and this works

00:10:03.440 | because when the model is predicting tokens or words it is actually assigning a probability

00:10:10.880 | distribution over all possible words or tokens that it could output. So you know let's say we

00:10:17.280 | have all of these different words here or tokens I should say and there are hundreds of thousands

00:10:22.960 | of these it's not six of these but what we're essentially doing is the model is going through

00:10:27.840 | these and it's kind of assigning a probability distribution so maybe this one is a high

00:10:31.600 | probability this one's low again this one's kind of big and this one up here is the most likely one

00:10:38.240 | right so this is the probability here and in this case if we have the temperature set to zero

00:10:45.040 | the word that is going to be chosen is this one here okay because there's no randomness in the

00:10:51.440 | model it's just going to always choose the highest probability token. If instead we turn the temperature

00:10:58.320 | up to one there is a lot more randomness it may still choose this token here because it has the

00:11:05.200 | highest probability but it will also consider this because there's still a decent probability there

00:11:11.360 | okay and to a lesser extent it will also consider this bit here this token to a lesser degree this

00:11:17.440 | one to a lesser degree this one and so on right so by increasing the temperature we increase

00:11:25.520 | the the weighting almost of these other possible tokens as being the select tokens within the

00:11:33.520 | generation of the model and this will generally lead to more creative or kind of random outputs

00:11:41.040 | so considering this if we have our conservative fact-based Q&A we might actually want to

00:11:50.000 | turn the temperature down okay more towards the zero because we don't want the model to make

00:11:55.280 | anything up we want it to be not creative and just factual whereas if the idea is we want to produce

00:12:04.560 | some creative writing or some interesting chatbot conversations then we might turn the temperature

00:12:11.760 | up because it will usually produce something more entertaining and interesting and to some degree

00:12:17.360 | kind of surprising to see in a good way. So let's take a look at what that might look like so here

00:12:23.120 | we're going to create a conversation with an amusing chatbot so again I want to just set the

00:12:29.840 | I need to add in the match tokens here okay and I'll add that in so we're going to start

00:12:36.960 | and the temperature is going to be very low so the default is is one with the opening eye

00:12:42.800 | completion endpoint running states zero so the following is a conversation with a funny chatbot

00:12:48.160 | the chatbot's responses are amusing and entertaining now this is like the instructions

00:12:53.920 | okay and then down here we have the style of conversation we have the user's input here and

00:13:00.080 | then we have our next output indicator so let's run this and we get this so oh just hanging out

00:13:06.160 | and having a good time what about you you know it's not it's pretty predictable it's not that

00:13:09.920 | interesting it's definitely not funny it's not amusing or entertaining now let's run this again

00:13:16.000 | last time I got a good answer to this but it doesn't always provide a good answer so

00:13:21.120 | you know we can try so let me put in the match tokens again it's a little more interesting

00:13:26.560 | hang out with my electronic friends it's always a good time a bit better hang out contemplating

00:13:32.720 | the meaning of life okay so a few better answers I don't think any of these are as good as the first

00:13:39.360 | one I got but they're not bad and definitely much better than the you know the first one we have

00:13:46.720 | here which is just a bit kind of plain and boring so let's move on to what we would call few shot

00:13:56.560 | training for our model now what we'll often find is that sometimes these models don't quite get

00:14:04.880 | what we are looking for and we can actually see that in this example here so the following is

00:14:11.280 | conversation similar thing again AI assistant assistant is stochastic witty producing creative

00:14:16.960 | and funny responses to the user's questions here are some examples and then in here what we can do

00:14:22.880 | is actually put in some examples but before we do that I just want to remove that and I want to show

00:14:28.640 | you what we get to begin with so let's run this we've turned the temperature up so it should come

00:14:36.960 | up with something kind of creative to a degree but we'll see that it's not not particularly

00:14:42.560 | interesting okay so yeah maybe if you want a serious answer this is what you're looking for

00:14:49.040 | but I'm not asking for anything seriously I want something stochastic witty creative and amusing

00:14:54.720 | right so what if we come down here and we actually add a few examples to our prompt okay so this is

00:15:03.520 | what we would refer to as few shot training we're adding a few essentially training examples into

00:15:09.600 | the prompt so we're going to say okay user how are you AI is just kind of being stochastic I can't

00:15:15.120 | complain sometimes I still do the user will ask what time is it and the AI says it's time to get

00:15:20.480 | a watch okay and let's see if we get a a less serious answer to what is the meaning of life

00:15:26.640 | again let me put in the max tokens the previous answer is pretty good I don't know if we're going

00:15:32.160 | to get a good one like that again but let's try okay and you know we get something we get something

00:15:39.760 | good again as a great philosopher Shrek once said Fiona the meaning in life is to find your passion

00:15:46.640 | so kind of useful but also pretty amusing so this is a much better response and we got that by just

00:15:55.840 | providing a few examples beforehand so we did some few shot training we showed a few training

00:16:03.600 | examples to model and all of a sudden it can produce a much better output now next thing I

00:16:09.600 | want to talk about is adding multiple contexts or adding a context in the maybe I think it was

00:16:15.520 | the first example we had a context in there but we manually wrote that okay we added that in there

00:16:20.720 | in reality what we do is something slightly different so let's consider the use case of

00:16:26.080 | question answering question answering we want the model to be factual we don't want it to make

00:16:32.080 | things up and ideally we would also like the model to be able to source where it's kind of getting

00:16:39.360 | this information from so what we essentially want here is some form of external source of information

00:16:46.320 | that we can feed into the model and we can also use to kind of fact check that what the model

00:16:52.800 | is saying is actually true or is at least coming from somewhere that is reliable now when we are

00:17:00.240 | feeding this type of information into a model via the prompt we would refer to a source knowledge

00:17:06.960 | and source knowledge as you might expect is just any type of knowledge that is fed into the model

00:17:14.400 | via the prompt and this is kind of the I don't want to say the opposite but this is an alternative

00:17:21.760 | to what we would call parametric knowledge which is knowledge that the model has learned during

00:17:27.600 | the training process and stores within the model weights themselves so that seems like if you ask

00:17:33.040 | the model who's the first man on the moon it's probably going to be able to say Neil Armstrong

00:17:38.160 | because it's kind of remembered that from during its training when it's seen you know tons and tons

00:17:43.120 | of human information and data but if you ask more specific pointed questions it can sometimes make

00:17:50.720 | things up or can provide an answer which is kind of generic and not actually that useful and that's

00:17:57.200 | where we would like to use this source knowledge to feed in more useful information now in this

00:18:04.080 | example we're just going to feed in a list of dummy external informations so in reality we'd

00:18:10.560 | probably use like a search engine api or a long-term memory component rather than just relying

00:18:16.880 | on a list like we're doing here but for the sake of simplicity this is all we're going to do so we

00:18:22.480 | have a few contexts here it's talking about large language models the latest models use nlp

00:18:28.160 | so on and so on it also talks about getting your api key from open ai talks about the open ai's

00:18:36.000 | api accessible via the open ai library or some bits of code and down here we also talk about

00:18:41.600 | accessing it via the line chain library now it's going to use all this information and it's going

00:18:46.880 | to use all that to build a better prompt and create a better output so what we do is we have

00:18:53.280 | our instructions at the top here as we did before then we have our external information our context

00:18:59.520 | now for gpt3 in particular what they recommend is that you separate your external information

00:19:06.960 | or your context from the rest of the prompt using like three of these or also you can

00:19:14.480 | use three of these not like that but like this okay we're going to stick with these and then

00:19:23.760 | when it comes to the prompts themselves you also separate them each one of the unique prompts so

00:19:29.280 | you know we have one here one here and so on we're separating each one of those with just two

00:19:34.560 | of these characters as well then we have our question and then we have our output indicator

00:19:42.720 | now let me actually just copy this i'm going to put it up here so we can just see

00:19:49.040 | what we're actually building here uh oh we need to run this context here now let's run this again

00:19:56.320 | okay and i need to i need to print it because it's a mess okay cool so still kind of messy but it it

00:20:06.880 | works and also points out to me that i've missed a little bit so we have the instructions here

00:20:13.680 | and then we have our separator and then we have our context now actually here i've got this the

00:20:20.160 | thing to to separate them but that's not exactly what we want we actually want to have some new

00:20:25.840 | line characters in there as well so add those and and to actually do that we're going to need to

00:20:36.800 | separate this bit as well so put context string in there and then we also here just put in the

00:20:46.960 | context string directly so come to here and then we get this sort of nicer format just by the way

00:20:59.280 | it will work even if you don't have this nice format but we should try and format it like this

00:21:06.560 | and so we have our context and each one of them is separated okay and then you go down you have

00:21:12.960 | a question and the answer at the end there that is what we want i'm going to replace this with

00:21:18.080 | the context string and then if we come down to here i'm also going to add in our max tokens

00:21:24.880 | great and let's run that okay cool so the question we'll just go up to the top here

00:21:35.040 | uh answer the question based on the context below and then answer i don't know if it doesn't know

00:21:40.800 | the answer you know same as before uh give me two examples of how to use openai's gpt3 model using

00:21:48.240 | python from start to finish okay so what we have is two options we can either use it via openai

00:21:56.080 | or can go via lang train both these are correct now the one question here is okay we added in

00:22:02.560 | these contexts but actually did it need those can we do this prompt without context and still get a

00:22:08.320 | decent answer we can try so answer the question and uh we don't have any context here same question

00:22:15.120 | same output indicator let's run that oh one thing we all need is the max tokens again

00:22:22.000 | okay and yeah we get this so using openai's gpt3 model with python to generate text

00:22:31.200 | you know yeah that is true but it's not it's not very useful and then here it's saying using

00:22:39.040 | gpt3 to generate images which isn't even possible so yeah not not really what you know not a good

00:22:45.600 | answer essentially so this is you know where we will see our certain information that source

00:22:52.640 | knowledge as actually being pretty useful now considering how big our prompts got with those

00:22:59.200 | contexts and you know that wasn't even that much information being fed into our prompts

00:23:05.200 | how big is too big like at what point are our prompts too large for us to actually use the

00:23:12.640 | model this is a pretty obviously important thing because if we go too big we're going to start

00:23:17.680 | throwing errors so let's begin taking a look at that so for text of entry 003 the what we

00:23:26.800 | call the context window which is the maximum number of tokens that text of entry 003 can handle

00:23:35.040 | within both the prompt and also the completion creation that is 4097 tokens okay not words

00:23:46.720 | but tokens so we can set the maximum completion length of our model as we saw before where we're

00:23:54.400 | setting the max tokens we can set that but that cannot cover the 4097 because we need to consider

00:24:01.120 | the number of tokens within our prompt as well the only problem is how do we measure the number

00:24:06.560 | of tokens within our prompt okay so for that we need to use OpenAI's tick token tokenizer

00:24:13.200 | right so for that you'll need to pip install tick token taking the early prompt and we'll just stick

00:24:18.640 | with the the one where I haven't added in the the new lines here let's take a look at how we can

00:24:23.440 | actually measure the number of tokens in there so we need to import tick token we create our prompt

00:24:30.960 | hopefully better than using us here we have our encoder name which I will explain in a moment

00:24:37.040 | and then we have our tokenizer so this is a tick token tokenizer which we initialize by using tick

00:24:45.600 | token get a coding then we have that code name and code name is important for each model here

00:24:50.320 | and then we just tokenize it in code our prompt check the length of that and we get 412 so we're

00:24:59.360 | feeding this prompt into our text imagery 003 model which is going to use 412 of our context

00:25:06.480 | window tokens okay which is that 4097 that you see here that leaves us with 3685 tokens for the

00:25:18.080 | completion okay for that so that max tokens we can set it to that number but no higher now one

00:25:25.120 | important thing to note here is that not all OpenAI models use the p50k base encoder there is

00:25:33.760 | this link here which will take you through to a table of the different encoders but in short

00:25:40.160 | as of recording this video they were this okay so for most GPT-3 models and also GPT-2 you're

00:25:46.880 | going to be using the r50k base or GPT-2 encoder for the code models and recent instructional

00:25:54.400 | models we use p50k base and for the embedding model hud002 we use cl100k base so if we now

00:26:04.880 | set our maximum tokens to this and generate let's let's take a look what we get now one thing you

00:26:11.680 | will note straight away is that the completion time is longer it's because we have more maximum

00:26:18.080 | tokens even if it doesn't fill that entire space with with tokens the computational time does take

00:26:24.240 | longer now let's see what we have here it's pretty much the same as what we got before

00:26:32.720 | even though we've increased the number of tokens because it doesn't need to use all of that space

00:26:37.920 | so the model doesn't now what happens if we increase max tokens by one more right so if we

00:26:45.440 | go to you can kind of see already if we go to 3,686 let's run that and we get this invalid

00:26:53.520 | request error that's because we've exceeded that maximum content length so we need to just be very

00:27:01.200 | cautious with that and obviously like if you're using this in a environment where you might expect

00:27:08.800 | the maximum content length to be exceeded you should probably consider implementing something

00:27:15.360 | like this check into your pipeline now that is actually everything for this video we've just

00:27:24.080 | actually been through a fair few things when it comes to building better prompts and how to handle

00:27:30.880 | some of the key parameters within your prompt completion inputs so as I mentioned prompts are

00:27:39.680 | super important and if you don't get them right your output is is not going to be that great to

00:27:46.400 | be honest so it's worth spending some time just learning more about prompts not just what we've

00:27:52.640 | covered here but you know maybe more and just especially more than anything experimenting

00:27:58.480 | with your prompts and considering other things depending on what you're doing like do you need

00:28:03.920 | to be pulling in more information for your prompts from external sources of information

00:28:08.800 | do you need to be modifying the temperature are there other variables in the completion endpoint

00:28:13.200 | that you could do modifying all these things are pretty important to consider if you're you know

00:28:19.360 | actually building something with any real value using large language models and another thing to

00:28:26.000 | point out is that this is not specific to GPT-3 models this is if you want to use coheres generation

00:28:34.640 | endpoints or completion endpoints or you want to use open source hugging face models like you want

00:28:41.280 | to use bloom or something for completion you should also consider these prompt engineering

00:28:48.320 | like rules of thumb or trips beyond that I think we've covered everything so that's it for this

00:28:56.000 | video I hope it's been useful and interesting but for now we'll leave it there so thank you

00:29:03.280 | very much for watching and I will see you again in the next one. Bye!

Prompt Engineering with OpenAI's GPT-3 and other LLMs

Chapters