back to index

Prompt Engineering with OpenAI's GPT-3 and other LLMs


Chapters

0:0 What is Prompt Engineering?
2:15 Anatomy of a Prompt
7:3 Building prompts with OpenAI GPT 3
8:35 Generation / completion temperature
13:50 Few-shot training with examples
16:8 Adding external information
22:55 Max context window
27:18 Final thoughts on Gen AI and prompts

Whisper Transcript | Transcript Only Page

00:00:00.000 | Prompt engineering is an immersion discipline within the world of generative AI and it describes
00:00:07.200 | the art of writing good intentional prompts that produce an output from a generative AI model
00:00:15.520 | that we actually want. And to a degree it is an art, it's very hard to explain how to create a
00:00:24.720 | good prompt, but to a larger extent there's a very logical process and way that we can go into
00:00:33.840 | creating prompts that can be described and easily applied to produce better output from large
00:00:41.520 | language models and of course the generative art tools as well. Good prompts are the key to producing
00:00:48.320 | good outputs for these models. Using different types of prompts we can modify the mode or type of task
00:00:54.880 | that is being performed and we can even use prompts to train models to some degree and the performance
00:01:03.120 | of doing that is actually surprisingly good. Now there's a few things to learn with prompt
00:01:09.920 | engineering and I think one of the best ways to maybe think about this discipline is to think of
00:01:17.280 | it as a more abstract version of programming. So throughout the last decades we've seen programming
00:01:23.760 | languages become more and more abstract. Prompts for AI models is almost like the next step, it's a
00:01:29.200 | super abstract programming of an AI model and that's exactly how I want to approach this here. I
00:01:36.080 | want to discuss prompts and building good prompts, the different parts of a prompt and how we apply
00:01:43.120 | them to large language models. Now when we think of large language models there are a lot of
00:01:50.080 | different use cases that they are used for. We see things like creative writing, question answering,
00:01:56.000 | text summarization, data extraction, like a ton of these completely different things and with each of
00:02:02.960 | these different tasks we're not actually doing anything different in terms of the model. The
00:02:08.960 | models are all the same for each one of these tasks. The difference is the prompts themselves.
00:02:15.120 | Now what do these prompts look like? Well we can typically break them apart into a few components.
00:02:21.680 | So we have the instructions of a prompt, any external information or we can also call these
00:02:28.800 | contexts quite commonly as well. We would also have the user input or a query and we can also
00:02:36.720 | prime our prompt with what we call an output indicator and this is usually just a little word
00:02:42.640 | at the end. Now not all prompts require all of these components but often a good prompt will
00:02:48.560 | use one or more of them. So starting with instructions. Instructions tell the model
00:02:54.480 | what to do and this is a very key part of more instruction based models like OpenAI's
00:03:02.240 | Text DaVinci 003 and through these instructions we try to define what we would like the model to do
00:03:09.520 | and that means how it should use some inputs and how it should format outputs and what it should
00:03:15.520 | consider whilst it's going through that process and we would always put these instructions at
00:03:20.560 | the very top of our prompt. I'll explain a little bit more about that pretty soon. Following this
00:03:25.920 | we have our external information or contexts and these are additional parts of information that we
00:03:32.960 | feed into the model via the prompt. These can be things that we manually insert into the prompt,
00:03:39.920 | information that we pull in through a long-term memory component, a vector database or we get it
00:03:46.880 | through other means like a web search API or a calculator API, something along those lines.
00:03:53.120 | Following that we have our user input. Now that's pretty obvious that's just the input from a
00:03:58.480 | particular user. It depends on what you're doing like if you have a text summarization use case
00:04:04.160 | they might input a two-page chunk of text and we might want to summarize that into a paragraph
00:04:10.400 | or on the other side maybe it's a question answering tool and in that case the user might
00:04:16.000 | just type in a few words and question mark and that is their question, that is their user input.
00:04:21.520 | So of course that can vary as well. Then finally we have our output indicator this is essentially
00:04:28.320 | the start of what we would like the model to begin generating. So it is it's kind of like a way of
00:04:34.560 | indicating to the model hey okay now it's time for you to start you know writing something and
00:04:39.440 | I want you to start writing something based on this first little chunk of text. So a good example
00:04:46.480 | or very clear example at least in my view is when you have code generation model you want it to
00:04:52.160 | generate python code you know you give instructions to do so then your output indicator will just be
00:04:57.680 | the word import all in lowercase because most python scripts will actually begin with the word
00:05:04.880 | import because you're going to be importing your libraries like import numpy and so on.
00:05:09.840 | On the other hand if you were building like a conversational chatbot this output indicator
00:05:16.560 | might be like the name of the chatbot followed by a colon as if you're sort of in a chat log.
00:05:23.280 | Okay so they're the four main components of a prompt that we're going to talk about and that
00:05:29.120 | we're going to actually use to construct our prompts throughout this video. Okay so let's
00:05:34.160 | have a look at an example here we have the sort of our prompt up here this is the instruction okay
00:05:41.200 | so that's right at the top of the prompt answer the question based on the context below if the
00:05:46.080 | question cannot be answered using the information provided to answer with I don't know. Okay so this
00:05:51.040 | is a form of conservative Q&A so given a question a user question we want the model to answer based
00:05:59.040 | only on information that we can verify. Okay and that verified information is something that
00:06:05.600 | is our external information here our context that is also fed into the prompt right and we're saying
00:06:12.720 | if that information is not contained within this context here or we'll probably usually have a list
00:06:18.320 | of context I want you to say I don't know. In this case if the model answers or makes something up
00:06:24.240 | it can lead on to pretty bad results so we really don't want it to make anything up
00:06:28.720 | because they tend to do that pretty often. So we have our instructions we have our external
00:06:34.480 | information or the context then we have user query which is down here okay so the question
00:06:40.080 | which libraries and model providers offer LLMs and then we have that final bit is that output
00:06:47.440 | indicator so this is like okay now you can start answering the question. Now this is a pretty good
00:06:53.920 | prompt it's clear we have our instructions we have some more external information we have a question
00:07:00.720 | we have the output indicator at the end there. Okay so let's have a look at how we will actually
00:07:06.240 | implement these things. So we're going to work through this notebook here if you'd like to follow
00:07:11.600 | along and run the notebook yourself you can do there'll be a link in the top of the video right
00:07:15.920 | now and also in the video description. The first thing we need to do is pip install the OpenAI
00:07:21.600 | library initially that's the only library we need there'll be another one a little bit later on
00:07:25.520 | which I'll explain when we get to it and we've come down to the first code block we see we have
00:07:30.960 | this prompt this is the same one I just went through so I'm going to run that and then
00:07:36.560 | initialize my OpenAI instance using my OpenAI API key. So if you need this you can get it from
00:07:46.320 | here which is just at this so beta.openai.com account API keys which you can just access you
00:07:54.400 | log into your account you create a new secret key and just copy that into the notebook. Okay so once
00:08:00.400 | you have authenticated with that we're going to generate from that prompt that we just created.
00:08:05.360 | Okay so all we do is we go to completion endpoint create we're using the text DaVinci
00:08:10.800 | 003 model it's one of the most recent instructional models and then we print out the response which
00:08:18.000 | will be in this path of the JSON returned. Now here you can see that it stops pretty like suddenly
00:08:25.760 | here now the reason for that is because our max token length isn't very long and we'll explain
00:08:31.520 | that a little bit more later on but what we first need to do is just increase that length right now
00:08:37.840 | and what we're going to do is just set that to 256 so max tokens equals 256. Okay and let's just
00:08:47.520 | see if that does answer the question which libraries are modified as offer large language
00:08:53.040 | models and that's exactly right. Okay so we have HuggingFace, OpenAI and Cohere right and then
00:09:00.720 | alternatively if we do not have the correct information within the context the model should
00:09:06.240 | play I don't know because we have this a little bit here so I'm just going to put in the context
00:09:12.000 | libraries are a place full of books and we would actually hopefully assume that the model is going
00:09:17.200 | to put output I don't know. Now let me just copy the max tokens again put that in here and we see
00:09:27.440 | that it follows our instructions it says I don't know. Okay great so that's just a simple prompt
00:09:33.120 | now what's next let's come down here and what we're going to talk about is the temperature within
00:09:40.160 | our completion endpoint. So we can think of the temperature parameter as telling us how random
00:09:50.640 | the model can be or how creative the model can be and it simply represents the probability of
00:09:57.360 | the model to choose a word that is not actually the first choice of the model and this works
00:10:03.440 | because when the model is predicting tokens or words it is actually assigning a probability
00:10:10.880 | distribution over all possible words or tokens that it could output. So you know let's say we
00:10:17.280 | have all of these different words here or tokens I should say and there are hundreds of thousands
00:10:22.960 | of these it's not six of these but what we're essentially doing is the model is going through
00:10:27.840 | these and it's kind of assigning a probability distribution so maybe this one is a high
00:10:31.600 | probability this one's low again this one's kind of big and this one up here is the most likely one
00:10:38.240 | right so this is the probability here and in this case if we have the temperature set to zero
00:10:45.040 | the word that is going to be chosen is this one here okay because there's no randomness in the
00:10:51.440 | model it's just going to always choose the highest probability token. If instead we turn the temperature
00:10:58.320 | up to one there is a lot more randomness it may still choose this token here because it has the
00:11:05.200 | highest probability but it will also consider this because there's still a decent probability there
00:11:11.360 | okay and to a lesser extent it will also consider this bit here this token to a lesser degree this
00:11:17.440 | one to a lesser degree this one and so on right so by increasing the temperature we increase
00:11:25.520 | the the weighting almost of these other possible tokens as being the select tokens within the
00:11:33.520 | generation of the model and this will generally lead to more creative or kind of random outputs
00:11:41.040 | so considering this if we have our conservative fact-based Q&A we might actually want to
00:11:50.000 | turn the temperature down okay more towards the zero because we don't want the model to make
00:11:55.280 | anything up we want it to be not creative and just factual whereas if the idea is we want to produce
00:12:04.560 | some creative writing or some interesting chatbot conversations then we might turn the temperature
00:12:11.760 | up because it will usually produce something more entertaining and interesting and to some degree
00:12:17.360 | kind of surprising to see in a good way. So let's take a look at what that might look like so here
00:12:23.120 | we're going to create a conversation with an amusing chatbot so again I want to just set the
00:12:29.840 | I need to add in the match tokens here okay and I'll add that in so we're going to start
00:12:36.960 | and the temperature is going to be very low so the default is is one with the opening eye
00:12:42.800 | completion endpoint running states zero so the following is a conversation with a funny chatbot
00:12:48.160 | the chatbot's responses are amusing and entertaining now this is like the instructions
00:12:53.920 | okay and then down here we have the style of conversation we have the user's input here and
00:13:00.080 | then we have our next output indicator so let's run this and we get this so oh just hanging out
00:13:06.160 | and having a good time what about you you know it's not it's pretty predictable it's not that
00:13:09.920 | interesting it's definitely not funny it's not amusing or entertaining now let's run this again
00:13:16.000 | last time I got a good answer to this but it doesn't always provide a good answer so
00:13:21.120 | you know we can try so let me put in the match tokens again it's a little more interesting
00:13:26.560 | hang out with my electronic friends it's always a good time a bit better hang out contemplating
00:13:32.720 | the meaning of life okay so a few better answers I don't think any of these are as good as the first
00:13:39.360 | one I got but they're not bad and definitely much better than the you know the first one we have
00:13:46.720 | here which is just a bit kind of plain and boring so let's move on to what we would call few shot
00:13:56.560 | training for our model now what we'll often find is that sometimes these models don't quite get
00:14:04.880 | what we are looking for and we can actually see that in this example here so the following is
00:14:11.280 | conversation similar thing again AI assistant assistant is stochastic witty producing creative
00:14:16.960 | and funny responses to the user's questions here are some examples and then in here what we can do
00:14:22.880 | is actually put in some examples but before we do that I just want to remove that and I want to show
00:14:28.640 | you what we get to begin with so let's run this we've turned the temperature up so it should come
00:14:36.960 | up with something kind of creative to a degree but we'll see that it's not not particularly
00:14:42.560 | interesting okay so yeah maybe if you want a serious answer this is what you're looking for
00:14:49.040 | but I'm not asking for anything seriously I want something stochastic witty creative and amusing
00:14:54.720 | right so what if we come down here and we actually add a few examples to our prompt okay so this is
00:15:03.520 | what we would refer to as few shot training we're adding a few essentially training examples into
00:15:09.600 | the prompt so we're going to say okay user how are you AI is just kind of being stochastic I can't
00:15:15.120 | complain sometimes I still do the user will ask what time is it and the AI says it's time to get
00:15:20.480 | a watch okay and let's see if we get a a less serious answer to what is the meaning of life
00:15:26.640 | again let me put in the max tokens the previous answer is pretty good I don't know if we're going
00:15:32.160 | to get a good one like that again but let's try okay and you know we get something we get something
00:15:39.760 | good again as a great philosopher Shrek once said Fiona the meaning in life is to find your passion
00:15:46.640 | so kind of useful but also pretty amusing so this is a much better response and we got that by just
00:15:55.840 | providing a few examples beforehand so we did some few shot training we showed a few training
00:16:03.600 | examples to model and all of a sudden it can produce a much better output now next thing I
00:16:09.600 | want to talk about is adding multiple contexts or adding a context in the maybe I think it was
00:16:15.520 | the first example we had a context in there but we manually wrote that okay we added that in there
00:16:20.720 | in reality what we do is something slightly different so let's consider the use case of
00:16:26.080 | question answering question answering we want the model to be factual we don't want it to make
00:16:32.080 | things up and ideally we would also like the model to be able to source where it's kind of getting
00:16:39.360 | this information from so what we essentially want here is some form of external source of information
00:16:46.320 | that we can feed into the model and we can also use to kind of fact check that what the model
00:16:52.800 | is saying is actually true or is at least coming from somewhere that is reliable now when we are
00:17:00.240 | feeding this type of information into a model via the prompt we would refer to a source knowledge
00:17:06.960 | and source knowledge as you might expect is just any type of knowledge that is fed into the model
00:17:14.400 | via the prompt and this is kind of the I don't want to say the opposite but this is an alternative
00:17:21.760 | to what we would call parametric knowledge which is knowledge that the model has learned during
00:17:27.600 | the training process and stores within the model weights themselves so that seems like if you ask
00:17:33.040 | the model who's the first man on the moon it's probably going to be able to say Neil Armstrong
00:17:38.160 | because it's kind of remembered that from during its training when it's seen you know tons and tons
00:17:43.120 | of human information and data but if you ask more specific pointed questions it can sometimes make
00:17:50.720 | things up or can provide an answer which is kind of generic and not actually that useful and that's
00:17:57.200 | where we would like to use this source knowledge to feed in more useful information now in this
00:18:04.080 | example we're just going to feed in a list of dummy external informations so in reality we'd
00:18:10.560 | probably use like a search engine api or a long-term memory component rather than just relying
00:18:16.880 | on a list like we're doing here but for the sake of simplicity this is all we're going to do so we
00:18:22.480 | have a few contexts here it's talking about large language models the latest models use nlp
00:18:28.160 | so on and so on it also talks about getting your api key from open ai talks about the open ai's
00:18:36.000 | api accessible via the open ai library or some bits of code and down here we also talk about
00:18:41.600 | accessing it via the line chain library now it's going to use all this information and it's going
00:18:46.880 | to use all that to build a better prompt and create a better output so what we do is we have
00:18:53.280 | our instructions at the top here as we did before then we have our external information our context
00:18:59.520 | now for gpt3 in particular what they recommend is that you separate your external information
00:19:06.960 | or your context from the rest of the prompt using like three of these or also you can
00:19:14.480 | use three of these not like that but like this okay we're going to stick with these and then
00:19:23.760 | when it comes to the prompts themselves you also separate them each one of the unique prompts so
00:19:29.280 | you know we have one here one here and so on we're separating each one of those with just two
00:19:34.560 | of these characters as well then we have our question and then we have our output indicator
00:19:42.720 | now let me actually just copy this i'm going to put it up here so we can just see
00:19:49.040 | what we're actually building here uh oh we need to run this context here now let's run this again
00:19:56.320 | okay and i need to i need to print it because it's a mess okay cool so still kind of messy but it it
00:20:06.880 | works and also points out to me that i've missed a little bit so we have the instructions here
00:20:13.680 | and then we have our separator and then we have our context now actually here i've got this the
00:20:20.160 | thing to to separate them but that's not exactly what we want we actually want to have some new
00:20:25.840 | line characters in there as well so add those and and to actually do that we're going to need to
00:20:36.800 | separate this bit as well so put context string in there and then we also here just put in the
00:20:46.960 | context string directly so come to here and then we get this sort of nicer format just by the way
00:20:59.280 | it will work even if you don't have this nice format but we should try and format it like this
00:21:06.560 | and so we have our context and each one of them is separated okay and then you go down you have
00:21:12.960 | a question and the answer at the end there that is what we want i'm going to replace this with
00:21:18.080 | the context string and then if we come down to here i'm also going to add in our max tokens
00:21:24.880 | great and let's run that okay cool so the question we'll just go up to the top here
00:21:35.040 | uh answer the question based on the context below and then answer i don't know if it doesn't know
00:21:40.800 | the answer you know same as before uh give me two examples of how to use openai's gpt3 model using
00:21:48.240 | python from start to finish okay so what we have is two options we can either use it via openai
00:21:56.080 | or can go via lang train both these are correct now the one question here is okay we added in
00:22:02.560 | these contexts but actually did it need those can we do this prompt without context and still get a
00:22:08.320 | decent answer we can try so answer the question and uh we don't have any context here same question
00:22:15.120 | same output indicator let's run that oh one thing we all need is the max tokens again
00:22:22.000 | okay and yeah we get this so using openai's gpt3 model with python to generate text
00:22:31.200 | you know yeah that is true but it's not it's not very useful and then here it's saying using
00:22:39.040 | gpt3 to generate images which isn't even possible so yeah not not really what you know not a good
00:22:45.600 | answer essentially so this is you know where we will see our certain information that source
00:22:52.640 | knowledge as actually being pretty useful now considering how big our prompts got with those
00:22:59.200 | contexts and you know that wasn't even that much information being fed into our prompts
00:23:05.200 | how big is too big like at what point are our prompts too large for us to actually use the
00:23:12.640 | model this is a pretty obviously important thing because if we go too big we're going to start
00:23:17.680 | throwing errors so let's begin taking a look at that so for text of entry 003 the what we
00:23:26.800 | call the context window which is the maximum number of tokens that text of entry 003 can handle
00:23:35.040 | within both the prompt and also the completion creation that is 4097 tokens okay not words
00:23:46.720 | but tokens so we can set the maximum completion length of our model as we saw before where we're
00:23:54.400 | setting the max tokens we can set that but that cannot cover the 4097 because we need to consider
00:24:01.120 | the number of tokens within our prompt as well the only problem is how do we measure the number
00:24:06.560 | of tokens within our prompt okay so for that we need to use OpenAI's tick token tokenizer
00:24:13.200 | right so for that you'll need to pip install tick token taking the early prompt and we'll just stick
00:24:18.640 | with the the one where I haven't added in the the new lines here let's take a look at how we can
00:24:23.440 | actually measure the number of tokens in there so we need to import tick token we create our prompt
00:24:30.960 | hopefully better than using us here we have our encoder name which I will explain in a moment
00:24:37.040 | and then we have our tokenizer so this is a tick token tokenizer which we initialize by using tick
00:24:45.600 | token get a coding then we have that code name and code name is important for each model here
00:24:50.320 | and then we just tokenize it in code our prompt check the length of that and we get 412 so we're
00:24:59.360 | feeding this prompt into our text imagery 003 model which is going to use 412 of our context
00:25:06.480 | window tokens okay which is that 4097 that you see here that leaves us with 3685 tokens for the
00:25:18.080 | completion okay for that so that max tokens we can set it to that number but no higher now one
00:25:25.120 | important thing to note here is that not all OpenAI models use the p50k base encoder there is
00:25:33.760 | this link here which will take you through to a table of the different encoders but in short
00:25:40.160 | as of recording this video they were this okay so for most GPT-3 models and also GPT-2 you're
00:25:46.880 | going to be using the r50k base or GPT-2 encoder for the code models and recent instructional
00:25:54.400 | models we use p50k base and for the embedding model hud002 we use cl100k base so if we now
00:26:04.880 | set our maximum tokens to this and generate let's let's take a look what we get now one thing you
00:26:11.680 | will note straight away is that the completion time is longer it's because we have more maximum
00:26:18.080 | tokens even if it doesn't fill that entire space with with tokens the computational time does take
00:26:24.240 | longer now let's see what we have here it's pretty much the same as what we got before
00:26:32.720 | even though we've increased the number of tokens because it doesn't need to use all of that space
00:26:37.920 | so the model doesn't now what happens if we increase max tokens by one more right so if we
00:26:45.440 | go to you can kind of see already if we go to 3,686 let's run that and we get this invalid
00:26:53.520 | request error that's because we've exceeded that maximum content length so we need to just be very
00:27:01.200 | cautious with that and obviously like if you're using this in a environment where you might expect
00:27:08.800 | the maximum content length to be exceeded you should probably consider implementing something
00:27:15.360 | like this check into your pipeline now that is actually everything for this video we've just
00:27:24.080 | actually been through a fair few things when it comes to building better prompts and how to handle
00:27:30.880 | some of the key parameters within your prompt completion inputs so as I mentioned prompts are
00:27:39.680 | super important and if you don't get them right your output is is not going to be that great to
00:27:46.400 | be honest so it's worth spending some time just learning more about prompts not just what we've
00:27:52.640 | covered here but you know maybe more and just especially more than anything experimenting
00:27:58.480 | with your prompts and considering other things depending on what you're doing like do you need
00:28:03.920 | to be pulling in more information for your prompts from external sources of information
00:28:08.800 | do you need to be modifying the temperature are there other variables in the completion endpoint
00:28:13.200 | that you could do modifying all these things are pretty important to consider if you're you know
00:28:19.360 | actually building something with any real value using large language models and another thing to
00:28:26.000 | point out is that this is not specific to GPT-3 models this is if you want to use coheres generation
00:28:34.640 | endpoints or completion endpoints or you want to use open source hugging face models like you want
00:28:41.280 | to use bloom or something for completion you should also consider these prompt engineering
00:28:48.320 | like rules of thumb or trips beyond that I think we've covered everything so that's it for this
00:28:56.000 | video I hope it's been useful and interesting but for now we'll leave it there so thank you
00:29:03.280 | very much for watching and I will see you again in the next one. Bye!