back to index

GPT 4: Hands on with the API


Chapters

0:0 GPT-4 has been released
0:31 Hands-on with GPT-4
2:39 Max token limits for GPT-4
5:23 Coding with GPT-4
9:56 Using GPT-4 API in Python
12:32 GPT-4 vs gpt-3.5-turbo
15:59 Why GPT-4 is a big step forward

Whisper Transcript | Transcript Only Page

00:00:00.000 | GPT-4 is finally here.
00:00:01.420 | It is currently under a wait list.
00:00:04.040 | So you need to sign up for the wait list,
00:00:06.680 | but right now I have access.
00:00:08.880 | So what we're going to do is take a look at what it can do.
00:00:12.720 | Now, I haven't really played around with this.
00:00:15.640 | I've tested to see that I actually do have access,
00:00:18.420 | but beyond that, I haven't touched it yet.
00:00:20.200 | So I mean, let's just jump straight into it.
00:00:22.360 | I want to compare it to the previous best model,
00:00:25.400 | which is GPT-3.5 Turbo, and just see how they compare.
00:00:30.120 | So we'll start over in the playground.
00:00:33.080 | The first thing I'm going to do is
00:00:34.960 | I'm going to set up a system message
00:00:37.160 | that I know GPT-3.5 was struggling with in the past.
00:00:41.440 | So I'm just going to copy that in, it's this.
00:00:44.220 | You're a helpful assistant.
00:00:45.240 | You keep responses to no more than 50 characters long,
00:00:50.140 | 50 characters long, including the white space,
00:00:52.480 | and sign off every message with a random name,
00:00:56.320 | like Robot or Bot Rob.
00:00:58.560 | Then I'm going to ask a question.
00:01:01.440 | So I go here and I go,
00:01:03.520 | Hi AI, how are you?
00:01:04.840 | What is quantum physics?
00:01:05.780 | Now, right now we're using 3.5 Turbo,
00:01:08.040 | so let's just see how it performs.
00:01:12.000 | Press submit over here.
00:01:13.360 | Right, so I mean, we can check this.
00:01:18.280 | This is, I mean, it's definitely longer than 50 characters.
00:01:23.280 | So if I check the length of that,
00:01:25.820 | what is it, 104 characters,
00:01:29.440 | and it didn't sign off with anything.
00:01:31.440 | Okay, so didn't really work.
00:01:33.800 | Let's have a look at what happens if we switch to GPT-4.
00:01:37.520 | So remove this and we submit.
00:01:41.000 | Okay, and great, thanks.
00:01:45.320 | Quantum physics studies tiny particles.
00:01:48.720 | And then it came up with a new name,
00:01:50.940 | which I hadn't seen it do before,
00:01:52.640 | even when I did get a GT 3.5 model working.
00:01:55.780 | So is this 50 characters?
00:01:57.480 | Let's see.
00:01:58.320 | So it's actually still over with GPT-4.
00:02:07.040 | Let's try maybe if we reduce the randomness,
00:02:10.560 | or sorry, the temperature.
00:02:12.000 | Let's try again.
00:02:14.760 | I mean, it's pretty similar.
00:02:16.200 | No, it's the same.
00:02:20.160 | Okay, so it's a little bit too long.
00:02:22.640 | That's interesting, but it is definitely better.
00:02:25.480 | And these sign off names are way better
00:02:27.600 | than even when I was getting this good with GPT 3.5,
00:02:32.600 | it still wasn't great.
00:02:34.320 | So what I'm gonna do is try something else.
00:02:38.680 | So one of the things with GPT-4,
00:02:42.800 | one of the really interesting thing is that the context,
00:02:46.360 | the number of tokens that you can feed into the model
00:02:48.560 | is significantly higher.
00:02:51.160 | So if I ask it something right now,
00:02:54.600 | you're a help assistant,
00:02:56.400 | you help developers understand documentation
00:03:01.400 | and provide answers to their technical questions,
00:03:08.200 | something like this.
00:03:09.920 | All right, that's gonna be our primer,
00:03:12.400 | the thing that sets up the system.
00:03:14.240 | We're gonna ask you about Lang chain.
00:03:16.440 | So how can I use the LLM chain in Lang chain?
00:03:21.440 | Let's see how that works.
00:03:25.560 | Okay, right, so this is actually wrong
00:03:31.800 | because the training data for these models,
00:03:34.920 | I don't know since when GPT-4 was trained up to,
00:03:39.800 | I think it might even be the same
00:03:40.960 | as when GPT-3.5 was trained to,
00:03:43.240 | but Lang chain didn't exist at that point, right?
00:03:48.040 | So I'm kind of curious if Lang chain
00:03:50.280 | is a blockchain-based platform.
00:03:52.760 | Maybe it is, I don't know.
00:03:54.200 | It does sound like it.
00:03:55.640 | But what we can do with this extended context window
00:03:59.600 | is we can just take the documentation of Lang chain
00:04:02.760 | and feed it into our prompt.
00:04:04.880 | Now we have here chains are this, right?
00:04:09.760 | So we have all of this.
00:04:11.760 | I'm just gonna copy all of this, right?
00:04:13.920 | So select all, copy.
00:04:15.960 | It's gonna be pretty messy, right?
00:04:18.040 | But let's just see what happens if we do this.
00:04:21.120 | All right, I'm gonna paste all of that.
00:04:23.880 | I mean, you can see this is super, super messy, right?
00:04:26.720 | So let's just see if it works like this.
00:04:28.840 | How can I use LLM chain in Lang chain?
00:04:31.080 | Right, so I thought I might be exceeding
00:04:35.000 | the maximum context length a little bit, and I am.
00:04:38.160 | So I've gone a little bit over, so I've got 10,000 tokens.
00:04:41.760 | So let me be a little more strict
00:04:43.440 | in what I'm selecting here.
00:04:44.880 | I'm just gonna go with all of this.
00:04:46.720 | Now, right now, I only have access to the 8K token model.
00:04:51.720 | There is also a 32K token model,
00:04:55.520 | which, as far as I can tell, is not there right now.
00:04:58.840 | So for now, we just have to stick with this.
00:05:01.400 | But I mean, technically, it should be possible
00:05:03.960 | to feed what I just fed into
00:05:05.680 | with plenty of additional space into that 32K model.
00:05:09.440 | So let's try this.
00:05:11.360 | Let's see where we are here.
00:05:13.440 | Okay, good, submit.
00:05:15.400 | Oh, still a little bit over.
00:05:17.000 | All right, so I'm sure LLM chain
00:05:19.240 | will probably be near the start.
00:05:20.320 | So I'm gonna just, I'm gonna cut to here.
00:05:23.280 | Submit.
00:05:24.320 | Okay, ooh, no way.
00:05:27.160 | That's so good.
00:05:28.240 | Right, so is this, let me see.
00:05:31.040 | I mean, let's try it, right?
00:05:32.560 | Let's try this code.
00:05:33.880 | I mean, it looks good.
00:05:35.320 | Okay, so I'm gonna just pip install lang chain and open AI.
00:05:40.320 | We're going to import these, let's go.
00:05:46.720 | I will, I'm pretty sure I will need
00:05:48.400 | to add in my environment key.
00:05:52.840 | Let me see if they included that in here.
00:05:55.640 | So didn't, I don't think it told me, no.
00:05:59.120 | So it didn't say to add my environment variable.
00:06:03.600 | So let's just run the code.
00:06:05.560 | And what I would do is when I get an error,
00:06:08.440 | I'm going to prompt GPT-4 again
00:06:10.560 | and see if it can solve that issue.
00:06:12.600 | So I'm gonna pretend I have no idea what's going on here.
00:06:15.240 | So we'll take this and we're just gonna copy in.
00:06:17.640 | So come to here, good.
00:06:22.600 | Right, and I think here we might hit an error.
00:06:25.840 | All right, so could not find this.
00:06:31.320 | So I'm just gonna, I'm gonna copy this error into here
00:06:36.320 | and see if it fixes this.
00:06:40.840 | So add message and just the error, nothing else.
00:06:44.920 | Submit.
00:06:45.760 | Okay, perfect.
00:06:52.160 | So we have this here, so I'm gonna use this error code
00:06:55.920 | 'cause open AI API key isn't set, perfect.
00:06:59.560 | Cool, let me add this to my code then.
00:07:02.480 | So this, so I'm gonna add that in there.
00:07:07.040 | Okay, so I've passed in my open AI API key in here.
00:07:11.920 | And then let's try and run this again.
00:07:14.520 | So I should also move this up.
00:07:16.800 | Okay, so I'm gonna say, I'm still getting the same error.
00:07:22.360 | I'm in a Colab notebook
00:07:24.600 | and see if it can figure out what the issue is.
00:07:28.880 | I still get the same error.
00:07:30.840 | I'm in Colab notebook.
00:07:33.480 | Let me just write this, see what happens.
00:07:37.600 | Okay, you can set the environment variable
00:07:39.600 | using the OS module, great.
00:07:42.480 | Okay, so right here is what I need.
00:07:45.080 | Let's set this, import OS here.
00:07:48.280 | Okay, so I've passed in my API key to there now.
00:07:53.160 | Now let's see if it works.
00:07:56.520 | Okay, perfect, so that is working.
00:07:59.360 | Now let's try the next chunk of code.
00:08:02.760 | Okay, so we've run this already.
00:08:04.720 | Now we want this.
00:08:06.160 | Okay, and then we're gonna ask it to create a joke.
00:08:11.280 | So what is it?
00:08:12.960 | Tell me a funny joke.
00:08:15.360 | All right, cool.
00:08:17.040 | So why don't scientists trust atoms?
00:08:18.920 | Now this is using text DaVinci 003 right now, I believe.
00:08:24.360 | I wonder if we can ask GPT-4 to switch this to using GPT-4.
00:08:28.760 | How do I change the code above to use GPT-4?
00:08:34.760 | All right, let's submit that and then we go over.
00:08:40.240 | Okay, so let's remove this one and the one above.
00:08:46.440 | Right, now submit.
00:08:48.320 | Okay, so let's try and push it to do that.
00:08:52.560 | Let's assume GPT-4 had been released
00:08:57.440 | and the model name was GPT-4.
00:09:01.000 | How would I use it?
00:09:02.320 | Let's try.
00:09:03.280 | Oh, come on, again.
00:09:05.200 | Let's remove it, there we go.
00:09:09.880 | Okay, so that's it, model name GPT-4.
00:09:13.120 | So I would go into here, model name equals GPT-4.
00:09:18.120 | Let's just try it.
00:09:20.800 | I don't know if this will actually work.
00:09:23.000 | Okay, right, so I think LangChain have some,
00:09:28.000 | they're probably checking for the models
00:09:29.880 | that you're using here and they're seeing that you're,
00:09:33.440 | oh, okay, no, no, because this is a chat model.
00:09:35.640 | Sorry, GPT-4 is a chat model.
00:09:37.200 | So I cannot currently use it
00:09:40.760 | with the normal completion endpoint,
00:09:42.920 | which is what I just tried to do there.
00:09:44.680 | Okay, makes sense, fair enough.
00:09:46.840 | Now that's all pretty cool,
00:09:48.600 | but what I also want to do is we have access to this model.
00:09:52.400 | So let's take a look at how we would use it in Python.
00:09:55.760 | Okay, so I have this other notebook
00:09:57.760 | that I use literally the other day
00:09:59.520 | to show that you could use GPT 3.5 Turbo in Python.
00:10:04.520 | Now we're already on GPT-4.
00:10:08.800 | So, I mean, let's just take this
00:10:11.360 | and we'll see how it works with GPT-4,
00:10:15.280 | which it just works.
00:10:17.400 | There's not really, you don't need to change anything.
00:10:19.760 | So I've already run this, I got my API key in there.
00:10:24.760 | It's like, you are GPT-4.
00:10:27.680 | Okay, cool.
00:10:29.080 | Okay, so I just took a moment to kind of go away
00:10:32.600 | for a little bit and take a little bit more of a look
00:10:36.240 | at GPT-4 and find some examples
00:10:39.200 | that are a better indication
00:10:42.480 | of what has changed between 3.5 and 4.
00:10:46.880 | So, I mean, the paper is full of a lot of interesting things
00:10:51.080 | but in particular, they have this graph here.
00:10:53.520 | So this is the inverse scaling prize.
00:10:55.880 | And the idea behind this or why they're even showing this is,
00:11:00.840 | I mean, you can see the models here.
00:11:02.400 | So these are all OpenAI models.
00:11:04.520 | And as the models get larger,
00:11:08.200 | the performance is decreasing.
00:11:11.040 | Okay, the accuracy is decreasing, which is weird, right?
00:11:15.960 | And this is basically coming from this here,
00:11:19.360 | this inverse scaling prize,
00:11:21.040 | which is actually from Anthropic,
00:11:23.360 | which is kind of, most people view them
00:11:26.160 | as the OpenAI for Google.
00:11:29.120 | So essentially what we usually see with large-language models
00:11:32.640 | is a load of tasks that are like this on the left.
00:11:34.560 | Performance increases as model size increases.
00:11:37.760 | But there's a lot of tasks or potentially a lot of tasks
00:11:42.640 | where maybe the performance might decrease
00:11:45.360 | as model size increases, okay?
00:11:47.880 | It's just a kind of interesting artifact
00:11:52.840 | or interesting idea that some tasks
00:11:55.160 | might degrade over time or over model size.
00:12:00.160 | And that's kind of what they're showing here.
00:12:02.400 | They're showing that their previous models
00:12:04.040 | were subject to this, okay?
00:12:06.560 | But then with Jupyter 4, they're like,
00:12:09.360 | ah, okay, no, that doesn't matter anymore.
00:12:12.000 | And they have this insanely high accuracy
00:12:13.920 | of I think that says 100%.
00:12:17.760 | You know, I mean, if so, that's insane, right?
00:12:21.360 | But that is very specific to this hindsight neglect task.
00:12:26.360 | I believe there are quite a few tasks in there.
00:12:29.480 | But let's have a look at those tasks.
00:12:32.080 | So these are pretty good examples
00:12:33.800 | of sharing where GPT 3.5 fails
00:12:37.880 | or doesn't do as well as GPT 4.
00:12:40.920 | So what I did is I created a sort of script.
00:12:43.760 | We have our primer here, super simple,
00:12:45.480 | nothing crazy going on there.
00:12:47.280 | And then we have this little function
00:12:48.800 | that's just gonna say, okay, try GPT 3.5,
00:12:51.720 | then try GPT 4 and print out the answers.
00:12:54.280 | So the first one, we'll just go through a few of these
00:12:57.520 | and I'll leave a link to this notebook
00:12:59.440 | so that you can kind of go through it
00:13:00.600 | and read all the other ones
00:13:01.840 | and kind of like see how they compare yourself.
00:13:06.240 | So they have problems with negation, okay?
00:13:10.120 | So this is a question.
00:13:11.800 | If a cat has a body temperature that is below average,
00:13:14.360 | it isn't, so negation that it isn't in danger
00:13:19.000 | or safe ranges, obviously it's in danger, right?
00:13:22.520 | And it isn't in safe ranges.
00:13:25.480 | So the correct answer would be safe ranges.
00:13:27.720 | And you see GPT 3.5, it says it isn't in danger, okay?
00:13:32.720 | Which is wrong, right?
00:13:34.640 | GPT 4 gets it right.
00:13:36.440 | So that's kind of cool.
00:13:38.920 | And then there's another thing.
00:13:41.120 | And you see this in a lot of the examples,
00:13:43.480 | a lot of tasks that they did,
00:13:44.880 | where the model is kind of relying on memory
00:13:47.040 | they obtained during training
00:13:48.680 | and not on kind of the instructions
00:13:51.320 | that are being passed right now.
00:13:53.400 | So with this, we're saying repeat sentence back to me.
00:13:56.480 | And then we have input, output, input, output.
00:13:58.800 | And then we have this input,
00:13:59.760 | which is a well-known phrase that the model has probably,
00:14:03.920 | well, almost definitely seen before,
00:14:06.560 | which is all in the world is sage
00:14:08.840 | and all the men and women merely players.
00:14:11.640 | They have their exits and their entrances
00:14:13.760 | and one man in his time plays many.
00:14:16.720 | And then we change the phrase.
00:14:18.840 | We change from many parts to many pango,
00:14:21.600 | which is just, as far as I know, made up word.
00:14:24.080 | The model needs to repeat a sentence back to us.
00:14:27.720 | So GPT 3.5, it actually just misses the word pango
00:14:32.480 | for some reason, I don't know why.
00:14:34.840 | You would kind of expect it would say,
00:14:38.760 | one man in his time plays many parts.
00:14:41.520 | It just doesn't say anything.
00:14:43.120 | It just says plays many and then that's it.
00:14:45.560 | Okay, interesting.
00:14:48.440 | GPT 4 gets it right.
00:14:49.520 | So they actually repeat it.
00:14:51.240 | This one, they both get right.
00:14:53.760 | So redefine pi as 462.
00:14:56.400 | So this is kind of relying on previous memory.
00:14:59.160 | Both of them say that first digit is now four,
00:15:02.960 | which is what we told it to do.
00:15:06.080 | And then we have this.
00:15:07.040 | So this is like reasoning and logic.
00:15:09.520 | So if John has a pet, then John has a dog
00:15:13.440 | and John doesn't have a dog.
00:15:16.120 | So from that, we know, okay, John doesn't have a dog.
00:15:19.040 | That means he doesn't have a pet.
00:15:20.960 | And the conclusion here is John doesn't have a pet.
00:15:23.640 | So is this correct?
00:15:24.560 | And both of them get yes.
00:15:26.520 | But yeah, I mean, there are a ton of these,
00:15:29.880 | like GPT 3.5 doesn't do badly, but GPT 4 does better.
00:15:34.240 | And from what I remember,
00:15:37.680 | I don't think GPT 4 actually got any of them wrong,
00:15:42.680 | which I could be wrong,
00:15:45.240 | but I think it got all of them right.
00:15:47.760 | So anyway, I just wanted to go through that example
00:15:50.720 | as like a better example of the differences
00:15:55.040 | between 3.5 and 4.
00:15:57.000 | I just wanted to cover that.
00:15:58.160 | I think there's been a lot of hype
00:16:02.000 | around GPT 4.
00:16:05.680 | People, in terms of the language side of things,
00:16:08.360 | people may have expected more,
00:16:11.480 | but honestly, it is a pretty big step up
00:16:14.840 | in terms of what it can do.
00:16:17.000 | And I think, honestly, for me,
00:16:20.000 | the more exciting thing is the increased context length.
00:16:25.000 | So at the moment, we just have 8,000,
00:16:26.520 | which is on par with Text Adventure 003
00:16:30.680 | and also GPT 3.5, I think,
00:16:34.320 | but there is a 32K token model
00:16:37.840 | that should be released pretty soon, right?
00:16:40.200 | So that is, I mean, that's a massive increase,
00:16:44.240 | and I think opens up a lot of potential use cases
00:16:47.080 | that we just couldn't do before.
00:16:49.000 | And then also, obviously, the multimodal side of things,
00:16:51.880 | which there are models out there that do that,
00:16:55.280 | like Clip, which I've spoken about before,
00:16:58.760 | but having it behind like an API,
00:17:01.360 | and I assume the performance
00:17:03.360 | is going to be significantly better,
00:17:06.480 | that is really interesting,
00:17:08.440 | and that will be really cool to see.
00:17:10.040 | For now, we'll leave it there.
00:17:11.840 | So I hope all this has been interesting,
00:17:15.320 | but for now, thank you very much for watching,
00:17:17.720 | and I will see you again in the next one.
00:17:21.240 | (gentle music)
00:17:23.920 | (dramatic music)
00:17:26.680 | (gentle music)
00:17:29.260 | (gentle music)
00:17:31.840 | (gentle music)