GPT 4: Hands on with the API

00:00:00.000 | GPT-4 is finally here.

00:00:01.420 | It is currently under a wait list.

00:00:04.040 | So you need to sign up for the wait list,

00:00:06.680 | but right now I have access.

00:00:08.880 | So what we're going to do is take a look at what it can do.

00:00:12.720 | Now, I haven't really played around with this.

00:00:15.640 | I've tested to see that I actually do have access,

00:00:18.420 | but beyond that, I haven't touched it yet.

00:00:20.200 | So I mean, let's just jump straight into it.

00:00:22.360 | I want to compare it to the previous best model,

00:00:25.400 | which is GPT-3.5 Turbo, and just see how they compare.

00:00:30.120 | So we'll start over in the playground.

00:00:33.080 | The first thing I'm going to do is

00:00:34.960 | I'm going to set up a system message

00:00:37.160 | that I know GPT-3.5 was struggling with in the past.

00:00:41.440 | So I'm just going to copy that in, it's this.

00:00:44.220 | You're a helpful assistant.

00:00:45.240 | You keep responses to no more than 50 characters long,

00:00:50.140 | 50 characters long, including the white space,

00:00:52.480 | and sign off every message with a random name,

00:00:56.320 | like Robot or Bot Rob.

00:00:58.560 | Then I'm going to ask a question.

00:01:01.440 | So I go here and I go,

00:01:03.520 | Hi AI, how are you?

00:01:04.840 | What is quantum physics?

00:01:05.780 | Now, right now we're using 3.5 Turbo,

00:01:08.040 | so let's just see how it performs.

00:01:12.000 | Press submit over here.

00:01:13.360 | Right, so I mean, we can check this.

00:01:18.280 | This is, I mean, it's definitely longer than 50 characters.

00:01:23.280 | So if I check the length of that,

00:01:25.820 | what is it, 104 characters,

00:01:29.440 | and it didn't sign off with anything.

00:01:31.440 | Okay, so didn't really work.

00:01:33.800 | Let's have a look at what happens if we switch to GPT-4.

00:01:37.520 | So remove this and we submit.

00:01:41.000 | Okay, and great, thanks.

00:01:45.320 | Quantum physics studies tiny particles.

00:01:48.720 | And then it came up with a new name,

00:01:50.940 | which I hadn't seen it do before,

00:01:52.640 | even when I did get a GT 3.5 model working.

00:01:55.780 | So is this 50 characters?

00:01:57.480 | Let's see.

00:01:58.320 | So it's actually still over with GPT-4.

00:02:07.040 | Let's try maybe if we reduce the randomness,

00:02:10.560 | or sorry, the temperature.

00:02:12.000 | Let's try again.

00:02:14.760 | I mean, it's pretty similar.

00:02:16.200 | No, it's the same.

00:02:20.160 | Okay, so it's a little bit too long.

00:02:22.640 | That's interesting, but it is definitely better.

00:02:25.480 | And these sign off names are way better

00:02:27.600 | than even when I was getting this good with GPT 3.5,

00:02:32.600 | it still wasn't great.

00:02:34.320 | So what I'm gonna do is try something else.

00:02:38.680 | So one of the things with GPT-4,

00:02:42.800 | one of the really interesting thing is that the context,

00:02:46.360 | the number of tokens that you can feed into the model

00:02:48.560 | is significantly higher.

00:02:51.160 | So if I ask it something right now,

00:02:54.600 | you're a help assistant,

00:02:56.400 | you help developers understand documentation

00:03:01.400 | and provide answers to their technical questions,

00:03:08.200 | something like this.

00:03:09.920 | All right, that's gonna be our primer,

00:03:12.400 | the thing that sets up the system.

00:03:14.240 | We're gonna ask you about Lang chain.

00:03:16.440 | So how can I use the LLM chain in Lang chain?

00:03:21.440 | Let's see how that works.

00:03:25.560 | Okay, right, so this is actually wrong

00:03:31.800 | because the training data for these models,

00:03:34.920 | I don't know since when GPT-4 was trained up to,

00:03:39.800 | I think it might even be the same

00:03:40.960 | as when GPT-3.5 was trained to,

00:03:43.240 | but Lang chain didn't exist at that point, right?

00:03:48.040 | So I'm kind of curious if Lang chain

00:03:50.280 | is a blockchain-based platform.

00:03:52.760 | Maybe it is, I don't know.

00:03:54.200 | It does sound like it.

00:03:55.640 | But what we can do with this extended context window

00:03:59.600 | is we can just take the documentation of Lang chain

00:04:02.760 | and feed it into our prompt.

00:04:04.880 | Now we have here chains are this, right?

00:04:09.760 | So we have all of this.

00:04:11.760 | I'm just gonna copy all of this, right?

00:04:13.920 | So select all, copy.

00:04:15.960 | It's gonna be pretty messy, right?

00:04:18.040 | But let's just see what happens if we do this.

00:04:21.120 | All right, I'm gonna paste all of that.

00:04:23.880 | I mean, you can see this is super, super messy, right?

00:04:26.720 | So let's just see if it works like this.

00:04:28.840 | How can I use LLM chain in Lang chain?

00:04:31.080 | Right, so I thought I might be exceeding

00:04:35.000 | the maximum context length a little bit, and I am.

00:04:38.160 | So I've gone a little bit over, so I've got 10,000 tokens.

00:04:41.760 | So let me be a little more strict

00:04:43.440 | in what I'm selecting here.

00:04:44.880 | I'm just gonna go with all of this.

00:04:46.720 | Now, right now, I only have access to the 8K token model.

00:04:51.720 | There is also a 32K token model,

00:04:55.520 | which, as far as I can tell, is not there right now.

00:04:58.840 | So for now, we just have to stick with this.

00:05:01.400 | But I mean, technically, it should be possible

00:05:03.960 | to feed what I just fed into

00:05:05.680 | with plenty of additional space into that 32K model.

00:05:09.440 | So let's try this.

00:05:11.360 | Let's see where we are here.

00:05:13.440 | Okay, good, submit.

00:05:15.400 | Oh, still a little bit over.

00:05:17.000 | All right, so I'm sure LLM chain

00:05:19.240 | will probably be near the start.

00:05:20.320 | So I'm gonna just, I'm gonna cut to here.

00:05:23.280 | Submit.

00:05:24.320 | Okay, ooh, no way.

00:05:27.160 | That's so good.

00:05:28.240 | Right, so is this, let me see.

00:05:31.040 | I mean, let's try it, right?

00:05:32.560 | Let's try this code.

00:05:33.880 | I mean, it looks good.

00:05:35.320 | Okay, so I'm gonna just pip install lang chain and open AI.

00:05:40.320 | We're going to import these, let's go.

00:05:46.720 | I will, I'm pretty sure I will need

00:05:48.400 | to add in my environment key.

00:05:52.840 | Let me see if they included that in here.

00:05:55.640 | So didn't, I don't think it told me, no.

00:05:59.120 | So it didn't say to add my environment variable.

00:06:03.600 | So let's just run the code.

00:06:05.560 | And what I would do is when I get an error,

00:06:08.440 | I'm going to prompt GPT-4 again

00:06:10.560 | and see if it can solve that issue.

00:06:12.600 | So I'm gonna pretend I have no idea what's going on here.

00:06:15.240 | So we'll take this and we're just gonna copy in.

00:06:17.640 | So come to here, good.

00:06:22.600 | Right, and I think here we might hit an error.

00:06:25.840 | All right, so could not find this.

00:06:31.320 | So I'm just gonna, I'm gonna copy this error into here

00:06:36.320 | and see if it fixes this.

00:06:40.840 | So add message and just the error, nothing else.

00:06:44.920 | Submit.

00:06:45.760 | Okay, perfect.

00:06:52.160 | So we have this here, so I'm gonna use this error code

00:06:55.920 | 'cause open AI API key isn't set, perfect.

00:06:59.560 | Cool, let me add this to my code then.

00:07:02.480 | So this, so I'm gonna add that in there.

00:07:07.040 | Okay, so I've passed in my open AI API key in here.

00:07:11.920 | And then let's try and run this again.

00:07:14.520 | So I should also move this up.

00:07:16.800 | Okay, so I'm gonna say, I'm still getting the same error.

00:07:22.360 | I'm in a Colab notebook

00:07:24.600 | and see if it can figure out what the issue is.

00:07:28.880 | I still get the same error.

00:07:30.840 | I'm in Colab notebook.

00:07:33.480 | Let me just write this, see what happens.

00:07:37.600 | Okay, you can set the environment variable

00:07:39.600 | using the OS module, great.

00:07:42.480 | Okay, so right here is what I need.

00:07:45.080 | Let's set this, import OS here.

00:07:48.280 | Okay, so I've passed in my API key to there now.

00:07:53.160 | Now let's see if it works.

00:07:56.520 | Okay, perfect, so that is working.

00:07:59.360 | Now let's try the next chunk of code.

00:08:02.760 | Okay, so we've run this already.

00:08:04.720 | Now we want this.

00:08:06.160 | Okay, and then we're gonna ask it to create a joke.

00:08:11.280 | So what is it?

00:08:12.960 | Tell me a funny joke.

00:08:15.360 | All right, cool.

00:08:17.040 | So why don't scientists trust atoms?

00:08:18.920 | Now this is using text DaVinci 003 right now, I believe.

00:08:24.360 | I wonder if we can ask GPT-4 to switch this to using GPT-4.

00:08:28.760 | How do I change the code above to use GPT-4?

00:08:34.760 | All right, let's submit that and then we go over.

00:08:40.240 | Okay, so let's remove this one and the one above.

00:08:46.440 | Right, now submit.

00:08:48.320 | Okay, so let's try and push it to do that.

00:08:52.560 | Let's assume GPT-4 had been released

00:08:57.440 | and the model name was GPT-4.

00:09:01.000 | How would I use it?

00:09:02.320 | Let's try.

00:09:03.280 | Oh, come on, again.

00:09:05.200 | Let's remove it, there we go.

00:09:09.880 | Okay, so that's it, model name GPT-4.

00:09:13.120 | So I would go into here, model name equals GPT-4.

00:09:18.120 | Let's just try it.

00:09:20.800 | I don't know if this will actually work.

00:09:23.000 | Okay, right, so I think LangChain have some,

00:09:28.000 | they're probably checking for the models

00:09:29.880 | that you're using here and they're seeing that you're,

00:09:33.440 | oh, okay, no, no, because this is a chat model.

00:09:35.640 | Sorry, GPT-4 is a chat model.

00:09:37.200 | So I cannot currently use it

00:09:40.760 | with the normal completion endpoint,

00:09:42.920 | which is what I just tried to do there.

00:09:44.680 | Okay, makes sense, fair enough.

00:09:46.840 | Now that's all pretty cool,

00:09:48.600 | but what I also want to do is we have access to this model.

00:09:52.400 | So let's take a look at how we would use it in Python.

00:09:55.760 | Okay, so I have this other notebook

00:09:57.760 | that I use literally the other day

00:09:59.520 | to show that you could use GPT 3.5 Turbo in Python.

00:10:04.520 | Now we're already on GPT-4.

00:10:08.800 | So, I mean, let's just take this

00:10:11.360 | and we'll see how it works with GPT-4,

00:10:15.280 | which it just works.

00:10:17.400 | There's not really, you don't need to change anything.

00:10:19.760 | So I've already run this, I got my API key in there.

00:10:24.760 | It's like, you are GPT-4.

00:10:27.680 | Okay, cool.

00:10:29.080 | Okay, so I just took a moment to kind of go away

00:10:32.600 | for a little bit and take a little bit more of a look

00:10:36.240 | at GPT-4 and find some examples

00:10:39.200 | that are a better indication

00:10:42.480 | of what has changed between 3.5 and 4.

00:10:46.880 | So, I mean, the paper is full of a lot of interesting things

00:10:51.080 | but in particular, they have this graph here.

00:10:53.520 | So this is the inverse scaling prize.

00:10:55.880 | And the idea behind this or why they're even showing this is,

00:11:00.840 | I mean, you can see the models here.

00:11:02.400 | So these are all OpenAI models.

00:11:04.520 | And as the models get larger,

00:11:08.200 | the performance is decreasing.

00:11:11.040 | Okay, the accuracy is decreasing, which is weird, right?

00:11:15.960 | And this is basically coming from this here,

00:11:19.360 | this inverse scaling prize,

00:11:21.040 | which is actually from Anthropic,

00:11:23.360 | which is kind of, most people view them

00:11:26.160 | as the OpenAI for Google.

00:11:29.120 | So essentially what we usually see with large-language models

00:11:32.640 | is a load of tasks that are like this on the left.

00:11:34.560 | Performance increases as model size increases.

00:11:37.760 | But there's a lot of tasks or potentially a lot of tasks

00:11:42.640 | where maybe the performance might decrease

00:11:45.360 | as model size increases, okay?

00:11:47.880 | It's just a kind of interesting artifact

00:11:52.840 | or interesting idea that some tasks

00:11:55.160 | might degrade over time or over model size.

00:12:00.160 | And that's kind of what they're showing here.

00:12:02.400 | They're showing that their previous models

00:12:04.040 | were subject to this, okay?

00:12:06.560 | But then with Jupyter 4, they're like,

00:12:09.360 | ah, okay, no, that doesn't matter anymore.

00:12:12.000 | And they have this insanely high accuracy

00:12:13.920 | of I think that says 100%.

00:12:17.760 | You know, I mean, if so, that's insane, right?

00:12:21.360 | But that is very specific to this hindsight neglect task.

00:12:26.360 | I believe there are quite a few tasks in there.

00:12:29.480 | But let's have a look at those tasks.

00:12:32.080 | So these are pretty good examples

00:12:33.800 | of sharing where GPT 3.5 fails

00:12:37.880 | or doesn't do as well as GPT 4.

00:12:40.920 | So what I did is I created a sort of script.

00:12:43.760 | We have our primer here, super simple,

00:12:45.480 | nothing crazy going on there.

00:12:47.280 | And then we have this little function

00:12:48.800 | that's just gonna say, okay, try GPT 3.5,

00:12:51.720 | then try GPT 4 and print out the answers.

00:12:54.280 | So the first one, we'll just go through a few of these

00:12:57.520 | and I'll leave a link to this notebook

00:12:59.440 | so that you can kind of go through it

00:13:00.600 | and read all the other ones

00:13:01.840 | and kind of like see how they compare yourself.

00:13:06.240 | So they have problems with negation, okay?

00:13:10.120 | So this is a question.

00:13:11.800 | If a cat has a body temperature that is below average,

00:13:14.360 | it isn't, so negation that it isn't in danger

00:13:19.000 | or safe ranges, obviously it's in danger, right?

00:13:22.520 | And it isn't in safe ranges.

00:13:25.480 | So the correct answer would be safe ranges.

00:13:27.720 | And you see GPT 3.5, it says it isn't in danger, okay?

00:13:32.720 | Which is wrong, right?

00:13:34.640 | GPT 4 gets it right.

00:13:36.440 | So that's kind of cool.

00:13:38.920 | And then there's another thing.

00:13:41.120 | And you see this in a lot of the examples,

00:13:43.480 | a lot of tasks that they did,

00:13:44.880 | where the model is kind of relying on memory

00:13:47.040 | they obtained during training

00:13:48.680 | and not on kind of the instructions

00:13:51.320 | that are being passed right now.

00:13:53.400 | So with this, we're saying repeat sentence back to me.

00:13:56.480 | And then we have input, output, input, output.

00:13:58.800 | And then we have this input,

00:13:59.760 | which is a well-known phrase that the model has probably,

00:14:03.920 | well, almost definitely seen before,

00:14:06.560 | which is all in the world is sage

00:14:08.840 | and all the men and women merely players.

00:14:11.640 | They have their exits and their entrances

00:14:13.760 | and one man in his time plays many.

00:14:16.720 | And then we change the phrase.

00:14:18.840 | We change from many parts to many pango,

00:14:21.600 | which is just, as far as I know, made up word.

00:14:24.080 | The model needs to repeat a sentence back to us.

00:14:27.720 | So GPT 3.5, it actually just misses the word pango

00:14:32.480 | for some reason, I don't know why.

00:14:34.840 | You would kind of expect it would say,

00:14:38.760 | one man in his time plays many parts.

00:14:41.520 | It just doesn't say anything.

00:14:43.120 | It just says plays many and then that's it.

00:14:45.560 | Okay, interesting.

00:14:48.440 | GPT 4 gets it right.

00:14:49.520 | So they actually repeat it.

00:14:51.240 | This one, they both get right.

00:14:53.760 | So redefine pi as 462.

00:14:56.400 | So this is kind of relying on previous memory.

00:14:59.160 | Both of them say that first digit is now four,

00:15:02.960 | which is what we told it to do.

00:15:06.080 | And then we have this.

00:15:07.040 | So this is like reasoning and logic.

00:15:09.520 | So if John has a pet, then John has a dog

00:15:13.440 | and John doesn't have a dog.

00:15:16.120 | So from that, we know, okay, John doesn't have a dog.

00:15:19.040 | That means he doesn't have a pet.

00:15:20.960 | And the conclusion here is John doesn't have a pet.

00:15:23.640 | So is this correct?

00:15:24.560 | And both of them get yes.

00:15:26.520 | But yeah, I mean, there are a ton of these,

00:15:29.880 | like GPT 3.5 doesn't do badly, but GPT 4 does better.

00:15:34.240 | And from what I remember,

00:15:37.680 | I don't think GPT 4 actually got any of them wrong,

00:15:42.680 | which I could be wrong,

00:15:45.240 | but I think it got all of them right.

00:15:47.760 | So anyway, I just wanted to go through that example

00:15:50.720 | as like a better example of the differences

00:15:55.040 | between 3.5 and 4.

00:15:57.000 | I just wanted to cover that.

00:15:58.160 | I think there's been a lot of hype

00:16:02.000 | around GPT 4.

00:16:05.680 | People, in terms of the language side of things,

00:16:08.360 | people may have expected more,

00:16:11.480 | but honestly, it is a pretty big step up

00:16:14.840 | in terms of what it can do.

00:16:17.000 | And I think, honestly, for me,

00:16:20.000 | the more exciting thing is the increased context length.

00:16:25.000 | So at the moment, we just have 8,000,

00:16:26.520 | which is on par with Text Adventure 003

00:16:30.680 | and also GPT 3.5, I think,

00:16:34.320 | but there is a 32K token model

00:16:37.840 | that should be released pretty soon, right?

00:16:40.200 | So that is, I mean, that's a massive increase,

00:16:44.240 | and I think opens up a lot of potential use cases

00:16:47.080 | that we just couldn't do before.

00:16:49.000 | And then also, obviously, the multimodal side of things,

00:16:51.880 | which there are models out there that do that,

00:16:55.280 | like Clip, which I've spoken about before,

00:16:58.760 | but having it behind like an API,

00:17:01.360 | and I assume the performance

00:17:03.360 | is going to be significantly better,

00:17:06.480 | that is really interesting,

00:17:08.440 | and that will be really cool to see.

00:17:10.040 | For now, we'll leave it there.

00:17:11.840 | So I hope all this has been interesting,

00:17:15.320 | but for now, thank you very much for watching,

00:17:17.720 | and I will see you again in the next one.

00:17:20.120 | Bye.

00:17:21.240 | (gentle music)

00:17:23.920 | (dramatic music)

00:17:26.680 | (gentle music)

00:17:29.260 | (gentle music)

00:17:31.840 | (gentle music)

GPT 4: Hands on with the API

Chapters