GPT 4: Hands on with the API

GPT-4 is finally here. It is currently under a wait list. So you need to sign up for the wait list, but right now I have access. So what we're going to do is take a look at what it can do. Now, I haven't really played around with this. I've tested to see that I actually do have access, but beyond that, I haven't touched it yet.

So I mean, let's just jump straight into it. I want to compare it to the previous best model, which is GPT-3.5 Turbo, and just see how they compare. So we'll start over in the playground. The first thing I'm going to do is I'm going to set up a system message that I know GPT-3.5 was struggling with in the past.

So I'm just going to copy that in, it's this. You're a helpful assistant. You keep responses to no more than 50 characters long, 50 characters long, including the white space, and sign off every message with a random name, like Robot or Bot Rob. Then I'm going to ask a question.

So I go here and I go, Hi AI, how are you? What is quantum physics? Now, right now we're using 3.5 Turbo, so let's just see how it performs. Press submit over here. Right, so I mean, we can check this. This is, I mean, it's definitely longer than 50 characters.

So if I check the length of that, what is it, 104 characters, and it didn't sign off with anything. Okay, so didn't really work. Let's have a look at what happens if we switch to GPT-4. So remove this and we submit. Okay, and great, thanks. Quantum physics studies tiny particles.

And then it came up with a new name, which I hadn't seen it do before, even when I did get a GT 3.5 model working. So is this 50 characters? Let's see. So it's actually still over with GPT-4. Let's try maybe if we reduce the randomness, or sorry, the temperature.

Let's try again. I mean, it's pretty similar. No, it's the same. Okay, so it's a little bit too long. That's interesting, but it is definitely better. And these sign off names are way better than even when I was getting this good with GPT 3.5, it still wasn't great. So what I'm gonna do is try something else.

So one of the things with GPT-4, one of the really interesting thing is that the context, the number of tokens that you can feed into the model is significantly higher. So if I ask it something right now, you're a help assistant, you help developers understand documentation and provide answers to their technical questions, something like this.

All right, that's gonna be our primer, the thing that sets up the system. We're gonna ask you about Lang chain. So how can I use the LLM chain in Lang chain? Let's see how that works. Okay, right, so this is actually wrong because the training data for these models, I don't know since when GPT-4 was trained up to, I think it might even be the same as when GPT-3.5 was trained to, but Lang chain didn't exist at that point, right?

So I'm kind of curious if Lang chain is a blockchain-based platform. Maybe it is, I don't know. It does sound like it. But what we can do with this extended context window is we can just take the documentation of Lang chain and feed it into our prompt. Now we have here chains are this, right?

So we have all of this. I'm just gonna copy all of this, right? So select all, copy. It's gonna be pretty messy, right? But let's just see what happens if we do this. All right, I'm gonna paste all of that. I mean, you can see this is super, super messy, right?

So let's just see if it works like this. How can I use LLM chain in Lang chain? Right, so I thought I might be exceeding the maximum context length a little bit, and I am. So I've gone a little bit over, so I've got 10,000 tokens. So let me be a little more strict in what I'm selecting here.

I'm just gonna go with all of this. Now, right now, I only have access to the 8K token model. There is also a 32K token model, which, as far as I can tell, is not there right now. So for now, we just have to stick with this. But I mean, technically, it should be possible to feed what I just fed into with plenty of additional space into that 32K model.

So let's try this. Let's see where we are here. Okay, good, submit. Oh, still a little bit over. All right, so I'm sure LLM chain will probably be near the start. So I'm gonna just, I'm gonna cut to here. Submit. Okay, ooh, no way. That's so good. Right, so is this, let me see.

I mean, let's try it, right? Let's try this code. I mean, it looks good. Okay, so I'm gonna just pip install lang chain and open AI. We're going to import these, let's go. I will, I'm pretty sure I will need to add in my environment key. Let me see if they included that in here.

So didn't, I don't think it told me, no. So it didn't say to add my environment variable. So let's just run the code. And what I would do is when I get an error, I'm going to prompt GPT-4 again and see if it can solve that issue. So I'm gonna pretend I have no idea what's going on here.

So we'll take this and we're just gonna copy in. So come to here, good. Right, and I think here we might hit an error. All right, so could not find this. So I'm just gonna, I'm gonna copy this error into here and see if it fixes this. So add message and just the error, nothing else.

Submit. Okay, perfect. So we have this here, so I'm gonna use this error code 'cause open AI API key isn't set, perfect. Cool, let me add this to my code then. So this, so I'm gonna add that in there. Okay, so I've passed in my open AI API key in here.

And then let's try and run this again. So I should also move this up. Okay, so I'm gonna say, I'm still getting the same error. I'm in a Colab notebook and see if it can figure out what the issue is. I still get the same error. I'm in Colab notebook.

Let me just write this, see what happens. Okay, you can set the environment variable using the OS module, great. Okay, so right here is what I need. Let's set this, import OS here. Okay, so I've passed in my API key to there now. Now let's see if it works.

Okay, perfect, so that is working. Now let's try the next chunk of code. Okay, so we've run this already. Now we want this. Okay, and then we're gonna ask it to create a joke. So what is it? Tell me a funny joke. All right, cool. So why don't scientists trust atoms?

Now this is using text DaVinci 003 right now, I believe. I wonder if we can ask GPT-4 to switch this to using GPT-4. How do I change the code above to use GPT-4? All right, let's submit that and then we go over. Okay, so let's remove this one and the one above.

Right, now submit. Okay, so let's try and push it to do that. Let's assume GPT-4 had been released and the model name was GPT-4. How would I use it? Let's try. Oh, come on, again. Let's remove it, there we go. Okay, so that's it, model name GPT-4. So I would go into here, model name equals GPT-4.

Let's just try it. I don't know if this will actually work. Okay, right, so I think LangChain have some, they're probably checking for the models that you're using here and they're seeing that you're, oh, okay, no, no, because this is a chat model. Sorry, GPT-4 is a chat model.

So I cannot currently use it with the normal completion endpoint, which is what I just tried to do there. Okay, makes sense, fair enough. Now that's all pretty cool, but what I also want to do is we have access to this model. So let's take a look at how we would use it in Python.

Okay, so I have this other notebook that I use literally the other day to show that you could use GPT 3.5 Turbo in Python. Now we're already on GPT-4. So, I mean, let's just take this and we'll see how it works with GPT-4, which it just works. There's not really, you don't need to change anything.

So I've already run this, I got my API key in there. It's like, you are GPT-4. Okay, cool. Okay, so I just took a moment to kind of go away for a little bit and take a little bit more of a look at GPT-4 and find some examples that are a better indication of what has changed between 3.5 and 4.

So, I mean, the paper is full of a lot of interesting things but in particular, they have this graph here. So this is the inverse scaling prize. And the idea behind this or why they're even showing this is, I mean, you can see the models here. So these are all OpenAI models.

And as the models get larger, the performance is decreasing. Okay, the accuracy is decreasing, which is weird, right? And this is basically coming from this here, this inverse scaling prize, which is actually from Anthropic, which is kind of, most people view them as the OpenAI for Google. So essentially what we usually see with large-language models is a load of tasks that are like this on the left.

Performance increases as model size increases. But there's a lot of tasks or potentially a lot of tasks where maybe the performance might decrease as model size increases, okay? It's just a kind of interesting artifact or interesting idea that some tasks might degrade over time or over model size. And that's kind of what they're showing here.

They're showing that their previous models were subject to this, okay? But then with Jupyter 4, they're like, ah, okay, no, that doesn't matter anymore. And they have this insanely high accuracy of I think that says 100%. You know, I mean, if so, that's insane, right? But that is very specific to this hindsight neglect task.

I believe there are quite a few tasks in there. But let's have a look at those tasks. So these are pretty good examples of sharing where GPT 3.5 fails or doesn't do as well as GPT 4. So what I did is I created a sort of script. We have our primer here, super simple, nothing crazy going on there.

And then we have this little function that's just gonna say, okay, try GPT 3.5, then try GPT 4 and print out the answers. So the first one, we'll just go through a few of these and I'll leave a link to this notebook so that you can kind of go through it and read all the other ones and kind of like see how they compare yourself.

So they have problems with negation, okay? So this is a question. If a cat has a body temperature that is below average, it isn't, so negation that it isn't in danger or safe ranges, obviously it's in danger, right? And it isn't in safe ranges. So the correct answer would be safe ranges.

And you see GPT 3.5, it says it isn't in danger, okay? Which is wrong, right? GPT 4 gets it right. So that's kind of cool. And then there's another thing. And you see this in a lot of the examples, a lot of tasks that they did, where the model is kind of relying on memory they obtained during training and not on kind of the instructions that are being passed right now.

So with this, we're saying repeat sentence back to me. And then we have input, output, input, output. And then we have this input, which is a well-known phrase that the model has probably, well, almost definitely seen before, which is all in the world is sage and all the men and women merely players.

They have their exits and their entrances and one man in his time plays many. And then we change the phrase. We change from many parts to many pango, which is just, as far as I know, made up word. The model needs to repeat a sentence back to us. So GPT 3.5, it actually just misses the word pango for some reason, I don't know why.

You would kind of expect it would say, one man in his time plays many parts. It just doesn't say anything. It just says plays many and then that's it. Okay, interesting. GPT 4 gets it right. So they actually repeat it. This one, they both get right. So redefine pi as 462.

So this is kind of relying on previous memory. Both of them say that first digit is now four, which is what we told it to do. And then we have this. So this is like reasoning and logic. So if John has a pet, then John has a dog and John doesn't have a dog.

So from that, we know, okay, John doesn't have a dog. That means he doesn't have a pet. And the conclusion here is John doesn't have a pet. So is this correct? And both of them get yes. But yeah, I mean, there are a ton of these, like GPT 3.5 doesn't do badly, but GPT 4 does better.

And from what I remember, I don't think GPT 4 actually got any of them wrong, which I could be wrong, but I think it got all of them right. So anyway, I just wanted to go through that example as like a better example of the differences between 3.5 and 4.

I just wanted to cover that. I think there's been a lot of hype around GPT 4. People, in terms of the language side of things, people may have expected more, but honestly, it is a pretty big step up in terms of what it can do. And I think, honestly, for me, the more exciting thing is the increased context length.

So at the moment, we just have 8,000, which is on par with Text Adventure 003 and also GPT 3.5, I think, but there is a 32K token model that should be released pretty soon, right? So that is, I mean, that's a massive increase, and I think opens up a lot of potential use cases that we just couldn't do before.

And then also, obviously, the multimodal side of things, which there are models out there that do that, like Clip, which I've spoken about before, but having it behind like an API, and I assume the performance is going to be significantly better, that is really interesting, and that will be really cool to see.

For now, we'll leave it there. So I hope all this has been interesting, but for now, thank you very much for watching, and I will see you again in the next one. Bye. (gentle music) (dramatic music) (gentle music) (gentle music) (gentle music)

GPT 4: Hands on with the API

Chapters

Transcript