Back to Index

GPT-5: Everything You Need to Know So Far


Transcript

It seems quite likely that yesterday was the day that OpenAI launched the full training run of GPT-5. I've gone through every source I can find to bring you the most reliable information on what that means, including possibly every public comment on the topic from OpenAI, an exclusive interview with a hardware CEO, and tons of my own analysis.

Plus, I'm going to find time to throw in a practical tip that you can use literally every time you open ChatGPT, and a bonus DALI discovery that I really enjoy. But let's start with these two tweets. And the first clue that the full-scale GPT-5 is being trained comes from the president and co-founder of OpenAI, Greg Brockman.

Now first, a little bit of context. OpenAI typically trains smaller models of about a thousandth the size before they do a full training run. They then gather insights from these smaller models before doing the full training run. So that's the backdrop to OpenAI, in Brockman's words, scientifically predicting and understanding the resulting systems.

That being done, what they're now building is maximally harnessing all their computing resources. They're gathering all of their ideas together and scaling beyond precedent. Translated, they are training their biggest model yet. We'll get to parameters, data, and capabilities in a moment. But first, what's this other tweet? This comes from Jason Wei, a top OpenAI researcher.

A few hours after Brockman's tweet, he said there's no adrenaline rush like launching a massive GPU training. And this got plenty of salutes in the reply from other OpenAI employees. Now for context, this does not mean that we're imminently going to get GPT-5. GPT-4 took around three months to train, and then there was the safety testing.

I'm actually going to end this video with my exact prediction of when I think they're going to release GPT-5. But first, here is a little bit more supporting evidence that they are currently training GPT-5. OpenAI updated their blog to say that applications for the red teaming network have closed, and that those red teamers would know about the status of their applications by the end of last year.

What that means is that the red teamers are now in place to start safety testing the new model. Now, you might say, what's the point of having those red teamers in place if it's still going to be training for two to three months? Well, before a model is fully trained, it goes through various checkpoints.

Think of them a bit like a video game save. What that also means is that in effect, OpenAI will have a GPT-4.2 before they have a GPT-5. Indeed, Greg Brockman, going back to April of last year, said that it might be one of those checkpoints that OpenAI release first.

He said that it's easy to create a continuum of incrementally better AIs, such as by deploying subsequent checkpoints of a given training run. And he explicitly contrasted that approach and said it would be very unlike our historical approach of infrequent major model upgrades. But remember that even before those checkpoints, OpenAI would have already gotten a glimpse of GPT-5 capabilities from those smaller, earlier versions of the model.

Indeed, Sam Altman, back in November, said that he was privileged to be in the room when they pushed back the veil of ignorance. I'm super excited. I can't imagine anything more exciting to work on. And on a personal note, like four times now in the history of OpenAI, the most recent time was just in the last couple of weeks.

I've gotten to be in the room when we sort of like push the front, the sort of the veil of ignorance back and the frontier of discovery forward. And getting to do that is like the professional honor of a lifetime. It all points to OpenAI in November and December having trained those smaller versions of GPT-5.

Again, with the purpose of scientifically predicting and understanding the resulting GPT-5 system. So they know it's going to be good, but just how good and how big? And what are these new or old ideas that they're going to incorporate? Well, one thing that seemed almost certain is that they're going to incorporate a way to let GPT-5 think for longer.

In other words, it's going to lay out its reasoning steps before solving a challenge and have each of those reasoning steps checked internally or externally. Here's Sam Altman a few days ago at Davos. What it means to verify or understand what's going on is going to be a little bit different than people think right now.

I actually can't look in your brain and look at the hundred trillion synapses and try to understand what's happening in each one and say, okay, I really understand why he's thinking what he's thinking. You're not a black box to me. But what I can ask you to do is explain to me your reasoning.

I can say, you know, you think this thing, why? And you can explain first this, then this, then there's this conclusion, then that one, and then there's this. And I can decide if that sounds reasonable to me or not. And I think our AI systems will also be able to do the same thing.

They'll be able to explain to us in natural language, the steps from A to B, and we can decide whether we think those are good steps. And a few days before that, Sam Altman told Bill Gates that that might involve asking GPT-4 or GPT-5 the same question 10,000 times.

You know, when you look at the next two years, what do you think some of the key milestones will be? Multimodality will definitely be important. Which means speech in, speech out. Speech in, speech out, images, eventually video. Clearly, people really want that. We launched images and audio, and it had a much stronger response than we expected.

We'll be able to push that much further. But maybe the most important areas of progress will be around reasoning ability. Right now, GPT-4 can reason in only extremely limited ways. And also reliability. If you ask GPT-4 most questions 10,000 times, one of those 10,000 is probably pretty good, but it doesn't always know which one.

And you'd like to get the best response of 10,000 each time. That increase in reliability will be important. And at this point, watchers of my channel will know exactly what he's referring to. Both of those approaches, checking your reasoning steps and sampling up to 10,000 times, are incorporated into OpenAI's Let's Verify step-by-step paper.

Now, I'm not going to dive into the details of Let's Verify in this video, because I've got at least two previous videos on the topic. But notice in the paper how many times they sample GPT-4. The chart shows what happens when you sample the model over a thousand times and pick out the best responses.

And notice something about this process-supervised way of doing things. The results are continuing to go up. I can't resist showing you a quick example of the reasoning steps broken down into separate lines and essentially a verifier looking in and checking which steps are accurate or inaccurate. The answers for whom each step in the reasoning process got a thumbs up were the ones submitted and the results were dramatic.

Essentially, this process of sampling the model thousands of times and taking the answer that had the highest rated reasoning steps doubled the performance in mathematics. And no, this didn't just work for mathematics. It had dramatic results across these STEM fields. And remember, this was using GPT-4 as a base model, not GPT-5.

And it was only 2000 samples, not 10,000 like Sam Altman talked about. So this is the evidence I would present to someone who said that LLMs have peaked. If OpenAI can incorporate through parallelization a way to get the model to submit and analyze 10,000 responses, the results could be truly dramatic.

Indeed, the Let's Verify paper from OpenAI repeatedly cited this earlier DeepMind paper on solving math problems with process-based feedback. And for coding, we know AlphaCode2 from Google DeepMind used the mass sampling approach to get an 87th percentile score on a coding contest. In other words, it beat 87% of participants in this CodeForces coding challenge.

For context, the GPT-4 that we got scored around the 5th percentile, a score of 400 in the CodeForces challenge. These numbers are a little out of date, but AlphaCode2 would have scored around here, expert or candidate master level. Or to just translate everything, if they find a way to let GPT-5 think, it could be night and day in terms of performance for coding, mathematics, and STEM.

But just how big will GPT-5 be that's doing all of this parallel thinking? Well, for AI Insiders, I interviewed Gavin Uberti, the CEO and co-founder of EtchedAI, which I've also talked about on this channel. He is a 21-year-old dropout from Harvard University and on his LinkedIn profile, it says he's building the hardware for superintelligence.

In the interview, he guessed that GPT-5 would have around 10 times the parameter count of GPT-4. According to leaks, GPT-4 has around 1.5 to 1.8 trillion parameters. But just quickly, what did he mean when he said that he expects that to come from a combination of a larger embedding dimension, more layers, and double the number of experts?

Well, think of the embedding dimension as being about how granular the training can be about each token and its context. A bigger embedding dimension means more granularity and nuance about each token. And doubling the number of layers allows a model to develop deeper pattern recognition. It allows it to see more complex patterns within patterns within patterns.

More highlights from that interview will be coming on AI Insiders on Patreon in the coming weeks. But during this video on GPT-5, I promised you two interludes. One focused on DALI-3 and one a practical tip for using ChatGPT anytime. Well, here is the first of those two interludes focused on a particular quirk of DALI-3.

I say that, but this trick also works for mid-journey. Now, many of you might have noticed a trend on TikTok, Reddit, Twitter, and YouTube shorts, where people post an image and then make it progressively more intense, let's say. Well, here's something arguably even more quirky. I got the original idea from Peter Wilderford on Twitter and decided to make it more intense.

But first I asked, draw an image of a London scene, but don't use lampposts in the image. And lo and behold, we get dozens of lampposts. I mean, in this one, we have a lamppost coming down from the sky. And what does GPT-4 say in analyzing these? It says that these are two images of a London street scene without lampposts.

So then I said, now make these images with even fewer lampposts, stripping any lamppost references completely. And here were the results. As you can see on the right, there's barely a lamppost in sight. And as GPT-4 says, these images were created with complete omission of any lamppost references. Then I said, delete absolutely everything that pertains to a lamppost.

And I don't know about you, but I can't see any lampposts left. And finally, I said, take this to the max and give me an image that someone could not picture a lamppost existing within, even in their wildest imagination. Now, I think it's pretty cute that the lamppost persisted throughout these images.

And I suspect that the deeper reason is that the caption training that DALI 3 got didn't have many examples of omission. They used web captions and synthetic captions, but I doubt there were many examples of people saying this image does not contain X. But speaking of modality, the first thing they want to fix apparently for GPT-5 is the real time nature of the voice interaction.

At the moment, there is quite a bit of time to first token latency. In other words, it takes a bit too long to reply. Here's Sam Altman speaking last week. I think there's all sorts of the current stuff that people complain about, like the voice is too slow and, you know, it's not real time and that'll get better this year.

I think where we're headed, and then I'll talk about this year, is we're headed towards the way you use the computers to talk to it. The operating system of a computer in some sense is close to this idea that you're like working inside of a chat experience. When he mentioned using an LLM as an operating system, he was drawing on Andrej Karpathy's vision.

I've talked about it before, but notice at the top that it's video in and out, audio in and out. And it's not like OpenAI are hiding it. They want as much text, image, audio and video data as they can get their hands on. They also want what I'm going to call reasoning data.

Data that expresses human intention, they call it. Now, I didn't notice that phrase at the time of this blog post in November, but it fits in clearly with what I was saying earlier about Let's Verify. Think about it. How would you make a model agentic, able to solve more complex challenges?

Well, if GPT-5 gets loads of data about people laying out plans full of human intention, GPT-5 could learn to imitate those schemes and plans and maybe have a verifier internally or externally judging those reasoning steps. Now, as for the question of whether those reasoning steps faithfully represent what the model is internally calculating, that will have to be for another day.

This paper from Anthropic back in July said that as models become larger and more capable, they actually produce less faithful reasoning on most tasks we study. That doesn't mean getting the answer wrong more often. It means outputting reasoning steps that don't actually reflect what it's internally calculating. So, GPT-5 may end up being an excellent productivity assistant while still being somewhat inscrutable on a deeper level.

Just quickly before we leave data, I think there's one thing that we can safely say, which is that there will be much more multilingual data in GPT-5 training set. OpenAI have formed so many data partnerships, including with people like the Icelandic government, and there are so many more multilingual data sets being open sourced that I think it's almost inevitable that there will be a dramatic forward step in GPT-5's multilingual abilities.

Partly, this is a safety thing too, with OpenAI wanting its red teamers to be fluent in more than one language. Models are notoriously easier to jailbreak in different languages, and it looks like OpenAI are working hard on that front. But there is one language that I bet you didn't know GPT-4 can already speak, and that's the language of gobbledygook.

According to this fascinating paper from Tokyo, GPT-4 can almost perfectly handle unnatural scrambled text. Now you might already know that humans have this ability, if the first and last letter of a word is the same, you can often still recognize the word. But GPT-4, and obviously GPT-5, will be able to go a step further.

Even if the first and last letter are different, if a word is completely scrambled, it can recover the sentence. I tested it out, and indeed it works. Just look at how utterly gobbled this sentence is. For me and you, that would be almost complete gobbledygook. But GPT-4 was able to recognize what I'm saying.

So that is the practical tip that I also wanted to give you in this video. If you have a quick request of GPT-4, don't bother going back and spending 30 seconds correcting all your typos. Trust me, I've been guilty of this in the past because I love perfect English, but if you have a letter or two out of place, don't worry, it will understand.

To be honest, if it can unscramble this, it can understand your typo of the word there, your missing comma, and all the rest of it. So save yourself 30 seconds and don't even bother correcting your typos. But now it's time for me to finally give my prediction for when GPT-5 is going to be released.

For the last few weeks, I've honestly thought it would be around September of this year. But now I think it will be toward the end of November of 2024. And no, that's not just because that would be the two-year anniversary of the release of ChatGPT, the original version. First, let me clarify that I don't think they will release the full capabilities of GPT-5 on the first go.

As mentioned, I think they'll release different checkpoints, different functionalities as we head into 2025. But what explains the delay from now when they're training it all the way to November? Well, first of all, as mentioned, it does take a few months to train a model the size of GPT-5.

Yes, they might be able to use, say, 100,000 H100 GPUs from NVIDIA. But training a model has hiccups and, of course, the model will be much larger. But let's say that takes around two months. That would bring us to the end of March. Now here's the key point. Sam Altman has boasted many times in the past about how they tested GPT-4 for six to eight months before releasing it.

It would be pretty awkward for OpenAI to have even less safety testing for GPT-5. So add six months to the end of March and you get to the end of September. Of course, add eight months and you get to the end of November. So why end of November rather than end of September?

Well, I think OpenAI will want to steer clear of what will be an incredibly contentious American election. If they release even an alpha version of GPT-5 with, say, video and audio before the election, they could come under incredible flack. As they say on their website, they're still working to understand how effective their current tools might be for personalized persuasion.

That would be stepping into a minefield. In the recent New Hampshire election, we already saw robocalls imitating Joe Biden. So November 30th, as well as being a symbolic date, would steer clear of that election. You might say, what about 2025? But I think the incentives of Moloch will prevent them delaying too long.

In not too long, we might be getting the release of Gemini Ultra, not to mention Gemini 2 Ultra and, of course, LLAMA 3 from Meta, announced by Zuckerberg. When everyone else has caught up, they might feel compelled to release the next model. And Anthropic might choose a similar time to release Claude III.

Here is the CEO of Anthropic, Dario Amadei, giving his rough predictions for GPT-5 and Claude III. What do you think happens on the next major training run for LLMs? My guess would be, you know, nothing truly insane happens, say, in any training run that, you know, happens in 2024.

You know, to really invent new science, the ability to cure diseases, the ability to make bio, yeah, the ability to make bioweapons, yeah, and maybe someday the Dyson spheres. The least impressive of those things, I think, you know, will happen, I would say, no sooner than 2025, maybe 2026.

I think we're just going to see, in 2024, crisper, more commercially applicable versions of the models that exist today. Like, you know, we've seen a few of these generations of jumps. I think in 2024, people are certainly going to be surprised. Like, they're going to be surprised at how much better these things have gotten.

But it's not going to quite bend reality yet. Of course, I have to say, at the end of this video, that no one truly knows, not even OpenAI, what GPT-5 will be like. As Sam Altman recently said, "Until we go train that model, GPT-5, it's like a fun guessing game for us." I can't tell you, here's exactly what it's going to do that GPT-4 didn't.

And here's Greg Brockman with a similar message. Right, that is the biggest theme in the history of AI, is that it's full of surprises. Every time you think you know something, you scale it up 10x, turns out you knew nothing. And so I think that we, as a humanity, as a species, are really exploring this together.

Mm-hmm. And then we get cryptic messages like this from senior members of OpenAI. Ben Newhouse says he's hiring at OpenAI, "And we're building what I think could be an industry-defining 0-to-1 product that leverages the latest and greatest from our upcoming models, i.e. GPT-5." And two other OpenAI employees replied like this, "This product will change everything and what they said." Of course, it would be pure speculation to guess what they mean with this.

So those were my predictions. I try to base them on evidence rather than idle speculation. I do try not to be lazy like GPT-4 has in the past been accused of being. If you like this kind of thing, I would invite you to see more exclusive premium content on AI Insiders on Patreon, but for everyone watching to the end, I want to thank you so much for being here and I want to wish you a wonderful day.