back to indexGPT-5: Everything You Need to Know So Far
00:00:00.000 |
It seems quite likely that yesterday was the day that OpenAI launched the full training run of GPT-5. 00:00:07.600 |
I've gone through every source I can find to bring you the most reliable information on what 00:00:12.640 |
that means, including possibly every public comment on the topic from OpenAI, an exclusive 00:00:19.040 |
interview with a hardware CEO, and tons of my own analysis. Plus, I'm going to find time to throw in 00:00:25.760 |
a practical tip that you can use literally every time you open ChatGPT, and a bonus DALI discovery 00:00:33.120 |
that I really enjoy. But let's start with these two tweets. And the first clue that the full-scale 00:00:39.520 |
GPT-5 is being trained comes from the president and co-founder of OpenAI, Greg Brockman. Now first, 00:00:46.000 |
a little bit of context. OpenAI typically trains smaller models of about a thousandth the size 00:00:51.440 |
before they do a full training run. They then gather insights from these smaller models before 00:00:56.320 |
doing the full training run. So that's the backdrop to OpenAI, in Brockman's words, scientifically 00:01:02.640 |
predicting and understanding the resulting systems. That being done, what they're now building 00:01:09.040 |
is maximally harnessing all their computing resources. They're gathering all of their ideas 00:01:14.640 |
together and scaling beyond precedent. Translated, they are training their biggest model yet. We'll 00:01:20.640 |
get to parameters, data, and capabilities in a moment. But first, what's this other tweet? 00:01:25.040 |
This comes from Jason Wei, a top OpenAI researcher. A few hours after Brockman's tweet, 00:01:30.160 |
he said there's no adrenaline rush like launching a massive GPU training. And this got plenty of 00:01:36.720 |
salutes in the reply from other OpenAI employees. Now for context, this does not mean that we're 00:01:42.720 |
imminently going to get GPT-5. GPT-4 took around three months to train, and then there was the 00:01:48.880 |
safety testing. I'm actually going to end this video with my exact prediction of when I think 00:01:53.040 |
they're going to release GPT-5. But first, here is a little bit more supporting evidence that 00:01:58.080 |
they are currently training GPT-5. OpenAI updated their blog to say that applications for the red 00:02:04.400 |
teaming network have closed, and that those red teamers would know about the status of their 00:02:09.200 |
applications by the end of last year. What that means is that the red teamers are now in place 00:02:14.720 |
to start safety testing the new model. Now, you might say, what's the point of having those red 00:02:19.520 |
teamers in place if it's still going to be training for two to three months? Well, before a 00:02:24.560 |
model is fully trained, it goes through various checkpoints. Think of them a bit like a video game 00:02:30.160 |
save. What that also means is that in effect, OpenAI will have a GPT-4.2 before they have a GPT-5. 00:02:38.400 |
Indeed, Greg Brockman, going back to April of last year, said that it might be one of those 00:02:43.520 |
checkpoints that OpenAI release first. He said that it's easy to create a continuum of incrementally 00:02:50.640 |
better AIs, such as by deploying subsequent checkpoints of a given training run. And he 00:02:55.920 |
explicitly contrasted that approach and said it would be very unlike our historical approach 00:03:01.440 |
of infrequent major model upgrades. But remember that even before those checkpoints, OpenAI would 00:03:07.360 |
have already gotten a glimpse of GPT-5 capabilities from those smaller, earlier versions of the model. 00:03:13.920 |
Indeed, Sam Altman, back in November, said that he was privileged to be in the room 00:03:18.080 |
when they pushed back the veil of ignorance. I'm super excited. I can't imagine anything 00:03:23.040 |
more exciting to work on. And on a personal note, like four times now in the history of OpenAI, 00:03:27.680 |
the most recent time was just in the last couple of weeks. I've gotten to be in the room when we 00:03:32.880 |
sort of like push the front, the sort of the veil of ignorance back and the frontier of 00:03:38.000 |
discovery forward. And getting to do that is like the professional honor of a lifetime. 00:03:42.480 |
It all points to OpenAI in November and December having trained those smaller versions of GPT-5. 00:03:49.120 |
Again, with the purpose of scientifically predicting and understanding the resulting 00:03:54.560 |
GPT-5 system. So they know it's going to be good, but just how good and how big? And what are these 00:04:01.120 |
new or old ideas that they're going to incorporate? Well, one thing that seemed almost certain 00:04:06.240 |
is that they're going to incorporate a way to let GPT-5 think for longer. In other words, 00:04:11.280 |
it's going to lay out its reasoning steps before solving a challenge and have each of those 00:04:16.000 |
reasoning steps checked internally or externally. Here's Sam Altman a few days ago at Davos. 00:04:22.240 |
What it means to verify or understand what's going on is going to be a little bit different 00:04:26.560 |
than people think right now. I actually can't look in your brain and look at the hundred trillion 00:04:32.080 |
synapses and try to understand what's happening in each one and say, okay, I really understand 00:04:36.880 |
why he's thinking what he's thinking. You're not a black box to me. But what I can ask you to do 00:04:42.720 |
is explain to me your reasoning. I can say, you know, you think this thing, why? And you can 00:04:48.080 |
explain first this, then this, then there's this conclusion, then that one, and then there's this. 00:04:52.240 |
And I can decide if that sounds reasonable to me or not. And I think our AI systems will also be 00:04:58.400 |
able to do the same thing. They'll be able to explain to us in natural language, the steps 00:05:03.040 |
from A to B, and we can decide whether we think those are good steps. 00:05:09.200 |
And a few days before that, Sam Altman told Bill Gates that that might involve 00:05:14.000 |
asking GPT-4 or GPT-5 the same question 10,000 times. 00:05:19.200 |
You know, when you look at the next two years, what do you think some of the key milestones 00:05:29.840 |
Speech in, speech out, images, eventually video. Clearly, people really want that. 00:05:34.880 |
We launched images and audio, and it had a much stronger response than we expected. 00:05:39.600 |
We'll be able to push that much further. But maybe the most important areas of progress 00:05:45.280 |
will be around reasoning ability. Right now, GPT-4 can reason in only extremely limited ways. 00:05:50.800 |
And also reliability. If you ask GPT-4 most questions 10,000 times, 00:05:56.400 |
one of those 10,000 is probably pretty good, but it doesn't always know which one. 00:06:00.320 |
And you'd like to get the best response of 10,000 each time. 00:06:03.440 |
That increase in reliability will be important. 00:06:05.760 |
And at this point, watchers of my channel will know exactly what he's referring to. 00:06:09.520 |
Both of those approaches, checking your reasoning steps and sampling up to 10,000 times, 00:06:15.040 |
are incorporated into OpenAI's Let's Verify step-by-step paper. 00:06:19.440 |
Now, I'm not going to dive into the details of Let's Verify in this video, 00:06:23.360 |
because I've got at least two previous videos on the topic. 00:06:26.480 |
But notice in the paper how many times they sample GPT-4. 00:06:30.160 |
The chart shows what happens when you sample the model over a thousand times 00:06:36.640 |
And notice something about this process-supervised way of doing things. 00:06:42.640 |
I can't resist showing you a quick example of the reasoning steps broken down into separate lines 00:06:48.240 |
and essentially a verifier looking in and checking which steps are accurate or inaccurate. 00:06:53.760 |
The answers for whom each step in the reasoning process got a thumbs up 00:06:58.000 |
were the ones submitted and the results were dramatic. 00:07:01.440 |
Essentially, this process of sampling the model thousands of times 00:07:04.640 |
and taking the answer that had the highest rated reasoning steps 00:07:10.480 |
And no, this didn't just work for mathematics. 00:07:12.720 |
It had dramatic results across these STEM fields. 00:07:15.840 |
And remember, this was using GPT-4 as a base model, not GPT-5. 00:07:20.240 |
And it was only 2000 samples, not 10,000 like Sam Altman talked about. 00:07:24.480 |
So this is the evidence I would present to someone who said that LLMs have peaked. 00:07:29.760 |
If OpenAI can incorporate through parallelization 00:07:33.360 |
a way to get the model to submit and analyze 10,000 responses, 00:07:40.480 |
Indeed, the Let's Verify paper from OpenAI repeatedly cited this earlier DeepMind paper 00:07:46.400 |
on solving math problems with process-based feedback. 00:07:49.920 |
And for coding, we know AlphaCode2 from Google DeepMind used the mass sampling approach 00:07:55.760 |
to get an 87th percentile score on a coding contest. 00:08:00.320 |
In other words, it beat 87% of participants in this CodeForces coding challenge. 00:08:05.440 |
For context, the GPT-4 that we got scored around the 5th percentile, 00:08:12.640 |
These numbers are a little out of date, but AlphaCode2 would have scored around here, 00:08:19.040 |
Or to just translate everything, if they find a way to let GPT-5 think, 00:08:24.480 |
it could be night and day in terms of performance for coding, mathematics, and STEM. 00:08:30.160 |
But just how big will GPT-5 be that's doing all of this parallel thinking? 00:08:34.720 |
Well, for AI Insiders, I interviewed Gavin Uberti, 00:08:38.240 |
the CEO and co-founder of EtchedAI, which I've also talked about on this channel. 00:08:43.440 |
He is a 21-year-old dropout from Harvard University and on his LinkedIn profile, 00:08:47.920 |
it says he's building the hardware for superintelligence. 00:08:51.120 |
In the interview, he guessed that GPT-5 would have around 10 times the parameter count of GPT-4. 00:08:57.040 |
According to leaks, GPT-4 has around 1.5 to 1.8 trillion parameters. 00:09:01.920 |
But just quickly, what did he mean when he said that he expects that to come from a combination 00:09:05.920 |
of a larger embedding dimension, more layers, and double the number of experts? 00:09:10.560 |
Well, think of the embedding dimension as being about how granular the training can be 00:09:17.360 |
A bigger embedding dimension means more granularity and nuance about each token. 00:09:22.240 |
And doubling the number of layers allows a model to develop deeper pattern recognition. 00:09:26.960 |
It allows it to see more complex patterns within patterns within patterns. 00:09:31.040 |
More highlights from that interview will be coming on 00:09:36.720 |
But during this video on GPT-5, I promised you two interludes. 00:09:41.280 |
One focused on DALI-3 and one a practical tip for using ChatGPT anytime. 00:09:47.120 |
Well, here is the first of those two interludes focused on a particular quirk of DALI-3. 00:09:52.400 |
I say that, but this trick also works for mid-journey. 00:09:55.040 |
Now, many of you might have noticed a trend on TikTok, Reddit, Twitter, and YouTube shorts, 00:10:00.160 |
where people post an image and then make it progressively more intense, let's say. 00:10:04.720 |
Well, here's something arguably even more quirky. 00:10:07.680 |
I got the original idea from Peter Wilderford on Twitter and decided to make it more intense. 00:10:13.520 |
But first I asked, draw an image of a London scene, but don't use lampposts in the image. 00:10:18.960 |
And lo and behold, we get dozens of lampposts. 00:10:22.400 |
I mean, in this one, we have a lamppost coming down from the sky. 00:10:28.960 |
It says that these are two images of a London street scene without lampposts. 00:10:33.440 |
So then I said, now make these images with even fewer lampposts, 00:10:37.760 |
stripping any lamppost references completely. 00:10:42.080 |
As you can see on the right, there's barely a lamppost in sight. 00:10:45.840 |
And as GPT-4 says, these images were created with complete omission of any lamppost references. 00:10:52.000 |
Then I said, delete absolutely everything that pertains to a lamppost. 00:10:55.920 |
And I don't know about you, but I can't see any lampposts left. 00:10:58.400 |
And finally, I said, take this to the max and give me an image that 00:11:01.760 |
someone could not picture a lamppost existing within, even in their wildest imagination. 00:11:06.480 |
Now, I think it's pretty cute that the lamppost persisted throughout these images. 00:11:10.800 |
And I suspect that the deeper reason is that the caption training 00:11:15.040 |
that DALI 3 got didn't have many examples of omission. 00:11:18.800 |
They used web captions and synthetic captions, 00:11:21.360 |
but I doubt there were many examples of people saying this image does not contain X. 00:11:26.000 |
But speaking of modality, the first thing they want to fix apparently for GPT-5 00:11:31.120 |
is the real time nature of the voice interaction. 00:11:34.400 |
At the moment, there is quite a bit of time to first token latency. 00:11:38.560 |
In other words, it takes a bit too long to reply. 00:11:42.960 |
I think there's all sorts of the current stuff that people complain about, 00:11:45.440 |
like the voice is too slow and, you know, it's not real time and that'll get better this year. 00:11:49.360 |
I think where we're headed, and then I'll talk about this year, 00:11:51.680 |
is we're headed towards the way you use the computers to talk to it. 00:11:54.800 |
The operating system of a computer in some sense is close to this idea 00:11:58.560 |
that you're like working inside of a chat experience. 00:12:02.400 |
When he mentioned using an LLM as an operating system, 00:12:08.320 |
I've talked about it before, but notice at the top that it's video in and out, 00:12:16.640 |
They want as much text, image, audio and video data as they can get their hands on. 00:12:21.680 |
They also want what I'm going to call reasoning data. 00:12:24.560 |
Data that expresses human intention, they call it. 00:12:27.440 |
Now, I didn't notice that phrase at the time of this blog post in November, 00:12:31.440 |
but it fits in clearly with what I was saying earlier about Let's Verify. 00:12:36.000 |
How would you make a model agentic, able to solve more complex challenges? 00:12:40.240 |
Well, if GPT-5 gets loads of data about people laying out plans full of human intention, 00:12:45.760 |
GPT-5 could learn to imitate those schemes and plans 00:12:49.120 |
and maybe have a verifier internally or externally judging those reasoning steps. 00:12:53.840 |
Now, as for the question of whether those reasoning steps faithfully represent 00:12:57.840 |
what the model is internally calculating, that will have to be for another day. 00:13:02.240 |
This paper from Anthropic back in July said that as models become larger and more capable, 00:13:07.520 |
they actually produce less faithful reasoning on most tasks we study. 00:13:11.360 |
That doesn't mean getting the answer wrong more often. 00:13:14.000 |
It means outputting reasoning steps that don't actually reflect what it's internally calculating. 00:13:19.040 |
So, GPT-5 may end up being an excellent productivity assistant 00:13:23.280 |
while still being somewhat inscrutable on a deeper level. 00:13:26.720 |
Just quickly before we leave data, I think there's one thing that we can safely say, 00:13:30.320 |
which is that there will be much more multilingual data in GPT-5 training set. 00:13:35.440 |
OpenAI have formed so many data partnerships, 00:13:38.240 |
including with people like the Icelandic government, 00:13:40.720 |
and there are so many more multilingual data sets being open sourced 00:13:44.720 |
that I think it's almost inevitable that there will be a dramatic forward step 00:13:53.360 |
with OpenAI wanting its red teamers to be fluent in more than one language. 00:13:57.440 |
Models are notoriously easier to jailbreak in different languages, 00:14:01.360 |
and it looks like OpenAI are working hard on that front. 00:14:04.240 |
But there is one language that I bet you didn't know GPT-4 can already speak, 00:14:10.400 |
According to this fascinating paper from Tokyo, 00:14:13.280 |
GPT-4 can almost perfectly handle unnatural scrambled text. 00:14:17.360 |
Now you might already know that humans have this ability, 00:14:20.000 |
if the first and last letter of a word is the same, 00:14:25.280 |
But GPT-4, and obviously GPT-5, will be able to go a step further. 00:14:29.440 |
Even if the first and last letter are different, 00:14:32.320 |
if a word is completely scrambled, it can recover the sentence. 00:14:38.720 |
Just look at how utterly gobbled this sentence is. 00:14:42.480 |
For me and you, that would be almost complete gobbledygook. 00:14:45.920 |
But GPT-4 was able to recognize what I'm saying. 00:14:49.120 |
So that is the practical tip that I also wanted to give you in this video. 00:14:56.000 |
don't bother going back and spending 30 seconds correcting all your typos. 00:15:00.800 |
Trust me, I've been guilty of this in the past because I love perfect English, 00:15:05.680 |
but if you have a letter or two out of place, don't worry, it will understand. 00:15:09.520 |
To be honest, if it can unscramble this, it can understand your typo of the word there, 00:15:16.400 |
So save yourself 30 seconds and don't even bother correcting your typos. 00:15:20.080 |
But now it's time for me to finally give my prediction for when GPT-5 is going to be released. 00:15:25.600 |
For the last few weeks, I've honestly thought it would be around September of this year. 00:15:30.160 |
But now I think it will be toward the end of November of 2024. 00:15:35.760 |
And no, that's not just because that would be the two-year anniversary 00:15:39.440 |
of the release of ChatGPT, the original version. 00:15:42.400 |
First, let me clarify that I don't think they will release 00:15:45.040 |
the full capabilities of GPT-5 on the first go. 00:15:48.640 |
As mentioned, I think they'll release different checkpoints, 00:15:50.880 |
different functionalities as we head into 2025. 00:15:54.080 |
But what explains the delay from now when they're training it all the way to November? 00:15:58.880 |
Well, first of all, as mentioned, it does take a few months to train a model the size of GPT-5. 00:16:05.120 |
Yes, they might be able to use, say, 100,000 H100 GPUs from NVIDIA. 00:16:09.520 |
But training a model has hiccups and, of course, the model will be much larger. 00:16:13.680 |
But let's say that takes around two months. That would bring us to the end of March. 00:16:19.040 |
Sam Altman has boasted many times in the past 00:16:22.080 |
about how they tested GPT-4 for six to eight months before releasing it. 00:16:26.800 |
It would be pretty awkward for OpenAI to have even less safety testing for GPT-5. 00:16:33.120 |
So add six months to the end of March and you get to the end of September. 00:16:37.280 |
Of course, add eight months and you get to the end of November. 00:16:40.640 |
So why end of November rather than end of September? 00:16:43.680 |
Well, I think OpenAI will want to steer clear of 00:16:47.360 |
what will be an incredibly contentious American election. 00:16:50.960 |
If they release even an alpha version of GPT-5 00:16:54.080 |
with, say, video and audio before the election, they could come under incredible flack. 00:16:59.200 |
As they say on their website, they're still working to understand 00:17:02.240 |
how effective their current tools might be for personalized persuasion. 00:17:08.880 |
In the recent New Hampshire election, we already saw robocalls imitating Joe Biden. 00:17:13.520 |
So November 30th, as well as being a symbolic date, would steer clear of that election. 00:17:28.480 |
But I think the incentives of Moloch will prevent them delaying too long. 00:17:32.160 |
In not too long, we might be getting the release of Gemini Ultra, 00:17:35.040 |
not to mention Gemini 2 Ultra and, of course, LLAMA 3 from Meta, announced by Zuckerberg. 00:17:40.240 |
When everyone else has caught up, they might feel compelled to release the next model. 00:17:44.320 |
And Anthropic might choose a similar time to release Claude III. 00:17:48.240 |
Here is the CEO of Anthropic, Dario Amadei, giving his rough predictions for GPT-5 and Claude III. 00:17:55.040 |
What do you think happens on the next major training run for LLMs? 00:17:58.880 |
My guess would be, you know, nothing truly insane happens, say, in any training run that, 00:18:07.680 |
You know, to really invent new science, the ability to cure diseases, the ability to make 00:18:14.160 |
bio, yeah, the ability to make bioweapons, yeah, and maybe someday the Dyson spheres. 00:18:18.480 |
The least impressive of those things, I think, you know, will happen, 00:18:22.000 |
I would say, no sooner than 2025, maybe 2026. 00:18:28.640 |
crisper, more commercially applicable versions of the models that exist today. 00:18:34.000 |
Like, you know, we've seen a few of these generations of jumps. 00:18:37.360 |
I think in 2024, people are certainly going to be surprised. 00:18:40.560 |
Like, they're going to be surprised at how much better these things have gotten. 00:18:43.680 |
But it's not going to quite bend reality yet. 00:18:47.120 |
Of course, I have to say, at the end of this video, 00:18:49.520 |
that no one truly knows, not even OpenAI, what GPT-5 will be like. 00:18:54.160 |
As Sam Altman recently said, "Until we go train that model, GPT-5, 00:19:00.400 |
I can't tell you, here's exactly what it's going to do that GPT-4 didn't. 00:19:04.480 |
And here's Greg Brockman with a similar message. 00:19:07.040 |
Right, that is the biggest theme in the history of AI, is that it's full of surprises. 00:19:10.400 |
Every time you think you know something, you scale it up 10x, turns out you knew nothing. 00:19:14.000 |
And so I think that we, as a humanity, as a species, are really exploring this together. 00:19:19.360 |
And then we get cryptic messages like this from senior members of OpenAI. 00:19:26.800 |
"And we're building what I think could be an industry-defining 0-to-1 product 00:19:32.640 |
that leverages the latest and greatest from our upcoming models, i.e. GPT-5." 00:19:37.680 |
And two other OpenAI employees replied like this, 00:19:40.640 |
"This product will change everything and what they said." 00:19:44.160 |
Of course, it would be pure speculation to guess what they mean with this. 00:19:50.080 |
I try to base them on evidence rather than idle speculation. 00:19:54.080 |
I do try not to be lazy like GPT-4 has in the past been accused of being. 00:19:59.760 |
I would invite you to see more exclusive premium content on AI Insiders on Patreon, 00:20:07.440 |
I want to thank you so much for being here and I want to wish you a wonderful day.