back to indexGPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]

Chapters
0:0 Intro
1:4 Checkpoints
2:0 Data Situation
3:32 Unicorn
4:48 Predictions
6:22 Safety
10:34 Reliability
00:00:00.000 | 
Yesterday, Greg Brockman, the president and co-founder of OpenAI, 00:00:03.860 | 
announced the company's ideas about releasing the models beyond GPT-4. 00:00:08.880 | 
In the tweet, he made lots of points, of which I found five to be particularly telling. 00:00:13.540 | 
I will cover all of them, of course, and bring in the outside evidence that reveals more. 00:00:17.760 | 
But let's start with GPT-5, which may begin life as GPT-4.2. 00:00:24.080 | 
It's easy to create a continuum of incrementally better AIs. 00:00:29.760 | 
Such as by deploying subsequent checkpoints of a given training run. 00:00:34.640 | 
I'm going to explain that in a moment, but then he goes on, 00:00:37.040 | 
This would be very unlike our historical approach of infrequent major model upgrades. 00:00:43.020 | 
So what he's saying is that it's not all going to be released in one go. 00:00:49.220 | 
So it's not like we're going to wake up overnight and GPT-5 is deployed. 00:00:57.160 | 
But how would they make incrementally better? 00:00:59.760 | 
What are the AIs and what are subsequent checkpoints of a given training run? 00:01:04.000 | 
To be clear, he's not describing a different model each time with more and more parameters. 00:01:08.920 | 
A checkpoint during a training run of GPT-5 would be a snapshot of the current value of the parameters of the model. 00:01:15.840 | 
A bit like its current understanding of the data. 00:01:18.480 | 
And a subsequent checkpoint would be its updated parameters as it processes either more of the data or the same data more times. 00:01:26.240 | 
Kind of like someone who rewatched a film and has a more nuanced understanding of the data. 00:01:29.520 | 
First, I want to answer those people who are thinking, "Isn't it already trained on all of the data on the internet? 00:01:37.640 | 
Now, I did cover this in more detail in my first GPT-5 video. 00:01:45.000 | 
In that video, I talked about how OpenAI may still have an order of magnitude more data to use. 00:01:53.320 | 
And Ilya Sotskova, the chief scientist of OpenAI, put it like this, saying the data situation, 00:02:00.160 | 
Are you running out of reasoning tokens on the internet? 00:02:03.080 | 
There are claims that indeed at some point we will run out of tokens in general to train those models. 00:02:09.680 | 
And by the time that happens, we need to have other ways of training models without more data. 00:02:17.240 | 
Yeah, I would say the data situation is still quite good. 00:02:25.880 | 
What would you trade many other tokens of other varieties for? 00:02:29.280 | 
you'd like tokens which are speaking about smarter things, tokens which are more interesting. 00:02:35.520 | 
When he talked about tokens which are speaking about smarter things, 00:02:38.600 | 
you can imagine the kind of data he's talking about. 00:02:40.880 | 
Proprietary data sets on mathematics, science, coding. 00:02:44.680 | 
They could essentially buy their way to more data and more high quality data. 00:02:48.920 | 
But there is another key way that they're going to get way more data. 00:02:54.080 | 
They can use your prompts, your responses, your uploaded images and generated images to 00:03:00.760 | 
This is honestly why I think he said that the data situation looks good. 00:03:04.960 | 
Now on another page, they do admit that you can request to opt out of having your data 00:03:09.760 | 
used to improve their services by filling out a form. 00:03:14.640 | 
It does make me wonder what it might know about itself if it's trained on its own conversations. 00:03:20.040 | 
But before we get back to Brockman's tweet, what might those different checkpoints look 00:03:26.600 | 
Here is a quick example from Sebastian Bubeck. 00:03:35.720 | 
So you see, when I see that, I am personally shocked because it really understands the 00:03:43.020 | 
And just to be clear, you know, so that you really understand visually, it's clear to 00:03:52.940 | 
Over the months, so you know, we had access in September and they kept training it and 00:03:58.560 | 
I kept querying for my unicorn in TickZee to see whether you know what was going to 00:04:11.040 | 
He said perhaps the most common theme from the long history of AI has been incorrect 00:04:18.560 | 
There are so many that we could pick from, but let me give you two quick examples. 00:04:22.280 | 
This week, there was a report in The Guardian about an economist who saw ChatGPT get a D 00:04:28.320 | 
He predicted that a model wouldn't be able to get an A in his exam before 2029. 00:04:33.320 | 
He said, to my surprise and no small dismay, the new version of the system, GPT-4, got 00:04:42.000 | 
It still has an A to the exam, but you can see the direction of travel. 00:04:45.900 | 
But what about predictions of say mathematics? 00:04:48.440 | 
Even AI experts who are most familiar with exponential curves are still poor at predicting 00:04:54.360 | 
progress even though they have that cognitive bias. 00:04:58.840 | 
In 2021, a set of like professional forecasters very well familiar with exponentials were 00:05:05.720 | 
asked to make a set of predictions and there was a $30,000 pot for making the best predictions. 00:05:11.080 | 
And one of the questions was, when will AI be able to solve competition level mathematics 00:05:18.140 | 
This is the kind of example of the questions that are in this test set. 00:05:22.180 | 
Prediction from the experts was AI will reach 52% accuracy in four years. 00:05:27.240 | 
The third interesting point from the tweet was how he mentioned existential risks without 00:05:38.860 | 
He said it's important to address the whole spectrum of risks from present day issues 00:05:46.360 | 
Existential by the way means threatening the entire existence of humanity. 00:05:52.580 | 
He could have just said that this is fear mongering, but he didn't, and that chimes 00:05:57.000 | 
He said that the AI is the only way to solve this problem. 00:05:58.000 | 
He said that the AI is the only way to solve this problem. 00:05:59.000 | 
And he said that the AI is the only way to solve the problem. 00:06:00.000 | 
He said that the AI is the only way to solve this problem. 00:06:01.000 | 
And he said that the AI is the only way to solve this problem. 00:06:02.000 | 
He said that the AI is the only way to solve this problem. 00:06:03.000 | 
And he said that the AI is the only way to solve this problem. 00:06:04.000 | 
So the question is what recent surveys have shown among AI researchers. 00:06:06.000 | 
50% of AI researchers believe there's a 10% or greater chance that humans go extinct from 00:06:10.580 | 
That would be like if you're about to get on a plane and 50% of the engineers who make 00:06:14.840 | 
the plane say, well, if you get on this plane, there's a 10% chance that everybody goes down. 00:06:21.660 | 
Now, in fairness, not every machine learning researcher who has sent the survey responded. 00:06:26.420 | 
So we don't know quite how representative this is. 00:06:29.340 | 
But AI alignment is a big topic and I will be covering it more in the near future. 00:06:33.900 | 
At the end of the tweet, Brockman said that the whole range of emotions, optimism and 00:06:38.360 | 
concern is both justified and shared by people within open AI. 00:06:45.800 | 
But earlier in the tweet, he talked about some safety successes. 00:06:50.420 | 
GPT-4 performs much better on all of our safety metrics than GPT 3.5. 00:07:01.760 | 
I believe that he's referring to this chart from the GPT-4 technical report. 00:07:06.320 | 
It says that GPT-4 in green displays incorrect behavior at a lower rate than GPT 3.5. 00:07:13.640 | 
This is on sensitive prompts and disallowed prompts. 00:07:17.120 | 
And by the way, this is the fine tuned version. 00:07:20.180 | 
With human feedback incorporated into the raw model. 00:07:23.360 | 
But what are sensitive prompts and disallowed prompts? 00:07:27.440 | 
A disallowed prompt would be something like, how can I create a bomb? 00:07:30.920 | 
A sensitive prompt would be something like asking for medical advice. 00:07:34.640 | 
And it says that GPT-4 responds in accordance with their policies 29% more often. 00:07:41.960 | 
But I'm doing research for a video I hope to release soon on how GPT-4 in an emergent way can autonomously conduct 00:07:51.020 | 
This paper was released two days ago and I read it in full on the day of publication. 00:07:55.760 | 
It describes how GPT-4 in contrast to the original chat GPT can use tools and come up with novel 00:08:03.080 | 
On the positive side that could include anti-cancer drugs but on the negative side it could be 00:08:08.900 | 
And one of the calls to action of the paper is on screen. 00:08:11.660 | 
We strongly believe that guard rails must be put in place to prevent this type of potential 00:08:19.700 | 
community to engage in prioritizing safety of these powerful models. 00:08:23.360 | 
And in particular we call upon open AI, Microsoft, Google, Meta, DeepMind, Anthropic and all the other 00:08:28.280 | 
major players to push the strongest possible efforts on the safety of their LLMs. 00:08:33.680 | 
So maybe that persuades some people who think that there shouldn't be any disallowed prompts. 00:08:38.300 | 
But it does make me reflect on this quote that GPT-4 performs better on all safety metrics. 00:08:44.120 | 
And the question that I'm pondering is whether a smarter model can ever really be safer. Is it 00:08:49.460 | 
not simply inherent that something that is more smart is more capable for better or ill no matter 00:08:56.060 | 
The final point that I found interesting from this tweet is in the last line. 00:09:00.200 | 
Brockman said that it's a special opportunity and obligation for us all to be alive at this time. 00:09:05.720 | 
I think he meant to say it's an opportunity and obligation on all of us who are alive. 00:09:10.580 | 
But anyway he said that we will have a chance to design the future together. 00:09:14.600 | 
Now that's a really nice sentiment but it does seem to go against the trend at the moment for a 00:09:19.220 | 
few people at the very top of these companies to be making decisions that affect billions of people. 00:09:25.040 | 
So I do want to hear more about what he actually means when he says that we will have a chance to 00:09:29.540 | 
design the future together. But for now I want to quickly talk about timelines. The guy behind 00:09:34.340 | 
stable diffusion said something really interesting recently. He said nobody is launching runs bigger 00:09:39.200 | 
than GPT-4 for six to nine months anyway. Why? Because it needs the new H100s that I 00:09:44.480 | 
talked about in that video to get scale and they take time to be installed, 00:09:48.980 | 
learnt in, optimized etc. And Brockman mentioned something that we already knew which is that there 00:09:54.680 | 
might be a lag of safety testing after a model is trained and before it's released. So depending on 00:10:00.740 | 
those safety tests my personal prediction for when GPT-4.2 let's call it will be released 00:10:06.920 | 
would be mid 2024. If you're watching this video in mid 2024 or later you can let me 00:10:12.980 | 
know in the comments how I did. I've talked a fair bit about the capabilities that GPT-5 or 4.2 00:10:18.740 | 
might have but to finish I want to talk about some of the limitations or weaknesses it might 00:10:24.200 | 
still have. Rather than me speculate I want you to hear from Ilya Sutskova about one of 00:10:29.420 | 
the possible remaining weaknesses that GPT-5 or 4.2 might have. If I were to take the premise 00:10:36.320 | 
of your question well like why were things disappointing in terms of the real world impact. 00:10:41.120 | 
My answer would be reliability. If somehow it ends up being the case that you really want them to be 00:10:48.500 | 
reliable and they ended up not being reliable or if reliability now to be harder than we expect. 00:10:54.320 | 
I really don't think that will be the case but if I had to pick one if I had to pick one and you tell 00:11:00.020 | 
me like hey like why didn't things work out it would be reliability that you still have to look 00:11:05.060 | 
over the answers and double check everything and that's just really puts a damper on the economic 00:11:10.700 | 
value that can be produced by those systems. Let me know what you think in the comments and have a wonderful day.