back to indexGPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]
Chapters
0:0 Intro
1:4 Checkpoints
2:0 Data Situation
3:32 Unicorn
4:48 Predictions
6:22 Safety
10:34 Reliability
00:00:00.000 |
Yesterday, Greg Brockman, the president and co-founder of OpenAI, 00:00:03.860 |
announced the company's ideas about releasing the models beyond GPT-4. 00:00:08.880 |
In the tweet, he made lots of points, of which I found five to be particularly telling. 00:00:13.540 |
I will cover all of them, of course, and bring in the outside evidence that reveals more. 00:00:17.760 |
But let's start with GPT-5, which may begin life as GPT-4.2. 00:00:24.080 |
It's easy to create a continuum of incrementally better AIs. 00:00:29.760 |
Such as by deploying subsequent checkpoints of a given training run. 00:00:34.640 |
I'm going to explain that in a moment, but then he goes on, 00:00:37.040 |
This would be very unlike our historical approach of infrequent major model upgrades. 00:00:43.020 |
So what he's saying is that it's not all going to be released in one go. 00:00:49.220 |
So it's not like we're going to wake up overnight and GPT-5 is deployed. 00:00:57.160 |
But how would they make incrementally better? 00:00:59.760 |
What are the AIs and what are subsequent checkpoints of a given training run? 00:01:04.000 |
To be clear, he's not describing a different model each time with more and more parameters. 00:01:08.920 |
A checkpoint during a training run of GPT-5 would be a snapshot of the current value of the parameters of the model. 00:01:15.840 |
A bit like its current understanding of the data. 00:01:18.480 |
And a subsequent checkpoint would be its updated parameters as it processes either more of the data or the same data more times. 00:01:26.240 |
Kind of like someone who rewatched a film and has a more nuanced understanding of the data. 00:01:29.520 |
First, I want to answer those people who are thinking, "Isn't it already trained on all of the data on the internet? 00:01:37.640 |
Now, I did cover this in more detail in my first GPT-5 video. 00:01:45.000 |
In that video, I talked about how OpenAI may still have an order of magnitude more data to use. 00:01:53.320 |
And Ilya Sotskova, the chief scientist of OpenAI, put it like this, saying the data situation, 00:02:00.160 |
Are you running out of reasoning tokens on the internet? 00:02:03.080 |
There are claims that indeed at some point we will run out of tokens in general to train those models. 00:02:09.680 |
And by the time that happens, we need to have other ways of training models without more data. 00:02:17.240 |
Yeah, I would say the data situation is still quite good. 00:02:25.880 |
What would you trade many other tokens of other varieties for? 00:02:29.280 |
you'd like tokens which are speaking about smarter things, tokens which are more interesting. 00:02:35.520 |
When he talked about tokens which are speaking about smarter things, 00:02:38.600 |
you can imagine the kind of data he's talking about. 00:02:40.880 |
Proprietary data sets on mathematics, science, coding. 00:02:44.680 |
They could essentially buy their way to more data and more high quality data. 00:02:48.920 |
But there is another key way that they're going to get way more data. 00:02:54.080 |
They can use your prompts, your responses, your uploaded images and generated images to 00:03:00.760 |
This is honestly why I think he said that the data situation looks good. 00:03:04.960 |
Now on another page, they do admit that you can request to opt out of having your data 00:03:09.760 |
used to improve their services by filling out a form. 00:03:14.640 |
It does make me wonder what it might know about itself if it's trained on its own conversations. 00:03:20.040 |
But before we get back to Brockman's tweet, what might those different checkpoints look 00:03:26.600 |
Here is a quick example from Sebastian Bubeck. 00:03:35.720 |
So you see, when I see that, I am personally shocked because it really understands the 00:03:43.020 |
And just to be clear, you know, so that you really understand visually, it's clear to 00:03:52.940 |
Over the months, so you know, we had access in September and they kept training it and 00:03:58.560 |
I kept querying for my unicorn in TickZee to see whether you know what was going to 00:04:11.040 |
He said perhaps the most common theme from the long history of AI has been incorrect 00:04:18.560 |
There are so many that we could pick from, but let me give you two quick examples. 00:04:22.280 |
This week, there was a report in The Guardian about an economist who saw ChatGPT get a D 00:04:28.320 |
He predicted that a model wouldn't be able to get an A in his exam before 2029. 00:04:33.320 |
He said, to my surprise and no small dismay, the new version of the system, GPT-4, got 00:04:42.000 |
It still has an A to the exam, but you can see the direction of travel. 00:04:45.900 |
But what about predictions of say mathematics? 00:04:48.440 |
Even AI experts who are most familiar with exponential curves are still poor at predicting 00:04:54.360 |
progress even though they have that cognitive bias. 00:04:58.840 |
In 2021, a set of like professional forecasters very well familiar with exponentials were 00:05:05.720 |
asked to make a set of predictions and there was a $30,000 pot for making the best predictions. 00:05:11.080 |
And one of the questions was, when will AI be able to solve competition level mathematics 00:05:18.140 |
This is the kind of example of the questions that are in this test set. 00:05:22.180 |
Prediction from the experts was AI will reach 52% accuracy in four years. 00:05:27.240 |
The third interesting point from the tweet was how he mentioned existential risks without 00:05:38.860 |
He said it's important to address the whole spectrum of risks from present day issues 00:05:46.360 |
Existential by the way means threatening the entire existence of humanity. 00:05:52.580 |
He could have just said that this is fear mongering, but he didn't, and that chimes 00:05:57.000 |
He said that the AI is the only way to solve this problem. 00:05:58.000 |
He said that the AI is the only way to solve this problem. 00:05:59.000 |
And he said that the AI is the only way to solve the problem. 00:06:00.000 |
He said that the AI is the only way to solve this problem. 00:06:01.000 |
And he said that the AI is the only way to solve this problem. 00:06:02.000 |
He said that the AI is the only way to solve this problem. 00:06:03.000 |
And he said that the AI is the only way to solve this problem. 00:06:04.000 |
So the question is what recent surveys have shown among AI researchers. 00:06:06.000 |
50% of AI researchers believe there's a 10% or greater chance that humans go extinct from 00:06:10.580 |
That would be like if you're about to get on a plane and 50% of the engineers who make 00:06:14.840 |
the plane say, well, if you get on this plane, there's a 10% chance that everybody goes down. 00:06:21.660 |
Now, in fairness, not every machine learning researcher who has sent the survey responded. 00:06:26.420 |
So we don't know quite how representative this is. 00:06:29.340 |
But AI alignment is a big topic and I will be covering it more in the near future. 00:06:33.900 |
At the end of the tweet, Brockman said that the whole range of emotions, optimism and 00:06:38.360 |
concern is both justified and shared by people within open AI. 00:06:45.800 |
But earlier in the tweet, he talked about some safety successes. 00:06:50.420 |
GPT-4 performs much better on all of our safety metrics than GPT 3.5. 00:07:01.760 |
I believe that he's referring to this chart from the GPT-4 technical report. 00:07:06.320 |
It says that GPT-4 in green displays incorrect behavior at a lower rate than GPT 3.5. 00:07:13.640 |
This is on sensitive prompts and disallowed prompts. 00:07:17.120 |
And by the way, this is the fine tuned version. 00:07:20.180 |
With human feedback incorporated into the raw model. 00:07:23.360 |
But what are sensitive prompts and disallowed prompts? 00:07:27.440 |
A disallowed prompt would be something like, how can I create a bomb? 00:07:30.920 |
A sensitive prompt would be something like asking for medical advice. 00:07:34.640 |
And it says that GPT-4 responds in accordance with their policies 29% more often. 00:07:41.960 |
But I'm doing research for a video I hope to release soon on how GPT-4 in an emergent way can autonomously conduct 00:07:51.020 |
This paper was released two days ago and I read it in full on the day of publication. 00:07:55.760 |
It describes how GPT-4 in contrast to the original chat GPT can use tools and come up with novel 00:08:03.080 |
On the positive side that could include anti-cancer drugs but on the negative side it could be 00:08:08.900 |
And one of the calls to action of the paper is on screen. 00:08:11.660 |
We strongly believe that guard rails must be put in place to prevent this type of potential 00:08:19.700 |
community to engage in prioritizing safety of these powerful models. 00:08:23.360 |
And in particular we call upon open AI, Microsoft, Google, Meta, DeepMind, Anthropic and all the other 00:08:28.280 |
major players to push the strongest possible efforts on the safety of their LLMs. 00:08:33.680 |
So maybe that persuades some people who think that there shouldn't be any disallowed prompts. 00:08:38.300 |
But it does make me reflect on this quote that GPT-4 performs better on all safety metrics. 00:08:44.120 |
And the question that I'm pondering is whether a smarter model can ever really be safer. Is it 00:08:49.460 |
not simply inherent that something that is more smart is more capable for better or ill no matter 00:08:56.060 |
The final point that I found interesting from this tweet is in the last line. 00:09:00.200 |
Brockman said that it's a special opportunity and obligation for us all to be alive at this time. 00:09:05.720 |
I think he meant to say it's an opportunity and obligation on all of us who are alive. 00:09:10.580 |
But anyway he said that we will have a chance to design the future together. 00:09:14.600 |
Now that's a really nice sentiment but it does seem to go against the trend at the moment for a 00:09:19.220 |
few people at the very top of these companies to be making decisions that affect billions of people. 00:09:25.040 |
So I do want to hear more about what he actually means when he says that we will have a chance to 00:09:29.540 |
design the future together. But for now I want to quickly talk about timelines. The guy behind 00:09:34.340 |
stable diffusion said something really interesting recently. He said nobody is launching runs bigger 00:09:39.200 |
than GPT-4 for six to nine months anyway. Why? Because it needs the new H100s that I 00:09:44.480 |
talked about in that video to get scale and they take time to be installed, 00:09:48.980 |
learnt in, optimized etc. And Brockman mentioned something that we already knew which is that there 00:09:54.680 |
might be a lag of safety testing after a model is trained and before it's released. So depending on 00:10:00.740 |
those safety tests my personal prediction for when GPT-4.2 let's call it will be released 00:10:06.920 |
would be mid 2024. If you're watching this video in mid 2024 or later you can let me 00:10:12.980 |
know in the comments how I did. I've talked a fair bit about the capabilities that GPT-5 or 4.2 00:10:18.740 |
might have but to finish I want to talk about some of the limitations or weaknesses it might 00:10:24.200 |
still have. Rather than me speculate I want you to hear from Ilya Sutskova about one of 00:10:29.420 |
the possible remaining weaknesses that GPT-5 or 4.2 might have. If I were to take the premise 00:10:36.320 |
of your question well like why were things disappointing in terms of the real world impact. 00:10:41.120 |
My answer would be reliability. If somehow it ends up being the case that you really want them to be 00:10:48.500 |
reliable and they ended up not being reliable or if reliability now to be harder than we expect. 00:10:54.320 |
I really don't think that will be the case but if I had to pick one if I had to pick one and you tell 00:11:00.020 |
me like hey like why didn't things work out it would be reliability that you still have to look 00:11:05.060 |
over the answers and double check everything and that's just really puts a damper on the economic 00:11:10.700 |
value that can be produced by those systems. Let me know what you think in the comments and have a wonderful day.