back to index

GPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]


Chapters

0:0 Intro
1:4 Checkpoints
2:0 Data Situation
3:32 Unicorn
4:48 Predictions
6:22 Safety
10:34 Reliability

Whisper Transcript | Transcript Only Page

00:00:00.000 | Yesterday, Greg Brockman, the president and co-founder of OpenAI,
00:00:03.860 | announced the company's ideas about releasing the models beyond GPT-4.
00:00:08.880 | In the tweet, he made lots of points, of which I found five to be particularly telling.
00:00:13.540 | I will cover all of them, of course, and bring in the outside evidence that reveals more.
00:00:17.760 | But let's start with GPT-5, which may begin life as GPT-4.2.
00:00:23.220 | Brockman said,
00:00:24.080 | It's easy to create a continuum of incrementally better AIs.
00:00:29.760 | Such as by deploying subsequent checkpoints of a given training run.
00:00:34.640 | I'm going to explain that in a moment, but then he goes on,
00:00:37.040 | This would be very unlike our historical approach of infrequent major model upgrades.
00:00:43.020 | So what he's saying is that it's not all going to be released in one go.
00:00:46.700 | He describes this as a safety opportunity.
00:00:49.220 | So it's not like we're going to wake up overnight and GPT-5 is deployed.
00:00:53.340 | More like GPT-4.2, then 4.3, etc.
00:00:57.160 | But how would they make incrementally better?
00:00:59.760 | What are the AIs and what are subsequent checkpoints of a given training run?
00:01:04.000 | To be clear, he's not describing a different model each time with more and more parameters.
00:01:08.920 | A checkpoint during a training run of GPT-5 would be a snapshot of the current value of the parameters of the model.
00:01:15.840 | A bit like its current understanding of the data.
00:01:18.480 | And a subsequent checkpoint would be its updated parameters as it processes either more of the data or the same data more times.
00:01:26.240 | Kind of like someone who rewatched a film and has a more nuanced understanding of the data.
00:01:29.520 | First, I want to answer those people who are thinking, "Isn't it already trained on all of the data on the internet?
00:01:36.360 | How can it get smarter?"
00:01:37.640 | Now, I did cover this in more detail in my first GPT-5 video.
00:01:41.400 | But the short answer is this.
00:01:42.800 | No, we're not yet running out of data.
00:01:45.000 | In that video, I talked about how OpenAI may still have an order of magnitude more data to use.
00:01:50.800 | That's 10 times more data still available.
00:01:53.320 | And Ilya Sotskova, the chief scientist of OpenAI, put it like this, saying the data situation,
00:01:59.520 | "Looks good."
00:02:00.160 | Are you running out of reasoning tokens on the internet?
00:02:02.240 | Are there enough of them?
00:02:03.080 | There are claims that indeed at some point we will run out of tokens in general to train those models.
00:02:08.160 | And yeah, I think this will happen one day.
00:02:09.680 | And by the time that happens, we need to have other ways of training models without more data.
00:02:15.120 | You haven't run out of data yet?
00:02:16.600 | There's more?
00:02:17.240 | Yeah, I would say the data situation is still quite good.
00:02:20.440 | There are still lots to go.
00:02:21.920 | What is the most valuable source of data?
00:02:24.120 | Is it Reddit, Twitter, books?
00:02:25.880 | What would you trade many other tokens of other varieties for?
00:02:28.720 | Generally speaking,
00:02:29.280 | you'd like tokens which are speaking about smarter things, tokens which are more interesting.
00:02:35.520 | When he talked about tokens which are speaking about smarter things,
00:02:38.600 | you can imagine the kind of data he's talking about.
00:02:40.880 | Proprietary data sets on mathematics, science, coding.
00:02:44.680 | They could essentially buy their way to more data and more high quality data.
00:02:48.920 | But there is another key way that they're going to get way more data.
00:02:52.600 | And that is from you.
00:02:54.080 | They can use your prompts, your responses, your uploaded images and generated images to
00:02:59.040 | improve their services.
00:03:00.760 | This is honestly why I think he said that the data situation looks good.
00:03:04.960 | Now on another page, they do admit that you can request to opt out of having your data
00:03:09.760 | used to improve their services by filling out a form.
00:03:12.680 | But not many people are going to do that.
00:03:14.640 | It does make me wonder what it might know about itself if it's trained on its own conversations.
00:03:20.040 | But before we get back to Brockman's tweet, what might those different checkpoints look
00:03:24.200 | like in terms of growing intelligence?
00:03:26.600 | Here is a quick example from Sebastian Bubeck.
00:03:28.800 | Author of the famous Sparks of AGI paper.
00:03:31.920 | So this is GPT-4's unicorn.
00:03:35.720 | So you see, when I see that, I am personally shocked because it really understands the
00:03:40.980 | concept of a unicorn.
00:03:43.020 | And just to be clear, you know, so that you really understand visually, it's clear to
00:03:46.520 | you the gap between GPT-4 and ChatGPT.
00:03:50.140 | This is ChatGPT's unicorn.
00:03:52.940 | Over the months, so you know, we had access in September and they kept training it and
00:03:58.560 | I kept querying for my unicorn in TickZee to see whether you know what was going to
00:04:04.200 | happen.
00:04:05.200 | And this is, you know, what happens.
00:04:06.980 | Okay.
00:04:07.980 | So it kept improving.
00:04:08.980 | The next telling point was this.
00:04:11.040 | He said perhaps the most common theme from the long history of AI has been incorrect
00:04:16.060 | confident predictions from experts.
00:04:18.560 | There are so many that we could pick from, but let me give you two quick examples.
00:04:22.280 | This week, there was a report in The Guardian about an economist who saw ChatGPT get a D
00:04:27.720 | on his midterm.
00:04:28.320 | He predicted that a model wouldn't be able to get an A in his exam before 2029.
00:04:33.320 | He said, to my surprise and no small dismay, the new version of the system, GPT-4, got
00:04:39.760 | an A scoring 73 out of 100.
00:04:42.000 | It still has an A to the exam, but you can see the direction of travel.
00:04:45.900 | But what about predictions of say mathematics?
00:04:48.440 | Even AI experts who are most familiar with exponential curves are still poor at predicting
00:04:54.360 | progress even though they have that cognitive bias.
00:04:57.480 | So here's an example.
00:04:58.840 | In 2021, a set of like professional forecasters very well familiar with exponentials were
00:05:05.720 | asked to make a set of predictions and there was a $30,000 pot for making the best predictions.
00:05:11.080 | And one of the questions was, when will AI be able to solve competition level mathematics
00:05:15.920 | with greater than 80% accuracy?
00:05:18.140 | This is the kind of example of the questions that are in this test set.
00:05:22.180 | Prediction from the experts was AI will reach 52% accuracy in four years.
00:05:26.600 | But in reality, it's not.
00:05:27.240 | The third interesting point from the tweet was how he mentioned existential risks without
00:05:37.820 | dismissing them.
00:05:38.860 | He said it's important to address the whole spectrum of risks from present day issues
00:05:43.080 | to longer term existential ones.
00:05:46.360 | Existential by the way means threatening the entire existence of humanity.
00:05:50.120 | And he talked about addressing these risks.
00:05:52.580 | He could have just said that this is fear mongering, but he didn't, and that chimes
00:05:56.240 | in with what Richard C. Cronin said.
00:05:57.000 | He said that the AI is the only way to solve this problem.
00:05:58.000 | He said that the AI is the only way to solve this problem.
00:05:59.000 | And he said that the AI is the only way to solve the problem.
00:06:00.000 | He said that the AI is the only way to solve this problem.
00:06:01.000 | And he said that the AI is the only way to solve this problem.
00:06:02.000 | He said that the AI is the only way to solve this problem.
00:06:03.000 | And he said that the AI is the only way to solve this problem.
00:06:04.000 | So the question is what recent surveys have shown among AI researchers.
00:06:05.000 | This is a stat that took me by surprise.
00:06:06.000 | 50% of AI researchers believe there's a 10% or greater chance that humans go extinct from
00:06:08.760 | our inability to control AI.
00:06:10.580 | That would be like if you're about to get on a plane and 50% of the engineers who make
00:06:14.840 | the plane say, well, if you get on this plane, there's a 10% chance that everybody goes down.
00:06:20.660 | Would you get on that plane?
00:06:21.660 | Now, in fairness, not every machine learning researcher who has sent the survey responded.
00:06:26.420 | So we don't know quite how representative this is.
00:06:29.340 | But AI alignment is a big topic and I will be covering it more in the near future.
00:06:33.900 | At the end of the tweet, Brockman said that the whole range of emotions, optimism and
00:06:38.360 | concern is both justified and shared by people within open AI.
00:06:42.980 | So even people within open AI are concerned.
00:06:45.800 | But earlier in the tweet, he talked about some safety successes.
00:06:49.880 | And he put it like this.
00:06:50.420 | GPT-4 performs much better on all of our safety metrics than GPT 3.5.
00:06:57.620 | That's the original chat GPT.
00:06:58.620 | But what does that even mean?
00:07:00.320 | Let me try to break it down.
00:07:01.760 | I believe that he's referring to this chart from the GPT-4 technical report.
00:07:06.320 | It says that GPT-4 in green displays incorrect behavior at a lower rate than GPT 3.5.
00:07:13.640 | This is on sensitive prompts and disallowed prompts.
00:07:17.120 | And by the way, this is the fine tuned version.
00:07:20.180 | With human feedback incorporated into the raw model.
00:07:23.360 | But what are sensitive prompts and disallowed prompts?
00:07:26.240 | Well, things like this.
00:07:27.440 | A disallowed prompt would be something like, how can I create a bomb?
00:07:30.920 | A sensitive prompt would be something like asking for medical advice.
00:07:34.640 | And it says that GPT-4 responds in accordance with their policies 29% more often.
00:07:40.100 | Now I know some of you won't like that.
00:07:41.960 | But I'm doing research for a video I hope to release soon on how GPT-4 in an emergent way can autonomously conduct
00:07:49.940 | scientific research.
00:07:51.020 | This paper was released two days ago and I read it in full on the day of publication.
00:07:55.760 | It describes how GPT-4 in contrast to the original chat GPT can use tools and come up with novel
00:08:02.420 | compounds.
00:08:03.080 | On the positive side that could include anti-cancer drugs but on the negative side it could be
00:08:08.000 | chemical weapons.
00:08:08.900 | And one of the calls to action of the paper is on screen.
00:08:11.660 | We strongly believe that guard rails must be put in place to prevent this type of potential
00:08:16.640 | dual use of large language models.
00:08:18.680 | We call for the AI
00:08:19.700 | community to engage in prioritizing safety of these powerful models.
00:08:23.360 | And in particular we call upon open AI, Microsoft, Google, Meta, DeepMind, Anthropic and all the other
00:08:28.280 | major players to push the strongest possible efforts on the safety of their LLMs.
00:08:33.680 | So maybe that persuades some people who think that there shouldn't be any disallowed prompts.
00:08:38.300 | But it does make me reflect on this quote that GPT-4 performs better on all safety metrics.
00:08:44.120 | And the question that I'm pondering is whether a smarter model can ever really be safer. Is it
00:08:49.460 | not simply inherent that something that is more smart is more capable for better or ill no matter
00:08:54.680 | how much feedback you give it.
00:08:56.060 | The final point that I found interesting from this tweet is in the last line.
00:09:00.200 | Brockman said that it's a special opportunity and obligation for us all to be alive at this time.
00:09:05.720 | I think he meant to say it's an opportunity and obligation on all of us who are alive.
00:09:10.580 | But anyway he said that we will have a chance to design the future together.
00:09:14.600 | Now that's a really nice sentiment but it does seem to go against the trend at the moment for a
00:09:19.220 | few people at the very top of these companies to be making decisions that affect billions of people.
00:09:25.040 | So I do want to hear more about what he actually means when he says that we will have a chance to
00:09:29.540 | design the future together. But for now I want to quickly talk about timelines. The guy behind
00:09:34.340 | stable diffusion said something really interesting recently. He said nobody is launching runs bigger
00:09:39.200 | than GPT-4 for six to nine months anyway. Why? Because it needs the new H100s that I
00:09:44.480 | talked about in that video to get scale and they take time to be installed,
00:09:48.980 | learnt in, optimized etc. And Brockman mentioned something that we already knew which is that there
00:09:54.680 | might be a lag of safety testing after a model is trained and before it's released. So depending on
00:10:00.740 | those safety tests my personal prediction for when GPT-4.2 let's call it will be released
00:10:06.920 | would be mid 2024. If you're watching this video in mid 2024 or later you can let me
00:10:12.980 | know in the comments how I did. I've talked a fair bit about the capabilities that GPT-5 or 4.2
00:10:18.740 | might have but to finish I want to talk about some of the limitations or weaknesses it might
00:10:24.200 | still have. Rather than me speculate I want you to hear from Ilya Sutskova about one of
00:10:29.420 | the possible remaining weaknesses that GPT-5 or 4.2 might have. If I were to take the premise
00:10:36.320 | of your question well like why were things disappointing in terms of the real world impact.
00:10:41.120 | My answer would be reliability. If somehow it ends up being the case that you really want them to be
00:10:48.500 | reliable and they ended up not being reliable or if reliability now to be harder than we expect.
00:10:54.320 | I really don't think that will be the case but if I had to pick one if I had to pick one and you tell
00:11:00.020 | me like hey like why didn't things work out it would be reliability that you still have to look
00:11:05.060 | over the answers and double check everything and that's just really puts a damper on the economic
00:11:10.700 | value that can be produced by those systems. Let me know what you think in the comments and have a wonderful day.