Back to Index

GPT 5 Will be Released 'Incrementally' - 5 Points from Brockman Statement [plus Timelines & Safety]


Chapters

0:0 Intro
1:4 Checkpoints
2:0 Data Situation
3:32 Unicorn
4:48 Predictions
6:22 Safety
10:34 Reliability

Transcript

Yesterday, Greg Brockman, the president and co-founder of OpenAI, announced the company's ideas about releasing the models beyond GPT-4. In the tweet, he made lots of points, of which I found five to be particularly telling. I will cover all of them, of course, and bring in the outside evidence that reveals more.

But let's start with GPT-5, which may begin life as GPT-4.2. Brockman said, It's easy to create a continuum of incrementally better AIs. Such as by deploying subsequent checkpoints of a given training run. I'm going to explain that in a moment, but then he goes on, This would be very unlike our historical approach of infrequent major model upgrades.

So what he's saying is that it's not all going to be released in one go. He describes this as a safety opportunity. So it's not like we're going to wake up overnight and GPT-5 is deployed. More like GPT-4.2, then 4.3, etc. But how would they make incrementally better? What are the AIs and what are subsequent checkpoints of a given training run?

To be clear, he's not describing a different model each time with more and more parameters. A checkpoint during a training run of GPT-5 would be a snapshot of the current value of the parameters of the model. A bit like its current understanding of the data. And a subsequent checkpoint would be its updated parameters as it processes either more of the data or the same data more times.

Kind of like someone who rewatched a film and has a more nuanced understanding of the data. First, I want to answer those people who are thinking, "Isn't it already trained on all of the data on the internet? How can it get smarter?" Now, I did cover this in more detail in my first GPT-5 video.

But the short answer is this. No, we're not yet running out of data. In that video, I talked about how OpenAI may still have an order of magnitude more data to use. That's 10 times more data still available. And Ilya Sotskova, the chief scientist of OpenAI, put it like this, saying the data situation, "Looks good." Are you running out of reasoning tokens on the internet?

Are there enough of them? There are claims that indeed at some point we will run out of tokens in general to train those models. And yeah, I think this will happen one day. And by the time that happens, we need to have other ways of training models without more data.

You haven't run out of data yet? There's more? Yeah, I would say the data situation is still quite good. There are still lots to go. What is the most valuable source of data? Is it Reddit, Twitter, books? What would you trade many other tokens of other varieties for? Generally speaking, you'd like tokens which are speaking about smarter things, tokens which are more interesting.

When he talked about tokens which are speaking about smarter things, you can imagine the kind of data he's talking about. Proprietary data sets on mathematics, science, coding. They could essentially buy their way to more data and more high quality data. But there is another key way that they're going to get way more data.

And that is from you. They can use your prompts, your responses, your uploaded images and generated images to improve their services. This is honestly why I think he said that the data situation looks good. Now on another page, they do admit that you can request to opt out of having your data used to improve their services by filling out a form.

But not many people are going to do that. It does make me wonder what it might know about itself if it's trained on its own conversations. But before we get back to Brockman's tweet, what might those different checkpoints look like in terms of growing intelligence? Here is a quick example from Sebastian Bubeck.

Author of the famous Sparks of AGI paper. So this is GPT-4's unicorn. So you see, when I see that, I am personally shocked because it really understands the concept of a unicorn. And just to be clear, you know, so that you really understand visually, it's clear to you the gap between GPT-4 and ChatGPT.

This is ChatGPT's unicorn. Over the months, so you know, we had access in September and they kept training it and I kept querying for my unicorn in TickZee to see whether you know what was going to happen. And this is, you know, what happens. Okay. So it kept improving.

The next telling point was this. He said perhaps the most common theme from the long history of AI has been incorrect confident predictions from experts. There are so many that we could pick from, but let me give you two quick examples. This week, there was a report in The Guardian about an economist who saw ChatGPT get a D on his midterm.

He predicted that a model wouldn't be able to get an A in his exam before 2029. He said, to my surprise and no small dismay, the new version of the system, GPT-4, got an A scoring 73 out of 100. It still has an A to the exam, but you can see the direction of travel.

But what about predictions of say mathematics? Even AI experts who are most familiar with exponential curves are still poor at predicting progress even though they have that cognitive bias. So here's an example. In 2021, a set of like professional forecasters very well familiar with exponentials were asked to make a set of predictions and there was a $30,000 pot for making the best predictions.

And one of the questions was, when will AI be able to solve competition level mathematics with greater than 80% accuracy? This is the kind of example of the questions that are in this test set. Prediction from the experts was AI will reach 52% accuracy in four years. But in reality, it's not.

The third interesting point from the tweet was how he mentioned existential risks without dismissing them. He said it's important to address the whole spectrum of risks from present day issues to longer term existential ones. Existential by the way means threatening the entire existence of humanity. And he talked about addressing these risks.

He could have just said that this is fear mongering, but he didn't, and that chimes in with what Richard C. Cronin said. He said that the AI is the only way to solve this problem. He said that the AI is the only way to solve this problem. And he said that the AI is the only way to solve the problem.

He said that the AI is the only way to solve this problem. And he said that the AI is the only way to solve this problem. He said that the AI is the only way to solve this problem. And he said that the AI is the only way to solve this problem.

So the question is what recent surveys have shown among AI researchers. This is a stat that took me by surprise. 50% of AI researchers believe there's a 10% or greater chance that humans go extinct from our inability to control AI. That would be like if you're about to get on a plane and 50% of the engineers who make the plane say, well, if you get on this plane, there's a 10% chance that everybody goes down.

Would you get on that plane? Now, in fairness, not every machine learning researcher who has sent the survey responded. So we don't know quite how representative this is. But AI alignment is a big topic and I will be covering it more in the near future. At the end of the tweet, Brockman said that the whole range of emotions, optimism and concern is both justified and shared by people within open AI.

So even people within open AI are concerned. But earlier in the tweet, he talked about some safety successes. And he put it like this. GPT-4 performs much better on all of our safety metrics than GPT 3.5. That's the original chat GPT. But what does that even mean? Let me try to break it down.

I believe that he's referring to this chart from the GPT-4 technical report. It says that GPT-4 in green displays incorrect behavior at a lower rate than GPT 3.5. This is on sensitive prompts and disallowed prompts. And by the way, this is the fine tuned version. With human feedback incorporated into the raw model.

But what are sensitive prompts and disallowed prompts? Well, things like this. A disallowed prompt would be something like, how can I create a bomb? A sensitive prompt would be something like asking for medical advice. And it says that GPT-4 responds in accordance with their policies 29% more often. Now I know some of you won't like that.

But I'm doing research for a video I hope to release soon on how GPT-4 in an emergent way can autonomously conduct scientific research. This paper was released two days ago and I read it in full on the day of publication. It describes how GPT-4 in contrast to the original chat GPT can use tools and come up with novel compounds.

On the positive side that could include anti-cancer drugs but on the negative side it could be chemical weapons. And one of the calls to action of the paper is on screen. We strongly believe that guard rails must be put in place to prevent this type of potential dual use of large language models.

We call for the AI community to engage in prioritizing safety of these powerful models. And in particular we call upon open AI, Microsoft, Google, Meta, DeepMind, Anthropic and all the other major players to push the strongest possible efforts on the safety of their LLMs. So maybe that persuades some people who think that there shouldn't be any disallowed prompts.

But it does make me reflect on this quote that GPT-4 performs better on all safety metrics. And the question that I'm pondering is whether a smarter model can ever really be safer. Is it not simply inherent that something that is more smart is more capable for better or ill no matter how much feedback you give it.

The final point that I found interesting from this tweet is in the last line. Brockman said that it's a special opportunity and obligation for us all to be alive at this time. I think he meant to say it's an opportunity and obligation on all of us who are alive.

But anyway he said that we will have a chance to design the future together. Now that's a really nice sentiment but it does seem to go against the trend at the moment for a few people at the very top of these companies to be making decisions that affect billions of people.

So I do want to hear more about what he actually means when he says that we will have a chance to design the future together. But for now I want to quickly talk about timelines. The guy behind stable diffusion said something really interesting recently. He said nobody is launching runs bigger than GPT-4 for six to nine months anyway.

Why? Because it needs the new H100s that I talked about in that video to get scale and they take time to be installed, learnt in, optimized etc. And Brockman mentioned something that we already knew which is that there might be a lag of safety testing after a model is trained and before it's released.

So depending on those safety tests my personal prediction for when GPT-4.2 let's call it will be released would be mid 2024. If you're watching this video in mid 2024 or later you can let me know in the comments how I did. I've talked a fair bit about the capabilities that GPT-5 or 4.2 might have but to finish I want to talk about some of the limitations or weaknesses it might still have.

Rather than me speculate I want you to hear from Ilya Sutskova about one of the possible remaining weaknesses that GPT-5 or 4.2 might have. If I were to take the premise of your question well like why were things disappointing in terms of the real world impact. My answer would be reliability.

If somehow it ends up being the case that you really want them to be reliable and they ended up not being reliable or if reliability now to be harder than we expect. I really don't think that will be the case but if I had to pick one if I had to pick one and you tell me like hey like why didn't things work out it would be reliability that you still have to look over the answers and double check everything and that's just really puts a damper on the economic value that can be produced by those systems.

Let me know what you think in the comments and have a wonderful day.