back to index

GPT-5: Everything You Need to Know So Far


Whisper Transcript | Transcript Only Page

00:00:00.000 | It seems quite likely that yesterday was the day that OpenAI launched the full training run of GPT-5.
00:00:07.600 | I've gone through every source I can find to bring you the most reliable information on what
00:00:12.640 | that means, including possibly every public comment on the topic from OpenAI, an exclusive
00:00:19.040 | interview with a hardware CEO, and tons of my own analysis. Plus, I'm going to find time to throw in
00:00:25.760 | a practical tip that you can use literally every time you open ChatGPT, and a bonus DALI discovery
00:00:33.120 | that I really enjoy. But let's start with these two tweets. And the first clue that the full-scale
00:00:39.520 | GPT-5 is being trained comes from the president and co-founder of OpenAI, Greg Brockman. Now first,
00:00:46.000 | a little bit of context. OpenAI typically trains smaller models of about a thousandth the size
00:00:51.440 | before they do a full training run. They then gather insights from these smaller models before
00:00:56.320 | doing the full training run. So that's the backdrop to OpenAI, in Brockman's words, scientifically
00:01:02.640 | predicting and understanding the resulting systems. That being done, what they're now building
00:01:09.040 | is maximally harnessing all their computing resources. They're gathering all of their ideas
00:01:14.640 | together and scaling beyond precedent. Translated, they are training their biggest model yet. We'll
00:01:20.640 | get to parameters, data, and capabilities in a moment. But first, what's this other tweet?
00:01:25.040 | This comes from Jason Wei, a top OpenAI researcher. A few hours after Brockman's tweet,
00:01:30.160 | he said there's no adrenaline rush like launching a massive GPU training. And this got plenty of
00:01:36.720 | salutes in the reply from other OpenAI employees. Now for context, this does not mean that we're
00:01:42.720 | imminently going to get GPT-5. GPT-4 took around three months to train, and then there was the
00:01:48.880 | safety testing. I'm actually going to end this video with my exact prediction of when I think
00:01:53.040 | they're going to release GPT-5. But first, here is a little bit more supporting evidence that
00:01:58.080 | they are currently training GPT-5. OpenAI updated their blog to say that applications for the red
00:02:04.400 | teaming network have closed, and that those red teamers would know about the status of their
00:02:09.200 | applications by the end of last year. What that means is that the red teamers are now in place
00:02:14.720 | to start safety testing the new model. Now, you might say, what's the point of having those red
00:02:19.520 | teamers in place if it's still going to be training for two to three months? Well, before a
00:02:24.560 | model is fully trained, it goes through various checkpoints. Think of them a bit like a video game
00:02:30.160 | save. What that also means is that in effect, OpenAI will have a GPT-4.2 before they have a GPT-5.
00:02:38.400 | Indeed, Greg Brockman, going back to April of last year, said that it might be one of those
00:02:43.520 | checkpoints that OpenAI release first. He said that it's easy to create a continuum of incrementally
00:02:50.640 | better AIs, such as by deploying subsequent checkpoints of a given training run. And he
00:02:55.920 | explicitly contrasted that approach and said it would be very unlike our historical approach
00:03:01.440 | of infrequent major model upgrades. But remember that even before those checkpoints, OpenAI would
00:03:07.360 | have already gotten a glimpse of GPT-5 capabilities from those smaller, earlier versions of the model.
00:03:13.920 | Indeed, Sam Altman, back in November, said that he was privileged to be in the room
00:03:18.080 | when they pushed back the veil of ignorance. I'm super excited. I can't imagine anything
00:03:23.040 | more exciting to work on. And on a personal note, like four times now in the history of OpenAI,
00:03:27.680 | the most recent time was just in the last couple of weeks. I've gotten to be in the room when we
00:03:32.880 | sort of like push the front, the sort of the veil of ignorance back and the frontier of
00:03:38.000 | discovery forward. And getting to do that is like the professional honor of a lifetime.
00:03:42.480 | It all points to OpenAI in November and December having trained those smaller versions of GPT-5.
00:03:49.120 | Again, with the purpose of scientifically predicting and understanding the resulting
00:03:54.560 | GPT-5 system. So they know it's going to be good, but just how good and how big? And what are these
00:04:01.120 | new or old ideas that they're going to incorporate? Well, one thing that seemed almost certain
00:04:06.240 | is that they're going to incorporate a way to let GPT-5 think for longer. In other words,
00:04:11.280 | it's going to lay out its reasoning steps before solving a challenge and have each of those
00:04:16.000 | reasoning steps checked internally or externally. Here's Sam Altman a few days ago at Davos.
00:04:22.240 | What it means to verify or understand what's going on is going to be a little bit different
00:04:26.560 | than people think right now. I actually can't look in your brain and look at the hundred trillion
00:04:32.080 | synapses and try to understand what's happening in each one and say, okay, I really understand
00:04:36.880 | why he's thinking what he's thinking. You're not a black box to me. But what I can ask you to do
00:04:42.720 | is explain to me your reasoning. I can say, you know, you think this thing, why? And you can
00:04:48.080 | explain first this, then this, then there's this conclusion, then that one, and then there's this.
00:04:52.240 | And I can decide if that sounds reasonable to me or not. And I think our AI systems will also be
00:04:58.400 | able to do the same thing. They'll be able to explain to us in natural language, the steps
00:05:03.040 | from A to B, and we can decide whether we think those are good steps.
00:05:09.200 | And a few days before that, Sam Altman told Bill Gates that that might involve
00:05:14.000 | asking GPT-4 or GPT-5 the same question 10,000 times.
00:05:19.200 | You know, when you look at the next two years, what do you think some of the key milestones
00:05:24.080 | will be?
00:05:24.880 | Multimodality will definitely be important.
00:05:27.040 | Which means speech in, speech out.
00:05:29.840 | Speech in, speech out, images, eventually video. Clearly, people really want that.
00:05:34.880 | We launched images and audio, and it had a much stronger response than we expected.
00:05:39.600 | We'll be able to push that much further. But maybe the most important areas of progress
00:05:45.280 | will be around reasoning ability. Right now, GPT-4 can reason in only extremely limited ways.
00:05:50.800 | And also reliability. If you ask GPT-4 most questions 10,000 times,
00:05:56.400 | one of those 10,000 is probably pretty good, but it doesn't always know which one.
00:06:00.320 | And you'd like to get the best response of 10,000 each time.
00:06:03.440 | That increase in reliability will be important.
00:06:05.760 | And at this point, watchers of my channel will know exactly what he's referring to.
00:06:09.520 | Both of those approaches, checking your reasoning steps and sampling up to 10,000 times,
00:06:15.040 | are incorporated into OpenAI's Let's Verify step-by-step paper.
00:06:19.440 | Now, I'm not going to dive into the details of Let's Verify in this video,
00:06:23.360 | because I've got at least two previous videos on the topic.
00:06:26.480 | But notice in the paper how many times they sample GPT-4.
00:06:30.160 | The chart shows what happens when you sample the model over a thousand times
00:06:34.720 | and pick out the best responses.
00:06:36.640 | And notice something about this process-supervised way of doing things.
00:06:40.240 | The results are continuing to go up.
00:06:42.640 | I can't resist showing you a quick example of the reasoning steps broken down into separate lines
00:06:48.240 | and essentially a verifier looking in and checking which steps are accurate or inaccurate.
00:06:53.760 | The answers for whom each step in the reasoning process got a thumbs up
00:06:58.000 | were the ones submitted and the results were dramatic.
00:07:01.440 | Essentially, this process of sampling the model thousands of times
00:07:04.640 | and taking the answer that had the highest rated reasoning steps
00:07:08.320 | doubled the performance in mathematics.
00:07:10.480 | And no, this didn't just work for mathematics.
00:07:12.720 | It had dramatic results across these STEM fields.
00:07:15.840 | And remember, this was using GPT-4 as a base model, not GPT-5.
00:07:20.240 | And it was only 2000 samples, not 10,000 like Sam Altman talked about.
00:07:24.480 | So this is the evidence I would present to someone who said that LLMs have peaked.
00:07:29.760 | If OpenAI can incorporate through parallelization
00:07:33.360 | a way to get the model to submit and analyze 10,000 responses,
00:07:38.240 | the results could be truly dramatic.
00:07:40.480 | Indeed, the Let's Verify paper from OpenAI repeatedly cited this earlier DeepMind paper
00:07:46.400 | on solving math problems with process-based feedback.
00:07:49.920 | And for coding, we know AlphaCode2 from Google DeepMind used the mass sampling approach
00:07:55.760 | to get an 87th percentile score on a coding contest.
00:08:00.320 | In other words, it beat 87% of participants in this CodeForces coding challenge.
00:08:05.440 | For context, the GPT-4 that we got scored around the 5th percentile,
00:08:09.680 | a score of 400 in the CodeForces challenge.
00:08:12.640 | These numbers are a little out of date, but AlphaCode2 would have scored around here,
00:08:16.960 | expert or candidate master level.
00:08:19.040 | Or to just translate everything, if they find a way to let GPT-5 think,
00:08:24.480 | it could be night and day in terms of performance for coding, mathematics, and STEM.
00:08:30.160 | But just how big will GPT-5 be that's doing all of this parallel thinking?
00:08:34.720 | Well, for AI Insiders, I interviewed Gavin Uberti,
00:08:38.240 | the CEO and co-founder of EtchedAI, which I've also talked about on this channel.
00:08:43.440 | He is a 21-year-old dropout from Harvard University and on his LinkedIn profile,
00:08:47.920 | it says he's building the hardware for superintelligence.
00:08:51.120 | In the interview, he guessed that GPT-5 would have around 10 times the parameter count of GPT-4.
00:08:57.040 | According to leaks, GPT-4 has around 1.5 to 1.8 trillion parameters.
00:09:01.920 | But just quickly, what did he mean when he said that he expects that to come from a combination
00:09:05.920 | of a larger embedding dimension, more layers, and double the number of experts?
00:09:10.560 | Well, think of the embedding dimension as being about how granular the training can be
00:09:15.040 | about each token and its context.
00:09:17.360 | A bigger embedding dimension means more granularity and nuance about each token.
00:09:22.240 | And doubling the number of layers allows a model to develop deeper pattern recognition.
00:09:26.960 | It allows it to see more complex patterns within patterns within patterns.
00:09:31.040 | More highlights from that interview will be coming on
00:09:33.600 | AI Insiders on Patreon in the coming weeks.
00:09:36.720 | But during this video on GPT-5, I promised you two interludes.
00:09:41.280 | One focused on DALI-3 and one a practical tip for using ChatGPT anytime.
00:09:47.120 | Well, here is the first of those two interludes focused on a particular quirk of DALI-3.
00:09:52.400 | I say that, but this trick also works for mid-journey.
00:09:55.040 | Now, many of you might have noticed a trend on TikTok, Reddit, Twitter, and YouTube shorts,
00:10:00.160 | where people post an image and then make it progressively more intense, let's say.
00:10:04.720 | Well, here's something arguably even more quirky.
00:10:07.680 | I got the original idea from Peter Wilderford on Twitter and decided to make it more intense.
00:10:13.520 | But first I asked, draw an image of a London scene, but don't use lampposts in the image.
00:10:18.960 | And lo and behold, we get dozens of lampposts.
00:10:22.400 | I mean, in this one, we have a lamppost coming down from the sky.
00:10:25.840 | And what does GPT-4 say in analyzing these?
00:10:28.960 | It says that these are two images of a London street scene without lampposts.
00:10:33.440 | So then I said, now make these images with even fewer lampposts,
00:10:37.760 | stripping any lamppost references completely.
00:10:40.240 | And here were the results.
00:10:42.080 | As you can see on the right, there's barely a lamppost in sight.
00:10:45.840 | And as GPT-4 says, these images were created with complete omission of any lamppost references.
00:10:52.000 | Then I said, delete absolutely everything that pertains to a lamppost.
00:10:55.920 | And I don't know about you, but I can't see any lampposts left.
00:10:58.400 | And finally, I said, take this to the max and give me an image that
00:11:01.760 | someone could not picture a lamppost existing within, even in their wildest imagination.
00:11:06.480 | Now, I think it's pretty cute that the lamppost persisted throughout these images.
00:11:10.800 | And I suspect that the deeper reason is that the caption training
00:11:15.040 | that DALI 3 got didn't have many examples of omission.
00:11:18.800 | They used web captions and synthetic captions,
00:11:21.360 | but I doubt there were many examples of people saying this image does not contain X.
00:11:26.000 | But speaking of modality, the first thing they want to fix apparently for GPT-5
00:11:31.120 | is the real time nature of the voice interaction.
00:11:34.400 | At the moment, there is quite a bit of time to first token latency.
00:11:38.560 | In other words, it takes a bit too long to reply.
00:11:41.040 | Here's Sam Altman speaking last week.
00:11:42.960 | I think there's all sorts of the current stuff that people complain about,
00:11:45.440 | like the voice is too slow and, you know, it's not real time and that'll get better this year.
00:11:49.360 | I think where we're headed, and then I'll talk about this year,
00:11:51.680 | is we're headed towards the way you use the computers to talk to it.
00:11:54.800 | The operating system of a computer in some sense is close to this idea
00:11:58.560 | that you're like working inside of a chat experience.
00:12:02.400 | When he mentioned using an LLM as an operating system,
00:12:05.840 | he was drawing on Andrej Karpathy's vision.
00:12:08.320 | I've talked about it before, but notice at the top that it's video in and out,
00:12:12.640 | audio in and out.
00:12:14.320 | And it's not like OpenAI are hiding it.
00:12:16.640 | They want as much text, image, audio and video data as they can get their hands on.
00:12:21.680 | They also want what I'm going to call reasoning data.
00:12:24.560 | Data that expresses human intention, they call it.
00:12:27.440 | Now, I didn't notice that phrase at the time of this blog post in November,
00:12:31.440 | but it fits in clearly with what I was saying earlier about Let's Verify.
00:12:35.440 | Think about it.
00:12:36.000 | How would you make a model agentic, able to solve more complex challenges?
00:12:40.240 | Well, if GPT-5 gets loads of data about people laying out plans full of human intention,
00:12:45.760 | GPT-5 could learn to imitate those schemes and plans
00:12:49.120 | and maybe have a verifier internally or externally judging those reasoning steps.
00:12:53.840 | Now, as for the question of whether those reasoning steps faithfully represent
00:12:57.840 | what the model is internally calculating, that will have to be for another day.
00:13:02.240 | This paper from Anthropic back in July said that as models become larger and more capable,
00:13:07.520 | they actually produce less faithful reasoning on most tasks we study.
00:13:11.360 | That doesn't mean getting the answer wrong more often.
00:13:14.000 | It means outputting reasoning steps that don't actually reflect what it's internally calculating.
00:13:19.040 | So, GPT-5 may end up being an excellent productivity assistant
00:13:23.280 | while still being somewhat inscrutable on a deeper level.
00:13:26.720 | Just quickly before we leave data, I think there's one thing that we can safely say,
00:13:30.320 | which is that there will be much more multilingual data in GPT-5 training set.
00:13:35.440 | OpenAI have formed so many data partnerships,
00:13:38.240 | including with people like the Icelandic government,
00:13:40.720 | and there are so many more multilingual data sets being open sourced
00:13:44.720 | that I think it's almost inevitable that there will be a dramatic forward step
00:13:49.120 | in GPT-5's multilingual abilities.
00:13:51.520 | Partly, this is a safety thing too,
00:13:53.360 | with OpenAI wanting its red teamers to be fluent in more than one language.
00:13:57.440 | Models are notoriously easier to jailbreak in different languages,
00:14:01.360 | and it looks like OpenAI are working hard on that front.
00:14:04.240 | But there is one language that I bet you didn't know GPT-4 can already speak,
00:14:08.480 | and that's the language of gobbledygook.
00:14:10.400 | According to this fascinating paper from Tokyo,
00:14:13.280 | GPT-4 can almost perfectly handle unnatural scrambled text.
00:14:17.360 | Now you might already know that humans have this ability,
00:14:20.000 | if the first and last letter of a word is the same,
00:14:23.200 | you can often still recognize the word.
00:14:25.280 | But GPT-4, and obviously GPT-5, will be able to go a step further.
00:14:29.440 | Even if the first and last letter are different,
00:14:32.320 | if a word is completely scrambled, it can recover the sentence.
00:14:36.400 | I tested it out, and indeed it works.
00:14:38.720 | Just look at how utterly gobbled this sentence is.
00:14:42.480 | For me and you, that would be almost complete gobbledygook.
00:14:45.920 | But GPT-4 was able to recognize what I'm saying.
00:14:49.120 | So that is the practical tip that I also wanted to give you in this video.
00:14:53.040 | If you have a quick request of GPT-4,
00:14:56.000 | don't bother going back and spending 30 seconds correcting all your typos.
00:15:00.800 | Trust me, I've been guilty of this in the past because I love perfect English,
00:15:05.680 | but if you have a letter or two out of place, don't worry, it will understand.
00:15:09.520 | To be honest, if it can unscramble this, it can understand your typo of the word there,
00:15:14.080 | your missing comma, and all the rest of it.
00:15:16.400 | So save yourself 30 seconds and don't even bother correcting your typos.
00:15:20.080 | But now it's time for me to finally give my prediction for when GPT-5 is going to be released.
00:15:25.600 | For the last few weeks, I've honestly thought it would be around September of this year.
00:15:30.160 | But now I think it will be toward the end of November of 2024.
00:15:35.760 | And no, that's not just because that would be the two-year anniversary
00:15:39.440 | of the release of ChatGPT, the original version.
00:15:42.400 | First, let me clarify that I don't think they will release
00:15:45.040 | the full capabilities of GPT-5 on the first go.
00:15:48.640 | As mentioned, I think they'll release different checkpoints,
00:15:50.880 | different functionalities as we head into 2025.
00:15:54.080 | But what explains the delay from now when they're training it all the way to November?
00:15:58.880 | Well, first of all, as mentioned, it does take a few months to train a model the size of GPT-5.
00:16:05.120 | Yes, they might be able to use, say, 100,000 H100 GPUs from NVIDIA.
00:16:09.520 | But training a model has hiccups and, of course, the model will be much larger.
00:16:13.680 | But let's say that takes around two months. That would bring us to the end of March.
00:16:17.840 | Now here's the key point.
00:16:19.040 | Sam Altman has boasted many times in the past
00:16:22.080 | about how they tested GPT-4 for six to eight months before releasing it.
00:16:26.800 | It would be pretty awkward for OpenAI to have even less safety testing for GPT-5.
00:16:33.120 | So add six months to the end of March and you get to the end of September.
00:16:37.280 | Of course, add eight months and you get to the end of November.
00:16:40.640 | So why end of November rather than end of September?
00:16:43.680 | Well, I think OpenAI will want to steer clear of
00:16:47.360 | what will be an incredibly contentious American election.
00:16:50.960 | If they release even an alpha version of GPT-5
00:16:54.080 | with, say, video and audio before the election, they could come under incredible flack.
00:16:59.200 | As they say on their website, they're still working to understand
00:17:02.240 | how effective their current tools might be for personalized persuasion.
00:17:06.160 | That would be stepping into a minefield.
00:17:08.880 | In the recent New Hampshire election, we already saw robocalls imitating Joe Biden.
00:17:13.520 | So November 30th, as well as being a symbolic date, would steer clear of that election.
00:17:26.880 | You might say, what about 2025?
00:17:28.480 | But I think the incentives of Moloch will prevent them delaying too long.
00:17:32.160 | In not too long, we might be getting the release of Gemini Ultra,
00:17:35.040 | not to mention Gemini 2 Ultra and, of course, LLAMA 3 from Meta, announced by Zuckerberg.
00:17:40.240 | When everyone else has caught up, they might feel compelled to release the next model.
00:17:44.320 | And Anthropic might choose a similar time to release Claude III.
00:17:48.240 | Here is the CEO of Anthropic, Dario Amadei, giving his rough predictions for GPT-5 and Claude III.
00:17:55.040 | What do you think happens on the next major training run for LLMs?
00:17:58.880 | My guess would be, you know, nothing truly insane happens, say, in any training run that,
00:18:05.680 | you know, happens in 2024.
00:18:07.680 | You know, to really invent new science, the ability to cure diseases, the ability to make
00:18:14.160 | bio, yeah, the ability to make bioweapons, yeah, and maybe someday the Dyson spheres.
00:18:18.480 | The least impressive of those things, I think, you know, will happen,
00:18:22.000 | I would say, no sooner than 2025, maybe 2026.
00:18:25.360 | I think we're just going to see, in 2024,
00:18:28.640 | crisper, more commercially applicable versions of the models that exist today.
00:18:34.000 | Like, you know, we've seen a few of these generations of jumps.
00:18:37.360 | I think in 2024, people are certainly going to be surprised.
00:18:40.560 | Like, they're going to be surprised at how much better these things have gotten.
00:18:43.680 | But it's not going to quite bend reality yet.
00:18:47.120 | Of course, I have to say, at the end of this video,
00:18:49.520 | that no one truly knows, not even OpenAI, what GPT-5 will be like.
00:18:54.160 | As Sam Altman recently said, "Until we go train that model, GPT-5,
00:18:58.000 | it's like a fun guessing game for us."
00:19:00.400 | I can't tell you, here's exactly what it's going to do that GPT-4 didn't.
00:19:04.480 | And here's Greg Brockman with a similar message.
00:19:07.040 | Right, that is the biggest theme in the history of AI, is that it's full of surprises.
00:19:10.400 | Every time you think you know something, you scale it up 10x, turns out you knew nothing.
00:19:14.000 | And so I think that we, as a humanity, as a species, are really exploring this together.
00:19:18.560 | Mm-hmm.
00:19:19.360 | And then we get cryptic messages like this from senior members of OpenAI.
00:19:23.840 | Ben Newhouse says he's hiring at OpenAI,
00:19:26.800 | "And we're building what I think could be an industry-defining 0-to-1 product
00:19:32.640 | that leverages the latest and greatest from our upcoming models, i.e. GPT-5."
00:19:37.680 | And two other OpenAI employees replied like this,
00:19:40.640 | "This product will change everything and what they said."
00:19:44.160 | Of course, it would be pure speculation to guess what they mean with this.
00:19:48.560 | So those were my predictions.
00:19:50.080 | I try to base them on evidence rather than idle speculation.
00:19:54.080 | I do try not to be lazy like GPT-4 has in the past been accused of being.
00:19:58.720 | If you like this kind of thing,
00:19:59.760 | I would invite you to see more exclusive premium content on AI Insiders on Patreon,
00:20:05.040 | but for everyone watching to the end,
00:20:07.440 | I want to thank you so much for being here and I want to wish you a wonderful day.