back to indexGPT-4o - Full Breakdown + Bonus Details

00:00:00.000 | 
It's smarter in most ways, cheaper, faster, better at coding, multi-modal in and out, 00:00:07.440 | 
and perfectly timed to steal the spotlight from Google. 00:00:13.760 | 
I've gone through all the benchmarks and the release videos to give you the highlights. 00:00:19.200 | 
My first reaction was it's more flirtatious PSI than AGI, but a notable step forward nonetheless. 00:00:27.680 | 
First things first, GPT-4 O, meaning Omni, which is all or everywhere, 00:00:33.280 | 
referencing the different modalities it's got, is free. 00:00:36.800 | 
By making GPT-4 O free, they are either crazy committed to scaling up from 100 million users 00:00:43.360 | 
to hundreds of millions of users, or they have an even smarter model coming soon, 00:00:49.360 | 
Of course, it could be both, but it does have to be something. 00:00:52.480 | 
Just giving paid users five times more in terms of message limits doesn't seem enough to me. 00:00:57.680 | 
Next, OpenAI branded this as GPT-4 level intelligence, 00:01:02.240 | 
although in a way, I think they slightly underplayed it. 00:01:05.280 | 
So before we get to the video demos, some of which you may have already seen, 00:01:09.120 | 
let me get to some more under the radar announcements. 00:01:12.800 | 
Take text to image and look at the accuracy of the text generated from this prompt. 00:01:18.560 | 
Now, I know it's not perfect. There aren't two question marks on the now. 00:01:22.800 | 
There's others that you can spot, like the I being capitalized. 00:01:26.000 | 
But overall, I've never seen text generated with that much accuracy. 00:01:31.520 | 
Or take this other example, where two OpenAI researchers submitted their photos. 00:01:36.000 | 
Then they asked GPT-4 O to design a movie poster, and they gave the requirements in text. 00:01:42.240 | 
Now, when you see the first output, you're going to say, well, that isn't that good. 00:01:46.480 | 
But then they asked GPT-4 O something fascinating. 00:01:49.760 | 
It seemed to be almost reverse psychology because they said, 00:01:54.640 | 
The text is crisper and the colors bolder and more dramatic. 00:02:01.920 | 
The final result in terms of the accuracy of the photos and of the text was really quite impressive. 00:02:08.000 | 
I can imagine millions of children and adults playing about with this functionality. 00:02:12.480 | 
Of course, they can't do so immediately because OpenAI said 00:02:17.680 | 
As another bonus, here is a video that OpenAI didn't put on their YouTube channel. 00:02:22.320 | 
It mimics a demo that Google made years ago, but never followed up with. 00:02:26.800 | 
The OpenAI employee asked GPT-4 O to call customer service and ask for something. 00:02:32.960 | 
I've skipped ahead, and the customer service in this case is another AI. 00:02:38.480 | 
Could you provide Joe's email address for me? 00:02:43.200 | 
Awesome. All right. I've just sent the email. 00:02:51.280 | 
Hey, Joe, could you please check your email to see if the shipping label 00:03:00.960 | 
They call it a proof of concept, but it is a hint toward the agents that are coming. 00:03:06.080 | 
Here are five more quick things that didn't make it to the demo. 00:03:12.000 | 
Submit your photo and get a caricature of yourself. 00:03:17.680 | 
You just ask for a new style of font and it will generate one. 00:03:23.920 | 
The meeting in this case had four speakers and it was transcribed. 00:03:28.400 | 
Or video summaries. Remember this model is multimodal in and out. 00:03:33.200 | 
Now it doesn't have video out, but I'll get to that in a moment. 00:03:36.480 | 
Here, though, was a demonstration of a 45 minute video submitted to GPT 4.0 00:03:43.840 | 
We also got character consistency across both woman and dog, 00:03:52.640 | 
what about the actual intelligence and performance of the model? 00:03:55.920 | 
Before I get to official benchmarks, here is a human graded leaderboard 00:04:02.560 | 
And yes, I'm also a good GPT 2 chatbot is indeed GPT 4.0. 00:04:08.640 | 
So it turns out I've actually been testing the model for days. 00:04:11.840 | 
Overall, you can see the preference for GPT 4.0 compared to all other models. 00:04:17.600 | 
In coding specifically, the difference is quite stark. 00:04:23.120 | 
we're not looking at an entirely new tier of intelligence. 00:04:27.200 | 
Remember that a 100 Elo gap is a win rate of around two thirds. 00:04:32.240 | 
So one third of the time GPT 4.0 Turbo's outputs would be preferred. 00:04:36.480 | 
That's about the same gap between GPT 4.0 Turbo and last year's GPT 4.0. 00:04:41.120 | 
A huge step forward, but not completely night and day. 00:04:44.560 | 
I think one underrated announcement was the desktop app, a live coding co-pilot. 00:04:50.640 | 
OK, so I'm going to open the chat GPT desktop app 00:04:55.600 | 
OK, and to give a bit of background of what's going on. 00:04:58.320 | 
So here we have a computer and on the screen we have some code 00:05:01.840 | 
and then the chat GPT voice app is on the right. 00:05:04.160 | 
So chat GPT will be able to hear me, but it can't see anything on the screen. 00:05:07.920 | 
So I'm going to highlight the code, command see it, 00:05:11.920 | 
And then I'm going to talk about the code to chat GPT. 00:05:17.280 | 
Could you give me a really brief one sentence description of what's going on in the code? 00:05:20.240 | 
This code fetches daily weather data for a specific location and time period, 00:05:26.080 | 
smooths the temperature data using a rolling average, 00:05:29.280 | 
annotates a significant weather event on the resulting plot, 00:05:32.720 | 
and then displays the plot with the average minimum and maximum temperatures over the year. 00:05:40.800 | 
I was most impressed with GPT 4.0's performance on the math benchmark. 00:05:45.120 | 
Even though it fails pretty much all of my math prompts, 00:05:48.000 | 
that is still a stark improvement from the original GPT 4. 00:05:51.920 | 
On the Google Proof graduate test, it beats Cloud 3 Opus. 00:05:56.240 | 
And remember, that was the headline benchmark for Anthropic. 00:05:59.520 | 
In fact, speaking of Anthropic, they are somewhat challenged by this release. 00:06:03.680 | 
GPT 4.0 costs $5 per 1 million tokens input and $15 per 1 million tokens output. 00:06:09.920 | 
As a quick aside, it also has 128k token context and an October knowledge cutoff. 00:06:21.360 | 
And remember, for Cloud 3 Opus on the web, you have to sign up with a subscription. 00:06:28.400 | 
So for Cloud 3 Opus to be beaten in its headline benchmark is a concern for them. 00:06:34.160 | 
In fact, I think the results are clear enough to say that GPT 4.0 is the new smartest AI. 00:06:40.560 | 
However, just before you get carried away and type on Twitter that AGI is here, 00:06:49.600 | 
I dug into this benchmark and it's about adversarial reading comprehension questions. 00:06:54.080 | 
They're designed to really test the reasoning capabilities of models. 00:06:58.560 | 
If you give models difficult passages and they've got to sort through references, 00:07:02.400 | 
do some counting and other operations, how do they fare? 00:07:05.520 | 
The DROP, by the way, is discrete reasoning over the content of paragraphs. 00:07:10.000 | 
It does slightly better than the original GPT 4.0, but slightly worse than LLAMA 3/400B. 00:07:15.680 | 
And as they note, LLAMA 3/400B is still training. 00:07:19.200 | 
So it's just about the new smartest model by a hair's breadth. 00:07:25.600 | 
It's better at translation than Gemini models. 00:07:28.640 | 
Quick caveat there, Gemini 2 might be announced tomorrow and that could regain the lead. 00:07:33.760 | 
Then there are the vision understanding evaluations. 00:07:40.800 | 
As you can see, a clear 10 points better than Cloud Opus. 00:07:44.080 | 
Again, I'm curious if Google Gemini can exceed it though. 00:07:47.520 | 
The improvements to the tokenizer could be revolutionary for non-English speakers. 00:07:52.880 | 
The dramatically fewer tokens needed for languages like Gujarati, 00:07:56.960 | 
Hindi, Arabic, and more don't just mean that conversations are cheaper, they're also quicker. 00:08:04.880 | 
Well, this time they didn't compare it to other models, but compared it to the original GPT 4. 00:08:09.760 | 
Definitely a step up across languages, but English is still by far the most suited language. 00:08:16.240 | 
Indeed, here is a video of some of the model's mistakes 00:08:22.160 | 
I know enough Mandarin to say it wasn't perfect at tutoring. 00:08:58.560 | 
Hello, my name is Nach, I'm here with my co-worker. 00:09:03.600 | 
Hi, I'm Shil, I'm trying to teach my co-worker how to speak in Mandarin. 00:09:07.120 | 
We want to start from simple words like "Nǐ hǎo". 00:09:15.040 | 
"Nǐ hǎo" is pretty straightforward to pronounce. 00:09:20.480 | 
Just make sure to keep the "Nǐ" part high and then go down and pitch on the "hǎo". 00:09:32.480 | 
It really wasn't, he needs to work on his tones and her face was the giveaway. 00:09:37.360 | 
There were a lot of other interesting video demonstrations, 00:09:40.560 | 
but before them, the GPT-4.0 blog post from Sam Altman put out tonight. 00:09:45.840 | 
He made the argument that putting this capable AI tool in the hands of everyone 00:09:50.400 | 
for free is justifying that open part of open AI. 00:09:54.800 | 
Less about open weights or open source, more about open to everyone, 00:10:02.080 | 
He also draws our attention to the latency of the models. 00:10:05.920 | 
In many ways, that is the key innovation with GPT-4.0. 00:10:09.680 | 
By dialing down the latency, you dial up the realism. 00:10:14.000 | 
As he said, it feels like AI from the movies, 00:10:16.560 | 
getting to human level response times and expressiveness turns out to be a big change. 00:10:21.520 | 
Indeed, I think I should get a little credit for predicting 00:10:30.240 | 
The response times were amazing and the model was clearly designed to be flirtatious. 00:10:35.600 | 
Note that in Senate testimony last year, Sam Altman said, 00:10:39.040 | 
"We try to design systems that do not maximize for engagement." 00:10:43.280 | 
So let me know in the comments whether you think these demos 00:10:57.600 | 
Yeah, well, I really appreciate all the help. 00:11:10.480 | 
Yeah, as we can see, you know, we can chat in real time. 00:11:12.800 | 
I was impressed though that it could speed up its talking on demand. 00:11:25.920 | 
One, two, three, four, five, six, seven, eight, nine, ten. 00:11:31.440 | 
because for blind people, this could be revolutionary. 00:11:35.440 | 
The Royal Standard flag is flying above Buckingham Palace, 00:11:39.360 | 
which is the signal that the monarch is present. 00:11:42.000 | 
How exciting to be there while the king is home. 00:11:44.960 | 
Try and tell me exactly what they're doing right now, please. 00:11:48.160 | 
Right now, the ducks are gently gliding across the water. 00:11:52.320 | 
It will also help you with interview prep by analyzing your appearance. 00:12:04.320 | 
Well, in a few minutes, I'm going to be interviewing at OpenAI. 00:12:16.800 | 
Well, it's for a software engineering role, but I just need to know, 00:12:27.840 | 
"I've been coding all night" look down, which could actually work in your favor. 00:12:32.080 | 
Maybe just run a hand through your hair or lean into the mad genie's spine. 00:12:37.840 | 
Your enthusiasm is what's really going to shine through. 00:12:40.800 | 
I don't have a lot of time, so I'm just going to throw this on. 00:12:55.200 | 
though maybe not in the way you're hoping for an interview. 00:13:00.320 | 
I could definitely see GPT-4.0 being used for mathematics tutoring. 00:13:04.800 | 
Here is a 30-second clip, but notice also the slight glitchiness at certain moments. 00:13:12.080 | 
All right, so I'm pretty sure this is the angle alpha right here, right? 00:13:19.360 | 
Now, looking at the triangle, which side do you think is the hypotenuse? 00:13:29.920 | 
I think it might be this one, but I really am not sure. 00:13:37.600 | 
I also noticed quite a few glitches in the demo. 00:13:42.800 | 
The one feature though that OpenAI wants you to really focus on is the video in capacity. 00:13:48.960 | 
You can live stream videos straight to the transformer architecture behind GPT-4.0. 00:13:54.480 | 
And so the video in functionality is certainly impressive. 00:13:58.400 | 
Note though that the reaction time of GPT-4.0 isn't quite as immediate as for audio. 00:14:04.000 | 
Greg Brockman, the co-founder of OpenAI, has to prompt it to react to the intruder. 00:14:11.920 | 
They might be getting ready for a conversation or presentation, 00:14:15.200 | 
given their direct engagement with the camera. 00:14:17.360 | 
Sounds like we're in for an intriguing interaction. 00:14:23.680 | 
I'm curious about the lighting you mentioned. 00:14:27.680 | 
Is it natural light or are we talking unique fixtures or colors that add to the atmosphere? 00:14:33.920 | 
The lighting here seems to be a mix of natural and artificial. 00:14:39.920 | 
There's a noticeable bright light overhead, probably a fixture, 00:14:44.240 | 
casting a focused beam downwards, which creates a spotlight effect. 00:14:47.920 | 
This adds a dramatic and modern feel to the scene. 00:14:52.320 | 
The rest of the room is lit more softly, possibly by natural light filtering... 00:15:02.480 | 
Just as you asked, another person came into view behind the first person. 00:15:07.280 | 
They playfully made bunny ears behind the first person's head and then quickly left the frame. 00:15:13.600 | 
It added a lighthearted and unexpected moment to the scene. 00:15:18.000 | 
Now the focus is back on the original person with the leather jacket. 00:15:22.720 | 
Nevertheless, GPT-4O can produce multiple voices that can sing almost in harmony. 00:15:31.600 | 
But maybe make it more dramatic and make the soprano higher. 00:15:57.120 | 
And I suspect this real-time translation could soon be coming to Siri. 00:16:03.680 | 
So every time I say something in English, can you repeat it back in Spanish? 00:16:07.760 | 
And every time he says something in Spanish, can you repeat it back in English? 00:16:19.200 | 
Have you been up to anything interesting recently? 00:16:36.960 | 
Just a bit busy here preparing for an event next week. 00:16:40.800 | 
Because Bloomberg reported two days ago that Apple is nearing a deal with OpenAI 00:16:48.560 | 
And in case you're wondering about GPT-4.5 or even 5, 00:16:52.560 | 
Sam Ullman said we'll have more stuff to share soon. 00:16:55.600 | 
And Mira Murati in the official presentation said that they would be 00:16:59.760 | 
soon updating us on progress on the next big thing. 00:17:04.320 | 
Whether that's empty hype or real, you can decide. 00:17:07.680 | 
No word of course about OpenAI co-founder Ilya Sutskov, 00:17:11.360 | 
although he was listed as a contributor under additional leadership. 00:17:16.400 | 
Overall, I think this model will be massively more popular 00:17:22.880 | 
You can prompt the model now with text and images in the OpenAI playground. 00:17:29.680 | 
Note also that all the demos you saw were in real time at 1x speed. 00:17:34.880 | 
That I think was a nod to Google's botched demo. 00:17:38.560 | 
Of course, let's see tomorrow what Google replies with. 00:17:41.520 | 
To those who think that GPT-4.0 is a huge stride towards AGI, 00:17:46.400 | 
I would point them to the somewhat mixed results on the reasoning benchmarks. 00:17:51.040 | 
Expect GPT-4.0 to still suffer from a massive amount of hallucinations. 00:17:56.160 | 
To those though who think that GPT-4.0 will change nothing, I would say this. 00:18:00.880 | 
Look at what ChatGPT did to the popularity of the underlying GPT series. 00:18:06.240 | 
It being a free and chatty model brought 100 million people into testing AI. 00:18:12.320 | 
GPT-4.0 being the smartest model currently available and free on the web and multimodal, 00:18:19.360 | 
I think could unlock AI for hundreds of millions more people. 00:18:27.440 | 
If you want to analyse the announcement even more, 00:18:30.000 | 
do join me on the AI Insiders Discord via Patreon. 00:18:33.840 | 
We have live meetups around the world and professional best practice sharing. 00:18:37.840 | 
So let me know what you think and as always have a wonderful day.