back to indexGPT-4o - Full Breakdown + Bonus Details
00:00:00.000 |
It's smarter in most ways, cheaper, faster, better at coding, multi-modal in and out, 00:00:07.440 |
and perfectly timed to steal the spotlight from Google. 00:00:13.760 |
I've gone through all the benchmarks and the release videos to give you the highlights. 00:00:19.200 |
My first reaction was it's more flirtatious PSI than AGI, but a notable step forward nonetheless. 00:00:27.680 |
First things first, GPT-4 O, meaning Omni, which is all or everywhere, 00:00:33.280 |
referencing the different modalities it's got, is free. 00:00:36.800 |
By making GPT-4 O free, they are either crazy committed to scaling up from 100 million users 00:00:43.360 |
to hundreds of millions of users, or they have an even smarter model coming soon, 00:00:49.360 |
Of course, it could be both, but it does have to be something. 00:00:52.480 |
Just giving paid users five times more in terms of message limits doesn't seem enough to me. 00:00:57.680 |
Next, OpenAI branded this as GPT-4 level intelligence, 00:01:02.240 |
although in a way, I think they slightly underplayed it. 00:01:05.280 |
So before we get to the video demos, some of which you may have already seen, 00:01:09.120 |
let me get to some more under the radar announcements. 00:01:12.800 |
Take text to image and look at the accuracy of the text generated from this prompt. 00:01:18.560 |
Now, I know it's not perfect. There aren't two question marks on the now. 00:01:22.800 |
There's others that you can spot, like the I being capitalized. 00:01:26.000 |
But overall, I've never seen text generated with that much accuracy. 00:01:31.520 |
Or take this other example, where two OpenAI researchers submitted their photos. 00:01:36.000 |
Then they asked GPT-4 O to design a movie poster, and they gave the requirements in text. 00:01:42.240 |
Now, when you see the first output, you're going to say, well, that isn't that good. 00:01:46.480 |
But then they asked GPT-4 O something fascinating. 00:01:49.760 |
It seemed to be almost reverse psychology because they said, 00:01:54.640 |
The text is crisper and the colors bolder and more dramatic. 00:02:01.920 |
The final result in terms of the accuracy of the photos and of the text was really quite impressive. 00:02:08.000 |
I can imagine millions of children and adults playing about with this functionality. 00:02:12.480 |
Of course, they can't do so immediately because OpenAI said 00:02:17.680 |
As another bonus, here is a video that OpenAI didn't put on their YouTube channel. 00:02:22.320 |
It mimics a demo that Google made years ago, but never followed up with. 00:02:26.800 |
The OpenAI employee asked GPT-4 O to call customer service and ask for something. 00:02:32.960 |
I've skipped ahead, and the customer service in this case is another AI. 00:02:38.480 |
Could you provide Joe's email address for me? 00:02:43.200 |
Awesome. All right. I've just sent the email. 00:02:51.280 |
Hey, Joe, could you please check your email to see if the shipping label 00:03:00.960 |
They call it a proof of concept, but it is a hint toward the agents that are coming. 00:03:06.080 |
Here are five more quick things that didn't make it to the demo. 00:03:12.000 |
Submit your photo and get a caricature of yourself. 00:03:17.680 |
You just ask for a new style of font and it will generate one. 00:03:23.920 |
The meeting in this case had four speakers and it was transcribed. 00:03:28.400 |
Or video summaries. Remember this model is multimodal in and out. 00:03:33.200 |
Now it doesn't have video out, but I'll get to that in a moment. 00:03:36.480 |
Here, though, was a demonstration of a 45 minute video submitted to GPT 4.0 00:03:43.840 |
We also got character consistency across both woman and dog, 00:03:52.640 |
what about the actual intelligence and performance of the model? 00:03:55.920 |
Before I get to official benchmarks, here is a human graded leaderboard 00:04:02.560 |
And yes, I'm also a good GPT 2 chatbot is indeed GPT 4.0. 00:04:08.640 |
So it turns out I've actually been testing the model for days. 00:04:11.840 |
Overall, you can see the preference for GPT 4.0 compared to all other models. 00:04:17.600 |
In coding specifically, the difference is quite stark. 00:04:23.120 |
we're not looking at an entirely new tier of intelligence. 00:04:27.200 |
Remember that a 100 Elo gap is a win rate of around two thirds. 00:04:32.240 |
So one third of the time GPT 4.0 Turbo's outputs would be preferred. 00:04:36.480 |
That's about the same gap between GPT 4.0 Turbo and last year's GPT 4.0. 00:04:41.120 |
A huge step forward, but not completely night and day. 00:04:44.560 |
I think one underrated announcement was the desktop app, a live coding co-pilot. 00:04:50.640 |
OK, so I'm going to open the chat GPT desktop app 00:04:55.600 |
OK, and to give a bit of background of what's going on. 00:04:58.320 |
So here we have a computer and on the screen we have some code 00:05:01.840 |
and then the chat GPT voice app is on the right. 00:05:04.160 |
So chat GPT will be able to hear me, but it can't see anything on the screen. 00:05:07.920 |
So I'm going to highlight the code, command see it, 00:05:11.920 |
And then I'm going to talk about the code to chat GPT. 00:05:17.280 |
Could you give me a really brief one sentence description of what's going on in the code? 00:05:20.240 |
This code fetches daily weather data for a specific location and time period, 00:05:26.080 |
smooths the temperature data using a rolling average, 00:05:29.280 |
annotates a significant weather event on the resulting plot, 00:05:32.720 |
and then displays the plot with the average minimum and maximum temperatures over the year. 00:05:40.800 |
I was most impressed with GPT 4.0's performance on the math benchmark. 00:05:45.120 |
Even though it fails pretty much all of my math prompts, 00:05:48.000 |
that is still a stark improvement from the original GPT 4. 00:05:51.920 |
On the Google Proof graduate test, it beats Cloud 3 Opus. 00:05:56.240 |
And remember, that was the headline benchmark for Anthropic. 00:05:59.520 |
In fact, speaking of Anthropic, they are somewhat challenged by this release. 00:06:03.680 |
GPT 4.0 costs $5 per 1 million tokens input and $15 per 1 million tokens output. 00:06:09.920 |
As a quick aside, it also has 128k token context and an October knowledge cutoff. 00:06:21.360 |
And remember, for Cloud 3 Opus on the web, you have to sign up with a subscription. 00:06:28.400 |
So for Cloud 3 Opus to be beaten in its headline benchmark is a concern for them. 00:06:34.160 |
In fact, I think the results are clear enough to say that GPT 4.0 is the new smartest AI. 00:06:40.560 |
However, just before you get carried away and type on Twitter that AGI is here, 00:06:49.600 |
I dug into this benchmark and it's about adversarial reading comprehension questions. 00:06:54.080 |
They're designed to really test the reasoning capabilities of models. 00:06:58.560 |
If you give models difficult passages and they've got to sort through references, 00:07:02.400 |
do some counting and other operations, how do they fare? 00:07:05.520 |
The DROP, by the way, is discrete reasoning over the content of paragraphs. 00:07:10.000 |
It does slightly better than the original GPT 4.0, but slightly worse than LLAMA 3/400B. 00:07:15.680 |
And as they note, LLAMA 3/400B is still training. 00:07:19.200 |
So it's just about the new smartest model by a hair's breadth. 00:07:25.600 |
It's better at translation than Gemini models. 00:07:28.640 |
Quick caveat there, Gemini 2 might be announced tomorrow and that could regain the lead. 00:07:33.760 |
Then there are the vision understanding evaluations. 00:07:40.800 |
As you can see, a clear 10 points better than Cloud Opus. 00:07:44.080 |
Again, I'm curious if Google Gemini can exceed it though. 00:07:47.520 |
The improvements to the tokenizer could be revolutionary for non-English speakers. 00:07:52.880 |
The dramatically fewer tokens needed for languages like Gujarati, 00:07:56.960 |
Hindi, Arabic, and more don't just mean that conversations are cheaper, they're also quicker. 00:08:04.880 |
Well, this time they didn't compare it to other models, but compared it to the original GPT 4. 00:08:09.760 |
Definitely a step up across languages, but English is still by far the most suited language. 00:08:16.240 |
Indeed, here is a video of some of the model's mistakes 00:08:22.160 |
I know enough Mandarin to say it wasn't perfect at tutoring. 00:08:58.560 |
Hello, my name is Nach, I'm here with my co-worker. 00:09:03.600 |
Hi, I'm Shil, I'm trying to teach my co-worker how to speak in Mandarin. 00:09:07.120 |
We want to start from simple words like "Nǐ hǎo". 00:09:15.040 |
"Nǐ hǎo" is pretty straightforward to pronounce. 00:09:20.480 |
Just make sure to keep the "Nǐ" part high and then go down and pitch on the "hǎo". 00:09:32.480 |
It really wasn't, he needs to work on his tones and her face was the giveaway. 00:09:37.360 |
There were a lot of other interesting video demonstrations, 00:09:40.560 |
but before them, the GPT-4.0 blog post from Sam Altman put out tonight. 00:09:45.840 |
He made the argument that putting this capable AI tool in the hands of everyone 00:09:50.400 |
for free is justifying that open part of open AI. 00:09:54.800 |
Less about open weights or open source, more about open to everyone, 00:10:02.080 |
He also draws our attention to the latency of the models. 00:10:05.920 |
In many ways, that is the key innovation with GPT-4.0. 00:10:09.680 |
By dialing down the latency, you dial up the realism. 00:10:14.000 |
As he said, it feels like AI from the movies, 00:10:16.560 |
getting to human level response times and expressiveness turns out to be a big change. 00:10:21.520 |
Indeed, I think I should get a little credit for predicting 00:10:30.240 |
The response times were amazing and the model was clearly designed to be flirtatious. 00:10:35.600 |
Note that in Senate testimony last year, Sam Altman said, 00:10:39.040 |
"We try to design systems that do not maximize for engagement." 00:10:43.280 |
So let me know in the comments whether you think these demos 00:10:57.600 |
Yeah, well, I really appreciate all the help. 00:11:10.480 |
Yeah, as we can see, you know, we can chat in real time. 00:11:12.800 |
I was impressed though that it could speed up its talking on demand. 00:11:25.920 |
One, two, three, four, five, six, seven, eight, nine, ten. 00:11:31.440 |
because for blind people, this could be revolutionary. 00:11:35.440 |
The Royal Standard flag is flying above Buckingham Palace, 00:11:39.360 |
which is the signal that the monarch is present. 00:11:42.000 |
How exciting to be there while the king is home. 00:11:44.960 |
Try and tell me exactly what they're doing right now, please. 00:11:48.160 |
Right now, the ducks are gently gliding across the water. 00:11:52.320 |
It will also help you with interview prep by analyzing your appearance. 00:12:04.320 |
Well, in a few minutes, I'm going to be interviewing at OpenAI. 00:12:16.800 |
Well, it's for a software engineering role, but I just need to know, 00:12:27.840 |
"I've been coding all night" look down, which could actually work in your favor. 00:12:32.080 |
Maybe just run a hand through your hair or lean into the mad genie's spine. 00:12:37.840 |
Your enthusiasm is what's really going to shine through. 00:12:40.800 |
I don't have a lot of time, so I'm just going to throw this on. 00:12:55.200 |
though maybe not in the way you're hoping for an interview. 00:13:00.320 |
I could definitely see GPT-4.0 being used for mathematics tutoring. 00:13:04.800 |
Here is a 30-second clip, but notice also the slight glitchiness at certain moments. 00:13:12.080 |
All right, so I'm pretty sure this is the angle alpha right here, right? 00:13:19.360 |
Now, looking at the triangle, which side do you think is the hypotenuse? 00:13:29.920 |
I think it might be this one, but I really am not sure. 00:13:37.600 |
I also noticed quite a few glitches in the demo. 00:13:42.800 |
The one feature though that OpenAI wants you to really focus on is the video in capacity. 00:13:48.960 |
You can live stream videos straight to the transformer architecture behind GPT-4.0. 00:13:54.480 |
And so the video in functionality is certainly impressive. 00:13:58.400 |
Note though that the reaction time of GPT-4.0 isn't quite as immediate as for audio. 00:14:04.000 |
Greg Brockman, the co-founder of OpenAI, has to prompt it to react to the intruder. 00:14:11.920 |
They might be getting ready for a conversation or presentation, 00:14:15.200 |
given their direct engagement with the camera. 00:14:17.360 |
Sounds like we're in for an intriguing interaction. 00:14:23.680 |
I'm curious about the lighting you mentioned. 00:14:27.680 |
Is it natural light or are we talking unique fixtures or colors that add to the atmosphere? 00:14:33.920 |
The lighting here seems to be a mix of natural and artificial. 00:14:39.920 |
There's a noticeable bright light overhead, probably a fixture, 00:14:44.240 |
casting a focused beam downwards, which creates a spotlight effect. 00:14:47.920 |
This adds a dramatic and modern feel to the scene. 00:14:52.320 |
The rest of the room is lit more softly, possibly by natural light filtering... 00:15:02.480 |
Just as you asked, another person came into view behind the first person. 00:15:07.280 |
They playfully made bunny ears behind the first person's head and then quickly left the frame. 00:15:13.600 |
It added a lighthearted and unexpected moment to the scene. 00:15:18.000 |
Now the focus is back on the original person with the leather jacket. 00:15:22.720 |
Nevertheless, GPT-4O can produce multiple voices that can sing almost in harmony. 00:15:31.600 |
But maybe make it more dramatic and make the soprano higher. 00:15:57.120 |
And I suspect this real-time translation could soon be coming to Siri. 00:16:03.680 |
So every time I say something in English, can you repeat it back in Spanish? 00:16:07.760 |
And every time he says something in Spanish, can you repeat it back in English? 00:16:19.200 |
Have you been up to anything interesting recently? 00:16:36.960 |
Just a bit busy here preparing for an event next week. 00:16:40.800 |
Because Bloomberg reported two days ago that Apple is nearing a deal with OpenAI 00:16:48.560 |
And in case you're wondering about GPT-4.5 or even 5, 00:16:52.560 |
Sam Ullman said we'll have more stuff to share soon. 00:16:55.600 |
And Mira Murati in the official presentation said that they would be 00:16:59.760 |
soon updating us on progress on the next big thing. 00:17:04.320 |
Whether that's empty hype or real, you can decide. 00:17:07.680 |
No word of course about OpenAI co-founder Ilya Sutskov, 00:17:11.360 |
although he was listed as a contributor under additional leadership. 00:17:16.400 |
Overall, I think this model will be massively more popular 00:17:22.880 |
You can prompt the model now with text and images in the OpenAI playground. 00:17:29.680 |
Note also that all the demos you saw were in real time at 1x speed. 00:17:34.880 |
That I think was a nod to Google's botched demo. 00:17:38.560 |
Of course, let's see tomorrow what Google replies with. 00:17:41.520 |
To those who think that GPT-4.0 is a huge stride towards AGI, 00:17:46.400 |
I would point them to the somewhat mixed results on the reasoning benchmarks. 00:17:51.040 |
Expect GPT-4.0 to still suffer from a massive amount of hallucinations. 00:17:56.160 |
To those though who think that GPT-4.0 will change nothing, I would say this. 00:18:00.880 |
Look at what ChatGPT did to the popularity of the underlying GPT series. 00:18:06.240 |
It being a free and chatty model brought 100 million people into testing AI. 00:18:12.320 |
GPT-4.0 being the smartest model currently available and free on the web and multimodal, 00:18:19.360 |
I think could unlock AI for hundreds of millions more people. 00:18:27.440 |
If you want to analyse the announcement even more, 00:18:30.000 |
do join me on the AI Insiders Discord via Patreon. 00:18:33.840 |
We have live meetups around the world and professional best practice sharing. 00:18:37.840 |
So let me know what you think and as always have a wonderful day.