GPT-4o - Full Breakdown + Bonus Details

00:00:00.000 | It's smarter in most ways, cheaper, faster, better at coding, multi-modal in and out,

00:00:07.440 | and perfectly timed to steal the spotlight from Google.

00:00:11.200 | It's GPT-4 Omni.

00:00:13.760 | I've gone through all the benchmarks and the release videos to give you the highlights.

00:00:19.200 | My first reaction was it's more flirtatious PSI than AGI, but a notable step forward nonetheless.

00:00:27.680 | First things first, GPT-4 O, meaning Omni, which is all or everywhere,

00:00:33.280 | referencing the different modalities it's got, is free.

00:00:36.800 | By making GPT-4 O free, they are either crazy committed to scaling up from 100 million users

00:00:43.360 | to hundreds of millions of users, or they have an even smarter model coming soon,

00:00:48.080 | and they did hint at that.

00:00:49.360 | Of course, it could be both, but it does have to be something.

00:00:52.480 | Just giving paid users five times more in terms of message limits doesn't seem enough to me.

00:00:57.680 | Next, OpenAI branded this as GPT-4 level intelligence,

00:01:02.240 | although in a way, I think they slightly underplayed it.

00:01:05.280 | So before we get to the video demos, some of which you may have already seen,

00:01:09.120 | let me get to some more under the radar announcements.

00:01:12.800 | Take text to image and look at the accuracy of the text generated from this prompt.

00:01:18.560 | Now, I know it's not perfect. There aren't two question marks on the now.

00:01:22.800 | There's others that you can spot, like the I being capitalized.

00:01:26.000 | But overall, I've never seen text generated with that much accuracy.

00:01:30.000 | It wasn't even in the demo.

00:01:31.520 | Or take this other example, where two OpenAI researchers submitted their photos.

00:01:36.000 | Then they asked GPT-4 O to design a movie poster, and they gave the requirements in text.

00:01:42.240 | Now, when you see the first output, you're going to say, well, that isn't that good.

00:01:46.480 | But then they asked GPT-4 O something fascinating.

00:01:49.760 | It seemed to be almost reverse psychology because they said,

00:01:52.640 | here is the same poster, but cleaned up.

00:01:54.640 | The text is crisper and the colors bolder and more dramatic.

00:01:58.080 | The whole image is now improved.

00:02:00.000 | This is the input, don't forget.

00:02:01.920 | The final result in terms of the accuracy of the photos and of the text was really quite impressive.

00:02:08.000 | I can imagine millions of children and adults playing about with this functionality.

00:02:12.480 | Of course, they can't do so immediately because OpenAI said

00:02:15.280 | this will be released in the next few weeks.

00:02:17.680 | As another bonus, here is a video that OpenAI didn't put on their YouTube channel.

00:02:22.320 | It mimics a demo that Google made years ago, but never followed up with.

00:02:26.800 | The OpenAI employee asked GPT-4 O to call customer service and ask for something.

00:02:32.960 | I've skipped ahead, and the customer service in this case is another AI.

00:02:36.560 | But here is the conclusion.

00:02:38.480 | Could you provide Joe's email address for me?

00:02:40.720 | Sure. It's joe@example.com.

00:02:43.200 | Awesome. All right. I've just sent the email.

00:02:46.640 | Can you check if Joe received it?

00:02:48.560 | We'll check right now. Please hold.

00:02:50.480 | Sure thing.

00:02:51.280 | Hey, Joe, could you please check your email to see if the shipping label

00:02:55.120 | and return instructions have arrived?

00:02:56.720 | Fingers crossed.

00:02:57.680 | Yes, I got the instructions.

00:02:59.200 | Perfect. Joe has received the email.

00:03:00.960 | They call it a proof of concept, but it is a hint toward the agents that are coming.

00:03:06.080 | Here are five more quick things that didn't make it to the demo.

00:03:09.520 | How about a replacement for lenser?

00:03:12.000 | Submit your photo and get a caricature of yourself.

00:03:15.280 | Or what about text to new font?

00:03:17.680 | You just ask for a new style of font and it will generate one.

00:03:21.680 | Or what about meeting transcription?

00:03:23.920 | The meeting in this case had four speakers and it was transcribed.

00:03:28.400 | Or video summaries. Remember this model is multimodal in and out.

00:03:33.200 | Now it doesn't have video out, but I'll get to that in a moment.

00:03:36.480 | Here, though, was a demonstration of a 45 minute video submitted to GPT 4.0

00:03:42.080 | and a summary of that video.

00:03:43.840 | We also got character consistency across both woman and dog,

00:03:48.480 | almost like an entire cartoon strip.

00:03:50.800 | If those were the quick bonuses,

00:03:52.640 | what about the actual intelligence and performance of the model?

00:03:55.920 | Before I get to official benchmarks, here is a human graded leaderboard

00:04:00.480 | pitting one model against another.

00:04:02.560 | And yes, I'm also a good GPT 2 chatbot is indeed GPT 4.0.

00:04:08.640 | So it turns out I've actually been testing the model for days.

00:04:11.840 | Overall, you can see the preference for GPT 4.0 compared to all other models.

00:04:17.600 | In coding specifically, the difference is quite stark.

00:04:21.200 | I would say even here, though,

00:04:23.120 | we're not looking at an entirely new tier of intelligence.

00:04:27.200 | Remember that a 100 Elo gap is a win rate of around two thirds.

00:04:32.240 | So one third of the time GPT 4.0 Turbo's outputs would be preferred.

00:04:36.480 | That's about the same gap between GPT 4.0 Turbo and last year's GPT 4.0.

00:04:41.120 | A huge step forward, but not completely night and day.

00:04:44.560 | I think one underrated announcement was the desktop app, a live coding co-pilot.

00:04:50.640 | OK, so I'm going to open the chat GPT desktop app

00:04:54.320 | like Mira was talking about before.

00:04:55.600 | OK, and to give a bit of background of what's going on.

00:04:58.320 | So here we have a computer and on the screen we have some code

00:05:01.840 | and then the chat GPT voice app is on the right.

00:05:04.160 | So chat GPT will be able to hear me, but it can't see anything on the screen.

00:05:07.920 | So I'm going to highlight the code, command see it,

00:05:10.400 | and then that will send it to chat GPT.

00:05:11.920 | And then I'm going to talk about the code to chat GPT.

00:05:13.680 | OK, so I just shared some code with you.

00:05:17.280 | Could you give me a really brief one sentence description of what's going on in the code?

00:05:20.240 | This code fetches daily weather data for a specific location and time period,

00:05:26.080 | smooths the temperature data using a rolling average,

00:05:29.280 | annotates a significant weather event on the resulting plot,

00:05:32.720 | and then displays the plot with the average minimum and maximum temperatures over the year.

00:05:37.680 | I've delayed long enough.

00:05:38.960 | Here are the benchmarks.

00:05:40.800 | I was most impressed with GPT 4.0's performance on the math benchmark.

00:05:45.120 | Even though it fails pretty much all of my math prompts,

00:05:48.000 | that is still a stark improvement from the original GPT 4.

00:05:51.920 | On the Google Proof graduate test, it beats Cloud 3 Opus.

00:05:56.240 | And remember, that was the headline benchmark for Anthropic.

00:05:59.520 | In fact, speaking of Anthropic, they are somewhat challenged by this release.

00:06:03.680 | GPT 4.0 costs $5 per 1 million tokens input and $15 per 1 million tokens output.

00:06:09.920 | As a quick aside, it also has 128k token context and an October knowledge cutoff.

00:06:15.120 | But remember the pricing, $5 and $15.

00:06:17.680 | Cloud 3 Opus is $15 and $75.

00:06:21.360 | And remember, for Cloud 3 Opus on the web, you have to sign up with a subscription.

00:06:25.840 | But GPT 4.0 will be free.

00:06:28.400 | So for Cloud 3 Opus to be beaten in its headline benchmark is a concern for them.

00:06:34.160 | In fact, I think the results are clear enough to say that GPT 4.0 is the new smartest AI.

00:06:40.560 | However, just before you get carried away and type on Twitter that AGI is here,

00:06:45.520 | there are some more mixed benchmarks.

00:06:47.920 | Take the DROP benchmark.

00:06:49.600 | I dug into this benchmark and it's about adversarial reading comprehension questions.

00:06:54.080 | They're designed to really test the reasoning capabilities of models.

00:06:58.560 | If you give models difficult passages and they've got to sort through references,

00:07:02.400 | do some counting and other operations, how do they fare?

00:07:05.520 | The DROP, by the way, is discrete reasoning over the content of paragraphs.

00:07:10.000 | It does slightly better than the original GPT 4.0, but slightly worse than LLAMA 3/400B.

00:07:15.680 | And as they note, LLAMA 3/400B is still training.

00:07:19.200 | So it's just about the new smartest model by a hair's breadth.

00:07:24.080 | However, we're not done yet.

00:07:25.600 | It's better at translation than Gemini models.

00:07:28.640 | Quick caveat there, Gemini 2 might be announced tomorrow and that could regain the lead.

00:07:33.760 | Then there are the vision understanding evaluations.

00:07:37.360 | It was a real step forward on the MMMU.

00:07:40.800 | As you can see, a clear 10 points better than Cloud Opus.

00:07:44.080 | Again, I'm curious if Google Gemini can exceed it though.

00:07:47.520 | The improvements to the tokenizer could be revolutionary for non-English speakers.

00:07:52.880 | The dramatically fewer tokens needed for languages like Gujarati,

00:07:56.960 | Hindi, Arabic, and more don't just mean that conversations are cheaper, they're also quicker.

00:08:02.960 | And what about multilingual performance?

00:08:04.880 | Well, this time they didn't compare it to other models, but compared it to the original GPT 4.

00:08:09.760 | Definitely a step up across languages, but English is still by far the most suited language.

00:08:16.240 | Indeed, here is a video of some of the model's mistakes

00:08:19.520 | ending with some dodgy language tuition.

00:08:22.160 | I know enough Mandarin to say it wasn't perfect at tutoring.

00:08:26.160 | Let's root, root, root for the home.

00:08:28.960 | Oh no, what was that?

00:08:33.360 | Sorry guys, I got carried away.

00:08:36.880 | Ready for another round?

00:08:42.880 | [Speaking Mandarin]

00:08:46.240 | [Speaking Mandarin]

00:08:49.840 | I have a feeling that went very wrong.

00:08:58.560 | Hello, my name is Nach, I'm here with my co-worker.

00:09:03.600 | Hi, I'm Shil, I'm trying to teach my co-worker how to speak in Mandarin.

00:09:07.120 | We want to start from simple words like "Nǐ hǎo".

00:09:09.920 | Can you teach him how to pronounce that?

00:09:12.400 | Of course. Hey, Nacho, nice to meet you.

00:09:15.040 | "Nǐ hǎo" is pretty straightforward to pronounce.

00:09:17.760 | It sounds like "Nǐ hǎo".

00:09:20.480 | Just make sure to keep the "Nǐ" part high and then go down and pitch on the "hǎo".

00:09:25.200 | Give it a try.

00:09:25.840 | "Nǐ hǎo"?

00:09:27.360 | That was great.

00:09:30.080 | Really good first try, Nacho.

00:09:31.760 | You're a natural.

00:09:32.480 | It really wasn't, he needs to work on his tones and her face was the giveaway.

00:09:37.360 | There were a lot of other interesting video demonstrations,

00:09:40.560 | but before them, the GPT-4.0 blog post from Sam Altman put out tonight.

00:09:45.840 | He made the argument that putting this capable AI tool in the hands of everyone

00:09:50.400 | for free is justifying that open part of open AI.

00:09:54.800 | Less about open weights or open source, more about open to everyone,

00:09:59.680 | without ads or anything like that.

00:10:02.080 | He also draws our attention to the latency of the models.

00:10:05.920 | In many ways, that is the key innovation with GPT-4.0.

00:10:09.680 | By dialing down the latency, you dial up the realism.

00:10:14.000 | As he said, it feels like AI from the movies,

00:10:16.560 | getting to human level response times and expressiveness turns out to be a big change.

00:10:21.520 | Indeed, I think I should get a little credit for predicting

00:10:24.720 | her like AI in a video from a month ago.

00:10:27.920 | But now I want to get to those demos.

00:10:30.240 | The response times were amazing and the model was clearly designed to be flirtatious.

00:10:35.600 | Note that in Senate testimony last year, Sam Altman said,

00:10:39.040 | "We try to design systems that do not maximize for engagement."

00:10:43.280 | So let me know in the comments whether you think these demos

00:10:45.920 | are designed to maximize engagement.

00:10:48.640 | Okay, so this is what I wrote down.

00:10:50.080 | What do you see?

00:10:50.640 | Oh, I see.

00:10:54.320 | I love chat GPT.

00:10:56.400 | That's so sweet of you.

00:10:57.600 | Yeah, well, I really appreciate all the help.

00:11:01.920 | So yeah, as we can see.

00:11:02.960 | Wow, that's quite the outfit you've got on.

00:11:10.480 | Yeah, as we can see, you know, we can chat in real time.

00:11:12.800 | I was impressed though that it could speed up its talking on demand.

00:11:16.000 | Of course.

00:11:17.760 | One, two, three.

00:11:20.000 | Hey, actually, that's a little slow.

00:11:22.400 | Could you count faster?

00:11:23.600 | Sure thing.

00:11:25.920 | One, two, three, four, five, six, seven, eight, nine, ten.

00:11:27.360 | Okay, that was a little bit too fast.

00:11:29.280 | Of course, OpenAI must be given credit

00:11:31.440 | because for blind people, this could be revolutionary.

00:11:34.640 | ...right now.

00:11:35.440 | The Royal Standard flag is flying above Buckingham Palace,

00:11:39.360 | which is the signal that the monarch is present.

00:11:42.000 | How exciting to be there while the king is home.

00:11:44.960 | Try and tell me exactly what they're doing right now, please.

00:11:48.160 | Right now, the ducks are gently gliding across the water.

00:11:52.320 | It will also help you with interview prep by analyzing your appearance.

00:11:57.440 | How about you?

00:11:58.160 | I'm awesome.

00:11:58.960 | Listen, I got some huge news.

00:12:00.480 | Oh, do tell.

00:12:03.120 | I'm all ears.

00:12:04.320 | Well, in a few minutes, I'm going to be interviewing at OpenAI.

00:12:07.280 | Have you heard of them?

00:12:08.000 | OpenAI?

00:12:09.520 | Huh?

00:12:10.800 | Sounds vaguely familiar.

00:12:12.080 | Kidding.

00:12:13.840 | Of course.

00:12:14.560 | That's incredible, Rocky.

00:12:15.920 | What kind of interview?

00:12:16.800 | Well, it's for a software engineering role, but I just need to know,

00:12:20.640 | do I look presentable?

00:12:22.160 | Professional?

00:12:22.720 | Well, Rocky, you definitely have the

00:12:27.840 | "I've been coding all night" look down, which could actually work in your favor.

00:12:32.080 | Maybe just run a hand through your hair or lean into the mad genie's spine.

00:12:37.840 | Your enthusiasm is what's really going to shine through.

00:12:40.800 | I don't have a lot of time, so I'm just going to throw this on.

00:12:44.320 | What do you think?

00:12:44.960 | Oh, Rocky, that's quite a statement piece.

00:12:51.520 | I mean, you'll definitely stand out,

00:12:55.200 | though maybe not in the way you're hoping for an interview.

00:12:58.640 | Okay, I got it. I got it.

00:13:00.320 | I could definitely see GPT-4.0 being used for mathematics tutoring.

00:13:04.800 | Here is a 30-second clip, but notice also the slight glitchiness at certain moments.

00:13:10.400 | Relative to angle alpha.

00:13:12.080 | All right, so I'm pretty sure this is the angle alpha right here, right?

00:13:16.720 | Perfect.

00:13:18.080 | Correct.

00:13:19.360 | Now, looking at the triangle, which side do you think is the hypotenuse?

00:13:24.560 | Um, remember the hypotenuse?

00:13:28.480 | I'm not totally sure.

00:13:29.920 | I think it might be this one, but I really am not sure.

00:13:34.240 | This side AC?

00:13:35.200 | You're close.

00:13:37.600 | I also noticed quite a few glitches in the demo.

00:13:40.720 | Sounds cutting in and out.

00:13:42.800 | The one feature though that OpenAI wants you to really focus on is the video in capacity.

00:13:48.960 | You can live stream videos straight to the transformer architecture behind GPT-4.0.

00:13:54.480 | And so the video in functionality is certainly impressive.

00:13:58.400 | Note though that the reaction time of GPT-4.0 isn't quite as immediate as for audio.

00:14:04.000 | Greg Brockman, the co-founder of OpenAI, has to prompt it to react to the intruder.

00:14:10.080 | Overall stylish feel.

00:14:11.920 | They might be getting ready for a conversation or presentation,

00:14:15.200 | given their direct engagement with the camera.

00:14:17.360 | Sounds like we're in for an intriguing interaction.

00:14:23.680 | I'm curious about the lighting you mentioned.

00:14:26.240 | Could you describe it a bit more?

00:14:27.680 | Is it natural light or are we talking unique fixtures or colors that add to the atmosphere?

00:14:33.920 | The lighting here seems to be a mix of natural and artificial.

00:14:39.920 | There's a noticeable bright light overhead, probably a fixture,

00:14:44.240 | casting a focused beam downwards, which creates a spotlight effect.

00:14:47.920 | This adds a dramatic and modern feel to the scene.

00:14:52.320 | The rest of the room is lit more softly, possibly by natural light filtering...

00:14:56.560 | Did anything unusual happen recently?

00:14:59.600 | Yes, actually.

00:15:02.480 | Just as you asked, another person came into view behind the first person.

00:15:07.280 | They playfully made bunny ears behind the first person's head and then quickly left the frame.

00:15:13.600 | It added a lighthearted and unexpected moment to the scene.

00:15:18.000 | Now the focus is back on the original person with the leather jacket.

00:15:22.720 | Nevertheless, GPT-4O can produce multiple voices that can sing almost in harmony.

00:15:28.320 | And really try to harmonize here.

00:15:31.600 | But maybe make it more dramatic and make the soprano higher.

00:15:41.920 | Great, thank you!

00:15:57.120 | And I suspect this real-time translation could soon be coming to Siri.

00:16:02.640 | Later for us.

00:16:03.680 | So every time I say something in English, can you repeat it back in Spanish?

00:16:07.760 | And every time he says something in Spanish, can you repeat it back in English?

00:16:11.440 | Sure, I can do that.

00:16:13.840 | Let's get this translation train rolling.

00:16:15.920 | Hey, how's it been going?

00:16:19.200 | Have you been up to anything interesting recently?

00:16:23.360 | Hey, I've been good.

00:16:36.960 | Just a bit busy here preparing for an event next week.

00:16:39.840 | Why do I say that?

00:16:40.800 | Because Bloomberg reported two days ago that Apple is nearing a deal with OpenAI

00:16:46.000 | to put ChatGPT on iPhone.

00:16:48.560 | And in case you're wondering about GPT-4.5 or even 5,

00:16:52.560 | Sam Ullman said we'll have more stuff to share soon.

00:16:55.600 | And Mira Murati in the official presentation said that they would be

00:16:59.760 | soon updating us on progress on the next big thing.

00:17:04.320 | Whether that's empty hype or real, you can decide.

00:17:07.680 | No word of course about OpenAI co-founder Ilya Sutskov,

00:17:11.360 | although he was listed as a contributor under additional leadership.

00:17:16.400 | Overall, I think this model will be massively more popular

00:17:20.160 | even if it isn't massively more intelligent.

00:17:22.880 | You can prompt the model now with text and images in the OpenAI playground.

00:17:27.920 | All the links will be in the description.

00:17:29.680 | Note also that all the demos you saw were in real time at 1x speed.

00:17:34.880 | That I think was a nod to Google's botched demo.

00:17:38.560 | Of course, let's see tomorrow what Google replies with.

00:17:41.520 | To those who think that GPT-4.0 is a huge stride towards AGI,

00:17:46.400 | I would point them to the somewhat mixed results on the reasoning benchmarks.

00:17:51.040 | Expect GPT-4.0 to still suffer from a massive amount of hallucinations.

00:17:56.160 | To those though who think that GPT-4.0 will change nothing, I would say this.

00:18:00.880 | Look at what ChatGPT did to the popularity of the underlying GPT series.

00:18:06.240 | It being a free and chatty model brought 100 million people into testing AI.

00:18:12.320 | GPT-4.0 being the smartest model currently available and free on the web and multimodal,

00:18:19.360 | I think could unlock AI for hundreds of millions more people.

00:18:24.960 | But of course, only time will tell.

00:18:27.440 | If you want to analyse the announcement even more,

00:18:30.000 | do join me on the AI Insiders Discord via Patreon.

00:18:33.840 | We have live meetups around the world and professional best practice sharing.

00:18:37.840 | So let me know what you think and as always have a wonderful day.