back to index

GPT-4o - Full Breakdown + Bonus Details


Whisper Transcript | Transcript Only Page

00:00:00.000 | It's smarter in most ways, cheaper, faster, better at coding, multi-modal in and out,
00:00:07.440 | and perfectly timed to steal the spotlight from Google.
00:00:11.200 | It's GPT-4 Omni.
00:00:13.760 | I've gone through all the benchmarks and the release videos to give you the highlights.
00:00:19.200 | My first reaction was it's more flirtatious PSI than AGI, but a notable step forward nonetheless.
00:00:27.680 | First things first, GPT-4 O, meaning Omni, which is all or everywhere,
00:00:33.280 | referencing the different modalities it's got, is free.
00:00:36.800 | By making GPT-4 O free, they are either crazy committed to scaling up from 100 million users
00:00:43.360 | to hundreds of millions of users, or they have an even smarter model coming soon,
00:00:48.080 | and they did hint at that.
00:00:49.360 | Of course, it could be both, but it does have to be something.
00:00:52.480 | Just giving paid users five times more in terms of message limits doesn't seem enough to me.
00:00:57.680 | Next, OpenAI branded this as GPT-4 level intelligence,
00:01:02.240 | although in a way, I think they slightly underplayed it.
00:01:05.280 | So before we get to the video demos, some of which you may have already seen,
00:01:09.120 | let me get to some more under the radar announcements.
00:01:12.800 | Take text to image and look at the accuracy of the text generated from this prompt.
00:01:18.560 | Now, I know it's not perfect. There aren't two question marks on the now.
00:01:22.800 | There's others that you can spot, like the I being capitalized.
00:01:26.000 | But overall, I've never seen text generated with that much accuracy.
00:01:30.000 | It wasn't even in the demo.
00:01:31.520 | Or take this other example, where two OpenAI researchers submitted their photos.
00:01:36.000 | Then they asked GPT-4 O to design a movie poster, and they gave the requirements in text.
00:01:42.240 | Now, when you see the first output, you're going to say, well, that isn't that good.
00:01:46.480 | But then they asked GPT-4 O something fascinating.
00:01:49.760 | It seemed to be almost reverse psychology because they said,
00:01:52.640 | here is the same poster, but cleaned up.
00:01:54.640 | The text is crisper and the colors bolder and more dramatic.
00:01:58.080 | The whole image is now improved.
00:02:00.000 | This is the input, don't forget.
00:02:01.920 | The final result in terms of the accuracy of the photos and of the text was really quite impressive.
00:02:08.000 | I can imagine millions of children and adults playing about with this functionality.
00:02:12.480 | Of course, they can't do so immediately because OpenAI said
00:02:15.280 | this will be released in the next few weeks.
00:02:17.680 | As another bonus, here is a video that OpenAI didn't put on their YouTube channel.
00:02:22.320 | It mimics a demo that Google made years ago, but never followed up with.
00:02:26.800 | The OpenAI employee asked GPT-4 O to call customer service and ask for something.
00:02:32.960 | I've skipped ahead, and the customer service in this case is another AI.
00:02:36.560 | But here is the conclusion.
00:02:38.480 | Could you provide Joe's email address for me?
00:02:40.720 | Sure. It's joe@example.com.
00:02:43.200 | Awesome. All right. I've just sent the email.
00:02:46.640 | Can you check if Joe received it?
00:02:48.560 | We'll check right now. Please hold.
00:02:50.480 | Sure thing.
00:02:51.280 | Hey, Joe, could you please check your email to see if the shipping label
00:02:55.120 | and return instructions have arrived?
00:02:56.720 | Fingers crossed.
00:02:57.680 | Yes, I got the instructions.
00:02:59.200 | Perfect. Joe has received the email.
00:03:00.960 | They call it a proof of concept, but it is a hint toward the agents that are coming.
00:03:06.080 | Here are five more quick things that didn't make it to the demo.
00:03:09.520 | How about a replacement for lenser?
00:03:12.000 | Submit your photo and get a caricature of yourself.
00:03:15.280 | Or what about text to new font?
00:03:17.680 | You just ask for a new style of font and it will generate one.
00:03:21.680 | Or what about meeting transcription?
00:03:23.920 | The meeting in this case had four speakers and it was transcribed.
00:03:28.400 | Or video summaries. Remember this model is multimodal in and out.
00:03:33.200 | Now it doesn't have video out, but I'll get to that in a moment.
00:03:36.480 | Here, though, was a demonstration of a 45 minute video submitted to GPT 4.0
00:03:42.080 | and a summary of that video.
00:03:43.840 | We also got character consistency across both woman and dog,
00:03:48.480 | almost like an entire cartoon strip.
00:03:50.800 | If those were the quick bonuses,
00:03:52.640 | what about the actual intelligence and performance of the model?
00:03:55.920 | Before I get to official benchmarks, here is a human graded leaderboard
00:04:00.480 | pitting one model against another.
00:04:02.560 | And yes, I'm also a good GPT 2 chatbot is indeed GPT 4.0.
00:04:08.640 | So it turns out I've actually been testing the model for days.
00:04:11.840 | Overall, you can see the preference for GPT 4.0 compared to all other models.
00:04:17.600 | In coding specifically, the difference is quite stark.
00:04:21.200 | I would say even here, though,
00:04:23.120 | we're not looking at an entirely new tier of intelligence.
00:04:27.200 | Remember that a 100 Elo gap is a win rate of around two thirds.
00:04:32.240 | So one third of the time GPT 4.0 Turbo's outputs would be preferred.
00:04:36.480 | That's about the same gap between GPT 4.0 Turbo and last year's GPT 4.0.
00:04:41.120 | A huge step forward, but not completely night and day.
00:04:44.560 | I think one underrated announcement was the desktop app, a live coding co-pilot.
00:04:50.640 | OK, so I'm going to open the chat GPT desktop app
00:04:54.320 | like Mira was talking about before.
00:04:55.600 | OK, and to give a bit of background of what's going on.
00:04:58.320 | So here we have a computer and on the screen we have some code
00:05:01.840 | and then the chat GPT voice app is on the right.
00:05:04.160 | So chat GPT will be able to hear me, but it can't see anything on the screen.
00:05:07.920 | So I'm going to highlight the code, command see it,
00:05:10.400 | and then that will send it to chat GPT.
00:05:11.920 | And then I'm going to talk about the code to chat GPT.
00:05:13.680 | OK, so I just shared some code with you.
00:05:17.280 | Could you give me a really brief one sentence description of what's going on in the code?
00:05:20.240 | This code fetches daily weather data for a specific location and time period,
00:05:26.080 | smooths the temperature data using a rolling average,
00:05:29.280 | annotates a significant weather event on the resulting plot,
00:05:32.720 | and then displays the plot with the average minimum and maximum temperatures over the year.
00:05:37.680 | I've delayed long enough.
00:05:38.960 | Here are the benchmarks.
00:05:40.800 | I was most impressed with GPT 4.0's performance on the math benchmark.
00:05:45.120 | Even though it fails pretty much all of my math prompts,
00:05:48.000 | that is still a stark improvement from the original GPT 4.
00:05:51.920 | On the Google Proof graduate test, it beats Cloud 3 Opus.
00:05:56.240 | And remember, that was the headline benchmark for Anthropic.
00:05:59.520 | In fact, speaking of Anthropic, they are somewhat challenged by this release.
00:06:03.680 | GPT 4.0 costs $5 per 1 million tokens input and $15 per 1 million tokens output.
00:06:09.920 | As a quick aside, it also has 128k token context and an October knowledge cutoff.
00:06:15.120 | But remember the pricing, $5 and $15.
00:06:17.680 | Cloud 3 Opus is $15 and $75.
00:06:21.360 | And remember, for Cloud 3 Opus on the web, you have to sign up with a subscription.
00:06:25.840 | But GPT 4.0 will be free.
00:06:28.400 | So for Cloud 3 Opus to be beaten in its headline benchmark is a concern for them.
00:06:34.160 | In fact, I think the results are clear enough to say that GPT 4.0 is the new smartest AI.
00:06:40.560 | However, just before you get carried away and type on Twitter that AGI is here,
00:06:45.520 | there are some more mixed benchmarks.
00:06:47.920 | Take the DROP benchmark.
00:06:49.600 | I dug into this benchmark and it's about adversarial reading comprehension questions.
00:06:54.080 | They're designed to really test the reasoning capabilities of models.
00:06:58.560 | If you give models difficult passages and they've got to sort through references,
00:07:02.400 | do some counting and other operations, how do they fare?
00:07:05.520 | The DROP, by the way, is discrete reasoning over the content of paragraphs.
00:07:10.000 | It does slightly better than the original GPT 4.0, but slightly worse than LLAMA 3/400B.
00:07:15.680 | And as they note, LLAMA 3/400B is still training.
00:07:19.200 | So it's just about the new smartest model by a hair's breadth.
00:07:24.080 | However, we're not done yet.
00:07:25.600 | It's better at translation than Gemini models.
00:07:28.640 | Quick caveat there, Gemini 2 might be announced tomorrow and that could regain the lead.
00:07:33.760 | Then there are the vision understanding evaluations.
00:07:37.360 | It was a real step forward on the MMMU.
00:07:40.800 | As you can see, a clear 10 points better than Cloud Opus.
00:07:44.080 | Again, I'm curious if Google Gemini can exceed it though.
00:07:47.520 | The improvements to the tokenizer could be revolutionary for non-English speakers.
00:07:52.880 | The dramatically fewer tokens needed for languages like Gujarati,
00:07:56.960 | Hindi, Arabic, and more don't just mean that conversations are cheaper, they're also quicker.
00:08:02.960 | And what about multilingual performance?
00:08:04.880 | Well, this time they didn't compare it to other models, but compared it to the original GPT 4.
00:08:09.760 | Definitely a step up across languages, but English is still by far the most suited language.
00:08:16.240 | Indeed, here is a video of some of the model's mistakes
00:08:19.520 | ending with some dodgy language tuition.
00:08:22.160 | I know enough Mandarin to say it wasn't perfect at tutoring.
00:08:26.160 | Let's root, root, root for the home.
00:08:28.960 | Oh no, what was that?
00:08:33.360 | Sorry guys, I got carried away.
00:08:36.880 | Ready for another round?
00:08:42.880 | [Speaking Mandarin]
00:08:46.240 | [Speaking Mandarin]
00:08:49.840 | I have a feeling that went very wrong.
00:08:58.560 | Hello, my name is Nach, I'm here with my co-worker.
00:09:03.600 | Hi, I'm Shil, I'm trying to teach my co-worker how to speak in Mandarin.
00:09:07.120 | We want to start from simple words like "Nǐ hǎo".
00:09:09.920 | Can you teach him how to pronounce that?
00:09:12.400 | Of course. Hey, Nacho, nice to meet you.
00:09:15.040 | "Nǐ hǎo" is pretty straightforward to pronounce.
00:09:17.760 | It sounds like "Nǐ hǎo".
00:09:20.480 | Just make sure to keep the "Nǐ" part high and then go down and pitch on the "hǎo".
00:09:25.200 | Give it a try.
00:09:25.840 | "Nǐ hǎo"?
00:09:27.360 | That was great.
00:09:30.080 | Really good first try, Nacho.
00:09:31.760 | You're a natural.
00:09:32.480 | It really wasn't, he needs to work on his tones and her face was the giveaway.
00:09:37.360 | There were a lot of other interesting video demonstrations,
00:09:40.560 | but before them, the GPT-4.0 blog post from Sam Altman put out tonight.
00:09:45.840 | He made the argument that putting this capable AI tool in the hands of everyone
00:09:50.400 | for free is justifying that open part of open AI.
00:09:54.800 | Less about open weights or open source, more about open to everyone,
00:09:59.680 | without ads or anything like that.
00:10:02.080 | He also draws our attention to the latency of the models.
00:10:05.920 | In many ways, that is the key innovation with GPT-4.0.
00:10:09.680 | By dialing down the latency, you dial up the realism.
00:10:14.000 | As he said, it feels like AI from the movies,
00:10:16.560 | getting to human level response times and expressiveness turns out to be a big change.
00:10:21.520 | Indeed, I think I should get a little credit for predicting
00:10:24.720 | her like AI in a video from a month ago.
00:10:27.920 | But now I want to get to those demos.
00:10:30.240 | The response times were amazing and the model was clearly designed to be flirtatious.
00:10:35.600 | Note that in Senate testimony last year, Sam Altman said,
00:10:39.040 | "We try to design systems that do not maximize for engagement."
00:10:43.280 | So let me know in the comments whether you think these demos
00:10:45.920 | are designed to maximize engagement.
00:10:48.640 | Okay, so this is what I wrote down.
00:10:50.080 | What do you see?
00:10:50.640 | Oh, I see.
00:10:54.320 | I love chat GPT.
00:10:56.400 | That's so sweet of you.
00:10:57.600 | Yeah, well, I really appreciate all the help.
00:11:01.920 | So yeah, as we can see.
00:11:02.960 | Wow, that's quite the outfit you've got on.
00:11:10.480 | Yeah, as we can see, you know, we can chat in real time.
00:11:12.800 | I was impressed though that it could speed up its talking on demand.
00:11:16.000 | Of course.
00:11:17.760 | One, two, three.
00:11:20.000 | Hey, actually, that's a little slow.
00:11:22.400 | Could you count faster?
00:11:23.600 | Sure thing.
00:11:25.920 | One, two, three, four, five, six, seven, eight, nine, ten.
00:11:27.360 | Okay, that was a little bit too fast.
00:11:29.280 | Of course, OpenAI must be given credit
00:11:31.440 | because for blind people, this could be revolutionary.
00:11:34.640 | ...right now.
00:11:35.440 | The Royal Standard flag is flying above Buckingham Palace,
00:11:39.360 | which is the signal that the monarch is present.
00:11:42.000 | How exciting to be there while the king is home.
00:11:44.960 | Try and tell me exactly what they're doing right now, please.
00:11:48.160 | Right now, the ducks are gently gliding across the water.
00:11:52.320 | It will also help you with interview prep by analyzing your appearance.
00:11:57.440 | How about you?
00:11:58.160 | I'm awesome.
00:11:58.960 | Listen, I got some huge news.
00:12:00.480 | Oh, do tell.
00:12:03.120 | I'm all ears.
00:12:04.320 | Well, in a few minutes, I'm going to be interviewing at OpenAI.
00:12:07.280 | Have you heard of them?
00:12:08.000 | OpenAI?
00:12:10.800 | Sounds vaguely familiar.
00:12:12.080 | Kidding.
00:12:13.840 | Of course.
00:12:14.560 | That's incredible, Rocky.
00:12:15.920 | What kind of interview?
00:12:16.800 | Well, it's for a software engineering role, but I just need to know,
00:12:20.640 | do I look presentable?
00:12:22.160 | Professional?
00:12:22.720 | Well, Rocky, you definitely have the
00:12:27.840 | "I've been coding all night" look down, which could actually work in your favor.
00:12:32.080 | Maybe just run a hand through your hair or lean into the mad genie's spine.
00:12:37.840 | Your enthusiasm is what's really going to shine through.
00:12:40.800 | I don't have a lot of time, so I'm just going to throw this on.
00:12:44.320 | What do you think?
00:12:44.960 | Oh, Rocky, that's quite a statement piece.
00:12:51.520 | I mean, you'll definitely stand out,
00:12:55.200 | though maybe not in the way you're hoping for an interview.
00:12:58.640 | Okay, I got it. I got it.
00:13:00.320 | I could definitely see GPT-4.0 being used for mathematics tutoring.
00:13:04.800 | Here is a 30-second clip, but notice also the slight glitchiness at certain moments.
00:13:10.400 | Relative to angle alpha.
00:13:12.080 | All right, so I'm pretty sure this is the angle alpha right here, right?
00:13:16.720 | Perfect.
00:13:18.080 | Correct.
00:13:19.360 | Now, looking at the triangle, which side do you think is the hypotenuse?
00:13:24.560 | Um, remember the hypotenuse?
00:13:28.480 | I'm not totally sure.
00:13:29.920 | I think it might be this one, but I really am not sure.
00:13:34.240 | This side AC?
00:13:35.200 | You're close.
00:13:37.600 | I also noticed quite a few glitches in the demo.
00:13:40.720 | Sounds cutting in and out.
00:13:42.800 | The one feature though that OpenAI wants you to really focus on is the video in capacity.
00:13:48.960 | You can live stream videos straight to the transformer architecture behind GPT-4.0.
00:13:54.480 | And so the video in functionality is certainly impressive.
00:13:58.400 | Note though that the reaction time of GPT-4.0 isn't quite as immediate as for audio.
00:14:04.000 | Greg Brockman, the co-founder of OpenAI, has to prompt it to react to the intruder.
00:14:10.080 | Overall stylish feel.
00:14:11.920 | They might be getting ready for a conversation or presentation,
00:14:15.200 | given their direct engagement with the camera.
00:14:17.360 | Sounds like we're in for an intriguing interaction.
00:14:23.680 | I'm curious about the lighting you mentioned.
00:14:26.240 | Could you describe it a bit more?
00:14:27.680 | Is it natural light or are we talking unique fixtures or colors that add to the atmosphere?
00:14:33.920 | The lighting here seems to be a mix of natural and artificial.
00:14:39.920 | There's a noticeable bright light overhead, probably a fixture,
00:14:44.240 | casting a focused beam downwards, which creates a spotlight effect.
00:14:47.920 | This adds a dramatic and modern feel to the scene.
00:14:52.320 | The rest of the room is lit more softly, possibly by natural light filtering...
00:14:56.560 | Did anything unusual happen recently?
00:14:59.600 | Yes, actually.
00:15:02.480 | Just as you asked, another person came into view behind the first person.
00:15:07.280 | They playfully made bunny ears behind the first person's head and then quickly left the frame.
00:15:13.600 | It added a lighthearted and unexpected moment to the scene.
00:15:18.000 | Now the focus is back on the original person with the leather jacket.
00:15:22.720 | Nevertheless, GPT-4O can produce multiple voices that can sing almost in harmony.
00:15:28.320 | And really try to harmonize here.
00:15:31.600 | But maybe make it more dramatic and make the soprano higher.
00:15:41.920 | Great, thank you!
00:15:57.120 | And I suspect this real-time translation could soon be coming to Siri.
00:16:02.640 | Later for us.
00:16:03.680 | So every time I say something in English, can you repeat it back in Spanish?
00:16:07.760 | And every time he says something in Spanish, can you repeat it back in English?
00:16:11.440 | Sure, I can do that.
00:16:13.840 | Let's get this translation train rolling.
00:16:15.920 | Hey, how's it been going?
00:16:19.200 | Have you been up to anything interesting recently?
00:16:23.360 | Hey, I've been good.
00:16:36.960 | Just a bit busy here preparing for an event next week.
00:16:39.840 | Why do I say that?
00:16:40.800 | Because Bloomberg reported two days ago that Apple is nearing a deal with OpenAI
00:16:46.000 | to put ChatGPT on iPhone.
00:16:48.560 | And in case you're wondering about GPT-4.5 or even 5,
00:16:52.560 | Sam Ullman said we'll have more stuff to share soon.
00:16:55.600 | And Mira Murati in the official presentation said that they would be
00:16:59.760 | soon updating us on progress on the next big thing.
00:17:04.320 | Whether that's empty hype or real, you can decide.
00:17:07.680 | No word of course about OpenAI co-founder Ilya Sutskov,
00:17:11.360 | although he was listed as a contributor under additional leadership.
00:17:16.400 | Overall, I think this model will be massively more popular
00:17:20.160 | even if it isn't massively more intelligent.
00:17:22.880 | You can prompt the model now with text and images in the OpenAI playground.
00:17:27.920 | All the links will be in the description.
00:17:29.680 | Note also that all the demos you saw were in real time at 1x speed.
00:17:34.880 | That I think was a nod to Google's botched demo.
00:17:38.560 | Of course, let's see tomorrow what Google replies with.
00:17:41.520 | To those who think that GPT-4.0 is a huge stride towards AGI,
00:17:46.400 | I would point them to the somewhat mixed results on the reasoning benchmarks.
00:17:51.040 | Expect GPT-4.0 to still suffer from a massive amount of hallucinations.
00:17:56.160 | To those though who think that GPT-4.0 will change nothing, I would say this.
00:18:00.880 | Look at what ChatGPT did to the popularity of the underlying GPT series.
00:18:06.240 | It being a free and chatty model brought 100 million people into testing AI.
00:18:12.320 | GPT-4.0 being the smartest model currently available and free on the web and multimodal,
00:18:19.360 | I think could unlock AI for hundreds of millions more people.
00:18:24.960 | But of course, only time will tell.
00:18:27.440 | If you want to analyse the announcement even more,
00:18:30.000 | do join me on the AI Insiders Discord via Patreon.
00:18:33.840 | We have live meetups around the world and professional best practice sharing.
00:18:37.840 | So let me know what you think and as always have a wonderful day.