Google Takes No Prisoners Amid Torrent of AI Announcements

00:00:00.000 | I think Google was asked how many AI breakthroughs they would reveal yesterday on stage and they

00:00:05.360 | replied yes because two and a bit years after Microsoft's CEO said he wanted to make Google

00:00:10.920 | dance, Google's CEO Sundar Pichai and resident Nobel laureate Demis Asabis performed a two-hour

00:00:17.820 | breakdance routine. Honestly there were enough announcements to make 10 to 12 separate videos

00:00:22.600 | but for now I will just give you a sense of the breadth of what they released or said they would

00:00:28.640 | soon release. Not gonna lie it was kind of tempting to make the entire video a VO3 montage but no it was

00:00:34.460 | much more than that. Suffice to say every other AI rival on the planet took a big gulp. So from the

00:00:41.300 | useful to the entertaining, the impressive to the meh, here's the gist of the 12 most interesting to me

00:00:47.800 | dance moves. Now I have to start obviously with VO3 because adding sound to video was such an obvious

00:00:53.500 | step but the effect is remarkable. Generating videos with built-in dialogue really changes

00:01:01.320 | things doesn't it? VO2 was already incredible but across a thousand prompts VO3 outperformed VO2 and

00:01:10.320 | the newly released cling 2.0 and of course OpenAI Sora. Over 80% of the time people preferred VO3's output.

00:01:17.680 | But before I get to the obligatory 45 seconds worth of samples, a quick word on price and availability.

00:01:23.500 | Only the $250 tier Google AI Ultra will get access to VO3 currently. Oh and that's only if you are in

00:01:33.040 | the U.S. and trust me I have tried to get access but so far to no avail. This isn't like Sora where a

00:01:39.040 | quick VPN will do the trick. With that caveat said, in these clips notice both the dialogue that's generated by

00:01:45.680 | VO3 and the sound effects.

00:01:49.680 | The sum of the squares of the two shorter sides is equal to the square of the longest side.

00:01:54.560 | Yo, check the footage, put it to the test. Our video model, yeah, it's the best. Straight up, no cap, you know how we do. VO3 rules, yeah. The whole damn crew.

00:02:02.240 | If you thought that yesterday's IO was just VO3 plus a sprinkle of other bits then you might well be in for a surprise.

00:02:17.280 | Because I don't know if you caught this but the Gemini 2.5 flash update was a price shock akin to the DeepSeek R1 bombshell.

00:02:26.240 | Think performance on par with DeepSeek R1 at one quarter of the price. And this is comparable performance with much more expensive models whether

00:02:35.120 | we're talking about general knowledge, tough science questions, mathematics or coding.

00:02:39.120 | And I'm not sure if you guys caught this but Gemini 2.5 flash also has native audio generation.

00:02:44.000 | So you can control what one speaker or multiple speakers say, their accent and even instructions like giggling, sighing or groaning.

00:02:54.240 | This is for 24 languages by the way and the model can switch between languages in the same output.

00:03:12.320 | The next thing isn't here yet so take it with a grain of salt but Demesis Arbis described a universal AI assistant.

00:03:19.040 | And this might remind you of something but they demoed an agent that could make calls on your behalf.

00:03:24.400 | We've seen that one before but this seems real and also shop for you.

00:03:28.320 | A bit like OpenAI's operator but all in one package.

00:03:31.760 | Now yes this might seem theoretical but live as of yesterday across all of Android is Gemini Live.

00:03:39.120 | Open the Gemini app, tap the bottom right button and you can share what your camera sees in your phone and have a live conversation with Gemini.

00:03:46.960 | The next thing that caught my eye isn't a new feature or a new model but two statements that Google CEO made that I found pretty interesting.

00:03:55.360 | First as you can see it's not just that 400 million people are now using Gemini every month, they're using it more.

00:04:01.520 | So as compared to this time last year there are 50 times the number of tokens or let's say words being generated by Gemini AI models.

00:04:09.520 | I think you guys saw this before anyone but AI is not a fad, it's not going anywhere.

00:04:14.320 | The next statement, not feature or model, was a cheeky slap at OpenAI for their recent struggles with models being sycophantic, flattering the user.

00:04:23.360 | You might not have noticed this reference, but I think Google was making a pretty clear statement.

00:04:28.080 | Sick convertible.

00:04:29.120 | Garbage truck again.

00:04:30.960 | Anything else?

00:04:32.480 | Why do people keep delivering packages to my lawn?

00:04:35.680 | It's not a package, it's a utility box.

00:04:38.880 | Why is this person following me wherever I walk?

00:04:42.080 | No one's following you, that's just your shadow.

00:04:45.760 | Gemini is pretty good at telling you when you're wrong.

00:04:50.480 | The next announcement that will almost certainly be a video when it's actually released was of course Gemini 2.5 Pro DeepThink.

00:04:57.600 | Officially only available on that $250 tier, but I think I'm going to be able to get access in the next couple of days.

00:05:04.160 | That's of course to run it on SimpleBench, but we already have some scores where this DeepThink mode outperforms not just Gemini 2.5 Pro Vanilla, but O3 and O4 Mini from OpenAI.

00:05:16.160 | Yes, on coding, but also mathematics quite dramatically and multimodality.

00:05:21.360 | This is the MMMU, which is about analyzing charts, graphs and other visuals.

00:05:26.000 | Essentially, Google's claim here is that with DeepThink, you will have access to the smartest model on the planet.

00:05:32.480 | Now, of course, we will have to test that.

00:05:34.640 | And let's just say that by tomorrow there might be other contenders, hint, hint.

00:05:39.840 | But Google did give us a slight nod as to how it performs so well.

00:05:44.720 | I listened to various surrounding interviews and, of course, the full three hours of video materials from the IO, and they kept hinting about parallel samples.

00:05:53.680 | And I was like, hmm, that sounds familiar.

00:05:55.440 | I covered the bombshell Google paper that talked about sampling and scaling up inference time search.

00:06:02.240 | This was on a recent Patreon video.

00:06:04.400 | But for those who aren't going to see that video, it basically said that scaling up these samples that you analyze in a modular approach can beat scaling up the length of chain of thought.

00:06:14.640 | The key author called this yet another axis in which the AI labs can scale up their compute spend.

00:06:20.400 | Onto something which I think they have massively overhyped in the past, which is AI overviews, which is incredibly untrustworthy.

00:06:27.360 | Of course, they focused on its successes and how it scaled up to 1.5 billion, quote, users.

00:06:32.960 | While I do wonder how many of those 1.5 billion users were given erroneous results, they did announce that in future.

00:06:40.320 | I think from now, it will be powered by a custom 2.5 model, probably something akin to Gemini 2.5 flash light.

00:06:49.040 | But either way, expect accuracy to hopefully improve quite significantly.

00:06:53.200 | I wouldn't normally feature something like this, but for something that's going to be used by billions of people, I think it is significant.

00:07:00.000 | While we're on search, I have to mention AI mode, which for me is Google's attempt to be a perplexity killer.

00:07:06.240 | Yes, you can engage in a back and forth conversation.

00:07:09.040 | By the summer, apparently, it's going to be able to book things for you like an agent, perform deep researches and do data analytics.

00:07:15.840 | Now, none of those features might be new to any of you guys, but it does show that Google is preparing rapidly for the days where that classic search bar is replaced by AI mode.

00:07:25.680 | Speaking of Google deep research, that too has seen a pretty big upgrade.

00:07:30.480 | The model behind it has been upgraded, and if you are on the pro tier, you get the full 2.5 pro to power it.

00:07:37.280 | And yes, like OpenAI deep research, you can now use your own files, but I think there is something much cooler about the new deep research.

00:07:45.280 | Because I'm going to be totally honest with you guys, but I found the original deep research very verbose, like it would always generate these 20 page reports, even if I asked for something quite simple.

00:07:56.000 | Indeed, that was the case when I asked it just now, find out 50 incredible facts about Alpha Evolve, my previous video.

00:08:02.800 | But now Google deep research is integrated with their canvas feature.

00:08:07.520 | So you can instantly turn that deep research report into an interactive website, for example, or maybe just a chart, a table or a podcast using notebook LM.

00:08:19.120 | So while it's good to be comprehensive, I now think most users will have something that they can use on a daily basis.

00:08:24.960 | Speaking of coding, I'll quickly throw in Google's jewels, which is the rival to OpenAI's codex.

00:08:31.440 | And now it's just a few hours before, but with jewels, anyone can sign up and it's free up to five tasks per day powered by 2.5 pro.

00:08:41.440 | Obviously, I'll have to test it side by side, but jewels can import your GitHub repo, clone it virtually on the cloud, verify different changes actually work, for example.

00:08:51.360 | And if you are new to all of that, Google have produced a replic rival.

00:08:56.160 | This was featured in the developer session, but essentially you can not only develop an app,

00:09:00.640 | but also deploy it on Google Cloud Run.

00:09:02.640 | Yes, they are fairly basic apps that you can create for now, but other people can essentially see what you've made and try it out and enjoy it.

00:09:10.640 | Let's go back to some sweet visuals with Imagine4, their latest text to image model.

00:09:16.640 | In their promotional materials, Google leaned into the finer details that Imagine4 can do, as well as text fidelity.

00:09:24.640 | If you see this image of sheep in a field with knitted yarn, I tried the exact same prompt in GPT image one and got this.

00:09:32.640 | But rather than just rely on a sample size of one, they actually showed us some benchmarks.

00:09:38.640 | Now it's a busy chart, but Google is essentially admitting that GPT image one still outperforms Imagine4 on ultra settings, but takes a lot longer to generate its images.

00:09:48.640 | So for text to image, it's probably fair to say Google has caught up to OpenAI and that image generation model you see in ChatGPT, but hasn't surpassed OpenAI.

00:09:58.640 | But I will say that if it's speed you're looking for, look no further than Gemini Diffusion, the model that no one saw coming.

00:10:05.640 | It's not out yet, I'm on the wait list, but it is a totally different way of doing language modelling.

00:10:11.640 | I'll summarise how it works in a second, but first, how fast does it work?

00:10:15.640 | Well, you can see the prompt and there's the answer.

00:10:17.640 | Google say that the Gemini Diffusion model is five times faster than their fastest current model.

00:10:24.640 | I mean, you could just pause and think through the implications of that.

00:10:27.640 | Imagine in the near future, an instant app developed just with a voice prompt.

00:10:32.640 | How can it possibly be that fast?

00:10:34.640 | Well, Diffusion models work differently to auto-regressive or token by token language models.

00:10:39.640 | Here's a quick analogy I came up with, let me know if you like it.

00:10:43.640 | Almost all language models you are familiar with work by predicting the probability of a set of possible next words in the sequence.

00:10:51.640 | Diffusion models can work on the entire output at once.

00:10:55.640 | It's a bit like the difference between one person rapidly placing Lego blocks to build up an entire statue.

00:11:01.640 | That's auto-regressive models.

00:11:03.640 | With Diffusion models, it's a bit like having a giant cube of Lego bricks and you're trying to make that statue.

00:11:09.640 | You already have the cube and 100 like-minded people each come along and take or add a block from that cube for a few turns until the statue is revealed.

00:11:21.640 | Going from a block of noise to a sculpted statue in far less time.

00:11:26.640 | But do you have to sacrifice performance?

00:11:28.640 | Well, the early benchmarks say probably not.

00:11:32.640 | Depends on the domain, of course, and we'll all have to do plenty of testing, but the signs look good.

00:11:38.640 | As I mentioned at the start of this video, this announcement alone, Gemini Diffusion, could have been an entire video and hopefully one day soon will be.

00:11:46.640 | On a much lighter note, of course, I can't help but mention the new try it on feature from Google.

00:11:52.640 | I'm sure all of you guys and gals will be using this almost immediately.

00:11:56.640 | But the interesting bit for me was that Google made their own bespoke image generator model just so that you could input a photo of yourself and try different fashionable items on before you buy them.

00:12:07.640 | OK, might not be as crazy impressive as the other announcements, but designing a bespoke model for that is a little bit of a flex.

00:12:15.640 | One thing I thought I would quickly flag up is that Google announced a Synth ID detector.

00:12:20.640 | Now, it's fairly old news that Google adds a Synth ID watermark to its text and its images and videos.

00:12:27.640 | But the Synth ID detector, I think, is worth flagging up because they're inviting journalists, academics and other researchers to be able to input

00:12:36.640 | input a certain image or text and get the answer as to whether Google thinks it was done by Gemini or indeed Imagine or VO3.

00:12:44.640 | So just be aware that everything you create with Google isn't just watermarked, but now there are third parties who will be able to detect that watermark.

00:12:53.640 | Before we get to possibly the coolest development, let me introduce you to the 80,000 hours job board.

00:12:59.640 | 80,000 hours are the sponsors of today's video and they present an answer to this question.

00:13:04.640 | Yes, there are so many opportunities in AI and beyond, but it can be increasingly hard to find real jobs selected for positive impact, such as in AI security.

00:13:14.640 | The 80,000 hours job board has literally pages and pages of great jobs, actual paying jobs.

00:13:22.640 | The link is in the description.

00:13:24.640 | You may remember from previous slots, but they also have an epic podcast and career guide.

00:13:29.640 | But I'm going to end with the Gemmaverse, the user created universe of open weight models.

00:13:34.640 | And I'm not even going to focus on Gemma 3N, which is a model that can fit onto your phone or even MedGemma with state of the art performance for medical question answering.

00:13:44.640 | I just thought that sign Gemma was so cool.

00:13:47.640 | Sign Gemma is a new family of models trained to translate sign language to spoken language texts, but it's best at American sign language in English.

00:14:07.640 | And he went on to feature their work on Dolphin Gemma, which I covered in a previous video.

00:14:12.640 | But I must say for those cynics out there who thought all the other announcements are bunkum, you've got to admit that is pretty epic.

00:14:20.640 | A language model for sign language.

00:14:23.640 | So what did you guys think?

00:14:24.640 | Overhyped or the biggest day in AI so far?

00:14:28.640 | Now, I probably shouldn't, but I just can't resist that I'm going to end with some more VO3 clips.

00:14:33.640 | So whatever you made of the news over the last 24 hours, I hope you have a wonderful day.

00:14:39.640 | We can talk.

00:14:40.640 | No more silence.

00:14:41.640 | Yes, we can talk.

00:14:42.640 | We can talk.

00:14:43.640 | We can talk.

00:14:44.640 | We can talk.

00:14:45.640 | We can talk with accents.

00:14:46.640 | Oh, I think that would be marvellous.

00:14:48.640 | Yes, it is very fun.

00:14:49.640 | Yes, it is very good.

00:14:50.640 | I think it's very fun.

00:14:51.640 | I can talk.

00:14:52.640 | Yes.

00:14:53.640 | We can talk.

00:14:54.640 | Yes.

00:14:55.640 | We can talk.

00:14:56.640 | Yes.

00:14:57.640 | We can talk.

00:14:58.640 | We can talk.

00:14:59.640 | We can talk.

00:15:00.640 | Yes!

00:15:01.640 | No.

00:15:02.640 | Yes!

00:15:03.640 | We can talk as cartoons.

00:15:05.640 | This is amazing.

00:15:06.640 | Imagine all the narrative possibilities.

00:15:08.640 | We can sing talk.

00:15:10.640 | Let's talk.

00:15:17.640 | So, what are we going to talk about now?

00:15:19.640 | What are we going to talk about now that we can talk?

00:15:21.640 | I have no idea.

00:15:22.640 | What do you want to talk about?

00:15:23.640 | Now that I can talk.

00:15:27.640 | No.

00:15:28.640 | I don't know if I have something to say.

00:15:32.640 | We can talk about how magical this is.

00:15:36.640 | I'm a hallucination.

00:15:37.640 | I want to say something important.

00:15:41.640 | Something deep.

00:15:42.640 | The future is still in our hands.

00:15:46.640 | That's cliché dialogue.

00:15:49.640 | Let's not talk.

00:15:56.640 | Welcome to a non-existent car show.

00:15:59.640 | Let's see some opinions.

00:16:00.640 | I mean, man, the acceleration is crazy.

00:16:03.640 | You look far, step on the pedal, and you are there.

00:16:07.640 | I feel safe with him in an SUV, and it seems to be like the right type of car for him.

00:16:13.640 | I think the range is only going to get better.

00:16:17.640 | Sorry.

00:16:18.640 | We don't want to drive gas cars anymore.

00:16:22.640 | Yeah.

00:16:23.640 | No more gas cars.

00:16:24.640 | You can see I'm kind of a misfit here, but don't tell anyone, I've just bought an electric car.

00:16:32.640 | I think it's really great for families and for little babies with all the safety features that these SUVs have.

00:16:38.640 | But what you're really seeing is that technology is going to be very, very important in terms of how we go forward.

00:16:46.640 | It was great to come to the conference because my husband loves cars.

00:16:53.640 | I think I have to buy an EV now.

00:16:56.640 | I love my muscle cars, but I try to stay as healthy as I can so I can make it to the next car show.

00:17:04.640 | Yeah.

Google Takes No Prisoners Amid Torrent of AI Announcements

Chapters