back to indexA year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind

00:00:40.900 |
across the model side, across the Gemini app side, 00:00:44.100 |
and also across, of course, the developer platform. 00:00:59.760 |
New Gemini model, this is hopefully the final update 00:01:07.920 |
about the changes, and I think my slide has an animation, 00:01:18.580 |
A bunch of increases across benchmarks people care about. 00:01:22.720 |
It's Soda on Ader, and it's Soda on HLE, and some other benchmarks. 00:01:27.780 |
I think it closes the gap on a bunch of the stuff that folks gave us feedback on 00:01:32.840 |
from the previous versions of the model, so hopefully it has great performance across the board. 00:01:37.900 |
It also, I think, is sort of setting the stage for the future of Gemini. 00:01:43.960 |
I think 2.5 Pro for us internally, and I think in the perception from the developer ecosystem, 00:01:49.060 |
was the turning point, which was super exciting. 00:01:53.020 |
We've got a bunch of other great models coming as well. 00:01:58.020 |
Send us feedback if things don't work, and we'll continue to push the rock up the hill. 00:02:03.020 |
You can go to AI.dev if you want to try it out. 00:02:05.020 |
It's also available on the Gemini app and all that other stuff. 00:02:07.020 |
And if you need anything, email us, and we'll make it happen. 00:02:17.380 |
So I don't know if folks tuned in to Google I/O, but Sundar showed this slide on stage, 00:02:22.280 |
which I think was a great reminder for me of just how much-- 00:02:27.120 |
like, it feels like 10 years of Gemini stuff packed into the last 12 months, 00:02:33.520 |
And it's actually interesting to see as well, just to sort of opine on one of the points. 00:02:38.120 |
like all of these different research bets across DeepMind coming together 00:02:42.320 |
to like build this incredible mainline Gemini model. 00:02:44.880 |
And I think this is actually like I have a conversation with people all the time, 00:02:47.520 |
but like what is-- what's the DeepMind strategy? 00:02:50.020 |
What's the advantage for us building models, all that good stuff? 00:02:52.780 |
And I think the interesting thing to me is just this breadth of research happening 00:02:56.460 |
across like science and Gemini and all these other areas, 00:03:02.720 |
And all that actually ends up upstreaming into the mainline models, 00:03:07.520 |
So you see like the alpha proof and alpha geometry and a bunch of stuff 00:03:12.020 |
that we did with custom models in those areas, 00:03:15.380 |
actually improving the performance of our models for those domains. 00:03:18.980 |
And Jack will talk about that in a little bit, 00:03:24.520 |
The other thing is just like not just the pace of innovation, 00:03:31.120 |
which was a 50x increase in the amount of AI inference that's being processed through Google servers from one year ago to last month. 00:03:41.380 |
And I think that is just remarkable to see the amount of increase in demand for Gemini models, 00:03:54.380 |
I think the other question, and I think this is like talked about a little bit, 00:04:04.080 |
I think one of the critical pieces and like it's, you know, not super fun, 00:04:08.580 |
but is worth thinking about for folks who are building companies here is like an organizational thing, truthfully. 00:04:14.880 |
Like I think you bring together Google historically had lots of different teams doing lots of different AI research. 00:04:19.580 |
And in late 2023, early 2023, Google brought a bunch of those teams together and sort of charted this new direction for the DeepMind team to not only just like do theoretical foundational research, 00:04:30.380 |
but also to like build models and deliver them to the rest of Google and also the external world. 00:04:34.540 |
And then we took the second step of that journey later earlier this year, which was actually bringing the product teams into DeepMind. 00:04:41.920 |
So now DeepMind creates the models, does the research, but then also builds products and delivers those to the world. 00:04:47.580 |
Then we have the Gemini app, which is our consumer product, and then we have the developer side of that with the Gemini API. 00:04:52.080 |
And this has been like personally for me super fun to get to collaborate with our research team and like help actually be on the frontier with them and bring new models and capabilities to the world. 00:05:02.280 |
I think this is like the collaboration that works, works incredibly well. 00:05:08.280 |
I think this is the, this is the most fun part is there's so much stuff, so much innovation happening inside of Google. 00:05:14.280 |
It's, it's incredible to get to bring that to the world and bring that to developers. 00:05:18.280 |
And I think we're actually very early in that journey and as we'll, we'll see in a couple of minutes. 00:05:33.780 |
I don't know if folks have played around with VO or not, but it's also been just incredible to see the reception to VO. 00:05:40.280 |
It's burning all the TPUs down, which has been incredible to see lots of demand, lots of interest on the VO front. 00:05:47.280 |
So hopefully folks get a chance to play around. 00:05:55.780 |
So I think the, the sort of Gemini app piece is interesting just because people talk about it a lot and it's, it's a fun product and it's cool to think about. 00:06:04.780 |
And also sort of, I think for folks building stuff, it's interesting to hear like what our strategy is from the app perspective. 00:06:10.780 |
But the Gemini app is trying to be this universal assistant. 00:06:13.280 |
And I think what that means in practice is if you, I'm sure people don't think about this all the time, but I think a lot about like what Google's products do and sort of how we show up in the world. 00:06:23.280 |
And one of the interesting observations I had was that if you think about what was the thing that like brought people, individuals through all of Google's products historically. 00:06:33.280 |
Like the thing that comes to mind is like, like your Google account, I guess, which like wasn't like super stateful. 00:06:38.080 |
You would sort of sign into lots of different Google products with your Google account, but that didn't really do anything other than just like get you signed into that individual product. 00:06:46.580 |
I think now we're seeing with Gemini that it's actually this thread that unifies all of Google. 00:06:50.780 |
And I think the future for Google is going to look a lot like Gemini is this sort of, you know, thread that brings all of our stuff together, which is really interesting. 00:06:58.280 |
And then hitting on all the trends, which I'm sure folks are also excited about building, I think the one that I'm most excited about is proactivity. 00:07:04.880 |
I think most AI products today are still very like, you have to go and do all the work as the user. 00:07:10.380 |
And I think this proactive next step of AI systems and models coming into play is going to be is going to be awesome to see. 00:07:21.380 |
If you have complaints, please do not tag me on Twitter, please tag Josh. 00:07:35.880 |
But he is the person who can make stuff happen on the Gemini app side, not me. 00:07:40.880 |
From a model perspective, like again, there's so much. 00:07:44.880 |
When Gemini was originally created, it was built to be a single multimodal model to do audio, image, video, et cetera. 00:07:52.880 |
We've made a lot of progress on that at I/O this year. 00:07:55.380 |
We announced native audio capabilities in Gemini. 00:08:07.380 |
So I think we're going to get towards that omnimodal model, which is awesome. 00:08:11.380 |
We have VO, which is soda across a bunch of stuff. 00:08:14.880 |
So hopefully we'll get video into the mainline Gemini model. 00:08:17.880 |
If folks saw some of our early experiments with diffusion, which means you can get like crazy levels of tokens per second. 00:08:25.880 |
That's like definitely a research exploration area, and it's not mainline yet. 00:08:34.880 |
The agentic by default thread, I think, is something that I've been thinking a lot about recently, which is like historically, for me as a developer, I've thought about models just as this thing that gives me tokens in and out. 00:08:46.880 |
And then there was lots of scaffolding in the ecosystem to allow me to build those models. 00:08:50.880 |
I think it's becoming very clear to me that the models are becoming more systematic themselves, like they're doing more and more. 00:08:57.880 |
And I think the reasoning step is this like really interesting place in which a lot of that's going to happen. 00:09:02.880 |
And Jack's going to talk about the scaling up of reasoning. 00:09:05.880 |
But I do think it'll be interesting to see like how much of the scaffolding work that's happened in the past ends up just like being a part of that reasoning step and like what that means for people who are building products and stuff like that. 00:09:19.880 |
We'll also have more small models soon, which I'm excited about, and big models. 00:09:27.880 |
And then the last one is continuing to push the frontier on infinite context. 00:09:30.880 |
I think the current model paradigm doesn't work for infinite context. 00:09:34.880 |
I think it's just like impossible to scale up. 00:09:38.880 |
So I think there'll be some new innovations to hopefully help let people continue to scale up the amount of context that they're bringing in. 00:09:45.880 |
And Tulsi is the person who drives all of our model stuff. 00:09:50.880 |
So if you have stuff, you want to talk about Gemini models, you have ideas for things that don't work well. 00:09:56.880 |
She is the person running the show on the Gemini model product side. 00:10:03.880 |
So we have lots of things coming, which I'm excited about. 00:10:07.880 |
I think I'll highlight maybe three that I think people are super excited about. 00:10:12.880 |
Embeddings, I think we have, which is, you know, feels like early AI stuff, but I think it's still super important. 00:10:18.880 |
Embeddings power most people's applications using RAG. 00:10:21.880 |
We have a Gemini embeddings model, which is state of the art. 00:10:24.880 |
So excited to be rolling that out to developers more broadly in the next couple of weeks. 00:10:28.880 |
The deep research API I'm super interested in. 00:10:31.880 |
There's so many interesting products that are built around this sort of research tasks. 00:10:37.880 |
So we're finding ways to bring a bunch of that together into a, like, bespoke deep research API, which will be awesome. 00:10:44.880 |
And then VO3 and Imagine 4 in the API as well. 00:10:47.880 |
So hopefully we'll see that very, very, very soon. 00:10:50.880 |
And as we work to scale and make that possible from a developer platform side, I'll make one other quick comment, 00:10:57.880 |
which is the AI Studio product positioning, which I also think is interesting. 00:11:03.880 |
Like, AI Studio, just to be very clear, is being built as a developer platform. 00:11:08.880 |
So we'll sort of move away from this, like, kind of consumer-y feel and move much more towards being a developer platform, 00:11:14.880 |
which I'm personally very excited about because I think that's what developers want from us. 00:11:17.880 |
So it'll be awesome to see that actually come to life with, like, many new iterations of our developer experience 00:11:23.880 |
with agents built in and hopefully things like Jules and some of our developer coding agents natively in that experience, 00:11:35.880 |
I appreciate all the people who send lots of great feedback about Gemini stuff. 00:11:39.880 |
So we'll keep pushing the rock up the hill, and I'll be around. 00:11:42.880 |
So if you have more feedback, come find me, and we'll keep making Gemini great for everyone.