back to indexUnveiling the latest Gemma model advancements: Kathleen Kenealy

00:00:00.000 |
My name is Kathleen Keneally. I'm a research engineer at Google DeepMind. And as was just 00:00:19.280 |
mentioned, I'm the technical lead of the Gemma team. Before I get started, I just wanted to say 00:00:26.400 |
how awesome it is to get to be here with you all today. When we were building Gemma, our North Star, 00:00:33.600 |
the thing we were most excited about was building something to empower and accelerate 00:00:39.680 |
the amazing work being done by the open source community. And since we launched our first models 00:00:46.000 |
in February, I have been absolutely blown away by the incredible projects and research and 00:00:54.000 |
innovations that have already been built on top of Gemma. So I'm particularly excited to be here with 00:01:01.200 |
so many developers today and especially delighted to unveil the latest advancements and additions to 00:01:09.200 |
the Gemma model family. So without further ado, we'll get started. As many of you probably know, Google has 00:01:18.480 |
been a pioneer in publications of AI and ML research for the past decade, including publishing some of 00:01:26.800 |
the key research that has sparked recent innovations we've seen in AI. Research like the Transformer, 00:01:34.320 |
Sentin piece, BERT, to name a few. Google DeepMind has really continued this tradition and is actively working to share 00:01:43.680 |
our research for the world to validate and examine and build upon. But Google's support of the open 00:01:50.880 |
community for AI and ML is not just limited to publishing research. We've also been doing work to 00:01:57.760 |
support ML across the entire technical stack for a long time, from hardware breakthroughs of TPUs, which I 00:02:05.360 |
imagine is especially relevant for this crowd and this track, all the way to an evolution in ML frameworks from 00:02:13.200 |
TensorFlow to JAX. Throughout all of this, open development has been especially critical for Google. Our ability to 00:02:22.480 |
collaborate with the open source community has helped us all discover more, innovate faster, 00:02:29.760 |
and really push the limits of what AI is capable of. So this long history of support of the open source 00:02:37.440 |
community leads us to today and to Google's latest investment in open models, Gemma. Gemma is Google 00:02:46.400 |
DeepMind's family of open source, lightweight, state-of-the-art models, which we build from the same research and 00:02:54.640 |
technology used to create the Gemma models. I'm so sorry, I think that's my phone going off during this 00:03:00.800 |
talk. Please feel free to rummage through that bag. Wow, lesson learned that even the speaker needs to 00:03:07.680 |
remember to silence her cell phone. All right, back to Gemma. There are a couple of key advantages of the 00:03:15.200 |
Gemma models that I want to highlight today. The first is that Gemma models were built to be responsible 00:03:21.360 |
by design. I can tell you from personal experience that from day zero of developing a Gemma model, 00:03:30.000 |
safety is a top priority. That means we are manually inspecting data sets to make sure that we are not 00:03:37.120 |
only training on the highest quality data, but also the safest data we can. This means that we are evaluating 00:03:44.720 |
our models for safety, starting with our earliest experimentation and ablations, so that we are 00:03:50.880 |
selecting training methodologies that we know will result in a safer model. And at the end of our 00:03:58.080 |
development, our final models are evaluated against the same rigorous state-of-the-art safety evaluations 00:04:04.960 |
that we evaluate Gemma models against. And we really do this to make sure that no matter where or how you 00:04:13.600 |
deploy a Gemma model, you can count on the fact that you will have a trustworthy and responsible AI 00:04:20.160 |
application. No matter how you've customized Gemma models, you can trust that it will be a responsible 00:04:25.440 |
model. Gemma models also achieve unparalleled breakthrough performance for models of their scale, 00:04:33.280 |
including outperforming significantly larger models. But we'll get to more on that very shortly. 00:04:41.520 |
We also designed the Gemma models to be highly extensible so that you can use a Gemma model wherever 00:04:49.600 |
and however you want. This means they're optimized for TPUs and GPUs, as well as for use on your local 00:04:56.080 |
device. They're supported across many frameworks, TensorFlow, JAX, Keras, PyTorch, Ollama, Transformers, 00:05:04.320 |
you name it, Gemma is probably there. And finally, the real power of the Gemma models comes from 00:05:11.280 |
their open access and open license. That period, that's what's powerful about Gemma. We put state-of-the-art 00:05:19.200 |
technology into your hands so you can decide what the next wave of innovation looks like. 00:05:24.320 |
When we decided to launch the Gemma models, we wanted to make sure that we could meet developers 00:05:31.440 |
exactly where they are, which is why Gemma models are available anywhere and everywhere you can find an 00:05:38.640 |
open model. I will not list all of the frameworks on this slide, but this is only a fraction of the 00:05:45.280 |
places where you can find Gemma models today. This means you can use Gemma how you need it, when you 00:05:51.600 |
need it, with the tools that you prefer for development. Since our initial launch back in February, 00:05:59.920 |
we've added a couple of different variants to the Gemma model family. We, of course, have our initial models, 00:06:05.760 |
Gemma 1.0, which are our foundational LLMs. We also released, shortly after that, Code Gemma, 00:06:13.040 |
which are the Gemma 1.0 models fine-tuned for improved performance on code generation and code 00:06:19.040 |
evaluation. And one variant that I am particularly excited about is Recurrent Gemma, which is a novel 00:06:26.240 |
architecture, a state-space model that's designed for faster and more efficient inference, especially at long 00:06:33.520 |
contexts. We've also updated all of these models since their initial release. We now have Gemma 1.1, which is 00:06:43.040 |
better at instruction following and chat. We've updated Code Gemma to have even more improved code 00:06:48.560 |
performance. And we now have Recurrent Gemma at not only the original 2B size, but also at a 9 billion 00:06:55.280 |
parameter size. So there's a lot going on in the Gemma model family, and I'm especially excited to tell you 00:07:04.800 |
about our two most recent launches. The first one is actually our most highly requested feature since day zero 00:07:14.720 |
of launch, and that was multimodality. So we launched Pally Gemma. Pally Gemma -- oh, thank you. I appreciate it. 00:07:24.640 |
This is why I love the open source community, truly the most passionate developers that there are. 00:07:31.200 |
Pally Gemma is a combination of the SigLip vision encoder combined with the Gemma 1.0 text decoder. This 00:07:40.960 |
combination allows us to do a variety of image text sort of tasks and capabilities, including 00:07:48.480 |
question answering, image and video captioning, object detection, and object segmentation. 00:07:54.000 |
The model comes in a couple of different variants. It's currently only available at the 2B size, 00:08:00.560 |
but we have pre-trained weights that are available that can be fine-tuned for specific tasks. We have a 00:08:06.160 |
couple of different fine-tuned variants as well that are already targeted towards things like object 00:08:11.040 |
detection and object segmentation. And we also have transfer checkpoints that are models that are 00:08:17.280 |
specialized to target a couple of academic benchmarks. Up until this morning, that was our latest release, 00:08:27.360 |
but I'm very excited to be here today with you guys because it is Gemma V2 launch day! 00:08:38.080 |
We have been working very hard on these models since Gemma 1.0 launch date. We tried to do as much as we 00:08:47.520 |
could to gather feedback from the community to learn where the 1.0 and 1.1 models fell short and what we 00:08:55.040 |
could do to make them better, and so we created Gemma 2. Gemma 2 comes in both a 9 billion parameter size 00:09:03.120 |
and a 27 billion parameter size. Both models are without a doubt the most performant of their size, 00:09:11.600 |
and both models also outperform models that are even two to three times larger than these base models. 00:09:20.800 |
But Gemma 2 isn't just powerful. It's designed to easily integrate into the workflows that you already 00:09:27.280 |
have existing. So Gemma 2 uses all of the same tools, all of the same frameworks as Gemma 1, which means 00:09:35.120 |
if you've already started developing with Gemma 1, you can, with only a couple of lines of code, 00:09:40.960 |
automatically switch to using the Gemma 2 models and have increased performance and more power behind 00:09:48.160 |
your applications. We also have the same broad framework compatibility. Again, TensorFlow, 00:09:55.440 |
Jaxx, Transformers, Omama, all of the ones I previously named, we have them for Gemma 2 as well. 00:10:01.120 |
We also have significantly improved documentation. We have more guides, more tutorials, so that we can 00:10:09.040 |
coach you through how to get started not only with inference, but with advanced and efficient fine-tuning from 00:10:15.200 |
day zero. And finally, we really wanted to target fine-tuning as one of the key capabilities of these 00:10:23.040 |
models. We did extensive research into how our core modeling decisions impact users' ability to do 00:10:31.760 |
downstream fine-tuning. So we believe these models are going to be incredibly easy to fine-tune, so you can 00:10:38.240 |
customize them to whatever your use case may be. In addition, to make it especially easy to get 00:10:45.600 |
started using Gemma 2 models, we have made the 27B model available in Google AI Studios. This means you 00:10:53.520 |
can go to the AI Studio homepage and select Gemma 2 now, if you wanted to, and start playing around with 00:11:00.800 |
prompts right away. You shouldn't have to do anything except come up with an idea for how you want to push 00:11:07.040 |
the limits of our model. I am especially excited to see what you all end up doing with AI Studios and 00:11:14.960 |
Gemma, and we have a couple of different ways for you to let us know what you're building, which I'll get 00:11:20.720 |
to down the road. But if you have ideas, I'll be here all day and want to hear what you're doing with 00:11:26.480 |
the Gemma models. But let's dive a little bit more into performance. We are incredibly proud of the 00:11:35.200 |
models that we've made. As I mentioned, they are without a doubt the best, most performant models of 00:11:41.600 |
their size and are also competitive with models two to three times larger. So our 27B model has 00:11:50.000 |
performance in the same ballpark as LLAMA 370B and outperforms Grock models on many benchmarks by a 00:11:58.480 |
fairly significant margin in some cases. But I think academic benchmarks are only part of the way that we 00:12:07.040 |
evaluate Gemma models. Sometimes these benchmarks are not always indicative of how a model will perform 00:12:14.000 |
once it's in your hands. So we've done extensive human evaluations as well, where we find that the Gemma 00:12:20.400 |
models are consistently heavily preferred to other open models, including larger open models. And I'm also 00:12:30.080 |
proud to say that the Gemma 27B model is currently the number one open model of its size. And it 00:12:38.800 |
currently outranks LLAMA 370B, Nemo Tron 340B, Grock, Claude 3, many, many other models as well. 00:12:49.120 |
Thank you. Wow, you guys are very supportive. I appreciate it. 00:12:56.560 |
The only other open model of any size that outperforms the Gemma 27B model is the E large model on LMSS. 00:13:05.840 |
So we expect that you should have some fun playing around with this, especially for chat applications. 00:13:12.880 |
We found in our evaluations that the Gemma 2 models are even better at instruction following. They're even 00:13:19.200 |
more creative. They're better at factuality, better all around than the Gemma 1.0 and 1.1 models. 00:13:25.760 |
The other important thing that I want to make sure to highlight from our most recent launch is the 00:13:33.600 |
Gemma cookbook. The Gemma cookbook is available on GitHub now and contains 20 different recipes of 00:13:40.720 |
ranging from easy to very advanced applications of how to use the Gemma models. And the thing that I am 00:13:47.200 |
most excited about is the Gemma cookbook is currently accepting pull requests. So this is a great 00:13:53.280 |
opportunity to share with us what you're building with the Gemma models so we can help share it with 00:14:00.480 |
the rest of the world. And of course, I have to say, we also wouldn't mind if you started the repository. 00:14:07.760 |
Go take a look and tell us what you're building with Gemma. So there are a couple of different ways you 00:14:13.520 |
can get started with the Gemma 2 models. Of course, I just mentioned the cookbook. You can also apply to 00:14:20.240 |
get GCP credits to accelerate your research using Gemma 2. We have a lot of funding available to support 00:14:29.200 |
research. I would really encourage you to fill out an application regardless of how small or big your 00:14:36.080 |
project is. We also, as I mentioned, have significantly improved documentation. We have many guides, tutorials, 00:14:43.440 |
collabs across every framework so you can get started doing inference, fine tuning, and evaluation with 00:14:49.520 |
Gemma 2 models. You can download them anywhere open models are available. And please chat with us on 00:14:56.720 |
Discord or other social media channels so we can learn more about what you're building. 00:15:00.960 |
And that's about all from me today. I am so excited to see what you all build with Gemma. I have been 00:15:12.000 |
working on this project for almost two years now and started working on this project because I, as a 00:15:20.480 |
researcher in academia, was disappointed to see how far behind open foundational LLMs were compared to 00:15:29.520 |
the rapid improvements we were seeing in proprietary models. So this is something that's very near and 00:15:35.920 |
dear to my heart and that I wish I had had when I was actively part of the open source community. So I'm 00:15:42.880 |
very excited to see the projects and the research that you all do with these models. Please engage with us on 00:15:49.120 |
social media, on GitHub, on Hugging Face, here at the event, and let us know what you think of the models. Let 00:15:56.720 |
us know what you think we can do better for next time. And thank you all very much. Really appreciate your time.