Back to Index

Unveiling the latest Gemma model advancements: Kathleen Kenealy


Transcript

My name is Kathleen Keneally. I'm a research engineer at Google DeepMind. And as was just mentioned, I'm the technical lead of the Gemma team. Before I get started, I just wanted to say how awesome it is to get to be here with you all today. When we were building Gemma, our North Star, the thing we were most excited about was building something to empower and accelerate the amazing work being done by the open source community.

And since we launched our first models in February, I have been absolutely blown away by the incredible projects and research and innovations that have already been built on top of Gemma. So I'm particularly excited to be here with so many developers today and especially delighted to unveil the latest advancements and additions to the Gemma model family.

So without further ado, we'll get started. As many of you probably know, Google has been a pioneer in publications of AI and ML research for the past decade, including publishing some of the key research that has sparked recent innovations we've seen in AI. Research like the Transformer, Sentin piece, BERT, to name a few.

Google DeepMind has really continued this tradition and is actively working to share our research for the world to validate and examine and build upon. But Google's support of the open community for AI and ML is not just limited to publishing research. We've also been doing work to support ML across the entire technical stack for a long time, from hardware breakthroughs of TPUs, which I imagine is especially relevant for this crowd and this track, all the way to an evolution in ML frameworks from TensorFlow to JAX.

Throughout all of this, open development has been especially critical for Google. Our ability to collaborate with the open source community has helped us all discover more, innovate faster, and really push the limits of what AI is capable of. So this long history of support of the open source community leads us to today and to Google's latest investment in open models, Gemma.

Gemma is Google DeepMind's family of open source, lightweight, state-of-the-art models, which we build from the same research and technology used to create the Gemma models. I'm so sorry, I think that's my phone going off during this talk. Please feel free to rummage through that bag. Wow, lesson learned that even the speaker needs to remember to silence her cell phone.

All right, back to Gemma. There are a couple of key advantages of the Gemma models that I want to highlight today. The first is that Gemma models were built to be responsible by design. I can tell you from personal experience that from day zero of developing a Gemma model, safety is a top priority.

That means we are manually inspecting data sets to make sure that we are not only training on the highest quality data, but also the safest data we can. This means that we are evaluating our models for safety, starting with our earliest experimentation and ablations, so that we are selecting training methodologies that we know will result in a safer model.

And at the end of our development, our final models are evaluated against the same rigorous state-of-the-art safety evaluations that we evaluate Gemma models against. And we really do this to make sure that no matter where or how you deploy a Gemma model, you can count on the fact that you will have a trustworthy and responsible AI application.

No matter how you've customized Gemma models, you can trust that it will be a responsible model. Gemma models also achieve unparalleled breakthrough performance for models of their scale, including outperforming significantly larger models. But we'll get to more on that very shortly. We also designed the Gemma models to be highly extensible so that you can use a Gemma model wherever and however you want.

This means they're optimized for TPUs and GPUs, as well as for use on your local device. They're supported across many frameworks, TensorFlow, JAX, Keras, PyTorch, Ollama, Transformers, you name it, Gemma is probably there. And finally, the real power of the Gemma models comes from their open access and open license.

That period, that's what's powerful about Gemma. We put state-of-the-art technology into your hands so you can decide what the next wave of innovation looks like. When we decided to launch the Gemma models, we wanted to make sure that we could meet developers exactly where they are, which is why Gemma models are available anywhere and everywhere you can find an open model.

I will not list all of the frameworks on this slide, but this is only a fraction of the places where you can find Gemma models today. This means you can use Gemma how you need it, when you need it, with the tools that you prefer for development. Since our initial launch back in February, we've added a couple of different variants to the Gemma model family.

We, of course, have our initial models, Gemma 1.0, which are our foundational LLMs. We also released, shortly after that, Code Gemma, which are the Gemma 1.0 models fine-tuned for improved performance on code generation and code evaluation. And one variant that I am particularly excited about is Recurrent Gemma, which is a novel architecture, a state-space model that's designed for faster and more efficient inference, especially at long contexts.

We've also updated all of these models since their initial release. We now have Gemma 1.1, which is better at instruction following and chat. We've updated Code Gemma to have even more improved code performance. And we now have Recurrent Gemma at not only the original 2B size, but also at a 9 billion parameter size.

So there's a lot going on in the Gemma model family, and I'm especially excited to tell you about our two most recent launches. The first one is actually our most highly requested feature since day zero of launch, and that was multimodality. So we launched Pally Gemma. Pally Gemma -- oh, thank you.

I appreciate it. This is why I love the open source community, truly the most passionate developers that there are. Pally Gemma is a combination of the SigLip vision encoder combined with the Gemma 1.0 text decoder. This combination allows us to do a variety of image text sort of tasks and capabilities, including question answering, image and video captioning, object detection, and object segmentation.

The model comes in a couple of different variants. It's currently only available at the 2B size, but we have pre-trained weights that are available that can be fine-tuned for specific tasks. We have a couple of different fine-tuned variants as well that are already targeted towards things like object detection and object segmentation.

And we also have transfer checkpoints that are models that are specialized to target a couple of academic benchmarks. Up until this morning, that was our latest release, but I'm very excited to be here today with you guys because it is Gemma V2 launch day! Woo-hoo! Wow, thanks. We have been working very hard on these models since Gemma 1.0 launch date.

We tried to do as much as we could to gather feedback from the community to learn where the 1.0 and 1.1 models fell short and what we could do to make them better, and so we created Gemma 2. Gemma 2 comes in both a 9 billion parameter size and a 27 billion parameter size.

Both models are without a doubt the most performant of their size, and both models also outperform models that are even two to three times larger than these base models. But Gemma 2 isn't just powerful. It's designed to easily integrate into the workflows that you already have existing. So Gemma 2 uses all of the same tools, all of the same frameworks as Gemma 1, which means if you've already started developing with Gemma 1, you can, with only a couple of lines of code, automatically switch to using the Gemma 2 models and have increased performance and more power behind your applications.

We also have the same broad framework compatibility. Again, TensorFlow, Jaxx, Transformers, Omama, all of the ones I previously named, we have them for Gemma 2 as well. We also have significantly improved documentation. We have more guides, more tutorials, so that we can coach you through how to get started not only with inference, but with advanced and efficient fine-tuning from day zero.

And finally, we really wanted to target fine-tuning as one of the key capabilities of these models. We did extensive research into how our core modeling decisions impact users' ability to do downstream fine-tuning. So we believe these models are going to be incredibly easy to fine-tune, so you can customize them to whatever your use case may be.

In addition, to make it especially easy to get started using Gemma 2 models, we have made the 27B model available in Google AI Studios. This means you can go to the AI Studio homepage and select Gemma 2 now, if you wanted to, and start playing around with prompts right away.

You shouldn't have to do anything except come up with an idea for how you want to push the limits of our model. I am especially excited to see what you all end up doing with AI Studios and Gemma, and we have a couple of different ways for you to let us know what you're building, which I'll get to down the road.

But if you have ideas, I'll be here all day and want to hear what you're doing with the Gemma models. But let's dive a little bit more into performance. We are incredibly proud of the models that we've made. As I mentioned, they are without a doubt the best, most performant models of their size and are also competitive with models two to three times larger.

So our 27B model has performance in the same ballpark as LLAMA 370B and outperforms Grock models on many benchmarks by a fairly significant margin in some cases. But I think academic benchmarks are only part of the way that we evaluate Gemma models. Sometimes these benchmarks are not always indicative of how a model will perform once it's in your hands.

So we've done extensive human evaluations as well, where we find that the Gemma models are consistently heavily preferred to other open models, including larger open models. And I'm also proud to say that the Gemma 27B model is currently the number one open model of its size. And it currently outranks LLAMA 370B, Nemo Tron 340B, Grock, Claude 3, many, many other models as well.

Thank you. Wow, you guys are very supportive. I appreciate it. The only other open model of any size that outperforms the Gemma 27B model is the E large model on LMSS. So we expect that you should have some fun playing around with this, especially for chat applications. We found in our evaluations that the Gemma 2 models are even better at instruction following.

They're even more creative. They're better at factuality, better all around than the Gemma 1.0 and 1.1 models. The other important thing that I want to make sure to highlight from our most recent launch is the Gemma cookbook. The Gemma cookbook is available on GitHub now and contains 20 different recipes of ranging from easy to very advanced applications of how to use the Gemma models.

And the thing that I am most excited about is the Gemma cookbook is currently accepting pull requests. So this is a great opportunity to share with us what you're building with the Gemma models so we can help share it with the rest of the world. And of course, I have to say, we also wouldn't mind if you started the repository.

Go take a look and tell us what you're building with Gemma. So there are a couple of different ways you can get started with the Gemma 2 models. Of course, I just mentioned the cookbook. You can also apply to get GCP credits to accelerate your research using Gemma 2.

We have a lot of funding available to support research. I would really encourage you to fill out an application regardless of how small or big your project is. We also, as I mentioned, have significantly improved documentation. We have many guides, tutorials, collabs across every framework so you can get started doing inference, fine tuning, and evaluation with Gemma 2 models.

You can download them anywhere open models are available. And please chat with us on Discord or other social media channels so we can learn more about what you're building. And that's about all from me today. I am so excited to see what you all build with Gemma. I have been working on this project for almost two years now and started working on this project because I, as a researcher in academia, was disappointed to see how far behind open foundational LLMs were compared to the rapid improvements we were seeing in proprietary models.

So this is something that's very near and dear to my heart and that I wish I had had when I was actively part of the open source community. So I'm very excited to see the projects and the research that you all do with these models. Please engage with us on social media, on GitHub, on Hugging Face, here at the event, and let us know what you think of the models.

Let us know what you think we can do better for next time. And thank you all very much. Really appreciate your time. Thank you.