back to index

Unveiling the latest Gemma model advancements: Kathleen Kenealy


Whisper Transcript | Transcript Only Page

00:00:00.000 | My name is Kathleen Keneally. I'm a research engineer at Google DeepMind. And as was just
00:00:19.280 | mentioned, I'm the technical lead of the Gemma team. Before I get started, I just wanted to say
00:00:26.400 | how awesome it is to get to be here with you all today. When we were building Gemma, our North Star,
00:00:33.600 | the thing we were most excited about was building something to empower and accelerate
00:00:39.680 | the amazing work being done by the open source community. And since we launched our first models
00:00:46.000 | in February, I have been absolutely blown away by the incredible projects and research and
00:00:54.000 | innovations that have already been built on top of Gemma. So I'm particularly excited to be here with
00:01:01.200 | so many developers today and especially delighted to unveil the latest advancements and additions to
00:01:09.200 | the Gemma model family. So without further ado, we'll get started. As many of you probably know, Google has
00:01:18.480 | been a pioneer in publications of AI and ML research for the past decade, including publishing some of
00:01:26.800 | the key research that has sparked recent innovations we've seen in AI. Research like the Transformer,
00:01:34.320 | Sentin piece, BERT, to name a few. Google DeepMind has really continued this tradition and is actively working to share
00:01:43.680 | our research for the world to validate and examine and build upon. But Google's support of the open
00:01:50.880 | community for AI and ML is not just limited to publishing research. We've also been doing work to
00:01:57.760 | support ML across the entire technical stack for a long time, from hardware breakthroughs of TPUs, which I
00:02:05.360 | imagine is especially relevant for this crowd and this track, all the way to an evolution in ML frameworks from
00:02:13.200 | TensorFlow to JAX. Throughout all of this, open development has been especially critical for Google. Our ability to
00:02:22.480 | collaborate with the open source community has helped us all discover more, innovate faster,
00:02:29.760 | and really push the limits of what AI is capable of. So this long history of support of the open source
00:02:37.440 | community leads us to today and to Google's latest investment in open models, Gemma. Gemma is Google
00:02:46.400 | DeepMind's family of open source, lightweight, state-of-the-art models, which we build from the same research and
00:02:54.640 | technology used to create the Gemma models. I'm so sorry, I think that's my phone going off during this
00:03:00.800 | talk. Please feel free to rummage through that bag. Wow, lesson learned that even the speaker needs to
00:03:07.680 | remember to silence her cell phone. All right, back to Gemma. There are a couple of key advantages of the
00:03:15.200 | Gemma models that I want to highlight today. The first is that Gemma models were built to be responsible
00:03:21.360 | by design. I can tell you from personal experience that from day zero of developing a Gemma model,
00:03:30.000 | safety is a top priority. That means we are manually inspecting data sets to make sure that we are not
00:03:37.120 | only training on the highest quality data, but also the safest data we can. This means that we are evaluating
00:03:44.720 | our models for safety, starting with our earliest experimentation and ablations, so that we are
00:03:50.880 | selecting training methodologies that we know will result in a safer model. And at the end of our
00:03:58.080 | development, our final models are evaluated against the same rigorous state-of-the-art safety evaluations
00:04:04.960 | that we evaluate Gemma models against. And we really do this to make sure that no matter where or how you
00:04:13.600 | deploy a Gemma model, you can count on the fact that you will have a trustworthy and responsible AI
00:04:20.160 | application. No matter how you've customized Gemma models, you can trust that it will be a responsible
00:04:25.440 | model. Gemma models also achieve unparalleled breakthrough performance for models of their scale,
00:04:33.280 | including outperforming significantly larger models. But we'll get to more on that very shortly.
00:04:41.520 | We also designed the Gemma models to be highly extensible so that you can use a Gemma model wherever
00:04:49.600 | and however you want. This means they're optimized for TPUs and GPUs, as well as for use on your local
00:04:56.080 | device. They're supported across many frameworks, TensorFlow, JAX, Keras, PyTorch, Ollama, Transformers,
00:05:04.320 | you name it, Gemma is probably there. And finally, the real power of the Gemma models comes from
00:05:11.280 | their open access and open license. That period, that's what's powerful about Gemma. We put state-of-the-art
00:05:19.200 | technology into your hands so you can decide what the next wave of innovation looks like.
00:05:24.320 | When we decided to launch the Gemma models, we wanted to make sure that we could meet developers
00:05:31.440 | exactly where they are, which is why Gemma models are available anywhere and everywhere you can find an
00:05:38.640 | open model. I will not list all of the frameworks on this slide, but this is only a fraction of the
00:05:45.280 | places where you can find Gemma models today. This means you can use Gemma how you need it, when you
00:05:51.600 | need it, with the tools that you prefer for development. Since our initial launch back in February,
00:05:59.920 | we've added a couple of different variants to the Gemma model family. We, of course, have our initial models,
00:06:05.760 | Gemma 1.0, which are our foundational LLMs. We also released, shortly after that, Code Gemma,
00:06:13.040 | which are the Gemma 1.0 models fine-tuned for improved performance on code generation and code
00:06:19.040 | evaluation. And one variant that I am particularly excited about is Recurrent Gemma, which is a novel
00:06:26.240 | architecture, a state-space model that's designed for faster and more efficient inference, especially at long
00:06:33.520 | contexts. We've also updated all of these models since their initial release. We now have Gemma 1.1, which is
00:06:43.040 | better at instruction following and chat. We've updated Code Gemma to have even more improved code
00:06:48.560 | performance. And we now have Recurrent Gemma at not only the original 2B size, but also at a 9 billion
00:06:55.280 | parameter size. So there's a lot going on in the Gemma model family, and I'm especially excited to tell you
00:07:04.800 | about our two most recent launches. The first one is actually our most highly requested feature since day zero
00:07:14.720 | of launch, and that was multimodality. So we launched Pally Gemma. Pally Gemma -- oh, thank you. I appreciate it.
00:07:24.640 | This is why I love the open source community, truly the most passionate developers that there are.
00:07:31.200 | Pally Gemma is a combination of the SigLip vision encoder combined with the Gemma 1.0 text decoder. This
00:07:40.960 | combination allows us to do a variety of image text sort of tasks and capabilities, including
00:07:48.480 | question answering, image and video captioning, object detection, and object segmentation.
00:07:54.000 | The model comes in a couple of different variants. It's currently only available at the 2B size,
00:08:00.560 | but we have pre-trained weights that are available that can be fine-tuned for specific tasks. We have a
00:08:06.160 | couple of different fine-tuned variants as well that are already targeted towards things like object
00:08:11.040 | detection and object segmentation. And we also have transfer checkpoints that are models that are
00:08:17.280 | specialized to target a couple of academic benchmarks. Up until this morning, that was our latest release,
00:08:27.360 | but I'm very excited to be here today with you guys because it is Gemma V2 launch day!
00:08:33.600 | Woo-hoo!
00:08:35.680 | Wow, thanks.
00:08:38.080 | We have been working very hard on these models since Gemma 1.0 launch date. We tried to do as much as we
00:08:47.520 | could to gather feedback from the community to learn where the 1.0 and 1.1 models fell short and what we
00:08:55.040 | could do to make them better, and so we created Gemma 2. Gemma 2 comes in both a 9 billion parameter size
00:09:03.120 | and a 27 billion parameter size. Both models are without a doubt the most performant of their size,
00:09:11.600 | and both models also outperform models that are even two to three times larger than these base models.
00:09:20.800 | But Gemma 2 isn't just powerful. It's designed to easily integrate into the workflows that you already
00:09:27.280 | have existing. So Gemma 2 uses all of the same tools, all of the same frameworks as Gemma 1, which means
00:09:35.120 | if you've already started developing with Gemma 1, you can, with only a couple of lines of code,
00:09:40.960 | automatically switch to using the Gemma 2 models and have increased performance and more power behind
00:09:48.160 | your applications. We also have the same broad framework compatibility. Again, TensorFlow,
00:09:55.440 | Jaxx, Transformers, Omama, all of the ones I previously named, we have them for Gemma 2 as well.
00:10:01.120 | We also have significantly improved documentation. We have more guides, more tutorials, so that we can
00:10:09.040 | coach you through how to get started not only with inference, but with advanced and efficient fine-tuning from
00:10:15.200 | day zero. And finally, we really wanted to target fine-tuning as one of the key capabilities of these
00:10:23.040 | models. We did extensive research into how our core modeling decisions impact users' ability to do
00:10:31.760 | downstream fine-tuning. So we believe these models are going to be incredibly easy to fine-tune, so you can
00:10:38.240 | customize them to whatever your use case may be. In addition, to make it especially easy to get
00:10:45.600 | started using Gemma 2 models, we have made the 27B model available in Google AI Studios. This means you
00:10:53.520 | can go to the AI Studio homepage and select Gemma 2 now, if you wanted to, and start playing around with
00:11:00.800 | prompts right away. You shouldn't have to do anything except come up with an idea for how you want to push
00:11:07.040 | the limits of our model. I am especially excited to see what you all end up doing with AI Studios and
00:11:14.960 | Gemma, and we have a couple of different ways for you to let us know what you're building, which I'll get
00:11:20.720 | to down the road. But if you have ideas, I'll be here all day and want to hear what you're doing with
00:11:26.480 | the Gemma models. But let's dive a little bit more into performance. We are incredibly proud of the
00:11:35.200 | models that we've made. As I mentioned, they are without a doubt the best, most performant models of
00:11:41.600 | their size and are also competitive with models two to three times larger. So our 27B model has
00:11:50.000 | performance in the same ballpark as LLAMA 370B and outperforms Grock models on many benchmarks by a
00:11:58.480 | fairly significant margin in some cases. But I think academic benchmarks are only part of the way that we
00:12:07.040 | evaluate Gemma models. Sometimes these benchmarks are not always indicative of how a model will perform
00:12:14.000 | once it's in your hands. So we've done extensive human evaluations as well, where we find that the Gemma
00:12:20.400 | models are consistently heavily preferred to other open models, including larger open models. And I'm also
00:12:30.080 | proud to say that the Gemma 27B model is currently the number one open model of its size. And it
00:12:38.800 | currently outranks LLAMA 370B, Nemo Tron 340B, Grock, Claude 3, many, many other models as well.
00:12:49.120 | Thank you. Wow, you guys are very supportive. I appreciate it.
00:12:56.560 | The only other open model of any size that outperforms the Gemma 27B model is the E large model on LMSS.
00:13:05.840 | So we expect that you should have some fun playing around with this, especially for chat applications.
00:13:12.880 | We found in our evaluations that the Gemma 2 models are even better at instruction following. They're even
00:13:19.200 | more creative. They're better at factuality, better all around than the Gemma 1.0 and 1.1 models.
00:13:25.760 | The other important thing that I want to make sure to highlight from our most recent launch is the
00:13:33.600 | Gemma cookbook. The Gemma cookbook is available on GitHub now and contains 20 different recipes of
00:13:40.720 | ranging from easy to very advanced applications of how to use the Gemma models. And the thing that I am
00:13:47.200 | most excited about is the Gemma cookbook is currently accepting pull requests. So this is a great
00:13:53.280 | opportunity to share with us what you're building with the Gemma models so we can help share it with
00:14:00.480 | the rest of the world. And of course, I have to say, we also wouldn't mind if you started the repository.
00:14:07.760 | Go take a look and tell us what you're building with Gemma. So there are a couple of different ways you
00:14:13.520 | can get started with the Gemma 2 models. Of course, I just mentioned the cookbook. You can also apply to
00:14:20.240 | get GCP credits to accelerate your research using Gemma 2. We have a lot of funding available to support
00:14:29.200 | research. I would really encourage you to fill out an application regardless of how small or big your
00:14:36.080 | project is. We also, as I mentioned, have significantly improved documentation. We have many guides, tutorials,
00:14:43.440 | collabs across every framework so you can get started doing inference, fine tuning, and evaluation with
00:14:49.520 | Gemma 2 models. You can download them anywhere open models are available. And please chat with us on
00:14:56.720 | Discord or other social media channels so we can learn more about what you're building.
00:15:00.960 | And that's about all from me today. I am so excited to see what you all build with Gemma. I have been
00:15:12.000 | working on this project for almost two years now and started working on this project because I, as a
00:15:20.480 | researcher in academia, was disappointed to see how far behind open foundational LLMs were compared to
00:15:29.520 | the rapid improvements we were seeing in proprietary models. So this is something that's very near and
00:15:35.920 | dear to my heart and that I wish I had had when I was actively part of the open source community. So I'm
00:15:42.880 | very excited to see the projects and the research that you all do with these models. Please engage with us on
00:15:49.120 | social media, on GitHub, on Hugging Face, here at the event, and let us know what you think of the models. Let
00:15:56.720 | us know what you think we can do better for next time. And thank you all very much. Really appreciate your time.
00:16:07.760 | Thank you.