back to index

Enter PaLM 2 (New Bard): Full Breakdown - 92 Pages Read and Gemini Before GPT 5? Google I/O


Whisper Transcript | Transcript Only Page

00:00:00.000 | Less than 24 hours ago Google released the Palm 2 technical report. I have read all 92 pages,
00:00:06.740 | watched the Palm 2 presentation, read the release notes and have already tested the model in a
00:00:12.340 | dozen ways. But before getting into it all my four main takeaways are these. First Palm 2 is
00:00:18.300 | competitive with GPT-4 and while it is probably less smart overall it's better in certain ways
00:00:25.200 | and that surprised me. Second Google is saying very little about the data it used to train the
00:00:30.500 | model or about parameters or about compute although we can make educated guesses on each.
00:00:36.680 | Third Gemini was announced to be in training and will likely rival GPT-5 while arriving earlier
00:00:43.160 | than GPT-5. As you probably know Sam Altman said that GPT-5 isn't in training and won't be for a
00:00:49.680 | long time. Fourth while dedicating 20 pages to bias, toxicity and mischief, the Palm 2 is a
00:00:55.180 | There wasn't a single page on AI impacts more broadly. Google boasted of giving Gemini
00:01:01.480 | planning abilities in a move that, surprised as I am to say it, makes open AI look like
00:01:07.420 | paragons of responsibility. So a lot to get to but let's look at the first reason that Palm 2
00:01:13.740 | is different from GPT-4. On page 3 they say we designed a more multilingual and diverse
00:01:19.640 | pre-training mixture extending across hundreds of languages and domains like programming,
00:01:24.160 | mathematics etc.
00:01:25.160 | So because the text that they train Palm 2 on is different to the text that open AI trained GPT-4 on
00:01:31.940 | it means that those models have different abilities and I would say Palm 2 is better at translation
00:01:37.900 | and linguistics and in certain other areas which I'll get to shortly. If that's data what about
00:01:43.540 | parameter count? Well Google never actually say. They only use words like it's significantly
00:01:48.760 | smaller than the largest Palm model which was 540 billion parameters. Sometimes they say
00:01:55.140 | significantly, other times dramatically. Despite this it significantly outperforms Palm on a variety
00:02:02.140 | of tasks. So all the references you may have seen to imminent 100 trillion parameter models were bogus.
00:02:08.360 | Skipping ahead to page 91 out of 92 in the model summary they say further details of model size and
00:02:15.040 | architecture are withheld from external publication. But earlier on they did seem to want to give hints
00:02:20.800 | about the parameter count inside Palm 2 which open AI never did. Here they say that the parameter count
00:02:25.120 | is the optimal number of parameters given a certain amount of compute flops. Scaling this up to the
00:02:31.780 | estimated number of flops used to train Palm 2 that would give an optimal parameter count of between
00:02:37.580 | 100 and 200 billion. That is a comparable parameter count to GPT-3 while getting competitive performance
00:02:45.120 | with GPT-4. BARD is apparently now powered by Palm 2 and the inference speed is about 10 times faster
00:02:52.840 | than GPT-4 for the exact same parameter count. So that's a lot of parameters that are being used to train Palm 2.
00:02:55.100 | And I know there are other factors that influence inference speed but that would
00:02:59.160 | broadly fit with an order of magnitude fewer parameters. This has other implications of course
00:03:04.860 | and they say that Palm 2 is dramatically smaller, cheaper and faster to serve. Not only that Palm 2
00:03:11.580 | itself comes in different sizes as Sundar Pichai said. Palm 2 models deliver excellent foundational
00:03:17.980 | capabilities across a wide range of sizes. We have affectionately named them Gecko, Order,
00:03:25.080 | Bison and Unicorn. Gecko is so lightweight that it can work on mobile devices fast enough for great
00:03:33.660 | interactive applications on device even when offline. I would expect Gecko to soon be inside
00:03:40.280 | the Google Pixel phones. Going back to data Google cryptically said that their pre-training corpus
00:03:46.000 | is composed of a diverse set of sources, documents, books, code, mathematics and conversational data.
00:03:55.060 | This is a very common data issue that these companies face but suffice to say they're not saying anything about where the data comes from.
00:04:00.960 | Next they don't go into detail but they do say that Palm 2 was trained to increase the context
00:04:06.240 | length of a model significantly beyond that of Palm. As of today you can input around 10,000
00:04:11.240 | characters into BARD but they end this paragraph with something a bit more interesting. They say
00:04:16.000 | without demonstrating "Our results show that it is possible to increase the context length of the
00:04:21.040 | model without hurting its performance on generic benchmarks."
00:04:25.040 | The bit about not hurting performance is interesting because in this experiment published a few weeks ago
00:04:30.020 | about extending the input size in tokens up to around 2 million tokens the performance did drop
00:04:35.820 | off. If Google had found a way to increase the input size in tokens and not affect performance
00:04:41.820 | that would be a breakthrough. On multilingual benchmarks notice how the performance of Palm 2
00:04:47.700 | in English is not dramatically better than in other languages. In fact in many other languages
00:04:53.420 | it does better than in English.
00:04:55.020 | This is very different to GPT-4 which was noticeably better in English than in all other languages.
00:05:01.340 | As Google hinted earlier this is likely due to the multilingual text data that Google trained Palm 2 with.
00:05:08.140 | In fact on page 17 Google admit that the performance of Palm 2 exceeds Google Translate for certain languages.
00:05:15.420 | And they show on page 4 that it can pass the mastery exams across a range of languages like Chinese, Japanese, Italian, French, Spanish, German etc.
00:05:25.000 | Look at the difference between Palm 2 and Palm in red.
00:05:28.940 | Now before you rush off and try BARD in all of those languages I tried that and apparently
00:05:34.020 | you can only use BARD at the moment in the following languages: English, US English what a pity,
00:05:40.020 | and Japanese and Korean.
00:05:41.620 | But I was able to test BARD in Korean on a question translated via Google Translate from
00:05:48.260 | the MMLU dataset. It got the question right in each of its drafts. In contrast GPT-4 not only
00:05:54.980 | got the question wrong in Korean when I originally tested it for my Smart GPT video it got the
00:06:01.000 | question wrong in English.
00:06:02.480 | In case any of my regular viewers are wondering I am working very hard on Smart GPT to understand
00:06:08.460 | what it's capable of and getting it benchmarked officially.
00:06:11.800 | And thank you so much for all the kind offers of help in that regard.
00:06:15.480 | I must admit it was very interesting to see on page 14 a direct comparison between Palm
00:06:21.360 | 2 and GPT-4. And Google do admit for the Palm 2 and GPT-4
00:06:24.960 | results they use chain of thought prompting and self consistency. Reading the self consistency
00:06:30.880 | paper did remind me quite a lot actually of Smart GPT where it picks the most consistent
00:06:36.440 | answer of multiple outputs. So I do wonder if this comparison is totally fair if Palm
00:06:42.540 | 2 used this method and GPT-4 didn't.
00:06:44.900 | I'll have to talk about these benchmarks more in another video otherwise this one would
00:06:48.840 | be too long. But a quick hint is that Winogrand is about identifying what the pronoun in a
00:06:54.940 | sentence refers to.
00:06:56.060 | Google also weighed into the emerging abilities debate saying that Palm 2 does indeed demonstrate
00:07:02.220 | new emerging abilities. They say it does so in things like multi-step arithmetic problems,
00:07:07.600 | temporal sequences and hierarchical reasoning. Of course I'm going to test all of those and
00:07:12.440 | have begun to do so already.
00:07:14.320 | And in my early experiments I'm getting quite an interesting result. Palm 2 gets a lot of
00:07:18.780 | questions wrong that GPT-4 gets right but it can also get questions right that GPT-4
00:07:24.920 | gets wrong. And I must admit it's really weird to see Palm 2 getting really advanced
00:07:29.340 | college level math questions right that GPT-4 gets wrong and yet also when I ask it a basic
00:07:34.840 | question about prime numbers it gets it kind of hilariously wrong. Honestly I'm not certain
00:07:39.960 | what's going on there but I do have my suspicions.
00:07:42.920 | Remember though that recent papers have claimed that emergent abilities are a mirage so Google
00:07:48.560 | begs to differ. When Google put Palm 2 up against GPT-4 in high school mathematics problems it did
00:07:54.900 | outperform GPT-4. But again it was using an advanced prompting strategy not 100% different
00:08:01.340 | from SmartGPT so I wonder if the comparison is quite fair. What about coding? Well again it's
00:08:06.820 | really hard to find a direct comparison that's fair between the two models. Overall I would
00:08:12.600 | guess that the specialized coding model of Palm, what they call Palm 2S, is worse than GPT-4.
00:08:19.400 | It says its pass at one accuracy, as in pass first time, is 37.6%.
00:08:24.880 | Remember the Sparks of AGI paper? Well that gave GPT-4 as having an 82% zero shot pass at one
00:08:32.860 | accuracy level. However as I talked about in the Sparks of AGI video the paper admits that it could
00:08:38.540 | be that GPT-4 has seen and memorized some or all of human eval. There is one thing I will give
00:08:45.620 | Google credit on which is that their code now sometimes references where it came from. Here is
00:08:50.680 | a brief extract from the Google keynote presentation.
00:08:54.860 | So let's take a look at the first example.
00:08:56.860 | So let's take a look at the first example.
00:08:58.860 | So let's take a look at the first example.
00:09:00.860 | So let's take a look at the first example.
00:09:02.860 | So let's take a look at the first example.
00:09:04.860 | So let's take a look at the first example.
00:09:06.860 | So let's take a look at the first example.
00:09:08.860 | So let's take a look at the first example.
00:09:10.860 | So let's take a look at the first example.
00:09:12.860 | So let's take a look at the first example.
00:09:14.860 | So let's take a look at the first example.
00:09:16.860 | So let's take a look at the first example.
00:09:18.860 | So let's take a look at the first example.
00:09:20.860 | So let's take a look at the first example.
00:09:22.860 | So let's take a look at the first example.
00:09:24.840 | So let's take a look at the first example.
00:09:26.840 | So let's take a look at the first example.
00:09:28.840 | So let's take a look at the first example.
00:09:30.840 | So let's take a look at the first example.
00:09:32.840 | So let's take a look at the first example.
00:09:34.840 | So let's take a look at the first example.
00:09:36.840 | So let's take a look at the first example.
00:09:38.840 | So let's take a look at the first example.
00:09:40.840 | So let's take a look at the first example.
00:09:42.840 | So let's take a look at the first example.
00:09:44.840 | So let's take a look at the first example.
00:09:46.840 | So let's take a look at the first example.
00:09:48.840 | So let's take a look at the first example.
00:09:50.840 | So let's take a look at the first example.
00:09:52.840 | So let's take a look at the first example.
00:09:54.820 | So let's take a look at the first example.
00:09:56.820 | So let's take a look at the first example.
00:09:58.820 | So let's take a look at the first example.
00:10:00.820 | So let's take a look at the first example.
00:10:02.820 | So let's take a look at the first example.
00:10:04.820 | So let's take a look at the first example.
00:10:06.820 | So let's take a look at the first example.
00:10:08.820 | So let's take a look at the first example.
00:10:10.820 | So let's take a look at the first example.
00:10:12.820 | So let's take a look at the first example.
00:10:14.820 | So let's take a look at the first example.
00:10:16.820 | So let's take a look at the first example.
00:10:18.820 | So let's take a look at the first example.
00:10:20.820 | So let's take a look at the first example.
00:10:22.820 | So let's take a look at the first example.
00:10:24.800 | So let's take a look at the first example.
00:10:26.800 | So let's take a look at the first example.
00:10:28.800 | So let's take a look at the first example.
00:10:30.800 | So let's take a look at the first example.
00:10:32.800 | So let's take a look at the first example.
00:10:34.800 | So let's take a look at the first example.
00:10:36.800 | So let's take a look at the first example.
00:10:38.800 | So let's take a look at the first example.
00:10:40.800 | So let's take a look at the first example.
00:10:42.800 | So let's take a look at the first example.
00:10:44.800 | So let's take a look at the first example.
00:10:46.800 | So let's take a look at the first example.
00:10:48.800 | So let's take a look at the first example.
00:10:50.800 | So let's take a look at the first example.
00:10:52.800 | So let's take a look at the first example.
00:10:54.780 | So let's take a look at the first example.
00:10:56.780 | So let's take a look at the first example.
00:10:58.780 | So let's take a look at the first example.
00:11:00.780 | So let's take a look at the first example.
00:11:02.780 | So let's take a look at the first example.
00:11:04.780 | So let's take a look at the first example.
00:11:06.780 | So let's take a look at the first example.
00:11:08.780 | So let's take a look at the first example.
00:11:10.780 | So let's take a look at the first example.
00:11:12.780 | So let's take a look at the first example.
00:11:14.780 | So let's take a look at the first example.
00:11:16.780 | So let's take a look at the first example.
00:11:18.780 | So let's take a look at the first example.
00:11:20.780 | So let's take a look at the first example.
00:11:22.780 | So let's take a look at the first example.
00:11:24.760 | So let's take a look at the first example.
00:11:26.760 | So let's take a look at the first example.
00:11:28.760 | So let's take a look at the first example.
00:11:30.760 | So let's take a look at the first example.
00:11:32.760 | So let's take a look at the first example.
00:11:34.760 | So let's take a look at the first example.
00:11:36.760 | So let's take a look at the first example.
00:11:38.760 | So let's take a look at the first example.
00:11:40.760 | So let's take a look at the first example.
00:11:42.760 | So let's take a look at the first example.
00:11:44.760 | So let's take a look at the first example.
00:11:46.760 | So let's take a look at the first example.
00:11:48.760 | So let's take a look at the first example.
00:11:50.760 | So let's take a look at the first example.
00:11:52.760 | So let's take a look at the first example.
00:11:54.740 | So let's take a look at the first example.
00:11:56.740 | So let's take a look at the first example.
00:11:58.740 | So let's take a look at the first example.
00:12:00.740 | So let's take a look at the first example.
00:12:02.740 | So let's take a look at the first example.
00:12:04.740 | So let's take a look at the first example.
00:12:06.740 | So let's take a look at the first example.
00:12:08.740 | So let's take a look at the first example.
00:12:10.740 | So let's take a look at the first example.
00:12:12.740 | So let's take a look at the first example.
00:12:14.740 | So let's take a look at the first example.
00:12:16.740 | So let's take a look at the first example.
00:12:18.740 | So let's take a look at the first example.
00:12:20.740 | So let's take a look at the first example.
00:12:22.740 | So let's take a look at the first example.
00:12:24.740 | So let's take a look at the first example.
00:12:26.740 | So let's take a look at the first example.
00:12:28.740 | So let's take a look at the first example.
00:12:30.740 | So let's take a look at the first example.
00:12:32.740 | So let's take a look at the first example.
00:12:34.740 | So let's take a look at the first example.
00:12:36.740 | So let's take a look at the first example.
00:12:38.740 | So let's take a look at the first example.
00:12:40.740 | So let's take a look at the first example.
00:12:42.740 | So let's take a look at the first example.
00:12:44.740 | So let's take a look at the first example.
00:12:46.740 | So let's take a look at the first example.
00:12:48.740 | So let's take a look at the first example.
00:12:50.740 | So let's take a look at the first example.
00:12:52.740 | So let's take a look at the first example.
00:12:54.740 | So let's take a look at the first example.
00:12:56.740 | So let's take a look at the first example.
00:12:58.740 | So let's take a look at the first example.
00:13:00.740 | So let's take a look at the first example.
00:13:02.740 | So let's take a look at the first example.
00:13:04.740 | So let's take a look at the first example.
00:13:06.740 | So let's take a look at the first example.
00:13:08.740 | So let's take a look at the first example.
00:13:10.740 | So let's take a look at the first example.
00:13:12.740 | So let's take a look at the first example.
00:13:14.740 | So let's take a look at the first example.
00:13:16.740 | So let's take a look at the first example.
00:13:18.740 | So let's take a look at the first example.
00:13:20.740 | So let's take a look at the first example.
00:13:22.740 | So let's take a look at the first example.
00:13:24.740 | So let's take a look at the first example.
00:13:26.740 | So let's take a look at the first example.
00:13:28.740 | So let's take a look at the first example.
00:13:30.740 | So let's take a look at the first example.
00:13:32.740 | So let's take a look at the first example.
00:13:34.740 | So let's take a look at the first example.
00:13:36.740 | So let's take a look at the first example.
00:13:38.740 | So let's take a look at the first example.
00:13:40.740 | So let's take a look at the first example.
00:13:42.740 | So let's take a look at the first example.
00:13:44.740 | So let's take a look at the first example.
00:13:46.740 | So let's take a look at the first example.
00:13:48.740 | So let's take a look at the first example.
00:13:50.740 | So let's take a look at the first example.
00:13:52.740 | So let's take a look at the first example.
00:13:54.740 | So let's take a look at the first example.
00:13:56.740 | So let's take a look at the first example.
00:13:58.740 | So let's take a look at the first example.
00:14:00.740 | So let's take a look at the first example.
00:14:02.740 | So let's take a look at the first example.
00:14:04.740 | So let's take a look at the first example.
00:14:06.740 | So let's take a look at the first example.
00:14:08.740 | So let's take a look at the first example.
00:14:10.740 | So let's take a look at the first example.
00:14:12.740 | So let's take a look at the first example.
00:14:14.740 | So let's take a look at the first example.
00:14:16.740 | So let's take a look at the first example.
00:14:18.740 | So let's take a look at the first example.
00:14:20.740 | So let's take a look at the first example.
00:14:22.740 | So let's take a look at the first example.
00:14:24.740 | So let's take a look at the first example.
00:14:26.740 | So let's take a look at the first example.
00:14:28.740 | So let's take a look at the first example.
00:14:30.740 | So let's take a look at the first example.
00:14:32.740 | So let's take a look at the first example.
00:14:34.740 | So let's take a look at the first example.
00:14:36.740 | So let's take a look at the first example.
00:14:38.740 | So let's take a look at the first example.
00:14:40.740 | So let's take a look at the first example.
00:14:42.740 | So let's take a look at the first example.
00:14:44.740 | So let's take a look at the first example.
00:14:46.740 | So let's take a look at the first example.
00:14:48.740 | So let's take a look at the first example.
00:14:50.740 | So let's take a look at the first example.
00:14:52.740 | So let's take a look at the first example.
00:14:54.740 | So let's take a look at the first example.
00:14:56.740 | So let's take a look at the first example.
00:14:58.740 | So let's take a look at the first example.
00:15:00.740 | So let's take a look at the first example.
00:15:02.740 | So let's take a look at the first example.
00:15:04.740 | So let's take a look at the first example.
00:15:06.740 | So let's take a look at the first example.
00:15:08.740 | So let's take a look at the first example.
00:15:10.740 | So let's take a look at the first example.
00:15:12.740 | So let's take a look at the first example.
00:15:14.740 | So let's take a look at the first example.
00:15:16.740 | So let's take a look at the first example.
00:15:18.740 | So let's take a look at the first example.
00:15:20.740 | So let's take a look at the first example.
00:15:22.740 | So let's take a look at the first example.
00:15:24.740 | So let's take a look at the first example.
00:15:26.740 | So let's take a look at the first example.
00:15:28.740 | So let's take a look at the first example.
00:15:30.740 | So let's take a look at the first example.
00:15:32.740 | So let's take a look at the first example.
00:15:34.740 | So let's take a look at the first example.
00:15:36.740 | So let's take a look at the first example.
00:15:38.740 | So let's take a look at the first example.
00:15:40.740 | So let's take a look at the first example.
00:15:42.740 | So let's take a look at the first example.
00:15:44.740 | So let's take a look at the first example.
00:15:46.740 | So let's take a look at the first example.
00:15:48.740 | So let's take a look at the first example.
00:15:50.740 | So let's take a look at the first example.
00:15:52.740 | So let's take a look at the first example.
00:15:54.740 | So let's take a look at the first example.
00:15:56.740 | So let's take a look at the first example.
00:15:58.740 | So let's take a look at the first example.
00:16:00.740 | So let's take a look at the first example.
00:16:02.740 | So let's take a look at the first example.
00:16:04.740 | So let's take a look at the first example.
00:16:06.740 | So let's take a look at the first example.
00:16:08.740 | So let's take a look at the first example.
00:16:10.740 | So let's take a look at the first example.
00:16:12.740 | So let's take a look at the first example.
00:16:14.740 | So let's take a look at the first example.
00:16:16.740 | So let's take a look at the first example.
00:16:18.740 | So let's take a look at the first example.
00:16:20.740 | So let's take a look at the first example.
00:16:22.740 | So let's take a look at the first example.
00:16:24.740 | So let's take a look at the first example.
00:16:26.740 | So let's take a look at the first example.
00:16:28.740 | So let's take a look at the first example.
00:16:30.740 | So let's take a look at the first example.
00:16:32.740 | So let's take a look at the first example.
00:16:34.740 | So let's take a look at the first example.
00:16:36.740 | So let's take a look at the first example.
00:16:38.740 | So let's take a look at the first example.
00:16:40.740 | So let's take a look at the first example.
00:16:42.740 | So let's take a look at the first example.
00:16:44.740 | So let's take a look at the first example.
00:16:46.740 | So let's take a look at the first example.
00:16:48.740 | So let's take a look at the first example.
00:16:50.740 | So let's take a look at the first example.
00:16:52.740 | So let's take a look at the first example.
00:16:54.740 | So let's take a look at the first example.
00:16:56.740 | So let's take a look at the first example.
00:16:58.740 | So let's take a look at the first example.
00:17:00.740 | So let's take a look at the first example.
00:17:02.740 | So let's take a look at the first example.
00:17:04.740 | So let's take a look at the first example.
00:17:06.740 | So let's take a look at the first example.
00:17:08.740 | So let's take a look at the first example.
00:17:10.740 | So let's take a look at the first example.
00:17:12.740 | So let's take a look at the first example.
00:17:14.740 | So let's take a look at the first example.