back to indexGoogle Gemini: AlphaGo-GPT?
Chapters
0:0 Intro
0:15 Context
4:0 AlphaGoGPT
6:30 Truth of Thoughts Paper
8:49 Implications
00:00:00.000 |
In a somewhat provocative new interview with Wired Magazine, Demis Hassabis, head of Google 00:00:06.000 |
DeepMind, is quoted as saying that Gemini, which could be released as soon as this winter, 00:00:11.120 |
will be more capable than OpenAI's ChatGPT. He reveals that they are attempting to combine 00:00:17.500 |
some of the strengths of AlphaGo type systems with the amazing language capabilities of large 00:00:23.700 |
models. Before we look into how that might work, here is the context of the Gemini announcement 00:00:28.840 |
from Sundar Pichai. They are focused on building more capable systems safely and responsibly. 00:00:35.280 |
This includes our next generation foundation model, Gemini, which is still in training. While 00:00:41.200 |
still early, we are already seeing impressive multi-model capabilities not seen in prior models. 00:00:47.340 |
Hassabis promises that we also have some new innovations that are going to be pretty 00:00:52.560 |
interesting. And I know many people will dismiss this as all talk, but remember DeepMind was behind 00:00:58.680 |
not just AlphaGo, but also AlphaZero, which can play any two-player full information game from 00:01:05.220 |
scratch. They were also behind AlphaStar, which conquered StarCraft 2 with quote, 00:01:10.140 |
long-term planning. And let's remember that for later. And most famously, perhaps, Hassabis led 00:01:15.220 |
them to the incredible breakthrough of AlphaFold and AlphaFold2, which are already impacting the 00:01:21.860 |
fight against plastic pollution and antibiotic resistance. So let's not underestimate DeepMind. 00:01:28.520 |
To Gemini, we hear from the information recently that the multi-modality of Gemini will be helped 00:01:34.300 |
in part by training on YouTube videos. And apparently YouTube was also mined by OpenAI. 00:01:41.320 |
Of course, that's not just the text transcripts, but also the audio, imagery, and probably 00:01:46.760 |
comments. I wonder if Google DeepMind might one day use YouTube for more than that. A few days 00:01:52.380 |
ago, they released this paper on RoboCat, which they call a self-improving foundation agent for 00:01:58.360 |
robotic manipulation. And the paper says that with RoboCat, we demonstrate the ability to 00:02:03.480 |
generalize to new tasks and robots, both zero-shot as well as through adaptation using only a hundred 00:02:10.240 |
to a thousand examples for the target task. We also show how a trained model itself can be used 00:02:15.500 |
to generate data for subsequent training iterations, thus providing a basic building block 00:02:21.580 |
for an autonomous improvement loop. Notice that part about using the model itself to generate data. 00:02:28.200 |
That reminded me of a conversation I had with one of the authors of the textbooks are all you need 00:02:34.260 |
paper, Ronan Eldan from Microsoft. I'm making a video on their new Phi1 model for coding. We had 00:02:40.880 |
a really great chat and we were discussing at one point AGI timelines. And I said this, when you get 00:02:46.420 |
elite math papers with proofs and elite scientific research, if you train on much more of those for 00:02:52.340 |
way more epochs, I don't think we're that far away from AGI. I personally can't see any barrier within 00:02:58.040 |
the next five years. Ronan said this, as you said, I also don't see any barrier to AGI. My intuition is 00:03:04.200 |
that there's probably a lot more improvement we can do with the data we have and maybe a little 00:03:09.160 |
bit more synthetic data. And this is even without starting to talk about self-improving mechanisms 00:03:14.920 |
like AlphaZero, where the more you train models with some verification process and you generate 00:03:21.160 |
more data, this can be done in math and other things as we see here with RoboCat. So you know, there's just so many 00:03:27.880 |
directions where we can still go that I don't think we're going to hit a ceiling anytime soon. 00:03:32.920 |
Can't wait to show you guys the rest of that paper and what else I learned from Ronan, who is also by 00:03:37.240 |
the way the author of the Tiny Stories paper. But back to Gemini. If you remember the planning bit 00:03:42.600 |
from DeepMind's earlier systems, that reminded me of something else from Gemini's introduction. 00:03:48.040 |
Gemini was created from the ground up to be multi-modal, 00:03:51.640 |
highly efficient at tool and API integrations and built to enable future 00:03:57.720 |
innovations like memory and planning. This is echoed in the article in which 00:04:02.600 |
Hassabis says his team will combine a language model like GPT-4 with techniques used in AlphaGo, 00:04:08.760 |
aiming to give the system new capabilities such as planning or the ability to solve problems. 00:04:15.560 |
Interestingly, this comes just a few weeks after DeepMind's Extreme Risks paper, which identified 00:04:21.560 |
long horizon planning as a dangerous capability. For example, adapting its plans in the light of 00:04:27.560 |
unexpected obstacles or adversaries and generalizing to novel or new settings. 00:04:32.920 |
For me, this is a bit like when a model can predict what humans would do in reaction to its own output. 00:04:38.520 |
Back to the article, it's interesting though that Hassabis is both tasked with accelerating 00:04:43.960 |
Google's AI efforts while also managing unknown and potentially grave risks. 00:04:49.000 |
So what's his take? Hassabis says the extraordinary potential benefits of AI, 00:04:53.720 |
such as forced scientific discovery in areas like health or climate, 00:04:57.400 |
and the ability to develop new technologies and technologies that will help humanity. 00:05:01.240 |
He also believes that mandating a pause is impractical as it would be near impossible to enforce. 00:05:06.120 |
If done correctly, it will be the most beneficial technology for humanity ever, he says of AI. 00:05:11.880 |
We've got to boldly and bravely go after those things. 00:05:19.560 |
Hassabis described the basic approach behind AlphaGo in two of his recent talks. 00:05:24.120 |
So what's going on here then? Well, effectively, if one 00:05:27.240 |
thinks of a Go tree as the tree of all possibilities, and you imagine each node in this tree is a Go position. 00:05:33.880 |
So what we're basically doing is guiding the search with the model. 00:05:37.000 |
So the model is coming up with most probable moves and therefore guiding the tree search to be very efficient. 00:05:44.280 |
And then when it runs out of time, of course, then it outputs the best tree that it's found up to that point. 00:05:49.960 |
We've learned that from data or from simulated data. 00:05:53.720 |
Ideally, you have both in many cases. So in games, obviously, we have 00:05:57.080 |
this, it's effectively simulated data. And then what you do is you take that model, 00:06:01.320 |
and then you use that model to guide a search process according to some objective function. 00:06:07.240 |
I think this is a general way to think about a lot of problems. 00:06:10.040 |
I'm not saying every problem can fit into that. I mean, maybe. 00:06:13.000 |
And I'll give you an example from drug discovery, which is what we're trying to do at Isomorphic. 00:06:17.720 |
So this is the tree I showed you earlier, finding the best Go move, right? 00:06:20.920 |
You're trying to find a near optimal or close to optimal Go move and Go strategy. Well, 00:06:26.920 |
what happens if we just change those nodes to chemical compounds? 00:06:31.240 |
Now, let me know in the comments if that reminded anyone else of the Tree of Thoughts paper in which 00:06:36.680 |
multiple plans are sampled and results were exponentially better on tasks that 00:06:41.880 |
GPT-4 finds impossible, like creating workable crosswords or mathematical 00:06:46.280 |
problems that require a bit of planning, like creating the greatest integer from a set of 00:06:51.240 |
four integers using operations like multiplying and addition. Well, I think my theory might have 00:06:56.760 |
some legs because look at where many of the authors of this paper work. 00:07:01.640 |
And just yesterday, as I was researching for this video, the Tree of Thoughts paper was also cited 00:07:07.720 |
in this paper on using language models to prove mathematical theorems. As you can see at the 00:07:13.000 |
moment, GPT-4 doesn't do a great job. But my point in bringing this up was this. 00:07:17.000 |
They say towards the end of the paper that another key limitation of ChatGPT 00:07:21.560 |
was its inability to search systematically in a large space. Remember, that's what AlphaGo is 00:07:26.600 |
really good at. We frequently found that it stuck to an unpromising path when the correct solution 00:07:32.200 |
could be found by backtracking, a la Tree of Thoughts, and exploring alternative paths. 00:07:37.720 |
This behavior is consistent with the general observation that LLMs are weak at search and 00:07:42.760 |
planning. Addressing this weakness is an active area of research and then they reference the Tree 00:07:47.480 |
of Thoughts paper. It could well be that Gemini, let alone Gemini 2, 00:07:51.720 |
reaches state of the art for mathematical theorem proving. And to be honest, once we can 00:07:56.440 |
prove theorems we won't be as far from generating new ones. And in my opinion, fusing this AlphaGo 00:08:02.600 |
style branching mechanism with a large language model could work for other things. We've all seen 00:08:07.640 |
models like GPT-4 sometimes give a bad initial answer, picking just the most probable output 00:08:13.000 |
in a way that's sometimes called "greedy decoding". But methods like SmartGPT and 00:08:17.320 |
self-consistency demonstrate that the first initial or most probable output 00:08:22.280 |
doesn't always reflect the best that a model can do. And this is just one of the 00:08:26.280 |
reasons, as I said to Ronan, that I honestly think we could see a model hit 100% in the MMLU 00:08:32.760 |
in less than 5 years. The MMLU, which I talked about in my SmartGPT video, is a famous machine 00:08:38.440 |
learning benchmark, testing everything from formal logic to physics and politics. And I know that 00:08:43.800 |
predicting 100% performance within 5 years is a very bold prediction, but that is my prediction. 00:08:49.480 |
But if those are the growing capabilities, what does Demis Hassabis think about the implications of the sheer 00:08:56.120 |
power of such a model? One of the biggest challenges right now, Hassabis says, is to determine 00:09:01.880 |
what the risks of a more capable AI are likely to be. I think more research by the field needs to be 00:09:08.120 |
done very urgently on things like evaluation tests, he says, to determine how capable and 00:09:14.680 |
controllable new AI models are. He later mentions giving academia early access to these frontier 00:09:20.600 |
models. And they do seem to be following through on this with DeepMind, OpenAI and Anthropic giving 00:09:25.960 |
early access to their foundation models to the UK AI Task Force. This Foundation Model Task Force is 00:09:32.920 |
led by Ian Hogarth, who was actually the author of this, the "We Must Slow Down the Race to Godlike 00:09:39.560 |
AI" paper that I did a video on back in April. Do check that video out. But in the article, 00:09:44.600 |
Hogarth mentioned a practical plan to transform these companies into a CERN-like organisation. 00:09:51.240 |
And somewhat unexpectedly, this idea was echoed this week by none other than 00:09:55.800 |
Satya Nadella, who had earlier called on Google to "dance". 00:09:59.560 |
Satya Nadella: Essentially, the biggest unsolved problem is how do you ensure both at sort of a 00:10:05.480 |
scientific understanding level and then the practical engineering level that you can make 00:10:11.480 |
sure that the AI never goes out of control. And that's where I think there needs to be a CERN-like 00:10:17.480 |
project where both the academics along with corporations and governments all come together to 00:10:25.640 |
But back to the article, the interview with Hassabis ended with this somewhat chilling 00:10:33.000 |
response to the question "How worried should you be?" Hassabis says that no one really knows for 00:10:37.480 |
sure that AI will become a major danger, but he is certain that if progress continues at its current 00:10:43.240 |
pace, there isn't much time to develop safeguards. I can see the kind of things we're building into 00:10:48.760 |
the Gemini series and we have no reason to believe they won't work. My own thoughts on this article 00:10:54.600 |
are twofold. First, I think it's a good idea to have a CERN-like organisation. I think it's a good 00:10:55.480 |
idea to have a CERN-like organisation. I think it's a good idea to have a CERN-like organisation. 00:10:55.800 |
That we might not want to underestimate Google and Hassabis and that adding AlphaGo type systems 00:11:01.720 |
probably will work. And second, based on his comments, I do think there needs to be more 00:11:06.280 |
clarity on just how much of Google DeepMind's workforce is working on these evaluations and 00:11:12.600 |
pre-emptive measures. This article from a few months ago estimates that there may be less than 00:11:17.400 |
100 researchers focused on those areas. Out of 1000, so is it even 5% of the total? And if not, 00:11:25.320 |
why take too seriously the commitments at any AI summit such as the one happening this autumn in 00:11:30.920 |
the UK on safety? On the other hand, if Hassabis revealed that half or more of his workforce were 00:11:36.920 |
on the case, then we could be more confident that the creators of AlphaGo and my fellow Londoners 00:11:43.320 |
had a good chance of tree-searching to safety and success. 00:11:47.800 |
As always, thank you so much for watching and have a wonderful day.