Google Gemini: AlphaGo-GPT?

In a somewhat provocative new interview with Wired Magazine, Demis Hassabis, head of Google DeepMind, is quoted as saying that Gemini, which could be released as soon as this winter, will be more capable than OpenAI's ChatGPT. He reveals that they are attempting to combine some of the strengths of AlphaGo type systems with the amazing language capabilities of large models.

Before we look into how that might work, here is the context of the Gemini announcement from Sundar Pichai. They are focused on building more capable systems safely and responsibly. This includes our next generation foundation model, Gemini, which is still in training. While still early, we are already seeing impressive multi-model capabilities not seen in prior models.

Hassabis promises that we also have some new innovations that are going to be pretty interesting. And I know many people will dismiss this as all talk, but remember DeepMind was behind not just AlphaGo, but also AlphaZero, which can play any two-player full information game from scratch. They were also behind AlphaStar, which conquered StarCraft 2 with quote, long-term planning.

And let's remember that for later. And most famously, perhaps, Hassabis led them to the incredible breakthrough of AlphaFold and AlphaFold2, which are already impacting the fight against plastic pollution and antibiotic resistance. So let's not underestimate DeepMind. To Gemini, we hear from the information recently that the multi-modality of Gemini will be helped in part by training on YouTube videos.

And apparently YouTube was also mined by OpenAI. Of course, that's not just the text transcripts, but also the audio, imagery, and probably comments. I wonder if Google DeepMind might one day use YouTube for more than that. A few days ago, they released this paper on RoboCat, which they call a self-improving foundation agent for robotic manipulation.

And the paper says that with RoboCat, we demonstrate the ability to generalize to new tasks and robots, both zero-shot as well as through adaptation using only a hundred to a thousand examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop.

Notice that part about using the model itself to generate data. That reminded me of a conversation I had with one of the authors of the textbooks are all you need paper, Ronan Eldan from Microsoft. I'm making a video on their new Phi1 model for coding. We had a really great chat and we were discussing at one point AGI timelines.

And I said this, when you get elite math papers with proofs and elite scientific research, if you train on much more of those for way more epochs, I don't think we're that far away from AGI. I personally can't see any barrier within the next five years. Ronan said this, as you said, I also don't see any barrier to AGI.

My intuition is that there's probably a lot more improvement we can do with the data we have and maybe a little bit more synthetic data. And this is even without starting to talk about self-improving mechanisms like AlphaZero, where the more you train models with some verification process and you generate more data, this can be done in math and other things as we see here with RoboCat.

So you know, there's just so many directions where we can still go that I don't think we're going to hit a ceiling anytime soon. Can't wait to show you guys the rest of that paper and what else I learned from Ronan, who is also by the way the author of the Tiny Stories paper.

But back to Gemini. If you remember the planning bit from DeepMind's earlier systems, that reminded me of something else from Gemini's introduction. Gemini was created from the ground up to be multi-modal, highly efficient at tool and API integrations and built to enable future innovations like memory and planning. This is echoed in the article in which Hassabis says his team will combine a language model like GPT-4 with techniques used in AlphaGo, aiming to give the system new capabilities such as planning or the ability to solve problems.

Interestingly, this comes just a few weeks after DeepMind's Extreme Risks paper, which identified long horizon planning as a dangerous capability. For example, adapting its plans in the light of unexpected obstacles or adversaries and generalizing to novel or new settings. For me, this is a bit like when a model can predict what humans would do in reaction to its own output.

Back to the article, it's interesting though that Hassabis is both tasked with accelerating Google's AI efforts while also managing unknown and potentially grave risks. So what's his take? Hassabis says the extraordinary potential benefits of AI, such as forced scientific discovery in areas like health or climate, and the ability to develop new technologies and technologies that will help humanity.

He also believes that mandating a pause is impractical as it would be near impossible to enforce. If done correctly, it will be the most beneficial technology for humanity ever, he says of AI. We've got to boldly and bravely go after those things. So how would AlphaGo become AlphaGo GPT?

Hassabis described the basic approach behind AlphaGo in two of his recent talks. So what's going on here then? Well, effectively, if one thinks of a Go tree as the tree of all possibilities, and you imagine each node in this tree is a Go position. So what we're basically doing is guiding the search with the model.

So the model is coming up with most probable moves and therefore guiding the tree search to be very efficient. And then when it runs out of time, of course, then it outputs the best tree that it's found up to that point. We've learned that from data or from simulated data.

Ideally, you have both in many cases. So in games, obviously, we have this, it's effectively simulated data. And then what you do is you take that model, and then you use that model to guide a search process according to some objective function. I think this is a general way to think about a lot of problems.

I'm not saying every problem can fit into that. I mean, maybe. And I'll give you an example from drug discovery, which is what we're trying to do at Isomorphic. So this is the tree I showed you earlier, finding the best Go move, right? You're trying to find a near optimal or close to optimal Go move and Go strategy.

Well, what happens if we just change those nodes to chemical compounds? Now, let me know in the comments if that reminded anyone else of the Tree of Thoughts paper in which multiple plans are sampled and results were exponentially better on tasks that GPT-4 finds impossible, like creating workable crosswords or mathematical problems that require a bit of planning, like creating the greatest integer from a set of four integers using operations like multiplying and addition.

Well, I think my theory might have some legs because look at where many of the authors of this paper work. And just yesterday, as I was researching for this video, the Tree of Thoughts paper was also cited in this paper on using language models to prove mathematical theorems. As you can see at the moment, GPT-4 doesn't do a great job.

But my point in bringing this up was this. They say towards the end of the paper that another key limitation of ChatGPT was its inability to search systematically in a large space. Remember, that's what AlphaGo is really good at. We frequently found that it stuck to an unpromising path when the correct solution could be found by backtracking, a la Tree of Thoughts, and exploring alternative paths.

This behavior is consistent with the general observation that LLMs are weak at search and planning. Addressing this weakness is an active area of research and then they reference the Tree of Thoughts paper. It could well be that Gemini, let alone Gemini 2, reaches state of the art for mathematical theorem proving.

And to be honest, once we can prove theorems we won't be as far from generating new ones. And in my opinion, fusing this AlphaGo style branching mechanism with a large language model could work for other things. We've all seen models like GPT-4 sometimes give a bad initial answer, picking just the most probable output in a way that's sometimes called "greedy decoding".

But methods like SmartGPT and self-consistency demonstrate that the first initial or most probable output doesn't always reflect the best that a model can do. And this is just one of the reasons, as I said to Ronan, that I honestly think we could see a model hit 100% in the MMLU in less than 5 years.

The MMLU, which I talked about in my SmartGPT video, is a famous machine learning benchmark, testing everything from formal logic to physics and politics. And I know that predicting 100% performance within 5 years is a very bold prediction, but that is my prediction. But if those are the growing capabilities, what does Demis Hassabis think about the implications of the sheer power of such a model?

One of the biggest challenges right now, Hassabis says, is to determine what the risks of a more capable AI are likely to be. I think more research by the field needs to be done very urgently on things like evaluation tests, he says, to determine how capable and controllable new AI models are.

He later mentions giving academia early access to these frontier models. And they do seem to be following through on this with DeepMind, OpenAI and Anthropic giving early access to their foundation models to the UK AI Task Force. This Foundation Model Task Force is led by Ian Hogarth, who was actually the author of this, the "We Must Slow Down the Race to Godlike AI" paper that I did a video on back in April.

Do check that video out. But in the article, Hogarth mentioned a practical plan to transform these companies into a CERN-like organisation. And somewhat unexpectedly, this idea was echoed this week by none other than Satya Nadella, who had earlier called on Google to "dance". Satya Nadella: Essentially, the biggest unsolved problem is how do you ensure both at sort of a scientific understanding level and then the practical engineering level that you can make sure that the AI never goes out of control.

And that's where I think there needs to be a CERN-like project where both the academics along with corporations and governments all come together to perhaps solve that alignment problem. But back to the article, the interview with Hassabis ended with this somewhat chilling response to the question "How worried should you be?" Hassabis says that no one really knows for sure that AI will become a major danger, but he is certain that if progress continues at its current pace, there isn't much time to develop safeguards.

I can see the kind of things we're building into the Gemini series and we have no reason to believe they won't work. My own thoughts on this article are twofold. First, I think it's a good idea to have a CERN-like organisation. I think it's a good idea to have a CERN-like organisation.

I think it's a good idea to have a CERN-like organisation. That we might not want to underestimate Google and Hassabis and that adding AlphaGo type systems probably will work. And second, based on his comments, I do think there needs to be more clarity on just how much of Google DeepMind's workforce is working on these evaluations and pre-emptive measures.

This article from a few months ago estimates that there may be less than 100 researchers focused on those areas. Out of 1000, so is it even 5% of the total? And if not, why take too seriously the commitments at any AI summit such as the one happening this autumn in the UK on safety?

On the other hand, if Hassabis revealed that half or more of his workforce were on the case, then we could be more confident that the creators of AlphaGo and my fellow Londoners had a good chance of tree-searching to safety and success. As always, thank you so much for watching and have a wonderful day.

Google Gemini: AlphaGo-GPT?

Chapters

Transcript