back to indexAlpha Everywhere: AlphaGeometry, AlphaCodium and the Future of LLMs
00:00:00.000 |
24 hours ago, Google DeepMind released Alpha Geometry, 00:00:04.360 |
and while their leaders are calling it a step toward AGI, 00:00:08.600 |
the team itself is warning everyone not to overhype it. 00:00:12.720 |
I've read the paper in Nature, the press releases, 00:00:16.880 |
that hitting gold for geometry in the International Math 00:00:22.880 |
it signifies about the growing alliance between language 00:00:26.280 |
models and search, idea generation, and brute force. 00:00:30.520 |
In that same vein, we'll also take a quick peek 00:00:33.600 |
at Alpha Codium, the brand new open-sourced rival to Alpha 00:00:40.600 |
But let's start all the way down in the day-to-day way 00:00:48.200 |
If you think this is the way to go to get kids interested, 00:01:07.640 |
A tiny distance, dx, multiplied by the height, which 00:01:17.080 |
For those who don't know, the International Math Olympiad 00:01:20.360 |
is the most prestigious math competition in the world. 00:01:26.000 |
just to get into the International Math Olympiad. 00:01:34.560 |
scores almost as highly as the average IMO gold medalist, 00:01:38.920 |
but specifically for a subset of geometry problems only. 00:01:46.720 |
So it's not like Alpha Geometry did an IMO test. 00:01:53.080 |
Nevertheless, getting a gold medal overall in the IMO 00:01:56.680 |
has long been one of the holy grails of machine learning. 00:02:00.600 |
That's maybe why one of the co-founders of DeepMind 00:02:05.000 |
And even Demis Hassabis, the leader of DeepMind 00:02:11.280 |
This represents another step on the road to AGI. 00:02:21.080 |
but also he might have read some of the caveats 00:02:49.880 |
Well, alpha geometry is a neuro-symbolic system, 00:02:55.280 |
and the old-fashioned symbolic pre-programmed systems. 00:02:58.560 |
And in fact, that alliance between large language models, 00:03:01.120 |
neural networks, and old-fashioned pre-programmed systems 00:03:06.560 |
Idea generation, and you could call it creativity, 00:03:11.680 |
That alliance, I predict in the future, will yield AGI. 00:03:19.440 |
that two angles are equal in an isosceles triangle. 00:03:30.040 |
The thing is, symbolic systems aren't designed 00:03:38.720 |
The language model in this case was only 151 million parameters 00:03:43.360 |
and it was trained on a purely synthetic data. 00:03:48.400 |
was all about getting the model to provide proofs 00:03:53.600 |
In 91 million of those samples, brute force would be enough, 00:03:57.400 |
just step-by-step deduction using known rules. 00:04:03.960 |
The authors call them pulling rabbits out of the hat. 00:04:06.280 |
And the language model was fine-tuned on those examples. 00:04:09.480 |
It paid particular attention to those examples. 00:04:12.440 |
Basically, it got really good at suggesting such constructs. 00:04:16.040 |
Going back to this example, the moment you posit that line, 00:04:19.240 |
an old-fashioned symbolic deducer could then solve the rest. 00:04:27.280 |
Basically, the angle at B and the angle at C. 00:04:29.640 |
If, by the way, the deducer couldn't solve the problem, 00:04:36.280 |
While most of that training data involved basic proofs, 00:04:54.160 |
tend not to be symmetrical like human-discovered theorems, 00:04:57.840 |
as they are not biased towards any aesthetic standard. 00:05:05.480 |
The lead author of the paper put this really well 00:05:10.720 |
and pointed out that the approach isn't fully novel. 00:05:13.760 |
- The general observation here is that given a hard problem, 00:05:16.720 |
we usually have to come up with one or more rabbits 00:05:25.120 |
or the mechanical solver can just take the problem 00:05:28.400 |
But if the solver failed to solve the problem, 00:05:31.120 |
then we can always come back and ask for more rabbits. 00:05:44.280 |
that is trained to propose magic instruction. 00:05:48.560 |
that is tasked with handling all the mechanical cases 00:05:53.200 |
And then we put these two components into a loop 00:06:02.800 |
that is the observation of neural symbolic structure 00:06:06.000 |
is not a novel observation that is made in our work. 00:06:11.480 |
have already pointed out that a major limitation 00:06:23.600 |
- Geometry, it seems, might be particularly amenable 00:06:27.800 |
As one IMO gold medalist and fields medalist put it, 00:06:35.880 |
"in the sense that we have a rather small number 00:06:44.920 |
trained on a hundred million proofs with GPT-4. 00:06:54.360 |
Of course, deciding which of the many constructs to use 00:07:17.280 |
Speaking of search and compute budget though, 00:07:21.080 |
They use NVIDIA's V100 GPUs and said somewhat modestly, 00:07:25.240 |
"Scaling up these factors to examine a larger fraction 00:07:30.200 |
"might improve alpha geometry results even further." 00:07:35.280 |
because the V100 was replaced in 2020 with the A100, 00:07:42.920 |
And yes, I know I pronounce my H's in a Cockney way. 00:07:56.440 |
So the fact that they use V100s is incredibly impressive. 00:07:59.560 |
I feel like the bitter lesson is gonna strike again soon 00:08:02.400 |
and IMO geometry is gonna be all but solved by next year. 00:08:06.320 |
I must caution though that this had been foreseen, 00:08:20.920 |
DeepMind in their blog post go a bit further though. 00:08:23.520 |
They described this as demonstrating AI's growing ability 00:08:30.960 |
I feel like there might be years more of debate 00:08:33.040 |
over whether it's appropriate to use that word reason 00:08:37.280 |
But in the end, it might end up being semantics. 00:08:44.080 |
Within a year, they hope it will be inside Google's Gemini. 00:08:47.840 |
Remember, Google also promised that that alpha code too 00:08:56.000 |
if this is an example of mathematics falling first, 00:08:59.220 |
which would then lead to a torrent of results 00:09:01.560 |
that will impact everything in theoretical science 00:09:07.920 |
As the co-founder of XAI and former Googler put it, 00:09:13.400 |
He said it's not easily generalizable to other domains 00:09:24.080 |
But speaking of alpha code and open sourcing, 00:09:31.280 |
and is claimed to beat AlphaCode2 without fine tuning. 00:09:35.000 |
All the relevant links will be in the description. 00:09:37.520 |
But there's another reason why I bring it up in this video, 00:09:40.000 |
not just that it's brand new and state of the art, 00:09:42.560 |
but it's also that same theme of LLM's proposing solutions 00:09:46.380 |
and iterating based on feedback from the environment. 00:10:11.680 |
I'm discovering the same thing as the authors 00:10:20.920 |
is that if you force an LLM into an immediate answer, 00:10:24.140 |
it will then pick an answer and then stick to it. 00:10:33.560 |
That's probably why chain of thought works so well. 00:10:36.040 |
Here's a great summary from Santiago on Twitter. 00:10:39.160 |
"First, AlphaCodeum gets the LLM and its model agnostic 00:10:45.760 |
and focus on the goal, inputs, outputs, rules, et cetera. 00:10:48.600 |
Then make the model reason about the tests it would need. 00:10:53.560 |
in order of correctness, simplicity, and robustness. 00:10:56.340 |
Now generate more diverse tests for edge cases." 00:11:04.840 |
If the tests fail, improve the code and repeat the process. 00:11:08.160 |
I can't help but notice that this is eerily reminiscent 00:11:14.480 |
but what it involved was commanding the model 00:11:22.580 |
Then I would force it to come up with test cases. 00:11:25.200 |
And the rest of the steps I might cover in another video, 00:11:35.920 |
It's almost like you're forcing it to reason logically. 00:11:44.640 |
compared to direct prompting across a range of models. 00:11:56.880 |
just keeps occurring again and again in the literature. 00:12:01.200 |
And if you haven't seen my video on that, do check it out. 00:12:06.260 |
and these would be tested in a simulated environment 00:12:10.840 |
And even the notorious LLM skeptic, Professor Ralph, 00:12:29.400 |
and additionally show that external verifiers 00:12:31.480 |
can help provide feedback on the generated plans 00:12:34.260 |
and back prompt the LLM for better plan generation." 00:12:49.040 |
on its implications for embodiment and robotics. 00:12:51.680 |
I also interviewed Professor Rao for this video 00:12:54.640 |
on reasoning as the Holy Grail for artificial intelligence. 00:12:58.320 |
While we're here though, I can't resist mentioning 00:13:00.800 |
that I also released this video tonight on AI Insiders. 00:13:04.640 |
Basically, it's my attempt through analyzing five papers 00:13:19.080 |
He's an AI Insider himself and one of the benefits 00:13:25.920 |
The best of these I'll talk about on the main channel, 00:13:32.120 |
who is a cybersecurity consultant based in London. 00:13:47.820 |
these amazing detailed diagrams to explain certain topics. 00:13:51.700 |
If you wanna know what I mean, check out his channel. 00:14:02.540 |
But what about their workers at Google DeepMind? 00:14:04.780 |
No, those workers, they are spending hundreds of thousands 00:14:07.660 |
to millions of dollars to keep them at Google. 00:14:13.500 |
of Google's Gemini contributors since October. 00:14:16.340 |
Indeed, money-wise, I would say things are heating up 00:14:20.700 |
I imagine Samsung have signed a multi-billion dollar contract 00:14:24.180 |
to get access to Google Gemini models in their smartphones. 00:14:27.900 |
And apparently, Samsung will be among the first partners 00:14:34.980 |
are definitely not AGI, but neither is the race to AGI 00:14:41.820 |
Thank you so much for watching and have a wonderful day.