back to index'Pause Giant AI Experiments' - Letter Breakdown w/ Research Papers, Altman, Sutskever and more
Chapters
0:0 Intro
1:43 Who signed
2:36 Current worries
3:14 AI alignment
4:3 Demis Hasabis
4:27 Emad Mustang
5:12 X Risk Analysis
8:16 Supplementary Diagram
8:46 Super Intelligence
9:26 Max Tegmark
10:47 AI Safety Statement
11:15 Human Extinction
12:44 Googles Response
13:39 Conclusion
00:00:00.160 |
Less than 18 hours ago this letter was published calling for an immediate pause in training 00:00:05.920 |
AI systems more powerful than GPT-4. By now you will have seen the headlines about it 00:00:11.520 |
waving around eye-catching names such as Elon Musk but I want to show you not only what the letter 00:00:16.720 |
says but also the research behind it. The letter quotes 18 supporting documents and I have either 00:00:22.560 |
gone through or entirely read all of them. You will also hear from those at the top of OpenAI 00:00:28.480 |
and Google on their thoughts. Whether you agree or disagree with the letter I hope you learn 00:00:33.120 |
something. So what did it say? First they described the situation as AI labs locked in an out of 00:00:38.640 |
control race to develop and deploy ever more powerful digital minds that no one, not even their 00:00:44.800 |
creators, can understand predict or reliably control. They ask just because we can should we automate 00:00:51.680 |
away all the jobs including the fulfilling ones and other questions like should we risk loss of control 00:00:56.960 |
of our civilization. So what's their main ask? Well they quote OpenAI's AGI document. At some point 00:01:04.800 |
it may be important to get independent review before starting to train future systems and for 00:01:10.320 |
the most advanced efforts to agree to limit the rate of growth of compute used for creating new 00:01:15.520 |
models and they say we agree that point is now. And here is their call: 00:01:20.240 |
"Therefore we call on all AI labs to immediately pause for at least six months 00:01:25.440 |
the training of AI systems more powerful than GPT-4. Notice that they are not saying shut down 00:01:31.200 |
GPT-4 just saying don't train anything smarter or more advanced than GPT-4. They go on if such 00:01:38.080 |
a pause cannot be enacted quickly governments should step in and institute a moratorium. I will 00:01:43.280 |
come back to some other details in the letter later on but first let's glance at some of the 00:01:48.400 |
eye-catching names who have signed this document. We have Stuart Russell who wrote the textbook on AI and Joshua Bengio, 00:01:53.920 |
who pioneered deep learning. Among many other famous names we have the founder of stability AI 00:02:01.280 |
which is behind stable diffusion. Of course I could go on and on but we also have names like 00:02:06.240 |
Max Tegmark arguably one of the smartest people on the planet and if you notice below plenty of 00:02:11.760 |
researchers at DeepMind. But before you dismiss this as a bunch of outsiders this is what Sam 00:02:17.760 |
Altman once wrote in his blog. Many people seem to believe that superhuman machine intelligence 00:02:23.360 |
would be very dangerous if it were developed but think that it's either never going to happen or 00:02:28.640 |
definitely very far off. This is sloppy dangerous thinking. And a few days ago on the Lex Friedman 00:02:35.120 |
podcast he said this: "I think it's weird when people like think it's like a big dunk that I say 00:02:39.440 |
like I'm a little bit afraid and I think it'd be crazy not to be a little bit afraid 00:02:44.400 |
and I empathize with people who are a lot afraid. Current worries that I have 00:02:48.320 |
are that they're going to be disinformation problems or economic shocks or something else 00:02:55.680 |
at a level far beyond anything we're prepared for and that doesn't require super intelligence 00:03:02.080 |
that doesn't require a super deep alignment problem in the machine waking up and trying 00:03:06.000 |
to deceive us and I don't think that gets enough attention. I mean it's starting to get more I guess. 00:03:14.080 |
Before you think that's just Sam Altman being Sam Altman here's Ilya Satskova who arguably is the brains behind OpenAI and GPT-4. 00:03:22.960 |
As somebody who deeply understands these models what is your intuition of how hard alignment will be? 00:03:27.040 |
Like I think with the so here's what I would say I think with the current level of 00:03:30.400 |
capabilities I think we have a pretty good set of ideas of how to align them 00:03:33.440 |
but I would not underestimate the difficulty of alignment of models 00:03:37.920 |
that are actually smarter than us of models that are capable of misrepresenting their intentions. 00:03:43.520 |
By alignment he means matching up the goal of AI systems with our own and at this point I do want 00:03:49.280 |
to say that there are reasons to have hope on AI alignment and many many people are working on it. 00:03:54.720 |
I just don't want anyone to underestimate the scale of the task or to think it's just a bunch 00:04:00.320 |
of outsiders not the creators themselves. Here was a recent interview by Time magazine with Demis 00:04:06.880 |
Hassabis who many people say I sound like. He is the founder of course of DeepMind who are also at 00:04:11.600 |
the cutting edge of large language development. He's also the founder of the company that I'm 00:04:13.280 |
working with and he's been working on a lot of these things for a long time. He says when it 00:04:14.560 |
comes to very powerful technologies and obviously AI is going to be one of the most powerful ever 00:04:19.680 |
we need to be careful. Not everybody is thinking about those things. It's like experimentalists 00:04:24.480 |
many of whom don't realise they're holding dangerous material. And again Emad Mostak I 00:04:29.200 |
don't agree with everything in the letter but the race condition ramping as H100s come along 00:04:37.040 |
is not safe for something the creators consider as potentially an existential risk. Time to 00:04:43.040 |
take a breath, coordinate and carry on. This is only for the largest models. He went on that these 00:04:49.120 |
models can get weird as they get more powerful. So it's not just AI outsiders but what about the 00:04:54.960 |
research they cite? Those 18 supporting documents that I referred to? Well I read each of them. 00:05:00.160 |
Now for some of them I had already read them. Like the Sparks report that I did a video on 00:05:05.040 |
and the GPT-4 technical report that I also did a video on. Some others like the 00:05:09.120 |
super intelligence book by Bostrom I had read when it first came out. 00:05:12.800 |
One of the papers was called X-Risk Analysis for AI Research which are risks that threaten 00:05:17.680 |
the entirety of humanity. Of course the paper had way too much to cover in one video but it 00:05:22.560 |
did lay out 8 speculative hazards and failure modes including AI weaponisation, deception, 00:05:28.880 |
power seeking behaviour. In the appendix they give some examples. 00:05:32.480 |
Some are concerned that weaponising AI may be an on-ramp to more dangerous outcomes. 00:05:37.600 |
In recent years deep reinforcement learning algorithms can outperform humans at aerial 00:05:42.560 |
combat. While AlphaFold has discovered new chemical weapons and they go on to give plenty 00:05:47.440 |
more examples of weaponisation. What about deception? I found this part interesting. 00:05:51.520 |
They say that AI systems could also have incentives to bypass monitors and draw an analogy with 00:05:56.560 |
Volkswagen who program their engines to reduce emissions only when being monitored. It says that 00:06:02.000 |
future AI agents could similarly switch strategies when being monitored and take steps to obscure 00:06:06.880 |
their deception from monitors. With power seeking behaviour they say it has been shown that agents have incentives 00:06:12.320 |
to acquire and maintain power. And they end with this geopolitical quote: 00:06:16.960 |
"Whoever becomes the leader in AI will become the ruler of the world." 00:06:20.720 |
But again you might wonder if all of the research that was cited comes from outsiders. Well no. 00:06:25.920 |
Richard Ngou was the lead author of this paper and he currently works at OpenAI. It's a fascinating 00:06:31.440 |
document on the alignment problem from a deep learning perspective from insiders working with 00:06:36.480 |
these models. The author was the guy who wrote this yesterday on Twitter: "I predict that by the 00:06:41.280 |
end of 2021, the AI system will be able to take advantage of the current technology and the 00:06:42.080 |
technology of the AI system. I believe that the AI system will be able to take advantage of the 00:06:44.640 |
technology of the AI system and the AI system will be able to take advantage of the AI system. 00:06:46.960 |
I believe that the AI system will be able to take advantage of the AI system and the AI system will 00:06:49.840 |
be able to take advantage of the AI system and the AI system will be able to take advantage of the 00:06:52.960 |
AI system and the AI system will be able to take advantage of the AI system and the AI system will 00:06:54.320 |
be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system 00:06:56.800 |
and the AI system will be able to take advantage of the AI system and the AI system will be able to take 00:06:59.200 |
advantage of the AI system and the AI system will be able to take advantage of the AI system and the 00:07:00.960 |
AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of 00:07:02.800 |
Well many things but I have picked out some of the most interesting. 00:07:05.860 |
It gave an example of reward hacking where an algorithm learnt to trick humans to get 00:07:11.600 |
The task was to grab a ball with a claw and it says that the policy instead learnt to 00:07:16.380 |
place the claw between the camera and the ball in a way that it looked like it was grasping 00:07:21.820 |
the ball and therefore mistakenly received high reward from human supervisors. 00:07:29.620 |
Of course it didn't mean to deceive it was just maximising its reward function. 00:07:33.980 |
Next the paper gives details about why these models might want to seek power. 00:07:38.440 |
It quotes the memorable phrase "you can't fetch coffee if you're dead" implying that 00:07:42.820 |
even a policy or an algorithm with a simple goal like fetching coffee would pursue survival 00:07:50.860 |
In other words the model might realise that if it can't survive it can't achieve its 00:07:54.620 |
reward it can't reach the goal that the humans set for it and therefore it will try 00:07:59.600 |
Now I know many people will feel that I'm not covering enough of these fears or covering 00:08:03.920 |
too many of them but I agree with the authors when they conclude with this "Reasoning 00:08:08.280 |
about these topics is difficult but the stakes are sufficiently high that we cannot justify 00:08:16.060 |
Towards the end of this paper which was also cited by the letter it gave a very helpful 00:08:22.660 |
It showed that even if you don't believe that unaligned AGI is a threat even current 00:08:29.580 |
It also showed that the process could complicate so many other relationships and dynamics. 00:08:33.280 |
State to state relations, state to citizen relations, it could complicate social media 00:08:37.760 |
and recommender systems, it could give the state too much control over citizens and corporations 00:08:42.620 |
like Microsoft and Google too much leverage against the state. 00:08:46.180 |
Before I get to some reasons for hope I want to touch on that seminal book superintelligence 00:08:52.340 |
I read it almost a decade ago and this quote sticks out: 00:08:55.520 |
"Before the prospect of an intelligence explosion, we humans are like small trees 00:09:01.260 |
Such is the mismatch between the power of our plaything and the immaturity of our conduct. 00:09:06.700 |
Superintelligence is a challenge for which we are not ready now and will not be ready 00:09:11.820 |
We have little idea when the detonation will occur though if we hold the device to our 00:09:19.740 |
But now let's move on to Max Tegmark one of the signatories and a top physicist and 00:09:30.540 |
"I think the most unsafe and reckless approach is the alternative to that is intelligible 00:09:41.440 |
Where we say neural networks is just a tool for the first step to get the intuition but 00:09:47.260 |
then we're going to spend also serious resources on other AI techniques for demystifying this 00:09:53.900 |
black box and figuring out what it's actually doing so we can convert it into something 00:09:59.520 |
But that we actually understand what it's doing." 00:10:02.100 |
This aligns directly with what Ilya Sutskova, the Open AI chief scientist believes needs 00:10:08.240 |
"Do you think we'll ever have a mathematical definition of alignment?" 00:10:11.900 |
"Mathematical definition I think is unlikely. 00:10:16.760 |
I do think that we will instead have multiple, rather than achieving one mathematical definition, 00:10:23.100 |
I think we'll achieve multiple definitions that look at alignment from different aspects. 00:10:31.260 |
And by which I mean you can look at the behavior. 00:10:33.620 |
You can look at the behavior in various tests, in various adversarial stress situations. 00:10:39.920 |
You can look at how the neural net operates from the inside. 00:10:42.780 |
I think you have to look at several of these factors at the same time." 00:10:49.460 |
Here is the AI safety statement from Anthropic, a huge player in this industry. 00:10:54.460 |
In the section on mechanistic interpretability, which is understanding the machines, they 00:10:59.480 |
"We also understand significantly more about the mechanisms of neural network computation 00:11:05.140 |
than we did even a year ago, such as those responsible for memorization." 00:11:09.020 |
So progress is being made, but even if there's only a tiny risk of existential harm, more 00:11:15.280 |
The co-founders of the Center for Humane Technology put it like this: 00:11:19.460 |
"It would be the worst of all human mistakes to have ever been made. 00:11:25.200 |
We don't know all the things it will do, and we're putting it out there before we 00:11:31.220 |
Raskin points to a recent survey of AI researchers, where nearly half said they believe there's 00:11:37.420 |
at least a 10 percent chance AI could eventually result in an extremely bad outcome, like human 00:11:54.180 |
Imagine you're about to get on an airplane, and 50 percent of the engineers that built 00:11:58.320 |
the airplane say there's a 10 percent chance that it's safe. 00:11:59.440 |
And that's a 10 percent chance that their plane might crash and kill everyone." 00:12:05.400 |
Here is the survey from last year of hundreds of AI researchers. 00:12:09.560 |
And you can contrast that with a similar survey from seven years ago. 00:12:13.060 |
The black bar represents the proportion of these researchers who believe, to differing 00:12:17.360 |
degrees of probability, in extremely bad outcomes. 00:12:20.720 |
You can see that it's small, but it is rising. 00:12:22.820 |
One way to think of this is to use Sam Altman's own example of the Fermi Paradox, which is 00:12:27.600 |
the strange fact that we can't see the future. 00:12:31.440 |
He says, "One of my top four favorite explanations for the Fermi Paradox is that biological intelligence 00:12:36.980 |
always eventually creates machine intelligence, which wipes out biological life and then for 00:12:41.540 |
some reason decides to make itself undetectable." 00:12:44.240 |
Others, such as Dustin Tran at Google, are not as impressed. 00:12:48.360 |
He refers to the letter and says, "This call has valid concerns but is logistically impossible. 00:12:54.900 |
He is a research scientist at Google Brain and the evaluation lead for BARD. 00:12:59.400 |
There was another, indirect reaction that I found interesting. 00:13:02.620 |
One of the other books referenced was the alignment problem: machine learning and human 00:13:07.240 |
Now long before the letter even came out, the CEO of Microsoft read that book and gave 00:13:12.820 |
Nadella says that Christian offers a clear and compelling description and says that machines 00:13:17.920 |
that learn for themselves become increasingly autonomous and potentially unethical. 00:13:22.700 |
My next video is going to be on the reflection paper and how models like GPT-4 can teach 00:13:29.380 |
I'm working with the co-author of that paper to give you guys more of an overview. 00:13:33.160 |
Because even Nadella admits that if they learn for themselves and become autonomous it could 00:13:39.060 |
The letter concludes on a more optimistic note. 00:13:41.560 |
They say, "This does not mean a pause on AI development in general, merely a stepping 00:13:46.560 |
back from the dangerous race to ever larger, unpredictable black box models with emergent 00:13:54.720 |
I've got so much more to say on self-teaching but that will have to wait until the next video. 00:14:01.260 |
Let's enjoy a long AI summer, not rush unprepared into a fall. 00:14:06.420 |
Thanks for watching all the way to the end and let me know what you think.