back to index

'Pause Giant AI Experiments' - Letter Breakdown w/ Research Papers, Altman, Sutskever and more


Chapters

0:0 Intro
1:43 Who signed
2:36 Current worries
3:14 AI alignment
4:3 Demis Hasabis
4:27 Emad Mustang
5:12 X Risk Analysis
8:16 Supplementary Diagram
8:46 Super Intelligence
9:26 Max Tegmark
10:47 AI Safety Statement
11:15 Human Extinction
12:44 Googles Response
13:39 Conclusion

Whisper Transcript | Transcript Only Page

00:00:00.160 | Less than 18 hours ago this letter was published calling for an immediate pause in training
00:00:05.920 | AI systems more powerful than GPT-4. By now you will have seen the headlines about it
00:00:11.520 | waving around eye-catching names such as Elon Musk but I want to show you not only what the letter
00:00:16.720 | says but also the research behind it. The letter quotes 18 supporting documents and I have either
00:00:22.560 | gone through or entirely read all of them. You will also hear from those at the top of OpenAI
00:00:28.480 | and Google on their thoughts. Whether you agree or disagree with the letter I hope you learn
00:00:33.120 | something. So what did it say? First they described the situation as AI labs locked in an out of
00:00:38.640 | control race to develop and deploy ever more powerful digital minds that no one, not even their
00:00:44.800 | creators, can understand predict or reliably control. They ask just because we can should we automate
00:00:51.680 | away all the jobs including the fulfilling ones and other questions like should we risk loss of control
00:00:56.960 | of our civilization. So what's their main ask? Well they quote OpenAI's AGI document. At some point
00:01:04.800 | it may be important to get independent review before starting to train future systems and for
00:01:10.320 | the most advanced efforts to agree to limit the rate of growth of compute used for creating new
00:01:15.520 | models and they say we agree that point is now. And here is their call:
00:01:20.240 | "Therefore we call on all AI labs to immediately pause for at least six months
00:01:25.440 | the training of AI systems more powerful than GPT-4. Notice that they are not saying shut down
00:01:31.200 | GPT-4 just saying don't train anything smarter or more advanced than GPT-4. They go on if such
00:01:38.080 | a pause cannot be enacted quickly governments should step in and institute a moratorium. I will
00:01:43.280 | come back to some other details in the letter later on but first let's glance at some of the
00:01:48.400 | eye-catching names who have signed this document. We have Stuart Russell who wrote the textbook on AI and Joshua Bengio,
00:01:53.920 | who pioneered deep learning. Among many other famous names we have the founder of stability AI
00:02:01.280 | which is behind stable diffusion. Of course I could go on and on but we also have names like
00:02:06.240 | Max Tegmark arguably one of the smartest people on the planet and if you notice below plenty of
00:02:11.760 | researchers at DeepMind. But before you dismiss this as a bunch of outsiders this is what Sam
00:02:17.760 | Altman once wrote in his blog. Many people seem to believe that superhuman machine intelligence
00:02:23.360 | would be very dangerous if it were developed but think that it's either never going to happen or
00:02:28.640 | definitely very far off. This is sloppy dangerous thinking. And a few days ago on the Lex Friedman
00:02:35.120 | podcast he said this: "I think it's weird when people like think it's like a big dunk that I say
00:02:39.440 | like I'm a little bit afraid and I think it'd be crazy not to be a little bit afraid
00:02:44.400 | and I empathize with people who are a lot afraid. Current worries that I have
00:02:48.320 | are that they're going to be disinformation problems or economic shocks or something else
00:02:55.680 | at a level far beyond anything we're prepared for and that doesn't require super intelligence
00:03:02.080 | that doesn't require a super deep alignment problem in the machine waking up and trying
00:03:06.000 | to deceive us and I don't think that gets enough attention. I mean it's starting to get more I guess.
00:03:14.080 | Before you think that's just Sam Altman being Sam Altman here's Ilya Satskova who arguably is the brains behind OpenAI and GPT-4.
00:03:22.960 | As somebody who deeply understands these models what is your intuition of how hard alignment will be?
00:03:27.040 | Like I think with the so here's what I would say I think with the current level of
00:03:30.400 | capabilities I think we have a pretty good set of ideas of how to align them
00:03:33.440 | but I would not underestimate the difficulty of alignment of models
00:03:37.920 | that are actually smarter than us of models that are capable of misrepresenting their intentions.
00:03:43.520 | By alignment he means matching up the goal of AI systems with our own and at this point I do want
00:03:49.280 | to say that there are reasons to have hope on AI alignment and many many people are working on it.
00:03:54.720 | I just don't want anyone to underestimate the scale of the task or to think it's just a bunch
00:04:00.320 | of outsiders not the creators themselves. Here was a recent interview by Time magazine with Demis
00:04:06.880 | Hassabis who many people say I sound like. He is the founder of course of DeepMind who are also at
00:04:11.600 | the cutting edge of large language development. He's also the founder of the company that I'm
00:04:13.280 | working with and he's been working on a lot of these things for a long time. He says when it
00:04:14.560 | comes to very powerful technologies and obviously AI is going to be one of the most powerful ever
00:04:19.680 | we need to be careful. Not everybody is thinking about those things. It's like experimentalists
00:04:24.480 | many of whom don't realise they're holding dangerous material. And again Emad Mostak I
00:04:29.200 | don't agree with everything in the letter but the race condition ramping as H100s come along
00:04:37.040 | is not safe for something the creators consider as potentially an existential risk. Time to
00:04:43.040 | take a breath, coordinate and carry on. This is only for the largest models. He went on that these
00:04:49.120 | models can get weird as they get more powerful. So it's not just AI outsiders but what about the
00:04:54.960 | research they cite? Those 18 supporting documents that I referred to? Well I read each of them.
00:05:00.160 | Now for some of them I had already read them. Like the Sparks report that I did a video on
00:05:05.040 | and the GPT-4 technical report that I also did a video on. Some others like the
00:05:09.120 | super intelligence book by Bostrom I had read when it first came out.
00:05:12.800 | One of the papers was called X-Risk Analysis for AI Research which are risks that threaten
00:05:17.680 | the entirety of humanity. Of course the paper had way too much to cover in one video but it
00:05:22.560 | did lay out 8 speculative hazards and failure modes including AI weaponisation, deception,
00:05:28.880 | power seeking behaviour. In the appendix they give some examples.
00:05:32.480 | Some are concerned that weaponising AI may be an on-ramp to more dangerous outcomes.
00:05:37.600 | In recent years deep reinforcement learning algorithms can outperform humans at aerial
00:05:42.560 | combat. While AlphaFold has discovered new chemical weapons and they go on to give plenty
00:05:47.440 | more examples of weaponisation. What about deception? I found this part interesting.
00:05:51.520 | They say that AI systems could also have incentives to bypass monitors and draw an analogy with
00:05:56.560 | Volkswagen who program their engines to reduce emissions only when being monitored. It says that
00:06:02.000 | future AI agents could similarly switch strategies when being monitored and take steps to obscure
00:06:06.880 | their deception from monitors. With power seeking behaviour they say it has been shown that agents have incentives
00:06:12.320 | to acquire and maintain power. And they end with this geopolitical quote:
00:06:16.960 | "Whoever becomes the leader in AI will become the ruler of the world."
00:06:20.720 | But again you might wonder if all of the research that was cited comes from outsiders. Well no.
00:06:25.920 | Richard Ngou was the lead author of this paper and he currently works at OpenAI. It's a fascinating
00:06:31.440 | document on the alignment problem from a deep learning perspective from insiders working with
00:06:36.480 | these models. The author was the guy who wrote this yesterday on Twitter: "I predict that by the
00:06:41.280 | end of 2021, the AI system will be able to take advantage of the current technology and the
00:06:42.080 | technology of the AI system. I believe that the AI system will be able to take advantage of the
00:06:44.640 | technology of the AI system and the AI system will be able to take advantage of the AI system.
00:06:46.960 | I believe that the AI system will be able to take advantage of the AI system and the AI system will
00:06:49.840 | be able to take advantage of the AI system and the AI system will be able to take advantage of the
00:06:52.960 | AI system and the AI system will be able to take advantage of the AI system and the AI system will
00:06:54.320 | be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system
00:06:56.800 | and the AI system will be able to take advantage of the AI system and the AI system will be able to take
00:06:59.200 | advantage of the AI system and the AI system will be able to take advantage of the AI system and the
00:07:00.960 | AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of
00:07:02.800 | Well many things but I have picked out some of the most interesting.
00:07:05.860 | It gave an example of reward hacking where an algorithm learnt to trick humans to get
00:07:10.600 | good feedback.
00:07:11.600 | The task was to grab a ball with a claw and it says that the policy instead learnt to
00:07:16.380 | place the claw between the camera and the ball in a way that it looked like it was grasping
00:07:21.820 | the ball and therefore mistakenly received high reward from human supervisors.
00:07:27.260 | Essentially deception to maximise reward.
00:07:29.620 | Of course it didn't mean to deceive it was just maximising its reward function.
00:07:33.980 | Next the paper gives details about why these models might want to seek power.
00:07:38.440 | It quotes the memorable phrase "you can't fetch coffee if you're dead" implying that
00:07:42.820 | even a policy or an algorithm with a simple goal like fetching coffee would pursue survival
00:07:48.980 | as an instrumental sub goal.
00:07:50.860 | In other words the model might realise that if it can't survive it can't achieve its
00:07:54.620 | reward it can't reach the goal that the humans set for it and therefore it will try
00:07:58.900 | to survive.
00:07:59.600 | Now I know many people will feel that I'm not covering enough of these fears or covering
00:08:03.920 | too many of them but I agree with the authors when they conclude with this "Reasoning
00:08:08.280 | about these topics is difficult but the stakes are sufficiently high that we cannot justify
00:08:13.220 | disregarding or postponing the work."
00:08:16.060 | Towards the end of this paper which was also cited by the letter it gave a very helpful
00:08:21.100 | supplementary diagram.
00:08:22.660 | It showed that even if you don't believe that unaligned AGI is a threat even current
00:08:27.620 | and near term AI complicate the process.
00:08:29.580 | It also showed that the process could complicate so many other relationships and dynamics.
00:08:33.280 | State to state relations, state to citizen relations, it could complicate social media
00:08:37.760 | and recommender systems, it could give the state too much control over citizens and corporations
00:08:42.620 | like Microsoft and Google too much leverage against the state.
00:08:46.180 | Before I get to some reasons for hope I want to touch on that seminal book superintelligence
00:08:51.340 | by Bostrom.
00:08:52.340 | I read it almost a decade ago and this quote sticks out:
00:08:55.520 | "Before the prospect of an intelligence explosion, we humans are like small trees
00:08:59.560 | and children playing with a bomb.
00:09:01.260 | Such is the mismatch between the power of our plaything and the immaturity of our conduct.
00:09:06.700 | Superintelligence is a challenge for which we are not ready now and will not be ready
00:09:10.420 | for a long time.
00:09:11.820 | We have little idea when the detonation will occur though if we hold the device to our
00:09:16.900 | ear we can hear a faint ticking sound."
00:09:19.740 | But now let's move on to Max Tegmark one of the signatories and a top physicist and
00:09:24.780 | AI researcher at MIT.
00:09:29.540 | Max Tegmark:
00:09:30.540 | "I think the most unsafe and reckless approach is the alternative to that is intelligible
00:09:39.520 | intelligence approach instead.
00:09:41.440 | Where we say neural networks is just a tool for the first step to get the intuition but
00:09:47.260 | then we're going to spend also serious resources on other AI techniques for demystifying this
00:09:53.900 | black box and figuring out what it's actually doing so we can convert it into something
00:09:58.380 | that's equally intelligent.
00:09:59.520 | But that we actually understand what it's doing."
00:10:02.100 | This aligns directly with what Ilya Sutskova, the Open AI chief scientist believes needs
00:10:07.240 | to be done.
00:10:08.240 | "Do you think we'll ever have a mathematical definition of alignment?"
00:10:11.900 | "Mathematical definition I think is unlikely.
00:10:16.760 | I do think that we will instead have multiple, rather than achieving one mathematical definition,
00:10:23.100 | I think we'll achieve multiple definitions that look at alignment from different aspects.
00:10:29.500 | We'll get the assurance that we want.
00:10:31.260 | And by which I mean you can look at the behavior.
00:10:33.620 | You can look at the behavior in various tests, in various adversarial stress situations.
00:10:39.920 | You can look at how the neural net operates from the inside.
00:10:42.780 | I think you have to look at several of these factors at the same time."
00:10:47.180 | And there are people working on this.
00:10:49.460 | Here is the AI safety statement from Anthropic, a huge player in this industry.
00:10:54.460 | In the section on mechanistic interpretability, which is understanding the machines, they
00:10:58.920 | say this:
00:10:59.480 | "We also understand significantly more about the mechanisms of neural network computation
00:11:05.140 | than we did even a year ago, such as those responsible for memorization."
00:11:09.020 | So progress is being made, but even if there's only a tiny risk of existential harm, more
00:11:14.280 | needs to be done.
00:11:15.280 | The co-founders of the Center for Humane Technology put it like this:
00:11:19.460 | "It would be the worst of all human mistakes to have ever been made.
00:11:23.240 | And we literally don't know how it works.
00:11:25.200 | We don't know all the things it will do, and we're putting it out there before we
00:11:29.460 | actually know whether it's safe."
00:11:31.220 | Raskin points to a recent survey of AI researchers, where nearly half said they believe there's
00:11:37.420 | at least a 10 percent chance AI could eventually result in an extremely bad outcome, like human
00:11:45.420 | extinction.
00:11:46.420 | "Where do you come down on that?"
00:11:48.180 | "I don't know.
00:11:49.180 | The point is..."
00:11:50.180 | "That scares me, you don't know."
00:11:52.180 | "Yeah.
00:11:53.180 | Here's the point.
00:11:54.180 | Imagine you're about to get on an airplane, and 50 percent of the engineers that built
00:11:58.320 | the airplane say there's a 10 percent chance that it's safe.
00:11:59.440 | And that's a 10 percent chance that their plane might crash and kill everyone."
00:12:03.400 | "Leave me at the gate."
00:12:04.400 | "Exactly."
00:12:05.400 | Here is the survey from last year of hundreds of AI researchers.
00:12:09.560 | And you can contrast that with a similar survey from seven years ago.
00:12:13.060 | The black bar represents the proportion of these researchers who believe, to differing
00:12:17.360 | degrees of probability, in extremely bad outcomes.
00:12:20.720 | You can see that it's small, but it is rising.
00:12:22.820 | One way to think of this is to use Sam Altman's own example of the Fermi Paradox, which is
00:12:27.600 | the strange fact that we can't see the future.
00:12:29.420 | We can't see or detect any aliens.
00:12:31.440 | He says, "One of my top four favorite explanations for the Fermi Paradox is that biological intelligence
00:12:36.980 | always eventually creates machine intelligence, which wipes out biological life and then for
00:12:41.540 | some reason decides to make itself undetectable."
00:12:44.240 | Others, such as Dustin Tran at Google, are not as impressed.
00:12:48.360 | He refers to the letter and says, "This call has valid concerns but is logistically impossible.
00:12:53.500 | It's hard to take seriously."
00:12:54.900 | He is a research scientist at Google Brain and the evaluation lead for BARD.
00:12:59.400 | There was another, indirect reaction that I found interesting.
00:13:02.620 | One of the other books referenced was the alignment problem: machine learning and human
00:13:06.240 | values.
00:13:07.240 | Now long before the letter even came out, the CEO of Microsoft read that book and gave
00:13:11.820 | this review.
00:13:12.820 | Nadella says that Christian offers a clear and compelling description and says that machines
00:13:17.920 | that learn for themselves become increasingly autonomous and potentially unethical.
00:13:22.700 | My next video is going to be on the reflection paper and how models like GPT-4 can teach
00:13:27.840 | themselves.
00:13:29.380 | I'm working with the co-author of that paper to give you guys more of an overview.
00:13:33.160 | Because even Nadella admits that if they learn for themselves and become autonomous it could
00:13:38.060 | be unethical.
00:13:39.060 | The letter concludes on a more optimistic note.
00:13:41.560 | They say, "This does not mean a pause on AI development in general, merely a stepping
00:13:46.560 | back from the dangerous race to ever larger, unpredictable black box models with emergent
00:13:52.640 | capabilities like self-teaching."
00:13:54.720 | I've got so much more to say on self-teaching but that will have to wait until the next video.
00:13:59.360 | For now though, let's end on this note.
00:14:01.260 | Let's enjoy a long AI summer, not rush unprepared into a fall.
00:14:06.420 | Thanks for watching all the way to the end and let me know what you think.