back to indexStuart Russell: The Control Problem of Super-Intelligent AI | AI Podcast Clips
Chapters
0:0 Control Problem
1:43 King Midas Problem
4:8 Human Values
00:00:00.000 |
Let's just talk about maybe the control problem. 00:00:04.440 |
So this idea of losing ability to control the behavior in our AI system. 00:00:19.920 |
Well, so it doesn't take a genius to realize that if you make something that's smarter 00:00:29.080 |
You know, Alan Turing wrote about this and gave lectures about this in 1951. 00:00:39.080 |
He did a lecture on the radio, and he basically says, you know, once the machine thinking 00:00:46.320 |
method starts, very quickly they'll outstrip humanity. 00:00:53.120 |
And you know, if we're lucky we might be able to, I think he says, if we may be able to 00:00:59.480 |
turn off the power at strategic moments, but even so our species would be humbled. 00:01:07.880 |
Because you know, if it's a sufficiently intelligent machine, it's not going to let you switch 00:01:14.680 |
So what do you think is meant, just for a quick tangent, if we shut off this superintelligent 00:01:24.120 |
I think he means that we would realize that we are inferior, right? 00:01:30.800 |
That we only survive by the skin of our teeth because we happen to get to the off switch 00:01:38.920 |
And if we hadn't, then we would have lost control over the earth. 00:01:43.080 |
So are you more worried when you think about this stuff about superintelligent AI, or are 00:01:49.400 |
you more worried about super powerful AI that's not aligned with our values? 00:01:59.400 |
I think, so the main problem I'm working on is the control problem, the problem of machines 00:02:08.200 |
pursuing objectives that are, as you say, not aligned with human objectives. 00:02:14.600 |
And this has been the way we've thought about AI since the beginning. 00:02:22.960 |
You build a machine for optimizing, and then you put in some objective and it optimizes. 00:02:29.840 |
And we can think of this as the King Midas problem. 00:02:35.680 |
Because if the King Midas put in this objective, everything I touch should turn to gold, and 00:02:43.000 |
the gods, that's like the machine, they said, "Okay, done. 00:02:48.520 |
And of course his food and his drink and his family all turn to gold, and then he dies 00:02:56.160 |
And this is a warning, it's a failure mode that pretty much every culture in history 00:03:07.800 |
There's the genie that gives you three wishes, and the third wish is always, "Please undo 00:03:17.760 |
And when Arthur Samuel wrote his chequer playing program, which learned to play checkers considerably 00:03:25.080 |
better than Arthur Samuel could play, and actually reached a pretty decent standard, 00:03:32.480 |
Norbert Wiener, who was one of the major mathematicians of the 20th century, sort of the father of 00:03:38.360 |
modern automation control systems, he saw this and he basically extrapolated, as Turing 00:03:46.240 |
did, and said, "Okay, this is how we could lose control." 00:03:52.600 |
And specifically that we have to be certain that the purpose we put into the machine is 00:04:10.200 |
You mean it's very difficult to encode, to put our values on paper is really difficult, 00:04:23.080 |
So theoretically it's possible, but in practice it's extremely unlikely that we could specify 00:04:31.680 |
correctly in advance the full range of concerns of humanity. 00:04:37.400 |
You talked about cultural transmission of values, I think is how humans to human transmission 00:04:44.720 |
Well we learn, yeah, I mean, as we grow up we learn about the values that matter, how 00:04:51.840 |
things should go, what is reasonable to pursue and what isn't reasonable to pursue. 00:04:55.960 |
You think machines can learn in the same kind of way? 00:04:58.920 |
Yeah, so I think that what we need to do is to get away from this idea that you build 00:05:05.320 |
an optimizing machine and then you put the objective into it. 00:05:09.400 |
Because if it's possible that you might put in a wrong objective, and we already know 00:05:15.800 |
this is possible because it's happened lots of times, right? 00:05:19.000 |
That means that the machine should never take an objective that's given as gospel truth. 00:05:27.800 |
Because once it takes the objective as gospel truth, then it believes that whatever actions 00:05:36.600 |
it's taking in pursuit of that objective are the correct things to do. 00:05:40.880 |
So you could be jumping up and down and saying, "No, no, no, no, you're going to destroy the 00:05:45.240 |
But the machine knows what the true objective is and is pursuing it, and tough luck to you. 00:05:54.120 |
This is, I think, many of the 20th century technologies, right? 00:05:58.280 |
So in statistics, you minimize a loss function. 00:06:03.680 |
In control theory, you minimize a cost function. 00:06:06.680 |
In operations research, you maximize a reward function. 00:06:11.320 |
So in all these disciplines, this is how we conceive of the problem. 00:06:19.040 |
Because we cannot specify with certainty the correct objective, right? 00:06:25.480 |
We need the machine to be uncertain about what it is that it's supposed to be maximizing. 00:06:32.520 |
But favorite idea of yours, I've heard you say somewhere, well, I shouldn't pick favorites, 00:06:37.680 |
but it just sounds beautiful, of "We need to teach machines humility." 00:06:48.920 |
- In that they know that they don't know what it is they're supposed to be doing. 00:06:53.480 |
And that those objectives, I mean, they exist. 00:07:02.160 |
We may not even know how we want our future to go. 00:07:10.800 |
And a machine that's uncertain is going to be deferential to us. 00:07:18.240 |
So if we say, "Don't do that," well, now the machine's learned something, a bit more about 00:07:23.160 |
our true objectives, because something that it thought was reasonable in pursuit of our 00:07:28.360 |
objective turns out not to be, so now it's learned something. 00:07:31.200 |
So it's going to defer because it wants to be doing what we really want. 00:07:36.480 |
And that point, I think, is absolutely central to solving the control problem. 00:07:44.160 |
And it's a different kind of AI when you take away this idea that the objective is known, 00:07:52.160 |
then, in fact, a lot of the theoretical frameworks that we're so familiar with, Markov decision 00:08:01.680 |
processes, goal-based planning, you know, standard games research, all of these techniques 00:08:13.480 |
And you get a more complicated problem, because now the interaction with the human becomes 00:08:23.240 |
part of the problem, because the human, by making choices, is giving you more information 00:08:32.280 |
about the true objective, and that information helps you achieve the objective better. 00:08:39.040 |
And so that really means that you're mostly dealing with game-theoretic problems, where 00:08:44.040 |
you've got the machine and the human and they're coupled together, rather than a machine going 00:08:52.160 |
Which is fascinating on the machine and the human level, that when you don't have an objective, 00:09:00.640 |
means you're together coming up with an objective. 00:09:03.600 |
I mean, there's a lot of philosophy that, you know, you could argue that life doesn't 00:09:08.920 |
We together agree on what gives it meaning, and we kind of culturally create things that 00:09:15.640 |
give why the heck we are on this earth anyway. 00:09:19.280 |
We together as a society create that meaning, and you have to learn that objective. 00:09:23.800 |
And one of the biggest, I thought that's where you were going to go for a second, one of 00:09:28.240 |
the biggest troubles we run into outside of statistics and machine learning and AI, in 00:09:33.560 |
just human civilization, is when you look at, I was born in the Soviet Union, and the 00:09:40.840 |
history of the 20th century, we ran into the most trouble, us humans, when there was a 00:09:50.840 |
And you do whatever it takes to achieve that objective, whether you're talking about Germany 00:09:58.960 |
- Yeah, and I would say with corporations, in fact, some people argue that we don't have 00:10:05.160 |
to look forward to a time when AI systems take over the world. 00:10:08.560 |
They already have, and they're called corporations. 00:10:12.040 |
Corporations happen to be using people as components right now, but they are effectively 00:10:20.120 |
algorithmic machines, and they're optimizing an objective, which is quarterly profit that 00:10:25.760 |
isn't aligned with overall well-being of the human race, and they are destroying the world. 00:10:31.880 |
They are primarily responsible for our inability to tackle climate change. 00:10:37.280 |
So I think that's one way of thinking about what's going on with corporations. 00:10:42.640 |
But I think the point you're making is valid, that there are many systems in the real world 00:10:51.520 |
where we've sort of prematurely fixed on the objective and then decoupled the machine from 00:11:06.400 |
Government is supposed to be a machine that serves people, but instead it tends to be 00:11:12.000 |
taken over by people who have their own objective and use government to optimize that objective,