back to index

Stuart Russell: The Control Problem of Super-Intelligent AI | AI Podcast Clips


Chapters

0:0 Control Problem
1:43 King Midas Problem
4:8 Human Values

Whisper Transcript | Transcript Only Page

00:00:00.000 | Let's just talk about maybe the control problem.
00:00:04.440 | So this idea of losing ability to control the behavior in our AI system.
00:00:12.520 | So how do you see that?
00:00:14.920 | How do you see that coming about?
00:00:16.780 | What do you think we can do to manage it?
00:00:19.920 | Well, so it doesn't take a genius to realize that if you make something that's smarter
00:00:25.480 | than you, you might have a problem.
00:00:29.080 | You know, Alan Turing wrote about this and gave lectures about this in 1951.
00:00:39.080 | He did a lecture on the radio, and he basically says, you know, once the machine thinking
00:00:46.320 | method starts, very quickly they'll outstrip humanity.
00:00:53.120 | And you know, if we're lucky we might be able to, I think he says, if we may be able to
00:00:59.480 | turn off the power at strategic moments, but even so our species would be humbled.
00:01:04.480 | And actually he was wrong about that, right?
00:01:07.880 | Because you know, if it's a sufficiently intelligent machine, it's not going to let you switch
00:01:11.360 | it off.
00:01:13.000 | It's actually in competition with you.
00:01:14.680 | So what do you think is meant, just for a quick tangent, if we shut off this superintelligent
00:01:20.880 | machine, that our species will be humbled?
00:01:24.120 | I think he means that we would realize that we are inferior, right?
00:01:30.800 | That we only survive by the skin of our teeth because we happen to get to the off switch
00:01:35.080 | just in time.
00:01:38.920 | And if we hadn't, then we would have lost control over the earth.
00:01:43.080 | So are you more worried when you think about this stuff about superintelligent AI, or are
00:01:49.400 | you more worried about super powerful AI that's not aligned with our values?
00:01:55.200 | So the paperclip scenarios kind of...
00:01:59.400 | I think, so the main problem I'm working on is the control problem, the problem of machines
00:02:08.200 | pursuing objectives that are, as you say, not aligned with human objectives.
00:02:14.600 | And this has been the way we've thought about AI since the beginning.
00:02:22.960 | You build a machine for optimizing, and then you put in some objective and it optimizes.
00:02:29.840 | And we can think of this as the King Midas problem.
00:02:35.680 | Because if the King Midas put in this objective, everything I touch should turn to gold, and
00:02:43.000 | the gods, that's like the machine, they said, "Okay, done.
00:02:47.120 | You now have this power."
00:02:48.520 | And of course his food and his drink and his family all turn to gold, and then he dies
00:02:53.600 | of misery and starvation.
00:02:56.160 | And this is a warning, it's a failure mode that pretty much every culture in history
00:03:04.920 | has had some story along the same lines.
00:03:07.800 | There's the genie that gives you three wishes, and the third wish is always, "Please undo
00:03:12.960 | the first two wishes because I messed up."
00:03:17.760 | And when Arthur Samuel wrote his chequer playing program, which learned to play checkers considerably
00:03:25.080 | better than Arthur Samuel could play, and actually reached a pretty decent standard,
00:03:32.480 | Norbert Wiener, who was one of the major mathematicians of the 20th century, sort of the father of
00:03:38.360 | modern automation control systems, he saw this and he basically extrapolated, as Turing
00:03:46.240 | did, and said, "Okay, this is how we could lose control."
00:03:52.600 | And specifically that we have to be certain that the purpose we put into the machine is
00:04:01.800 | the purpose which we really desire.
00:04:05.640 | And the problem is, we can't do that.
00:04:10.200 | You mean it's very difficult to encode, to put our values on paper is really difficult,
00:04:15.720 | or you're just saying it's impossible?
00:04:18.320 | I hope the line is gray between the two.
00:04:23.080 | So theoretically it's possible, but in practice it's extremely unlikely that we could specify
00:04:31.680 | correctly in advance the full range of concerns of humanity.
00:04:37.400 | You talked about cultural transmission of values, I think is how humans to human transmission
00:04:42.200 | of values happens, right?
00:04:44.720 | Well we learn, yeah, I mean, as we grow up we learn about the values that matter, how
00:04:51.840 | things should go, what is reasonable to pursue and what isn't reasonable to pursue.
00:04:55.960 | You think machines can learn in the same kind of way?
00:04:58.920 | Yeah, so I think that what we need to do is to get away from this idea that you build
00:05:05.320 | an optimizing machine and then you put the objective into it.
00:05:09.400 | Because if it's possible that you might put in a wrong objective, and we already know
00:05:15.800 | this is possible because it's happened lots of times, right?
00:05:19.000 | That means that the machine should never take an objective that's given as gospel truth.
00:05:27.800 | Because once it takes the objective as gospel truth, then it believes that whatever actions
00:05:36.600 | it's taking in pursuit of that objective are the correct things to do.
00:05:40.880 | So you could be jumping up and down and saying, "No, no, no, no, you're going to destroy the
00:05:44.240 | world!"
00:05:45.240 | But the machine knows what the true objective is and is pursuing it, and tough luck to you.
00:05:51.160 | And this is not restricted to AI, right?
00:05:54.120 | This is, I think, many of the 20th century technologies, right?
00:05:58.280 | So in statistics, you minimize a loss function.
00:06:01.280 | The loss function is exogenously specified.
00:06:03.680 | In control theory, you minimize a cost function.
00:06:06.680 | In operations research, you maximize a reward function.
00:06:10.320 | And so on.
00:06:11.320 | So in all these disciplines, this is how we conceive of the problem.
00:06:15.640 | And it's the wrong problem.
00:06:19.040 | Because we cannot specify with certainty the correct objective, right?
00:06:24.480 | We need uncertainty.
00:06:25.480 | We need the machine to be uncertain about what it is that it's supposed to be maximizing.
00:06:31.520 | - That's objective.
00:06:32.520 | But favorite idea of yours, I've heard you say somewhere, well, I shouldn't pick favorites,
00:06:37.680 | but it just sounds beautiful, of "We need to teach machines humility."
00:06:41.360 | - It's a beautiful way to put it.
00:06:45.480 | I love it.
00:06:46.920 | - That they're humble.
00:06:47.920 | - Humble AI.
00:06:48.920 | - In that they know that they don't know what it is they're supposed to be doing.
00:06:53.480 | And that those objectives, I mean, they exist.
00:06:58.000 | They're within us.
00:06:59.200 | But we may not be able to explicate them.
00:07:02.160 | We may not even know how we want our future to go.
00:07:10.800 | And a machine that's uncertain is going to be deferential to us.
00:07:18.240 | So if we say, "Don't do that," well, now the machine's learned something, a bit more about
00:07:23.160 | our true objectives, because something that it thought was reasonable in pursuit of our
00:07:28.360 | objective turns out not to be, so now it's learned something.
00:07:31.200 | So it's going to defer because it wants to be doing what we really want.
00:07:36.480 | And that point, I think, is absolutely central to solving the control problem.
00:07:44.160 | And it's a different kind of AI when you take away this idea that the objective is known,
00:07:52.160 | then, in fact, a lot of the theoretical frameworks that we're so familiar with, Markov decision
00:08:01.680 | processes, goal-based planning, you know, standard games research, all of these techniques
00:08:09.680 | actually become inapplicable.
00:08:13.480 | And you get a more complicated problem, because now the interaction with the human becomes
00:08:23.240 | part of the problem, because the human, by making choices, is giving you more information
00:08:32.280 | about the true objective, and that information helps you achieve the objective better.
00:08:39.040 | And so that really means that you're mostly dealing with game-theoretic problems, where
00:08:44.040 | you've got the machine and the human and they're coupled together, rather than a machine going
00:08:49.320 | off by itself with a fixed objective.
00:08:52.160 | Which is fascinating on the machine and the human level, that when you don't have an objective,
00:09:00.640 | means you're together coming up with an objective.
00:09:03.600 | I mean, there's a lot of philosophy that, you know, you could argue that life doesn't
00:09:07.520 | really have meaning.
00:09:08.920 | We together agree on what gives it meaning, and we kind of culturally create things that
00:09:15.640 | give why the heck we are on this earth anyway.
00:09:19.280 | We together as a society create that meaning, and you have to learn that objective.
00:09:23.800 | And one of the biggest, I thought that's where you were going to go for a second, one of
00:09:28.240 | the biggest troubles we run into outside of statistics and machine learning and AI, in
00:09:33.560 | just human civilization, is when you look at, I was born in the Soviet Union, and the
00:09:40.840 | history of the 20th century, we ran into the most trouble, us humans, when there was a
00:09:48.640 | certainty about the objective.
00:09:50.840 | And you do whatever it takes to achieve that objective, whether you're talking about Germany
00:09:55.040 | or communist Russia.
00:09:56.440 | You get into trouble with humans.
00:09:58.960 | - Yeah, and I would say with corporations, in fact, some people argue that we don't have
00:10:05.160 | to look forward to a time when AI systems take over the world.
00:10:08.560 | They already have, and they're called corporations.
00:10:12.040 | Corporations happen to be using people as components right now, but they are effectively
00:10:20.120 | algorithmic machines, and they're optimizing an objective, which is quarterly profit that
00:10:25.760 | isn't aligned with overall well-being of the human race, and they are destroying the world.
00:10:31.880 | They are primarily responsible for our inability to tackle climate change.
00:10:37.280 | So I think that's one way of thinking about what's going on with corporations.
00:10:42.640 | But I think the point you're making is valid, that there are many systems in the real world
00:10:51.520 | where we've sort of prematurely fixed on the objective and then decoupled the machine from
00:11:00.160 | those that it's supposed to be serving.
00:11:03.240 | And I think you see this with government.
00:11:06.400 | Government is supposed to be a machine that serves people, but instead it tends to be
00:11:12.000 | taken over by people who have their own objective and use government to optimize that objective,
00:11:19.160 | regardless of what people want.
00:11:20.560 | [laughs]
00:11:21.100 | [end]
00:11:21.640 | [end]
00:11:22.140 | [end]
00:11:22.640 | [end]
00:11:23.140 | [end]
00:11:23.640 | [end]
00:11:24.140 | [end]
00:11:24.640 | [end]
00:11:25.140 | [end]
00:11:25.640 | [end]
00:11:27.140 | [end]
00:11:27.640 | [end]
00:11:29.140 | [end]
00:11:29.640 | [end]
00:11:31.140 | [end]
00:11:32.640 | [end]
00:11:34.140 | [end]
00:11:34.640 | [end]
00:11:36.140 | [end]
00:11:36.640 | [BLANK_AUDIO]