Stuart Russell: The Control Problem of Super-Intelligent AI

Let's just talk about maybe the control problem. So this idea of losing ability to control the behavior in our AI system. So how do you see that? How do you see that coming about? What do you think we can do to manage it? Well, so it doesn't take a genius to realize that if you make something that's smarter than you, you might have a problem.

You know, Alan Turing wrote about this and gave lectures about this in 1951. He did a lecture on the radio, and he basically says, you know, once the machine thinking method starts, very quickly they'll outstrip humanity. And you know, if we're lucky we might be able to, I think he says, if we may be able to turn off the power at strategic moments, but even so our species would be humbled.

And actually he was wrong about that, right? Because you know, if it's a sufficiently intelligent machine, it's not going to let you switch it off. It's actually in competition with you. So what do you think is meant, just for a quick tangent, if we shut off this superintelligent machine, that our species will be humbled?

I think he means that we would realize that we are inferior, right? That we only survive by the skin of our teeth because we happen to get to the off switch just in time. And if we hadn't, then we would have lost control over the earth. So are you more worried when you think about this stuff about superintelligent AI, or are you more worried about super powerful AI that's not aligned with our values?

So the paperclip scenarios kind of... I think, so the main problem I'm working on is the control problem, the problem of machines pursuing objectives that are, as you say, not aligned with human objectives. And this has been the way we've thought about AI since the beginning. You build a machine for optimizing, and then you put in some objective and it optimizes.

And we can think of this as the King Midas problem. Because if the King Midas put in this objective, everything I touch should turn to gold, and the gods, that's like the machine, they said, "Okay, done. You now have this power." And of course his food and his drink and his family all turn to gold, and then he dies of misery and starvation.

And this is a warning, it's a failure mode that pretty much every culture in history has had some story along the same lines. There's the genie that gives you three wishes, and the third wish is always, "Please undo the first two wishes because I messed up." And when Arthur Samuel wrote his chequer playing program, which learned to play checkers considerably better than Arthur Samuel could play, and actually reached a pretty decent standard, Norbert Wiener, who was one of the major mathematicians of the 20th century, sort of the father of modern automation control systems, he saw this and he basically extrapolated, as Turing did, and said, "Okay, this is how we could lose control." And specifically that we have to be certain that the purpose we put into the machine is the purpose which we really desire.

And the problem is, we can't do that. You mean it's very difficult to encode, to put our values on paper is really difficult, or you're just saying it's impossible? I hope the line is gray between the two. So theoretically it's possible, but in practice it's extremely unlikely that we could specify correctly in advance the full range of concerns of humanity.

You talked about cultural transmission of values, I think is how humans to human transmission of values happens, right? Well we learn, yeah, I mean, as we grow up we learn about the values that matter, how things should go, what is reasonable to pursue and what isn't reasonable to pursue.

You think machines can learn in the same kind of way? Yeah, so I think that what we need to do is to get away from this idea that you build an optimizing machine and then you put the objective into it. Because if it's possible that you might put in a wrong objective, and we already know this is possible because it's happened lots of times, right?

That means that the machine should never take an objective that's given as gospel truth. Because once it takes the objective as gospel truth, then it believes that whatever actions it's taking in pursuit of that objective are the correct things to do. So you could be jumping up and down and saying, "No, no, no, no, you're going to destroy the world!" But the machine knows what the true objective is and is pursuing it, and tough luck to you.

And this is not restricted to AI, right? This is, I think, many of the 20th century technologies, right? So in statistics, you minimize a loss function. The loss function is exogenously specified. In control theory, you minimize a cost function. In operations research, you maximize a reward function. And so on.

So in all these disciplines, this is how we conceive of the problem. And it's the wrong problem. Because we cannot specify with certainty the correct objective, right? We need uncertainty. We need the machine to be uncertain about what it is that it's supposed to be maximizing. - That's objective.

But favorite idea of yours, I've heard you say somewhere, well, I shouldn't pick favorites, but it just sounds beautiful, of "We need to teach machines humility." - It's a beautiful way to put it. I love it. - That they're humble. - Humble AI. - In that they know that they don't know what it is they're supposed to be doing.

And that those objectives, I mean, they exist. They're within us. But we may not be able to explicate them. We may not even know how we want our future to go. And a machine that's uncertain is going to be deferential to us. So if we say, "Don't do that," well, now the machine's learned something, a bit more about our true objectives, because something that it thought was reasonable in pursuit of our objective turns out not to be, so now it's learned something.

So it's going to defer because it wants to be doing what we really want. And that point, I think, is absolutely central to solving the control problem. And it's a different kind of AI when you take away this idea that the objective is known, then, in fact, a lot of the theoretical frameworks that we're so familiar with, Markov decision processes, goal-based planning, you know, standard games research, all of these techniques actually become inapplicable.

And you get a more complicated problem, because now the interaction with the human becomes part of the problem, because the human, by making choices, is giving you more information about the true objective, and that information helps you achieve the objective better. And so that really means that you're mostly dealing with game-theoretic problems, where you've got the machine and the human and they're coupled together, rather than a machine going off by itself with a fixed objective.

Which is fascinating on the machine and the human level, that when you don't have an objective, means you're together coming up with an objective. I mean, there's a lot of philosophy that, you know, you could argue that life doesn't really have meaning. We together agree on what gives it meaning, and we kind of culturally create things that give why the heck we are on this earth anyway.

We together as a society create that meaning, and you have to learn that objective. And one of the biggest, I thought that's where you were going to go for a second, one of the biggest troubles we run into outside of statistics and machine learning and AI, in just human civilization, is when you look at, I was born in the Soviet Union, and the history of the 20th century, we ran into the most trouble, us humans, when there was a certainty about the objective.

And you do whatever it takes to achieve that objective, whether you're talking about Germany or communist Russia. You get into trouble with humans. - Yeah, and I would say with corporations, in fact, some people argue that we don't have to look forward to a time when AI systems take over the world.

They already have, and they're called corporations. Corporations happen to be using people as components right now, but they are effectively algorithmic machines, and they're optimizing an objective, which is quarterly profit that isn't aligned with overall well-being of the human race, and they are destroying the world. They are primarily responsible for our inability to tackle climate change.

So I think that's one way of thinking about what's going on with corporations. But I think the point you're making is valid, that there are many systems in the real world where we've sort of prematurely fixed on the objective and then decoupled the machine from those that it's supposed to be serving.

And I think you see this with government. Government is supposed to be a machine that serves people, but instead it tends to be taken over by people who have their own objective and use government to optimize that objective, regardless of what people want.

Stuart Russell: The Control Problem of Super-Intelligent AI | AI Podcast Clips

Chapters

Transcript