Leslie Kaelbling: Reinforcement Learning, Planning, and Robotics

00:00:00.000 | The following is a conversation with Leslie Kaelbling. She's a roboticist and professor at

00:00:05.360 | MIT. She's recognized for her work in reinforcement learning, planning, robot navigation, and several

00:00:12.080 | other topics in AI. She won the IJCAI Computers and Thought Award and was the editor-in-chief

00:00:18.560 | of the prestigious Journal of Machine Learning Research. This conversation is part of the

00:00:24.320 | Artificial Intelligence Podcast at MIT and beyond. If you enjoy it, subscribe on YouTube, iTunes,

00:00:31.280 | or simply connect with me on Twitter @LexFriedman, spelled F-R-I-D. And now,

00:00:37.760 | here's my conversation with Leslie Kaelbling.

00:00:41.040 | - What made me get excited about AI, I can say that, is I read Gödel, Escher, Bach when I was

00:00:47.680 | in high school. That was pretty formative for me because it exposed the interestingness of

00:00:56.240 | primitives and combination and how you can make complex things out of simple parts,

00:01:01.520 | and ideas of AI and what kinds of programs might generate intelligent behavior.

00:01:07.120 | - So you first fell in love with AI reasoning logic versus robots?

00:01:12.400 | - Yeah, the robots came because my first job, so I finished an undergraduate degree in philosophy

00:01:18.240 | at Stanford and was about to finish a master's in computer science, and I got hired at SRI

00:01:24.160 | in their AI lab. And they were building a robot. It was a kind of a follow-on to Shakey,

00:01:30.960 | but all the Shakey people were not there anymore. And so my job was to try to get this robot to do

00:01:36.080 | stuff, and that's really kind of what got me interested in robots.

00:01:39.280 | - So maybe taking a small step back, your bachelor's in Stanford in philosophy,

00:01:44.400 | did a master's and PhD in computer science, but the bachelor's in philosophy. So what was that

00:01:49.440 | journey like? What elements of philosophy do you think you bring to your work in computer science?

00:01:54.880 | - So it's surprisingly relevant. So part of the reason that I didn't do a computer science

00:02:00.080 | undergraduate degree was that there wasn't one at Stanford at the time, but that there's part

00:02:04.480 | of philosophy, and in fact, Stanford has a special sub-major in something called now

00:02:08.320 | symbolic systems, which is logic, model, theory, formal semantics of natural language. And so that's

00:02:15.120 | actually a perfect preparation for work in AI and computer science.

00:02:19.680 | - That's kind of interesting. So if you were interested in artificial intelligence,

00:02:23.840 | what kind of majors were people even thinking about taking? What is it in neuroscience?

00:02:30.800 | So besides philosophies, what were you supposed to do if you were

00:02:35.520 | fascinated by the idea of creating intelligence?

00:02:37.680 | - There weren't enough people who did that for that even to be a conversation. I mean,

00:02:42.240 | I think probably philosophy. I mean, it's interesting in my class, my graduating class of

00:02:50.000 | undergraduate philosophers, probably maybe slightly less than half went on in computer science,

00:02:57.840 | slightly less than half went on in law, and like one or two went on in philosophy.

00:03:02.320 | So it was a common kind of connection.

00:03:05.520 | - Do you think AI researchers have a role, be part-time philosophers, or should they stick

00:03:10.240 | to the solid science and engineering without sort of taking the philosophizing tangents? I mean,

00:03:16.240 | you work with robots, you think about what it takes to create intelligent beings. Aren't you

00:03:21.760 | the perfect person to think about the big picture philosophy at all?

00:03:25.200 | - The parts of philosophy that are closest to AI, I think, or at least the closest to AI that I think

00:03:29.520 | about are stuff like belief and knowledge and denotation and that kind of stuff. And that's,

00:03:36.160 | you know, it's quite formal and it's like just one step away from

00:03:39.600 | the kinds of computer science work that we do kind of routinely.

00:03:44.000 | I think that there are important questions still about what you can do with a machine and what you

00:03:53.200 | can't and so on. Although at least my personal view is that I'm completely a materialist and

00:03:57.840 | I don't think that there's any reason why we can't make a robot be behaviorally indistinguishable

00:04:04.160 | from a human. And the question of whether it's distinguishable internally, whether it's a zombie

00:04:10.640 | or not in philosophy terms, I actually don't, I don't know, and I don't know if I care too much

00:04:16.480 | about that. - Right, but there is philosophical notions, they're mathematical and philosophical

00:04:22.080 | because we don't know so much, of how difficult that is, how difficult is the perception problem,

00:04:27.520 | how difficult is the planning problem, how difficult is it to operate in this world successfully.

00:04:32.720 | Because our robots are not currently as successful as human beings in many tasks,

00:04:37.920 | the question about the gap between current robots and human beings borders a little bit on

00:04:44.560 | philosophy. You know, the expanse of knowledge that's required to operate in this world,

00:04:51.600 | and the ability to form common sense knowledge, the ability to reason about uncertainty,

00:04:56.640 | much of the work you've been doing, there's open questions there that, I don't know,

00:05:03.600 | require to activate a certain big picture view. - To me that doesn't seem like a philosophical

00:05:09.440 | gap at all. To me, there is a big technical gap, there's a huge technical gap, but I don't see any

00:05:16.640 | reason why it's more than a technical gap. - Perfect. So, when you mentioned AI, you mentioned

00:05:23.600 | SRI, and maybe, can you describe to me when you first fell in love with robotics, the robots,

00:05:31.840 | were inspired, which, so you mentioned Flaky or Shaky, Shaky Flaky, and what was the robot that

00:05:40.480 | first captured your imagination, what's possible? - Right, well, so the first robot I worked with was

00:05:44.800 | Flaky. Shaky was a robot that the SRI people had built, but by the time, I think when I arrived,

00:05:50.880 | it was sitting in a corner of somebody's office dripping hydraulic fluid into a pan.

00:05:55.040 | But it's iconic, and really, everybody should read the Shaky Tech Report, because it has so

00:06:01.760 | many good ideas in it. I mean, they invented A*Search and symbolic planning and learning

00:06:09.200 | macro operators. They had low-level kind of configuration space planning for their robot,

00:06:16.080 | they had vision, they had all, this, the basic ideas of a ton of things. - Can you take,

00:06:20.800 | step by, Shaky have arms, what was the job, what was the goal? - Shaky was a mobile robot,

00:06:26.560 | but it could push objects, and so it would move things around. - With which actuator, with arms?

00:06:31.680 | - With itself, with its base. - Okay, great. - So it could, but it, and they had painted the base

00:06:39.040 | boards black, so it used vision to localize itself in a map, it detected objects, it could detect

00:06:47.680 | objects that were surprising to it, it would plan and replan based on what it saw, it reasoned about

00:06:54.480 | whether to look and take pictures. I mean, it really had the basics of so many of the things

00:07:01.280 | that we think about now. - How did it represent the space around it? - So it had representations

00:07:08.000 | at a bunch of different levels of abstraction, so it had, I think, a kind of an occupancy grid

00:07:12.400 | of some sort at the lowest level. At the high level, it was abstract, symbolic kind of rooms

00:07:18.800 | and connectivity. - So where does Flaky come in? - Yeah, okay, so I showed up at SRI, and the,

00:07:25.120 | we were building a brand new robot. As I said, none of the people from the previous project were

00:07:31.040 | kind of there or involved anymore, so we were kind of starting from scratch, and my advisor

00:07:37.680 | was Stan Rosenstein, he ended up being my thesis advisor, and he was motivated by this idea of

00:07:44.080 | situated computation or situated automata, and the idea was that the tools of logical reasoning were

00:07:52.400 | important, but possibly only for the engineers or designers to use in the analysis of a system,

00:08:01.200 | but not necessarily to be manipulated in the head of the system itself, right? So I might use logic

00:08:07.600 | to prove a theorem about the behavior of my robot, even if the robot's not using logic in its head

00:08:12.640 | to prove theorems, right? So that was kind of the distinction. And so the idea was to kind of use

00:08:18.560 | those principles to make a robot do stuff, but a lot of the basic things we had to kind of

00:08:27.040 | learn for ourselves, 'cause I had zero background in robotics, I didn't know anything about control,

00:08:31.360 | I didn't know anything about sensors, so we reinvented a lot of wheels on the way to getting

00:08:35.760 | that robot to do stuff. - Do you think that was an advantage

00:08:38.240 | or a hindrance? - Oh, no, I mean, I'm big in favor of

00:08:42.480 | wheel reinvention, actually. I mean, I think you learn a lot by doing it. It's important, though,

00:08:48.240 | to eventually have the pointers so that you can see what's really going on, but I think you can

00:08:53.440 | appreciate much better the good solutions once you've messed around a little bit on your own

00:08:59.440 | and found a bad one. - Yeah, I think you mentioned

00:09:01.440 | reinventing reinforcement learning and referring to rewards as pleasures, a pleasure, I think,

00:09:09.520 | which I think is a nice name for it. - Yeah, it seemed good to me.

00:09:12.480 | - It's more fun, almost. Do you think you could tell the history of AI, machine learning,

00:09:19.040 | reinforcement learning, how you think about it from the '50s to now?

00:09:23.360 | - One thing is that it oscillates, right? So things become fashionable and then they go out

00:09:29.360 | and then something else becomes cool and then it goes out and so on. So there's some interesting

00:09:34.480 | sociological process that actually drives a lot of what's going on. Early days was kind of

00:09:40.800 | cybernetics and control, right? And the idea of homeostasis, right? People who made these robots

00:09:47.840 | that could, I don't know, try to plug into the wall when they needed power and then come loose

00:09:53.840 | and roll around and do stuff. And then I think over time, the thought, well, that was inspiring,

00:10:00.560 | but people said, no, no, no, we want to get maybe closer to what feels like real intelligence or

00:10:04.480 | human intelligence. And then maybe the expert systems people tried to do that, but maybe a

00:10:14.720 | little too superficially, right? So, oh, we get the surface understanding of what intelligence is

00:10:21.440 | like because I understand how a steel mill works and I can try to explain it to you and you can

00:10:25.600 | write it down in logic and then we can make a computer infer that. And then that didn't work

00:10:31.120 | out. But what's interesting, I think, is when a thing starts to not be working very well,

00:10:37.520 | it's not only do we change methods, we change problems, right? So it's not like we have better

00:10:44.160 | ways of doing the problem that the expert systems people were trying to do. We have no ways of

00:10:48.160 | trying to do that problem. Oh yeah, I know, I think, or maybe a few, but we kind of give up

00:10:55.200 | on that problem and we switch to a different problem and we work that for a while and we make

00:11:01.120 | progress. - As a broad community.

00:11:02.480 | - As a community, yeah. - And there's a lot of people

00:11:04.160 | who would argue you don't give up on the problem, it's just you decrease the number of people working

00:11:09.520 | on it. You almost kind of like put it on the shelf, say, we'll come back to this 20 years later.

00:11:13.760 | - Yeah, I think that's right. Or you might decide that it's malformed.

00:11:18.080 | - Like you might say, it's wrong to just try to make something that does superficial symbolic

00:11:25.520 | reasoning behave like a doctor. You can't do that until you've had the sensory motor experience

00:11:32.640 | of being a doctor or something, right? So there's arguments that say that that problem was not well

00:11:37.280 | formed, or it could be that it is well formed, but we just weren't approaching it well.

00:11:42.240 | - So you mentioned that your favorite part of logic and symbolic systems is that they give

00:11:47.600 | short names for large sets. So there is some use to this, the use to symbolic reasoning. So

00:11:55.120 | looking at expert systems and symbolic computing, what do you think are the roadblocks that were

00:12:00.320 | hit in the '80s and '90s? - Ah, okay. So right, so the fact that I'm

00:12:04.720 | not a fan of expert systems doesn't mean that I'm not a fan of some kinds of symbolic reasoning.

00:12:13.040 | Let's see, roadblocks. Well, the main roadblock, I think, was that the idea that humans could

00:12:19.600 | articulate their knowledge effectively into some kind of logical statements.

00:12:26.000 | - So it's not just the cost, the effort, but really just the capability of doing it.

00:12:31.120 | - Right, because we're all experts in vision, right? But totally don't have introspective

00:12:36.720 | access into how we do that, right? And it's true that, I mean, I think the idea was, well, of

00:12:45.120 | course, even people then would know, of course, I wouldn't ask you to please write down the rules

00:12:48.800 | that you use for recognizing a water bottle. That's crazy. And everyone understood that. But

00:12:53.440 | we might ask you to please write down the rules you use for deciding, I don't know, what tie to

00:12:59.920 | put on or how to set up a microphone or something like that. But even those things, I think people

00:13:07.920 | maybe, I think what they found, I'm not sure about this, but I think what they found was that the

00:13:12.720 | so-called experts could give explanations that sort of post hoc explanations for how and why

00:13:19.120 | they did things, but they weren't necessarily very good. And then they depended on maybe some

00:13:27.680 | kinds of perceptual things, which again, they couldn't really define very well. So I think

00:13:33.840 | fundamentally, I think that the underlying problem with that was the assumption that people could

00:13:39.040 | articulate how and why they make their decisions. - Right, so it's almost encoding the knowledge,

00:13:45.760 | converting from expert to something that a machine could understand and reason with.

00:13:51.200 | - No, no, no, not even just encoding, but getting it out of you.

00:13:54.560 | - Just... - Right? Not writing it,

00:13:58.240 | I mean, yes, hard also to write it down for the computer, but I don't think that people can

00:14:03.680 | produce it. You can tell me a story about why you do stuff, but I'm not so sure that's the why.

00:14:10.080 | - Great. So there are still on the hierarchical planning side,

00:14:18.560 | places where symbolic reasoning is very useful. So as you've talked about, so...

00:14:25.040 | - Right, so don't... - Where's the gap?

00:14:29.520 | - Yeah, okay, good. So saying that humans can't provide a description of their reasoning processes,

00:14:36.480 | that's okay, fine, but that doesn't mean that it's not good to do reasoning of various styles

00:14:42.400 | inside a computer. Those are just two orthogonal points. So then the question is, what kind of

00:14:49.120 | reasoning should you do inside a computer? And the answer is, I think you need to do all different

00:14:54.560 | kinds of reasoning inside a computer, depending on what kinds of problems you face.

00:14:59.040 | - I guess the question is, what kind of things can you encode symbolically so you can reason about?

00:15:07.840 | - I think the idea about, and even symbolic, I don't even like that terminology,

00:15:16.080 | 'cause I don't know what it means technically and formally. I do believe in abstractions.

00:15:21.440 | So abstractions are critical, right? You cannot reason at completely fine grain about everything

00:15:27.840 | in your life, right? You can't make a plan at the level of images and torques for getting a PhD.

00:15:34.080 | So you have to reduce the size of the state space and you have to reduce the horizon

00:15:39.600 | if you're gonna reason about getting a PhD or even buying the ingredients to make dinner.

00:15:44.160 | And so how can you reduce the spaces and the horizon of the reasoning you have to do? And

00:15:50.960 | the answer is abstraction, spatial abstraction, temporal abstraction. I think abstraction along

00:15:55.280 | the lines of goals is also interesting. Like you might, or well, abstraction and decomposition.

00:16:01.600 | Goals is maybe more of a decomposition thing. So I think that's where these kinds of, if you want

00:16:06.880 | to call it symbolic or discrete models come in. You talk about a room of your house instead of

00:16:13.280 | your pose. You talk about doing something during the afternoon instead of at 2.54.

00:16:20.000 | And you do that because it makes your reasoning problem easier. And also because

00:16:28.480 | you don't have enough information to reason in high fidelity about your pose of your elbow at

00:16:35.280 | 2.35 this afternoon anyway. - Right. When you're trying to get a PhD.

00:16:39.360 | - Right. Or when you're doing anything really. - Yeah, okay.

00:16:41.920 | - Except for at that moment. At that moment, you do have to reason about the pose of your elbow,

00:16:46.000 | maybe. But then maybe you do that in some continuous joint space kind of model. So again,

00:16:53.440 | my biggest point about all of this is that there should be, that dogma is not the thing, right?

00:16:59.280 | It shouldn't be that I am in favor against symbolic reasoning and you're in favor against

00:17:04.080 | neural networks. It should be that just computer science tells us what the right answer to all

00:17:10.480 | these questions is if we were smart enough to figure it out. - Well, yeah. When you try to

00:17:13.840 | actually solve the problem with computers, the right answer comes out. You mentioned abstractions.

00:17:19.680 | I mean, neural networks form abstractions or rather there's automated ways to form abstractions.

00:17:27.200 | - Absolutely. - And there's

00:17:27.840 | expert driven ways to form abstractions and expert human driven ways. And humans just seems to be

00:17:34.240 | way better at forming abstractions currently and certain problems. So when you're referring to

00:17:39.840 | 2.45 p.m. versus afternoon, how do we construct that taxonomy? Is there any room for automated

00:17:48.960 | construction of such abstractions? - Oh, I think eventually, yeah. I mean,

00:17:53.680 | I think when we get to be better machine learning engineers, we'll build algorithms that

00:18:00.240 | build awesome abstractions. - That are useful in this kind

00:18:03.200 | of way that you're describing. - Yeah.

00:18:04.400 | - Yeah. So let's then step from the abstraction discussion and let's talk about BOM MDPs,

00:18:15.840 | partially observable Markov decision processes. So uncertainty. So first,

00:18:20.160 | what are Markov decision processes? - What are Markov decision processes?

00:18:23.600 | - And maybe how much of our world can be models and MDPs? How much, when you wake up in the

00:18:30.080 | morning and you're making breakfast, do you think of yourself as an MDP? So how do you think about

00:18:35.760 | MDPs and how they relate to our world? - Well, so there's a stance question, right?

00:18:41.280 | So a stance is a position that I take with respect to a problem. So I, as a researcher or a person

00:18:48.640 | who designs systems, can decide to make a model of the world around me in some terms. So I take

00:18:56.080 | this messy world and I say, I'm going to treat it as if it were a problem of this formal kind,

00:19:02.560 | and then I can apply solution concepts or algorithms or whatever to solve that formal

00:19:06.880 | thing, right? So of course the world is not anything. It's not an MDP or a POMDP. I don't

00:19:11.440 | know what it is, but I can model aspects of it in some way or some other way. And when I model some

00:19:16.960 | aspect of it in a certain way, that gives me some set of algorithms I can use. - You can model the

00:19:21.760 | world in all kinds of ways. Some are more accepting of uncertainty, more easily modeling uncertainty

00:19:31.520 | of the world. Some really force the world to be deterministic. And so certainly MDPs model the

00:19:39.440 | uncertainty of the world. - Yes. Model some uncertainty. They model not present state

00:19:44.880 | uncertainty, but they model uncertainty in the way the future will unfold. - Right. So what are

00:19:52.320 | Markov decision processes? - Okay, so Markov decision process is a model. It's a kind of a

00:19:56.160 | model that you could make that says, I know completely the current state of my system.

00:20:00.800 | And what it means to be a state is that I have all the information right now that will let me make

00:20:08.000 | predictions about the future as well as I can. So that remembering anything about my history

00:20:13.040 | wouldn't make my predictions any better. But then it also says that then I can take some actions

00:20:21.840 | that might change the state of the world and that I don't have a deterministic model of those

00:20:26.240 | changes. I have a probabilistic model of how the world might change. It's a useful model for some

00:20:33.440 | kinds of systems. I think it's a, I mean, it's certainly not a good model for most problems.

00:20:41.520 | I think because for most problems, you don't actually know the state. For most problems,

00:20:46.960 | it's partially observed. So that's now a different problem class. - So, okay, that's where the

00:20:53.280 | POMDPs, the partially observable Markov decision processes step in. So how do they address the

00:21:00.080 | fact that you can't observe most, you have incomplete information about most of the world

00:21:05.360 | around you? - Right. So now the idea is we still kind of postulate that there exists a state. We

00:21:10.880 | think that there is some information about the world out there such that if we knew that we

00:21:15.840 | could make good predictions, but we don't know the state. And so then we have to think about how,

00:21:21.600 | but we do get observations. Maybe I get images or I hear things or I feel things, and those might be

00:21:27.760 | local or noisy. And so therefore they don't tell me everything about what's going on. And then I

00:21:31.760 | have to reason about, given the history of actions I've taken and observations I've gotten,

00:21:37.680 | what do I think is going on in the world? And then given my own kind of uncertainty about what's

00:21:42.640 | going on in the world, I can decide what actions to take. - And so difficult is this problem of

00:21:48.160 | planning under uncertainty in your view, in your long experience of modeling the world,

00:21:54.000 | trying to deal with this uncertainty in, especially in real world systems. - Optimal planning for even

00:22:02.720 | discrete POMDPs can be undecidable depending on how you set it up. And so lots of people say,

00:22:10.720 | I don't use POMDPs because they are intractable. And I think that that's a kind of a very funny

00:22:16.480 | thing to say, because the problem you have to solve is the problem you have to solve.

00:22:22.080 | So if the problem you have to solve is intractable, that's what makes us AI people,

00:22:25.840 | right? So we solve, we understand that the problem we're solving is wildly intractable,

00:22:31.680 | that we can't, we will never be able to solve it optimally. At least I don't. Yeah, right. So

00:22:37.760 | later we can come back to an idea about bounded optimality and something. But anyway,

00:22:42.720 | we can't come up with optimal solutions to these problems. So we have to make approximations,

00:22:47.680 | approximations in modeling, approximations in solution algorithms and so on. And so

00:22:52.160 | I don't have a problem with saying, yeah, my problem actually, it is POMDP in continuous

00:22:58.800 | space with continuous observations and it's so computationally complex, I can't even think about

00:23:04.000 | it's, you know, big O, whatever. But that doesn't prevent me from, it helps me,

00:23:10.320 | gives me some clarity to think about it that way. And to then take steps to make approximation after

00:23:17.600 | approximation to get down to something that's like computable in some reasonable time.

00:23:22.160 | When you think about optimality, you know, the community broadly has shifted on that,

00:23:27.600 | I think a little bit in how much they value the idea of optimality, of chasing an optimal solution.

00:23:35.680 | How has your views of chasing an optimal solution changed over the years when you work with robots?

00:23:42.320 | That's interesting. I think we have a little bit of a methodological crisis, actually,

00:23:49.360 | from the theoretical side. I mean, I do think that theory is important and that right now we're not

00:23:54.080 | doing much of it. So there's lots of empirical hacking around and training this and doing that

00:24:00.640 | and reporting numbers, but is it good? Is it bad? We don't know. It's very hard to say things.

00:24:05.440 | And if you look at like computer science theory, so people talked for a while, everyone was about

00:24:16.800 | solving problems optimally or completely. And then there were interesting relaxations, right? So

00:24:22.400 | people look at, oh, can I, are there regret bounds or can I do some kind of, you know,

00:24:29.360 | approximation? Can I prove something that I can approximately solve this problem or that I get

00:24:33.760 | closer to the solution as I spend more time and so on? What's interesting, I think, is that we don't

00:24:39.920 | have good approximate solution concepts for very difficult problems, right? I like to, you know,

00:24:49.200 | I like to say that I'm interested in doing a very bad job of very big problems.

00:24:53.040 | - That's a good quote.

00:24:56.000 | - Right. So very bad job of very big problems. I like to do that. But I wish I could say

00:25:02.960 | something. I wish I had a, I don't know, some kind of a formal solution concept that I could use to

00:25:12.000 | say, oh, this algorithm actually, it gives me something. Like I know what I'm going to get.

00:25:17.520 | I can do something other than just run it and get out 6.7.

00:25:20.640 | - That notion is still somewhere deeply compelling to you. The notion that you can say,

00:25:26.240 | you can drop thing on the table says this, you can expect this algorithm will give me some good

00:25:33.120 | results.

00:25:33.440 | - I hope there's, I hope science will, I mean, there's engineering and there's science. I think

00:25:39.280 | that they're not exactly the same. And I think right now we're making huge engineering, like,

00:25:45.920 | leaps and bounds. So the engineering is running away ahead of the science, which is cool and

00:25:50.880 | often how it goes, right? So we're making things and nobody knows how and why they work, roughly.

00:25:55.200 | But we need to turn that into science.

00:26:00.160 | - There's some form, it's, yeah, there's some room for formalizing.

00:26:04.720 | - We need to know what the principles are. Why does this work? Why does that not work? I mean,

00:26:08.400 | for a while people built bridges by trying, but now we can often predict whether it's going to

00:26:13.440 | work or not without building it. Can we do that for learning systems or for robots?

00:26:18.320 | - So your hope is from a materialistic perspective that

00:26:21.600 | intelligence, artificial intelligence systems, robots, are just fancier bridges.

00:26:28.080 | Belief space. What's the difference between belief space and state space? So you mentioned MDPs,

00:26:34.240 | POMDPs, reasoning about, you sense the world, there's a state. What's this belief space idea?

00:26:42.720 | - Yeah, that sounds so good.

00:26:44.240 | - That sounds good. So belief space, that is, instead of thinking about what's the state of

00:26:50.640 | the world and trying to control that as a robot, I think about what is the space of beliefs that I

00:26:58.800 | could have about the world? What's, if I think of a belief as a probability distribution of ways the

00:27:03.680 | world could be, a belief state as a distribution, and then my control problem, if I'm reasoning

00:27:10.080 | about how to move through a world I'm uncertain about, my control problem is actually the problem

00:27:16.160 | of controlling my beliefs. So I think about taking actions, not just what effect they'll

00:27:21.040 | have on the world outside, but what effect they'll have on my own understanding of the world outside.

00:27:25.200 | And so that might compel me to ask a question or look somewhere to gather information,

00:27:31.920 | which may not really change the world state, but it changes my own belief about the world.

00:27:36.480 | - That's a powerful way to empower the agent to reason about the world, to explore the world.

00:27:44.480 | What kind of problems does it allow you to solve to consider belief space versus just state space?

00:27:51.680 | - Well, any problem that requires deliberate information gathering, right? So if,

00:27:56.800 | in some problems, like chess, there's no uncertainty, or maybe there's uncertainty

00:28:03.440 | about the opponent, there's no uncertainty about the state. And some problems there's

00:28:09.760 | uncertainty, but you gather information as you go, right? You might say, "Oh, I'm driving my

00:28:15.200 | autonomous car down the road, and it doesn't know perfectly where it is, but the light hours are all

00:28:19.120 | going all the time, so I don't have to think about whether to gather information." But if you're a

00:28:24.640 | human driving down the road, you sometimes look over your shoulder to see what's going on behind

00:28:29.680 | you in the lane, and you have to decide whether you should do that now. And you have to trade off

00:28:37.280 | the fact that you're not seeing in front of you, and you're looking behind you, and how valuable

00:28:41.200 | is that information, and so on. And so to make choices about information gathering,

00:28:46.000 | you have to reason in belief space. Also, I mean, also to just take into account your own

00:28:55.440 | uncertainty before trying to do things. So you might say, "If I understand where I'm standing

00:29:02.640 | relative to the door jam pretty accurately, then it's okay for me to go through the door. But if

00:29:08.000 | I'm really not sure where the door is, then it might be better to not do that right now."

00:29:12.880 | The degree of your uncertainty about the world is actually part of the thing you're trying to

00:29:17.760 | optimize in forming the plan, right? That's right.

00:29:20.960 | So this idea of a long horizon of planning for a PhD, or just even how to get out of the house,

00:29:27.040 | or how to make breakfast. You show this presentation of the WTF, where's the fork,

00:29:32.720 | of robot looking at a sink. And can you describe how we plan in this world,

00:29:40.640 | of this idea of hierarchical planning we've mentioned? So yeah, how can a robot hope to

00:29:47.360 | plan about something with such a long horizon, where the goal is quite far away?

00:29:53.840 | People, since probably reasoning began, have thought about hierarchical reasoning.

00:29:59.760 | The temporal hierarchy in particular. Well, there's spatial hierarchy, but let's talk

00:30:03.040 | about temporal hierarchy. So you might say, "Oh, I have this long execution I have to do,

00:30:08.960 | but I can divide it into some segments abstractly." So maybe you have to get out of the house,

00:30:15.360 | I have to get in the car, I have to drive, and so on. And so you can plan. If you can build

00:30:22.720 | abstractions, so this, we started out by talking about abstractions, and we're back to that now.

00:30:26.960 | If you can build abstractions in your state space, and abstractions, sort of temporal abstractions,

00:30:34.560 | then you can make plans at a high level. And you can say, "I'm going to go to town, and then I'll

00:30:40.160 | have to get gas, and then I can go here, and I can do this other thing." And you can reason about

00:30:43.840 | the dependencies and constraints among these actions, again, without thinking about the complete

00:30:50.000 | details. What we do in our hierarchical planning work is then say, "All right, I make a plan at a

00:30:57.280 | high level of abstraction. I have to have some reason to think that it's feasible without

00:31:03.920 | working it out in complete detail." And that's actually the interesting step. I always like to

00:31:08.800 | talk about walking through an airport. Like, you can plan to go to New York and arrive at the

00:31:14.160 | airport, and then find yourself an office building later. You can't even tell me in advance what your

00:31:20.000 | plan is for walking through the airport. Partly because you're too lazy to think about it, maybe,

00:31:24.960 | but partly also because you just don't have the information. You don't know what gate you're

00:31:28.400 | landing in, or what people are going to be in front of you, or anything. So there's no point

00:31:34.320 | in planning in detail. But you have to have -- you have to make a leap of faith that you can figure

00:31:41.040 | it out once you get there. And it's really interesting to me how you arrive at that.

00:31:47.680 | How do you -- so you have learned over your lifetime to be able to make some kinds of

00:31:53.040 | predictions about how hard it is to achieve some kinds of sub-goals. And that's critical. Like,

00:31:58.800 | you would never plan to fly somewhere if you couldn't -- didn't have a model of how hard it

00:32:03.360 | was to do some of the intermediate steps. So one of the things we're thinking about now is,

00:32:06.800 | how do you do this kind of very aggressive generalization to situations that you haven't

00:32:14.080 | been in and so on, to predict how long will it take to walk through the Kuala Lumpur airport?

00:32:18.640 | Like, you could give me an estimate and it wouldn't be crazy. And you have to have an estimate of that

00:32:24.560 | in order to make plans that involve walking through the Kuala Lumpur airport, even if you

00:32:28.880 | don't need to know it in detail. So I'm really interested in these kinds of abstract models and

00:32:34.160 | how do we acquire them. But once we have them, we can use them to do hierarchical reasoning,

00:32:38.960 | which I think is very important. Yeah, there's this notion of goal regression and

00:32:44.720 | pre-image backchaining, this idea of starting at the goal and just forming these big clouds

00:32:50.960 | of states. I mean, it's almost like saying to the airport, you know, you know, once you show up

00:32:59.200 | to the airport, that you're like a few steps away from the goal. So like, thinking of it this way,

00:33:07.040 | it's kind of interesting. I don't know if you have sort of further comments on that,

00:33:12.320 | of starting at the goal. Yeah, I mean, it's interesting that Simon, Herb Simon, back in the

00:33:20.000 | early days of AI, talked a lot about means-ends reasoning and reasoning back from the goal.

00:33:25.120 | There's a kind of an intuition that people have that the number of, that state space is big,

00:33:32.960 | the number of actions you could take is really big. So if you say, here I sit and I want to

00:33:37.600 | search forward from where I am, what are all the things I could do? That's just overwhelming.

00:33:41.520 | If you say, if you can reason at this other level and say, here's what I'm hoping to achieve,

00:33:46.480 | what could I do to make that true? That somehow the branching is smaller. Now,

00:33:51.600 | what's interesting is that like in the AI planning community, that hasn't worked out. In the class of

00:33:56.960 | problems that they look at and the methods that they tend to use, it hasn't turned out that it's

00:34:00.720 | better to go backward. It's still kind of my intuition that it is, but I can't prove that

00:34:07.120 | to you right now. Right. I share your intuition, at least for us mere humans. Speaking of which,

00:34:15.920 | when you maybe now we take it and take a little step into that philosophy circle,

00:34:21.200 | how hard would it, when you think about human life, you give those examples often,

00:34:27.680 | how hard do you think it is to formulate human life as a planning problem or aspects of

00:34:32.400 | human life? So when you look at robots, you're often trying to think about object manipulation,

00:34:38.640 | tasks, about moving a thing. When you take a slight step outside the room, let the robot

00:34:46.240 | leave and go get lunch, or maybe try to pursue more fuzzy goals. How hard do you think is that

00:34:54.480 | problem? If you were to try to maybe put another way, try to formulate human life as a planning

00:35:00.800 | problem. Well, that would be a mistake. I mean, it's not all a planning problem, right? I think

00:35:05.760 | it's really, really important that we understand that you have to put together pieces and parts

00:35:11.920 | that have different styles of reasoning and representation and learning. I think it seems

00:35:18.080 | probably clear to anybody that it can't all be this or all be that. Brains aren't all like this

00:35:25.680 | or all like that, right? They have different pieces and parts and substructure and so on.

00:35:30.160 | So I don't think that there's any good reason to think that there's going to be like one

00:35:33.920 | true algorithmic thing that's going to do the whole job.

00:35:38.080 | Just a bunch of pieces together designed to solve a bunch of specific problems.

00:35:43.040 | Or maybe styles of problems. I mean, there's probably some reasoning that needs to go on

00:35:50.320 | in image space. I think, again, there's this model-based versus model-free idea, right? So

00:35:58.560 | in reinforcement learning, people talk about, "Oh, should I learn? I could learn a policy,

00:36:03.600 | just straight up a way of behaving. I could learn it's popular, learn a value function,

00:36:09.440 | that's some kind of weird intermediate ground. Or I could learn a transition model, which tells me

00:36:15.840 | something about the dynamics of the world." If I take a tra- imagine that I learn a transition

00:36:20.800 | model and I couple it with a planner and I draw a box around that, I have a policy again. It's just

00:36:26.800 | stored a different way. But it's just as much of a policy as the other policy. It's just I've made,

00:36:34.080 | I think, the way I see it is it's a time-space trade-off in computation. Right? A more overt

00:36:41.440 | policy representation. Maybe it takes more space, but maybe I can compute quickly what action I

00:36:47.520 | should take. On the other hand, maybe a very compact model of the world dynamics plus a

00:36:52.480 | planner lets me compute what action to take too, just more slowly. There's no, I don't, I mean,

00:36:57.840 | I don't think there's no argument to be had. It's just like a question of what form of computation

00:37:03.840 | is best for us. - For the various sub-problems. - Right. So, and so like learning to do algebra

00:37:11.840 | manipulations for some reason is, I mean, that's probably gonna want naturally a sort of a

00:37:17.040 | different representation than writing a unicycle. At the time constraints on the unicycle are

00:37:22.080 | serious. The space is maybe smaller. I don't know. But so I. - And there could be the more human

00:37:28.080 | sides of falling in love, having a relationship that might be another. - Yeah, I have no idea.

00:37:35.600 | - How to model that. Yeah. Let's first solve the algebra and the object manipulation.

00:37:42.560 | What do you think is harder, perception or planning? - Perception. That's why. - Understanding.

00:37:49.040 | That's why. So what do you think is so hard about perception,

00:37:53.760 | about understanding the world around you? - Well, I mean, I think the big question

00:37:57.520 | is representational. Hugely the question is representation. So perception has made

00:38:08.400 | great strides lately, right? And we can classify images and we can

00:38:12.800 | play certain kinds of games and predict how to steer the car and all this sort of stuff.

00:38:17.760 | I don't think we have a very good idea of what perception should deliver, right? So if you,

00:38:28.160 | if you believe in modularity, okay, there's a very strong view which says

00:38:34.560 | we shouldn't build in any modularity. We should make a giant, gigantic neural network,

00:38:40.400 | train it end to end to do the thing. And that's the best way forward. And it's hard to argue

00:38:47.600 | with that except on a sample complexity basis, right? So you might say, oh, well, if I want to

00:38:52.960 | do end to end reinforcement learning on this giant, giant neural network, it's going to take

00:38:56.400 | a lot of data and a lot of like broken robots and stuff. So then the only answer is to say, okay,

00:39:07.280 | we have to build something in, build in some structure or some bias. We know from theory of

00:39:12.960 | machine learning, the only way to cut down the sample complexity is to kind of cut down,

00:39:16.960 | somehow cut down the hypothesis space. You can do that by building in bias. There's all kinds

00:39:22.960 | of reasons to think that nature built bias into humans. Convolution is a bias. It's a very strong

00:39:31.760 | bias and it's a very critical bias. So my own view is that we should look for more things that are

00:39:38.880 | like convolution, but that address other aspects of reasoning, right? So convolution helps us a

00:39:43.840 | lot with a certain kind of spatial reasoning. That's quite close to the imaging. I think

00:39:50.640 | there's other ideas like that. Maybe some amount of forward search, maybe some notions of abstraction,

00:39:58.080 | maybe the notion that objects exist. Actually, I think that's pretty important. And a lot of people

00:40:03.200 | won't give you that to start with. Right? - So almost like a convolution in the,

00:40:07.600 | in the object, semantic object space or some kind of, some kind of ideas in there.

00:40:14.560 | - That's right. And people are starting, like the graph, graph convolutions are an idea that

00:40:18.320 | are related to relational representations. And so, so I think there are, so you, I've come far

00:40:26.720 | afield from perception, but I think, I think the thing that's going to make perception that kind

00:40:32.160 | of the next step is actually understanding better what it should produce. Right? So what are we

00:40:37.600 | going to do with the output of it? Right? It's fine when what we're going to do with the output

00:40:41.280 | is severe. It's less clear when we're just trying to make a one integrated, intelligent agent. What

00:40:49.040 | should the output of perception be? We have no idea. And how should that hook up to the other

00:40:53.520 | stuff? We don't know. So I think the pressing question is what kinds of structure can we build

00:41:00.480 | in that are like the moral equivalent of convolution that will make a really awesome

00:41:05.520 | superstructure that then learning can kind of progress on efficiently.

00:41:10.000 | - I agree. Very compelling description of actually where we stand with the perception problem.

00:41:14.080 | You're teaching a course on embodying intelligence. What do you think it takes to

00:41:19.120 | build a robot with human level intelligence? - I don't know if we knew we would do it.

00:41:24.800 | - If you were to, I mean, okay. So do you think a robot needs to have a self-awareness,

00:41:35.200 | a consciousness, fear of mortality, or is it simpler than that? Or is consciousness a simple

00:41:44.160 | thing? Do you think about these notions? - I don't think much about consciousness.

00:41:49.200 | Even most philosophers who care about it will give you that you could have robots that are zombies,

00:41:55.760 | right? That behave like humans, but are not conscious. And I, at this moment,

00:41:59.680 | would be happy enough with that. So I'm not really worried one way or the other.

00:42:02.480 | - So on the technical side, you're not thinking of the use of self-awareness?

00:42:06.800 | - Well, but I, okay. But then what does self-awareness mean? I mean,

00:42:11.280 | that you need to have some part of the system that can observe other parts of the system and

00:42:18.160 | tell whether they're working well or not. That seems critical. So does that count as, I mean,

00:42:23.920 | does that count as self-awareness or not? Well, it depends on whether you think that there's

00:42:29.200 | somebody at home who can articulate whether they're self-aware. But clearly, if I have

00:42:33.680 | some piece of code that's counting how many times this procedure gets executed,

00:42:38.240 | that's a kind of self-awareness, right? So there's a big spectrum. It's clear you have to have some

00:42:44.080 | of it. - Right. We're quite far away,

00:42:46.320 | how many dimensions, but is there a direction of research that's most compelling to you for

00:42:51.600 | trying to achieve human-level intelligence in our robots?

00:42:55.760 | - Well, to me, I guess the thing that seems most compelling to me at the moment is this

00:43:00.960 | question of what to build in and what to learn. I think we're missing a bunch of ideas and

00:43:11.680 | we, you know, people, you know, don't you dare ask me how many years it's going to be until that

00:43:17.760 | happens because I won't even participate in the conversation because I think we're missing ideas

00:43:23.120 | and I don't know how long it's going to take to find them. - So I won't ask you how many years,

00:43:27.120 | but maybe I'll ask you when you'll be sufficiently impressed that we've achieved it. So what's a good

00:43:36.960 | test of intelligence? Do you like the Turing test, the natural language, and the robotic space? Is

00:43:42.640 | there something where you would sit back and think, "Oh, that's pretty impressive as a test,

00:43:49.760 | as a benchmark." Do you think about these kinds of problems? - No, I resist. I mean, I think all

00:43:55.040 | the time that we spend arguing about those kinds of things could be better spent just making the

00:43:59.840 | robots work better. - You don't value competition. So, I mean, there's a nature of benchmarks and

00:44:08.320 | data sets or Turing test challenges where everybody kind of gets together and tries to build a better

00:44:13.840 | robot because they want to out-compete each other. Like the DARPA challenge with the autonomous

00:44:18.080 | vehicles. Do you see the value of that or it can get in the way? - I think it can get in the way.

00:44:25.760 | Many people find it motivating and so that's good. I find it anti-motivating personally.

00:44:31.120 | But I think you get an interesting cycle where for a contest, a bunch of smart people get super

00:44:40.000 | motivated and they hack their brains out. And much of what gets done is just hacks, but sometimes

00:44:45.200 | really cool ideas emerge. And then that gives us something to chew on after that. So, it's not a

00:44:52.080 | thing for me, but I don't regret that other people do it. - Yeah, it's like you said, with everything

00:44:58.160 | else, the mix is good. So, jumping topics a little bit, you started the Journal of Machine Learning

00:45:03.440 | Research and served as its editor-in-chief. How did the publication come about? And what do you

00:45:12.560 | think about the current publishing model space in machine learning, artificial intelligence? -

00:45:18.400 | Okay, good. So, it came about because there was a journal called Machine Learning, which still

00:45:23.680 | exists, which was owned by Cluer. And I was on the editorial board and we used to have these

00:45:30.800 | meetings annually where we would complain to Cluer that it was too expensive for the libraries and

00:45:35.360 | that people couldn't publish. And we would really like to have some kind of relief on those fronts

00:45:39.840 | and they would always sympathize, but not do anything. So, we just decided to make a new

00:45:46.960 | journal. And there was the Journal of AI Research, which was on the same model, which had been in

00:45:53.280 | existence for maybe five years or so, and it was going on pretty well. So, we just made a new

00:46:00.480 | journal. I mean, I don't know, I guess it was work, but it wasn't that hard. So, basically,

00:46:06.320 | the editorial board, probably 75% of the editorial board of machine learning resigned and

00:46:14.720 | we founded this new journal. - But it was sort of, it was more open. - Yeah, right. So, it's

00:46:22.320 | completely open. It's open access. Actually, I had a postdoc, George Kanidaris, who wanted to

00:46:29.520 | call these journals free for all. Because there were, I mean, it both has no page charges and has

00:46:37.440 | no access restrictions. And the reason, and so lots of people, I mean, there were people who

00:46:46.960 | were mad about the existence of this journal who thought it was a fraud or something. It would be

00:46:51.280 | impossible, they said, to run a journal like this with basically, I mean, for a long time, I didn't

00:46:56.240 | even have a bank account. I paid for the lawyer to incorporate and the IP address, and it just did

00:47:05.200 | to cost a couple hundred dollars a year to run. It's a little bit more now, but not that much

00:47:09.760 | more. But that's because I think computer scientists are competent and autonomous in a way

00:47:17.440 | that many scientists in other fields aren't. I mean, at doing these kinds of things. We already

00:47:22.320 | types out our own papers. We all have students and people who can hack a website together in

00:47:27.040 | the afternoon. So, the infrastructure for us was like, not a problem. But for other people in other

00:47:32.640 | fields, it's a harder thing to do. - Yeah, and this kind of open access journal is nevertheless

00:47:38.960 | one of the most prestigious journals. So, it's not like a prestige and it can be achieved without

00:47:46.240 | any of the- - Paper is not required for prestige,

00:47:49.120 | it turns out. - So, on the review process side,

00:47:52.320 | actually a long time ago, I don't remember when, but I reviewed a paper where you were also a

00:47:58.080 | reviewer and I remember reading your review being influenced by it. It was really well written. It

00:48:03.120 | influenced how I write feature reviews. You disagreed with me actually. And you made it

00:48:09.680 | my review, but much better. But nevertheless, the review process has its flaws.

00:48:19.280 | And how do you think, what do you think works well? How can it be improved?

00:48:23.200 | - So, actually when I started JamLR, I wanted to do something completely different.

00:48:27.600 | And I didn't because it felt like we needed a traditional journal of record. And so, we just

00:48:34.800 | made JamLR be almost like a normal journal, except for the open access parts of it, basically.

00:48:40.720 | Increasingly, of course, publication is not even a sensible word. You can publish something by

00:48:47.600 | putting it in archive so I can publish everything tomorrow. So, making stuff public is, there's no

00:48:55.360 | barrier. We still need curation and evaluation. I don't have time to read all of archive.

00:49:06.880 | And you could argue that kind of social thumbs upping of articles suffices, right? You might say,

00:49:21.280 | "Oh, heck with this. We don't need journals at all. We'll put everything on archive and people

00:49:25.920 | will upvote and downvote the articles and then your CV will say, "Oh man, he got a lot of upvotes."

00:49:30.880 | So, that's good. But I think there's still value in careful reading and commentary of things. And

00:49:45.040 | it's hard to tell when people are upvoting and downvoting or arguing about your paper on Twitter

00:49:49.440 | and Reddit, whether they know what they're talking about, right? So, then I have the

00:49:55.440 | second order problem of trying to decide whose opinions I should value and such. So, I don't know.

00:50:01.520 | If I had infinite time, which I don't, and I'm not going to do this because I really want to make

00:50:06.560 | robots work, but if I felt inclined to do something more in the publication direction,

00:50:11.920 | I would do this other thing, which I thought about doing the first time, which is to get together

00:50:16.800 | some set of people whose opinions I value and who are pretty articulate. And I guess we would be

00:50:22.880 | public, although we could be private, I'm not sure. And we would review papers. We wouldn't

00:50:27.520 | publish them and you wouldn't submit them. We would just find papers and we would write

00:50:31.040 | reviews and we would make those reviews public. And maybe if you, you know, so we're Leslie's

00:50:38.720 | friends who review papers and maybe eventually if we, our opinion was sufficiently valued,

00:50:44.320 | like the opinion of JMR is valued, then you'd say on your CV that Leslie's friends gave my paper a

00:50:49.920 | five-star reading and that would be just as good as saying I got it accepted into this journal.

00:50:55.440 | So, I think we should have good public commentary and organize it in some way,

00:51:03.760 | but I don't really know how to do it. It's interesting times.

00:51:06.080 | - The way you describe it actually is really interesting. I mean, we do it for movies,

00:51:10.000 | imdb.com. There's experts, critics come in, they write reviews, but there's also

00:51:16.000 | regular non-critics. Humans write reviews and they're separated.

00:51:19.760 | - I like open review. The iClear process I think is interesting.

00:51:29.120 | - It's a step in the right direction, but it's still not as compelling as

00:51:32.960 | reviewing movies or video games. I mean, it sometimes almost, it might be silly,

00:51:40.240 | at least from my perspective to say, but it boils down to the user interface,

00:51:43.760 | how fun and easy it is to actually perform the reviews, how efficient, how much you as a reviewer

00:51:50.160 | get street cred for being a good reviewer. Those human elements come into play.

00:51:56.640 | - No, it's a big investment to do a good review of a paper and the flood of papers is out of control.

00:52:04.000 | Right, so, you know, there aren't 3,000 new, I don't know how many new movies are there in a year.

00:52:08.480 | I don't know, but there's probably gonna be less than how many machine learning papers

00:52:11.920 | are in a year now. And I'm worried, you know, I, right, so I'm like an old person, so of course,

00:52:21.760 | I'm gonna say, "Rawr, rawr, rawr, things are moving too fast. I'm a stick in the mud."

00:52:26.320 | So I can say that, but my particular flavor of that is, I think the horizon for researchers

00:52:34.560 | has gotten very short. That students want to publish a lot of papers and there's a huge,

00:52:41.520 | there's value, it's exciting and there's value in that and you get patted on the head for it

00:52:46.480 | and so on. But, and some of that is fine, but I'm worried that we're driving out people who

00:52:57.760 | would spend two years thinking about something. Back in my day, when we worked on our theses,

00:53:05.280 | we did not publish papers. You did your thesis for years. You picked a hard problem and then

00:53:10.400 | you worked and chewed on it and did stuff and wasted time and for a long time. And when it was,

00:53:16.000 | roughly when it was done, you would write papers. And so I don't know how to, and I don't think

00:53:22.640 | that everybody has to work in that mode, but I think there's some problems that are hard enough

00:53:26.800 | that it's important to have a longer research horizon and I'm worried that

00:53:31.680 | we don't incentivize that at all at this point. - In this current structure.

00:53:36.800 | - Right. - Yeah. So what do you see

00:53:40.560 | as, what are your hopes and fears about the future of AI and continuing on this theme? So AI has gone

00:53:47.280 | through a few winters, ups and downs. Do you see another winter of AI coming? Are you more hopeful

00:53:55.760 | about making robots work, as you said? - I think the cycles are inevitable,

00:54:02.880 | but I think each time we get higher, right? I mean, so, you know, it's like climbing some kind

00:54:09.680 | of landscape with a noisy optimizer. So it's clear that the, you know, the deep learning stuff has

00:54:19.520 | made deep and important improvements. And so the high watermark is now higher. There's no question,

00:54:26.960 | but of course, I think people are overselling and eventually investors, I guess, and other people

00:54:35.680 | look around and say, well, you're not quite delivering on this grand claim and that wild

00:54:41.680 | hypothesis. It's like, probably it's going to crash some amount and then it's okay. I mean,

00:54:48.800 | but I don't, I can't imagine that there's like some awesome monotonic improvement from here to

00:54:55.200 | human level AI. - So in, you know, I have to ask this question. I probably anticipate answers,

00:55:02.320 | the answers, but do you have a worry, short term or long term about the existential threats of AI

00:55:10.240 | and maybe short term, less existential, but more robots taking away jobs?

00:55:18.880 | - Well, actually, let me talk a little bit about utility. Actually, I had an interesting

00:55:27.200 | conversation with some military ethicists who wanted to talk to me about autonomous weapons.

00:55:32.560 | And they were interesting, smart, well-educated guys who didn't know too much about AI or machine

00:55:40.880 | learning. And the first question they asked me was, has your robot ever done something you didn't

00:55:45.280 | expect? And I like burst out laughing because anybody who's ever done something on the robot,

00:55:50.560 | right, knows that they don't do much. And what I realized was that their model of how we program a

00:55:56.240 | robot was completely wrong. Their model of how we can program a robot was like Lego Mindstorms,

00:56:02.560 | like, oh, go forward a meter, turn left, take a picture, do this, do that. And so if you have

00:56:07.280 | that model of programming, then it's true. It's kind of weird that your robot would do something

00:56:12.560 | that you didn't anticipate. But the fact is, and actually, so now this is my new educational

00:56:17.680 | mission. If I have to talk to non-experts, I try to teach them the idea that we don't operate,

00:56:24.560 | we operate at least one or maybe many levels of abstraction about that. And we say, oh,

00:56:29.680 | here's a hypothesis class. Maybe it's a space of plans, or maybe it's a space of classifiers or

00:56:35.200 | whatever, but there's some set of answers and an objective function. And then we work on some

00:56:40.160 | optimization method that tries to optimize a solution in that class. And we don't know what

00:56:46.800 | solution is going to come out. So I think it's important to communicate that. So I mean, of

00:56:52.320 | course, probably people who listen to this, they know that lesson. But I think it's really critical

00:56:56.960 | to communicate that lesson. And then lots of people are now talking about the value alignment

00:57:01.840 | problem. So you want to be sure as robots or software systems get more competent, that their

00:57:09.360 | objectives are aligned with your objectives, or that our objectives are compatible in some way,

00:57:14.480 | or we have a good way of mediating when they have different objectives. And so I think it is

00:57:20.240 | important to start thinking in terms, like, you don't have to be freaked out by the robot apocalypse

00:57:26.720 | to accept that it's important to think about objective functions of value alignment.

00:57:30.480 | >> Yes. >> And that you have to really,

00:57:32.960 | everyone who's done optimization knows that you have to be careful what you wish for, that,

00:57:37.120 | you know, sometimes you get the optimal solution, and you realize, man, that objective was wrong.

00:57:41.920 | So pragmatically, in the shortish term, it seems to me that those are really interesting and

00:57:50.480 | critical questions. And the idea that we're going to go from being people who engineer algorithms

00:57:55.040 | to being people who engineer objective functions, I think that's definitely going to happen. And

00:58:00.400 | that's going to change our thinking and methodology. >> You started at Stanford

00:58:05.360 | philosophy, that's where you should go back to philosophy. >> Philosophy, maybe.

00:58:09.600 | >> Designing objective functions. >> Well, I mean, they're mixed together,

00:58:12.880 | because as we also know, as machine learning people, right, when you design, in fact, this is

00:58:18.080 | the lecture I gave in class today, when you design an objective function, you have to wear both hats.

00:58:23.360 | There's the hat that says, what do I want? And there's the hat that says, but I know what my

00:58:28.320 | optimizer can do to some degree. And I have to take that into account. So it's always a tradeoff,

00:58:34.640 | and we have to kind of be mindful of that. The part about taking people's jobs, I understand

00:58:41.520 | that that's important. I don't understand sociology or economics or people very well.

00:58:47.920 | So I don't know how to think about that. >> So that's, yeah, so there might be a

00:58:51.840 | sociological aspect there, the economic aspect that's very difficult to think about. Okay.

00:58:56.400 | >> I mean, I think other people should be thinking about it, but I'm just, that's not my strength.

00:58:59.840 | >> So what do you think is the most exciting area of research in the short term,

00:59:04.320 | for the community and for yourself? >> Well, so, I mean, there's this story

00:59:08.400 | I've been telling about how to engineer intelligent robots, right? So that's what we want to do. We

00:59:15.920 | all kind of want to do, well, I mean, some set of us want to do this. And the question is, what's

00:59:20.240 | the most effective strategy? And we've tried, and there's a bunch of different things you could do

00:59:24.960 | at the extremes, right? One super extreme is we do introspection and we write a program. Okay,

00:59:30.960 | that has not worked out very well. Another extreme is we take a giant bunch of neural

00:59:35.680 | goo and we try to train it up to do something. I don't think that's going to work either.

00:59:39.360 | So the question is, what's the middle ground? And again, this isn't a theological question

00:59:48.480 | or anything like that. It's just like, how do we, what's the best way to make this work out?

00:59:54.960 | And I think it's clear, it's a combination of learning. To me, it's clear. It's a combination

01:00:00.240 | of learning and not learning. And what should that combination be? And what's the stuff we

01:00:05.120 | build in? So to me, that's the most compelling question. >> And when you say engineer robots,

01:00:09.680 | you mean engineering systems that work in the real world? Is that, that's the emphasis?

01:00:17.600 | Last question. Which robots or robot is your favorite from science fiction?

01:00:23.200 | So you can go with Star Wars or RTD2, or you can go with more modern, maybe Hal from-

01:00:33.280 | >> No, I don't think I have a favorite robot from science fiction.

01:00:37.040 | >> This is back to, you like to make robots work in the real world here, not in-

01:00:45.360 | >> I mean, I love the process. And I care more about the process.

01:00:49.920 | >> The engineering process. >> Yeah. I mean, I do research because it's fun,

01:00:53.840 | not because I care about what we produce. >> Well, that's a beautiful note, actually,

01:00:59.520 | to end on. Leslie, thank you so much for talking today.

01:01:01.840 | >> Sure, it's been fun.

01:01:03.120 | [BLANK_AUDIO]

01:01:04.580 | [NO SPEECH]

01:01:04.660 | [BLANK_AUDIO]

01:01:14.660 | [BLANK_AUDIO]

Leslie Kaelbling: Reinforcement Learning, Planning, and Robotics | Lex Fridman Podcast #15

Chapters