back to index

Leslie Kaelbling: Reinforcement Learning, Planning, and Robotics | Lex Fridman Podcast #15


Chapters

0:0 Intro
0:42 What made you get excited about AI
1:40 Philosophy and computer science
2:20 Majors in AI
3:8 AI philosophers
4:17 The philosophical gap
5:20 The first robot
9:0 Reinventing reinforcement learning
11:42 Roadblocks in symbolic reasoning
14:11 Where symbolic reasoning is useful
14:59 Why abstractions are critical
17:13 Automated construction of abstractions
19:21 Markov decision process
20:51 Partial observable Markov decision process
21:45 Planning under uncertainty
23:22 Optimality
25:20 Science vs Engineering
27:37 Belief vs State Space
32:40 Starting at the goal
34:8 Planning human life
35:37 Modelbased vs modelfree
37:27 Perception vs planning
40:6 Convolution
41:10 Selfawareness
42:45 Humanlevel intelligence
43:26 Test of intelligence
44:57 Journal of Machine Learning Research
46:17 Open Access
47:35 Review Process
51:29 Paper Reviews
53:35 Hopes and Fears
54:58 Existential Threats
58:50 Most Exciting Research

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Leslie Kaelbling. She's a roboticist and professor at
00:00:05.360 | MIT. She's recognized for her work in reinforcement learning, planning, robot navigation, and several
00:00:12.080 | other topics in AI. She won the IJCAI Computers and Thought Award and was the editor-in-chief
00:00:18.560 | of the prestigious Journal of Machine Learning Research. This conversation is part of the
00:00:24.320 | Artificial Intelligence Podcast at MIT and beyond. If you enjoy it, subscribe on YouTube, iTunes,
00:00:31.280 | or simply connect with me on Twitter @LexFriedman, spelled F-R-I-D. And now,
00:00:37.760 | here's my conversation with Leslie Kaelbling.
00:00:41.040 | - What made me get excited about AI, I can say that, is I read Gödel, Escher, Bach when I was
00:00:47.680 | in high school. That was pretty formative for me because it exposed the interestingness of
00:00:56.240 | primitives and combination and how you can make complex things out of simple parts,
00:01:01.520 | and ideas of AI and what kinds of programs might generate intelligent behavior.
00:01:07.120 | - So you first fell in love with AI reasoning logic versus robots?
00:01:12.400 | - Yeah, the robots came because my first job, so I finished an undergraduate degree in philosophy
00:01:18.240 | at Stanford and was about to finish a master's in computer science, and I got hired at SRI
00:01:24.160 | in their AI lab. And they were building a robot. It was a kind of a follow-on to Shakey,
00:01:30.960 | but all the Shakey people were not there anymore. And so my job was to try to get this robot to do
00:01:36.080 | stuff, and that's really kind of what got me interested in robots.
00:01:39.280 | - So maybe taking a small step back, your bachelor's in Stanford in philosophy,
00:01:44.400 | did a master's and PhD in computer science, but the bachelor's in philosophy. So what was that
00:01:49.440 | journey like? What elements of philosophy do you think you bring to your work in computer science?
00:01:54.880 | - So it's surprisingly relevant. So part of the reason that I didn't do a computer science
00:02:00.080 | undergraduate degree was that there wasn't one at Stanford at the time, but that there's part
00:02:04.480 | of philosophy, and in fact, Stanford has a special sub-major in something called now
00:02:08.320 | symbolic systems, which is logic, model, theory, formal semantics of natural language. And so that's
00:02:15.120 | actually a perfect preparation for work in AI and computer science.
00:02:19.680 | - That's kind of interesting. So if you were interested in artificial intelligence,
00:02:23.840 | what kind of majors were people even thinking about taking? What is it in neuroscience?
00:02:30.800 | So besides philosophies, what were you supposed to do if you were
00:02:35.520 | fascinated by the idea of creating intelligence?
00:02:37.680 | - There weren't enough people who did that for that even to be a conversation. I mean,
00:02:42.240 | I think probably philosophy. I mean, it's interesting in my class, my graduating class of
00:02:50.000 | undergraduate philosophers, probably maybe slightly less than half went on in computer science,
00:02:57.840 | slightly less than half went on in law, and like one or two went on in philosophy.
00:03:02.320 | So it was a common kind of connection.
00:03:05.520 | - Do you think AI researchers have a role, be part-time philosophers, or should they stick
00:03:10.240 | to the solid science and engineering without sort of taking the philosophizing tangents? I mean,
00:03:16.240 | you work with robots, you think about what it takes to create intelligent beings. Aren't you
00:03:21.760 | the perfect person to think about the big picture philosophy at all?
00:03:25.200 | - The parts of philosophy that are closest to AI, I think, or at least the closest to AI that I think
00:03:29.520 | about are stuff like belief and knowledge and denotation and that kind of stuff. And that's,
00:03:36.160 | you know, it's quite formal and it's like just one step away from
00:03:39.600 | the kinds of computer science work that we do kind of routinely.
00:03:44.000 | I think that there are important questions still about what you can do with a machine and what you
00:03:53.200 | can't and so on. Although at least my personal view is that I'm completely a materialist and
00:03:57.840 | I don't think that there's any reason why we can't make a robot be behaviorally indistinguishable
00:04:04.160 | from a human. And the question of whether it's distinguishable internally, whether it's a zombie
00:04:10.640 | or not in philosophy terms, I actually don't, I don't know, and I don't know if I care too much
00:04:16.480 | about that. - Right, but there is philosophical notions, they're mathematical and philosophical
00:04:22.080 | because we don't know so much, of how difficult that is, how difficult is the perception problem,
00:04:27.520 | how difficult is the planning problem, how difficult is it to operate in this world successfully.
00:04:32.720 | Because our robots are not currently as successful as human beings in many tasks,
00:04:37.920 | the question about the gap between current robots and human beings borders a little bit on
00:04:44.560 | philosophy. You know, the expanse of knowledge that's required to operate in this world,
00:04:51.600 | and the ability to form common sense knowledge, the ability to reason about uncertainty,
00:04:56.640 | much of the work you've been doing, there's open questions there that, I don't know,
00:05:03.600 | require to activate a certain big picture view. - To me that doesn't seem like a philosophical
00:05:09.440 | gap at all. To me, there is a big technical gap, there's a huge technical gap, but I don't see any
00:05:16.640 | reason why it's more than a technical gap. - Perfect. So, when you mentioned AI, you mentioned
00:05:23.600 | SRI, and maybe, can you describe to me when you first fell in love with robotics, the robots,
00:05:31.840 | were inspired, which, so you mentioned Flaky or Shaky, Shaky Flaky, and what was the robot that
00:05:40.480 | first captured your imagination, what's possible? - Right, well, so the first robot I worked with was
00:05:44.800 | Flaky. Shaky was a robot that the SRI people had built, but by the time, I think when I arrived,
00:05:50.880 | it was sitting in a corner of somebody's office dripping hydraulic fluid into a pan.
00:05:55.040 | But it's iconic, and really, everybody should read the Shaky Tech Report, because it has so
00:06:01.760 | many good ideas in it. I mean, they invented A*Search and symbolic planning and learning
00:06:09.200 | macro operators. They had low-level kind of configuration space planning for their robot,
00:06:16.080 | they had vision, they had all, this, the basic ideas of a ton of things. - Can you take,
00:06:20.800 | step by, Shaky have arms, what was the job, what was the goal? - Shaky was a mobile robot,
00:06:26.560 | but it could push objects, and so it would move things around. - With which actuator, with arms?
00:06:31.680 | - With itself, with its base. - Okay, great. - So it could, but it, and they had painted the base
00:06:39.040 | boards black, so it used vision to localize itself in a map, it detected objects, it could detect
00:06:47.680 | objects that were surprising to it, it would plan and replan based on what it saw, it reasoned about
00:06:54.480 | whether to look and take pictures. I mean, it really had the basics of so many of the things
00:07:01.280 | that we think about now. - How did it represent the space around it? - So it had representations
00:07:08.000 | at a bunch of different levels of abstraction, so it had, I think, a kind of an occupancy grid
00:07:12.400 | of some sort at the lowest level. At the high level, it was abstract, symbolic kind of rooms
00:07:18.800 | and connectivity. - So where does Flaky come in? - Yeah, okay, so I showed up at SRI, and the,
00:07:25.120 | we were building a brand new robot. As I said, none of the people from the previous project were
00:07:31.040 | kind of there or involved anymore, so we were kind of starting from scratch, and my advisor
00:07:37.680 | was Stan Rosenstein, he ended up being my thesis advisor, and he was motivated by this idea of
00:07:44.080 | situated computation or situated automata, and the idea was that the tools of logical reasoning were
00:07:52.400 | important, but possibly only for the engineers or designers to use in the analysis of a system,
00:08:01.200 | but not necessarily to be manipulated in the head of the system itself, right? So I might use logic
00:08:07.600 | to prove a theorem about the behavior of my robot, even if the robot's not using logic in its head
00:08:12.640 | to prove theorems, right? So that was kind of the distinction. And so the idea was to kind of use
00:08:18.560 | those principles to make a robot do stuff, but a lot of the basic things we had to kind of
00:08:27.040 | learn for ourselves, 'cause I had zero background in robotics, I didn't know anything about control,
00:08:31.360 | I didn't know anything about sensors, so we reinvented a lot of wheels on the way to getting
00:08:35.760 | that robot to do stuff. - Do you think that was an advantage
00:08:38.240 | or a hindrance? - Oh, no, I mean, I'm big in favor of
00:08:42.480 | wheel reinvention, actually. I mean, I think you learn a lot by doing it. It's important, though,
00:08:48.240 | to eventually have the pointers so that you can see what's really going on, but I think you can
00:08:53.440 | appreciate much better the good solutions once you've messed around a little bit on your own
00:08:59.440 | and found a bad one. - Yeah, I think you mentioned
00:09:01.440 | reinventing reinforcement learning and referring to rewards as pleasures, a pleasure, I think,
00:09:09.520 | which I think is a nice name for it. - Yeah, it seemed good to me.
00:09:12.480 | - It's more fun, almost. Do you think you could tell the history of AI, machine learning,
00:09:19.040 | reinforcement learning, how you think about it from the '50s to now?
00:09:23.360 | - One thing is that it oscillates, right? So things become fashionable and then they go out
00:09:29.360 | and then something else becomes cool and then it goes out and so on. So there's some interesting
00:09:34.480 | sociological process that actually drives a lot of what's going on. Early days was kind of
00:09:40.800 | cybernetics and control, right? And the idea of homeostasis, right? People who made these robots
00:09:47.840 | that could, I don't know, try to plug into the wall when they needed power and then come loose
00:09:53.840 | and roll around and do stuff. And then I think over time, the thought, well, that was inspiring,
00:10:00.560 | but people said, no, no, no, we want to get maybe closer to what feels like real intelligence or
00:10:04.480 | human intelligence. And then maybe the expert systems people tried to do that, but maybe a
00:10:14.720 | little too superficially, right? So, oh, we get the surface understanding of what intelligence is
00:10:21.440 | like because I understand how a steel mill works and I can try to explain it to you and you can
00:10:25.600 | write it down in logic and then we can make a computer infer that. And then that didn't work
00:10:31.120 | out. But what's interesting, I think, is when a thing starts to not be working very well,
00:10:37.520 | it's not only do we change methods, we change problems, right? So it's not like we have better
00:10:44.160 | ways of doing the problem that the expert systems people were trying to do. We have no ways of
00:10:48.160 | trying to do that problem. Oh yeah, I know, I think, or maybe a few, but we kind of give up
00:10:55.200 | on that problem and we switch to a different problem and we work that for a while and we make
00:11:01.120 | progress. - As a broad community.
00:11:02.480 | - As a community, yeah. - And there's a lot of people
00:11:04.160 | who would argue you don't give up on the problem, it's just you decrease the number of people working
00:11:09.520 | on it. You almost kind of like put it on the shelf, say, we'll come back to this 20 years later.
00:11:13.760 | - Yeah, I think that's right. Or you might decide that it's malformed.
00:11:18.080 | - Like you might say, it's wrong to just try to make something that does superficial symbolic
00:11:25.520 | reasoning behave like a doctor. You can't do that until you've had the sensory motor experience
00:11:32.640 | of being a doctor or something, right? So there's arguments that say that that problem was not well
00:11:37.280 | formed, or it could be that it is well formed, but we just weren't approaching it well.
00:11:42.240 | - So you mentioned that your favorite part of logic and symbolic systems is that they give
00:11:47.600 | short names for large sets. So there is some use to this, the use to symbolic reasoning. So
00:11:55.120 | looking at expert systems and symbolic computing, what do you think are the roadblocks that were
00:12:00.320 | hit in the '80s and '90s? - Ah, okay. So right, so the fact that I'm
00:12:04.720 | not a fan of expert systems doesn't mean that I'm not a fan of some kinds of symbolic reasoning.
00:12:13.040 | Let's see, roadblocks. Well, the main roadblock, I think, was that the idea that humans could
00:12:19.600 | articulate their knowledge effectively into some kind of logical statements.
00:12:26.000 | - So it's not just the cost, the effort, but really just the capability of doing it.
00:12:31.120 | - Right, because we're all experts in vision, right? But totally don't have introspective
00:12:36.720 | access into how we do that, right? And it's true that, I mean, I think the idea was, well, of
00:12:45.120 | course, even people then would know, of course, I wouldn't ask you to please write down the rules
00:12:48.800 | that you use for recognizing a water bottle. That's crazy. And everyone understood that. But
00:12:53.440 | we might ask you to please write down the rules you use for deciding, I don't know, what tie to
00:12:59.920 | put on or how to set up a microphone or something like that. But even those things, I think people
00:13:07.920 | maybe, I think what they found, I'm not sure about this, but I think what they found was that the
00:13:12.720 | so-called experts could give explanations that sort of post hoc explanations for how and why
00:13:19.120 | they did things, but they weren't necessarily very good. And then they depended on maybe some
00:13:27.680 | kinds of perceptual things, which again, they couldn't really define very well. So I think
00:13:33.840 | fundamentally, I think that the underlying problem with that was the assumption that people could
00:13:39.040 | articulate how and why they make their decisions. - Right, so it's almost encoding the knowledge,
00:13:45.760 | converting from expert to something that a machine could understand and reason with.
00:13:51.200 | - No, no, no, not even just encoding, but getting it out of you.
00:13:54.560 | - Just... - Right? Not writing it,
00:13:58.240 | I mean, yes, hard also to write it down for the computer, but I don't think that people can
00:14:03.680 | produce it. You can tell me a story about why you do stuff, but I'm not so sure that's the why.
00:14:10.080 | - Great. So there are still on the hierarchical planning side,
00:14:18.560 | places where symbolic reasoning is very useful. So as you've talked about, so...
00:14:25.040 | - Right, so don't... - Where's the gap?
00:14:29.520 | - Yeah, okay, good. So saying that humans can't provide a description of their reasoning processes,
00:14:36.480 | that's okay, fine, but that doesn't mean that it's not good to do reasoning of various styles
00:14:42.400 | inside a computer. Those are just two orthogonal points. So then the question is, what kind of
00:14:49.120 | reasoning should you do inside a computer? And the answer is, I think you need to do all different
00:14:54.560 | kinds of reasoning inside a computer, depending on what kinds of problems you face.
00:14:59.040 | - I guess the question is, what kind of things can you encode symbolically so you can reason about?
00:15:07.840 | - I think the idea about, and even symbolic, I don't even like that terminology,
00:15:16.080 | 'cause I don't know what it means technically and formally. I do believe in abstractions.
00:15:21.440 | So abstractions are critical, right? You cannot reason at completely fine grain about everything
00:15:27.840 | in your life, right? You can't make a plan at the level of images and torques for getting a PhD.
00:15:34.080 | So you have to reduce the size of the state space and you have to reduce the horizon
00:15:39.600 | if you're gonna reason about getting a PhD or even buying the ingredients to make dinner.
00:15:44.160 | And so how can you reduce the spaces and the horizon of the reasoning you have to do? And
00:15:50.960 | the answer is abstraction, spatial abstraction, temporal abstraction. I think abstraction along
00:15:55.280 | the lines of goals is also interesting. Like you might, or well, abstraction and decomposition.
00:16:01.600 | Goals is maybe more of a decomposition thing. So I think that's where these kinds of, if you want
00:16:06.880 | to call it symbolic or discrete models come in. You talk about a room of your house instead of
00:16:13.280 | your pose. You talk about doing something during the afternoon instead of at 2.54.
00:16:20.000 | And you do that because it makes your reasoning problem easier. And also because
00:16:28.480 | you don't have enough information to reason in high fidelity about your pose of your elbow at
00:16:35.280 | 2.35 this afternoon anyway. - Right. When you're trying to get a PhD.
00:16:39.360 | - Right. Or when you're doing anything really. - Yeah, okay.
00:16:41.920 | - Except for at that moment. At that moment, you do have to reason about the pose of your elbow,
00:16:46.000 | maybe. But then maybe you do that in some continuous joint space kind of model. So again,
00:16:53.440 | my biggest point about all of this is that there should be, that dogma is not the thing, right?
00:16:59.280 | It shouldn't be that I am in favor against symbolic reasoning and you're in favor against
00:17:04.080 | neural networks. It should be that just computer science tells us what the right answer to all
00:17:10.480 | these questions is if we were smart enough to figure it out. - Well, yeah. When you try to
00:17:13.840 | actually solve the problem with computers, the right answer comes out. You mentioned abstractions.
00:17:19.680 | I mean, neural networks form abstractions or rather there's automated ways to form abstractions.
00:17:27.200 | - Absolutely. - And there's
00:17:27.840 | expert driven ways to form abstractions and expert human driven ways. And humans just seems to be
00:17:34.240 | way better at forming abstractions currently and certain problems. So when you're referring to
00:17:39.840 | 2.45 p.m. versus afternoon, how do we construct that taxonomy? Is there any room for automated
00:17:48.960 | construction of such abstractions? - Oh, I think eventually, yeah. I mean,
00:17:53.680 | I think when we get to be better machine learning engineers, we'll build algorithms that
00:18:00.240 | build awesome abstractions. - That are useful in this kind
00:18:03.200 | of way that you're describing. - Yeah.
00:18:04.400 | - Yeah. So let's then step from the abstraction discussion and let's talk about BOM MDPs,
00:18:15.840 | partially observable Markov decision processes. So uncertainty. So first,
00:18:20.160 | what are Markov decision processes? - What are Markov decision processes?
00:18:23.600 | - And maybe how much of our world can be models and MDPs? How much, when you wake up in the
00:18:30.080 | morning and you're making breakfast, do you think of yourself as an MDP? So how do you think about
00:18:35.760 | MDPs and how they relate to our world? - Well, so there's a stance question, right?
00:18:41.280 | So a stance is a position that I take with respect to a problem. So I, as a researcher or a person
00:18:48.640 | who designs systems, can decide to make a model of the world around me in some terms. So I take
00:18:56.080 | this messy world and I say, I'm going to treat it as if it were a problem of this formal kind,
00:19:02.560 | and then I can apply solution concepts or algorithms or whatever to solve that formal
00:19:06.880 | thing, right? So of course the world is not anything. It's not an MDP or a POMDP. I don't
00:19:11.440 | know what it is, but I can model aspects of it in some way or some other way. And when I model some
00:19:16.960 | aspect of it in a certain way, that gives me some set of algorithms I can use. - You can model the
00:19:21.760 | world in all kinds of ways. Some are more accepting of uncertainty, more easily modeling uncertainty
00:19:31.520 | of the world. Some really force the world to be deterministic. And so certainly MDPs model the
00:19:39.440 | uncertainty of the world. - Yes. Model some uncertainty. They model not present state
00:19:44.880 | uncertainty, but they model uncertainty in the way the future will unfold. - Right. So what are
00:19:52.320 | Markov decision processes? - Okay, so Markov decision process is a model. It's a kind of a
00:19:56.160 | model that you could make that says, I know completely the current state of my system.
00:20:00.800 | And what it means to be a state is that I have all the information right now that will let me make
00:20:08.000 | predictions about the future as well as I can. So that remembering anything about my history
00:20:13.040 | wouldn't make my predictions any better. But then it also says that then I can take some actions
00:20:21.840 | that might change the state of the world and that I don't have a deterministic model of those
00:20:26.240 | changes. I have a probabilistic model of how the world might change. It's a useful model for some
00:20:33.440 | kinds of systems. I think it's a, I mean, it's certainly not a good model for most problems.
00:20:41.520 | I think because for most problems, you don't actually know the state. For most problems,
00:20:46.960 | it's partially observed. So that's now a different problem class. - So, okay, that's where the
00:20:53.280 | POMDPs, the partially observable Markov decision processes step in. So how do they address the
00:21:00.080 | fact that you can't observe most, you have incomplete information about most of the world
00:21:05.360 | around you? - Right. So now the idea is we still kind of postulate that there exists a state. We
00:21:10.880 | think that there is some information about the world out there such that if we knew that we
00:21:15.840 | could make good predictions, but we don't know the state. And so then we have to think about how,
00:21:21.600 | but we do get observations. Maybe I get images or I hear things or I feel things, and those might be
00:21:27.760 | local or noisy. And so therefore they don't tell me everything about what's going on. And then I
00:21:31.760 | have to reason about, given the history of actions I've taken and observations I've gotten,
00:21:37.680 | what do I think is going on in the world? And then given my own kind of uncertainty about what's
00:21:42.640 | going on in the world, I can decide what actions to take. - And so difficult is this problem of
00:21:48.160 | planning under uncertainty in your view, in your long experience of modeling the world,
00:21:54.000 | trying to deal with this uncertainty in, especially in real world systems. - Optimal planning for even
00:22:02.720 | discrete POMDPs can be undecidable depending on how you set it up. And so lots of people say,
00:22:10.720 | I don't use POMDPs because they are intractable. And I think that that's a kind of a very funny
00:22:16.480 | thing to say, because the problem you have to solve is the problem you have to solve.
00:22:22.080 | So if the problem you have to solve is intractable, that's what makes us AI people,
00:22:25.840 | right? So we solve, we understand that the problem we're solving is wildly intractable,
00:22:31.680 | that we can't, we will never be able to solve it optimally. At least I don't. Yeah, right. So
00:22:37.760 | later we can come back to an idea about bounded optimality and something. But anyway,
00:22:42.720 | we can't come up with optimal solutions to these problems. So we have to make approximations,
00:22:47.680 | approximations in modeling, approximations in solution algorithms and so on. And so
00:22:52.160 | I don't have a problem with saying, yeah, my problem actually, it is POMDP in continuous
00:22:58.800 | space with continuous observations and it's so computationally complex, I can't even think about
00:23:04.000 | it's, you know, big O, whatever. But that doesn't prevent me from, it helps me,
00:23:10.320 | gives me some clarity to think about it that way. And to then take steps to make approximation after
00:23:17.600 | approximation to get down to something that's like computable in some reasonable time.
00:23:22.160 | When you think about optimality, you know, the community broadly has shifted on that,
00:23:27.600 | I think a little bit in how much they value the idea of optimality, of chasing an optimal solution.
00:23:35.680 | How has your views of chasing an optimal solution changed over the years when you work with robots?
00:23:42.320 | That's interesting. I think we have a little bit of a methodological crisis, actually,
00:23:49.360 | from the theoretical side. I mean, I do think that theory is important and that right now we're not
00:23:54.080 | doing much of it. So there's lots of empirical hacking around and training this and doing that
00:24:00.640 | and reporting numbers, but is it good? Is it bad? We don't know. It's very hard to say things.
00:24:05.440 | And if you look at like computer science theory, so people talked for a while, everyone was about
00:24:16.800 | solving problems optimally or completely. And then there were interesting relaxations, right? So
00:24:22.400 | people look at, oh, can I, are there regret bounds or can I do some kind of, you know,
00:24:29.360 | approximation? Can I prove something that I can approximately solve this problem or that I get
00:24:33.760 | closer to the solution as I spend more time and so on? What's interesting, I think, is that we don't
00:24:39.920 | have good approximate solution concepts for very difficult problems, right? I like to, you know,
00:24:49.200 | I like to say that I'm interested in doing a very bad job of very big problems.
00:24:53.040 | - That's a good quote.
00:24:56.000 | - Right. So very bad job of very big problems. I like to do that. But I wish I could say
00:25:02.960 | something. I wish I had a, I don't know, some kind of a formal solution concept that I could use to
00:25:12.000 | say, oh, this algorithm actually, it gives me something. Like I know what I'm going to get.
00:25:17.520 | I can do something other than just run it and get out 6.7.
00:25:20.640 | - That notion is still somewhere deeply compelling to you. The notion that you can say,
00:25:26.240 | you can drop thing on the table says this, you can expect this algorithm will give me some good
00:25:33.120 | results.
00:25:33.440 | - I hope there's, I hope science will, I mean, there's engineering and there's science. I think
00:25:39.280 | that they're not exactly the same. And I think right now we're making huge engineering, like,
00:25:45.920 | leaps and bounds. So the engineering is running away ahead of the science, which is cool and
00:25:50.880 | often how it goes, right? So we're making things and nobody knows how and why they work, roughly.
00:25:55.200 | But we need to turn that into science.
00:26:00.160 | - There's some form, it's, yeah, there's some room for formalizing.
00:26:04.720 | - We need to know what the principles are. Why does this work? Why does that not work? I mean,
00:26:08.400 | for a while people built bridges by trying, but now we can often predict whether it's going to
00:26:13.440 | work or not without building it. Can we do that for learning systems or for robots?
00:26:18.320 | - So your hope is from a materialistic perspective that
00:26:21.600 | intelligence, artificial intelligence systems, robots, are just fancier bridges.
00:26:28.080 | Belief space. What's the difference between belief space and state space? So you mentioned MDPs,
00:26:34.240 | POMDPs, reasoning about, you sense the world, there's a state. What's this belief space idea?
00:26:42.720 | - Yeah, that sounds so good.
00:26:44.240 | - That sounds good. So belief space, that is, instead of thinking about what's the state of
00:26:50.640 | the world and trying to control that as a robot, I think about what is the space of beliefs that I
00:26:58.800 | could have about the world? What's, if I think of a belief as a probability distribution of ways the
00:27:03.680 | world could be, a belief state as a distribution, and then my control problem, if I'm reasoning
00:27:10.080 | about how to move through a world I'm uncertain about, my control problem is actually the problem
00:27:16.160 | of controlling my beliefs. So I think about taking actions, not just what effect they'll
00:27:21.040 | have on the world outside, but what effect they'll have on my own understanding of the world outside.
00:27:25.200 | And so that might compel me to ask a question or look somewhere to gather information,
00:27:31.920 | which may not really change the world state, but it changes my own belief about the world.
00:27:36.480 | - That's a powerful way to empower the agent to reason about the world, to explore the world.
00:27:44.480 | What kind of problems does it allow you to solve to consider belief space versus just state space?
00:27:51.680 | - Well, any problem that requires deliberate information gathering, right? So if,
00:27:56.800 | in some problems, like chess, there's no uncertainty, or maybe there's uncertainty
00:28:03.440 | about the opponent, there's no uncertainty about the state. And some problems there's
00:28:09.760 | uncertainty, but you gather information as you go, right? You might say, "Oh, I'm driving my
00:28:15.200 | autonomous car down the road, and it doesn't know perfectly where it is, but the light hours are all
00:28:19.120 | going all the time, so I don't have to think about whether to gather information." But if you're a
00:28:24.640 | human driving down the road, you sometimes look over your shoulder to see what's going on behind
00:28:29.680 | you in the lane, and you have to decide whether you should do that now. And you have to trade off
00:28:37.280 | the fact that you're not seeing in front of you, and you're looking behind you, and how valuable
00:28:41.200 | is that information, and so on. And so to make choices about information gathering,
00:28:46.000 | you have to reason in belief space. Also, I mean, also to just take into account your own
00:28:55.440 | uncertainty before trying to do things. So you might say, "If I understand where I'm standing
00:29:02.640 | relative to the door jam pretty accurately, then it's okay for me to go through the door. But if
00:29:08.000 | I'm really not sure where the door is, then it might be better to not do that right now."
00:29:12.880 | The degree of your uncertainty about the world is actually part of the thing you're trying to
00:29:17.760 | optimize in forming the plan, right? That's right.
00:29:20.960 | So this idea of a long horizon of planning for a PhD, or just even how to get out of the house,
00:29:27.040 | or how to make breakfast. You show this presentation of the WTF, where's the fork,
00:29:32.720 | of robot looking at a sink. And can you describe how we plan in this world,
00:29:40.640 | of this idea of hierarchical planning we've mentioned? So yeah, how can a robot hope to
00:29:47.360 | plan about something with such a long horizon, where the goal is quite far away?
00:29:53.840 | People, since probably reasoning began, have thought about hierarchical reasoning.
00:29:59.760 | The temporal hierarchy in particular. Well, there's spatial hierarchy, but let's talk
00:30:03.040 | about temporal hierarchy. So you might say, "Oh, I have this long execution I have to do,
00:30:08.960 | but I can divide it into some segments abstractly." So maybe you have to get out of the house,
00:30:15.360 | I have to get in the car, I have to drive, and so on. And so you can plan. If you can build
00:30:22.720 | abstractions, so this, we started out by talking about abstractions, and we're back to that now.
00:30:26.960 | If you can build abstractions in your state space, and abstractions, sort of temporal abstractions,
00:30:34.560 | then you can make plans at a high level. And you can say, "I'm going to go to town, and then I'll
00:30:40.160 | have to get gas, and then I can go here, and I can do this other thing." And you can reason about
00:30:43.840 | the dependencies and constraints among these actions, again, without thinking about the complete
00:30:50.000 | details. What we do in our hierarchical planning work is then say, "All right, I make a plan at a
00:30:57.280 | high level of abstraction. I have to have some reason to think that it's feasible without
00:31:03.920 | working it out in complete detail." And that's actually the interesting step. I always like to
00:31:08.800 | talk about walking through an airport. Like, you can plan to go to New York and arrive at the
00:31:14.160 | airport, and then find yourself an office building later. You can't even tell me in advance what your
00:31:20.000 | plan is for walking through the airport. Partly because you're too lazy to think about it, maybe,
00:31:24.960 | but partly also because you just don't have the information. You don't know what gate you're
00:31:28.400 | landing in, or what people are going to be in front of you, or anything. So there's no point
00:31:34.320 | in planning in detail. But you have to have -- you have to make a leap of faith that you can figure
00:31:41.040 | it out once you get there. And it's really interesting to me how you arrive at that.
00:31:47.680 | How do you -- so you have learned over your lifetime to be able to make some kinds of
00:31:53.040 | predictions about how hard it is to achieve some kinds of sub-goals. And that's critical. Like,
00:31:58.800 | you would never plan to fly somewhere if you couldn't -- didn't have a model of how hard it
00:32:03.360 | was to do some of the intermediate steps. So one of the things we're thinking about now is,
00:32:06.800 | how do you do this kind of very aggressive generalization to situations that you haven't
00:32:14.080 | been in and so on, to predict how long will it take to walk through the Kuala Lumpur airport?
00:32:18.640 | Like, you could give me an estimate and it wouldn't be crazy. And you have to have an estimate of that
00:32:24.560 | in order to make plans that involve walking through the Kuala Lumpur airport, even if you
00:32:28.880 | don't need to know it in detail. So I'm really interested in these kinds of abstract models and
00:32:34.160 | how do we acquire them. But once we have them, we can use them to do hierarchical reasoning,
00:32:38.960 | which I think is very important. Yeah, there's this notion of goal regression and
00:32:44.720 | pre-image backchaining, this idea of starting at the goal and just forming these big clouds
00:32:50.960 | of states. I mean, it's almost like saying to the airport, you know, you know, once you show up
00:32:59.200 | to the airport, that you're like a few steps away from the goal. So like, thinking of it this way,
00:33:07.040 | it's kind of interesting. I don't know if you have sort of further comments on that,
00:33:12.320 | of starting at the goal. Yeah, I mean, it's interesting that Simon, Herb Simon, back in the
00:33:20.000 | early days of AI, talked a lot about means-ends reasoning and reasoning back from the goal.
00:33:25.120 | There's a kind of an intuition that people have that the number of, that state space is big,
00:33:32.960 | the number of actions you could take is really big. So if you say, here I sit and I want to
00:33:37.600 | search forward from where I am, what are all the things I could do? That's just overwhelming.
00:33:41.520 | If you say, if you can reason at this other level and say, here's what I'm hoping to achieve,
00:33:46.480 | what could I do to make that true? That somehow the branching is smaller. Now,
00:33:51.600 | what's interesting is that like in the AI planning community, that hasn't worked out. In the class of
00:33:56.960 | problems that they look at and the methods that they tend to use, it hasn't turned out that it's
00:34:00.720 | better to go backward. It's still kind of my intuition that it is, but I can't prove that
00:34:07.120 | to you right now. Right. I share your intuition, at least for us mere humans. Speaking of which,
00:34:15.920 | when you maybe now we take it and take a little step into that philosophy circle,
00:34:21.200 | how hard would it, when you think about human life, you give those examples often,
00:34:27.680 | how hard do you think it is to formulate human life as a planning problem or aspects of
00:34:32.400 | human life? So when you look at robots, you're often trying to think about object manipulation,
00:34:38.640 | tasks, about moving a thing. When you take a slight step outside the room, let the robot
00:34:46.240 | leave and go get lunch, or maybe try to pursue more fuzzy goals. How hard do you think is that
00:34:54.480 | problem? If you were to try to maybe put another way, try to formulate human life as a planning
00:35:00.800 | problem. Well, that would be a mistake. I mean, it's not all a planning problem, right? I think
00:35:05.760 | it's really, really important that we understand that you have to put together pieces and parts
00:35:11.920 | that have different styles of reasoning and representation and learning. I think it seems
00:35:18.080 | probably clear to anybody that it can't all be this or all be that. Brains aren't all like this
00:35:25.680 | or all like that, right? They have different pieces and parts and substructure and so on.
00:35:30.160 | So I don't think that there's any good reason to think that there's going to be like one
00:35:33.920 | true algorithmic thing that's going to do the whole job.
00:35:38.080 | Just a bunch of pieces together designed to solve a bunch of specific problems.
00:35:43.040 | Or maybe styles of problems. I mean, there's probably some reasoning that needs to go on
00:35:50.320 | in image space. I think, again, there's this model-based versus model-free idea, right? So
00:35:58.560 | in reinforcement learning, people talk about, "Oh, should I learn? I could learn a policy,
00:36:03.600 | just straight up a way of behaving. I could learn it's popular, learn a value function,
00:36:09.440 | that's some kind of weird intermediate ground. Or I could learn a transition model, which tells me
00:36:15.840 | something about the dynamics of the world." If I take a tra- imagine that I learn a transition
00:36:20.800 | model and I couple it with a planner and I draw a box around that, I have a policy again. It's just
00:36:26.800 | stored a different way. But it's just as much of a policy as the other policy. It's just I've made,
00:36:34.080 | I think, the way I see it is it's a time-space trade-off in computation. Right? A more overt
00:36:41.440 | policy representation. Maybe it takes more space, but maybe I can compute quickly what action I
00:36:47.520 | should take. On the other hand, maybe a very compact model of the world dynamics plus a
00:36:52.480 | planner lets me compute what action to take too, just more slowly. There's no, I don't, I mean,
00:36:57.840 | I don't think there's no argument to be had. It's just like a question of what form of computation
00:37:03.840 | is best for us. - For the various sub-problems. - Right. So, and so like learning to do algebra
00:37:11.840 | manipulations for some reason is, I mean, that's probably gonna want naturally a sort of a
00:37:17.040 | different representation than writing a unicycle. At the time constraints on the unicycle are
00:37:22.080 | serious. The space is maybe smaller. I don't know. But so I. - And there could be the more human
00:37:28.080 | sides of falling in love, having a relationship that might be another. - Yeah, I have no idea.
00:37:35.600 | - How to model that. Yeah. Let's first solve the algebra and the object manipulation.
00:37:42.560 | What do you think is harder, perception or planning? - Perception. That's why. - Understanding.
00:37:49.040 | That's why. So what do you think is so hard about perception,
00:37:53.760 | about understanding the world around you? - Well, I mean, I think the big question
00:37:57.520 | is representational. Hugely the question is representation. So perception has made
00:38:08.400 | great strides lately, right? And we can classify images and we can
00:38:12.800 | play certain kinds of games and predict how to steer the car and all this sort of stuff.
00:38:17.760 | I don't think we have a very good idea of what perception should deliver, right? So if you,
00:38:28.160 | if you believe in modularity, okay, there's a very strong view which says
00:38:34.560 | we shouldn't build in any modularity. We should make a giant, gigantic neural network,
00:38:40.400 | train it end to end to do the thing. And that's the best way forward. And it's hard to argue
00:38:47.600 | with that except on a sample complexity basis, right? So you might say, oh, well, if I want to
00:38:52.960 | do end to end reinforcement learning on this giant, giant neural network, it's going to take
00:38:56.400 | a lot of data and a lot of like broken robots and stuff. So then the only answer is to say, okay,
00:39:07.280 | we have to build something in, build in some structure or some bias. We know from theory of
00:39:12.960 | machine learning, the only way to cut down the sample complexity is to kind of cut down,
00:39:16.960 | somehow cut down the hypothesis space. You can do that by building in bias. There's all kinds
00:39:22.960 | of reasons to think that nature built bias into humans. Convolution is a bias. It's a very strong
00:39:31.760 | bias and it's a very critical bias. So my own view is that we should look for more things that are
00:39:38.880 | like convolution, but that address other aspects of reasoning, right? So convolution helps us a
00:39:43.840 | lot with a certain kind of spatial reasoning. That's quite close to the imaging. I think
00:39:50.640 | there's other ideas like that. Maybe some amount of forward search, maybe some notions of abstraction,
00:39:58.080 | maybe the notion that objects exist. Actually, I think that's pretty important. And a lot of people
00:40:03.200 | won't give you that to start with. Right? - So almost like a convolution in the,
00:40:07.600 | in the object, semantic object space or some kind of, some kind of ideas in there.
00:40:14.560 | - That's right. And people are starting, like the graph, graph convolutions are an idea that
00:40:18.320 | are related to relational representations. And so, so I think there are, so you, I've come far
00:40:26.720 | afield from perception, but I think, I think the thing that's going to make perception that kind
00:40:32.160 | of the next step is actually understanding better what it should produce. Right? So what are we
00:40:37.600 | going to do with the output of it? Right? It's fine when what we're going to do with the output
00:40:41.280 | is severe. It's less clear when we're just trying to make a one integrated, intelligent agent. What
00:40:49.040 | should the output of perception be? We have no idea. And how should that hook up to the other
00:40:53.520 | stuff? We don't know. So I think the pressing question is what kinds of structure can we build
00:41:00.480 | in that are like the moral equivalent of convolution that will make a really awesome
00:41:05.520 | superstructure that then learning can kind of progress on efficiently.
00:41:10.000 | - I agree. Very compelling description of actually where we stand with the perception problem.
00:41:14.080 | You're teaching a course on embodying intelligence. What do you think it takes to
00:41:19.120 | build a robot with human level intelligence? - I don't know if we knew we would do it.
00:41:24.800 | - If you were to, I mean, okay. So do you think a robot needs to have a self-awareness,
00:41:35.200 | a consciousness, fear of mortality, or is it simpler than that? Or is consciousness a simple
00:41:44.160 | thing? Do you think about these notions? - I don't think much about consciousness.
00:41:49.200 | Even most philosophers who care about it will give you that you could have robots that are zombies,
00:41:55.760 | right? That behave like humans, but are not conscious. And I, at this moment,
00:41:59.680 | would be happy enough with that. So I'm not really worried one way or the other.
00:42:02.480 | - So on the technical side, you're not thinking of the use of self-awareness?
00:42:06.800 | - Well, but I, okay. But then what does self-awareness mean? I mean,
00:42:11.280 | that you need to have some part of the system that can observe other parts of the system and
00:42:18.160 | tell whether they're working well or not. That seems critical. So does that count as, I mean,
00:42:23.920 | does that count as self-awareness or not? Well, it depends on whether you think that there's
00:42:29.200 | somebody at home who can articulate whether they're self-aware. But clearly, if I have
00:42:33.680 | some piece of code that's counting how many times this procedure gets executed,
00:42:38.240 | that's a kind of self-awareness, right? So there's a big spectrum. It's clear you have to have some
00:42:44.080 | of it. - Right. We're quite far away,
00:42:46.320 | how many dimensions, but is there a direction of research that's most compelling to you for
00:42:51.600 | trying to achieve human-level intelligence in our robots?
00:42:55.760 | - Well, to me, I guess the thing that seems most compelling to me at the moment is this
00:43:00.960 | question of what to build in and what to learn. I think we're missing a bunch of ideas and
00:43:11.680 | we, you know, people, you know, don't you dare ask me how many years it's going to be until that
00:43:17.760 | happens because I won't even participate in the conversation because I think we're missing ideas
00:43:23.120 | and I don't know how long it's going to take to find them. - So I won't ask you how many years,
00:43:27.120 | but maybe I'll ask you when you'll be sufficiently impressed that we've achieved it. So what's a good
00:43:36.960 | test of intelligence? Do you like the Turing test, the natural language, and the robotic space? Is
00:43:42.640 | there something where you would sit back and think, "Oh, that's pretty impressive as a test,
00:43:49.760 | as a benchmark." Do you think about these kinds of problems? - No, I resist. I mean, I think all
00:43:55.040 | the time that we spend arguing about those kinds of things could be better spent just making the
00:43:59.840 | robots work better. - You don't value competition. So, I mean, there's a nature of benchmarks and
00:44:08.320 | data sets or Turing test challenges where everybody kind of gets together and tries to build a better
00:44:13.840 | robot because they want to out-compete each other. Like the DARPA challenge with the autonomous
00:44:18.080 | vehicles. Do you see the value of that or it can get in the way? - I think it can get in the way.
00:44:25.760 | Many people find it motivating and so that's good. I find it anti-motivating personally.
00:44:31.120 | But I think you get an interesting cycle where for a contest, a bunch of smart people get super
00:44:40.000 | motivated and they hack their brains out. And much of what gets done is just hacks, but sometimes
00:44:45.200 | really cool ideas emerge. And then that gives us something to chew on after that. So, it's not a
00:44:52.080 | thing for me, but I don't regret that other people do it. - Yeah, it's like you said, with everything
00:44:58.160 | else, the mix is good. So, jumping topics a little bit, you started the Journal of Machine Learning
00:45:03.440 | Research and served as its editor-in-chief. How did the publication come about? And what do you
00:45:12.560 | think about the current publishing model space in machine learning, artificial intelligence? -
00:45:18.400 | Okay, good. So, it came about because there was a journal called Machine Learning, which still
00:45:23.680 | exists, which was owned by Cluer. And I was on the editorial board and we used to have these
00:45:30.800 | meetings annually where we would complain to Cluer that it was too expensive for the libraries and
00:45:35.360 | that people couldn't publish. And we would really like to have some kind of relief on those fronts
00:45:39.840 | and they would always sympathize, but not do anything. So, we just decided to make a new
00:45:46.960 | journal. And there was the Journal of AI Research, which was on the same model, which had been in
00:45:53.280 | existence for maybe five years or so, and it was going on pretty well. So, we just made a new
00:46:00.480 | journal. I mean, I don't know, I guess it was work, but it wasn't that hard. So, basically,
00:46:06.320 | the editorial board, probably 75% of the editorial board of machine learning resigned and
00:46:14.720 | we founded this new journal. - But it was sort of, it was more open. - Yeah, right. So, it's
00:46:22.320 | completely open. It's open access. Actually, I had a postdoc, George Kanidaris, who wanted to
00:46:29.520 | call these journals free for all. Because there were, I mean, it both has no page charges and has
00:46:37.440 | no access restrictions. And the reason, and so lots of people, I mean, there were people who
00:46:46.960 | were mad about the existence of this journal who thought it was a fraud or something. It would be
00:46:51.280 | impossible, they said, to run a journal like this with basically, I mean, for a long time, I didn't
00:46:56.240 | even have a bank account. I paid for the lawyer to incorporate and the IP address, and it just did
00:47:05.200 | to cost a couple hundred dollars a year to run. It's a little bit more now, but not that much
00:47:09.760 | more. But that's because I think computer scientists are competent and autonomous in a way
00:47:17.440 | that many scientists in other fields aren't. I mean, at doing these kinds of things. We already
00:47:22.320 | types out our own papers. We all have students and people who can hack a website together in
00:47:27.040 | the afternoon. So, the infrastructure for us was like, not a problem. But for other people in other
00:47:32.640 | fields, it's a harder thing to do. - Yeah, and this kind of open access journal is nevertheless
00:47:38.960 | one of the most prestigious journals. So, it's not like a prestige and it can be achieved without
00:47:46.240 | any of the- - Paper is not required for prestige,
00:47:49.120 | it turns out. - So, on the review process side,
00:47:52.320 | actually a long time ago, I don't remember when, but I reviewed a paper where you were also a
00:47:58.080 | reviewer and I remember reading your review being influenced by it. It was really well written. It
00:48:03.120 | influenced how I write feature reviews. You disagreed with me actually. And you made it
00:48:09.680 | my review, but much better. But nevertheless, the review process has its flaws.
00:48:19.280 | And how do you think, what do you think works well? How can it be improved?
00:48:23.200 | - So, actually when I started JamLR, I wanted to do something completely different.
00:48:27.600 | And I didn't because it felt like we needed a traditional journal of record. And so, we just
00:48:34.800 | made JamLR be almost like a normal journal, except for the open access parts of it, basically.
00:48:40.720 | Increasingly, of course, publication is not even a sensible word. You can publish something by
00:48:47.600 | putting it in archive so I can publish everything tomorrow. So, making stuff public is, there's no
00:48:55.360 | barrier. We still need curation and evaluation. I don't have time to read all of archive.
00:49:06.880 | And you could argue that kind of social thumbs upping of articles suffices, right? You might say,
00:49:21.280 | "Oh, heck with this. We don't need journals at all. We'll put everything on archive and people
00:49:25.920 | will upvote and downvote the articles and then your CV will say, "Oh man, he got a lot of upvotes."
00:49:30.880 | So, that's good. But I think there's still value in careful reading and commentary of things. And
00:49:45.040 | it's hard to tell when people are upvoting and downvoting or arguing about your paper on Twitter
00:49:49.440 | and Reddit, whether they know what they're talking about, right? So, then I have the
00:49:55.440 | second order problem of trying to decide whose opinions I should value and such. So, I don't know.
00:50:01.520 | If I had infinite time, which I don't, and I'm not going to do this because I really want to make
00:50:06.560 | robots work, but if I felt inclined to do something more in the publication direction,
00:50:11.920 | I would do this other thing, which I thought about doing the first time, which is to get together
00:50:16.800 | some set of people whose opinions I value and who are pretty articulate. And I guess we would be
00:50:22.880 | public, although we could be private, I'm not sure. And we would review papers. We wouldn't
00:50:27.520 | publish them and you wouldn't submit them. We would just find papers and we would write
00:50:31.040 | reviews and we would make those reviews public. And maybe if you, you know, so we're Leslie's
00:50:38.720 | friends who review papers and maybe eventually if we, our opinion was sufficiently valued,
00:50:44.320 | like the opinion of JMR is valued, then you'd say on your CV that Leslie's friends gave my paper a
00:50:49.920 | five-star reading and that would be just as good as saying I got it accepted into this journal.
00:50:55.440 | So, I think we should have good public commentary and organize it in some way,
00:51:03.760 | but I don't really know how to do it. It's interesting times.
00:51:06.080 | - The way you describe it actually is really interesting. I mean, we do it for movies,
00:51:10.000 | imdb.com. There's experts, critics come in, they write reviews, but there's also
00:51:16.000 | regular non-critics. Humans write reviews and they're separated.
00:51:19.760 | - I like open review. The iClear process I think is interesting.
00:51:29.120 | - It's a step in the right direction, but it's still not as compelling as
00:51:32.960 | reviewing movies or video games. I mean, it sometimes almost, it might be silly,
00:51:40.240 | at least from my perspective to say, but it boils down to the user interface,
00:51:43.760 | how fun and easy it is to actually perform the reviews, how efficient, how much you as a reviewer
00:51:50.160 | get street cred for being a good reviewer. Those human elements come into play.
00:51:56.640 | - No, it's a big investment to do a good review of a paper and the flood of papers is out of control.
00:52:04.000 | Right, so, you know, there aren't 3,000 new, I don't know how many new movies are there in a year.
00:52:08.480 | I don't know, but there's probably gonna be less than how many machine learning papers
00:52:11.920 | are in a year now. And I'm worried, you know, I, right, so I'm like an old person, so of course,
00:52:21.760 | I'm gonna say, "Rawr, rawr, rawr, things are moving too fast. I'm a stick in the mud."
00:52:26.320 | So I can say that, but my particular flavor of that is, I think the horizon for researchers
00:52:34.560 | has gotten very short. That students want to publish a lot of papers and there's a huge,
00:52:41.520 | there's value, it's exciting and there's value in that and you get patted on the head for it
00:52:46.480 | and so on. But, and some of that is fine, but I'm worried that we're driving out people who
00:52:57.760 | would spend two years thinking about something. Back in my day, when we worked on our theses,
00:53:05.280 | we did not publish papers. You did your thesis for years. You picked a hard problem and then
00:53:10.400 | you worked and chewed on it and did stuff and wasted time and for a long time. And when it was,
00:53:16.000 | roughly when it was done, you would write papers. And so I don't know how to, and I don't think
00:53:22.640 | that everybody has to work in that mode, but I think there's some problems that are hard enough
00:53:26.800 | that it's important to have a longer research horizon and I'm worried that
00:53:31.680 | we don't incentivize that at all at this point. - In this current structure.
00:53:36.800 | - Right. - Yeah. So what do you see
00:53:40.560 | as, what are your hopes and fears about the future of AI and continuing on this theme? So AI has gone
00:53:47.280 | through a few winters, ups and downs. Do you see another winter of AI coming? Are you more hopeful
00:53:55.760 | about making robots work, as you said? - I think the cycles are inevitable,
00:54:02.880 | but I think each time we get higher, right? I mean, so, you know, it's like climbing some kind
00:54:09.680 | of landscape with a noisy optimizer. So it's clear that the, you know, the deep learning stuff has
00:54:19.520 | made deep and important improvements. And so the high watermark is now higher. There's no question,
00:54:26.960 | but of course, I think people are overselling and eventually investors, I guess, and other people
00:54:35.680 | look around and say, well, you're not quite delivering on this grand claim and that wild
00:54:41.680 | hypothesis. It's like, probably it's going to crash some amount and then it's okay. I mean,
00:54:48.800 | but I don't, I can't imagine that there's like some awesome monotonic improvement from here to
00:54:55.200 | human level AI. - So in, you know, I have to ask this question. I probably anticipate answers,
00:55:02.320 | the answers, but do you have a worry, short term or long term about the existential threats of AI
00:55:10.240 | and maybe short term, less existential, but more robots taking away jobs?
00:55:18.880 | - Well, actually, let me talk a little bit about utility. Actually, I had an interesting
00:55:27.200 | conversation with some military ethicists who wanted to talk to me about autonomous weapons.
00:55:32.560 | And they were interesting, smart, well-educated guys who didn't know too much about AI or machine
00:55:40.880 | learning. And the first question they asked me was, has your robot ever done something you didn't
00:55:45.280 | expect? And I like burst out laughing because anybody who's ever done something on the robot,
00:55:50.560 | right, knows that they don't do much. And what I realized was that their model of how we program a
00:55:56.240 | robot was completely wrong. Their model of how we can program a robot was like Lego Mindstorms,
00:56:02.560 | like, oh, go forward a meter, turn left, take a picture, do this, do that. And so if you have
00:56:07.280 | that model of programming, then it's true. It's kind of weird that your robot would do something
00:56:12.560 | that you didn't anticipate. But the fact is, and actually, so now this is my new educational
00:56:17.680 | mission. If I have to talk to non-experts, I try to teach them the idea that we don't operate,
00:56:24.560 | we operate at least one or maybe many levels of abstraction about that. And we say, oh,
00:56:29.680 | here's a hypothesis class. Maybe it's a space of plans, or maybe it's a space of classifiers or
00:56:35.200 | whatever, but there's some set of answers and an objective function. And then we work on some
00:56:40.160 | optimization method that tries to optimize a solution in that class. And we don't know what
00:56:46.800 | solution is going to come out. So I think it's important to communicate that. So I mean, of
00:56:52.320 | course, probably people who listen to this, they know that lesson. But I think it's really critical
00:56:56.960 | to communicate that lesson. And then lots of people are now talking about the value alignment
00:57:01.840 | problem. So you want to be sure as robots or software systems get more competent, that their
00:57:09.360 | objectives are aligned with your objectives, or that our objectives are compatible in some way,
00:57:14.480 | or we have a good way of mediating when they have different objectives. And so I think it is
00:57:20.240 | important to start thinking in terms, like, you don't have to be freaked out by the robot apocalypse
00:57:26.720 | to accept that it's important to think about objective functions of value alignment.
00:57:30.480 | >> Yes. >> And that you have to really,
00:57:32.960 | everyone who's done optimization knows that you have to be careful what you wish for, that,
00:57:37.120 | you know, sometimes you get the optimal solution, and you realize, man, that objective was wrong.
00:57:41.920 | So pragmatically, in the shortish term, it seems to me that those are really interesting and
00:57:50.480 | critical questions. And the idea that we're going to go from being people who engineer algorithms
00:57:55.040 | to being people who engineer objective functions, I think that's definitely going to happen. And
00:58:00.400 | that's going to change our thinking and methodology. >> You started at Stanford
00:58:05.360 | philosophy, that's where you should go back to philosophy. >> Philosophy, maybe.
00:58:09.600 | >> Designing objective functions. >> Well, I mean, they're mixed together,
00:58:12.880 | because as we also know, as machine learning people, right, when you design, in fact, this is
00:58:18.080 | the lecture I gave in class today, when you design an objective function, you have to wear both hats.
00:58:23.360 | There's the hat that says, what do I want? And there's the hat that says, but I know what my
00:58:28.320 | optimizer can do to some degree. And I have to take that into account. So it's always a tradeoff,
00:58:34.640 | and we have to kind of be mindful of that. The part about taking people's jobs, I understand
00:58:41.520 | that that's important. I don't understand sociology or economics or people very well.
00:58:47.920 | So I don't know how to think about that. >> So that's, yeah, so there might be a
00:58:51.840 | sociological aspect there, the economic aspect that's very difficult to think about. Okay.
00:58:56.400 | >> I mean, I think other people should be thinking about it, but I'm just, that's not my strength.
00:58:59.840 | >> So what do you think is the most exciting area of research in the short term,
00:59:04.320 | for the community and for yourself? >> Well, so, I mean, there's this story
00:59:08.400 | I've been telling about how to engineer intelligent robots, right? So that's what we want to do. We
00:59:15.920 | all kind of want to do, well, I mean, some set of us want to do this. And the question is, what's
00:59:20.240 | the most effective strategy? And we've tried, and there's a bunch of different things you could do
00:59:24.960 | at the extremes, right? One super extreme is we do introspection and we write a program. Okay,
00:59:30.960 | that has not worked out very well. Another extreme is we take a giant bunch of neural
00:59:35.680 | goo and we try to train it up to do something. I don't think that's going to work either.
00:59:39.360 | So the question is, what's the middle ground? And again, this isn't a theological question
00:59:48.480 | or anything like that. It's just like, how do we, what's the best way to make this work out?
00:59:54.960 | And I think it's clear, it's a combination of learning. To me, it's clear. It's a combination
01:00:00.240 | of learning and not learning. And what should that combination be? And what's the stuff we
01:00:05.120 | build in? So to me, that's the most compelling question. >> And when you say engineer robots,
01:00:09.680 | you mean engineering systems that work in the real world? Is that, that's the emphasis?
01:00:17.600 | Last question. Which robots or robot is your favorite from science fiction?
01:00:23.200 | So you can go with Star Wars or RTD2, or you can go with more modern, maybe Hal from-
01:00:33.280 | >> No, I don't think I have a favorite robot from science fiction.
01:00:37.040 | >> This is back to, you like to make robots work in the real world here, not in-
01:00:45.360 | >> I mean, I love the process. And I care more about the process.
01:00:49.920 | >> The engineering process. >> Yeah. I mean, I do research because it's fun,
01:00:53.840 | not because I care about what we produce. >> Well, that's a beautiful note, actually,
01:00:59.520 | to end on. Leslie, thank you so much for talking today.
01:01:01.840 | >> Sure, it's been fun.
01:01:03.120 | [BLANK_AUDIO]
01:01:04.580 | [NO SPEECH]
01:01:04.660 | [BLANK_AUDIO]
01:01:14.660 | [BLANK_AUDIO]