back to indexLeslie Kaelbling: Reinforcement Learning, Planning, and Robotics | Lex Fridman Podcast #15
Chapters
0:0 Intro
0:42 What made you get excited about AI
1:40 Philosophy and computer science
2:20 Majors in AI
3:8 AI philosophers
4:17 The philosophical gap
5:20 The first robot
9:0 Reinventing reinforcement learning
11:42 Roadblocks in symbolic reasoning
14:11 Where symbolic reasoning is useful
14:59 Why abstractions are critical
17:13 Automated construction of abstractions
19:21 Markov decision process
20:51 Partial observable Markov decision process
21:45 Planning under uncertainty
23:22 Optimality
25:20 Science vs Engineering
27:37 Belief vs State Space
32:40 Starting at the goal
34:8 Planning human life
35:37 Modelbased vs modelfree
37:27 Perception vs planning
40:6 Convolution
41:10 Selfawareness
42:45 Humanlevel intelligence
43:26 Test of intelligence
44:57 Journal of Machine Learning Research
46:17 Open Access
47:35 Review Process
51:29 Paper Reviews
53:35 Hopes and Fears
54:58 Existential Threats
58:50 Most Exciting Research
00:00:00.000 |
The following is a conversation with Leslie Kaelbling. She's a roboticist and professor at 00:00:05.360 |
MIT. She's recognized for her work in reinforcement learning, planning, robot navigation, and several 00:00:12.080 |
other topics in AI. She won the IJCAI Computers and Thought Award and was the editor-in-chief 00:00:18.560 |
of the prestigious Journal of Machine Learning Research. This conversation is part of the 00:00:24.320 |
Artificial Intelligence Podcast at MIT and beyond. If you enjoy it, subscribe on YouTube, iTunes, 00:00:31.280 |
or simply connect with me on Twitter @LexFriedman, spelled F-R-I-D. And now, 00:00:37.760 |
here's my conversation with Leslie Kaelbling. 00:00:41.040 |
- What made me get excited about AI, I can say that, is I read Gödel, Escher, Bach when I was 00:00:47.680 |
in high school. That was pretty formative for me because it exposed the interestingness of 00:00:56.240 |
primitives and combination and how you can make complex things out of simple parts, 00:01:01.520 |
and ideas of AI and what kinds of programs might generate intelligent behavior. 00:01:07.120 |
- So you first fell in love with AI reasoning logic versus robots? 00:01:12.400 |
- Yeah, the robots came because my first job, so I finished an undergraduate degree in philosophy 00:01:18.240 |
at Stanford and was about to finish a master's in computer science, and I got hired at SRI 00:01:24.160 |
in their AI lab. And they were building a robot. It was a kind of a follow-on to Shakey, 00:01:30.960 |
but all the Shakey people were not there anymore. And so my job was to try to get this robot to do 00:01:36.080 |
stuff, and that's really kind of what got me interested in robots. 00:01:39.280 |
- So maybe taking a small step back, your bachelor's in Stanford in philosophy, 00:01:44.400 |
did a master's and PhD in computer science, but the bachelor's in philosophy. So what was that 00:01:49.440 |
journey like? What elements of philosophy do you think you bring to your work in computer science? 00:01:54.880 |
- So it's surprisingly relevant. So part of the reason that I didn't do a computer science 00:02:00.080 |
undergraduate degree was that there wasn't one at Stanford at the time, but that there's part 00:02:04.480 |
of philosophy, and in fact, Stanford has a special sub-major in something called now 00:02:08.320 |
symbolic systems, which is logic, model, theory, formal semantics of natural language. And so that's 00:02:15.120 |
actually a perfect preparation for work in AI and computer science. 00:02:19.680 |
- That's kind of interesting. So if you were interested in artificial intelligence, 00:02:23.840 |
what kind of majors were people even thinking about taking? What is it in neuroscience? 00:02:30.800 |
So besides philosophies, what were you supposed to do if you were 00:02:35.520 |
fascinated by the idea of creating intelligence? 00:02:37.680 |
- There weren't enough people who did that for that even to be a conversation. I mean, 00:02:42.240 |
I think probably philosophy. I mean, it's interesting in my class, my graduating class of 00:02:50.000 |
undergraduate philosophers, probably maybe slightly less than half went on in computer science, 00:02:57.840 |
slightly less than half went on in law, and like one or two went on in philosophy. 00:03:05.520 |
- Do you think AI researchers have a role, be part-time philosophers, or should they stick 00:03:10.240 |
to the solid science and engineering without sort of taking the philosophizing tangents? I mean, 00:03:16.240 |
you work with robots, you think about what it takes to create intelligent beings. Aren't you 00:03:21.760 |
the perfect person to think about the big picture philosophy at all? 00:03:25.200 |
- The parts of philosophy that are closest to AI, I think, or at least the closest to AI that I think 00:03:29.520 |
about are stuff like belief and knowledge and denotation and that kind of stuff. And that's, 00:03:36.160 |
you know, it's quite formal and it's like just one step away from 00:03:39.600 |
the kinds of computer science work that we do kind of routinely. 00:03:44.000 |
I think that there are important questions still about what you can do with a machine and what you 00:03:53.200 |
can't and so on. Although at least my personal view is that I'm completely a materialist and 00:03:57.840 |
I don't think that there's any reason why we can't make a robot be behaviorally indistinguishable 00:04:04.160 |
from a human. And the question of whether it's distinguishable internally, whether it's a zombie 00:04:10.640 |
or not in philosophy terms, I actually don't, I don't know, and I don't know if I care too much 00:04:16.480 |
about that. - Right, but there is philosophical notions, they're mathematical and philosophical 00:04:22.080 |
because we don't know so much, of how difficult that is, how difficult is the perception problem, 00:04:27.520 |
how difficult is the planning problem, how difficult is it to operate in this world successfully. 00:04:32.720 |
Because our robots are not currently as successful as human beings in many tasks, 00:04:37.920 |
the question about the gap between current robots and human beings borders a little bit on 00:04:44.560 |
philosophy. You know, the expanse of knowledge that's required to operate in this world, 00:04:51.600 |
and the ability to form common sense knowledge, the ability to reason about uncertainty, 00:04:56.640 |
much of the work you've been doing, there's open questions there that, I don't know, 00:05:03.600 |
require to activate a certain big picture view. - To me that doesn't seem like a philosophical 00:05:09.440 |
gap at all. To me, there is a big technical gap, there's a huge technical gap, but I don't see any 00:05:16.640 |
reason why it's more than a technical gap. - Perfect. So, when you mentioned AI, you mentioned 00:05:23.600 |
SRI, and maybe, can you describe to me when you first fell in love with robotics, the robots, 00:05:31.840 |
were inspired, which, so you mentioned Flaky or Shaky, Shaky Flaky, and what was the robot that 00:05:40.480 |
first captured your imagination, what's possible? - Right, well, so the first robot I worked with was 00:05:44.800 |
Flaky. Shaky was a robot that the SRI people had built, but by the time, I think when I arrived, 00:05:50.880 |
it was sitting in a corner of somebody's office dripping hydraulic fluid into a pan. 00:05:55.040 |
But it's iconic, and really, everybody should read the Shaky Tech Report, because it has so 00:06:01.760 |
many good ideas in it. I mean, they invented A*Search and symbolic planning and learning 00:06:09.200 |
macro operators. They had low-level kind of configuration space planning for their robot, 00:06:16.080 |
they had vision, they had all, this, the basic ideas of a ton of things. - Can you take, 00:06:20.800 |
step by, Shaky have arms, what was the job, what was the goal? - Shaky was a mobile robot, 00:06:26.560 |
but it could push objects, and so it would move things around. - With which actuator, with arms? 00:06:31.680 |
- With itself, with its base. - Okay, great. - So it could, but it, and they had painted the base 00:06:39.040 |
boards black, so it used vision to localize itself in a map, it detected objects, it could detect 00:06:47.680 |
objects that were surprising to it, it would plan and replan based on what it saw, it reasoned about 00:06:54.480 |
whether to look and take pictures. I mean, it really had the basics of so many of the things 00:07:01.280 |
that we think about now. - How did it represent the space around it? - So it had representations 00:07:08.000 |
at a bunch of different levels of abstraction, so it had, I think, a kind of an occupancy grid 00:07:12.400 |
of some sort at the lowest level. At the high level, it was abstract, symbolic kind of rooms 00:07:18.800 |
and connectivity. - So where does Flaky come in? - Yeah, okay, so I showed up at SRI, and the, 00:07:25.120 |
we were building a brand new robot. As I said, none of the people from the previous project were 00:07:31.040 |
kind of there or involved anymore, so we were kind of starting from scratch, and my advisor 00:07:37.680 |
was Stan Rosenstein, he ended up being my thesis advisor, and he was motivated by this idea of 00:07:44.080 |
situated computation or situated automata, and the idea was that the tools of logical reasoning were 00:07:52.400 |
important, but possibly only for the engineers or designers to use in the analysis of a system, 00:08:01.200 |
but not necessarily to be manipulated in the head of the system itself, right? So I might use logic 00:08:07.600 |
to prove a theorem about the behavior of my robot, even if the robot's not using logic in its head 00:08:12.640 |
to prove theorems, right? So that was kind of the distinction. And so the idea was to kind of use 00:08:18.560 |
those principles to make a robot do stuff, but a lot of the basic things we had to kind of 00:08:27.040 |
learn for ourselves, 'cause I had zero background in robotics, I didn't know anything about control, 00:08:31.360 |
I didn't know anything about sensors, so we reinvented a lot of wheels on the way to getting 00:08:35.760 |
that robot to do stuff. - Do you think that was an advantage 00:08:38.240 |
or a hindrance? - Oh, no, I mean, I'm big in favor of 00:08:42.480 |
wheel reinvention, actually. I mean, I think you learn a lot by doing it. It's important, though, 00:08:48.240 |
to eventually have the pointers so that you can see what's really going on, but I think you can 00:08:53.440 |
appreciate much better the good solutions once you've messed around a little bit on your own 00:08:59.440 |
and found a bad one. - Yeah, I think you mentioned 00:09:01.440 |
reinventing reinforcement learning and referring to rewards as pleasures, a pleasure, I think, 00:09:09.520 |
which I think is a nice name for it. - Yeah, it seemed good to me. 00:09:12.480 |
- It's more fun, almost. Do you think you could tell the history of AI, machine learning, 00:09:19.040 |
reinforcement learning, how you think about it from the '50s to now? 00:09:23.360 |
- One thing is that it oscillates, right? So things become fashionable and then they go out 00:09:29.360 |
and then something else becomes cool and then it goes out and so on. So there's some interesting 00:09:34.480 |
sociological process that actually drives a lot of what's going on. Early days was kind of 00:09:40.800 |
cybernetics and control, right? And the idea of homeostasis, right? People who made these robots 00:09:47.840 |
that could, I don't know, try to plug into the wall when they needed power and then come loose 00:09:53.840 |
and roll around and do stuff. And then I think over time, the thought, well, that was inspiring, 00:10:00.560 |
but people said, no, no, no, we want to get maybe closer to what feels like real intelligence or 00:10:04.480 |
human intelligence. And then maybe the expert systems people tried to do that, but maybe a 00:10:14.720 |
little too superficially, right? So, oh, we get the surface understanding of what intelligence is 00:10:21.440 |
like because I understand how a steel mill works and I can try to explain it to you and you can 00:10:25.600 |
write it down in logic and then we can make a computer infer that. And then that didn't work 00:10:31.120 |
out. But what's interesting, I think, is when a thing starts to not be working very well, 00:10:37.520 |
it's not only do we change methods, we change problems, right? So it's not like we have better 00:10:44.160 |
ways of doing the problem that the expert systems people were trying to do. We have no ways of 00:10:48.160 |
trying to do that problem. Oh yeah, I know, I think, or maybe a few, but we kind of give up 00:10:55.200 |
on that problem and we switch to a different problem and we work that for a while and we make 00:11:02.480 |
- As a community, yeah. - And there's a lot of people 00:11:04.160 |
who would argue you don't give up on the problem, it's just you decrease the number of people working 00:11:09.520 |
on it. You almost kind of like put it on the shelf, say, we'll come back to this 20 years later. 00:11:13.760 |
- Yeah, I think that's right. Or you might decide that it's malformed. 00:11:18.080 |
- Like you might say, it's wrong to just try to make something that does superficial symbolic 00:11:25.520 |
reasoning behave like a doctor. You can't do that until you've had the sensory motor experience 00:11:32.640 |
of being a doctor or something, right? So there's arguments that say that that problem was not well 00:11:37.280 |
formed, or it could be that it is well formed, but we just weren't approaching it well. 00:11:42.240 |
- So you mentioned that your favorite part of logic and symbolic systems is that they give 00:11:47.600 |
short names for large sets. So there is some use to this, the use to symbolic reasoning. So 00:11:55.120 |
looking at expert systems and symbolic computing, what do you think are the roadblocks that were 00:12:00.320 |
hit in the '80s and '90s? - Ah, okay. So right, so the fact that I'm 00:12:04.720 |
not a fan of expert systems doesn't mean that I'm not a fan of some kinds of symbolic reasoning. 00:12:13.040 |
Let's see, roadblocks. Well, the main roadblock, I think, was that the idea that humans could 00:12:19.600 |
articulate their knowledge effectively into some kind of logical statements. 00:12:26.000 |
- So it's not just the cost, the effort, but really just the capability of doing it. 00:12:31.120 |
- Right, because we're all experts in vision, right? But totally don't have introspective 00:12:36.720 |
access into how we do that, right? And it's true that, I mean, I think the idea was, well, of 00:12:45.120 |
course, even people then would know, of course, I wouldn't ask you to please write down the rules 00:12:48.800 |
that you use for recognizing a water bottle. That's crazy. And everyone understood that. But 00:12:53.440 |
we might ask you to please write down the rules you use for deciding, I don't know, what tie to 00:12:59.920 |
put on or how to set up a microphone or something like that. But even those things, I think people 00:13:07.920 |
maybe, I think what they found, I'm not sure about this, but I think what they found was that the 00:13:12.720 |
so-called experts could give explanations that sort of post hoc explanations for how and why 00:13:19.120 |
they did things, but they weren't necessarily very good. And then they depended on maybe some 00:13:27.680 |
kinds of perceptual things, which again, they couldn't really define very well. So I think 00:13:33.840 |
fundamentally, I think that the underlying problem with that was the assumption that people could 00:13:39.040 |
articulate how and why they make their decisions. - Right, so it's almost encoding the knowledge, 00:13:45.760 |
converting from expert to something that a machine could understand and reason with. 00:13:51.200 |
- No, no, no, not even just encoding, but getting it out of you. 00:13:58.240 |
I mean, yes, hard also to write it down for the computer, but I don't think that people can 00:14:03.680 |
produce it. You can tell me a story about why you do stuff, but I'm not so sure that's the why. 00:14:10.080 |
- Great. So there are still on the hierarchical planning side, 00:14:18.560 |
places where symbolic reasoning is very useful. So as you've talked about, so... 00:14:29.520 |
- Yeah, okay, good. So saying that humans can't provide a description of their reasoning processes, 00:14:36.480 |
that's okay, fine, but that doesn't mean that it's not good to do reasoning of various styles 00:14:42.400 |
inside a computer. Those are just two orthogonal points. So then the question is, what kind of 00:14:49.120 |
reasoning should you do inside a computer? And the answer is, I think you need to do all different 00:14:54.560 |
kinds of reasoning inside a computer, depending on what kinds of problems you face. 00:14:59.040 |
- I guess the question is, what kind of things can you encode symbolically so you can reason about? 00:15:07.840 |
- I think the idea about, and even symbolic, I don't even like that terminology, 00:15:16.080 |
'cause I don't know what it means technically and formally. I do believe in abstractions. 00:15:21.440 |
So abstractions are critical, right? You cannot reason at completely fine grain about everything 00:15:27.840 |
in your life, right? You can't make a plan at the level of images and torques for getting a PhD. 00:15:34.080 |
So you have to reduce the size of the state space and you have to reduce the horizon 00:15:39.600 |
if you're gonna reason about getting a PhD or even buying the ingredients to make dinner. 00:15:44.160 |
And so how can you reduce the spaces and the horizon of the reasoning you have to do? And 00:15:50.960 |
the answer is abstraction, spatial abstraction, temporal abstraction. I think abstraction along 00:15:55.280 |
the lines of goals is also interesting. Like you might, or well, abstraction and decomposition. 00:16:01.600 |
Goals is maybe more of a decomposition thing. So I think that's where these kinds of, if you want 00:16:06.880 |
to call it symbolic or discrete models come in. You talk about a room of your house instead of 00:16:13.280 |
your pose. You talk about doing something during the afternoon instead of at 2.54. 00:16:20.000 |
And you do that because it makes your reasoning problem easier. And also because 00:16:28.480 |
you don't have enough information to reason in high fidelity about your pose of your elbow at 00:16:35.280 |
2.35 this afternoon anyway. - Right. When you're trying to get a PhD. 00:16:39.360 |
- Right. Or when you're doing anything really. - Yeah, okay. 00:16:41.920 |
- Except for at that moment. At that moment, you do have to reason about the pose of your elbow, 00:16:46.000 |
maybe. But then maybe you do that in some continuous joint space kind of model. So again, 00:16:53.440 |
my biggest point about all of this is that there should be, that dogma is not the thing, right? 00:16:59.280 |
It shouldn't be that I am in favor against symbolic reasoning and you're in favor against 00:17:04.080 |
neural networks. It should be that just computer science tells us what the right answer to all 00:17:10.480 |
these questions is if we were smart enough to figure it out. - Well, yeah. When you try to 00:17:13.840 |
actually solve the problem with computers, the right answer comes out. You mentioned abstractions. 00:17:19.680 |
I mean, neural networks form abstractions or rather there's automated ways to form abstractions. 00:17:27.840 |
expert driven ways to form abstractions and expert human driven ways. And humans just seems to be 00:17:34.240 |
way better at forming abstractions currently and certain problems. So when you're referring to 00:17:39.840 |
2.45 p.m. versus afternoon, how do we construct that taxonomy? Is there any room for automated 00:17:48.960 |
construction of such abstractions? - Oh, I think eventually, yeah. I mean, 00:17:53.680 |
I think when we get to be better machine learning engineers, we'll build algorithms that 00:18:00.240 |
build awesome abstractions. - That are useful in this kind 00:18:04.400 |
- Yeah. So let's then step from the abstraction discussion and let's talk about BOM MDPs, 00:18:15.840 |
partially observable Markov decision processes. So uncertainty. So first, 00:18:20.160 |
what are Markov decision processes? - What are Markov decision processes? 00:18:23.600 |
- And maybe how much of our world can be models and MDPs? How much, when you wake up in the 00:18:30.080 |
morning and you're making breakfast, do you think of yourself as an MDP? So how do you think about 00:18:35.760 |
MDPs and how they relate to our world? - Well, so there's a stance question, right? 00:18:41.280 |
So a stance is a position that I take with respect to a problem. So I, as a researcher or a person 00:18:48.640 |
who designs systems, can decide to make a model of the world around me in some terms. So I take 00:18:56.080 |
this messy world and I say, I'm going to treat it as if it were a problem of this formal kind, 00:19:02.560 |
and then I can apply solution concepts or algorithms or whatever to solve that formal 00:19:06.880 |
thing, right? So of course the world is not anything. It's not an MDP or a POMDP. I don't 00:19:11.440 |
know what it is, but I can model aspects of it in some way or some other way. And when I model some 00:19:16.960 |
aspect of it in a certain way, that gives me some set of algorithms I can use. - You can model the 00:19:21.760 |
world in all kinds of ways. Some are more accepting of uncertainty, more easily modeling uncertainty 00:19:31.520 |
of the world. Some really force the world to be deterministic. And so certainly MDPs model the 00:19:39.440 |
uncertainty of the world. - Yes. Model some uncertainty. They model not present state 00:19:44.880 |
uncertainty, but they model uncertainty in the way the future will unfold. - Right. So what are 00:19:52.320 |
Markov decision processes? - Okay, so Markov decision process is a model. It's a kind of a 00:19:56.160 |
model that you could make that says, I know completely the current state of my system. 00:20:00.800 |
And what it means to be a state is that I have all the information right now that will let me make 00:20:08.000 |
predictions about the future as well as I can. So that remembering anything about my history 00:20:13.040 |
wouldn't make my predictions any better. But then it also says that then I can take some actions 00:20:21.840 |
that might change the state of the world and that I don't have a deterministic model of those 00:20:26.240 |
changes. I have a probabilistic model of how the world might change. It's a useful model for some 00:20:33.440 |
kinds of systems. I think it's a, I mean, it's certainly not a good model for most problems. 00:20:41.520 |
I think because for most problems, you don't actually know the state. For most problems, 00:20:46.960 |
it's partially observed. So that's now a different problem class. - So, okay, that's where the 00:20:53.280 |
POMDPs, the partially observable Markov decision processes step in. So how do they address the 00:21:00.080 |
fact that you can't observe most, you have incomplete information about most of the world 00:21:05.360 |
around you? - Right. So now the idea is we still kind of postulate that there exists a state. We 00:21:10.880 |
think that there is some information about the world out there such that if we knew that we 00:21:15.840 |
could make good predictions, but we don't know the state. And so then we have to think about how, 00:21:21.600 |
but we do get observations. Maybe I get images or I hear things or I feel things, and those might be 00:21:27.760 |
local or noisy. And so therefore they don't tell me everything about what's going on. And then I 00:21:31.760 |
have to reason about, given the history of actions I've taken and observations I've gotten, 00:21:37.680 |
what do I think is going on in the world? And then given my own kind of uncertainty about what's 00:21:42.640 |
going on in the world, I can decide what actions to take. - And so difficult is this problem of 00:21:48.160 |
planning under uncertainty in your view, in your long experience of modeling the world, 00:21:54.000 |
trying to deal with this uncertainty in, especially in real world systems. - Optimal planning for even 00:22:02.720 |
discrete POMDPs can be undecidable depending on how you set it up. And so lots of people say, 00:22:10.720 |
I don't use POMDPs because they are intractable. And I think that that's a kind of a very funny 00:22:16.480 |
thing to say, because the problem you have to solve is the problem you have to solve. 00:22:22.080 |
So if the problem you have to solve is intractable, that's what makes us AI people, 00:22:25.840 |
right? So we solve, we understand that the problem we're solving is wildly intractable, 00:22:31.680 |
that we can't, we will never be able to solve it optimally. At least I don't. Yeah, right. So 00:22:37.760 |
later we can come back to an idea about bounded optimality and something. But anyway, 00:22:42.720 |
we can't come up with optimal solutions to these problems. So we have to make approximations, 00:22:47.680 |
approximations in modeling, approximations in solution algorithms and so on. And so 00:22:52.160 |
I don't have a problem with saying, yeah, my problem actually, it is POMDP in continuous 00:22:58.800 |
space with continuous observations and it's so computationally complex, I can't even think about 00:23:04.000 |
it's, you know, big O, whatever. But that doesn't prevent me from, it helps me, 00:23:10.320 |
gives me some clarity to think about it that way. And to then take steps to make approximation after 00:23:17.600 |
approximation to get down to something that's like computable in some reasonable time. 00:23:22.160 |
When you think about optimality, you know, the community broadly has shifted on that, 00:23:27.600 |
I think a little bit in how much they value the idea of optimality, of chasing an optimal solution. 00:23:35.680 |
How has your views of chasing an optimal solution changed over the years when you work with robots? 00:23:42.320 |
That's interesting. I think we have a little bit of a methodological crisis, actually, 00:23:49.360 |
from the theoretical side. I mean, I do think that theory is important and that right now we're not 00:23:54.080 |
doing much of it. So there's lots of empirical hacking around and training this and doing that 00:24:00.640 |
and reporting numbers, but is it good? Is it bad? We don't know. It's very hard to say things. 00:24:05.440 |
And if you look at like computer science theory, so people talked for a while, everyone was about 00:24:16.800 |
solving problems optimally or completely. And then there were interesting relaxations, right? So 00:24:22.400 |
people look at, oh, can I, are there regret bounds or can I do some kind of, you know, 00:24:29.360 |
approximation? Can I prove something that I can approximately solve this problem or that I get 00:24:33.760 |
closer to the solution as I spend more time and so on? What's interesting, I think, is that we don't 00:24:39.920 |
have good approximate solution concepts for very difficult problems, right? I like to, you know, 00:24:49.200 |
I like to say that I'm interested in doing a very bad job of very big problems. 00:24:56.000 |
- Right. So very bad job of very big problems. I like to do that. But I wish I could say 00:25:02.960 |
something. I wish I had a, I don't know, some kind of a formal solution concept that I could use to 00:25:12.000 |
say, oh, this algorithm actually, it gives me something. Like I know what I'm going to get. 00:25:17.520 |
I can do something other than just run it and get out 6.7. 00:25:20.640 |
- That notion is still somewhere deeply compelling to you. The notion that you can say, 00:25:26.240 |
you can drop thing on the table says this, you can expect this algorithm will give me some good 00:25:33.440 |
- I hope there's, I hope science will, I mean, there's engineering and there's science. I think 00:25:39.280 |
that they're not exactly the same. And I think right now we're making huge engineering, like, 00:25:45.920 |
leaps and bounds. So the engineering is running away ahead of the science, which is cool and 00:25:50.880 |
often how it goes, right? So we're making things and nobody knows how and why they work, roughly. 00:26:00.160 |
- There's some form, it's, yeah, there's some room for formalizing. 00:26:04.720 |
- We need to know what the principles are. Why does this work? Why does that not work? I mean, 00:26:08.400 |
for a while people built bridges by trying, but now we can often predict whether it's going to 00:26:13.440 |
work or not without building it. Can we do that for learning systems or for robots? 00:26:18.320 |
- So your hope is from a materialistic perspective that 00:26:21.600 |
intelligence, artificial intelligence systems, robots, are just fancier bridges. 00:26:28.080 |
Belief space. What's the difference between belief space and state space? So you mentioned MDPs, 00:26:34.240 |
POMDPs, reasoning about, you sense the world, there's a state. What's this belief space idea? 00:26:44.240 |
- That sounds good. So belief space, that is, instead of thinking about what's the state of 00:26:50.640 |
the world and trying to control that as a robot, I think about what is the space of beliefs that I 00:26:58.800 |
could have about the world? What's, if I think of a belief as a probability distribution of ways the 00:27:03.680 |
world could be, a belief state as a distribution, and then my control problem, if I'm reasoning 00:27:10.080 |
about how to move through a world I'm uncertain about, my control problem is actually the problem 00:27:16.160 |
of controlling my beliefs. So I think about taking actions, not just what effect they'll 00:27:21.040 |
have on the world outside, but what effect they'll have on my own understanding of the world outside. 00:27:25.200 |
And so that might compel me to ask a question or look somewhere to gather information, 00:27:31.920 |
which may not really change the world state, but it changes my own belief about the world. 00:27:36.480 |
- That's a powerful way to empower the agent to reason about the world, to explore the world. 00:27:44.480 |
What kind of problems does it allow you to solve to consider belief space versus just state space? 00:27:51.680 |
- Well, any problem that requires deliberate information gathering, right? So if, 00:27:56.800 |
in some problems, like chess, there's no uncertainty, or maybe there's uncertainty 00:28:03.440 |
about the opponent, there's no uncertainty about the state. And some problems there's 00:28:09.760 |
uncertainty, but you gather information as you go, right? You might say, "Oh, I'm driving my 00:28:15.200 |
autonomous car down the road, and it doesn't know perfectly where it is, but the light hours are all 00:28:19.120 |
going all the time, so I don't have to think about whether to gather information." But if you're a 00:28:24.640 |
human driving down the road, you sometimes look over your shoulder to see what's going on behind 00:28:29.680 |
you in the lane, and you have to decide whether you should do that now. And you have to trade off 00:28:37.280 |
the fact that you're not seeing in front of you, and you're looking behind you, and how valuable 00:28:41.200 |
is that information, and so on. And so to make choices about information gathering, 00:28:46.000 |
you have to reason in belief space. Also, I mean, also to just take into account your own 00:28:55.440 |
uncertainty before trying to do things. So you might say, "If I understand where I'm standing 00:29:02.640 |
relative to the door jam pretty accurately, then it's okay for me to go through the door. But if 00:29:08.000 |
I'm really not sure where the door is, then it might be better to not do that right now." 00:29:12.880 |
The degree of your uncertainty about the world is actually part of the thing you're trying to 00:29:17.760 |
optimize in forming the plan, right? That's right. 00:29:20.960 |
So this idea of a long horizon of planning for a PhD, or just even how to get out of the house, 00:29:27.040 |
or how to make breakfast. You show this presentation of the WTF, where's the fork, 00:29:32.720 |
of robot looking at a sink. And can you describe how we plan in this world, 00:29:40.640 |
of this idea of hierarchical planning we've mentioned? So yeah, how can a robot hope to 00:29:47.360 |
plan about something with such a long horizon, where the goal is quite far away? 00:29:53.840 |
People, since probably reasoning began, have thought about hierarchical reasoning. 00:29:59.760 |
The temporal hierarchy in particular. Well, there's spatial hierarchy, but let's talk 00:30:03.040 |
about temporal hierarchy. So you might say, "Oh, I have this long execution I have to do, 00:30:08.960 |
but I can divide it into some segments abstractly." So maybe you have to get out of the house, 00:30:15.360 |
I have to get in the car, I have to drive, and so on. And so you can plan. If you can build 00:30:22.720 |
abstractions, so this, we started out by talking about abstractions, and we're back to that now. 00:30:26.960 |
If you can build abstractions in your state space, and abstractions, sort of temporal abstractions, 00:30:34.560 |
then you can make plans at a high level. And you can say, "I'm going to go to town, and then I'll 00:30:40.160 |
have to get gas, and then I can go here, and I can do this other thing." And you can reason about 00:30:43.840 |
the dependencies and constraints among these actions, again, without thinking about the complete 00:30:50.000 |
details. What we do in our hierarchical planning work is then say, "All right, I make a plan at a 00:30:57.280 |
high level of abstraction. I have to have some reason to think that it's feasible without 00:31:03.920 |
working it out in complete detail." And that's actually the interesting step. I always like to 00:31:08.800 |
talk about walking through an airport. Like, you can plan to go to New York and arrive at the 00:31:14.160 |
airport, and then find yourself an office building later. You can't even tell me in advance what your 00:31:20.000 |
plan is for walking through the airport. Partly because you're too lazy to think about it, maybe, 00:31:24.960 |
but partly also because you just don't have the information. You don't know what gate you're 00:31:28.400 |
landing in, or what people are going to be in front of you, or anything. So there's no point 00:31:34.320 |
in planning in detail. But you have to have -- you have to make a leap of faith that you can figure 00:31:41.040 |
it out once you get there. And it's really interesting to me how you arrive at that. 00:31:47.680 |
How do you -- so you have learned over your lifetime to be able to make some kinds of 00:31:53.040 |
predictions about how hard it is to achieve some kinds of sub-goals. And that's critical. Like, 00:31:58.800 |
you would never plan to fly somewhere if you couldn't -- didn't have a model of how hard it 00:32:03.360 |
was to do some of the intermediate steps. So one of the things we're thinking about now is, 00:32:06.800 |
how do you do this kind of very aggressive generalization to situations that you haven't 00:32:14.080 |
been in and so on, to predict how long will it take to walk through the Kuala Lumpur airport? 00:32:18.640 |
Like, you could give me an estimate and it wouldn't be crazy. And you have to have an estimate of that 00:32:24.560 |
in order to make plans that involve walking through the Kuala Lumpur airport, even if you 00:32:28.880 |
don't need to know it in detail. So I'm really interested in these kinds of abstract models and 00:32:34.160 |
how do we acquire them. But once we have them, we can use them to do hierarchical reasoning, 00:32:38.960 |
which I think is very important. Yeah, there's this notion of goal regression and 00:32:44.720 |
pre-image backchaining, this idea of starting at the goal and just forming these big clouds 00:32:50.960 |
of states. I mean, it's almost like saying to the airport, you know, you know, once you show up 00:32:59.200 |
to the airport, that you're like a few steps away from the goal. So like, thinking of it this way, 00:33:07.040 |
it's kind of interesting. I don't know if you have sort of further comments on that, 00:33:12.320 |
of starting at the goal. Yeah, I mean, it's interesting that Simon, Herb Simon, back in the 00:33:20.000 |
early days of AI, talked a lot about means-ends reasoning and reasoning back from the goal. 00:33:25.120 |
There's a kind of an intuition that people have that the number of, that state space is big, 00:33:32.960 |
the number of actions you could take is really big. So if you say, here I sit and I want to 00:33:37.600 |
search forward from where I am, what are all the things I could do? That's just overwhelming. 00:33:41.520 |
If you say, if you can reason at this other level and say, here's what I'm hoping to achieve, 00:33:46.480 |
what could I do to make that true? That somehow the branching is smaller. Now, 00:33:51.600 |
what's interesting is that like in the AI planning community, that hasn't worked out. In the class of 00:33:56.960 |
problems that they look at and the methods that they tend to use, it hasn't turned out that it's 00:34:00.720 |
better to go backward. It's still kind of my intuition that it is, but I can't prove that 00:34:07.120 |
to you right now. Right. I share your intuition, at least for us mere humans. Speaking of which, 00:34:15.920 |
when you maybe now we take it and take a little step into that philosophy circle, 00:34:21.200 |
how hard would it, when you think about human life, you give those examples often, 00:34:27.680 |
how hard do you think it is to formulate human life as a planning problem or aspects of 00:34:32.400 |
human life? So when you look at robots, you're often trying to think about object manipulation, 00:34:38.640 |
tasks, about moving a thing. When you take a slight step outside the room, let the robot 00:34:46.240 |
leave and go get lunch, or maybe try to pursue more fuzzy goals. How hard do you think is that 00:34:54.480 |
problem? If you were to try to maybe put another way, try to formulate human life as a planning 00:35:00.800 |
problem. Well, that would be a mistake. I mean, it's not all a planning problem, right? I think 00:35:05.760 |
it's really, really important that we understand that you have to put together pieces and parts 00:35:11.920 |
that have different styles of reasoning and representation and learning. I think it seems 00:35:18.080 |
probably clear to anybody that it can't all be this or all be that. Brains aren't all like this 00:35:25.680 |
or all like that, right? They have different pieces and parts and substructure and so on. 00:35:30.160 |
So I don't think that there's any good reason to think that there's going to be like one 00:35:33.920 |
true algorithmic thing that's going to do the whole job. 00:35:38.080 |
Just a bunch of pieces together designed to solve a bunch of specific problems. 00:35:43.040 |
Or maybe styles of problems. I mean, there's probably some reasoning that needs to go on 00:35:50.320 |
in image space. I think, again, there's this model-based versus model-free idea, right? So 00:35:58.560 |
in reinforcement learning, people talk about, "Oh, should I learn? I could learn a policy, 00:36:03.600 |
just straight up a way of behaving. I could learn it's popular, learn a value function, 00:36:09.440 |
that's some kind of weird intermediate ground. Or I could learn a transition model, which tells me 00:36:15.840 |
something about the dynamics of the world." If I take a tra- imagine that I learn a transition 00:36:20.800 |
model and I couple it with a planner and I draw a box around that, I have a policy again. It's just 00:36:26.800 |
stored a different way. But it's just as much of a policy as the other policy. It's just I've made, 00:36:34.080 |
I think, the way I see it is it's a time-space trade-off in computation. Right? A more overt 00:36:41.440 |
policy representation. Maybe it takes more space, but maybe I can compute quickly what action I 00:36:47.520 |
should take. On the other hand, maybe a very compact model of the world dynamics plus a 00:36:52.480 |
planner lets me compute what action to take too, just more slowly. There's no, I don't, I mean, 00:36:57.840 |
I don't think there's no argument to be had. It's just like a question of what form of computation 00:37:03.840 |
is best for us. - For the various sub-problems. - Right. So, and so like learning to do algebra 00:37:11.840 |
manipulations for some reason is, I mean, that's probably gonna want naturally a sort of a 00:37:17.040 |
different representation than writing a unicycle. At the time constraints on the unicycle are 00:37:22.080 |
serious. The space is maybe smaller. I don't know. But so I. - And there could be the more human 00:37:28.080 |
sides of falling in love, having a relationship that might be another. - Yeah, I have no idea. 00:37:35.600 |
- How to model that. Yeah. Let's first solve the algebra and the object manipulation. 00:37:42.560 |
What do you think is harder, perception or planning? - Perception. That's why. - Understanding. 00:37:49.040 |
That's why. So what do you think is so hard about perception, 00:37:53.760 |
about understanding the world around you? - Well, I mean, I think the big question 00:37:57.520 |
is representational. Hugely the question is representation. So perception has made 00:38:08.400 |
great strides lately, right? And we can classify images and we can 00:38:12.800 |
play certain kinds of games and predict how to steer the car and all this sort of stuff. 00:38:17.760 |
I don't think we have a very good idea of what perception should deliver, right? So if you, 00:38:28.160 |
if you believe in modularity, okay, there's a very strong view which says 00:38:34.560 |
we shouldn't build in any modularity. We should make a giant, gigantic neural network, 00:38:40.400 |
train it end to end to do the thing. And that's the best way forward. And it's hard to argue 00:38:47.600 |
with that except on a sample complexity basis, right? So you might say, oh, well, if I want to 00:38:52.960 |
do end to end reinforcement learning on this giant, giant neural network, it's going to take 00:38:56.400 |
a lot of data and a lot of like broken robots and stuff. So then the only answer is to say, okay, 00:39:07.280 |
we have to build something in, build in some structure or some bias. We know from theory of 00:39:12.960 |
machine learning, the only way to cut down the sample complexity is to kind of cut down, 00:39:16.960 |
somehow cut down the hypothesis space. You can do that by building in bias. There's all kinds 00:39:22.960 |
of reasons to think that nature built bias into humans. Convolution is a bias. It's a very strong 00:39:31.760 |
bias and it's a very critical bias. So my own view is that we should look for more things that are 00:39:38.880 |
like convolution, but that address other aspects of reasoning, right? So convolution helps us a 00:39:43.840 |
lot with a certain kind of spatial reasoning. That's quite close to the imaging. I think 00:39:50.640 |
there's other ideas like that. Maybe some amount of forward search, maybe some notions of abstraction, 00:39:58.080 |
maybe the notion that objects exist. Actually, I think that's pretty important. And a lot of people 00:40:03.200 |
won't give you that to start with. Right? - So almost like a convolution in the, 00:40:07.600 |
in the object, semantic object space or some kind of, some kind of ideas in there. 00:40:14.560 |
- That's right. And people are starting, like the graph, graph convolutions are an idea that 00:40:18.320 |
are related to relational representations. And so, so I think there are, so you, I've come far 00:40:26.720 |
afield from perception, but I think, I think the thing that's going to make perception that kind 00:40:32.160 |
of the next step is actually understanding better what it should produce. Right? So what are we 00:40:37.600 |
going to do with the output of it? Right? It's fine when what we're going to do with the output 00:40:41.280 |
is severe. It's less clear when we're just trying to make a one integrated, intelligent agent. What 00:40:49.040 |
should the output of perception be? We have no idea. And how should that hook up to the other 00:40:53.520 |
stuff? We don't know. So I think the pressing question is what kinds of structure can we build 00:41:00.480 |
in that are like the moral equivalent of convolution that will make a really awesome 00:41:05.520 |
superstructure that then learning can kind of progress on efficiently. 00:41:10.000 |
- I agree. Very compelling description of actually where we stand with the perception problem. 00:41:14.080 |
You're teaching a course on embodying intelligence. What do you think it takes to 00:41:19.120 |
build a robot with human level intelligence? - I don't know if we knew we would do it. 00:41:24.800 |
- If you were to, I mean, okay. So do you think a robot needs to have a self-awareness, 00:41:35.200 |
a consciousness, fear of mortality, or is it simpler than that? Or is consciousness a simple 00:41:44.160 |
thing? Do you think about these notions? - I don't think much about consciousness. 00:41:49.200 |
Even most philosophers who care about it will give you that you could have robots that are zombies, 00:41:55.760 |
right? That behave like humans, but are not conscious. And I, at this moment, 00:41:59.680 |
would be happy enough with that. So I'm not really worried one way or the other. 00:42:02.480 |
- So on the technical side, you're not thinking of the use of self-awareness? 00:42:06.800 |
- Well, but I, okay. But then what does self-awareness mean? I mean, 00:42:11.280 |
that you need to have some part of the system that can observe other parts of the system and 00:42:18.160 |
tell whether they're working well or not. That seems critical. So does that count as, I mean, 00:42:23.920 |
does that count as self-awareness or not? Well, it depends on whether you think that there's 00:42:29.200 |
somebody at home who can articulate whether they're self-aware. But clearly, if I have 00:42:33.680 |
some piece of code that's counting how many times this procedure gets executed, 00:42:38.240 |
that's a kind of self-awareness, right? So there's a big spectrum. It's clear you have to have some 00:42:46.320 |
how many dimensions, but is there a direction of research that's most compelling to you for 00:42:51.600 |
trying to achieve human-level intelligence in our robots? 00:42:55.760 |
- Well, to me, I guess the thing that seems most compelling to me at the moment is this 00:43:00.960 |
question of what to build in and what to learn. I think we're missing a bunch of ideas and 00:43:11.680 |
we, you know, people, you know, don't you dare ask me how many years it's going to be until that 00:43:17.760 |
happens because I won't even participate in the conversation because I think we're missing ideas 00:43:23.120 |
and I don't know how long it's going to take to find them. - So I won't ask you how many years, 00:43:27.120 |
but maybe I'll ask you when you'll be sufficiently impressed that we've achieved it. So what's a good 00:43:36.960 |
test of intelligence? Do you like the Turing test, the natural language, and the robotic space? Is 00:43:42.640 |
there something where you would sit back and think, "Oh, that's pretty impressive as a test, 00:43:49.760 |
as a benchmark." Do you think about these kinds of problems? - No, I resist. I mean, I think all 00:43:55.040 |
the time that we spend arguing about those kinds of things could be better spent just making the 00:43:59.840 |
robots work better. - You don't value competition. So, I mean, there's a nature of benchmarks and 00:44:08.320 |
data sets or Turing test challenges where everybody kind of gets together and tries to build a better 00:44:13.840 |
robot because they want to out-compete each other. Like the DARPA challenge with the autonomous 00:44:18.080 |
vehicles. Do you see the value of that or it can get in the way? - I think it can get in the way. 00:44:25.760 |
Many people find it motivating and so that's good. I find it anti-motivating personally. 00:44:31.120 |
But I think you get an interesting cycle where for a contest, a bunch of smart people get super 00:44:40.000 |
motivated and they hack their brains out. And much of what gets done is just hacks, but sometimes 00:44:45.200 |
really cool ideas emerge. And then that gives us something to chew on after that. So, it's not a 00:44:52.080 |
thing for me, but I don't regret that other people do it. - Yeah, it's like you said, with everything 00:44:58.160 |
else, the mix is good. So, jumping topics a little bit, you started the Journal of Machine Learning 00:45:03.440 |
Research and served as its editor-in-chief. How did the publication come about? And what do you 00:45:12.560 |
think about the current publishing model space in machine learning, artificial intelligence? - 00:45:18.400 |
Okay, good. So, it came about because there was a journal called Machine Learning, which still 00:45:23.680 |
exists, which was owned by Cluer. And I was on the editorial board and we used to have these 00:45:30.800 |
meetings annually where we would complain to Cluer that it was too expensive for the libraries and 00:45:35.360 |
that people couldn't publish. And we would really like to have some kind of relief on those fronts 00:45:39.840 |
and they would always sympathize, but not do anything. So, we just decided to make a new 00:45:46.960 |
journal. And there was the Journal of AI Research, which was on the same model, which had been in 00:45:53.280 |
existence for maybe five years or so, and it was going on pretty well. So, we just made a new 00:46:00.480 |
journal. I mean, I don't know, I guess it was work, but it wasn't that hard. So, basically, 00:46:06.320 |
the editorial board, probably 75% of the editorial board of machine learning resigned and 00:46:14.720 |
we founded this new journal. - But it was sort of, it was more open. - Yeah, right. So, it's 00:46:22.320 |
completely open. It's open access. Actually, I had a postdoc, George Kanidaris, who wanted to 00:46:29.520 |
call these journals free for all. Because there were, I mean, it both has no page charges and has 00:46:37.440 |
no access restrictions. And the reason, and so lots of people, I mean, there were people who 00:46:46.960 |
were mad about the existence of this journal who thought it was a fraud or something. It would be 00:46:51.280 |
impossible, they said, to run a journal like this with basically, I mean, for a long time, I didn't 00:46:56.240 |
even have a bank account. I paid for the lawyer to incorporate and the IP address, and it just did 00:47:05.200 |
to cost a couple hundred dollars a year to run. It's a little bit more now, but not that much 00:47:09.760 |
more. But that's because I think computer scientists are competent and autonomous in a way 00:47:17.440 |
that many scientists in other fields aren't. I mean, at doing these kinds of things. We already 00:47:22.320 |
types out our own papers. We all have students and people who can hack a website together in 00:47:27.040 |
the afternoon. So, the infrastructure for us was like, not a problem. But for other people in other 00:47:32.640 |
fields, it's a harder thing to do. - Yeah, and this kind of open access journal is nevertheless 00:47:38.960 |
one of the most prestigious journals. So, it's not like a prestige and it can be achieved without 00:47:46.240 |
any of the- - Paper is not required for prestige, 00:47:49.120 |
it turns out. - So, on the review process side, 00:47:52.320 |
actually a long time ago, I don't remember when, but I reviewed a paper where you were also a 00:47:58.080 |
reviewer and I remember reading your review being influenced by it. It was really well written. It 00:48:03.120 |
influenced how I write feature reviews. You disagreed with me actually. And you made it 00:48:09.680 |
my review, but much better. But nevertheless, the review process has its flaws. 00:48:19.280 |
And how do you think, what do you think works well? How can it be improved? 00:48:23.200 |
- So, actually when I started JamLR, I wanted to do something completely different. 00:48:27.600 |
And I didn't because it felt like we needed a traditional journal of record. And so, we just 00:48:34.800 |
made JamLR be almost like a normal journal, except for the open access parts of it, basically. 00:48:40.720 |
Increasingly, of course, publication is not even a sensible word. You can publish something by 00:48:47.600 |
putting it in archive so I can publish everything tomorrow. So, making stuff public is, there's no 00:48:55.360 |
barrier. We still need curation and evaluation. I don't have time to read all of archive. 00:49:06.880 |
And you could argue that kind of social thumbs upping of articles suffices, right? You might say, 00:49:21.280 |
"Oh, heck with this. We don't need journals at all. We'll put everything on archive and people 00:49:25.920 |
will upvote and downvote the articles and then your CV will say, "Oh man, he got a lot of upvotes." 00:49:30.880 |
So, that's good. But I think there's still value in careful reading and commentary of things. And 00:49:45.040 |
it's hard to tell when people are upvoting and downvoting or arguing about your paper on Twitter 00:49:49.440 |
and Reddit, whether they know what they're talking about, right? So, then I have the 00:49:55.440 |
second order problem of trying to decide whose opinions I should value and such. So, I don't know. 00:50:01.520 |
If I had infinite time, which I don't, and I'm not going to do this because I really want to make 00:50:06.560 |
robots work, but if I felt inclined to do something more in the publication direction, 00:50:11.920 |
I would do this other thing, which I thought about doing the first time, which is to get together 00:50:16.800 |
some set of people whose opinions I value and who are pretty articulate. And I guess we would be 00:50:22.880 |
public, although we could be private, I'm not sure. And we would review papers. We wouldn't 00:50:27.520 |
publish them and you wouldn't submit them. We would just find papers and we would write 00:50:31.040 |
reviews and we would make those reviews public. And maybe if you, you know, so we're Leslie's 00:50:38.720 |
friends who review papers and maybe eventually if we, our opinion was sufficiently valued, 00:50:44.320 |
like the opinion of JMR is valued, then you'd say on your CV that Leslie's friends gave my paper a 00:50:49.920 |
five-star reading and that would be just as good as saying I got it accepted into this journal. 00:50:55.440 |
So, I think we should have good public commentary and organize it in some way, 00:51:03.760 |
but I don't really know how to do it. It's interesting times. 00:51:06.080 |
- The way you describe it actually is really interesting. I mean, we do it for movies, 00:51:10.000 |
imdb.com. There's experts, critics come in, they write reviews, but there's also 00:51:16.000 |
regular non-critics. Humans write reviews and they're separated. 00:51:19.760 |
- I like open review. The iClear process I think is interesting. 00:51:29.120 |
- It's a step in the right direction, but it's still not as compelling as 00:51:32.960 |
reviewing movies or video games. I mean, it sometimes almost, it might be silly, 00:51:40.240 |
at least from my perspective to say, but it boils down to the user interface, 00:51:43.760 |
how fun and easy it is to actually perform the reviews, how efficient, how much you as a reviewer 00:51:50.160 |
get street cred for being a good reviewer. Those human elements come into play. 00:51:56.640 |
- No, it's a big investment to do a good review of a paper and the flood of papers is out of control. 00:52:04.000 |
Right, so, you know, there aren't 3,000 new, I don't know how many new movies are there in a year. 00:52:08.480 |
I don't know, but there's probably gonna be less than how many machine learning papers 00:52:11.920 |
are in a year now. And I'm worried, you know, I, right, so I'm like an old person, so of course, 00:52:21.760 |
I'm gonna say, "Rawr, rawr, rawr, things are moving too fast. I'm a stick in the mud." 00:52:26.320 |
So I can say that, but my particular flavor of that is, I think the horizon for researchers 00:52:34.560 |
has gotten very short. That students want to publish a lot of papers and there's a huge, 00:52:41.520 |
there's value, it's exciting and there's value in that and you get patted on the head for it 00:52:46.480 |
and so on. But, and some of that is fine, but I'm worried that we're driving out people who 00:52:57.760 |
would spend two years thinking about something. Back in my day, when we worked on our theses, 00:53:05.280 |
we did not publish papers. You did your thesis for years. You picked a hard problem and then 00:53:10.400 |
you worked and chewed on it and did stuff and wasted time and for a long time. And when it was, 00:53:16.000 |
roughly when it was done, you would write papers. And so I don't know how to, and I don't think 00:53:22.640 |
that everybody has to work in that mode, but I think there's some problems that are hard enough 00:53:26.800 |
that it's important to have a longer research horizon and I'm worried that 00:53:31.680 |
we don't incentivize that at all at this point. - In this current structure. 00:53:40.560 |
as, what are your hopes and fears about the future of AI and continuing on this theme? So AI has gone 00:53:47.280 |
through a few winters, ups and downs. Do you see another winter of AI coming? Are you more hopeful 00:53:55.760 |
about making robots work, as you said? - I think the cycles are inevitable, 00:54:02.880 |
but I think each time we get higher, right? I mean, so, you know, it's like climbing some kind 00:54:09.680 |
of landscape with a noisy optimizer. So it's clear that the, you know, the deep learning stuff has 00:54:19.520 |
made deep and important improvements. And so the high watermark is now higher. There's no question, 00:54:26.960 |
but of course, I think people are overselling and eventually investors, I guess, and other people 00:54:35.680 |
look around and say, well, you're not quite delivering on this grand claim and that wild 00:54:41.680 |
hypothesis. It's like, probably it's going to crash some amount and then it's okay. I mean, 00:54:48.800 |
but I don't, I can't imagine that there's like some awesome monotonic improvement from here to 00:54:55.200 |
human level AI. - So in, you know, I have to ask this question. I probably anticipate answers, 00:55:02.320 |
the answers, but do you have a worry, short term or long term about the existential threats of AI 00:55:10.240 |
and maybe short term, less existential, but more robots taking away jobs? 00:55:18.880 |
- Well, actually, let me talk a little bit about utility. Actually, I had an interesting 00:55:27.200 |
conversation with some military ethicists who wanted to talk to me about autonomous weapons. 00:55:32.560 |
And they were interesting, smart, well-educated guys who didn't know too much about AI or machine 00:55:40.880 |
learning. And the first question they asked me was, has your robot ever done something you didn't 00:55:45.280 |
expect? And I like burst out laughing because anybody who's ever done something on the robot, 00:55:50.560 |
right, knows that they don't do much. And what I realized was that their model of how we program a 00:55:56.240 |
robot was completely wrong. Their model of how we can program a robot was like Lego Mindstorms, 00:56:02.560 |
like, oh, go forward a meter, turn left, take a picture, do this, do that. And so if you have 00:56:07.280 |
that model of programming, then it's true. It's kind of weird that your robot would do something 00:56:12.560 |
that you didn't anticipate. But the fact is, and actually, so now this is my new educational 00:56:17.680 |
mission. If I have to talk to non-experts, I try to teach them the idea that we don't operate, 00:56:24.560 |
we operate at least one or maybe many levels of abstraction about that. And we say, oh, 00:56:29.680 |
here's a hypothesis class. Maybe it's a space of plans, or maybe it's a space of classifiers or 00:56:35.200 |
whatever, but there's some set of answers and an objective function. And then we work on some 00:56:40.160 |
optimization method that tries to optimize a solution in that class. And we don't know what 00:56:46.800 |
solution is going to come out. So I think it's important to communicate that. So I mean, of 00:56:52.320 |
course, probably people who listen to this, they know that lesson. But I think it's really critical 00:56:56.960 |
to communicate that lesson. And then lots of people are now talking about the value alignment 00:57:01.840 |
problem. So you want to be sure as robots or software systems get more competent, that their 00:57:09.360 |
objectives are aligned with your objectives, or that our objectives are compatible in some way, 00:57:14.480 |
or we have a good way of mediating when they have different objectives. And so I think it is 00:57:20.240 |
important to start thinking in terms, like, you don't have to be freaked out by the robot apocalypse 00:57:26.720 |
to accept that it's important to think about objective functions of value alignment. 00:57:32.960 |
everyone who's done optimization knows that you have to be careful what you wish for, that, 00:57:37.120 |
you know, sometimes you get the optimal solution, and you realize, man, that objective was wrong. 00:57:41.920 |
So pragmatically, in the shortish term, it seems to me that those are really interesting and 00:57:50.480 |
critical questions. And the idea that we're going to go from being people who engineer algorithms 00:57:55.040 |
to being people who engineer objective functions, I think that's definitely going to happen. And 00:58:00.400 |
that's going to change our thinking and methodology. >> You started at Stanford 00:58:05.360 |
philosophy, that's where you should go back to philosophy. >> Philosophy, maybe. 00:58:09.600 |
>> Designing objective functions. >> Well, I mean, they're mixed together, 00:58:12.880 |
because as we also know, as machine learning people, right, when you design, in fact, this is 00:58:18.080 |
the lecture I gave in class today, when you design an objective function, you have to wear both hats. 00:58:23.360 |
There's the hat that says, what do I want? And there's the hat that says, but I know what my 00:58:28.320 |
optimizer can do to some degree. And I have to take that into account. So it's always a tradeoff, 00:58:34.640 |
and we have to kind of be mindful of that. The part about taking people's jobs, I understand 00:58:41.520 |
that that's important. I don't understand sociology or economics or people very well. 00:58:47.920 |
So I don't know how to think about that. >> So that's, yeah, so there might be a 00:58:51.840 |
sociological aspect there, the economic aspect that's very difficult to think about. Okay. 00:58:56.400 |
>> I mean, I think other people should be thinking about it, but I'm just, that's not my strength. 00:58:59.840 |
>> So what do you think is the most exciting area of research in the short term, 00:59:04.320 |
for the community and for yourself? >> Well, so, I mean, there's this story 00:59:08.400 |
I've been telling about how to engineer intelligent robots, right? So that's what we want to do. We 00:59:15.920 |
all kind of want to do, well, I mean, some set of us want to do this. And the question is, what's 00:59:20.240 |
the most effective strategy? And we've tried, and there's a bunch of different things you could do 00:59:24.960 |
at the extremes, right? One super extreme is we do introspection and we write a program. Okay, 00:59:30.960 |
that has not worked out very well. Another extreme is we take a giant bunch of neural 00:59:35.680 |
goo and we try to train it up to do something. I don't think that's going to work either. 00:59:39.360 |
So the question is, what's the middle ground? And again, this isn't a theological question 00:59:48.480 |
or anything like that. It's just like, how do we, what's the best way to make this work out? 00:59:54.960 |
And I think it's clear, it's a combination of learning. To me, it's clear. It's a combination 01:00:00.240 |
of learning and not learning. And what should that combination be? And what's the stuff we 01:00:05.120 |
build in? So to me, that's the most compelling question. >> And when you say engineer robots, 01:00:09.680 |
you mean engineering systems that work in the real world? Is that, that's the emphasis? 01:00:17.600 |
Last question. Which robots or robot is your favorite from science fiction? 01:00:23.200 |
So you can go with Star Wars or RTD2, or you can go with more modern, maybe Hal from- 01:00:33.280 |
>> No, I don't think I have a favorite robot from science fiction. 01:00:37.040 |
>> This is back to, you like to make robots work in the real world here, not in- 01:00:45.360 |
>> I mean, I love the process. And I care more about the process. 01:00:49.920 |
>> The engineering process. >> Yeah. I mean, I do research because it's fun, 01:00:53.840 |
not because I care about what we produce. >> Well, that's a beautiful note, actually, 01:00:59.520 |
to end on. Leslie, thank you so much for talking today.