back to indexAnca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81
Chapters
0:0 Introduction
2:26 Interest in robotics
5:32 Computer science
7:32 Favorite robot
13:25 How difficult is human-robot interaction?
32:1 HRI application domains
34:24 Optimizing the beliefs of humans
45:59 Difficulty of driving when humans are involved
65:2 Semi-autonomous driving
70:39 How do we specify good rewards?
77:30 Leaked information from human behavior
81:59 Three laws of robotics
86:31 Book recommendation
89:2 If a doctor gave you 5 years to live...
92:48 Small act of kindness
94:31 Meaning of life
00:00:00.000 |
The following is a conversation with Anka Jogan, 00:00:03.900 |
a professor at Berkeley working on human robot interaction, 00:00:08.160 |
algorithms that look beyond the robot's function 00:00:18.080 |
She also consults at Waymo, the autonomous vehicle company, 00:00:27.140 |
She is one of the most brilliant and fun roboticists 00:00:36.320 |
so I was a bit tired, even more so than usual, 00:00:48.880 |
So I had a lot of fun and really enjoyed this conversation. 00:01:01.680 |
or simply connect with me on Twitter at Lex Friedman, 00:01:08.160 |
As usual, I'll do one or two minutes of ads now 00:01:35.360 |
Since Cash App does fractional share trading, 00:01:39.240 |
let me mention that the order execution algorithm 00:01:43.400 |
to create the abstraction of fractional orders 00:01:54.760 |
that takes a step up to the next layer of abstraction 00:01:59.360 |
making trading more accessible for new investors 00:02:15.920 |
an organization that is helping to advance robotics 00:02:18.560 |
and STEM education for young people around the world. 00:02:22.340 |
And now here's my conversation with Enka Drogon. 00:02:25.920 |
When did you first fall in love with robotics? 00:02:37.060 |
because I first started getting into programming 00:02:54.340 |
and I was coming from this little school in Germany 00:02:59.020 |
but I had spent an exchange semester at Carnegie Mellon, 00:03:13.220 |
and I thought that robotics is a really cool way 00:03:16.300 |
to actually apply the stuff that I knew and loved, 00:03:25.780 |
which is I used to do mostly manipulation in my PhD, 00:03:30.780 |
but now I do kind of a bit of everything application-wise, 00:03:42.140 |
while I was a PhD student still for RSS 2014. 00:03:48.220 |
and he arranged for, it was Google at the time, 00:03:56.400 |
and it was just making decision after decision the right call 00:04:03.400 |
So it was a whole different experience, right? 00:04:08.680 |
- Was it the most magical robot you've ever met? 00:04:11.200 |
So like, for me, too, me and Google self-driving car, 00:04:14.920 |
for the first time was like a transformative moment. 00:04:19.960 |
that and Spot Mini, I don't know if you met Spot Mini 00:04:30.840 |
It's just, I mean, there's nothing truly special. 00:04:35.800 |
but the anthropomorphism that went on into my brain, 00:04:41.440 |
Like it had a little arm and it like, and looked at me. 00:04:48.960 |
And it made me realize, wow, robots can be so much more 00:04:54.240 |
They can be things that have a human connection. 00:04:56.920 |
Do you have, was the self-driving car the moment, 00:05:00.440 |
like, was there a robot that truly sort of inspired you? 00:05:03.880 |
- That was, I remember that experience very viscerally, 00:05:21.280 |
- Oh, that was like the smaller one, like the firefly. 00:05:25.640 |
And I put it on my laptop and I had that for years 00:05:30.160 |
until I finally changed my laptop out and you know. 00:05:33.120 |
- What about if we walk back, you mentioned optimization. 00:05:36.320 |
Like what beautiful ideas inspired you in math, 00:05:49.000 |
- The thing is I liked math from very early on, 00:05:52.400 |
from fifth grade is when I got into the math Olympiad 00:05:58.520 |
- Yeah, this, Romania is like our national sport, 00:06:14.960 |
And other than understanding, which was cool, 00:06:31.280 |
- Do you remember like the first program you've written? 00:06:36.400 |
I kind of do, it was in Q basic in fourth grade. 00:06:47.480 |
- Yeah, that was, I don't know how to do that anymore. 00:06:59.040 |
So you could sign up for dance or music or programming. 00:07:17.160 |
- I did a little bit of the computer science Olympian, 00:07:21.400 |
but not as seriously as I did the math Olympian. 00:07:25.800 |
Yeah, it's basically, here's a hard math problem, 00:07:27.760 |
solve it with a computer is kind of the deal. 00:07:39.920 |
well, what's like who or what is your favorite robot, 00:08:19.840 |
and it's the manipulator and what does it all mean? 00:08:33.640 |
So yeah, it goes woo and then it's super cute. 00:08:38.600 |
And yeah, the way it moves is just so expressive. 00:08:44.760 |
and what it's doing with these lenses is amazing. 00:08:48.280 |
And so I've really liked that from the start. 00:08:53.280 |
And then on top of that, sometimes I shared this, 00:09:01.160 |
My husband proposed to me by building a WALL-E 00:09:09.680 |
So it's seven degrees of freedom, including the lens thing. 00:09:23.520 |
And then it spewed out this box made out of Legos 00:09:34.400 |
- That could be like the most impressive thing 00:10:22.760 |
- And so do we automatically just anthropomorphize 00:10:39.920 |
So if you wanna do it in this very particular narrow setting 00:10:44.920 |
where it does only one thing and it's expressive, 00:11:12.040 |
just to clarify, I used to work a lot on this. 00:11:14.680 |
I don't work on that quite as much these days, 00:11:21.720 |
when they pick something up and put it in a place, 00:11:24.320 |
they can do that with various forms of style, 00:11:28.160 |
where you can say, well, this robot is succeeding 00:11:30.760 |
at this task and is confident versus it's hesitant 00:11:41.320 |
they can communicate so much about internal states 00:12:11.600 |
It doesn't reply in any way, it just says the same thing. 00:12:32.520 |
But it's really hard because it's, I don't know, 00:12:39.760 |
when it came to expressing goals or intentions for robots, 00:12:47.480 |
instead of doing robotics where you have your state 00:12:55.480 |
the reward function that you're trying to optimize, 00:12:57.880 |
now you kind of have to expand the notion of state 00:13:05.960 |
What do they think about the robots, something or other? 00:13:10.200 |
And then you have to optimize in that system. 00:13:14.160 |
how your motion, your actions end up sort of influencing 00:13:27.120 |
incorporating the human into the state model, 00:13:33.640 |
but how complicated are human beings, do you think? 00:13:46.160 |
Or is there something, do we have to model things like mood 00:13:52.800 |
I mean, all of these kinds of human qualities 00:14:00.160 |
- How hard is the problem of human robot interaction? 00:14:03.360 |
- Yeah, should we talk about what the problem 00:14:12.320 |
So, and by the way, I'm gonna talk about this very 00:14:15.840 |
particular view of human robot interaction, right? 00:14:21.600 |
or on the side of how do you have a good conversation 00:14:26.760 |
It turns out that if you make robots taller versus shorter, 00:14:29.200 |
this has an effect on how people act with them. 00:14:34.640 |
But I'm talking about this very kind of narrow thing, 00:14:42.880 |
in a lab out there in the world, but in isolation, 00:14:46.600 |
and now you're asking, what does it mean for the robot 00:14:55.880 |
That ends up changing the problem in two ways. 00:15:04.680 |
the robot is no longer the single agent acting. 00:15:08.560 |
Do you have humans who also take actions in that same space? 00:15:13.840 |
robots around an office, navigate around the people, 00:15:18.560 |
If I send the robot to over there in the cafeteria 00:15:20.920 |
to get me a coffee, then there's probably other people 00:15:30.560 |
Then you have these people who are also making decisions 00:15:36.240 |
And even if the robot knows what it should do 00:15:39.160 |
and all of that, just coexisting with these people, right? 00:15:47.080 |
That's sort of the kind of problem number one. 00:15:56.560 |
if I'm a programmer, I can specify some objective 00:16:07.280 |
presumably you might have your own opinions about, 00:16:13.960 |
Then how should robot know how close to me it should come 00:16:26.000 |
should satisfy the preferences of that end user, 00:16:39.800 |
So really it boils down to understand the humans 00:16:42.240 |
in order to interact with them and in order to please them. 00:16:51.080 |
So I think there's two tasks about understanding humans 00:17:00.960 |
So there's the task of being able to just anticipate 00:17:05.680 |
We all know that cards need to do this, right? 00:17:07.600 |
We all know that, well, if I navigate around some people, 00:17:47.480 |
to being able to anticipate what they'll do in the future. 00:17:57.000 |
because we're trying to achieve certain things. 00:17:59.400 |
And so I think that's the relationship between them. 00:18:01.600 |
Now, how complicated do these models need to be 00:18:05.560 |
in order to be able to understand what people want? 00:18:15.160 |
with something called inverse reinforcement learning, 00:18:25.240 |
- Right, so it's the problem of take human behavior 00:18:34.560 |
that that behavior is optimal with respect to. 00:18:58.200 |
"Okay, I'm getting the trade-offs that they're making. 00:19:03.000 |
I'm getting the preferences that they want out of this." 00:19:06.160 |
And so we've been successful in robotics somewhat with this. 00:19:10.320 |
And it's based on a very simple model of human behavior. 00:19:18.680 |
with respect to whatever it is that people want, right? 00:19:30.560 |
So this is based on utility maximization in economics. 00:19:39.760 |
"Okay, people are making choices by maximizing utility, go." 00:19:58.800 |
So they might choose something kind of stochastically 00:20:12.640 |
and something that we call Boltzmann rationality. 00:20:22.840 |
for these tasks where it turns out people act 00:20:26.320 |
noisily enough that you can't just do vanilla, 00:20:37.240 |
Then now we're hitting tasks where that's not enough. 00:20:44.360 |
So imagine you're trying to control some robot 00:20:50.080 |
'cause maybe you're a patient with a motor impairment 00:20:57.160 |
Or one test that we've looked at with Sergei is, 00:21:14.360 |
Imagine you're trying to provide some assistance 00:21:20.880 |
where you want the kind of the autonomy to kick in, 00:21:22.760 |
figure out what it is that you're trying to do 00:21:25.640 |
It's really hard to do that for say lunar lander 00:21:33.640 |
And so they seem much more noisy than really rational. 00:21:44.200 |
so we talked about the forties utility, late fifties. 00:21:51.240 |
and behavioral economics started being a thing 00:21:58.680 |
People are messy and emotional and irrational 00:22:10.000 |
- So what does my robot do to understand what you want? 00:22:19.680 |
we get away with pretty simple models until we don't. 00:22:23.240 |
And then the question is, what do you do then? 00:22:36.800 |
enough that you can reliably understand what people want, 00:22:44.920 |
You'll get these systems that are more and more capable 00:22:49.120 |
that you're telling them the right thing to do. 00:23:02.160 |
it would be harder than if I got to say something 00:23:08.600 |
Can you, can the robot help its understanding of the human 00:23:13.120 |
by influencing the behavior by actually acting? 00:23:19.800 |
So one of the things that's been exciting to me lately 00:23:28.520 |
when you try to think of the robotics problem as, 00:23:31.920 |
okay, I have a robot and it needs to optimize 00:23:34.480 |
for whatever it is that a person wants it to optimize, 00:23:54.920 |
at least implicitly to what it is that they want. 00:23:57.200 |
They can't write it down, but they can talk about it. 00:24:11.880 |
And so there's these information gathering actions 00:24:15.360 |
that the robot can take to sort of solicit responses 00:24:21.920 |
this is not for the purpose of assisting people, 00:24:23.920 |
but with kind of back to coordinating with people in cars 00:24:31.840 |
so we were looking at cars being able to navigate 00:24:43.000 |
but you want to change lanes in front of them. 00:24:45.240 |
- Navigating around other humans inside cars? 00:24:59.000 |
Similar things, ideas apply to pedestrians as well, 00:25:06.240 |
Well, you could be trying to infer the driving style 00:25:24.320 |
if you think that if you want to hedge your bets 00:25:27.960 |
and say, ah, maybe they're actually pretty aggressive, 00:25:36.440 |
because you're not actually getting the observations 00:25:45.200 |
regardless if they're aggressive or defensive. 00:25:51.000 |
to reason about how it might actually be able 00:25:54.160 |
to gather information by changing the actions 00:25:58.080 |
And then the robot comes up with these cool things 00:26:02.520 |
and then sees if you're going to slow down or not. 00:26:05.240 |
Then if you slow down, it sort of updates its model of you 00:26:07.920 |
and says, oh, okay, you're more on the defensive side. 00:26:14.320 |
That's so cool that you could use your own actions 00:26:30.240 |
- It's rare 'cause it's actually leveraging human. 00:26:33.400 |
I mean, most roboticists I've talked to a lot, 00:26:38.240 |
are kind of, being honest, kind of afraid of humans. 00:26:42.960 |
- 'Cause they're messy and complicated, right? 00:26:46.680 |
Going back to what we were talking about earlier, 00:26:49.800 |
right now we're kind of in this dilemma of, okay, 00:26:57.960 |
We can figure out their driving styles, whatever. 00:27:06.040 |
And this one, I've had a little bit of hope recently 00:27:23.920 |
But basically one thing that we've been thinking about 00:27:27.960 |
instead of kind of giving up and saying people 00:27:30.440 |
are too crazy and irrational for us to make sense of them, 00:27:33.520 |
maybe we can give them a bit the benefit of the doubt 00:27:43.960 |
but just under different assumptions about the world, 00:28:02.720 |
This is the transition function, that's what they know. 00:28:11.040 |
the way, the reason they can seem a little messy 00:28:16.440 |
is that perhaps they just make different assumptions 00:28:33.280 |
is that we just don't understand the constraints 00:28:38.280 |
And so our goal shouldn't be to throw our hands up 00:28:43.640 |
let's try to understand what are the constraints. 00:28:55.560 |
That's just good to, communicating with humans, 00:28:58.480 |
that's just a good, assume that you just don't, 00:29:03.400 |
- It's just maybe there's something you're missing 00:29:06.000 |
and it's, you know, it especially happens to robots 00:29:08.560 |
'cause they're kind of dumb and they don't know things 00:29:10.200 |
and oftentimes people are sort of super irrational 00:29:12.720 |
and that they actually know a lot of things that robots don't 00:29:26.880 |
but assuming a much more simplified physics model, 00:29:31.040 |
'cause they don't get the complexity of this kind of craft 00:29:33.840 |
or the robot arm with seven degrees of freedom 00:29:38.320 |
So maybe they have this intuitive physics model, 00:29:41.520 |
which is not, you know, this notion of intuitive physics 00:29:44.240 |
is something that is studied actually in cognitive science 00:29:46.560 |
was like Josh Tenenbaum, Tom Griffith's work on this stuff. 00:29:49.840 |
And what we found is that you can actually try 00:30:01.320 |
And then you can use that to sort of correct what it is 00:30:08.720 |
So they might be sending the craft somewhere, 00:30:16.920 |
if the world worked according to their intuitive 00:30:19.640 |
physics model, where do they think that the craft is going? 00:30:26.040 |
And then you can use the real physics, right? 00:30:31.560 |
instead of where they were actually sending you 00:30:38.320 |
and you know, in between the two flags and all that. 00:30:47.320 |
maybe we're kind of underestimating humans in some ways 00:31:03.600 |
that for instance, have touched upon the planning horizon. 00:31:11.400 |
maybe we work under computational constraints. 00:31:13.680 |
And I think kind of our view recently has been, 00:31:19.760 |
and just break it in all sorts of ways by saying state. 00:31:23.440 |
The person doesn't get to see the real state. 00:31:31.600 |
maybe they're still learning about what it is that they want. 00:31:49.720 |
You're still trying to figure out what you like, 00:31:52.640 |
So I think it's important to also account for that. 00:32:04.760 |
and we were talking about human-robot interaction, 00:32:07.120 |
what kind of problem spaces are you thinking about? 00:32:13.880 |
like wheeled robots with autonomous vehicles? 00:32:18.560 |
Like when you think about human-robot interaction 00:32:24.440 |
for the entire community of human-robot interaction. 00:32:27.000 |
No, but like, what are the problems of interest here? 00:32:43.040 |
but it could just happen in the virtual space. 00:32:46.360 |
So where's the boundaries of this field for you 00:32:51.800 |
- Yeah, so I tried to find kind of underlying, 00:33:03.800 |
I might call what I do the kind of working on 00:33:06.640 |
the foundations of algorithmic human-robot interaction 00:33:15.920 |
is actually somewhat domain agnostic when it comes to, 00:33:27.880 |
it's sort of the same underlying principles apply. 00:33:36.600 |
But these things that we were talking about around, 00:33:42.440 |
It turns out that a lot of systems need to core benefit 00:33:45.760 |
from a better understanding of how human behavior relates 00:33:49.560 |
to what people want and need to predict human behavior, 00:33:53.560 |
physical robots of all sorts and beyond that. 00:34:00.600 |
and then I was picking up stuff with people around. 00:34:23.800 |
- I thought that popped into my head just now. 00:34:28.720 |
this really interesting idea of using actions 00:34:47.480 |
- Yeah, is they also have a world model of view, 00:35:01.480 |
You said with the kids, people see Alexa in a certain way. 00:35:06.320 |
Is there some value in trying to also optimize 00:35:13.560 |
Or is that a little too far away from the specifics 00:35:26.320 |
And we've seen a little bit of progress on this problem, 00:35:36.280 |
to how complicated does the human model need to be. 00:35:38.320 |
But in one piece of work that we were looking at, 00:35:52.720 |
what driving style the robot has, or something like that. 00:35:55.320 |
And what we're gonna do is we're gonna set up a system 00:35:58.240 |
where part of the state is the person's belief 00:36:10.760 |
And so they're updating their mental model of the robot. 00:36:13.760 |
So if they see a card that sort of cut someone off, 00:36:19.280 |
If they see sort of a robot head towards a particular door, 00:36:28.040 |
to try to understand their goals and intentions, 00:36:31.120 |
humans are inevitably gonna do that to robots. 00:36:34.560 |
And then that raises this interesting question 00:36:36.560 |
that you asked, which is, can we do something about that? 00:36:45.680 |
for being more informative and less confusing 00:36:50.920 |
of how your actions are being interpreted by the human, 00:36:53.640 |
how they're using these actions to update their belief. 00:36:56.680 |
And honestly, all we did is just Bayes' rule. 00:37:02.920 |
they see an action, they make some assumptions 00:37:06.360 |
presumably as being rational, 'cause robots are rational, 00:37:11.280 |
And then they incorporate that new piece of evidence, 00:37:31.200 |
- So that's kind of a mathematical formalization of that. 00:37:43.720 |
The kids talking to Alexa disrespectfully worries me. 00:37:52.200 |
I guess I grew up in the Soviet Union, World War II, 00:37:54.800 |
I'm a Jew too, so with the Holocaust and everything. 00:37:58.160 |
I just worry about how we humans sometimes treat the other, 00:38:01.720 |
the group that we call the other, whatever it is. 00:38:05.080 |
Through human history, the group that's the other 00:38:09.560 |
But it seems like the robot will be the other, 00:38:23.440 |
- Shoved around, and is there one at the shallow level, 00:38:28.440 |
it seems that robots need to talk back a little bit. 00:38:31.560 |
Like my intuition says, I mean, most companies 00:38:35.480 |
from sort of Roomba, autonomous vehicle companies 00:38:40.360 |
that a robot has a little bit of an attitude. 00:38:48.280 |
Like we humans don't seem to respect anything 00:38:53.000 |
- That, or like a mix of mystery and attitude and anger 00:38:58.000 |
and that threatens us subtly, maybe passive aggressively. 00:39:03.960 |
I don't know, it seems like we humans, yeah, need that. 00:39:13.920 |
- One is, one is, it's, we respond to, you know, 00:39:41.720 |
a little more expressive, a little bit more like, 00:39:45.480 |
that wasn't cool to do and now I'm bummed, right? 00:39:51.720 |
'cause people can't help but anthropomorphize 00:39:54.440 |
Even that though, the emotion being communicated 00:40:02.080 |
We're still interpreting, you know, we watch, 00:40:07.320 |
with little triangles and kind of dots on a screen 00:40:13.080 |
and you get really angry at the darn triangle 00:40:16.080 |
'cause why is it not leaving the square alone? 00:40:21.520 |
- The vulnerability, that's really interesting. 00:40:29.760 |
being assertive as the only mechanism of getting, 00:40:37.960 |
Perhaps there's other mechanism that are less threatening. 00:40:43.960 |
But then this other thing that we can think about is, 00:40:48.360 |
that interaction is really game theoretic, right? 00:40:50.600 |
So the moment you're taking actions in a space, 00:40:52.760 |
the humans are taking actions in that same space, 00:40:55.360 |
but you have your own objective, which is, you know, 00:41:00.840 |
And then the human nearby has their own objective, 00:41:03.680 |
which somewhat overlaps with you, but not entirely. 00:41:06.640 |
You're not interested in getting into an accident 00:41:09.160 |
with each other, but you have different destinations 00:41:20.520 |
treating it as such as kind of a way we can step outside 00:41:32.200 |
and you don't realize you have any influence over it, 00:41:37.200 |
because you're understanding that people also understand 00:41:46.680 |
really talking about different equilibria of a game. 00:41:53.160 |
is to just make predictions about what people will do 00:41:57.800 |
And that's hard for the reasons we talked about, 00:41:59.880 |
which is how you have to understand people's intentions, 00:42:05.320 |
but somehow you have to get enough of an understanding 00:42:07.160 |
of that to be able to anticipate what happens next. 00:42:13.600 |
that people change what they do based on what you do, 00:42:17.320 |
'cause they don't plan in isolation either, right? 00:42:20.960 |
So when you see cars trying to merge on a highway 00:42:24.720 |
and not succeeding, one of the reasons this can be 00:42:27.640 |
is because they look at traffic that keeps coming, 00:42:32.640 |
they predict what these people are planning on doing, 00:42:52.840 |
"No, no, no, actually, these people change what they do 00:42:59.560 |
Like if the car actually tries to inch itself forward, 00:43:03.400 |
they might actually slow down and let the car in. 00:43:13.360 |
We call this like this underactuated system idea 00:43:16.040 |
where it's like an underactuated system in robotics, 00:43:18.480 |
but you're influenced these other degrees of freedom, 00:43:27.440 |
the human element in this picture as underactuated. 00:43:46.320 |
- Yeah, it's a very simple way of underactuation 00:43:48.800 |
where basically there's literally these degrees of freedom 00:43:59.440 |
that what you do influences what they end up doing. 00:44:14.280 |
I think about this a lot in the case of pedestrians. 00:44:27.240 |
I learn a lot about myself, about our human behavior 00:44:37.880 |
is like you're putting your life on the line. 00:44:40.320 |
I don't know, tens of millions of time in America every day 00:44:44.520 |
is people are just like playing this weird game of chicken 00:44:54.280 |
That has to do either with the rules of the road 00:44:56.640 |
or with the general personality of the intersection 00:45:10.280 |
Somebody, there's a runner that gave me this advice. 00:45:18.800 |
And he said that if you don't make eye contact with people 00:45:22.240 |
when you're running, they will all move out of your way. 00:45:29.200 |
Oh, wow, I need to look this up, but it works. 00:45:32.880 |
My sense was if you communicate confidence in your actions 00:45:47.200 |
as opposed to nudging where you're sort of hesitantly. 00:45:49.860 |
The hesitation might communicate that you're still 00:45:55.120 |
in the dance, in the game that they can influence 00:45:59.480 |
I've recently had a conversation with Jim Keller, 00:46:03.240 |
who's a sort of this legendary chip architect, 00:46:08.240 |
but he also led the autopilot team for a while. 00:46:12.280 |
And his intuition, that driving is fundamentally still 00:46:24.040 |
And you can kind of learn the right dynamics required 00:46:27.160 |
to do the merger and all those kinds of things. 00:46:29.840 |
And then my sense is, and I don't know if I can provide 00:46:41.540 |
Like it's not simply object collision avoidance problem. 00:46:49.240 |
of course, nobody knows the right answer here, 00:46:51.020 |
but where does your intuition fall on the difficulty, 00:46:54.360 |
fundamental difficulty of the driving problem, 00:47:12.840 |
No pedestrians, no human driven vehicles, no cyclists, 00:47:16.800 |
no people on little electric scooters zipping around, 00:47:25.040 |
There's nothing really that still needs to be solved 00:47:37.440 |
But we need to sort of internalize that idea. 00:47:42.960 |
'Cause we may not quite yet be done with that. 00:47:48.240 |
A lot of people kind of map autonomous driving 00:48:01.560 |
Do you see that as a, how hard is that problem? 00:48:06.160 |
So your intuition there behind your statement was, 00:48:16.760 |
I mean, and by the way, a bunch of years ago, 00:48:29.380 |
But I think it's fairly safe to say that at this point, 00:48:33.800 |
although you could always improve on things and all of that, 00:48:46.920 |
we've made a lot of progress on the perceptions 00:48:49.240 |
and I don't undermine the difficulty of the problem. 00:48:54.520 |
I think that the planning problem, the control problem, 00:48:58.440 |
all very difficult, but I think what makes it really- 00:49:11.600 |
now it's no longer snowing, now it's slippery in this way, 00:49:14.120 |
now it's the dynamics part, I could imagine being, 00:49:35.300 |
it's not actually, it may not be a good example because- 00:49:47.820 |
To me, what feels dangerous is highway speeds 00:49:51.020 |
when everything is, to us humans, super clear. 00:49:57.060 |
I think it's kind of irresponsible to not use LIDAR. 00:50:04.580 |
but I think if you have the opportunity to use LIDAR, 00:50:08.740 |
well, good, and in a lot of cases you might not. 00:50:21.500 |
there's a lot of, how many cameras do you have? 00:50:28.420 |
I imagine there's stuff that's really hard to actually see. 00:50:37.740 |
I think I have more of my intuition comes from systems 00:50:52.220 |
I also sympathize with the Elon Musk statement 00:50:57.380 |
It's a fun notion to think that the things that work today 00:51:17.300 |
You see this in academic and research settings all the time. 00:51:19.900 |
The things that work force you to not explore outside, 00:51:26.780 |
The problem is in the safety critical systems, 00:51:29.020 |
you kinda wanna stick with the things that work. 00:51:32.060 |
So it's an interesting and difficult trade-off 00:51:47.140 |
- How, I mean, how hard is this human element? 00:52:00.020 |
But perhaps actually the year isn't the thing I'm asking. 00:52:11.660 |
in solving the human-robot interaction problem 00:52:27.020 |
And on top of that, playing the game is hard. 00:52:35.260 |
some of the fundamental understanding for that. 00:52:49.300 |
a few companies that don't have a driver in the car 00:53:06.620 |
But there's incredible engineering work being done there. 00:53:13.180 |
it sounds silly, but to be able to drive without a, 00:53:15.580 |
without a ride, sorry, without a driver in the seat. 00:53:27.820 |
without being able to take the steering wheel. 00:53:42.460 |
I mean, it felt fast because you're like freaking out. 00:53:51.180 |
And there's humans and it deals with them quite well. 00:53:53.820 |
It detects them, it negotiates the intersections, 00:53:58.180 |
So at least in those domains, it's solving them. 00:54:11.060 |
how quickly can we expand to like cities like San Francisco? 00:54:14.580 |
- Yeah, and I wouldn't say that it's just, you know, 00:54:17.140 |
now it's just pure engineering and it's probably the, 00:54:22.060 |
I'm speaking kind of very generally here as hypothesizing, 00:54:34.380 |
So that seems to suggest that things can be expanded 00:54:38.860 |
and can be scaled and we know how to do a lot of things, 00:54:49.220 |
as you learn more and more about new challenges 00:54:55.780 |
- How much of this problem do you think can be learned 00:55:02.740 |
how much of it can be learned from sort of data from scratch 00:55:08.460 |
of autonomous vehicle systems have a lot of heuristics 00:55:40.420 |
as it's a bunch of rules that some people wrote down 00:55:43.660 |
versus it's an end-to-end RL system or imitation learning, 00:55:57.180 |
So for instance, I think a very, very useful tool 00:56:11.860 |
is actually planning, search optimization, right? 00:56:15.140 |
So robotics is a sequential decision-making problem. 00:56:26.460 |
how to achieve its goal without hitting stuff 00:56:38.220 |
There's nothing rule-based around that, right? 00:56:42.060 |
and figuring out, or you're optimizing through a space 00:56:43.780 |
and figure out what seems to be the right thing to do. 00:56:49.940 |
because you need to learn models of the world. 00:56:52.580 |
And I think it's hard to just do the learning part 00:56:58.820 |
because then you're saying, well, I could do imitation, 00:57:01.780 |
but then when I go off distribution, I'm really screwed. 00:57:04.700 |
Or you can say, I can do reinforcement learning, 00:57:09.900 |
but then you have to do either reinforcement learning 00:57:12.740 |
in the real world, which sounds a little challenging 00:57:18.460 |
or you have to do reinforcement learning in simulation. 00:57:23.140 |
You need to model things, at least to model people, 00:57:30.180 |
whatever policy you get of that is like actually fine 00:57:46.300 |
It seems like humans, everything we've been talking about 00:57:51.420 |
Do you think simulation has a role in this space? 00:57:58.860 |
and train with them ahead of time, for instance. 00:58:07.700 |
the models are sort of human constructed or learned? 00:58:31.660 |
and you don't assume anything else and you just say, okay, 00:58:39.220 |
Let me fit a policy to help people work based on that. 00:58:42.580 |
What tends to happen is you collected some data 00:58:58.580 |
where that model that you've built of the human 00:59:01.020 |
completely sucks because out of distribution, 00:59:07.860 |
and then you take only the ones that are consistent 00:59:14.460 |
a lot of things could happen outside of that distribution 00:59:17.580 |
where you're confident and you know what's going on. 00:59:30.820 |
so distribution is referring to the data that you've seen. 00:59:38.100 |
- They've encountered so far at training time. 00:59:40.740 |
But it kind of also implies that there's a nice, 00:59:44.020 |
like statistical model that represents that data. 00:59:47.460 |
So out of distribution feels like, I don't know, 01:00:03.300 |
- And so, and what we're talking about here is 01:00:14.940 |
I can anticipate what will happen in situations 01:00:24.140 |
but, and I might be a little uncertain and so on. 01:00:26.580 |
I think it's this that if you just rely on data, 01:00:36.060 |
there's too many policies out there that fit the data. 01:00:40.700 |
to really be able to anticipate what the person will do. 01:00:43.020 |
It kind of depends on what they've been doing so far, 01:00:46.980 |
to kind of, at least implicitly, sort of say, 01:00:53.020 |
So anyway, it's like you're trying to map history of states 01:00:56.660 |
- And history meaning like the last few seconds 01:00:59.860 |
or the last few minutes or the last few months. 01:01:02.540 |
- Who knows, who knows how much you need, right? 01:01:06.500 |
the positions of everything or whatnot and velocities. 01:01:21.420 |
So there's all very related things to think about it. 01:01:24.540 |
Basically, what are assumptions that we should be making 01:01:34.580 |
And now you're talking about, well, I don't know, 01:01:38.660 |
Maybe you can assume that people actually have intentions 01:01:58.180 |
common sense reasoning, whatever the heck that means. 01:02:01.180 |
Do you think something like common sense reasoning 01:02:04.980 |
has to be solved in part to be able to solve this dance 01:02:09.060 |
of human-robot interaction in the driving space 01:02:14.980 |
Do you have to be able to reason about these kinds 01:02:21.900 |
of all the things we've been talking about humans, 01:02:27.140 |
I don't even know how to express them with words, 01:02:30.580 |
but the basics of human behavior, of fear of death. 01:02:38.020 |
in some kind of sense, maybe not, maybe it's implicit, 01:02:41.860 |
but it feels it's important to explicitly encode 01:02:44.700 |
the fear of death, that people don't wanna die. 01:02:48.160 |
Because it seems silly, but the game of chicken 01:02:54.200 |
that involves with the pedestrian crossing the street 01:03:06.080 |
I don't know, it just feels like all these human concepts 01:03:11.140 |
Do you share that sense or is this a lot simpler 01:03:17.060 |
And I'm the person who likes to complicate things. 01:03:45.100 |
you automatically capture that they have an incentive 01:04:05.580 |
as having these objectives, these incentives, 01:05:02.860 |
- Let me ask sort of another small side of this, 01:05:09.940 |
but there's also relatively successful systems 01:05:23.380 |
work quite a bit with Cadillac Super Guru system, 01:05:31.300 |
There's a bunch of basically lane centering systems. 01:05:39.740 |
of dealing with the human robot interaction problem 01:05:45.260 |
and relying on the human to help the robot out 01:06:08.060 |
- I think what we have to be careful about there 01:06:12.100 |
is to not, it seems like some of these systems, 01:06:16.180 |
not all, are making this underlying assumption 01:06:23.780 |
and I'm now really not driving, but supervising 01:06:28.860 |
And so we have to be careful with this assumption 01:06:52.140 |
And I think I'm concerned about this assumption 01:06:58.380 |
it's that when you let something kind of take control 01:07:01.340 |
and do its thing, and it depends on what that thing is, 01:07:07.860 |
But if you let it do its thing and take control, 01:07:18.380 |
find themselves in if they were the ones driving. 01:07:24.020 |
just as well there as they function in the states 01:07:29.980 |
Now, another part is the kind of the human factors 01:07:34.020 |
side of this, which is that, I don't know about you, 01:07:38.260 |
but I think I definitely feel like I'm experiencing things 01:07:42.060 |
very differently when I'm actively engaged in the task 01:07:55.420 |
like you see students who are actively trying 01:07:58.300 |
to come up with the answer, learn the thing better 01:08:14.220 |
a huge amount of heat on this and I stand by it. 01:08:17.860 |
- 'Cause I know the human factors community well. 01:08:28.220 |
Nevertheless, I've been continuously surprised 01:08:40.300 |
but we have to be a little bit more open-minded. 01:08:45.300 |
So I'll tell you, there's a few surprising things 01:08:49.460 |
that super, like everything you said to the word 01:09:02.460 |
but we don't know if these systems are fundamentally unsafe. 01:09:11.060 |
Like I'm surprised by the fact, not the fact, 01:09:21.180 |
but also from just talking to a lot of people, 01:09:23.980 |
when in the supervisory role of semi-autonomous systems 01:09:35.200 |
The people are actually more energized as observers. 01:09:50.900 |
will do a better job with the system together. 01:09:56.780 |
I guess mainly I'm pointing out that if you do it naively, 01:10:02.180 |
that assumption might actually really be wrong. 01:10:04.480 |
But I do think that if you explicitly think about 01:10:20.260 |
so you want to empower them to be so much better 01:10:27.020 |
And that's different, it's a very different mindset 01:10:39.420 |
- So one of the interesting things we've been talking about 01:10:42.340 |
is the rewards, that they seem to be fundamental 01:10:59.620 |
Like how do we come up with good reward functions? 01:11:20.540 |
because it's really supposed to be what the people want, 01:11:35.060 |
even if you take the interactive component away, 01:11:37.980 |
it's still really hard to design reward functions. 01:11:43.740 |
I mean, if we assume this sort of AI paradigm 01:11:59.460 |
or maybe it's a set depending on the situation, 01:12:02.280 |
if you write it out and then you deploy the agent, 01:12:06.900 |
you'd want to make sure that whatever you specified 01:12:10.180 |
incentivizes the behavior you want from the agent 01:12:14.760 |
in any situation that the agent will be faced with, right? 01:12:24.200 |
like, you know, this is how far away you should try to stay, 01:12:30.680 |
to be able to be efficient, and blah, blah, blah, right? 01:12:33.860 |
I need to make sure that whatever I specified, 01:12:36.480 |
those constraints or trade-offs or whatever they are, 01:12:40.080 |
that when the robot goes and solves that problem 01:12:45.020 |
that behavior is the behavior that I want to see. 01:12:58.100 |
that I think are representative of what the robot will face, 01:13:01.100 |
and I can tune and add and tune some reward function 01:13:15.720 |
because, you know, through the miracle of AI, 01:13:18.960 |
we don't have to specify rules for behavior anymore, right? 01:13:24.440 |
the robot comes up with the right thing to do, 01:13:28.440 |
it optimizes, right, in that situation, it optimizes, 01:13:38.900 |
making sure you didn't forget about 50 bazillion things 01:13:42.280 |
and how they all should be combining together 01:14:09.540 |
and the like designing of features or whatever 01:14:19.680 |
And yes, I agree that there's way less of it, 01:14:35.140 |
- So you're kind of referring to unintended consequences 01:14:46.480 |
- Suboptimal behavior that is, you know, actually optimal. 01:14:49.680 |
I mean, this, I guess the idea of unintended consequences, 01:14:51.600 |
you know, it's optimal with respect to what you specified, 01:14:57.520 |
- But that's not fundamentally a robotics problem, right? 01:15:05.260 |
which is you set a metric for an organization 01:15:27.380 |
failing to think ahead of time of all the possible things. 01:15:32.380 |
All the possible things that might be important. 01:15:41.560 |
from the perspective of customizing to the end user, 01:15:44.020 |
but it really seems like it's not just the interaction 01:15:48.040 |
with the end user that's a problem of the human 01:15:50.880 |
and the robot collaborating so that the robot 01:15:55.160 |
This kind of back and forth, the robot probing, 01:15:57.280 |
the person being informative, all of that stuff 01:16:04.420 |
to this kind of maybe new form of human robot interaction, 01:16:10.780 |
and the expert programmer, roboticist, designer 01:16:28.100 |
What does it, when we think about the problem, 01:16:29.860 |
not as someone specifies all of your job is to optimize 01:16:34.460 |
and we start thinking about you're in this interaction 01:16:37.660 |
and this collaboration, and the first thing that comes up 01:16:45.020 |
it's not gospel, it's not like the letter of the law, 01:16:48.780 |
it's not the definition of the reward function 01:16:52.140 |
you should be optimizing, 'cause they're doing their best, 01:16:58.780 |
I think the sooner we'll get to more robust robots 01:17:02.460 |
that function better in different situations. 01:17:12.780 |
over putting too much weight on the reward specified 01:17:16.860 |
by definition, and maybe leaving a lot of other information 01:17:21.220 |
on the table, like what are other things we could do 01:17:27.460 |
besides attempting to specify a reward function. 01:17:32.180 |
I love the poetry of it, of leaked information. 01:17:34.860 |
You mentioned humans leak information about what they want, 01:17:55.260 |
and it's gonna stick with me for a while for some reason, 01:18:00.980 |
it kind of leaks indirectly from our behavior. 01:18:06.180 |
So I think maybe some surprising bits, right? 01:18:11.180 |
So we were talking before about, I'm a robot arm, 01:18:20.580 |
And now imagine that the robot has some initial objective 01:18:28.980 |
so they can do all these things functionally, 01:18:35.820 |
and maybe it's coming too close to me, right? 01:18:39.500 |
And maybe I'm the designer, maybe I'm the end user 01:18:52.420 |
And this is what we call physical human-robot interaction. 01:19:01.300 |
What should the robot do if such an event occurs? 01:19:03.580 |
And there's sort of different schools of thought. 01:19:05.020 |
Well, you can sort of treat it the control theoretic way 01:19:08.100 |
and say, this is a disturbance that you must reject. 01:19:11.220 |
You can sort of treat it more kind of heuristically 01:19:19.780 |
I'm gonna go in the direction that the person pushed me. 01:19:27.260 |
that that is signal that communicates about the reward 01:19:30.500 |
because if my robot was moving in an optimal way 01:19:40.260 |
Whatever it thinks is optimal is not actually optimal. 01:20:08.460 |
But they could have disengaged it for a million reasons. 01:20:16.860 |
can you structure a little bit your assumptions 01:20:20.460 |
about how human behavior relates to what they want? 01:20:26.100 |
is literally just treated this external torque 01:20:33.020 |
and you add it with what the torque the robot was 01:20:39.700 |
in respect to whatever it is that the person wants. 01:20:47.660 |
Now, you're right that there might be many things 01:20:51.420 |
and that you might need much more data than that 01:21:01.780 |
Now that we've done that in that context, just to clarify, 01:21:04.100 |
but it's definitely something we thought about 01:21:05.380 |
where you can have the robot start acting in a way, 01:21:10.380 |
like if there are a bunch of different explanations, right? 01:21:13.420 |
It moves in a way where it sees if you correct it 01:21:19.900 |
so that it can disambiguate and collect information 01:21:34.020 |
'cause the robot is about to do something bad. 01:21:42.540 |
that whatever it was about to do was not good. 01:21:46.740 |
that stopping and remaining stopped for a while 01:21:52.780 |
And that again is information about what are my preferences? 01:22:03.620 |
on the three laws of robotics from Isaac Asimov? 01:22:08.180 |
That don't harm humans, obey orders, protect yourself. 01:22:31.500 |
I know the three laws might be a silly notion, 01:22:35.580 |
what universal reward functions there might be 01:22:38.980 |
that we should enforce on the robots of the future? 01:22:47.060 |
And it doesn't, or is the mechanism that you just described, 01:22:52.700 |
it should be constantly adjusting kind of thing. 01:22:55.180 |
- I think it should constantly be adjusting kind of thing. 01:23:19.940 |
And you want these machines to do what you want, 01:23:24.620 |
so you don't want them to take you literally. 01:23:26.660 |
You wanna take what you say and interpret it in context. 01:23:31.660 |
And that's what we do with the specified rewards. 01:23:33.540 |
We don't take them literally anymore from the designer. 01:23:47.500 |
we sort of say, okay, the designer specified this thing, 01:23:57.180 |
that I shall always optimize, always and forever, 01:23:59.540 |
but as this is good evidence about what the person wants 01:24:11.060 |
'Cause ultimately that's what the designer thought about, 01:24:34.380 |
And then there's all these additional signals 01:24:36.380 |
we've been finding that it can kind of continually learn from 01:24:39.660 |
and adapt its understanding of what people want. 01:24:41.740 |
Every time the person corrects it, maybe they demonstrate, 01:24:48.380 |
One really, really crazy one is the environment itself. 01:24:53.380 |
Like our world, you don't, it's not, you know, 01:25:03.580 |
and you're saying, oh, people are making decisions 01:25:07.020 |
But our world is something that we've been acting when, 01:25:23.260 |
So even though the robot doesn't see me doing this, 01:25:33.180 |
because there's no way for them to have magically, 01:25:35.860 |
you know, instantiated themselves in that way. 01:25:38.980 |
Someone must have actually taken the time to do that. 01:25:44.580 |
- So the environment actually tells, the environment-- 01:25:52.860 |
So you have to kind of reverse engineer the narrative 01:25:55.700 |
that happened to create the environment as it is 01:26:03.100 |
Because people don't have the bandwidth to do everything. 01:26:08.100 |
doesn't mean that I want it to be messy, right? 01:26:17.380 |
well, that's something else was more important, 01:26:21.620 |
So it's a little subtle, but yeah, we really think of it, 01:26:26.740 |
that people implicitly made about how they want their world. 01:26:31.980 |
- What book or books, technical or fiction or philosophical, 01:26:35.060 |
had a, when you like look back, your life had a big impact, 01:26:39.700 |
maybe it was a turning point, it was inspiring in some way. 01:26:45.780 |
that nobody in their right mind would want to read, 01:26:48.700 |
or maybe it's a book that you would recommend 01:26:52.660 |
or maybe those could be two different recommendations 01:27:01.700 |
- When I was in, it's kind of a personal story. 01:27:15.580 |
I didn't know anything about AI at that point. 01:27:17.620 |
I was, you know, I had watched the movie, "The Matrix." 01:27:30.140 |
and you know, you were asking in the beginning, 01:27:32.100 |
what are, you know, it's math and it's algorithms, 01:27:44.580 |
through a kind of a messy, complicated situation, 01:27:48.460 |
sort of what sequence of decisions you should make 01:27:55.900 |
I'm, you know, I'm biased, but that's a cool book. 01:28:02.500 |
the goal of the process of intelligence and mechanize it. 01:28:15.900 |
- That's how you can write math about human behavior, right? 01:28:18.860 |
Yeah, so that's, and I think that stuck with me 01:28:28.820 |
combine it with data and learning, put it all together, 01:28:32.940 |
and, you know, hope that instead of writing rules 01:28:36.140 |
for the robots, writing heuristics, designing behavior, 01:28:44.060 |
That's kind of our, you know, that's our signature move. 01:28:47.380 |
and then instead of kind of hand crafting this 01:28:49.580 |
and that and that, the robot figured stuff out 01:28:53.580 |
And I think that is the same enthusiasm that I got 01:28:56.260 |
from the robot figured out how to reach that goal 01:29:00.820 |
- So, apologize for the romanticized questions, 01:29:10.180 |
sort of emphasizing the finiteness of our existence, 01:29:19.620 |
- It's like my biggest nightmare, by the way. 01:29:24.700 |
So, I'm actually, I really don't like the idea 01:29:41.480 |
Do you think of it as a feature or a bug too? 01:29:44.460 |
Is it, you said you don't like the idea of dying, 01:29:47.520 |
but if I were to give you a choice of living forever, 01:29:59.340 |
And the moral of the story is that you have to make 01:30:05.380 |
'cause otherwise people just kind of, it's like WALL-E. 01:30:08.020 |
It's like, ah, I'm sorry, I'm gonna lie around. 01:30:30.360 |
Yeah, it's just, I think that's the scary part. 01:30:35.340 |
- I still think that we like existing so much 01:30:44.140 |
I find almost everything about this life beautiful. 01:30:46.900 |
The silliest, most mundane things are just beautiful. 01:30:59.740 |
I also feel like there's a lesson in there for robotics 01:31:08.700 |
the finiteness of things seems to be a fundamental nature 01:31:29.420 |
But anyway, if you were, speaking of reward functions, 01:31:45.300 |
'cause I don't know that I would change much. 01:31:59.140 |
Maybe I'll take more trips to the Caribbean or something, 01:32:04.300 |
but I tried to solve that already from time to time. 01:32:08.580 |
So yeah, I mean, I try to do the things that bring me joy 01:32:20.180 |
For the most part, I do things that spark joy. 01:32:31.660 |
But no, I mean, I think I have amazing colleagues 01:32:35.860 |
and amazing students and amazing family and friends 01:32:44.820 |
So I don't know that I would really change anything. 01:32:50.860 |
what small act of kindness, if one pops to mind, 01:32:53.940 |
were you once shown that you will never forget? 01:33:08.660 |
We were gearing up for our baccalaureate exam 01:33:15.940 |
I was comfortable enough with some of those subjects, 01:33:19.580 |
but physics was something that I hadn't focused on 01:33:23.020 |
And so they were all working with this one teacher. 01:33:39.740 |
because she sort of told me that I should take the SATs 01:33:55.540 |
I couldn't, my parents couldn't really afford 01:34:04.100 |
to kind of train me for SATs and all that jazz 01:34:12.020 |
And obviously that has taken you to here today, 01:34:16.260 |
also to one of the world experts in robotics. 01:34:20.020 |
- Yeah, people do it via small or large acts of kindness. 01:34:34.540 |
Let me talk about the most ridiculous big question. 01:34:40.100 |
What's the reward function under which we humans operate? 01:34:48.900 |
What gives life fulfillment, purpose, happiness, meaning? 01:34:55.580 |
- You can't even ask that question with a straight face. 01:35:05.820 |
- You're gonna try to answer it anyway, aren't you? 01:35:18.180 |
and this whole like you're a speck of dust kind of thing. 01:35:20.780 |
I think I was conceptualizing that we're kind of, 01:35:26.660 |
We don't matter much in the grand scheme of things. 01:35:32.060 |
'cause they talked about this multiverse theory 01:35:40.380 |
and it's just these pop in and out of existence. 01:35:42.500 |
So like our whole thing that we can't even fathom 01:35:45.580 |
how big it is was like a blimp that went in and out. 01:35:48.820 |
And at that point I was like, okay, I'm done. 01:35:56.340 |
is try to impact whatever local thing we can impact. 01:35:59.900 |
Our communities leave a little bit behind there, 01:36:02.260 |
our friends, our family, our local communities 01:36:09.300 |
'Cause I just, everything beyond that seems ridiculous. 01:36:14.220 |
like how do you make sense of these multiverses? 01:36:16.540 |
Like are you inspired by the immensity of it? 01:36:29.420 |
or is it almost paralyzing in the mystery of it? 01:36:36.100 |
I'm frustrated by my inability to comprehend. 01:36:43.980 |
It's like, there's some stuff that we should time, 01:36:47.020 |
blah, blah, blah, that we should really be understanding. 01:37:08.220 |
- Well, that's one of the dreams of artificial intelligence 01:37:13.140 |
expand our cognitive capacity in order to understand, 01:37:16.060 |
build the theory of everything with the physics 01:37:19.900 |
and understand what the heck these multiverses are. 01:37:48.300 |
and thank you to our presenting sponsor, Cash App. 01:37:52.900 |
by downloading Cash App and using code LEXPODCAST. 01:37:56.980 |
If you enjoy this podcast, subscribe on YouTube, 01:38:03.320 |
or simply connect with me on Twitter @LexFriedman. 01:38:11.460 |
"Your assumptions are your windows in the world. 01:38:19.860 |
Thank you for listening, and hope to see you next time.