Anca Dragan: Human-Robot Interaction and Reward Engineering

00:00:00.000 | The following is a conversation with Anka Jogan,

00:00:03.900 | a professor at Berkeley working on human robot interaction,

00:00:08.160 | algorithms that look beyond the robot's function

00:00:10.800 | in isolation and generate robot behavior

00:00:13.920 | that accounts for interaction

00:00:15.960 | and coordination with human beings.

00:00:18.080 | She also consults at Waymo, the autonomous vehicle company,

00:00:22.360 | but in this conversation,

00:00:23.580 | she's 100% wearing her Berkeley hat.

00:00:27.140 | She is one of the most brilliant and fun roboticists

00:00:30.600 | in the world to talk with.

00:00:32.520 | I had a tough and crazy day

00:00:34.840 | leading up to this conversation,

00:00:36.320 | so I was a bit tired, even more so than usual,

00:00:41.320 | but almost immediately as she walked in,

00:00:44.160 | her energy, passion, and excitement

00:00:46.320 | for human robot interaction was contagious.

00:00:48.880 | So I had a lot of fun and really enjoyed this conversation.

00:00:52.840 | This is the Artificial Intelligence Podcast.

00:00:55.520 | If you enjoy it, subscribe on YouTube,

00:00:57.880 | review it with five stars on Apple Podcast,

00:01:00.320 | support it on Patreon,

00:01:01.680 | or simply connect with me on Twitter at Lex Friedman,

00:01:05.160 | spelled F-R-I-D-M-A-N.

00:01:08.160 | As usual, I'll do one or two minutes of ads now

00:01:11.000 | and never any ads in the middle

00:01:12.560 | that can break the flow of the conversation.

00:01:14.800 | I hope that works for you

00:01:16.240 | and doesn't hurt the listening experience.

00:01:18.580 | This show is presented by Cash App,

00:01:22.720 | the number one finance app in the App Store.

00:01:25.560 | When you get it, use code LEXPODCAST.

00:01:29.360 | Cash App lets you send money to friends,

00:01:31.400 | buy Bitcoin, and invest in the stock market

00:01:33.920 | with as little as $1.

00:01:35.360 | Since Cash App does fractional share trading,

00:01:39.240 | let me mention that the order execution algorithm

00:01:41.740 | that works behind the scenes

00:01:43.400 | to create the abstraction of fractional orders

00:01:46.000 | is an algorithmic marvel.

00:01:48.200 | So big props to the Cash App engineers

00:01:50.540 | for solving a hard problem

00:01:52.240 | that in the end provides an easy interface

00:01:54.760 | that takes a step up to the next layer of abstraction

00:01:57.860 | over the stock market,

00:01:59.360 | making trading more accessible for new investors

00:02:02.100 | and diversification much easier.

00:02:04.860 | So again, if you get Cash App

00:02:07.560 | from the App Store or Google Play

00:02:09.200 | and use the code LEXPODCAST, you get $10,

00:02:13.160 | and Cash App will also donate $10 to FIRST,

00:02:15.920 | an organization that is helping to advance robotics

00:02:18.560 | and STEM education for young people around the world.

00:02:22.340 | And now here's my conversation with Enka Drogon.

00:02:25.920 | When did you first fall in love with robotics?

00:02:29.900 | - I think it was a very gradual process,

00:02:34.220 | and it was somewhat accidental actually,

00:02:37.060 | because I first started getting into programming

00:02:41.220 | when I was a kid and then into math,

00:02:43.220 | and then into, I decided computer science

00:02:46.320 | was the thing I was going to do,

00:02:47.900 | and then in college I got into AI,

00:02:50.220 | and then I applied to the Robotics Institute

00:02:52.540 | at Carnegie Mellon,

00:02:54.340 | and I was coming from this little school in Germany

00:02:57.180 | that nobody had heard of,

00:02:59.020 | but I had spent an exchange semester at Carnegie Mellon,

00:03:01.840 | so I had letters from Carnegie Mellon.

00:03:04.060 | So that was the only, you know, MIT said no,

00:03:06.920 | Berkeley said no, Stanford said no.

00:03:09.220 | That was the only place I got into,

00:03:11.140 | so I went there to the Robotics Institute,

00:03:13.220 | and I thought that robotics is a really cool way

00:03:16.300 | to actually apply the stuff that I knew and loved,

00:03:20.040 | like optimization.

00:03:21.700 | So that's how I got into robotics.

00:03:23.220 | I have a better story how I got into cars,

00:03:25.780 | which is I used to do mostly manipulation in my PhD,

00:03:30.780 | but now I do kind of a bit of everything application-wise,

00:03:34.780 | including cars, and I got into cars

00:03:38.940 | because I was here in Berkeley

00:03:42.140 | while I was a PhD student still for RSS 2014.

00:03:46.380 | Peter Biel organized it,

00:03:48.220 | and he arranged for, it was Google at the time,

00:03:51.440 | to give us rides in self-driving cars,

00:03:54.280 | and I was in a robot,

00:03:56.400 | and it was just making decision after decision the right call

00:04:01.400 | and it was so amazing.

00:04:03.400 | So it was a whole different experience, right?

00:04:05.560 | Just, I mean, manipulation is so hard,

00:04:07.020 | you can't do anything, and there it was.

00:04:08.680 | - Was it the most magical robot you've ever met?

00:04:11.200 | So like, for me, too, me and Google self-driving car,

00:04:14.920 | for the first time was like a transformative moment.

00:04:18.460 | Like I had two moments like that,

00:04:19.960 | that and Spot Mini, I don't know if you met Spot Mini

00:04:22.480 | from Boston Dynamics.

00:04:24.160 | I felt like I fell in love or something,

00:04:27.260 | 'cause I know how Spot Mini works, right?

00:04:30.840 | It's just, I mean, there's nothing truly special.

00:04:34.000 | It's great engineering work,

00:04:35.800 | but the anthropomorphism that went on into my brain,

00:04:40.280 | that came to life.

00:04:41.440 | Like it had a little arm and it like, and looked at me.

00:04:45.880 | He, she looked at me.

00:04:47.120 | You know, I don't know,

00:04:47.960 | there's a magical connection there.

00:04:48.960 | And it made me realize, wow, robots can be so much more

00:04:52.480 | than things that manipulate objects.

00:04:54.240 | They can be things that have a human connection.

00:04:56.920 | Do you have, was the self-driving car the moment,

00:05:00.440 | like, was there a robot that truly sort of inspired you?

00:05:03.880 | - That was, I remember that experience very viscerally,

00:05:08.240 | riding in that car and being just wowed.

00:05:11.280 | I had the, they gave us a sticker that said,

00:05:16.040 | I rode in a self-driving car

00:05:17.520 | and it had this cute little firefly on it.

00:05:20.440 | And, or a logo or something.

00:05:21.280 | - Oh, that was like the smaller one, like the firefly.

00:05:23.680 | - Yeah, the really cute one, yeah.

00:05:25.640 | And I put it on my laptop and I had that for years

00:05:30.160 | until I finally changed my laptop out and you know.

00:05:33.120 | - What about if we walk back, you mentioned optimization.

00:05:36.320 | Like what beautiful ideas inspired you in math,

00:05:40.720 | computer science early on?

00:05:42.640 | Like why get into this field?

00:05:44.520 | Seems like a cold and boring field of math.

00:05:47.400 | Like what was exciting to you about it?

00:05:49.000 | - The thing is I liked math from very early on,

00:05:52.400 | from fifth grade is when I got into the math Olympiad

00:05:56.640 | and all of that.

00:05:57.480 | - Oh, you competed too.

00:05:58.520 | - Yeah, this, Romania is like our national sport,

00:06:01.120 | you gotta understand.

00:06:02.760 | So I got into that fairly early

00:06:05.720 | and it was a little maybe too just theory

00:06:10.160 | with no kind of, I didn't kind of have a,

00:06:12.920 | didn't really have a goal.

00:06:14.960 | And other than understanding, which was cool,

00:06:17.520 | I always liked learning and understanding,

00:06:19.280 | but there was no, okay,

00:06:20.200 | what am I applying this understanding to?

00:06:22.200 | And so I think that's how I got into,

00:06:23.800 | more heavily into computer science

00:06:25.320 | 'cause it was kind of math meets something

00:06:29.200 | you can do tangibly in the world.

00:06:31.280 | - Do you remember like the first program you've written?

00:06:34.440 | - Okay, the first program I've written,

00:06:36.400 | I kind of do, it was in Q basic in fourth grade.

00:06:42.520 | - Wow.

00:06:43.360 | - And it was drawing like a circle.

00:06:46.640 | - Graphics.

00:06:47.480 | - Yeah, that was, I don't know how to do that anymore.

00:06:50.800 | But in fourth grade,

00:06:52.880 | that's the first thing that they taught me.

00:06:54.200 | I was like, you could take a special,

00:06:56.320 | I wouldn't say it was an extra,

00:06:57.600 | it's in a sense an extracurricular.

00:06:59.040 | So you could sign up for dance or music or programming.

00:07:03.360 | And I did the programming thing

00:07:04.720 | and my mom was like, what, why?

00:07:07.000 | - Did you compete in program?

00:07:08.920 | Like these days, Romania probably,

00:07:12.080 | that's like a big thing.

00:07:13.000 | There's a program and competitions.

00:07:15.440 | Was that, did that touch you at all?

00:07:17.160 | - I did a little bit of the computer science Olympian,

00:07:21.400 | but not as seriously as I did the math Olympian.

00:07:24.760 | So it was programming.

00:07:25.800 | Yeah, it's basically, here's a hard math problem,

00:07:27.760 | solve it with a computer is kind of the deal.

00:07:29.480 | - Oh yeah, it's more like algorithm.

00:07:30.760 | - Exactly, it's always algorithmic.

00:07:32.640 | - So, again, you kind of mentioned

00:07:35.600 | the Google self-driving car,

00:07:36.720 | but outside of that,

00:07:39.920 | well, what's like who or what is your favorite robot,

00:07:44.000 | real or fictional that like captivated

00:07:46.520 | your imagination throughout?

00:07:48.360 | I mean, I guess you kind of alluded

00:07:49.920 | to the Google self-drive,

00:07:51.440 | the Firefly was a magical moment,

00:07:53.600 | but is there something else?

00:07:54.880 | - It wasn't the Firefly there.

00:07:56.240 | I think there was the Lexus by the way.

00:07:58.000 | This was back then.

00:07:59.640 | But yeah, so good question.

00:08:01.680 | Okay, my favorite fictional robot is WALL-E.

00:08:08.800 | And I love how amazingly expressive it is.

00:08:13.800 | I'm personally thinks a little bit

00:08:16.040 | about expressive motion kinds of things

00:08:17.600 | you were saying with,

00:08:18.440 | you can do this and it's a head

00:08:19.840 | and it's the manipulator and what does it all mean?

00:08:22.840 | I like to think about that stuff.

00:08:24.040 | I love Pixar, I love animation.

00:08:26.160 | - WALL-E has two big eyes, I think, or no.

00:08:28.640 | - Yeah, it has these cameras and they move.

00:08:33.640 | So yeah, it goes woo and then it's super cute.

00:08:38.600 | And yeah, the way it moves is just so expressive.

00:08:41.440 | The timing of that motion,

00:08:43.280 | what it's doing with its arms

00:08:44.760 | and what it's doing with these lenses is amazing.

00:08:48.280 | And so I've really liked that from the start.

00:08:53.280 | And then on top of that, sometimes I shared this,

00:08:56.440 | it's a personal story I share with people

00:08:58.120 | or when I teach about AI or whatnot.

00:09:01.160 | My husband proposed to me by building a WALL-E

00:09:07.040 | and he actuated it.

00:09:09.680 | So it's seven degrees of freedom, including the lens thing.

00:09:13.520 | And it kind of came in and he made it have

00:09:18.520 | like the belly box opening thing.

00:09:21.880 | So it just did that.

00:09:23.520 | And then it spewed out this box made out of Legos

00:09:27.600 | that open slowly and then bam, yeah.

00:09:30.640 | Yeah, it was quite, it set a bar.

00:09:34.400 | - That could be like the most impressive thing

00:09:36.360 | I've ever heard.

00:09:37.200 | (laughing)

00:09:38.040 | Okay.

00:09:39.080 | - That was a special connection to WALL-E.

00:09:40.280 | Long story short, I like WALL-E

00:09:41.600 | 'cause I like animation and I like robots

00:09:43.760 | and I like the fact that this was,

00:09:46.920 | we still have this robot to this day.

00:09:49.880 | - How hard is that problem do you think

00:09:51.400 | of the expressivity of robots?

00:09:54.240 | Like with the Boston Dynamics,

00:09:56.800 | I never talked to those folks

00:09:59.000 | about this particular element.

00:10:00.360 | I've talked to them a lot,

00:10:02.120 | but it seems to be like almost an accidental

00:10:04.520 | side effect for them that they weren't,

00:10:07.480 | I don't know if they're faking it,

00:10:08.720 | they weren't trying to, okay.

00:10:11.760 | They do say that the gripper,

00:10:14.240 | it was not intended to be a face.

00:10:16.620 | I don't know if that's a honest statement,

00:10:20.400 | but I think they're legitimate.

00:10:21.720 | - It probably is.

00:10:22.760 | - And so do we automatically just anthropomorphize

00:10:26.760 | anything we can see about a robot?

00:10:29.280 | So like the question is how hard is it

00:10:31.760 | to create a WALL-E type robot

00:10:33.640 | that connects so deeply with us humans?

00:10:35.320 | What do you think?

00:10:36.720 | - It's really hard, right?

00:10:37.840 | So it depends on what setting.

00:10:39.920 | So if you wanna do it in this very particular narrow setting

00:10:44.920 | where it does only one thing and it's expressive,

00:10:48.160 | then you can get an animator.

00:10:49.600 | You can have Pixar on call,

00:10:51.200 | come in, design some trajectories.

00:10:53.480 | There was a, Anki had a robot called Cosmo

00:10:56.000 | where they put in some of these animations.

00:10:58.320 | That part is easy, right?

00:11:00.520 | The hard part is doing it not via these

00:11:04.320 | kind of handcrafted behaviors,

00:11:06.480 | but doing it generally autonomously.

00:11:09.840 | Like I want robots, I don't work on,

00:11:12.040 | just to clarify, I used to work a lot on this.

00:11:14.680 | I don't work on that quite as much these days,

00:11:17.360 | but the notion of having robots that,

00:11:21.720 | when they pick something up and put it in a place,

00:11:24.320 | they can do that with various forms of style,

00:11:28.160 | where you can say, well, this robot is succeeding

00:11:30.760 | at this task and is confident versus it's hesitant

00:11:32.960 | versus maybe it's happy or it's disappointed

00:11:36.600 | about something, some failure that it had.

00:11:38.800 | I think that when robots move,

00:11:41.320 | they can communicate so much about internal states

00:11:46.840 | or perceived internal states that they have.

00:11:49.800 | And I think that's really useful

00:11:53.280 | and an element that we'll want in the future

00:11:55.520 | because I was reading this article about how

00:11:59.040 | kids are being rude to Alexa

00:12:05.480 | because they can be rude to it

00:12:09.720 | and it doesn't really get angry, right?

00:12:11.600 | It doesn't reply in any way, it just says the same thing.

00:12:15.280 | So I think there's, at least for that,

00:12:16.840 | for the correct development of children,

00:12:20.120 | it's important that these things,

00:12:21.520 | you kind of react differently.

00:12:22.960 | I also think that you walk in your home

00:12:24.640 | and you have a personal robot.

00:12:26.160 | And if you're really pissed,

00:12:27.160 | presumably the robot should kind of behave

00:12:28.920 | slightly differently than when you're

00:12:30.560 | super happy and excited.

00:12:32.520 | But it's really hard because it's, I don't know,

00:12:37.280 | the way I would think about it

00:12:38.720 | and the way I've thought about it

00:12:39.760 | when it came to expressing goals or intentions for robots,

00:12:44.080 | it's, well, what's really happening is that

00:12:47.480 | instead of doing robotics where you have your state

00:12:51.520 | and you have your action space

00:12:53.600 | and you have your space,

00:12:55.480 | the reward function that you're trying to optimize,

00:12:57.880 | now you kind of have to expand the notion of state

00:13:00.600 | to include this human internal state.

00:13:02.800 | What is the person actually perceiving?

00:13:05.960 | What do they think about the robots, something or other?

00:13:10.200 | And then you have to optimize in that system.

00:13:12.800 | And so that means they have to understand

00:13:14.160 | how your motion, your actions end up sort of influencing

00:13:18.000 | the observer's kind of perception of you.

00:13:21.000 | And it's very hard to write math about that.

00:13:25.040 | - Right, so when you start to think about

00:13:27.120 | incorporating the human into the state model,

00:13:29.720 | apologize for the philosophical question,

00:13:33.640 | but how complicated are human beings, do you think?

00:13:36.680 | Can they be reduced to a kind of

00:13:40.720 | almost like an object that moves

00:13:43.760 | and maybe has some basic intents?

00:13:46.160 | Or is there something, do we have to model things like mood

00:13:50.080 | and general aggressiveness and time?

00:13:52.800 | I mean, all of these kinds of human qualities

00:13:55.000 | or like game theoretical qualities,

00:13:56.640 | like what's your sense?

00:13:58.800 | - How complicated is-

00:14:00.160 | - How hard is the problem of human robot interaction?

00:14:03.360 | - Yeah, should we talk about what the problem

00:14:06.000 | of human robot interaction is?

00:14:07.720 | - Yeah, what is human robot interaction?

00:14:10.800 | - And then talk about how that, yeah.

00:14:12.320 | So, and by the way, I'm gonna talk about this very

00:14:15.840 | particular view of human robot interaction, right?

00:14:19.040 | Which is not so much on the social side

00:14:21.600 | or on the side of how do you have a good conversation

00:14:24.520 | with the robot?

00:14:25.360 | What should the robot's appearance be?

00:14:26.760 | It turns out that if you make robots taller versus shorter,

00:14:29.200 | this has an effect on how people act with them.

00:14:31.920 | So I'm not talking about that.

00:14:34.640 | But I'm talking about this very kind of narrow thing,

00:14:36.240 | which is you take, if you wanna take a task

00:14:39.920 | that a robot can do in isolation,

00:14:42.880 | in a lab out there in the world, but in isolation,

00:14:46.600 | and now you're asking, what does it mean for the robot

00:14:49.720 | to be able to do this task for,

00:14:52.600 | presumably what its actually end goal is,

00:14:54.280 | which is to help some person.

00:14:55.880 | That ends up changing the problem in two ways.

00:15:01.760 | The first way it changes the problem is that

00:15:04.680 | the robot is no longer the single agent acting.

00:15:08.560 | Do you have humans who also take actions in that same space?

00:15:11.880 | You know, cars navigate around people,

00:15:13.840 | robots around an office, navigate around the people,

00:15:16.440 | in that office.

00:15:18.560 | If I send the robot to over there in the cafeteria

00:15:20.920 | to get me a coffee, then there's probably other people

00:15:23.560 | reaching for stuff in the same space.

00:15:25.320 | And so now you have your robot,

00:15:27.680 | and you're in charge of the actions

00:15:29.120 | that the robot is taking.

00:15:30.560 | Then you have these people who are also making decisions

00:15:33.480 | and taking actions in that same space.

00:15:36.240 | And even if the robot knows what it should do

00:15:39.160 | and all of that, just coexisting with these people, right?

00:15:42.720 | Kind of getting the actions to gel well,

00:15:45.320 | to mesh well together.

00:15:47.080 | That's sort of the kind of problem number one.

00:15:50.480 | And then there's problem number two,

00:15:51.640 | which is, goes back to this notion of,

00:15:56.560 | if I'm a programmer, I can specify some objective

00:16:00.280 | for the robot to go off and optimize,

00:16:02.000 | I can specify the task.

00:16:03.760 | But if I put the robot in your home,

00:16:07.280 | presumably you might have your own opinions about,

00:16:11.400 | well, okay, I want my house clean,

00:16:12.800 | but how do I want it clean?

00:16:13.960 | Then how should robot know how close to me it should come

00:16:16.280 | and all of that.

00:16:17.320 | And so I think those are the two differences

00:16:19.640 | that you have, you're acting around people,

00:16:22.440 | and what you should be optimizing for

00:16:26.000 | should satisfy the preferences of that end user,

00:16:28.480 | not of your programmer who programmed you.

00:16:30.840 | - Yeah, and the preferences thing is tricky.

00:16:33.760 | So figuring out those preferences,

00:16:35.680 | be able to interactively adjust,

00:16:38.320 | to understand what the human is doing.

00:16:39.800 | So really it boils down to understand the humans

00:16:42.240 | in order to interact with them and in order to please them.

00:16:45.840 | - Right.

00:16:47.040 | - So why is this hard?

00:16:48.440 | - Yeah, why is understanding humans hard?

00:16:51.080 | So I think there's two tasks about understanding humans

00:16:56.080 | that in my mind are very, very similar,

00:16:59.880 | but not everyone agrees.

00:17:00.960 | So there's the task of being able to just anticipate

00:17:04.400 | what people will do.

00:17:05.680 | We all know that cards need to do this, right?

00:17:07.600 | We all know that, well, if I navigate around some people,

00:17:10.600 | the robot has to get some notion of,

00:17:12.600 | okay, where is this person gonna be?

00:17:15.520 | So that's kind of the prediction side.

00:17:17.320 | And then there's what you were saying,

00:17:19.240 | satisfying the preferences, right?

00:17:21.040 | So adapting to the person's preferences,

00:17:22.840 | knowing what to optimize for,

00:17:24.520 | which is more this inference side,

00:17:25.920 | this what does this person want?

00:17:28.840 | What is their intent?

00:17:29.720 | What are their preferences?

00:17:31.640 | And to me, those kind of go together

00:17:35.160 | because I think that at the very least,

00:17:39.800 | if you can understand,

00:17:41.200 | if you can look at human behavior

00:17:43.040 | and understand what it is that they want,

00:17:45.560 | then that's sort of the key enabler

00:17:47.480 | to being able to anticipate what they'll do in the future.

00:17:50.720 | Because I think that we're not arbitrary.

00:17:53.640 | We make these decisions that we make,

00:17:55.480 | we act in the way we do

00:17:57.000 | because we're trying to achieve certain things.

00:17:59.400 | And so I think that's the relationship between them.

00:18:01.600 | Now, how complicated do these models need to be

00:18:05.560 | in order to be able to understand what people want?

00:18:10.160 | So we've gotten a long way in robotics

00:18:15.160 | with something called inverse reinforcement learning,

00:18:17.560 | which is the notion of someone acts,

00:18:19.520 | demonstrates how they want the thing done.

00:18:22.120 | - What is inverse reinforcement learning?

00:18:24.240 | You briefly said it.

00:18:25.240 | - Right, so it's the problem of take human behavior

00:18:30.240 | and infer reward function from this.

00:18:33.280 | So figure out what it is

00:18:34.560 | that that behavior is optimal with respect to.

00:18:37.440 | And it's a great way to think about

00:18:39.000 | learning human preferences in the sense of,

00:18:41.640 | you have a car and the person can drive it

00:18:45.360 | and then you can say,

00:18:46.200 | "Well, okay, I can actually learn

00:18:47.960 | what the person is optimizing for.

00:18:50.720 | I can learn their driving style."

00:18:53.520 | Or you can have people demonstrate

00:18:55.680 | how they want the house clean.

00:18:57.360 | And then you can say,

00:18:58.200 | "Okay, I'm getting the trade-offs that they're making.

00:19:03.000 | I'm getting the preferences that they want out of this."

00:19:06.160 | And so we've been successful in robotics somewhat with this.

00:19:10.320 | And it's based on a very simple model of human behavior.

00:19:15.040 | It was remarkably simple,

00:19:16.360 | which is that human behavior is optimal

00:19:18.680 | with respect to whatever it is that people want, right?

00:19:22.040 | So you make that assumption

00:19:23.120 | and now you can kind of inverse through.

00:19:24.400 | That's why it's called inverse,

00:19:25.920 | well, really optimal control,

00:19:27.280 | but also inverse reinforcement learning.

00:19:30.560 | So this is based on utility maximization in economics.

00:19:35.560 | Whereas back in the '40s,

00:19:38.440 | von Neumann and Morgenstein were like,

00:19:39.760 | "Okay, people are making choices by maximizing utility, go."

00:19:44.760 | And then in the late '50s,

00:19:48.400 | we had Luce and Shepard come in and say,

00:19:53.320 | "People are a little bit noisy

00:19:56.360 | and approximate in that process.

00:19:58.800 | So they might choose something kind of stochastically

00:20:02.560 | with probability proportional to

00:20:04.800 | how much utility something has.

00:20:08.000 | So there's a bit of noise in there."

00:20:10.480 | This has translated into robotics

00:20:12.640 | and something that we call Boltzmann rationality.

00:20:15.080 | So it's a kind of an evolution

00:20:16.600 | of inverse reinforcement learning

00:20:17.720 | that accounts for human noise.

00:20:20.520 | And we've had some success with that too

00:20:22.840 | for these tasks where it turns out people act

00:20:26.320 | noisily enough that you can't just do vanilla,

00:20:29.200 | the vanilla version.

00:20:30.840 | You can account for noise and still infer

00:20:33.240 | what they seem to want based on this.

00:20:37.240 | Then now we're hitting tasks where that's not enough.

00:20:40.720 | And-

00:20:41.560 | - What are examples?

00:20:43.240 | - What are examples?

00:20:44.360 | So imagine you're trying to control some robot

00:20:46.720 | that's fairly complicated.

00:20:48.640 | You're trying to control a robot arm

00:20:50.080 | 'cause maybe you're a patient with a motor impairment

00:20:53.440 | and you have this wheelchair mounted arm

00:20:54.760 | and you're trying to control it around.

00:20:57.160 | Or one test that we've looked at with Sergei is,

00:21:01.600 | and our students did, is a lunar lander.

00:21:03.720 | So I don't know if you know this Atari game,

00:21:05.920 | it's called Lunar Lander.

00:21:07.640 | It's really hard.

00:21:08.520 | People really suck at landing the thing.

00:21:10.600 | Mostly they just crash it left and right.

00:21:12.720 | Okay, so this is the kind of task.

00:21:14.360 | Imagine you're trying to provide some assistance

00:21:17.720 | to a person operating such a robot

00:21:20.880 | where you want the kind of the autonomy to kick in,

00:21:22.760 | figure out what it is that you're trying to do

00:21:24.200 | and help you do it.

00:21:25.640 | It's really hard to do that for say lunar lander

00:21:31.400 | because people are all over the place.

00:21:33.640 | And so they seem much more noisy than really rational.

00:21:37.400 | That's an example of a task

00:21:38.600 | where these models are kind of failing us.

00:21:40.920 | And it's not surprising because,

00:21:44.200 | so we talked about the forties utility, late fifties.

00:21:47.560 | Sort of noisy.

00:21:49.440 | Then the seventies came

00:21:51.240 | and behavioral economics started being a thing

00:21:54.280 | where people were like, no, no, no, no, no.

00:21:56.600 | People are not rational.

00:21:58.680 | People are messy and emotional and irrational

00:22:03.680 | and have all sorts of heuristics

00:22:05.840 | that might be domain specific

00:22:07.520 | and they're just a mess.

00:22:09.160 | - It's a mess.

00:22:10.000 | - So what does my robot do to understand what you want?

00:22:15.080 | And it's very, that's why it's complicated.

00:22:18.840 | For the most part,

00:22:19.680 | we get away with pretty simple models until we don't.

00:22:23.240 | And then the question is, what do you do then?

00:22:25.520 | And I had days when I wanted to pack my bags

00:22:30.880 | and go home and switch jobs

00:22:32.760 | because it's just, it feels really daunting

00:22:34.960 | to make sense of human behavior

00:22:36.800 | enough that you can reliably understand what people want,

00:22:40.480 | especially as robot capabilities

00:22:42.520 | will continue to get developed.

00:22:44.920 | You'll get these systems that are more and more capable

00:22:47.160 | of all sorts of things.

00:22:48.040 | And then you really want to make sure

00:22:49.120 | that you're telling them the right thing to do.

00:22:51.520 | What is that thing?

00:22:52.600 | Well, read it in human behavior.

00:22:55.120 | - So if I just sat here quietly

00:22:58.480 | and tried to understand something about you

00:23:00.360 | by listening to you talk,

00:23:02.160 | it would be harder than if I got to say something

00:23:06.120 | and ask you and interact and control.

00:23:08.600 | Can you, can the robot help its understanding of the human

00:23:13.120 | by influencing the behavior by actually acting?

00:23:18.120 | - Yeah, absolutely.

00:23:19.800 | So one of the things that's been exciting to me lately

00:23:23.680 | is this notion that when you try to,

00:23:28.520 | when you try to think of the robotics problem as,

00:23:31.920 | okay, I have a robot and it needs to optimize

00:23:34.480 | for whatever it is that a person wants it to optimize,

00:23:37.520 | as opposed to maybe what a programmer said,

00:23:40.680 | that problem we think of

00:23:43.320 | as a human robot collaboration problem

00:23:46.040 | in which both agents get to act,

00:23:49.120 | in which the robot knows less than the human

00:23:52.280 | because the human actually has access to,

00:23:54.920 | at least implicitly to what it is that they want.

00:23:57.200 | They can't write it down, but they can talk about it.

00:24:00.640 | They can give all sorts of signals.

00:24:02.280 | They can demonstrate.

00:24:03.560 | And, but the robot doesn't need to sit there

00:24:06.520 | and passively observe human behavior

00:24:08.760 | and try to make sense of it.

00:24:10.080 | The robot can act too.

00:24:11.880 | And so there's these information gathering actions

00:24:15.360 | that the robot can take to sort of solicit responses

00:24:19.000 | that are actually informative.

00:24:21.040 | So for instance,

00:24:21.920 | this is not for the purpose of assisting people,

00:24:23.920 | but with kind of back to coordinating with people in cars

00:24:26.440 | and all of that.

00:24:27.400 | One thing that Dorsa did was,

00:24:31.840 | so we were looking at cars being able to navigate

00:24:34.240 | around people and you might not know exactly

00:24:39.440 | the driving style of a particular individual

00:24:41.840 | that's next to you,

00:24:43.000 | but you want to change lanes in front of them.

00:24:45.240 | - Navigating around other humans inside cars?

00:24:48.760 | - Yeah, good, good clarification question.

00:24:50.920 | So if you have an autonomous car

00:24:54.040 | and it's trying to navigate the road

00:24:56.560 | around human driven vehicles.

00:24:59.000 | Similar things, ideas apply to pedestrians as well,

00:25:01.600 | but let's just take human driven vehicles.

00:25:03.880 | So now you're trying to change a lane.

00:25:06.240 | Well, you could be trying to infer the driving style

00:25:10.440 | of this person next to you.

00:25:12.160 | You'd like to know if they're in particular,

00:25:13.760 | if they're sort of aggressive or defensive,

00:25:15.960 | if they're going to let you kind of go in

00:25:18.000 | or if they're going to not.

00:25:19.440 | And it's very difficult to just,

00:25:24.320 | if you think that if you want to hedge your bets

00:25:27.960 | and say, ah, maybe they're actually pretty aggressive,

00:25:30.320 | I shouldn't try this.

00:25:31.560 | You kind of end up driving next to them

00:25:33.400 | and driving next to them, right?

00:25:34.840 | And then you don't know

00:25:36.440 | because you're not actually getting the observations

00:25:39.320 | that you get the way someone drives

00:25:41.120 | when they're next to you

00:25:42.560 | and they just need to go straight.

00:25:44.360 | It's kind of the same

00:25:45.200 | regardless if they're aggressive or defensive.

00:25:47.400 | And so you need to enable the robot

00:25:51.000 | to reason about how it might actually be able

00:25:54.160 | to gather information by changing the actions

00:25:56.960 | that it's taking.

00:25:58.080 | And then the robot comes up with these cool things

00:25:59.880 | where it kind of nudges towards you

00:26:02.520 | and then sees if you're going to slow down or not.

00:26:05.240 | Then if you slow down, it sort of updates its model of you

00:26:07.920 | and says, oh, okay, you're more on the defensive side.

00:26:11.280 | So now I can actually-

00:26:12.680 | - That's a fascinating dance.

00:26:14.320 | That's so cool that you could use your own actions

00:26:18.080 | to gather information.

00:26:19.360 | That feels like a totally open,

00:26:22.320 | exciting new world of robotics.

00:26:24.360 | I mean, how many people are even thinking

00:26:26.040 | about that kind of thing?

00:26:28.640 | - A handful of us, I'd say.

00:26:30.240 | - It's rare 'cause it's actually leveraging human.

00:26:33.400 | I mean, most roboticists I've talked to a lot,

00:26:35.960 | sort of colleagues and so on,

00:26:38.240 | are kind of, being honest, kind of afraid of humans.

00:26:42.960 | - 'Cause they're messy and complicated, right?

00:26:45.440 | I understand.

00:26:46.680 | Going back to what we were talking about earlier,

00:26:49.800 | right now we're kind of in this dilemma of, okay,

00:26:52.480 | there are tasks that we can just assume

00:26:54.000 | people are approximately rational for

00:26:55.680 | and we can figure out what they want.

00:26:57.120 | We can figure out their goals.

00:26:57.960 | We can figure out their driving styles, whatever.

00:26:59.720 | Cool.

00:27:00.560 | There are these tasks that we can't.

00:27:02.840 | So what do we do, right?

00:27:03.960 | Do we pack our bags and go home?

00:27:06.040 | And this one, I've had a little bit of hope recently

00:27:10.920 | and I'm kind of doubting myself

00:27:13.720 | 'cause what do I know that 50 years

00:27:16.280 | of behavioral economics hasn't figured out?

00:27:19.560 | But maybe it's not really in contradiction

00:27:21.480 | with the way that field is headed.

00:27:23.920 | But basically one thing that we've been thinking about

00:27:27.960 | instead of kind of giving up and saying people

00:27:30.440 | are too crazy and irrational for us to make sense of them,

00:27:33.520 | maybe we can give them a bit the benefit of the doubt

00:27:39.360 | and maybe we can think of them

00:27:41.400 | as actually being relatively rational,

00:27:43.960 | but just under different assumptions about the world,

00:27:48.960 | about how the world works, about, you know,

00:27:51.520 | when we think about rationality,

00:27:54.040 | implicit assumption is, oh, they're rational

00:27:56.480 | under all the same assumptions

00:27:57.880 | and constraints as the robot, right?

00:27:59.880 | What, if this is the state of the world,

00:28:01.760 | that's what they know.

00:28:02.720 | This is the transition function, that's what they know.

00:28:05.080 | This is the horizon, that's what they know.

00:28:07.360 | But maybe kind of this difference,

00:28:11.040 | the way, the reason they can seem a little messy

00:28:13.760 | and hectic, especially to robots,

00:28:16.440 | is that perhaps they just make different assumptions

00:28:20.000 | or have different beliefs.

00:28:21.640 | - Yeah.

00:28:22.480 | I mean, that's another fascinating idea

00:28:24.800 | that this, our kind of anecdotal desire

00:28:29.040 | to say that humans are irrational,

00:28:31.040 | perhaps grounded in behavioral economics,

00:28:33.280 | is that we just don't understand the constraints

00:28:36.400 | and the rewards under which they operate.

00:28:38.280 | And so our goal shouldn't be to throw our hands up

00:28:40.960 | and say they're irrational, it's to say,

00:28:43.640 | let's try to understand what are the constraints.

00:28:46.400 | - What it is that they must be assuming

00:28:48.400 | that makes this behavior make sense.

00:28:51.120 | Good life lesson, right?

00:28:52.600 | Good life lesson.

00:28:53.440 | - Yeah, it's true.

00:28:54.280 | - It's just outside of robotics.

00:28:55.560 | That's just good to, communicating with humans,

00:28:58.480 | that's just a good, assume that you just don't,

00:29:01.880 | sort of empathy, right?

00:29:03.400 | - It's just maybe there's something you're missing

00:29:06.000 | and it's, you know, it especially happens to robots

00:29:08.560 | 'cause they're kind of dumb and they don't know things

00:29:10.200 | and oftentimes people are sort of super irrational

00:29:12.720 | and that they actually know a lot of things that robots don't

00:29:15.360 | Sometimes, like with the Lunar Lander,

00:29:17.840 | the robot knows much more.

00:29:20.760 | So it turns out that if you try to say,

00:29:23.920 | look, maybe people are operating this thing,

00:29:26.880 | but assuming a much more simplified physics model,

00:29:31.040 | 'cause they don't get the complexity of this kind of craft

00:29:33.840 | or the robot arm with seven degrees of freedom

00:29:36.040 | with these inertias and whatever.

00:29:38.320 | So maybe they have this intuitive physics model,

00:29:41.520 | which is not, you know, this notion of intuitive physics

00:29:44.240 | is something that is studied actually in cognitive science

00:29:46.560 | was like Josh Tenenbaum, Tom Griffith's work on this stuff.

00:29:49.840 | And what we found is that you can actually try

00:29:54.600 | to figure out what physics model

00:29:58.320 | kind of best explains human actions.

00:30:01.320 | And then you can use that to sort of correct what it is

00:30:06.320 | that they're commanding the craft to do.

00:30:08.720 | So they might be sending the craft somewhere,

00:30:11.320 | but instead of executing that action,

00:30:13.240 | you can sort of take a step back and say,

00:30:15.200 | according to their intuitive,

00:30:16.920 | if the world worked according to their intuitive

00:30:19.640 | physics model, where do they think that the craft is going?

00:30:23.640 | Where are they trying to send it to?

00:30:26.040 | And then you can use the real physics, right?

00:30:28.640 | The inverse of that to actually figure out

00:30:30.240 | what you should do so that you do that

00:30:31.560 | instead of where they were actually sending you

00:30:33.400 | in the real world.

00:30:34.840 | And I kid you not, it worked.

00:30:36.000 | People land the damn thing,

00:30:38.320 | and you know, in between the two flags and all that.

00:30:42.480 | So it's not conclusive in any way,

00:30:45.200 | but I'd say it's evidence that, yeah,

00:30:47.320 | maybe we're kind of underestimating humans in some ways

00:30:50.440 | when we're giving up and saying,

00:30:51.640 | yeah, they're just crazy noisy.

00:30:53.200 | - So then you try to explicitly try to model

00:30:56.280 | the kind of a worldview that they have.

00:30:58.120 | - That they have, that's right.

00:30:59.640 | That's right.

00:31:00.480 | And it's not too, I mean,

00:31:02.240 | there's things in behavioral economics too

00:31:03.600 | that for instance, have touched upon the planning horizon.

00:31:06.960 | So there's this idea that,

00:31:08.560 | there's bounded rationality essentially.

00:31:10.440 | And the idea that, well,

00:31:11.400 | maybe we work under computational constraints.

00:31:13.680 | And I think kind of our view recently has been,

00:31:17.040 | take the Bellman update in AI

00:31:19.760 | and just break it in all sorts of ways by saying state.

00:31:22.600 | No, no, no.

00:31:23.440 | The person doesn't get to see the real state.

00:31:25.040 | Maybe they're estimating somehow.

00:31:26.560 | Transition function.

00:31:27.760 | No, no, no, no, no.

00:31:28.880 | Even the actual reward evaluation,

00:31:31.600 | maybe they're still learning about what it is that they want.

00:31:34.880 | Like, you know, when you watch Netflix

00:31:37.760 | and you have all the things

00:31:39.440 | and then you have to pick something,

00:31:41.720 | imagine that, you know,

00:31:42.920 | the AI system interpreted that choice

00:31:46.760 | as this is the thing you prefer to see.

00:31:48.840 | And how are you going to know?

00:31:49.720 | You're still trying to figure out what you like,

00:31:51.320 | what you don't like, et cetera.

00:31:52.640 | So I think it's important to also account for that.

00:31:55.560 | So it's not irrationality,

00:31:56.760 | because they're doing the right thing

00:31:58.120 | under the things that they know.

00:32:00.000 | - Yeah, that's brilliant.

00:32:01.320 | You mentioned recommender systems.

00:32:03.280 | What kind of,

00:32:04.760 | and we were talking about human-robot interaction,

00:32:07.120 | what kind of problem spaces are you thinking about?

00:32:10.840 | So is it robots,

00:32:13.880 | like wheeled robots with autonomous vehicles?

00:32:16.000 | Is it object manipulation?

00:32:18.560 | Like when you think about human-robot interaction

00:32:20.960 | in your mind,

00:32:21.960 | and maybe I'm sure you can speak

00:32:24.440 | for the entire community of human-robot interaction.

00:32:27.000 | No, but like, what are the problems of interest here?

00:32:30.560 | And does it, you know,

00:32:33.520 | I kind of think of open domain dialogue

00:32:39.160 | as human-robot interaction.

00:32:40.840 | And that happens not in the physical space,

00:32:43.040 | but it could just happen in the virtual space.

00:32:46.360 | So where's the boundaries of this field for you

00:32:49.520 | when you're thinking about the things

00:32:50.720 | we've been talking about?

00:32:51.800 | - Yeah, so I tried to find kind of underlying,

00:32:56.800 | I don't know what to even call them.

00:33:02.480 | I get try to work on, you know,

00:33:03.800 | I might call what I do the kind of working on

00:33:06.640 | the foundations of algorithmic human-robot interaction

00:33:09.600 | and trying to make contributions there.

00:33:11.800 | And it's important to me that whatever we do

00:33:15.920 | is actually somewhat domain agnostic when it comes to,

00:33:19.320 | is it about, you know, autonomous cars,

00:33:24.000 | or is it about quadrotors,

00:33:27.040 | or is it about,

00:33:27.880 | it's sort of the same underlying principles apply.

00:33:30.800 | Of course, when you're trying to get

00:33:31.680 | a particular domain to work,

00:33:32.920 | you usually have to do some extra work

00:33:34.240 | to adapt that to that particular domain.

00:33:36.600 | But these things that we were talking about around,

00:33:40.040 | well, you know, how do you model humans?

00:33:42.440 | It turns out that a lot of systems need to core benefit

00:33:45.760 | from a better understanding of how human behavior relates

00:33:49.560 | to what people want and need to predict human behavior,

00:33:53.560 | physical robots of all sorts and beyond that.

00:33:56.440 | And so I used to do manipulation.

00:33:58.520 | I used to be, you know, picking up stuff,

00:34:00.600 | and then I was picking up stuff with people around.

00:34:03.440 | And now it's sort of very broad

00:34:05.920 | when it comes to the application level,

00:34:07.800 | but in a sense, very focused on, okay,

00:34:11.160 | how does the problem need to change?

00:34:14.080 | How do the algorithms need to change

00:34:15.880 | when we're not doing a robot by itself,

00:34:19.160 | you know, emptying the dishwasher,

00:34:21.400 | but we're stepping outside of that.

00:34:23.800 | - I thought that popped into my head just now.

00:34:26.800 | On the game theoretic side, I think you said

00:34:28.720 | this really interesting idea of using actions

00:34:30.960 | to gain more information.

00:34:33.320 | But if we think of sort of game theory,

00:34:37.800 | the humans that are interacting with you,

00:34:43.440 | with you, the robot,

00:34:44.560 | well, I'm taking the identity of the robot.

00:34:46.440 | - Yeah, I do that all the time.

00:34:47.480 | - Yeah, is they also have a world model of view,

00:34:52.480 | and you can manipulate that.

00:34:57.440 | I mean, if we look at autonomous vehicles,

00:34:59.360 | people have a certain viewpoint.

00:35:01.480 | You said with the kids, people see Alexa in a certain way.

00:35:06.320 | Is there some value in trying to also optimize

00:35:10.880 | how people see you as a robot?

00:35:13.560 | Or is that a little too far away from the specifics

00:35:20.160 | of what we can solve right now?

00:35:21.680 | - So, well, both, right?

00:35:24.400 | So it's really interesting.

00:35:26.320 | And we've seen a little bit of progress on this problem,

00:35:31.000 | on pieces of this problem.

00:35:32.400 | So you can, again, it kind of comes down

00:35:36.280 | to how complicated does the human model need to be.

00:35:38.320 | But in one piece of work that we were looking at,

00:35:42.360 | we just said, okay, there's these parameters

00:35:46.240 | that are internal to the robot,

00:35:47.960 | and what the robot is about to do,

00:35:51.640 | or maybe what objective,

00:35:52.720 | what driving style the robot has, or something like that.

00:35:55.320 | And what we're gonna do is we're gonna set up a system

00:35:58.240 | where part of the state is the person's belief

00:36:00.360 | over those parameters.

00:36:02.360 | And now when the robot acts,

00:36:05.240 | that the person gets new evidence

00:36:07.640 | about this robot internal state.

00:36:10.760 | And so they're updating their mental model of the robot.

00:36:13.760 | So if they see a card that sort of cut someone off,

00:36:17.040 | they're like, oh, that's an aggressive card,

00:36:18.440 | they know more.

00:36:19.280 | If they see sort of a robot head towards a particular door,

00:36:24.160 | they're like, oh yeah,

00:36:25.000 | the robot's trying to get to that door.

00:36:26.360 | So this thing that we have to do with humans

00:36:28.040 | to try to understand their goals and intentions,

00:36:31.120 | humans are inevitably gonna do that to robots.

00:36:34.560 | And then that raises this interesting question

00:36:36.560 | that you asked, which is, can we do something about that?

00:36:38.960 | This is gonna happen inevitably,

00:36:40.280 | but we can sort of be more confusing

00:36:42.120 | or less confusing to people.

00:36:44.160 | And it turns out you can optimize

00:36:45.680 | for being more informative and less confusing

00:36:49.040 | if you have an understanding

00:36:50.920 | of how your actions are being interpreted by the human,

00:36:53.640 | how they're using these actions to update their belief.

00:36:56.680 | And honestly, all we did is just Bayes' rule.

00:36:59.640 | Basically, okay, the person has a belief,

00:37:02.920 | they see an action, they make some assumptions

00:37:04.760 | about how the robot generates its actions,

00:37:06.360 | presumably as being rational, 'cause robots are rational,

00:37:09.120 | it's reasonable to assume that about them.

00:37:11.280 | And then they incorporate that new piece of evidence,

00:37:16.280 | in the Bayesian sense, in their belief,

00:37:19.320 | and they obtain a posterior.

00:37:20.640 | And now the robot is trying to figure out

00:37:22.960 | what actions to take,

00:37:24.320 | such that it steers the person's belief

00:37:26.120 | to put as much probability mass as possible

00:37:28.320 | on the correct parameters.

00:37:31.200 | - So that's kind of a mathematical formalization of that.

00:37:33.920 | But my worry, and I don't know

00:37:37.560 | if you wanna go there with me,

00:37:38.720 | but I talk about this quite a bit.

00:37:43.720 | The kids talking to Alexa disrespectfully worries me.

00:37:49.080 | I worry in general about human nature.

00:37:52.200 | I guess I grew up in the Soviet Union, World War II,

00:37:54.800 | I'm a Jew too, so with the Holocaust and everything.

00:37:58.160 | I just worry about how we humans sometimes treat the other,

00:38:01.720 | the group that we call the other, whatever it is.

00:38:05.080 | Through human history, the group that's the other

00:38:07.280 | has been changed faces.

00:38:09.560 | But it seems like the robot will be the other,

00:38:12.480 | the other, the next other.

00:38:15.680 | And one thing is, it feels to me

00:38:19.400 | that robots don't get no respect.

00:38:22.240 | - They get shoved around.

00:38:23.440 | - Shoved around, and is there one at the shallow level,

00:38:27.160 | for a better experience,

00:38:28.440 | it seems that robots need to talk back a little bit.

00:38:31.560 | Like my intuition says, I mean, most companies

00:38:35.480 | from sort of Roomba, autonomous vehicle companies

00:38:38.440 | might not be so happy with the idea

00:38:40.360 | that a robot has a little bit of an attitude.

00:38:43.680 | But it feels to me that that's necessary

00:38:46.760 | to create a compelling experience.

00:38:48.280 | Like we humans don't seem to respect anything

00:38:50.640 | that doesn't give us some attitude.

00:38:53.000 | - That, or like a mix of mystery and attitude and anger

00:38:58.000 | and that threatens us subtly, maybe passive aggressively.

00:39:03.960 | I don't know, it seems like we humans, yeah, need that.

00:39:08.200 | Do you, what are your, is there something,

00:39:10.120 | you have thoughts on this?

00:39:11.880 | - I'll give you two thoughts on this.

00:39:13.080 | - Okay, sure.

00:39:13.920 | - One is, one is, it's, we respond to, you know,

00:39:19.200 | someone being assertive, but we also respond

00:39:23.200 | to someone being vulnerable.

00:39:26.240 | So I think robots, my first thought is that

00:39:28.440 | robots get shoved around and bullied a lot

00:39:31.720 | because they're sort of, you know, tempting

00:39:33.080 | and they're sort of showing off

00:39:34.320 | or they appear to be showing off.

00:39:35.960 | And so I think going back to these things

00:39:38.920 | we were talking about in the beginning

00:39:40.160 | of making robots a little more,

00:39:41.720 | a little more expressive, a little bit more like,

00:39:45.480 | that wasn't cool to do and now I'm bummed, right?

00:39:50.120 | I think that that can actually help

00:39:51.720 | 'cause people can't help but anthropomorphize

00:39:53.600 | and respond to that.

00:39:54.440 | Even that though, the emotion being communicated

00:39:57.080 | is not in any way a real thing.

00:39:58.920 | And people know that it's not a real thing

00:40:00.440 | 'cause they know it's just a machine.

00:40:02.080 | We're still interpreting, you know, we watch,

00:40:04.720 | there's this famous psychology experiment

00:40:07.320 | with little triangles and kind of dots on a screen

00:40:11.240 | and a triangle is chasing the square

00:40:13.080 | and you get really angry at the darn triangle

00:40:16.080 | 'cause why is it not leaving the square alone?

00:40:18.520 | So that's, yeah, we can't help.

00:40:20.120 | So that was the first thought.

00:40:21.520 | - The vulnerability, that's really interesting.

00:40:23.880 | I think of like being, pushing back,

00:40:29.760 | being assertive as the only mechanism of getting,

00:40:33.720 | of forming a connection, of getting respect,

00:40:36.320 | but perhaps vulnerability.

00:40:37.960 | Perhaps there's other mechanism that are less threatening.

00:40:40.160 | - Yeah.

00:40:41.000 | - Is there?

00:40:41.840 | - Well, a little bit, yes.

00:40:43.960 | But then this other thing that we can think about is,

00:40:47.200 | it goes back to what you were saying,

00:40:48.360 | that interaction is really game theoretic, right?

00:40:50.600 | So the moment you're taking actions in a space,

00:40:52.760 | the humans are taking actions in that same space,

00:40:55.360 | but you have your own objective, which is, you know,

00:40:58.040 | you're a car, you need to get your passenger

00:40:59.600 | to the destination.

00:41:00.840 | And then the human nearby has their own objective,

00:41:03.680 | which somewhat overlaps with you, but not entirely.

00:41:06.640 | You're not interested in getting into an accident

00:41:09.160 | with each other, but you have different destinations

00:41:11.560 | and you wanna get home faster

00:41:13.000 | and they wanna get home faster.

00:41:14.600 | And that's a general sum game at that point.

00:41:17.560 | And so, I think that's what,

00:41:20.520 | treating it as such as kind of a way we can step outside

00:41:25.600 | of this kind of mode that,

00:41:29.560 | where you try to anticipate what people do

00:41:32.200 | and you don't realize you have any influence over it,

00:41:35.280 | while still protecting yourself

00:41:37.200 | because you're understanding that people also understand

00:41:40.520 | that they can influence you.

00:41:42.640 | And it's just kind of back and forth,

00:41:44.160 | there's this negotiation, which is really,

00:41:46.680 | really talking about different equilibria of a game.

00:41:50.480 | The very basic way to solve coordination

00:41:53.160 | is to just make predictions about what people will do

00:41:55.840 | and then stay out of their way.

00:41:57.800 | And that's hard for the reasons we talked about,

00:41:59.880 | which is how you have to understand people's intentions,

00:42:02.800 | implicitly, explicitly, who knows,

00:42:05.320 | but somehow you have to get enough of an understanding

00:42:07.160 | of that to be able to anticipate what happens next.

00:42:10.680 | And so that's challenging.

00:42:11.720 | But then it's further challenged by the fact

00:42:13.600 | that people change what they do based on what you do,

00:42:17.320 | 'cause they don't plan in isolation either, right?

00:42:20.960 | So when you see cars trying to merge on a highway

00:42:24.720 | and not succeeding, one of the reasons this can be

00:42:27.640 | is because they look at traffic that keeps coming,

00:42:32.640 | they predict what these people are planning on doing,

00:42:35.640 | which is to just keep going,

00:42:37.440 | and then they stay out of the way

00:42:38.960 | 'cause there's no feasible plan, right?

00:42:41.920 | Any plan would actually intersect

00:42:44.320 | with one of these other people.

00:42:46.480 | So that's bad, so you get stuck there.

00:42:49.080 | So now, if you start thinking about it as,

00:42:52.840 | "No, no, no, actually, these people change what they do

00:42:57.840 | depending on what the car does."

00:42:59.560 | Like if the car actually tries to inch itself forward,

00:43:03.400 | they might actually slow down and let the car in.

00:43:08.040 | And now taking advantage of that,

00:43:10.400 | well, that's kind of the next level.

00:43:13.360 | We call this like this underactuated system idea

00:43:16.040 | where it's like an underactuated system in robotics,

00:43:18.480 | but you're influenced these other degrees of freedom,

00:43:23.080 | but you don't get to decide what they do.

00:43:26.000 | - Somewhere I've seen you mention

00:43:27.440 | the human element in this picture as underactuated.

00:43:31.760 | So you understand underactuated robotics

00:43:35.000 | is that you can't fully control the system.

00:43:40.000 | - You can't go in arbitrary directions

00:43:43.400 | in the configuration space.

00:43:44.840 | - Under your control.

00:43:46.320 | - Yeah, it's a very simple way of underactuation

00:43:48.800 | where basically there's literally these degrees of freedom

00:43:51.040 | that you can control,

00:43:52.000 | and these are the things that you can't,

00:43:53.480 | but you influence them.

00:43:54.320 | And I think that's the important part

00:43:55.840 | is that they don't do whatever,

00:43:58.160 | regardless of what you do,

00:43:59.440 | that what you do influences what they end up doing.

00:44:02.280 | - I just also like the poetry

00:44:04.160 | of calling human and robot interaction

00:44:06.160 | an underactuated robotics problem.

00:44:09.400 | And you also mentioned sort of nudging.

00:44:11.880 | It seems that they're, I don't know,

00:44:14.280 | I think about this a lot in the case of pedestrians.

00:44:16.640 | I've collected hundreds of hours of videos.

00:44:18.720 | I like to just watch pedestrians.

00:44:21.120 | And it seems that--

00:44:22.880 | - It's a funny hobby.

00:44:24.280 | - Yeah, it's weird.

00:44:25.760 | 'Cause I learn a lot.

00:44:27.240 | I learn a lot about myself, about our human behavior

00:44:30.480 | from watching pedestrians,

00:44:32.920 | watching people in their environment.

00:44:35.240 | Basically, crossing the street

00:44:37.880 | is like you're putting your life on the line.

00:44:40.320 | I don't know, tens of millions of time in America every day

00:44:44.520 | is people are just like playing this weird game of chicken

00:44:48.920 | when they cross the street,

00:44:49.920 | especially when there's some ambiguity

00:44:51.880 | about the right of way.

00:44:54.280 | That has to do either with the rules of the road

00:44:56.640 | or with the general personality of the intersection

00:44:59.840 | based on the time of day and so on.

00:45:02.320 | And this nudging idea,

00:45:04.080 | it seems that people don't even nudge.

00:45:07.320 | They just aggressively make a decision.

00:45:10.280 | Somebody, there's a runner that gave me this advice.

00:45:13.200 | I sometimes run in the street,

00:45:16.560 | not in the street, on the sidewalk.

00:45:18.800 | And he said that if you don't make eye contact with people

00:45:22.240 | when you're running, they will all move out of your way.

00:45:25.680 | - It's called civil inattention.

00:45:27.480 | - Civil inattention, that's a thing.

00:45:29.200 | Oh, wow, I need to look this up, but it works.

00:45:32.040 | What is that?

00:45:32.880 | My sense was if you communicate confidence in your actions

00:45:37.880 | that you're unlikely to deviate

00:45:41.280 | from the action that you're following,

00:45:43.120 | that's a really powerful signal to others

00:45:44.960 | that they need to plan around your actions

00:45:47.200 | as opposed to nudging where you're sort of hesitantly.

00:45:49.860 | The hesitation might communicate that you're still

00:45:55.120 | in the dance, in the game that they can influence

00:45:57.480 | with their own actions.

00:45:59.480 | I've recently had a conversation with Jim Keller,

00:46:03.240 | who's a sort of this legendary chip architect,

00:46:08.240 | but he also led the autopilot team for a while.

00:46:12.280 | And his intuition, that driving is fundamentally still

00:46:17.200 | like a ballistics problem.

00:46:18.880 | Like you can ignore the human element,

00:46:22.220 | that it's just not hitting things.

00:46:24.040 | And you can kind of learn the right dynamics required

00:46:27.160 | to do the merger and all those kinds of things.

00:46:29.840 | And then my sense is, and I don't know if I can provide

00:46:32.600 | sort of definitive proof of this,

00:46:34.960 | but my sense is like an order of magnitude

00:46:38.040 | or more difficult when humans are involved.

00:46:41.540 | Like it's not simply object collision avoidance problem.

00:46:46.540 | Where does your intuition,

00:46:49.240 | of course, nobody knows the right answer here,

00:46:51.020 | but where does your intuition fall on the difficulty,

00:46:54.360 | fundamental difficulty of the driving problem,

00:46:57.040 | when humans are involved?

00:46:58.800 | - Yeah, good question.

00:47:00.400 | I have many opinions on this.

00:47:02.200 | Imagine downtown San Francisco.

00:47:07.320 | - Yeah. - Yeah.

00:47:08.920 | It's crazy busy, everything.

00:47:10.760 | Okay, now take all the humans out.

00:47:12.840 | No pedestrians, no human driven vehicles, no cyclists,

00:47:16.800 | no people on little electric scooters zipping around,

00:47:19.440 | nothing.

00:47:20.280 | I think we're done.

00:47:22.000 | I think driving at that point is done.

00:47:23.840 | We're done.

00:47:25.040 | There's nothing really that still needs to be solved

00:47:28.320 | about that.

00:47:29.160 | - Well, let's pause there.

00:47:30.640 | I think I agree with you.

00:47:33.440 | And I think a lot of people that will hear

00:47:35.320 | will agree with that.

00:47:37.440 | But we need to sort of internalize that idea.

00:47:41.700 | So what's the problem there?

00:47:42.960 | 'Cause we may not quite yet be done with that.

00:47:45.320 | 'Cause a lot of people kind of focus

00:47:46.920 | on the perception problem.

00:47:48.240 | A lot of people kind of map autonomous driving

00:47:52.880 | into how close are we to solving,

00:47:55.760 | being able to detect all the drivable area,

00:48:00.120 | the objects in the scene.

00:48:01.560 | Do you see that as a, how hard is that problem?

00:48:06.160 | So your intuition there behind your statement was,

00:48:10.240 | we might have not solved it yet,

00:48:11.560 | but we're close to solving basically

00:48:13.480 | the perception problem.

00:48:14.560 | - I think the perception problem,

00:48:16.760 | I mean, and by the way, a bunch of years ago,

00:48:19.360 | this would not have been true.

00:48:21.520 | And a lot of issues in the space were coming

00:48:24.640 | from the fact that, oh, we don't really,

00:48:27.080 | we don't know what's where.

00:48:29.380 | But I think it's fairly safe to say that at this point,

00:48:33.800 | although you could always improve on things and all of that,

00:48:37.020 | you can drive through downtown San Francisco

00:48:38.920 | if there are no people around.

00:48:40.440 | There's no really perception issues

00:48:42.520 | standing in your way there.

00:48:44.960 | I think perception is hard, but yeah,

00:48:46.920 | we've made a lot of progress on the perceptions

00:48:49.240 | and I don't undermine the difficulty of the problem.

00:48:50.960 | I think everything about robotics

00:48:52.240 | is really difficult, of course.

00:48:54.520 | I think that the planning problem, the control problem,

00:48:58.440 | all very difficult, but I think what makes it really-

00:49:02.240 | - Is the humans.

00:49:03.080 | - Yeah, it might be, I mean,

00:49:05.480 | and I picked downtown San Francisco,

00:49:07.280 | adapting to, well, now it's snowing,

00:49:11.600 | now it's no longer snowing, now it's slippery in this way,

00:49:14.120 | now it's the dynamics part, I could imagine being,

00:49:19.780 | being still somewhat challenging, but-

00:49:24.060 | - No, the thing that I think worries us

00:49:25.980 | and our intuition's not good there

00:49:27.700 | is the perception problem at the edge cases.

00:49:30.840 | Downtown San Francisco, the nice thing,

00:49:35.300 | it's not actually, it may not be a good example because-

00:49:38.860 | - 'Cause you know what you're getting.

00:49:41.300 | Well, there's like crazy construction zones

00:49:43.180 | and all of that. - Yeah, but the thing is

00:49:44.420 | you're traveling at slow speeds,

00:49:46.140 | so it doesn't feel dangerous.

00:49:47.820 | To me, what feels dangerous is highway speeds

00:49:51.020 | when everything is, to us humans, super clear.

00:49:54.580 | - Yeah, I'm assuming LIDAR here, by the way.

00:49:57.060 | I think it's kind of irresponsible to not use LIDAR.

00:49:59.740 | That's just my personal opinion.

00:50:01.340 | (laughing)

00:50:03.460 | I mean, depending on your use case,

00:50:04.580 | but I think if you have the opportunity to use LIDAR,

00:50:08.740 | well, good, and in a lot of cases you might not.

00:50:11.100 | - Good, your intuition makes more sense now.

00:50:13.660 | So you don't think vision-

00:50:15.220 | - I really just don't know enough to say,

00:50:18.060 | well, vision alone, what's like,

00:50:21.500 | there's a lot of, how many cameras do you have?

00:50:24.220 | Is it how are you using them?

00:50:25.700 | I don't know. - Yeah, there's details.

00:50:27.300 | - There's all sorts of details.

00:50:28.420 | I imagine there's stuff that's really hard to actually see.

00:50:31.860 | How do you deal with glare?

00:50:33.820 | Exactly what you were saying,

00:50:34.660 | stuff that people would see that you don't.

00:50:37.740 | I think I have more of my intuition comes from systems

00:50:40.660 | that can actually use LIDAR as well.

00:50:44.260 | - Yeah, and until we know for sure,

00:50:45.780 | it makes sense to be using LIDAR.

00:50:47.980 | That's kind of the safety focus.

00:50:50.020 | But then there's sort of the,

00:50:52.220 | I also sympathize with the Elon Musk statement

00:50:55.820 | of LIDAR is a crutch.

00:50:57.380 | It's a fun notion to think that the things that work today

00:51:04.580 | is a crutch for the invention of the things

00:51:07.980 | that will work tomorrow, right?

00:51:12.220 | It's kind of true in the sense that

00:51:14.860 | we wanna stick to the comfort zone.

00:51:17.300 | You see this in academic and research settings all the time.

00:51:19.900 | The things that work force you to not explore outside,

00:51:24.180 | think outside the box.

00:51:25.340 | I mean, that happens all the time.

00:51:26.780 | The problem is in the safety critical systems,

00:51:29.020 | you kinda wanna stick with the things that work.

00:51:32.060 | So it's an interesting and difficult trade-off

00:51:34.900 | in the case of real world

00:51:37.180 | sort of safety critical robotic systems.

00:51:41.820 | - So your intuition is, just to clarify.

00:51:46.060 | - Yes.

00:51:47.140 | - How, I mean, how hard is this human element?

00:51:50.180 | Like how hard is driving

00:51:52.700 | when this human element is involved?

00:51:55.080 | Are we years, decades away from solving it?

00:52:00.020 | But perhaps actually the year isn't the thing I'm asking.

00:52:03.820 | It doesn't matter what the timeline is.

00:52:05.440 | But do you think we're,

00:52:08.020 | how many breakthroughs are we away from

00:52:11.660 | in solving the human-robot interaction problem

00:52:13.580 | to get this right?

00:52:15.580 | - I think it, in a sense, it really depends.

00:52:20.460 | I think that, we were talking about how,

00:52:23.980 | well, look, it's really hard

00:52:25.100 | because antisemite we will do is hard.

00:52:27.020 | And on top of that, playing the game is hard.

00:52:30.260 | But I think we sort of have the fundamental,

00:52:35.260 | some of the fundamental understanding for that.

00:52:38.580 | And then you already see that these systems

00:52:40.980 | are being deployed in the real world,

00:52:44.360 | even driverless, like there's, I think now,

00:52:49.300 | a few companies that don't have a driver in the car

00:52:54.300 | in some small areas.

00:52:56.660 | - I got a chance to, I went to Phoenix

00:52:59.420 | and I shot a video with Waymo.

00:53:02.620 | I need to get that video out.

00:53:04.620 | People have been giving me slack.

00:53:06.620 | But there's incredible engineering work being done there.

00:53:09.220 | And it's one of those other seminal moments

00:53:11.100 | for me in my life to be able to,

00:53:13.180 | it sounds silly, but to be able to drive without a,

00:53:15.580 | without a ride, sorry, without a driver in the seat.

00:53:19.300 | I mean, that was an incredible robotics.

00:53:22.340 | I was driven by a robot

00:53:24.220 | without being able to take over,

00:53:27.820 | without being able to take the steering wheel.

00:53:31.140 | That's a magical, that's a magical moment.

00:53:33.500 | So in that regard, in those domains,

00:53:35.500 | at least for like Waymo,

00:53:36.540 | they're solving that human,

00:53:39.940 | there's, I mean, they're going,

00:53:42.460 | I mean, it felt fast because you're like freaking out.

00:53:45.100 | At first, this is my first experience,

00:53:47.420 | but it's going like the speed limit, right?

00:53:49.060 | 30, 40, whatever it is.

00:53:51.180 | And there's humans and it deals with them quite well.

00:53:53.820 | It detects them, it negotiates the intersections,

00:53:56.980 | left turns and all that.

00:53:58.180 | So at least in those domains, it's solving them.

00:54:01.220 | The open question for me is like,

00:54:03.340 | how quickly can we expand?

00:54:05.980 | You know, that's the,

00:54:07.260 | you know, outside of the weather conditions,

00:54:10.100 | all those kinds of things,

00:54:11.060 | how quickly can we expand to like cities like San Francisco?

00:54:14.580 | - Yeah, and I wouldn't say that it's just, you know,

00:54:17.140 | now it's just pure engineering and it's probably the,

00:54:20.260 | I mean, and by the way,

00:54:22.060 | I'm speaking kind of very generally here as hypothesizing,

00:54:26.300 | but I think that there are successes

00:54:31.220 | and yet no one is everywhere out there.

00:54:34.380 | So that seems to suggest that things can be expanded

00:54:38.860 | and can be scaled and we know how to do a lot of things,

00:54:41.700 | but there's still probably, you know,

00:54:44.100 | new algorithms or modified algorithms

00:54:46.780 | that you still need to put in there

00:54:49.220 | as you learn more and more about new challenges

00:54:53.460 | that you get faced with.

00:54:55.780 | - How much of this problem do you think can be learned

00:54:58.280 | through end-to-end?

00:54:59.120 | Is the success of machine learning

00:55:00.700 | and reinforcement learning,

00:55:02.740 | how much of it can be learned from sort of data from scratch

00:55:06.020 | and how much, which most of the success

00:55:08.460 | of autonomous vehicle systems have a lot of heuristics

00:55:12.500 | and rule-based stuff on top,

00:55:14.380 | like human expertise injected,

00:55:18.220 | forced into the system to make it work.

00:55:20.820 | What's your sense?

00:55:21.980 | How much, what will be the role of learning

00:55:26.100 | in the near term and the long term?

00:55:28.180 | - I think on the one hand

00:55:32.340 | that learning is inevitable here, right?

00:55:37.340 | I think on the other hand,

00:55:38.780 | that when people characterize the problem

00:55:40.420 | as it's a bunch of rules that some people wrote down

00:55:43.660 | versus it's an end-to-end RL system or imitation learning,

00:55:49.500 | then maybe there's kind of something missing

00:55:52.500 | from maybe that's more.

00:55:57.180 | So for instance, I think a very, very useful tool

00:56:02.180 | in this sort of problem,

00:56:04.500 | both in how to generate the car's behavior

00:56:07.500 | and robots in general,

00:56:08.980 | and how to model human beings

00:56:11.860 | is actually planning, search optimization, right?

00:56:15.140 | So robotics is a sequential decision-making problem.

00:56:18.420 | And when a robot can figure out on its own

00:56:26.460 | how to achieve its goal without hitting stuff

00:56:29.020 | and all that stuff, right?

00:56:30.100 | All the good stuff for motion planning 101,

00:56:33.140 | I think of that as very much AI,

00:56:36.340 | not this is some rule or something.

00:56:38.220 | There's nothing rule-based around that, right?

00:56:40.420 | It's just you're searching through a space

00:56:42.060 | and figuring out, or you're optimizing through a space

00:56:43.780 | and figure out what seems to be the right thing to do.

00:56:46.460 | And I think it's hard to just do that

00:56:49.940 | because you need to learn models of the world.

00:56:52.580 | And I think it's hard to just do the learning part

00:56:55.780 | where you don't bother with any of that

00:56:58.820 | because then you're saying, well, I could do imitation,

00:57:01.780 | but then when I go off distribution, I'm really screwed.

00:57:04.700 | Or you can say, I can do reinforcement learning,

00:57:08.380 | which adds a lot of robustness,

00:57:09.900 | but then you have to do either reinforcement learning

00:57:12.740 | in the real world, which sounds a little challenging

00:57:15.380 | or that trial and error, you know,

00:57:18.460 | or you have to do reinforcement learning in simulation.

00:57:21.140 | And then that means, well, guess what?

00:57:23.140 | You need to model things, at least to model people,

00:57:27.260 | model the world enough that you, you know,

00:57:30.180 | whatever policy you get of that is like actually fine

00:57:32.660 | to roll out in the world

00:57:34.900 | and do some additional learning there.

00:57:36.460 | So.

00:57:37.580 | - Do you think simulation, by the way,

00:57:38.700 | just a quick tangent has a role

00:57:42.100 | in the human-robot interaction space?

00:57:44.260 | Like, is it useful?

00:57:46.300 | It seems like humans, everything we've been talking about

00:57:48.500 | are difficult to model and simulate.

00:57:51.420 | Do you think simulation has a role in this space?

00:57:53.660 | - I do.

00:57:54.500 | I think so because you can take models

00:57:58.860 | and train with them ahead of time, for instance.

00:58:03.860 | You can.

00:58:06.100 | - But the models, sorry to interrupt,

00:58:07.700 | the models are sort of human constructed or learned?

00:58:10.500 | - I think they have to be a combination

00:58:14.940 | because if you get some human data

00:58:20.020 | and then you say, this is how,

00:58:21.180 | this is going to be my model of the person,

00:58:22.980 | what are for simulation and training

00:58:24.460 | or for just deployment time?

00:58:25.820 | And that's what I'm planning with

00:58:27.220 | as my model of how people work.

00:58:29.140 | Regardless, if you take some data

00:58:31.660 | and you don't assume anything else and you just say, okay,

00:58:35.780 | this is some data that I've collected.

00:58:39.220 | Let me fit a policy to help people work based on that.

00:58:42.580 | What tends to happen is you collected some data

00:58:45.100 | and some distribution,

00:58:46.780 | and then now your robot sort of computes

00:58:51.180 | a best response to that, right?

00:58:52.940 | It's sort of like, what should I do

00:58:54.460 | if this is how people work?

00:58:56.260 | And easily goes off of distribution

00:58:58.580 | where that model that you've built of the human

00:59:01.020 | completely sucks because out of distribution,

00:59:03.460 | you have no idea, right?

00:59:05.100 | If you think of all the possible policies

00:59:07.860 | and then you take only the ones that are consistent

00:59:10.940 | with the human data that you've observed,

00:59:13.020 | that still leads a lot of,

00:59:14.460 | a lot of things could happen outside of that distribution

00:59:17.580 | where you're confident and you know what's going on.

00:59:19.860 | - By the way, that's a,

00:59:21.820 | I mean, I've gotten used to this terminology

00:59:23.780 | of out of distribution,

00:59:25.380 | but it's such a machine learning terminology

00:59:29.020 | because it kind of assumes,

00:59:30.820 | so distribution is referring to the data that you've seen.

00:59:35.820 | - The set of states that you encountered.

00:59:38.100 | - They've encountered so far at training time.

00:59:40.740 | But it kind of also implies that there's a nice,

00:59:44.020 | like statistical model that represents that data.

00:59:47.460 | So out of distribution feels like, I don't know,

00:59:50.340 | it raises to me philosophical questions

00:59:54.420 | of how we humans reason out of distribution,

00:59:58.700 | reason about things that are completely,

01:00:01.660 | we haven't seen before.

01:00:03.300 | - And so, and what we're talking about here is

01:00:05.820 | how do we reason about what other people do

01:00:09.220 | in situations where we haven't seen them?

01:00:11.500 | And somehow we just magically navigate that.

01:00:14.940 | I can anticipate what will happen in situations

01:00:18.100 | that are even novel in many ways.

01:00:21.740 | And I have a pretty good intuition for it.

01:00:23.060 | I don't always get it right,

01:00:24.140 | but, and I might be a little uncertain and so on.

01:00:26.580 | I think it's this that if you just rely on data,

01:00:31.580 | there's just too many possibilities,

01:00:36.060 | there's too many policies out there that fit the data.

01:00:38.020 | And by the way, it's not just data,

01:00:39.380 | it's really kind of history of state

01:00:40.700 | to really be able to anticipate what the person will do.

01:00:43.020 | It kind of depends on what they've been doing so far,

01:00:45.180 | 'cause that's the information you need

01:00:46.980 | to kind of, at least implicitly, sort of say,

01:00:49.500 | oh, this is the kind of person that this is,

01:00:51.300 | this is probably what they're trying to do.

01:00:53.020 | So anyway, it's like you're trying to map history of states

01:00:55.180 | to actions, there's many mappings.

01:00:56.660 | - And history meaning like the last few seconds

01:00:59.860 | or the last few minutes or the last few months.

01:01:02.540 | - Who knows, who knows how much you need, right?

01:01:04.700 | In terms of if your state is really like

01:01:06.500 | the positions of everything or whatnot and velocities.

01:01:09.660 | Who knows how much you need?

01:01:10.540 | And then there's so many mappings.

01:01:14.660 | And so now you're talking about

01:01:16.580 | how do you regularize that space?

01:01:17.940 | What priors do you impose?

01:01:20.180 | Or what's the inductive bias?

01:01:21.420 | So there's all very related things to think about it.

01:01:24.540 | Basically, what are assumptions that we should be making

01:01:28.860 | such that these models actually generalize

01:01:32.620 | outside of the data that we've seen?

01:01:34.580 | And now you're talking about, well, I don't know,

01:01:37.820 | what can you assume?

01:01:38.660 | Maybe you can assume that people actually have intentions

01:01:42.020 | and that's what drives their actions.

01:01:43.820 | Maybe that's the right thing to do

01:01:46.580 | when you haven't seen data very nearby

01:01:49.620 | that tells you otherwise.

01:01:51.020 | I don't know, it's a very open question.

01:01:53.380 | - Do you think, so one of the dreams

01:01:55.620 | of artificial intelligence was to solve

01:01:58.180 | common sense reasoning, whatever the heck that means.

01:02:01.180 | Do you think something like common sense reasoning

01:02:04.980 | has to be solved in part to be able to solve this dance

01:02:09.060 | of human-robot interaction in the driving space

01:02:12.260 | or human-robot interaction in general?

01:02:14.980 | Do you have to be able to reason about these kinds

01:02:16.900 | of common sense concepts of physics,

01:02:21.900 | of all the things we've been talking about humans,

01:02:27.140 | I don't even know how to express them with words,

01:02:30.580 | but the basics of human behavior, of fear of death.

01:02:34.620 | So to me, it's really important to encode

01:02:38.020 | in some kind of sense, maybe not, maybe it's implicit,

01:02:41.860 | but it feels it's important to explicitly encode

01:02:44.700 | the fear of death, that people don't wanna die.

01:02:48.160 | Because it seems silly, but the game of chicken

01:02:54.200 | that involves with the pedestrian crossing the street

01:02:59.780 | is playing with the idea of mortality.

01:03:02.980 | Like we really don't wanna die.

01:03:04.220 | It's not just like a negative reward.

01:03:06.080 | I don't know, it just feels like all these human concepts

01:03:10.060 | have to be encoded.

01:03:11.140 | Do you share that sense or is this a lot simpler

01:03:14.300 | than I'm making out to be?

01:03:15.820 | - I think it might be simpler.

01:03:17.060 | And I'm the person who likes to complicate things.

01:03:18.860 | I think it might be simpler than that.

01:03:21.140 | Because it turns out, for instance,

01:03:24.220 | if you say model people in the very,

01:03:29.220 | I'll call it traditional way.

01:03:30.860 | I don't know if it's fair to look at it

01:03:32.260 | as a traditional way, but calling people as,

01:03:35.380 | okay, they're irrational somehow,

01:03:37.900 | the utilitarian perspective.

01:03:40.100 | Well, in that, once you say that,

01:03:45.100 | you automatically capture that they have an incentive

01:03:48.980 | to keep on being.

01:03:50.260 | Stuart likes to say,

01:03:53.660 | you can't fetch the coffee if you're dead.

01:03:55.860 | - Stuart Russell, by the way.

01:03:58.420 | That's a good line.

01:04:01.340 | So when you're sort of treating agents

01:04:05.580 | as having these objectives, these incentives,

01:04:10.220 | humans or artificial,

01:04:12.660 | you're kind of implicitly modeling

01:04:14.860 | that they'd like to stick around

01:04:16.940 | so that they can accomplish those goals.

01:04:20.140 | So I think in a sense,

01:04:22.780 | maybe that's what draws me so much

01:04:24.180 | to the rationality framework,

01:04:25.500 | even though it's so broken,

01:04:26.900 | it's been such a useful perspective.

01:04:30.660 | And like we were talking about earlier,

01:04:32.220 | what's the alternative?

01:04:33.060 | I give up and go home,

01:04:33.940 | or I just use complete black boxes,

01:04:36.060 | but then I don't know what to assume

01:04:37.220 | out of distribution, I come back to this.

01:04:40.060 | It's just, it's been a very fruitful way

01:04:42.620 | to think about the problem

01:04:43.940 | in a very more positive way, right?

01:04:46.980 | It's just people aren't just crazy.

01:04:49.100 | Maybe they make more sense than we think.

01:04:51.460 | But I think we also have to somehow be ready

01:04:55.300 | for it to be wrong, be able to detect

01:04:58.260 | when these assumptions aren't holding,

01:05:00.460 | be all of that stuff.

01:05:02.860 | - Let me ask sort of another small side of this,

01:05:06.620 | that we've been talking about

01:05:07.780 | the pure autonomous driving problem,

01:05:09.940 | but there's also relatively successful systems

01:05:13.700 | already deployed out there

01:05:15.260 | in what you may call like level two autonomy

01:05:18.780 | or semi-autonomous vehicles,

01:05:20.660 | whether that's Tesla autopilot,

01:05:23.380 | work quite a bit with Cadillac Super Guru system,

01:05:27.460 | which has a driver facing camera

01:05:30.380 | that detects your state.

01:05:31.300 | There's a bunch of basically lane centering systems.

01:05:35.420 | What's your sense about this kind of way

01:05:39.740 | of dealing with the human robot interaction problem

01:05:43.140 | by having a really dumb robot

01:05:45.260 | and relying on the human to help the robot out

01:05:50.220 | to keep them both alive?

01:05:51.800 | Is that from the research perspective,

01:05:57.340 | how difficult is that problem?

01:05:59.260 | And from a practical deployment perspective,

01:06:02.220 | is that a fruitful way to approach

01:06:05.900 | this human robot interaction problem?

01:06:08.060 | - I think what we have to be careful about there

01:06:12.100 | is to not, it seems like some of these systems,

01:06:16.180 | not all, are making this underlying assumption

01:06:19.820 | that if, so I'm a driver

01:06:23.780 | and I'm now really not driving, but supervising

01:06:26.500 | and my job is to intervene, right?

01:06:28.860 | And so we have to be careful with this assumption

01:06:31.220 | that if I'm supervising,

01:06:35.620 | I will be just as safe as when I'm driving.

01:06:41.580 | Like that I will, if I wouldn't get

01:06:44.660 | into some kind of accident if I'm driving,

01:06:48.460 | I will be able to avoid that accident

01:06:50.820 | when I'm supervising too.

01:06:52.140 | And I think I'm concerned about this assumption

01:06:55.020 | from a few perspectives.

01:06:56.780 | So from a technical perspective,

01:06:58.380 | it's that when you let something kind of take control

01:07:01.340 | and do its thing, and it depends on what that thing is,

01:07:03.740 | obviously, and how much it's taking control

01:07:05.420 | and what things are you trusting it to do.

01:07:07.860 | But if you let it do its thing and take control,

01:07:11.820 | it will go to what we might call off policy

01:07:14.980 | from the person's perspective state.

01:07:16.700 | So states that the person wouldn't actually

01:07:18.380 | find themselves in if they were the ones driving.

01:07:20.820 | And the assumption that the person functions

01:07:24.020 | just as well there as they function in the states

01:07:26.180 | that they would normally encounter

01:07:27.980 | is a little questionable.

01:07:29.980 | Now, another part is the kind of the human factors

01:07:34.020 | side of this, which is that, I don't know about you,

01:07:38.260 | but I think I definitely feel like I'm experiencing things

01:07:42.060 | very differently when I'm actively engaged in the task

01:07:45.260 | versus when I'm a passive observer.

01:07:47.020 | Even if I try to stay engaged, right,

01:07:49.340 | it's very different than when I'm actually

01:07:51.060 | actively making decisions.

01:07:53.500 | And you see this in life in general,

01:07:55.420 | like you see students who are actively trying

01:07:58.300 | to come up with the answer, learn the thing better

01:08:00.860 | than when they're passively told the answer.

01:08:02.940 | I think that's somewhat related.

01:08:04.260 | And I think people have studied this

01:08:05.780 | in human factors for airplanes.

01:08:07.540 | And I think it's actually fairly established

01:08:10.140 | that these two are not the same.

01:08:12.100 | - Yeah, on that point, because I've gotten

01:08:14.220 | a huge amount of heat on this and I stand by it.

01:08:17.020 | - Okay.

01:08:17.860 | - 'Cause I know the human factors community well.

01:08:21.900 | And the work here is really strong.

01:08:23.940 | And there's many decades of work

01:08:26.220 | showing exactly what you're saying.

01:08:28.220 | Nevertheless, I've been continuously surprised

01:08:30.860 | that much of the predictions of that work

01:08:32.940 | has been wrong in what I've seen.

01:08:35.340 | So what we have to do,

01:08:36.980 | I still agree with everything you said,

01:08:40.300 | but we have to be a little bit more open-minded.

01:08:45.300 | So I'll tell you, there's a few surprising things

01:08:49.460 | that super, like everything you said to the word

01:08:52.900 | is actually exactly correct.

01:08:54.820 | But it doesn't say, what you didn't say

01:08:57.860 | is that these systems are,

01:09:00.140 | you said you can't assume a bunch of things,

01:09:02.460 | but we don't know if these systems are fundamentally unsafe.

01:09:06.700 | That's still unknown.

01:09:07.920 | There's a lot of interesting things.

01:09:11.060 | Like I'm surprised by the fact, not the fact,

01:09:15.900 | that what seems to be anecdotally

01:09:17.940 | from large data collection that we've done,

01:09:21.180 | but also from just talking to a lot of people,

01:09:23.980 | when in the supervisory role of semi-autonomous systems

01:09:27.140 | that are sufficiently dumb, at least,

01:09:29.500 | which is, that might be a key element,

01:09:33.580 | is the systems have to be dumb.

01:09:35.200 | The people are actually more energized as observers.

01:09:38.700 | So they're actually better,

01:09:40.620 | they're better at observing the situation.

01:09:43.420 | So there might be cases in systems,

01:09:46.540 | if you get the interaction right,

01:09:48.340 | where you, as a supervisor,

01:09:50.900 | will do a better job with the system together.

01:09:53.620 | - I agree.

01:09:54.460 | I think that is actually really possible.

01:09:56.780 | I guess mainly I'm pointing out that if you do it naively,

01:10:00.080 | you're implicitly assuming something,

01:10:02.180 | that assumption might actually really be wrong.

01:10:04.480 | But I do think that if you explicitly think about

01:10:07.780 | what the agent should do

01:10:10.740 | so that the person still stays engaged,

01:10:13.180 | so that you essentially empower the person

01:10:16.420 | to more than they could,

01:10:17.540 | that's really the goal, right?

01:10:19.060 | Is you still have a driver,

01:10:20.260 | so you want to empower them to be so much better

01:10:25.260 | than they would be by themselves.

01:10:27.020 | And that's different, it's a very different mindset

01:10:29.740 | than I want them to basically not drive.

01:10:32.600 | (laughs)

01:10:34.680 | But be ready to sort of take over.

01:10:39.420 | - So one of the interesting things we've been talking about

01:10:42.340 | is the rewards, that they seem to be fundamental

01:10:46.500 | to the way robots behaves.

01:10:49.220 | So broadly speaking,

01:10:52.460 | we've been talking about utility functions,

01:10:54.100 | so could you comment on how do we approach

01:10:56.940 | the design of reward functions?

01:10:59.620 | Like how do we come up with good reward functions?

01:11:02.620 | - Well, really good question,

01:11:05.140 | because the answer is we don't.

01:11:08.620 | (laughs)

01:11:10.880 | This was, you know, I used to think,

01:11:13.540 | I used to think about how,

01:11:16.460 | well, it's actually really hard

01:11:17.780 | to specify rewards for interaction,

01:11:20.540 | because it's really supposed to be what the people want,

01:11:24.120 | and then you really, you know,

01:11:24.960 | we talked about how you have to customize

01:11:26.540 | what you want to do to the end user.

01:11:30.660 | But I kind of realized that

01:11:35.060 | even if you take the interactive component away,

01:11:37.980 | it's still really hard to design reward functions.

01:11:42.640 | So what do I mean by that?

01:11:43.740 | I mean, if we assume this sort of AI paradigm

01:11:47.340 | in which there's an agent,

01:11:49.500 | and his job is to optimize some objectives,

01:11:52.540 | some reward, utility, loss, whatever, cost.

01:11:55.600 | If you write it out,

01:11:59.460 | or maybe it's a set depending on the situation,

01:12:01.140 | or whatever it is,

01:12:02.280 | if you write it out and then you deploy the agent,

01:12:06.900 | you'd want to make sure that whatever you specified

01:12:10.180 | incentivizes the behavior you want from the agent

01:12:14.760 | in any situation that the agent will be faced with, right?

01:12:18.580 | So I do motion planning on my robot arm,

01:12:22.000 | I specify some cost function,

01:12:24.200 | like, you know, this is how far away you should try to stay,

01:12:28.000 | so much it matters to stay away from people,

01:12:29.480 | and this is how much it matters

01:12:30.680 | to be able to be efficient, and blah, blah, blah, right?

01:12:33.860 | I need to make sure that whatever I specified,

01:12:36.480 | those constraints or trade-offs or whatever they are,

01:12:40.080 | that when the robot goes and solves that problem

01:12:43.260 | in every new situation,

01:12:45.020 | that behavior is the behavior that I want to see.

01:12:47.820 | And what I've been finding is

01:12:50.100 | that we have no idea how to do that.

01:12:52.260 | Basically, what I can do is I can sample,

01:12:56.460 | I can think of some situations

01:12:58.100 | that I think are representative of what the robot will face,

01:13:01.100 | and I can tune and add and tune some reward function

01:13:08.260 | until the optimal behavior is what I want

01:13:11.480 | on those situations,

01:13:13.200 | which, first of all, is super frustrating

01:13:15.720 | because, you know, through the miracle of AI,

01:13:18.960 | we don't have to specify rules for behavior anymore, right?

01:13:22.800 | The, we were saying before,

01:13:24.440 | the robot comes up with the right thing to do,

01:13:26.920 | you plug in the situation,

01:13:28.440 | it optimizes, right, in that situation, it optimizes,

01:13:31.560 | but you have to spend still a lot of time

01:13:34.600 | on actually defining what it is

01:13:37.100 | that that criteria should be,

01:13:38.900 | making sure you didn't forget about 50 bazillion things

01:13:41.320 | that are important

01:13:42.280 | and how they all should be combining together

01:13:44.560 | to tell the robot what's good and what's bad

01:13:46.760 | and how good and how bad.

01:13:48.800 | And so I think this is a lesson that,

01:13:53.660 | I don't know, kind of,

01:13:56.640 | I guess I closed my eyes to it for a while

01:13:59.080 | 'cause I've been, you know,

01:14:00.160 | tuning cost functions for 10 years now,

01:14:02.500 | but it really strikes me that

01:14:07.060 | yeah, we've moved the tuning

01:14:09.540 | and the like designing of features or whatever

01:14:13.220 | from the behavior side

01:14:18.220 | into the reward side.

01:14:19.680 | And yes, I agree that there's way less of it,

01:14:21.980 | but it still seems really hard

01:14:23.980 | to anticipate any possible situation

01:14:26.940 | and make sure you specify a reward function

01:14:30.180 | that when optimized will work well

01:14:32.780 | in every possible situation.

01:14:35.140 | - So you're kind of referring to unintended consequences

01:14:38.560 | or just in general,

01:14:39.640 | any kind of suboptimal behavior

01:14:42.080 | that emerges outside of the things you said,

01:14:44.800 | out of distribution.

01:14:46.480 | - Suboptimal behavior that is, you know, actually optimal.

01:14:49.680 | I mean, this, I guess the idea of unintended consequences,

01:14:51.600 | you know, it's optimal with respect to what you specified,

01:14:53.680 | but it's not what you want.

01:14:55.440 | And there's a difference between those.

01:14:57.520 | - But that's not fundamentally a robotics problem, right?

01:14:59.840 | That's a human problem.

01:15:01.280 | So like- - That's the thing, right?

01:15:03.400 | So there's this thing called Goodhart's Law,

01:15:05.260 | which is you set a metric for an organization

01:15:07.900 | and the moment it becomes a target

01:15:10.860 | that people actually optimize for,

01:15:13.020 | it's no longer a good metric.

01:15:14.660 | - Oh, what's it called?

01:15:15.500 | - Goodhart's Law. - Goodhart's Law.

01:15:17.380 | So the moment you specify a metric,

01:15:20.060 | it stops doing its job.

01:15:21.580 | - Yeah, it stops doing its job.

01:15:23.980 | So there's, yeah, there's such a thing

01:15:25.100 | as over-optimizing for things and, you know,

01:15:27.380 | failing to think ahead of time of all the possible things.

01:15:32.380 | All the possible things that might be important.

01:15:35.600 | And so that's interesting because,

01:15:38.860 | historically I work a lot on reward learning

01:15:41.560 | from the perspective of customizing to the end user,

01:15:44.020 | but it really seems like it's not just the interaction

01:15:48.040 | with the end user that's a problem of the human

01:15:50.880 | and the robot collaborating so that the robot

01:15:53.160 | can do what the human wants, right?

01:15:55.160 | This kind of back and forth, the robot probing,

01:15:57.280 | the person being informative, all of that stuff

01:16:00.200 | might be actually just as applicable

01:16:04.420 | to this kind of maybe new form of human robot interaction,

01:16:07.460 | which is the interaction between the robot

01:16:10.780 | and the expert programmer, roboticist, designer

01:16:14.300 | in charge of actually specifying

01:16:16.260 | what the heck the robot should do,

01:16:18.340 | specifying the task for the robot.

01:16:20.260 | - That's fascinating, that's so cool,

01:16:21.500 | like collaborating on the reward design.

01:16:23.780 | - Right, collaborating on the reward design.

01:16:26.220 | And so what does it mean, right?

01:16:28.100 | What does it, when we think about the problem,

01:16:29.860 | not as someone specifies all of your job is to optimize

01:16:34.460 | and we start thinking about you're in this interaction

01:16:37.660 | and this collaboration, and the first thing that comes up

01:16:41.340 | is when the person specifies a reward,

01:16:45.020 | it's not gospel, it's not like the letter of the law,

01:16:48.780 | it's not the definition of the reward function

01:16:52.140 | you should be optimizing, 'cause they're doing their best,

01:16:54.900 | but they're not some magic perfect oracle.

01:16:57.180 | And the sooner we start understanding that,

01:16:58.780 | I think the sooner we'll get to more robust robots

01:17:02.460 | that function better in different situations.

01:17:06.500 | And then you have kind of say, okay, well,

01:17:08.580 | it's almost like robots are over learning,

01:17:12.780 | over putting too much weight on the reward specified

01:17:16.860 | by definition, and maybe leaving a lot of other information

01:17:21.220 | on the table, like what are other things we could do

01:17:23.380 | to actually communicate to the robot

01:17:25.580 | about what we want them to do

01:17:27.460 | besides attempting to specify a reward function.

01:17:29.580 | - Yeah, you have this awesome, again,

01:17:32.180 | I love the poetry of it, of leaked information.

01:17:34.860 | You mentioned humans leak information about what they want,

01:17:39.940 | leak reward signal for the robot.

01:17:45.020 | So how do we detect these leaks?

01:17:47.780 | - What is that?

01:17:48.620 | Yeah, what are these leaks?

01:17:49.980 | - Well, they're just, I don't know,

01:17:51.860 | those were just recently, so I read it,

01:17:54.100 | I don't know where, from you,

01:17:55.260 | and it's gonna stick with me for a while for some reason,

01:17:58.660 | 'cause it's not explicitly expressed,

01:18:00.980 | it kind of leaks indirectly from our behavior.

01:18:04.540 | - From what we do, yeah, absolutely.

01:18:06.180 | So I think maybe some surprising bits, right?

01:18:11.180 | So we were talking before about, I'm a robot arm,

01:18:14.780 | it needs to move around people,

01:18:17.500 | carry stuff, put stuff away, all of that.

01:18:20.580 | And now imagine that the robot has some initial objective

01:18:25.580 | that the programmer gave it

01:18:28.980 | so they can do all these things functionally,

01:18:30.700 | it's capable of doing that.

01:18:32.460 | And now I noticed that it's doing something

01:18:35.820 | and maybe it's coming too close to me, right?

01:18:39.500 | And maybe I'm the designer, maybe I'm the end user

01:18:41.500 | and this robot is now in my home.

01:18:43.900 | And I push it away.

01:18:47.860 | So I push it away 'cause it's a reaction

01:18:50.580 | to what the robot is currently doing.

01:18:52.420 | And this is what we call physical human-robot interaction.

01:18:55.860 | And now there's a lot of interesting work

01:18:58.500 | on how the heck do you respond

01:18:59.900 | to physical human-robot interaction?

01:19:01.300 | What should the robot do if such an event occurs?

01:19:03.580 | And there's sort of different schools of thought.

01:19:05.020 | Well, you can sort of treat it the control theoretic way

01:19:08.100 | and say, this is a disturbance that you must reject.

01:19:11.220 | You can sort of treat it more kind of heuristically

01:19:15.900 | and say, I'm gonna go into some

01:19:17.100 | like gravity compensation mode

01:19:18.300 | so that I'm easily maneuverable around.

01:19:19.780 | I'm gonna go in the direction that the person pushed me.

01:19:22.260 | And to us, part of realization has been

01:19:27.260 | that that is signal that communicates about the reward

01:19:30.500 | because if my robot was moving in an optimal way

01:19:34.540 | and I intervened, that means that I disagree

01:19:37.780 | with his notion of optimality, right?

01:19:40.260 | Whatever it thinks is optimal is not actually optimal.

01:19:43.540 | And sort of optimization problems aside,

01:19:45.980 | that means that the cost function,

01:19:47.420 | the reward function is incorrect

01:19:51.380 | or at least is not what I want it to be.

01:19:53.540 | - How difficult is that signal to interpret

01:19:58.420 | and make actionable?

01:19:59.420 | So like, 'cause this connects

01:20:00.780 | to our autonomous vehicle discussion

01:20:02.100 | where they're in the semi-autonomous vehicle

01:20:03.940 | or autonomous vehicle when the safety driver

01:20:06.460 | disengages the car.

01:20:08.460 | But they could have disengaged it for a million reasons.

01:20:11.820 | - Yeah, yeah.

01:20:12.660 | So that's true.

01:20:15.060 | Again, it comes back to,

01:20:16.860 | can you structure a little bit your assumptions

01:20:20.460 | about how human behavior relates to what they want?

01:20:23.340 | And you can, one thing that we've done

01:20:26.100 | is literally just treated this external torque

01:20:29.540 | that they applied as, when you take that

01:20:33.020 | and you add it with what the torque the robot was

01:20:35.540 | already applying, that overall action

01:20:38.060 | is probably relatively optimal

01:20:39.700 | in respect to whatever it is that the person wants.

01:20:41.860 | And then that gives you information

01:20:43.100 | about what it is that they want.

01:20:44.380 | So you can learn that people want you

01:20:45.700 | to stay further away from them.

01:20:47.660 | Now, you're right that there might be many things

01:20:49.820 | that explain just that one signal

01:20:51.420 | and that you might need much more data than that

01:20:53.380 | for the person to be able to shape

01:20:55.500 | your reward function over time.

01:20:57.260 | You can also do this info gathering stuff

01:21:00.900 | that we were talking about.

01:21:01.780 | Now that we've done that in that context, just to clarify,

01:21:04.100 | but it's definitely something we thought about

01:21:05.380 | where you can have the robot start acting in a way,

01:21:10.380 | like if there are a bunch of different explanations, right?

01:21:13.420 | It moves in a way where it sees if you correct it

01:21:16.340 | in some other way or not,

01:21:17.580 | and then kind of actually plans its motion

01:21:19.900 | so that it can disambiguate and collect information

01:21:22.700 | about what you want.

01:21:23.820 | Anyway, so that's one way,

01:21:25.980 | that's kind of sort of leaked information,

01:21:27.420 | maybe even more subtle leaked information

01:21:29.260 | is if I just press the E-stop, right?

01:21:32.740 | I just, I'm doing it out of panic

01:21:34.020 | 'cause the robot is about to do something bad.

01:21:36.260 | There's again, information there, right?

01:21:38.460 | Okay, the robot should definitely stop,

01:21:40.780 | but it should also figure out

01:21:42.540 | that whatever it was about to do was not good.

01:21:45.220 | And in fact, it was so not good

01:21:46.740 | that stopping and remaining stopped for a while

01:21:48.940 | was a better trajectory for it

01:21:51.100 | than whatever it is that it was about to do.

01:21:52.780 | And that again is information about what are my preferences?

01:21:55.980 | What do I want?

01:21:57.580 | - Speaking of E-stops,

01:21:59.700 | what are your expert opinions

01:22:03.620 | on the three laws of robotics from Isaac Asimov?

01:22:07.340 | (laughing)

01:22:08.180 | That don't harm humans, obey orders, protect yourself.

01:22:11.300 | I mean, it's such a silly notion,

01:22:13.340 | but I speak to so many people these days,

01:22:15.420 | just regular folks, just, I don't know,

01:22:17.060 | my parents and so on about robotics.

01:22:19.420 | And they kind of operate in that space of,

01:22:21.940 | you know, imagining our future with robots

01:22:25.820 | and thinking what are the ethical,

01:22:28.460 | how do we get that dance right?

01:22:31.500 | I know the three laws might be a silly notion,

01:22:34.020 | but do you think about like

01:22:35.580 | what universal reward functions there might be

01:22:38.980 | that we should enforce on the robots of the future?

01:22:43.980 | Or is that a little too far out?

01:22:47.060 | And it doesn't, or is the mechanism that you just described,

01:22:51.260 | there shouldn't be three laws,

01:22:52.700 | it should be constantly adjusting kind of thing.

01:22:55.180 | - I think it should constantly be adjusting kind of thing.

01:22:57.820 | You know, the issue with the laws is,

01:23:00.100 | I don't even, you know, they're words

01:23:02.620 | and I have to write math

01:23:04.620 | and have to translate them into math.

01:23:06.260 | What does it mean to--

01:23:07.300 | What does harm mean?

01:23:08.260 | What is--

01:23:09.100 | - Obey. - It's not math.

01:23:11.980 | - Obey what, right?

01:23:12.940 | 'Cause we just talked about how

01:23:14.740 | you try to say what you want,

01:23:17.100 | but you don't always get it right.

01:23:19.940 | And you want these machines to do what you want,

01:23:22.580 | not necessarily exactly what you literally,

01:23:24.620 | so you don't want them to take you literally.

01:23:26.660 | You wanna take what you say and interpret it in context.

01:23:31.660 | And that's what we do with the specified rewards.

01:23:33.540 | We don't take them literally anymore from the designer.

01:23:36.740 | We, not we as a community,

01:23:38.540 | we as, you know, some members of my group,

01:23:41.540 | we and some of our collaborators

01:23:45.740 | like Peter Beal and Stuart Russell,

01:23:47.500 | we sort of say, okay, the designer specified this thing,

01:23:52.420 | but I'm gonna interpret it not as,

01:23:55.660 | this is the universal reward function

01:23:57.180 | that I shall always optimize, always and forever,

01:23:59.540 | but as this is good evidence about what the person wants

01:24:05.500 | and I should interpret that evidence

01:24:07.460 | in the context of these situations

01:24:09.460 | that it was specified for.

01:24:11.060 | 'Cause ultimately that's what the designer thought about,

01:24:12.900 | that's what they had in mind.

01:24:14.380 | And really them specifying reward function

01:24:16.860 | that works for me in all these situations

01:24:19.060 | is really kind of telling me

01:24:21.140 | that whatever behavior that incentivizes

01:24:23.220 | must be good behavior respect to the thing

01:24:26.060 | that I should actually be optimizing for.

01:24:28.220 | And so now the robot kind of has uncertainty

01:24:30.420 | about what it is that it should be,

01:24:32.420 | what its reward function is.

01:24:34.380 | And then there's all these additional signals

01:24:36.380 | we've been finding that it can kind of continually learn from

01:24:39.660 | and adapt its understanding of what people want.

01:24:41.740 | Every time the person corrects it, maybe they demonstrate,

01:24:44.820 | maybe they stop, hopefully not, right?

01:24:48.380 | One really, really crazy one is the environment itself.

01:24:53.380 | Like our world, you don't, it's not, you know,

01:24:58.900 | you observe our world and the state of it

01:25:01.540 | and it's not that you're seeing behavior

01:25:03.580 | and you're saying, oh, people are making decisions

01:25:05.220 | that are rational, blah, blah, blah.

01:25:07.020 | But our world is something that we've been acting when,

01:25:12.220 | according to our preferences.

01:25:14.180 | So I have this example where like,

01:25:15.620 | the robot walks into my home

01:25:16.860 | and my shoes are laid down on the floor,

01:25:19.380 | kind of in a line, right?

01:25:21.060 | It took effort to do that.

01:25:23.260 | So even though the robot doesn't see me doing this,

01:25:27.420 | you know, actually aligning the shoes,

01:25:29.860 | it should still be able to figure out

01:25:31.500 | that I want the shoes aligned

01:25:33.180 | because there's no way for them to have magically,

01:25:35.860 | you know, instantiated themselves in that way.

01:25:38.980 | Someone must have actually taken the time to do that.

01:25:43.660 | So it must be important.

01:25:44.580 | - So the environment actually tells, the environment--

01:25:46.980 | - Leaks information.

01:25:47.980 | - Leaks information.

01:25:48.820 | I mean, the environment is the way it is

01:25:50.660 | because humans somehow manipulated it.

01:25:52.860 | So you have to kind of reverse engineer the narrative

01:25:55.700 | that happened to create the environment as it is

01:25:57.780 | and that leaks the preference information.

01:26:00.580 | - Yeah, yeah, yeah, yeah.

01:26:01.420 | And you have to be careful, right?

01:26:03.100 | Because people don't have the bandwidth to do everything.

01:26:06.660 | So just because, you know, my house is messy

01:26:08.100 | doesn't mean that I want it to be messy, right?

01:26:10.820 | But that just, you know,

01:26:12.860 | I didn't put the effort into that.

01:26:14.380 | I put the effort into something else.

01:26:16.220 | So the robot should figure out,

01:26:17.380 | well, that's something else was more important,

01:26:19.180 | but it doesn't mean that, you know,

01:26:20.380 | the house being messy is not.

01:26:21.620 | So it's a little subtle, but yeah, we really think of it,

01:26:24.500 | the state itself is kind of like a choice

01:26:26.740 | that people implicitly made about how they want their world.

01:26:31.980 | - What book or books, technical or fiction or philosophical,

01:26:35.060 | had a, when you like look back, your life had a big impact,

01:26:39.700 | maybe it was a turning point, it was inspiring in some way.

01:26:42.740 | Maybe we're talking about some silly book

01:26:45.780 | that nobody in their right mind would want to read,

01:26:48.700 | or maybe it's a book that you would recommend

01:26:51.740 | to others to read,

01:26:52.660 | or maybe those could be two different recommendations

01:26:54.780 | that, of books that could be useful

01:26:59.120 | for people on their journey.

01:27:01.700 | - When I was in, it's kind of a personal story.

01:27:04.420 | When I was in 12th grade,

01:27:06.580 | I got my hands on a PDF copy in Romania

01:27:11.580 | of Russell Norvig, "AI Modern Approach."

01:27:15.580 | I didn't know anything about AI at that point.

01:27:17.620 | I was, you know, I had watched the movie, "The Matrix."

01:27:21.180 | (laughs)

01:27:22.020 | It was my exposure.

01:27:23.300 | And so I started going through this thing,

01:27:30.140 | and you know, you were asking in the beginning,

01:27:32.100 | what are, you know, it's math and it's algorithms,

01:27:36.180 | what's interesting?

01:27:37.140 | It was so captivating.

01:27:38.820 | This notion that you could just have a goal

01:27:42.580 | and figure out your way

01:27:44.580 | through a kind of a messy, complicated situation,

01:27:48.460 | sort of what sequence of decisions you should make

01:27:50.980 | to autonomously to achieve that goal.

01:27:53.980 | That was so cool.

01:27:55.900 | I'm, you know, I'm biased, but that's a cool book.

01:28:00.060 | - Yeah, you can convert, you know,

01:28:02.500 | the goal of the process of intelligence and mechanize it.

01:28:06.980 | I had the same experience.

01:28:08.060 | I was really interested in psychiatry

01:28:10.060 | and trying to understand human behavior.

01:28:12.020 | And then AI and modern approach is like,

01:28:14.580 | wait, you can just reduce it all to--

01:28:15.900 | - That's how you can write math about human behavior, right?

01:28:18.860 | Yeah, so that's, and I think that stuck with me

01:28:21.540 | 'cause, you know, a lot of what I do,

01:28:23.680 | a lot of what we do in my lab

01:28:26.740 | is write math about human behavior,

01:28:28.820 | combine it with data and learning, put it all together,

01:28:31.500 | give it to robots to plan with,

01:28:32.940 | and, you know, hope that instead of writing rules

01:28:36.140 | for the robots, writing heuristics, designing behavior,

01:28:39.500 | they can actually autonomously come up

01:28:42.100 | with the right thing to do around people.

01:28:44.060 | That's kind of our, you know, that's our signature move.

01:28:46.420 | We wrote some math,

01:28:47.380 | and then instead of kind of hand crafting this

01:28:49.580 | and that and that, the robot figured stuff out

01:28:52.300 | and isn't that cool.

01:28:53.580 | And I think that is the same enthusiasm that I got

01:28:56.260 | from the robot figured out how to reach that goal

01:28:58.780 | in that graph, isn't that cool?

01:29:00.820 | - So, apologize for the romanticized questions,

01:29:05.840 | but, and the silly ones.

01:29:07.780 | If a doctor gave you five years to live,

01:29:10.180 | sort of emphasizing the finiteness of our existence,

01:29:15.900 | what would you try to accomplish?

01:29:19.620 | - It's like my biggest nightmare, by the way.

01:29:23.100 | I really like living.

01:29:24.700 | So, I'm actually, I really don't like the idea

01:29:28.580 | of being told that I'm going to die.

01:29:31.260 | - Sorry to linger on that for a second.

01:29:32.660 | Do you, I mean, do you meditate or ponder

01:29:35.260 | on your mortality or are human?

01:29:37.900 | The fact that this thing ends,

01:29:39.140 | it seems to be a fundamental feature.

01:29:41.480 | Do you think of it as a feature or a bug too?

01:29:44.460 | Is it, you said you don't like the idea of dying,

01:29:47.520 | but if I were to give you a choice of living forever,

01:29:51.020 | like you're not allowed to die.

01:29:52.380 | - Now I'll say that I want to live forever,

01:29:54.380 | but I watch this show, it's very silly,

01:29:56.540 | it's called "The Good Place"

01:29:57.860 | and they reflect a lot on this.

01:29:59.340 | And the moral of the story is that you have to make

01:30:02.580 | the afterlife be finite too,

01:30:05.380 | 'cause otherwise people just kind of, it's like WALL-E.

01:30:08.020 | It's like, ah, I'm sorry, I'm gonna lie around.

01:30:10.620 | So, I think the finiteness helps,

01:30:12.680 | but yeah, it's just, I don't,

01:30:16.460 | I'm not a religious person.

01:30:19.100 | I don't think that there's something after.

01:30:21.900 | And so I think it just ends

01:30:24.180 | and you stop existing.

01:30:25.540 | And I really like existing.

01:30:27.060 | It's such a great privilege to exist.

01:30:30.360 | Yeah, it's just, I think that's the scary part.

01:30:35.340 | - I still think that we like existing so much

01:30:38.980 | because it ends.

01:30:40.500 | And that's so sad.

01:30:41.740 | It's so sad to me every time.

01:30:44.140 | I find almost everything about this life beautiful.

01:30:46.900 | The silliest, most mundane things are just beautiful.

01:30:49.860 | And I think I'm cognizant of the fact

01:30:51.860 | that I find it beautiful because it ends.

01:30:54.900 | And it's so, I don't know.

01:30:57.940 | I don't know how to feel about that.

01:30:59.740 | I also feel like there's a lesson in there for robotics

01:31:04.500 | and AI that is not like,

01:31:08.700 | the finiteness of things seems to be a fundamental nature

01:31:12.540 | of human existence.

01:31:13.420 | I think some people sort of accuse me

01:31:15.820 | of just being Russian and melancholic

01:31:18.340 | and romantic or something,

01:31:20.020 | but that seems to be a fundamental nature

01:31:23.780 | of our existence that should be incorporated

01:31:27.420 | in our reward functions.

01:31:29.420 | But anyway, if you were, speaking of reward functions,

01:31:34.420 | if you only had five years,

01:31:36.860 | what would you try to accomplish?

01:31:38.460 | - This is the thing.

01:31:39.460 | I'm thinking about this question

01:31:43.300 | and have a pretty joyous moment

01:31:45.300 | 'cause I don't know that I would change much.

01:31:48.460 | - Oh, that's a beautiful thing.

01:31:51.060 | - I'm trying to make some contributions

01:31:53.820 | to how we understand human AI interaction.

01:31:57.420 | I don't think I would change that.

01:31:59.140 | Maybe I'll take more trips to the Caribbean or something,

01:32:04.300 | but I tried to solve that already from time to time.

01:32:08.580 | So yeah, I mean, I try to do the things that bring me joy

01:32:13.580 | and thinking about these things bring me joy

01:32:16.540 | is the Marie Kondo thing.

01:32:18.260 | Don't do stuff that doesn't spark joy.

01:32:20.180 | For the most part, I do things that spark joy.

01:32:22.460 | Maybe I'll do like less service

01:32:23.900 | in the department or something.

01:32:25.460 | (laughs)

01:32:27.540 | I'm not dealing with admissions anymore.

01:32:31.660 | But no, I mean, I think I have amazing colleagues

01:32:35.860 | and amazing students and amazing family and friends

01:32:39.420 | and kind of spending time in some balance

01:32:41.460 | with all of them is what I do.

01:32:43.260 | And that's what I'm doing already.

01:32:44.820 | So I don't know that I would really change anything.

01:32:48.020 | - So on the spirit of positiveness,

01:32:50.860 | what small act of kindness, if one pops to mind,

01:32:53.940 | were you once shown that you will never forget?

01:32:56.900 | - When I was in high school,

01:33:02.240 | my friends, my classmates did some tutoring.

01:33:08.660 | We were gearing up for our baccalaureate exam

01:33:11.460 | and they did some tutoring on,

01:33:14.020 | well, some on math, some on whatever.

01:33:15.940 | I was comfortable enough with some of those subjects,

01:33:19.580 | but physics was something that I hadn't focused on

01:33:22.180 | in a while.

01:33:23.020 | And so they were all working with this one teacher.

01:33:28.020 | And I started working with that teacher.

01:33:32.220 | Her name is Nicole Bacano.

01:33:33.860 | And she was the one who kind of opened up

01:33:38.340 | this whole world for me,

01:33:39.740 | because she sort of told me that I should take the SATs

01:33:44.620 | and apply to go to college abroad

01:33:47.900 | and do better on my English and all of that.

01:33:52.300 | And when it came to, well, financially,

01:33:55.540 | I couldn't, my parents couldn't really afford

01:33:57.780 | to do all these things.

01:33:59.300 | She started tutoring me on physics for free.

01:34:02.180 | And on top of that, sitting down with me

01:34:04.100 | to kind of train me for SATs and all that jazz

01:34:08.300 | that she had experience with.

01:34:10.380 | - Wow.

01:34:12.020 | And obviously that has taken you to here today,

01:34:16.260 | also to one of the world experts in robotics.

01:34:18.300 | It's funny, those little-

01:34:20.020 | - Yeah, people do it via small or large acts of kindness.

01:34:23.940 | - For no reason, really.

01:34:25.420 | Just out of karma.

01:34:28.140 | - Wanting to support someone, yeah.

01:34:29.860 | - Yeah.

01:34:31.340 | So we talked a ton about reward functions.

01:34:34.540 | Let me talk about the most ridiculous big question.

01:34:38.300 | What is the meaning of life?

01:34:40.100 | What's the reward function under which we humans operate?

01:34:43.340 | Like what, maybe to your life,

01:34:45.020 | maybe broader to human life in general,

01:34:48.060 | what do you think?

01:34:48.900 | What gives life fulfillment, purpose, happiness, meaning?

01:34:55.580 | - You can't even ask that question with a straight face.

01:34:59.860 | That's how ridiculous this is.

01:35:00.700 | - I can't, I can't.

01:35:01.860 | - Okay, so, you know-

01:35:05.820 | - You're gonna try to answer it anyway, aren't you?

01:35:09.460 | - So I was in a planetarium once.

01:35:12.980 | - Yes.

01:35:14.860 | - And they show you the thing

01:35:16.740 | and then they zoom out and zoom out

01:35:18.180 | and this whole like you're a speck of dust kind of thing.

01:35:20.780 | I think I was conceptualizing that we're kind of,

01:35:22.900 | what are humans?

01:35:23.860 | We're just on this little planet, whatever.

01:35:26.660 | We don't matter much in the grand scheme of things.

01:35:29.580 | And then my mind got really blown

01:35:32.060 | 'cause they talked about this multiverse theory

01:35:35.260 | where they kind of zoomed out

01:35:37.020 | and were like, this is our universe.

01:35:38.340 | And then like there's a bazillion other ones

01:35:40.380 | and it's just these pop in and out of existence.

01:35:42.500 | So like our whole thing that we can't even fathom

01:35:45.580 | how big it is was like a blimp that went in and out.

01:35:48.820 | And at that point I was like, okay, I'm done.

01:35:51.300 | This is not, there is no meaning.

01:35:53.500 | And clearly what we should be doing

01:35:56.340 | is try to impact whatever local thing we can impact.

01:35:59.900 | Our communities leave a little bit behind there,

01:36:02.260 | our friends, our family, our local communities

01:36:05.260 | and just try to be there for other humans.

01:36:09.300 | 'Cause I just, everything beyond that seems ridiculous.

01:36:13.340 | - I mean, are you,

01:36:14.220 | like how do you make sense of these multiverses?

01:36:16.540 | Like are you inspired by the immensity of it?

01:36:21.100 | That do you, I mean, is there,

01:36:25.300 | like is it amazing to you

01:36:29.420 | or is it almost paralyzing in the mystery of it?

01:36:34.540 | - It's frustrating.

01:36:36.100 | I'm frustrated by my inability to comprehend.

01:36:40.780 | It just feels very frustrating.

01:36:43.980 | It's like, there's some stuff that we should time,

01:36:47.020 | blah, blah, blah, that we should really be understanding.

01:36:49.420 | And I definitely don't understand it,

01:36:50.860 | but the amazing physicists of the world

01:36:55.180 | have a much better understanding than me,

01:36:56.700 | but it's still since Epsilon

01:36:57.820 | and then the grand scheme of things.

01:36:59.220 | So it's very frustrating.

01:37:00.780 | It's just, it sort of feels like our brains

01:37:02.620 | don't have some fundamental capacity yet.

01:37:06.180 | Well, yet or ever, I don't know, but.

01:37:08.220 | - Well, that's one of the dreams of artificial intelligence

01:37:10.860 | is to create systems that will aid,

01:37:13.140 | expand our cognitive capacity in order to understand,

01:37:16.060 | build the theory of everything with the physics

01:37:19.900 | and understand what the heck these multiverses are.

01:37:23.840 | So I think there's no better way to end it

01:37:28.220 | than talking about the meaning of life

01:37:29.860 | and the fundamental nature of the universe

01:37:31.780 | and the multiverse. - And the multiverse.

01:37:33.700 | - So Anca, it is a huge honor.

01:37:35.500 | One of my favorite conversations I've had.

01:37:38.460 | I really, really appreciate your time.

01:37:41.060 | Thank you for talking today.

01:37:42.140 | - Thank you for coming.

01:37:43.380 | Come back again.

01:37:44.400 | - Thanks for listening to this conversation

01:37:46.940 | with Anca Duggan,

01:37:48.300 | and thank you to our presenting sponsor, Cash App.

01:37:51.260 | Please consider supporting the podcast

01:37:52.900 | by downloading Cash App and using code LEXPODCAST.

01:37:56.980 | If you enjoy this podcast, subscribe on YouTube,

01:37:59.480 | review it with five stars on Apple Podcast,

01:38:01.960 | support it on Patreon,

01:38:03.320 | or simply connect with me on Twitter @LexFriedman.

01:38:06.780 | And now, let me leave you with some words

01:38:10.000 | from Isaac Asimov.

01:38:11.460 | "Your assumptions are your windows in the world.

01:38:15.680 | "Scrub them off every once in a while,

01:38:17.920 | "or the light won't come in."

01:38:19.860 | Thank you for listening, and hope to see you next time.

01:38:24.240 | (upbeat music)

01:38:26.820 | (upbeat music)

01:38:29.400 | [BLANK_AUDIO]

Anca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81

Chapters