back to indexElon Musk: Tesla Autopilot | Lex Fridman Podcast #18
Chapters
0:0 Introduction
2:35 The Dream of Autopilot
4:0 Autopilot Design
5:2 Computer Vision Uncertainty
7:10 How to best distribute effort
9:30 Fully redundant SOCs
10:18 Learning from edge cases
11:45 Manual control
12:57 Big leaps
13:56 Technological roadblocks
15:0 Selfdriving cars
16:52 Full autonomy
20:8 Functional vigilance
23:10 Driver monitoring
24:28 Operational design domain
26:32 Neural network security
28:29 General AI
00:00:00.000 |
The following is a conversation with Elon Musk. 00:00:17.240 |
including CEOs and CTOs of automotive, robotics, 00:00:34.600 |
offering a podcast conversation with Mr. Musk. 00:00:37.520 |
I accepted with full control of questions I could ask 00:00:46.880 |
I've never spoken with Elon before this conversation, 00:00:51.760 |
Neither he nor his companies have any influence 00:00:54.400 |
on my opinion, nor on the rigor and integrity 00:01:01.800 |
Tesla has never financially supported my research, 00:01:18.640 |
We agree on some things and disagree on others. 00:01:23.440 |
is to understand the way the guest sees the world. 00:01:26.880 |
One particular point of disagreement in this conversation 00:01:29.880 |
was the extent to which camera-based driver monitoring 00:01:36.040 |
it will remain relevant for AI-assisted driving. 00:01:45.160 |
I believe that if implemented and integrated effectively, 00:01:48.680 |
camera-based driver monitoring is likely to be of benefit 00:02:04.440 |
override any concern of human behavior and psychology. 00:02:12.000 |
but I deeply respect the engineering and innovation 00:02:16.840 |
My goal here is to catalyze a rigorous, nuanced, 00:02:20.560 |
and objective discussion in industry and academia 00:02:23.480 |
on AI-assisted driving, one that ultimately makes 00:02:30.840 |
And now, here's my conversation with Elon Musk. 00:02:38.640 |
when, in the beginning, the big picture system level 00:02:41.400 |
when it was first conceived and started being installed 00:02:49.760 |
- I wouldn't characterize it as a vision or dream, 00:02:51.360 |
simply that there are obviously two massive revolutions 00:03:06.340 |
And it became obvious to me that in the future, 00:03:19.120 |
Which is not to say that there's no use, it's just rare, 00:03:25.520 |
It's just obvious that cars will drive themselves completely, 00:03:29.640 |
And if we did not participate in the autonomy revolution, 00:03:48.200 |
five to 10 times more than a car which is not autonomous. 00:03:56.200 |
but let's say at least for the next five years, 00:04:00.520 |
- So there are a lot of very interesting design choices 00:04:09.960 |
or in the Model 3 on the center stack display, 00:04:20.520 |
- The whole point of the display is to provide 00:04:24.240 |
a health check on the vehicle's perception of reality. 00:04:32.240 |
but also radar and ultrasonics, GPS, and so forth. 00:04:47.640 |
like lane lines and traffic lights and other cars, 00:04:51.200 |
and then in vector space, that is re-rendered 00:04:54.840 |
onto a display so you can confirm whether the car 00:04:58.400 |
knows what's going on or not by looking out the window. 00:05:01.760 |
- Right, I think that's an extremely powerful thing 00:05:09.200 |
and understanding what the system is capable of. 00:05:17.480 |
you know, like road segmentation, lane detection, 00:05:19.760 |
vehicle detection, object detection underlying the system, 00:05:38.480 |
the vehicles in the vicinity, a very clean, crisp image, 00:05:41.800 |
and people do confirm that there's a car in front of me 00:05:44.640 |
and the system sees there's a car in front of me, 00:06:00.040 |
One is augmented vision, which I'm sure you've seen, 00:06:04.920 |
where it's basically, we draw boxes and labels 00:06:11.420 |
and then there's what we call the visualizer, 00:06:15.300 |
which is basically a vector space representation 00:06:32.360 |
But I think this is very difficult for people to, 00:06:40.940 |
to the current things that are being displayed 00:06:43.300 |
is optimized for the general public understanding 00:06:58.460 |
or if you have the development build like I do, 00:07:06.040 |
But those would just be total jerk to most people. 00:07:11.300 |
- What's your view on how to best distribute effort? 00:07:16.000 |
technical aspects of autopilot that are really important. 00:07:32.100 |
You only have so much money, only have so much time. 00:07:35.240 |
What do you think is the most important thing 00:07:40.060 |
Or do you see it as pretty evenly distributed 00:08:14.020 |
I think you keep quite close track of it, actually. 00:08:16.700 |
- Yeah, so we're approaching half a million cars 00:08:22.500 |
So this is, I'm not sure how many other cars on the road 00:08:29.400 |
but I'd be surprised if it's more than 5,000, 00:08:32.300 |
which means that we have 99% of all the data. 00:09:00.260 |
and plug the Tesla computer in, and that's it. 00:09:06.740 |
we're still exploring the boundaries of its capabilities, 00:09:10.180 |
but we're able to run the cameras at full frame rate, 00:09:15.420 |
and it's still got headroom, even on one of the systems. 00:09:19.980 |
The full self-driving computer is really two computers, 00:09:23.460 |
two systems on a chip that are fully redundant. 00:09:30.220 |
- The redundancy, are they perfect copies of each other? 00:09:35.980 |
as opposed to an arguing machine kind of architecture 00:09:43.140 |
if you have a twin-engine aircraft, commercial aircraft, 00:09:46.560 |
the system will operate best if both systems are operating, 00:10:13.500 |
We can actually just run a full duplicate on each one. 00:10:17.220 |
- You haven't really explored or hit the limit of the-- 00:10:32.180 |
the really valuable data to learn from is the edge cases. 00:10:36.740 |
So, how do you, I mean, I've heard you talk somewhere 00:10:48.300 |
or perhaps can you speak to those edge cases, 00:10:56.180 |
how to discover more and more and more edge cases in driving? 00:11:00.300 |
- Well, there's a lot of things that are learned. 00:11:04.820 |
I say somebody's on autopilot and they take over, 00:11:08.080 |
and then, okay, that's a trigger that goes to our system 00:11:12.380 |
that says, okay, did they take over for convenience, 00:11:16.800 |
because the autopilot wasn't working properly? 00:11:19.380 |
There's also, like, let's say we're trying to figure out 00:11:21.840 |
what is the optimal spline for traversing an intersection. 00:11:27.880 |
Then, the ones where there are no interventions 00:11:33.660 |
So, you then say, okay, when it looks like this, 00:11:36.380 |
do the following, and then you get the optimal spline 00:11:40.640 |
for a complex, navigating a complex intersection. 00:11:44.780 |
- So, that's for, so there's kind of the common case. 00:11:49.320 |
You're trying to capture a huge amount of samples 00:11:52.280 |
of a particular intersection, when things went right, 00:11:55.040 |
and then there's the edge case where, as you said, 00:11:59.240 |
not for convenience, but something didn't go exactly right. 00:12:02.040 |
- Somebody took over, somebody asserted manual control 00:12:04.060 |
from autopilot, and really, like, the way to look at this 00:12:09.900 |
If the user had to do input, there's something, 00:12:13.920 |
- That's a powerful line to think of it that way, 00:12:19.960 |
or if you want to, it's a navigation decision 00:12:23.080 |
that autopilot's not currently designed to do, 00:12:28.360 |
- Yeah, that's gonna change with navigate and autopilot, 00:12:30.120 |
which were just released, and without still confirm. 00:12:36.120 |
like, asserting control in order to do a lane change, 00:12:39.960 |
or exit a freeway, or do a highway interchange, 00:12:48.880 |
- Yeah, so that, I don't think people quite understand 00:13:06.960 |
through the development of autopilot, through its history, 00:13:22.120 |
- And it also automatically overtakes slow cars. 00:13:24.880 |
So, it's both navigation and seeking the fastest lane. 00:13:29.880 |
So, it'll overtake slower cars, and exit the freeway, 00:13:50.200 |
I mean, on the development version that I'm driving, 00:13:52.280 |
the car fully stops and goes at traffic lights. 00:13:59.800 |
that are an inkling of a step towards full autonomy. 00:14:02.360 |
What would you say are the biggest technological roadblocks 00:14:24.280 |
or any Model 3 that has the full self-driving package, 00:14:29.760 |
That's important to have enough base computation. 00:14:35.840 |
Then refining the neural net and the control software, 00:14:38.840 |
but all of that can just be provided as an over-the-air update. 00:15:01.000 |
- But capable is an interesting word, because-- 00:15:11.760 |
and then the reliability will increase dramatically, 00:15:14.000 |
and then it will receive regulatory approval. 00:15:27.800 |
I believe you are buying an appreciating asset, 00:15:33.120 |
- So, that's a really important statement there, 00:15:47.940 |
But, what's your intuition on the software side? 00:16:09.260 |
- I think people enjoy it very much so, on the highways. 00:16:12.820 |
It's a total game changer for quality of life. 00:16:21.340 |
So, it's really just extending that functionality 00:16:31.420 |
and then being able to navigate complicated parking lots, 00:16:41.300 |
even if it's in a complete maze of a parking lot. 00:16:54.420 |
and something that people would actually find a lot of use 00:17:00.820 |
it's rich of annoyance when you have to do it manually, 00:17:15.620 |
If you look at the current level four vehicles 00:17:28.860 |
because there's always a safety driver in almost all cases 00:17:44.820 |
So, its capabilities are powerful enough to drive, 00:17:49.020 |
to still be supervising, just like a safety driver is 00:17:57.380 |
- I think it'll require detecting hands on wheel 00:18:01.540 |
for at least six months or something like that from here. 00:18:06.540 |
Really, it's a question of, from a regulatory standpoint, 00:18:14.840 |
how much safer than a person does autopilot need to be 00:18:33.880 |
with high confidence, statistically speaking, 00:18:36.640 |
that the car is dramatically safer than a person, 00:18:45.920 |
So, it might need to be like 200 or 300% safer than a person. 00:18:52.400 |
- Incidents per mile, so crashes and fatalities. 00:19:06.960 |
there are far more crashes than there are fatalities. 00:19:10.960 |
So, you can assess what is the probability of a crash, 00:19:14.900 |
then there's another step, which is probability of injury, 00:19:23.840 |
And all of those need to be much better than a person 00:19:35.760 |
to have a healthy discourse with the regulatory bodies 00:19:40.080 |
- I mean, there's no question that regulators 00:19:57.800 |
there's I think almost 40,000 automotive deaths per year. 00:20:04.480 |
they'll probably receive a thousand times more press 00:20:08.820 |
- So, the psychology of that is actually fascinating. 00:20:11.480 |
I don't think we'll have enough time to talk about that, 00:20:13.360 |
but I have to talk to you about the human side of things. 00:20:17.040 |
So, myself and our team at MIT recently released a paper 00:20:20.960 |
on functional vigilance of drivers while using Autopilot. 00:20:24.600 |
This is work we've been doing since Autopilot 00:20:27.480 |
was first released publicly over three years ago, 00:20:30.220 |
collecting video of driver faces and driver body. 00:20:34.600 |
So, I saw that you tweeted a quote from the abstract, 00:20:38.460 |
so I can at least guess that you've glanced at it. 00:20:47.280 |
So, it appears that in the data that we've collected, 00:20:52.280 |
that drivers are maintaining functional vigilance 00:20:55.240 |
such that we're looking at 18,000 disengagements 00:21:00.400 |
and annotating were they able to take over control 00:21:05.760 |
So, they were there, present, looking at the road 00:21:10.180 |
Okay, so this goes against what many would predict 00:21:15.180 |
from the body of literature on vigilance with automation. 00:21:19.500 |
Now, the question is, do you think these results 00:21:31.480 |
you know, there's a small minority of drivers 00:21:36.060 |
where their vigilance decrement would increase 00:21:42.580 |
I mean, the system's improving so much, so fast, 00:21:47.580 |
that this is gonna be a moot point very soon. 00:21:50.340 |
Where vigilance is, like, if something's many times 00:21:57.100 |
safer than a person, then adding a person does, 00:22:11.500 |
So, the fact that a human may, some percent of the population 00:22:16.500 |
may exhibit a vigilance decrement will not affect 00:22:22.340 |
- No, in fact, I think it will become, very, very quickly, 00:22:29.260 |
but I'd say, I'd be shocked if it's not next year, 00:22:42.900 |
Now, it used to be that there were elevator operators, 00:22:45.700 |
and you couldn't go in an elevator by yourself 00:22:56.980 |
because the automated elevator that stops the floors 00:23:14.660 |
But I also have to ask, from a user experience 00:23:21.260 |
is camera-based detection of just sensing the human, 00:23:29.660 |
On the computer vision side, that's a fascinating problem, 00:23:59.220 |
And, like I said, you wouldn't want someone in the elevator, 00:24:08.500 |
some random person operating an elevator between floors? 00:24:21.780 |
from what you've seen with a full self-driving car, 00:24:24.300 |
computer, the rate of improvement is exponential. 00:24:27.340 |
- So one of the other very interesting design choices 00:24:33.780 |
is the operational design domain of autopilot. 00:24:41.700 |
So contrast another vehicle system that we're studying 00:24:50.500 |
very constrained to particular kinds of highways, 00:24:55.420 |
but it's much narrower than the ODD of Tesla vehicles. 00:25:10.380 |
in that different philosophy of thinking where, 00:25:22.300 |
the limitations of the system, at least early on, 00:25:26.220 |
together with the instrument cluster display, 00:25:28.260 |
they start to understand what are the capabilities. 00:25:31.980 |
The con is you're letting drivers use it basically anywhere. 00:25:36.980 |
- Well, anywhere that could detect lanes with confidence. 00:26:08.620 |
to drive one of these two-ton death machines, 00:26:12.980 |
and they could just drive wherever they wanted. 00:26:14.460 |
Just like elevators, you could just move the elevator 00:26:18.140 |
It could stop at halfway between floors if you want. 00:26:23.540 |
So, it's gonna seem like a mad thing in the future 00:26:31.840 |
- So I have a bunch of questions about the human psychology, 00:26:35.660 |
about behavior and so on, that would become-- 00:26:50.500 |
and the deep learning approach of learning from data 00:26:59.420 |
who tricked Autopilot to act in unexpected ways 00:27:19.060 |
- Can you elaborate on the confidence behind that answer? 00:27:22.700 |
- Well, a neural net is just a bunch of matrix math. 00:27:33.320 |
and basically reverse engineer how the matrix 00:27:37.340 |
is being built and then create a little thing 00:28:01.600 |
- So, learn both on the valid data and the invalid data. 00:28:06.220 |
So, basically learn on the adversarial examples 00:28:12.340 |
what is a car and what is definitely not a car. 00:28:16.180 |
You train for this is a car and this is definitely not a car. 00:28:33.980 |
just Tesla and autopilot, current deep learning approaches 00:29:07.220 |
and then we'll need to figure out what shall we do 00:29:13.900 |
But it's amazing how people can't differentiate 00:29:24.420 |
and navigate streets versus general intelligence. 00:29:33.140 |
Like your toaster and your computer are both machines 00:29:35.820 |
but one's much more sophisticated than another. 00:29:51.060 |
To me, right now, this seems game, set, match. 00:29:55.260 |
I mean, I don't want to be complacent or overconfident 00:29:57.820 |
but that is just literally how it appears right now. 00:30:02.660 |
I could be wrong but it appears to be the case 00:30:09.660 |
- Do you think we will ever create an AI system 00:30:16.300 |
in a deep meaningful way like in the movie Her? 00:30:18.660 |
- I think AI will be capable of convincing you 00:30:27.760 |
- You know, we start getting into a metaphysical question 00:30:35.620 |
And maybe they do, maybe they don't, I don't know. 00:30:38.340 |
But from a physics standpoint, I tend to think of things, 00:31:17.500 |
- And it's similar to seeing our world as simulation. 00:31:21.100 |
There may not be a test to tell the difference 00:31:22.940 |
between what the real world and the simulation 00:31:30.180 |
And there may be ways to test whether it's a simulation. 00:31:36.020 |
but you could certainly imagine that a simulation 00:31:37.940 |
could correct that once an entity in the simulation 00:31:43.060 |
it could either restart, pause the simulation, 00:31:47.380 |
start a new simulation or do one of many other things 00:31:51.660 |
- So when maybe you or somebody else creates an AGI system