back to index

Elon Musk: Tesla Autopilot | Lex Fridman Podcast #18


Chapters

0:0 Introduction
2:35 The Dream of Autopilot
4:0 Autopilot Design
5:2 Computer Vision Uncertainty
7:10 How to best distribute effort
9:30 Fully redundant SOCs
10:18 Learning from edge cases
11:45 Manual control
12:57 Big leaps
13:56 Technological roadblocks
15:0 Selfdriving cars
16:52 Full autonomy
20:8 Functional vigilance
23:10 Driver monitoring
24:28 Operational design domain
26:32 Neural network security
28:29 General AI

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Elon Musk.
00:00:03.000 | He's the CEO of Tesla, SpaceX, Neuralink,
00:00:06.240 | and a co-founder of several other companies.
00:00:09.240 | This conversation is part
00:00:10.760 | of the Artificial Intelligence podcast.
00:00:13.200 | The series includes leading researchers
00:00:15.640 | in academia and industry,
00:00:17.240 | including CEOs and CTOs of automotive, robotics,
00:00:21.120 | AI, and technology companies.
00:00:24.080 | This conversation happened after the release
00:00:26.360 | of the paper from our group at MIT
00:00:28.600 | on driver functional vigilance
00:00:30.520 | during use of Tesla's autopilot.
00:00:32.920 | The Tesla team reached out to me,
00:00:34.600 | offering a podcast conversation with Mr. Musk.
00:00:37.520 | I accepted with full control of questions I could ask
00:00:40.640 | and the choice of what is released publicly.
00:00:43.600 | I ended up editing out nothing of substance.
00:00:46.880 | I've never spoken with Elon before this conversation,
00:00:49.720 | publicly or privately.
00:00:51.760 | Neither he nor his companies have any influence
00:00:54.400 | on my opinion, nor on the rigor and integrity
00:00:57.800 | of the scientific method that I practice
00:00:59.720 | in my position at MIT.
00:01:01.800 | Tesla has never financially supported my research,
00:01:04.600 | and I've never owned a Tesla vehicle.
00:01:07.280 | I've never owned Tesla stock.
00:01:10.120 | This podcast is not a scientific paper.
00:01:12.760 | It is a conversation.
00:01:14.320 | I respect Elon as I do all other leaders
00:01:16.680 | and engineers I've spoken with.
00:01:18.640 | We agree on some things and disagree on others.
00:01:21.400 | My goal is always with these conversations
00:01:23.440 | is to understand the way the guest sees the world.
00:01:26.880 | One particular point of disagreement in this conversation
00:01:29.880 | was the extent to which camera-based driver monitoring
00:01:33.200 | will improve outcomes, and for how long
00:01:36.040 | it will remain relevant for AI-assisted driving.
00:01:39.040 | As someone who works on and is fascinated
00:01:42.200 | by human-centered artificial intelligence,
00:01:45.160 | I believe that if implemented and integrated effectively,
00:01:48.680 | camera-based driver monitoring is likely to be of benefit
00:01:51.800 | in both the short-term and the long-term.
00:01:55.600 | In contrast, Elon and Tesla's focus
00:01:59.200 | is on the improvement of autopilot
00:02:01.160 | such that its statistical safety benefits
00:02:04.440 | override any concern of human behavior and psychology.
00:02:09.000 | Elon and I may not agree on everything,
00:02:12.000 | but I deeply respect the engineering and innovation
00:02:14.800 | behind the efforts that he leads.
00:02:16.840 | My goal here is to catalyze a rigorous, nuanced,
00:02:20.560 | and objective discussion in industry and academia
00:02:23.480 | on AI-assisted driving, one that ultimately makes
00:02:27.880 | for a safer and better world.
00:02:30.840 | And now, here's my conversation with Elon Musk.
00:02:34.580 | What was the vision, the dream of autopilot
00:02:38.640 | when, in the beginning, the big picture system level
00:02:41.400 | when it was first conceived and started being installed
00:02:44.900 | in 2014, the hardware and the cars?
00:02:47.520 | What was the vision, the dream?
00:02:49.760 | - I wouldn't characterize it as a vision or dream,
00:02:51.360 | simply that there are obviously two massive revolutions
00:02:56.320 | in the automobile industry.
00:03:00.080 | One is the transition to electrification,
00:03:04.400 | and then the other is autonomy.
00:03:06.340 | And it became obvious to me that in the future,
00:03:12.640 | any car that did not have autonomy
00:03:16.200 | would be about as useful as a horse.
00:03:19.120 | Which is not to say that there's no use, it's just rare,
00:03:22.120 | and somewhat idiosyncratic if somebody
00:03:24.080 | has a horse at this point.
00:03:25.520 | It's just obvious that cars will drive themselves completely,
00:03:28.040 | it's just a question of time.
00:03:29.640 | And if we did not participate in the autonomy revolution,
00:03:34.640 | then our cars would not be useful to people
00:03:40.880 | relative to cars that are autonomous.
00:03:43.720 | I mean, an autonomous car is arguably worth
00:03:48.200 | five to 10 times more than a car which is not autonomous.
00:03:53.200 | - In the long term?
00:03:55.160 | - Depends what you mean by long term,
00:03:56.200 | but let's say at least for the next five years,
00:03:59.540 | perhaps 10 years.
00:04:00.520 | - So there are a lot of very interesting design choices
00:04:04.080 | with Autopilot early on.
00:04:05.740 | First is showing on the instrument cluster,
00:04:09.960 | or in the Model 3 on the center stack display,
00:04:12.700 | what the combined sensor suite sees.
00:04:14.920 | What was the thinking behind that choice?
00:04:17.960 | Was there debate?
00:04:18.960 | What was the process?
00:04:20.520 | - The whole point of the display is to provide
00:04:24.240 | a health check on the vehicle's perception of reality.
00:04:28.120 | So the vehicle's taking in information
00:04:30.440 | from a bunch of sensors, primarily cameras,
00:04:32.240 | but also radar and ultrasonics, GPS, and so forth.
00:04:35.980 | And then that information is then rendered
00:04:41.880 | into vector space, and that, you know,
00:04:44.760 | with a bunch of objects, with properties,
00:04:47.640 | like lane lines and traffic lights and other cars,
00:04:51.200 | and then in vector space, that is re-rendered
00:04:54.840 | onto a display so you can confirm whether the car
00:04:58.400 | knows what's going on or not by looking out the window.
00:05:01.760 | - Right, I think that's an extremely powerful thing
00:05:05.480 | for people to get an understanding,
00:05:07.920 | so to become one with the system
00:05:09.200 | and understanding what the system is capable of.
00:05:11.720 | Now, have you considered showing more?
00:05:14.800 | So if we look at the computer vision,
00:05:17.480 | you know, like road segmentation, lane detection,
00:05:19.760 | vehicle detection, object detection underlying the system,
00:05:23.040 | there is, at the edges, some uncertainty.
00:05:25.720 | Have you considered revealing the parts,
00:05:29.840 | the uncertainty in the system, the sort of--
00:05:33.600 | - Probabilities associated with, say,
00:05:35.560 | image recognition or something like that?
00:05:36.800 | - Yeah, so right now it shows, like,
00:05:38.480 | the vehicles in the vicinity, a very clean, crisp image,
00:05:41.800 | and people do confirm that there's a car in front of me
00:05:44.640 | and the system sees there's a car in front of me,
00:05:46.680 | but to help people build an intuition
00:05:49.040 | of what computer vision is
00:05:50.760 | by showing some of the uncertainty.
00:05:53.080 | - Well, I think it's, in my car,
00:05:55.120 | I always look at the sort of the debug view,
00:05:58.200 | and there's two debug views.
00:06:00.040 | One is augmented vision, which I'm sure you've seen,
00:06:04.920 | where it's basically, we draw boxes and labels
00:06:08.520 | around objects that are recognized,
00:06:11.420 | and then there's what we call the visualizer,
00:06:15.300 | which is basically a vector space representation
00:06:17.980 | summing up the input from all sensors.
00:06:21.020 | That does not show any pictures,
00:06:24.520 | but it shows all of the,
00:06:27.140 | it basically shows the car's view
00:06:29.540 | of the world in vector space.
00:06:32.360 | But I think this is very difficult for people to,
00:06:35.620 | normal people to understand.
00:06:37.120 | They would not know what they're looking at.
00:06:39.540 | - So it's almost an HMI challenge
00:06:40.940 | to the current things that are being displayed
00:06:43.300 | is optimized for the general public understanding
00:06:47.140 | of what the system's capable of.
00:06:48.780 | - It's like if you have no idea
00:06:50.100 | how computer vision works or anything,
00:06:51.700 | you can still look at the screen
00:06:53.060 | and see if the car knows what's going on.
00:06:55.800 | And then if you're a development engineer,
00:06:58.460 | or if you have the development build like I do,
00:07:02.420 | then you can see all the debug information.
00:07:06.040 | But those would just be total jerk to most people.
00:07:11.300 | - What's your view on how to best distribute effort?
00:07:14.260 | So there's three, I would say,
00:07:16.000 | technical aspects of autopilot that are really important.
00:07:19.060 | So it's the underlying algorithms,
00:07:20.500 | like the neural network architecture,
00:07:22.300 | there's the data, so that's trained on,
00:07:24.500 | and then there's the hardware development.
00:07:26.300 | There may be others, but,
00:07:27.660 | so look, algorithm, data, hardware.
00:07:32.100 | You only have so much money, only have so much time.
00:07:35.240 | What do you think is the most important thing
00:07:37.740 | to allocate resources to?
00:07:40.060 | Or do you see it as pretty evenly distributed
00:07:42.980 | between those three?
00:07:44.540 | - We automatically get fast amounts of data
00:07:46.660 | because all of our cars have
00:07:48.660 | eight external facing cameras and radar
00:07:54.740 | and usually 12 ultrasonic sensors,
00:07:58.120 | GPS, obviously, and IMU.
00:08:02.580 | And so we basically have a fleet that has,
00:08:09.780 | we've got about 400,000 cars on the road
00:08:12.260 | that have that level of data.
00:08:14.020 | I think you keep quite close track of it, actually.
00:08:15.860 | - Yes.
00:08:16.700 | - Yeah, so we're approaching half a million cars
00:08:20.340 | on the road that have the full sensor suite.
00:08:22.500 | So this is, I'm not sure how many other cars on the road
00:08:27.340 | have the sensor suite,
00:08:29.400 | but I'd be surprised if it's more than 5,000,
00:08:32.300 | which means that we have 99% of all the data.
00:08:35.140 | - So there's this huge inflow of data.
00:08:38.380 | - Absolutely, massive inflow of data.
00:08:40.660 | And then it's taken us about three years,
00:08:44.400 | but now we've finally developed
00:08:45.700 | our full self-driving computer,
00:08:47.660 | which can process an order of magnitude
00:08:52.660 | as much as the NVIDIA system
00:08:55.020 | that we currently have in the cars.
00:08:56.340 | And it's really just to use it,
00:08:58.380 | you unplug the NVIDIA computer
00:09:00.260 | and plug the Tesla computer in, and that's it.
00:09:02.640 | And it's, in fact, we're not even,
00:09:06.740 | we're still exploring the boundaries of its capabilities,
00:09:10.180 | but we're able to run the cameras at full frame rate,
00:09:11.940 | full resolution, not even crop the images,
00:09:15.420 | and it's still got headroom, even on one of the systems.
00:09:19.980 | The full self-driving computer is really two computers,
00:09:23.460 | two systems on a chip that are fully redundant.
00:09:26.100 | So you could put a bolt through basically
00:09:27.780 | any part of that system and it still works.
00:09:30.220 | - The redundancy, are they perfect copies of each other?
00:09:33.180 | - Yeah.
00:09:34.420 | - Also, it's purely for redundancy
00:09:35.980 | as opposed to an arguing machine kind of architecture
00:09:38.420 | where they're both making decisions.
00:09:40.060 | This is purely for redundancy.
00:09:41.860 | - I think it would more like,
00:09:43.140 | if you have a twin-engine aircraft, commercial aircraft,
00:09:46.560 | the system will operate best if both systems are operating,
00:09:51.780 | but it's capable of operating safely on one.
00:09:55.640 | So, but as it is right now, we can just run,
00:10:00.260 | we haven't even hit the edge of performance.
00:10:04.420 | So, there's no need to actually distribute
00:10:09.260 | the functionality across both SOCs.
00:10:13.500 | We can actually just run a full duplicate on each one.
00:10:17.220 | - You haven't really explored or hit the limit of the--
00:10:20.660 | - Not yet, hit the limit now.
00:10:22.540 | - So, the magic of deep learning
00:10:24.760 | is that it gets better with data.
00:10:27.300 | You said there's a huge inflow of data,
00:10:29.620 | but the thing about driving,
00:10:32.180 | the really valuable data to learn from is the edge cases.
00:10:36.740 | So, how do you, I mean, I've heard you talk somewhere
00:10:41.580 | about autopilot disengagements
00:10:44.180 | being an important moment of time to use.
00:10:46.980 | Is there other edge cases,
00:10:48.300 | or perhaps can you speak to those edge cases,
00:10:52.640 | what aspects of them might be valuable,
00:10:54.700 | or if you have other ideas,
00:10:56.180 | how to discover more and more and more edge cases in driving?
00:11:00.300 | - Well, there's a lot of things that are learned.
00:11:02.380 | There are certainly edge cases where,
00:11:04.820 | I say somebody's on autopilot and they take over,
00:11:08.080 | and then, okay, that's a trigger that goes to our system
00:11:12.380 | that says, okay, did they take over for convenience,
00:11:15.160 | or did they take over
00:11:16.800 | because the autopilot wasn't working properly?
00:11:19.380 | There's also, like, let's say we're trying to figure out
00:11:21.840 | what is the optimal spline for traversing an intersection.
00:11:27.880 | Then, the ones where there are no interventions
00:11:31.360 | are the right ones.
00:11:33.660 | So, you then say, okay, when it looks like this,
00:11:36.380 | do the following, and then you get the optimal spline
00:11:40.640 | for a complex, navigating a complex intersection.
00:11:44.780 | - So, that's for, so there's kind of the common case.
00:11:49.320 | You're trying to capture a huge amount of samples
00:11:52.280 | of a particular intersection, when things went right,
00:11:55.040 | and then there's the edge case where, as you said,
00:11:59.240 | not for convenience, but something didn't go exactly right.
00:12:02.040 | - Somebody took over, somebody asserted manual control
00:12:04.060 | from autopilot, and really, like, the way to look at this
00:12:07.620 | is view all input as error.
00:12:09.900 | If the user had to do input, there's something,
00:12:12.640 | all input is error.
00:12:13.920 | - That's a powerful line to think of it that way,
00:12:16.360 | 'cause it may very well be error,
00:12:17.760 | but if you want to exit the highway,
00:12:19.960 | or if you want to, it's a navigation decision
00:12:23.080 | that autopilot's not currently designed to do,
00:12:25.400 | then the driver takes over.
00:12:27.520 | How do you know the difference?
00:12:28.360 | - Yeah, that's gonna change with navigate and autopilot,
00:12:30.120 | which were just released, and without still confirm.
00:12:33.800 | So, the navigation, like, lane change-based,
00:12:36.120 | like, asserting control in order to do a lane change,
00:12:39.960 | or exit a freeway, or do a highway interchange,
00:12:43.560 | the vast majority of that will go away
00:12:46.040 | with the release that just went out.
00:12:48.880 | - Yeah, so that, I don't think people quite understand
00:12:52.960 | how big of a step that is.
00:12:54.560 | - Yeah, they don't.
00:12:55.880 | So, if you drive the car, then you do.
00:12:58.240 | - So, you still have to keep your hands
00:12:59.560 | on the steering wheel currently,
00:13:00.760 | when it does the automatic lane change.
00:13:03.400 | What are, so there's these big leaps
00:13:06.960 | through the development of autopilot, through its history,
00:13:10.040 | and what stands out to you as the big leaps?
00:13:13.560 | I would say this one, navigate and autopilot
00:13:16.160 | without having to confirm, is a huge leap.
00:13:21.120 | - It is a huge leap.
00:13:22.120 | - And it also automatically overtakes slow cars.
00:13:24.880 | So, it's both navigation and seeking the fastest lane.
00:13:29.880 | So, it'll overtake slower cars, and exit the freeway,
00:13:36.040 | and take highway interchanges,
00:13:38.640 | and then we have traffic light recognition,
00:13:45.520 | which is introduced initially as a warning.
00:13:50.200 | I mean, on the development version that I'm driving,
00:13:52.280 | the car fully stops and goes at traffic lights.
00:13:56.880 | - So, those are the steps, right?
00:13:58.480 | You've just mentioned some things
00:13:59.800 | that are an inkling of a step towards full autonomy.
00:14:02.360 | What would you say are the biggest technological roadblocks
00:14:08.000 | to full self-driving?
00:14:09.960 | - Actually, I don't think, I think we just,
00:14:11.440 | the full self-driving computer that we just,
00:14:13.600 | the Tesla, what we call the FSD computer,
00:14:17.120 | that's now in production.
00:14:20.640 | So, if you order any Model S or X,
00:14:24.280 | or any Model 3 that has the full self-driving package,
00:14:28.240 | you'll get the FSD computer.
00:14:29.760 | That's important to have enough base computation.
00:14:35.840 | Then refining the neural net and the control software,
00:14:38.840 | but all of that can just be provided as an over-the-air update.
00:14:42.840 | The thing that's really profound,
00:14:45.720 | and what I'll be emphasizing at the,
00:14:48.240 | sort of that investor day that we're having
00:14:51.960 | focused on autonomy,
00:14:53.320 | is that the cars currently being produced,
00:14:56.120 | or the hardware currently being produced,
00:14:58.200 | is capable of full self-driving.
00:15:01.000 | - But capable is an interesting word, because--
00:15:04.240 | - Like the hardware is.
00:15:05.920 | And as we refine the software,
00:15:07.560 | the capabilities will increase dramatically,
00:15:11.760 | and then the reliability will increase dramatically,
00:15:14.000 | and then it will receive regulatory approval.
00:15:16.200 | So, essentially, buying a car today
00:15:17.680 | is an investment in the future.
00:15:19.160 | You're essentially buying,
00:15:20.480 | I think the most profound thing is that
00:15:26.280 | if you buy a Tesla today,
00:15:27.800 | I believe you are buying an appreciating asset,
00:15:30.480 | not a depreciating asset.
00:15:33.120 | - So, that's a really important statement there,
00:15:35.320 | because if hardware is capable enough,
00:15:37.800 | that's the hard thing to upgrade, usually.
00:15:40.560 | So, then the rest is a software problem.
00:15:44.600 | - Software has no marginal cost, really.
00:15:47.940 | But, what's your intuition on the software side?
00:15:51.460 | How hard are the remaining steps
00:15:54.620 | to get it to where
00:15:57.700 | the experience, not just the safety,
00:16:03.940 | but the full experience,
00:16:05.740 | is something that people would enjoy?
00:16:09.260 | - I think people enjoy it very much so, on the highways.
00:16:12.820 | It's a total game changer for quality of life.
00:16:16.780 | For using Tesla autopilot on the highways.
00:16:21.340 | So, it's really just extending that functionality
00:16:23.020 | to city streets,
00:16:24.500 | adding in the traffic light recognition,
00:16:29.220 | navigating complex intersections,
00:16:31.420 | and then being able to navigate complicated parking lots,
00:16:36.420 | so the car can exit a parking space
00:16:40.460 | and come and find you,
00:16:41.300 | even if it's in a complete maze of a parking lot.
00:16:46.420 | And then it can just drop you off
00:16:49.940 | and find a parking spot by itself.
00:16:52.960 | - Yeah, in terms of enjoyability
00:16:54.420 | and something that people would actually find a lot of use
00:16:58.580 | from, the parking lot, is a really,
00:17:00.820 | it's rich of annoyance when you have to do it manually,
00:17:04.720 | so there's a lot of benefit to be gained
00:17:06.660 | from automation there.
00:17:07.820 | So, let me start injecting the human
00:17:10.380 | into this discussion a little bit.
00:17:12.780 | So, let's talk about full autonomy.
00:17:15.620 | If you look at the current level four vehicles
00:17:17.460 | being tested on road, like Waymo and so on,
00:17:19.780 | they're only technically autonomous.
00:17:23.380 | They're really level two systems
00:17:25.460 | with just a different design philosophy,
00:17:28.860 | because there's always a safety driver in almost all cases
00:17:31.540 | and they're monitoring the system.
00:17:33.340 | Do you see Tesla's full self-driving
00:17:38.060 | as still for a time to come,
00:17:40.620 | requiring supervision of the human being?
00:17:44.820 | So, its capabilities are powerful enough to drive,
00:17:47.460 | but nevertheless requires the human
00:17:49.020 | to still be supervising, just like a safety driver is
00:17:52.580 | in other fully autonomous vehicles.
00:17:57.380 | - I think it'll require detecting hands on wheel
00:18:01.540 | for at least six months or something like that from here.
00:18:06.540 | Really, it's a question of, from a regulatory standpoint,
00:18:14.840 | how much safer than a person does autopilot need to be
00:18:19.840 | for it to be okay to not monitor the car?
00:18:23.160 | And this is a debate that one can have,
00:18:27.120 | and then, but you need a large sample,
00:18:30.720 | a large amount of data so that you can prove
00:18:33.880 | with high confidence, statistically speaking,
00:18:36.640 | that the car is dramatically safer than a person,
00:18:40.400 | and that adding in the person monitoring
00:18:42.800 | does not materially affect the safety.
00:18:45.920 | So, it might need to be like 200 or 300% safer than a person.
00:18:50.120 | - And how do you prove that?
00:18:51.160 | - Incidents per mile.
00:18:52.400 | - Incidents per mile, so crashes and fatalities.
00:18:56.640 | - Yeah, fatalities would be a factor,
00:18:58.640 | but there are just not enough fatalities
00:19:00.440 | to be statistically significant at scale,
00:19:04.040 | but there are enough crashes,
00:19:06.960 | there are far more crashes than there are fatalities.
00:19:10.960 | So, you can assess what is the probability of a crash,
00:19:14.900 | then there's another step, which is probability of injury,
00:19:19.640 | and probability of permanent injury,
00:19:21.680 | and probability of death.
00:19:23.840 | And all of those need to be much better than a person
00:19:27.680 | by at least perhaps 200%.
00:19:32.680 | - And you think there's the ability
00:19:35.760 | to have a healthy discourse with the regulatory bodies
00:19:38.680 | on this topic?
00:19:40.080 | - I mean, there's no question that regulators
00:19:43.920 | pay a disproportionate amount of attention
00:19:46.880 | to that which generates press,
00:19:48.720 | this is just an objective fact,
00:19:50.560 | and Tesla generates a lot of press.
00:19:53.360 | So, in the United States,
00:19:57.800 | there's I think almost 40,000 automotive deaths per year.
00:20:01.180 | But if there are four in Tesla,
00:20:04.480 | they'll probably receive a thousand times more press
00:20:07.000 | than anyone else.
00:20:08.820 | - So, the psychology of that is actually fascinating.
00:20:11.480 | I don't think we'll have enough time to talk about that,
00:20:13.360 | but I have to talk to you about the human side of things.
00:20:17.040 | So, myself and our team at MIT recently released a paper
00:20:20.960 | on functional vigilance of drivers while using Autopilot.
00:20:24.600 | This is work we've been doing since Autopilot
00:20:27.480 | was first released publicly over three years ago,
00:20:30.220 | collecting video of driver faces and driver body.
00:20:34.600 | So, I saw that you tweeted a quote from the abstract,
00:20:38.460 | so I can at least guess that you've glanced at it.
00:20:43.360 | - Yeah, I read it.
00:20:44.520 | - Can I talk you through what we found?
00:20:46.320 | - Sure. - Okay.
00:20:47.280 | So, it appears that in the data that we've collected,
00:20:52.280 | that drivers are maintaining functional vigilance
00:20:55.240 | such that we're looking at 18,000 disengagements
00:20:57.880 | from Autopilot, 18,900,
00:21:00.400 | and annotating were they able to take over control
00:21:04.560 | in a timely manner.
00:21:05.760 | So, they were there, present, looking at the road
00:21:08.740 | to take over control.
00:21:10.180 | Okay, so this goes against what many would predict
00:21:15.180 | from the body of literature on vigilance with automation.
00:21:19.500 | Now, the question is, do you think these results
00:21:22.620 | hold across the broader population?
00:21:24.860 | So, ours is just a small subset.
00:21:27.320 | Do you think, one of the criticism is that,
00:21:31.480 | you know, there's a small minority of drivers
00:21:33.660 | that may be highly responsible,
00:21:36.060 | where their vigilance decrement would increase
00:21:38.840 | with Autopilot use?
00:21:40.400 | - I think this is all really gonna be swept.
00:21:42.580 | I mean, the system's improving so much, so fast,
00:21:47.580 | that this is gonna be a moot point very soon.
00:21:50.340 | Where vigilance is, like, if something's many times
00:21:57.100 | safer than a person, then adding a person does,
00:22:01.580 | the effect on safety is limited.
00:22:04.540 | And, in fact, it could be negative.
00:22:08.780 | - That's really interesting.
00:22:11.500 | So, the fact that a human may, some percent of the population
00:22:16.500 | may exhibit a vigilance decrement will not affect
00:22:20.640 | the overall statistics numbers of safety.
00:22:22.340 | - No, in fact, I think it will become, very, very quickly,
00:22:27.340 | maybe even towards the end of this year,
00:22:29.260 | but I'd say, I'd be shocked if it's not next year,
00:22:32.040 | at the latest, that having a human intervene
00:22:36.340 | will decrease safety.
00:22:37.880 | Decrease.
00:22:40.780 | I can imagine if you're in an elevator.
00:22:42.900 | Now, it used to be that there were elevator operators,
00:22:45.700 | and you couldn't go in an elevator by yourself
00:22:48.100 | and work the lever to move between floors.
00:22:50.940 | And now, nobody wants an elevator operator,
00:22:56.980 | because the automated elevator that stops the floors
00:23:00.500 | is much safer than the elevator operator.
00:23:02.700 | And, in fact, it would be quite dangerous
00:23:05.420 | to have someone with a lever that can move
00:23:07.780 | the elevator between floors.
00:23:09.780 | - So, that's a really powerful statement,
00:23:12.780 | and a really interesting one.
00:23:14.660 | But I also have to ask, from a user experience
00:23:16.900 | and from a safety perspective,
00:23:18.740 | one of the passions for me, algorithmically,
00:23:21.260 | is camera-based detection of just sensing the human,
00:23:25.980 | but detecting what the driver's looking at,
00:23:27.780 | cognitive load, body pose.
00:23:29.660 | On the computer vision side, that's a fascinating problem,
00:23:32.100 | but do you, and there's many in industry
00:23:34.540 | who believe you have to have
00:23:35.700 | camera-based driver monitoring.
00:23:37.540 | Do you think there could be benefit gained
00:23:39.820 | from driver monitoring?
00:23:41.700 | - If you have a system that's at or below
00:23:45.980 | a human-level reliability,
00:23:47.220 | then driver monitoring makes sense.
00:23:48.980 | But if your system is dramatically better,
00:23:52.100 | more reliable than a human,
00:23:54.220 | then driver monitoring does not help much.
00:23:59.220 | And, like I said, you wouldn't want someone in the elevator,
00:24:04.460 | if you're in an elevator,
00:24:06.620 | do you really want someone with a big lever,
00:24:08.500 | some random person operating an elevator between floors?
00:24:11.580 | I wouldn't trust that.
00:24:14.380 | I would rather have the buttons.
00:24:15.980 | - Okay, you're optimistic about the pace
00:24:20.220 | of improvement of the system,
00:24:21.780 | from what you've seen with a full self-driving car,
00:24:24.300 | computer, the rate of improvement is exponential.
00:24:27.340 | - So one of the other very interesting design choices
00:24:31.580 | early on that connects to this
00:24:33.780 | is the operational design domain of autopilot.
00:24:38.260 | So where autopilot is able to be turned on.
00:24:41.700 | So contrast another vehicle system that we're studying
00:24:47.140 | is the Cadillac Super Cruise system.
00:24:49.020 | That's, in terms of ODD,
00:24:50.500 | very constrained to particular kinds of highways,
00:24:53.580 | well-mapped, tested,
00:24:55.420 | but it's much narrower than the ODD of Tesla vehicles.
00:24:58.700 | - It's like ADD.
00:25:01.940 | (laughing)
00:25:04.620 | - That's good, that's a good line.
00:25:06.460 | What was the design decision
00:25:10.380 | in that different philosophy of thinking where,
00:25:14.380 | so there's pros and cons.
00:25:15.580 | What we see with a wide ODD
00:25:19.060 | is Tesla drivers are able to explore more
00:25:22.300 | the limitations of the system, at least early on,
00:25:24.500 | and they understand,
00:25:26.220 | together with the instrument cluster display,
00:25:28.260 | they start to understand what are the capabilities.
00:25:30.420 | So that's a benefit.
00:25:31.980 | The con is you're letting drivers use it basically anywhere.
00:25:36.980 | - Well, anywhere that could detect lanes with confidence.
00:25:41.620 | - Was there a philosophy,
00:25:43.100 | design decisions that were challenging
00:25:46.580 | that were being made there?
00:25:48.140 | Or from the very beginning,
00:25:49.500 | was that done on purpose with intent?
00:25:54.500 | - Well, I mean, I think,
00:25:56.100 | frankly, it's pretty crazy letting people
00:25:57.860 | drive a two-ton death machine manually.
00:26:01.500 | That's crazy.
00:26:03.840 | In the future, people will be like,
00:26:06.180 | I can't believe anyone was just allowed
00:26:08.620 | to drive one of these two-ton death machines,
00:26:12.980 | and they could just drive wherever they wanted.
00:26:14.460 | Just like elevators, you could just move the elevator
00:26:17.100 | with that lever wherever you want.
00:26:18.140 | It could stop at halfway between floors if you want.
00:26:20.580 | It's pretty crazy.
00:26:23.540 | So, it's gonna seem like a mad thing in the future
00:26:29.540 | that people were driving cars.
00:26:31.840 | - So I have a bunch of questions about the human psychology,
00:26:35.660 | about behavior and so on, that would become--
00:26:39.180 | - That mood's totally moot.
00:26:41.020 | - Because you have faith in the AI system.
00:26:46.300 | Not faith, but both on the hardware side
00:26:50.500 | and the deep learning approach of learning from data
00:26:52.940 | will make it just far safer than humans.
00:26:55.660 | - Yeah, exactly.
00:26:57.260 | - Recently, there are a few hackers
00:26:59.420 | who tricked Autopilot to act in unexpected ways
00:27:02.020 | with adversarial examples.
00:27:03.940 | So, we all know that neural network systems
00:27:06.500 | are very sensitive to minor disturbances
00:27:08.420 | to these adversarial examples on input.
00:27:11.220 | Do you think it's possible to defend against
00:27:13.140 | something like this for the industry?
00:27:15.980 | - Sure.
00:27:16.820 | (laughing)
00:27:19.060 | - Can you elaborate on the confidence behind that answer?
00:27:22.700 | - Well, a neural net is just a bunch of matrix math.
00:27:28.620 | You have to be a very sophisticated,
00:27:30.720 | somebody who really understands neural nets
00:27:33.320 | and basically reverse engineer how the matrix
00:27:37.340 | is being built and then create a little thing
00:27:40.500 | that just exactly causes the matrix math
00:27:44.100 | to be slightly off.
00:27:45.420 | But it's very easy to then block that
00:27:47.740 | by having basically negative recognition.
00:27:51.820 | It's like if the system sees something
00:27:53.900 | that looks like a matrix hack, exclude it.
00:27:57.700 | It's such an easy thing to do.
00:28:01.600 | - So, learn both on the valid data and the invalid data.
00:28:06.220 | So, basically learn on the adversarial examples
00:28:08.220 | to be able to exclude them.
00:28:09.740 | - Yeah, you basically want to both know
00:28:12.340 | what is a car and what is definitely not a car.
00:28:16.180 | You train for this is a car and this is definitely not a car.
00:28:19.100 | Those are two different things.
00:28:20.700 | People have no idea of neural nets, really.
00:28:23.820 | They probably think neural nets involves
00:28:25.300 | like fishing net or something.
00:28:27.100 | (laughing)
00:28:29.180 | - So, as you know, so taking a step beyond
00:28:33.980 | just Tesla and autopilot, current deep learning approaches
00:28:37.740 | still seem in some ways to be far
00:28:42.300 | from general intelligence systems.
00:28:44.660 | Do you think the current approaches
00:28:46.740 | will take us to general intelligence
00:28:49.740 | or do totally new ideas need to be invented?
00:28:53.820 | - I think we're missing a few key ideas
00:28:57.220 | for general intelligence,
00:29:00.100 | general artificial general intelligence.
00:29:05.140 | But it's gonna be upon us very quickly
00:29:07.220 | and then we'll need to figure out what shall we do
00:29:12.460 | if we even have that choice.
00:29:13.900 | But it's amazing how people can't differentiate
00:29:18.300 | between say the narrow AI that allows a car
00:29:22.740 | to figure out what a lane line is
00:29:24.420 | and navigate streets versus general intelligence.
00:29:29.420 | Like these are just very different things.
00:29:33.140 | Like your toaster and your computer are both machines
00:29:35.820 | but one's much more sophisticated than another.
00:29:38.660 | - You're confident with Tesla you can create
00:29:41.460 | the world's best toaster?
00:29:43.660 | - The world's best toaster, yes.
00:29:45.220 | The world's best self-driving.
00:29:51.060 | To me, right now, this seems game, set, match.
00:29:55.260 | I mean, I don't want to be complacent or overconfident
00:29:57.820 | but that is just literally how it appears right now.
00:30:02.660 | I could be wrong but it appears to be the case
00:30:06.340 | that Tesla is vastly ahead of everyone.
00:30:09.660 | - Do you think we will ever create an AI system
00:30:13.460 | that we can love and loves us back
00:30:16.300 | in a deep meaningful way like in the movie Her?
00:30:18.660 | - I think AI will be capable of convincing you
00:30:23.460 | to fall in love with it very well.
00:30:25.860 | - And that's different than us humans?
00:30:27.760 | - You know, we start getting into a metaphysical question
00:30:31.300 | of like do emotions and thoughts exist
00:30:33.820 | in a different realm than the physical?
00:30:35.620 | And maybe they do, maybe they don't, I don't know.
00:30:38.340 | But from a physics standpoint, I tend to think of things,
00:30:41.820 | like physics was my main sort of training
00:30:47.420 | and from a physics standpoint, essentially,
00:30:51.380 | if it loves you in a way that you can't tell
00:30:54.220 | whether it's real or not, it is real.
00:30:56.180 | - It's a physics view of love.
00:30:59.100 | - Yeah.
00:31:00.900 | If you cannot prove that it does not,
00:31:04.720 | if there's no test that you can apply
00:31:08.060 | that would make it,
00:31:10.160 | allow you to tell the difference,
00:31:15.900 | then there is no difference.
00:31:17.500 | - And it's similar to seeing our world as simulation.
00:31:21.100 | There may not be a test to tell the difference
00:31:22.940 | between what the real world and the simulation
00:31:25.420 | and therefore, from a physics perspective,
00:31:27.860 | it might as well be the same thing.
00:31:29.340 | - Yes.
00:31:30.180 | And there may be ways to test whether it's a simulation.
00:31:33.120 | There might be, I'm not saying there aren't,
00:31:36.020 | but you could certainly imagine that a simulation
00:31:37.940 | could correct that once an entity in the simulation
00:31:41.100 | found a way to detect the simulation,
00:31:43.060 | it could either restart, pause the simulation,
00:31:47.380 | start a new simulation or do one of many other things
00:31:49.900 | that then corrects for that error.
00:31:51.660 | - So when maybe you or somebody else creates an AGI system
00:31:59.340 | and you get to ask her one question,
00:32:02.980 | what would that question be?
00:32:04.380 | - What's outside the simulation?
00:32:18.700 | - Elon, thank you so much for talking today.
00:32:23.380 | It was a pleasure.
00:32:24.260 | - All right, thank you.
00:32:25.420 | (upbeat music)
00:32:28.000 | (upbeat music)
00:32:30.580 | (upbeat music)
00:32:33.160 | (upbeat music)
00:32:35.740 | (upbeat music)
00:32:38.320 | (upbeat music)
00:32:40.900 | [BLANK_AUDIO]