Elon Musk: Tesla Autopilot | Lex Fridman Podcast #18

00:00:00.000 | The following is a conversation with Elon Musk.

00:00:03.000 | He's the CEO of Tesla, SpaceX, Neuralink,

00:00:06.240 | and a co-founder of several other companies.

00:00:09.240 | This conversation is part

00:00:10.760 | of the Artificial Intelligence podcast.

00:00:13.200 | The series includes leading researchers

00:00:15.640 | in academia and industry,

00:00:17.240 | including CEOs and CTOs of automotive, robotics,

00:00:21.120 | AI, and technology companies.

00:00:24.080 | This conversation happened after the release

00:00:26.360 | of the paper from our group at MIT

00:00:28.600 | on driver functional vigilance

00:00:30.520 | during use of Tesla's autopilot.

00:00:32.920 | The Tesla team reached out to me,

00:00:34.600 | offering a podcast conversation with Mr. Musk.

00:00:37.520 | I accepted with full control of questions I could ask

00:00:40.640 | and the choice of what is released publicly.

00:00:43.600 | I ended up editing out nothing of substance.

00:00:46.880 | I've never spoken with Elon before this conversation,

00:00:49.720 | publicly or privately.

00:00:51.760 | Neither he nor his companies have any influence

00:00:54.400 | on my opinion, nor on the rigor and integrity

00:00:57.800 | of the scientific method that I practice

00:00:59.720 | in my position at MIT.

00:01:01.800 | Tesla has never financially supported my research,

00:01:04.600 | and I've never owned a Tesla vehicle.

00:01:07.280 | I've never owned Tesla stock.

00:01:10.120 | This podcast is not a scientific paper.

00:01:12.760 | It is a conversation.

00:01:14.320 | I respect Elon as I do all other leaders

00:01:16.680 | and engineers I've spoken with.

00:01:18.640 | We agree on some things and disagree on others.

00:01:21.400 | My goal is always with these conversations

00:01:23.440 | is to understand the way the guest sees the world.

00:01:26.880 | One particular point of disagreement in this conversation

00:01:29.880 | was the extent to which camera-based driver monitoring

00:01:33.200 | will improve outcomes, and for how long

00:01:36.040 | it will remain relevant for AI-assisted driving.

00:01:39.040 | As someone who works on and is fascinated

00:01:42.200 | by human-centered artificial intelligence,

00:01:45.160 | I believe that if implemented and integrated effectively,

00:01:48.680 | camera-based driver monitoring is likely to be of benefit

00:01:51.800 | in both the short-term and the long-term.

00:01:55.600 | In contrast, Elon and Tesla's focus

00:01:59.200 | is on the improvement of autopilot

00:02:01.160 | such that its statistical safety benefits

00:02:04.440 | override any concern of human behavior and psychology.

00:02:09.000 | Elon and I may not agree on everything,

00:02:12.000 | but I deeply respect the engineering and innovation

00:02:14.800 | behind the efforts that he leads.

00:02:16.840 | My goal here is to catalyze a rigorous, nuanced,

00:02:20.560 | and objective discussion in industry and academia

00:02:23.480 | on AI-assisted driving, one that ultimately makes

00:02:27.880 | for a safer and better world.

00:02:30.840 | And now, here's my conversation with Elon Musk.

00:02:34.580 | What was the vision, the dream of autopilot

00:02:38.640 | when, in the beginning, the big picture system level

00:02:41.400 | when it was first conceived and started being installed

00:02:44.900 | in 2014, the hardware and the cars?

00:02:47.520 | What was the vision, the dream?

00:02:49.760 | - I wouldn't characterize it as a vision or dream,

00:02:51.360 | simply that there are obviously two massive revolutions

00:02:56.320 | in the automobile industry.

00:03:00.080 | One is the transition to electrification,

00:03:04.400 | and then the other is autonomy.

00:03:06.340 | And it became obvious to me that in the future,

00:03:12.640 | any car that did not have autonomy

00:03:16.200 | would be about as useful as a horse.

00:03:19.120 | Which is not to say that there's no use, it's just rare,

00:03:22.120 | and somewhat idiosyncratic if somebody

00:03:24.080 | has a horse at this point.

00:03:25.520 | It's just obvious that cars will drive themselves completely,

00:03:28.040 | it's just a question of time.

00:03:29.640 | And if we did not participate in the autonomy revolution,

00:03:34.640 | then our cars would not be useful to people

00:03:40.880 | relative to cars that are autonomous.

00:03:43.720 | I mean, an autonomous car is arguably worth

00:03:48.200 | five to 10 times more than a car which is not autonomous.

00:03:53.200 | - In the long term?

00:03:55.160 | - Depends what you mean by long term,

00:03:56.200 | but let's say at least for the next five years,

00:03:59.540 | perhaps 10 years.

00:04:00.520 | - So there are a lot of very interesting design choices

00:04:04.080 | with Autopilot early on.

00:04:05.740 | First is showing on the instrument cluster,

00:04:09.960 | or in the Model 3 on the center stack display,

00:04:12.700 | what the combined sensor suite sees.

00:04:14.920 | What was the thinking behind that choice?

00:04:17.960 | Was there debate?

00:04:18.960 | What was the process?

00:04:20.520 | - The whole point of the display is to provide

00:04:24.240 | a health check on the vehicle's perception of reality.

00:04:28.120 | So the vehicle's taking in information

00:04:30.440 | from a bunch of sensors, primarily cameras,

00:04:32.240 | but also radar and ultrasonics, GPS, and so forth.

00:04:35.980 | And then that information is then rendered

00:04:41.880 | into vector space, and that, you know,

00:04:44.760 | with a bunch of objects, with properties,

00:04:47.640 | like lane lines and traffic lights and other cars,

00:04:51.200 | and then in vector space, that is re-rendered

00:04:54.840 | onto a display so you can confirm whether the car

00:04:58.400 | knows what's going on or not by looking out the window.

00:05:01.760 | - Right, I think that's an extremely powerful thing

00:05:05.480 | for people to get an understanding,

00:05:07.920 | so to become one with the system

00:05:09.200 | and understanding what the system is capable of.

00:05:11.720 | Now, have you considered showing more?

00:05:14.800 | So if we look at the computer vision,

00:05:17.480 | you know, like road segmentation, lane detection,

00:05:19.760 | vehicle detection, object detection underlying the system,

00:05:23.040 | there is, at the edges, some uncertainty.

00:05:25.720 | Have you considered revealing the parts,

00:05:29.840 | the uncertainty in the system, the sort of--

00:05:33.600 | - Probabilities associated with, say,

00:05:35.560 | image recognition or something like that?

00:05:36.800 | - Yeah, so right now it shows, like,

00:05:38.480 | the vehicles in the vicinity, a very clean, crisp image,

00:05:41.800 | and people do confirm that there's a car in front of me

00:05:44.640 | and the system sees there's a car in front of me,

00:05:46.680 | but to help people build an intuition

00:05:49.040 | of what computer vision is

00:05:50.760 | by showing some of the uncertainty.

00:05:53.080 | - Well, I think it's, in my car,

00:05:55.120 | I always look at the sort of the debug view,

00:05:58.200 | and there's two debug views.

00:06:00.040 | One is augmented vision, which I'm sure you've seen,

00:06:04.920 | where it's basically, we draw boxes and labels

00:06:08.520 | around objects that are recognized,

00:06:11.420 | and then there's what we call the visualizer,

00:06:15.300 | which is basically a vector space representation

00:06:17.980 | summing up the input from all sensors.

00:06:21.020 | That does not show any pictures,

00:06:24.520 | but it shows all of the,

00:06:27.140 | it basically shows the car's view

00:06:29.540 | of the world in vector space.

00:06:32.360 | But I think this is very difficult for people to,

00:06:35.620 | normal people to understand.

00:06:37.120 | They would not know what they're looking at.

00:06:39.540 | - So it's almost an HMI challenge

00:06:40.940 | to the current things that are being displayed

00:06:43.300 | is optimized for the general public understanding

00:06:47.140 | of what the system's capable of.

00:06:48.780 | - It's like if you have no idea

00:06:50.100 | how computer vision works or anything,

00:06:51.700 | you can still look at the screen

00:06:53.060 | and see if the car knows what's going on.

00:06:55.800 | And then if you're a development engineer,

00:06:58.460 | or if you have the development build like I do,

00:07:02.420 | then you can see all the debug information.

00:07:06.040 | But those would just be total jerk to most people.

00:07:11.300 | - What's your view on how to best distribute effort?

00:07:14.260 | So there's three, I would say,

00:07:16.000 | technical aspects of autopilot that are really important.

00:07:19.060 | So it's the underlying algorithms,

00:07:20.500 | like the neural network architecture,

00:07:22.300 | there's the data, so that's trained on,

00:07:24.500 | and then there's the hardware development.

00:07:26.300 | There may be others, but,

00:07:27.660 | so look, algorithm, data, hardware.

00:07:32.100 | You only have so much money, only have so much time.

00:07:35.240 | What do you think is the most important thing

00:07:37.740 | to allocate resources to?

00:07:40.060 | Or do you see it as pretty evenly distributed

00:07:42.980 | between those three?

00:07:44.540 | - We automatically get fast amounts of data

00:07:46.660 | because all of our cars have

00:07:48.660 | eight external facing cameras and radar

00:07:54.740 | and usually 12 ultrasonic sensors,

00:07:58.120 | GPS, obviously, and IMU.

00:08:02.580 | And so we basically have a fleet that has,

00:08:09.780 | we've got about 400,000 cars on the road

00:08:12.260 | that have that level of data.

00:08:14.020 | I think you keep quite close track of it, actually.

00:08:15.860 | - Yes.

00:08:16.700 | - Yeah, so we're approaching half a million cars

00:08:20.340 | on the road that have the full sensor suite.

00:08:22.500 | So this is, I'm not sure how many other cars on the road

00:08:27.340 | have the sensor suite,

00:08:29.400 | but I'd be surprised if it's more than 5,000,

00:08:32.300 | which means that we have 99% of all the data.

00:08:35.140 | - So there's this huge inflow of data.

00:08:38.380 | - Absolutely, massive inflow of data.

00:08:40.660 | And then it's taken us about three years,

00:08:44.400 | but now we've finally developed

00:08:45.700 | our full self-driving computer,

00:08:47.660 | which can process an order of magnitude

00:08:52.660 | as much as the NVIDIA system

00:08:55.020 | that we currently have in the cars.

00:08:56.340 | And it's really just to use it,

00:08:58.380 | you unplug the NVIDIA computer

00:09:00.260 | and plug the Tesla computer in, and that's it.

00:09:02.640 | And it's, in fact, we're not even,

00:09:06.740 | we're still exploring the boundaries of its capabilities,

00:09:10.180 | but we're able to run the cameras at full frame rate,

00:09:11.940 | full resolution, not even crop the images,

00:09:15.420 | and it's still got headroom, even on one of the systems.

00:09:19.980 | The full self-driving computer is really two computers,

00:09:23.460 | two systems on a chip that are fully redundant.

00:09:26.100 | So you could put a bolt through basically

00:09:27.780 | any part of that system and it still works.

00:09:30.220 | - The redundancy, are they perfect copies of each other?

00:09:33.180 | - Yeah.

00:09:34.420 | - Also, it's purely for redundancy

00:09:35.980 | as opposed to an arguing machine kind of architecture

00:09:38.420 | where they're both making decisions.

00:09:40.060 | This is purely for redundancy.

00:09:41.860 | - I think it would more like,

00:09:43.140 | if you have a twin-engine aircraft, commercial aircraft,

00:09:46.560 | the system will operate best if both systems are operating,

00:09:51.780 | but it's capable of operating safely on one.

00:09:55.640 | So, but as it is right now, we can just run,

00:10:00.260 | we haven't even hit the edge of performance.

00:10:04.420 | So, there's no need to actually distribute

00:10:09.260 | the functionality across both SOCs.

00:10:13.500 | We can actually just run a full duplicate on each one.

00:10:17.220 | - You haven't really explored or hit the limit of the--

00:10:20.660 | - Not yet, hit the limit now.

00:10:22.540 | - So, the magic of deep learning

00:10:24.760 | is that it gets better with data.

00:10:27.300 | You said there's a huge inflow of data,

00:10:29.620 | but the thing about driving,

00:10:32.180 | the really valuable data to learn from is the edge cases.

00:10:36.740 | So, how do you, I mean, I've heard you talk somewhere

00:10:41.580 | about autopilot disengagements

00:10:44.180 | being an important moment of time to use.

00:10:46.980 | Is there other edge cases,

00:10:48.300 | or perhaps can you speak to those edge cases,

00:10:52.640 | what aspects of them might be valuable,

00:10:54.700 | or if you have other ideas,

00:10:56.180 | how to discover more and more and more edge cases in driving?

00:11:00.300 | - Well, there's a lot of things that are learned.

00:11:02.380 | There are certainly edge cases where,

00:11:04.820 | I say somebody's on autopilot and they take over,

00:11:08.080 | and then, okay, that's a trigger that goes to our system

00:11:12.380 | that says, okay, did they take over for convenience,

00:11:15.160 | or did they take over

00:11:16.800 | because the autopilot wasn't working properly?

00:11:19.380 | There's also, like, let's say we're trying to figure out

00:11:21.840 | what is the optimal spline for traversing an intersection.

00:11:27.880 | Then, the ones where there are no interventions

00:11:31.360 | are the right ones.

00:11:33.660 | So, you then say, okay, when it looks like this,

00:11:36.380 | do the following, and then you get the optimal spline

00:11:40.640 | for a complex, navigating a complex intersection.

00:11:44.780 | - So, that's for, so there's kind of the common case.

00:11:49.320 | You're trying to capture a huge amount of samples

00:11:52.280 | of a particular intersection, when things went right,

00:11:55.040 | and then there's the edge case where, as you said,

00:11:59.240 | not for convenience, but something didn't go exactly right.

00:12:02.040 | - Somebody took over, somebody asserted manual control

00:12:04.060 | from autopilot, and really, like, the way to look at this

00:12:07.620 | is view all input as error.

00:12:09.900 | If the user had to do input, there's something,

00:12:12.640 | all input is error.

00:12:13.920 | - That's a powerful line to think of it that way,

00:12:16.360 | 'cause it may very well be error,

00:12:17.760 | but if you want to exit the highway,

00:12:19.960 | or if you want to, it's a navigation decision

00:12:23.080 | that autopilot's not currently designed to do,

00:12:25.400 | then the driver takes over.

00:12:27.520 | How do you know the difference?

00:12:28.360 | - Yeah, that's gonna change with navigate and autopilot,

00:12:30.120 | which were just released, and without still confirm.

00:12:33.800 | So, the navigation, like, lane change-based,

00:12:36.120 | like, asserting control in order to do a lane change,

00:12:39.960 | or exit a freeway, or do a highway interchange,

00:12:43.560 | the vast majority of that will go away

00:12:46.040 | with the release that just went out.

00:12:48.880 | - Yeah, so that, I don't think people quite understand

00:12:52.960 | how big of a step that is.

00:12:54.560 | - Yeah, they don't.

00:12:55.880 | So, if you drive the car, then you do.

00:12:58.240 | - So, you still have to keep your hands

00:12:59.560 | on the steering wheel currently,

00:13:00.760 | when it does the automatic lane change.

00:13:03.400 | What are, so there's these big leaps

00:13:06.960 | through the development of autopilot, through its history,

00:13:10.040 | and what stands out to you as the big leaps?

00:13:13.560 | I would say this one, navigate and autopilot

00:13:16.160 | without having to confirm, is a huge leap.

00:13:21.120 | - It is a huge leap.

00:13:22.120 | - And it also automatically overtakes slow cars.

00:13:24.880 | So, it's both navigation and seeking the fastest lane.

00:13:29.880 | So, it'll overtake slower cars, and exit the freeway,

00:13:36.040 | and take highway interchanges,

00:13:38.640 | and then we have traffic light recognition,

00:13:45.520 | which is introduced initially as a warning.

00:13:50.200 | I mean, on the development version that I'm driving,

00:13:52.280 | the car fully stops and goes at traffic lights.

00:13:56.880 | - So, those are the steps, right?

00:13:58.480 | You've just mentioned some things

00:13:59.800 | that are an inkling of a step towards full autonomy.

00:14:02.360 | What would you say are the biggest technological roadblocks

00:14:08.000 | to full self-driving?

00:14:09.960 | - Actually, I don't think, I think we just,

00:14:11.440 | the full self-driving computer that we just,

00:14:13.600 | the Tesla, what we call the FSD computer,

00:14:17.120 | that's now in production.

00:14:20.640 | So, if you order any Model S or X,

00:14:24.280 | or any Model 3 that has the full self-driving package,

00:14:28.240 | you'll get the FSD computer.

00:14:29.760 | That's important to have enough base computation.

00:14:35.840 | Then refining the neural net and the control software,

00:14:38.840 | but all of that can just be provided as an over-the-air update.

00:14:42.840 | The thing that's really profound,

00:14:45.720 | and what I'll be emphasizing at the,

00:14:48.240 | sort of that investor day that we're having

00:14:51.960 | focused on autonomy,

00:14:53.320 | is that the cars currently being produced,

00:14:56.120 | or the hardware currently being produced,

00:14:58.200 | is capable of full self-driving.

00:15:01.000 | - But capable is an interesting word, because--

00:15:04.240 | - Like the hardware is.

00:15:05.920 | And as we refine the software,

00:15:07.560 | the capabilities will increase dramatically,

00:15:11.760 | and then the reliability will increase dramatically,

00:15:14.000 | and then it will receive regulatory approval.

00:15:16.200 | So, essentially, buying a car today

00:15:17.680 | is an investment in the future.

00:15:19.160 | You're essentially buying,

00:15:20.480 | I think the most profound thing is that

00:15:26.280 | if you buy a Tesla today,

00:15:27.800 | I believe you are buying an appreciating asset,

00:15:30.480 | not a depreciating asset.

00:15:33.120 | - So, that's a really important statement there,

00:15:35.320 | because if hardware is capable enough,

00:15:37.800 | that's the hard thing to upgrade, usually.

00:15:40.560 | So, then the rest is a software problem.

00:15:44.600 | - Software has no marginal cost, really.

00:15:47.940 | But, what's your intuition on the software side?

00:15:51.460 | How hard are the remaining steps

00:15:54.620 | to get it to where

00:15:57.700 | the experience, not just the safety,

00:16:03.940 | but the full experience,

00:16:05.740 | is something that people would enjoy?

00:16:09.260 | - I think people enjoy it very much so, on the highways.

00:16:12.820 | It's a total game changer for quality of life.

00:16:16.780 | For using Tesla autopilot on the highways.

00:16:21.340 | So, it's really just extending that functionality

00:16:23.020 | to city streets,

00:16:24.500 | adding in the traffic light recognition,

00:16:29.220 | navigating complex intersections,

00:16:31.420 | and then being able to navigate complicated parking lots,

00:16:36.420 | so the car can exit a parking space

00:16:40.460 | and come and find you,

00:16:41.300 | even if it's in a complete maze of a parking lot.

00:16:46.420 | And then it can just drop you off

00:16:49.940 | and find a parking spot by itself.

00:16:52.960 | - Yeah, in terms of enjoyability

00:16:54.420 | and something that people would actually find a lot of use

00:16:58.580 | from, the parking lot, is a really,

00:17:00.820 | it's rich of annoyance when you have to do it manually,

00:17:04.720 | so there's a lot of benefit to be gained

00:17:06.660 | from automation there.

00:17:07.820 | So, let me start injecting the human

00:17:10.380 | into this discussion a little bit.

00:17:12.780 | So, let's talk about full autonomy.

00:17:15.620 | If you look at the current level four vehicles

00:17:17.460 | being tested on road, like Waymo and so on,

00:17:19.780 | they're only technically autonomous.

00:17:23.380 | They're really level two systems

00:17:25.460 | with just a different design philosophy,

00:17:28.860 | because there's always a safety driver in almost all cases

00:17:31.540 | and they're monitoring the system.

00:17:33.340 | Do you see Tesla's full self-driving

00:17:38.060 | as still for a time to come,

00:17:40.620 | requiring supervision of the human being?

00:17:44.820 | So, its capabilities are powerful enough to drive,

00:17:47.460 | but nevertheless requires the human

00:17:49.020 | to still be supervising, just like a safety driver is

00:17:52.580 | in other fully autonomous vehicles.

00:17:57.380 | - I think it'll require detecting hands on wheel

00:18:01.540 | for at least six months or something like that from here.

00:18:06.540 | Really, it's a question of, from a regulatory standpoint,

00:18:14.840 | how much safer than a person does autopilot need to be

00:18:19.840 | for it to be okay to not monitor the car?

00:18:23.160 | And this is a debate that one can have,

00:18:27.120 | and then, but you need a large sample,

00:18:30.720 | a large amount of data so that you can prove

00:18:33.880 | with high confidence, statistically speaking,

00:18:36.640 | that the car is dramatically safer than a person,

00:18:40.400 | and that adding in the person monitoring

00:18:42.800 | does not materially affect the safety.

00:18:45.920 | So, it might need to be like 200 or 300% safer than a person.

00:18:50.120 | - And how do you prove that?

00:18:51.160 | - Incidents per mile.

00:18:52.400 | - Incidents per mile, so crashes and fatalities.

00:18:56.640 | - Yeah, fatalities would be a factor,

00:18:58.640 | but there are just not enough fatalities

00:19:00.440 | to be statistically significant at scale,

00:19:04.040 | but there are enough crashes,

00:19:06.960 | there are far more crashes than there are fatalities.

00:19:10.960 | So, you can assess what is the probability of a crash,

00:19:14.900 | then there's another step, which is probability of injury,

00:19:19.640 | and probability of permanent injury,

00:19:21.680 | and probability of death.

00:19:23.840 | And all of those need to be much better than a person

00:19:27.680 | by at least perhaps 200%.

00:19:32.680 | - And you think there's the ability

00:19:35.760 | to have a healthy discourse with the regulatory bodies

00:19:38.680 | on this topic?

00:19:40.080 | - I mean, there's no question that regulators

00:19:43.920 | pay a disproportionate amount of attention

00:19:46.880 | to that which generates press,

00:19:48.720 | this is just an objective fact,

00:19:50.560 | and Tesla generates a lot of press.

00:19:53.360 | So, in the United States,

00:19:57.800 | there's I think almost 40,000 automotive deaths per year.

00:20:01.180 | But if there are four in Tesla,

00:20:04.480 | they'll probably receive a thousand times more press

00:20:07.000 | than anyone else.

00:20:08.820 | - So, the psychology of that is actually fascinating.

00:20:11.480 | I don't think we'll have enough time to talk about that,

00:20:13.360 | but I have to talk to you about the human side of things.

00:20:17.040 | So, myself and our team at MIT recently released a paper

00:20:20.960 | on functional vigilance of drivers while using Autopilot.

00:20:24.600 | This is work we've been doing since Autopilot

00:20:27.480 | was first released publicly over three years ago,

00:20:30.220 | collecting video of driver faces and driver body.

00:20:34.600 | So, I saw that you tweeted a quote from the abstract,

00:20:38.460 | so I can at least guess that you've glanced at it.

00:20:43.360 | - Yeah, I read it.

00:20:44.520 | - Can I talk you through what we found?

00:20:46.320 | - Sure. - Okay.

00:20:47.280 | So, it appears that in the data that we've collected,

00:20:52.280 | that drivers are maintaining functional vigilance

00:20:55.240 | such that we're looking at 18,000 disengagements

00:20:57.880 | from Autopilot, 18,900,

00:21:00.400 | and annotating were they able to take over control

00:21:04.560 | in a timely manner.

00:21:05.760 | So, they were there, present, looking at the road

00:21:08.740 | to take over control.

00:21:10.180 | Okay, so this goes against what many would predict

00:21:15.180 | from the body of literature on vigilance with automation.

00:21:19.500 | Now, the question is, do you think these results

00:21:22.620 | hold across the broader population?

00:21:24.860 | So, ours is just a small subset.

00:21:27.320 | Do you think, one of the criticism is that,

00:21:31.480 | you know, there's a small minority of drivers

00:21:33.660 | that may be highly responsible,

00:21:36.060 | where their vigilance decrement would increase

00:21:38.840 | with Autopilot use?

00:21:40.400 | - I think this is all really gonna be swept.

00:21:42.580 | I mean, the system's improving so much, so fast,

00:21:47.580 | that this is gonna be a moot point very soon.

00:21:50.340 | Where vigilance is, like, if something's many times

00:21:57.100 | safer than a person, then adding a person does,

00:22:01.580 | the effect on safety is limited.

00:22:04.540 | And, in fact, it could be negative.

00:22:08.780 | - That's really interesting.

00:22:11.500 | So, the fact that a human may, some percent of the population

00:22:16.500 | may exhibit a vigilance decrement will not affect

00:22:20.640 | the overall statistics numbers of safety.

00:22:22.340 | - No, in fact, I think it will become, very, very quickly,

00:22:27.340 | maybe even towards the end of this year,

00:22:29.260 | but I'd say, I'd be shocked if it's not next year,

00:22:32.040 | at the latest, that having a human intervene

00:22:36.340 | will decrease safety.

00:22:37.880 | Decrease.

00:22:40.780 | I can imagine if you're in an elevator.

00:22:42.900 | Now, it used to be that there were elevator operators,

00:22:45.700 | and you couldn't go in an elevator by yourself

00:22:48.100 | and work the lever to move between floors.

00:22:50.940 | And now, nobody wants an elevator operator,

00:22:56.980 | because the automated elevator that stops the floors

00:23:00.500 | is much safer than the elevator operator.

00:23:02.700 | And, in fact, it would be quite dangerous

00:23:05.420 | to have someone with a lever that can move

00:23:07.780 | the elevator between floors.

00:23:09.780 | - So, that's a really powerful statement,

00:23:12.780 | and a really interesting one.

00:23:14.660 | But I also have to ask, from a user experience

00:23:16.900 | and from a safety perspective,

00:23:18.740 | one of the passions for me, algorithmically,

00:23:21.260 | is camera-based detection of just sensing the human,

00:23:25.980 | but detecting what the driver's looking at,

00:23:27.780 | cognitive load, body pose.

00:23:29.660 | On the computer vision side, that's a fascinating problem,

00:23:32.100 | but do you, and there's many in industry

00:23:34.540 | who believe you have to have

00:23:35.700 | camera-based driver monitoring.

00:23:37.540 | Do you think there could be benefit gained

00:23:39.820 | from driver monitoring?

00:23:41.700 | - If you have a system that's at or below

00:23:45.980 | a human-level reliability,

00:23:47.220 | then driver monitoring makes sense.

00:23:48.980 | But if your system is dramatically better,

00:23:52.100 | more reliable than a human,

00:23:54.220 | then driver monitoring does not help much.

00:23:59.220 | And, like I said, you wouldn't want someone in the elevator,

00:24:04.460 | if you're in an elevator,

00:24:06.620 | do you really want someone with a big lever,

00:24:08.500 | some random person operating an elevator between floors?

00:24:11.580 | I wouldn't trust that.

00:24:14.380 | I would rather have the buttons.

00:24:15.980 | - Okay, you're optimistic about the pace

00:24:20.220 | of improvement of the system,

00:24:21.780 | from what you've seen with a full self-driving car,

00:24:24.300 | computer, the rate of improvement is exponential.

00:24:27.340 | - So one of the other very interesting design choices

00:24:31.580 | early on that connects to this

00:24:33.780 | is the operational design domain of autopilot.

00:24:38.260 | So where autopilot is able to be turned on.

00:24:41.700 | So contrast another vehicle system that we're studying

00:24:47.140 | is the Cadillac Super Cruise system.

00:24:49.020 | That's, in terms of ODD,

00:24:50.500 | very constrained to particular kinds of highways,

00:24:53.580 | well-mapped, tested,

00:24:55.420 | but it's much narrower than the ODD of Tesla vehicles.

00:24:58.700 | - It's like ADD.

00:25:01.940 | (laughing)

00:25:04.620 | - That's good, that's a good line.

00:25:06.460 | What was the design decision

00:25:10.380 | in that different philosophy of thinking where,

00:25:14.380 | so there's pros and cons.

00:25:15.580 | What we see with a wide ODD

00:25:19.060 | is Tesla drivers are able to explore more

00:25:22.300 | the limitations of the system, at least early on,

00:25:24.500 | and they understand,

00:25:26.220 | together with the instrument cluster display,

00:25:28.260 | they start to understand what are the capabilities.

00:25:30.420 | So that's a benefit.

00:25:31.980 | The con is you're letting drivers use it basically anywhere.

00:25:36.980 | - Well, anywhere that could detect lanes with confidence.

00:25:41.620 | - Was there a philosophy,

00:25:43.100 | design decisions that were challenging

00:25:46.580 | that were being made there?

00:25:48.140 | Or from the very beginning,

00:25:49.500 | was that done on purpose with intent?

00:25:54.500 | - Well, I mean, I think,

00:25:56.100 | frankly, it's pretty crazy letting people

00:25:57.860 | drive a two-ton death machine manually.

00:26:01.500 | That's crazy.

00:26:03.840 | In the future, people will be like,

00:26:06.180 | I can't believe anyone was just allowed

00:26:08.620 | to drive one of these two-ton death machines,

00:26:12.980 | and they could just drive wherever they wanted.

00:26:14.460 | Just like elevators, you could just move the elevator

00:26:17.100 | with that lever wherever you want.

00:26:18.140 | It could stop at halfway between floors if you want.

00:26:20.580 | It's pretty crazy.

00:26:23.540 | So, it's gonna seem like a mad thing in the future

00:26:29.540 | that people were driving cars.

00:26:31.840 | - So I have a bunch of questions about the human psychology,

00:26:35.660 | about behavior and so on, that would become--

00:26:39.180 | - That mood's totally moot.

00:26:41.020 | - Because you have faith in the AI system.

00:26:46.300 | Not faith, but both on the hardware side

00:26:50.500 | and the deep learning approach of learning from data

00:26:52.940 | will make it just far safer than humans.

00:26:55.660 | - Yeah, exactly.

00:26:57.260 | - Recently, there are a few hackers

00:26:59.420 | who tricked Autopilot to act in unexpected ways

00:27:02.020 | with adversarial examples.

00:27:03.940 | So, we all know that neural network systems

00:27:06.500 | are very sensitive to minor disturbances

00:27:08.420 | to these adversarial examples on input.

00:27:11.220 | Do you think it's possible to defend against

00:27:13.140 | something like this for the industry?

00:27:15.980 | - Sure.

00:27:16.820 | (laughing)

00:27:19.060 | - Can you elaborate on the confidence behind that answer?

00:27:22.700 | - Well, a neural net is just a bunch of matrix math.

00:27:28.620 | You have to be a very sophisticated,

00:27:30.720 | somebody who really understands neural nets

00:27:33.320 | and basically reverse engineer how the matrix

00:27:37.340 | is being built and then create a little thing

00:27:40.500 | that just exactly causes the matrix math

00:27:44.100 | to be slightly off.

00:27:45.420 | But it's very easy to then block that

00:27:47.740 | by having basically negative recognition.

00:27:51.820 | It's like if the system sees something

00:27:53.900 | that looks like a matrix hack, exclude it.

00:27:57.700 | It's such an easy thing to do.

00:28:01.600 | - So, learn both on the valid data and the invalid data.

00:28:06.220 | So, basically learn on the adversarial examples

00:28:08.220 | to be able to exclude them.

00:28:09.740 | - Yeah, you basically want to both know

00:28:12.340 | what is a car and what is definitely not a car.

00:28:16.180 | You train for this is a car and this is definitely not a car.

00:28:19.100 | Those are two different things.

00:28:20.700 | People have no idea of neural nets, really.

00:28:23.820 | They probably think neural nets involves

00:28:25.300 | like fishing net or something.

00:28:27.100 | (laughing)

00:28:29.180 | - So, as you know, so taking a step beyond

00:28:33.980 | just Tesla and autopilot, current deep learning approaches

00:28:37.740 | still seem in some ways to be far

00:28:42.300 | from general intelligence systems.

00:28:44.660 | Do you think the current approaches

00:28:46.740 | will take us to general intelligence

00:28:49.740 | or do totally new ideas need to be invented?

00:28:53.820 | - I think we're missing a few key ideas

00:28:57.220 | for general intelligence,

00:29:00.100 | general artificial general intelligence.

00:29:05.140 | But it's gonna be upon us very quickly

00:29:07.220 | and then we'll need to figure out what shall we do

00:29:12.460 | if we even have that choice.

00:29:13.900 | But it's amazing how people can't differentiate

00:29:18.300 | between say the narrow AI that allows a car

00:29:22.740 | to figure out what a lane line is

00:29:24.420 | and navigate streets versus general intelligence.

00:29:29.420 | Like these are just very different things.

00:29:33.140 | Like your toaster and your computer are both machines

00:29:35.820 | but one's much more sophisticated than another.

00:29:38.660 | - You're confident with Tesla you can create

00:29:41.460 | the world's best toaster?

00:29:43.660 | - The world's best toaster, yes.

00:29:45.220 | The world's best self-driving.

00:29:46.740 | Yes.

00:29:51.060 | To me, right now, this seems game, set, match.

00:29:55.260 | I mean, I don't want to be complacent or overconfident

00:29:57.820 | but that is just literally how it appears right now.

00:30:02.660 | I could be wrong but it appears to be the case

00:30:06.340 | that Tesla is vastly ahead of everyone.

00:30:09.660 | - Do you think we will ever create an AI system

00:30:13.460 | that we can love and loves us back

00:30:16.300 | in a deep meaningful way like in the movie Her?

00:30:18.660 | - I think AI will be capable of convincing you

00:30:23.460 | to fall in love with it very well.

00:30:25.860 | - And that's different than us humans?

00:30:27.760 | - You know, we start getting into a metaphysical question

00:30:31.300 | of like do emotions and thoughts exist

00:30:33.820 | in a different realm than the physical?

00:30:35.620 | And maybe they do, maybe they don't, I don't know.

00:30:38.340 | But from a physics standpoint, I tend to think of things,

00:30:41.820 | like physics was my main sort of training

00:30:47.420 | and from a physics standpoint, essentially,

00:30:51.380 | if it loves you in a way that you can't tell

00:30:54.220 | whether it's real or not, it is real.

00:30:56.180 | - It's a physics view of love.

00:30:59.100 | - Yeah.

00:31:00.900 | If you cannot prove that it does not,

00:31:04.720 | if there's no test that you can apply

00:31:08.060 | that would make it,

00:31:10.160 | allow you to tell the difference,

00:31:15.900 | then there is no difference.

00:31:17.500 | - And it's similar to seeing our world as simulation.

00:31:21.100 | There may not be a test to tell the difference

00:31:22.940 | between what the real world and the simulation

00:31:25.420 | and therefore, from a physics perspective,

00:31:27.860 | it might as well be the same thing.

00:31:29.340 | - Yes.

00:31:30.180 | And there may be ways to test whether it's a simulation.

00:31:33.120 | There might be, I'm not saying there aren't,

00:31:36.020 | but you could certainly imagine that a simulation

00:31:37.940 | could correct that once an entity in the simulation

00:31:41.100 | found a way to detect the simulation,

00:31:43.060 | it could either restart, pause the simulation,

00:31:47.380 | start a new simulation or do one of many other things

00:31:49.900 | that then corrects for that error.

00:31:51.660 | - So when maybe you or somebody else creates an AGI system

00:31:59.340 | and you get to ask her one question,

00:32:02.980 | what would that question be?

00:32:04.380 | - What's outside the simulation?

00:32:18.700 | - Elon, thank you so much for talking today.

00:32:23.380 | It was a pleasure.

00:32:24.260 | - All right, thank you.

00:32:25.420 | (upbeat music)

00:32:28.000 | (upbeat music)

00:32:30.580 | (upbeat music)

00:32:33.160 | (upbeat music)

00:32:35.740 | (upbeat music)

00:32:38.320 | (upbeat music)

00:32:40.900 | [BLANK_AUDIO]

Elon Musk: Tesla Autopilot | Lex Fridman Podcast #18

Chapters