Arguing Machines: Tesla Autopilot vs Neural Network

00:00:00.000 | Our group at MIT is studying semi-autonomous vehicles.

00:00:02.880 | Now that includes both inward-facing sensors for driver state sensing and outward-facing

00:00:07.740 | sensors for scene perception and the control planning, motion planning task.

00:00:13.200 | Now today we'll look at the second part of that, at the perception and the control of

00:00:17.520 | the vehicle.

00:00:19.740 | On the dashboard of the Tesla, there's a Jetson TX2 with a camera sitting on top of it.

00:00:25.780 | We have a neural network end-to-end running on the Jetson that's detecting the forward

00:00:31.040 | roadway, taking it as a sequence of images and producing steering commands.

00:00:35.240 | We also have here a Tesla that has a perception control system on it in the form of Autopilot.

00:00:41.560 | It's using a monocular camera.

00:00:43.040 | This is the hardware version one.

00:00:45.840 | It's making decisions based on this single video stream producing steering commands.

00:00:52.000 | And we'll look at two systems arguing today, Autopilot arguing against a neural network.

00:00:58.120 | And we'll see what comes out.

00:01:00.800 | In this concept, Tesla Autopilot is the primary AI system and the end-to-end neural network

00:01:05.840 | is the secondary AI system.

00:01:08.080 | And the disagreement between the two is used to detect challenging situations and seek

00:01:12.080 | human driver supervision.

00:01:14.080 | It is important to clarify that this is not a criticism of Autopilot.

00:01:19.240 | Of the two, it is by far the superior perception control system.

00:01:24.280 | The question is whether the argument between the two systems can create transparency that

00:01:29.500 | leverage the human driver as a supervisor of challenging driving scenarios, scenarios

00:01:34.740 | that may have not otherwise been caught by Autopilot alone.

00:01:39.040 | This is a general framework for supervision of black box AI systems that we hope can help

00:01:43.800 | save human lives.

00:01:46.880 | In the paper accompanying this video, we show that we can predict driver-initiated disengagement

00:01:51.960 | of Autopilot with a simple threshold on the disagreement of steering decisions.

00:01:58.400 | We believe this is a very surprising and powerful result that hopefully may be useful for human

00:02:04.440 | supervision of any kind of AI system that operates in the real world and makes decisions

00:02:09.440 | where errors may result in loss of human life.

00:02:15.240 | A quick note that we use the intensity of red color on the disagreement detected text

00:02:19.880 | as the visualization of disagreement magnitude.

00:02:22.800 | In retrospect, this is not an effective visualization because visually it looks like the two systems

00:02:28.640 | are constantly disagreeing.

00:02:30.040 | They are not.

00:02:31.920 | The intent of the on-road demo is to show successful real-time operation of the Argue

00:02:36.200 | Machines framework.

00:02:37.980 | The paper that goes along with this approach, on the other hand, is where we show the predictive

00:02:42.200 | power of this approach on large-scale naturalistic data.

00:02:47.800 | Inside the car, we have a screen over the center stack and a Jetson TX2 with a camera

00:02:54.800 | on top of it.

00:02:55.800 | The camera is feeding a video stream into the Jetson.

00:02:58.640 | On the Jetson is a neural network that's predicting the steering command, taking in end-to-end

00:03:04.720 | the video stream from the forward roadway and as an output for the neural network giving

00:03:10.000 | a steering command.

00:03:11.440 | That's being shown as pink on this display.

00:03:13.480 | The pink line is the steering suggested by the neural network.

00:03:17.360 | Cyan line is the steering of the car, of the Tesla that we're getting from the cam bus.

00:03:23.640 | When I move the steering wheel around, we see that live in real-time mapped on this

00:03:30.400 | graphic here showing in cyan the steering position of the car.

00:03:35.360 | Up top is whenever the two disagree significantly, the disagreement detected red sign appears

00:03:42.040 | showing that there's a disagreement.

00:03:43.880 | And I'll demonstrate that on road.

00:03:47.440 | We're now driving on the highway with the Tesla being controlled by autopilot and the

00:03:53.760 | Jetson TX2 on the dashboard with a camera plugged in has a neural network running on

00:03:59.200 | it end-to-end.

00:04:02.080 | And the input to the neural network is a sequence of images and the output is steering commands.

00:04:06.760 | Now there's two perception control systems working here.

00:04:09.880 | One is autopilot, the other one is an end-to-end neural network.

00:04:14.200 | Both the steering commands from both are being visualized on the center stack here.

00:04:18.760 | In pink is the output from the neural network, in cyan is the output from autopilot.

00:04:28.240 | And whenever there is some disagreement or a lot of disagreement, up on top there's a

00:04:32.320 | disagreement detected text that becomes more intensely red the greater the disagreement.

00:04:40.680 | At the bottom of the screen is the input to the neural network that is a sequence of images

00:04:46.560 | that is subtracted from each other capturing the temporal dynamics of the scene.

00:04:51.080 | All right, so why is this interesting?

00:04:54.600 | Because two perception control systems, two AI systems taking in the external world using

00:05:01.120 | a monocular camera and making a prediction, making steering commands to control the vehicle.

00:05:06.640 | Now whenever those two systems disagree, that's interesting for many reasons.

00:05:10.880 | One, the disagreement is an indicator that from a visual perspective, from a perception

00:05:15.720 | perspective the situation is challenging for those systems.

00:05:19.480 | Therefore you might want to bring the driver's attention to the situation so they take control

00:05:23.840 | back from the vehicle.

00:05:26.720 | It's also interesting for validating systems.

00:05:29.400 | So if you propose a new perception control system, you can imagine putting it into a

00:05:34.120 | car to go along with autopilot or with other similar systems to see how well that new system

00:05:41.600 | works with autopilot when it disagrees, when it doesn't.

00:05:45.480 | And the disagreement from the computer vision aspect is also really interesting for detecting

00:05:50.960 | edge cases.

00:05:52.360 | So the challenging thing about driving or for building autonomous vehicles is that most

00:05:57.040 | of the driving is really boring.

00:05:59.560 | The interesting bits happen rarely.

00:06:02.200 | So one of the ways to detect those interesting bits, the edge cases, is to look at the disagreement

00:06:07.880 | between these perception systems, to look at cases when the two perception systems diverge

00:06:13.320 | and therefore they struggle with that situation.

00:06:17.160 | Finally, when the driver is controlling and takes control of the vehicle, which I am doing

00:06:24.920 | now, and when my steering decisions, my turning of the steering wheel is such that the neural

00:06:33.560 | network disagrees, it perhaps means that I am either distracted or the situation is visually

00:06:42.120 | challenging, therefore I should pay extra attention.

00:06:44.800 | So it makes sense for the system to warn you about that situation.

00:06:50.760 | Now the interesting thing about Tesla and the autopilot system is that if we instrument

00:06:56.480 | a lot of these vehicles, as we have, we've instrumented 20 Teslas as part of the MIT

00:07:01.560 | Autonomous Vehicle Study and are collecting month after month, year after year now, data,

00:07:07.880 | video in and video out.

00:07:09.800 | We can use that data to train better systems, to train perception systems, control, motion

00:07:16.360 | planning and the end-to-end network that we're showing today.

00:07:19.760 | We have the large-scale data to train the learning-based perception and control algorithms.

00:07:29.320 | Now an important thing to mention is that these systems were designed to work on the

00:07:33.720 | highway, at highway speeds.

00:07:35.640 | So the kind of disagreement it's trained to detect is disagreement between autopilot and

00:07:41.280 | the neural network in highway situations.

00:07:44.520 | So the visual characteristics of lane markings deteriorating or construction zones and so

00:07:49.240 | on.

00:07:50.240 | Now the details, and if you're interested in more, can be found in a paper titled "Arguing

00:07:55.160 | Machines."

00:07:55.840 | [BLANK_AUDIO]

00:08:04.640 | [BLANK_AUDIO]

Arguing Machines: Tesla Autopilot vs Neural Network

Chapters