back to index

Arguing Machines: Tesla Autopilot vs Neural Network


Chapters

0:0 Outside intro
1:0 Concept overview
2:46 In-car description of components
3:45 On-road demonstration

Whisper Transcript | Transcript Only Page

00:00:00.000 | Our group at MIT is studying semi-autonomous vehicles.
00:00:02.880 | Now that includes both inward-facing sensors for driver state sensing and outward-facing
00:00:07.740 | sensors for scene perception and the control planning, motion planning task.
00:00:13.200 | Now today we'll look at the second part of that, at the perception and the control of
00:00:17.520 | the vehicle.
00:00:19.740 | On the dashboard of the Tesla, there's a Jetson TX2 with a camera sitting on top of it.
00:00:25.780 | We have a neural network end-to-end running on the Jetson that's detecting the forward
00:00:31.040 | roadway, taking it as a sequence of images and producing steering commands.
00:00:35.240 | We also have here a Tesla that has a perception control system on it in the form of Autopilot.
00:00:41.560 | It's using a monocular camera.
00:00:43.040 | This is the hardware version one.
00:00:45.840 | It's making decisions based on this single video stream producing steering commands.
00:00:52.000 | And we'll look at two systems arguing today, Autopilot arguing against a neural network.
00:00:58.120 | And we'll see what comes out.
00:01:00.800 | In this concept, Tesla Autopilot is the primary AI system and the end-to-end neural network
00:01:05.840 | is the secondary AI system.
00:01:08.080 | And the disagreement between the two is used to detect challenging situations and seek
00:01:12.080 | human driver supervision.
00:01:14.080 | It is important to clarify that this is not a criticism of Autopilot.
00:01:19.240 | Of the two, it is by far the superior perception control system.
00:01:24.280 | The question is whether the argument between the two systems can create transparency that
00:01:29.500 | leverage the human driver as a supervisor of challenging driving scenarios, scenarios
00:01:34.740 | that may have not otherwise been caught by Autopilot alone.
00:01:39.040 | This is a general framework for supervision of black box AI systems that we hope can help
00:01:43.800 | save human lives.
00:01:46.880 | In the paper accompanying this video, we show that we can predict driver-initiated disengagement
00:01:51.960 | of Autopilot with a simple threshold on the disagreement of steering decisions.
00:01:58.400 | We believe this is a very surprising and powerful result that hopefully may be useful for human
00:02:04.440 | supervision of any kind of AI system that operates in the real world and makes decisions
00:02:09.440 | where errors may result in loss of human life.
00:02:15.240 | A quick note that we use the intensity of red color on the disagreement detected text
00:02:19.880 | as the visualization of disagreement magnitude.
00:02:22.800 | In retrospect, this is not an effective visualization because visually it looks like the two systems
00:02:28.640 | are constantly disagreeing.
00:02:30.040 | They are not.
00:02:31.920 | The intent of the on-road demo is to show successful real-time operation of the Argue
00:02:36.200 | Machines framework.
00:02:37.980 | The paper that goes along with this approach, on the other hand, is where we show the predictive
00:02:42.200 | power of this approach on large-scale naturalistic data.
00:02:47.800 | Inside the car, we have a screen over the center stack and a Jetson TX2 with a camera
00:02:54.800 | on top of it.
00:02:55.800 | The camera is feeding a video stream into the Jetson.
00:02:58.640 | On the Jetson is a neural network that's predicting the steering command, taking in end-to-end
00:03:04.720 | the video stream from the forward roadway and as an output for the neural network giving
00:03:10.000 | a steering command.
00:03:11.440 | That's being shown as pink on this display.
00:03:13.480 | The pink line is the steering suggested by the neural network.
00:03:17.360 | Cyan line is the steering of the car, of the Tesla that we're getting from the cam bus.
00:03:23.640 | When I move the steering wheel around, we see that live in real-time mapped on this
00:03:30.400 | graphic here showing in cyan the steering position of the car.
00:03:35.360 | Up top is whenever the two disagree significantly, the disagreement detected red sign appears
00:03:42.040 | showing that there's a disagreement.
00:03:43.880 | And I'll demonstrate that on road.
00:03:47.440 | We're now driving on the highway with the Tesla being controlled by autopilot and the
00:03:53.760 | Jetson TX2 on the dashboard with a camera plugged in has a neural network running on
00:03:59.200 | it end-to-end.
00:04:02.080 | And the input to the neural network is a sequence of images and the output is steering commands.
00:04:06.760 | Now there's two perception control systems working here.
00:04:09.880 | One is autopilot, the other one is an end-to-end neural network.
00:04:14.200 | Both the steering commands from both are being visualized on the center stack here.
00:04:18.760 | In pink is the output from the neural network, in cyan is the output from autopilot.
00:04:28.240 | And whenever there is some disagreement or a lot of disagreement, up on top there's a
00:04:32.320 | disagreement detected text that becomes more intensely red the greater the disagreement.
00:04:40.680 | At the bottom of the screen is the input to the neural network that is a sequence of images
00:04:46.560 | that is subtracted from each other capturing the temporal dynamics of the scene.
00:04:51.080 | All right, so why is this interesting?
00:04:54.600 | Because two perception control systems, two AI systems taking in the external world using
00:05:01.120 | a monocular camera and making a prediction, making steering commands to control the vehicle.
00:05:06.640 | Now whenever those two systems disagree, that's interesting for many reasons.
00:05:10.880 | One, the disagreement is an indicator that from a visual perspective, from a perception
00:05:15.720 | perspective the situation is challenging for those systems.
00:05:19.480 | Therefore you might want to bring the driver's attention to the situation so they take control
00:05:23.840 | back from the vehicle.
00:05:26.720 | It's also interesting for validating systems.
00:05:29.400 | So if you propose a new perception control system, you can imagine putting it into a
00:05:34.120 | car to go along with autopilot or with other similar systems to see how well that new system
00:05:41.600 | works with autopilot when it disagrees, when it doesn't.
00:05:45.480 | And the disagreement from the computer vision aspect is also really interesting for detecting
00:05:50.960 | edge cases.
00:05:52.360 | So the challenging thing about driving or for building autonomous vehicles is that most
00:05:57.040 | of the driving is really boring.
00:05:59.560 | The interesting bits happen rarely.
00:06:02.200 | So one of the ways to detect those interesting bits, the edge cases, is to look at the disagreement
00:06:07.880 | between these perception systems, to look at cases when the two perception systems diverge
00:06:13.320 | and therefore they struggle with that situation.
00:06:17.160 | Finally, when the driver is controlling and takes control of the vehicle, which I am doing
00:06:24.920 | now, and when my steering decisions, my turning of the steering wheel is such that the neural
00:06:33.560 | network disagrees, it perhaps means that I am either distracted or the situation is visually
00:06:42.120 | challenging, therefore I should pay extra attention.
00:06:44.800 | So it makes sense for the system to warn you about that situation.
00:06:50.760 | Now the interesting thing about Tesla and the autopilot system is that if we instrument
00:06:56.480 | a lot of these vehicles, as we have, we've instrumented 20 Teslas as part of the MIT
00:07:01.560 | Autonomous Vehicle Study and are collecting month after month, year after year now, data,
00:07:07.880 | video in and video out.
00:07:09.800 | We can use that data to train better systems, to train perception systems, control, motion
00:07:16.360 | planning and the end-to-end network that we're showing today.
00:07:19.760 | We have the large-scale data to train the learning-based perception and control algorithms.
00:07:29.320 | Now an important thing to mention is that these systems were designed to work on the
00:07:33.720 | highway, at highway speeds.
00:07:35.640 | So the kind of disagreement it's trained to detect is disagreement between autopilot and
00:07:41.280 | the neural network in highway situations.
00:07:44.520 | So the visual characteristics of lane markings deteriorating or construction zones and so
00:07:50.240 | Now the details, and if you're interested in more, can be found in a paper titled "Arguing
00:07:55.160 | Machines."
00:07:55.840 | [BLANK_AUDIO]
00:08:04.640 | [BLANK_AUDIO]