back to indexArguing Machines: Tesla Autopilot vs Neural Network
Chapters
0:0 Outside intro
1:0 Concept overview
2:46 In-car description of components
3:45 On-road demonstration
00:00:00.000 |
Our group at MIT is studying semi-autonomous vehicles. 00:00:02.880 |
Now that includes both inward-facing sensors for driver state sensing and outward-facing 00:00:07.740 |
sensors for scene perception and the control planning, motion planning task. 00:00:13.200 |
Now today we'll look at the second part of that, at the perception and the control of 00:00:19.740 |
On the dashboard of the Tesla, there's a Jetson TX2 with a camera sitting on top of it. 00:00:25.780 |
We have a neural network end-to-end running on the Jetson that's detecting the forward 00:00:31.040 |
roadway, taking it as a sequence of images and producing steering commands. 00:00:35.240 |
We also have here a Tesla that has a perception control system on it in the form of Autopilot. 00:00:45.840 |
It's making decisions based on this single video stream producing steering commands. 00:00:52.000 |
And we'll look at two systems arguing today, Autopilot arguing against a neural network. 00:01:00.800 |
In this concept, Tesla Autopilot is the primary AI system and the end-to-end neural network 00:01:08.080 |
And the disagreement between the two is used to detect challenging situations and seek 00:01:14.080 |
It is important to clarify that this is not a criticism of Autopilot. 00:01:19.240 |
Of the two, it is by far the superior perception control system. 00:01:24.280 |
The question is whether the argument between the two systems can create transparency that 00:01:29.500 |
leverage the human driver as a supervisor of challenging driving scenarios, scenarios 00:01:34.740 |
that may have not otherwise been caught by Autopilot alone. 00:01:39.040 |
This is a general framework for supervision of black box AI systems that we hope can help 00:01:46.880 |
In the paper accompanying this video, we show that we can predict driver-initiated disengagement 00:01:51.960 |
of Autopilot with a simple threshold on the disagreement of steering decisions. 00:01:58.400 |
We believe this is a very surprising and powerful result that hopefully may be useful for human 00:02:04.440 |
supervision of any kind of AI system that operates in the real world and makes decisions 00:02:09.440 |
where errors may result in loss of human life. 00:02:15.240 |
A quick note that we use the intensity of red color on the disagreement detected text 00:02:19.880 |
as the visualization of disagreement magnitude. 00:02:22.800 |
In retrospect, this is not an effective visualization because visually it looks like the two systems 00:02:31.920 |
The intent of the on-road demo is to show successful real-time operation of the Argue 00:02:37.980 |
The paper that goes along with this approach, on the other hand, is where we show the predictive 00:02:42.200 |
power of this approach on large-scale naturalistic data. 00:02:47.800 |
Inside the car, we have a screen over the center stack and a Jetson TX2 with a camera 00:02:55.800 |
The camera is feeding a video stream into the Jetson. 00:02:58.640 |
On the Jetson is a neural network that's predicting the steering command, taking in end-to-end 00:03:04.720 |
the video stream from the forward roadway and as an output for the neural network giving 00:03:13.480 |
The pink line is the steering suggested by the neural network. 00:03:17.360 |
Cyan line is the steering of the car, of the Tesla that we're getting from the cam bus. 00:03:23.640 |
When I move the steering wheel around, we see that live in real-time mapped on this 00:03:30.400 |
graphic here showing in cyan the steering position of the car. 00:03:35.360 |
Up top is whenever the two disagree significantly, the disagreement detected red sign appears 00:03:47.440 |
We're now driving on the highway with the Tesla being controlled by autopilot and the 00:03:53.760 |
Jetson TX2 on the dashboard with a camera plugged in has a neural network running on 00:04:02.080 |
And the input to the neural network is a sequence of images and the output is steering commands. 00:04:06.760 |
Now there's two perception control systems working here. 00:04:09.880 |
One is autopilot, the other one is an end-to-end neural network. 00:04:14.200 |
Both the steering commands from both are being visualized on the center stack here. 00:04:18.760 |
In pink is the output from the neural network, in cyan is the output from autopilot. 00:04:28.240 |
And whenever there is some disagreement or a lot of disagreement, up on top there's a 00:04:32.320 |
disagreement detected text that becomes more intensely red the greater the disagreement. 00:04:40.680 |
At the bottom of the screen is the input to the neural network that is a sequence of images 00:04:46.560 |
that is subtracted from each other capturing the temporal dynamics of the scene. 00:04:54.600 |
Because two perception control systems, two AI systems taking in the external world using 00:05:01.120 |
a monocular camera and making a prediction, making steering commands to control the vehicle. 00:05:06.640 |
Now whenever those two systems disagree, that's interesting for many reasons. 00:05:10.880 |
One, the disagreement is an indicator that from a visual perspective, from a perception 00:05:15.720 |
perspective the situation is challenging for those systems. 00:05:19.480 |
Therefore you might want to bring the driver's attention to the situation so they take control 00:05:26.720 |
It's also interesting for validating systems. 00:05:29.400 |
So if you propose a new perception control system, you can imagine putting it into a 00:05:34.120 |
car to go along with autopilot or with other similar systems to see how well that new system 00:05:41.600 |
works with autopilot when it disagrees, when it doesn't. 00:05:45.480 |
And the disagreement from the computer vision aspect is also really interesting for detecting 00:05:52.360 |
So the challenging thing about driving or for building autonomous vehicles is that most 00:06:02.200 |
So one of the ways to detect those interesting bits, the edge cases, is to look at the disagreement 00:06:07.880 |
between these perception systems, to look at cases when the two perception systems diverge 00:06:13.320 |
and therefore they struggle with that situation. 00:06:17.160 |
Finally, when the driver is controlling and takes control of the vehicle, which I am doing 00:06:24.920 |
now, and when my steering decisions, my turning of the steering wheel is such that the neural 00:06:33.560 |
network disagrees, it perhaps means that I am either distracted or the situation is visually 00:06:42.120 |
challenging, therefore I should pay extra attention. 00:06:44.800 |
So it makes sense for the system to warn you about that situation. 00:06:50.760 |
Now the interesting thing about Tesla and the autopilot system is that if we instrument 00:06:56.480 |
a lot of these vehicles, as we have, we've instrumented 20 Teslas as part of the MIT 00:07:01.560 |
Autonomous Vehicle Study and are collecting month after month, year after year now, data, 00:07:09.800 |
We can use that data to train better systems, to train perception systems, control, motion 00:07:16.360 |
planning and the end-to-end network that we're showing today. 00:07:19.760 |
We have the large-scale data to train the learning-based perception and control algorithms. 00:07:29.320 |
Now an important thing to mention is that these systems were designed to work on the 00:07:35.640 |
So the kind of disagreement it's trained to detect is disagreement between autopilot and 00:07:44.520 |
So the visual characteristics of lane markings deteriorating or construction zones and so 00:07:50.240 |
Now the details, and if you're interested in more, can be found in a paper titled "Arguing