Back to Index

Drive Gaze Region Classification in a Tesla


Transcript

We're driving around MIT campus today in this little bit of rain. In a Tesla we've instrumented with six cameras plus other sensors, all going into a single board computer. The reason we're doing that is we're going to give you a demo of driver gaze classification. One of the things we're interested in our group at the MIT Age Lab is developing systems for driver state detection.

So there is the first part is the perception control and planning, which comes from the external sensors, video cameras, radar, sometimes LiDAR. And then there's inward facing sensors like the cameras we have in here that detect the state of the driver. This is an important component because the car that's driving itself needs to know when the driver is able to take control back and vice versa.

And now is a visualization of some of the synchronized data we're capturing, both for real time detection and post-processing analysis. Top left is the video of the face. Bottom left is the video of the hands, lap and the instrument cluster. Bottom middle is the cropped video of the center stack display.

Bottom right is the fish eye video of the instrument cluster. And the top right is a video of the forward roadway. And then there are two things being detected. In the top middle is the visualization of the facial landmarks used in the gaze classification. In the bottom right are annotations of the instrument cluster video showing the status of the autopilot based on the automatically detected autopilot icon.

We can think of these two detections as classifying the state of the human and the state of the machine. And allows us to study the handover of control from the human to the machine and back. One of the key novel aspects of our approach is instead of looking at gaze estimation as a geometric problem, we treat it as a supervised learning problem in classifying gaze into one of six regions of road, rearview mirror, left, right, center stack and instrument cluster.

This approach allows us to use large semi-automatically annotated data sets to generalize over the edge cases that pop up in the wild. And then in addition to the data on the CAN network, there's the automated detection of automation state from the instrument cluster. This combination of detecting human state and machine state allows us to study the interaction between the two.