Back to Index

MIT Advanced Vehicle Technology Study (MIT-AVT)


Transcript

As part of the MIT Autonomous Vehicle Technology Study, we're instrumenting cars with various degrees of automation. So let's take a look at one of those cars, a Tesla Model S, and look at our instrumentation. Inside the car, we have three cameras. One is looking at the driver's face, and that's capturing things like where the driver is looking, the drowsiness state of the driver, the emotional state, and also cognitive load.

We have a camera looking at the driver's body, a fisheye lens camera that's capturing the entire body of the driver, including hands. That's giving you information about whether the hands are off-wheel, whether the body is aligned, and further supplementary information about the state of the driver that the face camera provides.

And finally, there's a forward-facing camera attached to the windshield that's looking at the forward roadway, and it's capturing everything in the external environment, such as the vehicles, the lanes, and other characteristics of the road. Having these three cameras in the car allows us to study driver behavior and interaction with automation.

So the driver-facing camera, looking at the face, a camera looking at the body, and a camera looking at the outside environment allows us to understand over hundreds of thousands of miles of real-world driving, how people interact with these technologies, how we can have artificial intelligence systems play an important role in keeping us safe and providing an enjoyable experience in driving.

We have now, to date, collected 275,000 miles of real-world driving and interaction with autonomous systems in Tesla Model S vehicles, in Land Rover Evoque vehicles, and a Volvo S90. But most importantly, once that data is collected, it's just raw pixels. 3.5 billion video frames of raw pixels. We're using computer vision, deep learning methods, to convert those pixels into knowledge, into understanding of what the drivers are actually doing with these systems.

That comes from the face camera, that comes from the body camera, and the forward-facing camera. Understanding comes from actually being able to touch every single one of those frames and convert them into behavior of human beings as they interact with these artificial intelligence systems. 3.5 billion frames of real-world driving.

3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving.

3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving.

3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving. 3.5 billion frames of real-world driving.