back to indexMIT-AVT: Data Collection Device (for Large-Scale Semi-Autonomous Driving)
Chapters
0:0
1:22 Components
1:37 Sensory Integration
3:17 Logitech C920 Webcam
7:30 Synchronization
12:30 Next Steps
00:00:00.000 |
The MIT Autonomous Vehicle Technology Study is all about collecting large amounts of naturalistic driving data. 00:00:07.000 |
Behind that data collection is this box right here that Dan has termed "Rider". 00:00:13.000 |
Dan is behind a lot of the hardware work we do, embedded systems, 00:00:17.000 |
and Michael is behind a lot of the software, the data pipeline, as well as just offloading the data from the device. 00:00:24.000 |
We'd like to tell you some of the details behind Rider, behind the sensors. 00:00:30.000 |
Now, we have three cameras in the car and the wires are running back into the trunk, and that's where Rider is sitting. 00:00:36.000 |
There's a lot of design specifications to make the system work month after month, 00:00:41.000 |
reliably across multiple vehicles, across multiple weather conditions, and so on. 00:00:46.000 |
At the end of the day, we have multiple sensor streams. 00:00:49.000 |
We have the three cameras coming in, we have IMU, GPS, and all of the raw CAN messages coming from the vehicle itself. 00:00:57.000 |
And all of that has to be collected reliably, synchronized, and post-processed once we offload the data. 00:01:04.000 |
First, we have a single-board computer here running a custom version of Linux that we wrote specifically for this application. 00:01:10.000 |
This single-board computer integrates all of the cameras, all the sensors, GPS, CAN, IMU, 00:01:16.000 |
and offloads it all onto the solid-state hard drive that we have on board. 00:01:21.000 |
There are some extra components here for cellular communication, as well as power management throughout the device. 00:01:27.000 |
Here we have our single-board computer, as well as sensor integration and our power system. 00:01:32.000 |
This is our solid-state drive that connects directly to our single-board computer. 00:01:36.000 |
On our single-board computer, we have a sensor integration board on top here. 00:01:40.000 |
You'll be able to see our real-time clock, as well as its battery backup and CAN transceiver. 00:01:44.000 |
On the reverse side of this board, we have our GPS receiver and IMU. 00:01:49.000 |
This is our CAN-controlled power board, which monitors CAN throughout the car and determines whether or not the system should be on or off. 00:01:55.000 |
When the system is on, this sends power through a buck converter to reduce the 12 volts from the vehicle down to 5 volts to operate the single-board computer. 00:02:04.000 |
We also have a 4G wireless connection on board to monitor the health of Rider and determine things like free capacity left on our drive, as well as temperature and power usage information. 00:02:17.000 |
The cameras connect to Rider through this USB hub right here. 00:02:24.000 |
We needed the box to do at least three things. 00:02:26.000 |
One was record from at least three cameras, record CAN vehicle telemetry data, 00:02:32.000 |
and then lastly, be able to store all this data on board for a long period of time, 00:02:37.000 |
such that people could drive around for months without having us to offload the data from their vehicles. 00:02:43.000 |
When we're talking about hundreds of thousands of miles of data, 00:02:50.000 |
for about every 100,000 miles uncompressed, that's about 100 petabytes of video data. 00:02:59.000 |
One of the key other requirements was how to store all this data both on the device 00:03:04.000 |
and how to be able to then offload it successfully onto thousands of machines to be then processed with the computer vision, 00:03:11.000 |
with the deep learning algorithms that we're using. 00:03:14.000 |
One of the essential elements for that was to do compression on board. 00:03:20.000 |
They can do up to 1080p at 30 frames a second. 00:03:24.000 |
The major reason why we went with these is because they do onboard H.264 compression of the video. 00:03:30.000 |
So that allows us to offload all the processing from our single board computer onto these individual cameras, 00:03:36.000 |
allowing us to use a very slim, pared down, lightweight single board computer to run all of these sensors. 00:03:44.000 |
This is the original Logitech C920 that you would buy at a store. 00:03:48.000 |
These are the two same Logitech C920s, although they were put into a custom-made camera case just for this application. 00:03:55.000 |
What this allows us to do is add our own CS-type lenses to enable us to have a zoom lens, 00:04:01.000 |
as well as a fisheye lens from within the car, allowing us a greater range of field of views inside the vehicle. 00:04:15.000 |
So these are the types of standard lenses that connect to these types of cameras, 00:04:20.000 |
often to the industrial cameras that are often used for autonomous vehicle applications. 00:04:25.000 |
We tested these cameras to see what would happen to them if placed inside of a hot car on a summer day. 00:04:32.000 |
We wanted to see would these cameras still be able to hold up to the heat in the summer and still function as needed? 00:04:44.000 |
And this is the temperature that it went up to. 00:04:46.000 |
We cycled these cameras between 58 and 75 degrees Celsius, 00:04:51.000 |
which is about the maximum of 150 degrees Fahrenheit max temperature that a car would get in the summer. 00:04:59.000 |
We also cranked it up to 127 degrees Celsius just to see what would happen to these cameras after prolonged long-term high heat. 00:05:08.000 |
In fact, these cameras continued to work perfectly fine after that. 00:05:13.000 |
Creating a system that would intelligently and autonomously turn off and on to start and end recording was also a key aspect to this device. 00:05:21.000 |
Since people were just going to be driving their normal cars, we couldn't rely on them necessarily to start and end recording. 00:05:27.000 |
So this device, Rider, intelligently figures out when the car is running and when it's off to start and stop recording automatically. 00:05:37.000 |
So how does Rider specifically know when to turn on? 00:05:41.000 |
So we use CAN to determine when the system should turn off and on. 00:05:45.000 |
When CAN is active, the car is running, and we should turn the system on. 00:05:48.000 |
When CAN is inactive, we should turn the system off and end recording. 00:05:53.000 |
This also gives us the ability to trigger on certain CAN messages. 00:05:56.000 |
So, for instance, if we want to start recording as soon as they approach the car and unlock the door, we can do that. 00:06:02.000 |
Or if they turn the car on or they put it into drive or so on. 00:06:06.000 |
The cost of the car that the system resides in is about a thousand times more than the system itself. 00:06:15.000 |
So we have to make sure that we design the system, we run the wires in such a way that it doesn't do any damage to the vehicles. 00:06:23.000 |
The biggest issue we've had with this system are camera cables becoming unplugged. 00:06:28.000 |
So when a camera cable becomes unplugged, the system will try to restart that subsystem multiple times. 00:06:35.000 |
And if it's unable to, it completely shuts off recording. 00:06:38.000 |
And as long as that cable is still unplugged, Rider will not start up the next time. 00:06:42.000 |
So one issue that we've seen is that cables becoming unplugged causes us to lose the potential to record some data. 00:06:49.000 |
And that was one of the requirements of the system from the very beginning, 00:06:52.000 |
is that all the video streams are always recorded perfectly and synchronized. 00:06:57.000 |
Now, if any of the systems are failing to be recording from the sensors, 00:07:01.000 |
then we try again, restart the system, restart the system, and if it's still not working, it shuts down. 00:07:08.000 |
So the video, in order to understand what drivers are doing in these systems, the video is essential. 00:07:13.000 |
So if one of the cameras is not working, that means a system that's not working as a whole. 00:07:20.000 |
The other crucial component of having a data collection system that's taking multiple streams 00:07:28.000 |
is that those streams have to be synchronized perfectly. 00:07:31.000 |
Synchronization was the highest priority from the very beginning of Rider's design. 00:07:38.000 |
We have a real-time clock onboard Rider that allows us down to two parts per million of accuracy in time stamping. 00:07:44.000 |
This means over the course of a one-and-a-half-hour drive, 00:07:46.000 |
our time stamps issued to each of the different subsystems may drift up to seven or so milliseconds. 00:07:55.000 |
Relatively, this is extremely small compared to most clocks on computers today. 00:08:01.000 |
And once the data is offloaded, the very first thing we do is make sure that the time stamping, 00:08:06.000 |
that the data was time stamped correctly so that we can synchronize it. 00:08:11.000 |
And the very first thing as part of the data pipeline we do is synchronize the data. 00:08:15.000 |
That means using the time stamp that came from the real-time clock 00:08:20.000 |
to assign to every single piece of sensor data using that time stamp to align the data together. 00:08:28.000 |
Now for video, that means 30 frames a second perfectly aligned with other GPS signals and so on. 00:08:34.000 |
There are some other sensors like IMU and the CAN messages coming from the car 00:08:39.000 |
that come much more frequently than 30 hertz, 30 frames a second. 00:08:43.000 |
So we have a different synchronization scheme there. 00:08:45.000 |
But overall, synchronization from the very beginning of the design of the hardware 00:08:50.000 |
to the very end of the design of the software pipeline is crucial 00:08:54.000 |
because we want to be able to analyze what people are doing in these semi-autonomous vehicles, 00:09:00.000 |
And that means using data that comes from the face camera, the body camera, the forward view, 00:09:08.000 |
and all the messages coming from the vehicle telemetry from CAN. 00:09:13.000 |
The video stream compression, which is very much CPU or GPU intensive operations, 00:09:20.000 |
There are other CPU intensive operations performed on rider, like the sensor fusion for IMU. 00:09:28.000 |
But for the most part, there's sufficient CPU cycles left for the actual data collection 00:09:33.000 |
to not have any skips or drifts in the sensor stream collection. 00:09:38.000 |
One of the questions we get is, how do we get the data from this box to our computers, 00:09:45.000 |
then to the cluster that's doing the compute? 00:09:48.000 |
So when we receive a hard drive from one of these rider boxes that we're swapping, 00:09:53.000 |
we connect the hard drive locally to our computers, 00:09:55.000 |
and then we do a remote copy to a server that contains all of our data. 00:09:59.000 |
We then check the data for consistency and perform any fixes on the raw data 00:10:03.000 |
in preparation for a synchronization operation. 00:10:06.000 |
So we're not doing any remote offloading of data. 00:10:09.000 |
The data lives on rider until the subjects, the drivers, the owners of the car, 00:10:17.000 |
So we take the hard drive, swap it out, and offload the data from the hard drive. 00:10:21.000 |
Can you tell me the journey that a pixel takes on its way from the camera to our cluster? 00:10:30.000 |
Well, first the camera records the raw image data based on the settings 00:10:39.000 |
and that raw image data is compressed on the camera itself into an H.264 format 00:10:44.000 |
and then transmitted over the USB wire to the single board computer on the rider box. 00:10:50.000 |
Then it's recorded onto the solid state drive in a video file, 00:10:54.000 |
where it will stay until we do an offload in the course of about six months 00:10:59.000 |
for our NDS subjects and one month for our FT subjects. 00:11:03.000 |
After that, it is connected to a local computer, 00:11:09.000 |
and is then processed with initial cleaning algorithms 00:11:14.000 |
in order to remove any corrupt data or to fix any subject data 00:11:18.000 |
in the configuration files for that particular trip. 00:11:28.000 |
and can then be used for different detection algorithms or manual annotation. 00:11:32.000 |
So the important hard work behind the magic that deep learning computer vision unlocks 00:11:38.000 |
is the synchronization, the cleaning of the messy data, 00:11:42.000 |
making sure we get anything that's at all weird in any way in the data out, 00:11:49.000 |
so that at the end of the pipeline, we have a clean data set 00:11:53.000 |
of multiple sensor streams perfectly synchronized 00:11:56.000 |
that we can then use for both analysis and for annotation 00:12:00.000 |
so that we can improve the neural network models used for the various detection tasks. 00:12:07.000 |
So RIDER has done an amazing job over 30 vehicles 00:12:11.000 |
of collecting hundreds of thousands of miles worth of data, 00:12:17.000 |
So we're talking about an incredible amount of data, 00:12:20.000 |
all compressed with H.264 that's close to 300 terabytes worth of data. 00:12:32.000 |
One huge improvement for RIDER would be transitioning to another single board computer, 00:12:39.000 |
There's a lot more capability for added sensors as well as much more compute power 00:12:43.000 |
and even the possibility for developing some real-time systems with a Jetson. 00:12:47.000 |
One of the critical things when you're collecting huge amounts of data and driving 00:12:50.000 |
is you realize that most of driving is quite boring, 00:12:53.000 |
nothing interesting in terms of understanding driver behavior 00:12:56.000 |
or training computer vision models for edge cases and so on. 00:13:02.000 |
So one of the future steps we're taking is based on the things we found in the data so far, 00:13:09.000 |
we know which parts are interesting and which are not. 00:13:12.000 |
And so we want to design onboard algorithms that are processing in real time 00:13:17.000 |
"Is this the kind of data I want to keep at this time? 00:13:22.000 |
That means we can collect more efficiently just the bits that are interesting 00:13:26.000 |
for edge case neural network model training or for understanding human behavior. 00:13:31.000 |
Now this is a totally unknown open area because really we don't understand 00:13:39.000 |
when the car is driving itself, when the human is driving itself. 00:13:42.000 |
So the initial stages of the study were to keep all the data so we could do the analysis 00:13:47.000 |
to analyze the body pose, glance allocation, activity, smartphone usage, 00:13:52.000 |
all the various sun decelerations, autopilot usage, where it's used, how it's used, 00:14:01.000 |
But as we start to understand where the fundamental insights come from, 00:14:04.000 |
we can start to be more and more selective about which epics of data we want to be collecting. 00:14:09.000 |
Now that requires real-time processing of the data. 00:14:12.000 |
As Dan said, that's where the Jetson TX2, the power that the Jetson TX2 brings, 00:14:19.000 |
Now all of this work is part of the MIT Autonomous Vehicle Technology Study. 00:14:24.000 |
We've collected over 320,000 miles so far and collecting 500 to 1,000 miles every day. 00:14:31.000 |
So we're always growing, adding new vehicles. 00:14:34.000 |
We're looking at adding a Tesla Model 3, a Cadillac CT6 Super Cruise system, and others. 00:14:42.000 |
One of the driving principles behind our work is that the kind of data collection we need 00:14:47.000 |
to design safe, semi-autonomous, and autonomous vehicles is that we need to record 00:14:52.000 |
not just the forward roadway or any kind of sensor collection on the external environment. 00:14:58.000 |
We need to have rich sensor information about the internal environment, 00:15:02.000 |
what the driver is doing, everything about their face, the glance, all the cognitive load, 00:15:08.000 |
and body pose, everything about their activity. 00:15:11.000 |
We truly believe that autonomy, autonomous vehicles, require an understanding 00:15:17.000 |
of how human supervisors of those systems behave, how we can keep them attentive, 00:15:22.000 |
keep their glance on the road, keep them as effective, efficient supervisors of those systems.