back to index

MIT-AVT: Data Collection Device (for Large-Scale Semi-Autonomous Driving)


Chapters

0:0
1:22 Components
1:37 Sensory Integration
3:17 Logitech C920 Webcam
7:30 Synchronization
12:30 Next Steps

Whisper Transcript | Transcript Only Page

00:00:00.000 | The MIT Autonomous Vehicle Technology Study is all about collecting large amounts of naturalistic driving data.
00:00:07.000 | Behind that data collection is this box right here that Dan has termed "Rider".
00:00:13.000 | Dan is behind a lot of the hardware work we do, embedded systems,
00:00:17.000 | and Michael is behind a lot of the software, the data pipeline, as well as just offloading the data from the device.
00:00:24.000 | We'd like to tell you some of the details behind Rider, behind the sensors.
00:00:30.000 | Now, we have three cameras in the car and the wires are running back into the trunk, and that's where Rider is sitting.
00:00:36.000 | There's a lot of design specifications to make the system work month after month,
00:00:41.000 | reliably across multiple vehicles, across multiple weather conditions, and so on.
00:00:46.000 | At the end of the day, we have multiple sensor streams.
00:00:49.000 | We have the three cameras coming in, we have IMU, GPS, and all of the raw CAN messages coming from the vehicle itself.
00:00:57.000 | And all of that has to be collected reliably, synchronized, and post-processed once we offload the data.
00:01:04.000 | First, we have a single-board computer here running a custom version of Linux that we wrote specifically for this application.
00:01:10.000 | This single-board computer integrates all of the cameras, all the sensors, GPS, CAN, IMU,
00:01:16.000 | and offloads it all onto the solid-state hard drive that we have on board.
00:01:21.000 | There are some extra components here for cellular communication, as well as power management throughout the device.
00:01:27.000 | Here we have our single-board computer, as well as sensor integration and our power system.
00:01:32.000 | This is our solid-state drive that connects directly to our single-board computer.
00:01:36.000 | On our single-board computer, we have a sensor integration board on top here.
00:01:40.000 | You'll be able to see our real-time clock, as well as its battery backup and CAN transceiver.
00:01:44.000 | On the reverse side of this board, we have our GPS receiver and IMU.
00:01:49.000 | This is our CAN-controlled power board, which monitors CAN throughout the car and determines whether or not the system should be on or off.
00:01:55.000 | When the system is on, this sends power through a buck converter to reduce the 12 volts from the vehicle down to 5 volts to operate the single-board computer.
00:02:04.000 | We also have a 4G wireless connection on board to monitor the health of Rider and determine things like free capacity left on our drive, as well as temperature and power usage information.
00:02:17.000 | The cameras connect to Rider through this USB hub right here.
00:02:24.000 | We needed the box to do at least three things.
00:02:26.000 | One was record from at least three cameras, record CAN vehicle telemetry data,
00:02:32.000 | and then lastly, be able to store all this data on board for a long period of time,
00:02:37.000 | such that people could drive around for months without having us to offload the data from their vehicles.
00:02:43.000 | When we're talking about hundreds of thousands of miles of data,
00:02:50.000 | for about every 100,000 miles uncompressed, that's about 100 petabytes of video data.
00:02:59.000 | One of the key other requirements was how to store all this data both on the device
00:03:04.000 | and how to be able to then offload it successfully onto thousands of machines to be then processed with the computer vision,
00:03:11.000 | with the deep learning algorithms that we're using.
00:03:14.000 | One of the essential elements for that was to do compression on board.
00:03:18.000 | These are Logitech C920 webcams.
00:03:20.000 | They can do up to 1080p at 30 frames a second.
00:03:24.000 | The major reason why we went with these is because they do onboard H.264 compression of the video.
00:03:30.000 | So that allows us to offload all the processing from our single board computer onto these individual cameras,
00:03:36.000 | allowing us to use a very slim, pared down, lightweight single board computer to run all of these sensors.
00:03:44.000 | This is the original Logitech C920 that you would buy at a store.
00:03:48.000 | These are the two same Logitech C920s, although they were put into a custom-made camera case just for this application.
00:03:55.000 | What this allows us to do is add our own CS-type lenses to enable us to have a zoom lens,
00:04:01.000 | as well as a fisheye lens from within the car, allowing us a greater range of field of views inside the vehicle.
00:04:08.000 | So this is the fisheye lens.
00:04:10.000 | This is the zoom lens.
00:04:12.000 | And the CS type, there's also C type.
00:04:15.000 | So these are the types of standard lenses that connect to these types of cameras,
00:04:20.000 | often to the industrial cameras that are often used for autonomous vehicle applications.
00:04:25.000 | We tested these cameras to see what would happen to them if placed inside of a hot car on a summer day.
00:04:32.000 | We wanted to see would these cameras still be able to hold up to the heat in the summer and still function as needed?
00:04:38.000 | We put these cameras in a toaster.
00:04:41.000 | A scientific toaster.
00:04:44.000 | And this is the temperature that it went up to.
00:04:46.000 | We cycled these cameras between 58 and 75 degrees Celsius,
00:04:51.000 | which is about the maximum of 150 degrees Fahrenheit max temperature that a car would get in the summer.
00:04:59.000 | We also cranked it up to 127 degrees Celsius just to see what would happen to these cameras after prolonged long-term high heat.
00:05:08.000 | In fact, these cameras continued to work perfectly fine after that.
00:05:13.000 | Creating a system that would intelligently and autonomously turn off and on to start and end recording was also a key aspect to this device.
00:05:21.000 | Since people were just going to be driving their normal cars, we couldn't rely on them necessarily to start and end recording.
00:05:27.000 | So this device, Rider, intelligently figures out when the car is running and when it's off to start and stop recording automatically.
00:05:37.000 | So how does Rider specifically know when to turn on?
00:05:41.000 | So we use CAN to determine when the system should turn off and on.
00:05:45.000 | When CAN is active, the car is running, and we should turn the system on.
00:05:48.000 | When CAN is inactive, we should turn the system off and end recording.
00:05:53.000 | This also gives us the ability to trigger on certain CAN messages.
00:05:56.000 | So, for instance, if we want to start recording as soon as they approach the car and unlock the door, we can do that.
00:06:02.000 | Or if they turn the car on or they put it into drive or so on.
00:06:06.000 | The cost of the car that the system resides in is about a thousand times more than the system itself.
00:06:12.000 | So these are $100,000 plus cars.
00:06:15.000 | So we have to make sure that we design the system, we run the wires in such a way that it doesn't do any damage to the vehicles.
00:06:21.000 | What kind of things fail when they fail?
00:06:23.000 | The biggest issue we've had with this system are camera cables becoming unplugged.
00:06:28.000 | So when a camera cable becomes unplugged, the system will try to restart that subsystem multiple times.
00:06:35.000 | And if it's unable to, it completely shuts off recording.
00:06:38.000 | And as long as that cable is still unplugged, Rider will not start up the next time.
00:06:42.000 | So one issue that we've seen is that cables becoming unplugged causes us to lose the potential to record some data.
00:06:49.000 | And that was one of the requirements of the system from the very beginning,
00:06:52.000 | is that all the video streams are always recorded perfectly and synchronized.
00:06:57.000 | Now, if any of the systems are failing to be recording from the sensors,
00:07:01.000 | then we try again, restart the system, restart the system, and if it's still not working, it shuts down.
00:07:08.000 | So the video, in order to understand what drivers are doing in these systems, the video is essential.
00:07:13.000 | So if one of the cameras is not working, that means a system that's not working as a whole.
00:07:20.000 | The other crucial component of having a data collection system that's taking multiple streams
00:07:28.000 | is that those streams have to be synchronized perfectly.
00:07:31.000 | Synchronization was the highest priority from the very beginning of Rider's design.
00:07:38.000 | We have a real-time clock onboard Rider that allows us down to two parts per million of accuracy in time stamping.
00:07:44.000 | This means over the course of a one-and-a-half-hour drive,
00:07:46.000 | our time stamps issued to each of the different subsystems may drift up to seven or so milliseconds.
00:07:55.000 | Relatively, this is extremely small compared to most clocks on computers today.
00:08:01.000 | And once the data is offloaded, the very first thing we do is make sure that the time stamping,
00:08:06.000 | that the data was time stamped correctly so that we can synchronize it.
00:08:11.000 | And the very first thing as part of the data pipeline we do is synchronize the data.
00:08:15.000 | That means using the time stamp that came from the real-time clock
00:08:20.000 | to assign to every single piece of sensor data using that time stamp to align the data together.
00:08:28.000 | Now for video, that means 30 frames a second perfectly aligned with other GPS signals and so on.
00:08:34.000 | There are some other sensors like IMU and the CAN messages coming from the car
00:08:39.000 | that come much more frequently than 30 hertz, 30 frames a second.
00:08:43.000 | So we have a different synchronization scheme there.
00:08:45.000 | But overall, synchronization from the very beginning of the design of the hardware
00:08:50.000 | to the very end of the design of the software pipeline is crucial
00:08:54.000 | because we want to be able to analyze what people are doing in these semi-autonomous vehicles,
00:08:58.000 | how they're interacting with the technology.
00:09:00.000 | And that means using data that comes from the face camera, the body camera, the forward view,
00:09:05.000 | synchronized together with the GPS, the IMU,
00:09:08.000 | and all the messages coming from the vehicle telemetry from CAN.
00:09:13.000 | The video stream compression, which is very much CPU or GPU intensive operations,
00:09:18.000 | performed onboard the camera.
00:09:20.000 | There are other CPU intensive operations performed on rider, like the sensor fusion for IMU.
00:09:28.000 | But for the most part, there's sufficient CPU cycles left for the actual data collection
00:09:33.000 | to not have any skips or drifts in the sensor stream collection.
00:09:38.000 | One of the questions we get is, how do we get the data from this box to our computers,
00:09:45.000 | then to the cluster that's doing the compute?
00:09:48.000 | So when we receive a hard drive from one of these rider boxes that we're swapping,
00:09:53.000 | we connect the hard drive locally to our computers,
00:09:55.000 | and then we do a remote copy to a server that contains all of our data.
00:09:59.000 | We then check the data for consistency and perform any fixes on the raw data
00:10:03.000 | in preparation for a synchronization operation.
00:10:06.000 | So we're not doing any remote offloading of data.
00:10:09.000 | The data lives on rider until the subjects, the drivers, the owners of the car,
00:10:14.000 | come back to us and offload the data.
00:10:17.000 | So we take the hard drive, swap it out, and offload the data from the hard drive.
00:10:21.000 | Can you tell me the journey that a pixel takes on its way from the camera to our cluster?
00:10:30.000 | Well, first the camera records the raw image data based on the settings
00:10:35.000 | that we've configured from the rider box,
00:10:39.000 | and that raw image data is compressed on the camera itself into an H.264 format
00:10:44.000 | and then transmitted over the USB wire to the single board computer on the rider box.
00:10:50.000 | Then it's recorded onto the solid state drive in a video file,
00:10:54.000 | where it will stay until we do an offload in the course of about six months
00:10:59.000 | for our NDS subjects and one month for our FT subjects.
00:11:03.000 | After that, it is connected to a local computer,
00:11:07.000 | synchronized within a remote server,
00:11:09.000 | and is then processed with initial cleaning algorithms
00:11:14.000 | in order to remove any corrupt data or to fix any subject data
00:11:18.000 | in the configuration files for that particular trip.
00:11:21.000 | After the initial cleaning is taken care of,
00:11:24.000 | it is synchronized at 30 frames per second
00:11:28.000 | and can then be used for different detection algorithms or manual annotation.
00:11:32.000 | So the important hard work behind the magic that deep learning computer vision unlocks
00:11:38.000 | is the synchronization, the cleaning of the messy data,
00:11:42.000 | making sure we get anything that's at all weird in any way in the data out,
00:11:49.000 | so that at the end of the pipeline, we have a clean data set
00:11:53.000 | of multiple sensor streams perfectly synchronized
00:11:56.000 | that we can then use for both analysis and for annotation
00:12:00.000 | so that we can improve the neural network models used for the various detection tasks.
00:12:07.000 | So RIDER has done an amazing job over 30 vehicles
00:12:11.000 | of collecting hundreds of thousands of miles worth of data,
00:12:15.000 | billions of video frames.
00:12:17.000 | So we're talking about an incredible amount of data,
00:12:20.000 | all compressed with H.264 that's close to 300 terabytes worth of data.
00:12:26.000 | But, of course, it can always improve.
00:12:29.000 | So what are our next steps?
00:12:32.000 | One huge improvement for RIDER would be transitioning to another single board computer,
00:12:36.000 | in particular a Jetson TX2.
00:12:39.000 | There's a lot more capability for added sensors as well as much more compute power
00:12:43.000 | and even the possibility for developing some real-time systems with a Jetson.
00:12:47.000 | One of the critical things when you're collecting huge amounts of data and driving
00:12:50.000 | is you realize that most of driving is quite boring,
00:12:53.000 | nothing interesting in terms of understanding driver behavior
00:12:56.000 | or training computer vision models for edge cases and so on.
00:13:00.000 | Nothing interesting happens.
00:13:02.000 | So one of the future steps we're taking is based on the things we found in the data so far,
00:13:09.000 | we know which parts are interesting and which are not.
00:13:12.000 | And so we want to design onboard algorithms that are processing in real time
00:13:15.000 | that video data to determine,
00:13:17.000 | "Is this the kind of data I want to keep at this time?
00:13:20.000 | And if not, throw it out."
00:13:22.000 | That means we can collect more efficiently just the bits that are interesting
00:13:26.000 | for edge case neural network model training or for understanding human behavior.
00:13:31.000 | Now this is a totally unknown open area because really we don't understand
00:13:36.000 | what people do in semi-autonomous vehicles,
00:13:39.000 | when the car is driving itself, when the human is driving itself.
00:13:42.000 | So the initial stages of the study were to keep all the data so we could do the analysis
00:13:47.000 | to analyze the body pose, glance allocation, activity, smartphone usage,
00:13:52.000 | all the various sun decelerations, autopilot usage, where it's used, how it's used,
00:13:58.000 | geographic, weather, night, so on.
00:14:01.000 | But as we start to understand where the fundamental insights come from,
00:14:04.000 | we can start to be more and more selective about which epics of data we want to be collecting.
00:14:09.000 | Now that requires real-time processing of the data.
00:14:12.000 | As Dan said, that's where the Jetson TX2, the power that the Jetson TX2 brings,
00:14:17.000 | becomes more and more useful.
00:14:19.000 | Now all of this work is part of the MIT Autonomous Vehicle Technology Study.
00:14:24.000 | We've collected over 320,000 miles so far and collecting 500 to 1,000 miles every day.
00:14:31.000 | So we're always growing, adding new vehicles.
00:14:34.000 | We're looking at adding a Tesla Model 3, a Cadillac CT6 Super Cruise system, and others.
00:14:42.000 | One of the driving principles behind our work is that the kind of data collection we need
00:14:47.000 | to design safe, semi-autonomous, and autonomous vehicles is that we need to record
00:14:52.000 | not just the forward roadway or any kind of sensor collection on the external environment.
00:14:58.000 | We need to have rich sensor information about the internal environment,
00:15:02.000 | what the driver is doing, everything about their face, the glance, all the cognitive load,
00:15:08.000 | and body pose, everything about their activity.
00:15:11.000 | We truly believe that autonomy, autonomous vehicles, require an understanding
00:15:17.000 | of how human supervisors of those systems behave, how we can keep them attentive,
00:15:22.000 | keep their glance on the road, keep them as effective, efficient supervisors of those systems.
00:15:28.000 | [ Silence ]
00:15:34.000 | [ Silence ]
00:15:40.000 | Thanks for watching.