Back to Index

MIT Self-Driving Cars (2018)


Chapters

0:0 Intro
9:59 Different approaches to autonomy
38:36 Sensors
49:51 Companies in the self-driving car space
58:18 Opportunities for deep learning

Transcript

Welcome back to 6S094 Deep Learning for Self-Driving Cars Today we will talk about autonomous vehicles Also referred to as driverless cars, autonomous cars, robo cars First, the utopian view where for many autonomous vehicles have the opportunity to transform our society into a positive direction 1.3 million people die every year in automobile crashes globally 35, 38, 40,000 die every year in the United States So the one opportunity that's huge that's one of the biggest focus for us here at MIT for people who truly care about this is to design autonomous systems artificial intelligence systems that saves lives And those systems help work with, deal with, or take away what NHTSA calls the four D's of human folly Drunk, drugged, distracted, and drowsy driving Autonomous vehicles have the ability to take away drunk driving distracted, drowsy, and drugged Eliminate car ownership So taking shared mobility to another level Eliminating car ownership from the business side is the opportunity to save people money and increase mobility and access Making vehicles, removing ownership makes vehicles more accessible because the cost of getting from point A to point B drops an order to magnitude And the insertion of software and intelligence into vehicles makes those vehicles, makes the idea of transportation makes the way we see moving from A to point B a totally different experience Much like with our smartphone it makes it a personalized, efficient, and reliable experience Now for the negative view for the dystopian view Eliminate jobs Any technology throughout its history throughout our history of human civilization has always created fear that jobs that rely on the prior technology will be lost This is a huge fear especially in trucking because so many people in the United States and across the world rely, work in the transportation industry transportation sector and the possibility that AI will remove those jobs has potential catastrophic consequences The idea one that we have to struggle with in the 21st century of the role of intelligence systems that aren't human beings being further and further integrated into our lives is the idea that a failure of an autonomous vehicle even if they're much rarer if they're, even if they're much safer that there is a possibility for an AI algorithm designed by probably one of the engineers in this room will kill a person where that person would not have died if they were in control of the vehicle The idea of an intelligence system one in direct interaction with a human being killing that human being is one that we have to struggle with on a philosophical, ethical and technological level Artificial intelligence systems in popular culture less so in engineering concerns may not be grounded ethically grounded At this time, much of the focus of building these systems as we'll talk about today and throughout this course the focus is on the technology How do we make these things work?

But of course, decades out years or decades out the ethical concerns start arising For Rodney Brooks one of the seminal people from MIT those ethical concerns will not be an issue for another several decades at least five decades But they're still important It continues the thought the idea of what is the role of AI in our society When that car gets to make a decision about human life what is it making that decision based on?

Especially when it's a black box what is the ethical grounding of that system? Does it conform with our social norms? Does it go against them? And there's many other concerns Security is definitely a big one A car that's not even artificial intelligence based the car that's software based as they're becoming more and more Millions, most of the cars on road today are run by millions of lines of source code The idea that those lines of source code written again by some of the engineers in this room get to decide the life of a human being means then a hacker from outside of the car can manipulate that code to also decide the fate of that human being That's a huge concern For us from the engineering perspective the truth is somewhere in the middle We want to find what is the best positive way we can build these systems to transform our society to improve the quality of life of everyone amongst us But there's a grain of salt to the hype of autonomous vehicles We have to remember as we discussed in the previous lecture and it will come up again and again our intuition about what is difficult and what is easy for deep learning for autonomous systems is flawed If we use our if you use ourselves in this example human beings are extremely good at driving This will come up again and again Our intuition has to be grounded in understanding of what is the source of data what is the annotation and what is the approach what is the algorithm So you have to be careful about using our intuition extending it decades out and making predictions whether it's towards the utopian or the dystopian view And as we'll talk about some of the advancements of companies working in the space today You have to take what people say in the media what the companies say some of the speakers that will be speaking at this class say about their plans for the future and their current capabilities I think a guide I can provide is when there's a promise of a future technology future vehicles that are two years out or more that has to be that's a very doubtful prediction One that is within a year as we'll give a few examples today is skeptical The real proof comes in actual testing on public roads or in the most impressive the most amazing the reality of it is when it's available to consumer purchase I would like to use Rodney Brooks as a so it doesn't come from my mouth but I happen to agree His prediction is no earlier than 2032 a driver's taxi service in a major US city will provide arbitrary pickup and drop-off locations fully autonomously That's 14 years away and by 2045 it will do so in multiple cities across the United States So think about that that a lot of the engineers work in this space a lot of folks who are actually building these systems agree with this idea and that is the earliest I believe this will happen and Rodney believes but as all technophobes have been wrong who could be wrong This is a map on the x-axis a plot on the x-axis of time throughout the 20th century and the adoption rate on the y-axis from 0 to 100% of the various technologies from electricity to cars to radio to telephone and so on and as we get closer to today the technology adoption rate when it goes from 0 to 100% the number of years it takes to adopt that technology is getting shorter and shorter and shorter as a society we're better at throwing away the technology of old and accepting a technology of new So if a brilliant idea to solve some of the problems we're discussing comes along it could change everything overnight So let's talk about different approaches to autonomy We'll talk about sensors afterwards We'll talk about companies players in this space and then we'll talk about AI and the actual algorithms and how they can help solve some of the problems of autonomous vehicles Levels of autonomy Here's a useful taxonomization of levels of autonomy useful for initial discussion for legal discussion and for policymaking and for blog posts and media reports but it's not useful I would argue for design and engineering of the underlying intelligence and the system viewed from a holistic perspective the entire thing creating an experience that's safe and enjoyable So let's go over those levels The five, the six levels This is presented by SAE report J3016 the most widely accepted taxonomization of autonomy No automation at level zero Level one and level two is increasing levels automation Level one is cruise control Level two is adaptive cruise control lane keeping Level three I don't know what level three is There's a lot of people that will explain that level three is conditional automation meaning it's constrained to certain geographic location I will explain that from an engineering perspective I'm personally a little bit confused of where that stands I'll try to redefine how we should view automation Level four and level five is high full level automation Level four is when the vehicle can drive itself fully for part of the time There's certain areas in which you can take care of everything no matter what no human interaction input safekeeping is required Level five automation is the car does everything Everything I would argue that those levels aren't useful for designing systems that actually work in the real world I would argue that there's two systems But first a starting point that every system to some degree involves a human It starts with manual control from a human Human getting in the car and a human electing to do something So that's the manual control What we're talking about when the human engages the system when the system is first available and the human chooses to turn it on That's when we have two AI systems Human-centered autonomy when the human is needed is involved and full autonomy when AI is fully responsible for everything from the legal perspective that means A2 full autonomy means the car the designer the AI system is liable is responsible and for the human-centered autonomy the human is responsible What does this practically mean?

For human-centered autonomy and we'll discuss examples of all of these when a human interaction is necessary The question then becomes is how often is the system available? Is it available on in traffic conditions? So for traffic bumper to bumper is available on the highway Is it sensor based? Like in the Tesla vehicle meaning based on the visual characteristics of the scene the vehicle is confident enough to be able to control to make control decisions perception control decisions The other factor not discussed enough and I think poorly imprecisely discussed when it is is the number of seconds given to the driver not guaranteed but provided as a sort of feature to the driver to take over In the Tesla vehicle in all vehicles on the road today that time is zero zero seconds are guaranteed zero seconds are provided There is some there's some room sometimes it's hundreds of milliseconds sometimes it's multiple seconds but really there's no standard of how many seconds you get to say wake up, take control Then tele-operation something that some of the companies will mention or playing with is when a human being is involved remotely controlling the vehicle remotely so being able to take over control of the vehicle when you're when you're not able to control it So support by a human that's not inside the car That's a very interesting idea to explore But for the human-centered autonomy side all of those features are not required they're not guaranteed the human driver the human inside the car is always responsible at the end of the day they must pay attention to a degree that's required to take over when the system fails And no matter under this consideration under this level of autonomy the system will fail at some point That is the that is the point that is the collaboration between human and robot is the system will fail and the human has to catch it when it does And then full autonomy is AI is fully responsible Now that doesn't again as we'll present some companies in the marketing material and the PR side of things they might present that there is significant degrees of autonomy If you're talking about L3 or L4 or L5 you have to read between the lines You're not allowed to have teleoperation If a human is remotely operating the vehicle a human is still in the loop a human is still evolved It's still a human-centered autonomy system You don't get the 10-second rule which is just because you give the driver 10 seconds to take control that somehow removes liability for you If you say that that's it as an AI system I can't take can't resolve can't deal can't control the vehicle in this situation and you have 10 seconds to take over that's not good enough The driver might be sleeping that driver may have had a heart attack they're not able to control the vehicle Full autonomous systems must find safe harbor They must get you full stop from point A to point B that point B might be your desired destination or might be a safe parking lot but it has to bring you to a safe location This is a clear definition of the two systems and the human of course as far as our certain current conception of artificial intelligence and cars today is a human always overrides the AI system So we should for the in the general case the human gets to choose to take control The AI can't take control of the human except when danger is imminent meaning sudden crashes like in AAB events We're not yet ready for the AI systems to say as a society to say no no no you're drunk you can't drive So beyond the traditional levels from level 0 to level 5 the starting point is level 0 no automation all cars start here Level 1, level 2 and level 3 I would argue fall into human-centered autonomy systems A1 because they involve some degree of a human Then L4, L5 to some degree there's some crossover fall into full autonomy Even though with L4 with Waymo as you can ask on Friday and anyone cruise Uber playing in this space there's very often a human driver involved One of the huge accomplishments of Waymo over the past month incredible accomplishment where in Phoenix, Arizona they drove without the car drove without a driver meaning there was no safety driver to catch there's no engineer staff member there to catch the car A human being that doesn't work for Google or Waymo got into that car and got from A to point B without a safety driver That's an incredible accomplishment and that particular trip was a fully autonomous trip That is full autonomy when there's no human to catch the car No AI presentation is good without cats So full autonomy A2 system is when you do nothing but ride along human-centered autonomy system is when you have some control I'm sorry I had to So the two paths for autonomous systems A1 and A2 In blue on the left is A1 human-centered on the right is A2 full autonomy And then blue is from the artificial intelligence perspective is easy easier and then red is harder easier meaning we do not have to achieve a hundred percent accuracy harder means everything that's off of a hundred percent accuracy no matter how small has a potential of costing human lives and huge amounts of money for companies So let's discuss we'll discuss later in the lecture about the algorithms behind each of these methods on the left and the right But this summarizes the two approaches the localization mapping for the car to determine where it's located for the human-centered autonomy It's easy It still has to do the perception It has to localize itself within the lane It has to find all the neighboring pedestrians and the vehicles in order to be able to control the vehicle to some degree But because a human is there it doesn't have to do so perfectly When it fails a human is there to catch it Scene understanding perceiving everything in the environment from the camera from whether it's LiDAR, radar, ultrasonic The planning of the vehicle whether it's just staying within lane or for adaptive cruise control controlling the longitudinal movement of the vehicle or it's changing lanes as the Tesla autopilot or higher degrees of automation All of those movement planning decisions can be made autonomously when the human is there to catch It's easier because you're allowed to be wrong rarely but wrong The hard part is getting the human-robot interaction piece right That's next Wednesday lecture as we'll discuss about how deep learning can be used to interact first perceive everything about the driver and second to interact with the driver That part is hard because you can't screw up on that part You have to make sure you help the driver know where your flaws are so they can take over If the driver is not paying attention you have to bring their attention back to the road back to the interaction You have to get that piece right because for a flawed system one that's rarely flawed the rarity is the challenge in fact has to get the interaction right And then the final piece communication Autonomous vehicle a fully autonomous vehicle must communicate extremely well with the external world with the pedestrians, the jaywalkers the humans in this world, the cyclists That communication piece one at least that is part of a safe and enjoyable driving experience is extremely difficult A way more vehicle I wish them luck if they come to Boston from getting from point A to point B because pedestrians will take advantage A vehicle must assert itself in order to be able to navigate Boston streets And that assertion is communication That piece is extremely difficult For a Tesla vehicle for a human-centered autonomy vehicle L2, L3 The way you deal with Boston pedestrians is you take over roll down the window, yell something and then speed up Getting the piece for an artificial intelligence system to actually be able to accomplish something like that as we'll discuss on the ethics side and the engineering side is extremely difficult That said most of the literature in the human factors field in the autonomous vehicle field Anyone that studied autonomy in aviation and in vehicles is extremely skeptical about the human-centered approach They think it's deeply irresponsible It's deeply irresponsible because, as argued because human beings when you give them a technology which will take control part of the time they will get lazy they will take advantage of that technology they will over trust that technology they'll assume it'll work perfectly always This is the idea that this this idea extended beyond further and further means that the better the system gets the better the car gets at driving itself the more the humans will sit back and be completely distracted It will not be able to re-engage themselves in order to safely catch when the system fails This is Chris Urmson the founder of the Google self-driving cars program and now the co-founder of one the other co-founder is the speaker of this class on next Friday Sterling Anderson of a company called Aurora a startup He was one of the big proponents or the I should say opponents of the idea that human-centered autonomy could work They tried it Publicly is spoken about the fact that Google as in the early self-driving car program they've tried shared autonomy they've tried L2 and it failed because their engineers the people driving their vehicles fell asleep and that's the belief that people have and we'll talk about why that may not be true There's a fascinating truth in the way human beings can interact with artificial intelligence systems that may work in this case as I mentioned it's the human-robot interaction building that deep connection between human and machine of understanding of communication This is what we believe happens so there's a lot of videos like this it's fun but it's also representative of what society believes happens when automation is allowed to enter the human experience and driving where human life is at stake that you can become completely disengaged It's kind of it's kind of a natural thing to think but the question is does this actually happen?

What actually happens on public roads? The amazing thing that people don't often talk about is that there is hundreds of thousands of vehicles on the road today equipped with autopilot Tesla autopilot that have a significant degree of autonomy that's data that's information so we can answer the question what actually happens so many of the people behind this team have instrumented 25 vehicles 21 of which are Tesla autopilot vehicles now with over collected recording everything about the driver two cameras, two HD cameras on the driver two cameras on the one camera on the external roadway and collecting everything about the car including audio the state, the pulling everything from the cam bus the kinematics of the vehicle IMU, GPS all of that information over now over 300,000 miles over 5 billion video frames all as we'll talk about analyzed computer vision you extract from that video of the driver of everything they're doing the level of distraction the allocation of attention the drowsiness, emotional states the hands-on wheel, hands-off wheel, body pose activity, smartphone usage all of these factors all of these things that you would think would fall apart when you start letting autonomy into your life we'll talk about what the initial reality is that should be inspiring and thought-provoking as I said three cameras single board computer recording all the data over a thousand machines in Holyoke in distributed computation running the deep learning algorithms that I've mentioned on these five plus billion video frames going from the raw data to the actionable useful information the slides are up online if you'd like to look through them I'll fly through some of them and this is the video of one of thousands of trips we have in autopilot in our data a car driving autonomously a large fraction of the time on highways from here to California from here to Chicago to Florida and all across the United States we take that data and using the supervised learning algorithms semi-supervised the number of frames here is huge for those that work in computer vision five billion frames is several orders of magnitude larger than any data set that people are working with in computer vision actively annotated so we want to use that data for understanding the behavior of what people actually doing in the cars and we want to train the algorithms that do perception and control a quick summary over 300,000 miles 25 vehicles the colors are true to the actual colors of the vehicles little fun fact Tesla, Model X, Model S and now Model 3 500,000 500 plus sorry miles a day and growing now most days in 2018 are over a thousand miles a day this is a quick GPS map in red is manual driving across the Boston area in blue cyan is autonomous driving this is giving you the sense of just the scope of this data this is a huge number of miles with automated driving several orders of magnitude larger than what Waymo is doing that what Cruze is doing and what Uber is doing the miles driven in this data with autopilot confirming what Elon Musk has stated is 33% of miles are driven autonomously this is a remarkable number for those of you who drive and for those of you who are familiar with these technologies that is remarkable adoption rate that 33% of the miles are driven in autopilot that means these drivers are getting use out of the system it's working for them that's an incredible number it's also incredible because under the decades of literature from aviation to automation and vehicles to Chris Urmson and Waymo the belief is such high numbers are likely to lead to crashes to fatalities to at the very least highly responsible behavior drivers over trusting the systems and getting in trouble we can run the glance classification algorithms again this is for next Wednesday discussion to the actual algorithm it's the algorithm that tells you the region that the driver is looking at and it's comparing road instrument cluster left rearview center stack and right does the allocation of glance change with autopilot or with manual driving it does not appear to in any significant noticeable way meaning you don't start playing chess you don't start you don't get in the backseat to sleep you don't start texting in your smartphone watching a movie at least in this data set there's promise here for the human-centered approach the observation to summarize this particular data is that people are using it a lot the percentage of miles a percentage hours is incredibly high at least relative to what was will be expected from these systems and given that there's no crashes there's no near crashes in autopilot the road type is mostly highway traveling at high speeds the mental engagement looked at 8,000 transfer of control from machine to human so human beings taking control of the vehicle saying you know what I'm going to take control now I'm not comfortable with the situation for whatever reason either not comfortable or electing to do something that the vehicle is not able to like turn off the highway make a right or left turn stop for a stop sign these kinds of things physical engagement as I said glance remains the same and what do we take from this it's just something that I'd like to really emphasize as we talked as we talk about autonomous vehicles in this class and the guest speakers who are all on the other side so I'm representing the human center side all our speakers are focused on the full autonomy side because that's the side roboticists know how to solve that's the fascinating algorithm nerd side and that's the side I love as well it's just my belief stands that the solving the perception control problem is extremely difficult and two three decades away so in the meantime we have to utilize the human robot interaction to actually bring these AI systems onto the road to successfully operate and the way we do that counterintuitively is we have to have we have to let the artificial intelligence systems reveal their flaws one of the most endearing things to human beings can do to each other friends is reveal their flaws to each other now from a automotive perspective from a company perspective it's perhaps not appealing for an AI system to reveal what it sees about the world and what it doesn't see about the world where it succeeds and where it fails but that is perhaps exactly what it needs to do in the case of autopilot the way the very limited but I believe successful way is currently doing that is allowing you to use autopilot basically anywhere so what people are doing is they're trying to engage their turn on autopilot in places where they really shouldn't rural roads, curvy with terrible road markings with in heavy rain conditions with snow with lots of cars driving at high speeds all around they turn autopilot on to understand to experience the limitations of the system to interact that human robot interaction is through its tactile by turning it on and seeing is it going to work here?

How is it going to fail? And the human is always there to catch it that interaction that's communication that intimate understanding is what creates successful integration of AI in the car before we're able to solve the full autonomy puzzle learn the limitations by exploring it starts with this guy and hundreds of others if you search on YouTube first time with autopilot the amazing experience of direct transfer of control of your life to an artificial intelligence system in this case giving control to Tesla autopilot system this is why in the human center camp of autonomy I believe that autonomous vehicles can be viewed as personal robots with which you build a relationship where the human robot interaction is the key problem not the perception control and there the flaws of both humans and machines must be clearly communicated and perceived perceived because we use the computer vision algorithms to detect everything about the human and communicated because on the displays of the car or even through voice it has to be able to reveal when it doesn't see different aspects of the scene from the human centered approach then we can focus on the left the perception and control side perceiving everything about the external environment and controlling the vehicle without having to worry about being 99.99999% correct approaching a hundred percent correct because in the cases where it's extremely difficult we can let the human catch the system we can reveal the flaws and let the human take over when the system can't so let's get to the sensors the sources of raw data that we get to work with there's three there's cameras so image sensors RGB infrared visual data there's radar and ultrasonic and there's LiDAR let's discuss the strengths first discuss really what these sensors are the strength and weaknesses and how they can be integrated together through sensor fusion so radar is the trusted the old trusted friend the sensor that's commonly available in most vehicles that have any degree of autonomy on the left is a visualization of the kind of data on high-resolution radar that's able to be extracted it's cheap both radar which works with electromagnetic waves and ultrasonic which works with sound waves sending a wave letting it bounce off the obstacles knowing the speed of that wave being able to calculate the distance to the obstacle based on that it does extremely well in challenging weather rain, snow the downside is a slow resolution compared to the other sensors we'll discuss but it is the one that's most reliable and used in automotive industry today and it's the one that's in sensor fusion is always there LiDAR visualized on the right the downsides it's expensive but it produces an extremely accurate depth information and a high-resolution map of the environment that has 360 degrees of visibility it has some of the big strengths of radar in terms of reliability but with much higher resolution and accuracy the downside is cost here's a quick visualization comparing the two of the kind of information you get to work with the density and the quality of information with LiDAR is much higher and LiDAR has been the successful source of ground truth the reliable sensor relied upon on vehicles that don't care about cost and camera the thing that most people here should be passionate about because machine learning deep learning has the most ability to have a significant impact there why?

first is cheap so it's everywhere second is the highest resolution so there's the most the most highly dense amount of information which means information is something that could be learned and inferred to interpret the external scene that's why it's the best source of data for understanding the scene and the other reason it's awesome for deep learning is because of the hugeness of data involved the, it's many orders of magnitude more data available for driving in camera, visible light or infrared than it is in LiDAR the and our world is designed for visible light our eyes work in similar ways the cameras at least crudely so the source data is similar the lane markings the traffic signs, the traffic lights the other vehicles the other pedestrians all operate with each other in this RGB space in terms of visual characteristics the downside is cameras are bad at depth estimation it's noisy and difficult even with stereo vision cameras to estimate depth relative to LiDAR and they're not good in extreme weather and they're not good at least visible light cameras at night so let's compare the ranges here's a plot in meters on the x-axis of the range and acuity on the y-axis with ultrasonic, LiDAR, radar and camera passive visual sensor plotted the range of cameras is the greatest this is looking at we're going to look at several different conditions this is for clear well-lit conditions so during the day no rain no fog LiDAR and radar have a smaller range under 200 meters and ultrasonic sensors used mostly for park assistance and these kinds of things and blind spot warning has terrible range is designed for extremely close as high resolution distance estimation for extremely close distances here a little bit small but looking at up top is clear well-lit conditions the plot we just looked at and on bottom is clear dark conditions so just a clear night day no rain but it's night and on the bottom right is heavy rain, snow or fog vision falls apart in terms of range and accuracy under dark conditions and in rain, snow or fog radar, our old trusted friend stays strong the same range just under 200 meters and at the same acuity same with sonar LiDAR works well at night but it does not do well with rain or fog or snow one of the biggest downsides of LiDAR other than cost so here's another interesting way to visualize this that I think is productive for our discussion of which sensor will win out is it the Elon Musk prediction of camera or is it the Waymo prediction of LiDAR for LiDAR in this kind of plot that will look for every single sensor the greater the radius of the blue the more successful that sensor is at accomplishing that feature with a bunch of features lined up around the circle so range for LiDAR is pretty good not great but pretty good resolution is also pretty good it works in the dark it works in bright light but it falls apart in the snow it does not provide color information, texture information, contrast it's able to detect speed but the sensor size at least to date is huge the sensor cost at least to date is extremely expensive and it doesn't do well in proximity where ultrasonic shines speaking of which ultrasonic is a very important feature ultrasonic same kind of plot does well in proximity detection it's cheap, the cheapest sensor of the four and sensor size you can get it to be tiny it works in snow, fog and rain but its resolution is terrible its range is non-existent and it's not able to detect speed that's where radar steps up it's able to detect speed it's also cheap it's also small but the resolution is very low and it's just like LIDAR is not able to provide texture information color information camera the sensor cost is cheap the sensor size is small not good up close proximity the range is the longest of all of them resolution is the best of all of them it doesn't work in the dark it works in bright light but not always one of the biggest downfalls of camera sensors is the sensitivity to lighting variation it works it doesn't work in the snow, fog, rain so suffers much like LIDAR from that but it provides rich interesting textural information the very kind that deep learning needs to make sense of this world so let's look at the cheap sensors ultrasonic radar and cameras which is one approach putting a bunch of those in a car and fusing them together the cost there is low one of the nice ways to visualize using this visualization technique when they're fused together on the bottom it gives you a sense of them working together to complement each other's strengths and the question is whether camera or LIDAR will win out for partial autonomy or full autonomy on the bottom showing this kind of visualization for a LIDAR sensor and on top showing this kind of visualization for fused radar, ultrasonic and camera at least under these considerations the fusion of the cheap sensors can do as well as LIDAR now the open question is whether LIDAR in the future of this technology can become cheap and its range can increase because then LIDAR can win out solid-state LIDAR and a lot of developments with a lot of startup LIDAR companies are promising to decrease the cost and increase the range of these sensors but for now we plow along dedication on the camera front the annotated driving data grows exponentially more and more people are beginning to annotate and study the particular driving perception and control problems and the very algorithms for the supervised and semi-supervised and generative networks that we use to work with this data are improving so it's a race and of course radar and ultrasonic are always there to help so companies that are playing in the space some of them are speaking here Waymo in April 2017 they exited their testing their extensive impressive testing process and allowed a first rider in Phoenix public rider in November 2017 it's an incredible accomplishment for a company and for an artificial intelligence system in November 2017 no safety driver so the car truly achieved full autonomy under a lot of constraints but it's full autonomy it's a step it's a amazing step in the direction towards full autonomy much sooner than people would otherwise predict and the miles 4 million miles driven autonomously by November 2017 and growing quickly growing in terms of full autonomous driving if I can say so cautiously because most of those miles have a safety driver so I would argue it's not full autonomy but however they define full autonomy it's 4 million miles driven incredible Uber in terms of miles second on that list they have driven 2 million miles autonomously by December of this of last year 2017 the quiet player here in terms of not making any declarations of being fully autonomous just quietly driving in a human-centered way L2 over 1 billion miles in autopilot over 300,000 vehicles today are equipped with autopilot technology with ability to drive control the car laterally and longitudinally and if anyone believes the CEO of Tesla there will be over 1 million such vehicles by the end of 2018 but no matter what the 300,000 is an incredible number and the 1 billion miles is an incredible number autopilot was first released in September 2014 one of the first systems on the road to do so autopilot and I call myself as one of the skeptics in October 2016 autopilot decided to let go of an incredible work done by Mobileye now Intel with designing their perception control system they decided to let go of it completely and start from scratch using mostly deep learning methods the dry PX2 system from NVIDIA and eight cameras they decided to start from scratch that's the kind of boldness the kind of risk-taking that can come with naivety but in this case it worked incredible Audi A8 system is going to be released at the end of 2018 and it's promising one of the first vehicles that's promising what they're calling L3 and the definition of L3 according to Thorsten Lionheart the head of the automated driving in Audi in Audi is when the function is operate as intended if the customer turns the traffic jam pilot on now this L3 system is designed only for traffic jams bump into bumper traffic under 60 kilometers an hour if the customer turns the traffic jam pilot on and uses it as intended and the car was in control at the time of the accident the driver goes to the insurance company and the insurance company will compensate the victims of the accident and an aftermath they come to us we will pay them so that means the car is liable the problem is under the definition of L2 L3 perhaps there is some truth to this being an L3 system the important thing here is it's nevertheless deeply and fundamentally human-centered because even as you see here in this demonstration video with a reporter the car for a poorly understood reason transfer control to the driver says that's it I can't take care of the situation you take control how how much time do you have in terms of seconds before you really need to know to take over well this is the new thing about L3 with L3 the system allows the driver to give the prompt to take over vehicle control again ahead of time which is in this case up to 10 seconds okay so if the traffic jam situation clears up or any failure in the system occurs everything you might think of the system still needs to be able to drive automatically because the driver has this time to take over you might ask what is new about this so why is Audi saying this is the first L3 system worldwide on the market when talking about these levels of automation there's a classification which starts at level 0 which is basically the driver is doing everything there's no assistance nothing and then it gradually becomes into partly automation and when we're talking about these assistance functions like lane keeping and distance keeping we're talking about level 2 assistance functions okay which is meaning that the driver is obliged to permanently monitor the traffic situation to keep the hands on the wheel even though there's a support and an assistance and to intervene immediately if anything is not quite right so you know that from lane assistance systems when the steering is not perfectly in the right lane you have to intervene and correct immediately and that is the main difference now we got a takeover request so what so let's let's talk about what that means this is still a human-centered system it still struggles it still must solve the human-robot interaction problem and there's many others playing in the space on the on the full autonomy side Waymo, Uber GM Cruise, Newtonomy the CTO of which will speak here on Tuesday Optimus Ride Zenuity Voyage the CEO of which will speak here next Thursday and Aurora not listed the founder of which will speak here next Friday and the human-centered autonomy side the reason I am speaking about it so much today is we don't have any speakers I'm the speaker the Tesla Autopilot is for several years now doing incredible work on that side we're also working with Volvo Pilot Assist as a lot of different approaches there more conservative interesting the Audi Traffic Jam Assist as I mentioned the A8 being released at the end of this year the Mercedes Drive Pilot Assist and the E-Class an interesting vehicle that I got to drive quite a bit is the Cadillac Super Cruise the CT6 which is very much constrained geographically to highway driving and the loudest proudest of them all George Hotz of the ComAi OpenPilot I'll just leave that there So where can AI help?

We'll get into the details of the coming lectures on each individual component I'd like to give some examples the key areas, problem spaces that we can use machine learning to solve from data is localization and mapping so being able to localize yourself in the space the very first question that a robot needs to answer where am I?

Scene understanding taking the scene in and interpreting that scene detecting all the entities in the scene detecting the class of those entities in order to then do movement planning to move around those entities and finally driver state essential element for the human-robot interaction perceive everything about the driver everything about the pedestrian and the cyclist and the cars outside the human element of those the human perception side so first the where am I?

visual odometry using camera sensors which is really where once again deep learning is most the vision sensor is the most amenable to learning based approaches and visual odometry is using camera to localize yourself to answer the where am I question the traditional approaches of SLAM detect features in the scene and track them through time from frame to frame and from the movement of those features are able to estimate thousands of features tracking estimate the location the orientation of the vehicle or the camera those methods with stereo vision first requires taking two camera streams on distorting them computing disparity map from the different perspectives of the two camera computing the matching between the two the feature detection the sift and fast or any of the methods of extracting non deep learning methods of this extracting features strong detectable features that can be tracked through from frame to frame tracking those features and estimating the trajectory the orientation of the camera that's the traditional approach to visual odometry in the recent years since 2015 but most success in the last year has been the end-to-end deep learning approaches either stereo or monocular cameras DeepVO is one of the most successful the end-to-end method is taking a sequence of images extracting with a CNN from each image essential features from each image and then using RNN recurrent neural network to track over time the trajectory the pose of the camera image to pose end-to-end here's the visualization on a KITTI dataset using DeepVO again taking the video up on the top right as an input and estimating what's visualized is the position of the vehicle in red is the estimate based again end-to-end with a CNN and RNN the in red is the estimate in blue is the ground truth in the KITTI dataset so this removes a lot of the modular parts of SLAM a visual odometry and allows it to be end-to-end which means it's learnable which means it gets better with data that's huge and that's vision alone this is one of the exciting opportunities for AI or people working in AI is the ability to use a single sensor and perhaps the most inspiring because that sensor is similar to our own the sensor that we ourselves use of our eyes to use that alone as the primary sensor to control a vehicle that's really exciting and the fact that deep learning that the vision visible light is the most amenable to deep learning approaches makes this particularly an exciting area for deep learning research scene understanding of course we can do a thousand slides on this traditionally object detection pedestrians, vehicles there is a bunch of different types of classifiers and feature extractors heart-like features and deep learning has basically taken over and dominated every aspect of scene interpretation perception understanding tracking recognition classification detection problems and audio can't forget audio that we can use audio as source of information whether that's detecting honks or in this case using the audio of the tires microphones on the tires to determine visualize there's a spectrogram of the audio coming in for those of you who are particularly have a particularly tuned ear can listen to the different audio coming in here of wet road and dry road after the rain so there's no rain but the road is nevertheless wet and detecting that is extremely important for vehicles because they still don't have traction control is that have poor control in road to road surface tire road surface connection and being able to detect that from just audio is a very interesting approach finally or not finally next for the perception control side finally is the movement planning getting from A to point from point A to point B traditional approaches the optimization based approach determine the optimal control try to reduce the problem formalize the problem in a way that's amenable to optimization based approaches there's a lot of assumptions that need to be made but once those assumptions are made you're able to determine generate thousands or millions of possible trajectories and have an objective function would determine which of the trajectories to take here's a race car optimizing how to take a turn at high speed with deep learning reinforcement learning the application neural networks to reinforcement learning is particularly exciting for both the control and the planning side so that's where the two of the competitions we're doing in this class coming to play the simplistic two-dimensional world of deep traffic and the high speed moving high risk world of deep crash (car crash) we'll explore those tomorrow tomorrow's lectures on deep reinforcement learning and finally driver state detecting everything about the driver and then interacting with them on the left and green are the easier problems on the right and red are the harder problems in terms of perception in terms of how amenable they are to deep learning methods body pose estimation is a very well studied problem we have extremely good detectors for estimating the pose the hands, the elbows, the shoulders every aspect, visible aspect of the body head pose, the orientation of the head we're extremely good at that and as we get smaller and smaller in terms of size blink rate blink duration eye pose and blink dynamics start getting more and more difficult all of these metrics all of these metrics extremely important for detecting things like drowsiness or as components of detecting emotion or where people are looking in driving where your head is turned is not necessarily where you're looking in regular life non-driving life when you look somewhere you usually turn your head to look with your eyes in driving your head often stays still or moves very subtly your eyes do a lot more moving it's the kind of effect that we described as the lizard owl effect some fraction of people a small fraction are owls meaning they move their head a lot and some people most people are lizards moving eyes to allocate their attention the problem with eyes is from the computer vision perspective they're much harder to detect in lighting variation in real-world conditions they get harder and we'll discuss how to deal with it of course that's where deep learning steps up and really helps with real-world data cognitive load we'll discuss as well estimating the cognitive load of the driver to give a quick clip is this is the driver glance we've seen before estimating the very most important problem on driver state side is determining whether they're looking on road or off-road it's the dumbest simplest but most important aspect are they looking at are they in the seat and looking on the road or are they not that's driver glance classification not estimating the XYZ geometric orientation where they're looking but actually binary class classification on road or off-road body pose estimation determining of the hands are on wheel or not determining if the body alignment is standard is good for seat belt for safety this is one of the important things for autonomous vehicles if there's an imminent danger to the driver the driver should be asked to return to a position that is safe for them in a case of a crash driver in motion on the top is a satisfied on the bottom is a frustrated driver they self-reported satisfied this is with a voice-based navigation one of the biggest sources of frustrations for people in cars is voice-based navigation trying to tell an artificial intelligence system using your voice alone where you would like to go huge source of frustration one of the interesting things in our large data set that we have from the effective computing perspective is determining which of the features are most commonly associated with frustrated voice-based interaction and that's a smile shown there it's the counterintuitive notion that emotion in particular emotion in the car is very context dependent that smiling is not necessarily a sign of happiness and the stoic bored look of the driver up top is not necessarily a reflection of unhappiness he is indeed a 10 out of 10 on terms of satisfaction with the experience if he has ever been satisfied with anything happens to be Dan Brown one of the amazing engineers in our team cognitive load estimating from the eye region and sequences of images 3D convolutional neural networks taking in a sequence of images from the eye looking at the blink dynamics in the eye position to determine the cognitive load from zero to two how deep in thought you are two paths to autonomous future again I would like to maybe for the last time but probably not argue for the one on the left because our brilliant much smarter than me guest speakers will argue for the one on the right the human centered approach allows us to solve the problems of 99% accuracy of localization scene understanding movement planning those are the problems we're taking on this class the scene segmentation that we'll talk about on Thursday the control that we'll talk about tomorrow and the driver state that we'll talk about next Wednesday these problems can be solved with deep learning today the problems on the right solving them to close to 100% accuracy are extremely difficult and maybe decades away because for full autonomy to be here we have to solve this situation I've shown this many times Arctic Triumph we have to solve this situation I give you just a few examples what do you do?

you have to solve this situation a sort of subtler situation here is a is a busy crosswalk where no autonomous vehicle will ever have a hope of getting through unless it asserts itself and there's a couple of vehicles here that kind of nudge themselves through or at least when they have the right of way don't necessarily nudge but don't hesitate when a pedestrian is present an ambulance flying by even though if you use a trajectory so pedestrian intent modeling algorithm to predict the momentum of the pedestrian to estimate where they can possibly go you would then autonomous vehicle will stop but these vehicles don't stop they assert themselves they move forward now for a full autonomy system this may not be the last time I show this video but because it's taking full control it's following a reward function and objective function and all of the problems the ethical and the AI problems that arise like this coast runner problem will arise so we have to solve those problems we have to design that objective function so with that I'd like to thank you and encourage you to come tomorrow because you get a chance to participate in deep traffic, deep reinforcement learning competition thank you very much (applause)