Karl Iagnemma & Oscar Beijbom (Aptiv Autonomous Mobility)

00:00:00.000 | All right, welcome back to 6S094,

00:00:03.440 | Deep Learning for Self-Driving Cars.

00:00:05.440 | Today we have Carl and Yema,

00:00:07.860 | and Oscar Bayboom from Aptiv.

00:00:12.160 | Carl is the president of Aptiv Autonomous Mobility,

00:00:15.240 | where Oscar is the machine learning lead.

00:00:18.080 | Carl founded Neutonomy, as many of you know, in 2013.

00:00:22.280 | It's a Boston-based autonomous vehicle company,

00:00:25.960 | and Neutonomy was acquired by Aptiv in 2017,

00:00:29.160 | and now is part of Aptiv.

00:00:31.040 | Carl and team are one of the leaders

00:00:33.000 | in autonomous vehicle development and deployment,

00:00:36.080 | with cars on roads all over the United States,

00:00:38.120 | several sites.

00:00:39.440 | But most importantly, Carl is MIT through and through,

00:00:43.180 | as also some of you may know, getting his PhD here.

00:00:46.480 | He led a robotics group here

00:00:48.640 | as a research scientist for many years.

00:00:50.720 | So it's really a pleasure to have both Carl

00:00:54.280 | and Oscar with us today.

00:00:55.400 | Please give them a warm welcome.

00:00:57.200 | (audience applauding)

00:01:00.360 | - All right, thanks, Lex.

00:01:02.760 | Yeah, very glad to be back at MIT.

00:01:05.160 | Very impressed that you guys are here during IAP.

00:01:08.040 | My course load during IAP was usually ice skating,

00:01:13.080 | and sometimes there was a wine tasting course.

00:01:15.880 | This was now almost 20 years ago,

00:01:17.520 | and that was pretty much it.

00:01:18.800 | That's where the academic work stopped.

00:01:20.720 | So you guys are here to learn something,

00:01:22.660 | so I'm gonna do my best and try something radical, actually.

00:01:25.520 | Since I'm president now of Aptiv Autonomous Driving,

00:01:28.160 | I'm not allowed to talk about

00:01:29.040 | anything technical or interesting.

00:01:30.840 | I'm gonna flout that a little bit

00:01:32.240 | and raise some topics that we think about

00:01:35.440 | that I think are interesting questions

00:01:38.640 | to keep in the back of your mind

00:01:40.920 | as you're thinking about deep learning

00:01:42.080 | and autonomous driving.

00:01:42.940 | So I'll raise some of those questions.

00:01:44.960 | And then Oscar will actually present

00:01:46.960 | some real-life technology

00:01:49.340 | and some of the work that he has been doing.

00:01:51.580 | Oscar's our machine learning lead.

00:01:53.200 | Some of the work that he and his outstanding team

00:01:55.180 | have been doing around machine learning-based detectors

00:02:00.180 | for the perception problem.

00:02:03.560 | So let me first introduce Aptiv a little bit,

00:02:05.360 | 'cause people usually ask me, like,

00:02:08.240 | what's an Aptiv when I say I work for Aptiv?

00:02:11.000 | Aptiv's actually been around for a long time,

00:02:13.200 | but in a different form.

00:02:14.680 | Aptiv was previously Delphi Technologies,

00:02:17.200 | which was previously part of General Motors.

00:02:19.720 | So everybody's heard of General Motors.

00:02:21.120 | Some of you may have heard of Delphi,

00:02:23.480 | Aptiv spun from Delphi about 14 months ago.

00:02:28.480 | And so Aptiv's a tier one supplier.

00:02:30.800 | They're an automotive company

00:02:32.160 | that industrializes technology.

00:02:34.160 | Essentially, they take software and hardware,

00:02:37.960 | they industrialize it and put it on cars

00:02:39.880 | so it can run for many, many hundreds of thousands of miles

00:02:42.200 | without failing, which is a useful thing

00:02:44.960 | when we think about autonomous driving.

00:02:46.220 | So the themes for Aptiv,

00:02:48.080 | they develop what they say is safer, greener,

00:02:50.720 | and more connected solutions.

00:02:52.960 | Safer means safety systems, active safety,

00:02:55.520 | autonomous driving systems of the type that we're building.

00:02:58.320 | Greener, systems to enable electrification

00:03:02.440 | and kind of green vehicles.

00:03:03.720 | And then more connected connectivity solutions,

00:03:06.160 | both within the vehicle,

00:03:07.420 | transmitting data around the vehicle,

00:03:08.800 | and then externally, wireless communication.

00:03:11.680 | All of these things, as you can imagine,

00:03:14.200 | feed very, very nicely into the future

00:03:17.720 | transportation systems that the software

00:03:19.960 | will actually only be a part of.

00:03:21.600 | So Aptiv is in a really interesting spot

00:03:23.580 | when you think about the future of autonomous driving.

00:03:27.020 | And to give you a sense of scale,

00:03:29.600 | still kind of amazes me.

00:03:32.780 | The biggest my research group ever was at MIT

00:03:34.520 | was like 18, 18 people.

00:03:36.620 | Aptiv is 156,000 employees,

00:03:40.580 | so significant sized organization,

00:03:42.960 | about a $13 billion company by revenue

00:03:45.160 | in about 50 countries around the world.

00:03:47.740 | My group's about 700 people,

00:03:51.140 | so of which Oscar is one very important person.

00:03:53.640 | We're about 700 working on autonomous driving.

00:03:55.840 | We've got about 120 cars on the road

00:03:58.280 | in different countries,

00:03:59.680 | and I'll show you some examples of that.

00:04:01.480 | But first, let me take a trip down memory lane

00:04:04.880 | and show you a couple of snapshots

00:04:07.060 | about where we were not too long ago

00:04:10.400 | kind of as a community, but also me personally.

00:04:13.960 | And this will either inspire or horrify you,

00:04:16.120 | I'm not sure which.

00:04:17.880 | The fact is 2007, there were groups driving around

00:04:22.140 | with cars like running blade servers in the trunk

00:04:24.940 | that were generating so much heat,

00:04:26.580 | you had to install another air conditioner,

00:04:28.920 | which then was drawing so much power,

00:04:30.220 | you had to add another alternator,

00:04:32.060 | and then kind of rinse and repeat.

00:04:33.700 | So it wasn't a great situation.

00:04:35.740 | But people did enough algorithmically, computationally,

00:04:40.740 | to enable these cars,

00:04:43.560 | and this is the DARPA Urban Challenge

00:04:45.220 | for those of you that may be familiar,

00:04:46.740 | to enable these cars to do something useful

00:04:48.680 | and interesting on a closed course.

00:04:50.440 | And it kind of convinced enough people

00:04:53.920 | that given enough devotion of thought and resources

00:04:57.960 | that this might actually become a real thing someday.

00:05:00.860 | So I was one of those people that got convinced.

00:05:03.940 | 2010, this is now, I'm gonna crib from my co-founder Emilio

00:05:09.560 | who was a former MIT faculty member in AeroAstro.

00:05:12.320 | Emilio started up an operation in Singapore through Smart,

00:05:15.140 | who some of you have probably worked with.

00:05:16.640 | So this is some folks from Smart.

00:05:19.000 | That's James, who looks really young in that picture.

00:05:21.440 | He was one of Emilio's students

00:05:23.200 | who was basically taking a golf cart

00:05:26.200 | and turning it into an autonomous shuttle.

00:05:29.520 | It turned out to work pretty well,

00:05:31.000 | and it got people in Singapore excited,

00:05:33.300 | which in turn got us further excited.

00:05:35.300 | 2014, they did a demo where they let people of Singapore

00:05:39.020 | come and ride around in these carts in a garden,

00:05:42.240 | and that worked great over the course of a weekend.

00:05:45.800 | Around this time, we'd started Newtonomy.

00:05:48.080 | We'd actually started a commercial enterprise.

00:05:49.800 | It kind of stepped at least partly away

00:05:51.440 | from MIT at that point.

00:05:53.480 | 2015, we had cars on the road.

00:05:55.480 | This is a Mitsubishi IMEV electric vehicle.

00:05:58.300 | When we had all of our equipment in it,

00:06:00.400 | the front seat was pushed forward so far

00:06:02.080 | that me, I'm about six foot three,

00:06:04.000 | actually couldn't sit in the front seat,

00:06:05.920 | so I couldn't actually accompany people on rides.

00:06:07.880 | It wasn't very practical.

00:06:09.820 | We ended up switching cars to a Renault Zoe platform,

00:06:13.760 | which is the one you see here,

00:06:14.780 | which had a little more leg room.

00:06:16.440 | We were giving, at that point,

00:06:17.520 | open to the public rides in our cars in Singapore

00:06:21.040 | in the part of the city that we were allowed to operate in.

00:06:24.040 | It was a quick transition.

00:06:26.260 | As you can see, just even visually,

00:06:28.840 | the evolution of these systems has come a long way

00:06:31.480 | in a short time, and we're just a point example

00:06:34.340 | of this phenomenon, which is kind of, broadly speaking,

00:06:38.160 | similar across the industry.

00:06:39.720 | But 2017, we joined Aptiv, and we were excited by that

00:06:43.840 | because we, as primarily scientists and technologists,

00:06:47.520 | didn't have a great idea

00:06:48.560 | how we were gonna industrialize this technology

00:06:50.240 | and actually bring it to market

00:06:51.760 | and make it reliable and robust and make it safe,

00:06:55.120 | which is what I'm gonna talk about a little bit here today.

00:06:57.600 | So we joined Aptiv with its global footprint.

00:07:00.040 | Today, we're primarily in Pittsburgh,

00:07:02.200 | Boston, Singapore, and Vegas,

00:07:04.960 | and we've got connectivity to Aptiv's other sites

00:07:08.160 | in Shanghai and Wolfsburg.

00:07:10.160 | Let me tell you a little bit

00:07:11.000 | about what's happening in Vegas.

00:07:11.840 | I think people were here,

00:07:13.080 | when was Luke talking?

00:07:14.920 | Couple days ago, yesterday.

00:07:16.100 | So Luke from Lyft, Luke Vincent,

00:07:17.760 | probably talked a little bit about Vegas.

00:07:20.360 | Vegas is really an interesting place for us.

00:07:22.960 | We've got a big operation there,

00:07:24.080 | 130,000 square foot garage.

00:07:26.080 | We've got about 75 cars.

00:07:28.340 | We've got 30 of those cars on the Lyft network.

00:07:30.980 | So Aptiv technology,

00:07:32.200 | but connecting to the customer through Lyft.

00:07:33.720 | So if you go to Vegas and you open your Lyft app,

00:07:36.320 | it'll ask you, do you wanna take a ride in an autonomous car?

00:07:39.640 | You can opt in, you can opt out, it's up to you.

00:07:41.800 | If you opt in, there's a reasonable chance

00:07:43.520 | one of our cars will pick you up if you call for a ride.

00:07:46.520 | So anybody can do this,

00:07:48.120 | competitors, innocent bystanders,

00:07:50.600 | totally up to you, we have nothing to hide.

00:07:52.360 | Our cars are on the road 20 hours a day,

00:07:54.280 | seven days a week.

00:07:55.600 | If you take a ride, when you get out of the car,

00:07:57.600 | just like any Lyft ride,

00:07:58.800 | you gotta give us a star rating, one through five.

00:08:00.920 | And that, to us, is actually really interesting

00:08:02.720 | because it's a scaler, it's not too rich,

00:08:06.560 | but that star rating, to me,

00:08:08.520 | says something about the ride quality,

00:08:11.000 | meaning the comfort of the trip,

00:08:12.600 | the safety that you felt,

00:08:13.800 | and the efficiency of getting to where you wanted to go.

00:08:16.720 | Our star rating today is 4.95, which is pretty good.

00:08:21.120 | Key numbers, we've given, at this point,

00:08:24.280 | over 30,000 rides to more than 50,000 passengers.

00:08:27.680 | We've driven over a million miles in Vegas

00:08:30.160 | and a little bit additional, but primarily there.

00:08:34.320 | And as I mentioned, the 4.95.

00:08:37.000 | So what does it look like on the road?

00:08:38.280 | I'll show just one video today.

00:08:40.120 | I think Oscar has a few more.

00:08:41.680 | This one's actually in Singapore,

00:08:43.920 | but it's all kind of morally equivalent.

00:08:46.240 | You'll see a sped up, slightly sped up view of a run from,

00:08:51.240 | this is now probably six, seven months old,

00:08:54.200 | on the road in Singapore,

00:08:55.040 | but it's got some interesting stuff

00:08:56.800 | in a fairly typical run.

00:08:59.400 | Some of you may recognize these roads.

00:09:02.000 | We're on the wrong side of the road,

00:09:03.080 | remember, 'cause we're in Singapore.

00:09:04.600 | But to give you an example of some of the types of problems

00:09:07.460 | we have to solve on a daily basis.

00:09:10.380 | So let me run this thing.

00:09:11.660 | And you'll see as this car is cruising down the road,

00:09:16.660 | you have obstacles that we have to avoid,

00:09:20.460 | sometimes in the face of oncoming traffic.

00:09:23.340 | We've got to deal with sometimes situations

00:09:26.340 | where other road users are maybe not perfectly behaving

00:09:29.460 | by the rules.

00:09:30.300 | We've got to manage that in a natural way.

00:09:33.100 | Construction in Singapore, like everywhere else,

00:09:35.780 | is pretty ubiquitous.

00:09:37.220 | And so you have to navigate

00:09:38.500 | through these less structured environments.

00:09:40.680 | People who are sometimes doing things

00:09:44.660 | or indicating some future action,

00:09:46.660 | which you have to make inferences about,

00:09:49.220 | that can be tricky to navigate.

00:09:51.140 | So typical day, a route that any one of us as humans

00:09:55.060 | would drive through without batting an eye, no problem,

00:09:58.340 | is actually presents some really, really complex problems

00:10:02.740 | for autonomous vehicles.

00:10:04.260 | But it's the table stakes these days.

00:10:05.780 | These are the things you have to do if you want to be

00:10:07.300 | on the road, and certainly if you want to drive

00:10:09.740 | millions of miles with very few accidents,

00:10:12.340 | which is what we're doing.

00:10:13.660 | So that's an introduction to Aptiv

00:10:15.220 | and a little bit of background.

00:10:17.500 | So let me talk about, we're gonna talk about learning

00:10:20.700 | and how we think about learning

00:10:22.460 | in the context of autonomous driving.

00:10:25.260 | So there was a period a few years ago

00:10:27.060 | where I think as a community,

00:10:29.100 | people thought that we would be able to go

00:10:30.980 | from pixels to actuator commands

00:10:33.180 | with a single learned architecture,

00:10:35.540 | a single black box.

00:10:36.980 | I'll say, generally speaking,

00:10:39.260 | we no longer believe that's true.

00:10:40.660 | And I shouldn't include we in that.

00:10:43.060 | I didn't believe that was ever true.

00:10:44.460 | But some of us maybe thought that was true.

00:10:46.340 | And I'll tell you part of the reason why,

00:10:48.580 | in part of this talk,

00:10:50.540 | a big part of it comes down to safety.

00:10:53.060 | A big part of it comes down to safety.

00:10:54.700 | And the question of safety, convincing ourselves

00:10:57.940 | that that system, that black box,

00:10:59.660 | even if we could train it to accurately approximate

00:11:03.340 | this massively complex underlying function

00:11:05.900 | that we're trying to approximate,

00:11:07.900 | can we convince ourselves that it's safe?

00:11:10.340 | And it's very, very hard to answer

00:11:11.940 | that question affirmatively.

00:11:13.140 | And I'll raise some of the issues around why that is.

00:11:17.940 | This is not to say that learning methods

00:11:19.860 | are not incredibly useful for autonomous driving

00:11:21.820 | because they absolutely are.

00:11:23.580 | And Oscar will show you examples of why that is

00:11:25.900 | and how Aptiv is using some learning methods today.

00:11:28.900 | But this safety dimension is tricky

00:11:30.700 | because there's actually two axes here.

00:11:34.620 | One is the actual technical safety of the system,

00:11:36.860 | which is to say, can we build a system that's safe,

00:11:39.660 | that's provably in some sense safe,

00:11:41.860 | that we can validate, which we can convince ourselves,

00:11:45.300 | achieves the intended functionality

00:11:47.500 | in our operational design domain,

00:11:49.580 | that adheres to whatever regulatory requirements

00:11:52.980 | might be imposed on our jurisdictions that we're operating.

00:11:56.220 | And there's a whole longer list related to technical safety.

00:11:59.340 | But these are technical problems primarily.

00:12:02.220 | But there's another dimension,

00:12:03.300 | which up here is called perceived safety,

00:12:05.980 | which is to say, when you ride in a car,

00:12:08.300 | even if it's safe, do you believe that it's safe?

00:12:11.900 | And therefore, will you wanna take another trip?

00:12:14.420 | Which sounds kinda squishy,

00:12:16.300 | and as engineers, we're typically uncomfortable

00:12:18.980 | with that kind of stuff,

00:12:19.820 | but it turns out to be really important

00:12:21.260 | and probably harder to solve

00:12:22.900 | because it's a little bit squishy.

00:12:24.860 | And quite obviously, we gotta sit up here, right?

00:12:26.940 | We gotta be in this upper right-hand corner

00:12:28.540 | where we have not only a very safe car

00:12:30.980 | from a technical perspective,

00:12:32.060 | but one that feels safe, that inspires confidence

00:12:34.940 | in riders, in regulators, and in everybody else.

00:12:38.020 | So how do we get there in the context

00:12:40.940 | of elements of this system that may be black boxes,

00:12:44.340 | for lack of a better word?

00:12:46.340 | What's required is trust.

00:12:48.180 | You know, how do we get to this point

00:12:49.420 | where we can trust neural networks

00:12:51.060 | in the context of safety-critical systems,

00:12:53.180 | which is what an autonomous vehicle is?

00:12:55.180 | It really comes down to this question of,

00:12:58.300 | how do we convince ourselves

00:12:59.740 | that we can validate these systems?

00:13:01.220 | Again, validating the system,

00:13:03.100 | ensuring that it can meet the requirements,

00:13:07.140 | the operational requirements in the domain of interest

00:13:09.700 | that are imposed by the user, all right?

00:13:12.660 | There's three dimensions to this key question

00:13:17.100 | of understanding how to validate,

00:13:18.220 | and I'm gonna just briefly introduce

00:13:19.820 | some topics of interest around each of these.

00:13:22.500 | But the first one, trusting the data.

00:13:25.900 | Trusting the data.

00:13:27.660 | Do we actually have confidence

00:13:29.700 | about what goes into this algorithm?

00:13:32.260 | I mean, everybody knows garbage in, garbage out.

00:13:35.020 | There's various ways that we can make this garbage.

00:13:38.340 | We can have data which is insufficiently covering our domain,

00:13:42.180 | not representative of the domain.

00:13:43.540 | We can have data that's poorly annotated

00:13:45.620 | by our third-party trusted partners

00:13:47.300 | who we've trusted to label certain things of interest.

00:13:50.420 | So do we trust the data that's going in

00:13:52.620 | to the algorithm itself?

00:13:53.900 | Do we trust the implementation?

00:13:56.460 | We've got a beautiful algorithm,

00:13:58.060 | super descriptive, super robust,

00:14:00.100 | not brittle at all, well-trained,

00:14:02.300 | and we're running it on poor hardware.

00:14:03.940 | We've coded it poorly.

00:14:05.500 | We've got buffer overruns right and left.

00:14:07.540 | Do we trust the implementation

00:14:09.260 | to actually execute in a safe manner?

00:14:11.340 | And do we trust the algorithm?

00:14:14.620 | Again, generally speaking,

00:14:15.740 | we're trying to approximate really complicated functions.

00:14:19.020 | I don't think we typically use neural networks

00:14:21.300 | to approximate linear systems.

00:14:23.340 | So this is a gnarly, nasty function

00:14:25.340 | which has problems of critical interest

00:14:30.340 | which are really rare.

00:14:33.100 | In fact, they're the only ones of interest.

00:14:35.180 | So there's these events that happen very, very infrequently

00:14:38.380 | that we absolutely have to get right.

00:14:40.300 | It's a hard problem to convince ourselves

00:14:43.020 | that the algorithm is gonna perform properly

00:14:45.660 | in these unexpected and rare situations.

00:14:48.660 | So these are the sorts of things that we think about

00:14:51.740 | and that we have to answer in an intelligent way

00:14:55.220 | to convince ourselves that we have

00:14:56.780 | a validated neural network-based system.

00:15:00.060 | Okay, let me just step through each of these topics

00:15:05.020 | really quickly.

00:15:06.100 | So the topic of validation,

00:15:09.020 | what do we mean by that and why it is hard?

00:15:11.300 | There's a number of different dimensions here.

00:15:13.900 | The first is that we don't have insight

00:15:15.380 | into the nature of the function

00:15:16.500 | that we're trying to approximate.

00:15:18.860 | The underlying phenomena is really complicated.

00:15:21.580 | Again, if it weren't, we'd probably possibly be modeling it

00:15:24.940 | using different techniques.

00:15:25.940 | We'd write a closed-form equation to describe it.

00:15:28.400 | So that's a problem.

00:15:30.180 | Second, again, the accidents,

00:15:33.620 | the actual crashes on the road,

00:15:35.420 | what's going crashes and not accidents,

00:15:37.340 | these are rare.

00:15:38.600 | Luckily, they're very rare.

00:15:40.380 | But it makes the statistical argument

00:15:42.580 | around these accidents

00:15:44.540 | and being able to avoid these accidents

00:15:45.820 | really, really difficult.

00:15:47.060 | If you believe Rand, and they're pretty smart folks,

00:15:51.460 | they say you gotta drive 275 million miles

00:15:54.260 | without accident, without a crash,

00:15:55.980 | to claim a lower fatality rate

00:15:57.660 | than a human with 95% confidence.

00:16:00.260 | Well, how are we gonna do that?

00:16:01.820 | Can we think about using some correlated incident,

00:16:06.080 | maybe some kind of close call,

00:16:08.180 | as a proxy for accidents, which may be more frequent,

00:16:10.620 | and maybe back in that way?

00:16:12.500 | There's a lot of questions here,

00:16:14.020 | which I won't say we don't have any answers to,

00:16:16.180 | 'cause I wouldn't go that far,

00:16:17.180 | but they're hard questions.

00:16:19.060 | They're not questions with obvious answers.

00:16:21.860 | So this is one of them,

00:16:23.140 | this issue of rare events.

00:16:24.580 | The regulatory dimension is one of these known unknowns.

00:16:29.940 | How do we validate a system

00:16:31.700 | if the requirements that may be imposed upon us

00:16:34.100 | from outside regulatory bodies are still to be written?

00:16:38.620 | That's difficult.

00:16:40.140 | So there's a lack of consensus

00:16:42.680 | on what the safety target should be for these systems.

00:16:46.180 | This is obviously evolving.

00:16:47.540 | Smart people are thinking about this.

00:16:49.420 | But today, it's not at all clear.

00:16:51.860 | If you're driving in Las Vegas,

00:16:53.220 | if you're driving in Singapore,

00:16:54.620 | if you're driving in San Francisco,

00:16:56.020 | or anywhere in between, what this target needs to be.

00:16:58.820 | And then lastly, and this is a really interesting one,

00:17:03.880 | we can get through a validation process for a bill to code.

00:17:06.940 | Let's assume we can do that.

00:17:08.360 | Well, what happens when we wanna update the code?

00:17:10.480 | 'Cause obviously we will.

00:17:11.860 | Does that mean we have to start that validation process

00:17:13.940 | again from scratch,

00:17:14.820 | which will unavoidably be expensive and lengthy?

00:17:18.520 | Well, what if we only change a little bit of the code?

00:17:20.260 | What if we only change one line?

00:17:22.020 | But what if that one line is the most important line of code

00:17:24.740 | in the whole code base?

00:17:25.940 | This is one that I can tell you

00:17:29.700 | keeps a lot of people up at night,

00:17:30.980 | this question of revalidation.

00:17:33.260 | And then not even, again, keep that code base fixed.

00:17:35.940 | What if we move from one city to the next?

00:17:38.200 | And let's say that city is quite similar

00:17:39.620 | to your previous city, but not exactly the same.

00:17:42.380 | How do we think about validation

00:17:44.020 | in the context of new environments?

00:17:46.280 | So this continuous development issue is a challenge.

00:17:50.780 | All right, let me move on to talking about the data.

00:17:53.540 | There's probably people in this room

00:17:54.620 | who are doing active research in this area

00:17:57.020 | 'cause it's a really interesting one.

00:17:59.360 | But there's a couple of obvious questions, I would say,

00:18:02.780 | that we think about when we think about data.

00:18:06.340 | We can have a great algorithm,

00:18:07.740 | and if we're training it on poor data

00:18:09.940 | for one reason or another, we won't have a great output.

00:18:12.800 | So one thing we think about is the sufficiency,

00:18:16.660 | the completeness of the data,

00:18:19.100 | and the bias that may be inherent in the data

00:18:21.420 | for our operational domain.

00:18:23.820 | If we wanna operate 24 hours a day,

00:18:26.900 | and we only train on data collected during daytime,

00:18:29.840 | we're probably gonna have an issue.

00:18:31.580 | Annotating the data is another dimension of the problem.

00:18:36.420 | We can collect raw data that's sufficient,

00:18:38.420 | that covers our space, but when we annotate it,

00:18:41.100 | when we hand it off to a third party,

00:18:42.500 | 'cause it's typically a third party,

00:18:44.540 | to mark up the interesting aspects of it,

00:18:47.920 | we provide them some specifications,

00:18:49.340 | but we put a lot of trust in that third party,

00:18:51.780 | and trust that they're gonna do a good job

00:18:55.900 | annotating the interesting parts,

00:18:57.340 | and not the uninteresting parts,

00:18:58.880 | that they're gonna catch all the interesting parts

00:19:00.540 | that we've asked them to catch, et cetera.

00:19:02.640 | So this annotation part, which seems very mundane,

00:19:06.500 | very easy to manage, and kind of like low-hanging fruit,

00:19:10.420 | is in fact another key aspect

00:19:13.120 | of ensuring that we can trust the data.

00:19:16.020 | Okay, and this reference just kind of points to the fact

00:19:19.020 | that there are, again, smart people

00:19:21.300 | thinking about this problem,

00:19:22.540 | which rears its head in many domains

00:19:24.620 | beyond autonomous driving.

00:19:26.540 | Now what about the algorithms themselves?

00:19:31.060 | So moving on from the data to the actual algorithm,

00:19:34.800 | how do we convince ourselves that that algorithm,

00:19:38.420 | that like any kind of learning-based algorithm,

00:19:41.980 | we've trained on a training set,

00:19:44.420 | is gonna do well on some unknown test set?

00:19:48.120 | Well, there's a couple kind of properties

00:19:52.220 | of the algorithm that we can look at,

00:19:53.780 | that we can kind of interrogate,

00:19:55.860 | and kind of poke at to convince ourselves

00:19:58.740 | that that algorithm will perform well.

00:20:00.880 | You know, one is invariance,

00:20:03.580 | and the other one, we can say, is stability.

00:20:06.260 | If we make small perturbations to this function,

00:20:10.000 | does it behave well?

00:20:11.180 | Given kind of, let's say, a bounded input,

00:20:13.660 | do we see a bounded output?

00:20:15.460 | Or do we see some wild response?

00:20:17.640 | You know, I'm sure you've all heard of examples

00:20:22.060 | of adversarial images that can confuse

00:20:26.700 | learning-based classifiers.

00:20:28.460 | So it's a turtle.

00:20:31.060 | You show it a turtle, it says, "Well, that's a turtle."

00:20:33.460 | And then you show it a turtle that's maybe fuzzed

00:20:35.180 | with a little bit of noise that the human eye can't perceive.

00:20:38.320 | So it still looks like a turtle,

00:20:39.940 | and it tells you it's a machine gun.

00:20:41.740 | Obviously, for us in the driving domain,

00:20:44.860 | we want a stop sign to be correctly identified

00:20:46.940 | as a stop sign 100 times of 100.

00:20:49.380 | We don't want that stop sign,

00:20:51.380 | if somebody goes up and puts a piece of duct tape

00:20:53.180 | in the lower right-hand corner,

00:20:54.220 | to be interpreted as a yield sign, for example.

00:20:58.000 | So this question of the properties of the algorithm,

00:21:02.300 | its invariance, its stability,

00:21:04.260 | is something of high interest.

00:21:08.460 | And then lastly, to add one more point to this,

00:21:12.540 | this notion of interpretability.

00:21:14.620 | So interpretability, understanding why an algorithm

00:21:17.880 | made a decision that it made.

00:21:19.500 | This is the sort of thing that may not be a nice-to-have,

00:21:23.220 | may actually be a requirement,

00:21:25.100 | and would likely to be a requirement

00:21:26.700 | from the regulatory groups

00:21:27.900 | that I was referring to a minute ago.

00:21:29.740 | So let's say, imagine the case of a crash,

00:21:32.540 | where the system that was governing

00:21:34.580 | your trajectory generator was a data-driven system,

00:21:38.620 | was a deep-learning-based trajectory generator.

00:21:42.460 | Well, you may need to explain to someone

00:21:44.900 | exactly why that particular trajectory

00:21:47.740 | was generated at that particular moment.

00:21:50.020 | And this may be a hard thing to do,

00:21:52.340 | if the generator was a data-driven model.

00:21:55.180 | Now, obviously, there are people working

00:21:56.580 | and doing active research into this specific question

00:21:59.680 | of interpretable learning methods,

00:22:02.940 | but it's a thorny one.

00:22:05.420 | It's a very, very difficult topic,

00:22:07.420 | and it's not at all clear to me when and if

00:22:11.180 | we'll get to the stage where we can,

00:22:13.140 | to even a technical audience,

00:22:16.260 | but beyond that, to a lay jury,

00:22:18.580 | be able to explain why algorithm X made decision Y.

00:22:22.340 | Okay, so with all that in mind,

00:22:25.820 | let me talk a little bit about safety.

00:22:32.580 | That all maybe sounds pretty bleak.

00:22:34.180 | You think, well, man, why are we taking this course

00:22:35.820 | with Lex, 'cause we're never gonna really use this stuff.

00:22:37.700 | But in fact, we can.

00:22:39.980 | We can and will, as a community.

00:22:42.780 | There's a lot of tools we can bring to bear

00:22:45.720 | to think about neural networks,

00:22:48.780 | and they're, generally speaking,

00:22:49.820 | within the context of a broader safety argument.

00:22:52.860 | I think that's the key.

00:22:53.940 | We tend not to think about using a neural network

00:22:57.000 | as a holistic system to drive a car,

00:23:00.500 | but we'll think about it as a submodule

00:23:02.780 | that we can build other systems around,

00:23:05.600 | generally speaking, that which we can say,

00:23:07.980 | maybe make more rigorous claims about their performance,

00:23:10.700 | their underlying properties,

00:23:13.000 | and then therefore make a convincing,

00:23:14.900 | holistic safety argument that this end-to-end system is safe.

00:23:19.180 | We have tools, functional safety is,

00:23:22.860 | maybe familiar to some of you.

00:23:24.140 | It's something we think about a lot

00:23:25.060 | in the automotive domain.

00:23:26.360 | And SOTIF, which stands for

00:23:29.740 | Safety of the Intended Functionality,

00:23:31.460 | we're basically asking ourselves the question,

00:23:34.100 | is this overall function doing what it's intended to do?

00:23:38.580 | Is it operating safely?

00:23:39.740 | And is it meeting its specifications?

00:23:41.520 | There's kind of an analogy here

00:23:43.340 | to validation and verification, if you will.

00:23:47.180 | And we have to answer these questions

00:23:48.860 | around functional safety and SOTIF affirmatively,

00:23:52.720 | even when we have neural network-based elements

00:23:57.020 | in order to eventually put this car on the road.

00:24:00.260 | All right, so I mentioned that we need to do some embedding.

00:24:02.940 | This is an example of what it might look like.

00:24:05.240 | We refer to this as,

00:24:07.620 | sometimes we call this caging the learning.

00:24:10.060 | So we put the learning in a box.

00:24:11.580 | It's this powerful animal we wanna control.

00:24:14.360 | And in this case, it's up there at the top in red.

00:24:17.340 | That might be that trajectory proposer I was talking about.

00:24:21.360 | So let's say we've got a powerful trajectory proposer.

00:24:23.780 | We wanna use this thing.

00:24:24.740 | We've got it on what we call our performance compute,

00:24:26.980 | our high-powered compute.

00:24:28.380 | It's maybe not automotive grade.

00:24:29.820 | It's got some potential failure modes,

00:24:31.460 | but it's generally speaking, good performance.

00:24:34.060 | Let's go there.

00:24:35.300 | And we've got our neural network-based generator on it,

00:24:38.060 | which we can say some things about,

00:24:39.380 | but maybe not everything we'd like to.

00:24:41.280 | Well, we make the argument that if we can surround that,

00:24:44.780 | so if we can cage it, kind of underpin it

00:24:47.860 | with a safety system that we can say

00:24:50.180 | very rigorous things about its performance,

00:24:54.320 | then generally speaking, we may be okay.

00:24:56.060 | There may be a path to using neural networks

00:24:58.460 | on autonomous vehicles if we can wrap them

00:25:01.680 | in a safety architecture that we can say

00:25:03.900 | a lot of good things about.

00:25:05.700 | And this is exactly what this represents.

00:25:08.140 | So I'm gonna conclude my part of the talk here,

00:25:10.420 | hand it over to Oscar, with kind of a quote, an assertion.

00:25:15.420 | One of my engineers insisted I show today.

00:25:18.480 | The argument is the following.

00:25:20.460 | Engineering is inching closer to the natural sciences.

00:25:22.980 | I won't say how much closer, but closer.

00:25:24.920 | We're creating things that we don't fully understand,

00:25:27.460 | and then we're investigating the properties of our creation.

00:25:30.320 | We're not writing down closed-form functions.

00:25:33.100 | That would be too easy.

00:25:35.440 | We're generating these immensely complex

00:25:38.100 | functional approximators, and then we're just poking at 'em

00:25:40.980 | in different ways and saying, boy, well,

00:25:42.020 | what does this thing do under these situations?

00:25:44.500 | And I'll leave you with one image,

00:25:46.580 | which I'll present without comment,

00:25:48.100 | and then hand it over to Oscar.

00:25:50.000 | All right, thank you.

00:25:51.980 | (audience applauding)

00:25:55.140 | - So thanks a lot, Carl.

00:25:58.700 | Thanks, Lex, for the invite.

00:26:00.220 | Yes, my name is Oscar.

00:26:02.120 | I run the machine learning team at Aptiv Neutronomy.

00:26:05.660 | So let me begin with this slide.

00:26:08.920 | You know, not long ago, image classification was,

00:26:13.140 | you know, quite literally a joke.

00:26:14.380 | So this is an actual comic.

00:26:17.420 | How many have seen this before?

00:26:20.060 | Okay, well, I was doing my PhD in this era

00:26:22.820 | where, you know, building a bird classifier

00:26:26.820 | was like a PhD project, right?

00:26:28.660 | And it was, you know, it's funny 'cause it's true.

00:26:32.500 | And then, of course, as you well know,

00:26:34.660 | the deep learning revolution happened,

00:26:36.340 | and Lex, you know, previous introductory slides

00:26:38.940 | gives a great overview.

00:26:40.780 | I don't wanna redo that.

00:26:42.140 | I just wanna say sort of a straight line

00:26:44.420 | from what I consider the breakthrough paper

00:26:46.740 | by Krzyzewski et al.

00:26:48.900 | To the work I'll be talking about today,

00:26:51.020 | I'll start with these three.

00:26:51.900 | So you had the, you know, deep learning,

00:26:54.580 | end-to-end learning for image net classification

00:26:57.180 | by Krzyzewski et al.

00:26:58.300 | That paper's been cited 35,000 times.

00:27:00.940 | I checked yesterday.

00:27:01.980 | Then, 2014, Ross Gershick et al. at Berkeley

00:27:06.180 | basically showed how to, you know,

00:27:08.820 | repurpose the deep learning architecture

00:27:11.540 | to do detection in images.

00:27:13.860 | And that was the first time

00:27:14.700 | when the visual community really started seeing,

00:27:16.780 | okay, so classification is more general.

00:27:18.460 | You can classify anything,

00:27:19.460 | an image, an audio signal, whatever, right?

00:27:21.780 | But detection in images was very intimate

00:27:23.900 | to the computer vision community.

00:27:25.100 | We thought we were best in the world, right?

00:27:27.260 | So when this paper came out,

00:27:28.620 | that was sort of the final argument for like,

00:27:32.220 | okay, we all need to do deep learning now.

00:27:34.320 | Right, and then 2016, this paper came out,

00:27:37.860 | the single-shot multi-box detector,

00:27:40.020 | which I think is a great paper by Liu et al.

00:27:43.460 | So if you haven't looked at this paper,

00:27:46.220 | by all means, read them carefully.

00:27:48.020 | So as a result,

00:27:51.180 | you know, performance is no longer a joke, right?

00:27:54.980 | So this is a network that we developed in my group.

00:27:59.460 | So it's a joint image classification segmentation network.

00:28:04.460 | This thing, we can run this at 200 hertz on a single GPU.

00:28:07.780 | And in this video, in this rendering,

00:28:11.840 | there's no tracking applied.

00:28:13.940 | There's no temporal smoothing.

00:28:15.260 | Every single frame is analyzed

00:28:17.420 | independently from the other one.

00:28:20.300 | And you can see that we can model several different classes,

00:28:23.300 | you know, both boxes and the surfaces at the same time.

00:28:29.480 | Here's my cartoon drawing of a perception system

00:28:32.660 | on an autonomous vehicle.

00:28:33.660 | So you have the three different main sensibilities.

00:28:38.000 | Typically have some module that does detection and tracking.

00:28:41.300 | You know, there's tons of variations of this, of course,

00:28:44.200 | but you have some sort of sensor pipelines,

00:28:46.600 | and then in the end, you have a tracking and fusion step.

00:28:49.460 | So what I showed you in the previous video

00:28:52.380 | is basically this part.

00:28:53.260 | So like I said, there was no tracking,

00:28:55.140 | but it's like going from the camera to detections.

00:28:58.600 | And if you look, you know, when I started,

00:29:01.920 | so I come strict from the computer science

00:29:04.240 | learning community, so when I started looking

00:29:06.980 | at this pipeline, I'm like, why are there so many steps?

00:29:09.300 | Why aren't we optimizing things end to end?

00:29:11.740 | So obviously, there's a real temptation

00:29:14.160 | to just wrap everything in a kernel.

00:29:15.540 | It's a very well-defined input/output function.

00:29:18.620 | And like Carl alluded to, it's one that can be verified

00:29:22.460 | quite well, assuming you have the right data.

00:29:25.520 | I'm not gonna be talking about this.

00:29:28.260 | I am gonna talk about this,

00:29:30.540 | namely the building a deep learning kernel

00:29:33.860 | for the LiDAR pipeline.

00:29:35.120 | And LiDAR pipeline is arguably the backbone

00:29:37.980 | of the perception system

00:29:39.620 | for most autonomous driving systems.

00:29:43.860 | So what we're gonna do is,

00:29:44.980 | so this is basically gonna be the goal here.

00:29:47.580 | So we're gonna have a point cloud,

00:29:49.380 | it's input, and we're gonna have a neural network

00:29:52.940 | that takes that as input and then generates

00:29:54.980 | 3D bounding boxes that are in a well-coordinated system.

00:29:57.360 | So it's like 20 meters that way,

00:29:59.720 | it's two meters wide, so long,

00:30:01.780 | this rotation and this orientation and so on.

00:30:04.180 | So yeah, so that's what this talk is about.

00:30:09.420 | So I'm gonna talk about point pillars,

00:30:10.860 | which is a new method we developed for this,

00:30:13.100 | and new scenes, which is a benchmark data that we released.

00:30:16.460 | Okay, so what is point pillars?

00:30:18.260 | Well, it's a novel point cloud encoder.

00:30:21.300 | So what we do is we learn a representation

00:30:23.100 | that is suitable for downstream detection.

00:30:25.180 | It's almost like a, the main innovation

00:30:27.100 | is the translation from the point cloud

00:30:29.260 | to a canvas that can then be processed

00:30:31.620 | by a similar architecture that you would use in an image.

00:30:35.780 | And I'll show you how it performs,

00:30:37.620 | you know, all published measurement on KITTI

00:30:39.300 | by a large margin, especially with respect

00:30:42.980 | to inference speed.

00:30:45.860 | And there's a pre-printout and some code available

00:30:48.820 | if you guys wanna play around with it.

00:30:50.660 | So the architecture that we're gonna use

00:30:54.260 | looks something like this.

00:30:56.300 | And I should say, most papers in this space

00:31:00.900 | use this architecture.

00:31:02.780 | So it's kind of a natural design, right?

00:31:04.860 | So you have the point cloud at the top,

00:31:06.880 | you have this encoder, and that's where we introduce

00:31:09.380 | the point pillars, but you can have,

00:31:10.860 | I'll show you guys, you can have various types of encoders.

00:31:14.540 | And then after that, that feeds into a backbone,

00:31:16.600 | which is now a standard convolutional 2D backbone.

00:31:19.780 | You have a detection head, and you might have,

00:31:22.220 | you may or may not have a segmentation head on that.

00:31:25.100 | The point is that after the encoder,

00:31:26.620 | everything looks just like, the architecture's

00:31:28.900 | very, very similar to the SSD architecture

00:31:31.060 | or the RCNN architecture.

00:31:32.540 | So let's go into a little bit more detail, right?

00:31:38.020 | So the range, so what you're given here

00:31:40.420 | is a range of D meters, so you wanna model,

00:31:43.340 | you know, 40 meters, a 40 meter circle

00:31:45.740 | around the vehicle, for example.

00:31:47.740 | You have certain resolution of your bins,

00:31:51.500 | and then a number of output channels, right?

00:31:53.820 | So your input is a set of pillars,

00:31:55.900 | or in the pillar here is a vertical column, right?

00:31:59.100 | So you have N, M of those that are non-empty in the space.

00:32:03.060 | And you say a pillar P contains all the points,

00:32:05.500 | which are a lot of point X, Y, C, and intensity.

00:32:09.100 | And there's N sub, M indexed by M points in each pillar,

00:32:13.700 | right, so just to say that it varies, right?

00:32:16.940 | So it could be one single point at a particular location,

00:32:19.580 | it could be 200 points.

00:32:20.980 | And then it's centered around this bin.

00:32:23.220 | And the goal here is to produce a tensor as a fixed size.

00:32:27.540 | So it's height, which is, you know,

00:32:29.780 | range of a resolution, width, range of a resolution,

00:32:33.100 | and then this parameter C.

00:32:35.340 | C is the number of channels, so in an image,

00:32:38.420 | C will be three.

00:32:39.820 | We don't necessarily care about that.

00:32:41.400 | We call it a pseudo-image, but it's the same thing.

00:32:43.740 | It's a fixed number of channels

00:32:45.220 | that the backbone can then operate on.

00:32:47.140 | Yeah, so here's the same thing without math, right?

00:32:52.860 | So you have a lot of points, and then you have this space

00:32:55.140 | where you just grid it up in these pillars, right?

00:32:58.700 | Some are empty, some are not empty.

00:33:00.900 | So in this sort of, with this notation,

00:33:02.900 | let me give a little bit of a literature review.

00:33:05.900 | What people tend to do is you take each pillar,

00:33:07.980 | and you divide it into voxels, right?

00:33:09.580 | So now you have a 3D voxel grid, right?

00:33:11.780 | And then you say, I'm gonna extract

00:33:13.140 | some sort of features for each voxel.

00:33:14.460 | For example, how many points are in this voxel?

00:33:16.620 | Or what is the maximum intensity

00:33:18.640 | of all the points in this voxel?

00:33:20.740 | Then you extract features for the whole pillar, right?

00:33:23.580 | What is the max intensity across all the points

00:33:26.660 | in the whole pillar, right?

00:33:28.420 | All of these are hand-engineered functions

00:33:31.500 | that generates the fixed length output.

00:33:33.820 | So what you can do is you can now concatenate them,

00:33:36.380 | and their output is this tensor x, y, z.

00:33:40.940 | So then, VoxelNet came around, I'd say, a year or so ago.

00:33:49.420 | Maybe a little bit more by now.

00:33:51.740 | So they do the first, the first step is similar, right?

00:33:54.620 | So you divide each pillar into voxels,

00:33:56.520 | and then you take, you map the points in each voxels.

00:34:00.700 | And the novel thing here is that

00:34:02.380 | they got rid of the feature engineering.

00:34:03.980 | So they said, we'll map it from a voxel

00:34:06.940 | to features using a PointNet.

00:34:10.300 | And I'm not gonna get into the details of a PointNet,

00:34:12.220 | but it's basically a network architecture

00:34:15.780 | that allows you to take a point cloud

00:34:18.820 | and map it to, again, a fixed length representation.

00:34:21.920 | So it's a series of 1D convolutions and max pooling layers.

00:34:27.220 | It's a very neat paper, right?

00:34:29.240 | So what they did is they, okay,

00:34:30.420 | we say we apply that to each voxel,

00:34:32.340 | but now I end up with this awkward four-dimensional tensor

00:34:34.620 | 'cause I still have XYZ from the voxels,

00:34:37.660 | and then I have this C-dimensional output

00:34:41.860 | from the PointNet.

00:34:42.900 | So then they have to consolidate the Z dimension

00:34:45.540 | through a 3D convolution, right?

00:34:47.740 | And now you achieve your XYZ tensor.

00:34:50.980 | So now you're ready to go.

00:34:52.060 | So it's very nice in the sense that it's end-to-end method.

00:34:54.800 | They showed good performance,

00:34:57.020 | but at the end of the day, it was very slow.

00:34:58.260 | They got like five hertz runtime.

00:35:00.620 | And the culprit here is this last step,

00:35:03.700 | so the 3D convolution.

00:35:05.780 | It's much, much slower than a standard 2D convolution.

00:35:09.040 | All right, so here's what we did.

00:35:12.800 | We basically said, let's just forget about voxels.

00:35:15.940 | We'll take all the points in the pillar

00:35:17.900 | and we'll put it straight through PointNet.

00:35:21.700 | That's it.

00:35:22.540 | So just that single change gave a 10- to 100-fold speedup

00:35:29.260 | from VoxelNet.

00:35:30.940 | And then we simplified the PointNet.

00:35:33.260 | So now, instead of having,

00:35:34.380 | so PointNet can have several layers

00:35:35.980 | and several modules inside it.

00:35:37.780 | So we simplified it to a single 1D convolution

00:35:40.300 | and max pooling layer.

00:35:41.400 | And then we showed you can get a really fast implementation

00:35:45.380 | by taking all your pillars that are not empty,

00:35:48.140 | stack them together into a nice, dense tensor

00:35:50.420 | with a little bit of padding here and there.

00:35:52.620 | And then you can run the forward pass with a single,

00:35:56.780 | you can pose it as a 2D convolution

00:35:59.020 | with a one-by-one kernel.

00:36:00.460 | So the final encoder runtime is now 1.3 milliseconds,

00:36:06.180 | which is really, really fast.

00:36:08.340 | So the full method looks like this.

00:36:12.540 | So you have the point cloud,

00:36:14.300 | you have this pillar feature net, which is the encoder.

00:36:17.580 | So the different steps there,

00:36:20.540 | that feeds straight into the backbone

00:36:22.820 | and your detection heads.

00:36:24.100 | And there you go.

00:36:25.180 | So it's still a multi-stage architecture,

00:36:28.700 | but of course the key is that none of the steps are,

00:36:32.100 | all the steps are fully parameterized.

00:36:34.780 | And we can back propagate through the whole thing

00:36:37.980 | and learn it.

00:36:38.820 | So putting these things together,

00:36:43.500 | these were the results we got on the Qt Benchmark.

00:36:46.100 | So if you look at the car class, right,

00:36:51.340 | we actually got the highest performance,

00:36:53.580 | so this is I think the bird's eye view metric.

00:36:56.340 | And we even outperformed the methods

00:36:58.820 | that relied on LiDAR and vision.

00:37:00.420 | And we did that running at a little bit over 60 hertz.

00:37:05.780 | And this is, like I said, this is in terms of bird's eye view

00:37:15.100 | we can also measure the 3D benchmark

00:37:17.780 | and we get the same, very similar performance.

00:37:23.580 | Yeah, so, you know, car did well, cyclist did well,

00:37:28.340 | pedestrian there was one or two methods,

00:37:30.780 | future methods that did a little bit better.

00:37:32.700 | But then in aggregate on the top left,

00:37:35.300 | we ended up on top.

00:37:36.900 | So, yeah.

00:37:37.740 | And I put a little asterisk here,

00:37:40.660 | this is compared to published methods

00:37:43.020 | at the time of submission.

00:37:44.940 | And so many things happening so quickly.

00:37:47.500 | So there's tons of, you know,

00:37:49.620 | submissions on the Qt leaderboard

00:37:51.180 | that are completely anonymous,

00:37:52.980 | so we don't even know, you know,

00:37:55.180 | what was the input, what data did they use.

00:37:57.940 | So we only compare it to published methods.

00:38:00.140 | So here's some qualitative results.

00:38:04.660 | You have the, you know, just for visibilization

00:38:07.300 | you can project them into the image.

00:38:08.460 | So you see the gray boxes are the ground truth

00:38:10.260 | and the colored ones are the predictions.

00:38:13.500 | And yeah, some challenging ones,

00:38:21.220 | it's so small here.

00:38:22.700 | So we have, for example, the person on the right there,

00:38:25.420 | that's a person with a little stand

00:38:29.260 | got interpreted as a bicycle.

00:38:30.820 | We have this man on the ladder,

00:38:33.060 | which is an actual annotation error.

00:38:34.540 | So we discovered it as a person,

00:38:36.460 | but it wasn't annotated in the data.

00:38:38.260 | Here's a child on a bicycle that didn't get detected.

00:38:44.340 | So that's a, you know, that's a bummer.

00:38:50.020 | Okay, so that was KITTI,

00:38:54.340 | and then I just wanted to show you guys,

00:38:56.940 | of course we can run this on our vehicles.

00:39:00.140 | So this is a rendering.

00:39:01.660 | We just deploy the network at two hertz

00:39:04.980 | on the full 360 sensor suite.

00:39:08.700 | Input is still alive, you know, a few lighter sweeps,

00:39:12.180 | but just projected into the images for visualization.

00:39:17.660 | And again, no tracking or smoothing applied here.

00:39:19.740 | So it's every single frame is analyzed independently.

00:39:24.020 | See those arrows sticking out?

00:39:28.500 | That's the velocity estimate.

00:39:30.700 | So we actually show how you can,

00:39:32.260 | yeah, you can actually accumulate multiple point clouds

00:39:37.060 | into this method,

00:39:37.940 | and now you can start reasoning about velocity as well.

00:39:40.660 | (no audio)

00:39:42.980 | So the second part I want to talk about is NuScenes,

00:39:52.660 | which is a new benchmark data set that we have published.

00:39:57.660 | So what is NuScenes?

00:39:58.700 | So it's 1,020 second scenes

00:40:01.820 | that we collected with our development platform.

00:40:05.140 | So it's a full, it's the same platform that Carl showed,

00:40:08.740 | or a sort of previous generation platform, the Zoe vehicle.

00:40:12.060 | So it's full, you know, the full automotive sensor suite,

00:40:15.660 | data is registered and synced in 360 degree view.

00:40:20.620 | And it's also fully annotated with 3D bounding boxes.

00:40:22.860 | I think there's over one million 3D bounding boxes.

00:40:27.020 | And we actually make this freely available for research.

00:40:29.580 | So you can go to nuscenes.org right now

00:40:32.820 | and download a teaser release, which is 100 scenes,

00:40:38.260 | the full release will be in about a month.

00:40:40.380 | And of course the motivation is straightforward, right?

00:40:44.940 | So, you know, the whole field is driven by benchmark,

00:40:48.060 | and you know, without image, I don't think none of it,

00:40:51.020 | it may be the case that none of us are here,

00:40:52.780 | we're here, right?

00:40:53.620 | Because they may never have been able

00:40:54.860 | to write that first paper

00:40:56.740 | and sort of start this whole thing going.

00:40:58.960 | And when I started looking at 3D,

00:41:01.980 | I looked at the Kili benchmark,

00:41:03.140 | which is truly groundbreaking.

00:41:05.540 | I don't want to take anything away,

00:41:07.060 | but it was becoming outdated.

00:41:08.580 | They don't have full 3D view, they don't have any radar.

00:41:13.380 | So I think this offers an opportunity

00:41:15.980 | to sort of push the field forward a little bit.

00:41:18.880 | Right, and just as a comparison,

00:41:22.500 | this is sort of the most similar benchmark.

00:41:25.980 | And really the only one that you can really compare to

00:41:30.020 | is Kiti.

00:41:30.860 | But so there's other data sets that have maybe LIDAR only,

00:41:35.540 | tons of data sets that have image only, of course.

00:41:39.740 | But it's quite a big step up from Kiti.

00:41:43.420 | Yeah, some details.

00:41:46.180 | So you see the layout with the radars along the edge,

00:41:51.100 | all the cameras on the roof and the top LIDAR,

00:41:55.060 | and some of the receptive fields.

00:41:56.940 | And this data is all on the website.

00:41:59.160 | The taxonomy, so we model several different subcategories

00:42:03.180 | of pedestrians, several types of vehicles,

00:42:05.620 | some static objects, barrier cones.

00:42:08.460 | And then in addition, a bunch of attributes

00:42:11.020 | on the vehicles and on the pedestrians.

00:42:12.980 | All right, so without further ado,

00:42:15.380 | let's just look at some data.

00:42:16.640 | So this is one of the thousand scenes, right?

00:42:18.820 | So all I'm showing here is just playing the frames

00:42:23.700 | one by one of all the images.

00:42:25.680 | And again, the annotations live

00:42:30.700 | in the world coordinate system, right?

00:42:32.300 | So they are full 3D boxes.

00:42:34.580 | I've just projected them into the image.

00:42:36.580 | And that's what's so neat.

00:42:38.580 | So we're not really annotating the LIDAR

00:42:41.660 | or the camera or the radar.

00:42:43.380 | We're annotating the actual objects

00:42:45.420 | and put them in a world coordinate system

00:42:46.900 | and give all the transformations

00:42:48.060 | so you guys can play around with it how you like.

00:42:52.140 | So just to show that, so I can,

00:42:54.420 | because everything is ready,

00:42:55.420 | so I can now take the LIDAR sweeps

00:42:56.740 | and I can just project them into the images

00:42:58.780 | at the same time.

00:42:59.780 | So here I'm showing just colored by distance.

00:43:02.540 | So now you have some sort of sparse density measurement

00:43:06.700 | on the images, distance measurement, sorry.

00:43:09.660 | So that's all I wanted to talk about.

00:43:11.780 | Thank you.

00:43:12.620 | (audience applauding)

00:43:15.780 | - Hi, I was really, really interested

00:43:19.500 | in your discussion around validation

00:43:21.700 | and particularly continuous development

00:43:23.300 | and that sort of thing.

00:43:24.340 | And so my question was basically

00:43:26.020 | is this new scenes dataset,

00:43:27.320 | is this enough to guarantee

00:43:29.580 | that your model is going to generalize to unseen data

00:43:31.900 | and not hit pedestrians and that stuff

00:43:33.980 | or do you have other validation that you need to do?

00:43:36.100 | - No, no, no, I mean,

00:43:36.940 | so the new scenes effort is purely an academic effort.

00:43:40.100 | So we wanna share our data with academic community

00:43:43.580 | to drive the field forward.

00:43:46.300 | We're not making any claims

00:43:47.620 | that this is somehow a sufficient dataset

00:43:49.340 | for any safety case.

00:43:51.460 | It's a small subset of our data.

00:43:56.100 | Yeah, I would say, obviously,

00:43:59.620 | my background is in the academic world.

00:44:01.200 | One of the hardest things was always collecting data

00:44:03.400 | because it's difficult and expensive.

00:44:05.240 | And so having access to a dataset like that,

00:44:08.960 | which was expensive to collect and annotate,

00:44:12.520 | but which we thought we would make available

00:44:15.000 | because, well, we hope that it would spark

00:44:18.400 | academic interests and smart people

00:44:20.820 | like the people in this room

00:44:22.280 | coming up with new and better algorithms,

00:44:23.840 | which could benefit the whole community

00:44:25.120 | and then maybe some of you would even wanna

00:44:26.660 | come work with us at Aptiv.

00:44:28.120 | So not totally, a little bit of self-interest there.

00:44:31.700 | Wasn't intended to be for validation,

00:44:33.220 | it was more for research.

00:44:34.380 | To give you a sense of the scale of validation,

00:44:36.940 | there was one quote there at RAND

00:44:39.020 | saying you gotta drive 275 million miles or more,

00:44:42.000 | depending on the certainty you wanna impose.

00:44:47.000 | But to date as an industry,

00:44:48.680 | we've driven about 12 million miles

00:44:51.540 | to 12 to 14 million miles in sum,

00:44:54.220 | all participants in autonomous mode,

00:44:56.900 | under over hundreds of different bills of code

00:44:59.460 | and many different environments.

00:45:01.020 | So this would now be saying

00:45:02.100 | you're supposed to drive hundreds of millions of miles

00:45:04.560 | in a particular environment on a single bill of code,

00:45:07.360 | a single platform.

00:45:08.980 | Now obviously we're probably not gonna do that.

00:45:11.020 | What we'll end up doing is supplementing the driving

00:45:13.480 | with quite a lot of simulation

00:45:15.820 | and then other methodologies to convince ourselves

00:45:18.560 | that we can make a statistical,

00:45:20.260 | ultimately a statistical argument for safety.

00:45:22.420 | So there'll be use of data sets like this.

00:45:25.220 | We'll be doing lots of regression testing

00:45:26.820 | on supersized version of data sets like that

00:45:30.140 | and other kind of morally equivalent versions

00:45:32.120 | to test different parts of the systems.

00:45:33.500 | Now not just classification,

00:45:34.640 | but different aspects of the system.

00:45:36.380 | Our motion planning, decision making,

00:45:38.740 | localization, all aspects of the system.

00:45:42.580 | And then augment that with on-road driving

00:45:45.480 | and augment that with simulation.

00:45:46.900 | So the safety case is really quite a bit broader,

00:45:49.480 | unfortunately, than any single data set

00:45:51.940 | would allow you to kind of speak to.

00:45:54.620 | - From an industrial perspective,

00:45:56.340 | what do you think can 5G offer for autonomous vehicles?

00:46:00.920 | - 5G, yeah, it's an interesting one.

00:46:04.000 | Well, these vehicles are connected.

00:46:06.540 | You know, that's a requirement.

00:46:08.980 | Certainly when you think about operating them as a fleet.

00:46:12.620 | When the day comes when you have an autonomous vehicle

00:46:14.820 | that is personally owned,

00:46:16.900 | and that day will come in some point in the future,

00:46:19.180 | it may or may not be connected,

00:46:20.420 | it will almost certainly then be too.

00:46:22.220 | But when you have a fleet of vehicles

00:46:24.380 | and you wanna coordinate the activity of that fleet

00:46:27.380 | in a way to maximize the efficiency of that network,

00:46:30.100 | that transportation network,

00:46:31.660 | they're certainly connected.

00:46:33.180 | The requirements of that connectivity is fairly relaxed

00:46:35.740 | if you're talking about just passing back and forth

00:46:37.380 | the position of the car and maybe some status indicators.

00:46:40.500 | You know, are you in autonomous mode, manual mode,

00:46:42.280 | are all systems go, or do you have a fault code,

00:46:43.980 | and what is it?

00:46:45.240 | Now, there's some interesting requirements

00:46:48.220 | that become a little bit more stringent

00:46:49.580 | if you think about what we call teleoperation

00:46:51.740 | or remote operation of the car.

00:46:53.900 | The case where if the car encounters a situation

00:46:56.380 | it doesn't recognize, can't figure out,

00:46:58.620 | gets stuck or confused,

00:47:00.340 | you may kind of phone a human operator

00:47:02.900 | who's sitting remotely to intervene.

00:47:05.080 | And in that case, you know,

00:47:06.580 | that human operator will wanna have

00:47:07.900 | some situational awareness.

00:47:09.380 | There may be a demand of high bandwidth, low latency,

00:47:13.660 | high reliability of the sort that maybe 5G

00:47:16.940 | is better suited to than 4G.

00:47:19.540 | Or LT or whatever you've got.

00:47:21.420 | Broadly speaking, we see it as a very nice to have,

00:47:25.700 | but like any infrastructure,

00:47:28.420 | we understand that it's gonna arrive

00:47:30.660 | on a timeline of its own

00:47:32.420 | and be maintained by someone who's not us.

00:47:34.940 | So it's very much outside our control.

00:47:37.380 | And so for that reason, we design a system

00:47:39.700 | such that we don't rely on kind of the coming 5G wave,

00:47:43.260 | but we'll certainly welcome it when it arrives.

00:47:45.220 | - So you said you have presence in 45 countries.

00:47:48.260 | So did you observe any interesting patterns from that?

00:47:51.140 | Like your car, your same self-driving car model

00:47:55.500 | that is deployed in Vegas as well as Singapore

00:47:57.900 | was able to perform equally well

00:47:59.940 | in both Vegas and Singapore,

00:48:01.180 | or the model was able to perform very well in Singapore

00:48:03.700 | compared to Vegas?

00:48:05.580 | - To speak to your question

00:48:06.420 | about like country to country variation,

00:48:08.260 | you know, we touched on that for a moment

00:48:10.540 | in the validation discussion.

00:48:12.620 | But obviously driving in Singapore

00:48:14.180 | and driving in Vegas is pretty different.

00:48:15.540 | I mean, you're on the other side of the road for starters,

00:48:18.340 | but different traffic rules

00:48:20.900 | and it's sort of underappreciated people drive differently.

00:48:23.300 | There's slightly different traffic norms.

00:48:25.700 | So one of the things that,

00:48:27.380 | well, if anyone was in this class last year,

00:48:30.500 | my co-founder Emilio gave a talk

00:48:31.900 | about something we call rule books,

00:48:33.700 | which is a structure that we've designed

00:48:35.860 | around what we call the driving policy

00:48:37.460 | or the decision-making engine,

00:48:38.580 | which tries to admit in a general and fairly flexible way

00:48:44.020 | the ability to reprioritize rules, reassign rules,

00:48:47.300 | change weights on rules to enable us to drive

00:48:50.300 | in one community and then another

00:48:52.300 | in a fairly seamless manner.

00:48:53.980 | So to give you an example,

00:48:55.020 | when we wanted to get on the road in Singapore,

00:48:57.820 | if you can imagine you've got a,

00:48:59.460 | so let's say you're a autonomy engineer

00:49:02.100 | who was tasked with writing the decision-making engine

00:49:03.820 | and you decide I'm gonna do a finite state architecture,

00:49:06.100 | I'm gonna write down some transition rules,

00:49:07.860 | I'm gonna do them by hand, it's gonna be great.

00:49:09.820 | And then you did that for the right-hand driving

00:49:12.060 | and your boss came in and said,

00:49:13.100 | "Oh yeah, next Monday we're gonna be left-hand driving,

00:49:15.020 | "so just flip all that and get it ready to go."

00:49:18.580 | That could be a huge pain, pain to do,

00:49:21.620 | 'cause it's generally speaking you're doing it manually

00:49:23.300 | and then very difficult to validate,

00:49:24.780 | to ensure that the outputs are correct

00:49:27.380 | across the entire spectrum of possibilities.

00:49:29.740 | So we wanted to avoid that.

00:49:31.020 | And so the long story short,

00:49:33.580 | we actually quite carefully designed the system

00:49:36.940 | such that we can scale to different cities and countries.

00:49:41.780 | And one of the ways you do that is by thinking carefully

00:49:45.060 | around the architectural design

00:49:47.100 | of the decision-making engine.

00:49:48.900 | But it's quite different.

00:49:51.860 | There's four cities I mentioned which are our primary sites,

00:49:53.940 | Boston, Pittsburgh, Vegas, and Singapore,

00:49:56.700 | spans a wide spectrum of driving conditions.

00:49:58.760 | I mean, everybody knows Boston, which is pretty bad.

00:50:02.700 | Vegas is warm weather, mid-density urban, but it's Vegas.

00:50:07.700 | So I mean, all kinds of stuff.

00:50:11.080 | And then Singapore is interesting,

00:50:13.260 | perfect infrastructure, good weather, flat.

00:50:16.380 | People, generally speaking, obey the rules,

00:50:18.380 | so it's kind of close to the ideal case.

00:50:21.860 | So that exposure to this different spectrum of data,

00:50:25.340 | I think, I'll speak for Oscar, maybe, is pretty valuable.

00:50:27.900 | I know for other parts of the development team,

00:50:30.020 | quite valuable.

00:50:30.860 | - Singapore is ideal except there are,

00:50:33.100 | there's constant construction zones.

00:50:34.820 | So every time we drive out, there's a new construction zone.

00:50:37.620 | So we focused, have a lot of work

00:50:39.580 | on construction zone detection in Singapore.

00:50:41.720 | - And the torrential rain.

00:50:42.880 | - Yeah, and the jaywalkers.

00:50:44.720 | - And the jaywalkers, right.

00:50:46.440 | Yeah, they do jaywalk.

00:50:47.680 | People don't break the rules, but they jaywalk.

00:50:50.500 | Other than that, it's perfect.

00:50:52.480 | So which country's fully equipped?

00:50:54.120 | That's a really good question, yeah.

00:50:55.880 | Well, it's interesting because there's other dimensions.

00:50:59.120 | So when we look at which countries are interesting to us

00:51:01.640 | to be in as a market,

00:51:03.920 | there's the infrastructure conditions,

00:51:05.960 | there's the driving patterns and properties,

00:51:08.400 | the density, is it Times Square at rush hour

00:51:10.620 | or is it Dubuque, Iowa?

00:51:12.460 | There is the regulatory environment,

00:51:14.400 | which is incredibly important.

00:51:15.780 | You may have a perfectly well-suited city

00:51:17.220 | from a technical perspective

00:51:18.700 | and they may not allow you to drive there.

00:51:21.560 | So it's really all of these things put together.

00:51:24.460 | And so we kind of have a matrix.

00:51:26.280 | We analyze which cities check these boxes

00:51:28.420 | and assign them scores and then try to understand

00:51:31.700 | then also the economics of that market.

00:51:34.040 | Is that city, check all these boxes,

00:51:36.020 | but there's no one using mobility services there.

00:51:38.780 | There's no opportunity to actually generate revenue

00:51:41.420 | from the service.

00:51:43.020 | So you factor in all of those things.

00:51:45.160 | - Yeah, and I think, I mean, one thing to keep in mind

00:51:48.100 | that it's always the first thing I tell candidates

00:51:50.500 | when I interview them.

00:51:51.940 | There's a huge difference in the advantage

00:51:54.100 | to the business model we're proposing, right?

00:51:55.700 | The ride-hailing service.

00:51:57.260 | So we can choose, even if we commit to a certain city,

00:52:00.180 | we can still select the routes that we feel comfortable

00:52:03.540 | and we can roll it out sort of piece by piece.

00:52:05.180 | We can say, okay, we don't feel comfortable

00:52:07.260 | when driving at night in the city yet.

00:52:09.460 | So we just won't accept any rides, right?

00:52:12.060 | So there's like that decision space as well.

00:52:16.020 | - Hi, thank you very much for coming

00:52:17.500 | and giving us this talk today.

00:52:18.780 | It was very, very interesting.

00:52:19.660 | I have a question which might reveal more

00:52:21.420 | about how naive I am than anything else.

00:52:24.340 | I was comparing your point pillar approach

00:52:28.220 | to the earlier approach where you were,

00:52:31.740 | which is the Voxel-based approach

00:52:34.500 | to interpreting the LIDAR results.

00:52:37.100 | And in the Voxels, you had a four-dimensional tensor

00:52:40.900 | that you were starting with, and your point pillar,

00:52:42.900 | you only have three dimensions.

00:52:43.900 | You're throwing away the Z, as I understood it.

00:52:46.340 | So when you do that, are you concerned

00:52:49.100 | that you're losing information about potential occlusions

00:52:51.460 | or transparencies or semi-occlusions?

00:52:53.420 | Is this a concern?

00:52:56.540 | - I see.

00:52:57.380 | So I may have been a little bit sloppy there.

00:53:00.980 | So we're certainly not throwing away the Z.

00:53:03.500 | All we're saying is that we're learning

00:53:05.660 | the embedding in the Z dimension

00:53:08.060 | jointly with everything else.

00:53:10.340 | So VoxelNet, if you want, sort of felt,

00:53:14.220 | when I first signed that paper,

00:53:15.900 | I felt the need to spoon-feed the network a little bit

00:53:18.260 | and say, let's learn everything stratified

00:53:21.300 | in this height dimension.

00:53:24.820 | And then we'll have a second step

00:53:26.940 | where we learn to consolidate that into a single vector.

00:53:30.820 | We just said, why not just learn those things together?

00:53:33.660 | So, yeah.

00:53:35.100 | - Thanks for your talk.

00:53:36.740 | I have a question for Carl.

00:53:38.060 | You mentioned that if people make change to the code,

00:53:42.180 | do we need another validation or not?

00:53:45.580 | So I work in the industry of nuclear power.

00:53:48.700 | So we do nuclear power simulations.

00:53:51.180 | So when we make any change to our simulation code,

00:53:56.180 | and to make it commercialized,

00:53:57.860 | we need to submit a request for the NRC,

00:54:01.260 | which is the Nuclear Regulation Committee.

00:54:04.660 | So in your opinion, do you think for self-driving,

00:54:08.260 | we need another third-party validation committee or not?

00:54:14.140 | Or should that be a third party, or is just self-check?

00:54:19.580 | - Yeah, that's a really good question.

00:54:22.540 | So I don't know the answer.

00:54:24.140 | I wouldn't be surprised, let me put it this way.

00:54:26.020 | I would not be surprised either way

00:54:27.740 | if the automotive industry ended up

00:54:29.340 | with third-party regulatory oversight, or it didn't.

00:54:33.860 | And I'll tell you why.

00:54:34.700 | There's great precedence for what you just described.

00:54:37.780 | Nuclear, aerospace, there's external bodies

00:54:40.980 | who have deep technical competence,

00:54:43.700 | who can come in, they can do investigations,

00:54:45.780 | they can impose strict regulation, or advise regulation,

00:54:50.020 | and they can partner or define requirements

00:54:56.600 | for certification of various types.

00:54:58.880 | The automotive industry has largely been self-certifying.

00:55:01.680 | There's an argument, which is certainly not unreasonable,

00:55:05.080 | that you have a real alignment of incentive

00:55:08.800 | within the industry and with the public

00:55:11.640 | to be as safe as possible.

00:55:13.520 | Simply put, the cost of a crash is enormous,

00:55:17.900 | economically, socially, everything else.

00:55:20.720 | But whether it continues along that path,

00:55:22.840 | I couldn't tell you.

00:55:25.060 | It's an interesting space because it's one

00:55:28.580 | where the federal government is actually moving

00:55:31.140 | very, very quickly.

00:55:32.020 | I mean, I would say carefully, too,

00:55:33.900 | not overstepping and not trying to impose

00:55:36.620 | too much regulation around an industry

00:55:38.660 | that has never generated a dollar of revenue.

00:55:41.620 | It's still quite nascent.

00:55:43.620 | But if you would have told me a few years ago

00:55:45.680 | that there would have been very thoughtfully defined

00:55:48.820 | draft regulatory guidelines or advice,

00:55:52.620 | I mean, let's say, it's not firm regulation,

00:55:55.540 | around this industry, I probably wouldn't have believed you.

00:55:58.220 | But in fact, that exists.

00:55:59.180 | There's a third version that was released this summer

00:56:01.140 | by the Department of Transportation.

00:56:04.300 | So there's intense interest on the regulatory side.

00:56:07.700 | In terms of how far the process goes

00:56:11.880 | in terms of formation of an external body,

00:56:13.520 | I think really remains to be seen.

00:56:15.060 | I don't know the answer.

00:56:16.660 | - Thanks for your insightful talk.

00:56:18.740 | Looking at this slide, I'm wondering how easy

00:56:21.760 | and effective your train models are

00:56:25.100 | to transfer across different weathers

00:56:27.340 | and whether you need, for example,

00:56:29.300 | if it is snowing, do you need specific trainings

00:56:32.780 | for specifically for your lidars to work effectively

00:56:36.100 | or you don't see any issues in that regard?

00:56:39.500 | - No, I mean, I think the same rules apply

00:56:41.700 | to this method as any other machine learning-based method.

00:56:44.380 | You wanna have support in your training data

00:56:46.220 | for the situation you wanna deploy in.

00:56:49.220 | So if we have no snow in our training data,

00:56:51.760 | I wouldn't go and deploy this in snow.

00:56:54.120 | I do like, one thing I like after having worked

00:56:57.680 | so much with vision though is that the lidar point cloud

00:57:00.220 | is really easy to augment and play around with.

00:57:05.060 | So for example, if you wanna say,

00:57:07.720 | you wanna be robust in really rare events, right?

00:57:11.720 | So let's say there's a piano on the road.

00:57:13.680 | I really wanna detect that.

00:57:15.560 | But it's hard because I have very few examples

00:57:17.460 | of pianos on the road, right?

00:57:19.200 | Now if you think about augmenting your visual dataset

00:57:21.260 | with that data, it's actually quite tricky.

00:57:23.720 | So that easy to have a photorealistic piano

00:57:26.320 | in your training data.

00:57:27.480 | But it is quite easy to do that in your lidar data, right?

00:57:30.640 | So you have a 3D model of your piano,

00:57:33.200 | you have the model for your lidar

00:57:35.160 | and you can get a pretty accurate,

00:57:37.040 | fairly realistic point cloud return from that, right?

00:57:40.520 | So I like that part about working with lidar.

00:57:42.080 | You can augment, you can play around with it.

00:57:44.360 | In fact, one of the things we do when we train this model

00:57:47.500 | is that we copy and paste samples from,

00:57:51.820 | or like objects from different samples.

00:57:53.840 | So you can take a car that I saw yesterday,

00:57:56.240 | take the point returns on that car,

00:57:59.420 | you can just paste it into your current lidar sweep.

00:58:02.280 | You have to be a little bit careful, right?

00:58:03.860 | And this was actually proposed by another,

00:58:07.100 | by a previous paper.

00:58:08.500 | And we found that that was a really useful data.

00:58:11.680 | It sounds absurd, but it actually works.

00:58:14.420 | And it speaks to the ability to do that with lidar point cloud.

00:58:18.280 | - Okay, great.

00:58:19.120 | Please give Carl and Oscar a big hand.

00:58:20.920 | Thank you so much.

00:58:21.760 | (audience applauding)

00:58:25.540 | - Excellent.

00:58:26.540 | (upbeat music)

00:58:29.120 | (upbeat music)

00:58:31.700 | (upbeat music)

00:58:34.280 | (upbeat music)

00:58:36.860 | (upbeat music)

00:58:39.440 | (upbeat music)

00:58:42.020 | [BLANK_AUDIO]

Karl Iagnemma & Oscar Beijbom (Aptiv Autonomous Mobility) - MIT Self-Driving Cars

Chapters