back to index

Karl Iagnemma & Oscar Beijbom (Aptiv Autonomous Mobility) - MIT Self-Driving Cars


Chapters

0:0 Introduction to Karl Iagnemma and Oscar Beijbom
1:0 Karl - Aptiv Background
10:18 Dimensions of Safety for AVs
12:47 Trusting neural networks behind the wheel
15:7 Validation of black-box systems
17:50 Trusting the data
19:27 Trusting the algorithms
22:27 Safety architecture for neural networks
25:20 Engineering is inching closer to the natural sciences
25:57 Oscar - DL for 3D Detection
30:6 PointPillars
39:51 nuScenes - a dataset for multimodal 3d object detection
43:17 Q&A

Whisper Transcript | Transcript Only Page

00:00:00.000 | All right, welcome back to 6S094,
00:00:03.440 | Deep Learning for Self-Driving Cars.
00:00:05.440 | Today we have Carl and Yema,
00:00:07.860 | and Oscar Bayboom from Aptiv.
00:00:12.160 | Carl is the president of Aptiv Autonomous Mobility,
00:00:15.240 | where Oscar is the machine learning lead.
00:00:18.080 | Carl founded Neutonomy, as many of you know, in 2013.
00:00:22.280 | It's a Boston-based autonomous vehicle company,
00:00:25.960 | and Neutonomy was acquired by Aptiv in 2017,
00:00:29.160 | and now is part of Aptiv.
00:00:31.040 | Carl and team are one of the leaders
00:00:33.000 | in autonomous vehicle development and deployment,
00:00:36.080 | with cars on roads all over the United States,
00:00:38.120 | several sites.
00:00:39.440 | But most importantly, Carl is MIT through and through,
00:00:43.180 | as also some of you may know, getting his PhD here.
00:00:46.480 | He led a robotics group here
00:00:48.640 | as a research scientist for many years.
00:00:50.720 | So it's really a pleasure to have both Carl
00:00:54.280 | and Oscar with us today.
00:00:55.400 | Please give them a warm welcome.
00:00:57.200 | (audience applauding)
00:01:00.360 | - All right, thanks, Lex.
00:01:02.760 | Yeah, very glad to be back at MIT.
00:01:05.160 | Very impressed that you guys are here during IAP.
00:01:08.040 | My course load during IAP was usually ice skating,
00:01:13.080 | and sometimes there was a wine tasting course.
00:01:15.880 | This was now almost 20 years ago,
00:01:17.520 | and that was pretty much it.
00:01:18.800 | That's where the academic work stopped.
00:01:20.720 | So you guys are here to learn something,
00:01:22.660 | so I'm gonna do my best and try something radical, actually.
00:01:25.520 | Since I'm president now of Aptiv Autonomous Driving,
00:01:28.160 | I'm not allowed to talk about
00:01:29.040 | anything technical or interesting.
00:01:30.840 | I'm gonna flout that a little bit
00:01:32.240 | and raise some topics that we think about
00:01:35.440 | that I think are interesting questions
00:01:38.640 | to keep in the back of your mind
00:01:40.920 | as you're thinking about deep learning
00:01:42.080 | and autonomous driving.
00:01:42.940 | So I'll raise some of those questions.
00:01:44.960 | And then Oscar will actually present
00:01:46.960 | some real-life technology
00:01:49.340 | and some of the work that he has been doing.
00:01:51.580 | Oscar's our machine learning lead.
00:01:53.200 | Some of the work that he and his outstanding team
00:01:55.180 | have been doing around machine learning-based detectors
00:02:00.180 | for the perception problem.
00:02:03.560 | So let me first introduce Aptiv a little bit,
00:02:05.360 | 'cause people usually ask me, like,
00:02:08.240 | what's an Aptiv when I say I work for Aptiv?
00:02:11.000 | Aptiv's actually been around for a long time,
00:02:13.200 | but in a different form.
00:02:14.680 | Aptiv was previously Delphi Technologies,
00:02:17.200 | which was previously part of General Motors.
00:02:19.720 | So everybody's heard of General Motors.
00:02:21.120 | Some of you may have heard of Delphi,
00:02:23.480 | Aptiv spun from Delphi about 14 months ago.
00:02:28.480 | And so Aptiv's a tier one supplier.
00:02:30.800 | They're an automotive company
00:02:32.160 | that industrializes technology.
00:02:34.160 | Essentially, they take software and hardware,
00:02:37.960 | they industrialize it and put it on cars
00:02:39.880 | so it can run for many, many hundreds of thousands of miles
00:02:42.200 | without failing, which is a useful thing
00:02:44.960 | when we think about autonomous driving.
00:02:46.220 | So the themes for Aptiv,
00:02:48.080 | they develop what they say is safer, greener,
00:02:50.720 | and more connected solutions.
00:02:52.960 | Safer means safety systems, active safety,
00:02:55.520 | autonomous driving systems of the type that we're building.
00:02:58.320 | Greener, systems to enable electrification
00:03:02.440 | and kind of green vehicles.
00:03:03.720 | And then more connected connectivity solutions,
00:03:06.160 | both within the vehicle,
00:03:07.420 | transmitting data around the vehicle,
00:03:08.800 | and then externally, wireless communication.
00:03:11.680 | All of these things, as you can imagine,
00:03:14.200 | feed very, very nicely into the future
00:03:17.720 | transportation systems that the software
00:03:19.960 | will actually only be a part of.
00:03:21.600 | So Aptiv is in a really interesting spot
00:03:23.580 | when you think about the future of autonomous driving.
00:03:27.020 | And to give you a sense of scale,
00:03:29.600 | still kind of amazes me.
00:03:32.780 | The biggest my research group ever was at MIT
00:03:34.520 | was like 18, 18 people.
00:03:36.620 | Aptiv is 156,000 employees,
00:03:40.580 | so significant sized organization,
00:03:42.960 | about a $13 billion company by revenue
00:03:45.160 | in about 50 countries around the world.
00:03:47.740 | My group's about 700 people,
00:03:51.140 | so of which Oscar is one very important person.
00:03:53.640 | We're about 700 working on autonomous driving.
00:03:55.840 | We've got about 120 cars on the road
00:03:58.280 | in different countries,
00:03:59.680 | and I'll show you some examples of that.
00:04:01.480 | But first, let me take a trip down memory lane
00:04:04.880 | and show you a couple of snapshots
00:04:07.060 | about where we were not too long ago
00:04:10.400 | kind of as a community, but also me personally.
00:04:13.960 | And this will either inspire or horrify you,
00:04:16.120 | I'm not sure which.
00:04:17.880 | The fact is 2007, there were groups driving around
00:04:22.140 | with cars like running blade servers in the trunk
00:04:24.940 | that were generating so much heat,
00:04:26.580 | you had to install another air conditioner,
00:04:28.920 | which then was drawing so much power,
00:04:30.220 | you had to add another alternator,
00:04:32.060 | and then kind of rinse and repeat.
00:04:33.700 | So it wasn't a great situation.
00:04:35.740 | But people did enough algorithmically, computationally,
00:04:40.740 | to enable these cars,
00:04:43.560 | and this is the DARPA Urban Challenge
00:04:45.220 | for those of you that may be familiar,
00:04:46.740 | to enable these cars to do something useful
00:04:48.680 | and interesting on a closed course.
00:04:50.440 | And it kind of convinced enough people
00:04:53.920 | that given enough devotion of thought and resources
00:04:57.960 | that this might actually become a real thing someday.
00:05:00.860 | So I was one of those people that got convinced.
00:05:03.940 | 2010, this is now, I'm gonna crib from my co-founder Emilio
00:05:09.560 | who was a former MIT faculty member in AeroAstro.
00:05:12.320 | Emilio started up an operation in Singapore through Smart,
00:05:15.140 | who some of you have probably worked with.
00:05:16.640 | So this is some folks from Smart.
00:05:19.000 | That's James, who looks really young in that picture.
00:05:21.440 | He was one of Emilio's students
00:05:23.200 | who was basically taking a golf cart
00:05:26.200 | and turning it into an autonomous shuttle.
00:05:29.520 | It turned out to work pretty well,
00:05:31.000 | and it got people in Singapore excited,
00:05:33.300 | which in turn got us further excited.
00:05:35.300 | 2014, they did a demo where they let people of Singapore
00:05:39.020 | come and ride around in these carts in a garden,
00:05:42.240 | and that worked great over the course of a weekend.
00:05:45.800 | Around this time, we'd started Newtonomy.
00:05:48.080 | We'd actually started a commercial enterprise.
00:05:49.800 | It kind of stepped at least partly away
00:05:51.440 | from MIT at that point.
00:05:53.480 | 2015, we had cars on the road.
00:05:55.480 | This is a Mitsubishi IMEV electric vehicle.
00:05:58.300 | When we had all of our equipment in it,
00:06:00.400 | the front seat was pushed forward so far
00:06:02.080 | that me, I'm about six foot three,
00:06:04.000 | actually couldn't sit in the front seat,
00:06:05.920 | so I couldn't actually accompany people on rides.
00:06:07.880 | It wasn't very practical.
00:06:09.820 | We ended up switching cars to a Renault Zoe platform,
00:06:13.760 | which is the one you see here,
00:06:14.780 | which had a little more leg room.
00:06:16.440 | We were giving, at that point,
00:06:17.520 | open to the public rides in our cars in Singapore
00:06:21.040 | in the part of the city that we were allowed to operate in.
00:06:24.040 | It was a quick transition.
00:06:26.260 | As you can see, just even visually,
00:06:28.840 | the evolution of these systems has come a long way
00:06:31.480 | in a short time, and we're just a point example
00:06:34.340 | of this phenomenon, which is kind of, broadly speaking,
00:06:38.160 | similar across the industry.
00:06:39.720 | But 2017, we joined Aptiv, and we were excited by that
00:06:43.840 | because we, as primarily scientists and technologists,
00:06:47.520 | didn't have a great idea
00:06:48.560 | how we were gonna industrialize this technology
00:06:50.240 | and actually bring it to market
00:06:51.760 | and make it reliable and robust and make it safe,
00:06:55.120 | which is what I'm gonna talk about a little bit here today.
00:06:57.600 | So we joined Aptiv with its global footprint.
00:07:00.040 | Today, we're primarily in Pittsburgh,
00:07:02.200 | Boston, Singapore, and Vegas,
00:07:04.960 | and we've got connectivity to Aptiv's other sites
00:07:08.160 | in Shanghai and Wolfsburg.
00:07:10.160 | Let me tell you a little bit
00:07:11.000 | about what's happening in Vegas.
00:07:11.840 | I think people were here,
00:07:13.080 | when was Luke talking?
00:07:14.920 | Couple days ago, yesterday.
00:07:16.100 | So Luke from Lyft, Luke Vincent,
00:07:17.760 | probably talked a little bit about Vegas.
00:07:20.360 | Vegas is really an interesting place for us.
00:07:22.960 | We've got a big operation there,
00:07:24.080 | 130,000 square foot garage.
00:07:26.080 | We've got about 75 cars.
00:07:28.340 | We've got 30 of those cars on the Lyft network.
00:07:30.980 | So Aptiv technology,
00:07:32.200 | but connecting to the customer through Lyft.
00:07:33.720 | So if you go to Vegas and you open your Lyft app,
00:07:36.320 | it'll ask you, do you wanna take a ride in an autonomous car?
00:07:39.640 | You can opt in, you can opt out, it's up to you.
00:07:41.800 | If you opt in, there's a reasonable chance
00:07:43.520 | one of our cars will pick you up if you call for a ride.
00:07:46.520 | So anybody can do this,
00:07:48.120 | competitors, innocent bystanders,
00:07:50.600 | totally up to you, we have nothing to hide.
00:07:52.360 | Our cars are on the road 20 hours a day,
00:07:54.280 | seven days a week.
00:07:55.600 | If you take a ride, when you get out of the car,
00:07:57.600 | just like any Lyft ride,
00:07:58.800 | you gotta give us a star rating, one through five.
00:08:00.920 | And that, to us, is actually really interesting
00:08:02.720 | because it's a scaler, it's not too rich,
00:08:06.560 | but that star rating, to me,
00:08:08.520 | says something about the ride quality,
00:08:11.000 | meaning the comfort of the trip,
00:08:12.600 | the safety that you felt,
00:08:13.800 | and the efficiency of getting to where you wanted to go.
00:08:16.720 | Our star rating today is 4.95, which is pretty good.
00:08:21.120 | Key numbers, we've given, at this point,
00:08:24.280 | over 30,000 rides to more than 50,000 passengers.
00:08:27.680 | We've driven over a million miles in Vegas
00:08:30.160 | and a little bit additional, but primarily there.
00:08:34.320 | And as I mentioned, the 4.95.
00:08:37.000 | So what does it look like on the road?
00:08:38.280 | I'll show just one video today.
00:08:40.120 | I think Oscar has a few more.
00:08:41.680 | This one's actually in Singapore,
00:08:43.920 | but it's all kind of morally equivalent.
00:08:46.240 | You'll see a sped up, slightly sped up view of a run from,
00:08:51.240 | this is now probably six, seven months old,
00:08:54.200 | on the road in Singapore,
00:08:55.040 | but it's got some interesting stuff
00:08:56.800 | in a fairly typical run.
00:08:59.400 | Some of you may recognize these roads.
00:09:02.000 | We're on the wrong side of the road,
00:09:03.080 | remember, 'cause we're in Singapore.
00:09:04.600 | But to give you an example of some of the types of problems
00:09:07.460 | we have to solve on a daily basis.
00:09:10.380 | So let me run this thing.
00:09:11.660 | And you'll see as this car is cruising down the road,
00:09:16.660 | you have obstacles that we have to avoid,
00:09:20.460 | sometimes in the face of oncoming traffic.
00:09:23.340 | We've got to deal with sometimes situations
00:09:26.340 | where other road users are maybe not perfectly behaving
00:09:29.460 | by the rules.
00:09:30.300 | We've got to manage that in a natural way.
00:09:33.100 | Construction in Singapore, like everywhere else,
00:09:35.780 | is pretty ubiquitous.
00:09:37.220 | And so you have to navigate
00:09:38.500 | through these less structured environments.
00:09:40.680 | People who are sometimes doing things
00:09:44.660 | or indicating some future action,
00:09:46.660 | which you have to make inferences about,
00:09:49.220 | that can be tricky to navigate.
00:09:51.140 | So typical day, a route that any one of us as humans
00:09:55.060 | would drive through without batting an eye, no problem,
00:09:58.340 | is actually presents some really, really complex problems
00:10:02.740 | for autonomous vehicles.
00:10:04.260 | But it's the table stakes these days.
00:10:05.780 | These are the things you have to do if you want to be
00:10:07.300 | on the road, and certainly if you want to drive
00:10:09.740 | millions of miles with very few accidents,
00:10:12.340 | which is what we're doing.
00:10:13.660 | So that's an introduction to Aptiv
00:10:15.220 | and a little bit of background.
00:10:17.500 | So let me talk about, we're gonna talk about learning
00:10:20.700 | and how we think about learning
00:10:22.460 | in the context of autonomous driving.
00:10:25.260 | So there was a period a few years ago
00:10:27.060 | where I think as a community,
00:10:29.100 | people thought that we would be able to go
00:10:30.980 | from pixels to actuator commands
00:10:33.180 | with a single learned architecture,
00:10:35.540 | a single black box.
00:10:36.980 | I'll say, generally speaking,
00:10:39.260 | we no longer believe that's true.
00:10:40.660 | And I shouldn't include we in that.
00:10:43.060 | I didn't believe that was ever true.
00:10:44.460 | But some of us maybe thought that was true.
00:10:46.340 | And I'll tell you part of the reason why,
00:10:48.580 | in part of this talk,
00:10:50.540 | a big part of it comes down to safety.
00:10:53.060 | A big part of it comes down to safety.
00:10:54.700 | And the question of safety, convincing ourselves
00:10:57.940 | that that system, that black box,
00:10:59.660 | even if we could train it to accurately approximate
00:11:03.340 | this massively complex underlying function
00:11:05.900 | that we're trying to approximate,
00:11:07.900 | can we convince ourselves that it's safe?
00:11:10.340 | And it's very, very hard to answer
00:11:11.940 | that question affirmatively.
00:11:13.140 | And I'll raise some of the issues around why that is.
00:11:17.940 | This is not to say that learning methods
00:11:19.860 | are not incredibly useful for autonomous driving
00:11:21.820 | because they absolutely are.
00:11:23.580 | And Oscar will show you examples of why that is
00:11:25.900 | and how Aptiv is using some learning methods today.
00:11:28.900 | But this safety dimension is tricky
00:11:30.700 | because there's actually two axes here.
00:11:34.620 | One is the actual technical safety of the system,
00:11:36.860 | which is to say, can we build a system that's safe,
00:11:39.660 | that's provably in some sense safe,
00:11:41.860 | that we can validate, which we can convince ourselves,
00:11:45.300 | achieves the intended functionality
00:11:47.500 | in our operational design domain,
00:11:49.580 | that adheres to whatever regulatory requirements
00:11:52.980 | might be imposed on our jurisdictions that we're operating.
00:11:56.220 | And there's a whole longer list related to technical safety.
00:11:59.340 | But these are technical problems primarily.
00:12:02.220 | But there's another dimension,
00:12:03.300 | which up here is called perceived safety,
00:12:05.980 | which is to say, when you ride in a car,
00:12:08.300 | even if it's safe, do you believe that it's safe?
00:12:11.900 | And therefore, will you wanna take another trip?
00:12:14.420 | Which sounds kinda squishy,
00:12:16.300 | and as engineers, we're typically uncomfortable
00:12:18.980 | with that kind of stuff,
00:12:19.820 | but it turns out to be really important
00:12:21.260 | and probably harder to solve
00:12:22.900 | because it's a little bit squishy.
00:12:24.860 | And quite obviously, we gotta sit up here, right?
00:12:26.940 | We gotta be in this upper right-hand corner
00:12:28.540 | where we have not only a very safe car
00:12:30.980 | from a technical perspective,
00:12:32.060 | but one that feels safe, that inspires confidence
00:12:34.940 | in riders, in regulators, and in everybody else.
00:12:38.020 | So how do we get there in the context
00:12:40.940 | of elements of this system that may be black boxes,
00:12:44.340 | for lack of a better word?
00:12:46.340 | What's required is trust.
00:12:48.180 | You know, how do we get to this point
00:12:49.420 | where we can trust neural networks
00:12:51.060 | in the context of safety-critical systems,
00:12:53.180 | which is what an autonomous vehicle is?
00:12:55.180 | It really comes down to this question of,
00:12:58.300 | how do we convince ourselves
00:12:59.740 | that we can validate these systems?
00:13:01.220 | Again, validating the system,
00:13:03.100 | ensuring that it can meet the requirements,
00:13:07.140 | the operational requirements in the domain of interest
00:13:09.700 | that are imposed by the user, all right?
00:13:12.660 | There's three dimensions to this key question
00:13:17.100 | of understanding how to validate,
00:13:18.220 | and I'm gonna just briefly introduce
00:13:19.820 | some topics of interest around each of these.
00:13:22.500 | But the first one, trusting the data.
00:13:25.900 | Trusting the data.
00:13:27.660 | Do we actually have confidence
00:13:29.700 | about what goes into this algorithm?
00:13:32.260 | I mean, everybody knows garbage in, garbage out.
00:13:35.020 | There's various ways that we can make this garbage.
00:13:38.340 | We can have data which is insufficiently covering our domain,
00:13:42.180 | not representative of the domain.
00:13:43.540 | We can have data that's poorly annotated
00:13:45.620 | by our third-party trusted partners
00:13:47.300 | who we've trusted to label certain things of interest.
00:13:50.420 | So do we trust the data that's going in
00:13:52.620 | to the algorithm itself?
00:13:53.900 | Do we trust the implementation?
00:13:56.460 | We've got a beautiful algorithm,
00:13:58.060 | super descriptive, super robust,
00:14:00.100 | not brittle at all, well-trained,
00:14:02.300 | and we're running it on poor hardware.
00:14:03.940 | We've coded it poorly.
00:14:05.500 | We've got buffer overruns right and left.
00:14:07.540 | Do we trust the implementation
00:14:09.260 | to actually execute in a safe manner?
00:14:11.340 | And do we trust the algorithm?
00:14:14.620 | Again, generally speaking,
00:14:15.740 | we're trying to approximate really complicated functions.
00:14:19.020 | I don't think we typically use neural networks
00:14:21.300 | to approximate linear systems.
00:14:23.340 | So this is a gnarly, nasty function
00:14:25.340 | which has problems of critical interest
00:14:30.340 | which are really rare.
00:14:33.100 | In fact, they're the only ones of interest.
00:14:35.180 | So there's these events that happen very, very infrequently
00:14:38.380 | that we absolutely have to get right.
00:14:40.300 | It's a hard problem to convince ourselves
00:14:43.020 | that the algorithm is gonna perform properly
00:14:45.660 | in these unexpected and rare situations.
00:14:48.660 | So these are the sorts of things that we think about
00:14:51.740 | and that we have to answer in an intelligent way
00:14:55.220 | to convince ourselves that we have
00:14:56.780 | a validated neural network-based system.
00:15:00.060 | Okay, let me just step through each of these topics
00:15:05.020 | really quickly.
00:15:06.100 | So the topic of validation,
00:15:09.020 | what do we mean by that and why it is hard?
00:15:11.300 | There's a number of different dimensions here.
00:15:13.900 | The first is that we don't have insight
00:15:15.380 | into the nature of the function
00:15:16.500 | that we're trying to approximate.
00:15:18.860 | The underlying phenomena is really complicated.
00:15:21.580 | Again, if it weren't, we'd probably possibly be modeling it
00:15:24.940 | using different techniques.
00:15:25.940 | We'd write a closed-form equation to describe it.
00:15:28.400 | So that's a problem.
00:15:30.180 | Second, again, the accidents,
00:15:33.620 | the actual crashes on the road,
00:15:35.420 | what's going crashes and not accidents,
00:15:37.340 | these are rare.
00:15:38.600 | Luckily, they're very rare.
00:15:40.380 | But it makes the statistical argument
00:15:42.580 | around these accidents
00:15:44.540 | and being able to avoid these accidents
00:15:45.820 | really, really difficult.
00:15:47.060 | If you believe Rand, and they're pretty smart folks,
00:15:51.460 | they say you gotta drive 275 million miles
00:15:54.260 | without accident, without a crash,
00:15:55.980 | to claim a lower fatality rate
00:15:57.660 | than a human with 95% confidence.
00:16:00.260 | Well, how are we gonna do that?
00:16:01.820 | Can we think about using some correlated incident,
00:16:06.080 | maybe some kind of close call,
00:16:08.180 | as a proxy for accidents, which may be more frequent,
00:16:10.620 | and maybe back in that way?
00:16:12.500 | There's a lot of questions here,
00:16:14.020 | which I won't say we don't have any answers to,
00:16:16.180 | 'cause I wouldn't go that far,
00:16:17.180 | but they're hard questions.
00:16:19.060 | They're not questions with obvious answers.
00:16:21.860 | So this is one of them,
00:16:23.140 | this issue of rare events.
00:16:24.580 | The regulatory dimension is one of these known unknowns.
00:16:29.940 | How do we validate a system
00:16:31.700 | if the requirements that may be imposed upon us
00:16:34.100 | from outside regulatory bodies are still to be written?
00:16:38.620 | That's difficult.
00:16:40.140 | So there's a lack of consensus
00:16:42.680 | on what the safety target should be for these systems.
00:16:46.180 | This is obviously evolving.
00:16:47.540 | Smart people are thinking about this.
00:16:49.420 | But today, it's not at all clear.
00:16:51.860 | If you're driving in Las Vegas,
00:16:53.220 | if you're driving in Singapore,
00:16:54.620 | if you're driving in San Francisco,
00:16:56.020 | or anywhere in between, what this target needs to be.
00:16:58.820 | And then lastly, and this is a really interesting one,
00:17:03.880 | we can get through a validation process for a bill to code.
00:17:06.940 | Let's assume we can do that.
00:17:08.360 | Well, what happens when we wanna update the code?
00:17:10.480 | 'Cause obviously we will.
00:17:11.860 | Does that mean we have to start that validation process
00:17:13.940 | again from scratch,
00:17:14.820 | which will unavoidably be expensive and lengthy?
00:17:18.520 | Well, what if we only change a little bit of the code?
00:17:20.260 | What if we only change one line?
00:17:22.020 | But what if that one line is the most important line of code
00:17:24.740 | in the whole code base?
00:17:25.940 | This is one that I can tell you
00:17:29.700 | keeps a lot of people up at night,
00:17:30.980 | this question of revalidation.
00:17:33.260 | And then not even, again, keep that code base fixed.
00:17:35.940 | What if we move from one city to the next?
00:17:38.200 | And let's say that city is quite similar
00:17:39.620 | to your previous city, but not exactly the same.
00:17:42.380 | How do we think about validation
00:17:44.020 | in the context of new environments?
00:17:46.280 | So this continuous development issue is a challenge.
00:17:50.780 | All right, let me move on to talking about the data.
00:17:53.540 | There's probably people in this room
00:17:54.620 | who are doing active research in this area
00:17:57.020 | 'cause it's a really interesting one.
00:17:59.360 | But there's a couple of obvious questions, I would say,
00:18:02.780 | that we think about when we think about data.
00:18:06.340 | We can have a great algorithm,
00:18:07.740 | and if we're training it on poor data
00:18:09.940 | for one reason or another, we won't have a great output.
00:18:12.800 | So one thing we think about is the sufficiency,
00:18:16.660 | the completeness of the data,
00:18:19.100 | and the bias that may be inherent in the data
00:18:21.420 | for our operational domain.
00:18:23.820 | If we wanna operate 24 hours a day,
00:18:26.900 | and we only train on data collected during daytime,
00:18:29.840 | we're probably gonna have an issue.
00:18:31.580 | Annotating the data is another dimension of the problem.
00:18:36.420 | We can collect raw data that's sufficient,
00:18:38.420 | that covers our space, but when we annotate it,
00:18:41.100 | when we hand it off to a third party,
00:18:42.500 | 'cause it's typically a third party,
00:18:44.540 | to mark up the interesting aspects of it,
00:18:47.920 | we provide them some specifications,
00:18:49.340 | but we put a lot of trust in that third party,
00:18:51.780 | and trust that they're gonna do a good job
00:18:55.900 | annotating the interesting parts,
00:18:57.340 | and not the uninteresting parts,
00:18:58.880 | that they're gonna catch all the interesting parts
00:19:00.540 | that we've asked them to catch, et cetera.
00:19:02.640 | So this annotation part, which seems very mundane,
00:19:06.500 | very easy to manage, and kind of like low-hanging fruit,
00:19:10.420 | is in fact another key aspect
00:19:13.120 | of ensuring that we can trust the data.
00:19:16.020 | Okay, and this reference just kind of points to the fact
00:19:19.020 | that there are, again, smart people
00:19:21.300 | thinking about this problem,
00:19:22.540 | which rears its head in many domains
00:19:24.620 | beyond autonomous driving.
00:19:26.540 | Now what about the algorithms themselves?
00:19:31.060 | So moving on from the data to the actual algorithm,
00:19:34.800 | how do we convince ourselves that that algorithm,
00:19:38.420 | that like any kind of learning-based algorithm,
00:19:41.980 | we've trained on a training set,
00:19:44.420 | is gonna do well on some unknown test set?
00:19:48.120 | Well, there's a couple kind of properties
00:19:52.220 | of the algorithm that we can look at,
00:19:53.780 | that we can kind of interrogate,
00:19:55.860 | and kind of poke at to convince ourselves
00:19:58.740 | that that algorithm will perform well.
00:20:00.880 | You know, one is invariance,
00:20:03.580 | and the other one, we can say, is stability.
00:20:06.260 | If we make small perturbations to this function,
00:20:10.000 | does it behave well?
00:20:11.180 | Given kind of, let's say, a bounded input,
00:20:13.660 | do we see a bounded output?
00:20:15.460 | Or do we see some wild response?
00:20:17.640 | You know, I'm sure you've all heard of examples
00:20:22.060 | of adversarial images that can confuse
00:20:26.700 | learning-based classifiers.
00:20:28.460 | So it's a turtle.
00:20:31.060 | You show it a turtle, it says, "Well, that's a turtle."
00:20:33.460 | And then you show it a turtle that's maybe fuzzed
00:20:35.180 | with a little bit of noise that the human eye can't perceive.
00:20:38.320 | So it still looks like a turtle,
00:20:39.940 | and it tells you it's a machine gun.
00:20:41.740 | Obviously, for us in the driving domain,
00:20:44.860 | we want a stop sign to be correctly identified
00:20:46.940 | as a stop sign 100 times of 100.
00:20:49.380 | We don't want that stop sign,
00:20:51.380 | if somebody goes up and puts a piece of duct tape
00:20:53.180 | in the lower right-hand corner,
00:20:54.220 | to be interpreted as a yield sign, for example.
00:20:58.000 | So this question of the properties of the algorithm,
00:21:02.300 | its invariance, its stability,
00:21:04.260 | is something of high interest.
00:21:08.460 | And then lastly, to add one more point to this,
00:21:12.540 | this notion of interpretability.
00:21:14.620 | So interpretability, understanding why an algorithm
00:21:17.880 | made a decision that it made.
00:21:19.500 | This is the sort of thing that may not be a nice-to-have,
00:21:23.220 | may actually be a requirement,
00:21:25.100 | and would likely to be a requirement
00:21:26.700 | from the regulatory groups
00:21:27.900 | that I was referring to a minute ago.
00:21:29.740 | So let's say, imagine the case of a crash,
00:21:32.540 | where the system that was governing
00:21:34.580 | your trajectory generator was a data-driven system,
00:21:38.620 | was a deep-learning-based trajectory generator.
00:21:42.460 | Well, you may need to explain to someone
00:21:44.900 | exactly why that particular trajectory
00:21:47.740 | was generated at that particular moment.
00:21:50.020 | And this may be a hard thing to do,
00:21:52.340 | if the generator was a data-driven model.
00:21:55.180 | Now, obviously, there are people working
00:21:56.580 | and doing active research into this specific question
00:21:59.680 | of interpretable learning methods,
00:22:02.940 | but it's a thorny one.
00:22:05.420 | It's a very, very difficult topic,
00:22:07.420 | and it's not at all clear to me when and if
00:22:11.180 | we'll get to the stage where we can,
00:22:13.140 | to even a technical audience,
00:22:16.260 | but beyond that, to a lay jury,
00:22:18.580 | be able to explain why algorithm X made decision Y.
00:22:22.340 | Okay, so with all that in mind,
00:22:25.820 | let me talk a little bit about safety.
00:22:32.580 | That all maybe sounds pretty bleak.
00:22:34.180 | You think, well, man, why are we taking this course
00:22:35.820 | with Lex, 'cause we're never gonna really use this stuff.
00:22:37.700 | But in fact, we can.
00:22:39.980 | We can and will, as a community.
00:22:42.780 | There's a lot of tools we can bring to bear
00:22:45.720 | to think about neural networks,
00:22:48.780 | and they're, generally speaking,
00:22:49.820 | within the context of a broader safety argument.
00:22:52.860 | I think that's the key.
00:22:53.940 | We tend not to think about using a neural network
00:22:57.000 | as a holistic system to drive a car,
00:23:00.500 | but we'll think about it as a submodule
00:23:02.780 | that we can build other systems around,
00:23:05.600 | generally speaking, that which we can say,
00:23:07.980 | maybe make more rigorous claims about their performance,
00:23:10.700 | their underlying properties,
00:23:13.000 | and then therefore make a convincing,
00:23:14.900 | holistic safety argument that this end-to-end system is safe.
00:23:19.180 | We have tools, functional safety is,
00:23:22.860 | maybe familiar to some of you.
00:23:24.140 | It's something we think about a lot
00:23:25.060 | in the automotive domain.
00:23:26.360 | And SOTIF, which stands for
00:23:29.740 | Safety of the Intended Functionality,
00:23:31.460 | we're basically asking ourselves the question,
00:23:34.100 | is this overall function doing what it's intended to do?
00:23:38.580 | Is it operating safely?
00:23:39.740 | And is it meeting its specifications?
00:23:41.520 | There's kind of an analogy here
00:23:43.340 | to validation and verification, if you will.
00:23:47.180 | And we have to answer these questions
00:23:48.860 | around functional safety and SOTIF affirmatively,
00:23:52.720 | even when we have neural network-based elements
00:23:57.020 | in order to eventually put this car on the road.
00:24:00.260 | All right, so I mentioned that we need to do some embedding.
00:24:02.940 | This is an example of what it might look like.
00:24:05.240 | We refer to this as,
00:24:07.620 | sometimes we call this caging the learning.
00:24:10.060 | So we put the learning in a box.
00:24:11.580 | It's this powerful animal we wanna control.
00:24:14.360 | And in this case, it's up there at the top in red.
00:24:17.340 | That might be that trajectory proposer I was talking about.
00:24:21.360 | So let's say we've got a powerful trajectory proposer.
00:24:23.780 | We wanna use this thing.
00:24:24.740 | We've got it on what we call our performance compute,
00:24:26.980 | our high-powered compute.
00:24:28.380 | It's maybe not automotive grade.
00:24:29.820 | It's got some potential failure modes,
00:24:31.460 | but it's generally speaking, good performance.
00:24:34.060 | Let's go there.
00:24:35.300 | And we've got our neural network-based generator on it,
00:24:38.060 | which we can say some things about,
00:24:39.380 | but maybe not everything we'd like to.
00:24:41.280 | Well, we make the argument that if we can surround that,
00:24:44.780 | so if we can cage it, kind of underpin it
00:24:47.860 | with a safety system that we can say
00:24:50.180 | very rigorous things about its performance,
00:24:54.320 | then generally speaking, we may be okay.
00:24:56.060 | There may be a path to using neural networks
00:24:58.460 | on autonomous vehicles if we can wrap them
00:25:01.680 | in a safety architecture that we can say
00:25:03.900 | a lot of good things about.
00:25:05.700 | And this is exactly what this represents.
00:25:08.140 | So I'm gonna conclude my part of the talk here,
00:25:10.420 | hand it over to Oscar, with kind of a quote, an assertion.
00:25:15.420 | One of my engineers insisted I show today.
00:25:18.480 | The argument is the following.
00:25:20.460 | Engineering is inching closer to the natural sciences.
00:25:22.980 | I won't say how much closer, but closer.
00:25:24.920 | We're creating things that we don't fully understand,
00:25:27.460 | and then we're investigating the properties of our creation.
00:25:30.320 | We're not writing down closed-form functions.
00:25:33.100 | That would be too easy.
00:25:35.440 | We're generating these immensely complex
00:25:38.100 | functional approximators, and then we're just poking at 'em
00:25:40.980 | in different ways and saying, boy, well,
00:25:42.020 | what does this thing do under these situations?
00:25:44.500 | And I'll leave you with one image,
00:25:46.580 | which I'll present without comment,
00:25:48.100 | and then hand it over to Oscar.
00:25:50.000 | All right, thank you.
00:25:51.980 | (audience applauding)
00:25:55.140 | - So thanks a lot, Carl.
00:25:58.700 | Thanks, Lex, for the invite.
00:26:00.220 | Yes, my name is Oscar.
00:26:02.120 | I run the machine learning team at Aptiv Neutronomy.
00:26:05.660 | So let me begin with this slide.
00:26:08.920 | You know, not long ago, image classification was,
00:26:13.140 | you know, quite literally a joke.
00:26:14.380 | So this is an actual comic.
00:26:17.420 | How many have seen this before?
00:26:20.060 | Okay, well, I was doing my PhD in this era
00:26:22.820 | where, you know, building a bird classifier
00:26:26.820 | was like a PhD project, right?
00:26:28.660 | And it was, you know, it's funny 'cause it's true.
00:26:32.500 | And then, of course, as you well know,
00:26:34.660 | the deep learning revolution happened,
00:26:36.340 | and Lex, you know, previous introductory slides
00:26:38.940 | gives a great overview.
00:26:40.780 | I don't wanna redo that.
00:26:42.140 | I just wanna say sort of a straight line
00:26:44.420 | from what I consider the breakthrough paper
00:26:46.740 | by Krzyzewski et al.
00:26:48.900 | To the work I'll be talking about today,
00:26:51.020 | I'll start with these three.
00:26:51.900 | So you had the, you know, deep learning,
00:26:54.580 | end-to-end learning for image net classification
00:26:57.180 | by Krzyzewski et al.
00:26:58.300 | That paper's been cited 35,000 times.
00:27:00.940 | I checked yesterday.
00:27:01.980 | Then, 2014, Ross Gershick et al. at Berkeley
00:27:06.180 | basically showed how to, you know,
00:27:08.820 | repurpose the deep learning architecture
00:27:11.540 | to do detection in images.
00:27:13.860 | And that was the first time
00:27:14.700 | when the visual community really started seeing,
00:27:16.780 | okay, so classification is more general.
00:27:18.460 | You can classify anything,
00:27:19.460 | an image, an audio signal, whatever, right?
00:27:21.780 | But detection in images was very intimate
00:27:23.900 | to the computer vision community.
00:27:25.100 | We thought we were best in the world, right?
00:27:27.260 | So when this paper came out,
00:27:28.620 | that was sort of the final argument for like,
00:27:32.220 | okay, we all need to do deep learning now.
00:27:34.320 | Right, and then 2016, this paper came out,
00:27:37.860 | the single-shot multi-box detector,
00:27:40.020 | which I think is a great paper by Liu et al.
00:27:43.460 | So if you haven't looked at this paper,
00:27:46.220 | by all means, read them carefully.
00:27:48.020 | So as a result,
00:27:51.180 | you know, performance is no longer a joke, right?
00:27:54.980 | So this is a network that we developed in my group.
00:27:59.460 | So it's a joint image classification segmentation network.
00:28:04.460 | This thing, we can run this at 200 hertz on a single GPU.
00:28:07.780 | And in this video, in this rendering,
00:28:11.840 | there's no tracking applied.
00:28:13.940 | There's no temporal smoothing.
00:28:15.260 | Every single frame is analyzed
00:28:17.420 | independently from the other one.
00:28:20.300 | And you can see that we can model several different classes,
00:28:23.300 | you know, both boxes and the surfaces at the same time.
00:28:29.480 | Here's my cartoon drawing of a perception system
00:28:32.660 | on an autonomous vehicle.
00:28:33.660 | So you have the three different main sensibilities.
00:28:38.000 | Typically have some module that does detection and tracking.
00:28:41.300 | You know, there's tons of variations of this, of course,
00:28:44.200 | but you have some sort of sensor pipelines,
00:28:46.600 | and then in the end, you have a tracking and fusion step.
00:28:49.460 | So what I showed you in the previous video
00:28:52.380 | is basically this part.
00:28:53.260 | So like I said, there was no tracking,
00:28:55.140 | but it's like going from the camera to detections.
00:28:58.600 | And if you look, you know, when I started,
00:29:01.920 | so I come strict from the computer science
00:29:04.240 | learning community, so when I started looking
00:29:06.980 | at this pipeline, I'm like, why are there so many steps?
00:29:09.300 | Why aren't we optimizing things end to end?
00:29:11.740 | So obviously, there's a real temptation
00:29:14.160 | to just wrap everything in a kernel.
00:29:15.540 | It's a very well-defined input/output function.
00:29:18.620 | And like Carl alluded to, it's one that can be verified
00:29:22.460 | quite well, assuming you have the right data.
00:29:25.520 | I'm not gonna be talking about this.
00:29:28.260 | I am gonna talk about this,
00:29:30.540 | namely the building a deep learning kernel
00:29:33.860 | for the LiDAR pipeline.
00:29:35.120 | And LiDAR pipeline is arguably the backbone
00:29:37.980 | of the perception system
00:29:39.620 | for most autonomous driving systems.
00:29:43.860 | So what we're gonna do is,
00:29:44.980 | so this is basically gonna be the goal here.
00:29:47.580 | So we're gonna have a point cloud,
00:29:49.380 | it's input, and we're gonna have a neural network
00:29:52.940 | that takes that as input and then generates
00:29:54.980 | 3D bounding boxes that are in a well-coordinated system.
00:29:57.360 | So it's like 20 meters that way,
00:29:59.720 | it's two meters wide, so long,
00:30:01.780 | this rotation and this orientation and so on.
00:30:04.180 | So yeah, so that's what this talk is about.
00:30:09.420 | So I'm gonna talk about point pillars,
00:30:10.860 | which is a new method we developed for this,
00:30:13.100 | and new scenes, which is a benchmark data that we released.
00:30:16.460 | Okay, so what is point pillars?
00:30:18.260 | Well, it's a novel point cloud encoder.
00:30:21.300 | So what we do is we learn a representation
00:30:23.100 | that is suitable for downstream detection.
00:30:25.180 | It's almost like a, the main innovation
00:30:27.100 | is the translation from the point cloud
00:30:29.260 | to a canvas that can then be processed
00:30:31.620 | by a similar architecture that you would use in an image.
00:30:35.780 | And I'll show you how it performs,
00:30:37.620 | you know, all published measurement on KITTI
00:30:39.300 | by a large margin, especially with respect
00:30:42.980 | to inference speed.
00:30:45.860 | And there's a pre-printout and some code available
00:30:48.820 | if you guys wanna play around with it.
00:30:50.660 | So the architecture that we're gonna use
00:30:54.260 | looks something like this.
00:30:56.300 | And I should say, most papers in this space
00:31:00.900 | use this architecture.
00:31:02.780 | So it's kind of a natural design, right?
00:31:04.860 | So you have the point cloud at the top,
00:31:06.880 | you have this encoder, and that's where we introduce
00:31:09.380 | the point pillars, but you can have,
00:31:10.860 | I'll show you guys, you can have various types of encoders.
00:31:14.540 | And then after that, that feeds into a backbone,
00:31:16.600 | which is now a standard convolutional 2D backbone.
00:31:19.780 | You have a detection head, and you might have,
00:31:22.220 | you may or may not have a segmentation head on that.
00:31:25.100 | The point is that after the encoder,
00:31:26.620 | everything looks just like, the architecture's
00:31:28.900 | very, very similar to the SSD architecture
00:31:31.060 | or the RCNN architecture.
00:31:32.540 | So let's go into a little bit more detail, right?
00:31:38.020 | So the range, so what you're given here
00:31:40.420 | is a range of D meters, so you wanna model,
00:31:43.340 | you know, 40 meters, a 40 meter circle
00:31:45.740 | around the vehicle, for example.
00:31:47.740 | You have certain resolution of your bins,
00:31:51.500 | and then a number of output channels, right?
00:31:53.820 | So your input is a set of pillars,
00:31:55.900 | or in the pillar here is a vertical column, right?
00:31:59.100 | So you have N, M of those that are non-empty in the space.
00:32:03.060 | And you say a pillar P contains all the points,
00:32:05.500 | which are a lot of point X, Y, C, and intensity.
00:32:09.100 | And there's N sub, M indexed by M points in each pillar,
00:32:13.700 | right, so just to say that it varies, right?
00:32:16.940 | So it could be one single point at a particular location,
00:32:19.580 | it could be 200 points.
00:32:20.980 | And then it's centered around this bin.
00:32:23.220 | And the goal here is to produce a tensor as a fixed size.
00:32:27.540 | So it's height, which is, you know,
00:32:29.780 | range of a resolution, width, range of a resolution,
00:32:33.100 | and then this parameter C.
00:32:35.340 | C is the number of channels, so in an image,
00:32:38.420 | C will be three.
00:32:39.820 | We don't necessarily care about that.
00:32:41.400 | We call it a pseudo-image, but it's the same thing.
00:32:43.740 | It's a fixed number of channels
00:32:45.220 | that the backbone can then operate on.
00:32:47.140 | Yeah, so here's the same thing without math, right?
00:32:52.860 | So you have a lot of points, and then you have this space
00:32:55.140 | where you just grid it up in these pillars, right?
00:32:58.700 | Some are empty, some are not empty.
00:33:00.900 | So in this sort of, with this notation,
00:33:02.900 | let me give a little bit of a literature review.
00:33:05.900 | What people tend to do is you take each pillar,
00:33:07.980 | and you divide it into voxels, right?
00:33:09.580 | So now you have a 3D voxel grid, right?
00:33:11.780 | And then you say, I'm gonna extract
00:33:13.140 | some sort of features for each voxel.
00:33:14.460 | For example, how many points are in this voxel?
00:33:16.620 | Or what is the maximum intensity
00:33:18.640 | of all the points in this voxel?
00:33:20.740 | Then you extract features for the whole pillar, right?
00:33:23.580 | What is the max intensity across all the points
00:33:26.660 | in the whole pillar, right?
00:33:28.420 | All of these are hand-engineered functions
00:33:31.500 | that generates the fixed length output.
00:33:33.820 | So what you can do is you can now concatenate them,
00:33:36.380 | and their output is this tensor x, y, z.
00:33:40.940 | So then, VoxelNet came around, I'd say, a year or so ago.
00:33:49.420 | Maybe a little bit more by now.
00:33:51.740 | So they do the first, the first step is similar, right?
00:33:54.620 | So you divide each pillar into voxels,
00:33:56.520 | and then you take, you map the points in each voxels.
00:34:00.700 | And the novel thing here is that
00:34:02.380 | they got rid of the feature engineering.
00:34:03.980 | So they said, we'll map it from a voxel
00:34:06.940 | to features using a PointNet.
00:34:10.300 | And I'm not gonna get into the details of a PointNet,
00:34:12.220 | but it's basically a network architecture
00:34:15.780 | that allows you to take a point cloud
00:34:18.820 | and map it to, again, a fixed length representation.
00:34:21.920 | So it's a series of 1D convolutions and max pooling layers.
00:34:27.220 | It's a very neat paper, right?
00:34:29.240 | So what they did is they, okay,
00:34:30.420 | we say we apply that to each voxel,
00:34:32.340 | but now I end up with this awkward four-dimensional tensor
00:34:34.620 | 'cause I still have XYZ from the voxels,
00:34:37.660 | and then I have this C-dimensional output
00:34:41.860 | from the PointNet.
00:34:42.900 | So then they have to consolidate the Z dimension
00:34:45.540 | through a 3D convolution, right?
00:34:47.740 | And now you achieve your XYZ tensor.
00:34:50.980 | So now you're ready to go.
00:34:52.060 | So it's very nice in the sense that it's end-to-end method.
00:34:54.800 | They showed good performance,
00:34:57.020 | but at the end of the day, it was very slow.
00:34:58.260 | They got like five hertz runtime.
00:35:00.620 | And the culprit here is this last step,
00:35:03.700 | so the 3D convolution.
00:35:05.780 | It's much, much slower than a standard 2D convolution.
00:35:09.040 | All right, so here's what we did.
00:35:12.800 | We basically said, let's just forget about voxels.
00:35:15.940 | We'll take all the points in the pillar
00:35:17.900 | and we'll put it straight through PointNet.
00:35:21.700 | That's it.
00:35:22.540 | So just that single change gave a 10- to 100-fold speedup
00:35:29.260 | from VoxelNet.
00:35:30.940 | And then we simplified the PointNet.
00:35:33.260 | So now, instead of having,
00:35:34.380 | so PointNet can have several layers
00:35:35.980 | and several modules inside it.
00:35:37.780 | So we simplified it to a single 1D convolution
00:35:40.300 | and max pooling layer.
00:35:41.400 | And then we showed you can get a really fast implementation
00:35:45.380 | by taking all your pillars that are not empty,
00:35:48.140 | stack them together into a nice, dense tensor
00:35:50.420 | with a little bit of padding here and there.
00:35:52.620 | And then you can run the forward pass with a single,
00:35:56.780 | you can pose it as a 2D convolution
00:35:59.020 | with a one-by-one kernel.
00:36:00.460 | So the final encoder runtime is now 1.3 milliseconds,
00:36:06.180 | which is really, really fast.
00:36:08.340 | So the full method looks like this.
00:36:12.540 | So you have the point cloud,
00:36:14.300 | you have this pillar feature net, which is the encoder.
00:36:17.580 | So the different steps there,
00:36:20.540 | that feeds straight into the backbone
00:36:22.820 | and your detection heads.
00:36:24.100 | And there you go.
00:36:25.180 | So it's still a multi-stage architecture,
00:36:28.700 | but of course the key is that none of the steps are,
00:36:32.100 | all the steps are fully parameterized.
00:36:34.780 | And we can back propagate through the whole thing
00:36:37.980 | and learn it.
00:36:38.820 | So putting these things together,
00:36:43.500 | these were the results we got on the Qt Benchmark.
00:36:46.100 | So if you look at the car class, right,
00:36:51.340 | we actually got the highest performance,
00:36:53.580 | so this is I think the bird's eye view metric.
00:36:56.340 | And we even outperformed the methods
00:36:58.820 | that relied on LiDAR and vision.
00:37:00.420 | And we did that running at a little bit over 60 hertz.
00:37:05.780 | And this is, like I said, this is in terms of bird's eye view
00:37:15.100 | we can also measure the 3D benchmark
00:37:17.780 | and we get the same, very similar performance.
00:37:23.580 | Yeah, so, you know, car did well, cyclist did well,
00:37:28.340 | pedestrian there was one or two methods,
00:37:30.780 | future methods that did a little bit better.
00:37:32.700 | But then in aggregate on the top left,
00:37:35.300 | we ended up on top.
00:37:36.900 | So, yeah.
00:37:37.740 | And I put a little asterisk here,
00:37:40.660 | this is compared to published methods
00:37:43.020 | at the time of submission.
00:37:44.940 | And so many things happening so quickly.
00:37:47.500 | So there's tons of, you know,
00:37:49.620 | submissions on the Qt leaderboard
00:37:51.180 | that are completely anonymous,
00:37:52.980 | so we don't even know, you know,
00:37:55.180 | what was the input, what data did they use.
00:37:57.940 | So we only compare it to published methods.
00:38:00.140 | So here's some qualitative results.
00:38:04.660 | You have the, you know, just for visibilization
00:38:07.300 | you can project them into the image.
00:38:08.460 | So you see the gray boxes are the ground truth
00:38:10.260 | and the colored ones are the predictions.
00:38:13.500 | And yeah, some challenging ones,
00:38:21.220 | it's so small here.
00:38:22.700 | So we have, for example, the person on the right there,
00:38:25.420 | that's a person with a little stand
00:38:29.260 | got interpreted as a bicycle.
00:38:30.820 | We have this man on the ladder,
00:38:33.060 | which is an actual annotation error.
00:38:34.540 | So we discovered it as a person,
00:38:36.460 | but it wasn't annotated in the data.
00:38:38.260 | Here's a child on a bicycle that didn't get detected.
00:38:44.340 | So that's a, you know, that's a bummer.
00:38:50.020 | Okay, so that was KITTI,
00:38:54.340 | and then I just wanted to show you guys,
00:38:56.940 | of course we can run this on our vehicles.
00:39:00.140 | So this is a rendering.
00:39:01.660 | We just deploy the network at two hertz
00:39:04.980 | on the full 360 sensor suite.
00:39:08.700 | Input is still alive, you know, a few lighter sweeps,
00:39:12.180 | but just projected into the images for visualization.
00:39:17.660 | And again, no tracking or smoothing applied here.
00:39:19.740 | So it's every single frame is analyzed independently.
00:39:24.020 | See those arrows sticking out?
00:39:28.500 | That's the velocity estimate.
00:39:30.700 | So we actually show how you can,
00:39:32.260 | yeah, you can actually accumulate multiple point clouds
00:39:37.060 | into this method,
00:39:37.940 | and now you can start reasoning about velocity as well.
00:39:40.660 | (no audio)
00:39:42.980 | So the second part I want to talk about is NuScenes,
00:39:52.660 | which is a new benchmark data set that we have published.
00:39:57.660 | So what is NuScenes?
00:39:58.700 | So it's 1,020 second scenes
00:40:01.820 | that we collected with our development platform.
00:40:05.140 | So it's a full, it's the same platform that Carl showed,
00:40:08.740 | or a sort of previous generation platform, the Zoe vehicle.
00:40:12.060 | So it's full, you know, the full automotive sensor suite,
00:40:15.660 | data is registered and synced in 360 degree view.
00:40:20.620 | And it's also fully annotated with 3D bounding boxes.
00:40:22.860 | I think there's over one million 3D bounding boxes.
00:40:27.020 | And we actually make this freely available for research.
00:40:29.580 | So you can go to nuscenes.org right now
00:40:32.820 | and download a teaser release, which is 100 scenes,
00:40:38.260 | the full release will be in about a month.
00:40:40.380 | And of course the motivation is straightforward, right?
00:40:44.940 | So, you know, the whole field is driven by benchmark,
00:40:48.060 | and you know, without image, I don't think none of it,
00:40:51.020 | it may be the case that none of us are here,
00:40:52.780 | we're here, right?
00:40:53.620 | Because they may never have been able
00:40:54.860 | to write that first paper
00:40:56.740 | and sort of start this whole thing going.
00:40:58.960 | And when I started looking at 3D,
00:41:01.980 | I looked at the Kili benchmark,
00:41:03.140 | which is truly groundbreaking.
00:41:05.540 | I don't want to take anything away,
00:41:07.060 | but it was becoming outdated.
00:41:08.580 | They don't have full 3D view, they don't have any radar.
00:41:13.380 | So I think this offers an opportunity
00:41:15.980 | to sort of push the field forward a little bit.
00:41:18.880 | Right, and just as a comparison,
00:41:22.500 | this is sort of the most similar benchmark.
00:41:25.980 | And really the only one that you can really compare to
00:41:30.020 | is Kiti.
00:41:30.860 | But so there's other data sets that have maybe LIDAR only,
00:41:35.540 | tons of data sets that have image only, of course.
00:41:39.740 | But it's quite a big step up from Kiti.
00:41:43.420 | Yeah, some details.
00:41:46.180 | So you see the layout with the radars along the edge,
00:41:51.100 | all the cameras on the roof and the top LIDAR,
00:41:55.060 | and some of the receptive fields.
00:41:56.940 | And this data is all on the website.
00:41:59.160 | The taxonomy, so we model several different subcategories
00:42:03.180 | of pedestrians, several types of vehicles,
00:42:05.620 | some static objects, barrier cones.
00:42:08.460 | And then in addition, a bunch of attributes
00:42:11.020 | on the vehicles and on the pedestrians.
00:42:12.980 | All right, so without further ado,
00:42:15.380 | let's just look at some data.
00:42:16.640 | So this is one of the thousand scenes, right?
00:42:18.820 | So all I'm showing here is just playing the frames
00:42:23.700 | one by one of all the images.
00:42:25.680 | And again, the annotations live
00:42:30.700 | in the world coordinate system, right?
00:42:32.300 | So they are full 3D boxes.
00:42:34.580 | I've just projected them into the image.
00:42:36.580 | And that's what's so neat.
00:42:38.580 | So we're not really annotating the LIDAR
00:42:41.660 | or the camera or the radar.
00:42:43.380 | We're annotating the actual objects
00:42:45.420 | and put them in a world coordinate system
00:42:46.900 | and give all the transformations
00:42:48.060 | so you guys can play around with it how you like.
00:42:52.140 | So just to show that, so I can,
00:42:54.420 | because everything is ready,
00:42:55.420 | so I can now take the LIDAR sweeps
00:42:56.740 | and I can just project them into the images
00:42:58.780 | at the same time.
00:42:59.780 | So here I'm showing just colored by distance.
00:43:02.540 | So now you have some sort of sparse density measurement
00:43:06.700 | on the images, distance measurement, sorry.
00:43:09.660 | So that's all I wanted to talk about.
00:43:11.780 | Thank you.
00:43:12.620 | (audience applauding)
00:43:15.780 | - Hi, I was really, really interested
00:43:19.500 | in your discussion around validation
00:43:21.700 | and particularly continuous development
00:43:23.300 | and that sort of thing.
00:43:24.340 | And so my question was basically
00:43:26.020 | is this new scenes dataset,
00:43:27.320 | is this enough to guarantee
00:43:29.580 | that your model is going to generalize to unseen data
00:43:31.900 | and not hit pedestrians and that stuff
00:43:33.980 | or do you have other validation that you need to do?
00:43:36.100 | - No, no, no, I mean,
00:43:36.940 | so the new scenes effort is purely an academic effort.
00:43:40.100 | So we wanna share our data with academic community
00:43:43.580 | to drive the field forward.
00:43:46.300 | We're not making any claims
00:43:47.620 | that this is somehow a sufficient dataset
00:43:49.340 | for any safety case.
00:43:51.460 | It's a small subset of our data.
00:43:56.100 | Yeah, I would say, obviously,
00:43:59.620 | my background is in the academic world.
00:44:01.200 | One of the hardest things was always collecting data
00:44:03.400 | because it's difficult and expensive.
00:44:05.240 | And so having access to a dataset like that,
00:44:08.960 | which was expensive to collect and annotate,
00:44:12.520 | but which we thought we would make available
00:44:15.000 | because, well, we hope that it would spark
00:44:18.400 | academic interests and smart people
00:44:20.820 | like the people in this room
00:44:22.280 | coming up with new and better algorithms,
00:44:23.840 | which could benefit the whole community
00:44:25.120 | and then maybe some of you would even wanna
00:44:26.660 | come work with us at Aptiv.
00:44:28.120 | So not totally, a little bit of self-interest there.
00:44:31.700 | Wasn't intended to be for validation,
00:44:33.220 | it was more for research.
00:44:34.380 | To give you a sense of the scale of validation,
00:44:36.940 | there was one quote there at RAND
00:44:39.020 | saying you gotta drive 275 million miles or more,
00:44:42.000 | depending on the certainty you wanna impose.
00:44:47.000 | But to date as an industry,
00:44:48.680 | we've driven about 12 million miles
00:44:51.540 | to 12 to 14 million miles in sum,
00:44:54.220 | all participants in autonomous mode,
00:44:56.900 | under over hundreds of different bills of code
00:44:59.460 | and many different environments.
00:45:01.020 | So this would now be saying
00:45:02.100 | you're supposed to drive hundreds of millions of miles
00:45:04.560 | in a particular environment on a single bill of code,
00:45:07.360 | a single platform.
00:45:08.980 | Now obviously we're probably not gonna do that.
00:45:11.020 | What we'll end up doing is supplementing the driving
00:45:13.480 | with quite a lot of simulation
00:45:15.820 | and then other methodologies to convince ourselves
00:45:18.560 | that we can make a statistical,
00:45:20.260 | ultimately a statistical argument for safety.
00:45:22.420 | So there'll be use of data sets like this.
00:45:25.220 | We'll be doing lots of regression testing
00:45:26.820 | on supersized version of data sets like that
00:45:30.140 | and other kind of morally equivalent versions
00:45:32.120 | to test different parts of the systems.
00:45:33.500 | Now not just classification,
00:45:34.640 | but different aspects of the system.
00:45:36.380 | Our motion planning, decision making,
00:45:38.740 | localization, all aspects of the system.
00:45:42.580 | And then augment that with on-road driving
00:45:45.480 | and augment that with simulation.
00:45:46.900 | So the safety case is really quite a bit broader,
00:45:49.480 | unfortunately, than any single data set
00:45:51.940 | would allow you to kind of speak to.
00:45:54.620 | - From an industrial perspective,
00:45:56.340 | what do you think can 5G offer for autonomous vehicles?
00:46:00.920 | - 5G, yeah, it's an interesting one.
00:46:04.000 | Well, these vehicles are connected.
00:46:06.540 | You know, that's a requirement.
00:46:08.980 | Certainly when you think about operating them as a fleet.
00:46:12.620 | When the day comes when you have an autonomous vehicle
00:46:14.820 | that is personally owned,
00:46:16.900 | and that day will come in some point in the future,
00:46:19.180 | it may or may not be connected,
00:46:20.420 | it will almost certainly then be too.
00:46:22.220 | But when you have a fleet of vehicles
00:46:24.380 | and you wanna coordinate the activity of that fleet
00:46:27.380 | in a way to maximize the efficiency of that network,
00:46:30.100 | that transportation network,
00:46:31.660 | they're certainly connected.
00:46:33.180 | The requirements of that connectivity is fairly relaxed
00:46:35.740 | if you're talking about just passing back and forth
00:46:37.380 | the position of the car and maybe some status indicators.
00:46:40.500 | You know, are you in autonomous mode, manual mode,
00:46:42.280 | are all systems go, or do you have a fault code,
00:46:43.980 | and what is it?
00:46:45.240 | Now, there's some interesting requirements
00:46:48.220 | that become a little bit more stringent
00:46:49.580 | if you think about what we call teleoperation
00:46:51.740 | or remote operation of the car.
00:46:53.900 | The case where if the car encounters a situation
00:46:56.380 | it doesn't recognize, can't figure out,
00:46:58.620 | gets stuck or confused,
00:47:00.340 | you may kind of phone a human operator
00:47:02.900 | who's sitting remotely to intervene.
00:47:05.080 | And in that case, you know,
00:47:06.580 | that human operator will wanna have
00:47:07.900 | some situational awareness.
00:47:09.380 | There may be a demand of high bandwidth, low latency,
00:47:13.660 | high reliability of the sort that maybe 5G
00:47:16.940 | is better suited to than 4G.
00:47:19.540 | Or LT or whatever you've got.
00:47:21.420 | Broadly speaking, we see it as a very nice to have,
00:47:25.700 | but like any infrastructure,
00:47:28.420 | we understand that it's gonna arrive
00:47:30.660 | on a timeline of its own
00:47:32.420 | and be maintained by someone who's not us.
00:47:34.940 | So it's very much outside our control.
00:47:37.380 | And so for that reason, we design a system
00:47:39.700 | such that we don't rely on kind of the coming 5G wave,
00:47:43.260 | but we'll certainly welcome it when it arrives.
00:47:45.220 | - So you said you have presence in 45 countries.
00:47:48.260 | So did you observe any interesting patterns from that?
00:47:51.140 | Like your car, your same self-driving car model
00:47:55.500 | that is deployed in Vegas as well as Singapore
00:47:57.900 | was able to perform equally well
00:47:59.940 | in both Vegas and Singapore,
00:48:01.180 | or the model was able to perform very well in Singapore
00:48:03.700 | compared to Vegas?
00:48:05.580 | - To speak to your question
00:48:06.420 | about like country to country variation,
00:48:08.260 | you know, we touched on that for a moment
00:48:10.540 | in the validation discussion.
00:48:12.620 | But obviously driving in Singapore
00:48:14.180 | and driving in Vegas is pretty different.
00:48:15.540 | I mean, you're on the other side of the road for starters,
00:48:18.340 | but different traffic rules
00:48:20.900 | and it's sort of underappreciated people drive differently.
00:48:23.300 | There's slightly different traffic norms.
00:48:25.700 | So one of the things that,
00:48:27.380 | well, if anyone was in this class last year,
00:48:30.500 | my co-founder Emilio gave a talk
00:48:31.900 | about something we call rule books,
00:48:33.700 | which is a structure that we've designed
00:48:35.860 | around what we call the driving policy
00:48:37.460 | or the decision-making engine,
00:48:38.580 | which tries to admit in a general and fairly flexible way
00:48:44.020 | the ability to reprioritize rules, reassign rules,
00:48:47.300 | change weights on rules to enable us to drive
00:48:50.300 | in one community and then another
00:48:52.300 | in a fairly seamless manner.
00:48:53.980 | So to give you an example,
00:48:55.020 | when we wanted to get on the road in Singapore,
00:48:57.820 | if you can imagine you've got a,
00:48:59.460 | so let's say you're a autonomy engineer
00:49:02.100 | who was tasked with writing the decision-making engine
00:49:03.820 | and you decide I'm gonna do a finite state architecture,
00:49:06.100 | I'm gonna write down some transition rules,
00:49:07.860 | I'm gonna do them by hand, it's gonna be great.
00:49:09.820 | And then you did that for the right-hand driving
00:49:12.060 | and your boss came in and said,
00:49:13.100 | "Oh yeah, next Monday we're gonna be left-hand driving,
00:49:15.020 | "so just flip all that and get it ready to go."
00:49:18.580 | That could be a huge pain, pain to do,
00:49:21.620 | 'cause it's generally speaking you're doing it manually
00:49:23.300 | and then very difficult to validate,
00:49:24.780 | to ensure that the outputs are correct
00:49:27.380 | across the entire spectrum of possibilities.
00:49:29.740 | So we wanted to avoid that.
00:49:31.020 | And so the long story short,
00:49:33.580 | we actually quite carefully designed the system
00:49:36.940 | such that we can scale to different cities and countries.
00:49:41.780 | And one of the ways you do that is by thinking carefully
00:49:45.060 | around the architectural design
00:49:47.100 | of the decision-making engine.
00:49:48.900 | But it's quite different.
00:49:51.860 | There's four cities I mentioned which are our primary sites,
00:49:53.940 | Boston, Pittsburgh, Vegas, and Singapore,
00:49:56.700 | spans a wide spectrum of driving conditions.
00:49:58.760 | I mean, everybody knows Boston, which is pretty bad.
00:50:02.700 | Vegas is warm weather, mid-density urban, but it's Vegas.
00:50:07.700 | So I mean, all kinds of stuff.
00:50:11.080 | And then Singapore is interesting,
00:50:13.260 | perfect infrastructure, good weather, flat.
00:50:16.380 | People, generally speaking, obey the rules,
00:50:18.380 | so it's kind of close to the ideal case.
00:50:21.860 | So that exposure to this different spectrum of data,
00:50:25.340 | I think, I'll speak for Oscar, maybe, is pretty valuable.
00:50:27.900 | I know for other parts of the development team,
00:50:30.020 | quite valuable.
00:50:30.860 | - Singapore is ideal except there are,
00:50:33.100 | there's constant construction zones.
00:50:34.820 | So every time we drive out, there's a new construction zone.
00:50:37.620 | So we focused, have a lot of work
00:50:39.580 | on construction zone detection in Singapore.
00:50:41.720 | - And the torrential rain.
00:50:42.880 | - Yeah, and the jaywalkers.
00:50:44.720 | - And the jaywalkers, right.
00:50:46.440 | Yeah, they do jaywalk.
00:50:47.680 | People don't break the rules, but they jaywalk.
00:50:50.500 | Other than that, it's perfect.
00:50:52.480 | So which country's fully equipped?
00:50:54.120 | That's a really good question, yeah.
00:50:55.880 | Well, it's interesting because there's other dimensions.
00:50:59.120 | So when we look at which countries are interesting to us
00:51:01.640 | to be in as a market,
00:51:03.920 | there's the infrastructure conditions,
00:51:05.960 | there's the driving patterns and properties,
00:51:08.400 | the density, is it Times Square at rush hour
00:51:10.620 | or is it Dubuque, Iowa?
00:51:12.460 | There is the regulatory environment,
00:51:14.400 | which is incredibly important.
00:51:15.780 | You may have a perfectly well-suited city
00:51:17.220 | from a technical perspective
00:51:18.700 | and they may not allow you to drive there.
00:51:21.560 | So it's really all of these things put together.
00:51:24.460 | And so we kind of have a matrix.
00:51:26.280 | We analyze which cities check these boxes
00:51:28.420 | and assign them scores and then try to understand
00:51:31.700 | then also the economics of that market.
00:51:34.040 | Is that city, check all these boxes,
00:51:36.020 | but there's no one using mobility services there.
00:51:38.780 | There's no opportunity to actually generate revenue
00:51:41.420 | from the service.
00:51:43.020 | So you factor in all of those things.
00:51:45.160 | - Yeah, and I think, I mean, one thing to keep in mind
00:51:48.100 | that it's always the first thing I tell candidates
00:51:50.500 | when I interview them.
00:51:51.940 | There's a huge difference in the advantage
00:51:54.100 | to the business model we're proposing, right?
00:51:55.700 | The ride-hailing service.
00:51:57.260 | So we can choose, even if we commit to a certain city,
00:52:00.180 | we can still select the routes that we feel comfortable
00:52:03.540 | and we can roll it out sort of piece by piece.
00:52:05.180 | We can say, okay, we don't feel comfortable
00:52:07.260 | when driving at night in the city yet.
00:52:09.460 | So we just won't accept any rides, right?
00:52:12.060 | So there's like that decision space as well.
00:52:16.020 | - Hi, thank you very much for coming
00:52:17.500 | and giving us this talk today.
00:52:18.780 | It was very, very interesting.
00:52:19.660 | I have a question which might reveal more
00:52:21.420 | about how naive I am than anything else.
00:52:24.340 | I was comparing your point pillar approach
00:52:28.220 | to the earlier approach where you were,
00:52:31.740 | which is the Voxel-based approach
00:52:34.500 | to interpreting the LIDAR results.
00:52:37.100 | And in the Voxels, you had a four-dimensional tensor
00:52:40.900 | that you were starting with, and your point pillar,
00:52:42.900 | you only have three dimensions.
00:52:43.900 | You're throwing away the Z, as I understood it.
00:52:46.340 | So when you do that, are you concerned
00:52:49.100 | that you're losing information about potential occlusions
00:52:51.460 | or transparencies or semi-occlusions?
00:52:53.420 | Is this a concern?
00:52:56.540 | - I see.
00:52:57.380 | So I may have been a little bit sloppy there.
00:53:00.980 | So we're certainly not throwing away the Z.
00:53:03.500 | All we're saying is that we're learning
00:53:05.660 | the embedding in the Z dimension
00:53:08.060 | jointly with everything else.
00:53:10.340 | So VoxelNet, if you want, sort of felt,
00:53:14.220 | when I first signed that paper,
00:53:15.900 | I felt the need to spoon-feed the network a little bit
00:53:18.260 | and say, let's learn everything stratified
00:53:21.300 | in this height dimension.
00:53:24.820 | And then we'll have a second step
00:53:26.940 | where we learn to consolidate that into a single vector.
00:53:30.820 | We just said, why not just learn those things together?
00:53:33.660 | So, yeah.
00:53:35.100 | - Thanks for your talk.
00:53:36.740 | I have a question for Carl.
00:53:38.060 | You mentioned that if people make change to the code,
00:53:42.180 | do we need another validation or not?
00:53:45.580 | So I work in the industry of nuclear power.
00:53:48.700 | So we do nuclear power simulations.
00:53:51.180 | So when we make any change to our simulation code,
00:53:56.180 | and to make it commercialized,
00:53:57.860 | we need to submit a request for the NRC,
00:54:01.260 | which is the Nuclear Regulation Committee.
00:54:04.660 | So in your opinion, do you think for self-driving,
00:54:08.260 | we need another third-party validation committee or not?
00:54:14.140 | Or should that be a third party, or is just self-check?
00:54:19.580 | - Yeah, that's a really good question.
00:54:22.540 | So I don't know the answer.
00:54:24.140 | I wouldn't be surprised, let me put it this way.
00:54:26.020 | I would not be surprised either way
00:54:27.740 | if the automotive industry ended up
00:54:29.340 | with third-party regulatory oversight, or it didn't.
00:54:33.860 | And I'll tell you why.
00:54:34.700 | There's great precedence for what you just described.
00:54:37.780 | Nuclear, aerospace, there's external bodies
00:54:40.980 | who have deep technical competence,
00:54:43.700 | who can come in, they can do investigations,
00:54:45.780 | they can impose strict regulation, or advise regulation,
00:54:50.020 | and they can partner or define requirements
00:54:56.600 | for certification of various types.
00:54:58.880 | The automotive industry has largely been self-certifying.
00:55:01.680 | There's an argument, which is certainly not unreasonable,
00:55:05.080 | that you have a real alignment of incentive
00:55:08.800 | within the industry and with the public
00:55:11.640 | to be as safe as possible.
00:55:13.520 | Simply put, the cost of a crash is enormous,
00:55:17.900 | economically, socially, everything else.
00:55:20.720 | But whether it continues along that path,
00:55:22.840 | I couldn't tell you.
00:55:25.060 | It's an interesting space because it's one
00:55:28.580 | where the federal government is actually moving
00:55:31.140 | very, very quickly.
00:55:32.020 | I mean, I would say carefully, too,
00:55:33.900 | not overstepping and not trying to impose
00:55:36.620 | too much regulation around an industry
00:55:38.660 | that has never generated a dollar of revenue.
00:55:41.620 | It's still quite nascent.
00:55:43.620 | But if you would have told me a few years ago
00:55:45.680 | that there would have been very thoughtfully defined
00:55:48.820 | draft regulatory guidelines or advice,
00:55:52.620 | I mean, let's say, it's not firm regulation,
00:55:55.540 | around this industry, I probably wouldn't have believed you.
00:55:58.220 | But in fact, that exists.
00:55:59.180 | There's a third version that was released this summer
00:56:01.140 | by the Department of Transportation.
00:56:04.300 | So there's intense interest on the regulatory side.
00:56:07.700 | In terms of how far the process goes
00:56:11.880 | in terms of formation of an external body,
00:56:13.520 | I think really remains to be seen.
00:56:15.060 | I don't know the answer.
00:56:16.660 | - Thanks for your insightful talk.
00:56:18.740 | Looking at this slide, I'm wondering how easy
00:56:21.760 | and effective your train models are
00:56:25.100 | to transfer across different weathers
00:56:27.340 | and whether you need, for example,
00:56:29.300 | if it is snowing, do you need specific trainings
00:56:32.780 | for specifically for your lidars to work effectively
00:56:36.100 | or you don't see any issues in that regard?
00:56:39.500 | - No, I mean, I think the same rules apply
00:56:41.700 | to this method as any other machine learning-based method.
00:56:44.380 | You wanna have support in your training data
00:56:46.220 | for the situation you wanna deploy in.
00:56:49.220 | So if we have no snow in our training data,
00:56:51.760 | I wouldn't go and deploy this in snow.
00:56:54.120 | I do like, one thing I like after having worked
00:56:57.680 | so much with vision though is that the lidar point cloud
00:57:00.220 | is really easy to augment and play around with.
00:57:05.060 | So for example, if you wanna say,
00:57:07.720 | you wanna be robust in really rare events, right?
00:57:11.720 | So let's say there's a piano on the road.
00:57:13.680 | I really wanna detect that.
00:57:15.560 | But it's hard because I have very few examples
00:57:17.460 | of pianos on the road, right?
00:57:19.200 | Now if you think about augmenting your visual dataset
00:57:21.260 | with that data, it's actually quite tricky.
00:57:23.720 | So that easy to have a photorealistic piano
00:57:26.320 | in your training data.
00:57:27.480 | But it is quite easy to do that in your lidar data, right?
00:57:30.640 | So you have a 3D model of your piano,
00:57:33.200 | you have the model for your lidar
00:57:35.160 | and you can get a pretty accurate,
00:57:37.040 | fairly realistic point cloud return from that, right?
00:57:40.520 | So I like that part about working with lidar.
00:57:42.080 | You can augment, you can play around with it.
00:57:44.360 | In fact, one of the things we do when we train this model
00:57:47.500 | is that we copy and paste samples from,
00:57:51.820 | or like objects from different samples.
00:57:53.840 | So you can take a car that I saw yesterday,
00:57:56.240 | take the point returns on that car,
00:57:59.420 | you can just paste it into your current lidar sweep.
00:58:02.280 | You have to be a little bit careful, right?
00:58:03.860 | And this was actually proposed by another,
00:58:07.100 | by a previous paper.
00:58:08.500 | And we found that that was a really useful data.
00:58:11.680 | It sounds absurd, but it actually works.
00:58:14.420 | And it speaks to the ability to do that with lidar point cloud.
00:58:18.280 | - Okay, great.
00:58:19.120 | Please give Carl and Oscar a big hand.
00:58:20.920 | Thank you so much.
00:58:21.760 | (audience applauding)
00:58:25.540 | - Excellent.
00:58:26.540 | (upbeat music)
00:58:29.120 | (upbeat music)
00:58:31.700 | (upbeat music)
00:58:34.280 | (upbeat music)
00:58:36.860 | (upbeat music)
00:58:39.440 | (upbeat music)
00:58:42.020 | [BLANK_AUDIO]