back to indexTesla AI Day Highlights | Lex Fridman
Chapters
0:0 Overview
1:16 Neural network architecture
4:55 Data and annotation
6:44 Autopilot & DOJO
8:28 Summary: 3 key ideas
9:55 Tesla Bot
00:00:00.000 |
Tesla AI Day presented the most amazing real-world AI 00:00:03.640 |
and engineering effort I have ever seen in my life. 00:00:15.160 |
It was amazing because I believe the autonomous driving task 00:00:18.540 |
and the general real-world robotics perception 00:00:30.480 |
inference compute and training compute required 00:00:38.520 |
Yesterday was the first time I saw in one place 00:00:48.840 |
and the general real-world robotics perception 00:00:52.760 |
This includes the neural network architecture and pipeline, 00:01:05.400 |
and yes, the generalized application of all of the above 00:01:25.240 |
from the state of the art in machine learning. 00:01:27.580 |
First is to predict the vector space, not in image space. 00:01:40.220 |
The thing about reality is that it happens out there 00:01:46.460 |
all the machine learning on the 2D projections 00:01:50.260 |
Like many good ideas, this is an obvious one, 00:01:59.660 |
The detections performed by the different heads 00:02:04.700 |
For now, the fusion is at the multiscale feature level. 00:02:13.120 |
of doing the detection and the machine learning 00:02:28.120 |
At each frame, concatenating positional encodings, 00:03:10.540 |
takes us further toward full end-to-end driving 00:03:25.940 |
of utilization of neural networks is planning. 00:03:29.780 |
So obviously optimal planning in action space 00:04:55.700 |
The other critical part of making all of this work 00:05:16.100 |
and then project it out into the image space. 00:05:28.140 |
as is the case with self-supervised learning, 00:05:30.300 |
auto labeling is the key to this whole thing. 00:05:33.140 |
One of the interesting thing that was presented 00:05:37.100 |
that includes video, IMU, GPS, odometry, and so on 00:05:40.500 |
from multiple vehicles at the same location in time 00:05:52.940 |
these buckets of data from different vehicles, 00:06:02.660 |
of that particular part of road at that particular time. 00:06:06.300 |
That's amazing because the more the fleet grows, 00:06:08.500 |
the stronger that kind of auto labeling becomes. 00:06:12.300 |
And the more edge cases you're able to catch that way. 00:06:20.580 |
that are not going to appear often in the data, 00:06:22.460 |
even when that data set grows incredibly large. 00:06:27.880 |
of ultra complex scenes where accurate labeling 00:06:32.940 |
like a scene with like a hundred pedestrians, 00:06:41.020 |
and the data annotation is really just a big leap. 00:06:48.100 |
the neural network compiler that optimizes latency, 00:06:52.380 |
There's, I think I remember really nice testing 00:07:03.380 |
where you can compare different neural networks together. 00:07:13.900 |
are currently being used to continually retrain the network. 00:07:26.900 |
but unlike the neural network and the data annotation, 00:07:30.020 |
this is in the future, so to be deployed still, 00:07:34.020 |
is the Dojo computer, which is used for training. 00:07:37.940 |
So the autopilot computer is the computer on the car 00:07:45.260 |
that performs the training of the neural network. 00:07:47.900 |
There's a, what they're calling a single training tile 00:07:53.820 |
It's made up of D1 chips that are built in house by Tesla. 00:08:08.620 |
And then I think they connected like a million nodes 00:08:14.740 |
I forget what the name is, but it's 1.1 exaflop. 00:08:30.820 |
on AI day is amazing because the, what would you call it? 00:08:40.940 |
of auto labeling plus manual labeling of edge cases. 00:08:46.340 |
plus the data collection, retraining, deploying. 00:08:49.540 |
And then again, you go back to the data collection, 00:08:56.000 |
And you can go through this loop as many times as you want 00:08:59.980 |
to arbitrarily improve the performance of the network. 00:09:08.300 |
but I also think this loop does not have a ceiling. 00:09:11.780 |
I still think there's a big place for driver sensing. 00:09:19.860 |
but damn it, this loop of manual and auto labeling 00:09:24.060 |
that leads to retraining, that leads to deployment, 00:09:28.020 |
and the auto labeling and the manual labeling 00:09:48.720 |
the deployment of PyTorch across these nodes, 00:09:55.940 |
Finally, the third reason all of this was amazing 00:10:23.220 |
the lifelong dream has been to build the mind, 00:10:26.900 |
the robot that becomes a friend and a companion to humans, 00:10:30.500 |
not just a servant that performs boring and dangerous tasks. 00:10:45.040 |
of perception, movement, and object manipulation. 00:10:51.200 |
in solving the former problem of human-robot interaction, 00:10:57.440 |
I'm not going to mention love when talking about robots.