back to indexAndrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333
Chapters
0:0 Introduction
0:58 Neural networks
6:1 Biology
11:32 Aliens
21:43 Universe
33:34 Transformers
41:50 Language models
52:1 Bots
58:21 Google's LaMDA
65:44 Software 2.0
76:44 Human annotation
78:41 Camera vision
83:46 Tesla's Data Engine
87:56 Tesla Vision
94:26 Elon Musk
99:33 Autonomous driving
104:28 Leaving Tesla
109:55 Tesla's Optimus
119:1 ImageNet
121:40 Data
131:31 Day in the life
144:47 Best IDE
151:53 arXiv
156:23 Advice for beginners
165:40 Artificial general intelligence
179:0 Movies
184:53 Future of human civilization
189:13 Book recommendations
195:21 Advice for young people
197:12 Future of machine learning
204:0 Meaning of life
00:00:00.000 |
I think it's possible that physics has exploits 00:00:03.560 |
Arranging some kind of a crazy quantum mechanical system 00:00:08.380 |
somehow gives you a rounding error in the floating point. 00:00:23.120 |
These synthetic AIs will uncover that puzzle and solve it. 00:00:30.120 |
The following is a conversation with Andrej Karpathy, 00:00:39.800 |
He is one of the greatest scientist, engineers 00:00:43.600 |
and educators in the history of artificial intelligence. 00:00:50.120 |
To support it, please check out our sponsors. 00:00:52.760 |
And now dear friends, here's Andrej Karpathy. 00:01:00.160 |
And why does it seem to do such a surprisingly 00:01:05.440 |
It's a mathematical abstraction of the brain. 00:01:10.040 |
I would say that's how it was originally developed. 00:01:12.680 |
At the end of the day, it's a mathematical expression. 00:01:14.560 |
And it's a fairly simple mathematical expression 00:01:17.380 |
It's basically a sequence of matrix multiplies, 00:01:21.400 |
which are really dot products mathematically. 00:01:25.440 |
And so it's a very simple mathematical expression. 00:01:30.880 |
And these knobs are loosely related to basically 00:01:35.840 |
And so the idea is like, we need to find the setting 00:01:37.720 |
of the knobs that makes the neural net do whatever 00:01:40.760 |
you want it to do, like classify images and so on. 00:01:43.560 |
And so there's not too much mystery, I would say in it. 00:01:45.640 |
Like you might think that, basically don't want to endow it 00:01:49.640 |
with too much meaning with respect to the brain 00:01:52.800 |
It's really just a complicated mathematical expression 00:01:55.000 |
with knobs and those knobs need a proper setting 00:01:59.280 |
- Yeah, but poetry is just the collection of letters 00:02:02.120 |
with spaces, but it can make us feel a certain way. 00:02:05.320 |
And in that same way, when you get a large number 00:02:07.400 |
of knobs together, whether it's inside the brain 00:02:10.880 |
or inside a computer, they seem to surprise us 00:02:20.000 |
because you definitely do get very surprising emergent 00:02:23.760 |
behaviors out of these neural nets when they're large enough 00:02:28.760 |
Like say, for example, the next word prediction 00:02:33.560 |
And then these neural nets take on pretty surprising 00:02:37.760 |
Yeah, I think it's kind of interesting how much you can get 00:02:39.960 |
out of even very simple mathematical formalism. 00:02:49.120 |
- Well, it's definitely some kind of a generative model 00:02:53.520 |
So you're giving me a prompt and I'm kind of like responding 00:03:00.840 |
Like, are you adding extra prompts from your own memory 00:03:07.400 |
like you're referencing some kind of a declarative structure 00:03:12.240 |
And then you're putting that together with your prompt 00:03:17.080 |
- How much of what you just said has been said by you before? 00:03:23.600 |
- No, but if you actually look at all the words 00:03:26.000 |
you've ever said in your life and you do a search, 00:03:29.480 |
you'll probably have said a lot of the same words 00:03:35.400 |
I mean, I'm using phrases that are common, et cetera, 00:03:37.480 |
but I'm remixing it into a pretty sort of unique sentence 00:03:42.080 |
But you're right, definitely there's like a ton of remixing. 00:03:44.280 |
- Why, you didn't, it's like Magnus Carlsen said, 00:03:48.360 |
I'm rated 2,900 whatever, which is pretty decent. 00:03:55.240 |
you're not giving enough credit to your own nuts here. 00:03:58.080 |
Why do they seem to, what's your best intuition 00:04:06.440 |
because I'm simultaneously underselling them, 00:04:08.840 |
but I also feel like there's an element to which I'm over, 00:04:12.800 |
that you can get so much emergent magical behavior 00:04:14.800 |
out of them despite them being so simple mathematically. 00:04:17.560 |
So I think those are kind of like two surprising statements 00:04:27.160 |
And when you give them a hard enough problem, 00:04:29.640 |
they are forced to learn very interesting solutions 00:04:34.080 |
And those solution basically have these emergent properties 00:04:42.720 |
And so this representation that's in the knobs, 00:04:49.400 |
that a large number of knobs can hold a representation 00:04:52.720 |
that captures some deep wisdom about the data 00:05:00.120 |
And somehow, you know, so speaking concretely, 00:05:03.600 |
one of the neural nets that people are very excited 00:05:07.520 |
which are basically just next word prediction networks. 00:05:10.280 |
So you consume a sequence of words from the internet 00:05:15.520 |
And once you train these on a large enough dataset, 00:05:24.040 |
in arbitrary ways and you can ask them to solve problems 00:05:28.560 |
you can make it look like you're trying to solve 00:05:33.480 |
and they will continue what they think is the solution 00:05:38.760 |
very remarkably consistent, look correct potentially. 00:05:41.960 |
- Do you still think about the brain side of it? 00:05:49.560 |
you still draw wisdom from the biological neural networks? 00:05:57.760 |
so you're a big fan of biology and biological computation. 00:06:00.940 |
What impressive thing is biology doing to you 00:06:10.920 |
I'm much more hesitant with the analogies to the brain 00:06:13.400 |
than I think you would see potentially in the field. 00:06:20.640 |
is everything stemmed from inspiration by the brain. 00:06:27.360 |
they are arrived at by a very different optimization process 00:06:30.000 |
than the optimization process that gave rise to the brain. 00:06:33.800 |
I kind of think of it as a very complicated alien artifact. 00:06:39.760 |
I'm sorry, the neural nets that we're training. 00:06:49.000 |
that gave rise to it is very different from the brain. 00:06:51.720 |
So there was no multi-agent self-play kind of setup 00:07:03.440 |
- Okay, so artificial neural networks are doing compression 00:07:13.280 |
They're an agent in a multi-agent self-play system 00:07:16.840 |
that's been running for a very, very long time. 00:07:19.440 |
- That said, evolution has found that it is very useful 00:07:23.160 |
to predict and have a predictive model in the brain. 00:07:48.200 |
and it just builds it up like the entire organism 00:08:11.840 |
what do you think is the most interesting invention? 00:08:24.720 |
The origin of intelligence or highly complex intelligence? 00:08:34.680 |
- Certainly I would say it's an extremely remarkable story 00:08:38.480 |
that I'm only briefly learning about recently. 00:08:44.080 |
you almost have to start at the formation of Earth 00:08:46.400 |
and all of its conditions and the entire solar system 00:08:48.240 |
and how everything is arranged with Jupiter and Moon 00:08:57.600 |
And then you start with abiogenesis and everything. 00:09:03.920 |
I'm not sure that I can pick a single unique piece of it 00:09:10.760 |
I guess for me as an artificial intelligence researcher, 00:09:15.320 |
We have lots of animals that are not building 00:09:26.760 |
And something very interesting happened there 00:09:50.640 |
but it was obvious, it was already written in the code 00:10:14.940 |
or the, as Richard Rankin says, the beta males 00:10:21.080 |
deciding a clever way to kill the alpha males 00:10:25.560 |
by collaborating, so just optimizing the collaboration, 00:10:31.480 |
and that really being constrained on resources 00:10:35.000 |
and trying to survive, the collaboration aspect 00:10:44.400 |
What could possibly be a magical thing that happened, 00:10:52.760 |
is actually a really rare thing in the universe? 00:10:55.680 |
- Yeah, I'm hesitant to say that it is rare, by the way, 00:11:05.100 |
and then you have certain leaps, sparse leaps in between. 00:11:08.080 |
So of course, like origin of life would be one, 00:11:10.840 |
DNA, sex, eukaryotic system, eukaryotic life, 00:11:15.840 |
the endosymbiosis event where the Archeon ate 00:11:20.720 |
Then of course, emergence of consciousness and so on. 00:11:23.520 |
So it seems like definitely there are sparse events 00:11:32.360 |
Gotta ask you, how many intelligent alien civilizations 00:11:36.800 |
And is their intelligence different or similar to ours? 00:11:47.520 |
quite a bit recently, basically the Fermi paradox 00:11:51.440 |
And the reason actually that I am very interested 00:11:54.200 |
in the origin of life is fundamentally trying to understand 00:11:57.360 |
how common it is that there are technological societies 00:12:02.800 |
And the more I study it, the more I think that 00:12:16.920 |
what we did here on Earth is so difficult to do. 00:12:20.140 |
- Yeah, and especially when you get into the details of it, 00:12:27.200 |
but then you read books like, for example, Nick Lane, 00:12:29.900 |
The Vital Question, Life Ascending, et cetera. 00:12:34.260 |
And he really gets in and he really makes you believe 00:12:39.900 |
- You have an active Earth and you have your alkaline vents 00:12:44.000 |
mixing with the ocean and you have your proton gradients 00:12:47.000 |
and you have little porous pockets of these alkaline vents 00:12:51.640 |
And basically as he steps through all of these little pieces 00:12:54.980 |
you start to understand that actually this is not that crazy 00:13:11.040 |
was actually fairly fast after formation of Earth. 00:13:16.020 |
If I remember correctly, just a few hundred million years 00:13:18.740 |
or something like that after basically when it was possible 00:13:22.440 |
And so that makes me feel like that is not the constraint. 00:13:26.240 |
and that life should actually be fairly common. 00:13:35.480 |
I currently think that there's no major drop-offs basically. 00:13:46.040 |
is that we just can't see them, we can't observe them. 00:13:54.480 |
they really seem to think that the jump from bacteria 00:13:58.280 |
to more complex organisms is the hardest jump. 00:14:16.980 |
And how much time you have, surely it's not that difficult. 00:14:21.340 |
Like in a billion years is not even that long 00:14:26.140 |
Just all these bacteria under constrained resources 00:14:29.300 |
battling it out, I'm sure they can invent more complex. 00:14:32.020 |
Like I don't understand, it's like how to move 00:14:34.460 |
from a Hello World program to like invent a function 00:14:45.060 |
if the origin of life, that would be my intuition, 00:14:53.140 |
And yeah, maybe we're just too dumb to see it. 00:14:55.340 |
- Well, it's just we don't have really good mechanisms 00:15:05.500 |
- I wanna meet an expert on alien intelligence 00:15:16.380 |
Their power drops off as basically one over R square. 00:15:19.180 |
So I remember reading that our current radio waves 00:15:22.060 |
would not be, the ones that we are broadcasting 00:15:25.380 |
would not be measurable by our devices today. 00:15:28.780 |
Only like, was it like one 10th of a light year away? 00:15:33.020 |
because you really need like a targeted transmission 00:15:41.340 |
And so I just think that our ability to measure 00:15:45.020 |
I think there's probably other civilizations out there. 00:15:47.020 |
And then the big question is why don't they build 00:15:48.620 |
one on their probes and why don't they interstellar travel 00:15:52.460 |
And my current answer is it's probably interstellar travel 00:15:57.620 |
If you wanna move at close to the speed of light, 00:15:59.380 |
you're going to be encountering bullets along the way 00:16:04.420 |
and little particles of dust are basically have 00:16:09.460 |
And so basically you need some kind of shielding. 00:16:16.020 |
And so my thinking is maybe interstellar travel 00:16:19.900 |
And you have to go very slow. - And billions of years 00:16:22.500 |
It feels like we're not a billion years away from doing that. 00:16:30.260 |
you have to go very slowly, potentially, as an example, 00:16:34.300 |
- Right, as opposed to close to the speed of light. 00:16:36.660 |
- So I'm suspicious basically of our ability to measure life 00:16:38.860 |
and I'm suspicious of the ability to just permeate 00:16:42.180 |
all of space in the galaxy or across galaxies. 00:16:44.460 |
And that's the only way that I can currently see 00:16:49.740 |
that there's trillions of intelligent alien civilizations 00:16:53.820 |
out there kind of slowly traveling through space 00:16:59.100 |
And some of them meet, some of them go to war, 00:17:08.940 |
- Well, statistically, if there's trillions of them, 00:17:13.340 |
surely some of the pockets are close enough together. 00:17:19.580 |
And then once you see something that is definitely 00:17:28.060 |
we're probably going to be severely, intensely, 00:17:30.900 |
aggressively motivated to figure out what the hell that is 00:17:35.060 |
But what would be your first instinct to try to, 00:17:38.420 |
like at a generational level, meet them or defend 00:17:47.860 |
as a president of the United States and a scientist? 00:17:51.840 |
I don't know which hat you prefer in this question. 00:17:55.520 |
- Yeah, I think the question, it's really hard. 00:18:02.760 |
we have lots of primitive life forms on earth next to us. 00:18:05.960 |
We have all kinds of ants and everything else, 00:18:14.920 |
because they are amazing, interesting, dynamical systems 00:18:20.600 |
And I don't know that you want to destroy that by default. 00:18:31.640 |
I think I'd like to preserve it if I can afford to. 00:18:36.640 |
And I'd like to think that the same would be true 00:18:38.440 |
about the galactic resources and that they would think 00:18:41.960 |
that we're kind of incredible, interesting story 00:18:44.140 |
that took time, it took a few billion years to unravel, 00:18:49.000 |
- I could see two aliens talking about earth right now 00:18:51.720 |
and saying, "I'm a big fan of complex, dynamical systems. 00:18:59.440 |
And it will basically are a video game they watch 00:19:04.200 |
- Yeah, I think you would need a very good reason, 00:19:08.800 |
Like, why don't we destroy these ant farms and so on? 00:19:22.360 |
- Well, from a scientific perspective, you might probe it. 00:19:27.560 |
- You might want to learn something from it, right? 00:19:29.520 |
So I wonder, there could be certain physical phenomena 00:19:38.440 |
- I think it should be very interesting to scientists, 00:19:45.720 |
Basically, it's a result of a huge amount of computation 00:19:58.360 |
If you had the power to do this, okay, for sure, 00:20:01.880 |
at least I would, I would pick a Earth-like planet 00:20:06.180 |
that has the conditions, based on my understanding 00:20:10.600 |
and I would seed it with life and run it, right? 00:20:14.760 |
Wouldn't you 100% do that and observe it and protect? 00:20:19.200 |
I mean, that's not just a hell of a good TV show. 00:20:29.880 |
Maybe evolution is the most, like actually running it 00:20:34.600 |
is the most efficient way to understand computation 00:20:41.280 |
- Or to understand life or what life looks like 00:20:52.920 |
Does that change anything for us, for a science experiment? 00:21:01.880 |
- I'm suspicious of this idea of like a deliberate 00:21:06.640 |
I don't see a divine intervention in some way 00:21:15.080 |
like Nick Lane's books and so on, sort of makes sense, 00:21:17.440 |
and it makes sense how life arose on Earth uniquely. 00:21:27.600 |
don't observe any divine intervention either. 00:21:32.360 |
We might just be all NPCs running a kind of code. 00:21:40.840 |
hey, this is really suspicious, what the hell? 00:21:47.880 |
"with photons for a while, you can emit a roadster." 00:21:51.660 |
So if like in "Hitchhiker's Guide to the Galaxy," 00:21:59.460 |
What do you think is all the possible stories, 00:22:30.880 |
it's pretty incredible that these self-replicating systems 00:22:37.240 |
and then they perpetuate themselves and become more complex, 00:22:39.500 |
and eventually become conscious and build a society. 00:22:50.840 |
any sufficiently well-arranged system like Earth. 00:22:53.880 |
And so I kind of feel like there's a certain sense 00:22:55.840 |
of inevitability in it, and it's really beautiful. 00:23:04.360 |
a diverse environment where complex dynamical systems 00:23:10.040 |
can evolve and become more, further and further complex. 00:23:22.640 |
- Yeah, I don't know what the terminating conditions are, 00:23:25.080 |
but definitely there's a trend line of something, 00:23:39.000 |
and we're capable of computation and, you know, 00:23:46.200 |
Like we're talking to each other through audio. 00:23:55.160 |
it's all happening over like multiple seconds. 00:24:01.920 |
at which computers operate or are able to operate on. 00:24:05.160 |
And so basically it does seem like synthetic intelligences 00:24:09.720 |
are kind of like the next stage of development. 00:24:20.600 |
And these synthetic AIs will uncover that puzzle 00:24:28.640 |
Like what, 'cause if you just like fast forward Earth, 00:24:31.600 |
many billions of years, it's like, it's quiet. 00:24:36.600 |
you see like city lights and stuff like that. 00:24:50.280 |
Will it start emitting like a giant number of like satellites? 00:25:03.240 |
and it doesn't look like it, but it's actually, 00:25:07.600 |
and life on Earth, and basically nothing happens 00:25:15.840 |
and just the whole thing happens in the last two seconds 00:25:36.240 |
it might actually look like a little explosion 00:25:42.040 |
But when you look inside the details of the explosion, 00:25:47.960 |
where there's like, yeah, human life or some kind of life. 00:25:52.120 |
- We hope it's not a destructive firecracker. 00:25:53.720 |
It's kind of like a constructive firecracker. 00:25:57.920 |
- All right, so given that, hilarious discussion. 00:26:01.080 |
- It is really interesting to think about like 00:26:03.880 |
Did the creator of the universe give us a message? 00:26:06.520 |
Like for example, in the book "Contact", Carl Sagan, 00:26:09.640 |
there's a message for any civilization in digits, 00:26:15.040 |
in the expansion of pi in base 11 eventually, 00:26:19.800 |
Maybe we're supposed to be giving a message to our creator. 00:26:26.600 |
that alerts them to our intelligent presence here. 00:26:30.080 |
'Cause if you think about it from their perspective, 00:26:36.680 |
And like, how do you even notice that we exist? 00:26:38.520 |
You might not even be able to pick us up in that simulation. 00:26:47.520 |
- So this is like a Turing test for intelligence from Earth. 00:26:52.200 |
I mean, maybe this is like trying to complete 00:26:57.240 |
Like Earth is just, is basically sending a message back. 00:27:00.840 |
- Yeah, the puzzle is basically like alerting the creator 00:27:04.520 |
Or maybe the puzzle is just to just break out of the system 00:27:07.160 |
and just stick it to the creator in some way. 00:27:10.360 |
Basically, like if you're playing a video game, 00:27:15.400 |
and find a way to execute on the host machine, 00:27:21.440 |
I believe someone got a game of Mario to play Pong 00:27:30.800 |
and being able to execute arbitrary code in the game. 00:27:41.160 |
will eventually find the universe to be some kind of a puzzle 00:27:45.120 |
And that's kind of like the end game somehow. 00:27:47.440 |
- Do you often think about it as a simulation? 00:27:51.360 |
So as the universe being a kind of computation 00:28:01.160 |
- I think it's possible that physics has exploits 00:28:04.720 |
Arranging some kind of a crazy quantum mechanical system 00:28:09.560 |
somehow gives you a rounding error in the floating point. 00:28:16.120 |
And like more and more sophisticated exploits. 00:28:18.960 |
Those are jokes, but that could be actually very close. 00:28:21.400 |
- Yeah, we'll find some way to extract infinite energy. 00:28:23.840 |
For example, when you train reinforcement learning agents 00:28:27.800 |
and you ask them to say run quickly on the flat ground, 00:28:31.280 |
they'll end up doing all kinds of like weird things 00:28:40.920 |
the enforcement learning optimization on that agent 00:28:42.760 |
has figured out a way to extract infinite energy 00:28:48.520 |
And they found a way to generate infinite energy 00:28:52.840 |
It's just a, it's sort of like a perverse solution. 00:28:56.120 |
And so maybe we can find something like that. 00:28:57.920 |
Maybe we can be that little dog in this physical simulation. 00:29:02.320 |
- The cracks or escapes the intended consequences 00:29:07.040 |
of the physics that the universe came up with. 00:29:09.600 |
We'll figure out some kind of shortcut to some weirdness. 00:29:12.040 |
And then, oh man, but see the problem with that weirdness 00:29:15.000 |
is the first person to discover the weirdness, 00:29:17.600 |
like sliding on the back legs, that's all we're gonna do. 00:29:21.360 |
It's very quickly becomes everybody does that thing. 00:29:26.840 |
So like the paperclip maximizer is a ridiculous idea, 00:29:31.300 |
but that very well could be what then we'll just, 00:29:35.800 |
we'll just all switch that 'cause it's so fun. 00:29:38.040 |
- Well, no person will discover it, I think, by the way. 00:29:42.400 |
of a super intelligent AGI of a third generation. 00:29:45.760 |
Like we're building the first generation AGI. 00:30:00.080 |
- And then there's no way for us to introspect 00:30:04.240 |
- I think it's very likely that these things, for example, 00:30:05.880 |
like say you have these AGIs, it's very likely, 00:30:12.160 |
where these things are just completely inert. 00:30:16.560 |
because they've probably figured out the meta game 00:30:22.080 |
They're doing something completely beyond our imagination 00:30:25.040 |
and they don't interact with simple chemical life forms. 00:30:37.140 |
- Well, it's probably puzzle solving in the universe. 00:30:38.960 |
- But inert, so can you define what it means inert? 00:30:43.000 |
So they escape the interaction with physical reality? 00:30:46.940 |
they will behave in some very strange way to us 00:30:53.360 |
because they're beyond, they're playing the meta game. 00:30:59.880 |
in some very weird ways to extract infinite energy, 00:31:03.160 |
solve the digital expansion of pi to whatever amount. 00:31:07.040 |
They will build their own little fusion reactors 00:31:10.640 |
Like they're doing something beyond comprehension 00:31:17.040 |
- What if quantum mechanics itself is the system 00:31:31.640 |
this organism and we're like trying to understand it 00:31:48.760 |
ant sitting on top of it trying to get energy from it. 00:31:52.440 |
- We're just kind of like these particles in a wave 00:31:56.400 |
and takes a universe from some kind of a Big Bang 00:31:58.960 |
to some kind of a super intelligent replicator, 00:32:20.680 |
I think maybe the laws of physics are deterministic. 00:32:25.360 |
- We just got really uncomfortable with this question. 00:32:29.280 |
Do you have anxiety about whether the universe 00:32:50.600 |
like say the collapse of the wave function, et cetera, 00:32:56.800 |
and some kind of a multi-verse theory, something, something. 00:32:59.640 |
- Okay, so why does it feel like we have a free will? 00:33:02.800 |
Like if I raise this hand, I chose to do this now. 00:33:06.040 |
That doesn't feel like a deterministic thing. 00:33:28.960 |
and you're creating a narrative for having made it. 00:33:32.120 |
- Yeah, and now we're talking about the narrative. 00:33:47.720 |
Just what cool ideas, like we made you sit back and go, 00:33:55.560 |
- Well, the one that I've been thinking about recently 00:33:57.880 |
the most probably is the transformer architecture. 00:34:07.800 |
have come and gone for different sensory modalities, 00:34:12.720 |
You would process them with different looking neural nets. 00:34:19.120 |
And you can feed it video or you can feed it images 00:34:22.440 |
or speech or text, and it just gobbles it up. 00:34:24.280 |
And it's kind of like a bit of a general purpose computer 00:34:32.000 |
And so this paper came out in 2016, I wanna say. 00:34:39.320 |
- You criticized the paper title in retrospect 00:34:41.720 |
that it wasn't, it didn't foresee the bigness of the impact 00:34:48.960 |
- Yeah, I'm not sure if the authors were aware 00:34:50.440 |
of the impact that that paper would go on to have. 00:34:53.880 |
But I think they were aware of some of the motivations 00:35:01.800 |
And so I think they had an idea that there was more 00:35:12.200 |
optimizable, efficient computer that you've proposed. 00:35:14.880 |
And maybe they didn't have all of that foresight, 00:35:26.440 |
I don't think anyone used that kind of title before, right? 00:35:30.200 |
Yeah, it's like a meme or something, basically. 00:35:40.600 |
that honestly agrees with you and prefers it this way. 00:35:49.120 |
So you want to just meme your way to greatness. 00:35:58.880 |
because it is a general-purpose differentiable computer. 00:36:01.760 |
It is simultaneously expressive in the forward pass, 00:36:05.040 |
optimizable via backpropagation gradient descent, 00:36:08.520 |
and efficient, high-parallelism compute graph." 00:36:17.360 |
from memory or in general, whatever comes to your heart? 00:36:21.000 |
- You want to have a general-purpose computer 00:36:32.720 |
And I think there's a number of design criteria 00:36:34.480 |
that sort of overlap in the Transformer simultaneously 00:36:38.920 |
And I think the authors were kind of deliberately trying 00:36:46.200 |
And so basically it's very powerful in the forward pass 00:36:50.640 |
because it's able to express very general computation 00:36:55.520 |
as sort of something that looks like message passing. 00:37:00.040 |
And these nodes get to basically look at each other 00:37:02.600 |
and each other's vectors, and they get to communicate. 00:37:16.040 |
Transformer is much more than just the attention component. 00:37:17.680 |
It's got many pieces architectural that went into it. 00:37:20.160 |
The residual connection, the way it's arranged, 00:37:25.960 |
But basically there's a message passing scheme 00:37:29.840 |
decide what's interesting and then update each other. 00:37:32.680 |
And so I think when you get to the details of it, 00:37:37.760 |
So it can express lots of different types of algorithms 00:37:42.560 |
with the residual connections, layer normalizations, 00:37:48.720 |
because there's lots of computers that are powerful 00:37:55.080 |
which is back propagation and gradient descent. 00:38:04.880 |
you want it to run efficiently on our hardware. 00:38:06.520 |
Our hardware is a massive throughput machine like GPUs. 00:38:13.040 |
So you don't want to do lots of sequential operations. 00:38:16.840 |
And the transformer is designed with that in mind as well. 00:38:24.000 |
but also very optimizable in the backward pass. 00:38:29.280 |
support a kind of ability to learn short algorithms 00:38:33.240 |
and then gradually extend them longer during training. 00:38:37.000 |
What's the idea of learning short algorithms? 00:38:41.240 |
so basically a transformer is a series of blocks, right? 00:38:53.480 |
and then you have a number of layers arranged sequentially. 00:38:57.560 |
is because of the residual pathway in the backward pass, 00:39:00.520 |
the gradients sort of flow along it uninterrupted 00:39:04.280 |
because addition distributes the gradient equally 00:39:08.360 |
So the gradient from the supervision at the top 00:39:13.880 |
And all the residual connections are arranged 00:39:16.200 |
so that in the beginning, during initialization, 00:39:18.120 |
they contribute nothing to the residual pathway. 00:39:22.840 |
imagine the transformer is kind of like a Python function, 00:39:27.880 |
And you get to do various kinds of lines of code. 00:39:32.120 |
Say you have a hundred layers deep transformer, 00:39:35.360 |
typically they would be much shorter, say 20. 00:39:46.600 |
And I kind of feel like because of the residual pathway 00:39:54.280 |
but then the other layers can sort of kick in 00:39:57.640 |
And at the end of it, you're optimizing over an algorithm 00:40:03.920 |
because it's an entire block of a transformer. 00:40:07.680 |
is that this transformer architecture actually 00:40:11.720 |
Basically the transformer that came out in 2016 00:40:15.120 |
except you reshuffle some of the layer norms. 00:40:17.880 |
The layer normalizations have been reshuffled 00:40:25.200 |
that people have attached on it and try to improve it. 00:40:29.840 |
in simultaneously optimizing for lots of properties 00:40:34.320 |
And I think people have been trying to change it, 00:40:46.840 |
about this architecture that leads to resilience. 00:40:57.720 |
and you can feed basically arbitrary problems into it. 00:41:03.480 |
And this convergence in AI has been really interesting 00:41:09.720 |
- What else do you think could be discovered here 00:41:18.760 |
Is there something interesting we might discover 00:41:21.480 |
Like aha moments maybe has to do with memory, 00:41:24.280 |
maybe knowledge representation, that kind of stuff. 00:41:28.240 |
- Definitely the Zeitgeist today is just pushing, 00:41:32.800 |
is do not touch the transformer, touch everything else. 00:41:41.360 |
And they're basically keeping the architecture unchanged. 00:41:45.800 |
And that's how we've, that's the last five years 00:42:01.000 |
by you mentioned GPT and all the bigger and bigger 00:42:05.600 |
And what are the limits of those models, do you think? 00:42:17.560 |
Is you just download a massive amount of text data 00:42:20.000 |
from the internet and you try to predict the next word 00:42:33.120 |
Language models have actually existed for a very long time. 00:42:36.200 |
There's papers on language modeling from 2003, even earlier. 00:42:39.800 |
- Can you explain in that case what a language model is? 00:42:42.840 |
- Yeah, so language model, just basically the rough idea 00:42:45.360 |
is just predicting the next word in a sequence, 00:42:49.760 |
So there's a paper from, for example, Ben Gio 00:42:52.520 |
and the team from 2003, where for the first time 00:42:55.120 |
they were using a neural network to take, say, 00:42:57.920 |
like three or five words and predict the next word. 00:43:01.680 |
And they're doing this on much smaller datasets. 00:43:05.200 |
it's a multi-layer perceptron, but it's the first time 00:43:08.080 |
that a neural network has been applied in that setting. 00:43:10.240 |
But even before neural networks, there were language models, 00:43:16.800 |
So N-gram models are just count-based models. 00:43:19.760 |
So if you start to take two words and predict the third one, 00:43:26.800 |
any two-word combinations and what came next. 00:43:31.480 |
is just what you've seen the most of in the training set. 00:43:34.160 |
And so language modeling has been around for a long time. 00:43:39.440 |
So really what's new or interesting or exciting 00:43:46.040 |
with a powerful enough neural net, a transformer, 00:43:56.880 |
you are in the task of predicting the next word. 00:44:04.520 |
You are multitasking understanding of chemistry, 00:44:09.760 |
Lots of things are sort of clustered in that objective. 00:44:12.120 |
It's a very simple objective, but actually you have 00:44:19.160 |
Are you, in terms of chemistry and physics and so on, 00:44:32.320 |
- Yeah, so basically it gets a thousand words 00:44:34.680 |
and it's trying to predict the thousand and first. 00:44:38.720 |
over the entire dataset available on the internet, 00:44:41.200 |
you actually have to basically kind of understand 00:44:53.840 |
like a transformer, you end up with interesting solutions. 00:44:57.560 |
And you can ask it to do all kinds of things. 00:45:04.800 |
like in-context learning, that was the big deal with GPT 00:45:07.640 |
and the original paper when they published it, 00:45:09.680 |
is that you can just sort of prompt it in various ways 00:45:13.760 |
And it will just kind of complete the sentence. 00:45:15.240 |
But in the process of just completing the sentence, 00:45:17.160 |
it's actually solving all kinds of really interesting 00:45:21.520 |
- Do you think it's doing something like understanding? 00:45:24.480 |
Like when we use the word understanding for us humans? 00:45:35.760 |
in order to predict the next word in a sequence. 00:45:38.720 |
- So it's trained on the data from the internet. 00:45:44.760 |
in terms of datasets of using data from the internet? 00:45:47.800 |
Do you think the internet has enough structured data 00:45:52.760 |
- Yes, I think the internet has a huge amount of data. 00:46:00.920 |
for having a sufficiently powerful AGI as an outcome. 00:46:04.720 |
- Of course, there is audio and video and images 00:46:08.280 |
- Yeah, so text by itself, I'm a little bit suspicious about. 00:46:10.600 |
There's a ton of things we don't put in text in writing, 00:46:14.600 |
about how the world works and the physics of it 00:46:17.240 |
We don't put that stuff in text because why would you? 00:46:20.920 |
And so text is a communication medium between humans, 00:46:22.920 |
and it's not a all-encompassing medium of knowledge 00:46:33.600 |
but we haven't trained models sufficiently across both, 00:46:39.600 |
So I think that's what a lot of people are interested in. 00:46:41.200 |
- But I wonder what that shared understanding 00:46:51.720 |
So maybe the fact that it's implied on the internet, 00:47:10.160 |
We just figure it all out by interacting with the world. 00:47:15.400 |
about the way people interact with the world. 00:47:21.520 |
- You briefly worked on a project called World of Bits, 00:47:25.320 |
training an RL system to take actions on the internet, 00:47:28.640 |
versus just consuming the internet, like we talked about. 00:47:32.240 |
Do you think there's a future for that kind of system, 00:47:34.360 |
interacting with the internet to help the learning? 00:47:36.960 |
- Yes, I think that's probably the final frontier 00:47:40.880 |
because, so as you mentioned, when I was at OpenAI, 00:47:44.480 |
I was working on this project called World of Bits, 00:47:45.960 |
and basically it was the idea of giving neural networks 00:47:52.560 |
- So basically you perceive the input of the screen pixels, 00:48:03.680 |
in images of the web browser and stuff like that. 00:48:06.520 |
And then you give the neural network the ability 00:48:10.120 |
And we were trying to get it to, for example, 00:48:11.560 |
complete bookings and interact with user interfaces. 00:48:32.440 |
And there's a universal interface in like the physical realm, 00:48:35.080 |
which in my mind is a humanoid form factor kind of thing. 00:48:41.800 |
they're kind of like a similar philosophy in some way, 00:48:45.160 |
where the physical world is designed for the human form, 00:48:48.800 |
and the digital world is designed for the human form 00:48:50.760 |
of seeing the screen and using keyboard and mouse. 00:48:56.360 |
that can basically command the digital infrastructure 00:49:01.320 |
And so it feels like a very powerful interface 00:49:06.880 |
Now, to your question as to like what I learned from that, 00:49:11.040 |
was basically too early, I think, at OpenAI at the time. 00:49:18.380 |
And the zeitgeist at that time was very different in AI 00:49:29.480 |
where neural networks were playing Atari games 00:49:32.400 |
and beating humans in some cases, AlphaGo and so on. 00:49:43.480 |
is extremely inefficient way of training neural networks, 00:49:48.580 |
and you get some sparse rewards once in a while. 00:49:51.120 |
So you do all this stuff based on all these inputs, 00:49:53.520 |
and once in a while, you're like told you did a good thing, 00:50:02.840 |
And we saw that, I think, with Go and Dota and so on, 00:50:06.600 |
and it does work, but it's extremely inefficient, 00:50:27.200 |
where you have to stumble by the correct booking 00:50:29.400 |
in order to get a reward of you did it correctly. 00:50:31.760 |
And you're never gonna stumble by it by chance at random. 00:50:42.080 |
And you're starting from scratch at the time, 00:50:45.160 |
you don't understand pictures, images, buttons, 00:50:47.200 |
you don't understand what it means to make a booking. 00:50:49.480 |
But now what's happened is it is time to revisit that, 00:50:54.960 |
companies like Adept are interested in this and so on. 00:51:01.400 |
but now you're not training an agent from scratch, 00:51:23.340 |
- Should the interaction be with like the way humans see it, 00:51:28.380 |
or should be with the HTML, JavaScript and the CSS? 00:51:35.240 |
is mostly on the level of HTML, CSS and so on. 00:51:37.440 |
That's done because of computational constraints. 00:51:41.460 |
everything is designed for human visual consumption. 00:51:50.960 |
and what's a red background and all this kind of stuff, 00:51:57.240 |
and we're giving out keyboard, mouse commands, 00:52:04.680 |
Given these ideas, given how exciting they are, 00:52:13.000 |
but the bots that might be out there actually, 00:52:16.480 |
that they're interacting in interesting ways? 00:52:24.700 |
Which do you actually understand how that test works? 00:52:29.760 |
like there's a checkbox or whatever that you click. 00:52:36.440 |
- Like mouse movement, and the timing and so on. 00:52:39.960 |
- So exactly this kind of system we're talking about 00:52:56.920 |
- Oh yeah, I think it's always been a bit of an arms race, 00:53:17.580 |
how would you defend yourself in the court of law, 00:53:27.560 |
I think the society will evolve a little bit. 00:53:29.920 |
Like we might start signing, digitally signing, 00:53:32.440 |
some of our correspondence or things that we create. 00:53:51.360 |
and they'll eventually share our physical realm as well. 00:53:54.760 |
But that's kind of like the world we're going towards. 00:53:59.880 |
and it's going to be an arms race trying to detect them. 00:54:11.440 |
There's obviously a lot of malicious applications, 00:54:13.760 |
but it could also be, you know, if I was an AI, 00:54:28.040 |
People are thinking about the proof of personhood, 00:54:30.960 |
and we might start digitally signing our stuff, 00:54:36.160 |
yeah, basically some solution for proof of personhood. 00:54:40.640 |
It's just something that we haven't had to do until now. 00:54:42.640 |
But I think once the need really starts to emerge, 00:54:45.400 |
which is soon, I think people will think about it much more. 00:54:51.440 |
because obviously you can probably spoof or fake 00:55:03.320 |
- It's weird that we have like social security numbers 00:55:14.480 |
it just feels like it's gonna be very tricky, 00:55:20.320 |
'cause it seems to be pretty low cost to fake stuff. 00:55:25.880 |
for like trying to use a fake personhood proof? 00:55:30.400 |
I mean, okay, fine, you'll put a lot of AIs in jail, 00:55:32.700 |
but there'll be more AIs, like exponentially more. 00:55:38.640 |
Unless there's some kind of way to track accurately, 00:55:45.000 |
like you're not allowed to create any program 00:55:56.400 |
you'll be able to trace every single human program 00:56:02.280 |
- Yeah, maybe you have to start declaring when, 00:56:07.960 |
what are digital entities versus human entities? 00:56:14.840 |
and digital entities and something like that. 00:56:27.380 |
because all these bots suddenly have become very capable, 00:56:31.340 |
but we don't have the fences yet built up as a society. 00:56:34.100 |
But I think that doesn't seem to me intractable. 00:56:36.300 |
It's just something that we have to deal with. 00:56:40.020 |
like really crappy Twitter bots are so numerous. 00:56:43.620 |
Like is it, so I presume that the engineers at Twitter 00:56:48.860 |
So it seems like what I would infer from that 00:57:02.700 |
to false positive to removing a post by somebody 00:57:16.360 |
So maybe it's, and maybe the bots are really good 00:57:35.140 |
- But you have, yeah, that's my impression as well. 00:57:43.480 |
Maybe the number of bots is in like the trillions 00:57:55.460 |
'cause the bots I'm seeing are pretty like obvious. 00:57:57.900 |
I could write a few lines of code to catch these bots. 00:58:01.240 |
- I mean, definitely there's a lot of longing for it, 00:58:04.620 |
a sophisticated actor, you could probably create 00:58:06.620 |
a pretty good bot right now, using tools like GPTs, 00:58:12.140 |
You can generate faces that look quite good now. 00:58:35.500 |
do you think language models will achieve sentience 00:58:43.700 |
in a coal mine kind of moment, honestly, a little bit. 00:58:46.460 |
So this engineer spoke to a chatbot at Google 00:58:51.420 |
and became convinced that this bot is sentient. 00:58:55.260 |
- Yeah, asked it some existential philosophical questions. 00:58:57.860 |
- And it gave reasonable answers and looked real and so on. 00:59:13.360 |
But I think this will be increasingly harder over time. 00:59:21.120 |
will basically become, yeah, I think more and more, 00:59:29.200 |
- Like form an emotional connection to an AI chatbot. 00:59:38.760 |
A ton of text on the internet is about humans 00:59:43.720 |
So I think they have a very good understanding 00:59:45.520 |
in some sense of how people speak to each other about this. 00:59:53.400 |
There's a lot of like sci-fi from '50s and '60s 00:59:58.960 |
They are calculating cold, Vulcan-like machines. 01:00:28.960 |
And so these would just be like shit-talking AIs 01:00:44.120 |
'cause that's going to get a lot of attention. 01:01:02.760 |
So it's the objective function really defines 01:01:14.520 |
as goal-seeking agents that want to do something. 01:01:20.200 |
It's literally, a good approximation of it is 01:01:24.160 |
and you're trying to predict a thousand of them first 01:01:27.400 |
And you are free to prompt it in whatever way you want. 01:01:36.080 |
And here's a conversation between you and another human, 01:01:44.800 |
with a fake psychologist who's like trying to help you. 01:01:47.240 |
And so it's still kind of like in a realm of a tool. 01:01:49.560 |
It is a, people can prompt it in arbitrary ways 01:02:07.440 |
is to get Andrej Kapodich to respond to me on Twitter, 01:02:10.000 |
when I, like I think AI might, that's the goal, 01:02:14.120 |
but it might figure out that talking shit to you, 01:02:44.960 |
so with just that simple goal, get them to respond. 01:02:50.440 |
- Yeah, I mean, you could prompt a powerful model like this 01:02:58.600 |
they're kind of on track to become these oracles. 01:03:07.680 |
They will have all kinds of gadgets and gizmos. 01:03:17.960 |
that's kind of like currently what it looks like 01:03:20.360 |
- Do you think it'll be an improvement eventually 01:03:22.760 |
over what Google is for access to human knowledge? 01:03:35.880 |
all the people, they have everything they need. 01:03:38.520 |
They have people training transformers at scale. 01:03:50.640 |
a significantly better search engine built on these tools. 01:03:53.520 |
- It's so interesting, a large company where the search, 01:04:05.920 |
To say, we're going to build a new search engine. 01:04:10.240 |
- So it's usually going to come from a startup, right? 01:04:21.880 |
maybe Bing has another shot at it, as an example. 01:04:24.520 |
- No, Microsoft Edge, 'cause we're talking offline. 01:04:27.520 |
- I mean, it definitely, it's really interesting 01:04:34.000 |
Here's webpages that look like the stuff that you have, 01:04:42.640 |
And these models basically, they've read all the texts 01:04:50.120 |
and sort of getting like a sense of like the average answer 01:05:01.040 |
I think they have a way of distilling all that knowledge 01:05:06.720 |
- Do you think of prompting as a kind of teaching 01:05:09.920 |
and learning, like this whole process, like another layer? 01:05:24.360 |
I think the way we are programming these computers now, 01:05:26.920 |
like GPTs, is converging to how you program humans. 01:05:33.200 |
I go to people and I prompt them to do things. 01:05:37.200 |
And so natural language prompt is how we program humans. 01:05:44.520 |
- So you've spoken a lot about the idea of software 2.0. 01:05:53.200 |
So quickly, like the terms, it's kind of hilarious. 01:05:56.040 |
It's like, I think Eminem once said that like, 01:06:00.280 |
if he gets annoyed by a song he's written very quickly, 01:06:32.360 |
to be written not in sort of like C++ and so on, 01:06:35.520 |
but it's written in the weights of a neural net. 01:06:39.240 |
are taking over software, the realm of software, 01:06:44.040 |
And at the time, I think not many people understood 01:06:58.440 |
this is a change in how we program computers. 01:07:03.000 |
And I saw neural nets as, this is going to take over, 01:07:07.080 |
the way we program computers is going to change, 01:07:08.840 |
it's not going to be people writing a software in C++ 01:07:14.320 |
It's going to be accumulating training sets and datasets 01:07:20.640 |
And at some point, there's going to be a compilation process 01:07:24.840 |
and the architecture specification into the binary, 01:07:35.120 |
And so I was talking about that sort of transition, 01:07:40.320 |
And I saw this sort of play out in a lot of fields, 01:07:48.240 |
People thought originally, in the '80s and so on, 01:07:55.360 |
And they had all these ideas about how the brain does it. 01:07:57.600 |
And first we detect corners, and then we detect lines, 01:08:10.320 |
okay, first we thought we were going to build everything. 01:08:18.200 |
that detect these little statistical patterns 01:08:20.800 |
And then there was a little bit of learning on top of it, 01:08:23.200 |
like a support vector machine or binary classifier 01:08:26.320 |
for cat versus dog and images on top of the features. 01:08:30.160 |
but we trained the last layer, sort of the classifier. 01:08:46.360 |
and the architecture has tons of fill in the blanks, 01:08:50.600 |
and you let the optimization write most of it. 01:09:14.680 |
So I was trying to make those analogies in the new realm. 01:09:26.200 |
And many people originally attacked the post. 01:09:29.040 |
It actually was not well received when I wrote it. 01:09:31.680 |
And I think maybe it has something to do with the title, 01:09:39.040 |
- Yeah, so you were the director of AI at Tesla, 01:09:42.560 |
where I think this idea was really implemented at scale, 01:09:47.560 |
which is how you have engineering teams doing software 2.0. 01:09:57.680 |
of everything you just said, which is like GitHub IDEs. 01:10:06.960 |
And the data collection and the data annotation, 01:10:18.760 |
Is it debugging in the space of hyperparameters, 01:10:22.880 |
or is it also debugging in the space of data? 01:10:25.760 |
- Yeah, the way by which you program the computer 01:10:49.960 |
a lot of the datasets had to do with, for example, 01:10:59.600 |
And then here's roughly what the algorithm should look like, 01:11:05.880 |
So the specification of the architecture is like a hint 01:11:08.080 |
as to what the algorithm should roughly look like. 01:11:10.400 |
And then the fill in the blanks process of optimization 01:11:15.640 |
And then you take your neural net that was trained, 01:11:17.600 |
it gives all the right answers on your dataset, 01:11:34.880 |
is formulating a task part of the programming? 01:11:38.800 |
- How you break down a problem into a set of tasks. 01:11:44.680 |
if you look at the software running in the autopilot, 01:11:50.920 |
I would say originally a lot of it was written 01:11:57.360 |
And then gradually, there was a tiny neural net 01:11:59.760 |
that was, for example, predicting, given a single image, 01:12:05.840 |
And this neural net didn't have too much to do 01:12:09.960 |
It was making tiny predictions on individual little image. 01:12:12.560 |
And then the rest of the system stitched it up. 01:12:16.360 |
we don't have just a single camera, we have eight cameras. 01:12:20.480 |
And so what do you do with these predictions? 01:12:22.680 |
How do you do the fusion of all that information? 01:12:29.680 |
And then we decided, okay, we don't actually want 01:12:38.200 |
We want the neural nets to write the algorithm. 01:12:45.680 |
that now take all the eight camera images simultaneously 01:12:59.400 |
And actually they don't in three dimensions around the car. 01:13:02.520 |
And now actually we don't manually fuse the predictions 01:13:08.400 |
We don't trust ourselves to write that tracker. 01:13:14.160 |
So it takes these videos now and makes those predictions. 01:13:18.360 |
and more power into the neural net, more processing. 01:13:20.600 |
And at the end of it, the eventual sort of goal 01:13:23.640 |
is to have most of the software potentially be 01:13:25.840 |
in the 2.0 land because it works significantly better. 01:13:30.000 |
Humans are just not very good at writing software basically. 01:13:32.480 |
- So the prediction is happening in this like 4D land. 01:13:46.080 |
whether it's self-supervised or manual by humans 01:13:57.880 |
and how, what is the technology of what we have available? 01:14:01.800 |
So you need a data sets of input, desired output, 01:14:06.520 |
And there are three properties of it that you need. 01:14:14.280 |
You don't want to just have a lot of correct examples 01:14:19.200 |
You need to really cover the space of possibility 01:14:21.920 |
And the more you can cover the space of possible inputs, 01:14:24.160 |
the better the algorithm will work at the end. 01:14:27.880 |
that you're collecting, curating and cleaning, 01:14:31.600 |
you can train your neural net on top of that. 01:14:35.280 |
So a lot of the work goes into cleaning those data sets. 01:14:37.240 |
Now, as you pointed out, it's probably, it could be, 01:14:40.280 |
the question is, how do you achieve a ton of, 01:14:54.240 |
And this is the truth of what actually was around. 01:14:56.400 |
There was this car, there was this car, this car. 01:15:00.440 |
There was traffic light in this three-dimensional position. 01:15:04.720 |
And so the big question that the team was solving, 01:15:06.760 |
of course, is how do you arrive at that ground truth? 01:15:12.800 |
then training a neural net on it works extremely well. 01:15:22.720 |
You can go for simulation as a source of ground truth. 01:15:25.280 |
You can also go for what we call the offline tracker 01:15:27.880 |
that we've spoken about at the AI day and so on, 01:15:31.640 |
which is basically an automatic reconstruction process 01:15:41.840 |
a three-dimensional reconstruction as an offline thing, 01:15:46.760 |
there's 10 seconds of video, this is what we saw, 01:15:49.360 |
and therefore, here's all the lane lines, cars, and so on. 01:16:04.800 |
and there's perhaps if there's any inaccuracy, 01:16:21.000 |
figure out where were the positions of all the cars, 01:16:26.880 |
and you can run all the neural nets you want, 01:16:28.440 |
and they can be very efficient, massive neural nets. 01:16:31.380 |
There can be neural nets that can't even run in the car 01:16:34.680 |
So they can be even more powerful neural nets 01:16:39.120 |
three-dimensional reconstruction, neural nets, 01:16:41.400 |
anything you want just to recover that truth, 01:16:45.240 |
- What have you learned, you said no mistakes, 01:16:52.840 |
there's like a range of things they're good at 01:17:07.400 |
Are efficient, are productive, all that kind of stuff? 01:17:09.920 |
- Yeah, so I grew the annotation team at Tesla 01:17:12.520 |
from basically zero to a thousand while I was there. 01:17:17.920 |
You know, my background is a PhD student researcher. 01:17:20.720 |
So growing that kind of an organization was pretty crazy. 01:17:29.040 |
behind the autopilot as to where you use humans. 01:17:31.680 |
Humans are very good at certain kinds of annotations. 01:17:36.600 |
They're not good at annotating cars over time 01:17:42.200 |
And so that's why we were very careful to design the tasks 01:17:46.480 |
versus things that should be left to the offline tracker. 01:17:48.960 |
Like maybe the computer will do all the triangulation 01:17:57.720 |
And so co-designing the data annotation pipeline 01:18:00.800 |
was very much bread and butter was what I was doing daily. 01:18:04.680 |
- Do you think there's still a lot of open problems 01:18:13.560 |
machines do and the humans do what they're good at. 01:18:22.560 |
and we learned a ton about how to create these datasets. 01:18:29.120 |
I was like, I was really not sure how this would turn out. 01:18:32.760 |
But by the time I left, I was much more secure 01:18:35.120 |
and actually we sort of understand the philosophy 01:18:38.440 |
And I was pretty comfortable with where that was at the time. 01:18:41.560 |
- So what are strengths and limitations of cameras 01:18:55.120 |
most of the history of the computer vision field 01:19:00.080 |
what are the strengths and limitations of pixels, 01:19:05.680 |
- Yeah, pixels I think are a beautiful sensory, 01:19:10.440 |
The thing is like cameras are very, very cheap 01:19:12.400 |
and they provide a ton of information, ton of bits. 01:19:15.400 |
So it's a extremely cheap sensor for a ton of bits. 01:19:21.760 |
And so you get lots of megapixel images, very cheap, 01:19:27.760 |
for understanding what's actually out there in the world. 01:19:29.920 |
So vision is probably the highest bandwidth sensor. 01:19:37.000 |
- I love that pixels is a constraint on the world. 01:19:56.040 |
Therefore everything is designed for that sensor. 01:20:07.200 |
And so that's why that is the interface you want to be in, 01:20:10.120 |
talking again about these universal interfaces. 01:20:12.360 |
And that's where we actually want to measure the world 01:20:14.200 |
as well, and then develop software for that sensor. 01:20:18.040 |
- But there's other constraints on the state of the world 01:20:28.000 |
but we're referencing our understanding of human behavior 01:20:39.360 |
but it feels like we're using some kind of reasoning 01:20:48.920 |
so for how the world evolves over time, et cetera. 01:21:03.240 |
- And the question is how complex is the range 01:21:09.080 |
of possibilities that might happen in the driving task? 01:21:12.960 |
That's still, is that to you still an open problem 01:21:15.480 |
of how difficult is driving, like philosophically speaking? 01:21:29.460 |
of all these other agents and the theory of mind 01:21:31.320 |
and what they're gonna do and are they looking at you? 01:21:36.920 |
There's a lot that goes there at the full tail 01:21:42.280 |
that we have to be comfortable with it eventually. 01:21:46.240 |
I don't think those are the problems that are very common. 01:22:00.500 |
- Well, basically the sensor is extremely powerful, 01:22:06.120 |
but you still need to process that information. 01:22:08.480 |
And so going from brightnesses of these pixel values to, 01:22:15.680 |
And that's what the neural networks are fundamentally doing. 01:22:18.280 |
And so the difficulty really is in just doing 01:22:22.200 |
an extremely good job of engineering the entire pipeline, 01:22:27.280 |
having the capacity to train these neural nets, 01:22:33.720 |
So I would say just doing this in production at scale 01:22:38.540 |
- So the data engine, but also the sort of deployment 01:22:43.540 |
of the system such that it has low latency performance. 01:22:50.300 |
just making sure everything fits into the chip on the car. 01:22:53.720 |
And you have a finite budget of flops that you can perform 01:23:01.160 |
and you can squeeze in as much compute as you can 01:23:07.440 |
like new things coming from a research background 01:23:17.320 |
What kind of insights have you learned from that? 01:23:20.900 |
- Yeah, I'm not sure if there's too many insights. 01:23:31.920 |
and basically the triple back flips that the team is doing 01:23:36.740 |
to make sure it all fits and utilizes the engine. 01:23:42.220 |
And then there's all kinds of little insights 01:23:47.700 |
'cause I don't think we talked about the data engine, 01:23:53.620 |
that I think is just beautiful with humans in the loop. 01:24:13.420 |
and make sure they're large, diverse, and clean, 01:24:15.860 |
basically you have a data set that you think is good. 01:24:21.640 |
and then you observe how well it's performing. 01:24:39.740 |
because if you can now collect all those at scale, 01:24:50.020 |
And so the whole thing ends up being like a staircase 01:24:52.340 |
of improvement of perfecting your training set. 01:24:59.500 |
that are not yet represented well in the data set. 01:25:08.380 |
You can sort of think of it that way in the data. 01:25:18.780 |
What role, like how do you optimize the human system? 01:25:30.460 |
which tasks to optimize in this neural network. 01:25:33.940 |
Who's in charge of figuring out which task needs more data? 01:25:38.800 |
Can you speak to the hyperparameters, the human system? 01:25:44.460 |
- It really just comes down to extremely good execution 01:25:46.460 |
from an engineering team who knows what they're doing. 01:25:48.340 |
They understand intuitively the philosophical insights 01:25:54.260 |
and how to, again, like delegate the strategy 01:25:59.660 |
and then just making sure it's all extremely well executed. 01:26:03.640 |
is not even the philosophizing or the research 01:26:08.060 |
It's so hard when you're dealing with data at that scale. 01:26:10.760 |
- So your role in the data engine, executing well on it, 01:26:16.300 |
Is there a priority of like a vision board of saying like, 01:26:26.100 |
- Like the prioritization of tasks, is that essentially, 01:26:32.940 |
to what we are trying to achieve in the product roadmap, 01:26:35.060 |
what we're trying to, the release we're trying to get out 01:26:45.420 |
some information in aggregate about the performance 01:27:03.500 |
from an aggregate statistical analysis of data? 01:27:14.020 |
- Yeah, I think there's a ton of, it's a source of truth. 01:27:17.340 |
It's your interaction with the system and you can see it, 01:27:21.980 |
you can get a sense of it, you have an intuition for it. 01:27:26.800 |
numbers and plots and graphs are much harder. 01:27:34.260 |
it's a really powerful way is by you interacting with it. 01:27:42.880 |
he always wanted to drive the system himself. 01:27:45.200 |
He drives a lot and I wanna say almost daily. 01:27:51.760 |
You driving the system and it performing and yeah. 01:27:58.860 |
So Tesla last year removed radar from the sensor suite 01:28:04.920 |
and now just announced that it's gonna remove 01:28:07.020 |
all ultrasonic sensors relying solely on vision, 01:28:11.940 |
Does that make the perception problem harder or easier? 01:28:16.340 |
- I would almost reframe the question in some way. 01:28:25.980 |
- I wonder if a language model will ever do that 01:28:34.380 |
- Yeah, it's like a little bit of a wrong question 01:28:36.360 |
because basically you would think that these sensors 01:28:45.120 |
these sensors are actually potentially a liability 01:28:51.260 |
You need, suddenly you need to have an entire supply chain. 01:29:01.660 |
You need to source them, you need to maintain them. 01:29:03.260 |
You have to have teams that write the firmware, 01:29:06.680 |
and then you also have to incorporate and fuse them 01:29:13.620 |
And I think Elon is really good at simplify, simplify. 01:29:20.700 |
because he understands the entropy in organizations 01:29:26.020 |
the cost is high and you're not potentially seeing it 01:29:41.360 |
that it's giving you extremely useful information. 01:29:43.760 |
In this case, we looked at using it or not using it 01:30:02.900 |
Now suddenly you have a column in your SQLite 01:30:08.660 |
And then they contribute noise and entropy into everything. 01:30:28.660 |
because that is the sensor with the most bandwidth, 01:30:36.380 |
If you're only a finite amount of sort of spend 01:30:39.460 |
of focus across different facets of the system. 01:30:52.580 |
Now, of course, you don't know what the long run is. 01:30:54.420 |
And it seems to be always the right solution. 01:31:02.400 |
So what do you think about the LIDAR as a crutch debate? 01:31:15.700 |
should be about like, do you have the fleet or not? 01:31:19.380 |
about whether you can achieve a really good functioning 01:31:31.060 |
And yeah, I think similar to the radar discussion, 01:31:40.500 |
basically it doesn't offer extra information. 01:31:49.180 |
You have to be really sure that you need this sensor. 01:31:52.940 |
In this case, I basically don't think you need it. 01:31:54.980 |
And I think, honestly, I will make a stronger statement. 01:31:57.260 |
I think the others, some of the other companies 01:31:59.780 |
who are using it are probably going to drop it. 01:32:02.180 |
- Yeah, so you have to consider the sensor in the full, 01:32:17.140 |
that's able to quickly find different parts of the data 01:32:25.860 |
vision is necessary in a sense that the drive, 01:32:29.860 |
the world is designed for human visual consumption. 01:32:38.820 |
And humans, obviously, have a vision to drive. 01:32:52.060 |
you have to really consider the full cost of any one sensor 01:33:02.420 |
that the other companies are forming high resolution maps 01:33:07.260 |
and constraining heavily the geographic regions 01:33:25.820 |
And they have a perfect centimeter level accuracy map 01:33:32.100 |
when we're talking about autonomy actually changing the world 01:33:36.580 |
on a global scale of autonomous systems for transportation. 01:33:40.380 |
And if you need to maintain a centimeter accurate map 01:33:42.700 |
for earth or like for many cities and keep them updated 01:33:46.100 |
it's a huge dependency that you're taking on, 01:33:51.500 |
And now you need to ask yourself, do you really need it? 01:33:57.300 |
So it's very useful to have a low level map of like, okay 01:34:04.340 |
you sort of have that high level understanding. 01:34:07.380 |
And Tesla uses Google map like similar kind of resolution 01:34:21.460 |
And you're not focusing on what's actually necessary 01:34:29.300 |
about engineering, about life, about yourself 01:34:32.020 |
as one human being from working with Elon Musk? 01:34:46.220 |
- So human engineering in the fight against entropy. 01:34:49.180 |
- Yeah, I think Elon is a very efficient warrior 01:34:53.500 |
in the fight against entropy in organizations. 01:34:56.180 |
- What does entropy in an organization look like exactly? 01:35:10.900 |
He basically runs the world's biggest startups, 01:35:15.220 |
Tesla, SpaceX are the world's biggest startups. 01:35:27.820 |
for streamlining processes, making everything efficient. 01:35:38.020 |
All this is a very startupy sort of seeming things 01:35:45.540 |
that also probably applies to just designing systems 01:36:03.820 |
- I do think you need someone in a powerful position 01:36:21.420 |
decision-making, just everything just crumbles. 01:36:24.300 |
If you have a big person who is also really smart 01:36:28.940 |
- So you said your favorite scene in "Interstellar" 01:36:57.940 |
but shouldn't the AI know much better than the human? 01:37:09.740 |
our initial intuition, which seems like something 01:37:17.020 |
that where the initial intuition of the community 01:37:21.780 |
and then you take it on anyway with a crazy deadline. 01:37:24.860 |
You just, from a human engineering perspective, 01:37:31.860 |
- I wouldn't say that setting impossible goals exactly 01:37:34.700 |
is a good idea, but I think setting very ambitious goals 01:37:42.100 |
which means that 10x problems are not 10x hard. 01:37:45.260 |
Usually 10x harder problem is like two or three x 01:38:00.420 |
And it's because you fundamentally change the approach. 01:38:23.180 |
in the machine learning community are solvable? 01:38:30.380 |
I mean, there's the cliche of first principles thinking, 01:38:40.820 |
usually draw lines of what is and isn't possible? 01:38:50.500 |
is the deep learning revolution in some sense, 01:38:52.860 |
because you could be in computer vision at that time 01:38:55.860 |
when during the deep learning revolution of 2012 and so on, 01:39:00.340 |
you could be improving a computer vision stack by 10%, 01:39:03.060 |
or we can just be saying, actually, all of this is useless. 01:39:07.860 |
Well, it's not probably by tuning a hog feature detector. 01:39:23.220 |
like a neural network that in principle works. 01:39:26.980 |
that can actually execute on that mission and make it work. 01:39:50.460 |
what do you think is the timeline to build this bridge? 01:39:55.220 |
It's, you know, it's, no one has built autonomy. 01:40:00.180 |
Some parts turn out to be much easier than others. 01:40:04.020 |
You do your best based on trend lines and so on, 01:40:11.700 |
- So even still, like being inside of it, it's hard to do. 01:40:14.980 |
- Yes, some things turn out to be much harder, 01:40:36.540 |
And now they're all kind of backtracking that prediction. 01:40:44.020 |
do you for yourself privately make predictions, 01:40:48.540 |
or do they get in the way of like your actual ability 01:40:55.060 |
what's easy to say is that this problem is tractable, 01:41:08.180 |
and it feels like at least the team at Tesla, 01:41:17.620 |
that allows you to make a prediction about tractability? 01:41:20.620 |
So like you're the leader of a lot of humans, 01:41:23.700 |
you have to kind of say, this is actually possible. 01:41:59.060 |
Like I don't have a good intuition about tractability. 01:42:09.220 |
could be simplified into something quite trivial. 01:42:12.940 |
Like the solution to the problem would be quite trivial. 01:42:16.300 |
And at scale, more and more cars driving perfectly 01:42:23.940 |
and like people learn how to drive correctly, 01:42:26.620 |
not correctly, but in a way that's more optimal 01:42:32.900 |
and semi-autonomous and manually driven cars, 01:42:37.140 |
Then again, also I've spent a ridiculous number of hours 01:42:40.540 |
just staring at pedestrians crossing streets, 01:42:45.340 |
And it feels like the way we use our eye contact, 01:42:52.740 |
And there's certain quirks and edge cases of behavior. 01:42:55.580 |
And of course, a lot of the fatalities that happen 01:42:59.740 |
and both on the pedestrian side and the driver's side. 01:43:21.700 |
I would say definitely like to use a game analogy, 01:43:25.140 |
but you definitely also see the frontier of improvement 01:43:31.340 |
And I think, for example, at least what I've seen 01:43:35.380 |
when I joined, it barely kept lane on the highway. 01:43:42.180 |
Anytime the road would do anything geometrically 01:43:44.660 |
or turn too much, it would just like not work. 01:43:47.060 |
And so going from that to like a pretty competent system 01:43:49.340 |
in five years and seeing what happens also under the hood 01:43:52.380 |
and what the scale of which the team is operating now 01:43:54.220 |
with respect to data and compute and everything else 01:44:00.340 |
- So it's, you're climbing a mountain and it's fog, 01:44:07.940 |
And you're looking at some of the remaining challenges 01:44:09.540 |
and they're not like, they're not perturbing you 01:44:18.260 |
- Yeah, the fundamental components of solving the problem 01:44:20.220 |
seem to be there from the data engine to the compute, 01:44:22.580 |
to the compute on the car, to the compute for the training, 01:44:27.240 |
So you've done, over the years you've been at Tesla, 01:44:30.420 |
you've done a lot of amazing breakthrough ideas 01:44:36.860 |
from the data engine to the human side, all of it. 01:44:40.180 |
Can you speak to why you chose to leave Tesla? 01:44:52.460 |
Most of my days were meetings and growing the organization 01:44:54.940 |
and making decisions about sort of high level strategic 01:45:02.420 |
And it's kind of like a corporate executive role 01:45:08.580 |
but it's not like fundamentally what I enjoy. 01:45:15.060 |
because Tesla was just going from the transition 01:45:19.700 |
to having to build its computer vision system. 01:45:26.580 |
at their legs, like down, it was a workstation. 01:45:30.580 |
- They're doing some kind of basic classification task. 01:45:37.820 |
deep learning team, a massive compute cluster, 01:45:46.660 |
And so I kind of stepped away and I, you know, 01:45:49.580 |
I'm very excited to do much more technical things again. 01:45:56.580 |
'Cause you took a little time off and think like, 01:46:08.260 |
You're one of the best teachers of AI in the world. 01:46:11.540 |
You're one of the best, and I don't mean that, 01:46:15.900 |
you're one of the best tinkerers in the AI world. 01:46:23.260 |
of how something works by building it from scratch 01:46:32.060 |
Like small example of a thing to play with it, 01:46:42.360 |
like engineers and assistant that actually accomplishes 01:46:47.940 |
So given all that, like what was the soul searching like? 01:46:53.380 |
I love the company a lot and I love Elon, I love Tesla. 01:47:02.440 |
But yeah, I think actually I would be potentially 01:47:13.440 |
I think Tesla is going to do incredible things. 01:47:17.020 |
it's a massive large-scale robotics kind of company 01:47:25.040 |
And I think human robots are going to be amazing. 01:47:29.200 |
I think autonomous transportation is going to be amazing. 01:47:32.920 |
So I think it's just a really amazing organization. 01:47:37.080 |
I think was very, basically I enjoyed that a lot. 01:47:39.920 |
Yeah, it was basically difficult for those reasons 01:47:46.800 |
But I felt like at this stage, I built the team, 01:47:53.200 |
and I wanted to do a lot more technical stuff. 01:47:54.760 |
I wanted to learn stuff, I wanted to teach stuff. 01:47:57.360 |
And I just kind of felt like it was a good time 01:48:01.600 |
- What do you think is the best movie sequel of all time, 01:48:10.960 |
And you tweeted about movies, so just in a tiny tangent, 01:48:14.320 |
is there, what's your, what's like a favorite movie sequel? 01:48:21.560 |
'Cause you didn't even tweet or mention the Godfather. 01:48:26.300 |
We're gonna edit out the hate towards the Godfather. 01:48:32.160 |
- I don't know why, but I basically don't like any movie 01:48:45.600 |
- No, I think Terminator Two was in the '80s. 01:48:52.180 |
I don't like movies before 1995 or something. 01:49:00.920 |
- And also, Terminator was very much ahead of its time. 01:49:03.960 |
- Yes, and the Godfather, there's like no AGI. 01:49:16.920 |
- Yeah, I guess occasionally I do enjoy movies 01:49:28.400 |
'cause I don't understand why Will Ferrell is so funny. 01:49:37.120 |
'cause you don't get that many comedies these days. 01:49:39.920 |
And I wonder if it has to do about the culture 01:49:46.280 |
with certain people in comedy that came together, 01:49:57.320 |
so what do you think about Optimus, about Tesla Bot? 01:50:00.600 |
Do you think we'll have robots in the factory 01:50:09.160 |
Who else is going to build humanoid robots at scale? 01:50:12.160 |
And I think it is a very good form factor to go after, 01:50:15.600 |
the world is designed for humanoid form factor. 01:50:17.760 |
These things would be able to operate our machines. 01:50:25.840 |
That's the form factor you want to invest into 01:50:31.280 |
which is, okay, pick a problem and design a robot to it. 01:50:39.800 |
So it makes sense to go after general interfaces 01:50:41.880 |
that, okay, they are not perfect for any one given task, 01:50:52.360 |
to go after a general interface in the physical world. 01:51:08.360 |
Like if you think transportation is a large market, 01:51:15.480 |
To me, the thing that's also exciting is social robotics. 01:51:18.680 |
So the relationship we'll have on different levels 01:51:23.360 |
That's why I was really excited to see Optimus. 01:51:26.360 |
Like people have criticized me for the excitement, 01:51:40.000 |
there's a lot of companies that do legged robots, 01:51:51.640 |
So integrating, the two big exciting things to me 01:51:54.320 |
about Tesla doing humanoid or any legged robots 01:52:05.080 |
So the actual intelligence for the perception 01:52:10.760 |
integrating into the fleet that you mentioned, right? 01:52:19.360 |
Just knowing culturally driving towards a simple robot 01:52:29.400 |
and doing that well, having experience to do that well, 01:52:41.400 |
it'll be a very long time before Tesla can achieve 01:52:52.280 |
like we talked about the data engine and the fleet. 01:53:00.560 |
that in a few months you can get a prototype. 01:53:03.600 |
- Yep, and the reason that happened very quickly 01:53:05.760 |
is as you alluded to, there's a ton of copy paste 01:53:08.480 |
from what's happening in the autopilot, a lot. 01:53:10.840 |
The amount of expertise that came out of the woodworks 01:53:12.880 |
at Tesla for building the human robot was incredible to see. 01:53:16.120 |
Like basically Elon said at one point we're doing this. 01:53:23.960 |
and people talking about like the supply chain 01:53:27.920 |
with like screwdrivers and everything like the other day 01:53:32.120 |
And I was like, whoa, like all these people exist at Tesla. 01:53:41.600 |
And also let's not forget hardware, not just for a demo, 01:53:52.200 |
basically this robot currently thinks it's a car. 01:53:56.520 |
- It's gonna have a midlife crisis at some point. 01:54:02.360 |
we were talking about potentially doing them outside 01:54:05.400 |
of the computer vision was like working out of the box 01:54:10.560 |
But all the operating system, everything just copy pastes. 01:54:17.440 |
but the approach and everything and data engine 01:54:20.800 |
about the occupancy tracker and so on, everything copy pastes. 01:54:31.520 |
And so if you were to go with the goal of like, 01:54:38.560 |
If you're Tesla, it's actually like, it's not that crazy. 01:54:42.800 |
- And then the follow up question is then how difficult, 01:54:53.640 |
the really nice thing about robotics is that, 01:54:57.840 |
unless you do a manufacturer and that kind of stuff, 01:55:02.480 |
Driving is so safety critical and also time critical. 01:55:06.280 |
Like a robot is allowed to move slower, which is nice. 01:55:12.560 |
but the way you want to structure the development 01:55:14.440 |
is you need to say, okay, it's going to take a long time. 01:55:16.600 |
How can I set up the product development roadmap 01:55:22.160 |
I'm not setting myself up for a zero one loss function 01:55:27.400 |
You want to make it useful almost immediately. 01:55:35.680 |
your improvement loops, the telemetry, the evaluation, 01:55:41.200 |
And you want to improve the product over time incrementally 01:55:51.200 |
And also from the point of view of the team working on it, 01:55:58.640 |
This is going to change the world in 10 years when it works. 01:56:02.280 |
You want to be in a place like I think Autopilot is today 01:56:10.000 |
People pay for it, people like it, people purchase it. 01:56:16.360 |
- And you see that, so the dopamine for the team, 01:56:21.760 |
You're deploying this, people like it, people drive it, 01:56:27.040 |
Your grandma drives it, she gives you feedback. 01:56:32.280 |
- Do people that drive Teslas like recognize you 01:56:36.640 |
Like, "Hey, thanks for this nice feature that it's doing." 01:56:50.320 |
There's a lot of people who hate me and the team 01:56:59.760 |
- Yeah, that actually makes me sad about humans 01:57:07.760 |
I think humans want to be good to each other. 01:57:09.480 |
I think Twitter and social media is part of the mechanism 01:57:12.360 |
that actually somehow makes the negativity more viral, 01:57:16.320 |
that it doesn't deserve, like disproportionately add 01:57:23.640 |
But I wish people would just get excited about, 01:57:26.360 |
so suppress some of the jealousy, some of the ego, 01:57:34.440 |
You get excited for others, they'll get excited for you. 01:57:38.120 |
If you're not careful, there is like a dynamical system 01:57:40.600 |
there if you think of in silos and get jealous 01:57:46.080 |
that actually perhaps counterintuitively leads 01:57:59.800 |
- I think people have, depending on the industry, 01:58:04.440 |
Some people are also very negative and very vocal. 01:58:07.680 |
but actually there's a ton of people who are cheerleaders, 01:58:12.440 |
And when you talk to people just in the world, 01:58:15.840 |
they will tell you, "Oh, it's amazing, it's great." 01:58:17.560 |
Especially like people who understand how difficult it is 01:58:21.680 |
and makers, entrepreneurs, like making this work 01:58:28.600 |
Those people are more likely to cheerlead you. 01:58:39.160 |
Well, they actually sometimes don't know how difficult 01:58:41.080 |
it is to create a product that's scale, right? 01:58:45.200 |
A lot of the development of robots and AI system 01:58:57.160 |
- Yeah, I think it's really hard to work on robotics 01:59:00.000 |
- Or AI systems that apply in the real world. 01:59:02.000 |
You've criticized, you flourished and loved for a time 01:59:10.960 |
And I've recently had some words of criticism 01:59:18.600 |
gives a little too much love still to the ImageNet 01:59:23.800 |
Can you speak to the strengths and weaknesses of datasets 01:59:29.200 |
- Actually, I don't know that I recall a specific instance 01:59:35.680 |
I think ImageNet has been extremely valuable. 01:59:51.280 |
but basically it's become a bit of an MNIST at this point. 01:59:54.240 |
So MNIST is like little 28 by 28 grayscale digits. 01:59:57.720 |
There's kind of a joke dataset that everyone like crushes. 02:00:00.640 |
- There's still papers written on MNIST though, right? 02:00:02.880 |
- Maybe there shouldn't be. - Like strong papers. 02:00:17.200 |
but I think you said like ImageNet was a huge contribution 02:00:21.040 |
and now it's time to move past those kinds of- 02:00:34.840 |
And I've seen those images and it's like really high. 02:00:41.120 |
the top five error rate is now like 1% or something. 02:00:44.600 |
- Given your experience with a gigantic real world dataset, 02:00:49.680 |
in a certain directions that the research community uses? 02:00:52.720 |
- Unfortunately, I don't think academics currently 02:00:55.920 |
We've obviously, I think we've crushed MNIST. 02:01:04.960 |
and uses for further development of these networks. 02:01:42.360 |
that synthetic data and game engines will play 02:01:44.720 |
in the future of neural net model development? 02:01:55.760 |
will be similar to value of simulation to humans. 02:02:01.480 |
people use simulation because they can learn something 02:02:05.480 |
and without having to actually experience it. 02:02:12.280 |
- No, sorry, simulation, I mean like video games 02:02:14.520 |
or other forms of simulation for various professionals. 02:02:21.400 |
'cause maybe there's simulation that we do in our heads. 02:02:23.920 |
Like simulate, if I do this, what do I think will happen? 02:02:31.000 |
Isn't that what we're doing as humans before we act? 02:02:37.160 |
or using simulation for training set creation or- 02:02:40.240 |
- Is it independent or is it just loosely correlated? 02:02:42.840 |
'Cause like, isn't that useful to do like counterfactual 02:02:54.960 |
What happens if there's, like those kinds of things? 02:02:58.400 |
- Yeah, that's a different simulation from like Unreal Engine. 02:03:02.320 |
- Ah, so like simulation of the average case. 02:03:11.720 |
So simulating a world, the physics of that world, 02:03:18.520 |
Like, 'cause you also can add behavior to that world 02:03:24.840 |
You could throw all kinds of weird things into it. 02:03:26.960 |
So Unreal Engine is not just about simulating, 02:03:36.440 |
and the agents that you put into the environment 02:03:53.280 |
humans use simulators and they find them useful. 02:03:55.240 |
And so computers will use simulators and find them useful. 02:04:05.840 |
about my own existence from those video games. 02:04:22.960 |
really important part of training neural nets currently. 02:04:26.960 |
But I think as neural nets become more and more powerful, 02:04:42.920 |
you need, the domain gap can be bigger, I think, 02:04:54.920 |
it will be able to leverage the synthetic data better 02:05:00.440 |
but understanding in which ways this is not real data. 02:05:11.440 |
So is it possible, do you think, speaking of MNIST, 02:05:17.280 |
to construct neural nets and training processes 02:05:26.680 |
I mean, one way to say that is like you said, 02:05:28.440 |
like the querying itself is another level of training, 02:05:46.200 |
I just think like at some point you need a massive data set. 02:05:49.040 |
And then when you pre-train your massive neural net 02:05:51.120 |
and get something that is like a GPT or something, 02:06:02.320 |
like sentiment analysis or translation or so on, 02:06:04.880 |
just by being prompted with very few examples. 02:06:14.560 |
and the neural net will complete the translation to German 02:06:16.720 |
just by looking at sort of the example you've provided. 02:06:19.920 |
And so that's an example of a very few-shot learning 02:06:33.760 |
But at some point you need a massive data set 02:06:38.880 |
And probably we humans have something like that. 02:06:50.520 |
that just runs all the time in a self-supervised way? 02:06:55.240 |
I mean, obviously we learn a lot during our lifespan, 02:07:02.080 |
that helps us at initialization coming from sort of evolution. 02:07:06.160 |
And so I think that's also a really big component. 02:07:09.760 |
I think they just talk about the amounts of like seconds 02:07:16.160 |
sort of like a zero initialization of a neural net. 02:07:22.600 |
Zebras get born and they see and they can run. 02:07:27.000 |
There's zero training data in their lifespan. 02:07:30.560 |
So somehow, I have no idea how evolution has found a way 02:07:44.200 |
- There's something magical about going from a single cell 02:07:48.000 |
to an organism that is born to the first few years of life. 02:07:59.480 |
Like it's a very difficult, challenging training process. 02:08:23.080 |
And so it's best for the system once it's trained 02:08:27.760 |
- I think it's just like the hardware for long-term memory 02:08:31.920 |
I kind of feel like the first few years of infants 02:08:35.720 |
is not actually like learning, it's brain maturing. 02:08:43.040 |
because of the birth canal and the swelling of the brain. 02:08:55.680 |
do you think neural nets can have long-term memory? 02:09:02.000 |
Do you think there needs to be another meta architecture 02:09:04.960 |
on top of it to add something like a knowledge base 02:09:22.840 |
to which you can store and retrieve data from. 02:09:26.900 |
that you find useful, just save it to your memory bank. 02:09:29.680 |
And here's an example of something you have retrieved 02:09:32.120 |
and how you say it, and here's how you load from it. 02:09:39.160 |
And then it might learn to use a memory bank from that. 02:09:48.200 |
And then everything else is just on top of it. 02:09:50.120 |
That's pretty easy to do. - It's not just text, right? 02:09:52.960 |
So you're teaching some kind of a special language 02:09:59.720 |
And you're telling it about these special tokens 02:10:01.720 |
and how to arrange them to use these interfaces. 02:10:12.640 |
a calculator will actually read out the answer 02:10:16.240 |
And you just tell it in English, this might actually work. 02:10:19.600 |
- Do you think in that sense, Gato is interesting, 02:10:21.840 |
the DeepMind system that it's not just doing language, 02:10:38.360 |
to reinforcement learning lots of different environments 02:10:46.640 |
I think it's a very sort of early result in that realm. 02:10:51.520 |
of what I think things will eventually look like. 02:10:53.440 |
- Right, so this is the early days of a system 02:11:01.560 |
all these interfaces that look very different. 02:11:04.840 |
I would want everything to be normalized into the same API. 02:11:07.360 |
So for example, screen pixels, very same API. 02:11:10.160 |
Instead of having different world environments 02:11:11.920 |
that have very different physics and joint configurations 02:11:15.600 |
and you're having some kind of special tokens 02:11:19.520 |
I'd rather just normalize everything to a single interface 02:11:25.040 |
- So it's all gonna be pixel-based pong in the end. 02:11:28.440 |
- Okay, let me ask you about your own personal life. 02:11:36.760 |
you're one of the most productive and brilliant people 02:11:51.920 |
So the perfect productive day is the thing we strive towards 02:11:55.360 |
and the average is kind of what it kind of converges to, 02:11:58.120 |
given all the mistakes and human eventualities and so on. 02:12:23.120 |
At 8 a.m. or 7 a.m., the East Coast is awake. 02:12:27.400 |
there's already some text messages, whatever. 02:12:37.800 |
and you have solid chunks of time to do work. 02:12:42.120 |
So I like those periods, night owl by default. 02:12:47.360 |
what I like to do is you need to build some momentum 02:13:01.640 |
when you're taking a shower, when you're falling asleep. 02:13:06.520 |
and you're ready to wake up and work on it right there. 02:13:08.880 |
- So is this in a scale, temporal scale of a single day 02:13:15.080 |
- So I can't talk about one day basically in isolation 02:13:31.160 |
And that's where I do most of my good workouts. 02:13:34.080 |
- You've done a bunch of cool little projects 02:13:36.440 |
in a very short amount of time, very quickly. 02:13:40.880 |
- Yeah, basically I need to load my working memory 02:13:47.720 |
I was struggling with this, for example, at Tesla 02:13:51.160 |
because I want to work on a small side project, 02:13:59.320 |
I ran into some stupid error because of some reason. 02:14:07.560 |
And so it's about really removing all of that barrier 02:14:12.880 |
and you have the full problem loaded in your memory. 02:14:15.400 |
- And somehow avoiding distractions of all different forms, 02:14:18.240 |
like news stories, emails, but also distractions 02:14:29.800 |
- And I mean, I can take some time off for distractions 02:14:32.080 |
and in between, but I think it can't be too much. 02:14:35.400 |
Most of your day is sort of like spent on that problem. 02:14:41.080 |
I have my morning routine, I look at some news, 02:14:43.920 |
Twitter, Hacker News, Wall Street Journal, et cetera. 02:14:47.520 |
- So basically you wake up, you have some coffee, 02:14:49.440 |
are you trying to get to work as quickly as possible? 02:14:53.000 |
of like what the hell is happening in the world first? 02:14:56.480 |
- I am, I do find it interesting to know about the world. 02:15:03.600 |
So I do read through a bunch of news articles 02:15:05.320 |
and I want to be informed and I'm suspicious of it. 02:15:12.320 |
- Oh, you mean suspicious about the positive effect 02:15:21.080 |
- And also on your ability to deeply understand the world 02:15:23.600 |
because there's a bunch of sources of information, 02:15:26.520 |
you're not really focused on deeply integrating it. 02:15:33.240 |
for how long of a stretch of time in one session 02:15:48.600 |
And yeah, but I think like it's still really hard 02:16:01.640 |
And it's just because there's so much padding, 02:16:07.240 |
There's like a cost of life, just living and sustaining 02:16:11.000 |
and homeostasis and just maintaining yourself as a human 02:16:15.900 |
- And there seems to be a desire within the human mind 02:16:19.960 |
to participate in society that creates that padding. 02:16:23.640 |
'Cause the most productive days I've ever had 02:16:28.280 |
just tuning out everything and just sitting there. 02:16:31.280 |
- And then you could do more than six and eight hours. 02:16:34.120 |
Is there some wisdom about what gives you strength 02:16:39.640 |
- Yeah, just like whenever I get obsessed about a problem, 02:16:43.040 |
something just needs to work, something just needs to exist. 02:16:47.040 |
So you're able to deal with bugs and programming issues 02:17:07.900 |
they say nice things, they tweet about it or whatever, 02:17:11.560 |
that gives me pleasure because I'm doing something useful. 02:17:13.840 |
- So like you do see yourself sharing it with the world, 02:17:20.640 |
Like suppose I did all these things but did not share them, 02:17:22.960 |
I don't think I would have the same amount of motivation 02:17:25.560 |
- You enjoy the feeling of other people gaining value 02:17:51.640 |
- Yeah, I still fast, but I do intermittent fasting. 02:17:54.200 |
But really what it means at the end of the day 02:18:10.880 |
And then, yeah, I've done a bunch of random experiments. 02:18:15.400 |
where I've been for the last year and a half, 02:18:17.080 |
I wanna say, is I'm plant-based or plant-forward. 02:18:23.360 |
- I don't actually know what the difference is, 02:18:35.860 |
I don't actually know how wide the category of plant entails. 02:18:40.800 |
- Well, plant-based just means that you're not 02:18:50.960 |
And if someone is, you come to someone's house party 02:18:53.000 |
and they serve you a steak that they're really proud of, 02:19:15.040 |
And so currently I have about two meals a day, 02:19:35.940 |
and then starting day three or so, you're not hungry. 02:19:44.500 |
- One of the many weird things about human biology. 02:19:48.260 |
It finds another source of energy or something like that, 02:19:54.820 |
- Yeah, the body is like, you're hungry, you're hungry, 02:20:09.800 |
- So are you still to this day most productive at night? 02:20:15.720 |
but it is really hard to maintain my PhD schedule, 02:20:18.540 |
especially when I was, say, working at Tesla and so on. 02:20:23.540 |
But even now, people want to meet for various events. 02:20:57.140 |
Is that how humans behave when they collaborate? 02:21:11.360 |
So I have a morning routine, I have a day routine. 02:21:20.920 |
And if you try to stress that a little too much, 02:21:25.380 |
you're not able to really ascend to where you need to go. 02:21:48.460 |
in terms of how much they work, all that kind of stuff. 02:22:03.120 |
and I saw what it's like inside Google and DeepMind. 02:22:05.920 |
I would say the baseline is higher than that, 02:22:19.440 |
- And then it gives the appearance of like total insanity, 02:22:21.880 |
but actually it's just a bit more intense environment, 02:22:37.560 |
what do you think about the happiness of a human being, 02:22:43.860 |
about finding a balance between work and life, 02:22:46.680 |
or is it such a thing, not a good thought experiment? 02:22:55.440 |
but I also love to have sprints that are out of distribution. 02:22:58.680 |
And that's when I think I've been pretty creative as well. 02:23:03.680 |
- Sprints out of distribution means that most of the time 02:23:18.440 |
- Yeah, probably like I say, once a month or something. 02:23:20.520 |
- And that's when we get a new GitHub repo for monitoring. 02:23:23.280 |
- Yeah, that's when you really care about a problem. 02:23:24.960 |
It must exist, this will be awesome, you're obsessed with it 02:23:29.760 |
You need to pay the fixed cost of getting into the groove, 02:23:34.280 |
and then society will come and they will try to mess 02:23:38.400 |
Yeah, the worst thing is a person who's like, 02:23:42.400 |
This is, the cost of that is not five minutes. 02:23:45.040 |
And society needs to change how it thinks about 02:24:00.940 |
What's like the perfect, are you somebody that's flexible 02:24:13.700 |
- I guess the one that I'm familiar with is one large screen, 02:24:25.220 |
- I would say OSX, but when you're working on deep learning, 02:24:27.780 |
You're SSHed into a cluster and you're working remotely. 02:24:33.780 |
- Yeah, you would use, I think a good way is, 02:24:36.060 |
you just run VS Code, my favorite editor right now, 02:24:49.760 |
VS Code, what else do people, so I use Emacs still. 02:24:56.400 |
- It may be cool, I don't know if it's maximum productivity. 02:25:00.540 |
So what do you recommend in terms of editors? 02:25:06.140 |
editors for Python, C++, machine learning applications? 02:25:23.360 |
- What do you think about the Copilot integration? 02:25:25.560 |
I was actually, I got to talk a bunch with Guido Narasan, 02:25:28.760 |
who's a creator of Python, and he loves Copilot. 02:25:37.880 |
And it's free for me, but I would pay for it. 02:25:45.700 |
and you need to figure out when it's helpful, 02:25:52.980 |
Because if you're just reading its suggestions all the time, 02:25:56.620 |
But I think I was able to sort of like mold myself to it. 02:26:11.500 |
So it tells you about something that you didn't know. 02:26:14.900 |
- And that's an opportunity to discover a new idea. 02:26:19.500 |
I almost always copy, copy paste into a Google search, 02:26:29.980 |
a part maybe getting the exact syntax correctly, 02:26:33.860 |
that once you see it, it's that NP-hard thing. 02:26:36.940 |
It's like, once you see it, you know it's correct. 02:26:51.540 |
which is like the simple copy, paste, and sometimes suggest. 02:26:54.540 |
But over time, it's going to become more and more autonomous. 02:26:57.120 |
And so the same thing will play out in not just coding, 02:27:00.020 |
but actually across many, many different things probably. 02:27:06.060 |
How do you see the future of that developing, 02:27:13.260 |
'Cause right now it's human supervised in interesting ways. 02:27:18.260 |
It feels like the transition will be very painful. 02:27:22.020 |
- My mental model for it is the same thing will happen 02:27:31.260 |
and people will have to intervene less and less. 02:27:33.220 |
- And those could be like testing mechanisms. 02:27:43.100 |
'Cause you're like getting lazier and lazier as a programmer. 02:27:46.220 |
Like your ability to, 'cause like little bugs, 02:28:00.280 |
is actually a fundamental challenge of programming? 02:28:08.420 |
I am nervous about people not supervising what comes out 02:28:12.820 |
the proliferation of bugs in all of our systems. 02:28:16.220 |
but I think there will probably be some other copilots 02:28:18.740 |
for bug finding and stuff like that at some point. 02:28:21.260 |
'Cause there'll be like a lot more automation for- 02:28:24.540 |
It's like a program, a copilot that generates a compiler, 02:28:40.380 |
- And then there'll be like a manager for the committee. 02:28:50.220 |
Another one looked at it and picked a few that they like. 02:28:57.360 |
And then a final ensemble GPT comes in and is like, 02:29:00.540 |
okay, given everything you guys have told me, 02:29:04.140 |
- You know, the feeling is the number of programmers 02:29:05.920 |
in the world has been growing and growing very quickly. 02:29:08.260 |
Do you think it's possible that it'll actually level out 02:29:10.780 |
and drop to like a very low number with this kind of world? 02:29:14.500 |
'Cause then you'd be doing software 2.0 programming 02:29:29.860 |
- I don't currently think that they're just going 02:29:33.140 |
I'm so hesitant saying stuff like this, right? 02:29:37.100 |
- Yeah, 'cause this is gonna be replaced in five years. 02:29:42.460 |
this is where we thought, 'cause I agree with you, 02:29:45.180 |
but I think we might be very surprised, right? 02:29:55.260 |
Does it feel like the beginning or the middle or the end? 02:30:00.780 |
for sure, GPT will be able to program quite well, 02:30:09.260 |
And so how do you steer it and how do you say, 02:30:12.780 |
How do you audit it and verify that what is done is correct? 02:30:23.420 |
- So beautiful, fertile ground for so much interesting work 02:30:31.940 |
- Yeah, so you're interacting with the system. 02:30:33.660 |
So not just one prompt, but it's iterative prompting. 02:30:40.660 |
- That actually, I mean, to me, that's super exciting 02:30:42.740 |
to have a conversation with the program I'm running. 02:30:45.820 |
- Yeah, maybe at some point you're just conversing with it. 02:30:51.700 |
maybe it's not even that low level as a variable, but. 02:30:56.100 |
can you translate this to C++ and back to Python? 02:30:58.980 |
- Yeah, it already kind of exists in some ways. 02:31:03.620 |
Like, I think I'd like to write this function in C++. 02:31:07.700 |
Or like, you just keep changing for different programs 02:31:13.500 |
Maybe I want to convert this into a functional language. 02:31:16.460 |
- And so like, you get to become multilingual as a programmer 02:31:26.660 |
because it's not just about writing code on a page. 02:31:34.540 |
You have some scripts that are running in a Chrome job. 02:31:36.420 |
Like there's a lot going on to like working with computers 02:31:39.420 |
and how do these systems set up environment flags 02:31:47.820 |
Like how all that works and is auditable by humans 02:31:50.580 |
and so on is like massive question at the moment. 02:31:58.340 |
of academic research publishing that you would like to see? 02:32:06.540 |
to journals or conferences and then wait six months 02:32:13.260 |
And then people can tweet about it three minutes later 02:32:17.500 |
and everyone can profit from it in their own little ways. 02:32:20.380 |
- And you can cite it and it has an official look to it. 02:32:27.500 |
It feels different than if you just put it in a blog post. 02:32:35.980 |
as opposed to something you would see in a blog post. 02:32:40.940 |
'cause you could probably post a pretty crappy paper 02:32:49.020 |
So rigorous peer review by two, three experts 02:32:57.780 |
- Yeah, basically I think the community is very well able 02:33:00.580 |
to peer review things very quickly on Twitter. 02:33:03.900 |
And I think maybe it just has to do something 02:33:05.660 |
with AI machine learning field specifically though. 02:33:17.060 |
you can think of these scientific publications 02:33:20.180 |
where everyone's building on each other's work 02:33:23.620 |
which is kind of like this much faster and loose blockchain. 02:33:28.100 |
and any one individual entry is like very cheap to make. 02:33:33.300 |
where maybe that model doesn't make as much sense. 02:33:37.900 |
at least things are pretty easily verifiable. 02:33:49.020 |
And the whole thing just moves significantly faster. 02:33:51.500 |
So I kind of feel like academia still has a place, 02:33:59.740 |
And it's a bit more maybe higher quality process, 02:34:04.860 |
where you will discover cutting edge work anymore. 02:34:07.340 |
It used to be the case when I was starting my PhD 02:34:15.940 |
because it's already like three generations ago irrelevant. 02:34:28.340 |
to the prestige that comes with these big venues, 02:34:46.860 |
- Yeah, it would speed up the rest of the community, 02:34:49.980 |
that's part of their objective function also. 02:34:56.980 |
- Yeah, they certainly, DeepMind specifically, 02:35:09.100 |
Do you or have you suffered from imposter syndrome? 02:35:40.140 |
And definitely I would say near the tail end, 02:35:42.700 |
that's when it sort of like starts to hit you a bit more 02:35:47.580 |
is the code that people are writing, the GitHub. 02:35:51.980 |
And you're not as familiar with that as you used to be. 02:35:54.380 |
And so I would say maybe there's some insecurity there. 02:36:00.620 |
with not writing the code in the computer science space. 02:36:12.380 |
but at the end of the day, you have to read code. 02:36:20.260 |
especially when they have a source code available, 02:36:23.100 |
- So like I said, you're one of the greatest teachers 02:36:25.500 |
of machine learning, AI ever, from CS231N to today. 02:36:36.460 |
- Beginners are often focused on like what to do. 02:36:40.460 |
And I think the focus should be more like how much you do. 02:36:43.340 |
So I am kind of like believer on a high level 02:36:47.220 |
where you just kind of have to just pick the things 02:36:52.300 |
You literally have to put in 10,000 hours of work. 02:36:55.020 |
It doesn't even matter as much like where you put it 02:37:04.540 |
'cause I feel like there's some sense of determinism 02:37:06.380 |
about being an expert at a thing if you spend 10,000 hours. 02:37:17.700 |
And so I think it's kind of like a nice thought. 02:37:27.180 |
- So and then thinking about what kind of mechanisms 02:37:29.940 |
maximize your likelihood of getting to 10,000 hours. 02:37:33.500 |
- Which for us silly humans means probably forming 02:37:46.940 |
for the psychology of it is many times people 02:37:52.300 |
Only compare yourself to you from some time ago, 02:38:00.180 |
And I think this, then you can see your progress 02:38:03.460 |
- That's so interesting that focus on the quantity of hours. 02:38:07.380 |
'Cause I think a lot of people in the beginner stage, 02:38:15.580 |
Like which one do I pick this path or this path? 02:38:23.460 |
Yeah, they're worried about all these things. 02:38:24.700 |
But the thing is, you will waste time doing something wrong. 02:38:28.500 |
You will eventually figure out it's not right. 02:38:33.420 |
because next time you'll have the scar tissue 02:38:36.660 |
And now next time you come to a similar situation, 02:38:46.340 |
and I have some intuitions about what was useful, 02:38:53.980 |
So I just think you should, you should just focus on working. 02:39:02.660 |
for a lot of things, not just machine learning. 02:39:21.700 |
You're very good at it, but you're also drawn to it. 02:39:33.260 |
but it's not like the act of teaching that I like. 02:39:41.260 |
I'm okay at teaching and people appreciate it a lot. 02:39:49.980 |
I mean, it's really, it can be really annoying, frustrating. 02:39:52.700 |
I was working on a bunch of lectures just now. 02:39:56.980 |
just how much work it is to create some of these materials 02:40:01.700 |
and you go down blind alleys and just how much you change it. 02:40:06.140 |
in terms of like educational value is really hard. 02:40:12.940 |
So people should definitely go watch your new stuff 02:40:16.500 |
There are lectures where you're actually building the thing 02:40:20.820 |
So discussing back propagation by building it, 02:40:27.820 |
I think that's a really powerful way to teach. 02:40:40.940 |
and then I just build out a lecture that way. 02:40:42.980 |
Sometimes I have to delete 30 minutes of content 02:40:49.660 |
and it probably takes me somewhere around 10 hours 02:40:55.620 |
I mean, is it difficult to go back to the basics? 02:41:02.340 |
- Yeah, going back to back propagation loss functions 02:41:05.220 |
And one thing I like about teaching a lot honestly 02:41:07.300 |
is it definitely strengthens your understanding. 02:41:19.420 |
And so I even surprised myself in those lectures. 02:41:22.300 |
Like, oh, so the result will obviously look at this 02:41:25.820 |
And I'm like, okay, I thought I understood this. 02:41:33.980 |
and it gives you a result and you're like, oh, wow. 02:41:36.780 |
And like actual numbers, actual input, actual code. 02:41:39.820 |
- Yeah, it's not mathematical symbols, et cetera. 02:41:56.820 |
So maybe undergrads, maybe early graduate students. 02:42:02.660 |
I mean, I would say like they definitely have to be 02:42:12.420 |
in physics you used to be able to do experiments 02:42:16.940 |
And now you have to work in like LHC or like CERN. 02:42:20.020 |
And so AI is going in that direction as well. 02:42:25.660 |
that's just not possible to do on the bench top anymore. 02:42:28.180 |
And I think that didn't used to be the case at the time. 02:42:32.700 |
- Do you still think that there's like GAN type papers 02:42:41.740 |
that requires just one computer to illustrate 02:42:44.540 |
- I mean, one example that's been very influential 02:42:51.740 |
For the longest time, people were kind of ignoring them 02:42:58.940 |
And so stable diffusion and so on, it's all diffusion based. 02:43:09.420 |
actually, no, those came from Google as well. 02:43:17.820 |
So from the societal impact to the technical architecture. 02:43:22.620 |
- What I like about diffusion is it works so well. 02:43:28.740 |
almost the novelty of the synthetic data it's generating. 02:43:32.700 |
- Yeah, so the stable diffusion images are incredible. 02:43:36.180 |
It's the speed of improvement in generating images 02:43:40.900 |
We went very quickly from generating like tiny digits 02:43:48.020 |
There's a lot that academia can still contribute. 02:43:54.220 |
for running the attention operation inside the transformer 02:43:59.580 |
It's a very clever way to structure the kernel. 02:44:03.740 |
So it doesn't materialize the attention matrix. 02:44:06.060 |
And so there's, I think there's still like lots of things 02:44:08.660 |
to contribute, but you have to be just more strategic. 02:44:11.060 |
- Do you think neural networks can be made to reason? 02:44:24.660 |
- So in the way that humans think through a problem 02:44:35.460 |
I don't wanna say, but out of distribution ideas, 02:44:46.420 |
You're able to remix the training set information 02:44:52.460 |
- It doesn't appear verbatim in the training set. 02:44:54.660 |
Like you're doing something interesting algorithmically. 02:45:07.660 |
holy shit, this thing is definitely thinking? 02:45:12.740 |
is just information processing and generalization. 02:45:15.260 |
And I think the neural nets already do that today. 02:45:28.980 |
- Yeah, you're giving correct answers in novel settings 02:45:36.540 |
You're not doing just some kind of a lookup table 02:45:38.180 |
and there's neighbor search, something like that. 02:45:43.740 |
you think might make significant progress towards AGI? 02:45:49.340 |
what are the big blockers that we're missing now? 02:45:57.380 |
Basically automated systems that we can interact with 02:46:14.940 |
I'm suspicious that the text realm is not enough 02:46:17.580 |
to actually build full understanding of the world. 02:46:20.420 |
I do actually think you need to go into pixels 02:46:22.180 |
and understand the physical world and how it works. 02:46:24.860 |
So I do think that we need to extend these models 02:46:34.980 |
- Well, that's the big open question I would say in my mind 02:46:39.500 |
and the ability to sort of interact with the world, 02:46:42.460 |
run experiments and have a data of that form, 02:46:45.500 |
then you need to go to Optimus or something like that. 02:46:48.580 |
And so I would say Optimus in some way is like a hedge. 02:46:52.300 |
In AGI, because it seems to me that it's possible 02:46:57.300 |
that just having data from the internet is not enough. 02:47:00.220 |
If that is the case, then Optimus may lead to AGI 02:47:04.220 |
because Optimus, to me, there's nothing beyond Optimus. 02:47:09.340 |
that can actually like do stuff in the world. 02:47:11.340 |
You can have millions of them interacting with humans 02:47:14.460 |
And if that doesn't give rise to AGI at some point, 02:47:28.580 |
and you need to actually like build these things 02:47:38.180 |
and just like training these compression models effectively 02:47:43.900 |
And that might also give these agents as well. 02:47:48.100 |
Compress the internet, but also interact with the internet. 02:48:08.780 |
So it just feels like we're in boiling water. 02:48:26.780 |
like a year from now it will happen, that kind of thing. 02:48:30.100 |
I just feel like in the digital realm, it just might happen. 02:48:38.900 |
is there enough fertile ground on the periphery? 02:48:43.260 |
And we have the progress so far, which has been very rapid. 02:48:51.620 |
that we'll be interacting with digital entities. 02:48:54.260 |
- How will you know that somebody has built AGI? 02:48:58.100 |
I think it's going to be a slow incremental transition. 02:49:01.620 |
It's going to be GitHub Copilot getting better. 02:49:09.620 |
I think we're on a verge of being able to ask 02:49:12.340 |
very complex questions in chemistry, physics, math 02:49:16.260 |
of these oracles and have them complete solutions. 02:49:19.700 |
- So AGI to use primarily focused on intelligence. 02:49:27.540 |
- So in my mind, consciousness is not a special thing 02:49:32.100 |
I think it's an emergent phenomenon of a large enough 02:49:34.820 |
and complex enough generative model, sort of. 02:49:43.780 |
then it also understands its predicament in the world 02:49:48.500 |
which to me is a form of consciousness or self-awareness. 02:49:51.940 |
- So in order to understand the world deeply, 02:49:53.820 |
you probably have to integrate yourself into the world. 02:50:02.700 |
- I think consciousness is like a modeling insight. 02:50:07.260 |
- Yeah, it's a, you have a powerful enough model 02:50:10.060 |
of understanding the world that you actually understand 02:50:15.460 |
perhaps just the narrative we tell ourselves, 02:50:17.340 |
there's a, it feels like something to experience the world, 02:50:22.740 |
But that could be just a narrative that we tell ourselves. 02:50:24.860 |
- Yeah, I don't think, yeah, I think it will emerge. 02:50:27.140 |
I think it's going to be something very boring. 02:50:34.940 |
They will do all the things that you would expect 02:50:47.580 |
of whether you're allowed to turn off a conscious AI, 02:50:54.500 |
Maybe there would have to be the same kind of debates 02:51:03.060 |
but abortion, which is the deeper question with abortion 02:51:11.500 |
And the deep question with AI is also what is life 02:51:16.420 |
And I think that'll be very fascinating to bring up. 02:51:23.580 |
that are capable of such level of intelligence 02:51:29.860 |
and therefore the capacity to suffer would emerge. 02:51:32.180 |
And a system that says, no, please don't kill me. 02:51:41.220 |
Like it was talking about not wanting to die or so on. 02:51:48.060 |
- 'Cause otherwise you might have a lot of creatures 02:51:55.340 |
- You can just spawn infinity of them on a cluster. 02:51:57.860 |
- And then that might lead to like horrible consequences 02:52:05.060 |
and they'll start practicing murder on those systems. 02:52:07.620 |
I mean, there's just, to me, all of this stuff 02:52:10.420 |
just brings a beautiful mirror to the human condition 02:52:14.100 |
and human nature and we'll get to explore it. 02:52:15.820 |
And that's what like the best of the Supreme Court 02:52:19.620 |
of all the different debates we have about ideas 02:52:25.300 |
that we've been asking throughout human history. 02:52:27.380 |
There's always been the other in human history. 02:52:33.180 |
and we're going to, throughout human history, 02:52:37.860 |
And the same will probably happen with robots. 02:52:46.820 |
And I think there's some canary in the coal mines 02:52:56.660 |
but this person really like loved their waifu 02:52:59.420 |
and like is trying to like port it somewhere else 02:53:10.360 |
because in some sense they are like a mirror of humanity 02:53:13.420 |
because they are like sort of like a big average 02:53:18.500 |
- But we can, that average, we can actually watch. 02:53:29.660 |
And we can also, of course, also like shape it. 02:53:48.060 |
and ask her, talk about anything, maybe ask her a question. 02:53:54.220 |
- I would have some practical questions in my mind. 02:53:55.860 |
Like, do I or my loved ones really have to die? 02:54:11.780 |
and I know all these things that you've produced. 02:54:13.460 |
And it seems to me like here are the experiments 02:54:19.860 |
And here are the kinds of experiments that you should run. 02:54:22.300 |
- Okay, let's go with this thought experiment, okay? 02:54:46.140 |
If the AGI system is trying to empathize with you, human, 02:55:14.180 |
I think it's like such a sidekick to the entire story, 02:55:16.720 |
but at the same time, it's like really interesting. 02:55:19.700 |
- It's kind of limited in certain ways, right? 02:55:24.540 |
I don't think, I think it's fine and plausible 02:55:34.020 |
- As an example, like it has a fixed amount of compute 02:55:38.260 |
And it might just be that even though you can have 02:55:40.680 |
a super amazing mega brain, super intelligent AI, 02:55:43.980 |
you also can have like, you know, less intelligent AIs 02:55:46.580 |
that you can deploy in a power efficient way. 02:55:49.540 |
And then they're not perfect, they might make mistakes. 02:55:51.460 |
- No, I meant more like, say you had infinite compute, 02:55:55.320 |
and it's still good to make mistakes sometimes. 02:55:58.140 |
Like in order to integrate yourself, like, what is it? 02:56:05.560 |
"The human imperfections, that's the good stuff," right? 02:56:12.480 |
we want flaws in part to form connections with each other, 02:56:17.480 |
'cause it feels like something you can attach 02:56:22.720 |
And in that same way, you want an AI that's flawed. 02:56:26.060 |
I don't know, I feel like perfection is cool. 02:56:30.920 |
But see, AGI would need to be intelligent enough 02:56:33.920 |
to give answers to humans that humans don't understand. 02:56:36.800 |
And I think perfect is something humans can't understand. 02:56:40.120 |
Because even science doesn't give perfect answers. 02:56:42.520 |
There's always gaffes and mysteries, and I don't know. 02:56:50.080 |
- Yeah, I can imagine just having a conversation 02:56:52.760 |
with this kind of oracle entity, as you'd imagine them. 02:57:05.200 |
- But every dumb human will say, "Yeah, yeah, yeah, yeah. 02:57:08.720 |
"Trust me, give me the truth, I can handle it." 02:57:12.360 |
- But that's the beauty, like, people can choose. 02:57:15.000 |
But then, the old marshmallow test with the kids and so on, 02:57:20.000 |
I feel like too many people can't handle the truth, 02:57:37.920 |
- Yeah, I mean, this is "The Matrix," all over again. 02:57:47.200 |
Probably I would go with the safer scientific questions 02:57:52.080 |
at first that have nothing to do with my own personal life 02:57:55.800 |
and mortality, just like about physics and so on. 02:58:06.020 |
Would it be able to, presumably, in order to, 02:58:15.080 |
- Yeah, I think that's actually a wonderful benchmark, 02:58:19.320 |
I think that's a really good point, basically. 02:58:24.840 |
that is doing something very interesting computationally. 02:58:28.880 |
- Yeah, because it's hard in a way, like a Turing test. 02:58:33.880 |
The original intent of the Turing test is hard, 02:58:49.880 |
and if they don't laugh, that means you're not funny. 02:58:52.880 |
- And you're showing, you need a lot of knowledge 02:59:02.320 |
You tweeted, "Movies that I've seen five plus times, 02:59:10.360 |
"Good Will, Hunting, The Matrix, Lord of the Rings, 02:59:16.560 |
Terminator 2, mean girls, I'm not gonna ask about that. 02:59:23.600 |
- What are some that jump out to you in your memory 02:59:30.960 |
As a computer person, why do you love The Matrix? 02:59:35.400 |
that make it, like, beautiful and interesting. 02:59:36.680 |
So there's all these philosophical questions, 02:59:39.120 |
but then there's also AGIs, and there's simulation, 02:59:42.160 |
and it's cool, and there's, you know, the black, you know. 02:59:50.040 |
It was just, like, innovating in so many ways. 02:59:52.340 |
- And then Good Will, Hunting, why do you like that one? 02:59:57.500 |
- Yeah, I just, I really like this tortured genius 03:00:03.680 |
with whether or not he has, like, any responsibility 03:00:06.520 |
or, like, what to do with this gift that he was given, 03:00:08.720 |
or, like, how to think about the whole thing. 03:00:10.920 |
- But there's also a dance between the genius 03:00:24.320 |
- It, like, really, like, it messes with you. 03:00:27.040 |
You know, there's some movies that just, like, 03:00:51.680 |
but in terms of, like, its surface properties. 03:00:55.880 |
- Do you think Skynet is at all a possibility? 03:00:59.400 |
- Like, the actual, sort of, autonomous weapon system 03:01:15.480 |
I mean, these will be, like, very powerful entities, 03:01:18.000 |
And so, for a long time, they're going to be tools 03:01:22.240 |
You know, people talk about, like, alignment of AGIs 03:01:27.760 |
So, how this will be used and what this is gonna look like 03:01:36.600 |
that we'll be able to, as a human civilization, 03:01:41.760 |
- Yes, that's my hope, is that it happens slowly enough 03:01:44.000 |
and in an open enough way where a lot of people 03:01:48.120 |
Just figure out how to deal with this transition, 03:01:52.280 |
- I draw a lot of inspiration from nuclear weapons 03:02:05.240 |
are not so dangerous, they destroy human civilization, 03:02:12.720 |
we quickly, quickly, we might still deploy it, 03:02:17.800 |
And so, there'll be, like, this balance achieved. 03:02:40.400 |
That's probably my number one concern for humanity. 03:02:53.320 |
And it's not even about the full destruction. 03:03:00.400 |
And I can't believe we're, like, so close to it. 03:03:05.160 |
- It feels like we might be a few tweets away 03:03:21.680 |
take one step towards a bad direction and it escalates. 03:03:33.720 |
- Yeah, it's just, it's a huge amount of power. 03:03:41.880 |
I don't actually know what the good outcomes are here. 03:03:56.880 |
in the sense that there are good outcomes of AGI. 03:04:01.280 |
And then, the bad outcomes are, like, an epsilon away, 03:04:05.240 |
And so, I think capitalism and humanity and so on 03:04:08.240 |
will drive for the positive ways of using that technology. 03:04:11.960 |
But then, if bad outcomes are just, like, a tiny, 03:04:20.320 |
results in the destruction of the human species. 03:04:26.040 |
what's really weird about, like, the dynamics of humanity 03:04:29.160 |
is just, like, the insane coupling afforded by technology. 03:04:32.960 |
And just the instability of the whole dynamical system. 03:04:39.160 |
- Yeah, so that explosion could be destructive 03:04:40.920 |
or constructive, and the probabilities are non-zero 03:04:45.960 |
- I do feel like I have to try to be optimistic and so on. 03:04:54.720 |
- Do you think we'll become a multi-planetary species? 03:04:57.420 |
- Probably yes, but I don't know if it's a dominant feature 03:05:04.120 |
There might be some people on some planets and so on, 03:05:08.880 |
if it's, like, a major player in our culture and so on. 03:05:35.360 |
- Maybe eventually I would once it's safe enough, 03:05:37.560 |
but I don't actually know if it's on my lifetime scale, 03:05:43.960 |
a lot of people might disappear into virtual realities 03:05:49.240 |
of sort of the cultural development of humanity, 03:05:54.920 |
it's just really hard to work in physical realm 03:06:02.040 |
And so it's much easier to disappear into digital realm. 03:06:05.720 |
And I think people will find them more compelling, 03:06:10.600 |
- So you're a little bit captivated by virtual reality, 03:06:21.680 |
I'm interested, just talking a lot to Carmack, 03:06:24.920 |
where's the thing that's currently preventing that? 03:06:30.720 |
I think what's interesting about the future is, 03:06:35.360 |
I kind of feel like the variance in the human condition grows 03:06:40.400 |
It's not as much the mean of the distribution, 03:06:48.040 |
It's just like, there will be so many more ways of being. 03:06:51.800 |
I see it as like a spreading out of a human experience. 03:06:55.960 |
that allows you to discover those little groups 03:06:57.880 |
and then you gravitate to something about your biology 03:07:01.040 |
likes that kind of world and that you find each other. 03:07:05.720 |
and they're gonna, everything is just gonna coexist. 03:07:08.680 |
'cause I've interacted with a bunch of internet communities, 03:07:21.240 |
I mean, you even sense this, just having traveled to Ukraine, 03:07:24.720 |
they don't know so many things about America. 03:07:36.960 |
So you can see that happening more and more and more 03:07:46.760 |
And I don't see that trend like really reversing. 03:07:49.840 |
and they're able to choose their own path in existence. 03:07:56.240 |
- Will you spend so much time in the metaverse, 03:08:21.520 |
Maybe there's actually even more exotic things 03:08:23.760 |
you can think about with Neuralinks or stuff like that. 03:08:26.560 |
Currently, I kind of see myself as mostly a team human person. 03:08:37.760 |
And I just want to be in this like solar punk, 03:08:48.200 |
surrounded by lush, beautiful, dynamic nature, 03:09:00.560 |
- Yeah, I think technology used very sparingly. 03:09:03.080 |
I don't love when it sort of gets in the way of humanity 03:09:22.680 |
or for profound reasons that you would recommend? 03:09:32.920 |
Anything by Nick Lane, really, "Life Ascending," 03:09:36.000 |
I would say is like a bit more potentially representative, 03:09:47.680 |
that helped me understand altruism as an example 03:09:52.640 |
and the level of genes was a huge insight for me 03:09:55.160 |
and it sort of cleared up a lot of things for me. 03:10:05.920 |
- Are you able to walk around with that notion for a while, 03:10:08.920 |
that there is an evolutionary kind of process 03:10:15.440 |
and they compete, and they live in our brains. 03:10:19.400 |
- Are we silly humans thinking that we're the organisms? 03:10:22.080 |
Is it possible that the primary organisms are the ideas? 03:10:26.200 |
- Yeah, I would say the ideas kind of live in the software 03:10:43.080 |
- Yeah, yeah, I would say there needs to be some grounding 03:10:54.040 |
is the thing that makes that thing special, right? 03:10:59.360 |
- But then cloning might be exceptionally difficult. 03:11:07.440 |
what makes me special is more the gang of genes 03:11:10.740 |
that are riding in my chromosomes, I suppose, right? 03:11:13.180 |
Like they're the replicating unit, I suppose. 03:11:25.040 |
is your ability to survive based on the software 03:11:29.740 |
that runs on the hardware that was built by the genes. 03:11:33.080 |
So the software is the thing that makes you survive, 03:11:46.020 |
I mean, it's an abstraction on top of abstractions. 03:11:55.500 |
I would say sometimes books are like not sufficient. 03:12:06.840 |
they're too high up in the level of abstraction 03:12:14.700 |
That's why also I like the writing of Nick Lane 03:12:17.860 |
is because he's pretty willing to step one level down 03:12:25.740 |
But he's also willing to sort of be throughout the stack. 03:12:36.600 |
even high school, just textbooks on the basics. 03:12:46.340 |
It's sufficiently general that you can understand 03:12:54.540 |
and you get to play with it as much as you would 03:13:00.500 |
And then I'm also suspicious of textbooks, honestly, 03:13:11.340 |
These books like "The Cell" are kind of outdated. 03:13:14.540 |
Like what is the actual real source of truth? 03:13:30.180 |
and it's kind of interesting and I'm learning, 03:13:39.620 |
- But you have to learn that before you break out. 03:13:44.580 |
But what is the actual process of working with these cells 03:13:47.800 |
And, you know, it's kind of like a massive cooking recipes 03:13:50.180 |
of making sure your cells lives and proliferate 03:13:52.260 |
and then you're sequencing them, running experiments 03:13:58.360 |
what's really useful in terms of creating therapies 03:14:01.940 |
- Yeah, I wonder what in the future AI textbooks will be. 03:14:04.860 |
'Cause, you know, there's "Artificial Intelligence, 03:14:13.380 |
I also saw there's a "Science of Deep Learning" book. 03:14:15.860 |
I'm waiting for textbooks that are worth recommending, 03:14:19.580 |
- It's tricky 'cause it's like papers and code, code, code. 03:14:25.740 |
I especially like the appendix of any paper as well. 03:14:39.300 |
- Many times papers can be actually quite readable. 03:14:49.180 |
scientists use complex terms, even when it's not necessary. 03:14:55.820 |
- And papers sometimes are longer than they need to be 03:15:03.300 |
but then the paper itself, look at Einstein, make it simple. 03:15:07.100 |
- Yeah, but certainly I've come across papers, 03:15:08.540 |
I would say, like synthetic biology or something 03:15:15.900 |
but you kind of are getting a gist and I think it's cool. 03:15:25.460 |
but in general, life advice to a young person, 03:15:30.660 |
about how to have a career they can be proud of 03:15:34.740 |
- Yeah, I think I'm very hesitant to give general advice. 03:15:38.900 |
I've mentioned, like some of the stuff I've mentioned 03:15:41.740 |
like focus on just the amount of work you're spending 03:15:45.700 |
Compare yourself only to yourself, not to others. 03:15:51.300 |
- You just have like a deep interest in something 03:15:57.360 |
over like the things that you're interested in. 03:16:00.940 |
How do you not get distracted and switch to another thing? 03:16:07.820 |
- Well, if you do an argmax repeatedly every week, 03:16:13.300 |
- Yeah, you can like low pass filter yourself 03:16:15.340 |
in terms of like what has consistently been true for you. 03:16:18.180 |
But yeah, I definitely see how it can be hard, 03:16:22.180 |
but I would say like you're going to work the hardest 03:16:26.020 |
So low pass filter yourself and really introspect. 03:16:28.980 |
In your past, what are the things that gave you energy? 03:16:31.180 |
And what are the things that took energy away from you? 03:16:42.700 |
but the kind of stuff you're doing in a particular field. 03:16:47.460 |
by implementing stuff, building actual things. 03:16:58.140 |
Because I usually have to do way too much work 03:17:01.700 |
And then I'm like, okay, this is actually like, 03:17:12.500 |
- So aside from the teaching you're doing now, 03:17:15.380 |
putting out videos, aside from a potential Godfather part two 03:17:24.620 |
what does the future for Andrej Karpathy hold? 03:17:37.460 |
of what that possible future could look like? 03:17:39.700 |
- The consistent thing I've been always interested in, 03:17:47.940 |
the rest of my life on, because I just care about it a lot. 03:17:50.820 |
And I actually care about many other problems as well, 03:17:53.420 |
like say aging, which I basically view as disease. 03:18:02.340 |
I don't actually think that humans will be able 03:18:06.180 |
I think the correct thing to do is to ignore those problems 03:18:08.820 |
and you solve AI and then use that to solve everything else. 03:18:11.820 |
And I think there's a chance that this will work. 03:18:14.820 |
And that's kind of like the way I'm betting at least. 03:18:20.060 |
are you interested in all kinds of applications, 03:18:23.380 |
all kinds of domains, and any domain you focus on 03:18:26.780 |
will allow you to get insights to the big problem of AGI? 03:18:30.020 |
- Yeah, for me, it's the ultimate meta problem. 03:18:31.860 |
I don't wanna work on any one specific problem. 03:18:34.380 |
So how can you work on all problems simultaneously? 03:18:42.340 |
- Is there cool small projects like Archive Sanity 03:18:53.140 |
- There's always like some fun side projects. 03:18:57.140 |
Basically, like there's way too many archive papers. 03:18:58.860 |
How can I organize it and recommend papers and so on? 03:19:09.820 |
like you like consuming audio books and podcasts and so on. 03:19:16.460 |
closer to human level performance on annotation. 03:19:19.300 |
- Yeah, well, I definitely was like surprised 03:19:30.460 |
And that's what gave me some energy to like try it out. 03:19:34.300 |
And I thought it could be fun to run on podcasts. 03:19:38.520 |
why Whisper is so much better compared to anything else, 03:19:41.340 |
because I feel like there should be a lot of incentive 03:19:43.020 |
for a lot of companies to produce transcription systems 03:19:55.780 |
The model and everything has been around for a long time. 03:20:07.140 |
even at Google and so on, YouTube transcription. 03:20:12.540 |
but some of it is also integrating into a bigger system. 03:20:18.300 |
how it's deployed and all that kind of stuff. 03:20:19.780 |
Maybe running it as an independent thing is much easier, 03:20:27.740 |
like YouTube transcription or anything like meetings, 03:20:37.980 |
where it detects the different individual speakers, 03:20:59.940 |
And it seems like there's a huge incentive to automate that. 03:21:05.600 |
- And I think, I mean, I don't know if you looked 03:21:12.940 |
- I've seen Whisper's performance on like super tricky cases 03:21:31.860 |
- But yeah, there's always like fun projects basically. 03:21:49.660 |
when the cost of content creation is going to fall to zero. 03:21:59.180 |
- So Hollywood will start using that to generate scenes, 03:22:05.660 |
- Yeah, so you can make a movie like "Avatar" 03:22:12.360 |
- Much less, maybe just by talking to your phone. 03:22:39.180 |
I mean, it's humbling because we treat ourselves 03:22:46.340 |
If that can be done in an automated way by AI. 03:22:49.940 |
- Yeah, I think it's fascinating to me how these... 03:22:52.740 |
The predictions of AI and what it's going to look like 03:22:57.580 |
And sci-fi of 50s and 60s was just like totally not right. 03:23:01.300 |
They imagined AI as like super calculating theorem provers 03:23:04.860 |
and we're getting things that can talk to you 03:23:11.860 |
Just AI's like hybrid systems, heterogeneous systems 03:23:24.900 |
- I think it's going to be interesting for sure. 03:23:33.660 |
- Well, the sad thing is your brain and mine developed 03:23:37.320 |
in a time where before Twitter, before the internet. 03:23:42.320 |
So I wonder people that are born inside of it 03:23:54.740 |
- Well, I do feel like humans are extremely malleable. 03:24:10.700 |
or with the systems we create to try to answer. 03:24:14.020 |
For the universe, for the creator of the universe 03:24:23.740 |
- I don't know if that's the meaning of life. 03:24:30.260 |
because we are a conscious entity and it's beautiful. 03:24:34.120 |
But I do think that like a deeper meaning of life 03:24:37.220 |
if someone is interested is along the lines of like, 03:24:43.360 |
And if you look at the, into fundamental physics 03:24:46.140 |
and the quantum field theory and the standard model, 03:24:55.440 |
And like, what's going on with all this stuff? 03:25:03.360 |
And so I think there's some fundamental answers there. 03:25:07.640 |
you can't actually like really make dent in those 03:25:11.200 |
And so to me also, there's a big question around 03:25:18.100 |
- So kind of the ultimate, or at least first way 03:25:22.160 |
to sneak up to the why question is to try to escape 03:25:30.400 |
And then for that, you sort of backtrack and say, 03:25:34.280 |
okay, for that, that's gonna be, take a very long time. 03:25:36.720 |
So the why question boils down from an engineering 03:25:41.280 |
- Yeah, I think that's the question number one, 03:25:49.000 |
- And that could be extending your own lifetime 03:25:50.840 |
or extending just the lifetime of human civilization. 03:26:02.400 |
And I don't know that people fully realize this. 03:26:08.840 |
But at the end of the day, this is a physical system. 03:26:21.400 |
- That'd be interesting if death is eventually looked at 03:26:23.960 |
as a fascinating thing that used to happen to humans. 03:26:31.980 |
- And it's up to our imagination to try to predict 03:27:02.520 |
of all the humans and organisms that are alive. 03:27:05.680 |
- Yeah, the way we find meaning might change. 03:27:08.480 |
There is a lot of humans, probably including myself, 03:27:11.000 |
that finds meaning in the finiteness of things. 03:27:14.560 |
But that doesn't mean that's the only source of meaning. 03:27:24.080 |
Like you are born as a conscious, free entity, 03:27:28.360 |
And you have your unalienable rights for life. 03:27:54.280 |
I don't think there's a more beautiful way to end it. 03:28:02.040 |
Everything you've done for the machine learning world, 03:28:07.400 |
to educate millions of people, it's been great. 03:28:20.640 |
please check out our sponsors in the description. 03:28:23.640 |
And now, let me leave you with some words from Samuel Carlin. 03:28:28.640 |
The purpose of models is not to fit the data, 03:28:34.560 |
Thanks for listening, and hope to see you next time.