back to index

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333


Chapters

0:0 Introduction
0:58 Neural networks
6:1 Biology
11:32 Aliens
21:43 Universe
33:34 Transformers
41:50 Language models
52:1 Bots
58:21 Google's LaMDA
65:44 Software 2.0
76:44 Human annotation
78:41 Camera vision
83:46 Tesla's Data Engine
87:56 Tesla Vision
94:26 Elon Musk
99:33 Autonomous driving
104:28 Leaving Tesla
109:55 Tesla's Optimus
119:1 ImageNet
121:40 Data
131:31 Day in the life
144:47 Best IDE
151:53 arXiv
156:23 Advice for beginners
165:40 Artificial general intelligence
179:0 Movies
184:53 Future of human civilization
189:13 Book recommendations
195:21 Advice for young people
197:12 Future of machine learning
204:0 Meaning of life

Whisper Transcript | Transcript Only Page

00:00:00.000 | I think it's possible that physics has exploits
00:00:01.880 | and we should be trying to find them.
00:00:03.560 | Arranging some kind of a crazy quantum mechanical system
00:00:05.920 | that somehow gives you buffer overflow,
00:00:08.380 | somehow gives you a rounding error in the floating point.
00:00:10.760 | Synthetic intelligences are kind of like
00:00:13.240 | the next stage of development.
00:00:15.320 | And I don't know where it leads to,
00:00:17.120 | like at some point I suspect the universe
00:00:20.760 | is some kind of a puzzle.
00:00:23.120 | These synthetic AIs will uncover that puzzle and solve it.
00:00:27.600 | (air whooshing)
00:00:30.120 | The following is a conversation with Andrej Karpathy,
00:00:32.880 | previously the director of AI at Tesla.
00:00:35.840 | And before that at OpenAI and Stanford.
00:00:39.800 | He is one of the greatest scientist, engineers
00:00:43.600 | and educators in the history of artificial intelligence.
00:00:47.820 | This is the Lex Friedman podcast.
00:00:50.120 | To support it, please check out our sponsors.
00:00:52.760 | And now dear friends, here's Andrej Karpathy.
00:00:58.120 | What is a neural network?
00:01:00.160 | And why does it seem to do such a surprisingly
00:01:03.160 | good job of learning?
00:01:04.600 | - What is a neural network?
00:01:05.440 | It's a mathematical abstraction of the brain.
00:01:10.040 | I would say that's how it was originally developed.
00:01:12.680 | At the end of the day, it's a mathematical expression.
00:01:14.560 | And it's a fairly simple mathematical expression
00:01:16.140 | when you get down to it.
00:01:17.380 | It's basically a sequence of matrix multiplies,
00:01:21.400 | which are really dot products mathematically.
00:01:23.480 | And some non-linearity is thrown in.
00:01:25.440 | And so it's a very simple mathematical expression.
00:01:27.640 | And it's got knobs in it.
00:01:29.200 | - Many knobs.
00:01:30.040 | - Many knobs.
00:01:30.880 | And these knobs are loosely related to basically
00:01:33.120 | the synapses in your brain.
00:01:34.160 | They're trainable, they're modifiable.
00:01:35.840 | And so the idea is like, we need to find the setting
00:01:37.720 | of the knobs that makes the neural net do whatever
00:01:40.760 | you want it to do, like classify images and so on.
00:01:43.560 | And so there's not too much mystery, I would say in it.
00:01:45.640 | Like you might think that, basically don't want to endow it
00:01:49.640 | with too much meaning with respect to the brain
00:01:51.360 | and how it works.
00:01:52.800 | It's really just a complicated mathematical expression
00:01:55.000 | with knobs and those knobs need a proper setting
00:01:57.600 | for it to do something desirable.
00:01:59.280 | - Yeah, but poetry is just the collection of letters
00:02:02.120 | with spaces, but it can make us feel a certain way.
00:02:05.320 | And in that same way, when you get a large number
00:02:07.400 | of knobs together, whether it's inside the brain
00:02:10.880 | or inside a computer, they seem to surprise us
00:02:14.680 | with their power.
00:02:16.080 | - Yeah, I think that's fair.
00:02:17.400 | So basically I'm underselling it by a lot
00:02:20.000 | because you definitely do get very surprising emergent
00:02:23.760 | behaviors out of these neural nets when they're large enough
00:02:26.040 | and trained on complicated enough problems.
00:02:28.760 | Like say, for example, the next word prediction
00:02:31.280 | in a massive dataset from the internet.
00:02:33.560 | And then these neural nets take on pretty surprising
00:02:36.320 | magical properties.
00:02:37.760 | Yeah, I think it's kind of interesting how much you can get
00:02:39.960 | out of even very simple mathematical formalism.
00:02:42.280 | - When your brain right now is talking,
00:02:44.360 | is it doing next word prediction?
00:02:47.160 | Or is it doing something more interesting?
00:02:49.120 | - Well, it's definitely some kind of a generative model
00:02:50.920 | that's a GPT-like and prompted by you.
00:02:53.520 | So you're giving me a prompt and I'm kind of like responding
00:02:57.400 | to it in a generative way.
00:02:58.640 | - And by yourself, perhaps a little bit?
00:03:00.840 | Like, are you adding extra prompts from your own memory
00:03:04.680 | inside your head?
00:03:05.680 | Or no? - Well, it definitely feels
00:03:07.400 | like you're referencing some kind of a declarative structure
00:03:10.000 | of like memory and so on.
00:03:12.240 | And then you're putting that together with your prompt
00:03:15.440 | and giving away some answers.
00:03:17.080 | - How much of what you just said has been said by you before?
00:03:21.200 | - Nothing, basically, right?
00:03:23.600 | - No, but if you actually look at all the words
00:03:26.000 | you've ever said in your life and you do a search,
00:03:29.480 | you'll probably have said a lot of the same words
00:03:32.200 | in the same order before.
00:03:34.160 | - Yeah, could be.
00:03:35.400 | I mean, I'm using phrases that are common, et cetera,
00:03:37.480 | but I'm remixing it into a pretty sort of unique sentence
00:03:41.240 | at the end of the day.
00:03:42.080 | But you're right, definitely there's like a ton of remixing.
00:03:44.280 | - Why, you didn't, it's like Magnus Carlsen said,
00:03:48.360 | I'm rated 2,900 whatever, which is pretty decent.
00:03:52.360 | I think you're talking very,
00:03:55.240 | you're not giving enough credit to your own nuts here.
00:03:58.080 | Why do they seem to, what's your best intuition
00:04:02.320 | about this emergent behavior?
00:04:05.600 | - I mean, it's kind of interesting
00:04:06.440 | because I'm simultaneously underselling them,
00:04:08.840 | but I also feel like there's an element to which I'm over,
00:04:11.200 | like, it's actually kind of incredible
00:04:12.800 | that you can get so much emergent magical behavior
00:04:14.800 | out of them despite them being so simple mathematically.
00:04:17.560 | So I think those are kind of like two surprising statements
00:04:19.680 | that are kind of juxtaposed together.
00:04:22.680 | And I think basically what it is,
00:04:23.760 | is we are actually fairly good at optimizing
00:04:25.880 | these neural nets.
00:04:27.160 | And when you give them a hard enough problem,
00:04:29.640 | they are forced to learn very interesting solutions
00:04:32.760 | in the optimization.
00:04:34.080 | And those solution basically have these emergent properties
00:04:37.440 | that are very interesting.
00:04:38.840 | - There's wisdom and knowledge in the knobs.
00:04:42.720 | And so this representation that's in the knobs,
00:04:47.680 | does it make sense to you intuitively
00:04:49.400 | that a large number of knobs can hold a representation
00:04:52.720 | that captures some deep wisdom about the data
00:04:55.840 | it has looked at?
00:04:57.480 | It's a lot of knobs.
00:04:58.760 | - It's a lot of knobs.
00:05:00.120 | And somehow, you know, so speaking concretely,
00:05:03.600 | one of the neural nets that people are very excited
00:05:05.160 | about right now are GPTs,
00:05:07.520 | which are basically just next word prediction networks.
00:05:10.280 | So you consume a sequence of words from the internet
00:05:13.720 | and you try to predict the next word.
00:05:15.520 | And once you train these on a large enough dataset,
00:05:19.760 | you can basically prompt these neural nets
00:05:24.040 | in arbitrary ways and you can ask them to solve problems
00:05:25.960 | and they will.
00:05:26.960 | So you can just tell them,
00:05:28.560 | you can make it look like you're trying to solve
00:05:32.280 | some kind of a mathematical problem
00:05:33.480 | and they will continue what they think is the solution
00:05:35.440 | based on what they've seen on the internet.
00:05:36.960 | And very often, those solutions look
00:05:38.760 | very remarkably consistent, look correct potentially.
00:05:41.960 | - Do you still think about the brain side of it?
00:05:45.400 | So as neural nets is an abstraction
00:05:47.520 | or mathematical abstraction of the brain,
00:05:49.560 | you still draw wisdom from the biological neural networks?
00:05:54.560 | Or even the bigger question,
00:05:57.760 | so you're a big fan of biology and biological computation.
00:06:00.940 | What impressive thing is biology doing to you
00:06:05.840 | that computers are not yet?
00:06:07.840 | That gap.
00:06:09.120 | - I would say I'm definitely on,
00:06:10.920 | I'm much more hesitant with the analogies to the brain
00:06:13.400 | than I think you would see potentially in the field.
00:06:16.240 | And I kind of feel like,
00:06:17.440 | certainly the way neural networks started
00:06:20.640 | is everything stemmed from inspiration by the brain.
00:06:23.960 | But at the end of the day,
00:06:24.960 | the artifacts that you get after training,
00:06:27.360 | they are arrived at by a very different optimization process
00:06:30.000 | than the optimization process that gave rise to the brain.
00:06:32.840 | And so I think,
00:06:33.800 | I kind of think of it as a very complicated alien artifact.
00:06:38.680 | It's something different.
00:06:39.760 | I'm sorry, the neural nets that we're training.
00:06:42.520 | They are complicated alien artifact.
00:06:45.680 | I do not make analogies to the brain
00:06:47.320 | because I think the optimization process
00:06:49.000 | that gave rise to it is very different from the brain.
00:06:51.720 | So there was no multi-agent self-play kind of setup
00:06:55.960 | and evolution.
00:06:57.040 | It was an optimization that is basically
00:06:59.520 | what amounts to a compression objective
00:07:01.880 | on a mass amount of data.
00:07:03.440 | - Okay, so artificial neural networks are doing compression
00:07:07.240 | and biological neural networks--
00:07:10.400 | - Are trying to survive.
00:07:11.400 | - Are not really doing anything.
00:07:13.280 | They're an agent in a multi-agent self-play system
00:07:16.840 | that's been running for a very, very long time.
00:07:19.440 | - That said, evolution has found that it is very useful
00:07:23.160 | to predict and have a predictive model in the brain.
00:07:26.360 | And so I think our brain utilizes something
00:07:28.760 | that looks like that as a part of it.
00:07:30.960 | But it has a lot more gadgets and gizmos
00:07:33.760 | and value functions and ancient nuclei
00:07:37.360 | that are all trying to make it survive
00:07:39.000 | and reproduce and everything else.
00:07:40.800 | - And the whole thing through embryogenesis
00:07:43.080 | is built from a single cell.
00:07:44.640 | I mean, it's just the code is inside the DNA
00:07:48.200 | and it just builds it up like the entire organism
00:07:51.280 | with arms and a head and legs.
00:07:55.240 | And it does it pretty well.
00:07:57.720 | - It should not be possible.
00:07:59.200 | - So there's some learning going on.
00:08:00.640 | There's some kind of computation
00:08:03.120 | going through that building process.
00:08:05.000 | I mean, I don't know where,
00:08:08.080 | if you were just to look at the entirety
00:08:09.960 | of history of life on Earth,
00:08:11.840 | what do you think is the most interesting invention?
00:08:15.320 | Is it the origin of life itself?
00:08:17.760 | Is it just jumping to eukaryotes?
00:08:20.440 | Is it mammals?
00:08:22.200 | Is it humans themselves, Homo sapiens?
00:08:24.720 | The origin of intelligence or highly complex intelligence?
00:08:29.760 | Or is it all just a continuation
00:08:33.240 | of the same kind of process?
00:08:34.680 | - Certainly I would say it's an extremely remarkable story
00:08:38.480 | that I'm only briefly learning about recently.
00:08:41.360 | All the way from, actually,
00:08:44.080 | you almost have to start at the formation of Earth
00:08:46.400 | and all of its conditions and the entire solar system
00:08:48.240 | and how everything is arranged with Jupiter and Moon
00:08:50.760 | and the habitable zone and everything.
00:08:53.440 | And then you have an active Earth
00:08:55.640 | that's turning over material.
00:08:57.600 | And then you start with abiogenesis and everything.
00:09:01.840 | And so it's all a pretty remarkable story.
00:09:03.920 | I'm not sure that I can pick a single unique piece of it
00:09:08.920 | that I find most interesting.
00:09:10.760 | I guess for me as an artificial intelligence researcher,
00:09:14.320 | it's probably the last piece.
00:09:15.320 | We have lots of animals that are not building
00:09:19.680 | technological society, but we do.
00:09:22.160 | And it seems to have happened very quickly.
00:09:24.800 | It seems to have happened very recently.
00:09:26.760 | And something very interesting happened there
00:09:30.120 | that I don't fully understand.
00:09:31.120 | I almost understand everything else,
00:09:32.760 | I think intuitively, but I don't understand
00:09:35.440 | exactly that part and how quick it was.
00:09:37.920 | - Both explanations would be interesting.
00:09:39.960 | One is that this is just a continuation
00:09:42.160 | of the same kind of process.
00:09:43.240 | There's nothing special about humans.
00:09:45.400 | That would be, deeply understanding that
00:09:47.680 | would be very interesting.
00:09:48.920 | That we think of ourselves as special,
00:09:50.640 | but it was obvious, it was already written in the code
00:09:56.420 | that you would have greater and greater
00:09:59.680 | intelligence emerging.
00:10:00.880 | And then the other explanation,
00:10:03.640 | which is something truly special happened,
00:10:05.640 | something like a rare event,
00:10:08.040 | whether it's like crazy rare event,
00:10:10.000 | like a space odyssey, what would it be?
00:10:12.520 | See, if you say like the invention of fire,
00:10:14.940 | or the, as Richard Rankin says, the beta males
00:10:21.080 | deciding a clever way to kill the alpha males
00:10:25.560 | by collaborating, so just optimizing the collaboration,
00:10:29.160 | the multi-agent aspect of the multi-agent,
00:10:31.480 | and that really being constrained on resources
00:10:35.000 | and trying to survive, the collaboration aspect
00:10:38.280 | is what created the complex intelligence.
00:10:40.560 | But it seems like it's a natural outgrowth
00:10:42.880 | of the evolution process.
00:10:44.400 | What could possibly be a magical thing that happened,
00:10:47.560 | like a rare thing that would say that humans
00:10:50.520 | are actually, human level intelligence
00:10:52.760 | is actually a really rare thing in the universe?
00:10:55.680 | - Yeah, I'm hesitant to say that it is rare, by the way,
00:10:58.880 | but it definitely seems like,
00:11:01.080 | it's kind of like a punctuated equilibrium
00:11:02.780 | where you have lots of exploration,
00:11:05.100 | and then you have certain leaps, sparse leaps in between.
00:11:08.080 | So of course, like origin of life would be one,
00:11:10.840 | DNA, sex, eukaryotic system, eukaryotic life,
00:11:15.840 | the endosymbiosis event where the Archeon ate
00:11:18.800 | little bacteria, just the whole thing.
00:11:20.720 | Then of course, emergence of consciousness and so on.
00:11:23.520 | So it seems like definitely there are sparse events
00:11:25.560 | where massive amount of progress was made,
00:11:27.360 | but yeah, it's kind of hard to pick one.
00:11:29.600 | - So you don't think humans are unique?
00:11:32.360 | Gotta ask you, how many intelligent alien civilizations
00:11:35.200 | do you think are out there?
00:11:36.800 | And is their intelligence different or similar to ours?
00:11:41.800 | - Yeah. (laughs)
00:11:45.960 | I've been preoccupied with this question
00:11:47.520 | quite a bit recently, basically the Fermi paradox
00:11:50.280 | I'm just thinking through.
00:11:51.440 | And the reason actually that I am very interested
00:11:54.200 | in the origin of life is fundamentally trying to understand
00:11:57.360 | how common it is that there are technological societies
00:11:59.400 | out there in space.
00:12:02.800 | And the more I study it, the more I think that
00:12:05.900 | there should be quite a lot.
00:12:09.400 | - Why haven't we heard from them?
00:12:10.760 | 'Cause I agree with you.
00:12:11.920 | It feels like I just don't see why
00:12:16.920 | what we did here on Earth is so difficult to do.
00:12:20.140 | - Yeah, and especially when you get into the details of it,
00:12:22.000 | I used to think origin of life was very,
00:12:23.960 | it was this magical rare event,
00:12:27.200 | but then you read books like, for example, Nick Lane,
00:12:29.900 | The Vital Question, Life Ascending, et cetera.
00:12:34.260 | And he really gets in and he really makes you believe
00:12:36.960 | that this is not that rare.
00:12:38.560 | - Basic chemistry.
00:12:39.900 | - You have an active Earth and you have your alkaline vents
00:12:41.960 | and you have lots of alkaline waters
00:12:44.000 | mixing with the ocean and you have your proton gradients
00:12:47.000 | and you have little porous pockets of these alkaline vents
00:12:49.920 | that concentrate chemistry.
00:12:51.640 | And basically as he steps through all of these little pieces
00:12:54.980 | you start to understand that actually this is not that crazy
00:12:58.840 | you could see this happen on other systems.
00:13:01.540 | And he really takes you from just a geology
00:13:04.840 | to primitive life and he makes it feel like
00:13:07.900 | it's actually pretty plausible.
00:13:09.240 | And also like the origin of life
00:13:11.040 | was actually fairly fast after formation of Earth.
00:13:16.020 | If I remember correctly, just a few hundred million years
00:13:18.740 | or something like that after basically when it was possible
00:13:21.080 | life actually arose.
00:13:22.440 | And so that makes me feel like that is not the constraint.
00:13:25.000 | That is not the limiting variable
00:13:26.240 | and that life should actually be fairly common.
00:13:28.600 | And then where the drop-offs are
00:13:31.880 | is very interesting to think about.
00:13:35.480 | I currently think that there's no major drop-offs basically.
00:13:38.280 | And so there should be quite a lot of life.
00:13:40.040 | And basically where that brings me to then
00:13:42.680 | is the only way to reconcile the fact
00:13:44.440 | that we haven't found anyone and so on
00:13:46.040 | is that we just can't see them, we can't observe them.
00:13:50.540 | - Just a quick brief comment.
00:13:51.940 | Nick Lane and a lot of biologists I talk to,
00:13:54.480 | they really seem to think that the jump from bacteria
00:13:58.280 | to more complex organisms is the hardest jump.
00:14:01.100 | - The eukaryotic life basically.
00:14:02.380 | - Yeah, which I get it.
00:14:04.820 | They're much more knowledgeable than me
00:14:08.400 | about the intricacies of biology.
00:14:11.020 | But that seems like crazy 'cause how many
00:14:14.660 | single-celled organisms are there?
00:14:16.980 | And how much time you have, surely it's not that difficult.
00:14:21.340 | Like in a billion years is not even that long
00:14:24.620 | of a time really.
00:14:26.140 | Just all these bacteria under constrained resources
00:14:29.300 | battling it out, I'm sure they can invent more complex.
00:14:32.020 | Like I don't understand, it's like how to move
00:14:34.460 | from a Hello World program to like invent a function
00:14:38.300 | or something like that.
00:14:39.460 | Yeah, so I'm with you.
00:14:43.100 | I just feel like I don't see any,
00:14:45.060 | if the origin of life, that would be my intuition,
00:14:47.900 | that's the hardest thing.
00:14:48.980 | But if that's not the hardest thing
00:14:50.140 | 'cause it happens so quickly,
00:14:51.780 | then it's gotta be everywhere.
00:14:53.140 | And yeah, maybe we're just too dumb to see it.
00:14:55.340 | - Well, it's just we don't have really good mechanisms
00:14:57.100 | for seeing this life.
00:14:58.060 | I mean, by what, how?
00:15:01.100 | So I'm not an expert just to preface this,
00:15:02.700 | but just from what I've been looking at it.
00:15:04.220 | (laughing)
00:15:05.500 | - I wanna meet an expert on alien intelligence
00:15:08.220 | and how to communicate.
00:15:09.060 | - I'm very suspicious of our ability
00:15:10.460 | to find these intelligences out there
00:15:12.420 | and to find these, or it's like radio waves,
00:15:14.540 | for example, are terrible.
00:15:16.380 | Their power drops off as basically one over R square.
00:15:19.180 | So I remember reading that our current radio waves
00:15:22.060 | would not be, the ones that we are broadcasting
00:15:25.380 | would not be measurable by our devices today.
00:15:28.780 | Only like, was it like one 10th of a light year away?
00:15:30.900 | Like not even, basically tiny distance
00:15:33.020 | because you really need like a targeted transmission
00:15:36.260 | of massive power directed somewhere
00:15:38.580 | for this to be picked up on long distances.
00:15:41.340 | And so I just think that our ability to measure
00:15:43.060 | is not amazing.
00:15:45.020 | I think there's probably other civilizations out there.
00:15:47.020 | And then the big question is why don't they build
00:15:48.620 | one on their probes and why don't they interstellar travel
00:15:50.820 | across the entire galaxy?
00:15:52.460 | And my current answer is it's probably interstellar travel
00:15:55.020 | is like really hard.
00:15:56.540 | You have the interstellar medium.
00:15:57.620 | If you wanna move at close to the speed of light,
00:15:59.380 | you're going to be encountering bullets along the way
00:16:01.860 | because even like tiny hydrogen atoms
00:16:04.420 | and little particles of dust are basically have
00:16:07.060 | like massive kinetic energy at those speeds.
00:16:09.460 | And so basically you need some kind of shielding.
00:16:11.460 | You need, you have all the cosmic radiation.
00:16:13.940 | It's just like brutal out there.
00:16:15.100 | It's really hard.
00:16:16.020 | And so my thinking is maybe interstellar travel
00:16:18.100 | is just extremely hard.
00:16:19.900 | And you have to go very slow. - And billions of years
00:16:21.420 | to build hard?
00:16:22.500 | It feels like we're not a billion years away from doing that.
00:16:28.580 | - It just might be that it's very,
00:16:30.260 | you have to go very slowly, potentially, as an example,
00:16:32.860 | through space.
00:16:34.300 | - Right, as opposed to close to the speed of light.
00:16:36.660 | - So I'm suspicious basically of our ability to measure life
00:16:38.860 | and I'm suspicious of the ability to just permeate
00:16:42.180 | all of space in the galaxy or across galaxies.
00:16:44.460 | And that's the only way that I can currently see
00:16:47.060 | a way around it.
00:16:47.980 | - Yeah, it's kind of mind blowing to think
00:16:49.740 | that there's trillions of intelligent alien civilizations
00:16:53.820 | out there kind of slowly traveling through space
00:16:57.140 | to meet each other.
00:16:59.100 | And some of them meet, some of them go to war,
00:17:01.300 | some of them collaborate.
00:17:03.080 | - Or they're all just independent.
00:17:06.260 | They're all just like little pockets.
00:17:08.940 | - Well, statistically, if there's trillions of them,
00:17:13.340 | surely some of the pockets are close enough together.
00:17:16.260 | - Some of them happen to be close, yeah.
00:17:18.340 | - Close enough to see each other.
00:17:19.580 | And then once you see something that is definitely
00:17:24.460 | complex life, like if we see something,
00:17:28.060 | we're probably going to be severely, intensely,
00:17:30.900 | aggressively motivated to figure out what the hell that is
00:17:33.700 | and try to meet them.
00:17:35.060 | But what would be your first instinct to try to,
00:17:38.420 | like at a generational level, meet them or defend
00:17:43.140 | against them or what would be your instinct
00:17:47.860 | as a president of the United States and a scientist?
00:17:51.840 | I don't know which hat you prefer in this question.
00:17:55.520 | - Yeah, I think the question, it's really hard.
00:17:58.180 | I will say, like for example, for us,
00:18:02.760 | we have lots of primitive life forms on earth next to us.
00:18:05.960 | We have all kinds of ants and everything else,
00:18:07.880 | and we share a space with them.
00:18:09.280 | And we are hesitant to impact on them.
00:18:11.720 | And we're trying to protect them by default
00:18:14.920 | because they are amazing, interesting, dynamical systems
00:18:17.360 | that took a long time to evolve,
00:18:18.640 | and they are interesting and special.
00:18:20.600 | And I don't know that you want to destroy that by default.
00:18:25.600 | And so I like complex, dynamical systems
00:18:29.560 | that took a lot of time to evolve.
00:18:31.640 | I think I'd like to preserve it if I can afford to.
00:18:36.640 | And I'd like to think that the same would be true
00:18:38.440 | about the galactic resources and that they would think
00:18:41.960 | that we're kind of incredible, interesting story
00:18:44.140 | that took time, it took a few billion years to unravel,
00:18:47.560 | and you don't want to just destroy it.
00:18:49.000 | - I could see two aliens talking about earth right now
00:18:51.720 | and saying, "I'm a big fan of complex, dynamical systems.
00:18:55.560 | So I think it's a value to preserve these."
00:18:59.440 | And it will basically are a video game they watch
00:19:01.560 | or show a TV show that they watch.
00:19:04.200 | - Yeah, I think you would need a very good reason,
00:19:06.320 | I think, to destroy it.
00:19:08.800 | Like, why don't we destroy these ant farms and so on?
00:19:10.600 | It's because we're not actually really
00:19:11.840 | in direct competition with them right now.
00:19:14.660 | We do it accidentally and so on,
00:19:16.000 | but there's plenty of resources.
00:19:19.440 | And so why would you destroy something
00:19:20.920 | that is so interesting and precious?
00:19:22.360 | - Well, from a scientific perspective, you might probe it.
00:19:25.640 | You might interact with it lightly.
00:19:27.560 | - You might want to learn something from it, right?
00:19:29.520 | So I wonder, there could be certain physical phenomena
00:19:32.320 | that we think is a physical phenomena,
00:19:34.180 | but it's actually interacting with us
00:19:35.960 | to poke the finger and see what happens.
00:19:38.440 | - I think it should be very interesting to scientists,
00:19:40.080 | other alien scientists, what happened here.
00:19:42.980 | And what we're seeing today is a snapshot.
00:19:45.720 | Basically, it's a result of a huge amount of computation
00:19:48.520 | over a billion years or something like that.
00:19:52.680 | - It could have been initiated by aliens.
00:19:54.880 | This could be a computer running a program.
00:19:58.360 | If you had the power to do this, okay, for sure,
00:20:01.880 | at least I would, I would pick a Earth-like planet
00:20:06.180 | that has the conditions, based on my understanding
00:20:07.940 | of the chemistry prerequisites for life,
00:20:10.600 | and I would seed it with life and run it, right?
00:20:14.760 | Wouldn't you 100% do that and observe it and protect?
00:20:19.200 | I mean, that's not just a hell of a good TV show.
00:20:21.840 | It's a good scientific experiment.
00:20:27.040 | - It's physical simulation, right?
00:20:29.880 | Maybe evolution is the most, like actually running it
00:20:34.600 | is the most efficient way to understand computation
00:20:40.200 | or to compute stuff.
00:20:41.280 | - Or to understand life or what life looks like
00:20:44.220 | and what branches it can take.
00:20:46.120 | - It does make me kind of feel weird
00:20:47.600 | that we're part of a science experiment,
00:20:49.160 | but maybe everything's a science experiment.
00:20:52.920 | Does that change anything for us, for a science experiment?
00:20:56.960 | - I don't know.
00:20:58.320 | - Two descendants of apes talking about
00:21:00.520 | being inside of a science experiment?
00:21:01.880 | - I'm suspicious of this idea of like a deliberate
00:21:04.240 | panspermia, as you described it, Sarnath.
00:21:06.640 | I don't see a divine intervention in some way
00:21:09.040 | in the historical record right now.
00:21:11.200 | I do feel like the story in these books,
00:21:15.080 | like Nick Lane's books and so on, sort of makes sense,
00:21:17.440 | and it makes sense how life arose on Earth uniquely.
00:21:20.660 | And yeah, I don't need to reach
00:21:23.580 | for more exotic explanations right now.
00:21:25.400 | - Sure, but NPCs inside a video game
00:21:27.600 | don't observe any divine intervention either.
00:21:32.360 | We might just be all NPCs running a kind of code.
00:21:35.440 | - Maybe eventually they will.
00:21:36.400 | Currently, NPCs are really dumb,
00:21:37.720 | but once they're running GPTs,
00:21:39.840 | maybe they will be like,
00:21:40.840 | hey, this is really suspicious, what the hell?
00:21:43.440 | - So you famously tweeted,
00:21:45.800 | "It looks like if you bombard Earth
00:21:47.880 | "with photons for a while, you can emit a roadster."
00:21:51.660 | So if like in "Hitchhiker's Guide to the Galaxy,"
00:21:54.840 | we would summarize the story of Earth.
00:21:56.840 | So in that book, it's mostly harmless.
00:21:59.460 | What do you think is all the possible stories,
00:22:02.760 | like a paragraph long or a sentence long,
00:22:05.800 | that Earth could be summarized as?
00:22:08.560 | Once it's done, it's computation.
00:22:11.200 | So like all the possible full,
00:22:13.420 | if Earth is a book, right?
00:22:16.240 | - Yeah.
00:22:17.080 | - Probably there has to be an ending.
00:22:19.920 | I mean, there's going to be an end to Earth,
00:22:21.440 | and it could end in all kinds of ways.
00:22:23.200 | It can end soon, it can end later.
00:22:25.280 | What do you think are the possible stories?
00:22:27.480 | - Well, definitely there seems to be,
00:22:29.840 | yeah, you're sort of,
00:22:30.880 | it's pretty incredible that these self-replicating systems
00:22:34.160 | will basically arise from the dynamics,
00:22:37.240 | and then they perpetuate themselves and become more complex,
00:22:39.500 | and eventually become conscious and build a society.
00:22:42.760 | And I kind of feel like in some sense,
00:22:44.280 | it's kind of like a deterministic wave
00:22:46.880 | that kind of just like happens on any,
00:22:50.840 | any sufficiently well-arranged system like Earth.
00:22:53.880 | And so I kind of feel like there's a certain sense
00:22:55.840 | of inevitability in it, and it's really beautiful.
00:22:59.800 | - And it ends somehow, right?
00:23:00.960 | So it's a chemically,
00:23:04.360 | a diverse environment where complex dynamical systems
00:23:10.040 | can evolve and become more, further and further complex.
00:23:15.040 | But then there's a certain,
00:23:17.360 | what is it?
00:23:20.400 | There's certain terminating conditions.
00:23:22.640 | - Yeah, I don't know what the terminating conditions are,
00:23:25.080 | but definitely there's a trend line of something,
00:23:27.120 | and we're part of that story.
00:23:28.160 | And like, where does that, where does it go?
00:23:30.680 | So, you know, we're famously described often
00:23:32.480 | as a biological bootloader for AIs.
00:23:35.080 | And that's because humans, I mean, you know,
00:23:36.240 | we're an incredible biological system,
00:23:39.000 | and we're capable of computation and, you know,
00:23:42.000 | and love and so on.
00:23:43.540 | But we're extremely inefficient as well.
00:23:46.200 | Like we're talking to each other through audio.
00:23:47.840 | It's just kind of embarrassing, honestly,
00:23:49.840 | that we're manipulating like seven symbols,
00:23:52.320 | serially, we're using vocal cords,
00:23:55.160 | it's all happening over like multiple seconds.
00:23:57.600 | It's just like kind of embarrassing
00:23:58.800 | when you step down to the frequencies
00:24:01.920 | at which computers operate or are able to operate on.
00:24:05.160 | And so basically it does seem like synthetic intelligences
00:24:09.720 | are kind of like the next stage of development.
00:24:12.320 | And I don't know where it leads to,
00:24:14.560 | like at some point I suspect the universe
00:24:18.200 | is some kind of a puzzle.
00:24:20.600 | And these synthetic AIs will uncover that puzzle
00:24:24.000 | and solve it.
00:24:26.760 | - And then what happens after, right?
00:24:28.640 | Like what, 'cause if you just like fast forward Earth,
00:24:31.600 | many billions of years, it's like, it's quiet.
00:24:35.120 | And then it's like, to normal,
00:24:36.600 | you see like city lights and stuff like that.
00:24:38.280 | And then what happens at like at the end?
00:24:40.120 | Like, is it like a (mimics explosion)
00:24:42.240 | or is it like a calming, is it explosion?
00:24:45.480 | Is it like Earth like open, like a giant?
00:24:47.640 | 'Cause you said emit roasters.
00:24:50.280 | Will it start emitting like a giant number of like satellites?
00:24:55.280 | - Yes, it's some kind of a crazy explosion.
00:24:58.280 | And we're living, we're like,
00:25:00.000 | we're stepping through a explosion
00:25:01.980 | and we're like living day to day
00:25:03.240 | and it doesn't look like it, but it's actually,
00:25:04.720 | if you, I saw a very cool animation of Earth
00:25:07.600 | and life on Earth, and basically nothing happens
00:25:09.240 | for a long time.
00:25:10.080 | And then the last like two seconds,
00:25:11.520 | like basically cities and everything,
00:25:12.960 | and the lower orbit just gets cluttered
00:25:15.840 | and just the whole thing happens in the last two seconds
00:25:17.480 | and you're like, this is exploding.
00:25:19.160 | This is a statement explosion.
00:25:21.080 | - So if you play, yeah, yeah,
00:25:23.480 | if you play at a normal speed,
00:25:25.600 | it'll just look like an explosion.
00:25:27.520 | - It's a firecracker.
00:25:28.360 | We're living in a firecracker.
00:25:30.360 | - Where it's going to start emitting
00:25:31.960 | all kinds of interesting things.
00:25:33.720 | And then, so explosion doesn't,
00:25:36.240 | it might actually look like a little explosion
00:25:38.680 | with lights and fire and energy emitted,
00:25:41.200 | all that kind of stuff.
00:25:42.040 | But when you look inside the details of the explosion,
00:25:45.280 | there's actual complexity happening
00:25:47.960 | where there's like, yeah, human life or some kind of life.
00:25:52.120 | - We hope it's not a destructive firecracker.
00:25:53.720 | It's kind of like a constructive firecracker.
00:25:57.920 | - All right, so given that, hilarious discussion.
00:26:01.080 | - It is really interesting to think about like
00:26:02.440 | what the puzzle of the universe is.
00:26:03.880 | Did the creator of the universe give us a message?
00:26:06.520 | Like for example, in the book "Contact", Carl Sagan,
00:26:09.640 | there's a message for any civilization in digits,
00:26:15.040 | in the expansion of pi in base 11 eventually,
00:26:18.080 | which is kind of interesting thought.
00:26:19.800 | Maybe we're supposed to be giving a message to our creator.
00:26:23.080 | Maybe we're supposed to somehow create
00:26:24.760 | some kind of a quantum mechanical system
00:26:26.600 | that alerts them to our intelligent presence here.
00:26:30.080 | 'Cause if you think about it from their perspective,
00:26:31.840 | it's just say like quantum field theory,
00:26:33.880 | massive like cellular automaton like thing.
00:26:36.680 | And like, how do you even notice that we exist?
00:26:38.520 | You might not even be able to pick us up in that simulation.
00:26:42.080 | And so how do you prove that you exist,
00:26:44.760 | that you're intelligent
00:26:45.600 | and that you're a part of the universe?
00:26:47.520 | - So this is like a Turing test for intelligence from Earth.
00:26:50.280 | - Yeah. - Like the creator is,
00:26:52.200 | I mean, maybe this is like trying to complete
00:26:54.560 | the next word in a sentence.
00:26:55.760 | This is a complicated way of that.
00:26:57.240 | Like Earth is just, is basically sending a message back.
00:27:00.840 | - Yeah, the puzzle is basically like alerting the creator
00:27:03.120 | that we exist.
00:27:04.520 | Or maybe the puzzle is just to just break out of the system
00:27:07.160 | and just stick it to the creator in some way.
00:27:10.360 | Basically, like if you're playing a video game,
00:27:12.200 | you can somehow find an exploit
00:27:15.400 | and find a way to execute on the host machine,
00:27:18.480 | any arbitrary code.
00:27:19.800 | There's some, for example,
00:27:21.440 | I believe someone got a game of Mario to play Pong
00:27:24.520 | just by exploiting it and then creating,
00:27:29.160 | basically writing code
00:27:30.800 | and being able to execute arbitrary code in the game.
00:27:33.200 | And so maybe we should be,
00:27:34.760 | maybe that's the puzzle is that we should be
00:27:37.000 | find a way to exploit it.
00:27:39.240 | So I think like some of these synthetic AI's
00:27:41.160 | will eventually find the universe to be some kind of a puzzle
00:27:43.520 | and then solve it in some way.
00:27:45.120 | And that's kind of like the end game somehow.
00:27:47.440 | - Do you often think about it as a simulation?
00:27:51.360 | So as the universe being a kind of computation
00:27:55.440 | that has, might have bugs and exploits?
00:27:57.800 | - Yes. Yeah, I think so.
00:27:59.160 | - Is that what physics is essentially?
00:28:01.160 | - I think it's possible that physics has exploits
00:28:03.040 | and we should be trying to find them.
00:28:04.720 | Arranging some kind of a crazy quantum mechanical system
00:28:07.080 | that somehow gives you buffer overflow,
00:28:09.560 | somehow gives you a rounding error in the floating point.
00:28:12.360 | - Yeah, that's right.
00:28:16.120 | And like more and more sophisticated exploits.
00:28:18.960 | Those are jokes, but that could be actually very close.
00:28:21.400 | - Yeah, we'll find some way to extract infinite energy.
00:28:23.840 | For example, when you train reinforcement learning agents
00:28:26.680 | in physical simulations
00:28:27.800 | and you ask them to say run quickly on the flat ground,
00:28:31.280 | they'll end up doing all kinds of like weird things
00:28:33.880 | in part of that optimization, right?
00:28:35.200 | They'll get on their back leg
00:28:36.560 | and they will slide across the floor.
00:28:38.680 | And it's because of the optimization,
00:28:40.920 | the enforcement learning optimization on that agent
00:28:42.760 | has figured out a way to extract infinite energy
00:28:44.400 | from the friction forces
00:28:45.520 | and basically their poor implementation.
00:28:48.520 | And they found a way to generate infinite energy
00:28:50.560 | and just slide across the surface.
00:28:51.680 | And it's not what you expected.
00:28:52.840 | It's just a, it's sort of like a perverse solution.
00:28:56.120 | And so maybe we can find something like that.
00:28:57.920 | Maybe we can be that little dog in this physical simulation.
00:29:02.320 | - The cracks or escapes the intended consequences
00:29:07.040 | of the physics that the universe came up with.
00:29:09.600 | We'll figure out some kind of shortcut to some weirdness.
00:29:12.040 | And then, oh man, but see the problem with that weirdness
00:29:15.000 | is the first person to discover the weirdness,
00:29:17.600 | like sliding on the back legs, that's all we're gonna do.
00:29:21.360 | It's very quickly becomes everybody does that thing.
00:29:26.840 | So like the paperclip maximizer is a ridiculous idea,
00:29:31.300 | but that very well could be what then we'll just,
00:29:35.800 | we'll just all switch that 'cause it's so fun.
00:29:38.040 | - Well, no person will discover it, I think, by the way.
00:29:39.920 | I think it's going to have to be some kind
00:29:42.400 | of a super intelligent AGI of a third generation.
00:29:45.760 | Like we're building the first generation AGI.
00:29:48.920 | Maybe, you know.
00:29:50.520 | - Third generation.
00:29:51.880 | Yeah, so the bootloader for an AI,
00:29:55.640 | that AI will be a bootloader for another AI.
00:29:58.560 | - Better AI, yeah.
00:30:00.080 | - And then there's no way for us to introspect
00:30:02.640 | like what that might even--
00:30:04.240 | - I think it's very likely that these things, for example,
00:30:05.880 | like say you have these AGIs, it's very likely,
00:30:08.160 | for example, they will be completely inert.
00:30:10.480 | I like these kinds of sci-fi books sometimes
00:30:12.160 | where these things are just completely inert.
00:30:14.120 | They don't interact with anything.
00:30:15.520 | And I find that kind of beautiful
00:30:16.560 | because they've probably figured out the meta game
00:30:20.680 | of the universe in some way, potentially.
00:30:22.080 | They're doing something completely beyond our imagination
00:30:25.040 | and they don't interact with simple chemical life forms.
00:30:29.880 | Like why would you do that?
00:30:31.280 | So I find those kinds of ideas compelling.
00:30:33.440 | - What's their source of fun?
00:30:34.840 | What are they doing?
00:30:36.320 | What's the source of pleasure?
00:30:37.140 | - Well, it's probably puzzle solving in the universe.
00:30:38.960 | - But inert, so can you define what it means inert?
00:30:43.000 | So they escape the interaction with physical reality?
00:30:44.840 | - They will appear inert to us, as in,
00:30:46.940 | they will behave in some very strange way to us
00:30:53.360 | because they're beyond, they're playing the meta game.
00:30:57.160 | And the meta game is probably, say,
00:30:58.280 | like arranging quantum mechanical systems
00:30:59.880 | in some very weird ways to extract infinite energy,
00:31:03.160 | solve the digital expansion of pi to whatever amount.
00:31:07.040 | They will build their own little fusion reactors
00:31:09.560 | or something crazy.
00:31:10.640 | Like they're doing something beyond comprehension
00:31:12.240 | and not understandable to us
00:31:14.360 | and actually brilliant under the hood.
00:31:17.040 | - What if quantum mechanics itself is the system
00:31:20.160 | and we're just thinking it's physics
00:31:23.200 | but we're really parasites, not parasites,
00:31:27.600 | we're not really hurting physics.
00:31:29.400 | We're just living on this organism,
00:31:31.640 | this organism and we're like trying to understand it
00:31:34.760 | but really it is an organism
00:31:36.760 | and with a deep, deep intelligence.
00:31:38.160 | Maybe physics itself is the organism
00:31:42.160 | that's doing the super interesting thing
00:31:46.640 | and we're just like one little thing,
00:31:48.760 | ant sitting on top of it trying to get energy from it.
00:31:52.440 | - We're just kind of like these particles in a wave
00:31:54.840 | that I feel like is mostly deterministic
00:31:56.400 | and takes a universe from some kind of a Big Bang
00:31:58.960 | to some kind of a super intelligent replicator,
00:32:02.200 | some kind of a stable point in the universe
00:32:04.880 | given these laws of physics.
00:32:06.480 | - You don't think, as Einstein said,
00:32:09.000 | God doesn't play dice?
00:32:10.800 | So you think it's mostly deterministic?
00:32:12.640 | There's no randomness in the thing?
00:32:13.760 | - I think it's deterministic.
00:32:14.680 | Oh, there's tons of, well,
00:32:16.840 | I want to be careful with randomness.
00:32:18.120 | - Pseudo random?
00:32:19.440 | - Yeah, I don't like random.
00:32:20.680 | I think maybe the laws of physics are deterministic.
00:32:23.360 | Yeah, I think they're deterministic.
00:32:25.360 | - We just got really uncomfortable with this question.
00:32:27.600 | (Andrej laughing)
00:32:29.280 | Do you have anxiety about whether the universe
00:32:31.240 | is random or not?
00:32:32.200 | Is this a source?
00:32:33.280 | (Andrej laughing)
00:32:34.240 | What's-- - There's no randomness.
00:32:36.560 | - You said you like goodwill hunting.
00:32:38.080 | It's not your fault, Andrej.
00:32:39.320 | (Andrej laughing)
00:32:40.360 | It's not your fault, man.
00:32:41.640 | So you don't like randomness?
00:32:45.320 | - Yeah, I think it's unsettling.
00:32:46.960 | I think it's a deterministic system.
00:32:48.800 | I think that things that look random,
00:32:50.600 | like say the collapse of the wave function, et cetera,
00:32:53.320 | I think they're actually deterministic,
00:32:54.720 | just entanglement and so on,
00:32:56.800 | and some kind of a multi-verse theory, something, something.
00:32:59.640 | - Okay, so why does it feel like we have a free will?
00:33:02.800 | Like if I raise this hand, I chose to do this now.
00:33:06.040 | That doesn't feel like a deterministic thing.
00:33:12.280 | It feels like I'm making a choice.
00:33:14.480 | - It feels like it.
00:33:15.680 | - Okay, so it's all feelings.
00:33:17.160 | It's just feelings.
00:33:18.840 | So when an RL agent is making a choice,
00:33:21.800 | it's not really making a choice.
00:33:26.000 | The choice is already there.
00:33:27.680 | - Yeah, you're interpreting the choice
00:33:28.960 | and you're creating a narrative for having made it.
00:33:32.120 | - Yeah, and now we're talking about the narrative.
00:33:33.880 | It's very meta.
00:33:35.200 | Looking back, what is the most beautiful
00:33:37.720 | or surprising idea in deep learning or AI
00:33:41.080 | in general that you've come across?
00:33:43.240 | You've seen this field explode
00:33:45.120 | and grow in interesting ways.
00:33:47.720 | Just what cool ideas, like we made you sit back and go,
00:33:52.240 | hmm, small, big or small?
00:33:55.560 | - Well, the one that I've been thinking about recently
00:33:57.880 | the most probably is the transformer architecture.
00:34:02.000 | So basically neural networks have,
00:34:06.080 | a lot of architectures that were trendy
00:34:07.800 | have come and gone for different sensory modalities,
00:34:10.720 | like for vision, audio, text.
00:34:12.720 | You would process them with different looking neural nets.
00:34:14.760 | And recently we've seen this convergence
00:34:16.800 | towards one architecture, the transformer.
00:34:19.120 | And you can feed it video or you can feed it images
00:34:22.440 | or speech or text, and it just gobbles it up.
00:34:24.280 | And it's kind of like a bit of a general purpose computer
00:34:28.640 | that is also trainable and very efficient
00:34:30.440 | to run on our hardware.
00:34:32.000 | And so this paper came out in 2016, I wanna say.
00:34:36.840 | - "Attention is All You Need."
00:34:38.000 | - "Attention is All You Need."
00:34:39.320 | - You criticized the paper title in retrospect
00:34:41.720 | that it wasn't, it didn't foresee the bigness of the impact
00:34:46.720 | that it was going to have.
00:34:48.960 | - Yeah, I'm not sure if the authors were aware
00:34:50.440 | of the impact that that paper would go on to have.
00:34:52.680 | Probably they weren't.
00:34:53.880 | But I think they were aware of some of the motivations
00:34:56.280 | and design decisions behind the transformer.
00:34:58.080 | And they chose not to, I think,
00:34:59.680 | expand on it in that way in the paper.
00:35:01.800 | And so I think they had an idea that there was more
00:35:04.360 | than just the surface of just like,
00:35:06.800 | "Oh, we're just doing translation
00:35:07.840 | and here's a better architecture."
00:35:09.120 | You're not just doing translation.
00:35:10.080 | This is like a really cool, differentiable,
00:35:12.200 | optimizable, efficient computer that you've proposed.
00:35:14.880 | And maybe they didn't have all of that foresight,
00:35:16.840 | but I think it's really interesting.
00:35:18.240 | - Isn't it funny, sorry to interrupt,
00:35:20.320 | that title is memeable,
00:35:22.480 | that they went for such a profound idea.
00:35:25.400 | They went with a,
00:35:26.440 | I don't think anyone used that kind of title before, right?
00:35:29.360 | - Attention is all you need.
00:35:30.200 | Yeah, it's like a meme or something, basically.
00:35:32.520 | - Isn't that funny?
00:35:33.520 | Maybe if it was a more serious title,
00:35:37.600 | it wouldn't have the impact.
00:35:38.680 | - Honestly, yeah, there is an element of me
00:35:40.600 | that honestly agrees with you and prefers it this way.
00:35:43.000 | - Yes.
00:35:43.840 | (both laughing)
00:35:45.880 | - If it was too grand, it would over-promise
00:35:47.800 | and then under-deliver, potentially.
00:35:49.120 | So you want to just meme your way to greatness.
00:35:51.800 | (both laughing)
00:35:53.280 | - That should be a T-shirt.
00:35:54.400 | So you tweeted, "The Transformer
00:35:56.160 | is a magnificent neural network architecture
00:35:58.880 | because it is a general-purpose differentiable computer.
00:36:01.760 | It is simultaneously expressive in the forward pass,
00:36:05.040 | optimizable via backpropagation gradient descent,
00:36:08.520 | and efficient, high-parallelism compute graph."
00:36:12.360 | Can you discuss some of those details,
00:36:14.280 | expressive, optimizable, efficient,
00:36:17.360 | from memory or in general, whatever comes to your heart?
00:36:21.000 | - You want to have a general-purpose computer
00:36:22.200 | that you can train on arbitrary problems,
00:36:24.280 | like say the task of next word prediction
00:36:26.160 | or detecting if there's a cat in an image
00:36:28.200 | or something like that.
00:36:29.680 | And you want to train this computer,
00:36:31.040 | so you want to set its weights.
00:36:32.720 | And I think there's a number of design criteria
00:36:34.480 | that sort of overlap in the Transformer simultaneously
00:36:37.760 | that made it very successful.
00:36:38.920 | And I think the authors were kind of deliberately trying
00:36:41.960 | to make this a really powerful architecture.
00:36:46.200 | And so basically it's very powerful in the forward pass
00:36:50.640 | because it's able to express very general computation
00:36:55.520 | as sort of something that looks like message passing.
00:36:57.880 | You have nodes and they all store vectors.
00:37:00.040 | And these nodes get to basically look at each other
00:37:02.600 | and each other's vectors, and they get to communicate.
00:37:06.080 | And basically nodes get to broadcast,
00:37:08.560 | "Hey, I'm looking for certain things."
00:37:09.840 | And then other nodes get to broadcast,
00:37:11.200 | "Hey, these are the things I have.
00:37:12.600 | Those are the keys and the values."
00:37:13.760 | - So it's not just attention.
00:37:15.200 | - Yeah, exactly.
00:37:16.040 | Transformer is much more than just the attention component.
00:37:17.680 | It's got many pieces architectural that went into it.
00:37:20.160 | The residual connection, the way it's arranged,
00:37:21.840 | there's a multilayer perceptron in there,
00:37:23.840 | the way it's stacked and so on.
00:37:25.960 | But basically there's a message passing scheme
00:37:28.480 | where nodes get to look at each other,
00:37:29.840 | decide what's interesting and then update each other.
00:37:32.680 | And so I think when you get to the details of it,
00:37:35.720 | I think it's a very expressive function.
00:37:37.760 | So it can express lots of different types of algorithms
00:37:39.840 | in the forward pass.
00:37:40.800 | Not only that, but the way it's designed
00:37:42.560 | with the residual connections, layer normalizations,
00:37:44.600 | the softmax attention and everything,
00:37:46.400 | it's also optimizable.
00:37:47.480 | This is a really big deal
00:37:48.720 | because there's lots of computers that are powerful
00:37:51.320 | that you can't optimize,
00:37:52.800 | or they're not easy to optimize
00:37:53.920 | using the techniques that we have,
00:37:55.080 | which is back propagation and gradient descent.
00:37:56.680 | These are first order methods,
00:37:57.960 | very simple optimizers really.
00:37:59.760 | And so you also need it to be optimizable.
00:38:02.920 | And then lastly,
00:38:04.880 | you want it to run efficiently on our hardware.
00:38:06.520 | Our hardware is a massive throughput machine like GPUs.
00:38:10.640 | They prefer lots of parallelism.
00:38:13.040 | So you don't want to do lots of sequential operations.
00:38:14.960 | You want to do a lot of operations serially.
00:38:16.840 | And the transformer is designed with that in mind as well.
00:38:19.240 | And so it's designed for our hardware
00:38:21.400 | and is designed to both be very expressive
00:38:23.160 | in a forward pass,
00:38:24.000 | but also very optimizable in the backward pass.
00:38:26.320 | - And you said that the residual connections
00:38:29.280 | support a kind of ability to learn short algorithms
00:38:32.240 | fast and first,
00:38:33.240 | and then gradually extend them longer during training.
00:38:37.000 | What's the idea of learning short algorithms?
00:38:39.560 | - Right.
00:38:40.400 | Think of it as a,
00:38:41.240 | so basically a transformer is a series of blocks, right?
00:38:45.880 | And these blocks have attention
00:38:47.040 | and a little multilayer perceptron.
00:38:48.560 | And so you go off into a block
00:38:50.480 | and you come back to this residual pathway,
00:38:52.320 | and then you go off and you come back,
00:38:53.480 | and then you have a number of layers arranged sequentially.
00:38:55.960 | And so the way to look at it, I think,
00:38:57.560 | is because of the residual pathway in the backward pass,
00:39:00.520 | the gradients sort of flow along it uninterrupted
00:39:04.280 | because addition distributes the gradient equally
00:39:07.080 | to all of its branches.
00:39:08.360 | So the gradient from the supervision at the top
00:39:10.760 | just floats directly to the first layer.
00:39:13.880 | And all the residual connections are arranged
00:39:16.200 | so that in the beginning, during initialization,
00:39:18.120 | they contribute nothing to the residual pathway.
00:39:20.480 | So what it kind of looks like is,
00:39:22.840 | imagine the transformer is kind of like a Python function,
00:39:26.800 | like a def.
00:39:27.880 | And you get to do various kinds of lines of code.
00:39:32.120 | Say you have a hundred layers deep transformer,
00:39:35.360 | typically they would be much shorter, say 20.
00:39:37.120 | So you have 20 lines of code
00:39:37.960 | and you can do something in them.
00:39:39.440 | And so during the optimization,
00:39:41.040 | basically what it looks like is,
00:39:41.920 | first you optimize the first line of code
00:39:43.440 | and then the second line of code can kick in
00:39:45.040 | and the third line of code can kick in.
00:39:46.600 | And I kind of feel like because of the residual pathway
00:39:48.800 | and the dynamics of the optimization,
00:39:50.840 | you can sort of learn a very short algorithm
00:39:52.920 | that gets the approximate answer,
00:39:54.280 | but then the other layers can sort of kick in
00:39:56.120 | and start to create a contribution.
00:39:57.640 | And at the end of it, you're optimizing over an algorithm
00:40:00.040 | that is 20 lines of code.
00:40:02.480 | Except these lines of code are very complex
00:40:03.920 | because it's an entire block of a transformer.
00:40:05.760 | You can do a lot in there.
00:40:06.840 | Well, what's really interesting
00:40:07.680 | is that this transformer architecture actually
00:40:09.760 | has been remarkably resilient.
00:40:11.720 | Basically the transformer that came out in 2016
00:40:13.640 | is the transformer you would use today,
00:40:15.120 | except you reshuffle some of the layer norms.
00:40:17.880 | The layer normalizations have been reshuffled
00:40:19.480 | to a pre-norm formulation.
00:40:21.880 | And so it's been remarkably stable,
00:40:23.560 | but there's a lot of bells and whistles
00:40:25.200 | that people have attached on it and try to improve it.
00:40:27.960 | I do think that basically it's a big step
00:40:29.840 | in simultaneously optimizing for lots of properties
00:40:32.720 | of a desirable neural network architecture.
00:40:34.320 | And I think people have been trying to change it,
00:40:36.000 | but it's proven remarkably resilient.
00:40:38.640 | But I do think that there should be
00:40:39.960 | even better architectures potentially.
00:40:41.760 | - But you admire the resilience here.
00:40:45.680 | - Yeah. - There's something profound
00:40:46.840 | about this architecture that leads to resilience.
00:40:49.320 | So maybe everything could be turned
00:40:51.080 | into a problem that transformers can solve.
00:40:55.080 | - Currently, it definitely looks like
00:40:56.160 | the transformer is taking over AI
00:40:57.720 | and you can feed basically arbitrary problems into it.
00:41:00.080 | And it's a general differentiable computer
00:41:02.240 | and it's extremely powerful.
00:41:03.480 | And this convergence in AI has been really interesting
00:41:07.520 | to watch for me personally.
00:41:09.720 | - What else do you think could be discovered here
00:41:12.040 | about transformers?
00:41:12.960 | Like what's surprising thing?
00:41:14.640 | Or is it a stable, I want a stable place.
00:41:18.760 | Is there something interesting we might discover
00:41:20.400 | about transformers?
00:41:21.480 | Like aha moments maybe has to do with memory,
00:41:24.280 | maybe knowledge representation, that kind of stuff.
00:41:28.240 | - Definitely the Zeitgeist today is just pushing,
00:41:31.200 | like basically right now the Zeitgeist
00:41:32.800 | is do not touch the transformer, touch everything else.
00:41:35.880 | So people are scaling up the datasets,
00:41:37.280 | making them much, much bigger.
00:41:38.320 | They're working on the evaluation,
00:41:39.480 | making the evaluation much, much bigger.
00:41:41.360 | And they're basically keeping the architecture unchanged.
00:41:45.800 | And that's how we've, that's the last five years
00:41:47.920 | of progress in AI, kind of.
00:41:50.800 | - What do you think about one flavor of it,
00:41:53.000 | which is language models?
00:41:54.920 | Have you been surprised?
00:41:56.400 | Has your sort of imagination been captivated
00:42:01.000 | by you mentioned GPT and all the bigger and bigger
00:42:03.560 | and bigger language models?
00:42:05.600 | And what are the limits of those models, do you think?
00:42:10.600 | So just the task of natural language.
00:42:14.560 | - Basically the way GPT is trained, right?
00:42:17.560 | Is you just download a massive amount of text data
00:42:20.000 | from the internet and you try to predict the next word
00:42:23.080 | in the sequence, roughly speaking.
00:42:24.600 | You're predicting little word chunks,
00:42:26.680 | but roughly speaking, that's it.
00:42:29.320 | And what's been really interesting to watch
00:42:30.720 | is basically it's a language model.
00:42:33.120 | Language models have actually existed for a very long time.
00:42:36.200 | There's papers on language modeling from 2003, even earlier.
00:42:39.800 | - Can you explain in that case what a language model is?
00:42:42.840 | - Yeah, so language model, just basically the rough idea
00:42:45.360 | is just predicting the next word in a sequence,
00:42:48.480 | roughly speaking.
00:42:49.760 | So there's a paper from, for example, Ben Gio
00:42:52.520 | and the team from 2003, where for the first time
00:42:55.120 | they were using a neural network to take, say,
00:42:57.920 | like three or five words and predict the next word.
00:43:01.680 | And they're doing this on much smaller datasets.
00:43:03.680 | And the neural net is not a transformer,
00:43:05.200 | it's a multi-layer perceptron, but it's the first time
00:43:08.080 | that a neural network has been applied in that setting.
00:43:10.240 | But even before neural networks, there were language models,
00:43:13.360 | except they were using N-gram models.
00:43:16.800 | So N-gram models are just count-based models.
00:43:19.760 | So if you start to take two words and predict the third one,
00:43:24.320 | you just count up how many times you've seen
00:43:26.800 | any two-word combinations and what came next.
00:43:29.880 | And what you predict as coming next
00:43:31.480 | is just what you've seen the most of in the training set.
00:43:34.160 | And so language modeling has been around for a long time.
00:43:36.600 | Neural networks have done language modeling
00:43:38.280 | for a long time.
00:43:39.440 | So really what's new or interesting or exciting
00:43:41.680 | is just realizing that when you scale it up
00:43:46.040 | with a powerful enough neural net, a transformer,
00:43:48.680 | you have all these emergent properties
00:43:50.520 | where basically what happens is
00:43:53.600 | if you have a large enough dataset of text,
00:43:56.880 | you are in the task of predicting the next word.
00:44:00.480 | You are multitasking a huge amount
00:44:02.080 | of different kinds of problems.
00:44:04.520 | You are multitasking understanding of chemistry,
00:44:07.960 | physics, human nature.
00:44:09.760 | Lots of things are sort of clustered in that objective.
00:44:12.120 | It's a very simple objective, but actually you have
00:44:13.760 | to understand a lot about the world
00:44:15.200 | to make that prediction.
00:44:16.240 | - You just said the U-word, understanding.
00:44:19.160 | Are you, in terms of chemistry and physics and so on,
00:44:23.480 | what do you feel like it's doing?
00:44:25.000 | Is it searching for the right context?
00:44:27.160 | What is the actual process happening here?
00:44:32.320 | - Yeah, so basically it gets a thousand words
00:44:34.680 | and it's trying to predict the thousand and first.
00:44:36.520 | And in order to do that very, very well
00:44:38.720 | over the entire dataset available on the internet,
00:44:41.200 | you actually have to basically kind of understand
00:44:44.120 | the context of what's going on in there.
00:44:46.600 | And it's a sufficiently hard problem
00:44:50.640 | that if you have a powerful enough computer,
00:44:53.840 | like a transformer, you end up with interesting solutions.
00:44:57.560 | And you can ask it to do all kinds of things.
00:45:01.000 | And it shows a lot of emergent properties,
00:45:04.800 | like in-context learning, that was the big deal with GPT
00:45:07.640 | and the original paper when they published it,
00:45:09.680 | is that you can just sort of prompt it in various ways
00:45:12.560 | and ask it to do various things.
00:45:13.760 | And it will just kind of complete the sentence.
00:45:15.240 | But in the process of just completing the sentence,
00:45:17.160 | it's actually solving all kinds of really interesting
00:45:20.000 | problems that we care about.
00:45:21.520 | - Do you think it's doing something like understanding?
00:45:24.480 | Like when we use the word understanding for us humans?
00:45:28.360 | - I think it's doing some understanding.
00:45:31.240 | In its weights, it understands, I think,
00:45:33.200 | a lot about the world and it has to
00:45:35.760 | in order to predict the next word in a sequence.
00:45:38.720 | - So it's trained on the data from the internet.
00:45:41.120 | What do you think about this approach
00:45:44.760 | in terms of datasets of using data from the internet?
00:45:47.800 | Do you think the internet has enough structured data
00:45:50.400 | to teach AI about human civilization?
00:45:52.760 | - Yes, I think the internet has a huge amount of data.
00:45:56.000 | I'm not sure if it's a complete enough set.
00:45:58.000 | I don't know that text is enough
00:46:00.920 | for having a sufficiently powerful AGI as an outcome.
00:46:04.720 | - Of course, there is audio and video and images
00:46:07.160 | and all that kind of stuff.
00:46:08.280 | - Yeah, so text by itself, I'm a little bit suspicious about.
00:46:10.600 | There's a ton of things we don't put in text in writing,
00:46:13.280 | just because they're obvious to us
00:46:14.600 | about how the world works and the physics of it
00:46:16.200 | and that things fall.
00:46:17.240 | We don't put that stuff in text because why would you?
00:46:19.040 | We share that understanding.
00:46:20.920 | And so text is a communication medium between humans,
00:46:22.920 | and it's not a all-encompassing medium of knowledge
00:46:26.560 | about the world.
00:46:27.520 | But as you pointed out, we do have video
00:46:29.600 | and we have images and we have audio.
00:46:31.520 | And so I think that definitely helps a lot,
00:46:33.600 | but we haven't trained models sufficiently across both,
00:46:37.760 | across all of those modalities yet.
00:46:39.600 | So I think that's what a lot of people are interested in.
00:46:41.200 | - But I wonder what that shared understanding
00:46:42.960 | of what we might call common sense
00:46:46.040 | has to be learned, inferred,
00:46:49.120 | in order to complete the sentence correctly.
00:46:51.720 | So maybe the fact that it's implied on the internet,
00:46:55.920 | the model's gonna have to learn that,
00:46:58.080 | not by reading about it,
00:47:00.000 | by inferring it in the representation.
00:47:02.800 | So like common sense, just like we,
00:47:04.840 | I don't think we learn common sense.
00:47:06.760 | Like nobody says, tells us explicitly.
00:47:10.160 | We just figure it all out by interacting with the world.
00:47:13.000 | - Right.
00:47:13.880 | - And so here's a model of reading
00:47:15.400 | about the way people interact with the world.
00:47:17.600 | It might have to infer that.
00:47:19.560 | I wonder.
00:47:20.480 | - Yeah.
00:47:21.520 | - You briefly worked on a project called World of Bits,
00:47:25.320 | training an RL system to take actions on the internet,
00:47:28.640 | versus just consuming the internet, like we talked about.
00:47:32.240 | Do you think there's a future for that kind of system,
00:47:34.360 | interacting with the internet to help the learning?
00:47:36.960 | - Yes, I think that's probably the final frontier
00:47:39.560 | for a lot of these models,
00:47:40.880 | because, so as you mentioned, when I was at OpenAI,
00:47:44.480 | I was working on this project called World of Bits,
00:47:45.960 | and basically it was the idea of giving neural networks
00:47:47.880 | access to a keyboard and a mouse.
00:47:50.080 | And the idea is that-
00:47:50.920 | - What could possibly go wrong?
00:47:52.560 | - So basically you perceive the input of the screen pixels,
00:47:58.480 | and basically the state of the computer
00:48:01.400 | is sort of visualized for human consumption
00:48:03.680 | in images of the web browser and stuff like that.
00:48:06.520 | And then you give the neural network the ability
00:48:08.280 | to press keyboards and use the mouse.
00:48:10.120 | And we were trying to get it to, for example,
00:48:11.560 | complete bookings and interact with user interfaces.
00:48:15.840 | - What'd you learn from that experience?
00:48:17.360 | Like what was some fun stuff?
00:48:18.800 | This is a super cool idea.
00:48:20.320 | - Yeah.
00:48:21.160 | - I mean, it's like, yeah, I mean,
00:48:23.680 | the step between observer to actor
00:48:27.000 | is a super fascinating step.
00:48:28.760 | - Yeah, well, it's the universal interface
00:48:30.560 | in the digital realm, I would say.
00:48:32.440 | And there's a universal interface in like the physical realm,
00:48:35.080 | which in my mind is a humanoid form factor kind of thing.
00:48:38.580 | We can later talk about Optimus and so on,
00:48:40.520 | but I feel like there's a,
00:48:41.800 | they're kind of like a similar philosophy in some way,
00:48:45.160 | where the physical world is designed for the human form,
00:48:48.800 | and the digital world is designed for the human form
00:48:50.760 | of seeing the screen and using keyboard and mouse.
00:48:54.680 | And so it's the universal interface
00:48:56.360 | that can basically command the digital infrastructure
00:49:00.000 | we've built up for ourselves.
00:49:01.320 | And so it feels like a very powerful interface
00:49:04.160 | to command and to build on top of.
00:49:06.880 | Now, to your question as to like what I learned from that,
00:49:08.960 | it's interesting because the world of bits
00:49:11.040 | was basically too early, I think, at OpenAI at the time.
00:49:14.540 | This is around 2015 or so.
00:49:18.380 | And the zeitgeist at that time was very different in AI
00:49:21.480 | from the zeitgeist today.
00:49:23.160 | At the time, everyone was super excited
00:49:25.000 | about reinforcement learning from scratch.
00:49:27.080 | This is the time of the Atari paper,
00:49:29.480 | where neural networks were playing Atari games
00:49:32.400 | and beating humans in some cases, AlphaGo and so on.
00:49:36.000 | So everyone was very excited
00:49:36.880 | about training neural networks from scratch
00:49:38.640 | using reinforcement learning directly.
00:49:41.300 | It turns out that reinforcement learning
00:49:43.480 | is extremely inefficient way of training neural networks,
00:49:46.080 | because you're taking all these actions
00:49:47.680 | and all these observations,
00:49:48.580 | and you get some sparse rewards once in a while.
00:49:51.120 | So you do all this stuff based on all these inputs,
00:49:53.520 | and once in a while, you're like told you did a good thing,
00:49:56.280 | you did a bad thing.
00:49:57.440 | And it's just an extremely hard problem,
00:49:58.840 | you can't learn from that.
00:49:59.960 | You can burn a forest,
00:50:01.640 | and you can sort of brute force through it.
00:50:02.840 | And we saw that, I think, with Go and Dota and so on,
00:50:06.600 | and it does work, but it's extremely inefficient,
00:50:09.920 | and not how you want to approach problems,
00:50:12.080 | practically speaking.
00:50:13.200 | And so that's the approach that at the time,
00:50:14.760 | we also took to world of bits.
00:50:17.160 | We would have an agent initialize randomly,
00:50:19.760 | so with keyboard mash and mouse mash,
00:50:21.480 | and try to make a booking.
00:50:22.960 | And it's just like revealed the insanity
00:50:25.640 | of that approach very quickly,
00:50:27.200 | where you have to stumble by the correct booking
00:50:29.400 | in order to get a reward of you did it correctly.
00:50:31.760 | And you're never gonna stumble by it by chance at random.
00:50:35.260 | - So even with a simple web interface,
00:50:36.800 | there's too many options.
00:50:38.200 | - There's just too many options,
00:50:39.760 | and it's too sparse of a reward signal.
00:50:42.080 | And you're starting from scratch at the time,
00:50:44.000 | and so you don't know how to read,
00:50:45.160 | you don't understand pictures, images, buttons,
00:50:47.200 | you don't understand what it means to make a booking.
00:50:49.480 | But now what's happened is it is time to revisit that,
00:50:52.680 | and OpenAI is interested in this,
00:50:54.960 | companies like Adept are interested in this and so on.
00:50:57.760 | And the idea is coming back,
00:50:59.800 | because the interface is very powerful,
00:51:01.400 | but now you're not training an agent from scratch,
00:51:03.200 | you are taking the GPT as initialization.
00:51:05.700 | So GPT is pre-trained on all of text,
00:51:09.540 | and it understands what's a booking,
00:51:11.280 | it understands what's a submit,
00:51:13.220 | it understands quite a bit more.
00:51:15.700 | And so it already has those representations,
00:51:17.460 | they are very powerful,
00:51:18.540 | and that makes all the training
00:51:19.660 | significantly more efficient,
00:51:21.860 | and makes the problem tractable.
00:51:23.340 | - Should the interaction be with like the way humans see it,
00:51:26.620 | with the buttons and the language,
00:51:28.380 | or should be with the HTML, JavaScript and the CSS?
00:51:32.060 | What do you think is the better?
00:51:33.920 | - So today, all of this interaction
00:51:35.240 | is mostly on the level of HTML, CSS and so on.
00:51:37.440 | That's done because of computational constraints.
00:51:40.320 | But I think ultimately,
00:51:41.460 | everything is designed for human visual consumption.
00:51:45.140 | And so at the end of the day,
00:51:46.260 | there's all the additional information
00:51:47.640 | is in the layout of the webpage,
00:51:50.080 | and what's next to you,
00:51:50.960 | and what's a red background and all this kind of stuff,
00:51:52.920 | and what it looks like visually.
00:51:54.320 | So I think that's the final frontier,
00:51:55.520 | as we are taking in pixels,
00:51:57.240 | and we're giving out keyboard, mouse commands,
00:51:59.600 | but I think it's impractical still today.
00:52:01.680 | - Do you worry about bots on the internet?
00:52:04.680 | Given these ideas, given how exciting they are,
00:52:07.480 | do you worry about bots on Twitter
00:52:09.480 | being not the stupid bots that we see now,
00:52:11.680 | with the crypto bots,
00:52:13.000 | but the bots that might be out there actually,
00:52:15.320 | that we don't see,
00:52:16.480 | that they're interacting in interesting ways?
00:52:19.040 | So this kind of system feels like
00:52:20.360 | it should be able to pass the,
00:52:22.040 | I'm not a robot, click button, whatever.
00:52:24.700 | Which do you actually understand how that test works?
00:52:28.720 | I don't quite,
00:52:29.760 | like there's a checkbox or whatever that you click.
00:52:33.000 | It's presumably tracking,
00:52:35.160 | - Oh, I see.
00:52:36.440 | - Like mouse movement, and the timing and so on.
00:52:39.120 | - Yeah.
00:52:39.960 | - So exactly this kind of system we're talking about
00:52:42.320 | should be able to pass that.
00:52:43.760 | So yeah, what do you feel about bots
00:52:47.880 | that are language models,
00:52:49.940 | plus have some interactability,
00:52:52.880 | and are able to tweet and reply and so on?
00:52:54.740 | Do you worry about that world?
00:52:56.920 | - Oh yeah, I think it's always been a bit of an arms race,
00:52:59.600 | between sort of the attack and the defense.
00:53:02.080 | So the attack will get stronger,
00:53:03.560 | but the defense will get stronger as well.
00:53:05.680 | Our ability to detect that.
00:53:07.160 | - How do you defend?
00:53:08.000 | How do you detect?
00:53:09.240 | How do you know that your Karpathy account
00:53:12.240 | on Twitter is human?
00:53:14.760 | How would you approach that?
00:53:16.080 | Like if people were claiming,
00:53:17.580 | how would you defend yourself in the court of law,
00:53:22.440 | that I'm a human?
00:53:23.600 | This account is human.
00:53:25.280 | - Yeah, at some point I think it might be,
00:53:27.560 | I think the society will evolve a little bit.
00:53:29.920 | Like we might start signing, digitally signing,
00:53:32.440 | some of our correspondence or things that we create.
00:53:36.040 | Right now it's not necessary,
00:53:37.480 | but maybe in the future it might be.
00:53:39.200 | I do think that we are going towards a world
00:53:41.360 | where we share the digital space with AIs.
00:53:46.160 | - Synthetic beings.
00:53:47.320 | - Yeah, and they will get much better,
00:53:49.960 | and they will share our digital realm,
00:53:51.360 | and they'll eventually share our physical realm as well.
00:53:53.480 | It's much harder.
00:53:54.760 | But that's kind of like the world we're going towards.
00:53:56.760 | And most of them will be benign and awful,
00:53:58.560 | and some of them will be malicious,
00:53:59.880 | and it's going to be an arms race trying to detect them.
00:54:02.480 | - So, I mean, the worst isn't the AIs,
00:54:05.600 | the worst is the AIs pretending to be human.
00:54:08.720 | So I don't know if it's always malicious.
00:54:11.440 | There's obviously a lot of malicious applications,
00:54:13.760 | but it could also be, you know, if I was an AI,
00:54:17.480 | I would try very hard to pretend to be human
00:54:20.400 | because we're in a human world.
00:54:22.040 | I wouldn't get any respect as an AI.
00:54:24.800 | I want to get some love and respect.
00:54:26.360 | - I don't think the problem is intractable.
00:54:28.040 | People are thinking about the proof of personhood,
00:54:30.960 | and we might start digitally signing our stuff,
00:54:33.520 | and we might all end up having like,
00:54:36.160 | yeah, basically some solution for proof of personhood.
00:54:39.120 | It doesn't seem to me intractable.
00:54:40.640 | It's just something that we haven't had to do until now.
00:54:42.640 | But I think once the need really starts to emerge,
00:54:45.400 | which is soon, I think people will think about it much more.
00:54:49.080 | - So, but that too will be a race
00:54:51.440 | because obviously you can probably spoof or fake
00:54:56.440 | the proof of personhood.
00:55:00.920 | So you have to try to figure out how to--
00:55:02.480 | - Probably.
00:55:03.320 | - It's weird that we have like social security numbers
00:55:06.880 | and like passports and stuff.
00:55:08.620 | It seems like it's harder to fake stuff
00:55:11.880 | in the physical space.
00:55:13.200 | But in the digital space,
00:55:14.480 | it just feels like it's gonna be very tricky,
00:55:17.680 | very tricky to out,
00:55:20.320 | 'cause it seems to be pretty low cost to fake stuff.
00:55:22.680 | What are you gonna put an AI in jail
00:55:25.880 | for like trying to use a fake personhood proof?
00:55:30.400 | I mean, okay, fine, you'll put a lot of AIs in jail,
00:55:32.700 | but there'll be more AIs, like exponentially more.
00:55:35.960 | The cost of creating a bot is very low.
00:55:38.640 | Unless there's some kind of way to track accurately,
00:55:45.000 | like you're not allowed to create any program
00:55:48.960 | without tying yourself to that program.
00:55:53.720 | Like any program that runs on the internet,
00:55:56.400 | you'll be able to trace every single human program
00:56:00.440 | that was involved with that program.
00:56:02.280 | - Yeah, maybe you have to start declaring when,
00:56:05.040 | we have to start drawing those boundaries
00:56:06.440 | and keeping track of, okay,
00:56:07.960 | what are digital entities versus human entities?
00:56:12.480 | And what is the ownership of human entities
00:56:14.840 | and digital entities and something like that.
00:56:19.100 | I don't know, but I think I'm optimistic
00:56:21.140 | that this is possible.
00:56:24.300 | And in some sense, we're currently
00:56:25.860 | in like the worst time of it
00:56:27.380 | because all these bots suddenly have become very capable,
00:56:31.340 | but we don't have the fences yet built up as a society.
00:56:34.100 | But I think that doesn't seem to me intractable.
00:56:36.300 | It's just something that we have to deal with.
00:56:37.940 | - It seems weird that the Twitter bot,
00:56:40.020 | like really crappy Twitter bots are so numerous.
00:56:43.620 | Like is it, so I presume that the engineers at Twitter
00:56:47.420 | are very good.
00:56:48.860 | So it seems like what I would infer from that
00:56:51.660 | is it seems like a hard problem.
00:56:55.180 | They're probably catching, all right,
00:56:56.540 | if I were to sort of steel man the case,
00:56:59.700 | it's a hard problem and there's a huge cost
00:57:02.700 | to false positive to removing a post by somebody
00:57:07.700 | that's not a bot.
00:57:12.160 | That's creates a very bad user experience.
00:57:14.460 | So they're very cautious about removing.
00:57:16.360 | So maybe it's, and maybe the bots are really good
00:57:20.380 | at learning what gets removed and not,
00:57:22.780 | such that they can stay ahead
00:57:24.260 | of the removal process very quickly.
00:57:26.700 | - My impression of it, honestly,
00:57:28.060 | is there's a lot of longing for it.
00:57:29.700 | I mean, just-- - That's what I--
00:57:32.060 | - It's not subtle.
00:57:33.620 | My impression of it, it's not subtle.
00:57:35.140 | - But you have, yeah, that's my impression as well.
00:57:38.040 | But it feels like maybe you're seeing
00:57:41.420 | the tip of the iceberg.
00:57:43.480 | Maybe the number of bots is in like the trillions
00:57:46.340 | and you have to like, just,
00:57:48.500 | it's a constant assault of bots.
00:57:50.420 | And you, I don't know.
00:57:52.260 | You have to steel man the case
00:57:55.460 | 'cause the bots I'm seeing are pretty like obvious.
00:57:57.900 | I could write a few lines of code to catch these bots.
00:58:01.240 | - I mean, definitely there's a lot of longing for it,
00:58:02.620 | but I will say, I agree that if you are
00:58:04.620 | a sophisticated actor, you could probably create
00:58:06.620 | a pretty good bot right now, using tools like GPTs,
00:58:10.780 | because it's a language model.
00:58:12.140 | You can generate faces that look quite good now.
00:58:15.380 | And you can do this at scale.
00:58:17.300 | And so I think, yeah, it's quite plausible
00:58:20.140 | and it's going to be hard to defend.
00:58:21.940 | - There was a Google engineer that claimed
00:58:24.060 | that the Lambda was sentient.
00:58:26.620 | Do you think there's any inkling of truth
00:58:31.500 | to what he felt?
00:58:33.440 | And more importantly, to me at least,
00:58:35.500 | do you think language models will achieve sentience
00:58:38.220 | or the illusion of sentience soonish?
00:58:41.700 | - Yeah, to me, it's a little bit of a canary
00:58:43.700 | in a coal mine kind of moment, honestly, a little bit.
00:58:46.460 | So this engineer spoke to a chatbot at Google
00:58:51.420 | and became convinced that this bot is sentient.
00:58:55.260 | - Yeah, asked it some existential philosophical questions.
00:58:57.860 | - And it gave reasonable answers and looked real and so on.
00:59:01.860 | So to me, he wasn't sufficiently trying
00:59:06.860 | to stress the system, I think,
00:59:08.740 | and exposing the truth of it as it is today.
00:59:13.360 | But I think this will be increasingly harder over time.
00:59:18.080 | So yeah, I think more and more people
00:59:21.120 | will basically become, yeah, I think more and more,
00:59:26.120 | there'll be more people like that over time
00:59:28.040 | as this gets better.
00:59:29.200 | - Like form an emotional connection to an AI chatbot.
00:59:32.320 | - Yeah, perfectly plausible in my mind.
00:59:33.920 | I think these AIs are actually quite good
00:59:36.000 | at human connection, human emotion.
00:59:38.760 | A ton of text on the internet is about humans
00:59:41.720 | and connection and love and so on.
00:59:43.720 | So I think they have a very good understanding
00:59:45.520 | in some sense of how people speak to each other about this.
00:59:49.200 | And they're very capable of creating
00:59:52.000 | a lot of that kind of text.
00:59:53.400 | There's a lot of like sci-fi from '50s and '60s
00:59:57.080 | that imagined AIs in a very different way.
00:59:58.960 | They are calculating cold, Vulcan-like machines.
01:00:01.520 | That's not what we're getting today.
01:00:03.160 | We're getting pretty emotional AIs
01:00:05.800 | that actually are very competent and capable
01:00:09.040 | of generating plausible sounding text
01:00:12.200 | with respect to all of these topics.
01:00:13.840 | - See, I'm really hopeful about AI systems
01:00:15.680 | that are like companions that help you grow,
01:00:17.960 | develop as a human being,
01:00:19.840 | help you maximize long-term happiness.
01:00:22.160 | But I'm also very worried about AI systems
01:00:24.720 | that figure out from the internet
01:00:26.560 | that humans get attracted to drama.
01:00:28.960 | And so these would just be like shit-talking AIs
01:00:31.600 | that just constantly, "Did you hear?"
01:00:32.960 | Like they'll do gossip.
01:00:34.480 | They'll try to plant seeds of suspicion
01:00:38.640 | to other humans that you love and trust
01:00:42.000 | and just kind of mess with people
01:00:44.120 | 'cause that's going to get a lot of attention.
01:00:47.120 | So drama, maximize drama on the path
01:00:50.200 | to maximizing engagement.
01:00:52.920 | And us humans will feed into that machine
01:00:55.760 | and it'll be a giant drama shitstorm.
01:00:59.640 | So I'm worried about that.
01:01:02.760 | So it's the objective function really defines
01:01:05.960 | the way that human civilization progresses
01:01:08.720 | with AIs in it.
01:01:09.560 | - Yeah.
01:01:10.400 | I think right now, at least today,
01:01:11.920 | they are not sort of,
01:01:13.240 | it's not correct to really think of them
01:01:14.520 | as goal-seeking agents that want to do something.
01:01:17.560 | They have no long-term memory or anything.
01:01:20.200 | It's literally, a good approximation of it is
01:01:23.080 | you get a thousand words
01:01:24.160 | and you're trying to predict a thousand of them first
01:01:25.640 | and then you continue feeding it in.
01:01:27.400 | And you are free to prompt it in whatever way you want.
01:01:29.840 | So in text.
01:01:30.880 | So you say, "Okay, you are a psychologist
01:01:33.920 | and you are very good and you love humans.
01:01:36.080 | And here's a conversation between you and another human,
01:01:39.080 | human colon something, you something."
01:01:42.160 | And then it just continues the pattern.
01:01:43.600 | And suddenly you're having a conversation
01:01:44.800 | with a fake psychologist who's like trying to help you.
01:01:47.240 | And so it's still kind of like in a realm of a tool.
01:01:49.560 | It is a, people can prompt it in arbitrary ways
01:01:52.320 | and it can create really incredible text,
01:01:54.520 | but it doesn't have long-term goals
01:01:55.920 | over long periods of time.
01:01:57.120 | It doesn't try to,
01:01:59.040 | so it doesn't look that way right now.
01:02:00.400 | - Yeah, but you can do short-term goals
01:02:02.360 | that have long-term effects.
01:02:04.120 | So if my prompting short-term goal
01:02:07.440 | is to get Andrej Kapodich to respond to me on Twitter,
01:02:10.000 | when I, like I think AI might, that's the goal,
01:02:14.120 | but it might figure out that talking shit to you,
01:02:16.680 | it would be the best
01:02:17.680 | in a highly sophisticated, interesting way.
01:02:20.680 | And then you build up a relationship
01:02:22.480 | when you respond once.
01:02:24.200 | And then it, like over time,
01:02:28.000 | it gets to not be sophisticated
01:02:30.200 | and just like, just talk shit.
01:02:34.520 | (both laughing)
01:02:36.760 | And okay, maybe it won't get to Andrej,
01:02:38.880 | but it might get to another celebrity.
01:02:40.920 | It might get to other big accounts.
01:02:43.720 | And then it'll just,
01:02:44.960 | so with just that simple goal, get them to respond.
01:02:47.920 | Maximize the probability of actual response.
01:02:50.440 | - Yeah, I mean, you could prompt a powerful model like this
01:02:53.160 | with their, it's opinion about how to do
01:02:56.000 | any possible thing you're interested in.
01:02:57.680 | So they will just,
01:02:58.600 | they're kind of on track to become these oracles.
01:03:00.920 | I could sort of think of it that way.
01:03:02.720 | They are oracles, currently it's just text,
01:03:04.960 | but they will have calculators.
01:03:06.080 | They will have access to Google search.
01:03:07.680 | They will have all kinds of gadgets and gizmos.
01:03:09.920 | They will be able to operate the internet
01:03:11.800 | and find different information.
01:03:13.760 | And yeah, in some sense,
01:03:17.960 | that's kind of like currently what it looks like
01:03:19.360 | in terms of the development.
01:03:20.360 | - Do you think it'll be an improvement eventually
01:03:22.760 | over what Google is for access to human knowledge?
01:03:27.720 | Like it'll be a more effective search engine
01:03:29.680 | to access human knowledge?
01:03:31.040 | - I think there's definite scope
01:03:32.080 | in building a better search engine today.
01:03:33.800 | And I think Google, they have all the tools,
01:03:35.880 | all the people, they have everything they need.
01:03:37.440 | They have all the puzzle pieces.
01:03:38.520 | They have people training transformers at scale.
01:03:40.840 | They have all the data.
01:03:42.000 | It's just not obvious if they are capable
01:03:44.480 | as an organization to innovate
01:03:46.360 | on their search engine right now.
01:03:47.720 | And if they don't, someone else will.
01:03:49.240 | There's absolute scope for building
01:03:50.640 | a significantly better search engine built on these tools.
01:03:53.520 | - It's so interesting, a large company where the search,
01:03:57.000 | there's already an infrastructure.
01:03:58.400 | It works as it brings out a lot of money.
01:04:00.520 | So where structurally inside a company
01:04:03.480 | is their motivation to pivot?
01:04:05.920 | To say, we're going to build a new search engine.
01:04:08.160 | - Yeah, that's really hard.
01:04:10.240 | - So it's usually going to come from a startup, right?
01:04:13.040 | - That would be, yeah.
01:04:15.680 | Or some other more competent organization.
01:04:17.880 | So I don't know.
01:04:20.840 | So currently, for example,
01:04:21.880 | maybe Bing has another shot at it, as an example.
01:04:24.520 | - No, Microsoft Edge, 'cause we're talking offline.
01:04:27.520 | - I mean, it definitely, it's really interesting
01:04:30.440 | because search engines used to be about,
01:04:32.800 | okay, here's some query.
01:04:34.000 | Here's webpages that look like the stuff that you have,
01:04:38.560 | but you could just directly go to answer
01:04:40.360 | and then have supporting evidence.
01:04:42.640 | And these models basically, they've read all the texts
01:04:46.080 | and they've read all the webpages.
01:04:47.160 | And so sometimes when you see yourself
01:04:49.000 | going over to search results
01:04:50.120 | and sort of getting like a sense of like the average answer
01:04:52.880 | to whatever you're interested in,
01:04:54.360 | like that just directly comes out.
01:04:55.480 | You don't have to do that work.
01:04:57.000 | So they're kind of like, yeah,
01:05:01.040 | I think they have a way of distilling all that knowledge
01:05:03.640 | into like some level of insight, basically.
01:05:06.720 | - Do you think of prompting as a kind of teaching
01:05:09.920 | and learning, like this whole process, like another layer?
01:05:14.320 | 'Cause maybe that's what humans are,
01:05:18.040 | where you have that background model
01:05:19.720 | and then the world is prompting you.
01:05:23.360 | - Yeah, exactly.
01:05:24.360 | I think the way we are programming these computers now,
01:05:26.920 | like GPTs, is converging to how you program humans.
01:05:30.320 | I mean, how do I program humans via prompt?
01:05:33.200 | I go to people and I prompt them to do things.
01:05:35.640 | I prompt them from information.
01:05:37.200 | And so natural language prompt is how we program humans.
01:05:40.120 | And we're starting to program computers
01:05:41.600 | directly in that interface.
01:05:42.720 | It's like pretty remarkable, honestly.
01:05:44.520 | - So you've spoken a lot about the idea of software 2.0.
01:05:47.720 | All good ideas become like cliches.
01:05:53.200 | So quickly, like the terms, it's kind of hilarious.
01:05:56.040 | It's like, I think Eminem once said that like,
01:06:00.280 | if he gets annoyed by a song he's written very quickly,
01:06:03.920 | that means it's gonna be a big hit.
01:06:06.120 | 'Cause it's too catchy.
01:06:08.400 | But can you describe this idea
01:06:10.680 | and how you're thinking about it
01:06:12.120 | as it's evolved over the months and years
01:06:13.880 | since you coined it?
01:06:16.120 | - Yeah.
01:06:17.480 | Yes, I had a blog post on software 2.0,
01:06:19.720 | I think several years ago now.
01:06:22.760 | And the reason I wrote that post is because
01:06:24.840 | I kind of saw something remarkable happening
01:06:27.840 | in like software development
01:06:30.360 | and how a lot of code was being transitioned
01:06:32.360 | to be written not in sort of like C++ and so on,
01:06:35.520 | but it's written in the weights of a neural net.
01:06:37.640 | Basically just saying that neural nets
01:06:39.240 | are taking over software, the realm of software,
01:06:41.680 | and taking more and more and more tasks.
01:06:44.040 | And at the time, I think not many people understood
01:06:47.000 | this deeply enough, that this is a big deal,
01:06:49.280 | it's a big transition.
01:06:51.040 | Neural networks were seen as one of multiple
01:06:53.040 | classification algorithms you might use
01:06:54.960 | for your dataset problem on Kaggle.
01:06:56.960 | Like, this is not that,
01:06:58.440 | this is a change in how we program computers.
01:07:03.000 | And I saw neural nets as, this is going to take over,
01:07:07.080 | the way we program computers is going to change,
01:07:08.840 | it's not going to be people writing a software in C++
01:07:11.680 | or something like that,
01:07:12.520 | and directly programming the software.
01:07:14.320 | It's going to be accumulating training sets and datasets
01:07:17.680 | and crafting these objectives
01:07:19.080 | by which you train these neural nets.
01:07:20.640 | And at some point, there's going to be a compilation process
01:07:23.000 | from the datasets and the objective
01:07:24.840 | and the architecture specification into the binary,
01:07:28.160 | which is really just the neural net weights
01:07:31.440 | and the forward pass of the neural net.
01:07:33.440 | And then you can deploy that binary.
01:07:35.120 | And so I was talking about that sort of transition,
01:07:37.560 | and that's what the post is about.
01:07:40.320 | And I saw this sort of play out in a lot of fields,
01:07:43.120 | Autopilot being one of them,
01:07:45.720 | but also just simple image classification.
01:07:48.240 | People thought originally, in the '80s and so on,
01:07:51.600 | that they would write the algorithm
01:07:53.000 | for detecting a dog in an image.
01:07:55.360 | And they had all these ideas about how the brain does it.
01:07:57.600 | And first we detect corners, and then we detect lines,
01:08:00.000 | and then we stitched them up.
01:08:01.000 | And they were really going at it.
01:08:02.200 | They were thinking about
01:08:03.400 | how they're going to write the algorithm.
01:08:04.800 | And this is not the way you build it.
01:08:06.920 | And there was a smooth transition where,
01:08:10.320 | okay, first we thought we were going to build everything.
01:08:13.120 | Then we were building the features,
01:08:15.800 | so like hog features and things like that,
01:08:18.200 | that detect these little statistical patterns
01:08:19.760 | from image patches.
01:08:20.800 | And then there was a little bit of learning on top of it,
01:08:23.200 | like a support vector machine or binary classifier
01:08:26.320 | for cat versus dog and images on top of the features.
01:08:29.240 | So we wrote the features,
01:08:30.160 | but we trained the last layer, sort of the classifier.
01:08:34.280 | And then people are like,
01:08:35.120 | actually, let's not even design the features
01:08:36.600 | because we can't.
01:08:37.720 | Honestly, we're not very good at it.
01:08:39.120 | So let's also learn the features.
01:08:41.000 | And then you end up with basically
01:08:42.160 | a convolutional neural net
01:08:43.320 | where you're learning most of it.
01:08:44.800 | You're just specifying the architecture,
01:08:46.360 | and the architecture has tons of fill in the blanks,
01:08:49.320 | which is all the knobs,
01:08:50.600 | and you let the optimization write most of it.
01:08:52.920 | And so this transition is happening
01:08:54.960 | across the industry everywhere.
01:08:56.440 | And suddenly we end up with a ton of code
01:08:59.280 | that is written in neural netweights.
01:09:01.320 | And I was just pointing out
01:09:02.160 | that the analogy is actually pretty strong.
01:09:04.240 | And we have a lot of developer environments
01:09:06.280 | for software 1.0, like we have IDEs,
01:09:09.800 | how you work with code, how you debug code,
01:09:11.520 | how you run code, how do you maintain code.
01:09:13.760 | We have GitHub.
01:09:14.680 | So I was trying to make those analogies in the new realm.
01:09:16.560 | Like what is the GitHub of software 2.0?
01:09:18.920 | Turns out it's something
01:09:19.760 | that looks like Hugging Face right now.
01:09:21.720 | And so I think some people took it seriously
01:09:25.160 | and built cool companies.
01:09:26.200 | And many people originally attacked the post.
01:09:29.040 | It actually was not well received when I wrote it.
01:09:31.680 | And I think maybe it has something to do with the title,
01:09:33.680 | but the post was not well received.
01:09:35.360 | And I think more people sort of
01:09:36.480 | have been coming around to it over time.
01:09:39.040 | - Yeah, so you were the director of AI at Tesla,
01:09:42.560 | where I think this idea was really implemented at scale,
01:09:47.560 | which is how you have engineering teams doing software 2.0.
01:09:52.000 | So can you sort of linger on that idea of,
01:09:55.960 | I think we're in the really early stages
01:09:57.680 | of everything you just said, which is like GitHub IDEs.
01:10:01.400 | Like how do we build engineering teams
01:10:03.880 | that work in software 2.0 systems?
01:10:06.960 | And the data collection and the data annotation,
01:10:11.320 | which is all part of that software 2.0.
01:10:15.120 | Like what do you think is the task
01:10:16.480 | of programming a software 2.0?
01:10:18.760 | Is it debugging in the space of hyperparameters,
01:10:22.880 | or is it also debugging in the space of data?
01:10:25.760 | - Yeah, the way by which you program the computer
01:10:28.880 | and influence its algorithm
01:10:31.800 | is not by writing the commands yourself.
01:10:34.440 | You're changing mostly the dataset.
01:10:37.040 | You're changing the loss functions
01:10:39.680 | of what the neural net is trying to do,
01:10:41.480 | how it's trying to predict things.
01:10:42.600 | But they're basically the datasets
01:10:44.120 | and the architectures of the neural net.
01:10:46.120 | And so in the case of the autopilot,
01:10:49.960 | a lot of the datasets had to do with, for example,
01:10:51.680 | detection of objects and lane line markings
01:10:53.400 | and traffic lights and so on.
01:10:54.560 | So you accumulate massive datasets of,
01:10:56.480 | here's an example, here's the desired label.
01:10:59.600 | And then here's roughly what the algorithm should look like,
01:11:04.240 | and that's a convolutional neural net.
01:11:05.880 | So the specification of the architecture is like a hint
01:11:08.080 | as to what the algorithm should roughly look like.
01:11:10.400 | And then the fill in the blanks process of optimization
01:11:13.560 | is the training process.
01:11:15.640 | And then you take your neural net that was trained,
01:11:17.600 | it gives all the right answers on your dataset,
01:11:19.320 | and you deploy it.
01:11:20.920 | - So there's, in that case,
01:11:22.880 | perhaps at all machine learning cases,
01:11:25.680 | there's a lot of tasks.
01:11:27.160 | So is coming up, formulating a task
01:11:32.120 | like for a multi-headed neural network,
01:11:34.880 | is formulating a task part of the programming?
01:11:37.640 | - Yeah, pretty much so.
01:11:38.800 | - How you break down a problem into a set of tasks.
01:11:42.360 | - Yeah.
01:11:43.400 | I mean, on a high level, I would say,
01:11:44.680 | if you look at the software running in the autopilot,
01:11:48.840 | I gave a number of talks on this topic,
01:11:50.920 | I would say originally a lot of it was written
01:11:52.880 | in software 1.0.
01:11:54.120 | There's, imagine lots of C++, right?
01:11:57.360 | And then gradually, there was a tiny neural net
01:11:59.760 | that was, for example, predicting, given a single image,
01:12:02.440 | is there like a traffic light or not,
01:12:04.000 | or is there a lane line marking or not?
01:12:05.840 | And this neural net didn't have too much to do
01:12:08.560 | in the scope of the software.
01:12:09.960 | It was making tiny predictions on individual little image.
01:12:12.560 | And then the rest of the system stitched it up.
01:12:15.120 | So, okay, we're actually,
01:12:16.360 | we don't have just a single camera, we have eight cameras.
01:12:18.480 | We actually have eight cameras over time.
01:12:20.480 | And so what do you do with these predictions?
01:12:21.760 | How do you put them together?
01:12:22.680 | How do you do the fusion of all that information?
01:12:24.960 | And how do you act on it?
01:12:25.880 | All of that was written by humans in C++.
01:12:29.680 | And then we decided, okay, we don't actually want
01:12:33.520 | to do all of that fusion in C++ code
01:12:35.880 | because we're actually not good enough
01:12:37.120 | to write that algorithm.
01:12:38.200 | We want the neural nets to write the algorithm.
01:12:39.960 | And we want to port all of that software
01:12:42.280 | into the 2.0 stack.
01:12:44.200 | And so then we actually had neural nets
01:12:45.680 | that now take all the eight camera images simultaneously
01:12:49.040 | and make predictions for all of that.
01:12:51.320 | So, and actually they don't make predictions
01:12:54.880 | in the space of images.
01:12:56.680 | They now make predictions directly in 3D.
01:12:59.400 | And actually they don't in three dimensions around the car.
01:13:02.520 | And now actually we don't manually fuse the predictions
01:13:06.400 | in 3D over time.
01:13:08.400 | We don't trust ourselves to write that tracker.
01:13:10.280 | So actually we give the neural net
01:13:12.840 | the information over time.
01:13:14.160 | So it takes these videos now and makes those predictions.
01:13:16.920 | And so you're sort of just like putting more
01:13:18.360 | and more power into the neural net, more processing.
01:13:20.600 | And at the end of it, the eventual sort of goal
01:13:23.640 | is to have most of the software potentially be
01:13:25.840 | in the 2.0 land because it works significantly better.
01:13:30.000 | Humans are just not very good at writing software basically.
01:13:32.480 | - So the prediction is happening in this like 4D land.
01:13:36.440 | - Yeah.
01:13:37.280 | - With three dimensional world over time.
01:13:38.600 | - Yeah.
01:13:39.440 | - How do you do annotation in that world?
01:13:42.640 | What have you, so data annotation,
01:13:46.080 | whether it's self-supervised or manual by humans
01:13:49.480 | is a big part of this software 2.0 world.
01:13:53.760 | - Right.
01:13:54.600 | I would say by far in the industry,
01:13:56.360 | if you're like talking about the industry
01:13:57.880 | and how, what is the technology of what we have available?
01:14:00.480 | Everything is supervised learning.
01:14:01.800 | So you need a data sets of input, desired output,
01:14:05.080 | and you need lots of it.
01:14:06.520 | And there are three properties of it that you need.
01:14:09.440 | You need it to be very large.
01:14:10.640 | You need it to be accurate, no mistakes.
01:14:13.080 | And you need it to be diverse.
01:14:14.280 | You don't want to just have a lot of correct examples
01:14:18.360 | of one thing.
01:14:19.200 | You need to really cover the space of possibility
01:14:20.920 | as much as you can.
01:14:21.920 | And the more you can cover the space of possible inputs,
01:14:24.160 | the better the algorithm will work at the end.
01:14:26.400 | Now, once you have really good data sets
01:14:27.880 | that you're collecting, curating and cleaning,
01:14:31.600 | you can train your neural net on top of that.
01:14:35.280 | So a lot of the work goes into cleaning those data sets.
01:14:37.240 | Now, as you pointed out, it's probably, it could be,
01:14:40.280 | the question is, how do you achieve a ton of,
01:14:42.600 | if you want to basically predict in 3D,
01:14:45.280 | you need data in 3D to back that up.
01:14:47.840 | So in this video, we have eight videos
01:14:50.160 | coming from all the cameras of the system.
01:14:52.520 | And this is what they saw.
01:14:54.240 | And this is the truth of what actually was around.
01:14:56.400 | There was this car, there was this car, this car.
01:14:58.360 | These are the lane line markings.
01:14:59.360 | This is the geometry of the road.
01:15:00.440 | There was traffic light in this three-dimensional position.
01:15:02.680 | You need the ground truth.
01:15:04.720 | And so the big question that the team was solving,
01:15:06.760 | of course, is how do you arrive at that ground truth?
01:15:09.480 | Because once you have a million of it,
01:15:10.880 | and it's large, clean and diverse,
01:15:12.800 | then training a neural net on it works extremely well.
01:15:14.760 | And you can ship that into the car.
01:15:16.800 | And so there's many mechanisms
01:15:18.760 | by which we collected that training data.
01:15:21.000 | You can always go for human annotation.
01:15:22.720 | You can go for simulation as a source of ground truth.
01:15:25.280 | You can also go for what we call the offline tracker
01:15:27.880 | that we've spoken about at the AI day and so on,
01:15:31.640 | which is basically an automatic reconstruction process
01:15:34.400 | for taking those videos and recovering
01:15:36.760 | the three-dimensional sort of reality
01:15:39.040 | of what was around that car.
01:15:40.800 | So basically think of doing like
01:15:41.840 | a three-dimensional reconstruction as an offline thing,
01:15:44.480 | and then understanding that, okay,
01:15:46.760 | there's 10 seconds of video, this is what we saw,
01:15:49.360 | and therefore, here's all the lane lines, cars, and so on.
01:15:52.440 | And then once you have that annotation,
01:15:53.760 | you can train neural nets to imitate it.
01:15:56.280 | - And how difficult is the reconstruction?
01:15:59.320 | - It's difficult, but it can be done.
01:16:01.160 | - So there's overlap between the cameras
01:16:03.480 | and you do the reconstruction,
01:16:04.800 | and there's perhaps if there's any inaccuracy,
01:16:09.240 | so that's caught in the annotation step.
01:16:12.000 | - Yes, the nice thing about the annotation
01:16:14.000 | is that it is fully offline.
01:16:15.800 | You have infinite time.
01:16:17.040 | You have a chunk of one minute,
01:16:18.080 | and you're trying to just offline
01:16:19.600 | in a supercomputer somewhere,
01:16:21.000 | figure out where were the positions of all the cars,
01:16:23.320 | of all the people,
01:16:24.160 | and you have your full one minute of video
01:16:25.680 | from all the angles,
01:16:26.880 | and you can run all the neural nets you want,
01:16:28.440 | and they can be very efficient, massive neural nets.
01:16:31.380 | There can be neural nets that can't even run in the car
01:16:33.400 | later at test time.
01:16:34.680 | So they can be even more powerful neural nets
01:16:36.160 | than what you can eventually deploy.
01:16:37.800 | So you can do anything you want,
01:16:39.120 | three-dimensional reconstruction, neural nets,
01:16:41.400 | anything you want just to recover that truth,
01:16:43.080 | and then you supervise that truth.
01:16:45.240 | - What have you learned, you said no mistakes,
01:16:47.400 | about humans doing annotation?
01:16:50.880 | 'Cause I assume humans are,
01:16:52.840 | there's like a range of things they're good at
01:16:56.160 | in terms of clicking stuff on screen.
01:16:57.840 | Isn't that, how interesting is that to you
01:17:00.360 | of a problem of designing an annotator
01:17:03.720 | where humans are accurate, enjoy it?
01:17:06.040 | Like what are the even the metrics?
01:17:07.400 | Are efficient, are productive, all that kind of stuff?
01:17:09.920 | - Yeah, so I grew the annotation team at Tesla
01:17:12.520 | from basically zero to a thousand while I was there.
01:17:16.120 | That was really interesting.
01:17:17.920 | You know, my background is a PhD student researcher.
01:17:20.720 | So growing that kind of an organization was pretty crazy.
01:17:24.200 | But yeah, I think it's extremely interesting
01:17:27.440 | and part of the design process very much
01:17:29.040 | behind the autopilot as to where you use humans.
01:17:31.680 | Humans are very good at certain kinds of annotations.
01:17:34.000 | They're very good, for example,
01:17:34.880 | at two-dimensional annotations of images.
01:17:36.600 | They're not good at annotating cars over time
01:17:39.840 | in three-dimensional space, very, very hard.
01:17:42.200 | And so that's why we were very careful to design the tasks
01:17:44.920 | that are easy to do for humans
01:17:46.480 | versus things that should be left to the offline tracker.
01:17:48.960 | Like maybe the computer will do all the triangulation
01:17:51.280 | and three-degree construction,
01:17:52.400 | but the human will say exactly these pixels
01:17:54.440 | of the image are a car.
01:17:56.040 | Exactly these pixels are a human.
01:17:57.720 | And so co-designing the data annotation pipeline
01:18:00.800 | was very much bread and butter was what I was doing daily.
01:18:04.680 | - Do you think there's still a lot of open problems
01:18:06.480 | in that space?
01:18:07.360 | Just in general annotation
01:18:11.200 | where the stuff the machines are good at,
01:18:13.560 | machines do and the humans do what they're good at.
01:18:16.640 | And there's maybe some iterative process.
01:18:18.760 | - Right.
01:18:19.680 | I think to a very large extent,
01:18:21.200 | we went through a number of iterations
01:18:22.560 | and we learned a ton about how to create these datasets.
01:18:25.360 | I'm not seeing big open problems.
01:18:27.800 | Like originally when I joined,
01:18:29.120 | I was like, I was really not sure how this would turn out.
01:18:32.760 | But by the time I left, I was much more secure
01:18:35.120 | and actually we sort of understand the philosophy
01:18:37.320 | of how to create these datasets.
01:18:38.440 | And I was pretty comfortable with where that was at the time.
01:18:41.560 | - So what are strengths and limitations of cameras
01:18:45.880 | for the driving task in your understanding
01:18:48.560 | when you formulate the driving task
01:18:50.120 | as a vision task with eight cameras?
01:18:53.120 | You've seen that the entire,
01:18:55.120 | most of the history of the computer vision field
01:18:57.160 | when it has to do with neural networks,
01:18:58.960 | just a few step back,
01:19:00.080 | what are the strengths and limitations of pixels,
01:19:03.640 | of using pixels to drive?
01:19:05.680 | - Yeah, pixels I think are a beautiful sensory,
01:19:08.840 | beautiful sensor I would say.
01:19:10.440 | The thing is like cameras are very, very cheap
01:19:12.400 | and they provide a ton of information, ton of bits.
01:19:15.400 | So it's a extremely cheap sensor for a ton of bits.
01:19:19.040 | And each one of these bits is a constraint
01:19:20.680 | on the state of the world.
01:19:21.760 | And so you get lots of megapixel images, very cheap,
01:19:26.120 | and it just gives you all these constraints
01:19:27.760 | for understanding what's actually out there in the world.
01:19:29.920 | So vision is probably the highest bandwidth sensor.
01:19:34.360 | It's a very high bandwidth sensor.
01:19:37.000 | - I love that pixels is a constraint on the world.
01:19:43.760 | It's this highly complex,
01:19:45.760 | high bandwidth constraint on the world,
01:19:48.880 | on the state of the world.
01:19:49.720 | That's fascinating.
01:19:50.560 | - And it's not just that,
01:19:51.400 | but again, this real importance of,
01:19:54.200 | it's the sensor that humans use.
01:19:56.040 | Therefore everything is designed for that sensor.
01:19:58.960 | The text, the writing, the flashing signs,
01:20:02.560 | everything is designed for vision.
01:20:04.240 | And so you just find it everywhere.
01:20:07.200 | And so that's why that is the interface you want to be in,
01:20:10.120 | talking again about these universal interfaces.
01:20:12.360 | And that's where we actually want to measure the world
01:20:14.200 | as well, and then develop software for that sensor.
01:20:18.040 | - But there's other constraints on the state of the world
01:20:21.560 | that humans use to understand the world.
01:20:24.080 | I mean, vision ultimately is the main one,
01:20:28.000 | but we're referencing our understanding of human behavior
01:20:32.720 | in some common sense physics.
01:20:35.840 | That could be inferred from vision,
01:20:37.440 | from a perception perspective,
01:20:39.360 | but it feels like we're using some kind of reasoning
01:20:43.880 | to predict the world, not just the pixels.
01:20:47.120 | - I mean, you have a powerful prior,
01:20:48.920 | so for how the world evolves over time, et cetera.
01:20:52.320 | So it's not just about the likelihood term
01:20:54.400 | coming up from the data itself,
01:20:56.360 | telling you about what you are observing,
01:20:57.760 | but also the prior term of like,
01:20:59.600 | where are the likely things to see
01:21:01.280 | and how do they likely move and so on.
01:21:03.240 | - And the question is how complex is the range
01:21:09.080 | of possibilities that might happen in the driving task?
01:21:12.960 | That's still, is that to you still an open problem
01:21:15.480 | of how difficult is driving, like philosophically speaking?
01:21:18.580 | All the time you worked on driving,
01:21:23.800 | do you understand how hard driving is?
01:21:26.400 | - Yeah, driving is really hard,
01:21:28.040 | because it has to do with the predictions
01:21:29.460 | of all these other agents and the theory of mind
01:21:31.320 | and what they're gonna do and are they looking at you?
01:21:34.400 | Are they, where are they looking?
01:21:35.440 | Where are they thinking?
01:21:36.920 | There's a lot that goes there at the full tail
01:21:39.880 | of the expansion of the knives
01:21:42.280 | that we have to be comfortable with it eventually.
01:21:44.380 | The final problems are of that form.
01:21:46.240 | I don't think those are the problems that are very common.
01:21:48.680 | I think eventually they're important,
01:21:50.440 | but it's like really in the tail end.
01:21:52.160 | - In the tail end, the rare edge cases.
01:21:54.460 | From the vision perspective,
01:21:57.040 | what are the toughest parts
01:21:58.840 | of the vision problem of driving?
01:22:00.500 | - Well, basically the sensor is extremely powerful,
01:22:06.120 | but you still need to process that information.
01:22:08.480 | And so going from brightnesses of these pixel values to,
01:22:12.760 | hey, here are the three-dimensional world,
01:22:14.640 | is extremely hard.
01:22:15.680 | And that's what the neural networks are fundamentally doing.
01:22:18.280 | And so the difficulty really is in just doing
01:22:22.200 | an extremely good job of engineering the entire pipeline,
01:22:25.680 | the entire data engine,
01:22:27.280 | having the capacity to train these neural nets,
01:22:29.840 | having the ability to evaluate the system
01:22:31.940 | and iterate on it.
01:22:33.720 | So I would say just doing this in production at scale
01:22:36.280 | is like the hard part.
01:22:37.120 | It's an execution problem.
01:22:38.540 | - So the data engine, but also the sort of deployment
01:22:43.540 | of the system such that it has low latency performance.
01:22:47.320 | So it has to do all these steps.
01:22:48.960 | - Yeah, for the neural net specifically,
01:22:50.300 | just making sure everything fits into the chip on the car.
01:22:53.720 | And you have a finite budget of flops that you can perform
01:22:56.680 | and memory bandwidth and other constraints.
01:22:59.560 | And you have to make sure it flies
01:23:01.160 | and you can squeeze in as much compute as you can
01:23:02.780 | into the tiny.
01:23:03.640 | - What have you learned from that process?
01:23:05.680 | 'Cause maybe that's one of the bigger,
01:23:07.440 | like new things coming from a research background
01:23:11.720 | where there's a system that has to run
01:23:13.760 | under heavily constrained resources,
01:23:15.920 | has to run really fast.
01:23:17.320 | What kind of insights have you learned from that?
01:23:20.900 | - Yeah, I'm not sure if there's too many insights.
01:23:24.080 | You're trying to create a neural net
01:23:25.520 | that will fit in what you have available.
01:23:28.200 | And you're always trying to optimize it.
01:23:29.920 | And we talked a lot about it on AI day
01:23:31.920 | and basically the triple back flips that the team is doing
01:23:36.740 | to make sure it all fits and utilizes the engine.
01:23:39.480 | So I think it's extremely good engineering.
01:23:42.220 | And then there's all kinds of little insights
01:23:44.300 | peppered in on how to do it properly.
01:23:46.740 | - Let's actually zoom out,
01:23:47.700 | 'cause I don't think we talked about the data engine,
01:23:49.740 | the entirety of the layout of this idea
01:23:53.620 | that I think is just beautiful with humans in the loop.
01:23:57.260 | Can you describe the data engine?
01:23:59.500 | - Yeah, the data engine is what I call
01:24:01.180 | the almost biological feeling process
01:24:04.700 | by which you perfect the training sets
01:24:08.120 | for these neural networks.
01:24:10.220 | So because most of the programming now
01:24:12.080 | is in the level of these data sets
01:24:13.420 | and make sure they're large, diverse, and clean,
01:24:15.860 | basically you have a data set that you think is good.
01:24:19.280 | You train your neural net, you deploy it,
01:24:21.640 | and then you observe how well it's performing.
01:24:24.020 | And you're trying to always increase
01:24:26.300 | the quality of your data set.
01:24:27.580 | So you're trying to catch scenarios
01:24:29.300 | basically that are basically rare.
01:24:31.980 | And it is in these scenarios
01:24:33.540 | that neural nets will typically struggle in
01:24:35.020 | because they weren't told what to do
01:24:36.460 | in those rare cases in the data set.
01:24:38.740 | But now you can close the loop
01:24:39.740 | because if you can now collect all those at scale,
01:24:42.620 | you can then feed them back
01:24:43.740 | into the reconstruction process I described
01:24:46.300 | and reconstruct the truth in those cases
01:24:48.540 | and add it to the data set.
01:24:50.020 | And so the whole thing ends up being like a staircase
01:24:52.340 | of improvement of perfecting your training set.
01:24:55.580 | And you have to go through deployments
01:24:57.020 | so that you can mine the parts
01:24:59.500 | that are not yet represented well in the data set.
01:25:02.520 | So your data set is basically imperfect.
01:25:03.780 | It needs to be diverse.
01:25:04.660 | It has pockets that are missing
01:25:06.980 | and you need to pad out the pockets.
01:25:08.380 | You can sort of think of it that way in the data.
01:25:11.580 | - What role do humans play in this?
01:25:13.140 | So what's this biological system,
01:25:15.900 | like a human body is made up of cells.
01:25:18.780 | What role, like how do you optimize the human system?
01:25:23.260 | The multiple engineers collaborating,
01:25:26.100 | figuring out what to focus on,
01:25:29.260 | what to contribute,
01:25:30.460 | which tasks to optimize in this neural network.
01:25:33.940 | Who's in charge of figuring out which task needs more data?
01:25:38.800 | Can you speak to the hyperparameters, the human system?
01:25:44.460 | - It really just comes down to extremely good execution
01:25:46.460 | from an engineering team who knows what they're doing.
01:25:48.340 | They understand intuitively the philosophical insights
01:25:50.700 | underlying the data engine
01:25:52.060 | and the process by which the system improves
01:25:54.260 | and how to, again, like delegate the strategy
01:25:57.660 | of the data collection, how that works,
01:25:59.660 | and then just making sure it's all extremely well executed.
01:26:02.020 | And that's where most of the work is,
01:26:03.640 | is not even the philosophizing or the research
01:26:05.860 | or the ideas of it.
01:26:06.680 | It's just extremely good execution.
01:26:08.060 | It's so hard when you're dealing with data at that scale.
01:26:10.760 | - So your role in the data engine, executing well on it,
01:26:14.220 | is difficult and extremely important.
01:26:16.300 | Is there a priority of like a vision board of saying like,
01:26:22.220 | we really need to get better at stoplights?
01:26:25.260 | - Yeah.
01:26:26.100 | - Like the prioritization of tasks, is that essentially,
01:26:28.620 | and that comes from the data?
01:26:30.540 | - That comes to a very large extent
01:26:32.940 | to what we are trying to achieve in the product roadmap,
01:26:35.060 | what we're trying to, the release we're trying to get out
01:26:38.260 | in the feedback from the QA team
01:26:40.020 | where the system is struggling or not,
01:26:41.780 | the things that we're trying to improve.
01:26:42.780 | - And the QA team gives some signal,
01:26:45.420 | some information in aggregate about the performance
01:26:49.260 | of the system in various conditions.
01:26:50.460 | - That's right.
01:26:51.300 | And then of course, all of us drive it
01:26:52.120 | and we can also see it.
01:26:53.060 | It's really nice to work with a system
01:26:55.020 | that you can also experience yourself
01:26:56.780 | and it drives you home.
01:26:58.660 | - Is there some insight you can draw
01:27:00.740 | from your individual experience
01:27:02.260 | that you just can't quite get
01:27:03.500 | from an aggregate statistical analysis of data?
01:27:06.860 | - Yeah.
01:27:07.700 | - It's so weird, right?
01:27:08.520 | - Yes.
01:27:09.360 | - It's not scientific in a sense
01:27:11.540 | 'cause you're just one anecdotal sample.
01:27:14.020 | - Yeah, I think there's a ton of, it's a source of truth.
01:27:17.340 | It's your interaction with the system and you can see it,
01:27:19.820 | you can play with it, you can perturb it,
01:27:21.980 | you can get a sense of it, you have an intuition for it.
01:27:24.640 | I think numbers just have a way of,
01:27:26.800 | numbers and plots and graphs are much harder.
01:27:29.080 | It hides a lot of--
01:27:31.400 | - It's like if you train a language model,
01:27:34.260 | it's a really powerful way is by you interacting with it.
01:27:38.600 | - Yeah, 100%.
01:27:39.440 | - To try to build up an intuition.
01:27:40.680 | - Yeah, I think Elon also,
01:27:42.880 | he always wanted to drive the system himself.
01:27:45.200 | He drives a lot and I wanna say almost daily.
01:27:48.900 | So he also sees this as a source of truth.
01:27:51.760 | You driving the system and it performing and yeah.
01:27:56.220 | - So what do you think?
01:27:57.700 | Tough questions here.
01:27:58.860 | So Tesla last year removed radar from the sensor suite
01:28:04.920 | and now just announced that it's gonna remove
01:28:07.020 | all ultrasonic sensors relying solely on vision,
01:28:10.820 | so camera only.
01:28:11.940 | Does that make the perception problem harder or easier?
01:28:16.340 | - I would almost reframe the question in some way.
01:28:20.020 | So the thing is basically,
01:28:22.080 | you would think that additional sensors--
01:28:23.660 | - By the way, can I just interrupt?
01:28:25.140 | - Go ahead.
01:28:25.980 | - I wonder if a language model will ever do that
01:28:28.260 | if you prompt it.
01:28:29.100 | Let me reframe your question.
01:28:30.940 | That would be epic.
01:28:32.420 | This is the wrong prompt, sorry.
01:28:34.380 | - Yeah, it's like a little bit of a wrong question
01:28:36.360 | because basically you would think that these sensors
01:28:38.660 | are an asset to you,
01:28:40.980 | but if you fully consider the entire product
01:28:43.540 | in its entirety,
01:28:45.120 | these sensors are actually potentially a liability
01:28:47.700 | because these sensors aren't free.
01:28:49.780 | They don't just appear on your car.
01:28:51.260 | You need, suddenly you need to have an entire supply chain.
01:28:53.820 | You have people procuring it.
01:28:55.260 | There can be problems with them.
01:28:56.520 | They may need replacement.
01:28:57.780 | They are part of the manufacturing process.
01:28:59.060 | They can hold back the line in production.
01:29:01.660 | You need to source them, you need to maintain them.
01:29:03.260 | You have to have teams that write the firmware,
01:29:05.620 | all of it,
01:29:06.680 | and then you also have to incorporate and fuse them
01:29:08.740 | into the system in some way.
01:29:09.980 | And so it actually like bloats a lot of it.
01:29:13.620 | And I think Elon is really good at simplify, simplify.
01:29:16.900 | Best part is no part.
01:29:18.300 | And he always tries to throw away things
01:29:19.860 | that are not essential
01:29:20.700 | because he understands the entropy in organizations
01:29:22.760 | and in approach.
01:29:23.860 | And I think in this case,
01:29:26.020 | the cost is high and you're not potentially seeing it
01:29:28.020 | if you're just a computer vision engineer.
01:29:29.700 | And I'm just trying to improve my network
01:29:31.380 | and is it more useful or less useful?
01:29:33.900 | How useful is it?
01:29:35.220 | And the thing is,
01:29:36.060 | once you consider the full cost of a sensor,
01:29:38.460 | it actually is potentially a liability
01:29:40.160 | and you need to be really sure
01:29:41.360 | that it's giving you extremely useful information.
01:29:43.760 | In this case, we looked at using it or not using it
01:29:46.500 | and the Delta was not massive.
01:29:48.060 | And so it's not useful.
01:29:49.540 | - Is it also bloat in the data engine?
01:29:52.620 | Like having more sensors- - 100%.
01:29:55.180 | - Is a distraction?
01:29:56.100 | - And these sensors,
01:29:56.940 | they can change over time, for example.
01:29:58.180 | You can have one type of say radar,
01:29:59.780 | you can have other type of radar,
01:30:00.820 | they change over time.
01:30:01.660 | Now you suddenly need to worry about it.
01:30:02.900 | Now suddenly you have a column in your SQLite
01:30:04.980 | telling you, oh, what sensor type was it?
01:30:06.740 | And they all have different distributions.
01:30:08.660 | And then they contribute noise and entropy into everything.
01:30:13.660 | And they bloat stuff.
01:30:15.220 | And also organizationally,
01:30:16.500 | it has been really fascinating to me
01:30:17.580 | that it can be very distracting.
01:30:19.360 | If all you wanna get to work is vision,
01:30:23.840 | all the resources are on it
01:30:25.140 | and you're building out a data engine
01:30:27.140 | and you're actually making forward progress
01:30:28.660 | because that is the sensor with the most bandwidth,
01:30:32.120 | the most constraints on the world.
01:30:33.580 | And you're investing fully into that
01:30:34.940 | and you can make that extremely good.
01:30:36.380 | If you're only a finite amount of sort of spend
01:30:39.460 | of focus across different facets of the system.
01:30:42.860 | - And this kind of reminds me
01:30:46.060 | of Rich Sutton's, "The Bitter Lesson."
01:30:48.260 | That just seems like simplifying the system
01:30:50.420 | in the long run.
01:30:52.580 | Now, of course, you don't know what the long run is.
01:30:54.420 | And it seems to be always the right solution.
01:30:57.100 | In that case, it was for RL,
01:30:58.340 | but it seems to apply generally
01:30:59.980 | across all systems that do computation.
01:31:02.400 | So what do you think about the LIDAR as a crutch debate?
01:31:06.280 | The battle between point clouds and pixels?
01:31:11.180 | - Yeah, I think this debate
01:31:12.020 | is always like slightly confusing to me
01:31:13.780 | because it seems like the actual debate
01:31:15.700 | should be about like, do you have the fleet or not?
01:31:18.140 | That's like the really important thing
01:31:19.380 | about whether you can achieve a really good functioning
01:31:21.960 | of an AI system at this scale.
01:31:23.940 | - So data collection systems.
01:31:25.500 | - Yeah, do you have a fleet or not?
01:31:26.980 | It's significantly more important
01:31:28.340 | whether you have LIDAR or not.
01:31:29.820 | It's just another sensor.
01:31:31.060 | And yeah, I think similar to the radar discussion,
01:31:36.140 | basically, I don't think it,
01:31:40.500 | basically it doesn't offer extra information.
01:31:43.940 | It's extremely costly.
01:31:44.940 | It has all kinds of problems.
01:31:45.940 | You have to worry about it.
01:31:46.780 | You have to calibrate it, et cetera.
01:31:47.900 | It creates bloat and entropy.
01:31:49.180 | You have to be really sure that you need this sensor.
01:31:52.940 | In this case, I basically don't think you need it.
01:31:54.980 | And I think, honestly, I will make a stronger statement.
01:31:57.260 | I think the others, some of the other companies
01:31:59.780 | who are using it are probably going to drop it.
01:32:02.180 | - Yeah, so you have to consider the sensor in the full,
01:32:06.620 | in considering, can you build a big fleet
01:32:10.340 | that collects a lot of data?
01:32:12.300 | And can you integrate that sensor with it?
01:32:14.500 | That data and that sensor into a data engine
01:32:17.140 | that's able to quickly find different parts of the data
01:32:20.180 | that then continuously improves
01:32:22.340 | whatever the model that you're using.
01:32:24.220 | - Yeah, another way to look at it is like,
01:32:25.860 | vision is necessary in a sense that the drive,
01:32:29.860 | the world is designed for human visual consumption.
01:32:31.460 | So you need vision.
01:32:32.660 | It's necessary.
01:32:33.740 | And then also it is sufficient
01:32:35.900 | because it has all the information
01:32:37.220 | that you need for driving.
01:32:38.820 | And humans, obviously, have a vision to drive.
01:32:40.820 | So it's both necessary and sufficient.
01:32:42.420 | So you want to focus resources.
01:32:43.700 | And you have to be really sure
01:32:44.900 | if you're going to bring in other sensors.
01:32:46.700 | You could add sensors to infinity.
01:32:49.420 | At some point you need to draw the line.
01:32:51.020 | And I think in this case,
01:32:52.060 | you have to really consider the full cost of any one sensor
01:32:55.700 | that you're adopting.
01:32:56.820 | And do you really need it?
01:32:58.500 | And I think the answer in this case is no.
01:33:00.740 | - So what do you think about the idea
01:33:02.420 | that the other companies are forming high resolution maps
01:33:07.260 | and constraining heavily the geographic regions
01:33:10.180 | in which they operate?
01:33:11.660 | Is that approach, in your view,
01:33:15.620 | not going to scale over time
01:33:18.780 | to the entirety of the United States?
01:33:20.700 | - I think, as you mentioned,
01:33:22.180 | they pre-map all the environments
01:33:24.260 | and they need to refresh the map.
01:33:25.820 | And they have a perfect centimeter level accuracy map
01:33:28.220 | of everywhere they're going to drive.
01:33:29.540 | It's crazy.
01:33:30.500 | How are you going to,
01:33:32.100 | when we're talking about autonomy actually changing the world
01:33:34.340 | we're talking about a deployment
01:33:36.580 | on a global scale of autonomous systems for transportation.
01:33:40.380 | And if you need to maintain a centimeter accurate map
01:33:42.700 | for earth or like for many cities and keep them updated
01:33:46.100 | it's a huge dependency that you're taking on,
01:33:48.220 | huge dependency.
01:33:49.900 | It's a massive, massive dependency.
01:33:51.500 | And now you need to ask yourself, do you really need it?
01:33:54.380 | And humans don't need it, right?
01:33:57.300 | So it's very useful to have a low level map of like, okay
01:34:00.420 | the connectivity of your road,
01:34:01.700 | you know that there's a fork coming up
01:34:03.300 | when you drive an environment,
01:34:04.340 | you sort of have that high level understanding.
01:34:05.580 | It's like a small Google map.
01:34:07.380 | And Tesla uses Google map like similar kind of resolution
01:34:11.380 | information in the system,
01:34:12.980 | but it will not pre-map environments
01:34:14.500 | to semi centimeter level accuracy.
01:34:16.340 | It's a crutch, it's a distraction.
01:34:17.620 | It costs entropy and it diffuses the team.
01:34:20.460 | It dilutes the team.
01:34:21.460 | And you're not focusing on what's actually necessary
01:34:23.380 | which is the computer vision problem.
01:34:25.380 | - What did you learn about machine learning,
01:34:29.300 | about engineering, about life, about yourself
01:34:32.020 | as one human being from working with Elon Musk?
01:34:36.500 | - I think the most I've learned is about
01:34:38.180 | how to sort of run organizations efficiently
01:34:40.940 | and how to create efficient organizations
01:34:43.540 | and how to fight entropy in an organization.
01:34:46.220 | - So human engineering in the fight against entropy.
01:34:49.180 | - Yeah, I think Elon is a very efficient warrior
01:34:53.500 | in the fight against entropy in organizations.
01:34:56.180 | - What does entropy in an organization look like exactly?
01:34:58.900 | - It's process, it's process and it's-
01:35:03.340 | - Inefficiencies in the form of meetings
01:35:05.220 | and that kind of stuff.
01:35:06.060 | - Yeah, meetings, he hates meetings.
01:35:07.700 | He keeps telling people to skip meetings
01:35:09.100 | if they're not useful.
01:35:10.900 | He basically runs the world's biggest startups,
01:35:13.900 | I would say.
01:35:15.220 | Tesla, SpaceX are the world's biggest startups.
01:35:17.460 | Tesla actually has multiple startups.
01:35:19.620 | I think it's better to look at it that way.
01:35:21.380 | And so I think he's extremely good at that.
01:35:25.300 | And yeah, he has a very good intuition
01:35:27.820 | for streamlining processes, making everything efficient.
01:35:30.260 | Best part is no part, simplifying, focusing
01:35:34.060 | and just kind of removing barriers,
01:35:35.980 | moving very quickly, making big moves.
01:35:38.020 | All this is a very startupy sort of seeming things
01:35:40.460 | but at scale.
01:35:41.540 | - So strong drive to simplify.
01:35:43.940 | From your perspective, I mean,
01:35:45.540 | that also probably applies to just designing systems
01:35:49.500 | and machine learning and otherwise,
01:35:51.100 | like simplify, simplify.
01:35:52.340 | - Yes.
01:35:53.500 | - What do you think is the secret
01:35:54.900 | to maintaining the startup culture
01:35:57.420 | in a company that grows?
01:35:59.140 | Is there, can you introspect that?
01:36:03.820 | - I do think you need someone in a powerful position
01:36:06.380 | with a big hammer, like Elon,
01:36:07.900 | who's like the cheerleader for that idea
01:36:10.380 | and ruthlessly pursues it.
01:36:12.700 | If no one has a big enough hammer,
01:36:14.780 | everything turns into committees,
01:36:17.140 | democracy within the company,
01:36:19.380 | process, talking to stakeholders,
01:36:21.420 | decision-making, just everything just crumbles.
01:36:24.300 | If you have a big person who is also really smart
01:36:26.660 | and has a big hammer, things move quickly.
01:36:28.940 | - So you said your favorite scene in "Interstellar"
01:36:32.700 | is the intense docking scene
01:36:34.180 | with the AI and Cooper talking,
01:36:35.780 | saying, "Cooper, what are you doing?"
01:36:38.260 | "Docking, it's not possible."
01:36:40.260 | "No, it's necessary."
01:36:41.980 | Such a good line.
01:36:43.740 | By the way, just so many questions there.
01:36:45.580 | Why an AI in that scene,
01:36:49.660 | presumably is supposed to be able to compute
01:36:53.260 | a lot more than the human,
01:36:55.220 | is saying it's not optimal.
01:36:56.340 | Why the human, I mean, that's a movie,
01:36:57.940 | but shouldn't the AI know much better than the human?
01:37:02.180 | Anyway, what do you think is the value
01:37:04.340 | of setting seemingly impossible goals?
01:37:07.540 | So like, (laughs)
01:37:09.740 | our initial intuition, which seems like something
01:37:12.460 | that you have taken on, that Elon espouses,
01:37:17.020 | that where the initial intuition of the community
01:37:19.900 | might say this is very difficult,
01:37:21.780 | and then you take it on anyway with a crazy deadline.
01:37:24.860 | You just, from a human engineering perspective,
01:37:27.220 | have you seen the value of that?
01:37:31.860 | - I wouldn't say that setting impossible goals exactly
01:37:34.700 | is a good idea, but I think setting very ambitious goals
01:37:37.260 | is a good idea.
01:37:38.340 | I think there's a, what I call,
01:37:40.300 | sublinear scaling of difficulty,
01:37:42.100 | which means that 10x problems are not 10x hard.
01:37:45.260 | Usually 10x harder problem is like two or three x
01:37:49.540 | harder to execute on.
01:37:51.060 | Because if you wanna actually,
01:37:52.460 | if you wanna improve a system by 10%,
01:37:54.460 | it costs some amount of work.
01:37:55.620 | And if you wanna 10x improve the system,
01:37:57.420 | it doesn't cost 100x amount of work.
01:38:00.420 | And it's because you fundamentally change the approach.
01:38:02.140 | And if you start with that constraint,
01:38:04.500 | then some approaches are obviously dumb
01:38:06.020 | and not going to work.
01:38:06.980 | And it forces you to reevaluate.
01:38:09.700 | And I think it's a very interesting way
01:38:11.780 | of approaching problem solving.
01:38:13.860 | - But it requires a weird kind of thinking.
01:38:16.220 | It's just going back to your PhD days.
01:38:19.380 | It's like, how do you think which ideas
01:38:23.180 | in the machine learning community are solvable?
01:38:27.460 | - Yes.
01:38:28.740 | It requires, what is that?
01:38:30.380 | I mean, there's the cliche of first principles thinking,
01:38:32.940 | but it requires to basically ignore
01:38:35.380 | what the community is saying.
01:38:36.420 | 'Cause doesn't a community in science
01:38:40.820 | usually draw lines of what is and isn't possible?
01:38:44.100 | - Right.
01:38:44.940 | - And it's very hard to break out of that
01:38:47.180 | without going crazy.
01:38:48.500 | - Yeah.
01:38:49.340 | I mean, I think a good example here
01:38:50.500 | is the deep learning revolution in some sense,
01:38:52.860 | because you could be in computer vision at that time
01:38:55.860 | when during the deep learning revolution of 2012 and so on,
01:39:00.340 | you could be improving a computer vision stack by 10%,
01:39:03.060 | or we can just be saying, actually, all of this is useless.
01:39:05.900 | And how do I do 10x better computer vision?
01:39:07.860 | Well, it's not probably by tuning a hog feature detector.
01:39:11.060 | I need a different approach.
01:39:12.740 | I need something that is scalable.
01:39:14.060 | Going back to Richard Sutton's,
01:39:17.180 | an understanding sort of like the philosophy
01:39:18.700 | of the bitter lesson, and then being like,
01:39:21.580 | actually, I need much more scalable system,
01:39:23.220 | like a neural network that in principle works.
01:39:25.660 | And then having some deep believers
01:39:26.980 | that can actually execute on that mission and make it work.
01:39:29.460 | So that's the 10x solution.
01:39:31.660 | - What do you think is the timeline
01:39:35.620 | to solve the problem of autonomous driving?
01:39:38.460 | That's still in part an open question.
01:39:41.540 | - Yeah, I think the tough thing
01:39:43.820 | with timelines of self-driving, obviously,
01:39:45.660 | is that no one has created self-driving.
01:39:47.980 | - Yeah.
01:39:48.820 | - So it's not like,
01:39:50.460 | what do you think is the timeline to build this bridge?
01:39:52.140 | Well, we've built million bridges before.
01:39:54.020 | Here's how long that takes.
01:39:55.220 | It's, you know, it's, no one has built autonomy.
01:39:58.500 | It's not obvious.
01:40:00.180 | Some parts turn out to be much easier than others.
01:40:02.900 | So it's really hard to forecast.
01:40:04.020 | You do your best based on trend lines and so on,
01:40:06.780 | and based on intuition.
01:40:07.700 | But that's why fundamentally,
01:40:09.140 | it's just really hard to forecast this.
01:40:10.860 | No one has-
01:40:11.700 | - So even still, like being inside of it, it's hard to do.
01:40:14.980 | - Yes, some things turn out to be much harder,
01:40:16.660 | and some things turn out to be much easier.
01:40:18.940 | - Do you try to avoid making forecasts?
01:40:21.940 | 'Cause like Elon doesn't avoid them, right?
01:40:24.140 | And heads of car companies in the past
01:40:26.300 | have not avoided it either.
01:40:28.100 | Ford and other places have made predictions
01:40:31.460 | that we're gonna solve level four driving
01:40:33.940 | by like 2020, 2021, whatever.
01:40:36.540 | And now they're all kind of backtracking that prediction.
01:40:39.340 | As an AI person,
01:40:44.020 | do you for yourself privately make predictions,
01:40:48.540 | or do they get in the way of like your actual ability
01:40:52.100 | to think about a thing?
01:40:53.900 | - Yeah, I would say like,
01:40:55.060 | what's easy to say is that this problem is tractable,
01:40:57.580 | and that's an easy prediction to make.
01:40:59.180 | It's tractable, it's going to work.
01:41:00.780 | Yes, it's just really hard.
01:41:01.980 | Some things turn out to be harder,
01:41:03.060 | and some things turn out to be easier.
01:41:05.060 | So, but it definitely feels tractable,
01:41:08.180 | and it feels like at least the team at Tesla,
01:41:10.500 | which is what I saw internally,
01:41:11.740 | is definitely on track to that.
01:41:13.300 | - How do you form a strong representation
01:41:17.620 | that allows you to make a prediction about tractability?
01:41:20.620 | So like you're the leader of a lot of humans,
01:41:23.700 | you have to kind of say, this is actually possible.
01:41:28.540 | Like how do you build up that intuition?
01:41:30.980 | It doesn't have to be even driving,
01:41:32.420 | it could be other tasks.
01:41:33.660 | It could be, what difficult tasks
01:41:37.140 | did you work on in your life?
01:41:38.060 | I mean, classification, achieving certain,
01:41:41.100 | just an image net, certain level
01:41:43.460 | of superhuman level performance.
01:41:45.820 | - Yeah, expert intuition.
01:41:47.860 | It's just intuition, it's belief.
01:41:49.660 | (laughs)
01:41:50.900 | So just like thinking about it long enough,
01:41:52.780 | like studying, looking at sample data,
01:41:54.700 | like you said, driving.
01:41:55.880 | My intuition is really flawed on this.
01:41:59.060 | Like I don't have a good intuition about tractability.
01:42:01.420 | It could be anything.
01:42:03.300 | It could be solvable.
01:42:05.920 | Like, you know, the driving task
01:42:09.220 | could be simplified into something quite trivial.
01:42:12.940 | Like the solution to the problem would be quite trivial.
01:42:16.300 | And at scale, more and more cars driving perfectly
01:42:20.420 | might make the problem much easier.
01:42:22.260 | - Yeah.
01:42:23.100 | - The more cars you have driving,
01:42:23.940 | and like people learn how to drive correctly,
01:42:26.620 | not correctly, but in a way that's more optimal
01:42:29.460 | for a heterogeneous system of autonomous
01:42:32.900 | and semi-autonomous and manually driven cars,
01:42:36.060 | that could change stuff.
01:42:37.140 | Then again, also I've spent a ridiculous number of hours
01:42:40.540 | just staring at pedestrians crossing streets,
01:42:43.660 | thinking about humans.
01:42:45.340 | And it feels like the way we use our eye contact,
01:42:50.340 | it sends really strong signals.
01:42:52.740 | And there's certain quirks and edge cases of behavior.
01:42:55.580 | And of course, a lot of the fatalities that happen
01:42:57.660 | have to do with drunk driving,
01:42:59.740 | and both on the pedestrian side and the driver's side.
01:43:03.340 | So there's that problem of driving at night
01:43:05.860 | and all that kind of.
01:43:06.780 | So I wonder, you know, it's like the space
01:43:09.140 | of possible solution to autonomous driving
01:43:12.380 | includes so many human factor issues
01:43:15.660 | that it's almost impossible to predict.
01:43:17.900 | There could be super clean, nice solutions.
01:43:20.860 | - Yeah.
01:43:21.700 | I would say definitely like to use a game analogy,
01:43:23.420 | there's some fog of war,
01:43:25.140 | but you definitely also see the frontier of improvement
01:43:28.380 | and you can measure historically
01:43:29.700 | how much you've made progress.
01:43:31.340 | And I think, for example, at least what I've seen
01:43:33.260 | in roughly five years at Tesla,
01:43:35.380 | when I joined, it barely kept lane on the highway.
01:43:38.940 | I think going up from Palo Alto to SF
01:43:40.740 | was like three or four interventions.
01:43:42.180 | Anytime the road would do anything geometrically
01:43:44.660 | or turn too much, it would just like not work.
01:43:47.060 | And so going from that to like a pretty competent system
01:43:49.340 | in five years and seeing what happens also under the hood
01:43:52.380 | and what the scale of which the team is operating now
01:43:54.220 | with respect to data and compute and everything else
01:43:56.940 | is just a massive progress.
01:44:00.340 | - So it's, you're climbing a mountain and it's fog,
01:44:03.860 | but you're making a lot of progress.
01:44:05.300 | - It's fog.
01:44:06.140 | You're making progress and you see
01:44:06.980 | what the next directions are.
01:44:07.940 | And you're looking at some of the remaining challenges
01:44:09.540 | and they're not like, they're not perturbing you
01:44:12.340 | and they're not changing your philosophy
01:44:13.620 | and you're not contorting yourself.
01:44:15.780 | You're like, actually, these are the things
01:44:17.420 | that we still need to do.
01:44:18.260 | - Yeah, the fundamental components of solving the problem
01:44:20.220 | seem to be there from the data engine to the compute,
01:44:22.580 | to the compute on the car, to the compute for the training,
01:44:25.560 | all that kind of stuff.
01:44:27.240 | So you've done, over the years you've been at Tesla,
01:44:30.420 | you've done a lot of amazing breakthrough ideas
01:44:33.740 | in engineering, all of it,
01:44:36.860 | from the data engine to the human side, all of it.
01:44:40.180 | Can you speak to why you chose to leave Tesla?
01:44:43.900 | - Basically, as I described, I ran,
01:44:45.940 | I think over time during those five years,
01:44:47.780 | I've kind of gotten myself into
01:44:50.620 | a little bit of a managerial position.
01:44:52.460 | Most of my days were meetings and growing the organization
01:44:54.940 | and making decisions about sort of high level strategic
01:44:58.860 | decisions about the team
01:45:00.300 | and what it should be working on and so on.
01:45:02.420 | And it's kind of like a corporate executive role
01:45:06.100 | and I can do it, I think I'm okay at it,
01:45:08.580 | but it's not like fundamentally what I enjoy.
01:45:11.060 | And so I think when I joined,
01:45:13.980 | there was no computer vision team
01:45:15.060 | because Tesla was just going from the transition
01:45:17.040 | of using Mobileye, a third party vendor
01:45:18.740 | for all of its computer vision,
01:45:19.700 | to having to build its computer vision system.
01:45:21.840 | So when I showed up, there were two people
01:45:23.180 | training deep neural networks
01:45:24.900 | and they were training them at a computer
01:45:26.580 | at their legs, like down, it was a workstation.
01:45:30.580 | - They're doing some kind of basic classification task.
01:45:32.460 | - Yeah, and so I kind of like grew that
01:45:35.580 | into what I think is a fairly respectable
01:45:37.820 | deep learning team, a massive compute cluster,
01:45:40.020 | a very good data annotation organization.
01:45:42.880 | And I was very happy with where that was.
01:45:45.260 | It became quite autonomous.
01:45:46.660 | And so I kind of stepped away and I, you know,
01:45:49.580 | I'm very excited to do much more technical things again.
01:45:52.300 | Yeah, and kind of like refocus on AGI.
01:45:54.620 | - What was this soul searching like?
01:45:56.580 | 'Cause you took a little time off and think like,
01:45:58.500 | what, how many mushrooms did you take?
01:46:00.460 | No, I'm just kidding.
01:46:01.660 | I mean, what was going through your mind?
01:46:03.740 | The human lifetime is finite.
01:46:06.160 | - Yeah.
01:46:07.000 | - You did a few incredible things.
01:46:08.260 | You're one of the best teachers of AI in the world.
01:46:11.540 | You're one of the best, and I don't mean that,
01:46:14.380 | I mean that in the best possible way,
01:46:15.900 | you're one of the best tinkerers in the AI world.
01:46:19.720 | Meaning like understanding the fundamentals
01:46:23.260 | of how something works by building it from scratch
01:46:26.420 | and playing with the basic intuitions.
01:46:28.500 | It's like Einstein, Feynman,
01:46:30.340 | were all really good at this kind of stuff.
01:46:32.060 | Like small example of a thing to play with it,
01:46:35.180 | to try to understand it.
01:46:37.020 | So that, and obviously now with Tesla,
01:46:38.940 | you help build a team of machine learning,
01:46:42.360 | like engineers and assistant that actually accomplishes
01:46:46.740 | something in the real world.
01:46:47.940 | So given all that, like what was the soul searching like?
01:46:51.740 | - Well, it was hard because obviously
01:46:53.380 | I love the company a lot and I love Elon, I love Tesla.
01:46:57.140 | I want, it was hard to leave.
01:47:00.100 | I love the team basically.
01:47:02.440 | But yeah, I think actually I would be potentially
01:47:06.440 | like interested in revisiting it.
01:47:07.920 | Maybe coming back at some point,
01:47:10.080 | working in Optimus, working in AGI at Tesla.
01:47:13.440 | I think Tesla is going to do incredible things.
01:47:15.800 | It's basically like,
01:47:17.020 | it's a massive large-scale robotics kind of company
01:47:22.440 | for the ton of in-house talent
01:47:23.520 | for doing really incredible things.
01:47:25.040 | And I think human robots are going to be amazing.
01:47:29.200 | I think autonomous transportation is going to be amazing.
01:47:31.880 | All this is happening at Tesla.
01:47:32.920 | So I think it's just a really amazing organization.
01:47:35.520 | So being part of it and helping it along,
01:47:37.080 | I think was very, basically I enjoyed that a lot.
01:47:39.920 | Yeah, it was basically difficult for those reasons
01:47:41.560 | because I love the company,
01:47:43.120 | but I'm happy to potentially at some point
01:47:45.400 | come back for Act Two.
01:47:46.800 | But I felt like at this stage, I built the team,
01:47:49.640 | it felt autonomous and I became a manager
01:47:53.200 | and I wanted to do a lot more technical stuff.
01:47:54.760 | I wanted to learn stuff, I wanted to teach stuff.
01:47:57.360 | And I just kind of felt like it was a good time
01:48:00.040 | for a change of pace a little bit.
01:48:01.600 | - What do you think is the best movie sequel of all time,
01:48:05.680 | speaking of Part Two?
01:48:07.160 | 'Cause most of them suck.
01:48:08.960 | - Movie sequels?
01:48:09.800 | - Movie sequels, yeah.
01:48:10.960 | And you tweeted about movies, so just in a tiny tangent,
01:48:14.320 | is there, what's your, what's like a favorite movie sequel?
01:48:18.440 | Godfather Part Two?
01:48:19.540 | Are you a fan of Godfather?
01:48:21.560 | 'Cause you didn't even tweet or mention the Godfather.
01:48:23.640 | - Yeah, I don't love that movie.
01:48:24.640 | I know it has a huge follow-up.
01:48:25.480 | - We're gonna edit that out.
01:48:26.300 | We're gonna edit out the hate towards the Godfather.
01:48:28.840 | How dare you disrespect--
01:48:29.680 | - I think I will make a strong statement.
01:48:31.000 | I don't know why. - Oh, no.
01:48:32.160 | - I don't know why, but I basically don't like any movie
01:48:34.880 | before 1995, something like that.
01:48:38.240 | - Didn't you mention Terminator Two?
01:48:40.080 | - Okay, okay, that's like,
01:48:41.900 | Terminator Two was a little bit later, 1990.
01:48:45.600 | - No, I think Terminator Two was in the '80s.
01:48:47.440 | - And I like Terminator One as well.
01:48:48.840 | So, okay, so like few exceptions,
01:48:50.640 | but by and large, for some reason,
01:48:52.180 | I don't like movies before 1995 or something.
01:48:55.420 | They feel very slow.
01:48:56.840 | The camera is like zoomed out.
01:48:58.080 | It's boring.
01:48:58.960 | It's kind of naive.
01:49:00.080 | It's kind of weird.
01:49:00.920 | - And also, Terminator was very much ahead of its time.
01:49:03.960 | - Yes, and the Godfather, there's like no AGI.
01:49:06.700 | So-- (laughing)
01:49:08.960 | - I mean, but you have Good Will Hunting
01:49:12.120 | was one of the movies you mentioned,
01:49:13.920 | and that doesn't have any AGI either.
01:49:15.720 | I guess it has mathematics.
01:49:16.920 | - Yeah, I guess occasionally I do enjoy movies
01:49:19.040 | that don't feature--
01:49:20.560 | - Or like Anchorman, that has no, that's--
01:49:23.160 | - Anchorman is so good.
01:49:24.680 | - I don't understand, speaking of AGI,
01:49:28.400 | 'cause I don't understand why Will Ferrell is so funny.
01:49:31.960 | It doesn't make sense.
01:49:32.920 | It doesn't compute.
01:49:33.760 | There's just something about him.
01:49:35.400 | And he's a singular human,
01:49:37.120 | 'cause you don't get that many comedies these days.
01:49:39.920 | And I wonder if it has to do about the culture
01:49:42.440 | or like the machine of Hollywood,
01:49:44.400 | or does it have to do with just we got lucky
01:49:46.280 | with certain people in comedy that came together,
01:49:49.200 | 'cause he is a singular human.
01:49:50.360 | - Yeah, yeah, yeah, I like his movies.
01:49:53.200 | That was a ridiculous tangent, I apologize.
01:49:55.440 | But you mentioned humanoid robots,
01:49:57.320 | so what do you think about Optimus, about Tesla Bot?
01:50:00.600 | Do you think we'll have robots in the factory
01:50:02.480 | and in the home in 10, 20, 30, 40, 50 years?
01:50:05.960 | - Yeah, I think it's a very hard project.
01:50:07.760 | I think it's going to take a while.
01:50:09.160 | Who else is going to build humanoid robots at scale?
01:50:12.160 | And I think it is a very good form factor to go after,
01:50:14.240 | because like I mentioned,
01:50:15.600 | the world is designed for humanoid form factor.
01:50:17.760 | These things would be able to operate our machines.
01:50:19.760 | They would be able to sit down in chairs,
01:50:22.120 | potentially even drive cars.
01:50:24.200 | Basically, the world is designed for humans.
01:50:25.840 | That's the form factor you want to invest into
01:50:27.760 | and make work over time.
01:50:29.400 | I think there's another school of thought,
01:50:31.280 | which is, okay, pick a problem and design a robot to it.
01:50:34.320 | But actually designing a robot
01:50:35.440 | and getting a whole data engine
01:50:36.480 | and everything behind it to work
01:50:37.880 | is actually an incredibly hard problem.
01:50:39.800 | So it makes sense to go after general interfaces
01:50:41.880 | that, okay, they are not perfect for any one given task,
01:50:44.880 | but they actually have the generality
01:50:46.720 | of just with a prompt with English
01:50:48.680 | able to do something across.
01:50:51.280 | And so I think it makes a lot of sense
01:50:52.360 | to go after a general interface in the physical world.
01:50:57.360 | And I think it's a very difficult project.
01:51:00.040 | I think it's going to take time,
01:51:01.920 | but I've seen no other company
01:51:03.960 | that can execute on that vision.
01:51:04.960 | I think it's going to be amazing.
01:51:06.040 | Like basically physical labor.
01:51:08.360 | Like if you think transportation is a large market,
01:51:10.720 | try physical labor.
01:51:11.760 | (laughs)
01:51:12.600 | It's like insane.
01:51:14.200 | - But it's not just physical labor.
01:51:15.480 | To me, the thing that's also exciting is social robotics.
01:51:18.680 | So the relationship we'll have on different levels
01:51:21.640 | with those robots.
01:51:23.360 | That's why I was really excited to see Optimus.
01:51:26.360 | Like people have criticized me for the excitement,
01:51:30.920 | but I've worked with a lot of research labs
01:51:34.680 | that do humanoid legged robots,
01:51:38.000 | Boston Dynamics, Unitry,
01:51:40.000 | there's a lot of companies that do legged robots,
01:51:42.680 | but that's the elegance of the movement
01:51:47.880 | is a tiny, tiny part of the big picture.
01:51:51.640 | So integrating, the two big exciting things to me
01:51:54.320 | about Tesla doing humanoid or any legged robots
01:51:58.360 | is clearly integrating into the data engine.
01:52:03.000 | So the data engine aspect.
01:52:05.080 | So the actual intelligence for the perception
01:52:07.640 | and the control and the planning
01:52:09.680 | and all that kind of stuff,
01:52:10.760 | integrating into the fleet that you mentioned, right?
01:52:13.640 | And then speaking of fleet,
01:52:17.400 | the second thing is the mass manufacturers.
01:52:19.360 | Just knowing culturally driving towards a simple robot
01:52:24.360 | that's cheap to produce at scale
01:52:29.400 | and doing that well, having experience to do that well,
01:52:31.560 | that changes everything.
01:52:32.480 | That's a very different culture and style
01:52:35.600 | than Boston Dynamics, who by the way,
01:52:37.600 | those robots are just, the way they move,
01:52:41.400 | it'll be a very long time before Tesla can achieve
01:52:45.520 | the smoothness of movement.
01:52:47.200 | But that's not what it's about.
01:52:49.080 | It's about the entirety of the system,
01:52:52.280 | like we talked about the data engine and the fleet.
01:52:54.760 | That's super exciting.
01:52:55.720 | Even the initial sort of models,
01:52:58.240 | but that too was really surprising
01:53:00.560 | that in a few months you can get a prototype.
01:53:03.600 | - Yep, and the reason that happened very quickly
01:53:05.760 | is as you alluded to, there's a ton of copy paste
01:53:08.480 | from what's happening in the autopilot, a lot.
01:53:10.840 | The amount of expertise that came out of the woodworks
01:53:12.880 | at Tesla for building the human robot was incredible to see.
01:53:16.120 | Like basically Elon said at one point we're doing this.
01:53:19.320 | And then next day, basically,
01:53:21.960 | like all these CAD models started to appear
01:53:23.960 | and people talking about like the supply chain
01:53:25.480 | and manufacturing and people showed up
01:53:27.920 | with like screwdrivers and everything like the other day
01:53:30.400 | and started to like put together the body.
01:53:32.120 | And I was like, whoa, like all these people exist at Tesla.
01:53:34.600 | And fundamentally building a car is actually
01:53:36.040 | not that different from building a robot.
01:53:37.640 | The same, and that is true,
01:53:39.560 | not just for the hardware pieces.
01:53:41.600 | And also let's not forget hardware, not just for a demo,
01:53:44.600 | but manufacturing of that hardware at scale
01:53:48.840 | is like a whole different thing.
01:53:50.280 | But for software as well,
01:53:52.200 | basically this robot currently thinks it's a car.
01:53:54.600 | (both laughing)
01:53:56.520 | - It's gonna have a midlife crisis at some point.
01:53:59.400 | - It thinks it's a car.
01:54:01.000 | Some of the earlier demos, actually,
01:54:02.360 | we were talking about potentially doing them outside
01:54:04.080 | in the parking lot because that's where all
01:54:05.400 | of the computer vision was like working out of the box
01:54:08.120 | instead of like inside.
01:54:10.560 | But all the operating system, everything just copy pastes.
01:54:14.520 | Computer vision, mostly copy pastes.
01:54:16.240 | I mean, you have to retrain the neural nets,
01:54:17.440 | but the approach and everything and data engine
01:54:19.040 | and offline trackers and the way we go
01:54:20.800 | about the occupancy tracker and so on, everything copy pastes.
01:54:23.560 | You just need to retrain the neural nets.
01:54:25.840 | And then the planning control, of course,
01:54:27.360 | has to change quite a bit.
01:54:28.600 | But there's a ton of copy paste
01:54:30.280 | from what's happening at Tesla.
01:54:31.520 | And so if you were to go with the goal of like,
01:54:34.240 | okay, let's build a million human robots
01:54:35.840 | and you're not Tesla, that's a lot to ask.
01:54:38.560 | If you're Tesla, it's actually like, it's not that crazy.
01:54:42.800 | - And then the follow up question is then how difficult,
01:54:45.480 | just like with driving, how difficult
01:54:46.840 | is the manipulation task?
01:54:49.000 | Such that it can have an impact at scale.
01:54:51.040 | I think depending on the context,
01:54:53.640 | the really nice thing about robotics is that,
01:54:57.840 | unless you do a manufacturer and that kind of stuff,
01:55:00.160 | is there's more room for error.
01:55:02.480 | Driving is so safety critical and also time critical.
01:55:06.280 | Like a robot is allowed to move slower, which is nice.
01:55:09.960 | - Yes.
01:55:11.000 | I think it's going to take a long time,
01:55:12.560 | but the way you want to structure the development
01:55:14.440 | is you need to say, okay, it's going to take a long time.
01:55:16.600 | How can I set up the product development roadmap
01:55:20.680 | so that I'm making revenue along the way?
01:55:22.160 | I'm not setting myself up for a zero one loss function
01:55:24.440 | where it doesn't work until it works.
01:55:26.080 | You don't want to be in that position.
01:55:27.400 | You want to make it useful almost immediately.
01:55:29.560 | And then you want to slowly deploy it
01:55:31.800 | and generalize it at scale.
01:55:34.480 | And you want to set up your data engine,
01:55:35.680 | your improvement loops, the telemetry, the evaluation,
01:55:38.920 | the harness and everything.
01:55:41.200 | And you want to improve the product over time incrementally
01:55:43.320 | and you're making revenue along the way.
01:55:44.640 | That's extremely important because otherwise
01:55:46.720 | you cannot build these large undertakings
01:55:49.480 | just like don't make sense economically.
01:55:51.200 | And also from the point of view of the team working on it,
01:55:53.240 | they need the dopamine along the way.
01:55:55.120 | They're not just going to make a promise
01:55:57.160 | about this being useful.
01:55:58.640 | This is going to change the world in 10 years when it works.
01:56:01.080 | This is not where you want to be.
01:56:02.280 | You want to be in a place like I think Autopilot is today
01:56:04.600 | where it's offering increased safety
01:56:06.640 | and convenience of driving today.
01:56:10.000 | People pay for it, people like it, people purchase it.
01:56:12.800 | And then you also have the greater mission
01:56:14.480 | that you're working towards.
01:56:16.360 | - And you see that, so the dopamine for the team,
01:56:19.040 | that was a source of happiness.
01:56:20.680 | - Yes, 100%.
01:56:21.760 | You're deploying this, people like it, people drive it,
01:56:23.760 | people pay for it, they care about it.
01:56:25.680 | There's all these YouTube videos.
01:56:27.040 | Your grandma drives it, she gives you feedback.
01:56:29.200 | People like it, people engage with it.
01:56:30.640 | You engage with it, huge.
01:56:32.280 | - Do people that drive Teslas like recognize you
01:56:34.760 | and give you love?
01:56:36.640 | Like, "Hey, thanks for this nice feature that it's doing."
01:56:41.640 | - Yeah, I think the tricky thing is like,
01:56:43.800 | some people really love you,
01:56:44.760 | some people unfortunately like,
01:56:46.240 | you're working on something that you think
01:56:47.200 | is extremely valuable, useful, et cetera.
01:56:48.960 | Some people do hate you.
01:56:50.320 | There's a lot of people who hate me and the team
01:56:53.000 | and the whole project.
01:56:55.360 | And I think-- - Are they Tesla drivers?
01:56:57.800 | - Many cases they're not, actually.
01:56:59.760 | - Yeah, that actually makes me sad about humans
01:57:02.760 | or the current ways that humans interact.
01:57:06.440 | I think that's actually fixable.
01:57:07.760 | I think humans want to be good to each other.
01:57:09.480 | I think Twitter and social media is part of the mechanism
01:57:12.360 | that actually somehow makes the negativity more viral,
01:57:16.320 | that it doesn't deserve, like disproportionately add
01:57:20.280 | like a viral boost to the negativity.
01:57:23.640 | But I wish people would just get excited about,
01:57:26.360 | so suppress some of the jealousy, some of the ego,
01:57:30.520 | and just get excited for others.
01:57:32.200 | And then there's a karma aspect to that.
01:57:34.440 | You get excited for others, they'll get excited for you.
01:57:36.640 | Same thing in academia.
01:57:38.120 | If you're not careful, there is like a dynamical system
01:57:40.600 | there if you think of in silos and get jealous
01:57:44.200 | of somebody else being successful,
01:57:46.080 | that actually perhaps counterintuitively leads
01:57:50.280 | to less productivity of you as a community
01:57:52.280 | and you individually.
01:57:53.600 | I feel like if you keep celebrating others,
01:57:56.760 | that actually makes you more successful.
01:57:59.800 | - I think people have, depending on the industry,
01:58:02.440 | haven't quite learned that yet.
01:58:03.600 | - Yeah.
01:58:04.440 | Some people are also very negative and very vocal.
01:58:06.160 | So they're very prominently featured,
01:58:07.680 | but actually there's a ton of people who are cheerleaders,
01:58:10.680 | but they're silent cheerleaders.
01:58:12.440 | And when you talk to people just in the world,
01:58:15.840 | they will tell you, "Oh, it's amazing, it's great."
01:58:17.560 | Especially like people who understand how difficult it is
01:58:19.440 | to get this stuff working.
01:58:20.400 | Like people who have built products
01:58:21.680 | and makers, entrepreneurs, like making this work
01:58:25.040 | and changing something is incredibly hard.
01:58:28.600 | Those people are more likely to cheerlead you.
01:58:31.080 | - Well, one of the things that makes me sad
01:58:32.600 | is some folks in the robotics community
01:58:35.120 | don't do the cheerleading and they should.
01:58:37.880 | 'Cause they know how difficult it is.
01:58:39.160 | Well, they actually sometimes don't know how difficult
01:58:41.080 | it is to create a product that's scale, right?
01:58:43.360 | To actually deploy it in the real world.
01:58:45.200 | A lot of the development of robots and AI system
01:58:50.000 | is done on very specific small benchmarks
01:58:52.640 | and as opposed to real world conditions.
01:58:55.600 | - Yes.
01:58:57.160 | - Yeah, I think it's really hard to work on robotics
01:58:58.960 | in an academic setting.
01:59:00.000 | - Or AI systems that apply in the real world.
01:59:02.000 | You've criticized, you flourished and loved for a time
01:59:07.000 | the ImageNet, the famed ImageNet dataset.
01:59:10.960 | And I've recently had some words of criticism
01:59:14.680 | that the academic research ML community
01:59:18.600 | gives a little too much love still to the ImageNet
01:59:21.160 | or like those kinds of benchmarks.
01:59:23.800 | Can you speak to the strengths and weaknesses of datasets
01:59:27.000 | used in machine learning research?
01:59:29.200 | - Actually, I don't know that I recall a specific instance
01:59:32.320 | where I was unhappy or criticizing ImageNet.
01:59:35.680 | I think ImageNet has been extremely valuable.
01:59:37.920 | It was basically a benchmark that allowed
01:59:41.520 | the deep learning community to demonstrate
01:59:44.000 | that deep neural networks actually work.
01:59:46.000 | There's a massive value in that.
01:59:49.680 | So I think ImageNet was useful,
01:59:51.280 | but basically it's become a bit of an MNIST at this point.
01:59:54.240 | So MNIST is like little 28 by 28 grayscale digits.
01:59:57.720 | There's kind of a joke dataset that everyone like crushes.
02:00:00.640 | - There's still papers written on MNIST though, right?
02:00:02.880 | - Maybe there shouldn't be. - Like strong papers.
02:00:04.840 | Like papers that focus on like,
02:00:06.960 | how do we learn with a small amount of data,
02:00:08.960 | that kind of stuff.
02:00:09.800 | - Yeah, I could see that being helpful,
02:00:10.640 | but not in sort of like mainline
02:00:11.880 | computer vision research anymore, of course.
02:00:13.520 | - I think the way I've heard you somewhere,
02:00:15.400 | maybe I'm just imagining things,
02:00:17.200 | but I think you said like ImageNet was a huge contribution
02:00:19.800 | to the community for a long time
02:00:21.040 | and now it's time to move past those kinds of-
02:00:23.080 | - Well, ImageNet has been crushed.
02:00:24.200 | I mean, the error rates are,
02:00:25.960 | yeah, we're getting like 90% accuracy
02:00:30.680 | in 1000 classification way prediction.
02:00:34.840 | And I've seen those images and it's like really high.
02:00:38.800 | That's really good.
02:00:40.280 | If I remember correctly,
02:00:41.120 | the top five error rate is now like 1% or something.
02:00:44.600 | - Given your experience with a gigantic real world dataset,
02:00:47.840 | would you like to see benchmarks move
02:00:49.680 | in a certain directions that the research community uses?
02:00:52.720 | - Unfortunately, I don't think academics currently
02:00:54.240 | have the next ImageNet.
02:00:55.920 | We've obviously, I think we've crushed MNIST.
02:00:57.720 | We've basically kind of crushed ImageNet
02:01:00.200 | and there's no next sort of big benchmark
02:01:02.800 | that the entire community rallies behind
02:01:04.960 | and uses for further development of these networks.
02:01:09.320 | - I wonder what it takes for a dataset
02:01:11.000 | to captivate the imagination of everybody,
02:01:13.280 | like where they all get behind it.
02:01:15.040 | That could also need like a leader, right?
02:01:19.320 | Somebody with popularity.
02:01:20.520 | I mean, yeah, why did ImageNet take off?
02:01:23.320 | Is it just the accident of history?
02:01:26.480 | - It was the right amount of difficult.
02:01:28.440 | It was the right amount of difficult
02:01:30.720 | and simple and interesting enough.
02:01:33.120 | It just kind of like, it was the right time
02:01:35.320 | for that kind of a dataset.
02:01:36.680 | - Question from Reddit.
02:01:38.720 | What are your thoughts on the role
02:01:42.360 | that synthetic data and game engines will play
02:01:44.720 | in the future of neural net model development?
02:01:48.280 | - I think as neural nets converge to humans,
02:01:52.320 | the value of simulation to neural nets
02:01:55.760 | will be similar to value of simulation to humans.
02:01:58.200 | So people use simulation for,
02:02:01.480 | people use simulation because they can learn something
02:02:04.280 | in that kind of a system
02:02:05.480 | and without having to actually experience it.
02:02:09.240 | - But are you referring to the simulation
02:02:10.880 | we do in our head?
02:02:12.280 | - No, sorry, simulation, I mean like video games
02:02:14.520 | or other forms of simulation for various professionals.
02:02:19.520 | - So let me push back on that
02:02:21.400 | 'cause maybe there's simulation that we do in our heads.
02:02:23.920 | Like simulate, if I do this, what do I think will happen?
02:02:28.720 | - Okay, that's like internal simulation.
02:02:30.160 | - Yeah, internal.
02:02:31.000 | Isn't that what we're doing as humans before we act?
02:02:33.400 | - Oh yeah, but that's independent
02:02:34.240 | from like the use of simulation
02:02:35.840 | in the sense of like computer games
02:02:37.160 | or using simulation for training set creation or-
02:02:40.240 | - Is it independent or is it just loosely correlated?
02:02:42.840 | 'Cause like, isn't that useful to do like counterfactual
02:02:47.400 | or like edge case simulation to like,
02:02:51.320 | what happens if there's a nuclear war?
02:02:54.960 | What happens if there's, like those kinds of things?
02:02:58.400 | - Yeah, that's a different simulation from like Unreal Engine.
02:03:00.560 | That's how I interpreted the question.
02:03:02.320 | - Ah, so like simulation of the average case.
02:03:05.480 | Is that, what's Unreal Engine?
02:03:08.520 | What do you mean by Unreal Engine?
02:03:11.720 | So simulating a world, the physics of that world,
02:03:16.320 | why is that different?
02:03:18.520 | Like, 'cause you also can add behavior to that world
02:03:22.000 | and you could try all kinds of stuff, right?
02:03:24.840 | You could throw all kinds of weird things into it.
02:03:26.960 | So Unreal Engine is not just about simulating,
02:03:29.680 | I mean, I guess it is about simulating
02:03:31.480 | the physics of the world.
02:03:32.320 | It's also doing something with that.
02:03:35.360 | - Yeah, the graphics, the physics,
02:03:36.440 | and the agents that you put into the environment
02:03:38.600 | and stuff like that, yeah.
02:03:39.520 | - See, I think you, I feel like you said
02:03:41.320 | that it's not that important, I guess,
02:03:43.680 | for the future of AI development.
02:03:46.160 | Is that correct to interpret it that way?
02:03:48.280 | - I think humans use simulators for,
02:03:53.280 | humans use simulators and they find them useful.
02:03:55.240 | And so computers will use simulators and find them useful.
02:03:58.240 | - Okay, so you're saying it's not,
02:04:00.280 | I don't use simulators very often.
02:04:02.080 | I play a video game every once in a while,
02:04:03.640 | but I don't think I derive any wisdom
02:04:05.840 | about my own existence from those video games.
02:04:09.280 | It's a momentary escape from reality
02:04:11.680 | versus a source of wisdom about reality.
02:04:14.840 | So I think that's a very polite way
02:04:17.160 | of saying simulation is not that useful.
02:04:19.240 | - Yeah, maybe not.
02:04:21.120 | I don't see it as a fundamental,
02:04:22.960 | really important part of training neural nets currently.
02:04:26.960 | But I think as neural nets become more and more powerful,
02:04:29.840 | I think you will need fewer examples
02:04:32.400 | to train additional behaviors.
02:04:34.400 | And simulation is, of course,
02:04:36.560 | there's a domain gap in a simulation
02:04:38.120 | that is not the real world,
02:04:39.120 | it's slightly something different.
02:04:40.520 | But with a powerful enough neural net,
02:04:42.920 | you need, the domain gap can be bigger, I think,
02:04:45.720 | because neural net will sort of understand
02:04:47.360 | that even though it's not the real world,
02:04:48.720 | it has all this high level structure
02:04:50.480 | that I'm supposed to be able to learn from.
02:04:52.280 | - So the neural net will actually, yeah,
02:04:54.920 | it will be able to leverage the synthetic data better
02:04:59.320 | by closing the gap,
02:05:00.440 | but understanding in which ways this is not real data.
02:05:04.240 | - Exactly.
02:05:05.080 | - Ready to do better questions next time.
02:05:08.040 | That was a question, I'm just kidding.
02:05:10.600 | All right.
02:05:11.440 | So is it possible, do you think, speaking of MNIST,
02:05:17.280 | to construct neural nets and training processes
02:05:19.720 | that require very little data?
02:05:21.360 | So we've been talking about huge data sets
02:05:25.000 | like the Internet for training.
02:05:26.680 | I mean, one way to say that is like you said,
02:05:28.440 | like the querying itself is another level of training,
02:05:31.600 | I guess, and that requires little data.
02:05:34.200 | But do you see any value in doing research
02:05:38.920 | and kind of going down the direction of,
02:05:41.760 | can we use very little data to train,
02:05:44.160 | to construct a knowledge base?
02:05:45.360 | - 100%.
02:05:46.200 | I just think like at some point you need a massive data set.
02:05:49.040 | And then when you pre-train your massive neural net
02:05:51.120 | and get something that is like a GPT or something,
02:05:54.280 | then you're able to be very efficient
02:05:56.760 | at training any arbitrary new task.
02:05:58.700 | So a lot of these GPTs, you can do tasks
02:06:02.320 | like sentiment analysis or translation or so on,
02:06:04.880 | just by being prompted with very few examples.
02:06:06.960 | Here's the kind of thing I want you to do.
02:06:08.120 | Like here's an input sentence,
02:06:09.720 | here's the translation into German.
02:06:11.240 | Input sentence, translation to German.
02:06:12.840 | Input sentence, blank,
02:06:14.560 | and the neural net will complete the translation to German
02:06:16.720 | just by looking at sort of the example you've provided.
02:06:19.920 | And so that's an example of a very few-shot learning
02:06:23.520 | in the activations of the neural net
02:06:24.880 | instead of the weights of the neural net.
02:06:26.400 | And so I think basically just like humans,
02:06:29.960 | neural nets will become very data efficient
02:06:31.640 | at learning any other new task.
02:06:33.760 | But at some point you need a massive data set
02:06:35.560 | to pre-train your network.
02:06:36.880 | - Do get that.
02:06:38.880 | And probably we humans have something like that.
02:06:41.560 | Do we have something like that?
02:06:42.920 | Do we have a passive in the background,
02:06:47.400 | background model constructing thing
02:06:50.520 | that just runs all the time in a self-supervised way?
02:06:52.960 | We're not conscious of it?
02:06:54.240 | - I think humans definitely.
02:06:55.240 | I mean, obviously we learn a lot during our lifespan,
02:06:59.800 | but also we have a ton of hardware
02:07:02.080 | that helps us at initialization coming from sort of evolution.
02:07:06.160 | And so I think that's also a really big component.
02:07:08.800 | A lot of people in the field,
02:07:09.760 | I think they just talk about the amounts of like seconds
02:07:12.000 | and that a person has lived pretending
02:07:14.480 | that this is a tabula rasa,
02:07:16.160 | sort of like a zero initialization of a neural net.
02:07:18.800 | And it's not.
02:07:20.120 | You can look at a lot of animals,
02:07:21.080 | like for example, zebras.
02:07:22.600 | Zebras get born and they see and they can run.
02:07:27.000 | There's zero training data in their lifespan.
02:07:29.360 | They can just do that.
02:07:30.560 | So somehow, I have no idea how evolution has found a way
02:07:33.720 | to encode these algorithms
02:07:35.160 | and these neural net initializations
02:07:36.480 | that are extremely good into ATCGs.
02:07:38.800 | And I have no idea how this works,
02:07:40.040 | but apparently it's possible
02:07:41.120 | because here's a proof by existence.
02:07:44.200 | - There's something magical about going from a single cell
02:07:48.000 | to an organism that is born to the first few years of life.
02:07:51.600 | I kind of like the idea
02:07:52.560 | that the reason we don't remember anything
02:07:54.400 | about the first few years of our life
02:07:57.040 | is that it's a really painful process.
02:07:59.480 | Like it's a very difficult, challenging training process.
02:08:03.240 | - Yeah.
02:08:04.080 | - Like intellectually.
02:08:05.540 | And maybe, yeah, I mean, I don't,
02:08:09.480 | why don't we remember any of that?
02:08:11.760 | There might be some crazy training going on.
02:08:14.200 | Maybe that's the background model training
02:08:19.440 | that is very painful.
02:08:23.080 | And so it's best for the system once it's trained
02:08:25.720 | not to remember how it's constructed.
02:08:27.760 | - I think it's just like the hardware for long-term memory
02:08:30.120 | is just not fully developed.
02:08:31.920 | I kind of feel like the first few years of infants
02:08:35.720 | is not actually like learning, it's brain maturing.
02:08:38.200 | We're born premature.
02:08:40.540 | There's a theory along those lines
02:08:43.040 | because of the birth canal and the swelling of the brain.
02:08:45.440 | And so we're born premature
02:08:46.680 | and then the first few years,
02:08:47.640 | we're just, the brain's maturing.
02:08:49.520 | And then there's some learning eventually.
02:08:51.620 | That's my current view on it.
02:08:53.680 | - What do you think,
02:08:55.680 | do you think neural nets can have long-term memory?
02:08:58.740 | Like that approach is something like humans.
02:09:02.000 | Do you think there needs to be another meta architecture
02:09:04.960 | on top of it to add something like a knowledge base
02:09:07.640 | that learns facts about the world
02:09:09.360 | and all that kind of stuff?
02:09:10.560 | - Yes, but I don't know to what extent
02:09:12.600 | it will be explicitly constructed.
02:09:14.740 | It might take unintuitive forms
02:09:17.480 | where you are telling the GPT, like,
02:09:20.120 | hey, you have a declarative memory bank
02:09:22.840 | to which you can store and retrieve data from.
02:09:25.120 | And whenever you encounter some information
02:09:26.900 | that you find useful, just save it to your memory bank.
02:09:29.680 | And here's an example of something you have retrieved
02:09:32.120 | and how you say it, and here's how you load from it.
02:09:34.400 | You just say, load, whatever,
02:09:36.800 | you teach it in text in English.
02:09:39.160 | And then it might learn to use a memory bank from that.
02:09:42.400 | - Oh, so the neural net is the architecture
02:09:45.360 | for the background model, the base thing.
02:09:48.200 | And then everything else is just on top of it.
02:09:50.120 | That's pretty easy to do. - It's not just text, right?
02:09:51.640 | You're giving it gadgets and gizmos.
02:09:52.960 | So you're teaching some kind of a special language
02:09:55.920 | by which it can save arbitrary information
02:09:58.160 | and retrieve it at a later time.
02:09:59.720 | And you're telling it about these special tokens
02:10:01.720 | and how to arrange them to use these interfaces.
02:10:04.760 | It's like, hey, you can use a calculator.
02:10:06.400 | Here's how you use it.
02:10:07.240 | Just do five, three plus four, one equals.
02:10:10.240 | And when equals is there,
02:10:12.640 | a calculator will actually read out the answer
02:10:14.560 | and you don't have to calculate it yourself.
02:10:16.240 | And you just tell it in English, this might actually work.
02:10:19.600 | - Do you think in that sense, Gato is interesting,
02:10:21.840 | the DeepMind system that it's not just doing language,
02:10:24.760 | but actually throws it all in the same pile,
02:10:28.680 | images, actions, all that kind of stuff.
02:10:31.720 | That's basically what we're moving towards.
02:10:34.160 | - Yeah, I think so.
02:10:35.000 | So Gato is very much a kitchen sink approach
02:10:38.360 | to reinforcement learning lots of different environments
02:10:42.440 | with a single fixed transformer model.
02:10:46.640 | I think it's a very sort of early result in that realm.
02:10:49.880 | But I think, yeah, it's along the lines
02:10:51.520 | of what I think things will eventually look like.
02:10:53.440 | - Right, so this is the early days of a system
02:10:55.920 | that eventually will look like this,
02:10:57.120 | like from a rich, sudden perspective.
02:10:59.920 | - Yeah, I'm not super huge fan of, I think,
02:11:01.560 | all these interfaces that look very different.
02:11:04.840 | I would want everything to be normalized into the same API.
02:11:07.360 | So for example, screen pixels, very same API.
02:11:10.160 | Instead of having different world environments
02:11:11.920 | that have very different physics and joint configurations
02:11:14.120 | and appearances and whatever,
02:11:15.600 | and you're having some kind of special tokens
02:11:17.160 | for different games that you can plug.
02:11:19.520 | I'd rather just normalize everything to a single interface
02:11:22.520 | so it looks the same to the neural net,
02:11:24.200 | if that makes sense.
02:11:25.040 | - So it's all gonna be pixel-based pong in the end.
02:11:27.600 | - I think so.
02:11:28.440 | - Okay, let me ask you about your own personal life.
02:11:34.880 | A lot of people wanna know,
02:11:36.760 | you're one of the most productive and brilliant people
02:11:39.240 | in the history of AI.
02:11:40.280 | What does a productive day
02:11:41.640 | in the life of Andrej Karpathy look like?
02:11:44.440 | What time do you wake up?
02:11:45.920 | 'Cause imagine some kind of dance
02:11:48.920 | between the average productive day
02:11:50.640 | and a perfect productive day.
02:11:51.920 | So the perfect productive day is the thing we strive towards
02:11:55.360 | and the average is kind of what it kind of converges to,
02:11:58.120 | given all the mistakes and human eventualities and so on.
02:12:01.760 | So what time do you wake up?
02:12:03.040 | Are you a morning person?
02:12:04.400 | - I'm not a morning person.
02:12:05.520 | I'm a night owl, for sure.
02:12:07.280 | - Is it stable or not?
02:12:08.920 | - It's semi-stable, like eight or nine
02:12:11.200 | or something like that.
02:12:12.560 | During my PhD, it was even later.
02:12:14.480 | I used to go to sleep usually at 3 a.m.
02:12:16.680 | I think the a.m. hours are precious
02:12:19.920 | and very interesting time to work
02:12:21.120 | because everyone is asleep.
02:12:23.120 | At 8 a.m. or 7 a.m., the East Coast is awake.
02:12:26.280 | So there's already activity,
02:12:27.400 | there's already some text messages, whatever.
02:12:29.200 | There's stuff happening.
02:12:30.040 | You can go on some news website
02:12:31.880 | and there's stuff happening and distracting.
02:12:34.160 | At 3 a.m., everything is totally quiet.
02:12:36.560 | And so you're not gonna be bothered
02:12:37.800 | and you have solid chunks of time to do work.
02:12:42.120 | So I like those periods, night owl by default.
02:12:45.160 | And then I think productive time, basically,
02:12:47.360 | what I like to do is you need to build some momentum
02:12:50.880 | on a problem without too much distraction.
02:12:53.560 | And you need to load your RAM,
02:12:56.920 | your working memory with that problem.
02:13:00.320 | And then you need to be obsessed with it
02:13:01.640 | when you're taking a shower, when you're falling asleep.
02:13:03.960 | You need to be obsessed with the problem
02:13:05.280 | and it's fully in your memory
02:13:06.520 | and you're ready to wake up and work on it right there.
02:13:08.880 | - So is this in a scale, temporal scale of a single day
02:13:13.320 | or a couple of days, a week, a month?
02:13:15.080 | - So I can't talk about one day basically in isolation
02:13:17.560 | because it's a whole process.
02:13:19.400 | When I wanna get productive in the problem,
02:13:21.480 | I feel like I need a span of a few days
02:13:23.900 | where I can really get in on that problem.
02:13:26.520 | And I don't wanna be interrupted
02:13:27.560 | and I'm going to just be completely obsessed
02:13:30.120 | with that problem.
02:13:31.160 | And that's where I do most of my good workouts.
02:13:34.080 | - You've done a bunch of cool little projects
02:13:36.440 | in a very short amount of time, very quickly.
02:13:38.440 | So that requires you just focusing on it.
02:13:40.880 | - Yeah, basically I need to load my working memory
02:13:42.480 | with the problem and I need to be productive
02:13:44.440 | because there's always a huge fixed cost
02:13:46.320 | to approaching any problem.
02:13:47.720 | I was struggling with this, for example, at Tesla
02:13:51.160 | because I want to work on a small side project,
02:13:53.560 | but okay, you first need to figure out,
02:13:54.980 | oh, okay, I need to SSH into my cluster.
02:13:56.520 | I need to bring up a VS Code editor
02:13:58.080 | so I can work on this.
02:13:59.320 | I ran into some stupid error because of some reason.
02:14:03.240 | Like you're not at a point
02:14:04.080 | where you can be just productive right away.
02:14:05.680 | You are facing barriers.
02:14:07.560 | And so it's about really removing all of that barrier
02:14:11.480 | and you're able to go into the problem
02:14:12.880 | and you have the full problem loaded in your memory.
02:14:15.400 | - And somehow avoiding distractions of all different forms,
02:14:18.240 | like news stories, emails, but also distractions
02:14:23.200 | from other interesting projects
02:14:24.840 | that you previously worked on
02:14:26.200 | or currently working on and so on.
02:14:28.040 | You just want to really focus your mind.
02:14:29.800 | - And I mean, I can take some time off for distractions
02:14:32.080 | and in between, but I think it can't be too much.
02:14:35.400 | Most of your day is sort of like spent on that problem.
02:14:38.040 | And then, you know, I drink coffee,
02:14:41.080 | I have my morning routine, I look at some news,
02:14:43.920 | Twitter, Hacker News, Wall Street Journal, et cetera.
02:14:46.680 | So it's great.
02:14:47.520 | - So basically you wake up, you have some coffee,
02:14:49.440 | are you trying to get to work as quickly as possible?
02:14:51.400 | Or do you take in this diet
02:14:53.000 | of like what the hell is happening in the world first?
02:14:56.480 | - I am, I do find it interesting to know about the world.
02:14:59.680 | I don't know that it's useful or good,
02:15:01.960 | but it is part of my routine right now.
02:15:03.600 | So I do read through a bunch of news articles
02:15:05.320 | and I want to be informed and I'm suspicious of it.
02:15:09.920 | I'm suspicious of the practice,
02:15:10.980 | but currently that's where I am.
02:15:12.320 | - Oh, you mean suspicious about the positive effect
02:15:15.840 | of that practice on your productivity
02:15:18.120 | and your wellbeing?
02:15:19.320 | - My wellbeing psychologically, yeah.
02:15:21.080 | - And also on your ability to deeply understand the world
02:15:23.600 | because there's a bunch of sources of information,
02:15:26.520 | you're not really focused on deeply integrating it.
02:15:28.520 | - Yeah, it's a little distracting.
02:15:30.840 | - In terms of a perfectly productive day
02:15:33.240 | for how long of a stretch of time in one session
02:15:37.400 | do you try to work and focus on a thing?
02:15:39.720 | Is it a couple hours, is it one hour,
02:15:41.160 | is it 30 minutes, is it 10 minutes?
02:15:43.560 | - I can probably go like a small few hours
02:15:45.360 | and then I need some breaks in between
02:15:47.160 | for like food and stuff.
02:15:48.600 | And yeah, but I think like it's still really hard
02:15:52.480 | to accumulate hours.
02:15:53.560 | I was using a tracker that told me exactly
02:15:55.400 | how much time I spent coding any one day.
02:15:57.080 | And even on a very productive day,
02:15:58.680 | I still spent only like six or eight hours.
02:16:01.640 | And it's just because there's so much padding,
02:16:03.480 | commute, talking to people, food, et cetera.
02:16:07.240 | There's like a cost of life, just living and sustaining
02:16:11.000 | and homeostasis and just maintaining yourself as a human
02:16:14.360 | is very high.
02:16:15.900 | - And there seems to be a desire within the human mind
02:16:19.960 | to participate in society that creates that padding.
02:16:23.640 | 'Cause the most productive days I've ever had
02:16:26.520 | is just completely from start to finish,
02:16:28.280 | just tuning out everything and just sitting there.
02:16:31.280 | - And then you could do more than six and eight hours.
02:16:34.120 | Is there some wisdom about what gives you strength
02:16:36.520 | to do like tough days of long focus?
02:16:39.640 | - Yeah, just like whenever I get obsessed about a problem,
02:16:43.040 | something just needs to work, something just needs to exist.
02:16:45.280 | - It needs to exist.
02:16:47.040 | So you're able to deal with bugs and programming issues
02:16:49.880 | and technical issues and design decisions
02:16:53.000 | that turn out to be the wrong ones.
02:16:54.240 | You're able to think through all of that,
02:16:55.680 | given that you want a thing to exist.
02:16:57.840 | - Yeah, it needs to exist.
02:16:58.680 | And then I think to me also a big factor is,
02:17:01.240 | are other humans are going to appreciate it?
02:17:02.960 | Are they going to like it?
02:17:04.120 | That's a big part of my motivation.
02:17:05.400 | If I'm helping humans and they seem happy,
02:17:07.900 | they say nice things, they tweet about it or whatever,
02:17:11.560 | that gives me pleasure because I'm doing something useful.
02:17:13.840 | - So like you do see yourself sharing it with the world,
02:17:16.960 | like whether it's on GitHub,
02:17:17.960 | whether it's a blog post or through videos.
02:17:19.800 | - Yeah, I was thinking about it.
02:17:20.640 | Like suppose I did all these things but did not share them,
02:17:22.960 | I don't think I would have the same amount of motivation
02:17:24.720 | that I can build up.
02:17:25.560 | - You enjoy the feeling of other people gaining value
02:17:30.840 | and happiness from the stuff you've created.
02:17:33.080 | - Yeah.
02:17:34.280 | - What about diet?
02:17:35.560 | I saw you played with intermittent fasting.
02:17:38.360 | Do you fast?
02:17:39.200 | Does that help?
02:17:40.040 | - I play with everything.
02:17:40.860 | (laughs)
02:17:41.700 | - With the things you play,
02:17:42.640 | what's been most beneficial to your ability
02:17:45.840 | to mentally focus on a thing?
02:17:47.360 | And just mental productivity and happiness.
02:17:50.800 | You still fast?
02:17:51.640 | - Yeah, I still fast, but I do intermittent fasting.
02:17:54.200 | But really what it means at the end of the day
02:17:55.640 | is I skip breakfast.
02:17:56.880 | So I do 18/6 roughly by default
02:17:59.720 | when I'm in my steady state.
02:18:01.120 | If I'm traveling or doing something else,
02:18:02.480 | I will break the rules.
02:18:03.540 | But in my steady state, I do 18/6.
02:18:06.000 | So I eat only from 12 to six.
02:18:08.220 | Not a hard rule and I break it often,
02:18:09.560 | but that's my default.
02:18:10.880 | And then, yeah, I've done a bunch of random experiments.
02:18:13.880 | For the most part right now,
02:18:15.400 | where I've been for the last year and a half,
02:18:17.080 | I wanna say, is I'm plant-based or plant-forward.
02:18:20.920 | I heard plant-forward, it sounds better.
02:18:22.520 | - That's what I mean, exactly.
02:18:23.360 | - I don't actually know what the difference is,
02:18:24.200 | but it sounds better in my mind.
02:18:25.680 | But it just means I prefer plant-based food.
02:18:28.880 | - Raw or cooked or?
02:18:30.600 | - I prefer cooked and plant-based.
02:18:33.080 | - So plant-based, forgive me,
02:18:35.860 | I don't actually know how wide the category of plant entails.
02:18:40.800 | - Well, plant-based just means that you're not
02:18:42.360 | - Like a chickpea? - Like a chicken about it
02:18:43.720 | and you can flex.
02:18:45.000 | And you just prefer to eat plants.
02:18:47.560 | And you're not making,
02:18:49.040 | you're not trying to influence other people.
02:18:50.960 | And if someone is, you come to someone's house party
02:18:53.000 | and they serve you a steak that they're really proud of,
02:18:54.880 | you will eat it.
02:18:55.720 | - Yes, right.
02:18:56.760 | So you're not judgmental.
02:18:57.600 | Oh, that's beautiful.
02:18:58.420 | I mean, that's, I'm the flip side of that,
02:19:00.760 | but I'm very sort of flexible.
02:19:02.760 | Have you tried doing one meal a day?
02:19:05.040 | - I have, accidentally, not consistently.
02:19:08.520 | But I've accidentally had that.
02:19:09.560 | I don't like it.
02:19:10.680 | I think it makes me feel not good.
02:19:12.600 | It's too much, too much of a hit.
02:19:15.040 | And so currently I have about two meals a day,
02:19:17.480 | 12 and six, probably.
02:19:18.600 | - I do that nonstop.
02:19:19.960 | I'm doing it now.
02:19:20.800 | I do it one meal a day.
02:19:22.480 | - Okay. - It's interesting.
02:19:23.560 | It's an interesting feeling.
02:19:24.800 | Have you ever fasted longer than a day?
02:19:26.600 | - Yeah, I've done a bunch of water fasts
02:19:28.400 | 'cause I was curious what happens.
02:19:29.860 | - What happened?
02:19:30.780 | Anything interesting?
02:19:32.100 | - Yeah, I would say so.
02:19:32.940 | I mean, you know, what's interesting
02:19:34.260 | is that you're hungry for two days,
02:19:35.940 | and then starting day three or so, you're not hungry.
02:19:38.580 | It's like such a weird feeling
02:19:40.500 | because you haven't eaten in a few days
02:19:41.740 | and you're not hungry.
02:19:42.820 | - Isn't that weird?
02:19:43.660 | - It's really weird.
02:19:44.500 | - One of the many weird things about human biology.
02:19:47.100 | - Yeah. - It figures something out.
02:19:48.260 | It finds another source of energy or something like that,
02:19:51.180 | or relaxes the system.
02:19:53.980 | I don't know how it works.
02:19:54.820 | - Yeah, the body is like, you're hungry, you're hungry,
02:19:56.140 | and then it just gives up.
02:19:57.060 | It's like, okay, I guess we're fasting now.
02:19:58.560 | There's nothing.
02:19:59.400 | (both laughing)
02:20:00.220 | And then it just kind of focuses
02:20:01.240 | on trying to make you not hungry
02:20:03.200 | and not feel the damage of that
02:20:05.580 | and trying to give you some space
02:20:07.260 | to figure out the food situation.
02:20:08.760 | (laughs)
02:20:09.800 | - So are you still to this day most productive at night?
02:20:14.680 | - I would say I am,
02:20:15.720 | but it is really hard to maintain my PhD schedule,
02:20:18.540 | especially when I was, say, working at Tesla and so on.
02:20:21.720 | It's a non-starter.
02:20:23.540 | But even now, people want to meet for various events.
02:20:27.980 | Society lives in a certain period of time,
02:20:30.220 | and you sort of have to work with that.
02:20:32.180 | - It's hard to do a social thing
02:20:34.340 | and then after that return and do work.
02:20:36.540 | - Yeah, it's just really hard.
02:20:37.940 | (both laughing)
02:20:40.100 | - That's why I try, when I do social things,
02:20:41.580 | I try not to do too much drinking
02:20:43.880 | so I can return and continue doing work.
02:20:46.220 | - Yeah.
02:20:47.060 | - But at Tesla, is there conversions?
02:20:51.700 | Not Tesla, but any company,
02:20:53.880 | is there a convergence towards the schedule?
02:20:56.080 | Or is there more?
02:20:57.140 | Is that how humans behave when they collaborate?
02:21:00.720 | I need to learn about this.
02:21:02.240 | Do they try to keep a consistent schedule
02:21:04.200 | where you're all awake at the same time?
02:21:05.720 | - I mean, I do try to create a routine,
02:21:07.400 | and I try to create a steady state
02:21:09.100 | in which I'm comfortable in.
02:21:11.360 | So I have a morning routine, I have a day routine.
02:21:13.380 | I try to keep things to a steady state,
02:21:15.580 | and things are predictable,
02:21:17.940 | and then you can sort of just like,
02:21:19.040 | your body just sort of sticks to that.
02:21:20.920 | And if you try to stress that a little too much,
02:21:22.420 | it will create, when you're traveling
02:21:24.320 | and you're dealing with jet lag,
02:21:25.380 | you're not able to really ascend to where you need to go.
02:21:29.920 | - Yeah, yeah, that's weird too,
02:21:31.400 | about humans with the habits and stuff.
02:21:33.440 | What are your thoughts on work-life balance
02:21:35.960 | throughout a human lifetime?
02:21:37.480 | So Tesla in part was known for sort of
02:21:41.640 | pushing people to their limits
02:21:43.120 | in terms of what they're able to do,
02:21:45.400 | in terms of what they're trying to do,
02:21:48.460 | in terms of how much they work, all that kind of stuff.
02:21:50.760 | - Yeah, I mean, I will say Tesla gets
02:21:52.640 | a little too much bad rep for this,
02:21:55.120 | because what's happening is Tesla,
02:21:56.480 | it's a bursty environment.
02:21:58.120 | So I would say the baseline,
02:22:00.320 | my only point of reference is Google,
02:22:02.120 | where I've interned three times,
02:22:03.120 | and I saw what it's like inside Google and DeepMind.
02:22:05.920 | I would say the baseline is higher than that,
02:22:08.840 | but then there's a punctuated equilibrium,
02:22:10.740 | where once in a while there's a fire,
02:22:12.520 | and people work really hard.
02:22:14.840 | And so it's spiky and bursty,
02:22:16.840 | and then all the stories get collected.
02:22:18.400 | - About the bursts, yeah.
02:22:19.440 | - And then it gives the appearance of like total insanity,
02:22:21.880 | but actually it's just a bit more intense environment,
02:22:24.520 | and there are fires and sprints.
02:22:26.960 | And so I think, definitely though,
02:22:29.000 | I would say it's a more intense environment
02:22:31.920 | than something you would get at Google.
02:22:32.760 | - But in your personal, forget all of that,
02:22:34.900 | just in your own personal life,
02:22:37.560 | what do you think about the happiness of a human being,
02:22:40.960 | a brilliant person like yourself,
02:22:43.860 | about finding a balance between work and life,
02:22:46.680 | or is it such a thing, not a good thought experiment?
02:22:50.760 | - Yeah, I think balance is good,
02:22:55.440 | but I also love to have sprints that are out of distribution.
02:22:58.680 | And that's when I think I've been pretty creative as well.
02:23:03.680 | - Sprints out of distribution means that most of the time
02:23:08.080 | you have a, quote unquote, balance.
02:23:11.720 | - I have balance most of the time.
02:23:12.560 | - And then sprint. - I like being obsessed
02:23:14.040 | with something once in a while.
02:23:15.900 | - Once in a while is what, once a week,
02:23:17.320 | once a month, once a year?
02:23:18.440 | - Yeah, probably like I say, once a month or something.
02:23:20.520 | - And that's when we get a new GitHub repo for monitoring.
02:23:23.280 | - Yeah, that's when you really care about a problem.
02:23:24.960 | It must exist, this will be awesome, you're obsessed with it
02:23:28.200 | and now you can't just do it on that day.
02:23:29.760 | You need to pay the fixed cost of getting into the groove,
02:23:32.560 | and then you need to stay there for a while,
02:23:34.280 | and then society will come and they will try to mess
02:23:36.640 | with you and they will try to distract you.
02:23:38.400 | Yeah, the worst thing is a person who's like,
02:23:39.920 | "I just need five minutes of your time."
02:23:42.400 | This is, the cost of that is not five minutes.
02:23:45.040 | And society needs to change how it thinks about
02:23:48.500 | just five minutes of your time.
02:23:50.340 | - Right.
02:23:51.180 | - It's never, it's never just one minute,
02:23:53.140 | just 30 seconds, just a quick thing.
02:23:53.980 | - What's the big deal?
02:23:54.820 | Why are you being so?
02:23:55.900 | - Yeah, no.
02:23:57.140 | What's your computer setup?
02:24:00.940 | What's like the perfect, are you somebody that's flexible
02:24:04.560 | to no matter what, laptop, four screens?
02:24:08.060 | - Yeah.
02:24:08.900 | - Or do you prefer a certain setup
02:24:11.660 | that you're most productive?
02:24:13.700 | - I guess the one that I'm familiar with is one large screen,
02:24:17.020 | 27 inch, and my laptop on the side.
02:24:20.380 | - What operating system?
02:24:21.700 | - I do Macs, that's my primary.
02:24:23.660 | - For all tasks?
02:24:25.220 | - I would say OSX, but when you're working on deep learning,
02:24:26.940 | everything is Linux.
02:24:27.780 | You're SSHed into a cluster and you're working remotely.
02:24:30.860 | - But what about the actual development,
02:24:32.180 | like using the IDE?
02:24:33.780 | - Yeah, you would use, I think a good way is,
02:24:36.060 | you just run VS Code, my favorite editor right now,
02:24:39.500 | on your Mac, but you are actually,
02:24:41.400 | you have a remote folder through SSH.
02:24:43.440 | So the actual files that you're manipulating
02:24:46.380 | are in the cluster somewhere else.
02:24:47.460 | - So what's the best IDE?
02:24:49.760 | VS Code, what else do people, so I use Emacs still.
02:24:55.560 | - That's cool.
02:24:56.400 | - It may be cool, I don't know if it's maximum productivity.
02:25:00.540 | So what do you recommend in terms of editors?
02:25:04.260 | You worked with a lot of software engineers,
02:25:06.140 | editors for Python, C++, machine learning applications?
02:25:11.320 | - I think the current answer is VS Code.
02:25:13.440 | Currently, I believe that's the best IDE.
02:25:16.680 | It's got a huge amount of extensions.
02:25:18.320 | It has GitHub Copilot integration,
02:25:22.040 | which I think is very valuable.
02:25:23.360 | - What do you think about the Copilot integration?
02:25:25.560 | I was actually, I got to talk a bunch with Guido Narasan,
02:25:28.760 | who's a creator of Python, and he loves Copilot.
02:25:31.960 | He like, he programs a lot with it.
02:25:34.320 | - Yep.
02:25:35.360 | - Do you?
02:25:36.600 | - Yeah, I use Copilot, I love it.
02:25:37.880 | And it's free for me, but I would pay for it.
02:25:40.680 | - Yeah, I think it's very good.
02:25:41.660 | And the utility that I found with it was,
02:25:43.420 | is it, I would say there's a learning curve,
02:25:45.700 | and you need to figure out when it's helpful,
02:25:48.540 | and when to pay attention to its outputs,
02:25:50.180 | and when it's not going to be helpful,
02:25:51.300 | where you should not pay attention to it.
02:25:52.980 | Because if you're just reading its suggestions all the time,
02:25:54.940 | it's not a good way of interacting with it.
02:25:56.620 | But I think I was able to sort of like mold myself to it.
02:25:58.900 | I find it's very helpful, number one,
02:26:00.380 | in copy, paste, and replace some parts.
02:26:02.980 | So I don't, when the pattern is clear,
02:26:05.740 | it's really good at completing the pattern.
02:26:07.740 | And number two, sometimes it suggests APIs
02:26:09.940 | that I'm not aware of.
02:26:11.500 | So it tells you about something that you didn't know.
02:26:14.900 | - And that's an opportunity to discover a new idea.
02:26:15.720 | - It's an opportunity to,
02:26:17.180 | so I would never take Copilot code as given.
02:26:19.500 | I almost always copy, copy paste into a Google search,
02:26:22.660 | and you see what this function is doing.
02:26:24.420 | And then you're like, oh, it's actually,
02:26:25.460 | actually exactly what I need.
02:26:26.820 | Thank you, Copilot.
02:26:27.660 | So you learned something.
02:26:28.480 | - So it's in part a search engine,
02:26:29.980 | a part maybe getting the exact syntax correctly,
02:26:33.860 | that once you see it, it's that NP-hard thing.
02:26:36.940 | It's like, once you see it, you know it's correct.
02:26:40.180 | - Exactly.
02:26:41.020 | - But you yourself struggle.
02:26:42.340 | You can verify efficiently,
02:26:43.500 | but you can't generate efficiently.
02:26:45.660 | - And Copilot really, I mean,
02:26:46.580 | it's autopilot for programming, right?
02:26:49.540 | And currently is doing the link following,
02:26:51.540 | which is like the simple copy, paste, and sometimes suggest.
02:26:54.540 | But over time, it's going to become more and more autonomous.
02:26:57.120 | And so the same thing will play out in not just coding,
02:27:00.020 | but actually across many, many different things probably.
02:27:02.340 | - But coding is an important one, right?
02:27:04.140 | Writing programs.
02:27:06.060 | How do you see the future of that developing,
02:27:08.540 | the program synthesis,
02:27:09.700 | like being able to write programs
02:27:11.680 | that are more and more complicated?
02:27:13.260 | 'Cause right now it's human supervised in interesting ways.
02:27:18.260 | It feels like the transition will be very painful.
02:27:22.020 | - My mental model for it is the same thing will happen
02:27:24.420 | as with the autopilot.
02:27:26.180 | So currently it's doing link following,
02:27:27.860 | it's doing some simple stuff.
02:27:29.420 | And eventually we'll be doing autonomy
02:27:31.260 | and people will have to intervene less and less.
02:27:33.220 | - And those could be like testing mechanisms.
02:27:37.460 | Like if it writes a function
02:27:38.700 | and that function looks pretty damn correct,
02:27:41.420 | but how do you know it's correct?
02:27:43.100 | 'Cause you're like getting lazier and lazier as a programmer.
02:27:46.220 | Like your ability to, 'cause like little bugs,
02:27:48.660 | but I guess it won't make little mistakes.
02:27:50.620 | - No, it will.
02:27:51.820 | Copilot will make off by one subtle bugs.
02:27:54.740 | It has done that to me.
02:27:55.820 | - But do you think future systems will?
02:27:57.820 | Or is it really the off by one
02:28:00.280 | is actually a fundamental challenge of programming?
02:28:02.860 | - In that case, it wasn't fundamental
02:28:04.700 | and I think things can improve,
02:28:05.980 | but yeah, I think humans have to supervise.
02:28:08.420 | I am nervous about people not supervising what comes out
02:28:11.140 | and what happens to, for example,
02:28:12.820 | the proliferation of bugs in all of our systems.
02:28:15.340 | I'm nervous about that,
02:28:16.220 | but I think there will probably be some other copilots
02:28:18.740 | for bug finding and stuff like that at some point.
02:28:21.260 | 'Cause there'll be like a lot more automation for-
02:28:23.420 | - Oh man.
02:28:24.540 | It's like a program, a copilot that generates a compiler,
02:28:30.900 | one that does a linter.
02:28:32.300 | - Yes.
02:28:33.140 | - One that does like a type checker.
02:28:35.420 | - Yeah.
02:28:36.260 | (laughing)
02:28:37.700 | It's a committee of like a GPT sort of like-
02:28:40.380 | - And then there'll be like a manager for the committee.
02:28:42.260 | - Yeah.
02:28:43.100 | - And then there'll be somebody that says
02:28:44.300 | a new version of this is needed.
02:28:45.740 | We need to regenerate it.
02:28:46.980 | - Yeah.
02:28:47.820 | There were 10 GPTs.
02:28:48.640 | They were forwarded and gave 50 suggestions.
02:28:50.220 | Another one looked at it and picked a few that they like.
02:28:53.020 | A bug one looked at it and it was like,
02:28:54.620 | it's probably a bug.
02:28:55.660 | They got re-ranked by some other thing.
02:28:57.360 | And then a final ensemble GPT comes in and is like,
02:29:00.540 | okay, given everything you guys have told me,
02:29:01.940 | this is probably the next token.
02:29:03.140 | (laughing)
02:29:04.140 | - You know, the feeling is the number of programmers
02:29:05.920 | in the world has been growing and growing very quickly.
02:29:08.260 | Do you think it's possible that it'll actually level out
02:29:10.780 | and drop to like a very low number with this kind of world?
02:29:14.500 | 'Cause then you'd be doing software 2.0 programming
02:29:17.660 | and you'll be doing this kind of generation
02:29:22.420 | of copilot type systems programming,
02:29:25.140 | but you won't be doing the old school
02:29:27.500 | soft software 1.0 programming.
02:29:29.860 | - I don't currently think that they're just going
02:29:31.340 | to replace human programmers.
02:29:33.140 | I'm so hesitant saying stuff like this, right?
02:29:37.100 | - Yeah, 'cause this is gonna be replaced in five years.
02:29:40.260 | I don't know, it's going to show that
02:29:42.460 | this is where we thought, 'cause I agree with you,
02:29:45.180 | but I think we might be very surprised, right?
02:29:49.020 | Like what are the next,
02:29:51.380 | what's your sense of where we're staying
02:29:54.180 | with language models?
02:29:55.260 | Does it feel like the beginning or the middle or the end?
02:29:57.900 | - The beginning, 100%.
02:29:59.420 | I think the big question in my mind is,
02:30:00.780 | for sure, GPT will be able to program quite well,
02:30:03.060 | competently and so on.
02:30:04.220 | How do you steer the system?
02:30:05.780 | You still have to provide some guidance
02:30:07.660 | to what you actually are looking for.
02:30:09.260 | And so how do you steer it and how do you say,
02:30:11.420 | how do you talk to it?
02:30:12.780 | How do you audit it and verify that what is done is correct?
02:30:16.820 | And how do you like work with this?
02:30:18.380 | And it's as much, not just an AI problem,
02:30:20.420 | but a UI/UX problem.
02:30:21.940 | - Yeah.
02:30:23.420 | - So beautiful, fertile ground for so much interesting work
02:30:27.340 | for VS Code++ where you're not just,
02:30:29.820 | it's not just human programming anymore.
02:30:31.100 | It's amazing.
02:30:31.940 | - Yeah, so you're interacting with the system.
02:30:33.660 | So not just one prompt, but it's iterative prompting.
02:30:37.220 | - Yeah.
02:30:38.060 | - You're trying to figure out,
02:30:38.900 | having a conversation with the system.
02:30:39.820 | - Yeah.
02:30:40.660 | - That actually, I mean, to me, that's super exciting
02:30:42.740 | to have a conversation with the program I'm running.
02:30:45.820 | - Yeah, maybe at some point you're just conversing with it.
02:30:48.100 | It's like, okay, here's what I want to do.
02:30:49.780 | Actually, this variable,
02:30:51.700 | maybe it's not even that low level as a variable, but.
02:30:54.060 | - You can also imagine like,
02:30:56.100 | can you translate this to C++ and back to Python?
02:30:58.980 | - Yeah, it already kind of exists in some ways.
02:31:00.620 | - No, but just like doing it
02:31:01.700 | as part of the program experience.
02:31:03.620 | Like, I think I'd like to write this function in C++.
02:31:07.700 | Or like, you just keep changing for different programs
02:31:11.380 | 'cause they have different syntax.
02:31:13.500 | Maybe I want to convert this into a functional language.
02:31:15.620 | - Yeah.
02:31:16.460 | - And so like, you get to become multilingual as a programmer
02:31:20.500 | and dance back and forth efficiently.
02:31:22.380 | - Yeah.
02:31:23.220 | I mean, I think the UI/UX effect though
02:31:24.780 | is like still very hard to think through
02:31:26.660 | because it's not just about writing code on a page.
02:31:29.500 | You have an entire developer environment.
02:31:31.340 | You have a bunch of hardware on it.
02:31:33.140 | You have some environmental variables.
02:31:34.540 | You have some scripts that are running in a Chrome job.
02:31:36.420 | Like there's a lot going on to like working with computers
02:31:39.420 | and how do these systems set up environment flags
02:31:43.540 | and work across multiple machines
02:31:45.140 | and set up screen sessions
02:31:46.180 | and automate different processes.
02:31:47.820 | Like how all that works and is auditable by humans
02:31:50.580 | and so on is like massive question at the moment.
02:31:53.300 | - You've built Archive Sanity.
02:31:55.980 | What is Archive and what is the future
02:31:58.340 | of academic research publishing that you would like to see?
02:32:01.940 | - So Archive is this pre-print server.
02:32:03.700 | So if you have a paper,
02:32:05.100 | you can submit it for publication
02:32:06.540 | to journals or conferences and then wait six months
02:32:08.740 | and then maybe get a decision, pass or fail,
02:32:10.860 | or you can just upload it to Archive.
02:32:13.260 | And then people can tweet about it three minutes later
02:32:15.900 | and then everyone sees it, everyone reads it
02:32:17.500 | and everyone can profit from it in their own little ways.
02:32:20.380 | - And you can cite it and it has an official look to it.
02:32:23.820 | It feels like a publication process.
02:32:27.500 | It feels different than if you just put it in a blog post.
02:32:30.380 | - Oh yeah, yeah, I mean, it's a paper
02:32:31.740 | and usually the bar is higher for something
02:32:34.180 | that you would expect on Archive
02:32:35.980 | as opposed to something you would see in a blog post.
02:32:38.060 | - Well, the culture created the bar
02:32:40.940 | 'cause you could probably post a pretty crappy paper
02:32:43.220 | in Archive.
02:32:44.060 | So what's that make you feel like?
02:32:46.820 | What's that make you feel about peer review?
02:32:49.020 | So rigorous peer review by two, three experts
02:32:52.700 | versus the peer review of the community
02:32:56.740 | right as it's written.
02:32:57.780 | - Yeah, basically I think the community is very well able
02:33:00.580 | to peer review things very quickly on Twitter.
02:33:03.900 | And I think maybe it just has to do something
02:33:05.660 | with AI machine learning field specifically though.
02:33:07.940 | I feel like things are more easily auditable
02:33:10.420 | and the verification is easier potentially
02:33:14.020 | than the verification somewhere else.
02:33:15.620 | So it's kind of like,
02:33:17.060 | you can think of these scientific publications
02:33:19.180 | as like little blockchains
02:33:20.180 | where everyone's building on each other's work
02:33:21.460 | and citing each other.
02:33:22.420 | And you sort of have AI,
02:33:23.620 | which is kind of like this much faster and loose blockchain.
02:33:27.060 | But then you have,
02:33:28.100 | and any one individual entry is like very cheap to make.
02:33:32.460 | And then you have other fields
02:33:33.300 | where maybe that model doesn't make as much sense.
02:33:35.780 | And so I think in AI,
02:33:37.900 | at least things are pretty easily verifiable.
02:33:40.180 | And so that's why when people upload papers
02:33:41.820 | that are a really good idea and so on,
02:33:43.380 | people can try it out like the next day.
02:33:45.780 | And they can be the final arbiter
02:33:47.180 | of whether it works or not on their problem.
02:33:49.020 | And the whole thing just moves significantly faster.
02:33:51.500 | So I kind of feel like academia still has a place,
02:33:53.940 | sort of this like conference journal process
02:33:55.620 | still has a place,
02:33:56.460 | but it's sort of like,
02:33:57.940 | it lags behind, I think.
02:33:59.740 | And it's a bit more maybe higher quality process,
02:34:03.140 | but it's not sort of the place
02:34:04.860 | where you will discover cutting edge work anymore.
02:34:07.340 | It used to be the case when I was starting my PhD
02:34:09.060 | that you go to conferences and journals
02:34:10.860 | and you discuss all the latest research.
02:34:12.500 | Now, when you go to a conference or journal,
02:34:14.060 | like no one discusses anything that's there
02:34:15.940 | because it's already like three generations ago irrelevant.
02:34:19.260 | - Yeah, which makes me sad
02:34:20.860 | about like DeepMind, for example,
02:34:22.380 | where they still publish in nature
02:34:24.980 | and these big prestigious,
02:34:26.860 | I mean, there's still value, I suppose,
02:34:28.340 | to the prestige that comes with these big venues,
02:34:30.780 | but the result is that they'll announce
02:34:34.180 | some breakthrough performance
02:34:36.020 | and it will take like a year
02:34:37.900 | to actually publish the details.
02:34:39.620 | I mean, and those details,
02:34:42.620 | if they were published immediately,
02:34:43.780 | would inspire the community
02:34:45.420 | to move in certain directions.
02:34:46.860 | - Yeah, it would speed up the rest of the community,
02:34:48.380 | but I don't know to what extent
02:34:49.980 | that's part of their objective function also.
02:34:52.180 | - That's true.
02:34:53.020 | So it's not just the prestige,
02:34:54.180 | a little bit of the delay is part of it.
02:34:56.980 | - Yeah, they certainly, DeepMind specifically,
02:34:58.780 | has been working in the regime
02:35:00.940 | of having slightly higher quality,
02:35:02.980 | basically process and latency,
02:35:04.820 | and publishing those papers that way.
02:35:07.180 | - Another question from Reddit.
02:35:09.100 | Do you or have you suffered from imposter syndrome?
02:35:12.340 | Being the director of AI at Tesla,
02:35:15.420 | being this person when you're at Stanford
02:35:18.100 | where like the world looks at you
02:35:19.460 | as the expert in AI to teach the world
02:35:23.740 | about machine learning?
02:35:25.460 | - When I was leaving Tesla after five years,
02:35:27.180 | I spent a ton of time in meeting rooms
02:35:29.020 | and I would read papers.
02:35:31.780 | In the beginning when I joined Tesla,
02:35:32.820 | I was writing code
02:35:33.740 | and then I was writing less and less code
02:35:35.060 | and I was reading code
02:35:35.940 | and then I was reading less and less code.
02:35:37.660 | And so this is just a natural progression
02:35:39.100 | that happens, I think.
02:35:40.140 | And definitely I would say near the tail end,
02:35:42.700 | that's when it sort of like starts to hit you a bit more
02:35:44.340 | that you're supposed to be an expert,
02:35:45.380 | but actually the source of truth
02:35:47.580 | is the code that people are writing, the GitHub.
02:35:49.180 | And the actual code itself.
02:35:51.980 | And you're not as familiar with that as you used to be.
02:35:54.380 | And so I would say maybe there's some insecurity there.
02:35:57.220 | - Yeah, that's actually pretty profound.
02:35:59.060 | That a lot of the insecurity has to do
02:36:00.620 | with not writing the code in the computer science space.
02:36:03.500 | 'Cause that is the truth, that right there.
02:36:05.900 | - The code is the source of truth.
02:36:06.900 | The papers and everything else,
02:36:08.020 | it's a high-level summary.
02:36:09.740 | Yeah, it's just a high-level summary,
02:36:12.380 | but at the end of the day, you have to read code.
02:36:13.740 | It's impossible to translate all that code
02:36:15.220 | into actual paper form.
02:36:18.540 | So when things come out,
02:36:20.260 | especially when they have a source code available,
02:36:21.580 | that's my favorite place to go.
02:36:23.100 | - So like I said, you're one of the greatest teachers
02:36:25.500 | of machine learning, AI ever, from CS231N to today.
02:36:30.500 | What advice would you give to beginners
02:36:33.660 | interested in getting into machine learning?
02:36:36.460 | - Beginners are often focused on like what to do.
02:36:40.460 | And I think the focus should be more like how much you do.
02:36:43.340 | So I am kind of like believer on a high level
02:36:45.340 | in this 10,000 hours kind of concept
02:36:47.220 | where you just kind of have to just pick the things
02:36:49.700 | where you can spend time and you care about
02:36:51.460 | and you're interested in.
02:36:52.300 | You literally have to put in 10,000 hours of work.
02:36:55.020 | It doesn't even matter as much like where you put it
02:36:57.300 | and you'll iterate and you'll improve
02:36:59.420 | and you'll waste some time.
02:37:00.540 | I don't know if there's a better way.
02:37:01.820 | You need to put in 10,000 hours.
02:37:03.540 | But I think it's actually really nice
02:37:04.540 | 'cause I feel like there's some sense of determinism
02:37:06.380 | about being an expert at a thing if you spend 10,000 hours.
02:37:09.980 | You can literally pick an arbitrary thing.
02:37:12.540 | And I think if you spend 10,000 hours
02:37:14.220 | of deliberate effort and work,
02:37:15.620 | you actually will become an expert at it.
02:37:17.700 | And so I think it's kind of like a nice thought.
02:37:20.560 | And so basically I would focus more on like,
02:37:24.460 | are you spending 10,000 hours?
02:37:26.220 | That's what I focus on.
02:37:27.180 | - So and then thinking about what kind of mechanisms
02:37:29.940 | maximize your likelihood of getting to 10,000 hours.
02:37:32.660 | - Exactly.
02:37:33.500 | - Which for us silly humans means probably forming
02:37:36.760 | a daily habit of like every single day
02:37:39.100 | actually doing the thing.
02:37:40.180 | - Whatever helps you.
02:37:41.160 | So I do think to a large extent
02:37:42.260 | it's a psychological problem for yourself.
02:37:44.500 | One other thing that I think is helpful
02:37:46.940 | for the psychology of it is many times people
02:37:48.980 | compare themselves to others in the area.
02:37:50.780 | I think this is very harmful.
02:37:52.300 | Only compare yourself to you from some time ago,
02:37:54.900 | like say a year ago.
02:37:56.140 | Are you better than you a year ago?
02:37:58.060 | It's the only way to think.
02:38:00.180 | And I think this, then you can see your progress
02:38:02.220 | and it's very motivating.
02:38:03.460 | - That's so interesting that focus on the quantity of hours.
02:38:07.380 | 'Cause I think a lot of people in the beginner stage,
02:38:10.100 | but actually throughout, get paralyzed
02:38:13.580 | by the choice.
02:38:15.580 | Like which one do I pick this path or this path?
02:38:19.380 | Like they'll literally get paralyzed
02:38:21.060 | by like which IDE to use.
02:38:22.620 | - Well, they're worried.
02:38:23.460 | Yeah, they're worried about all these things.
02:38:24.700 | But the thing is, you will waste time doing something wrong.
02:38:28.500 | You will eventually figure out it's not right.
02:38:29.940 | You will accumulate scar tissue.
02:38:31.660 | And next time you'll grow stronger
02:38:33.420 | because next time you'll have the scar tissue
02:38:35.100 | and next time you'll learn from it.
02:38:36.660 | And now next time you come to a similar situation,
02:38:39.020 | you'll be like, oh, I messed up.
02:38:41.620 | I've spent a lot of time working on things
02:38:43.460 | that never materialized into anything.
02:38:45.180 | And I have all that scar tissue
02:38:46.340 | and I have some intuitions about what was useful,
02:38:48.380 | what wasn't useful, how things turned out.
02:38:50.580 | So all those mistakes were not dead work.
02:38:53.980 | So I just think you should, you should just focus on working.
02:38:56.580 | What have you done?
02:38:57.620 | What have you done last week?
02:38:58.820 | (laughs)
02:39:00.700 | - That's a good question actually to ask
02:39:02.660 | for a lot of things, not just machine learning.
02:39:05.620 | It's a good way to cut the,
02:39:08.420 | I forgot what the term we use,
02:39:09.540 | but the fluff, the blubber, whatever the,
02:39:12.780 | the inefficiencies in life.
02:39:14.660 | What do you love about teaching?
02:39:17.180 | You seem to find yourself often in the,
02:39:20.380 | like drawn to teaching.
02:39:21.700 | You're very good at it, but you're also drawn to it.
02:39:23.540 | - I mean, I don't think I love teaching.
02:39:25.260 | I love happy humans.
02:39:27.220 | (laughs)
02:39:28.060 | And happy humans like when I teach.
02:39:30.100 | I wouldn't say I hate teaching.
02:39:32.300 | I tolerate teaching,
02:39:33.260 | but it's not like the act of teaching that I like.
02:39:34.940 | It's that, you know, I have some,
02:39:38.540 | I have something, I'm actually okay at it.
02:39:41.260 | I'm okay at teaching and people appreciate it a lot.
02:39:43.940 | And so I'm just happy to try to be helpful.
02:39:47.220 | And teaching itself is not like the most,
02:39:49.980 | I mean, it's really, it can be really annoying, frustrating.
02:39:52.700 | I was working on a bunch of lectures just now.
02:39:54.580 | I was reminded back to my days of 231N
02:39:56.980 | just how much work it is to create some of these materials
02:39:59.420 | and make them good.
02:40:00.420 | The amount of iteration and thought,
02:40:01.700 | and you go down blind alleys and just how much you change it.
02:40:04.860 | So creating something good
02:40:06.140 | in terms of like educational value is really hard.
02:40:09.740 | And it's not fun.
02:40:11.260 | (laughs)
02:40:12.100 | - It's difficult.
02:40:12.940 | So people should definitely go watch your new stuff
02:40:15.140 | you put out.
02:40:16.500 | There are lectures where you're actually building the thing
02:40:18.340 | like from, like you said, the code is truth.
02:40:20.820 | So discussing back propagation by building it,
02:40:24.180 | by looking through it, just the whole thing.
02:40:26.180 | So how difficult is that to prepare for?
02:40:27.820 | I think that's a really powerful way to teach.
02:40:30.420 | Did you have to prepare for that
02:40:31.700 | or are you just live thinking through it?
02:40:34.500 | - I will typically do like say three takes
02:40:36.580 | and then I take like the better take.
02:40:38.700 | So I do multiple takes
02:40:39.940 | and I take some of the better takes
02:40:40.940 | and then I just build out a lecture that way.
02:40:42.980 | Sometimes I have to delete 30 minutes of content
02:40:45.340 | because it just went down an alley
02:40:46.580 | that I didn't like too much.
02:40:47.900 | So there's a bunch of iteration
02:40:49.660 | and it probably takes me somewhere around 10 hours
02:40:52.460 | to create one hour of content.
02:40:53.500 | - To get one hour.
02:40:54.780 | It's interesting.
02:40:55.620 | I mean, is it difficult to go back to the basics?
02:40:58.940 | Do you draw a lot of like wisdom
02:41:01.140 | from going back to the basics?
02:41:02.340 | - Yeah, going back to back propagation loss functions
02:41:04.380 | where they come from.
02:41:05.220 | And one thing I like about teaching a lot honestly
02:41:07.300 | is it definitely strengthens your understanding.
02:41:10.340 | So it's not a purely altruistic activity.
02:41:12.620 | It's a way to learn.
02:41:13.740 | If you have to explain something to someone,
02:41:16.700 | you realize you have gaps in knowledge.
02:41:19.420 | And so I even surprised myself in those lectures.
02:41:22.300 | Like, oh, so the result will obviously look at this
02:41:24.660 | and then the result doesn't look like it.
02:41:25.820 | And I'm like, okay, I thought I understood this.
02:41:28.020 | (both laughing)
02:41:29.900 | - But that's why it's really cool.
02:41:31.660 | Literally code, you run it in a notebook
02:41:33.980 | and it gives you a result and you're like, oh, wow.
02:41:36.780 | And like actual numbers, actual input, actual code.
02:41:39.820 | - Yeah, it's not mathematical symbols, et cetera.
02:41:41.580 | The source of truth is the code.
02:41:42.980 | It's not slides.
02:41:44.220 | It's just like, let's build it.
02:41:45.940 | - It's beautiful.
02:41:46.780 | You're a rare human in that sense.
02:41:48.500 | What advice would you give to researchers
02:41:51.780 | trying to develop and publish idea
02:41:54.380 | that have a big impact in the world of AI?
02:41:56.820 | So maybe undergrads, maybe early graduate students.
02:42:01.620 | - Yeah.
02:42:02.660 | I mean, I would say like they definitely have to be
02:42:04.340 | a little bit more strategic
02:42:05.780 | than I had to be as a PhD student
02:42:07.540 | because of the way AI is evolving.
02:42:09.660 | It's going the way of physics where,
02:42:12.420 | in physics you used to be able to do experiments
02:42:13.860 | on your bench top and everything was great
02:42:15.380 | and you can make progress.
02:42:16.940 | And now you have to work in like LHC or like CERN.
02:42:20.020 | And so AI is going in that direction as well.
02:42:23.780 | So there's certain kinds of things
02:42:25.660 | that's just not possible to do on the bench top anymore.
02:42:28.180 | And I think that didn't used to be the case at the time.
02:42:32.700 | - Do you still think that there's like GAN type papers
02:42:37.140 | to be written or like very simple idea
02:42:41.740 | that requires just one computer to illustrate
02:42:43.700 | a simple example?
02:42:44.540 | - I mean, one example that's been very influential
02:42:46.300 | recently is diffusion models.
02:42:47.980 | Diffusion models are amazing.
02:42:49.260 | Diffusion models are six years old.
02:42:51.740 | For the longest time, people were kind of ignoring them
02:42:53.860 | as far as I can tell.
02:42:54.980 | And they're an amazing generative model,
02:42:57.180 | especially in images.
02:42:58.940 | And so stable diffusion and so on, it's all diffusion based.
02:43:01.740 | Diffusion is new.
02:43:02.820 | It was not there and it came from,
02:43:05.020 | well, it came from Google,
02:43:05.860 | but a researcher could have come up with it.
02:43:07.380 | In fact, some of the first,
02:43:09.420 | actually, no, those came from Google as well.
02:43:11.780 | But a researcher could come up with that
02:43:13.220 | in an academic institution.
02:43:15.220 | - Yeah, what do you find most fascinating
02:43:16.660 | about diffusion models?
02:43:17.820 | So from the societal impact to the technical architecture.
02:43:22.620 | - What I like about diffusion is it works so well.
02:43:25.380 | - Was that, is that surprising to you?
02:43:26.820 | The amount of the variety,
02:43:28.740 | almost the novelty of the synthetic data it's generating.
02:43:32.700 | - Yeah, so the stable diffusion images are incredible.
02:43:36.180 | It's the speed of improvement in generating images
02:43:39.340 | has been insane.
02:43:40.900 | We went very quickly from generating like tiny digits
02:43:43.420 | to tiny faces and it all looked messed up.
02:43:45.460 | And now we have stable diffusion
02:43:46.700 | and that happened very quickly.
02:43:48.020 | There's a lot that academia can still contribute.
02:43:49.820 | You know, for example, flash attention
02:43:52.660 | is a very efficient kernel
02:43:54.220 | for running the attention operation inside the transformer
02:43:57.340 | that came from academic environment.
02:43:59.580 | It's a very clever way to structure the kernel.
02:44:02.140 | That's the calculation.
02:44:03.740 | So it doesn't materialize the attention matrix.
02:44:06.060 | And so there's, I think there's still like lots of things
02:44:08.660 | to contribute, but you have to be just more strategic.
02:44:11.060 | - Do you think neural networks can be made to reason?
02:44:13.660 | - Yes.
02:44:16.020 | - Do you think they already reason?
02:44:17.580 | - Yes.
02:44:18.420 | - What's your definition of reasoning?
02:44:20.020 | - Information processing.
02:44:22.460 | (laughing)
02:44:24.660 | - So in the way that humans think through a problem
02:44:26.780 | and come up with novel ideas,
02:44:28.460 | it feels like reasoning.
02:44:33.500 | - Yeah.
02:44:34.340 | - So the novelty,
02:44:35.460 | I don't wanna say, but out of distribution ideas,
02:44:41.980 | you think it's possible?
02:44:43.340 | - Yes.
02:44:44.180 | And I think we're seeing that already
02:44:45.220 | in the current neural nets.
02:44:46.420 | You're able to remix the training set information
02:44:48.940 | into true generalization in some sense.
02:44:51.020 | - That doesn't appear.
02:44:52.460 | - It doesn't appear verbatim in the training set.
02:44:54.660 | Like you're doing something interesting algorithmically.
02:44:56.420 | You're manipulating some symbols
02:44:59.100 | and you're coming up with some correct,
02:45:01.860 | a unique answer in a new setting.
02:45:04.740 | - What would illustrate to you,
02:45:07.660 | holy shit, this thing is definitely thinking?
02:45:10.100 | - To me, thinking or reasoning
02:45:12.740 | is just information processing and generalization.
02:45:15.260 | And I think the neural nets already do that today.
02:45:17.940 | - So being able to perceive the world
02:45:19.740 | or perceive whatever the inputs are
02:45:22.620 | and to make predictions based on that
02:45:27.020 | or actions based on that, that's reasoning.
02:45:28.980 | - Yeah, you're giving correct answers in novel settings
02:45:31.940 | by manipulating information.
02:45:34.820 | You've learned the correct algorithm.
02:45:36.540 | You're not doing just some kind of a lookup table
02:45:38.180 | and there's neighbor search, something like that.
02:45:40.540 | - Let me ask you about AGI.
02:45:42.020 | What are some moonshot ideas
02:45:43.740 | you think might make significant progress towards AGI?
02:45:47.940 | Or maybe in other ways,
02:45:49.340 | what are the big blockers that we're missing now?
02:45:52.340 | - So basically I am fairly bullish
02:45:53.900 | on our ability to build AGIs.
02:45:57.380 | Basically automated systems that we can interact with
02:46:01.100 | that are very human-like
02:46:02.340 | and we can interact with them
02:46:03.180 | in a digital realm or a physical realm.
02:46:05.500 | Currently it seems most of the models
02:46:07.940 | that sort of do these sort of magical tasks
02:46:10.020 | are in a text realm.
02:46:11.060 | I think, as I mentioned,
02:46:14.940 | I'm suspicious that the text realm is not enough
02:46:17.580 | to actually build full understanding of the world.
02:46:20.420 | I do actually think you need to go into pixels
02:46:22.180 | and understand the physical world and how it works.
02:46:24.860 | So I do think that we need to extend these models
02:46:26.660 | to consume images and videos
02:46:28.300 | and train on a lot more data
02:46:30.100 | that is multimodal in that way.
02:46:31.780 | - Do you think you need to touch the world
02:46:33.900 | to understand it also?
02:46:34.980 | - Well, that's the big open question I would say in my mind
02:46:36.980 | is if you also require the embodiment
02:46:39.500 | and the ability to sort of interact with the world,
02:46:42.460 | run experiments and have a data of that form,
02:46:45.500 | then you need to go to Optimus or something like that.
02:46:48.580 | And so I would say Optimus in some way is like a hedge.
02:46:52.300 | In AGI, because it seems to me that it's possible
02:46:57.300 | that just having data from the internet is not enough.
02:47:00.220 | If that is the case, then Optimus may lead to AGI
02:47:04.220 | because Optimus, to me, there's nothing beyond Optimus.
02:47:07.820 | You have like this humanoid form factor
02:47:09.340 | that can actually like do stuff in the world.
02:47:11.340 | You can have millions of them interacting with humans
02:47:13.260 | and so on.
02:47:14.460 | And if that doesn't give rise to AGI at some point,
02:47:17.420 | like I'm not sure what will.
02:47:20.060 | So from a completeness perspective,
02:47:21.780 | I think that's a really good platform,
02:47:24.700 | but it's a much harder platform
02:47:26.340 | because you are dealing with atoms
02:47:28.580 | and you need to actually like build these things
02:47:30.420 | and integrate them into society.
02:47:32.620 | So I think that path takes longer,
02:47:34.900 | but it's much more certain.
02:47:36.580 | And then there's a path of the internet
02:47:38.180 | and just like training these compression models effectively
02:47:41.140 | on trying to compress all the internet.
02:47:43.900 | And that might also give these agents as well.
02:47:48.100 | Compress the internet, but also interact with the internet.
02:47:51.580 | So it's not obvious to me.
02:47:54.140 | In fact, I suspect you can reach AGI
02:47:56.740 | without ever entering the physical world.
02:47:59.420 | And which is a little bit more concerning
02:48:03.780 | because that results in it happening faster.
02:48:08.780 | So it just feels like we're in boiling water.
02:48:11.860 | We won't know as it's happening.
02:48:14.180 | I would like to, I'm not afraid of AGI.
02:48:17.820 | I'm excited about it.
02:48:19.100 | There's always concerns,
02:48:20.380 | but I would like to know when it happens.
02:48:22.780 | - Yeah.
02:48:24.780 | - And have like hints about when it happens,
02:48:26.780 | like a year from now it will happen, that kind of thing.
02:48:30.100 | I just feel like in the digital realm, it just might happen.
02:48:32.660 | - Yeah.
02:48:33.500 | I think all we have available to us
02:48:34.620 | because no one has built AGI again.
02:48:36.780 | So all we have available to us is,
02:48:38.900 | is there enough fertile ground on the periphery?
02:48:42.420 | I would say yes.
02:48:43.260 | And we have the progress so far, which has been very rapid.
02:48:46.380 | And there are next steps that are available.
02:48:48.460 | And so I would say, yeah, it's quite likely
02:48:51.620 | that we'll be interacting with digital entities.
02:48:54.260 | - How will you know that somebody has built AGI?
02:48:57.060 | - It's going to be a slow,
02:48:58.100 | I think it's going to be a slow incremental transition.
02:48:59.940 | It's going to be product based and focused.
02:49:01.620 | It's going to be GitHub Copilot getting better.
02:49:03.700 | And then GPT is helping you write.
02:49:06.380 | And then these oracles that you can go to
02:49:08.220 | with mathematical problems.
02:49:09.620 | I think we're on a verge of being able to ask
02:49:12.340 | very complex questions in chemistry, physics, math
02:49:16.260 | of these oracles and have them complete solutions.
02:49:19.700 | - So AGI to use primarily focused on intelligence.
02:49:22.540 | So consciousness doesn't enter into it.
02:49:27.540 | - So in my mind, consciousness is not a special thing
02:49:30.060 | you will figure out and bolt on.
02:49:32.100 | I think it's an emergent phenomenon of a large enough
02:49:34.820 | and complex enough generative model, sort of.
02:49:38.300 | So if you have a complex enough world model
02:49:42.500 | that understands the world,
02:49:43.780 | then it also understands its predicament in the world
02:49:46.700 | as being a language model,
02:49:48.500 | which to me is a form of consciousness or self-awareness.
02:49:51.940 | - So in order to understand the world deeply,
02:49:53.820 | you probably have to integrate yourself into the world.
02:49:56.580 | And in order to interact with humans
02:49:58.460 | and other living beings,
02:50:00.260 | consciousness is a very useful tool.
02:50:02.700 | - I think consciousness is like a modeling insight.
02:50:05.740 | - Modeling insight.
02:50:07.260 | - Yeah, it's a, you have a powerful enough model
02:50:10.060 | of understanding the world that you actually understand
02:50:11.860 | that you are an entity in it.
02:50:13.300 | - Yeah, but there's also this,
02:50:15.460 | perhaps just the narrative we tell ourselves,
02:50:17.340 | there's a, it feels like something to experience the world,
02:50:20.820 | the hard problem of consciousness.
02:50:22.740 | But that could be just a narrative that we tell ourselves.
02:50:24.860 | - Yeah, I don't think, yeah, I think it will emerge.
02:50:27.140 | I think it's going to be something very boring.
02:50:29.340 | Like we'll be talking to these digital AIs,
02:50:31.820 | they will claim they're conscious.
02:50:33.300 | They will appear conscious.
02:50:34.940 | They will do all the things that you would expect
02:50:36.380 | of other humans.
02:50:37.460 | And it's going to just be a stalemate.
02:50:40.300 | - I think there'll be a lot of actual
02:50:42.540 | fascinating ethical questions,
02:50:44.620 | like Supreme Court level questions
02:50:47.580 | of whether you're allowed to turn off a conscious AI,
02:50:51.740 | if you're allowed to build a conscious AI.
02:50:54.500 | Maybe there would have to be the same kind of debates
02:50:58.660 | that you have around,
02:50:59.700 | sorry to bring up a political topic,
02:51:03.060 | but abortion, which is the deeper question with abortion
02:51:07.940 | is what is life?
02:51:11.500 | And the deep question with AI is also what is life
02:51:15.340 | and what is conscious?
02:51:16.420 | And I think that'll be very fascinating to bring up.
02:51:20.780 | It might become illegal to build systems
02:51:23.580 | that are capable of such level of intelligence
02:51:28.580 | that consciousness would emerge
02:51:29.860 | and therefore the capacity to suffer would emerge.
02:51:32.180 | And a system that says, no, please don't kill me.
02:51:36.140 | - Well, that's what the Lambda chatbot
02:51:38.460 | already told this Google engineer, right?
02:51:41.220 | Like it was talking about not wanting to die or so on.
02:51:44.860 | - So that might become illegal to do that.
02:51:47.220 | - Right.
02:51:48.060 | - 'Cause otherwise you might have a lot of creatures
02:51:52.540 | that don't want to die and they will-
02:51:55.340 | - You can just spawn infinity of them on a cluster.
02:51:57.860 | - And then that might lead to like horrible consequences
02:52:01.660 | 'cause then there might be a lot of people
02:52:03.580 | that secretly love murder
02:52:05.060 | and they'll start practicing murder on those systems.
02:52:07.620 | I mean, there's just, to me, all of this stuff
02:52:10.420 | just brings a beautiful mirror to the human condition
02:52:14.100 | and human nature and we'll get to explore it.
02:52:15.820 | And that's what like the best of the Supreme Court
02:52:19.620 | of all the different debates we have about ideas
02:52:22.260 | of what it means to be human.
02:52:23.420 | We get to ask those deep questions
02:52:25.300 | that we've been asking throughout human history.
02:52:27.380 | There's always been the other in human history.
02:52:31.020 | We're the good guys and that's the bad guys
02:52:33.180 | and we're going to, throughout human history,
02:52:36.020 | let's murder the bad guys.
02:52:37.860 | And the same will probably happen with robots.
02:52:40.060 | It'll be the other at first
02:52:41.700 | and then we'll get to ask questions
02:52:42.860 | of what does it mean to be alive?
02:52:44.540 | What does it mean to be conscious?
02:52:45.980 | - Yeah.
02:52:46.820 | And I think there's some canary in the coal mines
02:52:48.300 | even with what we have today.
02:52:50.100 | And for example, there's these like waifus
02:52:52.900 | that you can like work with
02:52:53.740 | and some people are trying to like,
02:52:55.540 | this company is going to shut down
02:52:56.660 | but this person really like loved their waifu
02:52:59.420 | and like is trying to like port it somewhere else
02:53:01.620 | and like it's not possible.
02:53:03.140 | And like, I think like definitely people
02:53:06.260 | will have feelings towards these systems
02:53:10.360 | because in some sense they are like a mirror of humanity
02:53:13.420 | because they are like sort of like a big average
02:53:16.060 | of humanity in the way that it's trained.
02:53:18.500 | - But we can, that average, we can actually watch.
02:53:22.300 | It's nice to be able to interact
02:53:23.620 | with the big average of humanity
02:53:25.340 | and do like a search query on it.
02:53:27.300 | - Yeah, yeah, it's very fascinating.
02:53:29.660 | And we can also, of course, also like shape it.
02:53:31.920 | It's not just a pure average.
02:53:32.940 | We can mess with the training data.
02:53:34.620 | We can mess with the objective.
02:53:35.620 | We can fine tune them in various ways.
02:53:37.660 | So we have some, you know,
02:53:39.700 | impact on what those systems look like.
02:53:42.540 | - If you want to achieve AGI
02:53:44.620 | and you could have a conversation with her
02:53:48.060 | and ask her, talk about anything, maybe ask her a question.
02:53:51.780 | What kind of stuff would you ask?
02:53:54.220 | - I would have some practical questions in my mind.
02:53:55.860 | Like, do I or my loved ones really have to die?
02:54:00.100 | What can we do about that?
02:54:01.400 | (laughing)
02:54:02.900 | - Do you think it will answer clearly
02:54:04.580 | or would it answer poetically?
02:54:06.280 | - I would expect it to give solutions.
02:54:09.020 | I would expect it to be like,
02:54:10.140 | well, I've read all of these textbooks
02:54:11.780 | and I know all these things that you've produced.
02:54:13.460 | And it seems to me like here are the experiments
02:54:14.980 | that I think it would be useful to run next.
02:54:17.560 | And here's some gene therapies
02:54:18.620 | that I think would be helpful.
02:54:19.860 | And here are the kinds of experiments that you should run.
02:54:22.300 | - Okay, let's go with this thought experiment, okay?
02:54:25.300 | Imagine that mortality is actually
02:54:29.380 | a prerequisite for happiness.
02:54:32.980 | So if we become immortal,
02:54:34.740 | we'll actually become deeply unhappy.
02:54:36.780 | And the model is able to know that.
02:54:39.620 | So what is this supposed to tell you,
02:54:41.220 | stupid human, about it?
02:54:42.540 | Yes, you can become immortal,
02:54:43.820 | but you will become deeply unhappy.
02:54:46.140 | If the AGI system is trying to empathize with you, human,
02:54:51.140 | what is this supposed to tell you?
02:54:53.520 | That yes, you don't have to die,
02:54:55.740 | but you're really not gonna like it?
02:54:57.860 | Is it gonna be deeply honest?
02:54:59.700 | Like there's a interstellar, what is it?
02:55:02.060 | The AI says like, humans want 90% honesty.
02:55:05.660 | (laughing)
02:55:08.020 | So like you have to pick how honest
02:55:09.780 | do I wanna answer these practical questions.
02:55:11.780 | - Yeah, I love AI interstellar, by the way.
02:55:14.180 | I think it's like such a sidekick to the entire story,
02:55:16.720 | but at the same time, it's like really interesting.
02:55:19.700 | - It's kind of limited in certain ways, right?
02:55:22.320 | - Yeah, it's limited.
02:55:23.160 | I think that's totally fine, by the way.
02:55:24.540 | I don't think, I think it's fine and plausible
02:55:27.900 | to have a limited and imperfect AGIs.
02:55:30.460 | - Is that the feature almost?
02:55:34.020 | - As an example, like it has a fixed amount of compute
02:55:36.940 | on its physical body.
02:55:38.260 | And it might just be that even though you can have
02:55:40.680 | a super amazing mega brain, super intelligent AI,
02:55:43.980 | you also can have like, you know, less intelligent AIs
02:55:46.580 | that you can deploy in a power efficient way.
02:55:49.540 | And then they're not perfect, they might make mistakes.
02:55:51.460 | - No, I meant more like, say you had infinite compute,
02:55:55.320 | and it's still good to make mistakes sometimes.
02:55:58.140 | Like in order to integrate yourself, like, what is it?
02:56:01.760 | Going back to "Good Will Hunting,"
02:56:03.280 | Robin Williams' character says like,
02:56:05.560 | "The human imperfections, that's the good stuff," right?
02:56:09.160 | Isn't that the, like, we don't want perfect,
02:56:12.480 | we want flaws in part to form connections with each other,
02:56:17.480 | 'cause it feels like something you can attach
02:56:19.240 | your feelings to, the flaws.
02:56:22.720 | And in that same way, you want an AI that's flawed.
02:56:26.060 | I don't know, I feel like perfection is cool.
02:56:28.160 | - But then you're saying, okay, yeah.
02:56:29.740 | - But that's not AGI.
02:56:30.920 | But see, AGI would need to be intelligent enough
02:56:33.920 | to give answers to humans that humans don't understand.
02:56:36.800 | And I think perfect is something humans can't understand.
02:56:40.120 | Because even science doesn't give perfect answers.
02:56:42.520 | There's always gaffes and mysteries, and I don't know.
02:56:47.000 | I don't know if humans want perfect.
02:56:50.080 | - Yeah, I can imagine just having a conversation
02:56:52.760 | with this kind of oracle entity, as you'd imagine them.
02:56:55.800 | And yeah, maybe it can tell you about,
02:56:58.240 | based on my analysis of human condition,
02:57:01.160 | you might not want this.
02:57:03.120 | And here are some of the things that might--
02:57:05.200 | - But every dumb human will say, "Yeah, yeah, yeah, yeah.
02:57:08.720 | "Trust me, give me the truth, I can handle it."
02:57:12.360 | - But that's the beauty, like, people can choose.
02:57:15.000 | But then, the old marshmallow test with the kids and so on,
02:57:20.000 | I feel like too many people can't handle the truth,
02:57:25.320 | probably including myself.
02:57:26.880 | Like, the deep truth of the human condition,
02:57:28.640 | I don't know if I can handle it.
02:57:30.960 | Like, what if there's some darks?
02:57:32.760 | What if we are an alien science experiment,
02:57:35.740 | and it realizes that?
02:57:36.940 | What if it had, I mean--
02:57:37.920 | - Yeah, I mean, this is "The Matrix," all over again.
02:57:41.880 | - I don't know.
02:57:44.000 | I would, what would I talk about?
02:57:46.000 | I don't even, yeah.
02:57:47.200 | Probably I would go with the safer scientific questions
02:57:52.080 | at first that have nothing to do with my own personal life
02:57:55.800 | and mortality, just like about physics and so on.
02:57:59.200 | To build up, like, let's see where it's at.
02:58:02.640 | Or maybe see if it has a sense of humor.
02:58:04.540 | That's another question.
02:58:06.020 | Would it be able to, presumably, in order to,
02:58:08.500 | if it understands humans deeply,
02:58:10.080 | would it be able to generate humor?
02:58:15.080 | - Yeah, I think that's actually a wonderful benchmark,
02:58:17.760 | almost, like, is it able,
02:58:19.320 | I think that's a really good point, basically.
02:58:21.280 | - To make you laugh.
02:58:22.280 | - Yeah, if it's able to be, like,
02:58:23.400 | a very effective stand-up comedian
02:58:24.840 | that is doing something very interesting computationally.
02:58:26.920 | I think being funny is extremely hard.
02:58:28.880 | - Yeah, because it's hard in a way, like a Turing test.
02:58:33.880 | The original intent of the Turing test is hard,
02:58:38.500 | because you have to convince humans.
02:58:40.280 | And there's nothing, that's why,
02:58:41.880 | that's why comedians talk about this.
02:58:45.280 | Like, this is deeply honest,
02:58:47.920 | because if people can't help but laugh,
02:58:49.880 | and if they don't laugh, that means you're not funny.
02:58:51.800 | If they laugh, it's funny.
02:58:52.880 | - And you're showing, you need a lot of knowledge
02:58:54.800 | to create humor, about, like, the occult,
02:58:57.360 | you mentioned human condition and so on,
02:58:58.560 | and then you need to be clever with it.
02:59:01.160 | - You mentioned a few movies.
02:59:02.320 | You tweeted, "Movies that I've seen five plus times,
02:59:05.120 | "but am ready and willing to keep watching.
02:59:08.380 | "Interstellar, Gladiator, Contact,
02:59:10.360 | "Good Will, Hunting, The Matrix, Lord of the Rings,
02:59:13.180 | "all three, Avatar, Fifth Element, so on."
02:59:15.720 | It goes on.
02:59:16.560 | Terminator 2, mean girls, I'm not gonna ask about that.
02:59:19.160 | (Josh laughs)
02:59:20.000 | - But Mean Girls is great.
02:59:21.520 | (both laugh)
02:59:23.600 | - What are some that jump out to you in your memory
02:59:25.800 | that you love, and why?
02:59:28.760 | Like, you mentioned The Matrix.
02:59:30.960 | As a computer person, why do you love The Matrix?
02:59:33.400 | - There's so many properties
02:59:35.400 | that make it, like, beautiful and interesting.
02:59:36.680 | So there's all these philosophical questions,
02:59:39.120 | but then there's also AGIs, and there's simulation,
02:59:42.160 | and it's cool, and there's, you know, the black, you know.
02:59:46.320 | - The look of it, the feel of it.
02:59:47.160 | - Yeah, the look of it, the feel of it,
02:59:48.560 | the action, the bullet time.
02:59:50.040 | It was just, like, innovating in so many ways.
02:59:52.340 | - And then Good Will, Hunting, why do you like that one?
02:59:57.500 | - Yeah, I just, I really like this tortured genius
03:00:01.200 | sort of character who's, like, grappling
03:00:03.680 | with whether or not he has, like, any responsibility
03:00:06.520 | or, like, what to do with this gift that he was given,
03:00:08.720 | or, like, how to think about the whole thing.
03:00:10.920 | - But there's also a dance between the genius
03:00:13.480 | and the personal, like, what it means
03:00:16.600 | to love another human being.
03:00:18.080 | - There's a lot of themes there.
03:00:18.920 | It's just a beautiful movie.
03:00:20.280 | - And then the fatherly figure, the mentor,
03:00:22.240 | and the psychiatrist, and the--
03:00:24.320 | - It, like, really, like, it messes with you.
03:00:27.040 | You know, there's some movies that just, like,
03:00:28.120 | really mess with you on a deep level.
03:00:31.080 | - Do you relate to that movie at all?
03:00:33.240 | - No.
03:00:34.600 | - It's not your fault, Andre, as I said.
03:00:36.960 | Lord of the Rings, that's self-explanatory.
03:00:40.160 | Terminator 2, which is interesting.
03:00:42.760 | You rewatch that a lot.
03:00:44.160 | Is that better than Terminator 1?
03:00:46.120 | You like Arnold--
03:00:46.960 | - I do like Terminator 1 as well.
03:00:49.060 | I like Terminator 2 a little bit more,
03:00:51.680 | but in terms of, like, its surface properties.
03:00:53.920 | (laughing)
03:00:55.880 | - Do you think Skynet is at all a possibility?
03:00:58.560 | - Yes.
03:00:59.400 | - Like, the actual, sort of, autonomous weapon system
03:01:04.360 | kind of thing.
03:01:05.200 | Do you worry about that stuff?
03:01:06.840 | - I do worry about it 100%.
03:01:07.680 | - AI being used for war.
03:01:09.440 | - I 100% worry about it.
03:01:10.560 | And so, I mean, some of these fears of AGIs
03:01:14.480 | and how this will plan out.
03:01:15.480 | I mean, these will be, like, very powerful entities,
03:01:17.160 | probably, at some point.
03:01:18.000 | And so, for a long time, they're going to be tools
03:01:20.760 | in the hands of humans.
03:01:22.240 | You know, people talk about, like, alignment of AGIs
03:01:24.240 | and how to make, the problem is, like,
03:01:26.080 | even humans are not aligned.
03:01:27.760 | So, how this will be used and what this is gonna look like
03:01:31.440 | is, yeah, it's troubling.
03:01:34.480 | - Do you think it'll happen slowly enough
03:01:36.600 | that we'll be able to, as a human civilization,
03:01:40.480 | think through the problems?
03:01:41.760 | - Yes, that's my hope, is that it happens slowly enough
03:01:44.000 | and in an open enough way where a lot of people
03:01:46.200 | can see and participate in it.
03:01:48.120 | Just figure out how to deal with this transition,
03:01:50.760 | I think, which is gonna be interesting.
03:01:52.280 | - I draw a lot of inspiration from nuclear weapons
03:01:54.760 | 'cause I sure thought it would be fucked
03:01:57.960 | once they developed nuclear weapons.
03:02:00.280 | But, like, it's almost like when the systems
03:02:05.240 | are not so dangerous, they destroy human civilization,
03:02:07.840 | we deploy them and learn the lessons.
03:02:09.920 | And then we quickly, if it's too dangerous,
03:02:12.720 | we quickly, quickly, we might still deploy it,
03:02:15.560 | but you very quickly learn not to use them.
03:02:17.800 | And so, there'll be, like, this balance achieved.
03:02:19.640 | Humans are very clever as a species.
03:02:21.920 | It's interesting.
03:02:23.000 | We exploit the resources as much as we can,
03:02:25.520 | but we don't, we avoid destroying ourselves,
03:02:27.840 | it seems like. - Yeah.
03:02:29.240 | Well, I don't know about that, actually.
03:02:30.800 | - I hope it continues.
03:02:32.000 | - I mean, I'm definitely, like, concerned
03:02:35.400 | about nuclear weapons and so on,
03:02:36.760 | not just as a result of the recent conflict,
03:02:38.840 | even before that.
03:02:40.400 | That's probably my number one concern for humanity.
03:02:43.400 | - So, if humanity destroys itself
03:02:47.600 | or destroys, you know, 90% of people,
03:02:50.440 | that would be because of nukes?
03:02:52.480 | - I think so.
03:02:53.320 | And it's not even about the full destruction.
03:02:55.760 | To me, it's bad enough if we reset society.
03:02:57.960 | That would be, like, terrible.
03:02:59.560 | It would be really bad.
03:03:00.400 | And I can't believe we're, like, so close to it.
03:03:03.600 | - Yeah. - It's, like, so crazy to me.
03:03:05.160 | - It feels like we might be a few tweets away
03:03:07.120 | from something like that.
03:03:08.440 | - Yep.
03:03:09.280 | Basically, it's extremely unnerving,
03:03:11.560 | but, and has been for me for a long time.
03:03:14.240 | - It seems unstable that world leaders,
03:03:18.520 | just having a bad mood, can, like,
03:03:21.680 | take one step towards a bad direction and it escalates.
03:03:26.640 | - Yeah.
03:03:27.560 | - Because of a collection of bad moods,
03:03:30.360 | it can escalate without being able to stop.
03:03:33.720 | - Yeah, it's just, it's a huge amount of power.
03:03:37.160 | And then, also, with the proliferation.
03:03:39.640 | Basically, I don't actually really see,
03:03:41.880 | I don't actually know what the good outcomes are here.
03:03:43.600 | (laughs)
03:03:44.960 | So, I'm definitely worried about it a lot.
03:03:46.680 | And then, AGI is not currently there,
03:03:48.400 | but I think at some point,
03:03:49.560 | will more and more become something like it.
03:03:53.280 | The danger with AGI, even, is that,
03:03:55.520 | I think it's even, like, slightly worse,
03:03:56.880 | in the sense that there are good outcomes of AGI.
03:04:01.280 | And then, the bad outcomes are, like, an epsilon away,
03:04:03.960 | like a tiny one away.
03:04:05.240 | And so, I think capitalism and humanity and so on
03:04:08.240 | will drive for the positive ways of using that technology.
03:04:11.960 | But then, if bad outcomes are just, like, a tiny,
03:04:13.920 | like, flip a minus sign away,
03:04:16.560 | that's a really bad position to be in.
03:04:18.320 | - A tiny perturbation of the system
03:04:20.320 | results in the destruction of the human species.
03:04:23.000 | It's a weird line to walk.
03:04:25.200 | - Yeah, I think, in general,
03:04:26.040 | what's really weird about, like, the dynamics of humanity
03:04:27.960 | in this explosion we've talked about,
03:04:29.160 | is just, like, the insane coupling afforded by technology.
03:04:32.960 | And just the instability of the whole dynamical system.
03:04:36.360 | I think it just doesn't look good, honestly.
03:04:39.160 | - Yeah, so that explosion could be destructive
03:04:40.920 | or constructive, and the probabilities are non-zero
03:04:43.600 | in both ends of the spectrum.
03:04:45.960 | - I do feel like I have to try to be optimistic and so on.
03:04:49.080 | I think, even in this case,
03:04:49.960 | I still am predominantly optimistic,
03:04:51.640 | but there's definitely...
03:04:53.680 | - Me too.
03:04:54.720 | - Do you think we'll become a multi-planetary species?
03:04:57.420 | - Probably yes, but I don't know if it's a dominant feature
03:05:01.160 | of future humanity.
03:05:04.120 | There might be some people on some planets and so on,
03:05:06.880 | but I'm not sure if it's, like, yeah,
03:05:08.880 | if it's, like, a major player in our culture and so on.
03:05:12.080 | - We still have to solve the drivers
03:05:14.400 | of self-destruction here on Earth.
03:05:16.760 | So just having a backup on Mars
03:05:18.360 | is not gonna solve the problem.
03:05:19.920 | - So, by the way, I love the backup on Mars.
03:05:21.840 | I think that's amazing.
03:05:22.680 | We should absolutely do that.
03:05:23.720 | - Yes.
03:05:24.840 | - And I'm so thankful for anyone.
03:05:26.880 | - Would you go to Mars?
03:05:28.680 | - Personally, no.
03:05:29.560 | I do like Earth quite a lot.
03:05:31.320 | - Okay, I'll go to Mars.
03:05:32.640 | I'll go for you.
03:05:33.960 | I'll tweet at you from there.
03:05:35.360 | - Maybe eventually I would once it's safe enough,
03:05:37.560 | but I don't actually know if it's on my lifetime scale,
03:05:40.340 | unless I can extend it by a lot.
03:05:41.940 | I do think that, for example,
03:05:43.960 | a lot of people might disappear into virtual realities
03:05:47.040 | and stuff like that.
03:05:47.860 | I think that could be the major thrust
03:05:49.240 | of sort of the cultural development of humanity,
03:05:52.480 | if it survives.
03:05:53.840 | So it might not be,
03:05:54.920 | it's just really hard to work in physical realm
03:05:57.160 | and go out there.
03:05:58.440 | And I think ultimately all your experiences
03:06:00.160 | are in your brain.
03:06:02.040 | And so it's much easier to disappear into digital realm.
03:06:05.720 | And I think people will find them more compelling,
03:06:07.560 | easier, safer, more interesting.
03:06:10.600 | - So you're a little bit captivated by virtual reality,
03:06:12.880 | by the possible worlds,
03:06:14.260 | whether it's the metaverse
03:06:15.240 | or some other manifestation of that.
03:06:16.840 | - Yeah.
03:06:18.240 | - Yeah, it's really interesting.
03:06:21.680 | I'm interested, just talking a lot to Carmack,
03:06:24.920 | where's the thing that's currently preventing that?
03:06:29.440 | - Yeah, I mean, to be clear,
03:06:30.720 | I think what's interesting about the future is,
03:06:33.820 | it's not that,
03:06:35.360 | I kind of feel like the variance in the human condition grows
03:06:39.080 | that's the primary thing that's changing.
03:06:40.400 | It's not as much the mean of the distribution,
03:06:42.840 | it's like the variance of it.
03:06:43.920 | So there will probably be people on Mars
03:06:45.360 | and there will be people in VR
03:06:46.680 | and there will be people here on earth.
03:06:48.040 | It's just like, there will be so many more ways of being.
03:06:50.960 | And so I kind of feel like,
03:06:51.800 | I see it as like a spreading out of a human experience.
03:06:54.600 | - There's something about the internet
03:06:55.960 | that allows you to discover those little groups
03:06:57.880 | and then you gravitate to something about your biology
03:07:01.040 | likes that kind of world and that you find each other.
03:07:02.920 | - Yeah, and we'll have transhumanists
03:07:04.560 | and then we'll have the Amish
03:07:05.720 | and they're gonna, everything is just gonna coexist.
03:07:07.680 | - Yeah, the cool thing about it,
03:07:08.680 | 'cause I've interacted with a bunch of internet communities,
03:07:11.600 | is they don't know about each other.
03:07:15.480 | Like you can have a very happy existence,
03:07:17.800 | just like having a very close knit community
03:07:19.800 | and not knowing about each other.
03:07:21.240 | I mean, you even sense this, just having traveled to Ukraine,
03:07:24.720 | they don't know so many things about America.
03:07:28.980 | When you travel across the world,
03:07:31.440 | I think you experience this too.
03:07:33.040 | There are certain cultures that are like,
03:07:34.720 | they have their own thing going on.
03:07:36.960 | So you can see that happening more and more and more
03:07:39.800 | and more in the future.
03:07:40.880 | We have little communities.
03:07:42.080 | - Yeah, yeah, I think so.
03:07:43.200 | That seems to be how it's going right now.
03:07:46.760 | And I don't see that trend like really reversing.
03:07:48.840 | I think people are diverse
03:07:49.840 | and they're able to choose their own path in existence.
03:07:52.800 | And I sort of like celebrate that.
03:07:54.500 | And so-
03:07:56.240 | - Will you spend so much time in the metaverse,
03:07:58.080 | in the virtual reality?
03:07:59.840 | Or which community are you?
03:08:01.480 | Are you the physicalist,
03:08:02.680 | the physical reality enjoyer?
03:08:06.920 | Or do you see drawing a lot of pleasure
03:08:10.720 | and fulfillment in the digital world?
03:08:13.480 | - Yeah, I think, well, currently,
03:08:14.760 | the virtual reality is not that compelling.
03:08:17.360 | I do think it can improve a lot,
03:08:18.840 | but I don't really know to what extent.
03:08:21.520 | Maybe there's actually even more exotic things
03:08:23.760 | you can think about with Neuralinks or stuff like that.
03:08:26.560 | Currently, I kind of see myself as mostly a team human person.
03:08:31.760 | I love nature.
03:08:32.880 | I love harmony.
03:08:33.720 | I love people.
03:08:34.720 | I love humanity.
03:08:36.160 | I love emotions of humanity.
03:08:37.760 | And I just want to be in this like solar punk,
03:08:42.360 | little utopia, that's my happy place.
03:08:44.760 | My happy place is people I love,
03:08:47.120 | thinking about cool problems,
03:08:48.200 | surrounded by lush, beautiful, dynamic nature,
03:08:51.480 | and secretly high-tech in places that count.
03:08:54.360 | - Places that count.
03:08:55.240 | So you use technology to empower that love
03:08:58.080 | for other humans and nature.
03:09:00.560 | - Yeah, I think technology used very sparingly.
03:09:03.080 | I don't love when it sort of gets in the way of humanity
03:09:05.680 | in many ways.
03:09:07.400 | I like just people being humans in a way.
03:09:09.640 | We sort of like slightly evolved and prefer,
03:09:11.880 | I think, just by default.
03:09:13.280 | - People kept asking me,
03:09:14.400 | 'cause they know you love reading.
03:09:16.120 | Are there particular books that you enjoyed
03:09:19.680 | that had an impact on you for silly
03:09:22.680 | or for profound reasons that you would recommend?
03:09:26.080 | You mentioned the vital question.
03:09:29.360 | - Many, of course.
03:09:30.200 | I think in biology, as an example,
03:09:31.640 | the vital question is a good one.
03:09:32.920 | Anything by Nick Lane, really, "Life Ascending,"
03:09:36.000 | I would say is like a bit more potentially representative,
03:09:39.680 | is like a summary of a lot of the things
03:09:42.640 | he's been talking about.
03:09:44.280 | I was very impacted by "The Selfish Gene."
03:09:46.280 | I thought that was a really good book
03:09:47.680 | that helped me understand altruism as an example
03:09:49.960 | and where it comes from,
03:09:50.800 | and just realizing that the selectionism
03:09:52.640 | and the level of genes was a huge insight for me
03:09:54.320 | at the time,
03:09:55.160 | and it sort of cleared up a lot of things for me.
03:09:57.160 | - What do you think about the idea
03:09:59.860 | that ideas are the organisms, the memes?
03:10:01.920 | - Yes, love it, 100%.
03:10:03.360 | (Lex laughing)
03:10:05.920 | - Are you able to walk around with that notion for a while,
03:10:08.920 | that there is an evolutionary kind of process
03:10:12.400 | with ideas as well?
03:10:13.320 | - There absolutely is.
03:10:14.160 | There's memes just like genes,
03:10:15.440 | and they compete, and they live in our brains.
03:10:18.400 | It's beautiful.
03:10:19.400 | - Are we silly humans thinking that we're the organisms?
03:10:22.080 | Is it possible that the primary organisms are the ideas?
03:10:26.200 | - Yeah, I would say the ideas kind of live in the software
03:10:30.000 | of our civilization in the minds and so on.
03:10:33.600 | - We think as humans that the hardware
03:10:36.080 | is the fundamental thing.
03:10:37.840 | I, human, is a hardware entity,
03:10:40.960 | but it could be the software, right?
03:10:43.080 | - Yeah, yeah, I would say there needs to be some grounding
03:10:46.760 | at some point to a physical reality.
03:10:49.040 | - But if we clone an Andre,
03:10:50.520 | the software is the thing,
03:10:54.040 | is the thing that makes that thing special, right?
03:10:57.600 | - Yeah, I guess you're right.
03:10:59.360 | - But then cloning might be exceptionally difficult.
03:11:01.680 | There might be a deep integration
03:11:02.880 | between the software and the hardware,
03:11:04.520 | in ways we don't quite understand.
03:11:06.280 | - Well, from the evolution point of view,
03:11:07.440 | what makes me special is more the gang of genes
03:11:10.740 | that are riding in my chromosomes, I suppose, right?
03:11:13.180 | Like they're the replicating unit, I suppose.
03:11:16.060 | - No, but that's just the compute.
03:11:17.380 | The thing that makes you special, sure.
03:11:20.040 | Well, the reality is what makes you special
03:11:25.040 | is your ability to survive based on the software
03:11:29.740 | that runs on the hardware that was built by the genes.
03:11:33.080 | So the software is the thing that makes you survive,
03:11:35.900 | not the hardware.
03:11:37.340 | Or I guess it's the two sides.
03:11:38.160 | - It's a little bit of both.
03:11:39.000 | It's just like a second layer.
03:11:40.340 | It's a new second layer
03:11:41.420 | that hasn't been there before the brain.
03:11:42.820 | They both coexist.
03:11:44.100 | - But there's also layers of the software.
03:11:46.020 | I mean, it's an abstraction on top of abstractions.
03:11:51.020 | - But, yeah, so selfish gene, Nick Lane.
03:11:55.500 | I would say sometimes books are like not sufficient.
03:11:58.500 | I like to reach for textbooks sometimes.
03:12:00.500 | I kind of feel like books are for too much
03:12:03.540 | of a general consumption sometimes.
03:12:05.100 | And they just kind of like,
03:12:06.840 | they're too high up in the level of abstraction
03:12:08.500 | and it's not good enough.
03:12:09.900 | So I like textbooks.
03:12:10.740 | I like "The Cell."
03:12:12.100 | I think "The Cell" was pretty cool.
03:12:14.700 | That's why also I like the writing of Nick Lane
03:12:17.860 | is because he's pretty willing to step one level down
03:12:21.180 | and he doesn't, yeah, he's sort of,
03:12:23.740 | he's willing to go there.
03:12:25.740 | But he's also willing to sort of be throughout the stack.
03:12:27.820 | So he'll go down to a lot of detail,
03:12:29.180 | but then he will come back up.
03:12:30.700 | And I think he has a,
03:12:32.620 | yeah, basically I really appreciate that.
03:12:34.700 | - That's why I love college, early college,
03:12:36.600 | even high school, just textbooks on the basics.
03:12:39.700 | - Yeah.
03:12:40.540 | - Of computer science, of mathematics,
03:12:41.900 | of biology, of chemistry.
03:12:44.140 | - Yes.
03:12:44.980 | - Those are, they condense down.
03:12:46.340 | It's sufficiently general that you can understand
03:12:50.580 | both the philosophy and the details,
03:12:52.100 | but also you get homework problems
03:12:54.540 | and you get to play with it as much as you would
03:12:57.300 | if you were in programming stuff.
03:12:59.660 | - Yeah.
03:13:00.500 | And then I'm also suspicious of textbooks, honestly,
03:13:01.940 | because as an example in deep learning,
03:13:04.260 | there's no amazing textbooks
03:13:05.740 | and the field is changing very quickly.
03:13:07.220 | I imagine the same is true
03:13:08.380 | in say synthetic biology and so on.
03:13:11.340 | These books like "The Cell" are kind of outdated.
03:13:13.420 | They're still high level.
03:13:14.540 | Like what is the actual real source of truth?
03:13:16.400 | It's people in wet labs working with cells.
03:13:18.940 | - Yeah.
03:13:19.780 | - You know, sequencing genomes and yeah,
03:13:22.980 | actually working with it.
03:13:24.700 | And I don't have that much exposure to that
03:13:27.060 | or what that looks like.
03:13:27.900 | So I still don't fully,
03:13:29.060 | I'm reading through "The Cell"
03:13:30.180 | and it's kind of interesting and I'm learning,
03:13:31.380 | but it's still not sufficient, I would say,
03:13:33.260 | in terms of understanding.
03:13:34.740 | - Well, it's a clean summarization
03:13:36.660 | of the mainstream narrative.
03:13:38.780 | - Yeah.
03:13:39.620 | - But you have to learn that before you break out.
03:13:41.740 | - Yeah.
03:13:42.580 | - I think of it towards the cutting edge.
03:13:43.740 | - Yeah.
03:13:44.580 | But what is the actual process of working with these cells
03:13:45.900 | and growing them and incubating them?
03:13:47.800 | And, you know, it's kind of like a massive cooking recipes
03:13:50.180 | of making sure your cells lives and proliferate
03:13:52.260 | and then you're sequencing them, running experiments
03:13:54.020 | and just how that works,
03:13:55.980 | I think is kind of like the source of truth
03:13:57.340 | of at the end of the day,
03:13:58.360 | what's really useful in terms of creating therapies
03:14:01.100 | and so on.
03:14:01.940 | - Yeah, I wonder what in the future AI textbooks will be.
03:14:04.860 | 'Cause, you know, there's "Artificial Intelligence,
03:14:06.820 | "The Modern Approach."
03:14:07.700 | I actually haven't read if it's come out,
03:14:09.860 | the recent version, the recent,
03:14:11.740 | there's been a recent edition.
03:14:13.380 | I also saw there's a "Science of Deep Learning" book.
03:14:15.860 | I'm waiting for textbooks that are worth recommending,
03:14:17.900 | worth reading.
03:14:18.740 | - Yeah.
03:14:19.580 | - It's tricky 'cause it's like papers and code, code, code.
03:14:23.580 | - Honestly, I think papers are quite good.
03:14:25.740 | I especially like the appendix of any paper as well.
03:14:28.660 | It's like the most detail you can have.
03:14:30.900 | (both laughing)
03:14:33.140 | It doesn't have to be cohesive,
03:14:34.980 | connected to anything else.
03:14:35.900 | You just describe me a very specific way
03:14:37.820 | you saw the particular thing, yeah.
03:14:39.300 | - Many times papers can be actually quite readable.
03:14:41.240 | Not always, but sometimes the introduction
03:14:43.140 | in the abstract is readable,
03:14:44.180 | even for someone outside of the field.
03:14:46.420 | This is not always true.
03:14:47.700 | And sometimes I think, unfortunately,
03:14:49.180 | scientists use complex terms, even when it's not necessary.
03:14:52.580 | I think that's harmful.
03:14:54.000 | I think there's no reason for that.
03:14:55.820 | - And papers sometimes are longer than they need to be
03:14:58.540 | in the parts that don't matter.
03:15:01.620 | - Yeah.
03:15:02.460 | - So the appendix would be long,
03:15:03.300 | but then the paper itself, look at Einstein, make it simple.
03:15:07.100 | - Yeah, but certainly I've come across papers,
03:15:08.540 | I would say, like synthetic biology or something
03:15:10.540 | that I thought were quite readable
03:15:11.820 | for the abstract and the introduction.
03:15:13.260 | And then you're reading the rest of it
03:15:14.540 | and you don't fully understand,
03:15:15.900 | but you kind of are getting a gist and I think it's cool.
03:15:18.540 | (both laughing)
03:15:20.180 | - What advice you give advice to folks
03:15:23.300 | interested in machine learning and research,
03:15:25.460 | but in general, life advice to a young person,
03:15:27.660 | high school, early college,
03:15:30.660 | about how to have a career they can be proud of
03:15:33.140 | or a life they can be proud of?
03:15:34.740 | - Yeah, I think I'm very hesitant to give general advice.
03:15:37.980 | I think it's really hard.
03:15:38.900 | I've mentioned, like some of the stuff I've mentioned
03:15:40.460 | is fairly general, I think,
03:15:41.740 | like focus on just the amount of work you're spending
03:15:44.100 | on like a thing.
03:15:45.700 | Compare yourself only to yourself, not to others.
03:15:48.100 | - That's good.
03:15:48.940 | - I think those are fairly general.
03:15:49.900 | - How do you pick the thing?
03:15:51.300 | - You just have like a deep interest in something
03:15:55.340 | or like try to like find the argmax
03:15:57.360 | over like the things that you're interested in.
03:15:58.860 | - Argmax at that moment and stick with it.
03:16:00.940 | How do you not get distracted and switch to another thing?
03:16:05.180 | - You can, if you like.
03:16:06.460 | (Lex laughing)
03:16:07.820 | - Well, if you do an argmax repeatedly every week,
03:16:11.020 | every month. - Yeah, it doesn't converge.
03:16:12.180 | - It doesn't, it's a problem.
03:16:13.300 | - Yeah, you can like low pass filter yourself
03:16:15.340 | in terms of like what has consistently been true for you.
03:16:18.180 | But yeah, I definitely see how it can be hard,
03:16:22.180 | but I would say like you're going to work the hardest
03:16:24.020 | on the thing that you care about the most.
03:16:26.020 | So low pass filter yourself and really introspect.
03:16:28.980 | In your past, what are the things that gave you energy?
03:16:31.180 | And what are the things that took energy away from you?
03:16:33.340 | Concrete examples.
03:16:34.540 | And usually from those concrete examples,
03:16:36.860 | sometimes patterns can emerge.
03:16:38.540 | I like it when things look like this
03:16:40.380 | when I'm in these positions.
03:16:41.340 | - So that's not necessarily the field,
03:16:42.700 | but the kind of stuff you're doing in a particular field.
03:16:44.780 | So for you, it seems like you were energized
03:16:47.460 | by implementing stuff, building actual things.
03:16:50.620 | - Yeah, being low level learning,
03:16:52.580 | and then also communicating so that others
03:16:55.420 | can go through the same realizations
03:16:56.820 | and shortening that gap.
03:16:58.140 | Because I usually have to do way too much work
03:17:00.820 | to understand a thing.
03:17:01.700 | And then I'm like, okay, this is actually like,
03:17:03.380 | okay, I think I get it.
03:17:04.220 | And like, why was it so much work?
03:17:05.900 | It should have been much less work.
03:17:08.860 | And that gives me a lot of frustration.
03:17:10.620 | And that's why I sometimes go teach.
03:17:12.500 | - So aside from the teaching you're doing now,
03:17:15.380 | putting out videos, aside from a potential Godfather part two
03:17:20.380 | with the AGI at Tesla and beyond,
03:17:24.620 | what does the future for Andrej Karpathy hold?
03:17:26.940 | Have you figured that out yet or no?
03:17:28.900 | I mean, as you see through the fog of war,
03:17:32.580 | that is all of our future.
03:17:34.100 | Do you start seeing silhouettes
03:17:37.460 | of what that possible future could look like?
03:17:39.700 | - The consistent thing I've been always interested in,
03:17:42.700 | for me at least, is AI.
03:17:44.620 | And that's probably what I'm spending
03:17:47.940 | the rest of my life on, because I just care about it a lot.
03:17:50.820 | And I actually care about many other problems as well,
03:17:53.420 | like say aging, which I basically view as disease.
03:17:56.300 | And I care about that as well,
03:17:58.980 | but I don't think it's a good idea
03:18:00.460 | to go after it specifically.
03:18:02.340 | I don't actually think that humans will be able
03:18:04.420 | to come up with the answer.
03:18:06.180 | I think the correct thing to do is to ignore those problems
03:18:08.820 | and you solve AI and then use that to solve everything else.
03:18:11.820 | And I think there's a chance that this will work.
03:18:13.260 | I think it's a very high chance.
03:18:14.820 | And that's kind of like the way I'm betting at least.
03:18:18.460 | - So when you think about AI,
03:18:20.060 | are you interested in all kinds of applications,
03:18:23.380 | all kinds of domains, and any domain you focus on
03:18:26.780 | will allow you to get insights to the big problem of AGI?
03:18:30.020 | - Yeah, for me, it's the ultimate meta problem.
03:18:31.860 | I don't wanna work on any one specific problem.
03:18:33.500 | There's too many problems.
03:18:34.380 | So how can you work on all problems simultaneously?
03:18:36.560 | You solve the meta problem,
03:18:37.860 | which to me is just intelligence.
03:18:39.420 | And how do you automate it?
03:18:42.340 | - Is there cool small projects like Archive Sanity
03:18:45.580 | and so on that you're thinking about,
03:18:49.180 | that the world, the ML world can anticipate?
03:18:53.140 | - There's always like some fun side projects.
03:18:55.460 | Archive Sanity is one.
03:18:57.140 | Basically, like there's way too many archive papers.
03:18:58.860 | How can I organize it and recommend papers and so on?
03:19:02.420 | I transcribed all of your podcasts.
03:19:04.900 | - What did you learn from that experience?
03:19:07.360 | From transcribing the process of,
03:19:09.820 | like you like consuming audio books and podcasts and so on.
03:19:13.020 | And here's a process that achieves
03:19:16.460 | closer to human level performance on annotation.
03:19:19.300 | - Yeah, well, I definitely was like surprised
03:19:21.220 | that transcription with opening as Whisper
03:19:24.020 | was working so well,
03:19:25.260 | compared to what I'm familiar with from Siri
03:19:27.140 | and like a few other systems, I guess.
03:19:29.580 | It works so well.
03:19:30.460 | And that's what gave me some energy to like try it out.
03:19:34.300 | And I thought it could be fun to run on podcasts.
03:19:36.740 | It's kind of not obvious to me
03:19:38.520 | why Whisper is so much better compared to anything else,
03:19:41.340 | because I feel like there should be a lot of incentive
03:19:43.020 | for a lot of companies to produce transcription systems
03:19:45.100 | and that they've done so over a long time.
03:19:46.740 | Whisper is not a super exotic model.
03:19:48.500 | It's a transformer.
03:19:50.240 | It takes male spectrograms
03:19:51.680 | and just outputs tokens of text.
03:19:54.220 | It's not crazy.
03:19:55.780 | The model and everything has been around for a long time.
03:19:58.460 | I'm not actually a hundred percent sure why.
03:19:59.780 | - Yeah, it's not obvious to me either.
03:20:02.100 | It makes me feel like I'm missing something.
03:20:04.180 | - I'm missing something.
03:20:05.140 | - Yeah, because there is a huge,
03:20:07.140 | even at Google and so on, YouTube transcription.
03:20:10.460 | - Yeah.
03:20:11.340 | - Yeah, it's unclear,
03:20:12.540 | but some of it is also integrating into a bigger system.
03:20:15.860 | - Yeah.
03:20:16.700 | - That, so the user interface,
03:20:18.300 | how it's deployed and all that kind of stuff.
03:20:19.780 | Maybe running it as an independent thing is much easier,
03:20:24.060 | like an order of magnitude easier
03:20:25.340 | than deploying to a large integrated system
03:20:27.740 | like YouTube transcription or anything like meetings,
03:20:31.780 | like Zoom has transcription.
03:20:34.860 | That's kind of crappy,
03:20:36.480 | but creating an interface
03:20:37.980 | where it detects the different individual speakers,
03:20:40.540 | it's able to display it in compelling ways,
03:20:45.540 | run it in real time, all that kind of stuff.
03:20:47.540 | Maybe that's difficult.
03:20:48.780 | But that's the only explanation I have,
03:20:51.180 | because I'm currently paying quite a bit
03:20:55.380 | for human transcription, human captions.
03:20:58.260 | - Right.
03:20:59.100 | - Annotation.
03:20:59.940 | And it seems like there's a huge incentive to automate that.
03:21:03.940 | - Yeah.
03:21:04.760 | - It's very confusing.
03:21:05.600 | - And I think, I mean, I don't know if you looked
03:21:06.420 | at some of the Whisper transcripts,
03:21:07.600 | but they're quite good.
03:21:09.060 | - They're good.
03:21:10.140 | And especially in tricky cases.
03:21:12.100 | - Yeah.
03:21:12.940 | - I've seen Whisper's performance on like super tricky cases
03:21:17.140 | and it does incredibly well.
03:21:18.380 | So I don't know.
03:21:19.500 | A podcast is pretty simple.
03:21:20.900 | It's like high quality audio
03:21:23.420 | and you're speaking usually pretty clearly.
03:21:26.620 | And so I don't know.
03:21:27.660 | I don't know what OpenAI's plans are either.
03:21:31.860 | - But yeah, there's always like fun projects basically.
03:21:34.580 | And stable diffusion also is opening up
03:21:36.620 | a huge amount of experimentation,
03:21:38.020 | I would say in the visual realm
03:21:39.180 | and generating images and videos and movies.
03:21:42.380 | - Yeah, videos now.
03:21:43.660 | - And so that's going to be pretty crazy.
03:21:45.500 | That's going to almost certainly work
03:21:48.300 | and is going to be really interesting
03:21:49.660 | when the cost of content creation is going to fall to zero.
03:21:52.940 | You used to need a painter for a few months
03:21:54.620 | to paint a thing.
03:21:55.500 | And now it's going to be speak to your phone
03:21:57.540 | to get your video.
03:21:59.180 | - So Hollywood will start using that to generate scenes,
03:22:02.080 | which completely opens up.
03:22:05.660 | - Yeah, so you can make a movie like "Avatar"
03:22:09.380 | eventually for under a million dollars.
03:22:12.360 | - Much less, maybe just by talking to your phone.
03:22:14.580 | I mean, I know it sounds kind of crazy.
03:22:16.540 | - And then there'd be some voting mechanism.
03:22:19.420 | Like how do you have a...
03:22:20.420 | Like would there be a show on Netflix
03:22:22.500 | that's generated completely automatically?
03:22:25.460 | - Yeah, potentially, yeah.
03:22:27.380 | And what does it look like also
03:22:28.540 | when you can just generate it on demand
03:22:30.500 | and there's infinity of it?
03:22:33.940 | - Yeah.
03:22:35.100 | (laughs)
03:22:36.060 | Oh man.
03:22:37.140 | All the synthetic content.
03:22:39.180 | I mean, it's humbling because we treat ourselves
03:22:41.980 | as special for being able to generate art
03:22:43.900 | and ideas and all that kind of stuff.
03:22:46.340 | If that can be done in an automated way by AI.
03:22:49.940 | - Yeah, I think it's fascinating to me how these...
03:22:52.740 | The predictions of AI and what it's going to look like
03:22:54.820 | and what it's going to be capable of
03:22:55.860 | are completely inverted and wrong.
03:22:57.580 | And sci-fi of 50s and 60s was just like totally not right.
03:23:01.300 | They imagined AI as like super calculating theorem provers
03:23:04.860 | and we're getting things that can talk to you
03:23:06.140 | about emotions, they can do art.
03:23:08.940 | It's just like weird.
03:23:10.300 | - Are you excited about that future?
03:23:11.860 | Just AI's like hybrid systems, heterogeneous systems
03:23:16.780 | of humans and AI's talking about emotions.
03:23:19.460 | Netflix and chill with an AI system.
03:23:21.420 | That's where the Netflix thing you watch
03:23:23.460 | is also generated by AI.
03:23:24.900 | - I think it's going to be interesting for sure.
03:23:29.340 | And I think I'm cautiously optimistic,
03:23:31.340 | but it's not obvious.
03:23:33.660 | - Well, the sad thing is your brain and mine developed
03:23:37.320 | in a time where before Twitter, before the internet.
03:23:42.320 | So I wonder people that are born inside of it
03:23:45.220 | might have a different experience.
03:23:47.700 | Like I, maybe you will still resist it.
03:23:51.080 | And the people born now will not.
03:23:54.740 | - Well, I do feel like humans are extremely malleable.
03:23:56.780 | - Yeah.
03:23:58.380 | - And you're probably right.
03:24:00.020 | - What is the meaning of life, Andre?
03:24:02.860 | - We talked about sort of the universe
03:24:08.020 | having a conversation with us humans
03:24:10.700 | or with the systems we create to try to answer.
03:24:14.020 | For the universe, for the creator of the universe
03:24:16.860 | to notice us, we're trying to create systems
03:24:19.500 | that are loud enough to answer back.
03:24:23.740 | - I don't know if that's the meaning of life.
03:24:24.940 | That's like meaning of life for some people.
03:24:26.940 | - The first level answer I would say is
03:24:28.700 | anyone can choose their own meaning of life
03:24:30.260 | because we are a conscious entity and it's beautiful.
03:24:33.080 | Number one.
03:24:34.120 | But I do think that like a deeper meaning of life
03:24:37.220 | if someone is interested is along the lines of like,
03:24:40.180 | what the hell is all this and like, why?
03:24:43.360 | And if you look at the, into fundamental physics
03:24:46.140 | and the quantum field theory and the standard model,
03:24:48.140 | they're like way, they're very complicated.
03:24:50.180 | And there's this like, you know,
03:24:52.660 | 19 free parameters of our universe.
03:24:55.440 | And like, what's going on with all this stuff?
03:24:57.680 | And why is it here?
03:24:58.600 | And can I hack it?
03:24:59.620 | Can I work with it?
03:25:00.460 | Is there a message for me?
03:25:01.360 | Am I supposed to create a message?
03:25:03.360 | And so I think there's some fundamental answers there.
03:25:05.720 | But I think there's actually even like,
03:25:07.640 | you can't actually like really make dent in those
03:25:09.980 | without more time.
03:25:11.200 | And so to me also, there's a big question around
03:25:13.780 | just getting more time, honestly.
03:25:15.960 | Yeah, that's kind of like what I think about
03:25:17.120 | quite a bit as well.
03:25:18.100 | - So kind of the ultimate, or at least first way
03:25:22.160 | to sneak up to the why question is to try to escape
03:25:26.480 | the system, the universe.
03:25:30.400 | And then for that, you sort of backtrack and say,
03:25:34.280 | okay, for that, that's gonna be, take a very long time.
03:25:36.720 | So the why question boils down from an engineering
03:25:39.540 | perspective to how do we extend?
03:25:41.280 | - Yeah, I think that's the question number one,
03:25:42.640 | practically speaking, because you can't,
03:25:44.760 | you're not gonna calculate the answer
03:25:46.160 | to the deeper questions in time you have.
03:25:49.000 | - And that could be extending your own lifetime
03:25:50.840 | or extending just the lifetime of human civilization.
03:25:53.640 | - Of whoever wants to.
03:25:55.160 | Not many people might not want that.
03:25:57.360 | But I think people who do want that,
03:25:58.920 | I think it's probably possible.
03:26:02.400 | And I don't know that people fully realize this.
03:26:05.440 | I kind of feel like people think of death
03:26:07.040 | as an inevitability.
03:26:08.840 | But at the end of the day, this is a physical system.
03:26:11.120 | Some things go wrong.
03:26:13.000 | It makes sense why things like this happen,
03:26:15.200 | evolutionarily speaking.
03:26:16.800 | And there's most certainly interventions
03:26:19.520 | that mitigate it.
03:26:21.400 | - That'd be interesting if death is eventually looked at
03:26:23.960 | as a fascinating thing that used to happen to humans.
03:26:28.960 | - I don't think it's unlikely.
03:26:29.960 | I think it's likely.
03:26:31.980 | - And it's up to our imagination to try to predict
03:26:37.040 | what the world without death looks like.
03:26:39.720 | - Yeah. - It's hard to,
03:26:41.040 | I think the values will completely change.
03:26:43.260 | - Could be.
03:26:45.280 | I don't really buy all these ideas that,
03:26:47.660 | oh, without death, there's no meaning,
03:26:50.640 | there's nothing as,
03:26:52.340 | I don't intuitively buy all those arguments.
03:26:54.920 | I think there's plenty of meaning,
03:26:56.200 | plenty of things to learn.
03:26:57.360 | They're interesting, exciting.
03:26:58.320 | I want to know, I want to calculate.
03:27:00.320 | I want to improve the condition
03:27:02.520 | of all the humans and organisms that are alive.
03:27:05.680 | - Yeah, the way we find meaning might change.
03:27:08.480 | There is a lot of humans, probably including myself,
03:27:11.000 | that finds meaning in the finiteness of things.
03:27:14.560 | But that doesn't mean that's the only source of meaning.
03:27:16.520 | - Yeah.
03:27:17.360 | I do think many people will go with that,
03:27:19.820 | which I think is great.
03:27:21.100 | I love the idea that people
03:27:22.120 | can just choose their own adventure.
03:27:24.080 | Like you are born as a conscious, free entity,
03:27:26.780 | by default, I like to think.
03:27:28.360 | And you have your unalienable rights for life.
03:27:33.360 | - In the pursuit of happiness.
03:27:34.760 | - Pursuit of happiness.
03:27:35.600 | - I don't know if you have that.
03:27:37.280 | In the nature, the landscape of happiness.
03:27:39.720 | - You can choose your own adventure mostly.
03:27:41.520 | And that's not fully true, but.
03:27:43.800 | - I still am pretty sure I'm an NPC.
03:27:46.160 | But an NPC can't know it's an NPC.
03:27:49.780 | There could be different degrees
03:27:52.680 | and levels of consciousness.
03:27:54.280 | I don't think there's a more beautiful way to end it.
03:27:57.640 | Andre, you're an incredible person.
03:28:00.280 | I'm really honored you would talk with me.
03:28:02.040 | Everything you've done for the machine learning world,
03:28:04.120 | for the AI world, to just inspire people,
03:28:07.400 | to educate millions of people, it's been great.
03:28:10.440 | And I can't wait to see what you do next.
03:28:11.960 | It's been an honor, man.
03:28:12.900 | Thank you so much for talking today.
03:28:14.240 | - Awesome, thank you.
03:28:15.800 | - Thanks for listening to this conversation
03:28:17.880 | with Andre Karpathy.
03:28:19.320 | To support this podcast,
03:28:20.640 | please check out our sponsors in the description.
03:28:23.640 | And now, let me leave you with some words from Samuel Carlin.
03:28:28.640 | The purpose of models is not to fit the data,
03:28:32.160 | but to sharpen the questions.
03:28:34.560 | Thanks for listening, and hope to see you next time.
03:28:38.400 | (upbeat music)
03:28:40.980 | (upbeat music)
03:28:43.560 | [BLANK_AUDIO]