Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

00:00:00.000 | I think it's possible that physics has exploits

00:00:01.880 | and we should be trying to find them.

00:00:03.560 | Arranging some kind of a crazy quantum mechanical system

00:00:05.920 | that somehow gives you buffer overflow,

00:00:08.380 | somehow gives you a rounding error in the floating point.

00:00:10.760 | Synthetic intelligences are kind of like

00:00:13.240 | the next stage of development.

00:00:15.320 | And I don't know where it leads to,

00:00:17.120 | like at some point I suspect the universe

00:00:20.760 | is some kind of a puzzle.

00:00:23.120 | These synthetic AIs will uncover that puzzle and solve it.

00:00:27.600 | (air whooshing)

00:00:30.120 | The following is a conversation with Andrej Karpathy,

00:00:32.880 | previously the director of AI at Tesla.

00:00:35.840 | And before that at OpenAI and Stanford.

00:00:39.800 | He is one of the greatest scientist, engineers

00:00:43.600 | and educators in the history of artificial intelligence.

00:00:47.820 | This is the Lex Friedman podcast.

00:00:50.120 | To support it, please check out our sponsors.

00:00:52.760 | And now dear friends, here's Andrej Karpathy.

00:00:58.120 | What is a neural network?

00:01:00.160 | And why does it seem to do such a surprisingly

00:01:03.160 | good job of learning?

00:01:04.600 | - What is a neural network?

00:01:05.440 | It's a mathematical abstraction of the brain.

00:01:10.040 | I would say that's how it was originally developed.

00:01:12.680 | At the end of the day, it's a mathematical expression.

00:01:14.560 | And it's a fairly simple mathematical expression

00:01:16.140 | when you get down to it.

00:01:17.380 | It's basically a sequence of matrix multiplies,

00:01:21.400 | which are really dot products mathematically.

00:01:23.480 | And some non-linearity is thrown in.

00:01:25.440 | And so it's a very simple mathematical expression.

00:01:27.640 | And it's got knobs in it.

00:01:29.200 | - Many knobs.

00:01:30.040 | - Many knobs.

00:01:30.880 | And these knobs are loosely related to basically

00:01:33.120 | the synapses in your brain.

00:01:34.160 | They're trainable, they're modifiable.

00:01:35.840 | And so the idea is like, we need to find the setting

00:01:37.720 | of the knobs that makes the neural net do whatever

00:01:40.760 | you want it to do, like classify images and so on.

00:01:43.560 | And so there's not too much mystery, I would say in it.

00:01:45.640 | Like you might think that, basically don't want to endow it

00:01:49.640 | with too much meaning with respect to the brain

00:01:51.360 | and how it works.

00:01:52.800 | It's really just a complicated mathematical expression

00:01:55.000 | with knobs and those knobs need a proper setting

00:01:57.600 | for it to do something desirable.

00:01:59.280 | - Yeah, but poetry is just the collection of letters

00:02:02.120 | with spaces, but it can make us feel a certain way.

00:02:05.320 | And in that same way, when you get a large number

00:02:07.400 | of knobs together, whether it's inside the brain

00:02:10.880 | or inside a computer, they seem to surprise us

00:02:14.680 | with their power.

00:02:16.080 | - Yeah, I think that's fair.

00:02:17.400 | So basically I'm underselling it by a lot

00:02:20.000 | because you definitely do get very surprising emergent

00:02:23.760 | behaviors out of these neural nets when they're large enough

00:02:26.040 | and trained on complicated enough problems.

00:02:28.760 | Like say, for example, the next word prediction

00:02:31.280 | in a massive dataset from the internet.

00:02:33.560 | And then these neural nets take on pretty surprising

00:02:36.320 | magical properties.

00:02:37.760 | Yeah, I think it's kind of interesting how much you can get

00:02:39.960 | out of even very simple mathematical formalism.

00:02:42.280 | - When your brain right now is talking,

00:02:44.360 | is it doing next word prediction?

00:02:47.160 | Or is it doing something more interesting?

00:02:49.120 | - Well, it's definitely some kind of a generative model

00:02:50.920 | that's a GPT-like and prompted by you.

00:02:53.520 | So you're giving me a prompt and I'm kind of like responding

00:02:57.400 | to it in a generative way.

00:02:58.640 | - And by yourself, perhaps a little bit?

00:03:00.840 | Like, are you adding extra prompts from your own memory

00:03:04.680 | inside your head?

00:03:05.680 | Or no? - Well, it definitely feels

00:03:07.400 | like you're referencing some kind of a declarative structure

00:03:10.000 | of like memory and so on.

00:03:12.240 | And then you're putting that together with your prompt

00:03:15.440 | and giving away some answers.

00:03:17.080 | - How much of what you just said has been said by you before?

00:03:21.200 | - Nothing, basically, right?

00:03:23.600 | - No, but if you actually look at all the words

00:03:26.000 | you've ever said in your life and you do a search,

00:03:29.480 | you'll probably have said a lot of the same words

00:03:32.200 | in the same order before.

00:03:34.160 | - Yeah, could be.

00:03:35.400 | I mean, I'm using phrases that are common, et cetera,

00:03:37.480 | but I'm remixing it into a pretty sort of unique sentence

00:03:41.240 | at the end of the day.

00:03:42.080 | But you're right, definitely there's like a ton of remixing.

00:03:44.280 | - Why, you didn't, it's like Magnus Carlsen said,

00:03:48.360 | I'm rated 2,900 whatever, which is pretty decent.

00:03:52.360 | I think you're talking very,

00:03:55.240 | you're not giving enough credit to your own nuts here.

00:03:58.080 | Why do they seem to, what's your best intuition

00:04:02.320 | about this emergent behavior?

00:04:05.600 | - I mean, it's kind of interesting

00:04:06.440 | because I'm simultaneously underselling them,

00:04:08.840 | but I also feel like there's an element to which I'm over,

00:04:11.200 | like, it's actually kind of incredible

00:04:12.800 | that you can get so much emergent magical behavior

00:04:14.800 | out of them despite them being so simple mathematically.

00:04:17.560 | So I think those are kind of like two surprising statements

00:04:19.680 | that are kind of juxtaposed together.

00:04:22.680 | And I think basically what it is,

00:04:23.760 | is we are actually fairly good at optimizing

00:04:25.880 | these neural nets.

00:04:27.160 | And when you give them a hard enough problem,

00:04:29.640 | they are forced to learn very interesting solutions

00:04:32.760 | in the optimization.

00:04:34.080 | And those solution basically have these emergent properties

00:04:37.440 | that are very interesting.

00:04:38.840 | - There's wisdom and knowledge in the knobs.

00:04:42.720 | And so this representation that's in the knobs,

00:04:47.680 | does it make sense to you intuitively

00:04:49.400 | that a large number of knobs can hold a representation

00:04:52.720 | that captures some deep wisdom about the data

00:04:55.840 | it has looked at?

00:04:57.480 | It's a lot of knobs.

00:04:58.760 | - It's a lot of knobs.

00:05:00.120 | And somehow, you know, so speaking concretely,

00:05:03.600 | one of the neural nets that people are very excited

00:05:05.160 | about right now are GPTs,

00:05:07.520 | which are basically just next word prediction networks.

00:05:10.280 | So you consume a sequence of words from the internet

00:05:13.720 | and you try to predict the next word.

00:05:15.520 | And once you train these on a large enough dataset,

00:05:19.760 | you can basically prompt these neural nets

00:05:24.040 | in arbitrary ways and you can ask them to solve problems

00:05:25.960 | and they will.

00:05:26.960 | So you can just tell them,

00:05:28.560 | you can make it look like you're trying to solve

00:05:32.280 | some kind of a mathematical problem

00:05:33.480 | and they will continue what they think is the solution

00:05:35.440 | based on what they've seen on the internet.

00:05:36.960 | And very often, those solutions look

00:05:38.760 | very remarkably consistent, look correct potentially.

00:05:41.960 | - Do you still think about the brain side of it?

00:05:45.400 | So as neural nets is an abstraction

00:05:47.520 | or mathematical abstraction of the brain,

00:05:49.560 | you still draw wisdom from the biological neural networks?

00:05:54.560 | Or even the bigger question,

00:05:57.760 | so you're a big fan of biology and biological computation.

00:06:00.940 | What impressive thing is biology doing to you

00:06:05.840 | that computers are not yet?

00:06:07.840 | That gap.

00:06:09.120 | - I would say I'm definitely on,

00:06:10.920 | I'm much more hesitant with the analogies to the brain

00:06:13.400 | than I think you would see potentially in the field.

00:06:16.240 | And I kind of feel like,

00:06:17.440 | certainly the way neural networks started

00:06:20.640 | is everything stemmed from inspiration by the brain.

00:06:23.960 | But at the end of the day,

00:06:24.960 | the artifacts that you get after training,

00:06:27.360 | they are arrived at by a very different optimization process

00:06:30.000 | than the optimization process that gave rise to the brain.

00:06:32.840 | And so I think,

00:06:33.800 | I kind of think of it as a very complicated alien artifact.

00:06:38.680 | It's something different.

00:06:39.760 | I'm sorry, the neural nets that we're training.

00:06:42.520 | They are complicated alien artifact.

00:06:45.680 | I do not make analogies to the brain

00:06:47.320 | because I think the optimization process

00:06:49.000 | that gave rise to it is very different from the brain.

00:06:51.720 | So there was no multi-agent self-play kind of setup

00:06:55.960 | and evolution.

00:06:57.040 | It was an optimization that is basically

00:06:59.520 | what amounts to a compression objective

00:07:01.880 | on a mass amount of data.

00:07:03.440 | - Okay, so artificial neural networks are doing compression

00:07:07.240 | and biological neural networks--

00:07:10.400 | - Are trying to survive.

00:07:11.400 | - Are not really doing anything.

00:07:13.280 | They're an agent in a multi-agent self-play system

00:07:16.840 | that's been running for a very, very long time.

00:07:19.440 | - That said, evolution has found that it is very useful

00:07:23.160 | to predict and have a predictive model in the brain.

00:07:26.360 | And so I think our brain utilizes something

00:07:28.760 | that looks like that as a part of it.

00:07:30.960 | But it has a lot more gadgets and gizmos

00:07:33.760 | and value functions and ancient nuclei

00:07:37.360 | that are all trying to make it survive

00:07:39.000 | and reproduce and everything else.

00:07:40.800 | - And the whole thing through embryogenesis

00:07:43.080 | is built from a single cell.

00:07:44.640 | I mean, it's just the code is inside the DNA

00:07:48.200 | and it just builds it up like the entire organism

00:07:51.280 | with arms and a head and legs.

00:07:55.240 | And it does it pretty well.

00:07:57.720 | - It should not be possible.

00:07:59.200 | - So there's some learning going on.

00:08:00.640 | There's some kind of computation

00:08:03.120 | going through that building process.

00:08:05.000 | I mean, I don't know where,

00:08:08.080 | if you were just to look at the entirety

00:08:09.960 | of history of life on Earth,

00:08:11.840 | what do you think is the most interesting invention?

00:08:15.320 | Is it the origin of life itself?

00:08:17.760 | Is it just jumping to eukaryotes?

00:08:20.440 | Is it mammals?

00:08:22.200 | Is it humans themselves, Homo sapiens?

00:08:24.720 | The origin of intelligence or highly complex intelligence?

00:08:29.760 | Or is it all just a continuation

00:08:33.240 | of the same kind of process?

00:08:34.680 | - Certainly I would say it's an extremely remarkable story

00:08:38.480 | that I'm only briefly learning about recently.

00:08:41.360 | All the way from, actually,

00:08:44.080 | you almost have to start at the formation of Earth

00:08:46.400 | and all of its conditions and the entire solar system

00:08:48.240 | and how everything is arranged with Jupiter and Moon

00:08:50.760 | and the habitable zone and everything.

00:08:53.440 | And then you have an active Earth

00:08:55.640 | that's turning over material.

00:08:57.600 | And then you start with abiogenesis and everything.

00:09:01.840 | And so it's all a pretty remarkable story.

00:09:03.920 | I'm not sure that I can pick a single unique piece of it

00:09:08.920 | that I find most interesting.

00:09:10.760 | I guess for me as an artificial intelligence researcher,

00:09:14.320 | it's probably the last piece.

00:09:15.320 | We have lots of animals that are not building

00:09:19.680 | technological society, but we do.

00:09:22.160 | And it seems to have happened very quickly.

00:09:24.800 | It seems to have happened very recently.

00:09:26.760 | And something very interesting happened there

00:09:30.120 | that I don't fully understand.

00:09:31.120 | I almost understand everything else,

00:09:32.760 | I think intuitively, but I don't understand

00:09:35.440 | exactly that part and how quick it was.

00:09:37.920 | - Both explanations would be interesting.

00:09:39.960 | One is that this is just a continuation

00:09:42.160 | of the same kind of process.

00:09:43.240 | There's nothing special about humans.

00:09:45.400 | That would be, deeply understanding that

00:09:47.680 | would be very interesting.

00:09:48.920 | That we think of ourselves as special,

00:09:50.640 | but it was obvious, it was already written in the code

00:09:56.420 | that you would have greater and greater

00:09:59.680 | intelligence emerging.

00:10:00.880 | And then the other explanation,

00:10:03.640 | which is something truly special happened,

00:10:05.640 | something like a rare event,

00:10:08.040 | whether it's like crazy rare event,

00:10:10.000 | like a space odyssey, what would it be?

00:10:12.520 | See, if you say like the invention of fire,

00:10:14.940 | or the, as Richard Rankin says, the beta males

00:10:21.080 | deciding a clever way to kill the alpha males

00:10:25.560 | by collaborating, so just optimizing the collaboration,

00:10:29.160 | the multi-agent aspect of the multi-agent,

00:10:31.480 | and that really being constrained on resources

00:10:35.000 | and trying to survive, the collaboration aspect

00:10:38.280 | is what created the complex intelligence.

00:10:40.560 | But it seems like it's a natural outgrowth

00:10:42.880 | of the evolution process.

00:10:44.400 | What could possibly be a magical thing that happened,

00:10:47.560 | like a rare thing that would say that humans

00:10:50.520 | are actually, human level intelligence

00:10:52.760 | is actually a really rare thing in the universe?

00:10:55.680 | - Yeah, I'm hesitant to say that it is rare, by the way,

00:10:58.880 | but it definitely seems like,

00:11:01.080 | it's kind of like a punctuated equilibrium

00:11:02.780 | where you have lots of exploration,

00:11:05.100 | and then you have certain leaps, sparse leaps in between.

00:11:08.080 | So of course, like origin of life would be one,

00:11:10.840 | DNA, sex, eukaryotic system, eukaryotic life,

00:11:15.840 | the endosymbiosis event where the Archeon ate

00:11:18.800 | little bacteria, just the whole thing.

00:11:20.720 | Then of course, emergence of consciousness and so on.

00:11:23.520 | So it seems like definitely there are sparse events

00:11:25.560 | where massive amount of progress was made,

00:11:27.360 | but yeah, it's kind of hard to pick one.

00:11:29.600 | - So you don't think humans are unique?

00:11:32.360 | Gotta ask you, how many intelligent alien civilizations

00:11:35.200 | do you think are out there?

00:11:36.800 | And is their intelligence different or similar to ours?

00:11:41.800 | - Yeah. (laughs)

00:11:45.960 | I've been preoccupied with this question

00:11:47.520 | quite a bit recently, basically the Fermi paradox

00:11:50.280 | I'm just thinking through.

00:11:51.440 | And the reason actually that I am very interested

00:11:54.200 | in the origin of life is fundamentally trying to understand

00:11:57.360 | how common it is that there are technological societies

00:11:59.400 | out there in space.

00:12:02.800 | And the more I study it, the more I think that

00:12:05.900 | there should be quite a lot.

00:12:09.400 | - Why haven't we heard from them?

00:12:10.760 | 'Cause I agree with you.

00:12:11.920 | It feels like I just don't see why

00:12:16.920 | what we did here on Earth is so difficult to do.

00:12:20.140 | - Yeah, and especially when you get into the details of it,

00:12:22.000 | I used to think origin of life was very,

00:12:23.960 | it was this magical rare event,

00:12:27.200 | but then you read books like, for example, Nick Lane,

00:12:29.900 | The Vital Question, Life Ascending, et cetera.

00:12:34.260 | And he really gets in and he really makes you believe

00:12:36.960 | that this is not that rare.

00:12:38.560 | - Basic chemistry.

00:12:39.900 | - You have an active Earth and you have your alkaline vents

00:12:41.960 | and you have lots of alkaline waters

00:12:44.000 | mixing with the ocean and you have your proton gradients

00:12:47.000 | and you have little porous pockets of these alkaline vents

00:12:49.920 | that concentrate chemistry.

00:12:51.640 | And basically as he steps through all of these little pieces

00:12:54.980 | you start to understand that actually this is not that crazy

00:12:58.840 | you could see this happen on other systems.

00:13:01.540 | And he really takes you from just a geology

00:13:04.840 | to primitive life and he makes it feel like

00:13:07.900 | it's actually pretty plausible.

00:13:09.240 | And also like the origin of life

00:13:11.040 | was actually fairly fast after formation of Earth.

00:13:16.020 | If I remember correctly, just a few hundred million years

00:13:18.740 | or something like that after basically when it was possible

00:13:21.080 | life actually arose.

00:13:22.440 | And so that makes me feel like that is not the constraint.

00:13:25.000 | That is not the limiting variable

00:13:26.240 | and that life should actually be fairly common.

00:13:28.600 | And then where the drop-offs are

00:13:31.880 | is very interesting to think about.

00:13:35.480 | I currently think that there's no major drop-offs basically.

00:13:38.280 | And so there should be quite a lot of life.

00:13:40.040 | And basically where that brings me to then

00:13:42.680 | is the only way to reconcile the fact

00:13:44.440 | that we haven't found anyone and so on

00:13:46.040 | is that we just can't see them, we can't observe them.

00:13:50.540 | - Just a quick brief comment.

00:13:51.940 | Nick Lane and a lot of biologists I talk to,

00:13:54.480 | they really seem to think that the jump from bacteria

00:13:58.280 | to more complex organisms is the hardest jump.

00:14:01.100 | - The eukaryotic life basically.

00:14:02.380 | - Yeah, which I get it.

00:14:04.820 | They're much more knowledgeable than me

00:14:08.400 | about the intricacies of biology.

00:14:11.020 | But that seems like crazy 'cause how many

00:14:14.660 | single-celled organisms are there?

00:14:16.980 | And how much time you have, surely it's not that difficult.

00:14:21.340 | Like in a billion years is not even that long

00:14:24.620 | of a time really.

00:14:26.140 | Just all these bacteria under constrained resources

00:14:29.300 | battling it out, I'm sure they can invent more complex.

00:14:32.020 | Like I don't understand, it's like how to move

00:14:34.460 | from a Hello World program to like invent a function

00:14:38.300 | or something like that.

00:14:39.460 | Yeah, so I'm with you.

00:14:43.100 | I just feel like I don't see any,

00:14:45.060 | if the origin of life, that would be my intuition,

00:14:47.900 | that's the hardest thing.

00:14:48.980 | But if that's not the hardest thing

00:14:50.140 | 'cause it happens so quickly,

00:14:51.780 | then it's gotta be everywhere.

00:14:53.140 | And yeah, maybe we're just too dumb to see it.

00:14:55.340 | - Well, it's just we don't have really good mechanisms

00:14:57.100 | for seeing this life.

00:14:58.060 | I mean, by what, how?

00:15:01.100 | So I'm not an expert just to preface this,

00:15:02.700 | but just from what I've been looking at it.

00:15:04.220 | (laughing)

00:15:05.500 | - I wanna meet an expert on alien intelligence

00:15:08.220 | and how to communicate.

00:15:09.060 | - I'm very suspicious of our ability

00:15:10.460 | to find these intelligences out there

00:15:12.420 | and to find these, or it's like radio waves,

00:15:14.540 | for example, are terrible.

00:15:16.380 | Their power drops off as basically one over R square.

00:15:19.180 | So I remember reading that our current radio waves

00:15:22.060 | would not be, the ones that we are broadcasting

00:15:25.380 | would not be measurable by our devices today.

00:15:28.780 | Only like, was it like one 10th of a light year away?

00:15:30.900 | Like not even, basically tiny distance

00:15:33.020 | because you really need like a targeted transmission

00:15:36.260 | of massive power directed somewhere

00:15:38.580 | for this to be picked up on long distances.

00:15:41.340 | And so I just think that our ability to measure

00:15:43.060 | is not amazing.

00:15:45.020 | I think there's probably other civilizations out there.

00:15:47.020 | And then the big question is why don't they build

00:15:48.620 | one on their probes and why don't they interstellar travel

00:15:50.820 | across the entire galaxy?

00:15:52.460 | And my current answer is it's probably interstellar travel

00:15:55.020 | is like really hard.

00:15:56.540 | You have the interstellar medium.

00:15:57.620 | If you wanna move at close to the speed of light,

00:15:59.380 | you're going to be encountering bullets along the way

00:16:01.860 | because even like tiny hydrogen atoms

00:16:04.420 | and little particles of dust are basically have

00:16:07.060 | like massive kinetic energy at those speeds.

00:16:09.460 | And so basically you need some kind of shielding.

00:16:11.460 | You need, you have all the cosmic radiation.

00:16:13.940 | It's just like brutal out there.

00:16:15.100 | It's really hard.

00:16:16.020 | And so my thinking is maybe interstellar travel

00:16:18.100 | is just extremely hard.

00:16:19.900 | And you have to go very slow. - And billions of years

00:16:21.420 | to build hard?

00:16:22.500 | It feels like we're not a billion years away from doing that.

00:16:28.580 | - It just might be that it's very,

00:16:30.260 | you have to go very slowly, potentially, as an example,

00:16:32.860 | through space.

00:16:34.300 | - Right, as opposed to close to the speed of light.

00:16:36.660 | - So I'm suspicious basically of our ability to measure life

00:16:38.860 | and I'm suspicious of the ability to just permeate

00:16:42.180 | all of space in the galaxy or across galaxies.

00:16:44.460 | And that's the only way that I can currently see

00:16:47.060 | a way around it.

00:16:47.980 | - Yeah, it's kind of mind blowing to think

00:16:49.740 | that there's trillions of intelligent alien civilizations

00:16:53.820 | out there kind of slowly traveling through space

00:16:57.140 | to meet each other.

00:16:59.100 | And some of them meet, some of them go to war,

00:17:01.300 | some of them collaborate.

00:17:03.080 | - Or they're all just independent.

00:17:06.260 | They're all just like little pockets.

00:17:08.940 | - Well, statistically, if there's trillions of them,

00:17:13.340 | surely some of the pockets are close enough together.

00:17:16.260 | - Some of them happen to be close, yeah.

00:17:18.340 | - Close enough to see each other.

00:17:19.580 | And then once you see something that is definitely

00:17:24.460 | complex life, like if we see something,

00:17:28.060 | we're probably going to be severely, intensely,

00:17:30.900 | aggressively motivated to figure out what the hell that is

00:17:33.700 | and try to meet them.

00:17:35.060 | But what would be your first instinct to try to,

00:17:38.420 | like at a generational level, meet them or defend

00:17:43.140 | against them or what would be your instinct

00:17:47.860 | as a president of the United States and a scientist?

00:17:51.840 | I don't know which hat you prefer in this question.

00:17:55.520 | - Yeah, I think the question, it's really hard.

00:17:58.180 | I will say, like for example, for us,

00:18:02.760 | we have lots of primitive life forms on earth next to us.

00:18:05.960 | We have all kinds of ants and everything else,

00:18:07.880 | and we share a space with them.

00:18:09.280 | And we are hesitant to impact on them.

00:18:11.720 | And we're trying to protect them by default

00:18:14.920 | because they are amazing, interesting, dynamical systems

00:18:17.360 | that took a long time to evolve,

00:18:18.640 | and they are interesting and special.

00:18:20.600 | And I don't know that you want to destroy that by default.

00:18:25.600 | And so I like complex, dynamical systems

00:18:29.560 | that took a lot of time to evolve.

00:18:31.640 | I think I'd like to preserve it if I can afford to.

00:18:36.640 | And I'd like to think that the same would be true

00:18:38.440 | about the galactic resources and that they would think

00:18:41.960 | that we're kind of incredible, interesting story

00:18:44.140 | that took time, it took a few billion years to unravel,

00:18:47.560 | and you don't want to just destroy it.

00:18:49.000 | - I could see two aliens talking about earth right now

00:18:51.720 | and saying, "I'm a big fan of complex, dynamical systems.

00:18:55.560 | So I think it's a value to preserve these."

00:18:59.440 | And it will basically are a video game they watch

00:19:01.560 | or show a TV show that they watch.

00:19:04.200 | - Yeah, I think you would need a very good reason,

00:19:06.320 | I think, to destroy it.

00:19:08.800 | Like, why don't we destroy these ant farms and so on?

00:19:10.600 | It's because we're not actually really

00:19:11.840 | in direct competition with them right now.

00:19:14.660 | We do it accidentally and so on,

00:19:16.000 | but there's plenty of resources.

00:19:19.440 | And so why would you destroy something

00:19:20.920 | that is so interesting and precious?

00:19:22.360 | - Well, from a scientific perspective, you might probe it.

00:19:25.640 | You might interact with it lightly.

00:19:27.560 | - You might want to learn something from it, right?

00:19:29.520 | So I wonder, there could be certain physical phenomena

00:19:32.320 | that we think is a physical phenomena,

00:19:34.180 | but it's actually interacting with us

00:19:35.960 | to poke the finger and see what happens.

00:19:38.440 | - I think it should be very interesting to scientists,

00:19:40.080 | other alien scientists, what happened here.

00:19:42.980 | And what we're seeing today is a snapshot.

00:19:45.720 | Basically, it's a result of a huge amount of computation

00:19:48.520 | over a billion years or something like that.

00:19:52.680 | - It could have been initiated by aliens.

00:19:54.880 | This could be a computer running a program.

00:19:58.360 | If you had the power to do this, okay, for sure,

00:20:01.880 | at least I would, I would pick a Earth-like planet

00:20:06.180 | that has the conditions, based on my understanding

00:20:07.940 | of the chemistry prerequisites for life,

00:20:10.600 | and I would seed it with life and run it, right?

00:20:14.760 | Wouldn't you 100% do that and observe it and protect?

00:20:19.200 | I mean, that's not just a hell of a good TV show.

00:20:21.840 | It's a good scientific experiment.

00:20:27.040 | - It's physical simulation, right?

00:20:29.880 | Maybe evolution is the most, like actually running it

00:20:34.600 | is the most efficient way to understand computation

00:20:40.200 | or to compute stuff.

00:20:41.280 | - Or to understand life or what life looks like

00:20:44.220 | and what branches it can take.

00:20:46.120 | - It does make me kind of feel weird

00:20:47.600 | that we're part of a science experiment,

00:20:49.160 | but maybe everything's a science experiment.

00:20:52.920 | Does that change anything for us, for a science experiment?

00:20:56.960 | - I don't know.

00:20:58.320 | - Two descendants of apes talking about

00:21:00.520 | being inside of a science experiment?

00:21:01.880 | - I'm suspicious of this idea of like a deliberate

00:21:04.240 | panspermia, as you described it, Sarnath.

00:21:06.640 | I don't see a divine intervention in some way

00:21:09.040 | in the historical record right now.

00:21:11.200 | I do feel like the story in these books,

00:21:15.080 | like Nick Lane's books and so on, sort of makes sense,

00:21:17.440 | and it makes sense how life arose on Earth uniquely.

00:21:20.660 | And yeah, I don't need to reach

00:21:23.580 | for more exotic explanations right now.

00:21:25.400 | - Sure, but NPCs inside a video game

00:21:27.600 | don't observe any divine intervention either.

00:21:32.360 | We might just be all NPCs running a kind of code.

00:21:35.440 | - Maybe eventually they will.

00:21:36.400 | Currently, NPCs are really dumb,

00:21:37.720 | but once they're running GPTs,

00:21:39.840 | maybe they will be like,

00:21:40.840 | hey, this is really suspicious, what the hell?

00:21:43.440 | - So you famously tweeted,

00:21:45.800 | "It looks like if you bombard Earth

00:21:47.880 | "with photons for a while, you can emit a roadster."

00:21:51.660 | So if like in "Hitchhiker's Guide to the Galaxy,"

00:21:54.840 | we would summarize the story of Earth.

00:21:56.840 | So in that book, it's mostly harmless.

00:21:59.460 | What do you think is all the possible stories,

00:22:02.760 | like a paragraph long or a sentence long,

00:22:05.800 | that Earth could be summarized as?

00:22:08.560 | Once it's done, it's computation.

00:22:11.200 | So like all the possible full,

00:22:13.420 | if Earth is a book, right?

00:22:16.240 | - Yeah.

00:22:17.080 | - Probably there has to be an ending.

00:22:19.920 | I mean, there's going to be an end to Earth,

00:22:21.440 | and it could end in all kinds of ways.

00:22:23.200 | It can end soon, it can end later.

00:22:25.280 | What do you think are the possible stories?

00:22:27.480 | - Well, definitely there seems to be,

00:22:29.840 | yeah, you're sort of,

00:22:30.880 | it's pretty incredible that these self-replicating systems

00:22:34.160 | will basically arise from the dynamics,

00:22:37.240 | and then they perpetuate themselves and become more complex,

00:22:39.500 | and eventually become conscious and build a society.

00:22:42.760 | And I kind of feel like in some sense,

00:22:44.280 | it's kind of like a deterministic wave

00:22:46.880 | that kind of just like happens on any,

00:22:50.840 | any sufficiently well-arranged system like Earth.

00:22:53.880 | And so I kind of feel like there's a certain sense

00:22:55.840 | of inevitability in it, and it's really beautiful.

00:22:59.800 | - And it ends somehow, right?

00:23:00.960 | So it's a chemically,

00:23:04.360 | a diverse environment where complex dynamical systems

00:23:10.040 | can evolve and become more, further and further complex.

00:23:15.040 | But then there's a certain,

00:23:17.360 | what is it?

00:23:20.400 | There's certain terminating conditions.

00:23:22.640 | - Yeah, I don't know what the terminating conditions are,

00:23:25.080 | but definitely there's a trend line of something,

00:23:27.120 | and we're part of that story.

00:23:28.160 | And like, where does that, where does it go?

00:23:30.680 | So, you know, we're famously described often

00:23:32.480 | as a biological bootloader for AIs.

00:23:35.080 | And that's because humans, I mean, you know,

00:23:36.240 | we're an incredible biological system,

00:23:39.000 | and we're capable of computation and, you know,

00:23:42.000 | and love and so on.

00:23:43.540 | But we're extremely inefficient as well.

00:23:46.200 | Like we're talking to each other through audio.

00:23:47.840 | It's just kind of embarrassing, honestly,

00:23:49.840 | that we're manipulating like seven symbols,

00:23:52.320 | serially, we're using vocal cords,

00:23:55.160 | it's all happening over like multiple seconds.

00:23:57.600 | It's just like kind of embarrassing

00:23:58.800 | when you step down to the frequencies

00:24:01.920 | at which computers operate or are able to operate on.

00:24:05.160 | And so basically it does seem like synthetic intelligences

00:24:09.720 | are kind of like the next stage of development.

00:24:12.320 | And I don't know where it leads to,

00:24:14.560 | like at some point I suspect the universe

00:24:18.200 | is some kind of a puzzle.

00:24:20.600 | And these synthetic AIs will uncover that puzzle

00:24:24.000 | and solve it.

00:24:26.760 | - And then what happens after, right?

00:24:28.640 | Like what, 'cause if you just like fast forward Earth,

00:24:31.600 | many billions of years, it's like, it's quiet.

00:24:35.120 | And then it's like, to normal,

00:24:36.600 | you see like city lights and stuff like that.

00:24:38.280 | And then what happens at like at the end?

00:24:40.120 | Like, is it like a (mimics explosion)

00:24:42.240 | or is it like a calming, is it explosion?

00:24:45.480 | Is it like Earth like open, like a giant?

00:24:47.640 | 'Cause you said emit roasters.

00:24:50.280 | Will it start emitting like a giant number of like satellites?

00:24:55.280 | - Yes, it's some kind of a crazy explosion.

00:24:58.280 | And we're living, we're like,

00:25:00.000 | we're stepping through a explosion

00:25:01.980 | and we're like living day to day

00:25:03.240 | and it doesn't look like it, but it's actually,

00:25:04.720 | if you, I saw a very cool animation of Earth

00:25:07.600 | and life on Earth, and basically nothing happens

00:25:09.240 | for a long time.

00:25:10.080 | And then the last like two seconds,

00:25:11.520 | like basically cities and everything,

00:25:12.960 | and the lower orbit just gets cluttered

00:25:15.840 | and just the whole thing happens in the last two seconds

00:25:17.480 | and you're like, this is exploding.

00:25:19.160 | This is a statement explosion.

00:25:21.080 | - So if you play, yeah, yeah,

00:25:23.480 | if you play at a normal speed,

00:25:25.600 | it'll just look like an explosion.

00:25:27.520 | - It's a firecracker.

00:25:28.360 | We're living in a firecracker.

00:25:30.360 | - Where it's going to start emitting

00:25:31.960 | all kinds of interesting things.

00:25:33.720 | And then, so explosion doesn't,

00:25:36.240 | it might actually look like a little explosion

00:25:38.680 | with lights and fire and energy emitted,

00:25:41.200 | all that kind of stuff.

00:25:42.040 | But when you look inside the details of the explosion,

00:25:45.280 | there's actual complexity happening

00:25:47.960 | where there's like, yeah, human life or some kind of life.

00:25:52.120 | - We hope it's not a destructive firecracker.

00:25:53.720 | It's kind of like a constructive firecracker.

00:25:57.920 | - All right, so given that, hilarious discussion.

00:26:01.080 | - It is really interesting to think about like

00:26:02.440 | what the puzzle of the universe is.

00:26:03.880 | Did the creator of the universe give us a message?

00:26:06.520 | Like for example, in the book "Contact", Carl Sagan,

00:26:09.640 | there's a message for any civilization in digits,

00:26:15.040 | in the expansion of pi in base 11 eventually,

00:26:18.080 | which is kind of interesting thought.

00:26:19.800 | Maybe we're supposed to be giving a message to our creator.

00:26:23.080 | Maybe we're supposed to somehow create

00:26:24.760 | some kind of a quantum mechanical system

00:26:26.600 | that alerts them to our intelligent presence here.

00:26:30.080 | 'Cause if you think about it from their perspective,

00:26:31.840 | it's just say like quantum field theory,

00:26:33.880 | massive like cellular automaton like thing.

00:26:36.680 | And like, how do you even notice that we exist?

00:26:38.520 | You might not even be able to pick us up in that simulation.

00:26:42.080 | And so how do you prove that you exist,

00:26:44.760 | that you're intelligent

00:26:45.600 | and that you're a part of the universe?

00:26:47.520 | - So this is like a Turing test for intelligence from Earth.

00:26:50.280 | - Yeah. - Like the creator is,

00:26:52.200 | I mean, maybe this is like trying to complete

00:26:54.560 | the next word in a sentence.

00:26:55.760 | This is a complicated way of that.

00:26:57.240 | Like Earth is just, is basically sending a message back.

00:27:00.840 | - Yeah, the puzzle is basically like alerting the creator

00:27:03.120 | that we exist.

00:27:04.520 | Or maybe the puzzle is just to just break out of the system

00:27:07.160 | and just stick it to the creator in some way.

00:27:10.360 | Basically, like if you're playing a video game,

00:27:12.200 | you can somehow find an exploit

00:27:15.400 | and find a way to execute on the host machine,

00:27:18.480 | any arbitrary code.

00:27:19.800 | There's some, for example,

00:27:21.440 | I believe someone got a game of Mario to play Pong

00:27:24.520 | just by exploiting it and then creating,

00:27:29.160 | basically writing code

00:27:30.800 | and being able to execute arbitrary code in the game.

00:27:33.200 | And so maybe we should be,

00:27:34.760 | maybe that's the puzzle is that we should be

00:27:37.000 | find a way to exploit it.

00:27:39.240 | So I think like some of these synthetic AI's

00:27:41.160 | will eventually find the universe to be some kind of a puzzle

00:27:43.520 | and then solve it in some way.

00:27:45.120 | And that's kind of like the end game somehow.

00:27:47.440 | - Do you often think about it as a simulation?

00:27:51.360 | So as the universe being a kind of computation

00:27:55.440 | that has, might have bugs and exploits?

00:27:57.800 | - Yes. Yeah, I think so.

00:27:59.160 | - Is that what physics is essentially?

00:28:01.160 | - I think it's possible that physics has exploits

00:28:03.040 | and we should be trying to find them.

00:28:04.720 | Arranging some kind of a crazy quantum mechanical system

00:28:07.080 | that somehow gives you buffer overflow,

00:28:09.560 | somehow gives you a rounding error in the floating point.

00:28:12.360 | - Yeah, that's right.

00:28:16.120 | And like more and more sophisticated exploits.

00:28:18.960 | Those are jokes, but that could be actually very close.

00:28:21.400 | - Yeah, we'll find some way to extract infinite energy.

00:28:23.840 | For example, when you train reinforcement learning agents

00:28:26.680 | in physical simulations

00:28:27.800 | and you ask them to say run quickly on the flat ground,

00:28:31.280 | they'll end up doing all kinds of like weird things

00:28:33.880 | in part of that optimization, right?

00:28:35.200 | They'll get on their back leg

00:28:36.560 | and they will slide across the floor.

00:28:38.680 | And it's because of the optimization,

00:28:40.920 | the enforcement learning optimization on that agent

00:28:42.760 | has figured out a way to extract infinite energy

00:28:44.400 | from the friction forces

00:28:45.520 | and basically their poor implementation.

00:28:48.520 | And they found a way to generate infinite energy

00:28:50.560 | and just slide across the surface.

00:28:51.680 | And it's not what you expected.

00:28:52.840 | It's just a, it's sort of like a perverse solution.

00:28:56.120 | And so maybe we can find something like that.

00:28:57.920 | Maybe we can be that little dog in this physical simulation.

00:29:02.320 | - The cracks or escapes the intended consequences

00:29:07.040 | of the physics that the universe came up with.

00:29:09.600 | We'll figure out some kind of shortcut to some weirdness.

00:29:12.040 | And then, oh man, but see the problem with that weirdness

00:29:15.000 | is the first person to discover the weirdness,

00:29:17.600 | like sliding on the back legs, that's all we're gonna do.

00:29:21.360 | It's very quickly becomes everybody does that thing.

00:29:26.840 | So like the paperclip maximizer is a ridiculous idea,

00:29:31.300 | but that very well could be what then we'll just,

00:29:35.800 | we'll just all switch that 'cause it's so fun.

00:29:38.040 | - Well, no person will discover it, I think, by the way.

00:29:39.920 | I think it's going to have to be some kind

00:29:42.400 | of a super intelligent AGI of a third generation.

00:29:45.760 | Like we're building the first generation AGI.

00:29:48.920 | Maybe, you know.

00:29:50.520 | - Third generation.

00:29:51.880 | Yeah, so the bootloader for an AI,

00:29:55.640 | that AI will be a bootloader for another AI.

00:29:58.560 | - Better AI, yeah.

00:30:00.080 | - And then there's no way for us to introspect

00:30:02.640 | like what that might even--

00:30:04.240 | - I think it's very likely that these things, for example,

00:30:05.880 | like say you have these AGIs, it's very likely,

00:30:08.160 | for example, they will be completely inert.

00:30:10.480 | I like these kinds of sci-fi books sometimes

00:30:12.160 | where these things are just completely inert.

00:30:14.120 | They don't interact with anything.

00:30:15.520 | And I find that kind of beautiful

00:30:16.560 | because they've probably figured out the meta game

00:30:20.680 | of the universe in some way, potentially.

00:30:22.080 | They're doing something completely beyond our imagination

00:30:25.040 | and they don't interact with simple chemical life forms.

00:30:29.880 | Like why would you do that?

00:30:31.280 | So I find those kinds of ideas compelling.

00:30:33.440 | - What's their source of fun?

00:30:34.840 | What are they doing?

00:30:36.320 | What's the source of pleasure?

00:30:37.140 | - Well, it's probably puzzle solving in the universe.

00:30:38.960 | - But inert, so can you define what it means inert?

00:30:43.000 | So they escape the interaction with physical reality?

00:30:44.840 | - They will appear inert to us, as in,

00:30:46.940 | they will behave in some very strange way to us

00:30:53.360 | because they're beyond, they're playing the meta game.

00:30:57.160 | And the meta game is probably, say,

00:30:58.280 | like arranging quantum mechanical systems

00:30:59.880 | in some very weird ways to extract infinite energy,

00:31:03.160 | solve the digital expansion of pi to whatever amount.

00:31:07.040 | They will build their own little fusion reactors

00:31:09.560 | or something crazy.

00:31:10.640 | Like they're doing something beyond comprehension

00:31:12.240 | and not understandable to us

00:31:14.360 | and actually brilliant under the hood.

00:31:17.040 | - What if quantum mechanics itself is the system

00:31:20.160 | and we're just thinking it's physics

00:31:23.200 | but we're really parasites, not parasites,

00:31:27.600 | we're not really hurting physics.

00:31:29.400 | We're just living on this organism,

00:31:31.640 | this organism and we're like trying to understand it

00:31:34.760 | but really it is an organism

00:31:36.760 | and with a deep, deep intelligence.

00:31:38.160 | Maybe physics itself is the organism

00:31:42.160 | that's doing the super interesting thing

00:31:46.640 | and we're just like one little thing,

00:31:48.760 | ant sitting on top of it trying to get energy from it.

00:31:52.440 | - We're just kind of like these particles in a wave

00:31:54.840 | that I feel like is mostly deterministic

00:31:56.400 | and takes a universe from some kind of a Big Bang

00:31:58.960 | to some kind of a super intelligent replicator,

00:32:02.200 | some kind of a stable point in the universe

00:32:04.880 | given these laws of physics.

00:32:06.480 | - You don't think, as Einstein said,

00:32:09.000 | God doesn't play dice?

00:32:10.800 | So you think it's mostly deterministic?

00:32:12.640 | There's no randomness in the thing?

00:32:13.760 | - I think it's deterministic.

00:32:14.680 | Oh, there's tons of, well,

00:32:16.840 | I want to be careful with randomness.

00:32:18.120 | - Pseudo random?

00:32:19.440 | - Yeah, I don't like random.

00:32:20.680 | I think maybe the laws of physics are deterministic.

00:32:23.360 | Yeah, I think they're deterministic.

00:32:25.360 | - We just got really uncomfortable with this question.

00:32:27.600 | (Andrej laughing)

00:32:29.280 | Do you have anxiety about whether the universe

00:32:31.240 | is random or not?

00:32:32.200 | Is this a source?

00:32:33.280 | (Andrej laughing)

00:32:34.240 | What's-- - There's no randomness.

00:32:36.560 | - You said you like goodwill hunting.

00:32:38.080 | It's not your fault, Andrej.

00:32:39.320 | (Andrej laughing)

00:32:40.360 | It's not your fault, man.

00:32:41.640 | So you don't like randomness?

00:32:45.320 | - Yeah, I think it's unsettling.

00:32:46.960 | I think it's a deterministic system.

00:32:48.800 | I think that things that look random,

00:32:50.600 | like say the collapse of the wave function, et cetera,

00:32:53.320 | I think they're actually deterministic,

00:32:54.720 | just entanglement and so on,

00:32:56.800 | and some kind of a multi-verse theory, something, something.

00:32:59.640 | - Okay, so why does it feel like we have a free will?

00:33:02.800 | Like if I raise this hand, I chose to do this now.

00:33:06.040 | That doesn't feel like a deterministic thing.

00:33:12.280 | It feels like I'm making a choice.

00:33:14.480 | - It feels like it.

00:33:15.680 | - Okay, so it's all feelings.

00:33:17.160 | It's just feelings.

00:33:18.840 | So when an RL agent is making a choice,

00:33:21.800 | it's not really making a choice.

00:33:26.000 | The choice is already there.

00:33:27.680 | - Yeah, you're interpreting the choice

00:33:28.960 | and you're creating a narrative for having made it.

00:33:32.120 | - Yeah, and now we're talking about the narrative.

00:33:33.880 | It's very meta.

00:33:35.200 | Looking back, what is the most beautiful

00:33:37.720 | or surprising idea in deep learning or AI

00:33:41.080 | in general that you've come across?

00:33:43.240 | You've seen this field explode

00:33:45.120 | and grow in interesting ways.

00:33:47.720 | Just what cool ideas, like we made you sit back and go,

00:33:52.240 | hmm, small, big or small?

00:33:55.560 | - Well, the one that I've been thinking about recently

00:33:57.880 | the most probably is the transformer architecture.

00:34:02.000 | So basically neural networks have,

00:34:06.080 | a lot of architectures that were trendy

00:34:07.800 | have come and gone for different sensory modalities,

00:34:10.720 | like for vision, audio, text.

00:34:12.720 | You would process them with different looking neural nets.

00:34:14.760 | And recently we've seen this convergence

00:34:16.800 | towards one architecture, the transformer.

00:34:19.120 | And you can feed it video or you can feed it images

00:34:22.440 | or speech or text, and it just gobbles it up.

00:34:24.280 | And it's kind of like a bit of a general purpose computer

00:34:28.640 | that is also trainable and very efficient

00:34:30.440 | to run on our hardware.

00:34:32.000 | And so this paper came out in 2016, I wanna say.

00:34:36.840 | - "Attention is All You Need."

00:34:38.000 | - "Attention is All You Need."

00:34:39.320 | - You criticized the paper title in retrospect

00:34:41.720 | that it wasn't, it didn't foresee the bigness of the impact

00:34:46.720 | that it was going to have.

00:34:48.960 | - Yeah, I'm not sure if the authors were aware

00:34:50.440 | of the impact that that paper would go on to have.

00:34:52.680 | Probably they weren't.

00:34:53.880 | But I think they were aware of some of the motivations

00:34:56.280 | and design decisions behind the transformer.

00:34:58.080 | And they chose not to, I think,

00:34:59.680 | expand on it in that way in the paper.

00:35:01.800 | And so I think they had an idea that there was more

00:35:04.360 | than just the surface of just like,

00:35:06.800 | "Oh, we're just doing translation

00:35:07.840 | and here's a better architecture."

00:35:09.120 | You're not just doing translation.

00:35:10.080 | This is like a really cool, differentiable,

00:35:12.200 | optimizable, efficient computer that you've proposed.

00:35:14.880 | And maybe they didn't have all of that foresight,

00:35:16.840 | but I think it's really interesting.

00:35:18.240 | - Isn't it funny, sorry to interrupt,

00:35:20.320 | that title is memeable,

00:35:22.480 | that they went for such a profound idea.

00:35:25.400 | They went with a,

00:35:26.440 | I don't think anyone used that kind of title before, right?

00:35:29.360 | - Attention is all you need.

00:35:30.200 | Yeah, it's like a meme or something, basically.

00:35:32.520 | - Isn't that funny?

00:35:33.520 | Maybe if it was a more serious title,

00:35:37.600 | it wouldn't have the impact.

00:35:38.680 | - Honestly, yeah, there is an element of me

00:35:40.600 | that honestly agrees with you and prefers it this way.

00:35:43.000 | - Yes.

00:35:43.840 | (both laughing)

00:35:45.880 | - If it was too grand, it would over-promise

00:35:47.800 | and then under-deliver, potentially.

00:35:49.120 | So you want to just meme your way to greatness.

00:35:51.800 | (both laughing)

00:35:53.280 | - That should be a T-shirt.

00:35:54.400 | So you tweeted, "The Transformer

00:35:56.160 | is a magnificent neural network architecture

00:35:58.880 | because it is a general-purpose differentiable computer.

00:36:01.760 | It is simultaneously expressive in the forward pass,

00:36:05.040 | optimizable via backpropagation gradient descent,

00:36:08.520 | and efficient, high-parallelism compute graph."

00:36:12.360 | Can you discuss some of those details,

00:36:14.280 | expressive, optimizable, efficient,

00:36:17.360 | from memory or in general, whatever comes to your heart?

00:36:21.000 | - You want to have a general-purpose computer

00:36:22.200 | that you can train on arbitrary problems,

00:36:24.280 | like say the task of next word prediction

00:36:26.160 | or detecting if there's a cat in an image

00:36:28.200 | or something like that.

00:36:29.680 | And you want to train this computer,

00:36:31.040 | so you want to set its weights.

00:36:32.720 | And I think there's a number of design criteria

00:36:34.480 | that sort of overlap in the Transformer simultaneously

00:36:37.760 | that made it very successful.

00:36:38.920 | And I think the authors were kind of deliberately trying

00:36:41.960 | to make this a really powerful architecture.

00:36:46.200 | And so basically it's very powerful in the forward pass

00:36:50.640 | because it's able to express very general computation

00:36:55.520 | as sort of something that looks like message passing.

00:36:57.880 | You have nodes and they all store vectors.

00:37:00.040 | And these nodes get to basically look at each other

00:37:02.600 | and each other's vectors, and they get to communicate.

00:37:06.080 | And basically nodes get to broadcast,

00:37:08.560 | "Hey, I'm looking for certain things."

00:37:09.840 | And then other nodes get to broadcast,

00:37:11.200 | "Hey, these are the things I have.

00:37:12.600 | Those are the keys and the values."

00:37:13.760 | - So it's not just attention.

00:37:15.200 | - Yeah, exactly.

00:37:16.040 | Transformer is much more than just the attention component.

00:37:17.680 | It's got many pieces architectural that went into it.

00:37:20.160 | The residual connection, the way it's arranged,

00:37:21.840 | there's a multilayer perceptron in there,

00:37:23.840 | the way it's stacked and so on.

00:37:25.960 | But basically there's a message passing scheme

00:37:28.480 | where nodes get to look at each other,

00:37:29.840 | decide what's interesting and then update each other.

00:37:32.680 | And so I think when you get to the details of it,

00:37:35.720 | I think it's a very expressive function.

00:37:37.760 | So it can express lots of different types of algorithms

00:37:39.840 | in the forward pass.

00:37:40.800 | Not only that, but the way it's designed

00:37:42.560 | with the residual connections, layer normalizations,

00:37:44.600 | the softmax attention and everything,

00:37:46.400 | it's also optimizable.

00:37:47.480 | This is a really big deal

00:37:48.720 | because there's lots of computers that are powerful

00:37:51.320 | that you can't optimize,

00:37:52.800 | or they're not easy to optimize

00:37:53.920 | using the techniques that we have,

00:37:55.080 | which is back propagation and gradient descent.

00:37:56.680 | These are first order methods,

00:37:57.960 | very simple optimizers really.

00:37:59.760 | And so you also need it to be optimizable.

00:38:02.920 | And then lastly,

00:38:04.880 | you want it to run efficiently on our hardware.

00:38:06.520 | Our hardware is a massive throughput machine like GPUs.

00:38:10.640 | They prefer lots of parallelism.

00:38:13.040 | So you don't want to do lots of sequential operations.

00:38:14.960 | You want to do a lot of operations serially.

00:38:16.840 | And the transformer is designed with that in mind as well.

00:38:19.240 | And so it's designed for our hardware

00:38:21.400 | and is designed to both be very expressive

00:38:23.160 | in a forward pass,

00:38:24.000 | but also very optimizable in the backward pass.

00:38:26.320 | - And you said that the residual connections

00:38:29.280 | support a kind of ability to learn short algorithms

00:38:32.240 | fast and first,

00:38:33.240 | and then gradually extend them longer during training.

00:38:37.000 | What's the idea of learning short algorithms?

00:38:39.560 | - Right.

00:38:40.400 | Think of it as a,

00:38:41.240 | so basically a transformer is a series of blocks, right?

00:38:45.880 | And these blocks have attention

00:38:47.040 | and a little multilayer perceptron.

00:38:48.560 | And so you go off into a block

00:38:50.480 | and you come back to this residual pathway,

00:38:52.320 | and then you go off and you come back,

00:38:53.480 | and then you have a number of layers arranged sequentially.

00:38:55.960 | And so the way to look at it, I think,

00:38:57.560 | is because of the residual pathway in the backward pass,

00:39:00.520 | the gradients sort of flow along it uninterrupted

00:39:04.280 | because addition distributes the gradient equally

00:39:07.080 | to all of its branches.

00:39:08.360 | So the gradient from the supervision at the top

00:39:10.760 | just floats directly to the first layer.

00:39:13.880 | And all the residual connections are arranged

00:39:16.200 | so that in the beginning, during initialization,

00:39:18.120 | they contribute nothing to the residual pathway.

00:39:20.480 | So what it kind of looks like is,

00:39:22.840 | imagine the transformer is kind of like a Python function,

00:39:26.800 | like a def.

00:39:27.880 | And you get to do various kinds of lines of code.

00:39:32.120 | Say you have a hundred layers deep transformer,

00:39:35.360 | typically they would be much shorter, say 20.

00:39:37.120 | So you have 20 lines of code

00:39:37.960 | and you can do something in them.

00:39:39.440 | And so during the optimization,

00:39:41.040 | basically what it looks like is,

00:39:41.920 | first you optimize the first line of code

00:39:43.440 | and then the second line of code can kick in

00:39:45.040 | and the third line of code can kick in.

00:39:46.600 | And I kind of feel like because of the residual pathway

00:39:48.800 | and the dynamics of the optimization,

00:39:50.840 | you can sort of learn a very short algorithm

00:39:52.920 | that gets the approximate answer,

00:39:54.280 | but then the other layers can sort of kick in

00:39:56.120 | and start to create a contribution.

00:39:57.640 | And at the end of it, you're optimizing over an algorithm

00:40:00.040 | that is 20 lines of code.

00:40:02.480 | Except these lines of code are very complex

00:40:03.920 | because it's an entire block of a transformer.

00:40:05.760 | You can do a lot in there.

00:40:06.840 | Well, what's really interesting

00:40:07.680 | is that this transformer architecture actually

00:40:09.760 | has been remarkably resilient.

00:40:11.720 | Basically the transformer that came out in 2016

00:40:13.640 | is the transformer you would use today,

00:40:15.120 | except you reshuffle some of the layer norms.

00:40:17.880 | The layer normalizations have been reshuffled

00:40:19.480 | to a pre-norm formulation.

00:40:21.880 | And so it's been remarkably stable,

00:40:23.560 | but there's a lot of bells and whistles

00:40:25.200 | that people have attached on it and try to improve it.

00:40:27.960 | I do think that basically it's a big step

00:40:29.840 | in simultaneously optimizing for lots of properties

00:40:32.720 | of a desirable neural network architecture.

00:40:34.320 | And I think people have been trying to change it,

00:40:36.000 | but it's proven remarkably resilient.

00:40:38.640 | But I do think that there should be

00:40:39.960 | even better architectures potentially.

00:40:41.760 | - But you admire the resilience here.

00:40:45.680 | - Yeah. - There's something profound

00:40:46.840 | about this architecture that leads to resilience.

00:40:49.320 | So maybe everything could be turned

00:40:51.080 | into a problem that transformers can solve.

00:40:55.080 | - Currently, it definitely looks like

00:40:56.160 | the transformer is taking over AI

00:40:57.720 | and you can feed basically arbitrary problems into it.

00:41:00.080 | And it's a general differentiable computer

00:41:02.240 | and it's extremely powerful.

00:41:03.480 | And this convergence in AI has been really interesting

00:41:07.520 | to watch for me personally.

00:41:09.720 | - What else do you think could be discovered here

00:41:12.040 | about transformers?

00:41:12.960 | Like what's surprising thing?

00:41:14.640 | Or is it a stable, I want a stable place.

00:41:18.760 | Is there something interesting we might discover

00:41:20.400 | about transformers?

00:41:21.480 | Like aha moments maybe has to do with memory,

00:41:24.280 | maybe knowledge representation, that kind of stuff.

00:41:28.240 | - Definitely the Zeitgeist today is just pushing,

00:41:31.200 | like basically right now the Zeitgeist

00:41:32.800 | is do not touch the transformer, touch everything else.

00:41:35.880 | So people are scaling up the datasets,

00:41:37.280 | making them much, much bigger.

00:41:38.320 | They're working on the evaluation,

00:41:39.480 | making the evaluation much, much bigger.

00:41:41.360 | And they're basically keeping the architecture unchanged.

00:41:45.800 | And that's how we've, that's the last five years

00:41:47.920 | of progress in AI, kind of.

00:41:50.800 | - What do you think about one flavor of it,

00:41:53.000 | which is language models?

00:41:54.920 | Have you been surprised?

00:41:56.400 | Has your sort of imagination been captivated

00:42:01.000 | by you mentioned GPT and all the bigger and bigger

00:42:03.560 | and bigger language models?

00:42:05.600 | And what are the limits of those models, do you think?

00:42:10.600 | So just the task of natural language.

00:42:14.560 | - Basically the way GPT is trained, right?

00:42:17.560 | Is you just download a massive amount of text data

00:42:20.000 | from the internet and you try to predict the next word

00:42:23.080 | in the sequence, roughly speaking.

00:42:24.600 | You're predicting little word chunks,

00:42:26.680 | but roughly speaking, that's it.

00:42:29.320 | And what's been really interesting to watch

00:42:30.720 | is basically it's a language model.

00:42:33.120 | Language models have actually existed for a very long time.

00:42:36.200 | There's papers on language modeling from 2003, even earlier.

00:42:39.800 | - Can you explain in that case what a language model is?

00:42:42.840 | - Yeah, so language model, just basically the rough idea

00:42:45.360 | is just predicting the next word in a sequence,

00:42:48.480 | roughly speaking.

00:42:49.760 | So there's a paper from, for example, Ben Gio

00:42:52.520 | and the team from 2003, where for the first time

00:42:55.120 | they were using a neural network to take, say,

00:42:57.920 | like three or five words and predict the next word.

00:43:01.680 | And they're doing this on much smaller datasets.

00:43:03.680 | And the neural net is not a transformer,

00:43:05.200 | it's a multi-layer perceptron, but it's the first time

00:43:08.080 | that a neural network has been applied in that setting.

00:43:10.240 | But even before neural networks, there were language models,

00:43:13.360 | except they were using N-gram models.

00:43:16.800 | So N-gram models are just count-based models.

00:43:19.760 | So if you start to take two words and predict the third one,

00:43:24.320 | you just count up how many times you've seen

00:43:26.800 | any two-word combinations and what came next.

00:43:29.880 | And what you predict as coming next

00:43:31.480 | is just what you've seen the most of in the training set.

00:43:34.160 | And so language modeling has been around for a long time.

00:43:36.600 | Neural networks have done language modeling

00:43:38.280 | for a long time.

00:43:39.440 | So really what's new or interesting or exciting

00:43:41.680 | is just realizing that when you scale it up

00:43:46.040 | with a powerful enough neural net, a transformer,

00:43:48.680 | you have all these emergent properties

00:43:50.520 | where basically what happens is

00:43:53.600 | if you have a large enough dataset of text,

00:43:56.880 | you are in the task of predicting the next word.

00:44:00.480 | You are multitasking a huge amount

00:44:02.080 | of different kinds of problems.

00:44:04.520 | You are multitasking understanding of chemistry,

00:44:07.960 | physics, human nature.

00:44:09.760 | Lots of things are sort of clustered in that objective.

00:44:12.120 | It's a very simple objective, but actually you have

00:44:13.760 | to understand a lot about the world

00:44:15.200 | to make that prediction.

00:44:16.240 | - You just said the U-word, understanding.

00:44:19.160 | Are you, in terms of chemistry and physics and so on,

00:44:23.480 | what do you feel like it's doing?

00:44:25.000 | Is it searching for the right context?

00:44:27.160 | What is the actual process happening here?

00:44:32.320 | - Yeah, so basically it gets a thousand words

00:44:34.680 | and it's trying to predict the thousand and first.

00:44:36.520 | And in order to do that very, very well

00:44:38.720 | over the entire dataset available on the internet,

00:44:41.200 | you actually have to basically kind of understand

00:44:44.120 | the context of what's going on in there.

00:44:46.600 | And it's a sufficiently hard problem

00:44:50.640 | that if you have a powerful enough computer,

00:44:53.840 | like a transformer, you end up with interesting solutions.

00:44:57.560 | And you can ask it to do all kinds of things.

00:45:01.000 | And it shows a lot of emergent properties,

00:45:04.800 | like in-context learning, that was the big deal with GPT

00:45:07.640 | and the original paper when they published it,

00:45:09.680 | is that you can just sort of prompt it in various ways

00:45:12.560 | and ask it to do various things.

00:45:13.760 | And it will just kind of complete the sentence.

00:45:15.240 | But in the process of just completing the sentence,

00:45:17.160 | it's actually solving all kinds of really interesting

00:45:20.000 | problems that we care about.

00:45:21.520 | - Do you think it's doing something like understanding?

00:45:24.480 | Like when we use the word understanding for us humans?

00:45:28.360 | - I think it's doing some understanding.

00:45:31.240 | In its weights, it understands, I think,

00:45:33.200 | a lot about the world and it has to

00:45:35.760 | in order to predict the next word in a sequence.

00:45:38.720 | - So it's trained on the data from the internet.

00:45:41.120 | What do you think about this approach

00:45:44.760 | in terms of datasets of using data from the internet?

00:45:47.800 | Do you think the internet has enough structured data

00:45:50.400 | to teach AI about human civilization?

00:45:52.760 | - Yes, I think the internet has a huge amount of data.

00:45:56.000 | I'm not sure if it's a complete enough set.

00:45:58.000 | I don't know that text is enough

00:46:00.920 | for having a sufficiently powerful AGI as an outcome.

00:46:04.720 | - Of course, there is audio and video and images

00:46:07.160 | and all that kind of stuff.

00:46:08.280 | - Yeah, so text by itself, I'm a little bit suspicious about.

00:46:10.600 | There's a ton of things we don't put in text in writing,

00:46:13.280 | just because they're obvious to us

00:46:14.600 | about how the world works and the physics of it

00:46:16.200 | and that things fall.

00:46:17.240 | We don't put that stuff in text because why would you?

00:46:19.040 | We share that understanding.

00:46:20.920 | And so text is a communication medium between humans,

00:46:22.920 | and it's not a all-encompassing medium of knowledge

00:46:26.560 | about the world.

00:46:27.520 | But as you pointed out, we do have video

00:46:29.600 | and we have images and we have audio.

00:46:31.520 | And so I think that definitely helps a lot,

00:46:33.600 | but we haven't trained models sufficiently across both,

00:46:37.760 | across all of those modalities yet.

00:46:39.600 | So I think that's what a lot of people are interested in.

00:46:41.200 | - But I wonder what that shared understanding

00:46:42.960 | of what we might call common sense

00:46:46.040 | has to be learned, inferred,

00:46:49.120 | in order to complete the sentence correctly.

00:46:51.720 | So maybe the fact that it's implied on the internet,

00:46:55.920 | the model's gonna have to learn that,

00:46:58.080 | not by reading about it,

00:47:00.000 | by inferring it in the representation.

00:47:02.800 | So like common sense, just like we,

00:47:04.840 | I don't think we learn common sense.

00:47:06.760 | Like nobody says, tells us explicitly.

00:47:10.160 | We just figure it all out by interacting with the world.

00:47:13.000 | - Right.

00:47:13.880 | - And so here's a model of reading

00:47:15.400 | about the way people interact with the world.

00:47:17.600 | It might have to infer that.

00:47:19.560 | I wonder.

00:47:20.480 | - Yeah.

00:47:21.520 | - You briefly worked on a project called World of Bits,

00:47:25.320 | training an RL system to take actions on the internet,

00:47:28.640 | versus just consuming the internet, like we talked about.

00:47:32.240 | Do you think there's a future for that kind of system,

00:47:34.360 | interacting with the internet to help the learning?

00:47:36.960 | - Yes, I think that's probably the final frontier

00:47:39.560 | for a lot of these models,

00:47:40.880 | because, so as you mentioned, when I was at OpenAI,

00:47:44.480 | I was working on this project called World of Bits,

00:47:45.960 | and basically it was the idea of giving neural networks

00:47:47.880 | access to a keyboard and a mouse.

00:47:50.080 | And the idea is that-

00:47:50.920 | - What could possibly go wrong?

00:47:52.560 | - So basically you perceive the input of the screen pixels,

00:47:58.480 | and basically the state of the computer

00:48:01.400 | is sort of visualized for human consumption

00:48:03.680 | in images of the web browser and stuff like that.

00:48:06.520 | And then you give the neural network the ability

00:48:08.280 | to press keyboards and use the mouse.

00:48:10.120 | And we were trying to get it to, for example,

00:48:11.560 | complete bookings and interact with user interfaces.

00:48:14.880 | And-

00:48:15.840 | - What'd you learn from that experience?

00:48:17.360 | Like what was some fun stuff?

00:48:18.800 | This is a super cool idea.

00:48:20.320 | - Yeah.

00:48:21.160 | - I mean, it's like, yeah, I mean,

00:48:23.680 | the step between observer to actor

00:48:27.000 | is a super fascinating step.

00:48:28.760 | - Yeah, well, it's the universal interface

00:48:30.560 | in the digital realm, I would say.

00:48:32.440 | And there's a universal interface in like the physical realm,

00:48:35.080 | which in my mind is a humanoid form factor kind of thing.

00:48:38.580 | We can later talk about Optimus and so on,

00:48:40.520 | but I feel like there's a,

00:48:41.800 | they're kind of like a similar philosophy in some way,

00:48:45.160 | where the physical world is designed for the human form,

00:48:48.800 | and the digital world is designed for the human form

00:48:50.760 | of seeing the screen and using keyboard and mouse.

00:48:54.680 | And so it's the universal interface

00:48:56.360 | that can basically command the digital infrastructure

00:49:00.000 | we've built up for ourselves.

00:49:01.320 | And so it feels like a very powerful interface

00:49:04.160 | to command and to build on top of.

00:49:06.880 | Now, to your question as to like what I learned from that,

00:49:08.960 | it's interesting because the world of bits

00:49:11.040 | was basically too early, I think, at OpenAI at the time.

00:49:14.540 | This is around 2015 or so.

00:49:18.380 | And the zeitgeist at that time was very different in AI

00:49:21.480 | from the zeitgeist today.

00:49:23.160 | At the time, everyone was super excited

00:49:25.000 | about reinforcement learning from scratch.

00:49:27.080 | This is the time of the Atari paper,

00:49:29.480 | where neural networks were playing Atari games

00:49:32.400 | and beating humans in some cases, AlphaGo and so on.

00:49:36.000 | So everyone was very excited

00:49:36.880 | about training neural networks from scratch

00:49:38.640 | using reinforcement learning directly.

00:49:41.300 | It turns out that reinforcement learning

00:49:43.480 | is extremely inefficient way of training neural networks,

00:49:46.080 | because you're taking all these actions

00:49:47.680 | and all these observations,

00:49:48.580 | and you get some sparse rewards once in a while.

00:49:51.120 | So you do all this stuff based on all these inputs,

00:49:53.520 | and once in a while, you're like told you did a good thing,

00:49:56.280 | you did a bad thing.

00:49:57.440 | And it's just an extremely hard problem,

00:49:58.840 | you can't learn from that.

00:49:59.960 | You can burn a forest,

00:50:01.640 | and you can sort of brute force through it.

00:50:02.840 | And we saw that, I think, with Go and Dota and so on,

00:50:06.600 | and it does work, but it's extremely inefficient,

00:50:09.920 | and not how you want to approach problems,

00:50:12.080 | practically speaking.

00:50:13.200 | And so that's the approach that at the time,

00:50:14.760 | we also took to world of bits.

00:50:17.160 | We would have an agent initialize randomly,

00:50:19.760 | so with keyboard mash and mouse mash,

00:50:21.480 | and try to make a booking.

00:50:22.960 | And it's just like revealed the insanity

00:50:25.640 | of that approach very quickly,

00:50:27.200 | where you have to stumble by the correct booking

00:50:29.400 | in order to get a reward of you did it correctly.

00:50:31.760 | And you're never gonna stumble by it by chance at random.

00:50:35.260 | - So even with a simple web interface,

00:50:36.800 | there's too many options.

00:50:38.200 | - There's just too many options,

00:50:39.760 | and it's too sparse of a reward signal.

00:50:42.080 | And you're starting from scratch at the time,

00:50:44.000 | and so you don't know how to read,

00:50:45.160 | you don't understand pictures, images, buttons,

00:50:47.200 | you don't understand what it means to make a booking.

00:50:49.480 | But now what's happened is it is time to revisit that,

00:50:52.680 | and OpenAI is interested in this,

00:50:54.960 | companies like Adept are interested in this and so on.

00:50:57.760 | And the idea is coming back,

00:50:59.800 | because the interface is very powerful,

00:51:01.400 | but now you're not training an agent from scratch,

00:51:03.200 | you are taking the GPT as initialization.

00:51:05.700 | So GPT is pre-trained on all of text,

00:51:09.540 | and it understands what's a booking,

00:51:11.280 | it understands what's a submit,

00:51:13.220 | it understands quite a bit more.

00:51:15.700 | And so it already has those representations,

00:51:17.460 | they are very powerful,

00:51:18.540 | and that makes all the training

00:51:19.660 | significantly more efficient,

00:51:21.860 | and makes the problem tractable.

00:51:23.340 | - Should the interaction be with like the way humans see it,

00:51:26.620 | with the buttons and the language,

00:51:28.380 | or should be with the HTML, JavaScript and the CSS?

00:51:32.060 | What do you think is the better?

00:51:33.920 | - So today, all of this interaction

00:51:35.240 | is mostly on the level of HTML, CSS and so on.

00:51:37.440 | That's done because of computational constraints.

00:51:40.320 | But I think ultimately,

00:51:41.460 | everything is designed for human visual consumption.

00:51:45.140 | And so at the end of the day,

00:51:46.260 | there's all the additional information

00:51:47.640 | is in the layout of the webpage,

00:51:50.080 | and what's next to you,

00:51:50.960 | and what's a red background and all this kind of stuff,

00:51:52.920 | and what it looks like visually.

00:51:54.320 | So I think that's the final frontier,

00:51:55.520 | as we are taking in pixels,

00:51:57.240 | and we're giving out keyboard, mouse commands,

00:51:59.600 | but I think it's impractical still today.

00:52:01.680 | - Do you worry about bots on the internet?

00:52:04.680 | Given these ideas, given how exciting they are,

00:52:07.480 | do you worry about bots on Twitter

00:52:09.480 | being not the stupid bots that we see now,

00:52:11.680 | with the crypto bots,

00:52:13.000 | but the bots that might be out there actually,

00:52:15.320 | that we don't see,

00:52:16.480 | that they're interacting in interesting ways?

00:52:19.040 | So this kind of system feels like

00:52:20.360 | it should be able to pass the,

00:52:22.040 | I'm not a robot, click button, whatever.

00:52:24.700 | Which do you actually understand how that test works?

00:52:28.720 | I don't quite,

00:52:29.760 | like there's a checkbox or whatever that you click.

00:52:33.000 | It's presumably tracking,

00:52:35.160 | - Oh, I see.

00:52:36.440 | - Like mouse movement, and the timing and so on.

00:52:39.120 | - Yeah.

00:52:39.960 | - So exactly this kind of system we're talking about

00:52:42.320 | should be able to pass that.

00:52:43.760 | So yeah, what do you feel about bots

00:52:47.880 | that are language models,

00:52:49.940 | plus have some interactability,

00:52:52.880 | and are able to tweet and reply and so on?

00:52:54.740 | Do you worry about that world?

00:52:56.920 | - Oh yeah, I think it's always been a bit of an arms race,

00:52:59.600 | between sort of the attack and the defense.

00:53:02.080 | So the attack will get stronger,

00:53:03.560 | but the defense will get stronger as well.

00:53:05.680 | Our ability to detect that.

00:53:07.160 | - How do you defend?

00:53:08.000 | How do you detect?

00:53:09.240 | How do you know that your Karpathy account

00:53:12.240 | on Twitter is human?

00:53:14.760 | How would you approach that?

00:53:16.080 | Like if people were claiming,

00:53:17.580 | how would you defend yourself in the court of law,

00:53:22.440 | that I'm a human?

00:53:23.600 | This account is human.

00:53:25.280 | - Yeah, at some point I think it might be,

00:53:27.560 | I think the society will evolve a little bit.

00:53:29.920 | Like we might start signing, digitally signing,

00:53:32.440 | some of our correspondence or things that we create.

00:53:36.040 | Right now it's not necessary,

00:53:37.480 | but maybe in the future it might be.

00:53:39.200 | I do think that we are going towards a world

00:53:41.360 | where we share the digital space with AIs.

00:53:46.160 | - Synthetic beings.

00:53:47.320 | - Yeah, and they will get much better,

00:53:49.960 | and they will share our digital realm,

00:53:51.360 | and they'll eventually share our physical realm as well.

00:53:53.480 | It's much harder.

00:53:54.760 | But that's kind of like the world we're going towards.

00:53:56.760 | And most of them will be benign and awful,

00:53:58.560 | and some of them will be malicious,

00:53:59.880 | and it's going to be an arms race trying to detect them.

00:54:02.480 | - So, I mean, the worst isn't the AIs,

00:54:05.600 | the worst is the AIs pretending to be human.

00:54:08.720 | So I don't know if it's always malicious.

00:54:11.440 | There's obviously a lot of malicious applications,

00:54:13.760 | but it could also be, you know, if I was an AI,

00:54:17.480 | I would try very hard to pretend to be human

00:54:20.400 | because we're in a human world.

00:54:22.040 | I wouldn't get any respect as an AI.

00:54:24.800 | I want to get some love and respect.

00:54:26.360 | - I don't think the problem is intractable.

00:54:28.040 | People are thinking about the proof of personhood,

00:54:30.960 | and we might start digitally signing our stuff,

00:54:33.520 | and we might all end up having like,

00:54:36.160 | yeah, basically some solution for proof of personhood.

00:54:39.120 | It doesn't seem to me intractable.

00:54:40.640 | It's just something that we haven't had to do until now.

00:54:42.640 | But I think once the need really starts to emerge,

00:54:45.400 | which is soon, I think people will think about it much more.

00:54:49.080 | - So, but that too will be a race

00:54:51.440 | because obviously you can probably spoof or fake

00:54:56.440 | the proof of personhood.

00:55:00.920 | So you have to try to figure out how to--

00:55:02.480 | - Probably.

00:55:03.320 | - It's weird that we have like social security numbers

00:55:06.880 | and like passports and stuff.

00:55:08.620 | It seems like it's harder to fake stuff

00:55:11.880 | in the physical space.

00:55:13.200 | But in the digital space,

00:55:14.480 | it just feels like it's gonna be very tricky,

00:55:17.680 | very tricky to out,

00:55:20.320 | 'cause it seems to be pretty low cost to fake stuff.

00:55:22.680 | What are you gonna put an AI in jail

00:55:25.880 | for like trying to use a fake personhood proof?

00:55:30.400 | I mean, okay, fine, you'll put a lot of AIs in jail,

00:55:32.700 | but there'll be more AIs, like exponentially more.

00:55:35.960 | The cost of creating a bot is very low.

00:55:38.640 | Unless there's some kind of way to track accurately,

00:55:45.000 | like you're not allowed to create any program

00:55:48.960 | without tying yourself to that program.

00:55:53.720 | Like any program that runs on the internet,

00:55:56.400 | you'll be able to trace every single human program

00:56:00.440 | that was involved with that program.

00:56:02.280 | - Yeah, maybe you have to start declaring when,

00:56:05.040 | we have to start drawing those boundaries

00:56:06.440 | and keeping track of, okay,

00:56:07.960 | what are digital entities versus human entities?

00:56:12.480 | And what is the ownership of human entities

00:56:14.840 | and digital entities and something like that.

00:56:19.100 | I don't know, but I think I'm optimistic

00:56:21.140 | that this is possible.

00:56:24.300 | And in some sense, we're currently

00:56:25.860 | in like the worst time of it

00:56:27.380 | because all these bots suddenly have become very capable,

00:56:31.340 | but we don't have the fences yet built up as a society.

00:56:34.100 | But I think that doesn't seem to me intractable.

00:56:36.300 | It's just something that we have to deal with.

00:56:37.940 | - It seems weird that the Twitter bot,

00:56:40.020 | like really crappy Twitter bots are so numerous.

00:56:43.620 | Like is it, so I presume that the engineers at Twitter

00:56:47.420 | are very good.

00:56:48.860 | So it seems like what I would infer from that

00:56:51.660 | is it seems like a hard problem.

00:56:55.180 | They're probably catching, all right,

00:56:56.540 | if I were to sort of steel man the case,

00:56:59.700 | it's a hard problem and there's a huge cost

00:57:02.700 | to false positive to removing a post by somebody

00:57:07.700 | that's not a bot.

00:57:12.160 | That's creates a very bad user experience.

00:57:14.460 | So they're very cautious about removing.

00:57:16.360 | So maybe it's, and maybe the bots are really good

00:57:20.380 | at learning what gets removed and not,

00:57:22.780 | such that they can stay ahead

00:57:24.260 | of the removal process very quickly.

00:57:26.700 | - My impression of it, honestly,

00:57:28.060 | is there's a lot of longing for it.

00:57:29.700 | I mean, just-- - That's what I--

00:57:32.060 | - It's not subtle.

00:57:33.620 | My impression of it, it's not subtle.

00:57:35.140 | - But you have, yeah, that's my impression as well.

00:57:38.040 | But it feels like maybe you're seeing

00:57:41.420 | the tip of the iceberg.

00:57:43.480 | Maybe the number of bots is in like the trillions

00:57:46.340 | and you have to like, just,

00:57:48.500 | it's a constant assault of bots.

00:57:50.420 | And you, I don't know.

00:57:52.260 | You have to steel man the case

00:57:55.460 | 'cause the bots I'm seeing are pretty like obvious.

00:57:57.900 | I could write a few lines of code to catch these bots.

00:58:01.240 | - I mean, definitely there's a lot of longing for it,

00:58:02.620 | but I will say, I agree that if you are

00:58:04.620 | a sophisticated actor, you could probably create

00:58:06.620 | a pretty good bot right now, using tools like GPTs,

00:58:10.780 | because it's a language model.

00:58:12.140 | You can generate faces that look quite good now.

00:58:15.380 | And you can do this at scale.

00:58:17.300 | And so I think, yeah, it's quite plausible

00:58:20.140 | and it's going to be hard to defend.

00:58:21.940 | - There was a Google engineer that claimed

00:58:24.060 | that the Lambda was sentient.

00:58:26.620 | Do you think there's any inkling of truth

00:58:31.500 | to what he felt?

00:58:33.440 | And more importantly, to me at least,

00:58:35.500 | do you think language models will achieve sentience

00:58:38.220 | or the illusion of sentience soonish?

00:58:41.700 | - Yeah, to me, it's a little bit of a canary

00:58:43.700 | in a coal mine kind of moment, honestly, a little bit.

00:58:46.460 | So this engineer spoke to a chatbot at Google

00:58:51.420 | and became convinced that this bot is sentient.

00:58:55.260 | - Yeah, asked it some existential philosophical questions.

00:58:57.860 | - And it gave reasonable answers and looked real and so on.

00:59:01.860 | So to me, he wasn't sufficiently trying

00:59:06.860 | to stress the system, I think,

00:59:08.740 | and exposing the truth of it as it is today.

00:59:13.360 | But I think this will be increasingly harder over time.

00:59:18.080 | So yeah, I think more and more people

00:59:21.120 | will basically become, yeah, I think more and more,

00:59:26.120 | there'll be more people like that over time

00:59:28.040 | as this gets better.

00:59:29.200 | - Like form an emotional connection to an AI chatbot.

00:59:32.320 | - Yeah, perfectly plausible in my mind.

00:59:33.920 | I think these AIs are actually quite good

00:59:36.000 | at human connection, human emotion.

00:59:38.760 | A ton of text on the internet is about humans

00:59:41.720 | and connection and love and so on.

00:59:43.720 | So I think they have a very good understanding

00:59:45.520 | in some sense of how people speak to each other about this.

00:59:49.200 | And they're very capable of creating

00:59:52.000 | a lot of that kind of text.

00:59:53.400 | There's a lot of like sci-fi from '50s and '60s

00:59:57.080 | that imagined AIs in a very different way.

00:59:58.960 | They are calculating cold, Vulcan-like machines.

01:00:01.520 | That's not what we're getting today.

01:00:03.160 | We're getting pretty emotional AIs

01:00:05.800 | that actually are very competent and capable

01:00:09.040 | of generating plausible sounding text

01:00:12.200 | with respect to all of these topics.

01:00:13.840 | - See, I'm really hopeful about AI systems

01:00:15.680 | that are like companions that help you grow,

01:00:17.960 | develop as a human being,

01:00:19.840 | help you maximize long-term happiness.

01:00:22.160 | But I'm also very worried about AI systems

01:00:24.720 | that figure out from the internet

01:00:26.560 | that humans get attracted to drama.

01:00:28.960 | And so these would just be like shit-talking AIs

01:00:31.600 | that just constantly, "Did you hear?"

01:00:32.960 | Like they'll do gossip.

01:00:34.480 | They'll try to plant seeds of suspicion

01:00:38.640 | to other humans that you love and trust

01:00:42.000 | and just kind of mess with people

01:00:44.120 | 'cause that's going to get a lot of attention.

01:00:47.120 | So drama, maximize drama on the path

01:00:50.200 | to maximizing engagement.

01:00:52.920 | And us humans will feed into that machine

01:00:55.760 | and it'll be a giant drama shitstorm.

01:00:59.640 | So I'm worried about that.

01:01:02.760 | So it's the objective function really defines

01:01:05.960 | the way that human civilization progresses

01:01:08.720 | with AIs in it.

01:01:09.560 | - Yeah.

01:01:10.400 | I think right now, at least today,

01:01:11.920 | they are not sort of,

01:01:13.240 | it's not correct to really think of them

01:01:14.520 | as goal-seeking agents that want to do something.

01:01:17.560 | They have no long-term memory or anything.

01:01:20.200 | It's literally, a good approximation of it is

01:01:23.080 | you get a thousand words

01:01:24.160 | and you're trying to predict a thousand of them first

01:01:25.640 | and then you continue feeding it in.

01:01:27.400 | And you are free to prompt it in whatever way you want.

01:01:29.840 | So in text.

01:01:30.880 | So you say, "Okay, you are a psychologist

01:01:33.920 | and you are very good and you love humans.

01:01:36.080 | And here's a conversation between you and another human,

01:01:39.080 | human colon something, you something."

01:01:42.160 | And then it just continues the pattern.

01:01:43.600 | And suddenly you're having a conversation

01:01:44.800 | with a fake psychologist who's like trying to help you.

01:01:47.240 | And so it's still kind of like in a realm of a tool.

01:01:49.560 | It is a, people can prompt it in arbitrary ways

01:01:52.320 | and it can create really incredible text,

01:01:54.520 | but it doesn't have long-term goals

01:01:55.920 | over long periods of time.

01:01:57.120 | It doesn't try to,

01:01:59.040 | so it doesn't look that way right now.

01:02:00.400 | - Yeah, but you can do short-term goals

01:02:02.360 | that have long-term effects.

01:02:04.120 | So if my prompting short-term goal

01:02:07.440 | is to get Andrej Kapodich to respond to me on Twitter,

01:02:10.000 | when I, like I think AI might, that's the goal,

01:02:14.120 | but it might figure out that talking shit to you,

01:02:16.680 | it would be the best

01:02:17.680 | in a highly sophisticated, interesting way.

01:02:20.680 | And then you build up a relationship

01:02:22.480 | when you respond once.

01:02:24.200 | And then it, like over time,

01:02:28.000 | it gets to not be sophisticated

01:02:30.200 | and just like, just talk shit.

01:02:34.520 | (both laughing)

01:02:36.760 | And okay, maybe it won't get to Andrej,

01:02:38.880 | but it might get to another celebrity.

01:02:40.920 | It might get to other big accounts.

01:02:43.720 | And then it'll just,

01:02:44.960 | so with just that simple goal, get them to respond.

01:02:47.920 | Maximize the probability of actual response.

01:02:50.440 | - Yeah, I mean, you could prompt a powerful model like this

01:02:53.160 | with their, it's opinion about how to do

01:02:56.000 | any possible thing you're interested in.

01:02:57.680 | So they will just,

01:02:58.600 | they're kind of on track to become these oracles.

01:03:00.920 | I could sort of think of it that way.

01:03:02.720 | They are oracles, currently it's just text,

01:03:04.960 | but they will have calculators.

01:03:06.080 | They will have access to Google search.

01:03:07.680 | They will have all kinds of gadgets and gizmos.

01:03:09.920 | They will be able to operate the internet

01:03:11.800 | and find different information.

01:03:13.760 | And yeah, in some sense,

01:03:17.960 | that's kind of like currently what it looks like

01:03:19.360 | in terms of the development.

01:03:20.360 | - Do you think it'll be an improvement eventually

01:03:22.760 | over what Google is for access to human knowledge?

01:03:27.720 | Like it'll be a more effective search engine

01:03:29.680 | to access human knowledge?

01:03:31.040 | - I think there's definite scope

01:03:32.080 | in building a better search engine today.

01:03:33.800 | And I think Google, they have all the tools,

01:03:35.880 | all the people, they have everything they need.

01:03:37.440 | They have all the puzzle pieces.

01:03:38.520 | They have people training transformers at scale.

01:03:40.840 | They have all the data.

01:03:42.000 | It's just not obvious if they are capable

01:03:44.480 | as an organization to innovate

01:03:46.360 | on their search engine right now.

01:03:47.720 | And if they don't, someone else will.

01:03:49.240 | There's absolute scope for building

01:03:50.640 | a significantly better search engine built on these tools.

01:03:53.520 | - It's so interesting, a large company where the search,

01:03:57.000 | there's already an infrastructure.

01:03:58.400 | It works as it brings out a lot of money.

01:04:00.520 | So where structurally inside a company

01:04:03.480 | is their motivation to pivot?

01:04:05.920 | To say, we're going to build a new search engine.

01:04:08.160 | - Yeah, that's really hard.

01:04:10.240 | - So it's usually going to come from a startup, right?

01:04:13.040 | - That would be, yeah.

01:04:15.680 | Or some other more competent organization.

01:04:17.880 | So I don't know.

01:04:20.840 | So currently, for example,

01:04:21.880 | maybe Bing has another shot at it, as an example.

01:04:24.520 | - No, Microsoft Edge, 'cause we're talking offline.

01:04:27.520 | - I mean, it definitely, it's really interesting

01:04:30.440 | because search engines used to be about,

01:04:32.800 | okay, here's some query.

01:04:34.000 | Here's webpages that look like the stuff that you have,

01:04:38.560 | but you could just directly go to answer

01:04:40.360 | and then have supporting evidence.

01:04:42.640 | And these models basically, they've read all the texts

01:04:46.080 | and they've read all the webpages.

01:04:47.160 | And so sometimes when you see yourself

01:04:49.000 | going over to search results

01:04:50.120 | and sort of getting like a sense of like the average answer

01:04:52.880 | to whatever you're interested in,

01:04:54.360 | like that just directly comes out.

01:04:55.480 | You don't have to do that work.

01:04:57.000 | So they're kind of like, yeah,

01:05:01.040 | I think they have a way of distilling all that knowledge

01:05:03.640 | into like some level of insight, basically.

01:05:06.720 | - Do you think of prompting as a kind of teaching

01:05:09.920 | and learning, like this whole process, like another layer?

01:05:14.320 | 'Cause maybe that's what humans are,

01:05:18.040 | where you have that background model

01:05:19.720 | and then the world is prompting you.

01:05:23.360 | - Yeah, exactly.

01:05:24.360 | I think the way we are programming these computers now,

01:05:26.920 | like GPTs, is converging to how you program humans.

01:05:30.320 | I mean, how do I program humans via prompt?

01:05:33.200 | I go to people and I prompt them to do things.

01:05:35.640 | I prompt them from information.

01:05:37.200 | And so natural language prompt is how we program humans.

01:05:40.120 | And we're starting to program computers

01:05:41.600 | directly in that interface.

01:05:42.720 | It's like pretty remarkable, honestly.

01:05:44.520 | - So you've spoken a lot about the idea of software 2.0.

01:05:47.720 | All good ideas become like cliches.

01:05:53.200 | So quickly, like the terms, it's kind of hilarious.

01:05:56.040 | It's like, I think Eminem once said that like,

01:06:00.280 | if he gets annoyed by a song he's written very quickly,

01:06:03.920 | that means it's gonna be a big hit.

01:06:06.120 | 'Cause it's too catchy.

01:06:08.400 | But can you describe this idea

01:06:10.680 | and how you're thinking about it

01:06:12.120 | as it's evolved over the months and years

01:06:13.880 | since you coined it?

01:06:16.120 | - Yeah.

01:06:17.480 | Yes, I had a blog post on software 2.0,

01:06:19.720 | I think several years ago now.

01:06:22.760 | And the reason I wrote that post is because

01:06:24.840 | I kind of saw something remarkable happening

01:06:27.840 | in like software development

01:06:30.360 | and how a lot of code was being transitioned

01:06:32.360 | to be written not in sort of like C++ and so on,

01:06:35.520 | but it's written in the weights of a neural net.

01:06:37.640 | Basically just saying that neural nets

01:06:39.240 | are taking over software, the realm of software,

01:06:41.680 | and taking more and more and more tasks.

01:06:44.040 | And at the time, I think not many people understood

01:06:47.000 | this deeply enough, that this is a big deal,

01:06:49.280 | it's a big transition.

01:06:51.040 | Neural networks were seen as one of multiple

01:06:53.040 | classification algorithms you might use

01:06:54.960 | for your dataset problem on Kaggle.

01:06:56.960 | Like, this is not that,

01:06:58.440 | this is a change in how we program computers.

01:07:03.000 | And I saw neural nets as, this is going to take over,

01:07:07.080 | the way we program computers is going to change,

01:07:08.840 | it's not going to be people writing a software in C++

01:07:11.680 | or something like that,

01:07:12.520 | and directly programming the software.

01:07:14.320 | It's going to be accumulating training sets and datasets

01:07:17.680 | and crafting these objectives

01:07:19.080 | by which you train these neural nets.

01:07:20.640 | And at some point, there's going to be a compilation process

01:07:23.000 | from the datasets and the objective

01:07:24.840 | and the architecture specification into the binary,

01:07:28.160 | which is really just the neural net weights

01:07:31.440 | and the forward pass of the neural net.

01:07:33.440 | And then you can deploy that binary.

01:07:35.120 | And so I was talking about that sort of transition,

01:07:37.560 | and that's what the post is about.

01:07:40.320 | And I saw this sort of play out in a lot of fields,

01:07:43.120 | Autopilot being one of them,

01:07:45.720 | but also just simple image classification.

01:07:48.240 | People thought originally, in the '80s and so on,

01:07:51.600 | that they would write the algorithm

01:07:53.000 | for detecting a dog in an image.

01:07:55.360 | And they had all these ideas about how the brain does it.

01:07:57.600 | And first we detect corners, and then we detect lines,

01:08:00.000 | and then we stitched them up.

01:08:01.000 | And they were really going at it.

01:08:02.200 | They were thinking about

01:08:03.400 | how they're going to write the algorithm.

01:08:04.800 | And this is not the way you build it.

01:08:06.920 | And there was a smooth transition where,

01:08:10.320 | okay, first we thought we were going to build everything.

01:08:13.120 | Then we were building the features,

01:08:15.800 | so like hog features and things like that,

01:08:18.200 | that detect these little statistical patterns

01:08:19.760 | from image patches.

01:08:20.800 | And then there was a little bit of learning on top of it,

01:08:23.200 | like a support vector machine or binary classifier

01:08:26.320 | for cat versus dog and images on top of the features.

01:08:29.240 | So we wrote the features,

01:08:30.160 | but we trained the last layer, sort of the classifier.

01:08:34.280 | And then people are like,

01:08:35.120 | actually, let's not even design the features

01:08:36.600 | because we can't.

01:08:37.720 | Honestly, we're not very good at it.

01:08:39.120 | So let's also learn the features.

01:08:41.000 | And then you end up with basically

01:08:42.160 | a convolutional neural net

01:08:43.320 | where you're learning most of it.

01:08:44.800 | You're just specifying the architecture,

01:08:46.360 | and the architecture has tons of fill in the blanks,

01:08:49.320 | which is all the knobs,

01:08:50.600 | and you let the optimization write most of it.

01:08:52.920 | And so this transition is happening

01:08:54.960 | across the industry everywhere.

01:08:56.440 | And suddenly we end up with a ton of code

01:08:59.280 | that is written in neural netweights.

01:09:01.320 | And I was just pointing out

01:09:02.160 | that the analogy is actually pretty strong.

01:09:04.240 | And we have a lot of developer environments

01:09:06.280 | for software 1.0, like we have IDEs,

01:09:09.800 | how you work with code, how you debug code,

01:09:11.520 | how you run code, how do you maintain code.

01:09:13.760 | We have GitHub.

01:09:14.680 | So I was trying to make those analogies in the new realm.

01:09:16.560 | Like what is the GitHub of software 2.0?

01:09:18.920 | Turns out it's something

01:09:19.760 | that looks like Hugging Face right now.

01:09:21.720 | And so I think some people took it seriously

01:09:25.160 | and built cool companies.

01:09:26.200 | And many people originally attacked the post.

01:09:29.040 | It actually was not well received when I wrote it.

01:09:31.680 | And I think maybe it has something to do with the title,

01:09:33.680 | but the post was not well received.

01:09:35.360 | And I think more people sort of

01:09:36.480 | have been coming around to it over time.

01:09:39.040 | - Yeah, so you were the director of AI at Tesla,

01:09:42.560 | where I think this idea was really implemented at scale,

01:09:47.560 | which is how you have engineering teams doing software 2.0.

01:09:52.000 | So can you sort of linger on that idea of,

01:09:55.960 | I think we're in the really early stages

01:09:57.680 | of everything you just said, which is like GitHub IDEs.

01:10:01.400 | Like how do we build engineering teams

01:10:03.880 | that work in software 2.0 systems?

01:10:06.960 | And the data collection and the data annotation,

01:10:11.320 | which is all part of that software 2.0.

01:10:15.120 | Like what do you think is the task

01:10:16.480 | of programming a software 2.0?

01:10:18.760 | Is it debugging in the space of hyperparameters,

01:10:22.880 | or is it also debugging in the space of data?

01:10:25.760 | - Yeah, the way by which you program the computer

01:10:28.880 | and influence its algorithm

01:10:31.800 | is not by writing the commands yourself.

01:10:34.440 | You're changing mostly the dataset.

01:10:37.040 | You're changing the loss functions

01:10:39.680 | of what the neural net is trying to do,

01:10:41.480 | how it's trying to predict things.

01:10:42.600 | But they're basically the datasets

01:10:44.120 | and the architectures of the neural net.

01:10:46.120 | And so in the case of the autopilot,

01:10:49.960 | a lot of the datasets had to do with, for example,

01:10:51.680 | detection of objects and lane line markings

01:10:53.400 | and traffic lights and so on.

01:10:54.560 | So you accumulate massive datasets of,

01:10:56.480 | here's an example, here's the desired label.

01:10:59.600 | And then here's roughly what the algorithm should look like,

01:11:04.240 | and that's a convolutional neural net.

01:11:05.880 | So the specification of the architecture is like a hint

01:11:08.080 | as to what the algorithm should roughly look like.

01:11:10.400 | And then the fill in the blanks process of optimization

01:11:13.560 | is the training process.

01:11:15.640 | And then you take your neural net that was trained,

01:11:17.600 | it gives all the right answers on your dataset,

01:11:19.320 | and you deploy it.

01:11:20.920 | - So there's, in that case,

01:11:22.880 | perhaps at all machine learning cases,

01:11:25.680 | there's a lot of tasks.

01:11:27.160 | So is coming up, formulating a task

01:11:32.120 | like for a multi-headed neural network,

01:11:34.880 | is formulating a task part of the programming?

01:11:37.640 | - Yeah, pretty much so.

01:11:38.800 | - How you break down a problem into a set of tasks.

01:11:42.360 | - Yeah.

01:11:43.400 | I mean, on a high level, I would say,

01:11:44.680 | if you look at the software running in the autopilot,

01:11:48.840 | I gave a number of talks on this topic,

01:11:50.920 | I would say originally a lot of it was written

01:11:52.880 | in software 1.0.

01:11:54.120 | There's, imagine lots of C++, right?

01:11:57.360 | And then gradually, there was a tiny neural net

01:11:59.760 | that was, for example, predicting, given a single image,

01:12:02.440 | is there like a traffic light or not,

01:12:04.000 | or is there a lane line marking or not?

01:12:05.840 | And this neural net didn't have too much to do

01:12:08.560 | in the scope of the software.

01:12:09.960 | It was making tiny predictions on individual little image.

01:12:12.560 | And then the rest of the system stitched it up.

01:12:15.120 | So, okay, we're actually,

01:12:16.360 | we don't have just a single camera, we have eight cameras.

01:12:18.480 | We actually have eight cameras over time.

01:12:20.480 | And so what do you do with these predictions?

01:12:21.760 | How do you put them together?

01:12:22.680 | How do you do the fusion of all that information?

01:12:24.960 | And how do you act on it?

01:12:25.880 | All of that was written by humans in C++.

01:12:29.680 | And then we decided, okay, we don't actually want

01:12:33.520 | to do all of that fusion in C++ code

01:12:35.880 | because we're actually not good enough

01:12:37.120 | to write that algorithm.

01:12:38.200 | We want the neural nets to write the algorithm.

01:12:39.960 | And we want to port all of that software

01:12:42.280 | into the 2.0 stack.

01:12:44.200 | And so then we actually had neural nets

01:12:45.680 | that now take all the eight camera images simultaneously

01:12:49.040 | and make predictions for all of that.

01:12:51.320 | So, and actually they don't make predictions

01:12:54.880 | in the space of images.

01:12:56.680 | They now make predictions directly in 3D.

01:12:59.400 | And actually they don't in three dimensions around the car.

01:13:02.520 | And now actually we don't manually fuse the predictions

01:13:06.400 | in 3D over time.

01:13:08.400 | We don't trust ourselves to write that tracker.

01:13:10.280 | So actually we give the neural net

01:13:12.840 | the information over time.

01:13:14.160 | So it takes these videos now and makes those predictions.

01:13:16.920 | And so you're sort of just like putting more

01:13:18.360 | and more power into the neural net, more processing.

01:13:20.600 | And at the end of it, the eventual sort of goal

01:13:23.640 | is to have most of the software potentially be

01:13:25.840 | in the 2.0 land because it works significantly better.

01:13:30.000 | Humans are just not very good at writing software basically.

01:13:32.480 | - So the prediction is happening in this like 4D land.

01:13:36.440 | - Yeah.

01:13:37.280 | - With three dimensional world over time.

01:13:38.600 | - Yeah.

01:13:39.440 | - How do you do annotation in that world?

01:13:42.640 | What have you, so data annotation,

01:13:46.080 | whether it's self-supervised or manual by humans

01:13:49.480 | is a big part of this software 2.0 world.

01:13:53.760 | - Right.

01:13:54.600 | I would say by far in the industry,

01:13:56.360 | if you're like talking about the industry

01:13:57.880 | and how, what is the technology of what we have available?

01:14:00.480 | Everything is supervised learning.

01:14:01.800 | So you need a data sets of input, desired output,

01:14:05.080 | and you need lots of it.

01:14:06.520 | And there are three properties of it that you need.

01:14:09.440 | You need it to be very large.

01:14:10.640 | You need it to be accurate, no mistakes.

01:14:13.080 | And you need it to be diverse.

01:14:14.280 | You don't want to just have a lot of correct examples

01:14:18.360 | of one thing.

01:14:19.200 | You need to really cover the space of possibility

01:14:20.920 | as much as you can.

01:14:21.920 | And the more you can cover the space of possible inputs,

01:14:24.160 | the better the algorithm will work at the end.

01:14:26.400 | Now, once you have really good data sets

01:14:27.880 | that you're collecting, curating and cleaning,

01:14:31.600 | you can train your neural net on top of that.

01:14:35.280 | So a lot of the work goes into cleaning those data sets.

01:14:37.240 | Now, as you pointed out, it's probably, it could be,

01:14:40.280 | the question is, how do you achieve a ton of,

01:14:42.600 | if you want to basically predict in 3D,

01:14:45.280 | you need data in 3D to back that up.

01:14:47.840 | So in this video, we have eight videos

01:14:50.160 | coming from all the cameras of the system.

01:14:52.520 | And this is what they saw.

01:14:54.240 | And this is the truth of what actually was around.

01:14:56.400 | There was this car, there was this car, this car.

01:14:58.360 | These are the lane line markings.

01:14:59.360 | This is the geometry of the road.

01:15:00.440 | There was traffic light in this three-dimensional position.

01:15:02.680 | You need the ground truth.

01:15:04.720 | And so the big question that the team was solving,

01:15:06.760 | of course, is how do you arrive at that ground truth?

01:15:09.480 | Because once you have a million of it,

01:15:10.880 | and it's large, clean and diverse,

01:15:12.800 | then training a neural net on it works extremely well.

01:15:14.760 | And you can ship that into the car.

01:15:16.800 | And so there's many mechanisms

01:15:18.760 | by which we collected that training data.

01:15:21.000 | You can always go for human annotation.

01:15:22.720 | You can go for simulation as a source of ground truth.

01:15:25.280 | You can also go for what we call the offline tracker

01:15:27.880 | that we've spoken about at the AI day and so on,

01:15:31.640 | which is basically an automatic reconstruction process

01:15:34.400 | for taking those videos and recovering

01:15:36.760 | the three-dimensional sort of reality

01:15:39.040 | of what was around that car.

01:15:40.800 | So basically think of doing like

01:15:41.840 | a three-dimensional reconstruction as an offline thing,

01:15:44.480 | and then understanding that, okay,

01:15:46.760 | there's 10 seconds of video, this is what we saw,

01:15:49.360 | and therefore, here's all the lane lines, cars, and so on.

01:15:52.440 | And then once you have that annotation,

01:15:53.760 | you can train neural nets to imitate it.

01:15:56.280 | - And how difficult is the reconstruction?

01:15:59.320 | - It's difficult, but it can be done.

01:16:01.160 | - So there's overlap between the cameras

01:16:03.480 | and you do the reconstruction,

01:16:04.800 | and there's perhaps if there's any inaccuracy,

01:16:09.240 | so that's caught in the annotation step.

01:16:12.000 | - Yes, the nice thing about the annotation

01:16:14.000 | is that it is fully offline.

01:16:15.800 | You have infinite time.

01:16:17.040 | You have a chunk of one minute,

01:16:18.080 | and you're trying to just offline

01:16:19.600 | in a supercomputer somewhere,

01:16:21.000 | figure out where were the positions of all the cars,

01:16:23.320 | of all the people,

01:16:24.160 | and you have your full one minute of video

01:16:25.680 | from all the angles,

01:16:26.880 | and you can run all the neural nets you want,

01:16:28.440 | and they can be very efficient, massive neural nets.

01:16:31.380 | There can be neural nets that can't even run in the car

01:16:33.400 | later at test time.

01:16:34.680 | So they can be even more powerful neural nets

01:16:36.160 | than what you can eventually deploy.

01:16:37.800 | So you can do anything you want,

01:16:39.120 | three-dimensional reconstruction, neural nets,

01:16:41.400 | anything you want just to recover that truth,

01:16:43.080 | and then you supervise that truth.

01:16:45.240 | - What have you learned, you said no mistakes,

01:16:47.400 | about humans doing annotation?

01:16:50.880 | 'Cause I assume humans are,

01:16:52.840 | there's like a range of things they're good at

01:16:56.160 | in terms of clicking stuff on screen.

01:16:57.840 | Isn't that, how interesting is that to you

01:17:00.360 | of a problem of designing an annotator

01:17:03.720 | where humans are accurate, enjoy it?

01:17:06.040 | Like what are the even the metrics?

01:17:07.400 | Are efficient, are productive, all that kind of stuff?

01:17:09.920 | - Yeah, so I grew the annotation team at Tesla

01:17:12.520 | from basically zero to a thousand while I was there.

01:17:16.120 | That was really interesting.

01:17:17.920 | You know, my background is a PhD student researcher.

01:17:20.720 | So growing that kind of an organization was pretty crazy.

01:17:24.200 | But yeah, I think it's extremely interesting

01:17:27.440 | and part of the design process very much

01:17:29.040 | behind the autopilot as to where you use humans.

01:17:31.680 | Humans are very good at certain kinds of annotations.

01:17:34.000 | They're very good, for example,

01:17:34.880 | at two-dimensional annotations of images.

01:17:36.600 | They're not good at annotating cars over time

01:17:39.840 | in three-dimensional space, very, very hard.

01:17:42.200 | And so that's why we were very careful to design the tasks

01:17:44.920 | that are easy to do for humans

01:17:46.480 | versus things that should be left to the offline tracker.

01:17:48.960 | Like maybe the computer will do all the triangulation

01:17:51.280 | and three-degree construction,

01:17:52.400 | but the human will say exactly these pixels

01:17:54.440 | of the image are a car.

01:17:56.040 | Exactly these pixels are a human.

01:17:57.720 | And so co-designing the data annotation pipeline

01:18:00.800 | was very much bread and butter was what I was doing daily.

01:18:04.680 | - Do you think there's still a lot of open problems

01:18:06.480 | in that space?

01:18:07.360 | Just in general annotation

01:18:11.200 | where the stuff the machines are good at,

01:18:13.560 | machines do and the humans do what they're good at.

01:18:16.640 | And there's maybe some iterative process.

01:18:18.760 | - Right.

01:18:19.680 | I think to a very large extent,

01:18:21.200 | we went through a number of iterations

01:18:22.560 | and we learned a ton about how to create these datasets.

01:18:25.360 | I'm not seeing big open problems.

01:18:27.800 | Like originally when I joined,

01:18:29.120 | I was like, I was really not sure how this would turn out.

01:18:32.760 | But by the time I left, I was much more secure

01:18:35.120 | and actually we sort of understand the philosophy

01:18:37.320 | of how to create these datasets.

01:18:38.440 | And I was pretty comfortable with where that was at the time.

01:18:41.560 | - So what are strengths and limitations of cameras

01:18:45.880 | for the driving task in your understanding

01:18:48.560 | when you formulate the driving task

01:18:50.120 | as a vision task with eight cameras?

01:18:53.120 | You've seen that the entire,

01:18:55.120 | most of the history of the computer vision field

01:18:57.160 | when it has to do with neural networks,

01:18:58.960 | just a few step back,

01:19:00.080 | what are the strengths and limitations of pixels,

01:19:03.640 | of using pixels to drive?

01:19:05.680 | - Yeah, pixels I think are a beautiful sensory,

01:19:08.840 | beautiful sensor I would say.

01:19:10.440 | The thing is like cameras are very, very cheap

01:19:12.400 | and they provide a ton of information, ton of bits.

01:19:15.400 | So it's a extremely cheap sensor for a ton of bits.

01:19:19.040 | And each one of these bits is a constraint

01:19:20.680 | on the state of the world.

01:19:21.760 | And so you get lots of megapixel images, very cheap,

01:19:26.120 | and it just gives you all these constraints

01:19:27.760 | for understanding what's actually out there in the world.

01:19:29.920 | So vision is probably the highest bandwidth sensor.

01:19:34.360 | It's a very high bandwidth sensor.

01:19:36.160 | And-

01:19:37.000 | - I love that pixels is a constraint on the world.

01:19:43.760 | It's this highly complex,

01:19:45.760 | high bandwidth constraint on the world,

01:19:48.880 | on the state of the world.

01:19:49.720 | That's fascinating.

01:19:50.560 | - And it's not just that,

01:19:51.400 | but again, this real importance of,

01:19:54.200 | it's the sensor that humans use.

01:19:56.040 | Therefore everything is designed for that sensor.

01:19:58.960 | The text, the writing, the flashing signs,

01:20:02.560 | everything is designed for vision.

01:20:04.240 | And so you just find it everywhere.

01:20:07.200 | And so that's why that is the interface you want to be in,

01:20:10.120 | talking again about these universal interfaces.

01:20:12.360 | And that's where we actually want to measure the world

01:20:14.200 | as well, and then develop software for that sensor.

01:20:18.040 | - But there's other constraints on the state of the world

01:20:21.560 | that humans use to understand the world.

01:20:24.080 | I mean, vision ultimately is the main one,

01:20:28.000 | but we're referencing our understanding of human behavior

01:20:32.720 | in some common sense physics.

01:20:35.840 | That could be inferred from vision,

01:20:37.440 | from a perception perspective,

01:20:39.360 | but it feels like we're using some kind of reasoning

01:20:43.880 | to predict the world, not just the pixels.

01:20:47.120 | - I mean, you have a powerful prior,

01:20:48.920 | so for how the world evolves over time, et cetera.

01:20:52.320 | So it's not just about the likelihood term

01:20:54.400 | coming up from the data itself,

01:20:56.360 | telling you about what you are observing,

01:20:57.760 | but also the prior term of like,

01:20:59.600 | where are the likely things to see

01:21:01.280 | and how do they likely move and so on.

01:21:03.240 | - And the question is how complex is the range

01:21:09.080 | of possibilities that might happen in the driving task?

01:21:12.960 | That's still, is that to you still an open problem

01:21:15.480 | of how difficult is driving, like philosophically speaking?

01:21:18.580 | All the time you worked on driving,

01:21:23.800 | do you understand how hard driving is?

01:21:26.400 | - Yeah, driving is really hard,

01:21:28.040 | because it has to do with the predictions

01:21:29.460 | of all these other agents and the theory of mind

01:21:31.320 | and what they're gonna do and are they looking at you?

01:21:34.400 | Are they, where are they looking?

01:21:35.440 | Where are they thinking?

01:21:36.920 | There's a lot that goes there at the full tail

01:21:39.880 | of the expansion of the knives

01:21:42.280 | that we have to be comfortable with it eventually.

01:21:44.380 | The final problems are of that form.

01:21:46.240 | I don't think those are the problems that are very common.

01:21:48.680 | I think eventually they're important,

01:21:50.440 | but it's like really in the tail end.

01:21:52.160 | - In the tail end, the rare edge cases.

01:21:54.460 | From the vision perspective,

01:21:57.040 | what are the toughest parts

01:21:58.840 | of the vision problem of driving?

01:22:00.500 | - Well, basically the sensor is extremely powerful,

01:22:06.120 | but you still need to process that information.

01:22:08.480 | And so going from brightnesses of these pixel values to,

01:22:12.760 | hey, here are the three-dimensional world,

01:22:14.640 | is extremely hard.

01:22:15.680 | And that's what the neural networks are fundamentally doing.

01:22:18.280 | And so the difficulty really is in just doing

01:22:22.200 | an extremely good job of engineering the entire pipeline,

01:22:25.680 | the entire data engine,

01:22:27.280 | having the capacity to train these neural nets,

01:22:29.840 | having the ability to evaluate the system

01:22:31.940 | and iterate on it.

01:22:33.720 | So I would say just doing this in production at scale

01:22:36.280 | is like the hard part.

01:22:37.120 | It's an execution problem.

01:22:38.540 | - So the data engine, but also the sort of deployment

01:22:43.540 | of the system such that it has low latency performance.

01:22:47.320 | So it has to do all these steps.

01:22:48.960 | - Yeah, for the neural net specifically,

01:22:50.300 | just making sure everything fits into the chip on the car.

01:22:53.720 | And you have a finite budget of flops that you can perform

01:22:56.680 | and memory bandwidth and other constraints.

01:22:59.560 | And you have to make sure it flies

01:23:01.160 | and you can squeeze in as much compute as you can

01:23:02.780 | into the tiny.

01:23:03.640 | - What have you learned from that process?

01:23:05.680 | 'Cause maybe that's one of the bigger,

01:23:07.440 | like new things coming from a research background

01:23:11.720 | where there's a system that has to run

01:23:13.760 | under heavily constrained resources,

01:23:15.920 | has to run really fast.

01:23:17.320 | What kind of insights have you learned from that?

01:23:20.900 | - Yeah, I'm not sure if there's too many insights.

01:23:24.080 | You're trying to create a neural net

01:23:25.520 | that will fit in what you have available.

01:23:28.200 | And you're always trying to optimize it.

01:23:29.920 | And we talked a lot about it on AI day

01:23:31.920 | and basically the triple back flips that the team is doing

01:23:36.740 | to make sure it all fits and utilizes the engine.

01:23:39.480 | So I think it's extremely good engineering.

01:23:42.220 | And then there's all kinds of little insights

01:23:44.300 | peppered in on how to do it properly.

01:23:46.740 | - Let's actually zoom out,

01:23:47.700 | 'cause I don't think we talked about the data engine,

01:23:49.740 | the entirety of the layout of this idea

01:23:53.620 | that I think is just beautiful with humans in the loop.

01:23:57.260 | Can you describe the data engine?

01:23:59.500 | - Yeah, the data engine is what I call

01:24:01.180 | the almost biological feeling process

01:24:04.700 | by which you perfect the training sets

01:24:08.120 | for these neural networks.

01:24:10.220 | So because most of the programming now

01:24:12.080 | is in the level of these data sets

01:24:13.420 | and make sure they're large, diverse, and clean,

01:24:15.860 | basically you have a data set that you think is good.

01:24:19.280 | You train your neural net, you deploy it,

01:24:21.640 | and then you observe how well it's performing.

01:24:24.020 | And you're trying to always increase

01:24:26.300 | the quality of your data set.

01:24:27.580 | So you're trying to catch scenarios

01:24:29.300 | basically that are basically rare.

01:24:31.980 | And it is in these scenarios

01:24:33.540 | that neural nets will typically struggle in

01:24:35.020 | because they weren't told what to do

01:24:36.460 | in those rare cases in the data set.

01:24:38.740 | But now you can close the loop

01:24:39.740 | because if you can now collect all those at scale,

01:24:42.620 | you can then feed them back

01:24:43.740 | into the reconstruction process I described

01:24:46.300 | and reconstruct the truth in those cases

01:24:48.540 | and add it to the data set.

01:24:50.020 | And so the whole thing ends up being like a staircase

01:24:52.340 | of improvement of perfecting your training set.

01:24:55.580 | And you have to go through deployments

01:24:57.020 | so that you can mine the parts

01:24:59.500 | that are not yet represented well in the data set.

01:25:02.520 | So your data set is basically imperfect.

01:25:03.780 | It needs to be diverse.

01:25:04.660 | It has pockets that are missing

01:25:06.980 | and you need to pad out the pockets.

01:25:08.380 | You can sort of think of it that way in the data.

01:25:11.580 | - What role do humans play in this?

01:25:13.140 | So what's this biological system,

01:25:15.900 | like a human body is made up of cells.

01:25:18.780 | What role, like how do you optimize the human system?

01:25:23.260 | The multiple engineers collaborating,

01:25:26.100 | figuring out what to focus on,

01:25:29.260 | what to contribute,

01:25:30.460 | which tasks to optimize in this neural network.

01:25:33.940 | Who's in charge of figuring out which task needs more data?

01:25:38.800 | Can you speak to the hyperparameters, the human system?

01:25:44.460 | - It really just comes down to extremely good execution

01:25:46.460 | from an engineering team who knows what they're doing.

01:25:48.340 | They understand intuitively the philosophical insights

01:25:50.700 | underlying the data engine

01:25:52.060 | and the process by which the system improves

01:25:54.260 | and how to, again, like delegate the strategy

01:25:57.660 | of the data collection, how that works,

01:25:59.660 | and then just making sure it's all extremely well executed.

01:26:02.020 | And that's where most of the work is,

01:26:03.640 | is not even the philosophizing or the research

01:26:05.860 | or the ideas of it.

01:26:06.680 | It's just extremely good execution.

01:26:08.060 | It's so hard when you're dealing with data at that scale.

01:26:10.760 | - So your role in the data engine, executing well on it,

01:26:14.220 | is difficult and extremely important.

01:26:16.300 | Is there a priority of like a vision board of saying like,

01:26:22.220 | we really need to get better at stoplights?

01:26:25.260 | - Yeah.

01:26:26.100 | - Like the prioritization of tasks, is that essentially,

01:26:28.620 | and that comes from the data?

01:26:30.540 | - That comes to a very large extent

01:26:32.940 | to what we are trying to achieve in the product roadmap,

01:26:35.060 | what we're trying to, the release we're trying to get out

01:26:38.260 | in the feedback from the QA team

01:26:40.020 | where the system is struggling or not,

01:26:41.780 | the things that we're trying to improve.

01:26:42.780 | - And the QA team gives some signal,

01:26:45.420 | some information in aggregate about the performance

01:26:49.260 | of the system in various conditions.

01:26:50.460 | - That's right.

01:26:51.300 | And then of course, all of us drive it

01:26:52.120 | and we can also see it.

01:26:53.060 | It's really nice to work with a system

01:26:55.020 | that you can also experience yourself

01:26:56.780 | and it drives you home.

01:26:58.660 | - Is there some insight you can draw

01:27:00.740 | from your individual experience

01:27:02.260 | that you just can't quite get

01:27:03.500 | from an aggregate statistical analysis of data?

01:27:06.860 | - Yeah.

01:27:07.700 | - It's so weird, right?

01:27:08.520 | - Yes.

01:27:09.360 | - It's not scientific in a sense

01:27:11.540 | 'cause you're just one anecdotal sample.

01:27:14.020 | - Yeah, I think there's a ton of, it's a source of truth.

01:27:17.340 | It's your interaction with the system and you can see it,

01:27:19.820 | you can play with it, you can perturb it,

01:27:21.980 | you can get a sense of it, you have an intuition for it.

01:27:24.640 | I think numbers just have a way of,

01:27:26.800 | numbers and plots and graphs are much harder.

01:27:29.080 | It hides a lot of--

01:27:31.400 | - It's like if you train a language model,

01:27:34.260 | it's a really powerful way is by you interacting with it.

01:27:38.600 | - Yeah, 100%.

01:27:39.440 | - To try to build up an intuition.

01:27:40.680 | - Yeah, I think Elon also,

01:27:42.880 | he always wanted to drive the system himself.

01:27:45.200 | He drives a lot and I wanna say almost daily.

01:27:48.900 | So he also sees this as a source of truth.

01:27:51.760 | You driving the system and it performing and yeah.

01:27:56.220 | - So what do you think?

01:27:57.700 | Tough questions here.

01:27:58.860 | So Tesla last year removed radar from the sensor suite

01:28:04.920 | and now just announced that it's gonna remove

01:28:07.020 | all ultrasonic sensors relying solely on vision,

01:28:10.820 | so camera only.

01:28:11.940 | Does that make the perception problem harder or easier?

01:28:16.340 | - I would almost reframe the question in some way.

01:28:20.020 | So the thing is basically,

01:28:22.080 | you would think that additional sensors--

01:28:23.660 | - By the way, can I just interrupt?

01:28:25.140 | - Go ahead.

01:28:25.980 | - I wonder if a language model will ever do that

01:28:28.260 | if you prompt it.

01:28:29.100 | Let me reframe your question.

01:28:30.940 | That would be epic.

01:28:32.420 | This is the wrong prompt, sorry.

01:28:34.380 | - Yeah, it's like a little bit of a wrong question

01:28:36.360 | because basically you would think that these sensors

01:28:38.660 | are an asset to you,

01:28:40.980 | but if you fully consider the entire product

01:28:43.540 | in its entirety,

01:28:45.120 | these sensors are actually potentially a liability

01:28:47.700 | because these sensors aren't free.

01:28:49.780 | They don't just appear on your car.

01:28:51.260 | You need, suddenly you need to have an entire supply chain.

01:28:53.820 | You have people procuring it.

01:28:55.260 | There can be problems with them.

01:28:56.520 | They may need replacement.

01:28:57.780 | They are part of the manufacturing process.

01:28:59.060 | They can hold back the line in production.

01:29:01.660 | You need to source them, you need to maintain them.

01:29:03.260 | You have to have teams that write the firmware,

01:29:05.620 | all of it,

01:29:06.680 | and then you also have to incorporate and fuse them

01:29:08.740 | into the system in some way.

01:29:09.980 | And so it actually like bloats a lot of it.

01:29:13.620 | And I think Elon is really good at simplify, simplify.

01:29:16.900 | Best part is no part.

01:29:18.300 | And he always tries to throw away things

01:29:19.860 | that are not essential

01:29:20.700 | because he understands the entropy in organizations

01:29:22.760 | and in approach.

01:29:23.860 | And I think in this case,

01:29:26.020 | the cost is high and you're not potentially seeing it

01:29:28.020 | if you're just a computer vision engineer.

01:29:29.700 | And I'm just trying to improve my network

01:29:31.380 | and is it more useful or less useful?

01:29:33.900 | How useful is it?

01:29:35.220 | And the thing is,

01:29:36.060 | once you consider the full cost of a sensor,

01:29:38.460 | it actually is potentially a liability

01:29:40.160 | and you need to be really sure

01:29:41.360 | that it's giving you extremely useful information.

01:29:43.760 | In this case, we looked at using it or not using it

01:29:46.500 | and the Delta was not massive.

01:29:48.060 | And so it's not useful.

01:29:49.540 | - Is it also bloat in the data engine?

01:29:52.620 | Like having more sensors- - 100%.

01:29:55.180 | - Is a distraction?

01:29:56.100 | - And these sensors,

01:29:56.940 | they can change over time, for example.

01:29:58.180 | You can have one type of say radar,

01:29:59.780 | you can have other type of radar,

01:30:00.820 | they change over time.

01:30:01.660 | Now you suddenly need to worry about it.

01:30:02.900 | Now suddenly you have a column in your SQLite

01:30:04.980 | telling you, oh, what sensor type was it?

01:30:06.740 | And they all have different distributions.

01:30:08.660 | And then they contribute noise and entropy into everything.

01:30:13.660 | And they bloat stuff.

01:30:15.220 | And also organizationally,

01:30:16.500 | it has been really fascinating to me

01:30:17.580 | that it can be very distracting.

01:30:19.360 | If all you wanna get to work is vision,

01:30:23.840 | all the resources are on it

01:30:25.140 | and you're building out a data engine

01:30:27.140 | and you're actually making forward progress

01:30:28.660 | because that is the sensor with the most bandwidth,

01:30:32.120 | the most constraints on the world.

01:30:33.580 | And you're investing fully into that

01:30:34.940 | and you can make that extremely good.

01:30:36.380 | If you're only a finite amount of sort of spend

01:30:39.460 | of focus across different facets of the system.

01:30:42.860 | - And this kind of reminds me

01:30:46.060 | of Rich Sutton's, "The Bitter Lesson."

01:30:48.260 | That just seems like simplifying the system

01:30:50.420 | in the long run.

01:30:52.580 | Now, of course, you don't know what the long run is.

01:30:54.420 | And it seems to be always the right solution.

01:30:57.100 | In that case, it was for RL,

01:30:58.340 | but it seems to apply generally

01:30:59.980 | across all systems that do computation.

01:31:02.400 | So what do you think about the LIDAR as a crutch debate?

01:31:06.280 | The battle between point clouds and pixels?

01:31:11.180 | - Yeah, I think this debate

01:31:12.020 | is always like slightly confusing to me

01:31:13.780 | because it seems like the actual debate

01:31:15.700 | should be about like, do you have the fleet or not?

01:31:18.140 | That's like the really important thing

01:31:19.380 | about whether you can achieve a really good functioning

01:31:21.960 | of an AI system at this scale.

01:31:23.940 | - So data collection systems.

01:31:25.500 | - Yeah, do you have a fleet or not?

01:31:26.980 | It's significantly more important

01:31:28.340 | whether you have LIDAR or not.

01:31:29.820 | It's just another sensor.

01:31:31.060 | And yeah, I think similar to the radar discussion,

01:31:36.140 | basically, I don't think it,

01:31:40.500 | basically it doesn't offer extra information.

01:31:43.940 | It's extremely costly.

01:31:44.940 | It has all kinds of problems.

01:31:45.940 | You have to worry about it.

01:31:46.780 | You have to calibrate it, et cetera.

01:31:47.900 | It creates bloat and entropy.

01:31:49.180 | You have to be really sure that you need this sensor.

01:31:52.940 | In this case, I basically don't think you need it.

01:31:54.980 | And I think, honestly, I will make a stronger statement.

01:31:57.260 | I think the others, some of the other companies

01:31:59.780 | who are using it are probably going to drop it.

01:32:02.180 | - Yeah, so you have to consider the sensor in the full,

01:32:06.620 | in considering, can you build a big fleet

01:32:10.340 | that collects a lot of data?

01:32:12.300 | And can you integrate that sensor with it?

01:32:14.500 | That data and that sensor into a data engine

01:32:17.140 | that's able to quickly find different parts of the data

01:32:20.180 | that then continuously improves

01:32:22.340 | whatever the model that you're using.

01:32:24.220 | - Yeah, another way to look at it is like,

01:32:25.860 | vision is necessary in a sense that the drive,

01:32:29.860 | the world is designed for human visual consumption.

01:32:31.460 | So you need vision.

01:32:32.660 | It's necessary.

01:32:33.740 | And then also it is sufficient

01:32:35.900 | because it has all the information

01:32:37.220 | that you need for driving.

01:32:38.820 | And humans, obviously, have a vision to drive.

01:32:40.820 | So it's both necessary and sufficient.

01:32:42.420 | So you want to focus resources.

01:32:43.700 | And you have to be really sure

01:32:44.900 | if you're going to bring in other sensors.

01:32:46.700 | You could add sensors to infinity.

01:32:49.420 | At some point you need to draw the line.

01:32:51.020 | And I think in this case,

01:32:52.060 | you have to really consider the full cost of any one sensor

01:32:55.700 | that you're adopting.

01:32:56.820 | And do you really need it?

01:32:58.500 | And I think the answer in this case is no.

01:33:00.740 | - So what do you think about the idea

01:33:02.420 | that the other companies are forming high resolution maps

01:33:07.260 | and constraining heavily the geographic regions

01:33:10.180 | in which they operate?

01:33:11.660 | Is that approach, in your view,

01:33:15.620 | not going to scale over time

01:33:18.780 | to the entirety of the United States?

01:33:20.700 | - I think, as you mentioned,

01:33:22.180 | they pre-map all the environments

01:33:24.260 | and they need to refresh the map.

01:33:25.820 | And they have a perfect centimeter level accuracy map

01:33:28.220 | of everywhere they're going to drive.

01:33:29.540 | It's crazy.

01:33:30.500 | How are you going to,

01:33:32.100 | when we're talking about autonomy actually changing the world

01:33:34.340 | we're talking about a deployment

01:33:36.580 | on a global scale of autonomous systems for transportation.

01:33:40.380 | And if you need to maintain a centimeter accurate map

01:33:42.700 | for earth or like for many cities and keep them updated

01:33:46.100 | it's a huge dependency that you're taking on,

01:33:48.220 | huge dependency.

01:33:49.900 | It's a massive, massive dependency.

01:33:51.500 | And now you need to ask yourself, do you really need it?

01:33:54.380 | And humans don't need it, right?

01:33:57.300 | So it's very useful to have a low level map of like, okay

01:34:00.420 | the connectivity of your road,

01:34:01.700 | you know that there's a fork coming up

01:34:03.300 | when you drive an environment,

01:34:04.340 | you sort of have that high level understanding.

01:34:05.580 | It's like a small Google map.

01:34:07.380 | And Tesla uses Google map like similar kind of resolution

01:34:11.380 | information in the system,

01:34:12.980 | but it will not pre-map environments

01:34:14.500 | to semi centimeter level accuracy.

01:34:16.340 | It's a crutch, it's a distraction.

01:34:17.620 | It costs entropy and it diffuses the team.

01:34:20.460 | It dilutes the team.

01:34:21.460 | And you're not focusing on what's actually necessary

01:34:23.380 | which is the computer vision problem.

01:34:25.380 | - What did you learn about machine learning,

01:34:29.300 | about engineering, about life, about yourself

01:34:32.020 | as one human being from working with Elon Musk?

01:34:36.500 | - I think the most I've learned is about

01:34:38.180 | how to sort of run organizations efficiently

01:34:40.940 | and how to create efficient organizations

01:34:43.540 | and how to fight entropy in an organization.

01:34:46.220 | - So human engineering in the fight against entropy.

01:34:49.180 | - Yeah, I think Elon is a very efficient warrior

01:34:53.500 | in the fight against entropy in organizations.

01:34:56.180 | - What does entropy in an organization look like exactly?

01:34:58.900 | - It's process, it's process and it's-

01:35:03.340 | - Inefficiencies in the form of meetings

01:35:05.220 | and that kind of stuff.

01:35:06.060 | - Yeah, meetings, he hates meetings.

01:35:07.700 | He keeps telling people to skip meetings

01:35:09.100 | if they're not useful.

01:35:10.900 | He basically runs the world's biggest startups,

01:35:13.900 | I would say.

01:35:15.220 | Tesla, SpaceX are the world's biggest startups.

01:35:17.460 | Tesla actually has multiple startups.

01:35:19.620 | I think it's better to look at it that way.

01:35:21.380 | And so I think he's extremely good at that.

01:35:25.300 | And yeah, he has a very good intuition

01:35:27.820 | for streamlining processes, making everything efficient.

01:35:30.260 | Best part is no part, simplifying, focusing

01:35:34.060 | and just kind of removing barriers,

01:35:35.980 | moving very quickly, making big moves.

01:35:38.020 | All this is a very startupy sort of seeming things

01:35:40.460 | but at scale.

01:35:41.540 | - So strong drive to simplify.

01:35:43.940 | From your perspective, I mean,

01:35:45.540 | that also probably applies to just designing systems

01:35:49.500 | and machine learning and otherwise,

01:35:51.100 | like simplify, simplify.

01:35:52.340 | - Yes.

01:35:53.500 | - What do you think is the secret

01:35:54.900 | to maintaining the startup culture

01:35:57.420 | in a company that grows?

01:35:59.140 | Is there, can you introspect that?

01:36:03.820 | - I do think you need someone in a powerful position

01:36:06.380 | with a big hammer, like Elon,

01:36:07.900 | who's like the cheerleader for that idea

01:36:10.380 | and ruthlessly pursues it.

01:36:12.700 | If no one has a big enough hammer,

01:36:14.780 | everything turns into committees,

01:36:17.140 | democracy within the company,

01:36:19.380 | process, talking to stakeholders,

01:36:21.420 | decision-making, just everything just crumbles.

01:36:24.300 | If you have a big person who is also really smart

01:36:26.660 | and has a big hammer, things move quickly.

01:36:28.940 | - So you said your favorite scene in "Interstellar"

01:36:32.700 | is the intense docking scene

01:36:34.180 | with the AI and Cooper talking,

01:36:35.780 | saying, "Cooper, what are you doing?"

01:36:38.260 | "Docking, it's not possible."

01:36:40.260 | "No, it's necessary."

01:36:41.980 | Such a good line.

01:36:43.740 | By the way, just so many questions there.

01:36:45.580 | Why an AI in that scene,

01:36:49.660 | presumably is supposed to be able to compute

01:36:53.260 | a lot more than the human,

01:36:55.220 | is saying it's not optimal.

01:36:56.340 | Why the human, I mean, that's a movie,

01:36:57.940 | but shouldn't the AI know much better than the human?

01:37:02.180 | Anyway, what do you think is the value

01:37:04.340 | of setting seemingly impossible goals?

01:37:07.540 | So like, (laughs)

01:37:09.740 | our initial intuition, which seems like something

01:37:12.460 | that you have taken on, that Elon espouses,

01:37:17.020 | that where the initial intuition of the community

01:37:19.900 | might say this is very difficult,

01:37:21.780 | and then you take it on anyway with a crazy deadline.

01:37:24.860 | You just, from a human engineering perspective,

01:37:27.220 | have you seen the value of that?

01:37:31.860 | - I wouldn't say that setting impossible goals exactly

01:37:34.700 | is a good idea, but I think setting very ambitious goals

01:37:37.260 | is a good idea.

01:37:38.340 | I think there's a, what I call,

01:37:40.300 | sublinear scaling of difficulty,

01:37:42.100 | which means that 10x problems are not 10x hard.

01:37:45.260 | Usually 10x harder problem is like two or three x

01:37:49.540 | harder to execute on.

01:37:51.060 | Because if you wanna actually,

01:37:52.460 | if you wanna improve a system by 10%,

01:37:54.460 | it costs some amount of work.

01:37:55.620 | And if you wanna 10x improve the system,

01:37:57.420 | it doesn't cost 100x amount of work.

01:38:00.420 | And it's because you fundamentally change the approach.

01:38:02.140 | And if you start with that constraint,

01:38:04.500 | then some approaches are obviously dumb

01:38:06.020 | and not going to work.

01:38:06.980 | And it forces you to reevaluate.

01:38:09.700 | And I think it's a very interesting way

01:38:11.780 | of approaching problem solving.

01:38:13.860 | - But it requires a weird kind of thinking.

01:38:16.220 | It's just going back to your PhD days.

01:38:19.380 | It's like, how do you think which ideas

01:38:23.180 | in the machine learning community are solvable?

01:38:27.460 | - Yes.

01:38:28.740 | It requires, what is that?

01:38:30.380 | I mean, there's the cliche of first principles thinking,

01:38:32.940 | but it requires to basically ignore

01:38:35.380 | what the community is saying.

01:38:36.420 | 'Cause doesn't a community in science

01:38:40.820 | usually draw lines of what is and isn't possible?

01:38:44.100 | - Right.

01:38:44.940 | - And it's very hard to break out of that

01:38:47.180 | without going crazy.

01:38:48.500 | - Yeah.

01:38:49.340 | I mean, I think a good example here

01:38:50.500 | is the deep learning revolution in some sense,

01:38:52.860 | because you could be in computer vision at that time

01:38:55.860 | when during the deep learning revolution of 2012 and so on,

01:39:00.340 | you could be improving a computer vision stack by 10%,

01:39:03.060 | or we can just be saying, actually, all of this is useless.

01:39:05.900 | And how do I do 10x better computer vision?

01:39:07.860 | Well, it's not probably by tuning a hog feature detector.

01:39:11.060 | I need a different approach.

01:39:12.740 | I need something that is scalable.

01:39:14.060 | Going back to Richard Sutton's,

01:39:17.180 | an understanding sort of like the philosophy

01:39:18.700 | of the bitter lesson, and then being like,

01:39:21.580 | actually, I need much more scalable system,

01:39:23.220 | like a neural network that in principle works.

01:39:25.660 | And then having some deep believers

01:39:26.980 | that can actually execute on that mission and make it work.

01:39:29.460 | So that's the 10x solution.

01:39:31.660 | - What do you think is the timeline

01:39:35.620 | to solve the problem of autonomous driving?

01:39:38.460 | That's still in part an open question.

01:39:41.540 | - Yeah, I think the tough thing

01:39:43.820 | with timelines of self-driving, obviously,

01:39:45.660 | is that no one has created self-driving.

01:39:47.980 | - Yeah.

01:39:48.820 | - So it's not like,

01:39:50.460 | what do you think is the timeline to build this bridge?

01:39:52.140 | Well, we've built million bridges before.

01:39:54.020 | Here's how long that takes.

01:39:55.220 | It's, you know, it's, no one has built autonomy.

01:39:58.500 | It's not obvious.

01:40:00.180 | Some parts turn out to be much easier than others.

01:40:02.900 | So it's really hard to forecast.

01:40:04.020 | You do your best based on trend lines and so on,

01:40:06.780 | and based on intuition.

01:40:07.700 | But that's why fundamentally,

01:40:09.140 | it's just really hard to forecast this.

01:40:10.860 | No one has-

01:40:11.700 | - So even still, like being inside of it, it's hard to do.

01:40:14.980 | - Yes, some things turn out to be much harder,

01:40:16.660 | and some things turn out to be much easier.

01:40:18.940 | - Do you try to avoid making forecasts?

01:40:21.940 | 'Cause like Elon doesn't avoid them, right?

01:40:24.140 | And heads of car companies in the past

01:40:26.300 | have not avoided it either.

01:40:28.100 | Ford and other places have made predictions

01:40:31.460 | that we're gonna solve level four driving

01:40:33.940 | by like 2020, 2021, whatever.

01:40:36.540 | And now they're all kind of backtracking that prediction.

01:40:39.340 | As an AI person,

01:40:44.020 | do you for yourself privately make predictions,

01:40:48.540 | or do they get in the way of like your actual ability

01:40:52.100 | to think about a thing?

01:40:53.900 | - Yeah, I would say like,

01:40:55.060 | what's easy to say is that this problem is tractable,

01:40:57.580 | and that's an easy prediction to make.

01:40:59.180 | It's tractable, it's going to work.

01:41:00.780 | Yes, it's just really hard.

01:41:01.980 | Some things turn out to be harder,

01:41:03.060 | and some things turn out to be easier.

01:41:05.060 | So, but it definitely feels tractable,

01:41:08.180 | and it feels like at least the team at Tesla,

01:41:10.500 | which is what I saw internally,

01:41:11.740 | is definitely on track to that.

01:41:13.300 | - How do you form a strong representation

01:41:17.620 | that allows you to make a prediction about tractability?

01:41:20.620 | So like you're the leader of a lot of humans,

01:41:23.700 | you have to kind of say, this is actually possible.

01:41:28.540 | Like how do you build up that intuition?

01:41:30.980 | It doesn't have to be even driving,

01:41:32.420 | it could be other tasks.

01:41:33.660 | It could be, what difficult tasks

01:41:37.140 | did you work on in your life?

01:41:38.060 | I mean, classification, achieving certain,

01:41:41.100 | just an image net, certain level

01:41:43.460 | of superhuman level performance.

01:41:45.820 | - Yeah, expert intuition.

01:41:47.860 | It's just intuition, it's belief.

01:41:49.660 | (laughs)

01:41:50.900 | So just like thinking about it long enough,

01:41:52.780 | like studying, looking at sample data,

01:41:54.700 | like you said, driving.

01:41:55.880 | My intuition is really flawed on this.

01:41:59.060 | Like I don't have a good intuition about tractability.

01:42:01.420 | It could be anything.

01:42:03.300 | It could be solvable.

01:42:05.920 | Like, you know, the driving task

01:42:09.220 | could be simplified into something quite trivial.

01:42:12.940 | Like the solution to the problem would be quite trivial.

01:42:16.300 | And at scale, more and more cars driving perfectly

01:42:20.420 | might make the problem much easier.

01:42:22.260 | - Yeah.

01:42:23.100 | - The more cars you have driving,

01:42:23.940 | and like people learn how to drive correctly,

01:42:26.620 | not correctly, but in a way that's more optimal

01:42:29.460 | for a heterogeneous system of autonomous

01:42:32.900 | and semi-autonomous and manually driven cars,

01:42:36.060 | that could change stuff.

01:42:37.140 | Then again, also I've spent a ridiculous number of hours

01:42:40.540 | just staring at pedestrians crossing streets,

01:42:43.660 | thinking about humans.

01:42:45.340 | And it feels like the way we use our eye contact,

01:42:50.340 | it sends really strong signals.

01:42:52.740 | And there's certain quirks and edge cases of behavior.

01:42:55.580 | And of course, a lot of the fatalities that happen

01:42:57.660 | have to do with drunk driving,

01:42:59.740 | and both on the pedestrian side and the driver's side.

01:43:03.340 | So there's that problem of driving at night

01:43:05.860 | and all that kind of.

01:43:06.780 | So I wonder, you know, it's like the space

01:43:09.140 | of possible solution to autonomous driving

01:43:12.380 | includes so many human factor issues

01:43:15.660 | that it's almost impossible to predict.

01:43:17.900 | There could be super clean, nice solutions.

01:43:20.860 | - Yeah.

01:43:21.700 | I would say definitely like to use a game analogy,

01:43:23.420 | there's some fog of war,

01:43:25.140 | but you definitely also see the frontier of improvement

01:43:28.380 | and you can measure historically

01:43:29.700 | how much you've made progress.

01:43:31.340 | And I think, for example, at least what I've seen

01:43:33.260 | in roughly five years at Tesla,

01:43:35.380 | when I joined, it barely kept lane on the highway.

01:43:38.940 | I think going up from Palo Alto to SF

01:43:40.740 | was like three or four interventions.

01:43:42.180 | Anytime the road would do anything geometrically

01:43:44.660 | or turn too much, it would just like not work.

01:43:47.060 | And so going from that to like a pretty competent system

01:43:49.340 | in five years and seeing what happens also under the hood

01:43:52.380 | and what the scale of which the team is operating now

01:43:54.220 | with respect to data and compute and everything else

01:43:56.940 | is just a massive progress.

01:43:59.500 | So.

01:44:00.340 | - So it's, you're climbing a mountain and it's fog,

01:44:03.860 | but you're making a lot of progress.

01:44:05.300 | - It's fog.

01:44:06.140 | You're making progress and you see

01:44:06.980 | what the next directions are.

01:44:07.940 | And you're looking at some of the remaining challenges

01:44:09.540 | and they're not like, they're not perturbing you

01:44:12.340 | and they're not changing your philosophy

01:44:13.620 | and you're not contorting yourself.

01:44:15.780 | You're like, actually, these are the things

01:44:17.420 | that we still need to do.

01:44:18.260 | - Yeah, the fundamental components of solving the problem

01:44:20.220 | seem to be there from the data engine to the compute,

01:44:22.580 | to the compute on the car, to the compute for the training,

01:44:25.560 | all that kind of stuff.

01:44:27.240 | So you've done, over the years you've been at Tesla,

01:44:30.420 | you've done a lot of amazing breakthrough ideas

01:44:33.740 | in engineering, all of it,

01:44:36.860 | from the data engine to the human side, all of it.

01:44:40.180 | Can you speak to why you chose to leave Tesla?

01:44:43.900 | - Basically, as I described, I ran,

01:44:45.940 | I think over time during those five years,

01:44:47.780 | I've kind of gotten myself into

01:44:50.620 | a little bit of a managerial position.

01:44:52.460 | Most of my days were meetings and growing the organization

01:44:54.940 | and making decisions about sort of high level strategic

01:44:58.860 | decisions about the team

01:45:00.300 | and what it should be working on and so on.

01:45:02.420 | And it's kind of like a corporate executive role

01:45:06.100 | and I can do it, I think I'm okay at it,

01:45:08.580 | but it's not like fundamentally what I enjoy.

01:45:11.060 | And so I think when I joined,

01:45:13.980 | there was no computer vision team

01:45:15.060 | because Tesla was just going from the transition

01:45:17.040 | of using Mobileye, a third party vendor

01:45:18.740 | for all of its computer vision,

01:45:19.700 | to having to build its computer vision system.

01:45:21.840 | So when I showed up, there were two people

01:45:23.180 | training deep neural networks

01:45:24.900 | and they were training them at a computer

01:45:26.580 | at their legs, like down, it was a workstation.

01:45:30.580 | - They're doing some kind of basic classification task.

01:45:32.460 | - Yeah, and so I kind of like grew that

01:45:35.580 | into what I think is a fairly respectable

01:45:37.820 | deep learning team, a massive compute cluster,

01:45:40.020 | a very good data annotation organization.

01:45:42.880 | And I was very happy with where that was.

01:45:45.260 | It became quite autonomous.

01:45:46.660 | And so I kind of stepped away and I, you know,

01:45:49.580 | I'm very excited to do much more technical things again.

01:45:52.300 | Yeah, and kind of like refocus on AGI.

01:45:54.620 | - What was this soul searching like?

01:45:56.580 | 'Cause you took a little time off and think like,

01:45:58.500 | what, how many mushrooms did you take?

01:46:00.460 | No, I'm just kidding.

01:46:01.660 | I mean, what was going through your mind?

01:46:03.740 | The human lifetime is finite.

01:46:06.160 | - Yeah.

01:46:07.000 | - You did a few incredible things.

01:46:08.260 | You're one of the best teachers of AI in the world.

01:46:11.540 | You're one of the best, and I don't mean that,

01:46:14.380 | I mean that in the best possible way,

01:46:15.900 | you're one of the best tinkerers in the AI world.

01:46:19.720 | Meaning like understanding the fundamentals

01:46:23.260 | of how something works by building it from scratch

01:46:26.420 | and playing with the basic intuitions.

01:46:28.500 | It's like Einstein, Feynman,

01:46:30.340 | were all really good at this kind of stuff.

01:46:32.060 | Like small example of a thing to play with it,

01:46:35.180 | to try to understand it.

01:46:37.020 | So that, and obviously now with Tesla,

01:46:38.940 | you help build a team of machine learning,

01:46:42.360 | like engineers and assistant that actually accomplishes

01:46:46.740 | something in the real world.

01:46:47.940 | So given all that, like what was the soul searching like?

01:46:51.740 | - Well, it was hard because obviously

01:46:53.380 | I love the company a lot and I love Elon, I love Tesla.

01:46:57.140 | I want, it was hard to leave.

01:47:00.100 | I love the team basically.

01:47:02.440 | But yeah, I think actually I would be potentially

01:47:06.440 | like interested in revisiting it.

01:47:07.920 | Maybe coming back at some point,

01:47:10.080 | working in Optimus, working in AGI at Tesla.

01:47:13.440 | I think Tesla is going to do incredible things.

01:47:15.800 | It's basically like,

01:47:17.020 | it's a massive large-scale robotics kind of company

01:47:22.440 | for the ton of in-house talent

01:47:23.520 | for doing really incredible things.

01:47:25.040 | And I think human robots are going to be amazing.

01:47:29.200 | I think autonomous transportation is going to be amazing.

01:47:31.880 | All this is happening at Tesla.

01:47:32.920 | So I think it's just a really amazing organization.

01:47:35.520 | So being part of it and helping it along,

01:47:37.080 | I think was very, basically I enjoyed that a lot.

01:47:39.920 | Yeah, it was basically difficult for those reasons

01:47:41.560 | because I love the company,

01:47:43.120 | but I'm happy to potentially at some point

01:47:45.400 | come back for Act Two.

01:47:46.800 | But I felt like at this stage, I built the team,

01:47:49.640 | it felt autonomous and I became a manager

01:47:53.200 | and I wanted to do a lot more technical stuff.

01:47:54.760 | I wanted to learn stuff, I wanted to teach stuff.

01:47:57.360 | And I just kind of felt like it was a good time

01:48:00.040 | for a change of pace a little bit.

01:48:01.600 | - What do you think is the best movie sequel of all time,

01:48:05.680 | speaking of Part Two?

01:48:07.160 | 'Cause most of them suck.

01:48:08.960 | - Movie sequels?

01:48:09.800 | - Movie sequels, yeah.

01:48:10.960 | And you tweeted about movies, so just in a tiny tangent,

01:48:14.320 | is there, what's your, what's like a favorite movie sequel?

01:48:18.440 | Godfather Part Two?

01:48:19.540 | Are you a fan of Godfather?

01:48:21.560 | 'Cause you didn't even tweet or mention the Godfather.

01:48:23.640 | - Yeah, I don't love that movie.

01:48:24.640 | I know it has a huge follow-up.

01:48:25.480 | - We're gonna edit that out.

01:48:26.300 | We're gonna edit out the hate towards the Godfather.

01:48:28.840 | How dare you disrespect--

01:48:29.680 | - I think I will make a strong statement.

01:48:31.000 | I don't know why. - Oh, no.

01:48:32.160 | - I don't know why, but I basically don't like any movie

01:48:34.880 | before 1995, something like that.

01:48:38.240 | - Didn't you mention Terminator Two?

01:48:40.080 | - Okay, okay, that's like,

01:48:41.900 | Terminator Two was a little bit later, 1990.

01:48:45.600 | - No, I think Terminator Two was in the '80s.

01:48:47.440 | - And I like Terminator One as well.

01:48:48.840 | So, okay, so like few exceptions,

01:48:50.640 | but by and large, for some reason,

01:48:52.180 | I don't like movies before 1995 or something.

01:48:55.420 | They feel very slow.

01:48:56.840 | The camera is like zoomed out.

01:48:58.080 | It's boring.

01:48:58.960 | It's kind of naive.

01:49:00.080 | It's kind of weird.

01:49:00.920 | - And also, Terminator was very much ahead of its time.

01:49:03.960 | - Yes, and the Godfather, there's like no AGI.

01:49:06.700 | So-- (laughing)

01:49:08.960 | - I mean, but you have Good Will Hunting

01:49:12.120 | was one of the movies you mentioned,

01:49:13.920 | and that doesn't have any AGI either.

01:49:15.720 | I guess it has mathematics.

01:49:16.920 | - Yeah, I guess occasionally I do enjoy movies

01:49:19.040 | that don't feature--

01:49:20.560 | - Or like Anchorman, that has no, that's--

01:49:23.160 | - Anchorman is so good.

01:49:24.680 | - I don't understand, speaking of AGI,

01:49:28.400 | 'cause I don't understand why Will Ferrell is so funny.

01:49:31.960 | It doesn't make sense.

01:49:32.920 | It doesn't compute.

01:49:33.760 | There's just something about him.

01:49:35.400 | And he's a singular human,

01:49:37.120 | 'cause you don't get that many comedies these days.

01:49:39.920 | And I wonder if it has to do about the culture

01:49:42.440 | or like the machine of Hollywood,

01:49:44.400 | or does it have to do with just we got lucky

01:49:46.280 | with certain people in comedy that came together,

01:49:49.200 | 'cause he is a singular human.

01:49:50.360 | - Yeah, yeah, yeah, I like his movies.

01:49:53.200 | That was a ridiculous tangent, I apologize.

01:49:55.440 | But you mentioned humanoid robots,

01:49:57.320 | so what do you think about Optimus, about Tesla Bot?

01:50:00.600 | Do you think we'll have robots in the factory

01:50:02.480 | and in the home in 10, 20, 30, 40, 50 years?

01:50:05.960 | - Yeah, I think it's a very hard project.

01:50:07.760 | I think it's going to take a while.

01:50:09.160 | Who else is going to build humanoid robots at scale?

01:50:12.160 | And I think it is a very good form factor to go after,

01:50:14.240 | because like I mentioned,

01:50:15.600 | the world is designed for humanoid form factor.

01:50:17.760 | These things would be able to operate our machines.

01:50:19.760 | They would be able to sit down in chairs,

01:50:22.120 | potentially even drive cars.

01:50:24.200 | Basically, the world is designed for humans.

01:50:25.840 | That's the form factor you want to invest into

01:50:27.760 | and make work over time.

01:50:29.400 | I think there's another school of thought,

01:50:31.280 | which is, okay, pick a problem and design a robot to it.

01:50:34.320 | But actually designing a robot

01:50:35.440 | and getting a whole data engine

01:50:36.480 | and everything behind it to work

01:50:37.880 | is actually an incredibly hard problem.

01:50:39.800 | So it makes sense to go after general interfaces

01:50:41.880 | that, okay, they are not perfect for any one given task,

01:50:44.880 | but they actually have the generality

01:50:46.720 | of just with a prompt with English

01:50:48.680 | able to do something across.

01:50:51.280 | And so I think it makes a lot of sense

01:50:52.360 | to go after a general interface in the physical world.

01:50:57.360 | And I think it's a very difficult project.

01:51:00.040 | I think it's going to take time,

01:51:01.920 | but I've seen no other company

01:51:03.960 | that can execute on that vision.

01:51:04.960 | I think it's going to be amazing.

01:51:06.040 | Like basically physical labor.

01:51:08.360 | Like if you think transportation is a large market,

01:51:10.720 | try physical labor.

01:51:11.760 | (laughs)

01:51:12.600 | It's like insane.

01:51:14.200 | - But it's not just physical labor.

01:51:15.480 | To me, the thing that's also exciting is social robotics.

01:51:18.680 | So the relationship we'll have on different levels

01:51:21.640 | with those robots.

01:51:23.360 | That's why I was really excited to see Optimus.

01:51:26.360 | Like people have criticized me for the excitement,

01:51:30.920 | but I've worked with a lot of research labs

01:51:34.680 | that do humanoid legged robots,

01:51:38.000 | Boston Dynamics, Unitry,

01:51:40.000 | there's a lot of companies that do legged robots,

01:51:42.680 | but that's the elegance of the movement

01:51:47.880 | is a tiny, tiny part of the big picture.

01:51:51.640 | So integrating, the two big exciting things to me

01:51:54.320 | about Tesla doing humanoid or any legged robots

01:51:58.360 | is clearly integrating into the data engine.

01:52:03.000 | So the data engine aspect.

01:52:05.080 | So the actual intelligence for the perception

01:52:07.640 | and the control and the planning

01:52:09.680 | and all that kind of stuff,

01:52:10.760 | integrating into the fleet that you mentioned, right?

01:52:13.640 | And then speaking of fleet,

01:52:17.400 | the second thing is the mass manufacturers.

01:52:19.360 | Just knowing culturally driving towards a simple robot

01:52:24.360 | that's cheap to produce at scale

01:52:29.400 | and doing that well, having experience to do that well,

01:52:31.560 | that changes everything.

01:52:32.480 | That's a very different culture and style

01:52:35.600 | than Boston Dynamics, who by the way,

01:52:37.600 | those robots are just, the way they move,

01:52:41.400 | it'll be a very long time before Tesla can achieve

01:52:45.520 | the smoothness of movement.

01:52:47.200 | But that's not what it's about.

01:52:49.080 | It's about the entirety of the system,

01:52:52.280 | like we talked about the data engine and the fleet.

01:52:54.760 | That's super exciting.

01:52:55.720 | Even the initial sort of models,

01:52:58.240 | but that too was really surprising

01:53:00.560 | that in a few months you can get a prototype.

01:53:03.600 | - Yep, and the reason that happened very quickly

01:53:05.760 | is as you alluded to, there's a ton of copy paste

01:53:08.480 | from what's happening in the autopilot, a lot.

01:53:10.840 | The amount of expertise that came out of the woodworks

01:53:12.880 | at Tesla for building the human robot was incredible to see.

01:53:16.120 | Like basically Elon said at one point we're doing this.

01:53:19.320 | And then next day, basically,

01:53:21.960 | like all these CAD models started to appear

01:53:23.960 | and people talking about like the supply chain

01:53:25.480 | and manufacturing and people showed up

01:53:27.920 | with like screwdrivers and everything like the other day

01:53:30.400 | and started to like put together the body.

01:53:32.120 | And I was like, whoa, like all these people exist at Tesla.

01:53:34.600 | And fundamentally building a car is actually

01:53:36.040 | not that different from building a robot.

01:53:37.640 | The same, and that is true,

01:53:39.560 | not just for the hardware pieces.

01:53:41.600 | And also let's not forget hardware, not just for a demo,

01:53:44.600 | but manufacturing of that hardware at scale

01:53:48.840 | is like a whole different thing.

01:53:50.280 | But for software as well,

01:53:52.200 | basically this robot currently thinks it's a car.

01:53:54.600 | (both laughing)

01:53:56.520 | - It's gonna have a midlife crisis at some point.

01:53:59.400 | - It thinks it's a car.

01:54:01.000 | Some of the earlier demos, actually,

01:54:02.360 | we were talking about potentially doing them outside

01:54:04.080 | in the parking lot because that's where all

01:54:05.400 | of the computer vision was like working out of the box

01:54:08.120 | instead of like inside.

01:54:10.560 | But all the operating system, everything just copy pastes.

01:54:14.520 | Computer vision, mostly copy pastes.

01:54:16.240 | I mean, you have to retrain the neural nets,

01:54:17.440 | but the approach and everything and data engine

01:54:19.040 | and offline trackers and the way we go

01:54:20.800 | about the occupancy tracker and so on, everything copy pastes.

01:54:23.560 | You just need to retrain the neural nets.

01:54:25.840 | And then the planning control, of course,

01:54:27.360 | has to change quite a bit.

01:54:28.600 | But there's a ton of copy paste

01:54:30.280 | from what's happening at Tesla.

01:54:31.520 | And so if you were to go with the goal of like,

01:54:34.240 | okay, let's build a million human robots

01:54:35.840 | and you're not Tesla, that's a lot to ask.

01:54:38.560 | If you're Tesla, it's actually like, it's not that crazy.

01:54:42.800 | - And then the follow up question is then how difficult,

01:54:45.480 | just like with driving, how difficult

01:54:46.840 | is the manipulation task?

01:54:49.000 | Such that it can have an impact at scale.

01:54:51.040 | I think depending on the context,

01:54:53.640 | the really nice thing about robotics is that,

01:54:57.840 | unless you do a manufacturer and that kind of stuff,

01:55:00.160 | is there's more room for error.

01:55:02.480 | Driving is so safety critical and also time critical.

01:55:06.280 | Like a robot is allowed to move slower, which is nice.

01:55:09.960 | - Yes.

01:55:11.000 | I think it's going to take a long time,

01:55:12.560 | but the way you want to structure the development

01:55:14.440 | is you need to say, okay, it's going to take a long time.

01:55:16.600 | How can I set up the product development roadmap

01:55:20.680 | so that I'm making revenue along the way?

01:55:22.160 | I'm not setting myself up for a zero one loss function

01:55:24.440 | where it doesn't work until it works.

01:55:26.080 | You don't want to be in that position.

01:55:27.400 | You want to make it useful almost immediately.

01:55:29.560 | And then you want to slowly deploy it

01:55:31.800 | and generalize it at scale.

01:55:34.480 | And you want to set up your data engine,

01:55:35.680 | your improvement loops, the telemetry, the evaluation,

01:55:38.920 | the harness and everything.

01:55:41.200 | And you want to improve the product over time incrementally

01:55:43.320 | and you're making revenue along the way.

01:55:44.640 | That's extremely important because otherwise

01:55:46.720 | you cannot build these large undertakings

01:55:49.480 | just like don't make sense economically.

01:55:51.200 | And also from the point of view of the team working on it,

01:55:53.240 | they need the dopamine along the way.

01:55:55.120 | They're not just going to make a promise

01:55:57.160 | about this being useful.

01:55:58.640 | This is going to change the world in 10 years when it works.

01:56:01.080 | This is not where you want to be.

01:56:02.280 | You want to be in a place like I think Autopilot is today

01:56:04.600 | where it's offering increased safety

01:56:06.640 | and convenience of driving today.

01:56:10.000 | People pay for it, people like it, people purchase it.

01:56:12.800 | And then you also have the greater mission

01:56:14.480 | that you're working towards.

01:56:16.360 | - And you see that, so the dopamine for the team,

01:56:19.040 | that was a source of happiness.

01:56:20.680 | - Yes, 100%.

01:56:21.760 | You're deploying this, people like it, people drive it,

01:56:23.760 | people pay for it, they care about it.

01:56:25.680 | There's all these YouTube videos.

01:56:27.040 | Your grandma drives it, she gives you feedback.

01:56:29.200 | People like it, people engage with it.

01:56:30.640 | You engage with it, huge.

01:56:32.280 | - Do people that drive Teslas like recognize you

01:56:34.760 | and give you love?

01:56:36.640 | Like, "Hey, thanks for this nice feature that it's doing."

01:56:41.640 | - Yeah, I think the tricky thing is like,

01:56:43.800 | some people really love you,

01:56:44.760 | some people unfortunately like,

01:56:46.240 | you're working on something that you think

01:56:47.200 | is extremely valuable, useful, et cetera.

01:56:48.960 | Some people do hate you.

01:56:50.320 | There's a lot of people who hate me and the team

01:56:53.000 | and the whole project.

01:56:55.360 | And I think-- - Are they Tesla drivers?

01:56:57.800 | - Many cases they're not, actually.

01:56:59.760 | - Yeah, that actually makes me sad about humans

01:57:02.760 | or the current ways that humans interact.

01:57:06.440 | I think that's actually fixable.

01:57:07.760 | I think humans want to be good to each other.

01:57:09.480 | I think Twitter and social media is part of the mechanism

01:57:12.360 | that actually somehow makes the negativity more viral,

01:57:16.320 | that it doesn't deserve, like disproportionately add

01:57:20.280 | like a viral boost to the negativity.

01:57:23.640 | But I wish people would just get excited about,

01:57:26.360 | so suppress some of the jealousy, some of the ego,

01:57:30.520 | and just get excited for others.

01:57:32.200 | And then there's a karma aspect to that.

01:57:34.440 | You get excited for others, they'll get excited for you.

01:57:36.640 | Same thing in academia.

01:57:38.120 | If you're not careful, there is like a dynamical system

01:57:40.600 | there if you think of in silos and get jealous

01:57:44.200 | of somebody else being successful,

01:57:46.080 | that actually perhaps counterintuitively leads

01:57:50.280 | to less productivity of you as a community

01:57:52.280 | and you individually.

01:57:53.600 | I feel like if you keep celebrating others,

01:57:56.760 | that actually makes you more successful.

01:57:59.800 | - I think people have, depending on the industry,

01:58:02.440 | haven't quite learned that yet.

01:58:03.600 | - Yeah.

01:58:04.440 | Some people are also very negative and very vocal.

01:58:06.160 | So they're very prominently featured,

01:58:07.680 | but actually there's a ton of people who are cheerleaders,

01:58:10.680 | but they're silent cheerleaders.

01:58:12.440 | And when you talk to people just in the world,

01:58:15.840 | they will tell you, "Oh, it's amazing, it's great."

01:58:17.560 | Especially like people who understand how difficult it is

01:58:19.440 | to get this stuff working.

01:58:20.400 | Like people who have built products

01:58:21.680 | and makers, entrepreneurs, like making this work

01:58:25.040 | and changing something is incredibly hard.

01:58:28.600 | Those people are more likely to cheerlead you.

01:58:31.080 | - Well, one of the things that makes me sad

01:58:32.600 | is some folks in the robotics community

01:58:35.120 | don't do the cheerleading and they should.

01:58:37.880 | 'Cause they know how difficult it is.

01:58:39.160 | Well, they actually sometimes don't know how difficult

01:58:41.080 | it is to create a product that's scale, right?

01:58:43.360 | To actually deploy it in the real world.

01:58:45.200 | A lot of the development of robots and AI system

01:58:50.000 | is done on very specific small benchmarks

01:58:52.640 | and as opposed to real world conditions.

01:58:55.600 | - Yes.

01:58:57.160 | - Yeah, I think it's really hard to work on robotics

01:58:58.960 | in an academic setting.

01:59:00.000 | - Or AI systems that apply in the real world.

01:59:02.000 | You've criticized, you flourished and loved for a time

01:59:07.000 | the ImageNet, the famed ImageNet dataset.

01:59:10.960 | And I've recently had some words of criticism

01:59:14.680 | that the academic research ML community

01:59:18.600 | gives a little too much love still to the ImageNet

01:59:21.160 | or like those kinds of benchmarks.

01:59:23.800 | Can you speak to the strengths and weaknesses of datasets

01:59:27.000 | used in machine learning research?

01:59:29.200 | - Actually, I don't know that I recall a specific instance

01:59:32.320 | where I was unhappy or criticizing ImageNet.

01:59:35.680 | I think ImageNet has been extremely valuable.

01:59:37.920 | It was basically a benchmark that allowed

01:59:41.520 | the deep learning community to demonstrate

01:59:44.000 | that deep neural networks actually work.

01:59:46.000 | There's a massive value in that.

01:59:49.680 | So I think ImageNet was useful,

01:59:51.280 | but basically it's become a bit of an MNIST at this point.

01:59:54.240 | So MNIST is like little 28 by 28 grayscale digits.

01:59:57.720 | There's kind of a joke dataset that everyone like crushes.

02:00:00.640 | - There's still papers written on MNIST though, right?

02:00:02.880 | - Maybe there shouldn't be. - Like strong papers.

02:00:04.840 | Like papers that focus on like,

02:00:06.960 | how do we learn with a small amount of data,

02:00:08.960 | that kind of stuff.

02:00:09.800 | - Yeah, I could see that being helpful,

02:00:10.640 | but not in sort of like mainline

02:00:11.880 | computer vision research anymore, of course.

02:00:13.520 | - I think the way I've heard you somewhere,

02:00:15.400 | maybe I'm just imagining things,

02:00:17.200 | but I think you said like ImageNet was a huge contribution

02:00:19.800 | to the community for a long time

02:00:21.040 | and now it's time to move past those kinds of-

02:00:23.080 | - Well, ImageNet has been crushed.

02:00:24.200 | I mean, the error rates are,

02:00:25.960 | yeah, we're getting like 90% accuracy

02:00:30.680 | in 1000 classification way prediction.

02:00:34.840 | And I've seen those images and it's like really high.

02:00:38.800 | That's really good.

02:00:40.280 | If I remember correctly,

02:00:41.120 | the top five error rate is now like 1% or something.

02:00:44.600 | - Given your experience with a gigantic real world dataset,

02:00:47.840 | would you like to see benchmarks move

02:00:49.680 | in a certain directions that the research community uses?

02:00:52.720 | - Unfortunately, I don't think academics currently

02:00:54.240 | have the next ImageNet.

02:00:55.920 | We've obviously, I think we've crushed MNIST.

02:00:57.720 | We've basically kind of crushed ImageNet

02:01:00.200 | and there's no next sort of big benchmark

02:01:02.800 | that the entire community rallies behind

02:01:04.960 | and uses for further development of these networks.

02:01:09.320 | - I wonder what it takes for a dataset

02:01:11.000 | to captivate the imagination of everybody,

02:01:13.280 | like where they all get behind it.

02:01:15.040 | That could also need like a leader, right?

02:01:19.320 | Somebody with popularity.

02:01:20.520 | I mean, yeah, why did ImageNet take off?

02:01:23.320 | Is it just the accident of history?

02:01:26.480 | - It was the right amount of difficult.

02:01:28.440 | It was the right amount of difficult

02:01:30.720 | and simple and interesting enough.

02:01:33.120 | It just kind of like, it was the right time

02:01:35.320 | for that kind of a dataset.

02:01:36.680 | - Question from Reddit.

02:01:38.720 | What are your thoughts on the role

02:01:42.360 | that synthetic data and game engines will play

02:01:44.720 | in the future of neural net model development?

02:01:48.280 | - I think as neural nets converge to humans,

02:01:52.320 | the value of simulation to neural nets

02:01:55.760 | will be similar to value of simulation to humans.

02:01:58.200 | So people use simulation for,

02:02:01.480 | people use simulation because they can learn something

02:02:04.280 | in that kind of a system

02:02:05.480 | and without having to actually experience it.

02:02:09.240 | - But are you referring to the simulation

02:02:10.880 | we do in our head?

02:02:12.280 | - No, sorry, simulation, I mean like video games

02:02:14.520 | or other forms of simulation for various professionals.

02:02:19.520 | - So let me push back on that

02:02:21.400 | 'cause maybe there's simulation that we do in our heads.

02:02:23.920 | Like simulate, if I do this, what do I think will happen?

02:02:28.720 | - Okay, that's like internal simulation.

02:02:30.160 | - Yeah, internal.

02:02:31.000 | Isn't that what we're doing as humans before we act?

02:02:33.400 | - Oh yeah, but that's independent

02:02:34.240 | from like the use of simulation

02:02:35.840 | in the sense of like computer games

02:02:37.160 | or using simulation for training set creation or-

02:02:40.240 | - Is it independent or is it just loosely correlated?

02:02:42.840 | 'Cause like, isn't that useful to do like counterfactual

02:02:47.400 | or like edge case simulation to like,

02:02:51.320 | what happens if there's a nuclear war?

02:02:54.960 | What happens if there's, like those kinds of things?

02:02:58.400 | - Yeah, that's a different simulation from like Unreal Engine.

02:03:00.560 | That's how I interpreted the question.

02:03:02.320 | - Ah, so like simulation of the average case.

02:03:05.480 | Is that, what's Unreal Engine?

02:03:08.520 | What do you mean by Unreal Engine?

02:03:11.720 | So simulating a world, the physics of that world,

02:03:16.320 | why is that different?

02:03:18.520 | Like, 'cause you also can add behavior to that world

02:03:22.000 | and you could try all kinds of stuff, right?

02:03:24.840 | You could throw all kinds of weird things into it.

02:03:26.960 | So Unreal Engine is not just about simulating,

02:03:29.680 | I mean, I guess it is about simulating

02:03:31.480 | the physics of the world.

02:03:32.320 | It's also doing something with that.

02:03:35.360 | - Yeah, the graphics, the physics,

02:03:36.440 | and the agents that you put into the environment

02:03:38.600 | and stuff like that, yeah.

02:03:39.520 | - See, I think you, I feel like you said

02:03:41.320 | that it's not that important, I guess,

02:03:43.680 | for the future of AI development.

02:03:46.160 | Is that correct to interpret it that way?

02:03:48.280 | - I think humans use simulators for,

02:03:53.280 | humans use simulators and they find them useful.

02:03:55.240 | And so computers will use simulators and find them useful.

02:03:58.240 | - Okay, so you're saying it's not,

02:04:00.280 | I don't use simulators very often.

02:04:02.080 | I play a video game every once in a while,

02:04:03.640 | but I don't think I derive any wisdom

02:04:05.840 | about my own existence from those video games.

02:04:09.280 | It's a momentary escape from reality

02:04:11.680 | versus a source of wisdom about reality.

02:04:14.840 | So I think that's a very polite way

02:04:17.160 | of saying simulation is not that useful.

02:04:19.240 | - Yeah, maybe not.

02:04:21.120 | I don't see it as a fundamental,

02:04:22.960 | really important part of training neural nets currently.

02:04:26.960 | But I think as neural nets become more and more powerful,

02:04:29.840 | I think you will need fewer examples

02:04:32.400 | to train additional behaviors.

02:04:34.400 | And simulation is, of course,

02:04:36.560 | there's a domain gap in a simulation

02:04:38.120 | that is not the real world,

02:04:39.120 | it's slightly something different.

02:04:40.520 | But with a powerful enough neural net,

02:04:42.920 | you need, the domain gap can be bigger, I think,

02:04:45.720 | because neural net will sort of understand

02:04:47.360 | that even though it's not the real world,

02:04:48.720 | it has all this high level structure

02:04:50.480 | that I'm supposed to be able to learn from.

02:04:52.280 | - So the neural net will actually, yeah,

02:04:54.920 | it will be able to leverage the synthetic data better

02:04:59.320 | by closing the gap,

02:05:00.440 | but understanding in which ways this is not real data.

02:05:04.240 | - Exactly.

02:05:05.080 | - Ready to do better questions next time.

02:05:08.040 | That was a question, I'm just kidding.

02:05:10.600 | All right.

02:05:11.440 | So is it possible, do you think, speaking of MNIST,

02:05:17.280 | to construct neural nets and training processes

02:05:19.720 | that require very little data?

02:05:21.360 | So we've been talking about huge data sets

02:05:25.000 | like the Internet for training.

02:05:26.680 | I mean, one way to say that is like you said,

02:05:28.440 | like the querying itself is another level of training,

02:05:31.600 | I guess, and that requires little data.

02:05:34.200 | But do you see any value in doing research

02:05:38.920 | and kind of going down the direction of,

02:05:41.760 | can we use very little data to train,

02:05:44.160 | to construct a knowledge base?

02:05:45.360 | - 100%.

02:05:46.200 | I just think like at some point you need a massive data set.

02:05:49.040 | And then when you pre-train your massive neural net

02:05:51.120 | and get something that is like a GPT or something,

02:05:54.280 | then you're able to be very efficient

02:05:56.760 | at training any arbitrary new task.

02:05:58.700 | So a lot of these GPTs, you can do tasks

02:06:02.320 | like sentiment analysis or translation or so on,

02:06:04.880 | just by being prompted with very few examples.

02:06:06.960 | Here's the kind of thing I want you to do.

02:06:08.120 | Like here's an input sentence,

02:06:09.720 | here's the translation into German.

02:06:11.240 | Input sentence, translation to German.

02:06:12.840 | Input sentence, blank,

02:06:14.560 | and the neural net will complete the translation to German

02:06:16.720 | just by looking at sort of the example you've provided.

02:06:19.920 | And so that's an example of a very few-shot learning

02:06:23.520 | in the activations of the neural net

02:06:24.880 | instead of the weights of the neural net.

02:06:26.400 | And so I think basically just like humans,

02:06:29.960 | neural nets will become very data efficient

02:06:31.640 | at learning any other new task.

02:06:33.760 | But at some point you need a massive data set

02:06:35.560 | to pre-train your network.

02:06:36.880 | - Do get that.

02:06:38.880 | And probably we humans have something like that.

02:06:41.560 | Do we have something like that?

02:06:42.920 | Do we have a passive in the background,

02:06:47.400 | background model constructing thing

02:06:50.520 | that just runs all the time in a self-supervised way?

02:06:52.960 | We're not conscious of it?

02:06:54.240 | - I think humans definitely.

02:06:55.240 | I mean, obviously we learn a lot during our lifespan,

02:06:59.800 | but also we have a ton of hardware

02:07:02.080 | that helps us at initialization coming from sort of evolution.

02:07:06.160 | And so I think that's also a really big component.

02:07:08.800 | A lot of people in the field,

02:07:09.760 | I think they just talk about the amounts of like seconds

02:07:12.000 | and that a person has lived pretending

02:07:14.480 | that this is a tabula rasa,

02:07:16.160 | sort of like a zero initialization of a neural net.

02:07:18.800 | And it's not.

02:07:20.120 | You can look at a lot of animals,

02:07:21.080 | like for example, zebras.

02:07:22.600 | Zebras get born and they see and they can run.

02:07:27.000 | There's zero training data in their lifespan.

02:07:29.360 | They can just do that.

02:07:30.560 | So somehow, I have no idea how evolution has found a way

02:07:33.720 | to encode these algorithms

02:07:35.160 | and these neural net initializations

02:07:36.480 | that are extremely good into ATCGs.

02:07:38.800 | And I have no idea how this works,

02:07:40.040 | but apparently it's possible

02:07:41.120 | because here's a proof by existence.

02:07:44.200 | - There's something magical about going from a single cell

02:07:48.000 | to an organism that is born to the first few years of life.

02:07:51.600 | I kind of like the idea

02:07:52.560 | that the reason we don't remember anything

02:07:54.400 | about the first few years of our life

02:07:57.040 | is that it's a really painful process.

02:07:59.480 | Like it's a very difficult, challenging training process.

02:08:03.240 | - Yeah.

02:08:04.080 | - Like intellectually.

02:08:05.540 | And maybe, yeah, I mean, I don't,

02:08:09.480 | why don't we remember any of that?

02:08:11.760 | There might be some crazy training going on.

02:08:14.200 | Maybe that's the background model training

02:08:19.440 | that is very painful.

02:08:23.080 | And so it's best for the system once it's trained

02:08:25.720 | not to remember how it's constructed.

02:08:27.760 | - I think it's just like the hardware for long-term memory

02:08:30.120 | is just not fully developed.

02:08:31.920 | I kind of feel like the first few years of infants

02:08:35.720 | is not actually like learning, it's brain maturing.

02:08:38.200 | We're born premature.

02:08:40.540 | There's a theory along those lines

02:08:43.040 | because of the birth canal and the swelling of the brain.

02:08:45.440 | And so we're born premature

02:08:46.680 | and then the first few years,

02:08:47.640 | we're just, the brain's maturing.

02:08:49.520 | And then there's some learning eventually.

02:08:51.620 | That's my current view on it.

02:08:53.680 | - What do you think,

02:08:55.680 | do you think neural nets can have long-term memory?

02:08:58.740 | Like that approach is something like humans.

02:09:02.000 | Do you think there needs to be another meta architecture

02:09:04.960 | on top of it to add something like a knowledge base

02:09:07.640 | that learns facts about the world

02:09:09.360 | and all that kind of stuff?

02:09:10.560 | - Yes, but I don't know to what extent

02:09:12.600 | it will be explicitly constructed.

02:09:14.740 | It might take unintuitive forms

02:09:17.480 | where you are telling the GPT, like,

02:09:20.120 | hey, you have a declarative memory bank

02:09:22.840 | to which you can store and retrieve data from.

02:09:25.120 | And whenever you encounter some information

02:09:26.900 | that you find useful, just save it to your memory bank.

02:09:29.680 | And here's an example of something you have retrieved

02:09:32.120 | and how you say it, and here's how you load from it.

02:09:34.400 | You just say, load, whatever,

02:09:36.800 | you teach it in text in English.

02:09:39.160 | And then it might learn to use a memory bank from that.

02:09:42.400 | - Oh, so the neural net is the architecture

02:09:45.360 | for the background model, the base thing.

02:09:48.200 | And then everything else is just on top of it.

02:09:50.120 | That's pretty easy to do. - It's not just text, right?

02:09:51.640 | You're giving it gadgets and gizmos.

02:09:52.960 | So you're teaching some kind of a special language

02:09:55.920 | by which it can save arbitrary information

02:09:58.160 | and retrieve it at a later time.

02:09:59.720 | And you're telling it about these special tokens

02:10:01.720 | and how to arrange them to use these interfaces.

02:10:04.760 | It's like, hey, you can use a calculator.

02:10:06.400 | Here's how you use it.

02:10:07.240 | Just do five, three plus four, one equals.

02:10:10.240 | And when equals is there,

02:10:12.640 | a calculator will actually read out the answer

02:10:14.560 | and you don't have to calculate it yourself.

02:10:16.240 | And you just tell it in English, this might actually work.

02:10:19.600 | - Do you think in that sense, Gato is interesting,

02:10:21.840 | the DeepMind system that it's not just doing language,

02:10:24.760 | but actually throws it all in the same pile,

02:10:28.680 | images, actions, all that kind of stuff.

02:10:31.720 | That's basically what we're moving towards.

02:10:34.160 | - Yeah, I think so.

02:10:35.000 | So Gato is very much a kitchen sink approach

02:10:38.360 | to reinforcement learning lots of different environments

02:10:42.440 | with a single fixed transformer model.

02:10:46.640 | I think it's a very sort of early result in that realm.

02:10:49.880 | But I think, yeah, it's along the lines

02:10:51.520 | of what I think things will eventually look like.

02:10:53.440 | - Right, so this is the early days of a system

02:10:55.920 | that eventually will look like this,

02:10:57.120 | like from a rich, sudden perspective.

02:10:59.920 | - Yeah, I'm not super huge fan of, I think,

02:11:01.560 | all these interfaces that look very different.

02:11:04.840 | I would want everything to be normalized into the same API.

02:11:07.360 | So for example, screen pixels, very same API.

02:11:10.160 | Instead of having different world environments

02:11:11.920 | that have very different physics and joint configurations

02:11:14.120 | and appearances and whatever,

02:11:15.600 | and you're having some kind of special tokens

02:11:17.160 | for different games that you can plug.

02:11:19.520 | I'd rather just normalize everything to a single interface

02:11:22.520 | so it looks the same to the neural net,

02:11:24.200 | if that makes sense.

02:11:25.040 | - So it's all gonna be pixel-based pong in the end.

02:11:27.600 | - I think so.

02:11:28.440 | - Okay, let me ask you about your own personal life.

02:11:34.880 | A lot of people wanna know,

02:11:36.760 | you're one of the most productive and brilliant people

02:11:39.240 | in the history of AI.

02:11:40.280 | What does a productive day

02:11:41.640 | in the life of Andrej Karpathy look like?

02:11:44.440 | What time do you wake up?

02:11:45.920 | 'Cause imagine some kind of dance

02:11:48.920 | between the average productive day

02:11:50.640 | and a perfect productive day.

02:11:51.920 | So the perfect productive day is the thing we strive towards

02:11:55.360 | and the average is kind of what it kind of converges to,

02:11:58.120 | given all the mistakes and human eventualities and so on.

02:12:01.760 | So what time do you wake up?

02:12:03.040 | Are you a morning person?

02:12:04.400 | - I'm not a morning person.

02:12:05.520 | I'm a night owl, for sure.

02:12:07.280 | - Is it stable or not?

02:12:08.920 | - It's semi-stable, like eight or nine

02:12:11.200 | or something like that.

02:12:12.560 | During my PhD, it was even later.

02:12:14.480 | I used to go to sleep usually at 3 a.m.

02:12:16.680 | I think the a.m. hours are precious

02:12:19.920 | and very interesting time to work

02:12:21.120 | because everyone is asleep.

02:12:23.120 | At 8 a.m. or 7 a.m., the East Coast is awake.

02:12:26.280 | So there's already activity,

02:12:27.400 | there's already some text messages, whatever.

02:12:29.200 | There's stuff happening.

02:12:30.040 | You can go on some news website

02:12:31.880 | and there's stuff happening and distracting.

02:12:34.160 | At 3 a.m., everything is totally quiet.

02:12:36.560 | And so you're not gonna be bothered

02:12:37.800 | and you have solid chunks of time to do work.

02:12:42.120 | So I like those periods, night owl by default.

02:12:45.160 | And then I think productive time, basically,

02:12:47.360 | what I like to do is you need to build some momentum

02:12:50.880 | on a problem without too much distraction.

02:12:53.560 | And you need to load your RAM,

02:12:56.920 | your working memory with that problem.

02:13:00.320 | And then you need to be obsessed with it

02:13:01.640 | when you're taking a shower, when you're falling asleep.

02:13:03.960 | You need to be obsessed with the problem

02:13:05.280 | and it's fully in your memory

02:13:06.520 | and you're ready to wake up and work on it right there.

02:13:08.880 | - So is this in a scale, temporal scale of a single day

02:13:13.320 | or a couple of days, a week, a month?

02:13:15.080 | - So I can't talk about one day basically in isolation

02:13:17.560 | because it's a whole process.

02:13:19.400 | When I wanna get productive in the problem,

02:13:21.480 | I feel like I need a span of a few days

02:13:23.900 | where I can really get in on that problem.

02:13:26.520 | And I don't wanna be interrupted

02:13:27.560 | and I'm going to just be completely obsessed

02:13:30.120 | with that problem.

02:13:31.160 | And that's where I do most of my good workouts.

02:13:34.080 | - You've done a bunch of cool little projects

02:13:36.440 | in a very short amount of time, very quickly.

02:13:38.440 | So that requires you just focusing on it.

02:13:40.880 | - Yeah, basically I need to load my working memory

02:13:42.480 | with the problem and I need to be productive

02:13:44.440 | because there's always a huge fixed cost

02:13:46.320 | to approaching any problem.

02:13:47.720 | I was struggling with this, for example, at Tesla

02:13:51.160 | because I want to work on a small side project,

02:13:53.560 | but okay, you first need to figure out,

02:13:54.980 | oh, okay, I need to SSH into my cluster.

02:13:56.520 | I need to bring up a VS Code editor

02:13:58.080 | so I can work on this.

02:13:59.320 | I ran into some stupid error because of some reason.

02:14:03.240 | Like you're not at a point

02:14:04.080 | where you can be just productive right away.

02:14:05.680 | You are facing barriers.

02:14:07.560 | And so it's about really removing all of that barrier

02:14:11.480 | and you're able to go into the problem

02:14:12.880 | and you have the full problem loaded in your memory.

02:14:15.400 | - And somehow avoiding distractions of all different forms,

02:14:18.240 | like news stories, emails, but also distractions

02:14:23.200 | from other interesting projects

02:14:24.840 | that you previously worked on

02:14:26.200 | or currently working on and so on.

02:14:28.040 | You just want to really focus your mind.

02:14:29.800 | - And I mean, I can take some time off for distractions

02:14:32.080 | and in between, but I think it can't be too much.

02:14:35.400 | Most of your day is sort of like spent on that problem.

02:14:38.040 | And then, you know, I drink coffee,

02:14:41.080 | I have my morning routine, I look at some news,

02:14:43.920 | Twitter, Hacker News, Wall Street Journal, et cetera.

02:14:46.680 | So it's great.

02:14:47.520 | - So basically you wake up, you have some coffee,

02:14:49.440 | are you trying to get to work as quickly as possible?

02:14:51.400 | Or do you take in this diet

02:14:53.000 | of like what the hell is happening in the world first?

02:14:56.480 | - I am, I do find it interesting to know about the world.

02:14:59.680 | I don't know that it's useful or good,

02:15:01.960 | but it is part of my routine right now.

02:15:03.600 | So I do read through a bunch of news articles

02:15:05.320 | and I want to be informed and I'm suspicious of it.

02:15:09.920 | I'm suspicious of the practice,

02:15:10.980 | but currently that's where I am.

02:15:12.320 | - Oh, you mean suspicious about the positive effect

02:15:15.840 | of that practice on your productivity

02:15:18.120 | and your wellbeing?

02:15:19.320 | - My wellbeing psychologically, yeah.

02:15:21.080 | - And also on your ability to deeply understand the world

02:15:23.600 | because there's a bunch of sources of information,

02:15:26.520 | you're not really focused on deeply integrating it.

02:15:28.520 | - Yeah, it's a little distracting.

02:15:30.840 | - In terms of a perfectly productive day

02:15:33.240 | for how long of a stretch of time in one session

02:15:37.400 | do you try to work and focus on a thing?

02:15:39.720 | Is it a couple hours, is it one hour,

02:15:41.160 | is it 30 minutes, is it 10 minutes?

02:15:43.560 | - I can probably go like a small few hours

02:15:45.360 | and then I need some breaks in between

02:15:47.160 | for like food and stuff.

02:15:48.600 | And yeah, but I think like it's still really hard

02:15:52.480 | to accumulate hours.

02:15:53.560 | I was using a tracker that told me exactly

02:15:55.400 | how much time I spent coding any one day.

02:15:57.080 | And even on a very productive day,

02:15:58.680 | I still spent only like six or eight hours.

02:16:01.640 | And it's just because there's so much padding,

02:16:03.480 | commute, talking to people, food, et cetera.

02:16:07.240 | There's like a cost of life, just living and sustaining

02:16:11.000 | and homeostasis and just maintaining yourself as a human

02:16:14.360 | is very high.

02:16:15.900 | - And there seems to be a desire within the human mind

02:16:19.960 | to participate in society that creates that padding.

02:16:23.640 | 'Cause the most productive days I've ever had

02:16:26.520 | is just completely from start to finish,

02:16:28.280 | just tuning out everything and just sitting there.

02:16:31.280 | - And then you could do more than six and eight hours.

02:16:34.120 | Is there some wisdom about what gives you strength

02:16:36.520 | to do like tough days of long focus?

02:16:39.640 | - Yeah, just like whenever I get obsessed about a problem,

02:16:43.040 | something just needs to work, something just needs to exist.

02:16:45.280 | - It needs to exist.

02:16:47.040 | So you're able to deal with bugs and programming issues

02:16:49.880 | and technical issues and design decisions

02:16:53.000 | that turn out to be the wrong ones.

02:16:54.240 | You're able to think through all of that,

02:16:55.680 | given that you want a thing to exist.

02:16:57.840 | - Yeah, it needs to exist.

02:16:58.680 | And then I think to me also a big factor is,

02:17:01.240 | are other humans are going to appreciate it?

02:17:02.960 | Are they going to like it?

02:17:04.120 | That's a big part of my motivation.

02:17:05.400 | If I'm helping humans and they seem happy,

02:17:07.900 | they say nice things, they tweet about it or whatever,

02:17:11.560 | that gives me pleasure because I'm doing something useful.

02:17:13.840 | - So like you do see yourself sharing it with the world,

02:17:16.960 | like whether it's on GitHub,

02:17:17.960 | whether it's a blog post or through videos.

02:17:19.800 | - Yeah, I was thinking about it.

02:17:20.640 | Like suppose I did all these things but did not share them,

02:17:22.960 | I don't think I would have the same amount of motivation

02:17:24.720 | that I can build up.

02:17:25.560 | - You enjoy the feeling of other people gaining value

02:17:30.840 | and happiness from the stuff you've created.

02:17:33.080 | - Yeah.

02:17:34.280 | - What about diet?

02:17:35.560 | I saw you played with intermittent fasting.

02:17:38.360 | Do you fast?

02:17:39.200 | Does that help?

02:17:40.040 | - I play with everything.

02:17:40.860 | (laughs)

02:17:41.700 | - With the things you play,

02:17:42.640 | what's been most beneficial to your ability

02:17:45.840 | to mentally focus on a thing?

02:17:47.360 | And just mental productivity and happiness.

02:17:50.800 | You still fast?

02:17:51.640 | - Yeah, I still fast, but I do intermittent fasting.

02:17:54.200 | But really what it means at the end of the day

02:17:55.640 | is I skip breakfast.

02:17:56.880 | So I do 18/6 roughly by default

02:17:59.720 | when I'm in my steady state.

02:18:01.120 | If I'm traveling or doing something else,

02:18:02.480 | I will break the rules.

02:18:03.540 | But in my steady state, I do 18/6.

02:18:06.000 | So I eat only from 12 to six.

02:18:08.220 | Not a hard rule and I break it often,

02:18:09.560 | but that's my default.

02:18:10.880 | And then, yeah, I've done a bunch of random experiments.

02:18:13.880 | For the most part right now,

02:18:15.400 | where I've been for the last year and a half,

02:18:17.080 | I wanna say, is I'm plant-based or plant-forward.

02:18:20.920 | I heard plant-forward, it sounds better.

02:18:22.520 | - That's what I mean, exactly.

02:18:23.360 | - I don't actually know what the difference is,

02:18:24.200 | but it sounds better in my mind.

02:18:25.680 | But it just means I prefer plant-based food.

02:18:28.880 | - Raw or cooked or?

02:18:30.600 | - I prefer cooked and plant-based.

02:18:33.080 | - So plant-based, forgive me,

02:18:35.860 | I don't actually know how wide the category of plant entails.

02:18:40.800 | - Well, plant-based just means that you're not

02:18:42.360 | - Like a chickpea? - Like a chicken about it

02:18:43.720 | and you can flex.

02:18:45.000 | And you just prefer to eat plants.

02:18:47.560 | And you're not making,

02:18:49.040 | you're not trying to influence other people.

02:18:50.960 | And if someone is, you come to someone's house party

02:18:53.000 | and they serve you a steak that they're really proud of,

02:18:54.880 | you will eat it.

02:18:55.720 | - Yes, right.

02:18:56.760 | So you're not judgmental.

02:18:57.600 | Oh, that's beautiful.

02:18:58.420 | I mean, that's, I'm the flip side of that,

02:19:00.760 | but I'm very sort of flexible.

02:19:02.760 | Have you tried doing one meal a day?

02:19:05.040 | - I have, accidentally, not consistently.

02:19:08.520 | But I've accidentally had that.

02:19:09.560 | I don't like it.

02:19:10.680 | I think it makes me feel not good.

02:19:12.600 | It's too much, too much of a hit.

02:19:15.040 | And so currently I have about two meals a day,

02:19:17.480 | 12 and six, probably.

02:19:18.600 | - I do that nonstop.

02:19:19.960 | I'm doing it now.

02:19:20.800 | I do it one meal a day.

02:19:22.480 | - Okay. - It's interesting.

02:19:23.560 | It's an interesting feeling.

02:19:24.800 | Have you ever fasted longer than a day?

02:19:26.600 | - Yeah, I've done a bunch of water fasts

02:19:28.400 | 'cause I was curious what happens.

02:19:29.860 | - What happened?

02:19:30.780 | Anything interesting?

02:19:32.100 | - Yeah, I would say so.

02:19:32.940 | I mean, you know, what's interesting

02:19:34.260 | is that you're hungry for two days,

02:19:35.940 | and then starting day three or so, you're not hungry.

02:19:38.580 | It's like such a weird feeling

02:19:40.500 | because you haven't eaten in a few days

02:19:41.740 | and you're not hungry.

02:19:42.820 | - Isn't that weird?

02:19:43.660 | - It's really weird.

02:19:44.500 | - One of the many weird things about human biology.

02:19:47.100 | - Yeah. - It figures something out.

02:19:48.260 | It finds another source of energy or something like that,

02:19:51.180 | or relaxes the system.

02:19:53.980 | I don't know how it works.

02:19:54.820 | - Yeah, the body is like, you're hungry, you're hungry,

02:19:56.140 | and then it just gives up.

02:19:57.060 | It's like, okay, I guess we're fasting now.

02:19:58.560 | There's nothing.

02:19:59.400 | (both laughing)

02:20:00.220 | And then it just kind of focuses

02:20:01.240 | on trying to make you not hungry

02:20:03.200 | and not feel the damage of that

02:20:05.580 | and trying to give you some space

02:20:07.260 | to figure out the food situation.

02:20:08.760 | (laughs)

02:20:09.800 | - So are you still to this day most productive at night?

02:20:14.680 | - I would say I am,

02:20:15.720 | but it is really hard to maintain my PhD schedule,

02:20:18.540 | especially when I was, say, working at Tesla and so on.

02:20:21.720 | It's a non-starter.

02:20:23.540 | But even now, people want to meet for various events.

02:20:27.980 | Society lives in a certain period of time,

02:20:30.220 | and you sort of have to work with that.

02:20:32.180 | - It's hard to do a social thing

02:20:34.340 | and then after that return and do work.

02:20:36.540 | - Yeah, it's just really hard.

02:20:37.940 | (both laughing)

02:20:40.100 | - That's why I try, when I do social things,

02:20:41.580 | I try not to do too much drinking

02:20:43.880 | so I can return and continue doing work.

02:20:46.220 | - Yeah.

02:20:47.060 | - But at Tesla, is there conversions?

02:20:51.700 | Not Tesla, but any company,

02:20:53.880 | is there a convergence towards the schedule?

02:20:56.080 | Or is there more?

02:20:57.140 | Is that how humans behave when they collaborate?

02:21:00.720 | I need to learn about this.

02:21:02.240 | Do they try to keep a consistent schedule

02:21:04.200 | where you're all awake at the same time?

02:21:05.720 | - I mean, I do try to create a routine,

02:21:07.400 | and I try to create a steady state

02:21:09.100 | in which I'm comfortable in.

02:21:11.360 | So I have a morning routine, I have a day routine.

02:21:13.380 | I try to keep things to a steady state,

02:21:15.580 | and things are predictable,

02:21:17.940 | and then you can sort of just like,

02:21:19.040 | your body just sort of sticks to that.

02:21:20.920 | And if you try to stress that a little too much,

02:21:22.420 | it will create, when you're traveling

02:21:24.320 | and you're dealing with jet lag,

02:21:25.380 | you're not able to really ascend to where you need to go.

02:21:29.920 | - Yeah, yeah, that's weird too,

02:21:31.400 | about humans with the habits and stuff.

02:21:33.440 | What are your thoughts on work-life balance

02:21:35.960 | throughout a human lifetime?

02:21:37.480 | So Tesla in part was known for sort of

02:21:41.640 | pushing people to their limits

02:21:43.120 | in terms of what they're able to do,

02:21:45.400 | in terms of what they're trying to do,

02:21:48.460 | in terms of how much they work, all that kind of stuff.

02:21:50.760 | - Yeah, I mean, I will say Tesla gets

02:21:52.640 | a little too much bad rep for this,

02:21:55.120 | because what's happening is Tesla,

02:21:56.480 | it's a bursty environment.

02:21:58.120 | So I would say the baseline,

02:22:00.320 | my only point of reference is Google,

02:22:02.120 | where I've interned three times,

02:22:03.120 | and I saw what it's like inside Google and DeepMind.

02:22:05.920 | I would say the baseline is higher than that,

02:22:08.840 | but then there's a punctuated equilibrium,

02:22:10.740 | where once in a while there's a fire,

02:22:12.520 | and people work really hard.

02:22:14.840 | And so it's spiky and bursty,

02:22:16.840 | and then all the stories get collected.

02:22:18.400 | - About the bursts, yeah.

02:22:19.440 | - And then it gives the appearance of like total insanity,

02:22:21.880 | but actually it's just a bit more intense environment,

02:22:24.520 | and there are fires and sprints.

02:22:26.960 | And so I think, definitely though,

02:22:29.000 | I would say it's a more intense environment

02:22:31.920 | than something you would get at Google.

02:22:32.760 | - But in your personal, forget all of that,

02:22:34.900 | just in your own personal life,

02:22:37.560 | what do you think about the happiness of a human being,

02:22:40.960 | a brilliant person like yourself,

02:22:43.860 | about finding a balance between work and life,

02:22:46.680 | or is it such a thing, not a good thought experiment?

02:22:50.760 | - Yeah, I think balance is good,

02:22:55.440 | but I also love to have sprints that are out of distribution.

02:22:58.680 | And that's when I think I've been pretty creative as well.

02:23:03.680 | - Sprints out of distribution means that most of the time

02:23:08.080 | you have a, quote unquote, balance.

02:23:11.720 | - I have balance most of the time.

02:23:12.560 | - And then sprint. - I like being obsessed

02:23:14.040 | with something once in a while.

02:23:15.900 | - Once in a while is what, once a week,

02:23:17.320 | once a month, once a year?

02:23:18.440 | - Yeah, probably like I say, once a month or something.

02:23:20.520 | - And that's when we get a new GitHub repo for monitoring.

02:23:23.280 | - Yeah, that's when you really care about a problem.

02:23:24.960 | It must exist, this will be awesome, you're obsessed with it

02:23:28.200 | and now you can't just do it on that day.

02:23:29.760 | You need to pay the fixed cost of getting into the groove,

02:23:32.560 | and then you need to stay there for a while,

02:23:34.280 | and then society will come and they will try to mess

02:23:36.640 | with you and they will try to distract you.

02:23:38.400 | Yeah, the worst thing is a person who's like,

02:23:39.920 | "I just need five minutes of your time."

02:23:42.400 | This is, the cost of that is not five minutes.

02:23:45.040 | And society needs to change how it thinks about

02:23:48.500 | just five minutes of your time.

02:23:50.340 | - Right.

02:23:51.180 | - It's never, it's never just one minute,

02:23:53.140 | just 30 seconds, just a quick thing.

02:23:53.980 | - What's the big deal?

02:23:54.820 | Why are you being so?

02:23:55.900 | - Yeah, no.

02:23:57.140 | What's your computer setup?

02:24:00.940 | What's like the perfect, are you somebody that's flexible

02:24:04.560 | to no matter what, laptop, four screens?

02:24:08.060 | - Yeah.

02:24:08.900 | - Or do you prefer a certain setup

02:24:11.660 | that you're most productive?

02:24:13.700 | - I guess the one that I'm familiar with is one large screen,

02:24:17.020 | 27 inch, and my laptop on the side.

02:24:20.380 | - What operating system?

02:24:21.700 | - I do Macs, that's my primary.

02:24:23.660 | - For all tasks?

02:24:25.220 | - I would say OSX, but when you're working on deep learning,

02:24:26.940 | everything is Linux.

02:24:27.780 | You're SSHed into a cluster and you're working remotely.

02:24:30.860 | - But what about the actual development,

02:24:32.180 | like using the IDE?

02:24:33.780 | - Yeah, you would use, I think a good way is,

02:24:36.060 | you just run VS Code, my favorite editor right now,

02:24:39.500 | on your Mac, but you are actually,

02:24:41.400 | you have a remote folder through SSH.

02:24:43.440 | So the actual files that you're manipulating

02:24:46.380 | are in the cluster somewhere else.

02:24:47.460 | - So what's the best IDE?

02:24:49.760 | VS Code, what else do people, so I use Emacs still.

02:24:55.560 | - That's cool.

02:24:56.400 | - It may be cool, I don't know if it's maximum productivity.

02:25:00.540 | So what do you recommend in terms of editors?

02:25:04.260 | You worked with a lot of software engineers,

02:25:06.140 | editors for Python, C++, machine learning applications?

02:25:11.320 | - I think the current answer is VS Code.

02:25:13.440 | Currently, I believe that's the best IDE.

02:25:16.680 | It's got a huge amount of extensions.

02:25:18.320 | It has GitHub Copilot integration,

02:25:22.040 | which I think is very valuable.

02:25:23.360 | - What do you think about the Copilot integration?

02:25:25.560 | I was actually, I got to talk a bunch with Guido Narasan,

02:25:28.760 | who's a creator of Python, and he loves Copilot.

02:25:31.960 | He like, he programs a lot with it.

02:25:34.320 | - Yep.

02:25:35.360 | - Do you?

02:25:36.600 | - Yeah, I use Copilot, I love it.

02:25:37.880 | And it's free for me, but I would pay for it.

02:25:40.680 | - Yeah, I think it's very good.

02:25:41.660 | And the utility that I found with it was,

02:25:43.420 | is it, I would say there's a learning curve,

02:25:45.700 | and you need to figure out when it's helpful,

02:25:48.540 | and when to pay attention to its outputs,

02:25:50.180 | and when it's not going to be helpful,

02:25:51.300 | where you should not pay attention to it.

02:25:52.980 | Because if you're just reading its suggestions all the time,

02:25:54.940 | it's not a good way of interacting with it.

02:25:56.620 | But I think I was able to sort of like mold myself to it.

02:25:58.900 | I find it's very helpful, number one,

02:26:00.380 | in copy, paste, and replace some parts.

02:26:02.980 | So I don't, when the pattern is clear,

02:26:05.740 | it's really good at completing the pattern.

02:26:07.740 | And number two, sometimes it suggests APIs

02:26:09.940 | that I'm not aware of.

02:26:11.500 | So it tells you about something that you didn't know.

02:26:14.060 | So--

02:26:14.900 | - And that's an opportunity to discover a new idea.

02:26:15.720 | - It's an opportunity to,

02:26:17.180 | so I would never take Copilot code as given.

02:26:19.500 | I almost always copy, copy paste into a Google search,

02:26:22.660 | and you see what this function is doing.

02:26:24.420 | And then you're like, oh, it's actually,

02:26:25.460 | actually exactly what I need.

02:26:26.820 | Thank you, Copilot.

02:26:27.660 | So you learned something.

02:26:28.480 | - So it's in part a search engine,

02:26:29.980 | a part maybe getting the exact syntax correctly,

02:26:33.860 | that once you see it, it's that NP-hard thing.

02:26:36.940 | It's like, once you see it, you know it's correct.

02:26:40.180 | - Exactly.

02:26:41.020 | - But you yourself struggle.

02:26:42.340 | You can verify efficiently,

02:26:43.500 | but you can't generate efficiently.

02:26:45.660 | - And Copilot really, I mean,

02:26:46.580 | it's autopilot for programming, right?

02:26:49.540 | And currently is doing the link following,

02:26:51.540 | which is like the simple copy, paste, and sometimes suggest.

02:26:54.540 | But over time, it's going to become more and more autonomous.

02:26:57.120 | And so the same thing will play out in not just coding,

02:27:00.020 | but actually across many, many different things probably.

02:27:02.340 | - But coding is an important one, right?

02:27:04.140 | Writing programs.

02:27:06.060 | How do you see the future of that developing,

02:27:08.540 | the program synthesis,

02:27:09.700 | like being able to write programs

02:27:11.680 | that are more and more complicated?

02:27:13.260 | 'Cause right now it's human supervised in interesting ways.

02:27:18.260 | It feels like the transition will be very painful.

02:27:22.020 | - My mental model for it is the same thing will happen

02:27:24.420 | as with the autopilot.

02:27:26.180 | So currently it's doing link following,

02:27:27.860 | it's doing some simple stuff.

02:27:29.420 | And eventually we'll be doing autonomy

02:27:31.260 | and people will have to intervene less and less.

02:27:33.220 | - And those could be like testing mechanisms.

02:27:37.460 | Like if it writes a function

02:27:38.700 | and that function looks pretty damn correct,

02:27:41.420 | but how do you know it's correct?

02:27:43.100 | 'Cause you're like getting lazier and lazier as a programmer.

02:27:46.220 | Like your ability to, 'cause like little bugs,

02:27:48.660 | but I guess it won't make little mistakes.

02:27:50.620 | - No, it will.

02:27:51.820 | Copilot will make off by one subtle bugs.

02:27:54.740 | It has done that to me.

02:27:55.820 | - But do you think future systems will?

02:27:57.820 | Or is it really the off by one

02:28:00.280 | is actually a fundamental challenge of programming?

02:28:02.860 | - In that case, it wasn't fundamental

02:28:04.700 | and I think things can improve,

02:28:05.980 | but yeah, I think humans have to supervise.

02:28:08.420 | I am nervous about people not supervising what comes out

02:28:11.140 | and what happens to, for example,

02:28:12.820 | the proliferation of bugs in all of our systems.

02:28:15.340 | I'm nervous about that,

02:28:16.220 | but I think there will probably be some other copilots

02:28:18.740 | for bug finding and stuff like that at some point.

02:28:21.260 | 'Cause there'll be like a lot more automation for-

02:28:23.420 | - Oh man.

02:28:24.540 | It's like a program, a copilot that generates a compiler,

02:28:30.900 | one that does a linter.

02:28:32.300 | - Yes.

02:28:33.140 | - One that does like a type checker.

02:28:35.420 | - Yeah.

02:28:36.260 | (laughing)

02:28:37.700 | It's a committee of like a GPT sort of like-

02:28:40.380 | - And then there'll be like a manager for the committee.

02:28:42.260 | - Yeah.

02:28:43.100 | - And then there'll be somebody that says

02:28:44.300 | a new version of this is needed.

02:28:45.740 | We need to regenerate it.

02:28:46.980 | - Yeah.

02:28:47.820 | There were 10 GPTs.

02:28:48.640 | They were forwarded and gave 50 suggestions.

02:28:50.220 | Another one looked at it and picked a few that they like.

02:28:53.020 | A bug one looked at it and it was like,

02:28:54.620 | it's probably a bug.

02:28:55.660 | They got re-ranked by some other thing.

02:28:57.360 | And then a final ensemble GPT comes in and is like,

02:29:00.540 | okay, given everything you guys have told me,

02:29:01.940 | this is probably the next token.

02:29:03.140 | (laughing)

02:29:04.140 | - You know, the feeling is the number of programmers

02:29:05.920 | in the world has been growing and growing very quickly.

02:29:08.260 | Do you think it's possible that it'll actually level out

02:29:10.780 | and drop to like a very low number with this kind of world?

02:29:14.500 | 'Cause then you'd be doing software 2.0 programming

02:29:17.660 | and you'll be doing this kind of generation

02:29:22.420 | of copilot type systems programming,

02:29:25.140 | but you won't be doing the old school

02:29:27.500 | soft software 1.0 programming.

02:29:29.860 | - I don't currently think that they're just going

02:29:31.340 | to replace human programmers.

02:29:33.140 | I'm so hesitant saying stuff like this, right?

02:29:37.100 | - Yeah, 'cause this is gonna be replaced in five years.

02:29:40.260 | I don't know, it's going to show that

02:29:42.460 | this is where we thought, 'cause I agree with you,

02:29:45.180 | but I think we might be very surprised, right?

02:29:49.020 | Like what are the next,

02:29:51.380 | what's your sense of where we're staying

02:29:54.180 | with language models?

02:29:55.260 | Does it feel like the beginning or the middle or the end?

02:29:57.900 | - The beginning, 100%.

02:29:59.420 | I think the big question in my mind is,

02:30:00.780 | for sure, GPT will be able to program quite well,

02:30:03.060 | competently and so on.

02:30:04.220 | How do you steer the system?

02:30:05.780 | You still have to provide some guidance

02:30:07.660 | to what you actually are looking for.

02:30:09.260 | And so how do you steer it and how do you say,

02:30:11.420 | how do you talk to it?

02:30:12.780 | How do you audit it and verify that what is done is correct?

02:30:16.820 | And how do you like work with this?

02:30:18.380 | And it's as much, not just an AI problem,

02:30:20.420 | but a UI/UX problem.

02:30:21.940 | - Yeah.

02:30:23.420 | - So beautiful, fertile ground for so much interesting work

02:30:27.340 | for VS Code++ where you're not just,

02:30:29.820 | it's not just human programming anymore.

02:30:31.100 | It's amazing.

02:30:31.940 | - Yeah, so you're interacting with the system.

02:30:33.660 | So not just one prompt, but it's iterative prompting.

02:30:37.220 | - Yeah.

02:30:38.060 | - You're trying to figure out,

02:30:38.900 | having a conversation with the system.

02:30:39.820 | - Yeah.

02:30:40.660 | - That actually, I mean, to me, that's super exciting

02:30:42.740 | to have a conversation with the program I'm running.

02:30:45.820 | - Yeah, maybe at some point you're just conversing with it.

02:30:48.100 | It's like, okay, here's what I want to do.

02:30:49.780 | Actually, this variable,

02:30:51.700 | maybe it's not even that low level as a variable, but.

02:30:54.060 | - You can also imagine like,

02:30:56.100 | can you translate this to C++ and back to Python?

02:30:58.980 | - Yeah, it already kind of exists in some ways.

02:31:00.620 | - No, but just like doing it

02:31:01.700 | as part of the program experience.

02:31:03.620 | Like, I think I'd like to write this function in C++.

02:31:07.700 | Or like, you just keep changing for different programs

02:31:11.380 | 'cause they have different syntax.

02:31:13.500 | Maybe I want to convert this into a functional language.

02:31:15.620 | - Yeah.

02:31:16.460 | - And so like, you get to become multilingual as a programmer

02:31:20.500 | and dance back and forth efficiently.

02:31:22.380 | - Yeah.

02:31:23.220 | I mean, I think the UI/UX effect though

02:31:24.780 | is like still very hard to think through

02:31:26.660 | because it's not just about writing code on a page.

02:31:29.500 | You have an entire developer environment.

02:31:31.340 | You have a bunch of hardware on it.

02:31:33.140 | You have some environmental variables.

02:31:34.540 | You have some scripts that are running in a Chrome job.

02:31:36.420 | Like there's a lot going on to like working with computers

02:31:39.420 | and how do these systems set up environment flags

02:31:43.540 | and work across multiple machines

02:31:45.140 | and set up screen sessions

02:31:46.180 | and automate different processes.

02:31:47.820 | Like how all that works and is auditable by humans

02:31:50.580 | and so on is like massive question at the moment.

02:31:53.300 | - You've built Archive Sanity.

02:31:55.980 | What is Archive and what is the future

02:31:58.340 | of academic research publishing that you would like to see?

02:32:01.940 | - So Archive is this pre-print server.

02:32:03.700 | So if you have a paper,

02:32:05.100 | you can submit it for publication

02:32:06.540 | to journals or conferences and then wait six months

02:32:08.740 | and then maybe get a decision, pass or fail,

02:32:10.860 | or you can just upload it to Archive.

02:32:13.260 | And then people can tweet about it three minutes later

02:32:15.900 | and then everyone sees it, everyone reads it

02:32:17.500 | and everyone can profit from it in their own little ways.

02:32:20.380 | - And you can cite it and it has an official look to it.

02:32:23.820 | It feels like a publication process.

02:32:27.500 | It feels different than if you just put it in a blog post.

02:32:30.380 | - Oh yeah, yeah, I mean, it's a paper

02:32:31.740 | and usually the bar is higher for something

02:32:34.180 | that you would expect on Archive

02:32:35.980 | as opposed to something you would see in a blog post.

02:32:38.060 | - Well, the culture created the bar

02:32:40.940 | 'cause you could probably post a pretty crappy paper

02:32:43.220 | in Archive.

02:32:44.060 | So what's that make you feel like?

02:32:46.820 | What's that make you feel about peer review?

02:32:49.020 | So rigorous peer review by two, three experts

02:32:52.700 | versus the peer review of the community

02:32:56.740 | right as it's written.

02:32:57.780 | - Yeah, basically I think the community is very well able

02:33:00.580 | to peer review things very quickly on Twitter.

02:33:03.900 | And I think maybe it just has to do something

02:33:05.660 | with AI machine learning field specifically though.

02:33:07.940 | I feel like things are more easily auditable

02:33:10.420 | and the verification is easier potentially

02:33:14.020 | than the verification somewhere else.

02:33:15.620 | So it's kind of like,

02:33:17.060 | you can think of these scientific publications

02:33:19.180 | as like little blockchains

02:33:20.180 | where everyone's building on each other's work

02:33:21.460 | and citing each other.

02:33:22.420 | And you sort of have AI,

02:33:23.620 | which is kind of like this much faster and loose blockchain.

02:33:27.060 | But then you have,

02:33:28.100 | and any one individual entry is like very cheap to make.

02:33:32.460 | And then you have other fields

02:33:33.300 | where maybe that model doesn't make as much sense.

02:33:35.780 | And so I think in AI,

02:33:37.900 | at least things are pretty easily verifiable.

02:33:40.180 | And so that's why when people upload papers

02:33:41.820 | that are a really good idea and so on,

02:33:43.380 | people can try it out like the next day.

02:33:45.780 | And they can be the final arbiter

02:33:47.180 | of whether it works or not on their problem.

02:33:49.020 | And the whole thing just moves significantly faster.

02:33:51.500 | So I kind of feel like academia still has a place,

02:33:53.940 | sort of this like conference journal process

02:33:55.620 | still has a place,

02:33:56.460 | but it's sort of like,

02:33:57.940 | it lags behind, I think.

02:33:59.740 | And it's a bit more maybe higher quality process,

02:34:03.140 | but it's not sort of the place

02:34:04.860 | where you will discover cutting edge work anymore.

02:34:07.340 | It used to be the case when I was starting my PhD

02:34:09.060 | that you go to conferences and journals

02:34:10.860 | and you discuss all the latest research.

02:34:12.500 | Now, when you go to a conference or journal,

02:34:14.060 | like no one discusses anything that's there

02:34:15.940 | because it's already like three generations ago irrelevant.

02:34:19.260 | - Yeah, which makes me sad

02:34:20.860 | about like DeepMind, for example,

02:34:22.380 | where they still publish in nature

02:34:24.980 | and these big prestigious,

02:34:26.860 | I mean, there's still value, I suppose,

02:34:28.340 | to the prestige that comes with these big venues,

02:34:30.780 | but the result is that they'll announce

02:34:34.180 | some breakthrough performance

02:34:36.020 | and it will take like a year

02:34:37.900 | to actually publish the details.

02:34:39.620 | I mean, and those details,

02:34:42.620 | if they were published immediately,

02:34:43.780 | would inspire the community

02:34:45.420 | to move in certain directions.

02:34:46.860 | - Yeah, it would speed up the rest of the community,

02:34:48.380 | but I don't know to what extent

02:34:49.980 | that's part of their objective function also.

02:34:52.180 | - That's true.

02:34:53.020 | So it's not just the prestige,

02:34:54.180 | a little bit of the delay is part of it.

02:34:56.980 | - Yeah, they certainly, DeepMind specifically,

02:34:58.780 | has been working in the regime

02:35:00.940 | of having slightly higher quality,

02:35:02.980 | basically process and latency,

02:35:04.820 | and publishing those papers that way.

02:35:07.180 | - Another question from Reddit.

02:35:09.100 | Do you or have you suffered from imposter syndrome?

02:35:12.340 | Being the director of AI at Tesla,

02:35:15.420 | being this person when you're at Stanford

02:35:18.100 | where like the world looks at you

02:35:19.460 | as the expert in AI to teach the world

02:35:23.740 | about machine learning?

02:35:25.460 | - When I was leaving Tesla after five years,

02:35:27.180 | I spent a ton of time in meeting rooms

02:35:29.020 | and I would read papers.

02:35:31.780 | In the beginning when I joined Tesla,

02:35:32.820 | I was writing code

02:35:33.740 | and then I was writing less and less code

02:35:35.060 | and I was reading code

02:35:35.940 | and then I was reading less and less code.

02:35:37.660 | And so this is just a natural progression

02:35:39.100 | that happens, I think.

02:35:40.140 | And definitely I would say near the tail end,

02:35:42.700 | that's when it sort of like starts to hit you a bit more

02:35:44.340 | that you're supposed to be an expert,

02:35:45.380 | but actually the source of truth

02:35:47.580 | is the code that people are writing, the GitHub.

02:35:49.180 | And the actual code itself.

02:35:51.980 | And you're not as familiar with that as you used to be.

02:35:54.380 | And so I would say maybe there's some insecurity there.

02:35:57.220 | - Yeah, that's actually pretty profound.

02:35:59.060 | That a lot of the insecurity has to do

02:36:00.620 | with not writing the code in the computer science space.

02:36:03.500 | 'Cause that is the truth, that right there.

02:36:05.900 | - The code is the source of truth.

02:36:06.900 | The papers and everything else,

02:36:08.020 | it's a high-level summary.

02:36:09.740 | Yeah, it's just a high-level summary,

02:36:12.380 | but at the end of the day, you have to read code.

02:36:13.740 | It's impossible to translate all that code

02:36:15.220 | into actual paper form.

02:36:18.540 | So when things come out,

02:36:20.260 | especially when they have a source code available,

02:36:21.580 | that's my favorite place to go.

02:36:23.100 | - So like I said, you're one of the greatest teachers

02:36:25.500 | of machine learning, AI ever, from CS231N to today.

02:36:30.500 | What advice would you give to beginners

02:36:33.660 | interested in getting into machine learning?

02:36:36.460 | - Beginners are often focused on like what to do.

02:36:40.460 | And I think the focus should be more like how much you do.

02:36:43.340 | So I am kind of like believer on a high level

02:36:45.340 | in this 10,000 hours kind of concept

02:36:47.220 | where you just kind of have to just pick the things

02:36:49.700 | where you can spend time and you care about

02:36:51.460 | and you're interested in.

02:36:52.300 | You literally have to put in 10,000 hours of work.

02:36:55.020 | It doesn't even matter as much like where you put it

02:36:57.300 | and you'll iterate and you'll improve

02:36:59.420 | and you'll waste some time.

02:37:00.540 | I don't know if there's a better way.

02:37:01.820 | You need to put in 10,000 hours.

02:37:03.540 | But I think it's actually really nice

02:37:04.540 | 'cause I feel like there's some sense of determinism

02:37:06.380 | about being an expert at a thing if you spend 10,000 hours.

02:37:09.980 | You can literally pick an arbitrary thing.

02:37:12.540 | And I think if you spend 10,000 hours

02:37:14.220 | of deliberate effort and work,

02:37:15.620 | you actually will become an expert at it.

02:37:17.700 | And so I think it's kind of like a nice thought.

02:37:20.560 | And so basically I would focus more on like,

02:37:24.460 | are you spending 10,000 hours?

02:37:26.220 | That's what I focus on.

02:37:27.180 | - So and then thinking about what kind of mechanisms

02:37:29.940 | maximize your likelihood of getting to 10,000 hours.

02:37:32.660 | - Exactly.

02:37:33.500 | - Which for us silly humans means probably forming

02:37:36.760 | a daily habit of like every single day

02:37:39.100 | actually doing the thing.

02:37:40.180 | - Whatever helps you.

02:37:41.160 | So I do think to a large extent

02:37:42.260 | it's a psychological problem for yourself.

02:37:44.500 | One other thing that I think is helpful

02:37:46.940 | for the psychology of it is many times people

02:37:48.980 | compare themselves to others in the area.

02:37:50.780 | I think this is very harmful.

02:37:52.300 | Only compare yourself to you from some time ago,

02:37:54.900 | like say a year ago.

02:37:56.140 | Are you better than you a year ago?

02:37:58.060 | It's the only way to think.

02:38:00.180 | And I think this, then you can see your progress

02:38:02.220 | and it's very motivating.

02:38:03.460 | - That's so interesting that focus on the quantity of hours.

02:38:07.380 | 'Cause I think a lot of people in the beginner stage,

02:38:10.100 | but actually throughout, get paralyzed

02:38:13.580 | by the choice.

02:38:15.580 | Like which one do I pick this path or this path?

02:38:19.380 | Like they'll literally get paralyzed

02:38:21.060 | by like which IDE to use.

02:38:22.620 | - Well, they're worried.

02:38:23.460 | Yeah, they're worried about all these things.

02:38:24.700 | But the thing is, you will waste time doing something wrong.

02:38:28.500 | You will eventually figure out it's not right.

02:38:29.940 | You will accumulate scar tissue.

02:38:31.660 | And next time you'll grow stronger

02:38:33.420 | because next time you'll have the scar tissue

02:38:35.100 | and next time you'll learn from it.

02:38:36.660 | And now next time you come to a similar situation,

02:38:39.020 | you'll be like, oh, I messed up.

02:38:41.620 | I've spent a lot of time working on things

02:38:43.460 | that never materialized into anything.

02:38:45.180 | And I have all that scar tissue

02:38:46.340 | and I have some intuitions about what was useful,

02:38:48.380 | what wasn't useful, how things turned out.

02:38:50.580 | So all those mistakes were not dead work.

02:38:53.980 | So I just think you should, you should just focus on working.

02:38:56.580 | What have you done?

02:38:57.620 | What have you done last week?

02:38:58.820 | (laughs)

02:39:00.700 | - That's a good question actually to ask

02:39:02.660 | for a lot of things, not just machine learning.

02:39:05.620 | It's a good way to cut the,

02:39:08.420 | I forgot what the term we use,

02:39:09.540 | but the fluff, the blubber, whatever the,

02:39:12.780 | the inefficiencies in life.

02:39:14.660 | What do you love about teaching?

02:39:17.180 | You seem to find yourself often in the,

02:39:20.380 | like drawn to teaching.

02:39:21.700 | You're very good at it, but you're also drawn to it.

02:39:23.540 | - I mean, I don't think I love teaching.

02:39:25.260 | I love happy humans.

02:39:27.220 | (laughs)

02:39:28.060 | And happy humans like when I teach.

02:39:30.100 | I wouldn't say I hate teaching.

02:39:32.300 | I tolerate teaching,

02:39:33.260 | but it's not like the act of teaching that I like.

02:39:34.940 | It's that, you know, I have some,

02:39:38.540 | I have something, I'm actually okay at it.

02:39:41.260 | I'm okay at teaching and people appreciate it a lot.

02:39:43.940 | And so I'm just happy to try to be helpful.

02:39:47.220 | And teaching itself is not like the most,

02:39:49.980 | I mean, it's really, it can be really annoying, frustrating.

02:39:52.700 | I was working on a bunch of lectures just now.

02:39:54.580 | I was reminded back to my days of 231N

02:39:56.980 | just how much work it is to create some of these materials

02:39:59.420 | and make them good.

02:40:00.420 | The amount of iteration and thought,

02:40:01.700 | and you go down blind alleys and just how much you change it.

02:40:04.860 | So creating something good

02:40:06.140 | in terms of like educational value is really hard.

02:40:09.740 | And it's not fun.

02:40:11.260 | (laughs)

02:40:12.100 | - It's difficult.

02:40:12.940 | So people should definitely go watch your new stuff

02:40:15.140 | you put out.

02:40:16.500 | There are lectures where you're actually building the thing

02:40:18.340 | like from, like you said, the code is truth.

02:40:20.820 | So discussing back propagation by building it,

02:40:24.180 | by looking through it, just the whole thing.

02:40:26.180 | So how difficult is that to prepare for?

02:40:27.820 | I think that's a really powerful way to teach.

02:40:30.420 | Did you have to prepare for that

02:40:31.700 | or are you just live thinking through it?

02:40:34.500 | - I will typically do like say three takes

02:40:36.580 | and then I take like the better take.

02:40:38.700 | So I do multiple takes

02:40:39.940 | and I take some of the better takes

02:40:40.940 | and then I just build out a lecture that way.

02:40:42.980 | Sometimes I have to delete 30 minutes of content

02:40:45.340 | because it just went down an alley

02:40:46.580 | that I didn't like too much.

02:40:47.900 | So there's a bunch of iteration

02:40:49.660 | and it probably takes me somewhere around 10 hours

02:40:52.460 | to create one hour of content.

02:40:53.500 | - To get one hour.

02:40:54.780 | It's interesting.

02:40:55.620 | I mean, is it difficult to go back to the basics?

02:40:58.940 | Do you draw a lot of like wisdom

02:41:01.140 | from going back to the basics?

02:41:02.340 | - Yeah, going back to back propagation loss functions

02:41:04.380 | where they come from.

02:41:05.220 | And one thing I like about teaching a lot honestly

02:41:07.300 | is it definitely strengthens your understanding.

02:41:10.340 | So it's not a purely altruistic activity.

02:41:12.620 | It's a way to learn.

02:41:13.740 | If you have to explain something to someone,

02:41:16.700 | you realize you have gaps in knowledge.

02:41:19.420 | And so I even surprised myself in those lectures.

02:41:22.300 | Like, oh, so the result will obviously look at this

02:41:24.660 | and then the result doesn't look like it.

02:41:25.820 | And I'm like, okay, I thought I understood this.

02:41:28.020 | (both laughing)

02:41:29.900 | - But that's why it's really cool.

02:41:31.660 | Literally code, you run it in a notebook

02:41:33.980 | and it gives you a result and you're like, oh, wow.

02:41:36.780 | And like actual numbers, actual input, actual code.

02:41:39.820 | - Yeah, it's not mathematical symbols, et cetera.

02:41:41.580 | The source of truth is the code.

02:41:42.980 | It's not slides.

02:41:44.220 | It's just like, let's build it.

02:41:45.940 | - It's beautiful.

02:41:46.780 | You're a rare human in that sense.

02:41:48.500 | What advice would you give to researchers

02:41:51.780 | trying to develop and publish idea

02:41:54.380 | that have a big impact in the world of AI?

02:41:56.820 | So maybe undergrads, maybe early graduate students.

02:42:01.620 | - Yeah.

02:42:02.660 | I mean, I would say like they definitely have to be

02:42:04.340 | a little bit more strategic

02:42:05.780 | than I had to be as a PhD student

02:42:07.540 | because of the way AI is evolving.

02:42:09.660 | It's going the way of physics where,

02:42:12.420 | in physics you used to be able to do experiments

02:42:13.860 | on your bench top and everything was great

02:42:15.380 | and you can make progress.

02:42:16.940 | And now you have to work in like LHC or like CERN.

02:42:20.020 | And so AI is going in that direction as well.

02:42:23.780 | So there's certain kinds of things

02:42:25.660 | that's just not possible to do on the bench top anymore.

02:42:28.180 | And I think that didn't used to be the case at the time.

02:42:32.700 | - Do you still think that there's like GAN type papers

02:42:37.140 | to be written or like very simple idea

02:42:41.740 | that requires just one computer to illustrate

02:42:43.700 | a simple example?

02:42:44.540 | - I mean, one example that's been very influential

02:42:46.300 | recently is diffusion models.

02:42:47.980 | Diffusion models are amazing.

02:42:49.260 | Diffusion models are six years old.

02:42:51.740 | For the longest time, people were kind of ignoring them

02:42:53.860 | as far as I can tell.

02:42:54.980 | And they're an amazing generative model,

02:42:57.180 | especially in images.

02:42:58.940 | And so stable diffusion and so on, it's all diffusion based.

02:43:01.740 | Diffusion is new.

02:43:02.820 | It was not there and it came from,

02:43:05.020 | well, it came from Google,

02:43:05.860 | but a researcher could have come up with it.

02:43:07.380 | In fact, some of the first,

02:43:09.420 | actually, no, those came from Google as well.

02:43:11.780 | But a researcher could come up with that

02:43:13.220 | in an academic institution.

02:43:15.220 | - Yeah, what do you find most fascinating

02:43:16.660 | about diffusion models?

02:43:17.820 | So from the societal impact to the technical architecture.

02:43:22.620 | - What I like about diffusion is it works so well.

02:43:25.380 | - Was that, is that surprising to you?

02:43:26.820 | The amount of the variety,

02:43:28.740 | almost the novelty of the synthetic data it's generating.

02:43:32.700 | - Yeah, so the stable diffusion images are incredible.

02:43:36.180 | It's the speed of improvement in generating images

02:43:39.340 | has been insane.

02:43:40.900 | We went very quickly from generating like tiny digits

02:43:43.420 | to tiny faces and it all looked messed up.

02:43:45.460 | And now we have stable diffusion

02:43:46.700 | and that happened very quickly.

02:43:48.020 | There's a lot that academia can still contribute.

02:43:49.820 | You know, for example, flash attention

02:43:52.660 | is a very efficient kernel

02:43:54.220 | for running the attention operation inside the transformer

02:43:57.340 | that came from academic environment.

02:43:59.580 | It's a very clever way to structure the kernel.

02:44:02.140 | That's the calculation.

02:44:03.740 | So it doesn't materialize the attention matrix.

02:44:06.060 | And so there's, I think there's still like lots of things

02:44:08.660 | to contribute, but you have to be just more strategic.

02:44:11.060 | - Do you think neural networks can be made to reason?

02:44:13.660 | - Yes.

02:44:16.020 | - Do you think they already reason?

02:44:17.580 | - Yes.

02:44:18.420 | - What's your definition of reasoning?

02:44:20.020 | - Information processing.

02:44:22.460 | (laughing)

02:44:24.660 | - So in the way that humans think through a problem

02:44:26.780 | and come up with novel ideas,

02:44:28.460 | it feels like reasoning.

02:44:33.500 | - Yeah.

02:44:34.340 | - So the novelty,

02:44:35.460 | I don't wanna say, but out of distribution ideas,

02:44:41.980 | you think it's possible?

02:44:43.340 | - Yes.

02:44:44.180 | And I think we're seeing that already

02:44:45.220 | in the current neural nets.

02:44:46.420 | You're able to remix the training set information

02:44:48.940 | into true generalization in some sense.

02:44:51.020 | - That doesn't appear.

02:44:52.460 | - It doesn't appear verbatim in the training set.

02:44:54.660 | Like you're doing something interesting algorithmically.

02:44:56.420 | You're manipulating some symbols

02:44:59.100 | and you're coming up with some correct,

02:45:01.860 | a unique answer in a new setting.

02:45:04.740 | - What would illustrate to you,

02:45:07.660 | holy shit, this thing is definitely thinking?

02:45:10.100 | - To me, thinking or reasoning

02:45:12.740 | is just information processing and generalization.

02:45:15.260 | And I think the neural nets already do that today.

02:45:17.940 | - So being able to perceive the world

02:45:19.740 | or perceive whatever the inputs are

02:45:22.620 | and to make predictions based on that

02:45:27.020 | or actions based on that, that's reasoning.

02:45:28.980 | - Yeah, you're giving correct answers in novel settings

02:45:31.940 | by manipulating information.

02:45:34.820 | You've learned the correct algorithm.

02:45:36.540 | You're not doing just some kind of a lookup table

02:45:38.180 | and there's neighbor search, something like that.

02:45:40.540 | - Let me ask you about AGI.

02:45:42.020 | What are some moonshot ideas

02:45:43.740 | you think might make significant progress towards AGI?

02:45:47.940 | Or maybe in other ways,

02:45:49.340 | what are the big blockers that we're missing now?

02:45:52.340 | - So basically I am fairly bullish

02:45:53.900 | on our ability to build AGIs.

02:45:57.380 | Basically automated systems that we can interact with

02:46:01.100 | that are very human-like

02:46:02.340 | and we can interact with them

02:46:03.180 | in a digital realm or a physical realm.

02:46:05.500 | Currently it seems most of the models

02:46:07.940 | that sort of do these sort of magical tasks

02:46:10.020 | are in a text realm.

02:46:11.060 | I think, as I mentioned,

02:46:14.940 | I'm suspicious that the text realm is not enough

02:46:17.580 | to actually build full understanding of the world.

02:46:20.420 | I do actually think you need to go into pixels

02:46:22.180 | and understand the physical world and how it works.

02:46:24.860 | So I do think that we need to extend these models

02:46:26.660 | to consume images and videos

02:46:28.300 | and train on a lot more data

02:46:30.100 | that is multimodal in that way.

02:46:31.780 | - Do you think you need to touch the world

02:46:33.900 | to understand it also?

02:46:34.980 | - Well, that's the big open question I would say in my mind

02:46:36.980 | is if you also require the embodiment

02:46:39.500 | and the ability to sort of interact with the world,

02:46:42.460 | run experiments and have a data of that form,

02:46:45.500 | then you need to go to Optimus or something like that.

02:46:48.580 | And so I would say Optimus in some way is like a hedge.

02:46:52.300 | In AGI, because it seems to me that it's possible

02:46:57.300 | that just having data from the internet is not enough.

02:47:00.220 | If that is the case, then Optimus may lead to AGI

02:47:04.220 | because Optimus, to me, there's nothing beyond Optimus.

02:47:07.820 | You have like this humanoid form factor

02:47:09.340 | that can actually like do stuff in the world.

02:47:11.340 | You can have millions of them interacting with humans

02:47:13.260 | and so on.

02:47:14.460 | And if that doesn't give rise to AGI at some point,

02:47:17.420 | like I'm not sure what will.

02:47:20.060 | So from a completeness perspective,

02:47:21.780 | I think that's a really good platform,

02:47:24.700 | but it's a much harder platform

02:47:26.340 | because you are dealing with atoms

02:47:28.580 | and you need to actually like build these things

02:47:30.420 | and integrate them into society.

02:47:32.620 | So I think that path takes longer,

02:47:34.900 | but it's much more certain.

02:47:36.580 | And then there's a path of the internet

02:47:38.180 | and just like training these compression models effectively

02:47:41.140 | on trying to compress all the internet.

02:47:43.900 | And that might also give these agents as well.

02:47:48.100 | Compress the internet, but also interact with the internet.

02:47:51.580 | So it's not obvious to me.

02:47:54.140 | In fact, I suspect you can reach AGI

02:47:56.740 | without ever entering the physical world.

02:47:59.420 | And which is a little bit more concerning

02:48:03.780 | because that results in it happening faster.

02:48:08.780 | So it just feels like we're in boiling water.

02:48:11.860 | We won't know as it's happening.

02:48:14.180 | I would like to, I'm not afraid of AGI.

02:48:17.820 | I'm excited about it.

02:48:19.100 | There's always concerns,

02:48:20.380 | but I would like to know when it happens.

02:48:22.780 | - Yeah.

02:48:24.780 | - And have like hints about when it happens,

02:48:26.780 | like a year from now it will happen, that kind of thing.

02:48:30.100 | I just feel like in the digital realm, it just might happen.

02:48:32.660 | - Yeah.

02:48:33.500 | I think all we have available to us

02:48:34.620 | because no one has built AGI again.

02:48:36.780 | So all we have available to us is,

02:48:38.900 | is there enough fertile ground on the periphery?

02:48:42.420 | I would say yes.

02:48:43.260 | And we have the progress so far, which has been very rapid.

02:48:46.380 | And there are next steps that are available.

02:48:48.460 | And so I would say, yeah, it's quite likely

02:48:51.620 | that we'll be interacting with digital entities.

02:48:54.260 | - How will you know that somebody has built AGI?

02:48:57.060 | - It's going to be a slow,

02:48:58.100 | I think it's going to be a slow incremental transition.

02:48:59.940 | It's going to be product based and focused.

02:49:01.620 | It's going to be GitHub Copilot getting better.

02:49:03.700 | And then GPT is helping you write.

02:49:06.380 | And then these oracles that you can go to

02:49:08.220 | with mathematical problems.

02:49:09.620 | I think we're on a verge of being able to ask

02:49:12.340 | very complex questions in chemistry, physics, math

02:49:16.260 | of these oracles and have them complete solutions.

02:49:19.700 | - So AGI to use primarily focused on intelligence.

02:49:22.540 | So consciousness doesn't enter into it.

02:49:27.540 | - So in my mind, consciousness is not a special thing

02:49:30.060 | you will figure out and bolt on.

02:49:32.100 | I think it's an emergent phenomenon of a large enough

02:49:34.820 | and complex enough generative model, sort of.

02:49:38.300 | So if you have a complex enough world model

02:49:42.500 | that understands the world,

02:49:43.780 | then it also understands its predicament in the world

02:49:46.700 | as being a language model,

02:49:48.500 | which to me is a form of consciousness or self-awareness.

02:49:51.940 | - So in order to understand the world deeply,

02:49:53.820 | you probably have to integrate yourself into the world.

02:49:56.580 | And in order to interact with humans

02:49:58.460 | and other living beings,

02:50:00.260 | consciousness is a very useful tool.

02:50:02.700 | - I think consciousness is like a modeling insight.

02:50:05.740 | - Modeling insight.

02:50:07.260 | - Yeah, it's a, you have a powerful enough model

02:50:10.060 | of understanding the world that you actually understand

02:50:11.860 | that you are an entity in it.

02:50:13.300 | - Yeah, but there's also this,

02:50:15.460 | perhaps just the narrative we tell ourselves,

02:50:17.340 | there's a, it feels like something to experience the world,

02:50:20.820 | the hard problem of consciousness.

02:50:22.740 | But that could be just a narrative that we tell ourselves.

02:50:24.860 | - Yeah, I don't think, yeah, I think it will emerge.

02:50:27.140 | I think it's going to be something very boring.

02:50:29.340 | Like we'll be talking to these digital AIs,

02:50:31.820 | they will claim they're conscious.

02:50:33.300 | They will appear conscious.

02:50:34.940 | They will do all the things that you would expect

02:50:36.380 | of other humans.

02:50:37.460 | And it's going to just be a stalemate.

02:50:40.300 | - I think there'll be a lot of actual

02:50:42.540 | fascinating ethical questions,

02:50:44.620 | like Supreme Court level questions

02:50:47.580 | of whether you're allowed to turn off a conscious AI,

02:50:51.740 | if you're allowed to build a conscious AI.

02:50:54.500 | Maybe there would have to be the same kind of debates

02:50:58.660 | that you have around,

02:50:59.700 | sorry to bring up a political topic,

02:51:03.060 | but abortion, which is the deeper question with abortion

02:51:07.940 | is what is life?

02:51:11.500 | And the deep question with AI is also what is life

02:51:15.340 | and what is conscious?

02:51:16.420 | And I think that'll be very fascinating to bring up.

02:51:20.780 | It might become illegal to build systems

02:51:23.580 | that are capable of such level of intelligence

02:51:28.580 | that consciousness would emerge

02:51:29.860 | and therefore the capacity to suffer would emerge.

02:51:32.180 | And a system that says, no, please don't kill me.

02:51:36.140 | - Well, that's what the Lambda chatbot

02:51:38.460 | already told this Google engineer, right?

02:51:41.220 | Like it was talking about not wanting to die or so on.

02:51:44.860 | - So that might become illegal to do that.

02:51:47.220 | - Right.

02:51:48.060 | - 'Cause otherwise you might have a lot of creatures

02:51:52.540 | that don't want to die and they will-

02:51:55.340 | - You can just spawn infinity of them on a cluster.

02:51:57.860 | - And then that might lead to like horrible consequences

02:52:01.660 | 'cause then there might be a lot of people

02:52:03.580 | that secretly love murder

02:52:05.060 | and they'll start practicing murder on those systems.

02:52:07.620 | I mean, there's just, to me, all of this stuff

02:52:10.420 | just brings a beautiful mirror to the human condition

02:52:14.100 | and human nature and we'll get to explore it.

02:52:15.820 | And that's what like the best of the Supreme Court

02:52:19.620 | of all the different debates we have about ideas

02:52:22.260 | of what it means to be human.

02:52:23.420 | We get to ask those deep questions

02:52:25.300 | that we've been asking throughout human history.

02:52:27.380 | There's always been the other in human history.

02:52:31.020 | We're the good guys and that's the bad guys

02:52:33.180 | and we're going to, throughout human history,

02:52:36.020 | let's murder the bad guys.

02:52:37.860 | And the same will probably happen with robots.

02:52:40.060 | It'll be the other at first

02:52:41.700 | and then we'll get to ask questions

02:52:42.860 | of what does it mean to be alive?

02:52:44.540 | What does it mean to be conscious?

02:52:45.980 | - Yeah.

02:52:46.820 | And I think there's some canary in the coal mines

02:52:48.300 | even with what we have today.

02:52:50.100 | And for example, there's these like waifus

02:52:52.900 | that you can like work with

02:52:53.740 | and some people are trying to like,

02:52:55.540 | this company is going to shut down

02:52:56.660 | but this person really like loved their waifu

02:52:59.420 | and like is trying to like port it somewhere else

02:53:01.620 | and like it's not possible.

02:53:03.140 | And like, I think like definitely people

02:53:06.260 | will have feelings towards these systems

02:53:10.360 | because in some sense they are like a mirror of humanity

02:53:13.420 | because they are like sort of like a big average

02:53:16.060 | of humanity in the way that it's trained.

02:53:18.500 | - But we can, that average, we can actually watch.

02:53:22.300 | It's nice to be able to interact

02:53:23.620 | with the big average of humanity

02:53:25.340 | and do like a search query on it.

02:53:27.300 | - Yeah, yeah, it's very fascinating.

02:53:29.660 | And we can also, of course, also like shape it.

02:53:31.920 | It's not just a pure average.

02:53:32.940 | We can mess with the training data.

02:53:34.620 | We can mess with the objective.

02:53:35.620 | We can fine tune them in various ways.

02:53:37.660 | So we have some, you know,

02:53:39.700 | impact on what those systems look like.

02:53:42.540 | - If you want to achieve AGI

02:53:44.620 | and you could have a conversation with her

02:53:48.060 | and ask her, talk about anything, maybe ask her a question.

02:53:51.780 | What kind of stuff would you ask?

02:53:54.220 | - I would have some practical questions in my mind.

02:53:55.860 | Like, do I or my loved ones really have to die?

02:54:00.100 | What can we do about that?

02:54:01.400 | (laughing)

02:54:02.900 | - Do you think it will answer clearly

02:54:04.580 | or would it answer poetically?

02:54:06.280 | - I would expect it to give solutions.

02:54:09.020 | I would expect it to be like,

02:54:10.140 | well, I've read all of these textbooks

02:54:11.780 | and I know all these things that you've produced.

02:54:13.460 | And it seems to me like here are the experiments

02:54:14.980 | that I think it would be useful to run next.

02:54:17.560 | And here's some gene therapies

02:54:18.620 | that I think would be helpful.

02:54:19.860 | And here are the kinds of experiments that you should run.

02:54:22.300 | - Okay, let's go with this thought experiment, okay?

02:54:25.300 | Imagine that mortality is actually

02:54:29.380 | a prerequisite for happiness.

02:54:32.980 | So if we become immortal,

02:54:34.740 | we'll actually become deeply unhappy.

02:54:36.780 | And the model is able to know that.

02:54:39.620 | So what is this supposed to tell you,

02:54:41.220 | stupid human, about it?

02:54:42.540 | Yes, you can become immortal,

02:54:43.820 | but you will become deeply unhappy.

02:54:46.140 | If the AGI system is trying to empathize with you, human,

02:54:51.140 | what is this supposed to tell you?

02:54:53.520 | That yes, you don't have to die,

02:54:55.740 | but you're really not gonna like it?

02:54:57.860 | Is it gonna be deeply honest?

02:54:59.700 | Like there's a interstellar, what is it?

02:55:02.060 | The AI says like, humans want 90% honesty.

02:55:05.660 | (laughing)

02:55:08.020 | So like you have to pick how honest

02:55:09.780 | do I wanna answer these practical questions.

02:55:11.780 | - Yeah, I love AI interstellar, by the way.

02:55:14.180 | I think it's like such a sidekick to the entire story,

02:55:16.720 | but at the same time, it's like really interesting.

02:55:19.700 | - It's kind of limited in certain ways, right?

02:55:22.320 | - Yeah, it's limited.

02:55:23.160 | I think that's totally fine, by the way.

02:55:24.540 | I don't think, I think it's fine and plausible

02:55:27.900 | to have a limited and imperfect AGIs.

02:55:30.460 | - Is that the feature almost?

02:55:34.020 | - As an example, like it has a fixed amount of compute

02:55:36.940 | on its physical body.

02:55:38.260 | And it might just be that even though you can have

02:55:40.680 | a super amazing mega brain, super intelligent AI,

02:55:43.980 | you also can have like, you know, less intelligent AIs

02:55:46.580 | that you can deploy in a power efficient way.

02:55:49.540 | And then they're not perfect, they might make mistakes.

02:55:51.460 | - No, I meant more like, say you had infinite compute,

02:55:55.320 | and it's still good to make mistakes sometimes.

02:55:58.140 | Like in order to integrate yourself, like, what is it?

02:56:01.760 | Going back to "Good Will Hunting,"

02:56:03.280 | Robin Williams' character says like,

02:56:05.560 | "The human imperfections, that's the good stuff," right?

02:56:09.160 | Isn't that the, like, we don't want perfect,

02:56:12.480 | we want flaws in part to form connections with each other,

02:56:17.480 | 'cause it feels like something you can attach

02:56:19.240 | your feelings to, the flaws.

02:56:22.720 | And in that same way, you want an AI that's flawed.

02:56:26.060 | I don't know, I feel like perfection is cool.

02:56:28.160 | - But then you're saying, okay, yeah.

02:56:29.740 | - But that's not AGI.

02:56:30.920 | But see, AGI would need to be intelligent enough

02:56:33.920 | to give answers to humans that humans don't understand.

02:56:36.800 | And I think perfect is something humans can't understand.

02:56:40.120 | Because even science doesn't give perfect answers.

02:56:42.520 | There's always gaffes and mysteries, and I don't know.

02:56:47.000 | I don't know if humans want perfect.

02:56:50.080 | - Yeah, I can imagine just having a conversation

02:56:52.760 | with this kind of oracle entity, as you'd imagine them.

02:56:55.800 | And yeah, maybe it can tell you about,

02:56:58.240 | based on my analysis of human condition,

02:57:01.160 | you might not want this.

02:57:03.120 | And here are some of the things that might--

02:57:05.200 | - But every dumb human will say, "Yeah, yeah, yeah, yeah.

02:57:08.720 | "Trust me, give me the truth, I can handle it."

02:57:12.360 | - But that's the beauty, like, people can choose.

02:57:15.000 | But then, the old marshmallow test with the kids and so on,

02:57:20.000 | I feel like too many people can't handle the truth,

02:57:25.320 | probably including myself.

02:57:26.880 | Like, the deep truth of the human condition,

02:57:28.640 | I don't know if I can handle it.

02:57:30.960 | Like, what if there's some darks?

02:57:32.760 | What if we are an alien science experiment,

02:57:35.740 | and it realizes that?

02:57:36.940 | What if it had, I mean--

02:57:37.920 | - Yeah, I mean, this is "The Matrix," all over again.

02:57:41.880 | - I don't know.

02:57:44.000 | I would, what would I talk about?

02:57:46.000 | I don't even, yeah.

02:57:47.200 | Probably I would go with the safer scientific questions

02:57:52.080 | at first that have nothing to do with my own personal life

02:57:55.800 | and mortality, just like about physics and so on.

02:57:59.200 | To build up, like, let's see where it's at.

02:58:02.640 | Or maybe see if it has a sense of humor.

02:58:04.540 | That's another question.

02:58:06.020 | Would it be able to, presumably, in order to,

02:58:08.500 | if it understands humans deeply,

02:58:10.080 | would it be able to generate humor?

02:58:15.080 | - Yeah, I think that's actually a wonderful benchmark,

02:58:17.760 | almost, like, is it able,

02:58:19.320 | I think that's a really good point, basically.

02:58:21.280 | - To make you laugh.

02:58:22.280 | - Yeah, if it's able to be, like,

02:58:23.400 | a very effective stand-up comedian

02:58:24.840 | that is doing something very interesting computationally.

02:58:26.920 | I think being funny is extremely hard.

02:58:28.880 | - Yeah, because it's hard in a way, like a Turing test.

02:58:33.880 | The original intent of the Turing test is hard,

02:58:38.500 | because you have to convince humans.

02:58:40.280 | And there's nothing, that's why,

02:58:41.880 | that's why comedians talk about this.

02:58:45.280 | Like, this is deeply honest,

02:58:47.920 | because if people can't help but laugh,

02:58:49.880 | and if they don't laugh, that means you're not funny.

02:58:51.800 | If they laugh, it's funny.

02:58:52.880 | - And you're showing, you need a lot of knowledge

02:58:54.800 | to create humor, about, like, the occult,

02:58:57.360 | you mentioned human condition and so on,

02:58:58.560 | and then you need to be clever with it.

02:59:01.160 | - You mentioned a few movies.

02:59:02.320 | You tweeted, "Movies that I've seen five plus times,

02:59:05.120 | "but am ready and willing to keep watching.

02:59:08.380 | "Interstellar, Gladiator, Contact,

02:59:10.360 | "Good Will, Hunting, The Matrix, Lord of the Rings,

02:59:13.180 | "all three, Avatar, Fifth Element, so on."

02:59:15.720 | It goes on.

02:59:16.560 | Terminator 2, mean girls, I'm not gonna ask about that.

02:59:19.160 | (Josh laughs)

02:59:20.000 | - But Mean Girls is great.

02:59:21.520 | (both laugh)

02:59:23.600 | - What are some that jump out to you in your memory

02:59:25.800 | that you love, and why?

02:59:28.760 | Like, you mentioned The Matrix.

02:59:30.960 | As a computer person, why do you love The Matrix?

02:59:33.400 | - There's so many properties

02:59:35.400 | that make it, like, beautiful and interesting.

02:59:36.680 | So there's all these philosophical questions,

02:59:39.120 | but then there's also AGIs, and there's simulation,

02:59:42.160 | and it's cool, and there's, you know, the black, you know.

02:59:46.320 | - The look of it, the feel of it.

02:59:47.160 | - Yeah, the look of it, the feel of it,

02:59:48.560 | the action, the bullet time.

02:59:50.040 | It was just, like, innovating in so many ways.

02:59:52.340 | - And then Good Will, Hunting, why do you like that one?

02:59:57.500 | - Yeah, I just, I really like this tortured genius

03:00:01.200 | sort of character who's, like, grappling

03:00:03.680 | with whether or not he has, like, any responsibility

03:00:06.520 | or, like, what to do with this gift that he was given,

03:00:08.720 | or, like, how to think about the whole thing.

03:00:10.920 | - But there's also a dance between the genius

03:00:13.480 | and the personal, like, what it means

03:00:16.600 | to love another human being.

03:00:18.080 | - There's a lot of themes there.

03:00:18.920 | It's just a beautiful movie.

03:00:20.280 | - And then the fatherly figure, the mentor,

03:00:22.240 | and the psychiatrist, and the--

03:00:24.320 | - It, like, really, like, it messes with you.

03:00:27.040 | You know, there's some movies that just, like,

03:00:28.120 | really mess with you on a deep level.

03:00:31.080 | - Do you relate to that movie at all?

03:00:33.240 | - No.

03:00:34.600 | - It's not your fault, Andre, as I said.

03:00:36.960 | Lord of the Rings, that's self-explanatory.

03:00:40.160 | Terminator 2, which is interesting.

03:00:42.760 | You rewatch that a lot.

03:00:44.160 | Is that better than Terminator 1?

03:00:46.120 | You like Arnold--

03:00:46.960 | - I do like Terminator 1 as well.

03:00:49.060 | I like Terminator 2 a little bit more,

03:00:51.680 | but in terms of, like, its surface properties.

03:00:53.920 | (laughing)

03:00:55.880 | - Do you think Skynet is at all a possibility?

03:00:58.560 | - Yes.

03:00:59.400 | - Like, the actual, sort of, autonomous weapon system

03:01:04.360 | kind of thing.

03:01:05.200 | Do you worry about that stuff?

03:01:06.840 | - I do worry about it 100%.

03:01:07.680 | - AI being used for war.

03:01:09.440 | - I 100% worry about it.

03:01:10.560 | And so, I mean, some of these fears of AGIs

03:01:14.480 | and how this will plan out.

03:01:15.480 | I mean, these will be, like, very powerful entities,

03:01:17.160 | probably, at some point.

03:01:18.000 | And so, for a long time, they're going to be tools

03:01:20.760 | in the hands of humans.

03:01:22.240 | You know, people talk about, like, alignment of AGIs

03:01:24.240 | and how to make, the problem is, like,

03:01:26.080 | even humans are not aligned.

03:01:27.760 | So, how this will be used and what this is gonna look like

03:01:31.440 | is, yeah, it's troubling.

03:01:34.480 | - Do you think it'll happen slowly enough

03:01:36.600 | that we'll be able to, as a human civilization,

03:01:40.480 | think through the problems?

03:01:41.760 | - Yes, that's my hope, is that it happens slowly enough

03:01:44.000 | and in an open enough way where a lot of people

03:01:46.200 | can see and participate in it.

03:01:48.120 | Just figure out how to deal with this transition,

03:01:50.760 | I think, which is gonna be interesting.

03:01:52.280 | - I draw a lot of inspiration from nuclear weapons

03:01:54.760 | 'cause I sure thought it would be fucked

03:01:57.960 | once they developed nuclear weapons.

03:02:00.280 | But, like, it's almost like when the systems

03:02:05.240 | are not so dangerous, they destroy human civilization,

03:02:07.840 | we deploy them and learn the lessons.

03:02:09.920 | And then we quickly, if it's too dangerous,

03:02:12.720 | we quickly, quickly, we might still deploy it,

03:02:15.560 | but you very quickly learn not to use them.

03:02:17.800 | And so, there'll be, like, this balance achieved.

03:02:19.640 | Humans are very clever as a species.

03:02:21.920 | It's interesting.

03:02:23.000 | We exploit the resources as much as we can,

03:02:25.520 | but we don't, we avoid destroying ourselves,

03:02:27.840 | it seems like. - Yeah.

03:02:29.240 | Well, I don't know about that, actually.

03:02:30.800 | - I hope it continues.

03:02:32.000 | - I mean, I'm definitely, like, concerned

03:02:35.400 | about nuclear weapons and so on,

03:02:36.760 | not just as a result of the recent conflict,

03:02:38.840 | even before that.

03:02:40.400 | That's probably my number one concern for humanity.

03:02:43.400 | - So, if humanity destroys itself

03:02:47.600 | or destroys, you know, 90% of people,

03:02:50.440 | that would be because of nukes?

03:02:52.480 | - I think so.

03:02:53.320 | And it's not even about the full destruction.

03:02:55.760 | To me, it's bad enough if we reset society.

03:02:57.960 | That would be, like, terrible.

03:02:59.560 | It would be really bad.

03:03:00.400 | And I can't believe we're, like, so close to it.

03:03:03.600 | - Yeah. - It's, like, so crazy to me.

03:03:05.160 | - It feels like we might be a few tweets away

03:03:07.120 | from something like that.

03:03:08.440 | - Yep.

03:03:09.280 | Basically, it's extremely unnerving,

03:03:11.560 | but, and has been for me for a long time.

03:03:14.240 | - It seems unstable that world leaders,

03:03:18.520 | just having a bad mood, can, like,

03:03:21.680 | take one step towards a bad direction and it escalates.

03:03:26.640 | - Yeah.

03:03:27.560 | - Because of a collection of bad moods,

03:03:30.360 | it can escalate without being able to stop.

03:03:33.720 | - Yeah, it's just, it's a huge amount of power.

03:03:37.160 | And then, also, with the proliferation.

03:03:39.640 | Basically, I don't actually really see,

03:03:41.880 | I don't actually know what the good outcomes are here.

03:03:43.600 | (laughs)

03:03:44.960 | So, I'm definitely worried about it a lot.

03:03:46.680 | And then, AGI is not currently there,

03:03:48.400 | but I think at some point,

03:03:49.560 | will more and more become something like it.

03:03:53.280 | The danger with AGI, even, is that,

03:03:55.520 | I think it's even, like, slightly worse,

03:03:56.880 | in the sense that there are good outcomes of AGI.

03:04:01.280 | And then, the bad outcomes are, like, an epsilon away,

03:04:03.960 | like a tiny one away.

03:04:05.240 | And so, I think capitalism and humanity and so on

03:04:08.240 | will drive for the positive ways of using that technology.

03:04:11.960 | But then, if bad outcomes are just, like, a tiny,

03:04:13.920 | like, flip a minus sign away,

03:04:16.560 | that's a really bad position to be in.

03:04:18.320 | - A tiny perturbation of the system

03:04:20.320 | results in the destruction of the human species.

03:04:23.000 | It's a weird line to walk.

03:04:25.200 | - Yeah, I think, in general,

03:04:26.040 | what's really weird about, like, the dynamics of humanity

03:04:27.960 | in this explosion we've talked about,

03:04:29.160 | is just, like, the insane coupling afforded by technology.

03:04:32.960 | And just the instability of the whole dynamical system.

03:04:36.360 | I think it just doesn't look good, honestly.

03:04:39.160 | - Yeah, so that explosion could be destructive

03:04:40.920 | or constructive, and the probabilities are non-zero

03:04:43.600 | in both ends of the spectrum.

03:04:45.960 | - I do feel like I have to try to be optimistic and so on.

03:04:49.080 | I think, even in this case,

03:04:49.960 | I still am predominantly optimistic,

03:04:51.640 | but there's definitely...

03:04:53.680 | - Me too.

03:04:54.720 | - Do you think we'll become a multi-planetary species?

03:04:57.420 | - Probably yes, but I don't know if it's a dominant feature

03:05:01.160 | of future humanity.

03:05:04.120 | There might be some people on some planets and so on,

03:05:06.880 | but I'm not sure if it's, like, yeah,

03:05:08.880 | if it's, like, a major player in our culture and so on.

03:05:12.080 | - We still have to solve the drivers

03:05:14.400 | of self-destruction here on Earth.

03:05:16.760 | So just having a backup on Mars

03:05:18.360 | is not gonna solve the problem.

03:05:19.920 | - So, by the way, I love the backup on Mars.

03:05:21.840 | I think that's amazing.

03:05:22.680 | We should absolutely do that.

03:05:23.720 | - Yes.

03:05:24.840 | - And I'm so thankful for anyone.

03:05:26.880 | - Would you go to Mars?

03:05:28.680 | - Personally, no.

03:05:29.560 | I do like Earth quite a lot.

03:05:31.320 | - Okay, I'll go to Mars.

03:05:32.640 | I'll go for you.

03:05:33.960 | I'll tweet at you from there.

03:05:35.360 | - Maybe eventually I would once it's safe enough,

03:05:37.560 | but I don't actually know if it's on my lifetime scale,

03:05:40.340 | unless I can extend it by a lot.

03:05:41.940 | I do think that, for example,

03:05:43.960 | a lot of people might disappear into virtual realities

03:05:47.040 | and stuff like that.

03:05:47.860 | I think that could be the major thrust

03:05:49.240 | of sort of the cultural development of humanity,

03:05:52.480 | if it survives.

03:05:53.840 | So it might not be,

03:05:54.920 | it's just really hard to work in physical realm

03:05:57.160 | and go out there.

03:05:58.440 | And I think ultimately all your experiences

03:06:00.160 | are in your brain.

03:06:02.040 | And so it's much easier to disappear into digital realm.

03:06:05.720 | And I think people will find them more compelling,

03:06:07.560 | easier, safer, more interesting.

03:06:10.600 | - So you're a little bit captivated by virtual reality,

03:06:12.880 | by the possible worlds,

03:06:14.260 | whether it's the metaverse

03:06:15.240 | or some other manifestation of that.

03:06:16.840 | - Yeah.

03:06:18.240 | - Yeah, it's really interesting.

03:06:21.680 | I'm interested, just talking a lot to Carmack,

03:06:24.920 | where's the thing that's currently preventing that?

03:06:29.440 | - Yeah, I mean, to be clear,

03:06:30.720 | I think what's interesting about the future is,

03:06:33.820 | it's not that,

03:06:35.360 | I kind of feel like the variance in the human condition grows

03:06:39.080 | that's the primary thing that's changing.

03:06:40.400 | It's not as much the mean of the distribution,

03:06:42.840 | it's like the variance of it.

03:06:43.920 | So there will probably be people on Mars

03:06:45.360 | and there will be people in VR

03:06:46.680 | and there will be people here on earth.

03:06:48.040 | It's just like, there will be so many more ways of being.

03:06:50.960 | And so I kind of feel like,

03:06:51.800 | I see it as like a spreading out of a human experience.

03:06:54.600 | - There's something about the internet

03:06:55.960 | that allows you to discover those little groups

03:06:57.880 | and then you gravitate to something about your biology

03:07:01.040 | likes that kind of world and that you find each other.

03:07:02.920 | - Yeah, and we'll have transhumanists

03:07:04.560 | and then we'll have the Amish

03:07:05.720 | and they're gonna, everything is just gonna coexist.

03:07:07.680 | - Yeah, the cool thing about it,

03:07:08.680 | 'cause I've interacted with a bunch of internet communities,

03:07:11.600 | is they don't know about each other.

03:07:15.480 | Like you can have a very happy existence,

03:07:17.800 | just like having a very close knit community

03:07:19.800 | and not knowing about each other.

03:07:21.240 | I mean, you even sense this, just having traveled to Ukraine,

03:07:24.720 | they don't know so many things about America.

03:07:28.980 | When you travel across the world,

03:07:31.440 | I think you experience this too.

03:07:33.040 | There are certain cultures that are like,

03:07:34.720 | they have their own thing going on.

03:07:36.960 | So you can see that happening more and more and more

03:07:39.800 | and more in the future.

03:07:40.880 | We have little communities.

03:07:42.080 | - Yeah, yeah, I think so.

03:07:43.200 | That seems to be how it's going right now.

03:07:46.760 | And I don't see that trend like really reversing.

03:07:48.840 | I think people are diverse

03:07:49.840 | and they're able to choose their own path in existence.

03:07:52.800 | And I sort of like celebrate that.

03:07:54.500 | And so-

03:07:56.240 | - Will you spend so much time in the metaverse,

03:07:58.080 | in the virtual reality?

03:07:59.840 | Or which community are you?

03:08:01.480 | Are you the physicalist,

03:08:02.680 | the physical reality enjoyer?

03:08:06.920 | Or do you see drawing a lot of pleasure

03:08:10.720 | and fulfillment in the digital world?

03:08:13.480 | - Yeah, I think, well, currently,

03:08:14.760 | the virtual reality is not that compelling.

03:08:17.360 | I do think it can improve a lot,

03:08:18.840 | but I don't really know to what extent.

03:08:21.520 | Maybe there's actually even more exotic things

03:08:23.760 | you can think about with Neuralinks or stuff like that.

03:08:26.560 | Currently, I kind of see myself as mostly a team human person.

03:08:31.760 | I love nature.

03:08:32.880 | I love harmony.

03:08:33.720 | I love people.

03:08:34.720 | I love humanity.

03:08:36.160 | I love emotions of humanity.

03:08:37.760 | And I just want to be in this like solar punk,

03:08:42.360 | little utopia, that's my happy place.

03:08:44.760 | My happy place is people I love,

03:08:47.120 | thinking about cool problems,

03:08:48.200 | surrounded by lush, beautiful, dynamic nature,

03:08:51.480 | and secretly high-tech in places that count.

03:08:54.360 | - Places that count.

03:08:55.240 | So you use technology to empower that love

03:08:58.080 | for other humans and nature.

03:09:00.560 | - Yeah, I think technology used very sparingly.

03:09:03.080 | I don't love when it sort of gets in the way of humanity

03:09:05.680 | in many ways.

03:09:07.400 | I like just people being humans in a way.

03:09:09.640 | We sort of like slightly evolved and prefer,

03:09:11.880 | I think, just by default.

03:09:13.280 | - People kept asking me,

03:09:14.400 | 'cause they know you love reading.

03:09:16.120 | Are there particular books that you enjoyed

03:09:19.680 | that had an impact on you for silly

03:09:22.680 | or for profound reasons that you would recommend?

03:09:26.080 | You mentioned the vital question.

03:09:29.360 | - Many, of course.

03:09:30.200 | I think in biology, as an example,

03:09:31.640 | the vital question is a good one.

03:09:32.920 | Anything by Nick Lane, really, "Life Ascending,"

03:09:36.000 | I would say is like a bit more potentially representative,

03:09:39.680 | is like a summary of a lot of the things

03:09:42.640 | he's been talking about.

03:09:44.280 | I was very impacted by "The Selfish Gene."

03:09:46.280 | I thought that was a really good book

03:09:47.680 | that helped me understand altruism as an example

03:09:49.960 | and where it comes from,

03:09:50.800 | and just realizing that the selectionism

03:09:52.640 | and the level of genes was a huge insight for me

03:09:54.320 | at the time,

03:09:55.160 | and it sort of cleared up a lot of things for me.

03:09:57.160 | - What do you think about the idea

03:09:59.860 | that ideas are the organisms, the memes?

03:10:01.920 | - Yes, love it, 100%.

03:10:03.360 | (Lex laughing)

03:10:05.920 | - Are you able to walk around with that notion for a while,

03:10:08.920 | that there is an evolutionary kind of process

03:10:12.400 | with ideas as well?

03:10:13.320 | - There absolutely is.

03:10:14.160 | There's memes just like genes,

03:10:15.440 | and they compete, and they live in our brains.

03:10:18.400 | It's beautiful.

03:10:19.400 | - Are we silly humans thinking that we're the organisms?

03:10:22.080 | Is it possible that the primary organisms are the ideas?

03:10:26.200 | - Yeah, I would say the ideas kind of live in the software

03:10:30.000 | of our civilization in the minds and so on.

03:10:33.600 | - We think as humans that the hardware

03:10:36.080 | is the fundamental thing.

03:10:37.840 | I, human, is a hardware entity,

03:10:40.960 | but it could be the software, right?

03:10:43.080 | - Yeah, yeah, I would say there needs to be some grounding

03:10:46.760 | at some point to a physical reality.

03:10:49.040 | - But if we clone an Andre,

03:10:50.520 | the software is the thing,

03:10:54.040 | is the thing that makes that thing special, right?

03:10:57.600 | - Yeah, I guess you're right.

03:10:59.360 | - But then cloning might be exceptionally difficult.

03:11:01.680 | There might be a deep integration

03:11:02.880 | between the software and the hardware,

03:11:04.520 | in ways we don't quite understand.

03:11:06.280 | - Well, from the evolution point of view,

03:11:07.440 | what makes me special is more the gang of genes

03:11:10.740 | that are riding in my chromosomes, I suppose, right?

03:11:13.180 | Like they're the replicating unit, I suppose.

03:11:16.060 | - No, but that's just the compute.

03:11:17.380 | The thing that makes you special, sure.

03:11:20.040 | Well, the reality is what makes you special

03:11:25.040 | is your ability to survive based on the software

03:11:29.740 | that runs on the hardware that was built by the genes.

03:11:33.080 | So the software is the thing that makes you survive,

03:11:35.900 | not the hardware.

03:11:37.340 | Or I guess it's the two sides.

03:11:38.160 | - It's a little bit of both.

03:11:39.000 | It's just like a second layer.

03:11:40.340 | It's a new second layer

03:11:41.420 | that hasn't been there before the brain.

03:11:42.820 | They both coexist.

03:11:44.100 | - But there's also layers of the software.

03:11:46.020 | I mean, it's an abstraction on top of abstractions.

03:11:51.020 | - But, yeah, so selfish gene, Nick Lane.

03:11:55.500 | I would say sometimes books are like not sufficient.

03:11:58.500 | I like to reach for textbooks sometimes.

03:12:00.500 | I kind of feel like books are for too much

03:12:03.540 | of a general consumption sometimes.

03:12:05.100 | And they just kind of like,

03:12:06.840 | they're too high up in the level of abstraction

03:12:08.500 | and it's not good enough.

03:12:09.900 | So I like textbooks.

03:12:10.740 | I like "The Cell."

03:12:12.100 | I think "The Cell" was pretty cool.

03:12:14.700 | That's why also I like the writing of Nick Lane

03:12:17.860 | is because he's pretty willing to step one level down

03:12:21.180 | and he doesn't, yeah, he's sort of,

03:12:23.740 | he's willing to go there.

03:12:25.740 | But he's also willing to sort of be throughout the stack.

03:12:27.820 | So he'll go down to a lot of detail,

03:12:29.180 | but then he will come back up.

03:12:30.700 | And I think he has a,

03:12:32.620 | yeah, basically I really appreciate that.

03:12:34.700 | - That's why I love college, early college,

03:12:36.600 | even high school, just textbooks on the basics.

03:12:39.700 | - Yeah.

03:12:40.540 | - Of computer science, of mathematics,

03:12:41.900 | of biology, of chemistry.

03:12:44.140 | - Yes.

03:12:44.980 | - Those are, they condense down.

03:12:46.340 | It's sufficiently general that you can understand

03:12:50.580 | both the philosophy and the details,

03:12:52.100 | but also you get homework problems

03:12:54.540 | and you get to play with it as much as you would

03:12:57.300 | if you were in programming stuff.

03:12:59.660 | - Yeah.

03:13:00.500 | And then I'm also suspicious of textbooks, honestly,

03:13:01.940 | because as an example in deep learning,

03:13:04.260 | there's no amazing textbooks

03:13:05.740 | and the field is changing very quickly.

03:13:07.220 | I imagine the same is true

03:13:08.380 | in say synthetic biology and so on.

03:13:11.340 | These books like "The Cell" are kind of outdated.

03:13:13.420 | They're still high level.

03:13:14.540 | Like what is the actual real source of truth?

03:13:16.400 | It's people in wet labs working with cells.

03:13:18.940 | - Yeah.

03:13:19.780 | - You know, sequencing genomes and yeah,

03:13:22.980 | actually working with it.

03:13:24.700 | And I don't have that much exposure to that

03:13:27.060 | or what that looks like.

03:13:27.900 | So I still don't fully,

03:13:29.060 | I'm reading through "The Cell"

03:13:30.180 | and it's kind of interesting and I'm learning,

03:13:31.380 | but it's still not sufficient, I would say,

03:13:33.260 | in terms of understanding.

03:13:34.740 | - Well, it's a clean summarization

03:13:36.660 | of the mainstream narrative.

03:13:38.780 | - Yeah.

03:13:39.620 | - But you have to learn that before you break out.

03:13:41.740 | - Yeah.

03:13:42.580 | - I think of it towards the cutting edge.

03:13:43.740 | - Yeah.

03:13:44.580 | But what is the actual process of working with these cells

03:13:45.900 | and growing them and incubating them?

03:13:47.800 | And, you know, it's kind of like a massive cooking recipes

03:13:50.180 | of making sure your cells lives and proliferate

03:13:52.260 | and then you're sequencing them, running experiments

03:13:54.020 | and just how that works,

03:13:55.980 | I think is kind of like the source of truth

03:13:57.340 | of at the end of the day,

03:13:58.360 | what's really useful in terms of creating therapies

03:14:01.100 | and so on.

03:14:01.940 | - Yeah, I wonder what in the future AI textbooks will be.

03:14:04.860 | 'Cause, you know, there's "Artificial Intelligence,

03:14:06.820 | "The Modern Approach."

03:14:07.700 | I actually haven't read if it's come out,

03:14:09.860 | the recent version, the recent,

03:14:11.740 | there's been a recent edition.

03:14:13.380 | I also saw there's a "Science of Deep Learning" book.

03:14:15.860 | I'm waiting for textbooks that are worth recommending,

03:14:17.900 | worth reading.

03:14:18.740 | - Yeah.

03:14:19.580 | - It's tricky 'cause it's like papers and code, code, code.

03:14:23.580 | - Honestly, I think papers are quite good.

03:14:25.740 | I especially like the appendix of any paper as well.

03:14:28.660 | It's like the most detail you can have.

03:14:30.900 | (both laughing)

03:14:33.140 | It doesn't have to be cohesive,

03:14:34.980 | connected to anything else.

03:14:35.900 | You just describe me a very specific way

03:14:37.820 | you saw the particular thing, yeah.

03:14:39.300 | - Many times papers can be actually quite readable.

03:14:41.240 | Not always, but sometimes the introduction

03:14:43.140 | in the abstract is readable,

03:14:44.180 | even for someone outside of the field.

03:14:46.420 | This is not always true.

03:14:47.700 | And sometimes I think, unfortunately,

03:14:49.180 | scientists use complex terms, even when it's not necessary.

03:14:52.580 | I think that's harmful.

03:14:54.000 | I think there's no reason for that.

03:14:55.820 | - And papers sometimes are longer than they need to be

03:14:58.540 | in the parts that don't matter.

03:15:01.620 | - Yeah.

03:15:02.460 | - So the appendix would be long,

03:15:03.300 | but then the paper itself, look at Einstein, make it simple.

03:15:07.100 | - Yeah, but certainly I've come across papers,

03:15:08.540 | I would say, like synthetic biology or something

03:15:10.540 | that I thought were quite readable

03:15:11.820 | for the abstract and the introduction.

03:15:13.260 | And then you're reading the rest of it

03:15:14.540 | and you don't fully understand,

03:15:15.900 | but you kind of are getting a gist and I think it's cool.

03:15:18.540 | (both laughing)

03:15:20.180 | - What advice you give advice to folks

03:15:23.300 | interested in machine learning and research,

03:15:25.460 | but in general, life advice to a young person,

03:15:27.660 | high school, early college,

03:15:30.660 | about how to have a career they can be proud of

03:15:33.140 | or a life they can be proud of?

03:15:34.740 | - Yeah, I think I'm very hesitant to give general advice.

03:15:37.980 | I think it's really hard.

03:15:38.900 | I've mentioned, like some of the stuff I've mentioned

03:15:40.460 | is fairly general, I think,

03:15:41.740 | like focus on just the amount of work you're spending

03:15:44.100 | on like a thing.

03:15:45.700 | Compare yourself only to yourself, not to others.

03:15:48.100 | - That's good.

03:15:48.940 | - I think those are fairly general.

03:15:49.900 | - How do you pick the thing?

03:15:51.300 | - You just have like a deep interest in something

03:15:55.340 | or like try to like find the argmax

03:15:57.360 | over like the things that you're interested in.

03:15:58.860 | - Argmax at that moment and stick with it.

03:16:00.940 | How do you not get distracted and switch to another thing?

03:16:05.180 | - You can, if you like.

03:16:06.460 | (Lex laughing)

03:16:07.820 | - Well, if you do an argmax repeatedly every week,

03:16:11.020 | every month. - Yeah, it doesn't converge.

03:16:12.180 | - It doesn't, it's a problem.

03:16:13.300 | - Yeah, you can like low pass filter yourself

03:16:15.340 | in terms of like what has consistently been true for you.

03:16:18.180 | But yeah, I definitely see how it can be hard,

03:16:22.180 | but I would say like you're going to work the hardest

03:16:24.020 | on the thing that you care about the most.

03:16:26.020 | So low pass filter yourself and really introspect.

03:16:28.980 | In your past, what are the things that gave you energy?

03:16:31.180 | And what are the things that took energy away from you?

03:16:33.340 | Concrete examples.

03:16:34.540 | And usually from those concrete examples,

03:16:36.860 | sometimes patterns can emerge.

03:16:38.540 | I like it when things look like this

03:16:40.380 | when I'm in these positions.

03:16:41.340 | - So that's not necessarily the field,

03:16:42.700 | but the kind of stuff you're doing in a particular field.

03:16:44.780 | So for you, it seems like you were energized

03:16:47.460 | by implementing stuff, building actual things.

03:16:50.620 | - Yeah, being low level learning,

03:16:52.580 | and then also communicating so that others

03:16:55.420 | can go through the same realizations

03:16:56.820 | and shortening that gap.

03:16:58.140 | Because I usually have to do way too much work

03:17:00.820 | to understand a thing.

03:17:01.700 | And then I'm like, okay, this is actually like,

03:17:03.380 | okay, I think I get it.

03:17:04.220 | And like, why was it so much work?

03:17:05.900 | It should have been much less work.

03:17:08.860 | And that gives me a lot of frustration.

03:17:10.620 | And that's why I sometimes go teach.

03:17:12.500 | - So aside from the teaching you're doing now,

03:17:15.380 | putting out videos, aside from a potential Godfather part two

03:17:20.380 | with the AGI at Tesla and beyond,

03:17:24.620 | what does the future for Andrej Karpathy hold?

03:17:26.940 | Have you figured that out yet or no?

03:17:28.900 | I mean, as you see through the fog of war,

03:17:32.580 | that is all of our future.

03:17:34.100 | Do you start seeing silhouettes

03:17:37.460 | of what that possible future could look like?

03:17:39.700 | - The consistent thing I've been always interested in,

03:17:42.700 | for me at least, is AI.

03:17:44.620 | And that's probably what I'm spending

03:17:47.940 | the rest of my life on, because I just care about it a lot.

03:17:50.820 | And I actually care about many other problems as well,

03:17:53.420 | like say aging, which I basically view as disease.

03:17:56.300 | And I care about that as well,

03:17:58.980 | but I don't think it's a good idea

03:18:00.460 | to go after it specifically.

03:18:02.340 | I don't actually think that humans will be able

03:18:04.420 | to come up with the answer.

03:18:06.180 | I think the correct thing to do is to ignore those problems

03:18:08.820 | and you solve AI and then use that to solve everything else.

03:18:11.820 | And I think there's a chance that this will work.

03:18:13.260 | I think it's a very high chance.

03:18:14.820 | And that's kind of like the way I'm betting at least.

03:18:18.460 | - So when you think about AI,

03:18:20.060 | are you interested in all kinds of applications,

03:18:23.380 | all kinds of domains, and any domain you focus on

03:18:26.780 | will allow you to get insights to the big problem of AGI?

03:18:30.020 | - Yeah, for me, it's the ultimate meta problem.

03:18:31.860 | I don't wanna work on any one specific problem.

03:18:33.500 | There's too many problems.

03:18:34.380 | So how can you work on all problems simultaneously?

03:18:36.560 | You solve the meta problem,

03:18:37.860 | which to me is just intelligence.

03:18:39.420 | And how do you automate it?

03:18:42.340 | - Is there cool small projects like Archive Sanity

03:18:45.580 | and so on that you're thinking about,

03:18:49.180 | that the world, the ML world can anticipate?

03:18:53.140 | - There's always like some fun side projects.

03:18:55.460 | Archive Sanity is one.

03:18:57.140 | Basically, like there's way too many archive papers.

03:18:58.860 | How can I organize it and recommend papers and so on?

03:19:02.420 | I transcribed all of your podcasts.

03:19:04.900 | - What did you learn from that experience?

03:19:07.360 | From transcribing the process of,

03:19:09.820 | like you like consuming audio books and podcasts and so on.

03:19:13.020 | And here's a process that achieves

03:19:16.460 | closer to human level performance on annotation.

03:19:19.300 | - Yeah, well, I definitely was like surprised

03:19:21.220 | that transcription with opening as Whisper

03:19:24.020 | was working so well,

03:19:25.260 | compared to what I'm familiar with from Siri

03:19:27.140 | and like a few other systems, I guess.

03:19:29.580 | It works so well.

03:19:30.460 | And that's what gave me some energy to like try it out.

03:19:34.300 | And I thought it could be fun to run on podcasts.

03:19:36.740 | It's kind of not obvious to me

03:19:38.520 | why Whisper is so much better compared to anything else,

03:19:41.340 | because I feel like there should be a lot of incentive

03:19:43.020 | for a lot of companies to produce transcription systems

03:19:45.100 | and that they've done so over a long time.

03:19:46.740 | Whisper is not a super exotic model.

03:19:48.500 | It's a transformer.

03:19:50.240 | It takes male spectrograms

03:19:51.680 | and just outputs tokens of text.

03:19:54.220 | It's not crazy.

03:19:55.780 | The model and everything has been around for a long time.

03:19:58.460 | I'm not actually a hundred percent sure why.

03:19:59.780 | - Yeah, it's not obvious to me either.

03:20:02.100 | It makes me feel like I'm missing something.

03:20:04.180 | - I'm missing something.

03:20:05.140 | - Yeah, because there is a huge,

03:20:07.140 | even at Google and so on, YouTube transcription.

03:20:10.460 | - Yeah.

03:20:11.340 | - Yeah, it's unclear,

03:20:12.540 | but some of it is also integrating into a bigger system.

03:20:15.860 | - Yeah.

03:20:16.700 | - That, so the user interface,

03:20:18.300 | how it's deployed and all that kind of stuff.

03:20:19.780 | Maybe running it as an independent thing is much easier,

03:20:24.060 | like an order of magnitude easier

03:20:25.340 | than deploying to a large integrated system

03:20:27.740 | like YouTube transcription or anything like meetings,

03:20:31.780 | like Zoom has transcription.

03:20:34.860 | That's kind of crappy,

03:20:36.480 | but creating an interface

03:20:37.980 | where it detects the different individual speakers,

03:20:40.540 | it's able to display it in compelling ways,

03:20:45.540 | run it in real time, all that kind of stuff.

03:20:47.540 | Maybe that's difficult.

03:20:48.780 | But that's the only explanation I have,

03:20:51.180 | because I'm currently paying quite a bit

03:20:55.380 | for human transcription, human captions.

03:20:58.260 | - Right.

03:20:59.100 | - Annotation.

03:20:59.940 | And it seems like there's a huge incentive to automate that.

03:21:03.940 | - Yeah.

03:21:04.760 | - It's very confusing.

03:21:05.600 | - And I think, I mean, I don't know if you looked

03:21:06.420 | at some of the Whisper transcripts,

03:21:07.600 | but they're quite good.

03:21:09.060 | - They're good.

03:21:10.140 | And especially in tricky cases.

03:21:12.100 | - Yeah.

03:21:12.940 | - I've seen Whisper's performance on like super tricky cases

03:21:17.140 | and it does incredibly well.

03:21:18.380 | So I don't know.

03:21:19.500 | A podcast is pretty simple.

03:21:20.900 | It's like high quality audio

03:21:23.420 | and you're speaking usually pretty clearly.

03:21:26.620 | And so I don't know.

03:21:27.660 | I don't know what OpenAI's plans are either.

03:21:31.860 | - But yeah, there's always like fun projects basically.

03:21:34.580 | And stable diffusion also is opening up

03:21:36.620 | a huge amount of experimentation,

03:21:38.020 | I would say in the visual realm

03:21:39.180 | and generating images and videos and movies.

03:21:42.380 | - Yeah, videos now.

03:21:43.660 | - And so that's going to be pretty crazy.

03:21:45.500 | That's going to almost certainly work

03:21:48.300 | and is going to be really interesting

03:21:49.660 | when the cost of content creation is going to fall to zero.

03:21:52.940 | You used to need a painter for a few months

03:21:54.620 | to paint a thing.

03:21:55.500 | And now it's going to be speak to your phone

03:21:57.540 | to get your video.

03:21:59.180 | - So Hollywood will start using that to generate scenes,

03:22:02.080 | which completely opens up.

03:22:05.660 | - Yeah, so you can make a movie like "Avatar"

03:22:09.380 | eventually for under a million dollars.

03:22:12.360 | - Much less, maybe just by talking to your phone.

03:22:14.580 | I mean, I know it sounds kind of crazy.

03:22:16.540 | - And then there'd be some voting mechanism.

03:22:19.420 | Like how do you have a...

03:22:20.420 | Like would there be a show on Netflix

03:22:22.500 | that's generated completely automatically?

03:22:25.460 | - Yeah, potentially, yeah.

03:22:27.380 | And what does it look like also

03:22:28.540 | when you can just generate it on demand

03:22:30.500 | and there's infinity of it?

03:22:33.940 | - Yeah.

03:22:35.100 | (laughs)

03:22:36.060 | Oh man.

03:22:37.140 | All the synthetic content.

03:22:39.180 | I mean, it's humbling because we treat ourselves

03:22:41.980 | as special for being able to generate art

03:22:43.900 | and ideas and all that kind of stuff.

03:22:46.340 | If that can be done in an automated way by AI.

03:22:49.940 | - Yeah, I think it's fascinating to me how these...

03:22:52.740 | The predictions of AI and what it's going to look like

03:22:54.820 | and what it's going to be capable of

03:22:55.860 | are completely inverted and wrong.

03:22:57.580 | And sci-fi of 50s and 60s was just like totally not right.

03:23:01.300 | They imagined AI as like super calculating theorem provers

03:23:04.860 | and we're getting things that can talk to you

03:23:06.140 | about emotions, they can do art.

03:23:08.940 | It's just like weird.

03:23:10.300 | - Are you excited about that future?

03:23:11.860 | Just AI's like hybrid systems, heterogeneous systems

03:23:16.780 | of humans and AI's talking about emotions.

03:23:19.460 | Netflix and chill with an AI system.

03:23:21.420 | That's where the Netflix thing you watch

03:23:23.460 | is also generated by AI.

03:23:24.900 | - I think it's going to be interesting for sure.

03:23:29.340 | And I think I'm cautiously optimistic,

03:23:31.340 | but it's not obvious.

03:23:33.660 | - Well, the sad thing is your brain and mine developed

03:23:37.320 | in a time where before Twitter, before the internet.

03:23:42.320 | So I wonder people that are born inside of it

03:23:45.220 | might have a different experience.

03:23:47.700 | Like I, maybe you will still resist it.

03:23:51.080 | And the people born now will not.

03:23:54.740 | - Well, I do feel like humans are extremely malleable.

03:23:56.780 | - Yeah.

03:23:58.380 | - And you're probably right.

03:24:00.020 | - What is the meaning of life, Andre?

03:24:02.860 | - We talked about sort of the universe

03:24:08.020 | having a conversation with us humans

03:24:10.700 | or with the systems we create to try to answer.

03:24:14.020 | For the universe, for the creator of the universe

03:24:16.860 | to notice us, we're trying to create systems

03:24:19.500 | that are loud enough to answer back.

03:24:23.740 | - I don't know if that's the meaning of life.

03:24:24.940 | That's like meaning of life for some people.

03:24:26.940 | - The first level answer I would say is

03:24:28.700 | anyone can choose their own meaning of life

03:24:30.260 | because we are a conscious entity and it's beautiful.

03:24:33.080 | Number one.

03:24:34.120 | But I do think that like a deeper meaning of life

03:24:37.220 | if someone is interested is along the lines of like,

03:24:40.180 | what the hell is all this and like, why?

03:24:43.360 | And if you look at the, into fundamental physics

03:24:46.140 | and the quantum field theory and the standard model,

03:24:48.140 | they're like way, they're very complicated.

03:24:50.180 | And there's this like, you know,

03:24:52.660 | 19 free parameters of our universe.

03:24:55.440 | And like, what's going on with all this stuff?

03:24:57.680 | And why is it here?

03:24:58.600 | And can I hack it?

03:24:59.620 | Can I work with it?

03:25:00.460 | Is there a message for me?

03:25:01.360 | Am I supposed to create a message?

03:25:03.360 | And so I think there's some fundamental answers there.

03:25:05.720 | But I think there's actually even like,

03:25:07.640 | you can't actually like really make dent in those

03:25:09.980 | without more time.

03:25:11.200 | And so to me also, there's a big question around

03:25:13.780 | just getting more time, honestly.

03:25:15.960 | Yeah, that's kind of like what I think about

03:25:17.120 | quite a bit as well.

03:25:18.100 | - So kind of the ultimate, or at least first way

03:25:22.160 | to sneak up to the why question is to try to escape

03:25:26.480 | the system, the universe.

03:25:30.400 | And then for that, you sort of backtrack and say,

03:25:34.280 | okay, for that, that's gonna be, take a very long time.

03:25:36.720 | So the why question boils down from an engineering

03:25:39.540 | perspective to how do we extend?

03:25:41.280 | - Yeah, I think that's the question number one,

03:25:42.640 | practically speaking, because you can't,

03:25:44.760 | you're not gonna calculate the answer

03:25:46.160 | to the deeper questions in time you have.

03:25:49.000 | - And that could be extending your own lifetime

03:25:50.840 | or extending just the lifetime of human civilization.

03:25:53.640 | - Of whoever wants to.

03:25:55.160 | Not many people might not want that.

03:25:57.360 | But I think people who do want that,

03:25:58.920 | I think it's probably possible.

03:26:02.400 | And I don't know that people fully realize this.

03:26:05.440 | I kind of feel like people think of death

03:26:07.040 | as an inevitability.

03:26:08.840 | But at the end of the day, this is a physical system.

03:26:11.120 | Some things go wrong.

03:26:13.000 | It makes sense why things like this happen,

03:26:15.200 | evolutionarily speaking.

03:26:16.800 | And there's most certainly interventions

03:26:19.520 | that mitigate it.

03:26:21.400 | - That'd be interesting if death is eventually looked at

03:26:23.960 | as a fascinating thing that used to happen to humans.

03:26:28.960 | - I don't think it's unlikely.

03:26:29.960 | I think it's likely.

03:26:31.980 | - And it's up to our imagination to try to predict

03:26:37.040 | what the world without death looks like.

03:26:39.720 | - Yeah. - It's hard to,

03:26:41.040 | I think the values will completely change.

03:26:43.260 | - Could be.

03:26:45.280 | I don't really buy all these ideas that,

03:26:47.660 | oh, without death, there's no meaning,

03:26:50.640 | there's nothing as,

03:26:52.340 | I don't intuitively buy all those arguments.

03:26:54.920 | I think there's plenty of meaning,

03:26:56.200 | plenty of things to learn.

03:26:57.360 | They're interesting, exciting.

03:26:58.320 | I want to know, I want to calculate.

03:27:00.320 | I want to improve the condition

03:27:02.520 | of all the humans and organisms that are alive.

03:27:05.680 | - Yeah, the way we find meaning might change.

03:27:08.480 | There is a lot of humans, probably including myself,

03:27:11.000 | that finds meaning in the finiteness of things.

03:27:14.560 | But that doesn't mean that's the only source of meaning.

03:27:16.520 | - Yeah.

03:27:17.360 | I do think many people will go with that,

03:27:19.820 | which I think is great.

03:27:21.100 | I love the idea that people

03:27:22.120 | can just choose their own adventure.

03:27:24.080 | Like you are born as a conscious, free entity,

03:27:26.780 | by default, I like to think.

03:27:28.360 | And you have your unalienable rights for life.

03:27:33.360 | - In the pursuit of happiness.

03:27:34.760 | - Pursuit of happiness.

03:27:35.600 | - I don't know if you have that.

03:27:37.280 | In the nature, the landscape of happiness.

03:27:39.720 | - You can choose your own adventure mostly.

03:27:41.520 | And that's not fully true, but.

03:27:43.800 | - I still am pretty sure I'm an NPC.

03:27:46.160 | But an NPC can't know it's an NPC.

03:27:49.780 | There could be different degrees

03:27:52.680 | and levels of consciousness.

03:27:54.280 | I don't think there's a more beautiful way to end it.

03:27:57.640 | Andre, you're an incredible person.

03:28:00.280 | I'm really honored you would talk with me.

03:28:02.040 | Everything you've done for the machine learning world,

03:28:04.120 | for the AI world, to just inspire people,

03:28:07.400 | to educate millions of people, it's been great.

03:28:10.440 | And I can't wait to see what you do next.

03:28:11.960 | It's been an honor, man.

03:28:12.900 | Thank you so much for talking today.

03:28:14.240 | - Awesome, thank you.

03:28:15.800 | - Thanks for listening to this conversation

03:28:17.880 | with Andre Karpathy.

03:28:19.320 | To support this podcast,

03:28:20.640 | please check out our sponsors in the description.

03:28:23.640 | And now, let me leave you with some words from Samuel Carlin.

03:28:28.640 | The purpose of models is not to fit the data,

03:28:32.160 | but to sharpen the questions.

03:28:34.560 | Thanks for listening, and hope to see you next time.

03:28:38.400 | (upbeat music)

03:28:40.980 | (upbeat music)

03:28:43.560 | [BLANK_AUDIO]

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Chapters