back to index

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258


Chapters

0:0 Introduction
0:36 Self-supervised learning
10:55 Vision vs language
16:46 Statistics
22:33 Three challenges of machine learning
28:22 Chess
36:25 Animals and intelligence
46:9 Data augmentation
67:29 Multimodal learning
79:18 Consciousness
84:3 Intrinsic vs learned ideas
88:15 Fear of death
96:7 Artificial Intelligence
109:56 Facebook AI Research
126:34 NeurIPS
142:46 Complexity
151:11 Music
156:6 Advice for young people

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Jan Lekun,
00:00:02.740 | his second time on the podcast.
00:00:04.540 | He is the chief AI scientist at Meta, formerly Facebook,
00:00:09.180 | professor at NYU, Turing Award winner,
00:00:13.060 | one of the seminal figures in the history
00:00:15.620 | of machine learning and artificial intelligence,
00:00:18.500 | and someone who is brilliant and opinionated
00:00:21.980 | in the best kind of way,
00:00:23.460 | and so is always fun to talk to.
00:00:26.000 | This is the Lex Friedman Podcast.
00:00:28.000 | To support it, please check out our sponsors
00:00:29.980 | in the description, and now, here's my conversation
00:00:33.500 | with Jan Lekun.
00:00:35.040 | You co-wrote the article,
00:00:37.540 | "Self-supervised learning, the dark matter of intelligence."
00:00:40.900 | Great title, by the way, with Ishan Misra.
00:00:43.720 | So let me ask, what is self-supervised learning,
00:00:46.640 | and why is it the dark matter of intelligence?
00:00:49.940 | - I'll start by the dark matter part.
00:00:51.780 | There is obviously a kind of learning
00:00:55.700 | that humans and animals are doing
00:00:59.860 | that we currently are not reproducing properly
00:01:02.820 | with machines or with AI, right?
00:01:04.660 | So the most popular approaches to machine learning today
00:01:07.460 | are, or paradigms, I should say,
00:01:09.660 | are supervised learning and reinforcement learning.
00:01:12.700 | And they are extremely inefficient.
00:01:15.120 | Supervised learning requires many samples
00:01:17.620 | for learning anything,
00:01:19.740 | and reinforcement learning requires
00:01:21.820 | a ridiculously large number of trial and errors
00:01:24.900 | for a system to learn anything.
00:01:29.340 | And that's why we don't have self-driving cars.
00:01:32.060 | (Lex laughing)
00:01:32.980 | - That was a big leap from one to the other.
00:01:34.780 | Okay, so that, to solve difficult problems,
00:01:38.780 | you have to have a lot of human annotation
00:01:42.340 | for supervised learning to work,
00:01:44.080 | and to solve those difficult problems
00:01:45.500 | with reinforcement learning, you have to have
00:01:47.920 | some way to maybe simulate that problem
00:01:50.220 | such that you can do that large-scale kind of learning
00:01:52.700 | that reinforcement learning requires.
00:01:54.420 | - Right, so how is it that, you know,
00:01:57.180 | most teenagers can learn to drive a car
00:01:59.020 | in about 20 hours of practice,
00:02:02.300 | whereas even with millions of hours of simulated practice,
00:02:07.300 | a self-driving car can't actually learn
00:02:09.220 | to drive itself properly.
00:02:10.700 | And so obviously we're missing something, right?
00:02:13.900 | And it's quite obvious for a lot of people that,
00:02:16.420 | you know, the immediate response you get from many people
00:02:19.520 | is, well, you know, humans use their background knowledge
00:02:22.840 | to learn faster, and they're right.
00:02:25.820 | Now, how was that background knowledge acquired?
00:02:28.260 | And that's the big question.
00:02:30.040 | So now you have to ask, you know,
00:02:32.380 | how do babies in their first few months of life
00:02:35.100 | learn how the world works?
00:02:37.100 | Mostly by observation,
00:02:38.220 | because they can hardly act in the world.
00:02:40.220 | And they learn an enormous amount
00:02:42.500 | of background knowledge about the world.
00:02:43.820 | That may be the basis of what we call common sense.
00:02:47.940 | This type of learning, it's not learning a task,
00:02:51.220 | it's not being reinforced for anything,
00:02:53.620 | it's just observing the world and figuring out how it works.
00:02:57.240 | Building world models, learning world models.
00:03:01.140 | How do we do this?
00:03:02.060 | And how do we reproduce this in machines?
00:03:04.500 | So self-supervised learning is, you know,
00:03:07.620 | one instance or one attempt
00:03:10.220 | at trying to reproduce this kind of learning.
00:03:13.020 | - Okay, so you're looking at just observation,
00:03:16.300 | so not even the interacting part of a child.
00:03:18.620 | It's just sitting there watching mom and dad walk around,
00:03:21.540 | pick up stuff, all of that.
00:03:23.420 | - That's what you mean by background knowledge.
00:03:25.500 | - Perhaps not even watching mom and dad,
00:03:27.500 | just, you know, watching the world go by.
00:03:29.980 | - Just having eyes open or having eyes closed,
00:03:31.900 | or the very act of opening and closing eyes
00:03:34.460 | that the world appears and disappears,
00:03:36.260 | all that basic information.
00:03:37.820 | And you're saying in order to learn to drive,
00:03:43.100 | like the reason humans are able to learn to drive quickly,
00:03:45.820 | some faster than others,
00:03:47.340 | is because of the background knowledge
00:03:48.660 | they were able to watch cars operate in the world
00:03:51.740 | in the many years leading up to it,
00:03:53.580 | the physics of basic objects, all that kind of stuff.
00:03:55.740 | - That's right.
00:03:56.580 | I mean, the basic physics of objects,
00:03:57.420 | you don't even know, you don't even need to know,
00:03:59.540 | you know, how a car works, right?
00:04:00.820 | Because that you can learn fairly quickly.
00:04:02.460 | I mean, the example I use very often
00:04:03.780 | is you're driving next to a cliff,
00:04:05.700 | and you know in advance,
00:04:08.100 | because of your understanding of intuitive physics,
00:04:11.820 | that if you turn the wheel to the right,
00:04:13.740 | the car will veer to the right,
00:04:15.020 | will run off the cliff, fall off the cliff,
00:04:17.580 | and nothing good will come out of this, right?
00:04:20.420 | But if you are a sort of, you know,
00:04:22.740 | tabula rasa reinforcement learning system
00:04:25.100 | that doesn't have a model of the world,
00:04:27.060 | you have to repeat falling off this cliff
00:04:30.500 | thousands of times before you figure out it's a bad idea.
00:04:32.780 | And then a few more thousand times
00:04:34.580 | before you figure out how to not do it.
00:04:36.940 | And then a few more million times
00:04:38.460 | before you figure out how to not do it
00:04:39.780 | in every situation you ever encounter.
00:04:42.500 | - So self-supervised learning still has to have
00:04:45.820 | some source of truth being told to it by somebody.
00:04:50.100 | - So you have to figure out a way without human assistance
00:04:54.540 | or without significant amount of human assistance
00:04:56.580 | to get that truth from the world.
00:04:59.100 | So the mystery there is how much signal is there,
00:05:03.980 | how much truth is there that the world gives you,
00:05:06.260 | whether it's the human world,
00:05:08.180 | like you watch YouTube or something like that,
00:05:10.020 | or it's the more natural world.
00:05:12.980 | So how much signal is there?
00:05:14.900 | - So here's the trick.
00:05:16.300 | There is way more signal in sort of a self-supervised setting
00:05:20.580 | than there is in either a supervised
00:05:22.500 | or reinforcement setting.
00:05:24.540 | And this is going to my analogy of the cake.
00:05:28.340 | Le cake as someone has called it,
00:05:32.340 | where when you try to figure out how much information
00:05:36.020 | you ask the machine to predict
00:05:37.820 | and how much feedback you give the machine at every trial,
00:05:40.980 | in reinforcement learning,
00:05:41.820 | you give the machine a single scaler,
00:05:43.300 | you tell the machine you did good, you did bad,
00:05:45.340 | and you only tell this to the machine once in a while.
00:05:49.580 | When I say you, it could be the universe
00:05:51.380 | telling the machine, right?
00:05:52.780 | But it's just one scaler.
00:05:55.780 | And so as a consequence,
00:05:57.100 | you cannot possibly learn something very complicated
00:05:59.540 | without many, many, many trials
00:06:01.060 | where you get many, many feedbacks of this type.
00:06:04.700 | Supervised learning, you give a few bits to the machine
00:06:08.860 | at every sample.
00:06:10.180 | Let's say you're training a system on, you know,
00:06:14.300 | recognizing images on ImageNet.
00:06:16.300 | There is 1000 categories,
00:06:17.660 | that's a little less than 10 bits of information per sample.
00:06:20.900 | But self-supervised learning here is a setting.
00:06:24.620 | Ideally, we don't know how to do this yet,
00:06:26.340 | but ideally you would show a machine a segment of video
00:06:31.340 | and then stop the video and ask the machine to predict
00:06:34.140 | what's going to happen next.
00:06:35.540 | And so you let the machine predict,
00:06:38.660 | and then you let time go by
00:06:41.380 | and show the machine what actually happened.
00:06:44.260 | And hope the machine will, you know,
00:06:46.300 | learn to do a better job at predicting next time around.
00:06:49.340 | There's a huge amount of information you give the machine
00:06:51.500 | because it's an entire video clip of, you know,
00:06:56.500 | the future after the video clip you fed it
00:06:59.180 | in the first place.
00:07:00.220 | - So both for language and for vision,
00:07:02.820 | there's a subtle, seemingly trivial construction,
00:07:06.860 | but maybe that's representative
00:07:08.460 | of what is required to create intelligence,
00:07:10.580 | which is filling the gap.
00:07:13.700 | - Filling the gaps.
00:07:14.700 | - Sounds dumb, but can you,
00:07:17.780 | it is possible you could solve all of intelligence
00:07:22.060 | in this way, just for both language,
00:07:25.260 | just give a sentence and continue it,
00:07:28.780 | or give a sentence and there's a gap in it,
00:07:31.140 | some words blanked out,
00:07:33.500 | and you fill in what words go there.
00:07:35.700 | For vision, you give a sequence of images
00:07:39.180 | and predict what's gonna happen next,
00:07:40.940 | or you fill in what happened in between.
00:07:43.860 | Do you think it's possible that formulation alone,
00:07:47.020 | as a signal for self-supervised learning,
00:07:50.980 | can solve intelligence for vision and language?
00:07:53.620 | - I think that's our best shot at the moment.
00:07:56.300 | So whether this will take us all the way to,
00:07:59.820 | you know, human level intelligence or something,
00:08:01.780 | or just cat level intelligence is not clear,
00:08:04.860 | but among all the possible approaches
00:08:07.340 | that people have proposed, I think it's our best shot.
00:08:09.500 | So I think this idea of an intelligence system
00:08:14.500 | filling in the blanks, either predicting the future,
00:08:18.860 | inferring the past, filling in missing information,
00:08:22.180 | I'm currently filling the blank of what is behind your head
00:08:26.660 | and what your head looks like from the back,
00:08:30.580 | because I have basic knowledge about how humans are made.
00:08:33.740 | And I don't know if you're gonna,
00:08:35.660 | what are you gonna say, at which point you're gonna speak,
00:08:37.260 | whether you're gonna move your head this way or that way,
00:08:38.980 | which way you're gonna look.
00:08:40.260 | But I know you're not gonna just dematerialize
00:08:42.100 | and reappear three meters down the hall,
00:08:44.940 | because I know what's possible and what's impossible,
00:08:49.540 | according to intuitive physics.
00:08:50.900 | - So you have a model of what's possible and what's impossible
00:08:53.260 | and then you'd be very surprised if it happens,
00:08:55.100 | and then you'll have to reconstruct your model.
00:08:57.860 | - Right, so that's the model of the world.
00:08:59.620 | It's what tells you what fills in the blanks.
00:09:02.260 | So given your partial information
00:09:04.460 | about the state of the world, given by your perception,
00:09:08.060 | your model of the world fills in the missing information.
00:09:11.340 | And that includes predicting the future,
00:09:13.740 | retrodicting the past,
00:09:15.220 | filling in things you don't immediately perceive.
00:09:18.380 | - And that doesn't have to be purely generic vision
00:09:22.260 | or visual information or generic language.
00:09:24.300 | You can go to specifics,
00:09:25.820 | like predicting what control decision you make
00:09:30.260 | when you're driving in a lane.
00:09:31.580 | You have a sequence of images from a vehicle,
00:09:35.580 | and then you have information,
00:09:38.380 | if you record it on video, where the car ended up going.
00:09:41.780 | So you can go back in time and predict where the car went
00:09:45.500 | based on the visual information.
00:09:46.660 | That's very specific, domain-specific.
00:09:49.420 | - Right, but the question is whether we can come up
00:09:51.460 | with sort of a generic method for training machines
00:09:56.460 | to do this kind of prediction or filling in the blanks.
00:09:59.820 | So right now, this type of approach
00:10:03.220 | has been unbelievably successful
00:10:05.540 | in the context of natural language processing.
00:10:08.140 | Every model in natural language processing
00:10:09.660 | is pre-trained in self-supervised manner
00:10:12.220 | to fill in the blanks.
00:10:13.660 | You show it a sequence of words, you remove 10% of them,
00:10:16.380 | and then you train some gigantic neural net
00:10:17.900 | to predict the words that are missing.
00:10:20.260 | And once you've pre-trained that network,
00:10:22.660 | you can use the internal representation learned by it
00:10:26.540 | as input to something that you trained, supervised,
00:10:30.380 | or whatever.
00:10:32.140 | That's been incredibly successful.
00:10:33.300 | Not so successful in images, although it's making progress.
00:10:37.500 | And it's based on sort of manual data augmentation.
00:10:42.500 | We can go into this later.
00:10:43.460 | But what has not been successful yet is training from video.
00:10:47.140 | So getting a machine to learn to represent the visual world,
00:10:50.180 | for example, by just watching video.
00:10:52.700 | Nobody has really succeeded in doing this.
00:10:54.740 | - Okay, well, let's kind of give a high-level overview.
00:10:57.460 | What's the difference in kind and in difficulty
00:11:02.340 | between vision and language?
00:11:03.900 | So you said people haven't been able to really kind of crack
00:11:08.820 | the problem of vision open
00:11:10.420 | in terms of self-supervised learning,
00:11:11.900 | but that may not be necessarily
00:11:13.740 | because it's fundamentally more difficult.
00:11:15.820 | Maybe like when we're talking about achieving,
00:11:18.660 | like passing the Turing test in the full spirit
00:11:22.260 | of the Turing test in language might be harder than vision.
00:11:24.860 | That's not obvious.
00:11:26.380 | So in your view, which is harder,
00:11:29.380 | or perhaps are they just the same problem?
00:11:31.660 | The farther we get to solving each,
00:11:34.820 | the more we realize it's all the same thing.
00:11:36.700 | It's all the same cake.
00:11:37.620 | - I think what I'm looking for are methods
00:11:40.180 | that make them look essentially like the same cake,
00:11:43.580 | but currently they're not.
00:11:44.740 | And the main issue with learning world models
00:11:48.460 | or learning predictive models is that
00:11:50.860 | the prediction is never a single thing,
00:11:55.860 | because the world is not entirely predictable.
00:11:59.220 | It may be deterministic or stochastic.
00:12:00.700 | We can get into the philosophical discussion about it,
00:12:02.940 | but even if it's deterministic,
00:12:05.260 | it's not entirely predictable.
00:12:07.420 | And so if I play a short video clip
00:12:11.740 | and then I ask you to predict what's going to happen next,
00:12:14.140 | there's many, many plausible continuations
00:12:16.340 | for that video clip.
00:12:18.300 | And the number of continuation grows
00:12:20.540 | with the interval of time that you're asking the system
00:12:23.900 | to make a prediction for.
00:12:26.460 | And so one big question with self-supervised learning
00:12:29.860 | is how you represent this uncertainty,
00:12:32.300 | how you represent multiple discrete outcomes,
00:12:35.180 | how you represent a sort of continuum
00:12:37.060 | of possible outcomes, et cetera.
00:12:40.380 | And if you are a sort of a classical machine learning person,
00:12:45.180 | you say, "Oh, you just represent a distribution."
00:12:47.580 | And that we know how to do when we're predicting words,
00:12:52.540 | missing words in the text,
00:12:53.660 | because you can have a neural net give a score
00:12:56.820 | for every word in a dictionary.
00:12:58.580 | It's a big list of numbers, maybe 100,000 or so.
00:13:02.420 | And you can turn them into a probability distribution
00:13:05.220 | that tells you when I say a sentence,
00:13:07.620 | the cat is chasing the blank in the kitchen.
00:13:12.300 | There are only a few words that make sense there.
00:13:15.820 | It could be a mouse or it could be a lizard spot
00:13:18.340 | or something like that.
00:13:21.540 | And if I say the blank is chasing the blank in the savanna,
00:13:25.820 | you also have a bunch of plausible options
00:13:27.820 | for those two words, right?
00:13:29.180 | Because you have kind of underlying reality
00:13:33.620 | that you can refer to to sort of fill in those blanks.
00:13:36.260 | So you cannot say for sure in the savanna
00:13:42.020 | if it's a lion or a cheetah or whatever,
00:13:44.460 | you cannot know if it's a zebra or a gnu or whatever,
00:13:49.460 | wildebeest, the same thing.
00:13:50.860 | - Yeah.
00:13:51.700 | - But you can represent the uncertainty
00:13:56.820 | by just a long list of numbers.
00:13:58.460 | Now, if I do the same thing with video
00:14:01.780 | and I ask you to predict a video clip,
00:14:04.300 | it's not a discrete set of potential frames.
00:14:07.380 | You have to have some way of representing
00:14:09.980 | a sort of infinite number of plausible continuations
00:14:13.540 | of multiple frames in a high dimensional continuous space.
00:14:17.460 | And we just have no idea how to do this properly.
00:14:20.540 | - Finite high dimensional.
00:14:22.860 | So like you--
00:14:23.700 | - It's finite high dimensional, yes.
00:14:25.300 | - Just like the words,
00:14:26.220 | they try to get it down to a small finite set
00:14:31.220 | of like under a million, something like that.
00:14:34.220 | - Something like that.
00:14:35.060 | - I mean, it's kind of ridiculous
00:14:36.020 | that we're doing a distribution
00:14:39.020 | of every single possible word for language and it works.
00:14:42.900 | It feels like that's a really dumb way to do it.
00:14:45.300 | Like there seems to be like there should be
00:14:49.720 | some more compressed representation
00:14:52.900 | of the distribution of the words.
00:14:55.020 | - You're right about that.
00:14:56.140 | - And so--
00:14:56.980 | - I agree.
00:14:57.800 | - Do you have any interesting ideas
00:14:58.900 | about how to represent all of reality in a compressed way
00:15:01.860 | such that you can form a distribution over it?
00:15:03.780 | - That's one of the big questions, how do you do that?
00:15:06.180 | But I mean, what's kind of,
00:15:07.980 | another thing that really is stupid about,
00:15:12.140 | I shouldn't say stupid,
00:15:13.060 | but like simplistic about current approaches
00:15:15.540 | to self-supervised learning in NLP and text
00:15:19.340 | is that not only do you represent
00:15:21.880 | a giant distribution of words,
00:15:23.780 | but for multiple words that are missing,
00:15:25.640 | those distributions are essentially independent
00:15:27.660 | of each other.
00:15:28.500 | And you don't pay too much of a price for this.
00:15:33.020 | So you can't, so the system,
00:15:36.720 | in the sentence that I gave earlier,
00:15:38.900 | if it gives a certain probability for a lion and a cheetah,
00:15:43.620 | and then a certain probability for gazelle, wildebeest,
00:15:48.420 | and zebra, those two probabilities
00:15:52.980 | are independent of each other.
00:15:54.780 | And it's not the case that those things are independent.
00:15:58.020 | Lions actually attack like bigger animals than cheetahs.
00:16:01.440 | So, there's a huge independence hypothesis in this process,
00:16:05.940 | which is not actually true.
00:16:07.780 | The reason for this is that we don't know how to represent
00:16:10.860 | properly distributions over combinatorial sequences
00:16:15.580 | of symbols, essentially,
00:16:17.340 | because the number grows exponentially
00:16:18.980 | with the length of the symbols.
00:16:21.300 | And so we have to use tricks for this,
00:16:22.740 | but those techniques can get around,
00:16:26.380 | like don't even deal with it.
00:16:27.780 | So the big question is,
00:16:30.420 | would there be some sort of abstract
00:16:33.380 | latent representation of text that would say that,
00:16:37.420 | when I switch lion for gazelle,
00:16:40.660 | lion for cheetah, I also have to switch zebra for gazelle.
00:16:45.480 | - Yeah, so this independence assumption,
00:16:48.720 | let me throw some criticism at you that I often hear
00:16:51.140 | and see how you respond.
00:16:52.940 | So this kind of filling in the blanks is just statistics.
00:16:56.020 | You're not learning anything,
00:16:58.000 | like the deep underlying concepts.
00:17:01.580 | You're just mimicking stuff from the past.
00:17:05.660 | You're not learning anything new
00:17:07.540 | such that you can use it to generalize about the world.
00:17:10.800 | Or, okay, let me just say the crude version,
00:17:14.100 | which is it's just statistics.
00:17:16.200 | It's not intelligence.
00:17:17.860 | What do you have to say to that?
00:17:19.620 | What do you usually say to that
00:17:20.880 | if you kind of hear this kind of thing?
00:17:22.640 | - I don't get into those discussions
00:17:23.940 | because they're kind of pointless.
00:17:26.740 | So first of all, it's quite possible
00:17:28.740 | that intelligence is just statistics.
00:17:30.460 | It's just statistics of a particular kind.
00:17:32.540 | - But this is the philosophical question.
00:17:35.580 | Is it possible that intelligence is just statistics?
00:17:40.260 | - Yeah.
00:17:41.580 | But what kind of statistics?
00:17:43.500 | So if you are asking the question,
00:17:46.180 | are the models of the world that we learn,
00:17:50.620 | do they have some notion of causality?
00:17:53.380 | So if the criticism comes from people who say,
00:17:56.220 | current machine learning system don't care about causality,
00:17:59.420 | which by the way is wrong, I agree with them.
00:18:03.100 | Your model of the world should have your actions
00:18:06.560 | as one of the inputs,
00:18:09.100 | and that will drive you to learn causal models of the world
00:18:11.420 | where you know what intervention in the world
00:18:15.060 | will cause what result,
00:18:16.700 | or you can do this by observation of other agents
00:18:19.420 | acting in the world and observing the effect,
00:18:21.920 | other humans, for example.
00:18:24.220 | So I think at some level of description,
00:18:28.380 | intelligence is just statistics,
00:18:30.200 | but that doesn't mean you won't have models
00:18:35.180 | that have deep mechanistic explanation for what goes on.
00:18:40.060 | The question is, how do you learn them?
00:18:41.740 | That's the question I'm interested in.
00:18:44.420 | Because a lot of people who actually voice their criticism
00:18:49.340 | say that those mechanistic model
00:18:50.980 | have to come from someplace else.
00:18:52.660 | They have to come from human designers,
00:18:54.060 | they have to come from I don't know what.
00:18:56.180 | And obviously we learn them.
00:18:57.880 | Or if we don't learn them as an individual,
00:19:01.800 | nature learn them for us using evolution.
00:19:04.920 | So regardless of what you think,
00:19:07.180 | those processes have been learned somehow.
00:19:10.260 | - So if you look at the human brain,
00:19:12.940 | just like when we humans introspect
00:19:14.660 | about how the brain works,
00:19:16.340 | it seems like when we think about what is intelligence,
00:19:20.260 | we think about the high level stuff,
00:19:22.460 | like the models we've constructed,
00:19:23.960 | concepts like cognitive science,
00:19:25.580 | like concepts of memory and reasoning module,
00:19:28.700 | almost like these high level modules.
00:19:31.660 | Is this serve as a good analogy?
00:19:35.380 | Like are we ignoring the dark matter,
00:19:40.380 | the basic low level mechanisms,
00:19:43.580 | just like we ignore the way the operating system works,
00:19:45.820 | we're just using the high level software.
00:19:49.660 | We're ignoring that at the low level,
00:19:52.740 | the neural network might be doing something like statistics.
00:19:56.460 | Like me, sorry to use this word
00:19:59.140 | probably incorrectly and crudely,
00:20:00.580 | but doing this kind of fill in the gap kind of learning
00:20:03.340 | and just kind of updating the model constantly
00:20:05.740 | in order to be able to support the raw sensory information,
00:20:09.260 | to predict it and then adjust to the prediction
00:20:11.380 | when it's wrong.
00:20:12.420 | But like when we look at our brain at the high level,
00:20:15.860 | it feels like we're playing chess,
00:20:18.340 | like we're playing with high level concepts
00:20:22.260 | and we're stitching them together
00:20:23.700 | and we're putting them into long-term memory.
00:20:26.040 | But really what's going underneath
00:20:28.300 | is something we're not able to introspect,
00:20:30.200 | which is this kind of simple, large neural network
00:20:34.460 | that's just filling in the gaps.
00:20:36.020 | - Right, well, okay, so there's a lot of questions
00:20:38.260 | and a lot of answers there.
00:20:39.780 | Okay, so first of all,
00:20:40.620 | there's a whole school of thought in neuroscience,
00:20:42.700 | computational neuroscience in particular,
00:20:45.260 | that likes the idea of predictive coding,
00:20:47.800 | which is really related to the idea I was talking about
00:20:50.820 | in self-supervised learning.
00:20:52.060 | So everything is about prediction,
00:20:53.580 | the essence of intelligence is the ability to predict
00:20:56.360 | and everything the brain does is trying to predict
00:20:59.940 | everything from everything else.
00:21:02.140 | Okay, and that's really sort of the underlying principle,
00:21:04.780 | if you want, that self-supervised learning
00:21:07.820 | is trying to kind of reproduce this idea of prediction
00:21:10.660 | as kind of an essential mechanism
00:21:13.060 | of task-independent learning, if you want.
00:21:16.320 | The next step is what kind of intelligence
00:21:19.340 | are you interested in reproducing?
00:21:21.140 | And of course, we all think about trying to reproduce
00:21:25.300 | high-level cognitive processes in humans,
00:21:28.340 | but with machines, we're not even at the level
00:21:30.420 | of even reproducing the learning processes in a cat brain.
00:21:35.420 | The most intelligent or intelligent systems
00:21:39.020 | don't have as much common sense as a house cat.
00:21:41.960 | So how is it that cats learn?
00:21:45.180 | And cats don't do a whole lot of reasoning.
00:21:47.900 | They certainly have causal models.
00:21:49.580 | They certainly have, because many cats can figure out
00:21:53.620 | how they can act on the world to get what they want.
00:21:56.660 | They certainly have a fantastic model of intuitive physics,
00:22:01.660 | certainly the dynamics of their own bodies,
00:22:04.620 | but also of praise and things like that, right?
00:22:06.940 | So they're pretty smart.
00:22:09.940 | They only do this with about 800 million neurons.
00:22:12.460 | We are not anywhere close
00:22:15.980 | to reproducing this kind of thing.
00:22:17.980 | So to some extent, I could say,
00:22:21.340 | let's not even worry about the high-level cognition
00:22:26.340 | and long-term planning and reasoning that humans can do
00:22:28.660 | until we figure out,
00:22:30.100 | can we even reproduce what cats are doing?
00:22:32.500 | Now, that said, this ability to learn world models,
00:22:37.000 | I think is the key to the possibility
00:22:40.140 | of learning machines that can also reason.
00:22:43.160 | So whenever I give a talk, I say,
00:22:44.340 | there are three main challenges in machine learning.
00:22:47.300 | The first one is getting machines to learn
00:22:49.940 | to represent the world,
00:22:51.820 | and I'm proposing self-supervised learning.
00:22:54.840 | The second is getting machines to reason
00:22:58.000 | in ways that are compatible
00:22:59.240 | with essentially gradient-based learning,
00:23:01.600 | because this is what deep learning is all about, really.
00:23:04.280 | And the third one is something we have no idea how to solve,
00:23:07.640 | at least I have no idea how to solve,
00:23:09.480 | is can we get machines to learn hierarchical representations
00:23:14.360 | of action plans?
00:23:15.980 | We know how to train them
00:23:18.780 | to learn hierarchical representations of perception,
00:23:22.240 | with convolutional nets and things like that,
00:23:23.680 | and transformers, but what about action plans?
00:23:26.080 | Can we get them to spontaneously learn
00:23:28.320 | good hierarchical representations of actions?
00:23:30.560 | - Also gradient-based.
00:23:32.440 | - Yeah, all of that needs to be somewhat differentiable
00:23:35.920 | so that you can apply sort of gradient-based learning,
00:23:38.760 | which is really what deep learning is about.
00:23:40.920 | - So it's background, knowledge, ability to reason
00:23:46.760 | in a way that's differentiable,
00:23:50.520 | that is somehow connected, deeply integrated
00:23:53.840 | with that background knowledge,
00:23:55.480 | or builds on top of that background knowledge,
00:23:57.640 | and then given that background knowledge,
00:23:59.120 | be able to make hierarchical plans in the world.
00:24:02.400 | - So if you take classical optimal control,
00:24:05.480 | there's something in classical optimal control
00:24:07.000 | called model predictive control,
00:24:10.520 | and it's been around since the early '60s.
00:24:13.840 | NASA uses that to compute trajectories of rockets.
00:24:16.840 | And the basic idea is that you have a predictive model
00:24:20.600 | of the rocket, let's say,
00:24:21.840 | or whatever system you intend to control,
00:24:25.440 | which, given the state of the system at time t,
00:24:28.360 | and given an action that you're taking on the system,
00:24:31.640 | so for a rocket to be thrust,
00:24:33.520 | and all the controls you can have,
00:24:35.600 | it gives you the state of the system
00:24:37.960 | at time t plus delta t, right?
00:24:39.560 | So basically a differential equation, something like that.
00:24:44.240 | And if you have this model,
00:24:45.920 | and you have this model in the form
00:24:47.840 | of some sort of neural net,
00:24:49.360 | or some sort of set of formula
00:24:51.600 | that you can back propagate gradient through,
00:24:53.600 | you can do what's called model predictive control,
00:24:55.880 | or gradient-based model predictive control.
00:24:58.280 | So you have, you can unroll that model in time,
00:25:03.280 | you feed it a hypothesized sequence of actions,
00:25:09.560 | and then you have some objective function
00:25:13.560 | that measures how well, at the end of the trajectory,
00:25:16.040 | the system has succeeded or matched what you want it to do.
00:25:18.880 | Is it a robot harm?
00:25:21.200 | Have you grasped the object you want to grasp?
00:25:23.400 | If it's a rocket, are you at the right place
00:25:26.320 | near the space station?
00:25:27.600 | Things like that.
00:25:29.080 | And by back propagation through time,
00:25:30.960 | and again, this was invented in the 1960s
00:25:32.800 | by optimal control theorists,
00:25:35.120 | you can figure out what is the optimal sequence of actions
00:25:39.080 | that will get my system to the best final state.
00:25:44.040 | So that's a form of reasoning.
00:25:47.560 | It's basically planning,
00:25:48.640 | and a lot of planning systems in robotics
00:25:51.120 | are actually based on this.
00:25:52.840 | And you can think of this as a form of reasoning.
00:25:56.160 | So to take the example of the teenager driving a car again,
00:26:00.800 | you have a pretty good dynamical model of the car,
00:26:02.880 | it doesn't need to be very accurate,
00:26:04.200 | but you know, again, that if you turn the wheel
00:26:06.760 | to the right and there is a cliff,
00:26:08.080 | you're gonna run off the cliff, right?
00:26:09.400 | You don't need to have a very accurate model
00:26:10.840 | to predict that.
00:26:12.080 | And you can run this in your mind
00:26:13.640 | and decide not to do it for that reason,
00:26:16.040 | because you can predict in advance
00:26:17.440 | that the result is gonna be bad.
00:26:18.560 | So you can sort of imagine different scenarios
00:26:21.000 | and then employ or take the first step in the scenario
00:26:25.200 | that is most favorable,
00:26:26.360 | and then repeat the process of planning.
00:26:27.800 | That's called receding horizon model predictive control.
00:26:30.560 | So even all those things have names,
00:26:33.000 | going back decades.
00:26:35.800 | And so if you're not, you know,
00:26:38.960 | in classical optimal control,
00:26:40.040 | the model of the world is not generally learned.
00:26:42.440 | There's, you know, sometimes a few parameters
00:26:44.840 | you have to identify,
00:26:45.680 | that's called systems identification.
00:26:47.080 | But generally the model is mostly deterministic
00:26:52.000 | and mostly built by hand.
00:26:53.280 | So the big question of AI,
00:26:55.640 | I think the big challenge of AI for the next decade
00:26:58.720 | is how do we get machines to run predictive models
00:27:01.080 | of the world that deal with uncertainty
00:27:03.680 | and deal with the real world in all this complexity.
00:27:05.800 | So it's not just the trajectory of a rocket,
00:27:08.120 | which you can reduce to first principles.
00:27:10.200 | It's not even just the trajectory of a robot arm,
00:27:13.000 | which again, you can model by, you know,
00:27:14.880 | careful mathematics, but it's everything else,
00:27:17.160 | everything we observe in the world, you know,
00:27:18.840 | people behavior, you know, physical systems
00:27:22.960 | that involve collective phenomena like water or, you know,
00:27:27.960 | trees and, you know, branches in a tree or something,
00:27:31.880 | or like complex things that, you know,
00:27:35.040 | humans have no trouble developing abstract representations
00:27:38.520 | and predictive model for,
00:27:39.840 | but we still don't know how to do with machines.
00:27:41.600 | - Where do you put in these three,
00:27:43.880 | maybe in the planning stages,
00:27:46.160 | the game theoretic nature of this world,
00:27:50.640 | where your actions not only respond
00:27:52.960 | to the dynamic nature of the world, the environment,
00:27:55.520 | but also affect it.
00:27:57.520 | So if there's other humans involved,
00:27:59.840 | is this point number four,
00:28:02.240 | or is it somehow integrated
00:28:03.400 | into the hierarchical representation of action in your view?
00:28:06.640 | - I think it's integrated.
00:28:07.480 | It's just that now your model of the world has to deal with,
00:28:11.360 | you know, it just makes it more complicated, right?
00:28:13.080 | The fact that humans are complicated
00:28:15.600 | and not easily predictable,
00:28:17.240 | that makes your model of the world much more complicated,
00:28:19.880 | that much more complicated.
00:28:21.360 | - Well, there's a chess, I mean,
00:28:23.080 | I suppose chess is an analogy.
00:28:25.280 | So Monte Carlo tree search.
00:28:28.120 | I mean, there's a, I go, you go, I go, you go.
00:28:32.040 | Like Andrej Karpathy recently gave a talk
00:28:34.960 | at MIT about car doors.
00:28:36.960 | I think there's some machine learning too,
00:28:39.280 | but mostly car doors.
00:28:40.920 | And there's a dynamic nature to the car,
00:28:43.360 | like the person opening the door checking.
00:28:45.720 | I mean, he wasn't talking about that.
00:28:46.880 | He was talking about the perception problem
00:28:48.440 | of what the ontology of what defines a car door,
00:28:50.920 | this big philosophical question.
00:28:52.920 | But to me, it was interesting,
00:28:54.080 | 'cause like it's obvious
00:28:55.720 | that the person opening the car doors,
00:28:57.320 | they're trying to get out, like here in New York,
00:28:59.600 | trying to get out of the car.
00:29:01.440 | You slowing down is going to signal something.
00:29:03.640 | You speeding up is gonna signal something.
00:29:05.440 | That's a dance.
00:29:06.480 | It's a asynchronous chess game.
00:29:10.200 | I don't know.
00:29:11.040 | So it feels like it's not just,
00:29:16.960 | I mean, I guess you can integrate all of them
00:29:18.800 | to one giant model,
00:29:20.360 | like the entirety of these little interactions.
00:29:24.360 | 'Cause it's not as complicated as chess.
00:29:25.760 | It's just like a little dance.
00:29:27.160 | We do like a little dance together
00:29:28.800 | and then we figure it out.
00:29:30.000 | - Well, in some ways it's way more complicated than chess
00:29:32.520 | because it's continuous,
00:29:35.000 | it's uncertain in a continuous manner.
00:29:37.280 | - It doesn't feel more complicated.
00:29:39.840 | - But it doesn't feel more complicated
00:29:41.080 | because that's what we've evolved to solve.
00:29:43.680 | This is the kind of problem we've evolved to solve.
00:29:45.480 | And so we're good at it
00:29:46.400 | because nature has made us good at it.
00:29:49.280 | Nature has not made us good at chess.
00:29:52.360 | We completely suck at chess.
00:29:55.720 | In fact, that's why we designed it as a game,
00:29:58.000 | is to be challenging.
00:29:59.040 | And if there is something that recent progress
00:30:02.600 | in chess and Go has made us realize
00:30:05.600 | is that humans are really terrible at those things,
00:30:07.920 | like really bad.
00:30:08.920 | There was a story before AlphaGo
00:30:11.520 | that the best Go players thought
00:30:15.200 | there were maybe two or three stones
00:30:16.720 | behind an ideal player that they would call God.
00:30:19.640 | In fact, no, there are like nine or 10 stones behind,
00:30:23.680 | I mean, which is bad.
00:30:25.360 | So we're not good at,
00:30:27.400 | and it's because we have limited working memory.
00:30:30.360 | We're not very good at doing this tree exploration
00:30:32.960 | that computers are much better at doing than we are,
00:30:36.760 | but we are much better
00:30:37.960 | at learning differentiable models of the world.
00:30:40.600 | I mean, I say differentiable in a kind of,
00:30:43.840 | I should say not differentiable in the sense
00:30:46.040 | that we run backprop through it,
00:30:47.480 | but in the sense that our brain has some mechanism
00:30:50.520 | for estimating gradients of some kind.
00:30:54.080 | And that's what makes us efficient.
00:30:56.520 | So if you have an agent that consists of a model
00:31:01.520 | of the world, which in the human brain
00:31:04.360 | is basically the entire front half of your brain,
00:31:06.800 | an objective function, which in humans
00:31:13.320 | is a combination of two things.
00:31:14.400 | There is your sort of intrinsic motivation module,
00:31:17.640 | which is in the basal ganglia,
00:31:19.120 | the base of your brain.
00:31:20.080 | That's the thing that measures pain and hunger
00:31:22.480 | and things like that, like immediate feelings and emotions.
00:31:26.800 | And then there is the equivalent
00:31:30.720 | of what people in reinforcement learning call a critic,
00:31:32.560 | which is a sort of module that predicts ahead
00:31:36.040 | what the outcome of a situation will be.
00:31:41.040 | And so it's not a cost function,
00:31:43.800 | but it's sort of not an objective function,
00:31:45.400 | but it's sort of a trained predictor
00:31:49.000 | of the ultimate objective function.
00:31:50.960 | And that also is differentiable.
00:31:52.560 | And so if all of this is differentiable,
00:31:54.640 | your cost function, your critic, your world model,
00:31:59.640 | then you can use gradient-based type methods
00:32:03.080 | to do planning, to do reasoning, to do learning,
00:32:05.840 | to do all the things that we'd like
00:32:08.160 | an intelligent agent to do.
00:32:11.840 | - And a gradient-based learning,
00:32:14.200 | like what's your intuition?
00:32:15.360 | That's probably at the core of what can solve intelligence.
00:32:18.400 | So you don't need like a logic-based reasoning in your view.
00:32:23.400 | - I don't know how to make logic-based reasoning
00:32:27.280 | compatible with efficient learning.
00:32:29.760 | - Yeah.
00:32:31.000 | - Okay, I mean, there is a big question,
00:32:32.320 | perhaps a philosophical question.
00:32:33.880 | I mean, it's not that philosophical,
00:32:35.200 | but that we can ask is that, you know,
00:32:38.080 | all the learning algorithms we know
00:32:40.360 | from engineering and computer science
00:32:43.280 | proceed by optimizing some objective function.
00:32:46.520 | - Yeah. - Right?
00:32:48.320 | So one question we may ask is,
00:32:49.920 | does learning in the brain minimize an objective function?
00:32:54.720 | I mean, it could be a composite
00:32:57.320 | of multiple objective functions,
00:32:58.480 | but it's still an objective function.
00:33:00.320 | Second, if it does optimize an objective function,
00:33:04.640 | does it do it by some sort of gradient estimation?
00:33:09.640 | You know, it doesn't need to be backprop,
00:33:10.880 | but some way of estimating the gradient in an efficient manner
00:33:14.840 | whose complexity is on the same order of magnitude
00:33:17.000 | as actually running the inference.
00:33:20.760 | 'Cause you can't afford to do things like, you know,
00:33:24.920 | perturbing a weight in your brain
00:33:26.520 | to figure out what the effect is,
00:33:28.040 | and then sort of, you know,
00:33:29.640 | you can't do sort of estimating gradient by perturbation.
00:33:32.640 | It's, to me, it seems very implausible
00:33:35.400 | that the brain uses some sort of, you know,
00:33:39.200 | zeroth-order black box gradient-free optimization
00:33:42.960 | because it's so much less efficient
00:33:45.160 | than gradient optimization.
00:33:46.280 | So it has to have a way of estimating gradient.
00:33:49.240 | - Is it possible that some kind of logic-based reasoning
00:33:52.760 | emerges in pockets as a useful, like you said,
00:33:56.320 | if the brain is an objective function,
00:33:58.080 | maybe it's a mechanism for creating objective functions.
00:34:01.280 | It's a mechanism for creating knowledge bases, for example,
00:34:06.280 | that can then be queried.
00:34:08.360 | Like, maybe it's like an efficient representation
00:34:10.240 | of knowledge that's learned in a gradient-based way
00:34:12.640 | or something like that.
00:34:13.760 | - Well, so I think there is a lot of different types
00:34:15.920 | of intelligence.
00:34:17.280 | So first of all, I think the type of logical reasoning
00:34:19.600 | that we think about, that we are, you know,
00:34:23.040 | maybe stemming from, you know, sort of classical AI
00:34:26.000 | of the 1970s and '80s,
00:34:27.640 | I think humans use that relatively rarely
00:34:32.920 | and are not particularly good at it.
00:34:34.640 | - But we judge each other based on our ability
00:34:37.480 | to solve those rare problems.
00:34:40.520 | It's called an IQ test.
00:34:41.600 | - I don't think so.
00:34:42.600 | Like, I'm not very good at chess.
00:34:45.160 | - Yes, I'm judging you this whole time because,
00:34:48.480 | well, we actually-
00:34:49.720 | - With your, you know, heritage,
00:34:51.760 | I'm sure you're good at chess.
00:34:53.440 | - No, stereotypes.
00:34:55.000 | Not all stereotypes are true.
00:34:56.640 | - Well, I'm terrible at chess.
00:34:58.960 | So, you know, but I think perhaps another type
00:35:03.960 | of intelligence that I have is this, you know,
00:35:07.560 | ability of sort of building models to the world from,
00:35:11.000 | you know, reasoning, obviously, but also data.
00:35:15.640 | And those models generally are more kind of analogical,
00:35:18.600 | right?
00:35:19.440 | So it's reasoning by simulation and by analogy,
00:35:23.880 | where you use one model to apply to a new situation,
00:35:26.840 | even though you've never seen that situation,
00:35:28.400 | you can sort of connect it to a situation
00:35:31.560 | you've encountered before.
00:35:33.440 | And your reasoning is more, you know,
00:35:36.360 | akin to some sort of internal simulation.
00:35:38.360 | So you're kind of simulating what's happening
00:35:41.080 | when you're building, I don't know,
00:35:42.160 | a box out of wood or something, right?
00:35:44.000 | You kind of imagine in advance,
00:35:46.360 | like what would be the result of, you know,
00:35:47.760 | cutting the wood in this particular way?
00:35:49.560 | Are you going to use, you know, screws and nails or whatever?
00:35:52.800 | When you are interacting with someone,
00:35:54.080 | you also have a model of that person
00:35:55.720 | and sort of interact with that person, you know,
00:35:59.480 | having this model in mind to kind of tell the person
00:36:03.560 | what you think is useful to them.
00:36:05.240 | So I think this ability to construct models of the world
00:36:10.160 | is basically the essence of intelligence.
00:36:13.840 | And the ability to use it then to plan actions
00:36:18.200 | that will fulfill a particular criterion,
00:36:23.040 | of course, is necessary as well.
00:36:25.440 | - So I'm going to ask you a series of impossible questions
00:36:27.720 | as we keep asking, as I've been doing.
00:36:30.160 | So if that's the fundamental sort of dark matter
00:36:33.400 | of intelligence, this ability to form a background model,
00:36:36.560 | what's your intuition about how much knowledge is required?
00:36:41.440 | You know, I think dark matter,
00:36:43.120 | you can put a percentage on it
00:36:46.000 | of the composition of the universe
00:36:50.040 | and how much of it is dark matter,
00:36:51.440 | how much of it is dark energy.
00:36:52.640 | How much information do you think is required
00:36:57.640 | to be a house cat?
00:36:59.920 | So you have to be able to, when you see a box,
00:37:02.160 | go in it, when you see a human compute the most evil action,
00:37:06.240 | if there's a thing that's near an edge, you knock it off.
00:37:09.600 | All of that, plus the extra stuff you mentioned,
00:37:12.720 | which is a great self-awareness of the physics
00:37:15.720 | of your own body and the world.
00:37:18.740 | How much knowledge is required, do you think, to solve it?
00:37:21.600 | I don't even know how to measure an answer to that question.
00:37:25.600 | - I'm not sure how to measure it,
00:37:26.680 | but whatever it is, it fits in about 800,000 neurons.
00:37:32.100 | 800 million neurons, sorry.
00:37:33.900 | - The representation does.
00:37:35.380 | - Everything, all knowledge, everything, right?
00:37:38.660 | It's less than a billion.
00:37:41.460 | A dog is two billion, but a cat is less than one billion.
00:37:44.380 | And so multiply that by a thousand
00:37:48.100 | and you get the number of synapses.
00:37:50.260 | And I think almost all of it is learned
00:37:52.740 | through a sort of self-supervised learning,
00:37:55.900 | although I think a tiny sliver
00:37:58.460 | is learned through reinforcement learning,
00:37:59.860 | and certainly very little through
00:38:02.180 | classical supervised learning,
00:38:03.300 | although it's not even clear how supervised learning
00:38:05.180 | actually works in the biological world.
00:38:08.100 | So I think almost all of it is self-supervised learning,
00:38:12.860 | but it's driven by the sort of ingrained objective functions
00:38:17.860 | that a cat or a human have at the base of their brain,
00:38:21.380 | which kind of drives their behavior.
00:38:24.880 | So nature tells us you're hungry.
00:38:29.480 | It doesn't tell us how to feed ourselves.
00:38:31.900 | That's something that the rest of our brain
00:38:33.500 | has to figure out, right?
00:38:34.820 | - What's interesting is there might be more
00:38:37.940 | like deeper objective functions
00:38:39.660 | that are allowing the whole thing.
00:38:41.300 | So hunger may be some kind of,
00:38:44.500 | now you go to like neurobiology,
00:38:46.140 | it might be just the brain
00:38:47.500 | trying to maintain homeostasis.
00:38:52.460 | So hunger is just one of the human perceivable symptoms
00:38:58.020 | of the brain being unhappy with the way things are currently.
00:39:01.440 | It could be just like one really dumb
00:39:03.240 | objective function at the core.
00:39:04.920 | - But that's how behavior is driven.
00:39:08.440 | The fact that the basal ganglia
00:39:11.240 | drive us to do things that are different
00:39:14.800 | from say an orangutan or certainly a cat
00:39:18.160 | is what makes human nature
00:39:20.040 | versus orangutan nature versus cat nature.
00:39:23.240 | So for example,
00:39:25.680 | our basal ganglia drives us to
00:39:28.540 | seek the company of other humans.
00:39:32.220 | And that's because nature has figured out
00:39:34.540 | that we need to be social animals
00:39:36.140 | for our species to survive.
00:39:37.540 | And it's true of many primates.
00:39:40.320 | It's not true of orangutans.
00:39:42.620 | Orangutans are solitary animals.
00:39:44.920 | They don't seek the company of others.
00:39:46.900 | In fact, they avoid them.
00:39:48.140 | In fact, they scream at them when they come too close
00:39:51.060 | because they're territorial.
00:39:52.740 | 'Cause for their survival,
00:39:55.880 | evolution has figured out that's the best thing.
00:39:58.280 | I mean, they're occasionally social, of course,
00:40:00.040 | for reproduction and stuff like that.
00:40:03.520 | But they're mostly solitary.
00:40:05.920 | So all of those behaviors are not part of intelligence.
00:40:09.720 | People say, "Oh, you're never gonna have
00:40:11.040 | "intelligent machines because human intelligence is social."
00:40:13.960 | But then you look at orangutans, you look at octopus.
00:40:16.840 | Octopus never know their parents.
00:40:18.800 | They barely interact with any other.
00:40:20.480 | And they get to be really smart
00:40:22.200 | in less than a year, in like half a year.
00:40:24.920 | In a year, they're adults.
00:40:27.640 | In two years, they're dead.
00:40:28.800 | So there are things that we think, as humans,
00:40:33.600 | are intimately linked with intelligence,
00:40:35.760 | like social interaction, like language.
00:40:38.840 | I think we give way too much importance to language
00:40:43.520 | as a substrate of intelligence as humans
00:40:46.760 | because we think our reasoning is so linked with language.
00:40:49.840 | - So to solve the house cat intelligence problem,
00:40:53.480 | you think you could do it on a desert island?
00:40:55.480 | You could have-- - Pretty much.
00:40:56.760 | - You could just have a cat sitting there
00:40:58.760 | looking at the waves, at the ocean waves,
00:41:03.160 | and figure a lot of it out.
00:41:05.720 | - It needs to have sort of the right set of drives
00:41:08.840 | to kind of get it to do the thing
00:41:12.540 | and learn the appropriate things, right?
00:41:14.320 | But like, for example, baby humans are driven
00:41:19.440 | to learn to stand up and walk.
00:41:21.920 | That's kind of, this desire is hardwired.
00:41:26.000 | How to do it precisely is not, that's learned.
00:41:28.520 | But the desire to-- - To walk?
00:41:30.440 | - Move around and stand up, that's sort of hardwired.
00:41:35.440 | But it's very simple to hardwire this kind of stuff.
00:41:38.040 | - Oh, like the desire to, well, that's interesting.
00:41:42.760 | You're hardwired to want to walk.
00:41:45.600 | That's not a, there's gotta be a deeper need for walking.
00:41:50.440 | I think it was probably socially imposed by society
00:41:53.120 | that you need to walk, all the other bipedal--
00:41:55.560 | - No, like a lot of simple animals that, you know,
00:41:58.280 | would probably walk without ever watching
00:42:01.040 | any other members of the species.
00:42:03.880 | - It seems like a scary thing to have to do
00:42:06.820 | 'cause you suck at bipedal walking at first.
00:42:09.280 | It seems crawling is much safer, much more,
00:42:13.380 | like, why are you in a hurry?
00:42:15.720 | - Well, because you have this thing
00:42:17.560 | that drives you to do it, you know,
00:42:19.320 | which is sort of part of the sort of human development.
00:42:25.080 | - Is that understood, actually, what--
00:42:26.720 | - Not entirely, no.
00:42:28.280 | - What's the reason to get on two feet?
00:42:29.760 | It's really hard.
00:42:30.680 | Like, most animals don't get on two feet.
00:42:32.800 | Why is that? - Well, they get on four feet.
00:42:34.000 | You know, many mammals get on four feet.
00:42:35.800 | - Yeah, they do get-- - Very quickly.
00:42:36.800 | Some of them, extremely quickly.
00:42:38.520 | - But I don't, you know, like, from the last time
00:42:41.400 | I've interacted with a table, that's much more stable
00:42:43.640 | than a thing on two legs.
00:42:44.980 | It's just a really hard problem.
00:42:46.440 | - Yeah, I mean, birds have figured it out with two feet.
00:42:49.640 | - Well, technically, we can go into ontology.
00:42:52.000 | They have four.
00:42:53.160 | I guess they have two feet.
00:42:54.480 | - They have two feet. - Chickens.
00:42:56.400 | - You know, dinosaurs have two feet, many of them.
00:42:58.840 | - Allegedly.
00:42:59.680 | I'm just now learning that T-Rex was eating grass,
00:43:04.320 | not other animals.
00:43:05.400 | T-Rex might have been a friendly pet.
00:43:08.040 | What do you think about, I don't know if you looked at
00:43:12.440 | the test for general intelligence
00:43:14.620 | that Francois Chollet put together.
00:43:16.380 | I don't know if you got a chance to look at
00:43:18.100 | that kind of thing.
00:43:19.660 | What's your intuition about how to solve
00:43:21.820 | like an IQ type of test?
00:43:23.740 | - I don't know.
00:43:24.580 | I think it's so outside of my radar screen
00:43:26.140 | that it's not really relevant, I think, in the short term.
00:43:30.700 | - Well, I guess one way to ask, another way,
00:43:33.920 | perhaps more closer to what, to your work,
00:43:37.280 | is like, how do you solve MNIST
00:43:40.260 | with very little example data?
00:43:42.720 | - That's right, and that's the answer to this probably,
00:43:44.840 | is self-supervised learning.
00:43:45.840 | Just learn to represent images, and then learning
00:43:48.240 | to recognize handwritten digits on top of this
00:43:51.800 | will only require a few samples.
00:43:53.640 | And we observe this in humans, right?
00:43:55.480 | You show a young child a picture book
00:43:58.680 | with a couple of pictures of an elephant, and that's it.
00:44:01.960 | The child knows what an elephant is.
00:44:03.880 | And we see this today with practical systems
00:44:06.720 | that we train image recognition systems
00:44:09.520 | with enormous amounts of images,
00:44:13.660 | either completely self-supervised
00:44:15.740 | or very weakly supervised.
00:44:16.960 | For example, you can train a neural net
00:44:20.840 | to predict whatever hashtag people type on Instagram, right?
00:44:24.120 | Then you can do this with billions of images
00:44:25.740 | 'cause there's billions per day that are showing up.
00:44:28.520 | So the amount of training data there
00:44:30.640 | is essentially unlimited.
00:44:32.300 | And then you take the output representation,
00:44:35.020 | you know, a couple layers down from the output
00:44:37.320 | of what the system learned,
00:44:39.400 | and feed this as input to a classifier
00:44:42.000 | for any object in the world that you want,
00:44:43.760 | and it works pretty well.
00:44:45.000 | So that's transfer learning, okay?
00:44:47.600 | Or weakly supervised transfer learning.
00:44:50.140 | People are making very, very fast progress
00:44:53.480 | using self-supervised learning
00:44:55.280 | for this kind of scenario as well.
00:44:57.380 | And, you know, my guess is that
00:45:00.760 | that's gonna be the future.
00:45:02.520 | - For self-supervised learning,
00:45:03.640 | how much cleaning do you think is needed
00:45:06.800 | for filtering malicious signal,
00:45:11.760 | or what's a better term?
00:45:13.000 | But like a lot of people use hashtags on Instagram
00:45:15.720 | to get like good SEO
00:45:20.040 | that doesn't fully represent the contents of the image.
00:45:23.080 | Like they'll put a picture of a cat
00:45:24.480 | and hashtag it with like science, awesome, fun.
00:45:28.040 | I don't know, all kinds.
00:45:29.720 | Why would you put science?
00:45:31.200 | That's not very good SEO.
00:45:33.080 | - The way my colleagues who worked on this project
00:45:34.960 | at Facebook now Meta, Meta.AI,
00:45:38.680 | a few years ago dealt with this is that
00:45:41.560 | they only selected something like 17,000 tags
00:45:43.760 | that correspond to kind of physical things or situations.
00:45:48.080 | Like, you know, that has some visual content.
00:45:50.320 | So, you know, you wouldn't have like #tbt
00:45:55.800 | or anything like that.
00:45:57.120 | - Oh, so they keep a very select set of hashtags
00:46:00.800 | is what you're saying? - Yeah.
00:46:01.800 | - Okay.
00:46:02.960 | - It's still on the order of, you know, 10 to 20,000.
00:46:06.040 | So it's fairly large.
00:46:07.920 | - Okay.
00:46:09.040 | Can you tell me about data augmentation?
00:46:11.240 | What the heck is data augmentation?
00:46:13.240 | And how is it used, maybe contrast of learning for video?
00:46:18.240 | What are some cool ideas here?
00:46:20.840 | - Right, so data augmentation,
00:46:22.080 | I mean, first data augmentation, you know,
00:46:23.800 | is the idea of artificially increasing the size
00:46:26.120 | of your training set by distorting the images
00:46:29.360 | that you have in ways that don't change
00:46:31.000 | the nature of the image, right?
00:46:32.320 | So you take, you do MNIST,
00:46:33.960 | you can do data augmentation on MNIST.
00:46:35.440 | And people have done this since the 1990s, right?
00:46:37.320 | You take an MNIST digit and you shift it a little bit,
00:46:40.840 | or you change the size or rotate it, skew it,
00:46:44.800 | you know, et cetera.
00:46:46.960 | - Add noise.
00:46:48.240 | - Add noise, et cetera.
00:46:49.480 | And it works better.
00:46:50.800 | If you train a supervised classifier with augmented data,
00:46:53.440 | you're gonna get better results.
00:46:55.560 | Now it's become really interesting
00:46:58.600 | over the last couple of years,
00:47:00.400 | because a lot of self-supervised learning techniques
00:47:04.160 | to pre-train vision systems are based on data augmentation.
00:47:08.000 | And the basic techniques is originally inspired
00:47:12.000 | by techniques that I worked on in the early '90s
00:47:15.840 | and Geoff Hinton worked on also in the early '90s.
00:47:17.720 | They were sort of parallel work.
00:47:20.040 | I used to call this Siamese networks.
00:47:21.600 | So basically you take two identical copies
00:47:24.960 | of the same network, they share the same weights,
00:47:27.720 | and you show two different views of the same object.
00:47:31.760 | Either those two different views may have been obtained
00:47:33.920 | by data augmentation, or maybe it's two different views
00:47:36.480 | of the same scene from a camera that you moved
00:47:39.320 | or at different times or something like that, right?
00:47:41.360 | Or two pictures of the same person, things like that.
00:47:44.480 | And then you train this neural net,
00:47:46.440 | those two identical copies of this neural net
00:47:48.400 | to produce an output representation, a vector,
00:47:51.480 | in such a way that the representation
00:47:53.960 | for those two images are as close to each other as possible,
00:47:58.920 | as identical to each other as possible, right?
00:48:00.840 | Because you want the system to basically learn a function
00:48:04.640 | that will be invariant, that will not change,
00:48:07.160 | whose output will not change when you transform those inputs
00:48:10.840 | in those particular ways, right?
00:48:12.880 | So that's easy to do.
00:48:15.680 | What's complicated is how do you make sure
00:48:17.720 | that when you show two images that are different,
00:48:19.520 | the system will produce different things?
00:48:21.960 | Because if you don't have a specific provision for this,
00:48:26.200 | the system will just ignore the input.
00:48:28.240 | When you train it, it will end up ignoring the input
00:48:30.360 | and just produce a constant vector
00:48:31.720 | that is the same for every input, right?
00:48:33.640 | That's called a collapse.
00:48:35.200 | Now, how do you avoid collapse?
00:48:36.720 | So there's two ideas.
00:48:37.760 | One idea that I proposed in the early '90s
00:48:41.560 | with my colleagues at Bell Labs,
00:48:43.080 | Jane Bromley and a couple other people,
00:48:45.360 | which we now call contrastive learning,
00:48:48.280 | which is to have negative examples, right?
00:48:50.040 | So you have pairs of images that you know are different
00:48:53.200 | and you show them to the network and those two copies,
00:48:57.520 | and then you push the two output vectors
00:48:59.560 | away from each other.
00:49:01.120 | And it will eventually guarantee
00:49:02.240 | that things that are semantically similar
00:49:04.920 | produce similar representations
00:49:06.520 | and things that are different
00:49:07.360 | produce different representations.
00:49:09.040 | We actually came up with this idea
00:49:11.480 | for a project of doing signature verification.
00:49:14.520 | So we would collect signatures
00:49:17.880 | from multiple signatures on the same person
00:49:20.200 | and then train a neural net
00:49:21.440 | to produce the same representation.
00:49:23.320 | And then force the system to produce different representation
00:49:28.320 | from different signatures.
00:49:29.880 | This was actually, the problem was proposed
00:49:33.000 | by people from what was a subsidiary of AT&T at the time
00:49:36.720 | called NCR.
00:49:38.280 | And they were interested in storing a representation
00:49:41.040 | of the signature on the 80 bytes
00:49:43.520 | of the magnetic strip of a credit card.
00:49:46.680 | So we came up with this idea of having a neural net
00:49:48.800 | with 80 outputs that we quantized on bytes
00:49:52.320 | so that we could encode the-
00:49:53.880 | - And that encoding was then used to compare
00:49:55.480 | whether the signature matches or not.
00:49:57.120 | - That's right.
00:49:57.960 | So then you would sign, it would run through the neural net
00:50:00.680 | and then you would compare the output vector
00:50:02.440 | to whatever is stored on your card.
00:50:03.280 | - Did it actually work?
00:50:04.680 | - It worked, but they ended up not using it.
00:50:06.880 | Because nobody cares actually.
00:50:10.160 | I mean, the American financial payment system
00:50:13.840 | is incredibly lax in that respect compared to Europe.
00:50:17.600 | - Oh, with the signatures?
00:50:19.000 | What's the purpose of signatures anyway?
00:50:20.560 | This is very-
00:50:21.400 | - Nobody looks at them, nobody cares.
00:50:23.320 | - It's, yeah.
00:50:24.480 | - Yeah, no.
00:50:25.320 | So that's contrastive learning, right?
00:50:27.840 | So you need positive and negative pairs.
00:50:29.480 | And the problem with that is that,
00:50:31.760 | even though I had the original paper on this,
00:50:34.760 | I'm actually not very positive about it
00:50:36.800 | because it doesn't work in high dimension.
00:50:38.680 | If your representation is high dimensional,
00:50:41.040 | there's just too many ways for two things to be different.
00:50:44.280 | And so you would need lots and lots and lots
00:50:46.320 | of negative pairs.
00:50:48.240 | So there is a particular implementation of this,
00:50:50.800 | which is relatively recent
00:50:51.920 | from actually the Google Toronto group,
00:50:54.840 | where Geoff Hinton is the senior member there.
00:50:58.800 | And it's called SimClear, S-I-M-C-L-A-R.
00:51:01.360 | And it, basically a particular way
00:51:03.720 | of implementing this idea of contrastive learning,
00:51:06.760 | the particular objective function.
00:51:08.640 | Now, what I'm much more enthusiastic about these days
00:51:13.200 | is non-contrastive methods.
00:51:14.640 | So other ways to guarantee that
00:51:16.600 | the representations would be different
00:51:20.680 | for different inputs.
00:51:23.240 | And it's actually based on an idea
00:51:26.160 | that Geoff Hinton proposed in the early '90s
00:51:29.520 | with his student at the time, Sue Becker.
00:51:31.960 | And it's based on the idea
00:51:32.800 | of maximizing the mutual information
00:51:34.280 | between the outputs of the two systems.
00:51:36.160 | You only show positive pairs,
00:51:37.440 | you only show pairs of images
00:51:38.800 | that you know are somewhat similar.
00:51:41.640 | And you train the two networks to be informative,
00:51:44.160 | but also to be as informative of each other as possible.
00:51:49.720 | So basically one representation has to be predictable
00:51:52.240 | from the other, essentially.
00:51:53.880 | And he proposed that idea,
00:51:57.160 | had a couple of papers in the early '90s,
00:52:00.200 | and then nothing was done about it for decades.
00:52:03.080 | And I kind of revived this idea
00:52:04.800 | together with my postdocs at FAIR,
00:52:07.000 | particularly a postdoc called Stefan Duny,
00:52:09.640 | who's now a junior professor in Finland
00:52:12.520 | at the University of Aalto.
00:52:13.880 | We came up with something that we called Barlow Twins,
00:52:19.280 | and it's a particular way
00:52:20.640 | of maximizing the information content of a vector
00:52:25.480 | using some hypotheses.
00:52:28.080 | And we have kind of another version of it
00:52:32.040 | that's more recent now called VICREG, V-I-C-R-E-G,
00:52:34.520 | that means variance, invariance, covariance, regularization.
00:52:37.880 | And it's the thing I'm the most excited about
00:52:39.960 | in machine learning in the last 15 years.
00:52:41.720 | I mean, I'm really, really excited about this.
00:52:44.360 | - What kind of data augmentation is useful for that,
00:52:47.920 | not contrast to the learning method?
00:52:50.240 | Are we talking about, does that not matter that much?
00:52:52.640 | Or it seems like a very important part of the step.
00:52:55.920 | - Yeah.
00:52:56.760 | - How you generate the images that are similar,
00:52:58.160 | but sufficiently different.
00:52:59.600 | - Yeah, that's right.
00:53:00.440 | It's an important step, and it's also an annoying step,
00:53:02.400 | because you need to have that knowledge
00:53:03.760 | of what data augmentation you can do
00:53:06.800 | that do not change the nature of the object.
00:53:10.400 | And so the standard scenario,
00:53:13.160 | which a lot of people working in this area are using,
00:53:15.400 | is you use the type of distortion.
00:53:19.640 | So basically you do a geometric distortion.
00:53:22.040 | So one basically just shifts the image a little bit,
00:53:24.240 | it's called cropping.
00:53:25.240 | Another one kind of changes the scale a little bit.
00:53:27.760 | Another one kind of rotates it.
00:53:29.160 | Another one changes the colors.
00:53:30.720 | You can do a shift in color balance
00:53:32.960 | or something like that.
00:53:34.120 | Saturation.
00:53:35.840 | Another one sort of blurs it.
00:53:37.160 | Another one adds noise.
00:53:38.160 | So you have like a catalog of kind of standard things,
00:53:41.160 | and people try to use the same ones for different algorithms
00:53:44.040 | so that they can compare.
00:53:45.960 | But some algorithms, some self-supervised algorithm
00:53:48.240 | actually can deal with much bigger,
00:53:50.680 | like more aggressive data augmentation, and some don't.
00:53:53.560 | So that kind of makes the whole thing difficult.
00:53:56.400 | But that's the kind of distortions we're talking about.
00:53:58.800 | And so you train with those distortions,
00:54:03.560 | and then you chop off the last layer,
00:54:07.320 | a couple layers of the network,
00:54:11.120 | and you use the representation as input to a classifier.
00:54:13.560 | You train the classifier on ImageNet, let's say,
00:54:17.640 | or whatever, and measure the performance.
00:54:20.520 | And interestingly enough,
00:54:23.120 | the methods that are really good
00:54:24.400 | at eliminating the information that is irrelevant,
00:54:26.840 | which is the distortions between those images,
00:54:29.200 | do a good job at eliminating it.
00:54:32.400 | And as a consequence,
00:54:34.080 | you cannot use the representations in those systems
00:54:37.200 | for things like object detection and localization,
00:54:39.880 | because that information is gone.
00:54:41.520 | So the type of data augmentation you need to do
00:54:44.720 | depends on the task you want eventually the system to solve.
00:54:48.640 | And the type of data augmentation,
00:54:50.680 | standard data augmentation that we use today,
00:54:52.560 | are only appropriate for object recognition
00:54:54.680 | or image classification.
00:54:56.000 | They're not appropriate for things like--
00:54:57.760 | - Can you help me out understand why localization is--
00:55:00.800 | So you're saying it's just not good at the negative,
00:55:03.760 | like at classifying the negative,
00:55:05.440 | so that's why it can't be used for the localization?
00:55:07.920 | - No, it's just that you train the system,
00:55:10.360 | you give it an image,
00:55:12.360 | and then you give it the same image shifted and scaled,
00:55:14.960 | and you tell it that's the same image.
00:55:17.400 | So the system basically is trained
00:55:19.160 | to eliminate the information about position and size.
00:55:22.040 | So now you want to use that--
00:55:24.760 | - Oh, like literally--
00:55:26.200 | - Where an object is and what size it is.
00:55:27.760 | - Like a bounding box, like to be able to actually, okay.
00:55:30.800 | It can still find the object in the image,
00:55:34.160 | it's just not very good at finding
00:55:35.960 | the exact boundaries of that object, interesting.
00:55:39.000 | Interesting, which, you know,
00:55:41.120 | that's an interesting sort of philosophical question.
00:55:43.480 | How important is object localization anyway?
00:55:47.040 | We're like obsessed by measuring like image segmentation,
00:55:51.240 | obsessed by measuring perfectly
00:55:53.080 | knowing the boundaries of objects,
00:55:54.700 | when arguably that's not that essential
00:55:59.700 | to understanding what are the contents of the scene.
00:56:03.800 | - On the other hand, I think evolutionarily,
00:56:05.880 | the first vision systems in animals
00:56:08.200 | were basically all about localization,
00:56:10.040 | very little about recognition.
00:56:12.480 | And in the human brain, you have two separate pathways
00:56:15.320 | for recognizing the nature of a scene or an object,
00:56:20.320 | and localizing objects.
00:56:22.320 | So you use the first pathway called the ventral pathway
00:56:25.200 | for telling what you're looking at.
00:56:28.160 | The other pathway, the dorsal pathway,
00:56:30.580 | is used for navigation, for grasping, for everything else.
00:56:34.140 | And basically a lot of the things you need for survival
00:56:36.900 | are localization and detection.
00:56:39.740 | - Is similarity learning or contrastive learning,
00:56:45.060 | are these non-contrastive methods
00:56:46.540 | the same as understanding something?
00:56:48.860 | Just because you know a distorted cat
00:56:50.680 | is the same as a non-distorted cat,
00:56:52.620 | does that mean you understand what it means to be a cat?
00:56:56.740 | - To some extent.
00:56:57.580 | I mean, it's a superficial understanding, obviously.
00:57:00.100 | - But what is the ceiling of this method, do you think?
00:57:02.380 | Is this just one trick on the path
00:57:05.140 | to doing self-supervised learning,
00:57:07.300 | or can we go really, really far?
00:57:10.020 | - I think we can go really far.
00:57:11.260 | So if we figure out how to use techniques of that type,
00:57:16.260 | perhaps very different, but of the same nature,
00:57:19.460 | to train a system from video,
00:57:22.460 | to do video prediction, essentially,
00:57:24.260 | I think we'll have a path towards,
00:57:29.080 | I wouldn't say unlimited,
00:57:31.340 | but a path towards some level of physical common sense
00:57:36.340 | in machines.
00:57:38.100 | And I also think that that ability to learn
00:57:43.100 | how the world works from a high-throughput channel,
00:57:47.720 | like vision, is a necessary step
00:57:51.900 | towards real artificial intelligence.
00:57:55.540 | In other words, I believe in grounded intelligence.
00:57:58.100 | I don't think we can train a machine
00:57:59.920 | to be intelligent purely from text,
00:58:02.180 | because I think the amount of information
00:58:04.420 | about the world that's contained in text
00:58:06.220 | is tiny compared to what we need to know.
00:58:09.960 | So for example, and people have attempted to do this
00:58:15.300 | for 30 years, right, the PSYCH project,
00:58:17.460 | and things like that, right,
00:58:18.420 | of basically kind of writing down all the facts
00:58:20.620 | that are known and hoping that some sort of common sense
00:58:24.100 | will emerge, I think it's basically hopeless.
00:58:27.180 | But let me take an example.
00:58:28.300 | You take an object, I describe a situation to you.
00:58:31.300 | I take an object, I put it on the table,
00:58:33.540 | and I push the table.
00:58:34.940 | It's completely obvious to you that the object
00:58:37.220 | will be pushed with the table, right,
00:58:39.220 | because it's sitting on it.
00:58:40.580 | There's no text in the world, I believe,
00:58:43.420 | that explains this.
00:58:45.060 | And so if you train a machine as powerful as it could be,
00:58:48.380 | you know, your GPT 5000, or whatever it is,
00:58:53.380 | it's never gonna learn about this.
00:58:55.640 | That information is just not present in any text.
00:59:01.020 | - Well, the question, like with the PSYCH project,
00:59:03.260 | the dream, I think, is to have like 10 million,
00:59:08.020 | say, facts like that, that give you a head start,
00:59:13.260 | like a parent guiding you.
00:59:15.460 | Now, we humans don't need a parent to tell us
00:59:17.580 | that the table will move, sorry,
00:59:19.500 | the smartphone will move with the table.
00:59:21.700 | But we get a lot of guidance in other ways,
00:59:25.900 | so it's possible that we can give it a quick shortcut.
00:59:28.420 | - What about a cat?
00:59:29.460 | A cat knows that.
00:59:31.060 | - No, but they evolved, so--
00:59:33.380 | - No, they learn like us.
00:59:34.660 | - Sorry, the physics of stuff?
00:59:37.340 | - Yeah.
00:59:38.740 | - Well, yeah, so you're saying it's,
00:59:42.500 | so you're putting a lot of intelligence
00:59:45.100 | onto the nurture side, not the nature.
00:59:47.140 | - Yes.
00:59:47.980 | - 'Cause we seem to have, you know,
00:59:50.020 | there's a very inefficient, arguably,
00:59:52.540 | process of evolution that got us from bacteria
00:59:55.540 | to who we are today.
00:59:56.960 | Started at the bottom, now we're here.
00:59:59.780 | So the question is how, okay,
01:00:04.260 | the question is how fundamental is that,
01:00:05.980 | the nature of the whole hardware?
01:00:08.380 | And then, is there any way to shortcut it
01:00:11.660 | if it's fundamental?
01:00:12.500 | If it's not, if it's most of the intelligence,
01:00:14.280 | most of the cool stuff we've been talking about
01:00:15.900 | is mostly nurture, mostly trained,
01:00:18.780 | we figure it out by observing the world,
01:00:20.660 | we can form that big, beautiful, sexy background model
01:00:24.780 | that you're talking about just by sitting there.
01:00:27.240 | Then, okay, then you need to, then like maybe,
01:00:32.600 | it is all supervised learning all the way down.
01:00:37.820 | It's all supervised learning, so.
01:00:38.980 | Whatever it is that makes human intelligence
01:00:42.180 | different from other animals,
01:00:44.100 | which a lot of people think is language
01:00:46.340 | and logical reasoning and this kind of stuff,
01:00:48.740 | it cannot be that complicated
01:00:49.900 | because it only popped up in the last million years.
01:00:52.860 | - Yeah.
01:00:53.700 | - And it only involves less than 1% of a genome,
01:00:59.660 | which is the difference between human genome
01:01:01.220 | and chimps or whatever.
01:01:03.420 | So it can't be that complicated.
01:01:06.900 | It can't be that fundamental.
01:01:08.020 | I mean, most of the complicated stuff
01:01:10.860 | already exists in cats and dogs
01:01:12.460 | and certainly primates, non-human primates.
01:01:15.800 | - Yeah, that little thing with humans
01:01:18.620 | might be just something about social interaction
01:01:22.420 | and ability to maintain ideas
01:01:23.940 | across a collective of people.
01:01:28.100 | It sounds very dramatic and very impressive,
01:01:30.800 | but it probably isn't, mechanistically speaking.
01:01:33.340 | - It is, but we're not there yet.
01:01:34.660 | Like, we have, I mean, this is numbers.
01:01:37.300 | - Number 634 in the list of problems we have to solve.
01:01:42.300 | (laughs)
01:01:43.380 | - So basic physics of the world is number one.
01:01:46.860 | What do you, just a quick tangent on data augmentation.
01:01:51.580 | So a lot of it is hard-coded versus learned.
01:01:56.580 | Do you have any intuition that maybe
01:02:00.940 | there could be some weird data augmentation,
01:02:03.580 | like generative type of data augmentation,
01:02:06.180 | like doing something weird to images,
01:02:07.660 | which then improves the similarity learning process?
01:02:12.660 | So not just kind of dumb, simple distortions,
01:02:16.260 | but by you shaking your head,
01:02:18.100 | just saying that even simple distortions are enough.
01:02:20.900 | - I think, no, I think data augmentation
01:02:22.780 | is a temporary, necessary evil.
01:02:25.080 | So what people are working on now is two things.
01:02:28.880 | One is the type of self-supervised learning,
01:02:32.220 | like trying to translate the type of self-supervised learning
01:02:35.460 | people use in language, translating these two images,
01:02:38.660 | which is basically a denoising autoencoder method, right?
01:02:41.820 | So you take an image, you block, you mask some parts of it,
01:02:46.820 | and then you train some giant neural net
01:02:49.500 | to reconstruct the parts that are missing.
01:02:52.660 | And until very recently,
01:02:56.220 | there was no working methods for that.
01:02:59.140 | All the autoencoder type methods for images
01:03:01.620 | weren't producing very good representation,
01:03:03.740 | but there's a paper now coming out of the FAIR group
01:03:06.620 | at MNLO Park that actually works very well.
01:03:08.980 | So that doesn't require data augmentation,
01:03:12.140 | that requires only masking.
01:03:14.460 | Okay.
01:03:15.300 | - Only masking for images.
01:03:17.180 | Okay.
01:03:19.060 | - Right, so you mask part of the image
01:03:20.300 | and you train a system, which, you know,
01:03:23.060 | in this case is a transformer because you can,
01:03:26.620 | the transformer represents the image
01:03:28.380 | as non-overlapping patches,
01:03:30.860 | so it's easy to mask patches and things like that.
01:03:33.260 | - Okay, then my question transfers to that problem,
01:03:35.620 | the masking, like why should the mask be a square
01:03:38.740 | or a rectangle?
01:03:40.060 | - So it doesn't matter, like, you know,
01:03:41.580 | I think we're gonna come up probably in the future
01:03:44.300 | with sort of, you know, ways to mask that are, you know,
01:03:48.540 | kind of random, essentially.
01:03:51.140 | I mean, they are random already, but-
01:03:52.860 | - No, no, but like something that's challenging,
01:03:55.820 | like optimally challenging.
01:03:59.380 | So like, I mean, maybe it's a metaphor that doesn't apply,
01:04:02.460 | but you're, it seems like there's a data augmentation
01:04:06.420 | or masking, there's an interactive element with it.
01:04:09.860 | Like, you're almost like playing with an image.
01:04:11.980 | - Yeah.
01:04:12.820 | - And like, it's like the way we play
01:04:14.180 | with an image in our minds.
01:04:15.660 | - No, but it's like dropout.
01:04:16.700 | It's like Boson machine training.
01:04:18.300 | You know, every time you see a percept,
01:04:23.180 | you also, you can perturb it in some way.
01:04:26.820 | And then the principle of the training procedure
01:04:31.500 | is to minimize the difference of the output
01:04:33.580 | or the representation between the clean version
01:04:36.900 | and the corrupted version, essentially, right?
01:04:40.260 | And you can do this in real time, right?
01:04:42.020 | So, you know, Boson machine work like this, right?
01:04:44.220 | You show a percept, you tell the machine
01:04:47.420 | that's a good combination of activities
01:04:49.820 | or your input neurons.
01:04:50.900 | And then you either let them go their merry way
01:04:57.020 | without clamping them to values,
01:05:00.060 | or you only do this with a subset.
01:05:02.380 | And what you're doing is you're training the system
01:05:04.620 | so that the stable state of the entire network
01:05:07.980 | is the same regardless of whether it sees the entire input
01:05:10.660 | or whether it sees only part of it.
01:05:12.460 | You know, denoising autoencoder method
01:05:15.380 | is basically the same thing, right?
01:05:16.940 | You're training a system to reproduce the input,
01:05:19.540 | the complete input and filling the blanks,
01:05:21.820 | regardless of which parts are missing.
01:05:24.060 | And that's really the underlying principle.
01:05:26.220 | And you could imagine sort of even in the brain,
01:05:28.260 | some sort of neural principle where, you know,
01:05:30.700 | neurons kind of oscillate, right?
01:05:32.780 | So they take their activity and then temporarily
01:05:35.460 | they kind of shut off to, you know,
01:05:37.980 | force the rest of the system to basically reconstruct
01:05:42.100 | the input without their help, you know?
01:05:44.780 | And I mean, you could imagine, you know,
01:05:49.020 | more or less biologically possible processes.
01:05:51.060 | - Something like that.
01:05:51.900 | And I guess with this denoising autoencoder
01:05:54.940 | and masking and data augmentation,
01:05:58.700 | you don't have to worry about being super efficient.
01:06:01.140 | You can just do as much as you want
01:06:03.980 | and get better over time.
01:06:06.180 | 'Cause I was thinking like you might wanna be clever
01:06:08.780 | about the way you do all of these procedures, you know,
01:06:12.020 | but that's only, it's somehow costly to do every iteration,
01:06:16.740 | but it's not really.
01:06:17.940 | - Not really, maybe.
01:06:20.300 | And then there is, you know,
01:06:21.500 | data augmentation without explicit data augmentation.
01:06:24.180 | Is data augmentation by weighting,
01:06:25.580 | which is, you know, the sort of video prediction.
01:06:28.100 | You're observing a video clip,
01:06:31.500 | observing the, you know, the continuation of that video clip
01:06:35.940 | and you try to learn a representation
01:06:38.060 | using the joint embedding architectures
01:06:40.260 | in such a way that the representation of the future clip
01:06:43.300 | is easily predictable from the representation
01:06:45.660 | of the observed clip.
01:06:47.380 | - Do you think YouTube has enough raw data
01:06:52.740 | from which to learn how to be a cat?
01:06:56.420 | - I think so.
01:06:57.780 | - So the amount of data is not the constraint.
01:07:01.220 | - No, it would require some selection, I think.
01:07:04.140 | - Some selection?
01:07:05.420 | - Some selection of, you know,
01:07:07.060 | maybe the right type of data.
01:07:08.460 | - Don't go down the rabbit hole of just cat videos.
01:07:11.100 | I might, you might need to watch some lectures or something.
01:07:14.580 | - No.
01:07:15.700 | - How meta would that be?
01:07:17.500 | If it like watches lectures about intelligence
01:07:21.380 | and then learns, watches your lectures on NYU
01:07:24.300 | and learns from that how to be intelligent.
01:07:26.220 | - I don't think there'd be enough.
01:07:27.860 | - What's your, do you find multimodal learning interesting?
01:07:33.220 | We've been talking about visual language,
01:07:35.060 | like combining those together,
01:07:36.460 | maybe audio, all those kinds of things.
01:07:38.140 | - There's a lot of things that I find interesting
01:07:40.380 | in the short term, but are not addressing
01:07:43.260 | the important problem that I think are really
01:07:45.220 | kind of the big challenges.
01:07:46.660 | So I think, you know, things like multitask learning,
01:07:48.940 | continual learning, you know, adversarial issues.
01:07:53.940 | I mean, those have, you know, great practical interests
01:07:57.020 | in the relatively short term possibly,
01:08:00.300 | but I don't think they're fundamental, you know,
01:08:01.460 | active learning, even to some extent reinforcement learning.
01:08:04.380 | I think those things will become either obsolete
01:08:07.940 | or useless or easy once we figured out
01:08:12.940 | how to do self-supervised representation learning
01:08:15.900 | or learning predictive world models.
01:08:19.300 | And so I think that's what, you know,
01:08:21.540 | the entire community should be focusing on.
01:08:24.420 | At least people who are interested
01:08:25.460 | in sort of fundamental questions or, you know,
01:08:27.220 | really kind of pushing the envelope of AI
01:08:29.540 | towards the next stage.
01:08:31.460 | But of course there's like a huge amount of, you know,
01:08:33.340 | very interesting work to do in sort of practical questions
01:08:35.860 | that have, you know, short term impact.
01:08:38.020 | - Well, you know, it's difficult to talk about
01:08:41.300 | the temporal scale because all of human civilization
01:08:44.260 | will eventually be destroyed because the sun will die out.
01:08:48.580 | And even if Elon Musk is successful
01:08:50.300 | in multi-planetary colonization across the galaxy,
01:08:54.620 | eventually the entirety of it
01:08:56.620 | will just become giant black holes.
01:08:58.980 | And that's gonna keep the universe.
01:09:00.820 | - That's gonna take a while, though.
01:09:02.140 | - So, but what I'm saying is then that logic
01:09:04.860 | can be used to say it's all meaningless.
01:09:07.420 | I'm saying all that to say that multitask learning
01:09:11.900 | might be, you're calling it practical or pragmatic
01:09:16.220 | or whatever, that might be the thing
01:09:18.340 | that achieves something very akin to intelligence
01:09:21.140 | while we're trying to solve the more general problem
01:09:26.940 | of self-supervised learning of background knowledge.
01:09:29.460 | So the reason I bring that up,
01:09:30.660 | maybe one way to ask that question.
01:09:33.080 | I've been very impressed by what
01:09:34.740 | Tesla Autopilot team is doing.
01:09:36.460 | I don't know if you've gotten a chance to glance
01:09:38.340 | at this particular one example of multitask learning
01:09:42.140 | where they're literally taking the problem,
01:09:45.000 | like, I don't know, Charles Darwin studying animals.
01:09:48.940 | They're studying the problem of driving and asking,
01:09:52.100 | okay, what are all the things you have to perceive?
01:09:55.020 | And the way they're solving it is, one,
01:09:57.860 | there's an ontology where you're bringing that to the table.
01:10:00.420 | So you're formulating a bunch of different tasks.
01:10:02.300 | It's like over 100 tasks or something like that
01:10:04.260 | that they're involved in driving.
01:10:06.060 | And then they're deploying it
01:10:07.740 | and then getting data back from people that run into trouble
01:10:10.580 | and they're trying to figure out, do we add tasks?
01:10:12.700 | Do we, like, we focus on each individual task separately.
01:10:15.900 | - Sure. - In fact, half,
01:10:17.140 | so I would say, I'll classify Andrej Karpathy's talk
01:10:20.020 | in two ways.
01:10:20.860 | So one was about doors and the other one
01:10:23.140 | about how much ImageNet sucks.
01:10:24.740 | He kept going back and forth on those two topics,
01:10:28.600 | which ImageNet sucks,
01:10:30.060 | meaning you can't just use a single benchmark.
01:10:33.060 | There's so, like, you have to have, like,
01:10:36.060 | a giant suite of benchmarks to understand
01:10:38.460 | how well your system actually works.
01:10:40.020 | - Oh, I agree with him.
01:10:40.860 | I mean, he's a very sensible guy.
01:10:42.980 | Now, okay, it's very clear that if you're faced
01:10:47.620 | with an engineering problem that you need to solve
01:10:50.500 | in a relatively short time,
01:10:51.940 | particularly if you have Elon Musk breathing down your neck,
01:10:55.900 | you're going to have to take shortcuts, right?
01:10:57.380 | You might think about the fact that the right thing to do
01:11:02.380 | and the long-term solution involves, you know,
01:11:04.540 | some fancy self-supervisioning,
01:11:06.580 | but you have Elon Musk breathing down your neck,
01:11:10.260 | and this involves human lives,
01:11:13.620 | and so you have to basically just do
01:11:17.380 | the systematic engineering and fine-tuning and refinements
01:11:22.380 | and trial and error and all that stuff.
01:11:26.380 | There's nothing wrong with that.
01:11:27.460 | That's called engineering.
01:11:28.620 | That's called putting technology out
01:11:34.420 | in the world, and you have to kind of ironclad it
01:11:38.620 | before you do this, you know,
01:11:40.460 | so much for, you know, grand ideas and principles.
01:11:46.260 | But, you know, I'm placing myself sort of, you know,
01:11:50.740 | some, you know, upstream of this,
01:11:54.500 | quite a bit upstream of this.
01:11:55.780 | - Your Plato, think about platonic forms.
01:11:58.260 | You're- - It's not platonic,
01:11:59.900 | because eventually I want that stuff to get used,
01:12:03.100 | but it's okay if it takes five or 10 years
01:12:06.900 | for the community to realize this is the right thing to do.
01:12:09.300 | I've done this before.
01:12:11.260 | It's been the case before that, you know,
01:12:13.220 | I've made that case.
01:12:14.420 | I mean, if you look back in the mid-2000s, for example,
01:12:17.740 | and you ask yourself the question,
01:12:18.980 | okay, I want to recognize cars or faces or whatever,
01:12:22.060 | you know, I can use convolutional nets,
01:12:25.580 | so I can use sort of more conventional
01:12:28.380 | kind of computer vision techniques, you know,
01:12:29.900 | using interest point detectors or SIFT,
01:12:32.580 | dense SIFT features and, you know,
01:12:34.300 | sticking an SVM on top.
01:12:35.740 | At that time, the datasets were so small
01:12:37.820 | that those methods that use more hand engineering
01:12:41.940 | worked better than conv nets.
01:12:43.580 | There was just not enough data for conv nets,
01:12:45.540 | and conv nets were a little slow
01:12:47.860 | with the kind of hardware that was available at the time.
01:12:50.820 | And there was a sea change when, basically when, you know,
01:12:55.580 | datasets became bigger and GPUs became available.
01:12:58.580 | That's what, you know, two of the main factors
01:13:02.900 | that basically made people change their mind.
01:13:05.900 | And you can look at the history of,
01:13:11.820 | like all sub branches of AI or pattern recognition,
01:13:15.500 | and there's a similar trajectory followed by techniques
01:13:19.740 | where people start by, you know,
01:13:22.220 | engineering the hell out of it.
01:13:25.180 | You know, be it optical character recognition,
01:13:29.180 | speech recognition, computer vision,
01:13:31.740 | like image recognition in general,
01:13:34.260 | natural language understanding, like, you know,
01:13:35.980 | translation, things like that, right?
01:13:37.980 | You start to engineer the hell out of it.
01:13:39.980 | You start to acquire all the knowledge,
01:13:42.700 | the prior knowledge you know about image formation,
01:13:44.780 | about, you know, the shape of characters,
01:13:46.620 | about, you know, morphological operations,
01:13:49.580 | about like feature extraction, Fourier transforms,
01:13:52.420 | you know, Wernicke moments, you know, whatever, right?
01:13:54.500 | People have come up with thousands of ways
01:13:56.300 | of representing images so that they could be
01:13:58.620 | easily classified afterwards.
01:14:01.620 | Same for speech recognition, right?
01:14:03.020 | There is, you know, it took decades for people
01:14:05.020 | to figure out a good front end to pre-process
01:14:07.940 | a speech signal so that, you know,
01:14:10.540 | the information about what is being said is preserved,
01:14:13.420 | but most of the information about the identity
01:14:15.940 | of the speaker is gone.
01:14:17.060 | You know, Kestrel coefficients or whatever, right?
01:14:21.940 | And same for text, right?
01:14:24.540 | You do name entity recognition and you parse
01:14:27.460 | and you do tagging of the parts of speech.
01:14:32.460 | And, you know, you do this sort of tree representation
01:14:35.580 | of clauses and all that stuff, right,
01:14:37.500 | before you can do anything.
01:14:39.180 | So that's how it starts, right?
01:14:44.620 | Just engineer the hell out of it.
01:14:46.300 | And then you start having data
01:14:49.020 | and maybe you have more powerful computers,
01:14:51.260 | maybe you know something about statistical learning.
01:14:53.460 | So you start using machine learning
01:14:54.660 | and it's usually a small sliver on top of your
01:14:56.740 | kind of handcrafted system where, you know,
01:14:58.300 | you extract features by hand.
01:15:00.580 | Okay, and now, you know, nowadays,
01:15:02.580 | the standard way of doing this is that
01:15:04.100 | you train the entire thing end to end
01:15:05.380 | with a deep learning system and it learns its own features
01:15:07.740 | and, you know, speech recognition systems nowadays,
01:15:11.940 | OCR systems, are completely end to end.
01:15:13.900 | It's, you know, it's some giant neural net
01:15:16.380 | that takes raw waveforms and produces a sequence
01:15:19.940 | of characters coming out.
01:15:21.380 | And it's just a huge neural net, right?
01:15:23.020 | There's no, in a Markov model,
01:15:24.940 | there's no language model that is explicit
01:15:27.380 | other than, you know, something that's ingrained
01:15:29.540 | in the sort of neural language model, if you want.
01:15:31.900 | Same for translation, same for all kinds of stuff.
01:15:34.340 | So you see this continuous evolution
01:15:37.380 | from, you know, less and less handcrafting
01:15:41.340 | and more and more learning.
01:15:42.700 | And I think, I mean, it's true in biology as well.
01:15:49.260 | - So, I mean, we might disagree about this.
01:15:52.860 | Maybe not.
01:15:54.020 | In this one little piece at the end,
01:15:56.860 | you mentioned active learning.
01:15:58.340 | It feels like active learning,
01:16:01.460 | which is the selection of data and also the interactivity,
01:16:04.700 | needs to be part of this giant neural network.
01:16:06.780 | You cannot just be an observer
01:16:08.340 | to do self-supervised learning.
01:16:09.700 | You have to, well, I don't,
01:16:12.180 | self-supervised learning is just a word,
01:16:14.540 | but I would, whatever this giant stack
01:16:16.740 | of a neural network that's automatically learning,
01:16:19.620 | it feels, my intuition is that you have to have a system,
01:16:24.620 | whether it's a physical robot or a digital robot
01:16:30.180 | that's interacting with the world
01:16:32.300 | and doing so in a flawed way and improving over time
01:16:35.900 | in order to form the self-supervised learning well.
01:16:41.820 | You can't just give it a giant sea of data.
01:16:44.940 | - Okay, I agree and I disagree.
01:16:47.060 | I agree in the sense that I think, I agree in two ways.
01:16:52.060 | The first way I agree is that if you want,
01:16:55.100 | and you certainly need a causal model of the world
01:16:57.420 | that allows you to predict the consequences of your actions,
01:17:00.460 | to train that model, you need to take actions.
01:17:02.740 | You need to be able to act in a world and see the effect
01:17:06.100 | for you to learn causal models of the world.
01:17:08.420 | - So, that's not obvious because you can observe others.
01:17:11.500 | - You can observe others.
01:17:12.340 | - And you can infer that they're similar to you
01:17:14.660 | and then you can learn from that.
01:17:15.900 | - Yeah, but then you have to kind of hardware that part,
01:17:18.340 | right, and then you don't mirror neurons
01:17:19.820 | and all that stuff, right?
01:17:20.660 | So, and it's not clear to me
01:17:23.220 | how you would do this in a machine.
01:17:24.380 | So, I think the action part would be necessary
01:17:29.380 | for having causal models of the world.
01:17:32.580 | The second reason it may be necessary
01:17:36.660 | or at least more efficient is that active learning
01:17:40.580 | basically goes for the jiggler of what you don't know, right?
01:17:44.900 | There's obvious areas of uncertainty about your world
01:17:49.900 | and about how the world behaves.
01:17:52.980 | And you can resolve this uncertainty
01:17:56.220 | by systematic exploration of that part that you don't know.
01:18:00.300 | And if you know that you don't know,
01:18:01.700 | then it makes you curious.
01:18:03.020 | You kind of look into situations that...
01:18:05.580 | And across the animal world,
01:18:09.260 | different species are different levels of curiosity, right?
01:18:13.740 | Depending on how they're built, right?
01:18:15.100 | So, cats and rats are incredibly curious,
01:18:18.740 | dogs not so much, I mean, less.
01:18:20.620 | - Yeah, so it could be useful
01:18:22.100 | to have that kind of curiosity.
01:18:23.900 | - So, it'd be useful,
01:18:24.740 | but curiosity just makes the process faster.
01:18:26.980 | It doesn't make the process exist.
01:18:28.780 | So, what process, what learning process is it
01:18:34.580 | that active learning makes more efficient?
01:18:38.660 | And I'm asking that first question.
01:18:43.380 | We haven't answered that question yet.
01:18:44.780 | So, I worry about active learning once this question is...
01:18:48.100 | - So, it's the more fundamental question to ask.
01:18:50.820 | And if active learning or interaction
01:18:54.580 | increases the efficiency of the learning,
01:18:57.100 | see, sometimes it becomes very different
01:19:00.260 | if the increase is several orders of magnitude, right?
01:19:04.820 | - That's true.
01:19:05.660 | - But fundamentally, it's still the same thing
01:19:08.100 | in building up the intuition about how to,
01:19:11.180 | in a self-supervised way,
01:19:12.420 | to construct background models,
01:19:13.820 | efficient or inefficient, is the core problem.
01:19:18.640 | What do you think about Yoshua Bengio's
01:19:20.820 | talking about consciousness
01:19:22.900 | and all of these kinds of concepts?
01:19:24.540 | - Okay, I don't know what consciousness is, but...
01:19:29.540 | - It's a good opener.
01:19:31.980 | - And to some extent, a lot of the things
01:19:33.580 | that are said about consciousness remind me
01:19:35.980 | of the questions people were asking themselves
01:19:38.740 | in the 18th century or 17th century
01:19:41.340 | when they discovered that, you know,
01:19:44.100 | how the eye works and the fact that the image
01:19:46.300 | at the back of the eye was upside down, right?
01:19:49.900 | Because you have a lens.
01:19:51.180 | And so, on your retina, the image that forms
01:19:53.420 | is an image of the world, but it's upside down.
01:19:55.500 | How is it that you see right side up?
01:19:58.200 | And, you know, with what we know today in science,
01:20:00.460 | you know, we realize this question doesn't make any sense
01:20:03.860 | or is kind of ridiculous in some way, right?
01:20:06.340 | So, I think a lot of what is said about consciousness
01:20:08.180 | is of that nature.
01:20:09.020 | Now, that said, there's a lot of really smart people
01:20:10.980 | that, for whom I have a lot of respect,
01:20:13.820 | who are talking about this topic,
01:20:15.060 | people like David Chalmers, who is a colleague of mine
01:20:17.380 | at NYU.
01:20:18.220 | I have kind of an unorthodox, folk,
01:20:23.760 | speculative hypothesis about consciousness.
01:20:29.220 | So, we're talking about the study of a world model.
01:20:32.020 | And I think, you know, our entire prefrontal cortex
01:20:35.540 | basically is the engine for our world model.
01:20:40.820 | But when we are attending at a particular situation,
01:20:44.580 | we're focused on that situation.
01:20:46.060 | We basically cannot attend to anything else.
01:20:48.580 | And that seems to suggest that we basically have only one
01:20:53.580 | world model engine in our prefrontal cortex.
01:20:58.400 | That engine is configurable to the situation at hand.
01:21:02.580 | So, we are building a box out of wood,
01:21:04.620 | or we are, you know, driving down the highway,
01:21:08.340 | playing chess.
01:21:09.340 | We basically have a single model of the world
01:21:12.860 | that we're configuring to the situation at hand,
01:21:15.380 | which is why we can only attend to one task at a time.
01:21:18.080 | Now, if there is a task that we do repeatedly,
01:21:21.700 | it goes from the sort of deliberate reasoning
01:21:25.980 | using model of the world and prediction,
01:21:27.460 | and perhaps something like model predictive control,
01:21:29.340 | which I was talking about earlier,
01:21:31.420 | to something that is more subconscious
01:21:33.380 | that becomes automatic.
01:21:34.420 | So, I don't know if you've ever played against
01:21:36.340 | a chess grandmaster.
01:21:38.980 | You know, I get wiped out in 10 plies, right?
01:21:42.980 | And, you know, I have to think about my move
01:21:45.980 | for, you know, like 15 minutes.
01:21:48.680 | And the person in front of me, the grandmaster,
01:21:52.300 | you know, would just like react within seconds, right?
01:21:55.200 | You know, he doesn't need to think about it.
01:21:58.580 | That's become part of the subconscious,
01:21:59.980 | because, you know, it's basically just pattern recognition
01:22:02.620 | at this point.
01:22:03.460 | Same, you know, the first few hours you drive a car,
01:22:07.660 | you're really attentive.
01:22:08.660 | You can't do anything else.
01:22:09.660 | And then after 20, 30 hours of practice, 50 hours,
01:22:13.180 | you know, it's subconscious.
01:22:14.100 | You can talk to the person next to you,
01:22:15.420 | you know, things like that, right?
01:22:17.060 | Unless the situation becomes unpredictable,
01:22:18.980 | and then you have to stop talking.
01:22:21.020 | So, that suggests you only have one model in your head.
01:22:23.780 | And it might suggest the idea that consciousness
01:22:27.820 | basically is the module that configures
01:22:29.700 | this world model of yours.
01:22:31.700 | You know, you need to have some sort of executive
01:22:35.260 | kind of overseer that configures your world model
01:22:38.300 | for the situation at hand.
01:22:40.540 | And that leads to kind of the really curious concept
01:22:43.780 | that consciousness is not a consequence of the power
01:22:46.860 | of our minds, but of the limitation of our brains.
01:22:49.940 | But because we have only one world model,
01:22:52.020 | we have to be conscious.
01:22:53.660 | If we had as many world models as there are situations
01:22:57.620 | we encounter, then we could do all of them simultaneously,
01:23:00.740 | and we wouldn't need this sort of executive control
01:23:02.940 | that we call consciousness.
01:23:04.500 | - Yeah, interesting.
01:23:05.340 | And somehow maybe that executive controller,
01:23:08.940 | I mean, the hard problem of consciousness,
01:23:10.980 | there's some kind of chemicals in biology
01:23:12.860 | that's creating a feeling, like it feels
01:23:15.940 | to experience some of these things.
01:23:17.740 | That's kind of like the hard question is,
01:23:22.460 | what the heck is that, and why is that useful?
01:23:24.880 | Maybe the more pragmatic question,
01:23:26.180 | why is it useful to feel like this is really you
01:23:29.940 | experiencing this versus just like information
01:23:33.360 | being processed?
01:23:35.360 | - It could be just a very nice side effect
01:23:39.040 | of the way we evolved that it's just very useful
01:23:43.640 | to feel a sense of ownership to the decisions you make,
01:23:48.640 | to the perceptions you make, to the model
01:23:51.760 | you're trying to maintain.
01:23:53.200 | Like you own this thing, and it's the only one you got,
01:23:56.280 | and if you lose it, it's gonna really suck.
01:23:58.440 | And so you should really send the brain
01:24:00.640 | some signals about it.
01:24:03.720 | - What ideas do you believe might be true
01:24:06.840 | that most or at least many people disagree with you with,
01:24:10.080 | let's say in the space of machine learning?
01:24:13.760 | - Well, it depends who you talk about.
01:24:14.920 | But I think, so certainly there is a bunch of people
01:24:19.920 | who are nativist, right, who think that a lot
01:24:22.000 | of the basic things about the world are kind of hardwired
01:24:24.080 | in our minds.
01:24:25.320 | Things like the world is three-dimensional, for example.
01:24:28.880 | Is that hardwired?
01:24:30.400 | Things like object permanence, is it something
01:24:33.080 | that we learn before the age of three months or so,
01:24:37.520 | or are we born with it?
01:24:39.360 | And there are wide disagreements among
01:24:42.640 | the cognitive scientists for this.
01:24:46.560 | I think those things are actually very simple to learn.
01:24:49.040 | Is it the case that the oriented edge detectors in V1
01:24:54.240 | are learned, or are they hardwired?
01:24:56.160 | I think they are learned.
01:24:57.280 | They might be learned before birth,
01:24:58.560 | because it's really easy to generate signals
01:25:00.600 | from the retina that actually will train edge detectors.
01:25:03.000 | So, and again, those are things that can be learned
01:25:06.760 | within minutes of opening your eyes, right?
01:25:09.560 | I mean, since the 1990s, we have algorithms
01:25:14.040 | that can learn oriented edge detectors
01:25:15.440 | completely unsupervised with the equivalent
01:25:17.840 | of a few minutes of real time.
01:25:19.080 | So those things have to be learned.
01:25:21.540 | There's also those MIT experiments where you kind of plug
01:25:26.160 | the optical nerve on the auditory cortex of a baby ferret,
01:25:30.000 | right, and that auditory cortex
01:25:31.280 | becomes a visual cortex, essentially.
01:25:33.400 | So, you know, clearly there's learning taking place there.
01:25:37.980 | So, you know, I think a lot of what people think
01:25:40.680 | are so basic that they need to be hardwired,
01:25:43.160 | I think a lot of those things are learned
01:25:44.440 | because they are easy to learn.
01:25:46.240 | - So you put a lot of value in the power of learning.
01:25:49.960 | What kind of things do you suspect might not be learned?
01:25:53.340 | Is there something that could not be learned?
01:25:56.040 | - So your intrinsic drives are not learned.
01:25:59.760 | There are the things that, you know, make humans human
01:26:03.440 | or make, you know, cats different from dogs, right?
01:26:07.400 | It's the basic drives that are kind of hardwired
01:26:10.000 | in our basal ganglia.
01:26:11.920 | I mean, there are people who are working
01:26:14.000 | on this kind of stuff.
01:26:15.040 | It's called intrinsic motivation
01:26:16.320 | in the context of reinforcement learning.
01:26:18.160 | So these are objective functions
01:26:20.040 | where the reward doesn't come from the external world.
01:26:23.040 | It's computed by your own brain.
01:26:24.600 | Your own brain computes whether you're happy or not, right?
01:26:28.120 | It measures your degree of comfort or incomfort.
01:26:32.500 | And because it's your brain computing this,
01:26:36.080 | presumably it knows also how to estimate
01:26:37.760 | gradients of this, right?
01:26:38.760 | So it's easier to learn when your objective is intrinsic.
01:26:43.760 | So that has to be hardwired.
01:26:48.760 | The critic that makes long-term prediction of the outcome,
01:26:53.420 | which is the eventual result of this, that's learned.
01:26:56.720 | And perception is learned,
01:26:59.020 | and your model of the world is learned.
01:27:01.220 | But let me take an example of, you know, why the critic,
01:27:04.200 | I mean, an example of how the critic may be learned, right?
01:27:06.800 | If I come to you, you know, I reach across the table
01:27:11.200 | and I pinch your arm, right?
01:27:13.320 | Complete surprise for you.
01:27:15.040 | You would not have expected this from me.
01:27:15.880 | - I was expecting that the whole time, but yes, right.
01:27:18.040 | Let's say for the sake of the story, yes.
01:27:21.720 | - Okay, your basal ganglia is gonna light up
01:27:24.940 | 'cause it's gonna hurt, right?
01:27:26.780 | And now your model of the world includes the fact that
01:27:31.080 | I may pinch you if I approach my-
01:27:33.160 | - Don't trust humans.
01:27:36.180 | - Right, my hand to your arm.
01:27:37.820 | So if I try again, you're gonna recoil.
01:27:39.960 | And that's your critic, your predictive,
01:27:44.020 | you know, your predictor of your ultimate pain system
01:27:50.460 | that predicts that something bad is gonna happen
01:27:52.320 | and you recoil to avoid it.
01:27:53.760 | - So even that can be learned.
01:27:55.160 | - That is learned, definitely.
01:27:56.600 | This is what allows you also to, you know,
01:27:59.320 | define sub goals, right?
01:28:00.600 | So the fact that, you know, you're a school child,
01:28:04.440 | you wake up in the morning and you go to school and,
01:28:07.000 | you know, it's not because you necessarily like waking up
01:28:11.640 | early and going to school,
01:28:12.720 | but you know that there is a long-term objective
01:28:14.640 | you're trying to optimize.
01:28:15.840 | - So Ernest Becker, I'm not sure if you're familiar
01:28:18.120 | with him, the philosopher, he wrote the book
01:28:20.060 | "Denial of Death" and his idea is that
01:28:22.180 | one of the core motivations of human beings
01:28:24.460 | are terror of death, are fear of death.
01:28:27.260 | That's what makes us unique from cats.
01:28:28.900 | Cats are just surviving.
01:28:30.540 | They do not have a deep, like a cognizance,
01:28:35.540 | introspection that over the horizon is the end.
01:28:41.700 | And he says that, I mean,
01:28:43.020 | there's a terror management theory
01:28:44.380 | that just all these psychological experiments that show,
01:28:47.500 | basically this idea that all of human civilization,
01:28:52.500 | everything we create is kind of trying to forget
01:28:56.820 | if even for a brief moment that we're going to die.
01:29:00.620 | When do you think humans understand
01:29:03.720 | that they're going to die?
01:29:04.860 | Is it learned early on also?
01:29:07.460 | Like?
01:29:09.060 | - I don't know at what point, I mean, it's a question,
01:29:12.500 | like, you know, at what point do you realize that,
01:29:14.620 | you know, what death really is?
01:29:16.460 | And I think most people don't actually realize
01:29:18.220 | what death is, right?
01:29:19.260 | I mean, most people believe that you go to heaven
01:29:20.980 | or something, right?
01:29:21.900 | - So to push back on that, what Ernest Becker says
01:29:25.600 | and Sheldon Solomon, all of those folks,
01:29:29.340 | and I find those ideas a little bit compelling
01:29:31.660 | is that there is moments in life, early in life,
01:29:34.140 | a lot of this fun happens early in life
01:29:36.540 | when you are, when you do deeply experience the terror
01:29:41.540 | of this realization and all the things you think about,
01:29:45.340 | about religion, all those kinds of things
01:29:47.220 | that we kind of think about more like teenage years
01:29:49.620 | and later, we're talking about way earlier.
01:29:52.140 | - No, it was like seven or eight years,
01:29:53.220 | something like that, yeah.
01:29:54.060 | - You realize, holy crap, this is,
01:29:58.820 | like the mystery, the terror, like,
01:30:00.700 | it's almost like you're a little prey,
01:30:03.240 | a little baby deer sitting in the darkness
01:30:05.380 | of the jungle, the woods, looking all around you,
01:30:08.060 | the darkness full of terror.
01:30:09.580 | I mean, that realization says, okay,
01:30:12.180 | I'm going to go back in the comfort of my mind
01:30:14.500 | where there is a deep meaning,
01:30:16.820 | where there is a, maybe like, pretend I'm immortal
01:30:20.420 | in however way, however kind of idea I can construct
01:30:25.060 | to help me understand that I'm immortal.
01:30:27.180 | Religion helps with that.
01:30:28.660 | You can delude yourself in all kinds of ways,
01:30:31.460 | like lose yourself in the busyness of each day,
01:30:34.220 | have little goals in mind, all those kinds of things
01:30:36.380 | to think that it's gonna go on forever,
01:30:38.220 | and you kind of know you're gonna die, yeah,
01:30:40.780 | and it's gonna be sad, but you don't really understand
01:30:43.820 | that you're going to die.
01:30:45.140 | And so that's their idea.
01:30:46.460 | And I find that compelling because it does seem
01:30:49.940 | to be a core unique aspect of human nature
01:30:52.820 | that we're able to think that we're going,
01:30:55.180 | we're able to really understand that this life is finite.
01:30:59.540 | That seems important.
01:31:00.620 | - There's a bunch of different things there.
01:31:02.280 | So first of all, I don't think there is
01:31:03.660 | a qualitative difference between us and cats in the term.
01:31:07.520 | I think the difference is that we just have
01:31:09.240 | a better long-term ability to predict in the long term,
01:31:14.240 | and so we have a better understanding of how the world works,
01:31:17.380 | so we have a better understanding of finiteness of life
01:31:20.180 | and things like that.
01:31:21.020 | - So we have a better planning engine than cats?
01:31:23.520 | - Yeah.
01:31:24.440 | - Okay.
01:31:25.280 | - But what's the motivation for planning that far?
01:31:28.840 | - Well, I think it's just a side effect
01:31:30.160 | of the fact that we have just a better planning engine
01:31:32.320 | because it makes us, as I said,
01:31:34.760 | the essence of intelligence is the ability to predict.
01:31:37.400 | And so because we're smarter, as a side effect,
01:31:41.200 | we also have this ability to kind of make predictions
01:31:43.480 | about our own future existence or lack thereof.
01:31:47.560 | - Okay.
01:31:48.480 | - You say religion helps with that.
01:31:50.520 | I think religion hurts, actually.
01:31:52.960 | It makes people worry about what's gonna happen
01:31:55.600 | after their death, et cetera.
01:31:57.480 | If you believe that you just don't exist after death,
01:32:01.160 | it solves completely the problem, at least.
01:32:02.920 | - You're saying if you don't believe in God,
01:32:04.960 | you don't worry about what happens after death?
01:32:07.200 | - Yeah.
01:32:08.240 | - I don't know.
01:32:09.080 | - You only worry about this life
01:32:11.880 | because that's the only one you have.
01:32:14.240 | - I think it's, well, I don't know.
01:32:16.120 | If I were to say what Ernest Becker says,
01:32:17.760 | and I would say I agree with him more than not,
01:32:22.160 | is you do deeply worry.
01:32:26.160 | If you believe there's no God,
01:32:27.880 | there's still a deep worry of the mystery of it all.
01:32:31.760 | Like, how does that make any sense that it just ends?
01:32:35.680 | I don't think we can truly understand that this right,
01:32:39.720 | I mean, so much of our life, the consciousness, the ego,
01:32:43.000 | is invested in this being.
01:32:46.120 | - Science keeps bringing humanity down from its pedestal.
01:32:51.560 | And that's just another example of it.
01:32:54.720 | - That's wonderful, but for us individual humans,
01:32:57.840 | we don't like to be brought down from a pedestal.
01:33:00.280 | You're saying like-- - I'm fine with it.
01:33:01.720 | - But see, you're fine with it because, well,
01:33:04.140 | so what Ernest Becker would say is you're fine with it
01:33:06.360 | because that's just a more peaceful existence for you,
01:33:08.560 | but you're not really fine.
01:33:09.560 | You're hiding from, in fact, some of the people
01:33:12.000 | that experience the deepest trauma earlier in life,
01:33:17.000 | they often, before they seek extensive therapy,
01:33:19.600 | will say that I'm fine.
01:33:21.080 | It's like when you talk to people who are truly angry,
01:33:23.480 | how are you doing, I'm fine.
01:33:25.440 | The question is what's going on.
01:33:27.800 | - Now, I had a near-death experience.
01:33:29.200 | I had a very bad motorbike accident when I was 17.
01:33:33.640 | So, but that didn't have any impact
01:33:36.940 | on my reflection on that topic.
01:33:40.460 | - So I'm basically just playing a bit of a devil's advocate,
01:33:43.120 | pushing back on wondering is it truly possible
01:33:46.820 | to accept death?
01:33:47.660 | And the flip side that's more interesting, I think,
01:33:49.700 | for AI and robotics is how important is it to have this
01:33:54.700 | as one of the suite of motivations,
01:33:57.160 | is to not just avoid falling off the roof or something
01:34:02.160 | like that, but ponder the end of the ride.
01:34:07.160 | If you listen to the Stoics, it's a great motivator.
01:34:14.820 | It adds a sense of urgency.
01:34:16.900 | So maybe to truly fear death or be cognizant of it
01:34:21.440 | might give a deeper meaning and urgency
01:34:25.520 | to the moment, to live fully.
01:34:30.560 | - Maybe I don't disagree with that.
01:34:32.220 | I mean, I think what motivates me here is,
01:34:34.960 | you know, knowing more about human nature.
01:34:38.980 | I mean, I think human nature and human intelligence
01:34:41.760 | is a big mystery.
01:34:42.600 | It's a scientific mystery, in addition to, you know,
01:34:46.580 | philosophical and et cetera,
01:34:48.580 | but, you know, I'm a true believer in science.
01:34:50.860 | So, and I do have kind of a belief that for complex systems
01:34:57.540 | like the brain and the mind, the way to understand it
01:35:02.540 | is to try to reproduce it with, you know,
01:35:05.360 | artifacts that you build, because you know
01:35:07.660 | what's essential to it when you try to build it.
01:35:10.000 | You know, the same way I've used this analogy before
01:35:12.420 | with you, I believe, the same way we only started
01:35:15.820 | to understand aerodynamics when we started
01:35:18.640 | building airplanes, and that helped us understand
01:35:20.440 | how birds fly.
01:35:21.340 | So I think there's kind of a similar process here
01:35:25.480 | where we don't have a theory, a full theory of intelligence,
01:35:29.660 | but building, you know, intelligent artifacts
01:35:31.760 | will help us perhaps develop some, you know,
01:35:34.640 | underlying theory that encompasses not just
01:35:37.800 | artificial implements, but also human
01:35:41.920 | and biological intelligence in general.
01:35:43.840 | - So you're an interesting person to ask this question
01:35:46.080 | about sort of all kinds of different other
01:35:49.400 | intelligent entities or intelligences.
01:35:53.100 | What are your thoughts about kind of like the Turing
01:35:56.300 | or the Chinese room question?
01:35:58.020 | If we create an AI system that exhibits a lot of properties
01:36:04.100 | of intelligence and consciousness,
01:36:06.400 | how comfortable are you thinking of that entity
01:36:10.220 | as intelligent or conscious?
01:36:12.340 | So you're trying to build now systems that have intelligence
01:36:15.560 | and there's metrics about their performance,
01:36:17.420 | but that metric is external.
01:36:22.540 | - Okay.
01:36:23.380 | - So how are you, are you okay calling a thing intelligent
01:36:26.420 | or are you going to be like most humans
01:36:29.020 | and be once again unhappy to be brought down
01:36:32.700 | from a pedestal of consciousness/intelligence?
01:36:34.900 | - No, I'll be very happy to understand
01:36:39.500 | more about human nature, human mind, and human intelligence
01:36:45.500 | through the construction of machines
01:36:47.200 | that have similar abilities.
01:36:50.560 | And if a consequence of this is to bring down humanity
01:36:54.480 | one notch down from its already low pedestal,
01:36:58.000 | I'm just fine with it.
01:36:59.100 | That's just the reality of life.
01:37:01.300 | So I'm fine with that.
01:37:02.440 | Now, you were asking me about things that,
01:37:04.980 | opinions I have that a lot of people may disagree with.
01:37:07.900 | I think if we think about the design
01:37:12.740 | of an autonomous intelligence system,
01:37:14.220 | so assuming that we are somewhat successful at some level
01:37:18.660 | of getting machines to learn models of the world,
01:37:20.420 | predictive models of the world,
01:37:22.580 | we build intrinsic motivation objective functions
01:37:25.820 | to drive the behavior of that system.
01:37:28.300 | The system also has perception modules
01:37:30.060 | that allows it to estimate the state of the world
01:37:32.780 | and then have some way of figuring out a sequence of actions
01:37:35.460 | that, you know, to optimize a particular objective.
01:37:38.000 | If it has a critic of the type that I was describing before,
01:37:42.700 | the thing that makes you recoil your arm
01:37:44.580 | the second time I try to pinch you,
01:37:48.580 | intelligent autonomous machine will have emotions.
01:37:51.660 | I think emotions are an integral part
01:37:54.020 | of autonomous intelligence.
01:37:56.380 | If you have an intelligent system
01:37:58.980 | that is driven by intrinsic motivation, by objectives,
01:38:03.120 | if it has a critic that allows it to predict in advance
01:38:07.640 | whether the outcome of a situation
01:38:10.060 | is going to be good or bad, it's going to have emotions.
01:38:12.220 | It's going to have fear.
01:38:13.460 | - Yes.
01:38:14.300 | - When it predicts that the outcome is going to be bad
01:38:18.140 | and something to avoid, it's going to have elation
01:38:20.700 | when it predicts it's going to be good.
01:38:22.620 | If it has drives to relate with humans,
01:38:28.180 | you know, in some ways, the way humans have,
01:38:30.620 | you know, it's going to be social, right?
01:38:34.460 | And so it's going to have emotions about attachment
01:38:37.380 | and things of that type.
01:38:38.620 | So I think, you know, the sort of sci-fi thing
01:38:43.620 | where, you know, you see commander data
01:38:46.900 | like having an emotion chip that you can turn off, right?
01:38:50.100 | I think that's ridiculous.
01:38:51.700 | - So, I mean, here's the difficult
01:38:53.380 | philosophical social question.
01:38:57.820 | Do you think there will be a time,
01:39:00.040 | like a civil rights movement for robots where,
01:39:03.120 | okay, forget the movement, but a discussion,
01:39:06.460 | like the Supreme Court, that particular kinds of robots,
01:39:12.900 | you know, particular kinds of systems
01:39:14.860 | deserve the same rights as humans
01:39:18.300 | because they can suffer just as humans can,
01:39:21.640 | all those kinds of things?
01:39:24.740 | - Well, perhaps, perhaps not.
01:39:27.340 | Like imagine that humans were,
01:39:29.580 | that you could, you know, die and be restored.
01:39:33.740 | Like, you know, you could be sort of, you know,
01:39:35.500 | be 3D reprinted and, you know,
01:39:37.540 | your brain could be reconstructed in its finest details.
01:39:40.740 | Our ideas of rights will change in that case.
01:39:43.140 | If you can always just,
01:39:44.540 | there's always a backup, you could always restore.
01:39:48.220 | Maybe like the importance of murder
01:39:50.260 | will go down one notch.
01:39:51.980 | - That's right.
01:39:52.820 | But also the, your, you know,
01:39:56.140 | desire to do dangerous things like, you know,
01:39:59.620 | doing skydiving or, you know,
01:40:01.940 | or, you know, race car driving, you know,
01:40:06.100 | car racing or that kind of stuff, you know,
01:40:07.500 | would probably increase.
01:40:09.420 | Or, you know, airplane aerobatics
01:40:11.100 | or that kind of stuff, right?
01:40:11.940 | It would be fine to do a lot of those things
01:40:14.140 | or explore, you know, dangerous areas and things like that.
01:40:17.460 | It would kind of change your relationship.
01:40:19.180 | So now it's very likely that robots would be like that
01:40:22.380 | because, you know, they'll be based on perhaps technology
01:40:27.060 | that is somewhat similar to today's technology.
01:40:30.140 | And you can always have a backup.
01:40:32.260 | - So it's possible.
01:40:34.300 | I don't know if you like video games,
01:40:35.700 | but there's a game called Diablo and-
01:40:39.340 | - Oh, my sons are huge fans of this.
01:40:41.860 | - Yes.
01:40:42.700 | - And in fact, they made a game that's inspired by it.
01:40:47.060 | - Awesome.
01:40:47.900 | Like built a game?
01:40:49.260 | - My three sons have a game design studio between them.
01:40:52.380 | Yeah. - That's awesome.
01:40:53.220 | - They came out with a game.
01:40:54.060 | - Like it just came out last year?
01:40:55.540 | - No, this was last year, early last year,
01:40:57.300 | about a year ago.
01:40:58.140 | - That's awesome.
01:40:58.980 | But so in Diablo, there's something called hardcore mode,
01:41:01.980 | which if you die, there's no, you're gone.
01:41:05.420 | - Right.
01:41:06.260 | - That's it.
01:41:07.220 | So it's possible with AI systems,
01:41:09.660 | for them to be able to operate successfully
01:41:13.220 | and for us to treat them in a certain way,
01:41:15.540 | 'cause they have to be integrated in human society,
01:41:18.380 | they have to be able to die, no copies allowed.
01:41:22.020 | In fact, copying is illegal.
01:41:23.860 | It's possible with humans as well,
01:41:25.260 | like cloning will be illegal, even when it's possible.
01:41:28.580 | - But cloning is not copying, right?
01:41:29.940 | I mean, you don't reproduce the mind of the person
01:41:33.100 | and experience.
01:41:33.940 | - Right.
01:41:34.780 | - It's just a delayed twin, so.
01:41:36.420 | - But then it's, but we were talking about with computers
01:41:39.060 | that you will be able to copy.
01:41:40.500 | - Right.
01:41:41.340 | - You will be able to perfectly save,
01:41:42.660 | pickle the mind state.
01:41:46.660 | And it's possible that that will be illegal
01:41:49.660 | because that goes against,
01:41:52.320 | that will destroy the motivation of the system.
01:41:55.980 | - Okay, so let's say you have a domestic robot.
01:41:59.100 | - Yes.
01:41:59.940 | - Okay, sometime in the future.
01:42:01.380 | - Yes.
01:42:02.460 | - And the domestic robot, you know,
01:42:04.940 | comes to you kind of somewhat pre-trained,
01:42:07.140 | you know, it can do a bunch of things.
01:42:08.340 | - Yes.
01:42:09.180 | - But it has a particular personality
01:42:10.580 | that makes it slightly different from the other robots
01:42:12.300 | because that makes them more interesting.
01:42:14.220 | And then because it's, you know,
01:42:15.920 | it's lived with you for five years,
01:42:18.060 | you've grown some attachment to it and vice versa.
01:42:21.900 | And it's learned a lot about you.
01:42:24.380 | Or maybe it's not a household robot.
01:42:25.900 | Maybe it's a virtual assistant that lives in your,
01:42:29.380 | you know, augmented reality glasses or whatever, right?
01:42:32.580 | You know, the horror movie type thing, right?
01:42:35.020 | And that system, to some extent,
01:42:39.620 | the intelligence in that system is a bit like your child
01:42:43.900 | or maybe your PhD student in the sense that
01:42:47.100 | there's a lot of you in that machine now, right?
01:42:49.780 | - Yeah.
01:42:50.620 | - And so if it were a living thing,
01:42:53.500 | you would do this for free if you want, right?
01:42:56.560 | If it's your child, your child can, you know,
01:42:58.380 | then live his or her own life.
01:43:01.580 | And, you know, the fact that they learn stuff from you
01:43:04.020 | doesn't mean that you have any ownership of it, right?
01:43:06.020 | - Yeah.
01:43:06.860 | - But if it's a robot that you've trained,
01:43:09.380 | perhaps you have some intellectual property claim about-
01:43:13.980 | - Intellectual property?
01:43:15.140 | Oh, I thought you meant like a permanence value
01:43:18.160 | in the sense that's part of you is in-
01:43:20.140 | - Well, there is permanence value, right?
01:43:21.700 | So you would lose a lot if that robot were to be destroyed
01:43:24.660 | and you had no backup, you would lose a lot.
01:43:26.460 | You would lose a lot of investment, you know,
01:43:28.100 | kind of like a person dying, you know,
01:43:31.860 | that a friend of yours dying or a coworker
01:43:35.620 | or something like that.
01:43:36.820 | - But also you have like intellectual property rights
01:43:42.340 | in the sense that that system is fine-tuned
01:43:45.940 | to your particular existence.
01:43:47.340 | So that's now a very unique instantiation
01:43:49.860 | of that original background model,
01:43:51.980 | whatever it was that arrived.
01:43:54.260 | - And then there are issues of privacy, right?
01:43:55.660 | Because now imagine that that robot has its own
01:43:59.700 | kind of volition and decides to work for someone else
01:44:02.820 | or kind of, you know, thinks life with you
01:44:05.980 | is sort of untenable or whatever.
01:44:07.820 | - Right.
01:44:08.660 | - Now, all the things that that system learned from you,
01:44:12.740 | you know, can you like, you know,
01:44:16.820 | delete all the personal information
01:44:18.100 | that that system knows about you?
01:44:19.620 | - Yeah.
01:44:20.580 | - I mean, that would be kind of an ethical question.
01:44:22.180 | Like, you know, can you erase the mind of a intelligent
01:44:26.500 | robot to protect your privacy?
01:44:29.820 | - Yeah.
01:44:30.660 | - You can't do this with humans.
01:44:31.540 | You can ask them to shut up,
01:44:32.620 | but that you don't have complete power over them.
01:44:35.620 | - Can't erase humans.
01:44:36.780 | Yeah, it's the problem with the relationships, you know,
01:44:39.020 | that you break up, you can't erase the other human.
01:44:42.660 | With robots, I think it will have to be the same thing
01:44:44.940 | with robots, that risk, that there has to be some risk
01:44:50.300 | to our interactions to truly experience them deeply,
01:44:55.100 | it feels like.
01:44:56.140 | So you have to be able to lose your robot friend
01:44:59.620 | and that robot friend to go tweeting
01:45:01.660 | about how much of an asshole you are.
01:45:03.700 | - But then are you allowed to, you know,
01:45:06.140 | murder the robot to protect your private information?
01:45:08.620 | - Yeah, probably not.
01:45:09.460 | - If the robot decides to leave?
01:45:10.300 | - I have this intuition that for robots with certain,
01:45:14.540 | like, it's almost like a regulation.
01:45:16.820 | If you declare your robot to be,
01:45:19.220 | let's call it sentient or something like that,
01:45:20.980 | like this robot is designed for human interaction,
01:45:24.180 | then you're not allowed to murder these robots.
01:45:26.020 | It's the same as murdering other humans.
01:45:28.180 | - Well, but what about you do a backup of the robot
01:45:30.300 | that you preserve on a hard drive
01:45:32.580 | with the equivalent in the future?
01:45:33.860 | - That might be illegal.
01:45:34.700 | It's like piracy is illegal.
01:45:38.020 | - No, but it's your own robot, right?
01:45:39.740 | - But you can't, you don't--
01:45:41.620 | - But then you can wipe out his brain,
01:45:44.980 | so this robot doesn't know anything about you anymore,
01:45:47.380 | but you still have, technically it's still in existence
01:45:50.380 | because you backed it up.
01:45:51.660 | - And then there'll be these great speeches
01:45:53.500 | at the Supreme Court by saying,
01:45:55.420 | oh, sure, you can erase the mind of the robot
01:45:57.780 | just like you can erase the mind of a human.
01:46:00.020 | We both can suffer.
01:46:01.060 | There'll be some epic, like,
01:46:02.180 | Obama-type character with a speech that we,
01:46:05.620 | like, the robots and the humans are the same.
01:46:07.940 | We can both suffer, we can both hope,
01:46:11.340 | we can both, all of those kinds of things,
01:46:14.820 | raise families, all that kind of stuff.
01:46:17.220 | It's interesting for these, just like you said,
01:46:20.100 | emotion seems to be a fascinatingly powerful aspect
01:46:24.180 | of human-to-human interaction, human-robot interaction,
01:46:27.340 | and if they're able to exhibit emotions,
01:46:30.460 | at the end of the day, that's probably going to
01:46:33.540 | have us deeply consider human rights,
01:46:37.100 | like what we value in humans,
01:46:38.460 | what we value in other animals.
01:46:40.300 | That's why robots and AI is great.
01:46:42.140 | It makes us ask really good questions.
01:46:44.260 | - The hard questions, yeah.
01:46:46.100 | You asked about the Chinese room-type argument.
01:46:49.580 | Is it real, if it looks real?
01:46:51.500 | I think the Chinese room argument is a ridiculous one.
01:46:54.300 | - So for people who don't know, Chinese room is,
01:46:57.820 | I don't even know how to formulate it well,
01:47:00.740 | but basically, you can mimic the behavior
01:47:04.620 | of an intelligent system by just following
01:47:06.780 | a giant algorithm codebook that tells you
01:47:10.180 | exactly how to respond in exactly each case,
01:47:12.880 | but is that really intelligent?
01:47:14.700 | It's like a giant lookup table.
01:47:16.580 | When this person says this, you answer this.
01:47:18.580 | When this person says this, you answer this.
01:47:21.020 | And if you understand how that works,
01:47:24.300 | you have this giant, nearly infinite lookup table.
01:47:27.340 | Is that really intelligence?
01:47:28.620 | 'Cause intelligence seems to be a mechanism
01:47:31.300 | that's much more interesting and complex
01:47:33.420 | than this lookup table.
01:47:34.620 | - I don't think so.
01:47:35.460 | So the real question comes down to,
01:47:38.940 | do you think you can mechanize intelligence in some way,
01:47:44.340 | even if that involves learning?
01:47:47.580 | And the answer is, of course, yes.
01:47:49.300 | There's no question.
01:47:50.740 | There's a second question then,
01:47:52.140 | which is, assuming you can reproduce intelligence
01:47:56.540 | in sort of different hardware than biological hardware,
01:47:59.380 | you know, like computers,
01:48:00.620 | can you match human intelligence in all the domains
01:48:07.420 | in which humans are intelligent?
01:48:11.860 | Is it possible, right?
01:48:13.940 | So that's the hypothesis of strong AI.
01:48:17.060 | The answer to this, in my opinion, is an unqualified yes.
01:48:20.700 | This will happen at some point.
01:48:22.620 | There's no question that machines, at some point,
01:48:25.300 | will become more intelligent than humans
01:48:26.580 | in all domains where humans are intelligent.
01:48:28.580 | This is not for tomorrow.
01:48:30.180 | It's gonna take a long time,
01:48:32.180 | regardless of what Elon and others have claimed or believed.
01:48:37.180 | This is a lot harder than many of those guys think it is.
01:48:43.420 | And many of those guys who thought it was simpler than that
01:48:45.780 | years, you know, five years ago,
01:48:47.460 | now think it's hard because it's been five years
01:48:49.900 | and they realize it's gonna take a lot longer.
01:48:53.420 | That includes a bunch of people at DeepMind, for example.
01:48:56.180 | - Oh, interesting.
01:48:57.020 | I haven't actually touched base with the DeepMind folks,
01:48:59.340 | but some of it, Elon or Demis Hassabis,
01:49:03.300 | I mean, sometimes in your role,
01:49:05.820 | you have to kind of create deadlines
01:49:08.780 | that are nearer than farther away
01:49:10.740 | to kind of create an urgency.
01:49:12.820 | 'Cause you have to believe the impossible is possible
01:49:15.180 | in order to accomplish it.
01:49:16.180 | And there's, of course, a flip side to that coin,
01:49:18.540 | but it's a weird, you can't be too cynical
01:49:21.260 | if you wanna get something done.
01:49:22.420 | - Absolutely, I agree with that.
01:49:24.300 | But I mean, you have to inspire people, right,
01:49:26.900 | to work on sort of ambitious things.
01:49:28.740 | So, you know, it's certainly a lot harder than we believe,
01:49:35.580 | but there's no question in my mind that this will happen.
01:49:38.180 | And now, you know, people are kind of worried about
01:49:40.260 | what does that mean for humans?
01:49:42.460 | They are gonna be brought down from their pedestal,
01:49:45.100 | you know, a bunch of notches with that.
01:49:47.940 | And, you know, is that gonna be good or bad?
01:49:51.700 | I mean, it's just gonna give more power, right?
01:49:53.460 | It's an amplifier for human intelligence, really.
01:49:56.180 | - So speaking of doing cool, ambitious things,
01:49:59.700 | FAIR, the Facebook AI Research Group,
01:50:02.900 | has recently celebrated its eighth birthday,
01:50:05.500 | or maybe you can correct me on that.
01:50:08.620 | Looking back, what has been the successes,
01:50:11.580 | the failures, the lessons learned
01:50:13.340 | from the eight years of FAIR?
01:50:14.420 | And maybe you can also give context of
01:50:16.540 | where does the newly minted meta AI fit into,
01:50:21.260 | how does it relate to FAIR?
01:50:22.580 | - Right, so let me tell you a little bit
01:50:23.740 | about the organization of all this.
01:50:25.500 | Yeah, FAIR was created almost exactly eight years ago.
01:50:30.020 | It wasn't called FAIR yet.
01:50:31.220 | It took that name a few months later.
01:50:33.540 | And at the time I joined Facebook,
01:50:37.740 | there was a group called the AI Group
01:50:39.460 | that had about 12 engineers and a few scientists,
01:50:43.500 | like 10 engineers and two scientists
01:50:45.420 | or something like that.
01:50:47.020 | I ran it for three and a half years as a director,
01:50:49.900 | hired the first few scientists
01:50:52.300 | and kind of set up the culture and organized it,
01:50:55.300 | explained to the Facebook leadership
01:50:57.820 | what fundamental research was about
01:51:00.140 | and how it can work within industry
01:51:03.580 | and how it needs to be open and everything.
01:51:07.180 | And I think it's been an unqualified success
01:51:12.180 | in the sense that FAIR has simultaneously produced
01:51:16.460 | top-level research and advanced the science
01:51:20.900 | and the technology, provided tools,
01:51:22.500 | open source tools like PyTorch and many others.
01:51:24.980 | But at the same time has had a direct
01:51:29.820 | or mostly indirect impact on Facebook at the time,
01:51:34.620 | now meta, in the sense that a lot of systems
01:51:37.900 | that meta is built around now are based on research projects
01:51:44.340 | that started at FAIR.
01:51:48.300 | And so if you were to take out deep learning
01:51:50.020 | out of Facebook services now and meta more generally,
01:51:55.020 | I mean, the company would literally crumble.
01:51:57.660 | I mean, it's completely built around AI these days
01:52:01.380 | and it's really essential to the operations.
01:52:03.900 | So what happened after three and a half years
01:52:06.540 | is that I changed role, I became chief scientist.
01:52:10.140 | So I'm not doing day-to-day management of FAIR anymore.
01:52:14.780 | I'm more of a kind of, you know,
01:52:17.020 | think about strategy and things like that.
01:52:18.780 | And I carry my, I conduct my own research.
01:52:21.380 | I have, you know, my own kind of research group
01:52:23.220 | working on self-supervised learning and things like this,
01:52:25.220 | which I didn't have time to do when I was director.
01:52:28.140 | So now FAIR is run by Joël Pinot and Antoine Borde.
01:52:33.820 | Together, because FAIR is kind of split in two now,
01:52:36.300 | there's something called FAIR Labs,
01:52:37.820 | which is sort of bottom-up, science-driven research
01:52:40.900 | and FAIR Excel, which is slightly more organized
01:52:43.420 | for bigger projects that require a little more kind of focus
01:52:47.660 | and more engineering support and things like that.
01:52:49.740 | So Joël needs FAIR Lab and Antoine Borde needs FAIR Excel.
01:52:52.860 | - Where are they located?
01:52:54.540 | - It's delocalized all over.
01:52:56.620 | So there's no question that the leadership of the company
01:53:02.500 | believes that this was a very worthwhile investment.
01:53:06.540 | And what that means is that it's there for the long run.
01:53:11.540 | Right, so there is, if you want to talk in these terms,
01:53:16.780 | which I don't like, there's a business model, if you want,
01:53:19.540 | where FAIR, despite being a very fundamental research lab,
01:53:23.660 | brings a lot of value to the company,
01:53:25.900 | mostly indirectly through other groups.
01:53:27.860 | Now, what happened three and a half years ago
01:53:31.540 | when I stepped down was also the creation of Facebook AI,
01:53:34.620 | which was basically a larger organization
01:53:37.660 | that covers FAIR, so FAIR is included in it,
01:53:41.700 | but also has other organizations that are focused
01:53:46.260 | on applied research or advanced development of AI technology
01:53:51.220 | that is more focused on the products of the company.
01:53:54.660 | - So less emphasis on fundamental research.
01:53:56.660 | - Less fundamental, but it's still research.
01:53:58.220 | I mean, there's a lot of papers coming out
01:53:59.740 | of those organizations and people are awesome
01:54:03.940 | and wonderful to interact with,
01:54:06.380 | but it serves as kind of a way to kind of scale up,
01:54:11.380 | if you want, sort of AI technology,
01:54:15.700 | which may be very experimental and sort of lab prototypes
01:54:19.380 | into things that are usable.
01:54:20.580 | - So FAIR is a subset of meta AI.
01:54:23.020 | Is FAIR become like KFC?
01:54:25.140 | It'll just keep the F, nobody cares what the F stands for?
01:54:29.420 | - We'll know soon enough, probably by the end of 2021.
01:54:34.420 | - I guess it's not a giant change, Mare, FAIR.
01:54:38.420 | - Well, Mare doesn't sound too good,
01:54:39.540 | but the brand people are kind of deciding on this
01:54:43.540 | and they've been hesitating for a while now
01:54:45.860 | and they tell us they're gonna come up with an answer
01:54:48.500 | as to whether FAIR is gonna change name
01:54:50.460 | or whether we're gonna change just the meaning of the F.
01:54:53.180 | - Oh, that's a good call.
01:54:54.180 | I would keep FAIR and change the meaning of the F.
01:54:56.140 | - That would be my preference.
01:54:58.340 | - I would turn the F into fundamental.
01:55:00.980 | - Oh, that's good. - Fundamental AI research.
01:55:02.260 | - Oh, that's really good, yeah, yeah.
01:55:03.100 | - Within meta AI.
01:55:04.260 | So this would be sort of meta FAIR,
01:55:06.700 | but people would call it FAIR, right?
01:55:08.340 | - Yeah, exactly.
01:55:09.340 | I like it.
01:55:10.180 | - And now meta AI is part of the reality lab.
01:55:15.180 | So, you know, meta now, the new Facebook, right,
01:55:21.180 | it's called meta and it's kind of divided
01:55:23.940 | into, you know, Facebook, Instagram, WhatsApp,
01:55:28.660 | and reality lab.
01:55:32.900 | And reality lab is about, you know, AR, VR,
01:55:35.700 | you know, telepresence, communication,
01:55:39.180 | technology and stuff like that.
01:55:40.460 | It's kind of the, you can think of it as the sort of,
01:55:44.140 | a combination of sort of new products
01:55:47.820 | and technology part of meta.
01:55:51.900 | - Is that where the touch sensing for robots,
01:55:54.180 | I saw that you were posting about, that's-
01:55:56.020 | - But touch sensing for robot is part of FAIR, actually.
01:55:58.180 | That's a FAIR product. - Oh, it is, okay, cool.
01:56:00.500 | - This is also the, no, but there is the other way,
01:56:02.980 | the haptic glove, right? - Yes.
01:56:05.980 | - That has like- - That's more reality lab.
01:56:07.860 | - That's reality lab research.
01:56:10.700 | - Reality lab research.
01:56:11.700 | But by the way, the touch sensors is super interesting.
01:56:14.340 | Like integrating that modality
01:56:16.060 | into the whole sensing suite is very interesting.
01:56:20.060 | So what do you think about the metaverse?
01:56:23.620 | What do you think about this whole kind of expansion
01:56:27.740 | of the view of the role of Facebook and meta in the world?
01:56:30.820 | - Well, metaverse really should be thought of
01:56:32.420 | as the next step in the internet, right?
01:56:35.260 | Sort of trying to kind of, you know,
01:56:40.260 | make the experience more compelling of, you know,
01:56:44.060 | being connected either with other people or with content.
01:56:49.420 | And, you know, we are evolved and trained to evolve in,
01:56:54.260 | you know, 3D environments where, you know,
01:56:57.260 | we can see other people, we can talk to them when near them,
01:57:00.980 | or, you know, and other people are far away can't hear us,
01:57:04.060 | you know, things like that, right?
01:57:04.980 | So there's a lot of social conventions that exist
01:57:08.580 | in the real world that we can try to transpose.
01:57:10.740 | Now, what is gonna be eventually the,
01:57:13.220 | how compelling is it gonna be?
01:57:16.180 | Like, you know, is it gonna be the case
01:57:18.700 | that people are gonna be willing to do this
01:57:21.260 | if they have to wear, you know,
01:57:22.660 | a huge pair of goggles all day?
01:57:24.580 | Maybe not.
01:57:25.500 | - But then again, if the experience
01:57:27.460 | is sufficiently compelling, maybe so.
01:57:30.300 | - Or if the device that you have to wear
01:57:32.140 | is just basically a pair of glasses, you know,
01:57:34.300 | and technology makes sufficient progress for that.
01:57:36.780 | You know, AR is a much easier concept to grasp
01:57:41.540 | that you're gonna have, you know,
01:57:43.180 | augmented reality glasses that basically contain
01:57:46.580 | some sort of, you know, virtual assistant
01:57:48.620 | that can help you in your daily lives.
01:57:50.260 | - But at the same time with the AR,
01:57:51.900 | you have to contend with reality.
01:57:53.460 | With VR, you can completely detach yourself from reality,
01:57:55.860 | so it gives you freedom.
01:57:57.180 | It might be easier to design worlds in VR.
01:58:00.340 | - Yeah, but you can imagine, you know,
01:58:02.300 | the metaverse being-
01:58:03.540 | - A mix.
01:58:05.580 | - A mix, right.
01:58:06.500 | Or like you can have objects that exist in the metaverse
01:58:09.300 | that, you know, pop up on top of the real world
01:58:11.180 | or only exist in virtual reality.
01:58:14.380 | - Okay, let me ask the hard question.
01:58:17.060 | - Oh, because all of this was easy.
01:58:18.300 | - This was easy.
01:58:19.380 | The Facebook, now Meta, the social network
01:58:24.260 | has been painted by the media as net negative for society,
01:58:28.260 | even destructive and evil at times.
01:58:30.820 | You've pushed back against this, defending Facebook.
01:58:34.060 | Can you explain your defense?
01:58:36.540 | - Yeah, so the description,
01:58:38.620 | the company that is being described in some media
01:58:42.580 | is not the company we know when we work inside.
01:58:47.340 | And, you know, it could be claimed that
01:58:51.260 | a lot of employees are uninformed
01:58:52.860 | about what really goes on in a company,
01:58:54.580 | but, you know, I'm a vice president.
01:58:56.540 | I mean, I have a pretty good vision of what goes on.
01:58:58.660 | You know, I don't know everything, obviously.
01:59:00.180 | I'm not involved in everything,
01:59:01.860 | but certainly not in decision about like, you know,
01:59:04.580 | content moderation or anything like this,
01:59:06.100 | but I have some decent vision of what goes on.
01:59:10.140 | And this evil that is being described, I just don't see it.
01:59:13.660 | And then, you know, I think there is an easy story to buy,
01:59:18.180 | which is that, you know, all the bad things in the world,
01:59:21.740 | and, you know, the reason your friend believe crazy stuff,
01:59:25.140 | you know, there's an easy scapegoat, right,
01:59:28.740 | in social media in general, Facebook in particular.
01:59:34.460 | We have to look at the data.
01:59:35.460 | Like, is it the case that Facebook, for example,
01:59:40.060 | polarizes people politically?
01:59:42.700 | Are there academic studies that show this?
01:59:45.220 | Is it the case that, you know, teenagers
01:59:48.980 | think of themselves less if they use Instagram more?
01:59:52.140 | Is it the case that, you know,
01:59:55.340 | people get more riled up against, you know,
01:59:59.140 | opposite sides in a debate or political opinion
02:00:02.700 | if they are more on Facebook or if they are less?
02:00:05.700 | And study after study show that none of this is true.
02:00:10.700 | This is independent studies by academic.
02:00:12.420 | They're not funded by Facebook or Meta, you know,
02:00:15.900 | studied by Stanford, by some of my colleagues at NYU,
02:00:18.300 | actually, with whom I have no connection.
02:00:21.220 | You know, there's a study recently, they paid people,
02:00:25.020 | I think it was in the former Yugoslavia.
02:00:29.940 | I'm not exactly sure in what part,
02:00:31.820 | but they paid people to not use Facebook for a while
02:00:34.380 | in the period before the anniversary of the Srebrenica
02:00:39.940 | massacres, right?
02:00:41.140 | So, you know, people get riled up, like, should, you know,
02:00:43.180 | should we have a celebration?
02:00:45.460 | I mean, a memorial kind of celebration for it or not.
02:00:48.700 | So they paid a bunch of people to not use Facebook
02:00:51.420 | for a few weeks.
02:00:52.580 | And it turns out that those people ended up being
02:00:57.580 | more polarized than they were at the beginning.
02:01:00.460 | And the people who were more on Facebook
02:01:01.620 | were less polarized.
02:01:02.620 | There's a study, you know, from Stanford of,
02:01:07.620 | economists at Stanford that try to identify the causes
02:01:11.020 | of increasing polarization in the US.
02:01:14.460 | And it's been going on for 40 years before, you know,
02:01:17.460 | Mark Zuckerberg was born.
02:01:19.100 | - Yeah.
02:01:19.940 | - Continuously.
02:01:20.780 | And so if there is a cause,
02:01:24.220 | it's not Facebook or social media.
02:01:26.100 | So you could say if social media just accelerated,
02:01:28.100 | but no, I mean, it's basically a continuous evolution
02:01:31.740 | by some measure of polarization in the US.
02:01:34.300 | And then you compare this with other countries
02:01:36.300 | like the West half of Germany,
02:01:40.100 | because you can go 40 years in the East side
02:01:43.260 | or Denmark or other countries.
02:01:46.020 | And they use Facebook just as much.
02:01:47.980 | And they're not getting more polarized,
02:01:49.140 | they're getting less polarized.
02:01:50.540 | So if you want to look for, you know,
02:01:52.980 | a causal relationship there,
02:01:54.700 | you can find a scapegoat, but you can't find a cause.
02:01:58.420 | Now, if you want to fix the problem,
02:02:00.260 | you have to find the right cause.
02:02:01.620 | And what riles me up is that people now are,
02:02:04.900 | people now are accusing Facebook of bad deeds
02:02:08.300 | that are done by others.
02:02:09.300 | And those others are, we're not doing anything about them.
02:02:12.380 | And by the way, those others include
02:02:14.460 | the owner of the Wall Street Journal
02:02:15.660 | in which all of those papers were published.
02:02:17.700 | - So I should mention that I'm talking to Shrepp,
02:02:20.060 | Mike Shrepp on this podcast and also Mark Zuckerberg,
02:02:23.460 | and probably these are conversations you can have with them.
02:02:26.340 | 'Cause it's very interesting to me,
02:02:27.620 | even if Facebook has some measurable negative effect,
02:02:31.900 | you can't just consider that in isolation.
02:02:33.780 | You have to consider about all the positive ways
02:02:35.900 | that it connects us.
02:02:36.780 | So like every technology--
02:02:38.100 | - It connects people, it's a question.
02:02:39.620 | - You can't just say like, there's an increase in division.
02:02:43.860 | Yes, probably Google search engine
02:02:46.060 | has created increase in division,
02:02:47.860 | but you have to consider about how much information
02:02:49.860 | are brought to the world.
02:02:51.100 | Like I'm sure Wikipedia created more division
02:02:53.660 | if you just look at the division,
02:02:55.300 | but you have to look at the full context of the world
02:02:57.660 | and did it make a better world?
02:02:59.100 | - I mean, the printing press has created more division.
02:03:01.580 | - Exactly.
02:03:03.020 | - So when the printing press was invented,
02:03:06.860 | the first books that were printed
02:03:09.300 | were things like the Bible,
02:03:10.780 | and that allowed people to read the Bible by themselves,
02:03:13.780 | not get the message uniquely from priests in Europe.
02:03:17.380 | And that created the Protestant movement
02:03:20.340 | and 200 years of religious persecution and wars.
02:03:23.660 | So that's a bad side effect of the printing press.
02:03:26.180 | Social networks aren't being nearly as bad
02:03:28.500 | as the printing press,
02:03:29.340 | but nobody would say the printing press was a bad idea.
02:03:32.900 | - Yeah, a lot of it is perception
02:03:35.100 | and there's a lot of different incentives operating here.
02:03:38.420 | Maybe a quick comment,
02:03:40.020 | since you're one of the top leaders at Facebook
02:03:42.660 | and at Meta, sorry, that's in the tech space,
02:03:46.740 | I'm sure Facebook involves
02:03:48.500 | a lot of incredible technological challenges
02:03:52.020 | that need to be solved.
02:03:52.900 | A lot of it probably is on the computer infrastructure,
02:03:54.980 | the hardware, I mean, it's just a huge amount.
02:03:58.900 | Maybe can you give me context
02:04:00.380 | about how much of Schrepp's life is AI
02:04:04.380 | and how much of it is low-level compute?
02:04:06.220 | How much of it is flying all around doing business stuff
02:04:09.580 | in the same way Zuckerberg, Mark Zuckerberg?
02:04:12.020 | - They really focus on AI.
02:04:13.740 | I mean, certainly in the run-up of the creation affair
02:04:18.740 | and for at least a year after that, if not more,
02:04:24.060 | Mark was very, very much focused on AI
02:04:26.700 | and was spending quite a lot of effort on it.
02:04:29.700 | And that's his style.
02:04:30.780 | When he gets interested in something,
02:04:32.060 | he reads everything about it.
02:04:34.100 | He read some of my papers, for example, before I joined.
02:04:36.900 | And so he learned a lot about it.
02:04:41.860 | - He said he liked notes.
02:04:43.740 | - Right.
02:04:44.580 | And Schrepp was really into it also.
02:04:51.100 | I mean, Schrepp is really kind of,
02:04:52.780 | has something I've tried to preserve also
02:04:57.940 | despite my not so young age,
02:05:00.180 | which is a sense of wonder about science and technology.
02:05:03.180 | And he certainly has that.
02:05:05.260 | He's also a wonderful person.
02:05:07.420 | I mean, in terms of like as a manager,
02:05:10.380 | like dealing with people and everything,
02:05:12.140 | Mark also actually.
02:05:13.220 | So, I mean, they're very like, you know, very human people.
02:05:18.020 | In the case of Mark, it's shockingly human,
02:05:20.340 | you know, given his trajectory.
02:05:23.180 | I mean, the personality of him
02:05:27.060 | that he's painting in the press is just completely wrong.
02:05:29.580 | - Yeah.
02:05:30.420 | But you have to know how to play the press.
02:05:31.940 | So that's, I put some of that responsibility on him too.
02:05:36.180 | You have to, it's like, you know, like the director,
02:05:41.180 | the conductor of an orchestra,
02:05:44.300 | you have to play the press and the public
02:05:46.940 | in a certain kind of way
02:05:47.980 | where you convey your true self to them.
02:05:49.700 | If there's a depth of kindness to it.
02:05:51.100 | - It's hard.
02:05:51.940 | And it's probably not the best at it.
02:05:53.700 | So, yeah.
02:05:56.420 | You have to learn.
02:05:57.700 | And it's sad to see, I'll talk to him about it,
02:06:00.420 | but Shrepp is slowly stepping down.
02:06:04.020 | It's always sad to see folks sort of be there
02:06:07.500 | for a long time and slowly, I guess time is sad.
02:06:11.220 | - I think he's done the thing he set out to do.
02:06:14.780 | And, you know, he's got, you know,
02:06:17.540 | family priorities and stuff like that.
02:06:21.460 | And I understand, you know, after 13 years or something.
02:06:26.460 | - It's been a good run.
02:06:28.900 | - Which in Silicon Valley is basically a lifetime.
02:06:32.140 | - Yeah.
02:06:32.980 | - You know, cause you know, it's dog years.
02:06:34.980 | - So in Europe, the conference just wrapped up.
02:06:37.620 | Let me just go back to something else.
02:06:40.580 | You posted that the paper you coauthored
02:06:42.500 | was rejected from Europe.
02:06:44.460 | As you said, proudly in quotes, rejected.
02:06:47.140 | - I'm a joke.
02:06:48.940 | - Yeah, I know.
02:06:51.300 | Can you describe this paper and like,
02:06:53.940 | what was the idea in it?
02:06:55.700 | And also maybe this is a good opportunity
02:06:58.460 | to ask what are the pros and cons,
02:07:00.580 | what works and what doesn't about the review process?
02:07:03.620 | - Yeah, let me talk about the paper first.
02:07:04.980 | I'll talk about the review process afterwards.
02:07:08.260 | The paper is called VICREG.
02:07:10.700 | So this is, I mentioned that before,
02:07:12.540 | variance, invariance, covariance, regularization.
02:07:14.900 | And it's a technique, a non-contrastive learning technique
02:07:18.260 | for what I call joint embedding architecture.
02:07:21.300 | So Siamese nets are an example
02:07:23.340 | of joint embedding architecture.
02:07:24.860 | So joint embedding architecture is,
02:07:26.580 | let me back up a little bit, right?
02:07:30.620 | So if you want to do self-supervised learning,
02:07:33.300 | you can do it by prediction.
02:07:35.140 | So let's say you want to train a system to predict video,
02:07:38.580 | right?
02:07:39.420 | You show it a video clip
02:07:40.260 | and you train the system to predict the next,
02:07:43.580 | the continuation of that video clip.
02:07:45.060 | Now, because you need to handle uncertainty,
02:07:47.820 | because there are many, you know,
02:07:48.980 | many continuations that are plausible,
02:07:51.580 | you need to have, you need to handle this in some way.
02:07:54.020 | You need to have a way for the system
02:07:56.660 | to be able to produce multiple predictions.
02:08:00.620 | And the way, the only way I know to do this
02:08:03.500 | is through what's called a latent variable.
02:08:05.420 | So you have some sort of hidden vector
02:08:08.780 | of a variable that you can vary over a set
02:08:11.180 | or draw from a distribution.
02:08:12.700 | And as you vary this vector over a set,
02:08:14.500 | the output, the prediction,
02:08:15.620 | varies over a set of plausible predictions.
02:08:18.460 | Okay, so that's called,
02:08:19.460 | I call this a generative latent variable model.
02:08:23.220 | - Got it.
02:08:24.940 | - Okay, now there is an alternative to this,
02:08:27.020 | to handle uncertainty.
02:08:28.660 | And instead of directly predicting
02:08:31.140 | the next frames of the clip,
02:08:34.820 | you also run those through another neural net.
02:08:39.620 | So you now have two neural nets,
02:08:42.460 | one that looks at the, you know,
02:08:45.780 | the initial segment of the video clip,
02:08:48.660 | and another one that looks at the continuation
02:08:51.220 | during training, right?
02:08:52.380 | And what you're trying to do is learn a representation
02:08:56.260 | of those two video clips
02:08:59.020 | that is maximally informative
02:09:00.740 | about the video clips themselves,
02:09:03.420 | but is such that you can predict the representation
02:09:07.140 | of the second video clip
02:09:08.540 | from the representation of the first one, easily.
02:09:11.340 | Okay, and you can sort of formalize this
02:09:13.580 | in terms of maximizing mutual information
02:09:15.300 | and some stuff like that, but it doesn't matter.
02:09:18.100 | What you want is informative representations
02:09:23.100 | of the two video clips that are mutually predictable.
02:09:27.460 | What that means is that there's a lot of details
02:09:30.860 | in the second video clips that are irrelevant.
02:09:33.140 | You know, let's say a video clip consists in,
02:09:38.780 | you know, a camera panning the scene.
02:09:42.020 | There's going to be a piece of that room
02:09:43.340 | that is going to be revealed,
02:09:44.740 | and I can somewhat predict
02:09:46.180 | what that room is going to look like,
02:09:48.060 | but I may not be able to predict the details
02:09:50.220 | of the texture of the ground
02:09:52.300 | and where the tiles are ending and stuff like that, right?
02:09:54.500 | So those are irrelevant details
02:09:56.340 | that perhaps my representation will eliminate.
02:09:59.620 | And so what I need is to train this second neural net
02:10:03.660 | in such a way that whenever the continuation
02:10:09.020 | video clip varies over all the plausible continuations,
02:10:12.220 | the representation doesn't change.
02:10:15.620 | - Got it.
02:10:16.460 | So it's the, yeah, yeah, got it.
02:10:18.100 | Over the space of representations,
02:10:20.860 | doing the same kind of thing
02:10:21.860 | as you're doing with similarity learning.
02:10:24.300 | - Right.
02:10:25.700 | So these are two ways to handle
02:10:28.060 | multimodality in a prediction, right?
02:10:29.580 | In the first way, you parametrize the prediction
02:10:32.260 | with a latent variable,
02:10:33.460 | but you predict pixels essentially, right?
02:10:35.780 | In the second one, you don't predict pixels,
02:10:38.380 | you predict an abstract representation of pixels
02:10:40.660 | and you guarantee that this abstract representation
02:10:43.460 | has as much information as possible about the input,
02:10:46.140 | but sort of, you know,
02:10:47.020 | drops all the stuff that you really can't predict,
02:10:49.700 | essentially.
02:10:50.540 | I used to be a big fan of the first approach.
02:10:53.860 | And in fact, in this paper with Ishan Mishra,
02:10:55.580 | this blog post, the dark matter intelligence,
02:10:58.340 | I was kind of advocating for this.
02:10:59.740 | And in the last year and a half,
02:11:01.540 | I've completely changed my mind.
02:11:02.780 | I'm now a big fan of the second one.
02:11:05.540 | And it's because of a small collection of algorithms
02:11:10.020 | that have been proposed over the last year and a half
02:11:13.220 | or so, two years to do this, including V-Craig,
02:11:17.820 | its predecessor called Barlow-Twins,
02:11:19.620 | which I mentioned, a method from our friends
02:11:23.140 | at DeepMind called BYOL.
02:11:24.540 | And there's a bunch of others now that kind of work similarly.
02:11:29.580 | So they're all based on this idea of joint embedding.
02:11:32.580 | Some of them have an explicit criterion
02:11:34.660 | that is an approximation of mutual information.
02:11:36.620 | Some others at BYOL work, but we don't really know why.
02:11:39.420 | And there's been like lots of theoretical papers
02:11:41.220 | about why BYOL works.
02:11:42.340 | No, it's not that because we take it out
02:11:43.940 | and it still works and blah, blah, blah.
02:11:46.020 | I mean, so there's like a big debate,
02:11:47.820 | but the important point is that we now have a collection
02:11:51.540 | of non-contrastive joint embedding methods,
02:11:53.700 | which I think is the best thing since sliced bread.
02:11:56.380 | So I'm super excited about this
02:11:58.300 | because I think it's our best shot for techniques
02:12:02.020 | that would allow us to kind of build predictive world models
02:12:06.340 | and at the same time,
02:12:07.460 | learn hierarchical representations of the world
02:12:09.900 | where what matters about the world is preserved
02:12:11.860 | and what is irrelevant is eliminated.
02:12:14.420 | - And by the way, the representations of before and after
02:12:17.020 | is in the space in a sequence of images,
02:12:20.540 | or is it for single images?
02:12:22.300 | - It would be either for a single image, for a sequence.
02:12:24.660 | It doesn't have to be images.
02:12:25.660 | This could be applied to text.
02:12:26.700 | This could be applied to just about any signal.
02:12:28.540 | I'm looking for methods that are generally applicable
02:12:32.940 | that are not specific to one particular modality.
02:12:36.180 | It could be audio or whatever.
02:12:37.620 | - Got it.
02:12:38.460 | So what's the story behind this paper?
02:12:39.660 | This paper is what, is describing one such method?
02:12:43.460 | - This is this Vick-Reich method.
02:12:44.500 | So this is co-authored,
02:12:45.700 | the first author is a student called Adrien Barne,
02:12:49.260 | who is a resident PhD student at Fer Paris,
02:12:52.700 | who is co-advised by me and Jean Ponce,
02:12:55.820 | who's a professor at Ecole Normale Supérieure,
02:12:58.180 | also a research director at INRIA.
02:13:00.660 | So this is a wonderful program in France
02:13:03.580 | where PhD students can basically do their PhD in industry.
02:13:06.620 | And that's kind of what's happening here.
02:13:08.940 | And this paper is a follow-up on this Barlow-Twin paper
02:13:15.420 | by my former postdoc, now Stéphane Denis,
02:13:18.340 | with Li Jing and Joris Bontart
02:13:21.500 | and a bunch of other people from Fer.
02:13:24.700 | And one of the main criticism from reviewers
02:13:27.780 | is that Vick-Reich is not different enough from Barlow-Twins.
02:13:31.340 | But my impression is that it's Barlow-Twins
02:13:36.340 | with a few bugs fixed essentially.
02:13:39.860 | And in the end, this is what people will use.
02:13:43.140 | - Right.
02:13:44.500 | - But I'm used to stuff that I submit being rejected for a while.
02:13:48.980 | - So it might be rejected and actually exceptionally well cited
02:13:51.300 | 'cause people use it.
02:13:52.140 | - Well, it's already cited like a bunch of times.
02:13:54.340 | - So, I mean, the question is then to the deeper question
02:13:57.580 | about peer review and conferences.
02:14:00.220 | I mean, computer science as a field is kind of unique
02:14:02.580 | that the conference is highly prized.
02:14:04.940 | That's one.
02:14:05.780 | - Right.
02:14:06.620 | - And it's interesting because the peer review process
02:14:08.860 | there is similar, I suppose, to journals,
02:14:11.020 | but it's accelerated significantly.
02:14:13.600 | Well, not significantly, but it goes fast.
02:14:16.500 | And it's a nice way to get stuff out quickly,
02:14:19.740 | to peer review it quickly,
02:14:20.740 | go to present it quickly to the community.
02:14:22.580 | So not quickly, but quicker.
02:14:25.100 | - Yeah.
02:14:25.940 | - But nevertheless, it has many of the same flaws
02:14:27.780 | of peer review 'cause it's a limited number
02:14:30.180 | of people look at it.
02:14:31.460 | There's bias and the following,
02:14:32.740 | like that if you wanna do new ideas,
02:14:35.520 | you're gonna get pushed back.
02:14:37.020 | There's self-interested people that kind of can infer
02:14:42.060 | who submitted it and kind of be cranky about it,
02:14:46.700 | all that kind of stuff.
02:14:47.700 | - Yeah, I mean, there's a lot of social phenomenon there.
02:14:50.980 | There's one social phenomenon, which is that
02:14:53.180 | because the field has been growing exponentially,
02:14:56.760 | the vast majority of people in the field
02:14:58.540 | are extremely junior.
02:14:59.980 | - Yeah.
02:15:00.820 | - So as a consequence, and that's just a consequence
02:15:03.220 | of the field growing, right?
02:15:04.860 | So as the number of, as the size of the field
02:15:07.860 | kind of starts saturating, you will have less
02:15:10.100 | of that problem of reviewers being very inexperienced.
02:15:15.100 | A consequence of this is that young reviewers,
02:15:20.160 | I mean, there's a phenomenon which is that
02:15:22.860 | reviewers try to make their life easy
02:15:24.620 | and to make their life easy when reviewing a paper
02:15:27.460 | is very simple, you just have to find a flaw in the paper.
02:15:29.700 | Right?
02:15:30.540 | So basically they see their task as finding flaws in papers
02:15:34.500 | and most papers have flaws, even the good ones.
02:15:36.740 | - Yeah.
02:15:38.140 | - And so it's easy to do that.
02:15:41.500 | Your job is easier as a reviewer if you just focus on this.
02:15:46.420 | But what's important is like, is there a new idea
02:15:50.840 | in that paper that is likely to influence?
02:15:54.120 | It doesn't matter if the experiments are not that great,
02:15:56.240 | if the protocol is, you know, so-so, you know,
02:16:00.680 | things like that, as long as there is a worthy idea in it
02:16:05.040 | that will influence the way people think about the problem,
02:16:08.080 | even if they make it better, you know, eventually,
02:16:11.160 | I think that's really what makes a paper useful.
02:16:15.460 | And so this combination of social phenomena
02:16:19.480 | creates a disease that has plagued, you know,
02:16:24.120 | other fields in the past, like speech recognition,
02:16:26.640 | where basically, you know, people chase numbers
02:16:28.520 | on benchmarks and it's much easier to get a paper accepted
02:16:33.520 | if it brings an incremental improvement
02:16:37.000 | on a sort of mainstream, well-accepted method or problem.
02:16:43.800 | And those are, to me, boring papers.
02:16:46.020 | I mean, they're not useless, right?
02:16:47.860 | Because industry, you know, strives on those kind of progress
02:16:52.340 | but they're not the one that I'm interested in
02:16:54.020 | in terms of like new concepts and new ideas.
02:16:55.620 | So papers that are really trying to strike
02:16:59.260 | kind of new advances generally don't make it.
02:17:02.560 | Now, thankfully, we have Archive.
02:17:04.200 | - Archive, exactly.
02:17:05.260 | And then there's open review type of situations where you,
02:17:08.820 | and then, I mean, Twitter is a kind of open review.
02:17:11.620 | I'm a huge believer that review should be done
02:17:13.840 | by thousands of people, not two people.
02:17:15.680 | - I agree.
02:17:16.700 | - And so Archive, like do you see a future
02:17:19.540 | where a lot of really strong papers,
02:17:21.200 | it's already the present, but a growing future
02:17:23.620 | where it'll just be Archive
02:17:25.320 | and you're presenting an ongoing, continuous conference
02:17:31.260 | called Twitter/the Internet/Archive Sanity.
02:17:35.540 | Andre just released a new version.
02:17:38.000 | So just not, you know, not being so elitist
02:17:40.880 | about this particular gating.
02:17:43.420 | - It's not a question of being elitist or not.
02:17:44.940 | It's a question of being basically recommendation
02:17:49.940 | and seal of approvals for people who don't see themselves
02:17:53.340 | as having the ability to do so by themselves, right?
02:17:55.900 | And so it saves time, right?
02:17:57.300 | If you rely on other people's opinion
02:17:59.980 | and you trust those people or those groups
02:18:03.700 | to evaluate a paper for you,
02:18:07.300 | that saves you time.
02:18:09.920 | 'Cause you don't have to like scrutinize the paper as much.
02:18:13.340 | It is brought to your attention.
02:18:15.140 | I mean, there's the whole idea
02:18:15.980 | of sort of collective recommender system, right?
02:18:18.700 | So I actually thought about this a lot,
02:18:21.180 | you know, about 10, 15 years ago,
02:18:24.140 | 'cause there were discussions at NIPS
02:18:27.020 | and we're about to create iClear with Yoshua Bengio.
02:18:31.220 | And so I wrote a document kind of describing
02:18:34.820 | a reviewing system, which basically was,
02:18:38.020 | you post your paper on some repository,
02:18:39.660 | let's say archive or now could be open review.
02:18:42.540 | And then you can form a reviewing entity,
02:18:46.200 | which is equivalent to a reviewing board,
02:18:48.120 | you know, of a journal or a program committee
02:18:52.320 | of a conference.
02:18:53.940 | You have to list the members
02:18:55.540 | and then that group, reviewing entity,
02:18:59.400 | can choose to review a particular paper,
02:19:02.560 | spontaneously or not.
02:19:03.680 | There is no exclusive relationship anymore
02:19:05.580 | between a paper and a venue or reviewing entity.
02:19:09.160 | Any reviewing entity can review any paper
02:19:11.160 | or may choose not to.
02:19:14.080 | And then, you know, give an evaluation.
02:19:16.640 | It's not published, not published,
02:19:17.920 | it's just an evaluation and a comment,
02:19:20.320 | which would be public, signed by the reviewing entity.
02:19:23.660 | And if it's signed by a reviewing entity,
02:19:25.880 | you know, it's one of the members of reviewing entity.
02:19:27.760 | So if the reviewing entity is, you know,
02:19:30.680 | Lex Friedman's, you know, preferred papers, right?
02:19:33.700 | You know, it's Lex Friedman writing the review.
02:19:35.620 | - Yes.
02:19:36.700 | So for me, that's a beautiful system, I think.
02:19:40.920 | But what's in addition to that,
02:19:42.900 | it feels like there should be a reputation system
02:19:45.800 | for the reviewers.
02:19:47.480 | - For the reviewing entities,
02:19:49.020 | not the reviewers individually.
02:19:50.260 | - The reviewing entities, sure.
02:19:51.700 | But even within that, the reviewers too,
02:19:53.900 | because there's another thing here.
02:19:57.140 | It's not just the reputation,
02:19:59.340 | it's an incentive for an individual person to do great.
02:20:02.780 | Right now, in the academic setting,
02:20:05.060 | the incentive is kind of internal,
02:20:07.900 | just wanting to do a good job.
02:20:09.240 | But honestly, that's not a strong enough incentive
02:20:11.260 | to do a really good job in reading a paper,
02:20:13.700 | in finding the beautiful amidst the mistakes and the flaws
02:20:16.380 | and all that kind of stuff.
02:20:17.780 | Like, if you're the person
02:20:19.220 | that first discovered a powerful paper,
02:20:22.420 | and you get to be proud of that discovery,
02:20:25.100 | then that gives a huge incentive to you.
02:20:27.740 | - That's a big part of my proposal, actually,
02:20:29.300 | where I describe that as, you know,
02:20:31.700 | if your evaluation of papers
02:20:33.700 | is predictive of future success,
02:20:37.500 | then your reputation should go up as a reviewing entity.
02:20:40.900 | So, yeah, exactly.
02:20:43.740 | I mean, I even had a master's student
02:20:46.260 | who was a master's student in library science
02:20:49.540 | and computer science,
02:20:50.380 | actually kind of work out exactly
02:20:52.460 | how that should work with formulas and everything.
02:20:55.100 | - So, in terms of implementation,
02:20:56.780 | do you think that's something that's doable?
02:20:58.580 | - I mean, I've been sort of talking about this
02:21:00.740 | to sort of various people like, you know,
02:21:02.940 | Andrew McCallum, who started Open Review.
02:21:05.900 | And the reason why we picked Open Review
02:21:07.780 | for iClear initially,
02:21:09.060 | even though it was very early for them,
02:21:11.380 | is because my hope was that iClear,
02:21:14.260 | it was eventually going to kind of
02:21:16.700 | inaugurate this type of system.
02:21:18.580 | So iClear kept the idea of open reviews.
02:21:22.220 | So where the reviews are, you know,
02:21:23.820 | published with a paper, which I think is very useful.
02:21:27.300 | But in many ways, that's kind of reverted
02:21:29.740 | to kind of more of a conventional type conferences
02:21:33.260 | for everything else.
02:21:34.100 | And that, I mean, I don't run iClear,
02:21:37.780 | I'm just the president of the foundation,
02:21:41.180 | but, you know, people who run it
02:21:44.100 | should make decisions about how to run it.
02:21:45.620 | And I'm not going to tell them
02:21:47.340 | because they are volunteers
02:21:48.500 | and I'm really thankful that they do that.
02:21:50.300 | So, but I'm saddened by the fact that
02:21:53.820 | we're not being innovative enough.
02:21:57.060 | - Yeah, me too.
02:21:57.900 | I hope that changes.
02:21:59.660 | Yeah, 'cause the communication of science broadly,
02:22:02.060 | but the communication of computer science ideas
02:22:04.420 | is how you make those ideas have impact, I think.
02:22:08.300 | - Yeah, and I think, you know, a lot of this is
02:22:11.420 | because people have in their mind kind of an objective,
02:22:16.220 | which is, you know, fairness for authors
02:22:19.100 | and the ability to count points basically
02:22:22.260 | and give credits accurately.
02:22:24.860 | But that comes at the expense of the progress of science.
02:22:28.860 | So to some extent,
02:22:29.700 | we're slowing down the progress of science.
02:22:32.140 | - And are we actually achieving fairness?
02:22:34.420 | - And we're not achieving fairness, you know,
02:22:36.460 | we still have biases, you know,
02:22:38.060 | we're doing, you know, a double-blind review,
02:22:39.780 | but, you know, the biases are still there.
02:22:44.340 | There are different kinds of biases.
02:22:46.700 | - You write that the phenomenon of emergence,
02:22:49.340 | collective behavior exhibited by a large collection
02:22:51.660 | of simple elements in interaction
02:22:54.220 | is one of the things that got you
02:22:55.700 | into neural nets in the first place.
02:22:57.740 | I love cellular automata,
02:22:59.060 | I love simple interacting elements
02:23:01.940 | and the things that emerge from them.
02:23:04.020 | Do you think we understand how complex systems
02:23:07.260 | can emerge from such simple components
02:23:09.580 | that interact simply?
02:23:11.020 | - No, we don't.
02:23:12.260 | It's a big mystery.
02:23:13.100 | Also, it's a mystery for physicists,
02:23:14.460 | it's a mystery for biologists.
02:23:16.020 | You know, how is it that the universe around us
02:23:22.060 | seems to be increasing in complexity and not decreasing?
02:23:25.140 | I mean, that is a kind of curious property of physics
02:23:29.620 | that despite the second law of thermodynamics,
02:23:32.340 | we seem to be, you know, evolution and learning
02:23:35.940 | and et cetera seems to be kind of at least locally
02:23:39.620 | to increase complexity, not decrease it.
02:23:43.980 | So perhaps the ultimate purpose of the universe
02:23:46.500 | is to just get more complex.
02:23:49.060 | - Have these, I mean, small pockets of beautiful complexity.
02:23:55.060 | Does that, does cellular automata,
02:23:57.100 | do these kinds of emergence of complex systems
02:23:59.660 | give you some intuition or guide your understanding
02:24:04.100 | of machine learning systems and neural networks and so on?
02:24:06.660 | Or are these for you right now, disparate concepts?
02:24:09.420 | - Well, it got me into it.
02:24:10.860 | You know, I discovered the existence of the perceptron
02:24:15.580 | when I was a college student, you know,
02:24:18.540 | by reading a book on, it was a debate between Chomsky
02:24:20.940 | and Piaget and Seymour Papert from MIT
02:24:24.180 | was kind of singing the praise of the perceptron
02:24:26.620 | in that book.
02:24:27.460 | And I, the first time I heard about the learning machine,
02:24:29.740 | right, so I started digging the literature
02:24:31.340 | and I found those books,
02:24:33.540 | which were basically transcription of, you know,
02:24:36.020 | workshops or conferences from the 50s and 60s
02:24:39.860 | about self-organizing systems.
02:24:42.140 | So there were, there was a series of conferences
02:24:44.540 | on self-organizing systems and these books on this.
02:24:48.140 | Some of them are, you can actually get them
02:24:50.180 | at the internet archive, you know, the digital version.
02:24:53.220 | And there are like fascinating articles in there by,
02:24:58.260 | there's a guy whose name has been largely forgotten,
02:25:00.340 | Heinz von Foerster.
02:25:01.740 | So it was a German physicist who immigrated to the US
02:25:06.180 | and worked on self-organizing systems in the 50s.
02:25:11.180 | And in the 60s, he created,
02:25:12.860 | at University of Illinois Urbana-Champaign,
02:25:14.420 | he created the biological computer laboratory, BCL,
02:25:18.900 | which was, you know, all about neural nets.
02:25:21.580 | Unfortunately, that was kind of towards the end
02:25:23.340 | of the popularity of neural nets.
02:25:24.820 | So that lab never kind of strived very much,
02:25:27.660 | but he wrote a bunch of papers about self-organization
02:25:30.260 | and about the mystery of self-organization.
02:25:33.420 | An example he has is, you take,
02:25:35.620 | imagine you are in space, there's no gravity,
02:25:37.980 | you have a big box with magnets in it, okay?
02:25:42.100 | You know, kind of rectangular magnets
02:25:43.820 | with North Pole on one end, South Pole on the other end.
02:25:46.820 | You shake the box gently and the magnets
02:25:48.980 | will kind of stick to themselves
02:25:50.100 | and probably form a complex structure, you know,
02:25:53.660 | spontaneously, you know,
02:25:55.420 | that could be an example of self-organization.
02:25:57.020 | But, you know, you have lots of examples,
02:25:58.340 | neural nets are an example of self-organization too,
02:26:01.180 | you know, in many respect.
02:26:02.980 | And it's a bit of a mystery, you know,
02:26:05.900 | how, like what is possible with this, you know,
02:26:09.420 | pattern formation in physical systems,
02:26:11.940 | in chaotic system and things like that, you know,
02:26:14.700 | the emergence of life, you know, things like that.
02:26:16.860 | So, you know, how does that happen?
02:26:19.540 | So it's a big puzzle for physicists as well.
02:26:22.540 | - It feels like understanding this,
02:26:24.660 | the mathematics of emergence in some constrained situations
02:26:29.660 | might help us create intelligence.
02:26:32.060 | Like help us add a little spice to the systems
02:26:35.980 | because you seem to be able to,
02:26:39.500 | in complex systems with emergence,
02:26:41.900 | to be able to get a lot from little.
02:26:44.620 | And so that seems like a shortcut
02:26:47.020 | to get big leaps in performance.
02:26:51.100 | - But there's a missing theoretical concept
02:26:53.660 | that we don't have.
02:26:55.020 | - Yeah.
02:26:55.860 | - And it's something also I've been fascinated by
02:26:58.420 | since my undergrad days.
02:27:00.700 | And it's how you measure complexity, right?
02:27:03.900 | So we don't actually have good ways of measuring,
02:27:06.940 | or at least we don't have good ways of interpreting
02:27:09.860 | the measures that we have at our disposal.
02:27:11.940 | Like how do you measure the complexity of something, right?
02:27:14.500 | So there's all those things, you know,
02:27:15.660 | like, you know, Kolmogorov, Chaitin, Solomonov complexity
02:27:18.540 | of, you know, the length of the shortest program
02:27:20.940 | that would generate a bit string
02:27:22.460 | can be thought of as the complexity of that bit string.
02:27:25.220 | - Mm-hmm.
02:27:26.060 | - I've been fascinated by that concept.
02:27:28.180 | The problem with that is that that complexity
02:27:32.380 | is defined up to a constant, which can be very large.
02:27:34.980 | - Right.
02:27:36.740 | - There are similar concepts that are derived from,
02:27:38.860 | you know, Bayesian probability theory,
02:27:43.340 | where, you know, the complexity of something
02:27:45.580 | is the negative log of its probability, essentially, right?
02:27:49.460 | And you have a complete equivalence between the two things.
02:27:52.260 | And there you would think, you know,
02:27:53.180 | the probability is something
02:27:54.420 | that's well-defined mathematically,
02:27:56.220 | which means complexity is well-defined.
02:27:58.220 | But it's not true.
02:27:59.060 | You need to have a model of the distribution.
02:28:02.660 | You may need to have a prior
02:28:03.780 | if you're doing Bayesian inference.
02:28:05.100 | And the prior plays the same role
02:28:06.580 | as the choice of the computer
02:28:07.940 | with which you measure Kolmogorov complexity.
02:28:10.460 | And so every measure of complexity we have
02:28:12.980 | has some arbitrariness in it.
02:28:14.500 | You know, an additive constant,
02:28:17.740 | which can be arbitrarily large.
02:28:20.500 | And so, you know, how can we come up with a good theory
02:28:24.260 | of how things become more complex
02:28:25.580 | if we don't have a good measure of complexity?
02:28:26.900 | - Yeah, which we need for,
02:28:28.580 | there's one way that people study this
02:28:31.500 | in the space of biology,
02:28:32.980 | the people that study the origin of life
02:28:34.540 | or try to recreate life in the laboratory.
02:28:37.820 | And the more interesting one is the alien one,
02:28:39.860 | is when we go to other planets,
02:28:42.020 | how would we recognize this life?
02:28:44.700 | 'Cause, you know, complexity, we associate complexity,
02:28:47.500 | maybe some level of mobility with life.
02:28:49.820 | You know, we have to be able to like have concrete algorithms
02:28:55.700 | for like measuring the level of complexity we see
02:29:00.780 | in order to know the difference between life and non-life.
02:29:03.340 | - And the problem is that complexity
02:29:04.620 | is in the eye of the beholder.
02:29:06.060 | So let me give you an example.
02:29:08.100 | If I give you an image of the MNIST digits, right?
02:29:13.100 | And I flip through MNIST digits,
02:29:16.020 | there is some, obviously some structure to it
02:29:18.700 | because local structure, you know,
02:29:21.060 | neighboring pixels are correlated
02:29:22.780 | across the entire dataset.
02:29:26.140 | I imagine that I apply a random permutation
02:29:30.980 | to all the pixels, a fixed random permutation.
02:29:34.580 | I show you those images, they will look, you know,
02:29:37.980 | really disorganized to you, more complex.
02:29:40.420 | In fact, they're not more complex in absolute terms,
02:29:43.500 | they're exactly the same as originally, right?
02:29:46.100 | And if you knew what the permutation was, you know,
02:29:47.860 | you could undo the permutation.
02:29:50.020 | Now, imagine I give you special glasses
02:29:52.900 | that undo that permutation.
02:29:54.700 | Now, all of a sudden, what looked complicated becomes simple.
02:29:57.620 | - Right.
02:29:58.460 | - So if you have two, if you have, you know,
02:30:00.900 | humans on one end and then another race of aliens
02:30:03.820 | that sees the universe with permutation glasses.
02:30:05.980 | - Yeah, with the permutation glasses.
02:30:07.460 | (Lex laughing)
02:30:08.740 | - What we perceive as simple to them is hardly complicated,
02:30:11.380 | it's probably heat.
02:30:12.340 | - Yeah, heat, yeah.
02:30:13.540 | - Okay, and what they perceive as simple to us
02:30:15.900 | is random fluctuation, it's heat.
02:30:19.060 | - Yeah.
02:30:20.460 | - So-
02:30:21.300 | - Truly in the eye of the beholder,
02:30:22.780 | depends what kind of glasses you're wearing.
02:30:24.940 | - Right.
02:30:25.780 | - Depends what kind of algorithm you're running
02:30:26.860 | in your perception system.
02:30:28.380 | - So I don't think we'll have a theory of intelligence,
02:30:31.140 | self-organization, evolution, things like this
02:30:34.380 | until we have a good handle on a notion of complexity,
02:30:38.540 | which we know is in the eye of the beholder.
02:30:40.860 | - Yeah, it's sad to think that we might not be able
02:30:44.420 | to detect or interact with alien species
02:30:47.620 | because we're wearing different glasses.
02:30:50.340 | - Because the notion of locality
02:30:51.500 | might be different from ours.
02:30:52.460 | - Yeah, exactly.
02:30:53.300 | - This actually connects with fascinating questions
02:30:55.260 | in physics at the moment, like modern physics,
02:30:58.140 | quantum physics, like, you know, questions about,
02:31:00.300 | like, you know, can we recover the information
02:31:02.580 | that's lost in a black hole and things like this, right?
02:31:04.620 | And that relies on notions of complexity,
02:31:07.980 | which, you know, I find this fascinating.
02:31:11.700 | - Can you describe your personal quest
02:31:13.420 | to build an expressive electronic wind instrument, EWI?
02:31:18.420 | What is it?
02:31:20.660 | What does it take to build it?
02:31:24.060 | - Well, I'm a tinkerer.
02:31:25.140 | I like building things.
02:31:26.820 | I like building things with combinations of electronics
02:31:29.020 | and, you know, mechanical stuff.
02:31:32.460 | You know, I have a bunch of different hobbies,
02:31:34.140 | but, you know, probably my first one was little,
02:31:38.020 | was building model airplanes and stuff like that.
02:31:39.820 | And I still do that to some extent,
02:31:41.900 | but also electronics.
02:31:42.740 | I taught myself electronics before I studied it.
02:31:45.180 | And the reason I taught myself electronics
02:31:48.140 | is because of music.
02:31:49.620 | My cousin was an aspiring electronic musician
02:31:53.180 | and he had an analog synthesizer.
02:31:55.020 | And I was, you know, basically modifying it for him
02:31:58.020 | and building sequencers and stuff like that, right, for him.
02:32:00.260 | I was in high school when I was doing this.
02:32:02.620 | - How's the interest in like progressive rock, like '80s?
02:32:06.060 | Like what's the greatest band of all time,
02:32:07.980 | according to Yann LeCun?
02:32:09.500 | - Oh, man, there's too many of them.
02:32:11.100 | But, you know, it's a combination of, you know,
02:32:16.100 | Mahavishnu Orchestra, Weather Report,
02:32:19.820 | Yes, Genesis, you know,
02:32:22.780 | - Yes, Genesis.
02:32:23.980 | - Pre-Peter Gabriel, Gentle Giant, you know,
02:32:28.100 | things like that.
02:32:29.100 | - Great.
02:32:29.940 | Okay, so this love of electronics
02:32:32.300 | and this love of music combined together.
02:32:34.260 | - Right, so I was actually trained to play
02:32:36.340 | Baroque and Renaissance music.
02:32:39.500 | And I played in an orchestra when I was in high school
02:32:43.300 | and first year of college.
02:32:45.780 | And I played the recorder, cram horn,
02:32:48.060 | a little bit of oboe, you know, things like that.
02:32:50.220 | So I'm a wind instrument player.
02:32:52.540 | But I always wanted to play improvised music,
02:32:54.100 | even though I don't know anything about it.
02:32:56.340 | And the only way I figured, you know,
02:32:58.780 | short of like learning to play a saxophone
02:33:01.060 | was to play electronic wind instruments.
02:33:03.540 | So they behave, you know,
02:33:04.540 | the fingering is similar to a saxophone,
02:33:06.380 | but, you know, you have wide variety of sound
02:33:09.060 | because you control the synthesizer with it.
02:33:11.020 | So I had a bunch of those, you know,
02:33:13.100 | going back to the late 80s from either Yamaha or Akai.
02:33:18.100 | They're both kind of the main manufacturers of those.
02:33:22.500 | So they were classically, you know,
02:33:23.660 | going back several decades.
02:33:25.700 | But I've never been completely satisfied with them
02:33:27.660 | because of lack of expressivity.
02:33:29.260 | And, you know, those things, you know,
02:33:32.460 | are somewhat expressive.
02:33:33.420 | I mean, they measure the breath pressure,
02:33:34.780 | they measure the lip pressure,
02:33:36.540 | and, you know, you have various parameters.
02:33:39.820 | You can vary it with fingers,
02:33:41.500 | but they're not really as expressive
02:33:44.820 | as an acoustic instrument, right?
02:33:47.100 | You hear John Coltrane play two notes
02:33:49.420 | and you know it's John Coltrane,
02:33:50.820 | you know, it's got a unique sound.
02:33:53.060 | Or Miles Davis, right?
02:33:54.340 | You can hear it's Miles Davis playing the trumpet
02:33:57.540 | because the sound reflects their, you know,
02:34:02.540 | physiognomy, basically.
02:34:04.780 | The shape of the vocal track kind of shapes the sound.
02:34:09.700 | So how do you do this with an electronic instrument?
02:34:12.860 | And I was, many years ago I met a guy called David Wessel.
02:34:16.140 | He was a professor at Berkeley
02:34:18.780 | and created the Center for like, you know,
02:34:21.940 | music technology there.
02:34:23.500 | And he was interested in that question.
02:34:26.140 | And so I kept kind of thinking about this for many years.
02:34:28.620 | And finally, because of COVID, you know, I was at home.
02:34:31.540 | I was in my workshop.
02:34:32.580 | My workshop serves also as my kind of Zoom room
02:34:36.020 | and home office.
02:34:37.340 | - This is in New Jersey?
02:34:38.780 | - In New Jersey.
02:34:39.620 | And I started really being serious about, you know,
02:34:43.580 | building my own EWI instrument.
02:34:45.780 | - What else is going on in that New Jersey workshop?
02:34:48.140 | Is there some crazy stuff you've built?
02:34:50.860 | Like just, or like left on the workshop floor, left behind?
02:34:55.180 | - A lot of crazy stuff is, you know,
02:34:57.580 | electronics built with microcontrollers of various kinds
02:35:01.660 | and, you know, weird flying contraptions.
02:35:04.860 | - So you still love flying?
02:35:08.700 | - It's a family disease.
02:35:09.860 | My dad got me into it when I was a kid
02:35:12.620 | and he was building model airplanes when he was a kid.
02:35:16.820 | And he was a mechanical engineer.
02:35:19.780 | He taught himself electronics also.
02:35:21.140 | So he built his early radio control systems
02:35:24.060 | in the late sixties, early seventies.
02:35:26.420 | And so that's what got me into, I mean,
02:35:29.780 | he got me into kind of, you know,
02:35:31.060 | engineering and science and technology.
02:35:33.020 | - Do you also have an interest in appreciation of flight
02:35:36.100 | in other forms, like with drones, quadropters, or do you,
02:35:39.220 | is it model airplane, the thing that's-
02:35:42.700 | - You know, before drones were, you know,
02:35:45.180 | kind of a consumer product, you know,
02:35:49.180 | I built my own, you know,
02:35:50.220 | with also building a microcontroller
02:35:51.940 | with a gyroscopes and accelerometers for stabilization,
02:35:56.220 | writing the firmware for it, you know.
02:35:57.700 | And then when it became kind of a standard thing
02:35:59.140 | you could buy, it was boring, you know,
02:36:00.300 | I stopped doing it, it was not fun anymore.
02:36:02.460 | - Yeah, you were doing it before it was cool.
02:36:06.260 | - Yeah.
02:36:07.100 | - What advice would you give to a young person today
02:36:10.020 | in high school and college that dreams of doing
02:36:13.780 | something big like Yann LeCun,
02:36:15.940 | like let's talk in the space of intelligence,
02:36:18.940 | dreams of having a chance to solve
02:36:20.940 | some fundamental problem in space of intelligence,
02:36:23.980 | both for their career and just in life,
02:36:26.180 | being somebody who was a part of creating something special?
02:36:30.700 | - So try to get interested by big questions,
02:36:35.420 | things like, you know, what is intelligence?
02:36:38.660 | What is the universe made of?
02:36:40.420 | What's life all about?
02:36:41.660 | Things like that.
02:36:42.500 | Like even like crazy big questions, like what's time?
02:36:49.060 | Like nobody knows what time is.
02:36:51.460 | And then learn basic things, like basic methods,
02:36:56.460 | either from math, from physics or from engineering.
02:37:03.260 | Things that have a long shelf life.
02:37:05.620 | Like if you have a choice between like, you know,
02:37:08.740 | learning, you know, mobile programming on iPhone
02:37:11.700 | or quantum mechanics, take quantum mechanics.
02:37:14.860 | Because you're gonna learn things
02:37:18.500 | that you have no idea exist.
02:37:20.420 | And you may not, you may never be a quantum physicist,
02:37:25.340 | but you'll learn about path integrals
02:37:26.780 | and path integrals are used everywhere.
02:37:29.140 | It's the same formula that you use for, you know,
02:37:31.100 | Bayesian integration and stuff like that.
02:37:33.300 | - So the ideas, the little ideas within quantum mechanics,
02:37:38.100 | within some of these kind of more solidified fields
02:37:41.460 | will have a longer shelf life.
02:37:42.660 | They'll somehow use indirectly in your work.
02:37:46.940 | - Learn classical mechanics,
02:37:48.100 | like you learn about Lagrangians, for example.
02:37:50.420 | Which is like a hugely useful concept, you know,
02:37:55.140 | for all kinds of different things.
02:37:57.300 | Learn statistical physics, because all the math
02:38:01.660 | that comes out of, you know, for machine learning,
02:38:04.420 | basically comes out of what's figured out
02:38:07.260 | by statistical physicists in the, you know,
02:38:09.220 | late 19th, early 20th century, right?
02:38:10.940 | So, and for some of them actually more recently,
02:38:14.260 | for by people like Giorgio Parisi,
02:38:16.100 | who just got the Nobel prize for the replica method,
02:38:19.060 | among other things, it's used for a lot of different things.
02:38:23.180 | You know, variational inference,
02:38:25.580 | that math comes from statistical physics.
02:38:27.620 | So, a lot of those kind of, you know, basic courses,
02:38:33.580 | you know, if you do electrical engineering,
02:38:36.220 | you take signal processing,
02:38:37.620 | you'll learn about Fourier transforms.
02:38:39.860 | Again, something super useful is at the basis
02:38:42.700 | of things like graph neural nets,
02:38:44.900 | which is an entirely new sub area of, you know,
02:38:49.380 | AI machine learning, deep learning,
02:38:50.660 | which I think is super promising
02:38:52.140 | for all kinds of applications.
02:38:54.340 | Something very promising,
02:38:55.220 | if you're more interested in applications,
02:38:56.660 | is the applications of AI machine learning
02:38:58.820 | and deep learning to science.
02:39:00.420 | Or to science that can help solve big problems in the world.
02:39:05.540 | I have colleagues at Meta, at FAIR,
02:39:09.220 | who started this project called Open Catalyst,
02:39:11.220 | and it's an open project collaborative.
02:39:14.540 | And the idea is to use deep learning
02:39:16.620 | to help design new chemical compounds or materials
02:39:21.620 | that would facilitate the separation
02:39:23.740 | of hydrogen from oxygen.
02:39:25.780 | If you can efficiently separate oxygen from hydrogen
02:39:29.020 | with electricity, you solve climate change.
02:39:33.500 | It's as simple as that.
02:39:34.420 | Because you cover, you know,
02:39:37.580 | some random desert with solar panels,
02:39:39.740 | and you have them work all day, produce hydrogen,
02:39:43.420 | and then you shoot the hydrogen wherever it's needed.
02:39:45.380 | You don't need anything else.
02:39:46.820 | You know, you have controllable power
02:39:53.420 | that's, you know, can be transported anywhere.
02:39:55.620 | So if we have a large-scale, efficient
02:39:59.700 | energy storage technology, like producing hydrogen,
02:40:04.180 | we solve climate change.
02:40:06.620 | Here's another way to solve climate change,
02:40:08.500 | is figuring out how to make fusion work.
02:40:10.420 | Now, the problem with fusion
02:40:11.460 | is that you make a super-hot plasma,
02:40:13.580 | and the plasma is unstable, and you can't control it.
02:40:16.220 | Maybe with deep learning, you can find controllers
02:40:17.940 | that will stabilize plasma
02:40:19.100 | and make, you know, practical fusion reactors.
02:40:21.620 | I mean, that's very speculative,
02:40:23.060 | but, you know, it's worth trying,
02:40:24.460 | because, you know, the payoff is huge.
02:40:28.260 | There's a group at Google working on this,
02:40:29.900 | led by John Platt.
02:40:31.140 | - So, control, convert as many problems
02:40:33.900 | in science and physics and biology and chemistry
02:40:36.780 | into a learnable problem,
02:40:39.740 | and see if a machine can learn it.
02:40:41.540 | - Right, I mean, there's properties of, you know,
02:40:43.900 | complex materials that we don't understand
02:40:46.300 | from first principle, for example, right?
02:40:48.540 | So, you know, if we could design new, you know,
02:40:53.060 | new materials, we could make more efficient batteries.
02:40:56.420 | You know, we could make maybe faster electronics.
02:40:58.780 | We could, I mean, there's a lot of things we can imagine
02:41:01.900 | doing, or, you know, lighter materials
02:41:04.500 | for cars or airplanes and things like that.
02:41:06.420 | Maybe better fuel cells.
02:41:07.620 | I mean, there's all kinds of stuff we can imagine.
02:41:09.500 | If we had good fuel cells, hydrogen fuel cells,
02:41:12.300 | we could use them to power airplanes,
02:41:13.620 | and, you know, transportation wouldn't be, or cars,
02:41:17.220 | we wouldn't have emission problem,
02:41:20.300 | CO2 emission problems for air transportation anymore.
02:41:24.580 | So, there's a lot of those things,
02:41:26.500 | I think, where AI, you know, can be used.
02:41:29.180 | And this is not even talking about all the sort of
02:41:32.420 | medicine, biology, and everything like that, right?
02:41:35.660 | You know, like protein folding, you know,
02:41:38.100 | figuring out, like, how can you design your proteins
02:41:40.540 | that it sticks to another protein at a particular site,
02:41:42.820 | because that's how you design drugs in the end.
02:41:45.180 | So, you know, deep learning would be useful,
02:41:47.580 | all of this, and those are kind of, you know,
02:41:49.260 | would be sort of enormous progress
02:41:51.100 | if we could use it for that.
02:41:53.380 | Here's an example.
02:41:54.300 | If you take, this is like from recent material physics,
02:41:58.260 | you take a monoatomic layer of graphene, right?
02:42:02.180 | So, it's just carbon on an hexagonal mesh,
02:42:04.900 | and you make this single atom thick.
02:42:09.140 | You put another one on top,
02:42:10.340 | you twist them by some magic number of degrees,
02:42:13.100 | three degrees or something, it becomes superconductor.
02:42:16.780 | Nobody has any idea why.
02:42:18.100 | (both laughing)
02:42:20.820 | - I wanna know how that was discovered,
02:42:22.460 | but that's the kind of thing that machine learning
02:42:23.900 | can actually discover, these kinds of things.
02:42:25.820 | - Maybe not, but there is a hint, perhaps,
02:42:28.980 | that with machine learning, we would train a system
02:42:31.740 | to basically be a phenomenological model
02:42:34.860 | of some complex emergent phenomenon,
02:42:37.220 | which superconductivity is one of those,
02:42:40.380 | where this collective phenomenon is too difficult
02:42:45.340 | to describe from first principles
02:42:46.900 | with the usual sort of reductionist type method.
02:42:51.900 | But we could have deep learning systems
02:42:54.940 | that predict the properties of a system
02:42:57.660 | from a description of it,
02:42:59.180 | after being trained with sufficiently many samples.
02:43:02.660 | This guy, Pascal Fouad, at EPFL,
02:43:06.660 | he has a startup company
02:43:08.100 | where he basically trained a convolutional net,
02:43:13.420 | essentially, to predict the aerodynamic properties of solids.
02:43:17.980 | And you can generate as much data as you want
02:43:19.620 | by just running computational free dynamics, right?
02:43:21.900 | So, you give a wing airfoil or something,
02:43:28.260 | a shape of some kind,
02:43:29.780 | and you run computational free dynamics,
02:43:31.380 | you get, as a result, the drag and lift
02:43:36.380 | and all that stuff, right?
02:43:37.460 | And you can generate lots of data,
02:43:40.060 | train a neural net to make those predictions,
02:43:41.780 | and now what you have is a differentiable model
02:43:44.100 | of, let's say, drag and lift,
02:43:46.940 | as a function of the shape of that solid.
02:43:48.660 | And so you can do background and design,
02:43:49.900 | you can optimize the shape,
02:43:51.460 | so you get the properties you want.
02:43:53.220 | - Yeah, that's incredible.
02:43:56.020 | That's incredible.
02:43:56.860 | And on top of all that,
02:43:58.260 | probably you should read a little bit of literature
02:44:01.420 | and a little bit of history for inspiration and for wisdom,
02:44:06.420 | 'cause after all, all of these technologies
02:44:08.780 | will have to work in the human world.
02:44:10.260 | - Yes.
02:44:11.100 | - And the human world is complicated.
02:44:12.620 | - It is, certainly.
02:44:14.100 | - Jan, this is an amazing conversation.
02:44:18.380 | I'm really honored that you would talk with me today.
02:44:20.380 | Thank you for all the amazing work
02:44:21.820 | you're doing at FAIR, at META,
02:44:23.780 | and thank you for being so passionate
02:44:26.220 | after all these years about everything that's going on.
02:44:28.780 | You're a beacon of hope for the machine learning community.
02:44:31.620 | And thank you so much
02:44:32.700 | for spending your valuable time with me today.
02:44:34.460 | That was awesome.
02:44:35.300 | - Thanks for having me on.
02:44:36.300 | That was a pleasure.
02:44:37.780 | - Thanks for listening to this conversation with Jan Lekun.
02:44:41.420 | To support this podcast,
02:44:42.780 | please check out our sponsors in the description.
02:44:45.740 | And now, let me leave you with some words
02:44:47.820 | from Isaac Asimov.
02:44:49.580 | "Your assumptions are your windows on the world.
02:44:53.700 | "Scrub them off every once in a while,
02:44:55.940 | "or the light won't come in."
02:44:57.860 | Thank you for listening,
02:45:00.060 | and hope to see you next time.
02:45:02.060 | (upbeat music)
02:45:04.660 | (upbeat music)
02:45:07.260 | [BLANK_AUDIO]