Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning

00:00:00.000 | The following is a conversation with Jan Lekun,

00:00:02.740 | his second time on the podcast.

00:00:04.540 | He is the chief AI scientist at Meta, formerly Facebook,

00:00:09.180 | professor at NYU, Turing Award winner,

00:00:13.060 | one of the seminal figures in the history

00:00:15.620 | of machine learning and artificial intelligence,

00:00:18.500 | and someone who is brilliant and opinionated

00:00:21.980 | in the best kind of way,

00:00:23.460 | and so is always fun to talk to.

00:00:26.000 | This is the Lex Friedman Podcast.

00:00:28.000 | To support it, please check out our sponsors

00:00:29.980 | in the description, and now, here's my conversation

00:00:33.500 | with Jan Lekun.

00:00:35.040 | You co-wrote the article,

00:00:37.540 | "Self-supervised learning, the dark matter of intelligence."

00:00:40.900 | Great title, by the way, with Ishan Misra.

00:00:43.720 | So let me ask, what is self-supervised learning,

00:00:46.640 | and why is it the dark matter of intelligence?

00:00:49.940 | - I'll start by the dark matter part.

00:00:51.780 | There is obviously a kind of learning

00:00:55.700 | that humans and animals are doing

00:00:59.860 | that we currently are not reproducing properly

00:01:02.820 | with machines or with AI, right?

00:01:04.660 | So the most popular approaches to machine learning today

00:01:07.460 | are, or paradigms, I should say,

00:01:09.660 | are supervised learning and reinforcement learning.

00:01:12.700 | And they are extremely inefficient.

00:01:15.120 | Supervised learning requires many samples

00:01:17.620 | for learning anything,

00:01:19.740 | and reinforcement learning requires

00:01:21.820 | a ridiculously large number of trial and errors

00:01:24.900 | for a system to learn anything.

00:01:29.340 | And that's why we don't have self-driving cars.

00:01:32.060 | (Lex laughing)

00:01:32.980 | - That was a big leap from one to the other.

00:01:34.780 | Okay, so that, to solve difficult problems,

00:01:38.780 | you have to have a lot of human annotation

00:01:42.340 | for supervised learning to work,

00:01:44.080 | and to solve those difficult problems

00:01:45.500 | with reinforcement learning, you have to have

00:01:47.920 | some way to maybe simulate that problem

00:01:50.220 | such that you can do that large-scale kind of learning

00:01:52.700 | that reinforcement learning requires.

00:01:54.420 | - Right, so how is it that, you know,

00:01:57.180 | most teenagers can learn to drive a car

00:01:59.020 | in about 20 hours of practice,

00:02:02.300 | whereas even with millions of hours of simulated practice,

00:02:07.300 | a self-driving car can't actually learn

00:02:09.220 | to drive itself properly.

00:02:10.700 | And so obviously we're missing something, right?

00:02:13.900 | And it's quite obvious for a lot of people that,

00:02:16.420 | you know, the immediate response you get from many people

00:02:19.520 | is, well, you know, humans use their background knowledge

00:02:22.840 | to learn faster, and they're right.

00:02:25.820 | Now, how was that background knowledge acquired?

00:02:28.260 | And that's the big question.

00:02:30.040 | So now you have to ask, you know,

00:02:32.380 | how do babies in their first few months of life

00:02:35.100 | learn how the world works?

00:02:37.100 | Mostly by observation,

00:02:38.220 | because they can hardly act in the world.

00:02:40.220 | And they learn an enormous amount

00:02:42.500 | of background knowledge about the world.

00:02:43.820 | That may be the basis of what we call common sense.

00:02:47.940 | This type of learning, it's not learning a task,

00:02:51.220 | it's not being reinforced for anything,

00:02:53.620 | it's just observing the world and figuring out how it works.

00:02:57.240 | Building world models, learning world models.

00:03:01.140 | How do we do this?

00:03:02.060 | And how do we reproduce this in machines?

00:03:04.500 | So self-supervised learning is, you know,

00:03:07.620 | one instance or one attempt

00:03:10.220 | at trying to reproduce this kind of learning.

00:03:13.020 | - Okay, so you're looking at just observation,

00:03:16.300 | so not even the interacting part of a child.

00:03:18.620 | It's just sitting there watching mom and dad walk around,

00:03:21.540 | pick up stuff, all of that.

00:03:23.420 | - That's what you mean by background knowledge.

00:03:25.500 | - Perhaps not even watching mom and dad,

00:03:27.500 | just, you know, watching the world go by.

00:03:29.980 | - Just having eyes open or having eyes closed,

00:03:31.900 | or the very act of opening and closing eyes

00:03:34.460 | that the world appears and disappears,

00:03:36.260 | all that basic information.

00:03:37.820 | And you're saying in order to learn to drive,

00:03:43.100 | like the reason humans are able to learn to drive quickly,

00:03:45.820 | some faster than others,

00:03:47.340 | is because of the background knowledge

00:03:48.660 | they were able to watch cars operate in the world

00:03:51.740 | in the many years leading up to it,

00:03:53.580 | the physics of basic objects, all that kind of stuff.

00:03:55.740 | - That's right.

00:03:56.580 | I mean, the basic physics of objects,

00:03:57.420 | you don't even know, you don't even need to know,

00:03:59.540 | you know, how a car works, right?

00:04:00.820 | Because that you can learn fairly quickly.

00:04:02.460 | I mean, the example I use very often

00:04:03.780 | is you're driving next to a cliff,

00:04:05.700 | and you know in advance,

00:04:08.100 | because of your understanding of intuitive physics,

00:04:11.820 | that if you turn the wheel to the right,

00:04:13.740 | the car will veer to the right,

00:04:15.020 | will run off the cliff, fall off the cliff,

00:04:17.580 | and nothing good will come out of this, right?

00:04:20.420 | But if you are a sort of, you know,

00:04:22.740 | tabula rasa reinforcement learning system

00:04:25.100 | that doesn't have a model of the world,

00:04:27.060 | you have to repeat falling off this cliff

00:04:30.500 | thousands of times before you figure out it's a bad idea.

00:04:32.780 | And then a few more thousand times

00:04:34.580 | before you figure out how to not do it.

00:04:36.940 | And then a few more million times

00:04:38.460 | before you figure out how to not do it

00:04:39.780 | in every situation you ever encounter.

00:04:42.500 | - So self-supervised learning still has to have

00:04:45.820 | some source of truth being told to it by somebody.

00:04:50.100 | - So you have to figure out a way without human assistance

00:04:54.540 | or without significant amount of human assistance

00:04:56.580 | to get that truth from the world.

00:04:59.100 | So the mystery there is how much signal is there,

00:05:03.980 | how much truth is there that the world gives you,

00:05:06.260 | whether it's the human world,

00:05:08.180 | like you watch YouTube or something like that,

00:05:10.020 | or it's the more natural world.

00:05:12.980 | So how much signal is there?

00:05:14.900 | - So here's the trick.

00:05:16.300 | There is way more signal in sort of a self-supervised setting

00:05:20.580 | than there is in either a supervised

00:05:22.500 | or reinforcement setting.

00:05:24.540 | And this is going to my analogy of the cake.

00:05:28.340 | Le cake as someone has called it,

00:05:32.340 | where when you try to figure out how much information

00:05:36.020 | you ask the machine to predict

00:05:37.820 | and how much feedback you give the machine at every trial,

00:05:40.980 | in reinforcement learning,

00:05:41.820 | you give the machine a single scaler,

00:05:43.300 | you tell the machine you did good, you did bad,

00:05:45.340 | and you only tell this to the machine once in a while.

00:05:49.580 | When I say you, it could be the universe

00:05:51.380 | telling the machine, right?

00:05:52.780 | But it's just one scaler.

00:05:55.780 | And so as a consequence,

00:05:57.100 | you cannot possibly learn something very complicated

00:05:59.540 | without many, many, many trials

00:06:01.060 | where you get many, many feedbacks of this type.

00:06:04.700 | Supervised learning, you give a few bits to the machine

00:06:08.860 | at every sample.

00:06:10.180 | Let's say you're training a system on, you know,

00:06:14.300 | recognizing images on ImageNet.

00:06:16.300 | There is 1000 categories,

00:06:17.660 | that's a little less than 10 bits of information per sample.

00:06:20.900 | But self-supervised learning here is a setting.

00:06:24.620 | Ideally, we don't know how to do this yet,

00:06:26.340 | but ideally you would show a machine a segment of video

00:06:31.340 | and then stop the video and ask the machine to predict

00:06:34.140 | what's going to happen next.

00:06:35.540 | And so you let the machine predict,

00:06:38.660 | and then you let time go by

00:06:41.380 | and show the machine what actually happened.

00:06:44.260 | And hope the machine will, you know,

00:06:46.300 | learn to do a better job at predicting next time around.

00:06:49.340 | There's a huge amount of information you give the machine

00:06:51.500 | because it's an entire video clip of, you know,

00:06:56.500 | the future after the video clip you fed it

00:06:59.180 | in the first place.

00:07:00.220 | - So both for language and for vision,

00:07:02.820 | there's a subtle, seemingly trivial construction,

00:07:06.860 | but maybe that's representative

00:07:08.460 | of what is required to create intelligence,

00:07:10.580 | which is filling the gap.

00:07:13.700 | - Filling the gaps.

00:07:14.700 | - Sounds dumb, but can you,

00:07:17.780 | it is possible you could solve all of intelligence

00:07:22.060 | in this way, just for both language,

00:07:25.260 | just give a sentence and continue it,

00:07:28.780 | or give a sentence and there's a gap in it,

00:07:31.140 | some words blanked out,

00:07:33.500 | and you fill in what words go there.

00:07:35.700 | For vision, you give a sequence of images

00:07:39.180 | and predict what's gonna happen next,

00:07:40.940 | or you fill in what happened in between.

00:07:43.860 | Do you think it's possible that formulation alone,

00:07:47.020 | as a signal for self-supervised learning,

00:07:50.980 | can solve intelligence for vision and language?

00:07:53.620 | - I think that's our best shot at the moment.

00:07:56.300 | So whether this will take us all the way to,

00:07:59.820 | you know, human level intelligence or something,

00:08:01.780 | or just cat level intelligence is not clear,

00:08:04.860 | but among all the possible approaches

00:08:07.340 | that people have proposed, I think it's our best shot.

00:08:09.500 | So I think this idea of an intelligence system

00:08:14.500 | filling in the blanks, either predicting the future,

00:08:18.860 | inferring the past, filling in missing information,

00:08:22.180 | I'm currently filling the blank of what is behind your head

00:08:26.660 | and what your head looks like from the back,

00:08:30.580 | because I have basic knowledge about how humans are made.

00:08:33.740 | And I don't know if you're gonna,

00:08:35.660 | what are you gonna say, at which point you're gonna speak,

00:08:37.260 | whether you're gonna move your head this way or that way,

00:08:38.980 | which way you're gonna look.

00:08:40.260 | But I know you're not gonna just dematerialize

00:08:42.100 | and reappear three meters down the hall,

00:08:44.940 | because I know what's possible and what's impossible,

00:08:49.540 | according to intuitive physics.

00:08:50.900 | - So you have a model of what's possible and what's impossible

00:08:53.260 | and then you'd be very surprised if it happens,

00:08:55.100 | and then you'll have to reconstruct your model.

00:08:57.860 | - Right, so that's the model of the world.

00:08:59.620 | It's what tells you what fills in the blanks.

00:09:02.260 | So given your partial information

00:09:04.460 | about the state of the world, given by your perception,

00:09:08.060 | your model of the world fills in the missing information.

00:09:11.340 | And that includes predicting the future,

00:09:13.740 | retrodicting the past,

00:09:15.220 | filling in things you don't immediately perceive.

00:09:18.380 | - And that doesn't have to be purely generic vision

00:09:22.260 | or visual information or generic language.

00:09:24.300 | You can go to specifics,

00:09:25.820 | like predicting what control decision you make

00:09:30.260 | when you're driving in a lane.

00:09:31.580 | You have a sequence of images from a vehicle,

00:09:35.580 | and then you have information,

00:09:38.380 | if you record it on video, where the car ended up going.

00:09:41.780 | So you can go back in time and predict where the car went

00:09:45.500 | based on the visual information.

00:09:46.660 | That's very specific, domain-specific.

00:09:49.420 | - Right, but the question is whether we can come up

00:09:51.460 | with sort of a generic method for training machines

00:09:56.460 | to do this kind of prediction or filling in the blanks.

00:09:59.820 | So right now, this type of approach

00:10:03.220 | has been unbelievably successful

00:10:05.540 | in the context of natural language processing.

00:10:08.140 | Every model in natural language processing

00:10:09.660 | is pre-trained in self-supervised manner

00:10:12.220 | to fill in the blanks.

00:10:13.660 | You show it a sequence of words, you remove 10% of them,

00:10:16.380 | and then you train some gigantic neural net

00:10:17.900 | to predict the words that are missing.

00:10:20.260 | And once you've pre-trained that network,

00:10:22.660 | you can use the internal representation learned by it

00:10:26.540 | as input to something that you trained, supervised,

00:10:30.380 | or whatever.

00:10:32.140 | That's been incredibly successful.

00:10:33.300 | Not so successful in images, although it's making progress.

00:10:37.500 | And it's based on sort of manual data augmentation.

00:10:42.500 | We can go into this later.

00:10:43.460 | But what has not been successful yet is training from video.

00:10:47.140 | So getting a machine to learn to represent the visual world,

00:10:50.180 | for example, by just watching video.

00:10:52.700 | Nobody has really succeeded in doing this.

00:10:54.740 | - Okay, well, let's kind of give a high-level overview.

00:10:57.460 | What's the difference in kind and in difficulty

00:11:02.340 | between vision and language?

00:11:03.900 | So you said people haven't been able to really kind of crack

00:11:08.820 | the problem of vision open

00:11:10.420 | in terms of self-supervised learning,

00:11:11.900 | but that may not be necessarily

00:11:13.740 | because it's fundamentally more difficult.

00:11:15.820 | Maybe like when we're talking about achieving,

00:11:18.660 | like passing the Turing test in the full spirit

00:11:22.260 | of the Turing test in language might be harder than vision.

00:11:24.860 | That's not obvious.

00:11:26.380 | So in your view, which is harder,

00:11:29.380 | or perhaps are they just the same problem?

00:11:31.660 | The farther we get to solving each,

00:11:34.820 | the more we realize it's all the same thing.

00:11:36.700 | It's all the same cake.

00:11:37.620 | - I think what I'm looking for are methods

00:11:40.180 | that make them look essentially like the same cake,

00:11:43.580 | but currently they're not.

00:11:44.740 | And the main issue with learning world models

00:11:48.460 | or learning predictive models is that

00:11:50.860 | the prediction is never a single thing,

00:11:55.860 | because the world is not entirely predictable.

00:11:59.220 | It may be deterministic or stochastic.

00:12:00.700 | We can get into the philosophical discussion about it,

00:12:02.940 | but even if it's deterministic,

00:12:05.260 | it's not entirely predictable.

00:12:07.420 | And so if I play a short video clip

00:12:11.740 | and then I ask you to predict what's going to happen next,

00:12:14.140 | there's many, many plausible continuations

00:12:16.340 | for that video clip.

00:12:18.300 | And the number of continuation grows

00:12:20.540 | with the interval of time that you're asking the system

00:12:23.900 | to make a prediction for.

00:12:26.460 | And so one big question with self-supervised learning

00:12:29.860 | is how you represent this uncertainty,

00:12:32.300 | how you represent multiple discrete outcomes,

00:12:35.180 | how you represent a sort of continuum

00:12:37.060 | of possible outcomes, et cetera.

00:12:40.380 | And if you are a sort of a classical machine learning person,

00:12:45.180 | you say, "Oh, you just represent a distribution."

00:12:47.580 | And that we know how to do when we're predicting words,

00:12:52.540 | missing words in the text,

00:12:53.660 | because you can have a neural net give a score

00:12:56.820 | for every word in a dictionary.

00:12:58.580 | It's a big list of numbers, maybe 100,000 or so.

00:13:02.420 | And you can turn them into a probability distribution

00:13:05.220 | that tells you when I say a sentence,

00:13:07.620 | the cat is chasing the blank in the kitchen.

00:13:12.300 | There are only a few words that make sense there.

00:13:15.820 | It could be a mouse or it could be a lizard spot

00:13:18.340 | or something like that.

00:13:21.540 | And if I say the blank is chasing the blank in the savanna,

00:13:25.820 | you also have a bunch of plausible options

00:13:27.820 | for those two words, right?

00:13:29.180 | Because you have kind of underlying reality

00:13:33.620 | that you can refer to to sort of fill in those blanks.

00:13:36.260 | So you cannot say for sure in the savanna

00:13:42.020 | if it's a lion or a cheetah or whatever,

00:13:44.460 | you cannot know if it's a zebra or a gnu or whatever,

00:13:49.460 | wildebeest, the same thing.

00:13:50.860 | - Yeah.

00:13:51.700 | - But you can represent the uncertainty

00:13:56.820 | by just a long list of numbers.

00:13:58.460 | Now, if I do the same thing with video

00:14:01.780 | and I ask you to predict a video clip,

00:14:04.300 | it's not a discrete set of potential frames.

00:14:07.380 | You have to have some way of representing

00:14:09.980 | a sort of infinite number of plausible continuations

00:14:13.540 | of multiple frames in a high dimensional continuous space.

00:14:17.460 | And we just have no idea how to do this properly.

00:14:20.540 | - Finite high dimensional.

00:14:22.860 | So like you--

00:14:23.700 | - It's finite high dimensional, yes.

00:14:25.300 | - Just like the words,

00:14:26.220 | they try to get it down to a small finite set

00:14:31.220 | of like under a million, something like that.

00:14:34.220 | - Something like that.

00:14:35.060 | - I mean, it's kind of ridiculous

00:14:36.020 | that we're doing a distribution

00:14:39.020 | of every single possible word for language and it works.

00:14:42.900 | It feels like that's a really dumb way to do it.

00:14:45.300 | Like there seems to be like there should be

00:14:49.720 | some more compressed representation

00:14:52.900 | of the distribution of the words.

00:14:55.020 | - You're right about that.

00:14:56.140 | - And so--

00:14:56.980 | - I agree.

00:14:57.800 | - Do you have any interesting ideas

00:14:58.900 | about how to represent all of reality in a compressed way

00:15:01.860 | such that you can form a distribution over it?

00:15:03.780 | - That's one of the big questions, how do you do that?

00:15:06.180 | But I mean, what's kind of,

00:15:07.980 | another thing that really is stupid about,

00:15:12.140 | I shouldn't say stupid,

00:15:13.060 | but like simplistic about current approaches

00:15:15.540 | to self-supervised learning in NLP and text

00:15:19.340 | is that not only do you represent

00:15:21.880 | a giant distribution of words,

00:15:23.780 | but for multiple words that are missing,

00:15:25.640 | those distributions are essentially independent

00:15:27.660 | of each other.

00:15:28.500 | And you don't pay too much of a price for this.

00:15:33.020 | So you can't, so the system,

00:15:36.720 | in the sentence that I gave earlier,

00:15:38.900 | if it gives a certain probability for a lion and a cheetah,

00:15:43.620 | and then a certain probability for gazelle, wildebeest,

00:15:48.420 | and zebra, those two probabilities

00:15:52.980 | are independent of each other.

00:15:54.780 | And it's not the case that those things are independent.

00:15:58.020 | Lions actually attack like bigger animals than cheetahs.

00:16:01.440 | So, there's a huge independence hypothesis in this process,

00:16:05.940 | which is not actually true.

00:16:07.780 | The reason for this is that we don't know how to represent

00:16:10.860 | properly distributions over combinatorial sequences

00:16:15.580 | of symbols, essentially,

00:16:17.340 | because the number grows exponentially

00:16:18.980 | with the length of the symbols.

00:16:21.300 | And so we have to use tricks for this,

00:16:22.740 | but those techniques can get around,

00:16:26.380 | like don't even deal with it.

00:16:27.780 | So the big question is,

00:16:30.420 | would there be some sort of abstract

00:16:33.380 | latent representation of text that would say that,

00:16:37.420 | when I switch lion for gazelle,

00:16:40.660 | lion for cheetah, I also have to switch zebra for gazelle.

00:16:45.480 | - Yeah, so this independence assumption,

00:16:48.720 | let me throw some criticism at you that I often hear

00:16:51.140 | and see how you respond.

00:16:52.940 | So this kind of filling in the blanks is just statistics.

00:16:56.020 | You're not learning anything,

00:16:58.000 | like the deep underlying concepts.

00:17:01.580 | You're just mimicking stuff from the past.

00:17:05.660 | You're not learning anything new

00:17:07.540 | such that you can use it to generalize about the world.

00:17:10.800 | Or, okay, let me just say the crude version,

00:17:14.100 | which is it's just statistics.

00:17:16.200 | It's not intelligence.

00:17:17.860 | What do you have to say to that?

00:17:19.620 | What do you usually say to that

00:17:20.880 | if you kind of hear this kind of thing?

00:17:22.640 | - I don't get into those discussions

00:17:23.940 | because they're kind of pointless.

00:17:26.740 | So first of all, it's quite possible

00:17:28.740 | that intelligence is just statistics.

00:17:30.460 | It's just statistics of a particular kind.

00:17:32.540 | - But this is the philosophical question.

00:17:35.580 | Is it possible that intelligence is just statistics?

00:17:40.260 | - Yeah.

00:17:41.580 | But what kind of statistics?

00:17:43.500 | So if you are asking the question,

00:17:46.180 | are the models of the world that we learn,

00:17:50.620 | do they have some notion of causality?

00:17:52.300 | Yes.

00:17:53.380 | So if the criticism comes from people who say,

00:17:56.220 | current machine learning system don't care about causality,

00:17:59.420 | which by the way is wrong, I agree with them.

00:18:03.100 | Your model of the world should have your actions

00:18:06.560 | as one of the inputs,

00:18:09.100 | and that will drive you to learn causal models of the world

00:18:11.420 | where you know what intervention in the world

00:18:15.060 | will cause what result,

00:18:16.700 | or you can do this by observation of other agents

00:18:19.420 | acting in the world and observing the effect,

00:18:21.920 | other humans, for example.

00:18:24.220 | So I think at some level of description,

00:18:28.380 | intelligence is just statistics,

00:18:30.200 | but that doesn't mean you won't have models

00:18:35.180 | that have deep mechanistic explanation for what goes on.

00:18:40.060 | The question is, how do you learn them?

00:18:41.740 | That's the question I'm interested in.

00:18:44.420 | Because a lot of people who actually voice their criticism

00:18:49.340 | say that those mechanistic model

00:18:50.980 | have to come from someplace else.

00:18:52.660 | They have to come from human designers,

00:18:54.060 | they have to come from I don't know what.

00:18:56.180 | And obviously we learn them.

00:18:57.880 | Or if we don't learn them as an individual,

00:19:01.800 | nature learn them for us using evolution.

00:19:04.920 | So regardless of what you think,

00:19:07.180 | those processes have been learned somehow.

00:19:10.260 | - So if you look at the human brain,

00:19:12.940 | just like when we humans introspect

00:19:14.660 | about how the brain works,

00:19:16.340 | it seems like when we think about what is intelligence,

00:19:20.260 | we think about the high level stuff,

00:19:22.460 | like the models we've constructed,

00:19:23.960 | concepts like cognitive science,

00:19:25.580 | like concepts of memory and reasoning module,

00:19:28.700 | almost like these high level modules.

00:19:31.660 | Is this serve as a good analogy?

00:19:35.380 | Like are we ignoring the dark matter,

00:19:40.380 | the basic low level mechanisms,

00:19:43.580 | just like we ignore the way the operating system works,

00:19:45.820 | we're just using the high level software.

00:19:49.660 | We're ignoring that at the low level,

00:19:52.740 | the neural network might be doing something like statistics.

00:19:56.460 | Like me, sorry to use this word

00:19:59.140 | probably incorrectly and crudely,

00:20:00.580 | but doing this kind of fill in the gap kind of learning

00:20:03.340 | and just kind of updating the model constantly

00:20:05.740 | in order to be able to support the raw sensory information,

00:20:09.260 | to predict it and then adjust to the prediction

00:20:11.380 | when it's wrong.

00:20:12.420 | But like when we look at our brain at the high level,

00:20:15.860 | it feels like we're playing chess,

00:20:18.340 | like we're playing with high level concepts

00:20:22.260 | and we're stitching them together

00:20:23.700 | and we're putting them into long-term memory.

00:20:26.040 | But really what's going underneath

00:20:28.300 | is something we're not able to introspect,

00:20:30.200 | which is this kind of simple, large neural network

00:20:34.460 | that's just filling in the gaps.

00:20:36.020 | - Right, well, okay, so there's a lot of questions

00:20:38.260 | and a lot of answers there.

00:20:39.780 | Okay, so first of all,

00:20:40.620 | there's a whole school of thought in neuroscience,

00:20:42.700 | computational neuroscience in particular,

00:20:45.260 | that likes the idea of predictive coding,

00:20:47.800 | which is really related to the idea I was talking about

00:20:50.820 | in self-supervised learning.

00:20:52.060 | So everything is about prediction,

00:20:53.580 | the essence of intelligence is the ability to predict

00:20:56.360 | and everything the brain does is trying to predict

00:20:59.940 | everything from everything else.

00:21:02.140 | Okay, and that's really sort of the underlying principle,

00:21:04.780 | if you want, that self-supervised learning

00:21:07.820 | is trying to kind of reproduce this idea of prediction

00:21:10.660 | as kind of an essential mechanism

00:21:13.060 | of task-independent learning, if you want.

00:21:16.320 | The next step is what kind of intelligence

00:21:19.340 | are you interested in reproducing?

00:21:21.140 | And of course, we all think about trying to reproduce

00:21:25.300 | high-level cognitive processes in humans,

00:21:28.340 | but with machines, we're not even at the level

00:21:30.420 | of even reproducing the learning processes in a cat brain.

00:21:35.420 | The most intelligent or intelligent systems

00:21:39.020 | don't have as much common sense as a house cat.

00:21:41.960 | So how is it that cats learn?

00:21:45.180 | And cats don't do a whole lot of reasoning.

00:21:47.900 | They certainly have causal models.

00:21:49.580 | They certainly have, because many cats can figure out

00:21:53.620 | how they can act on the world to get what they want.

00:21:56.660 | They certainly have a fantastic model of intuitive physics,

00:22:01.660 | certainly the dynamics of their own bodies,

00:22:04.620 | but also of praise and things like that, right?

00:22:06.940 | So they're pretty smart.

00:22:09.940 | They only do this with about 800 million neurons.

00:22:12.460 | We are not anywhere close

00:22:15.980 | to reproducing this kind of thing.

00:22:17.980 | So to some extent, I could say,

00:22:21.340 | let's not even worry about the high-level cognition

00:22:26.340 | and long-term planning and reasoning that humans can do

00:22:28.660 | until we figure out,

00:22:30.100 | can we even reproduce what cats are doing?

00:22:32.500 | Now, that said, this ability to learn world models,

00:22:37.000 | I think is the key to the possibility

00:22:40.140 | of learning machines that can also reason.

00:22:43.160 | So whenever I give a talk, I say,

00:22:44.340 | there are three main challenges in machine learning.

00:22:47.300 | The first one is getting machines to learn

00:22:49.940 | to represent the world,

00:22:51.820 | and I'm proposing self-supervised learning.

00:22:54.840 | The second is getting machines to reason

00:22:58.000 | in ways that are compatible

00:22:59.240 | with essentially gradient-based learning,

00:23:01.600 | because this is what deep learning is all about, really.

00:23:04.280 | And the third one is something we have no idea how to solve,

00:23:07.640 | at least I have no idea how to solve,

00:23:09.480 | is can we get machines to learn hierarchical representations

00:23:14.360 | of action plans?

00:23:15.980 | We know how to train them

00:23:18.780 | to learn hierarchical representations of perception,

00:23:22.240 | with convolutional nets and things like that,

00:23:23.680 | and transformers, but what about action plans?

00:23:26.080 | Can we get them to spontaneously learn

00:23:28.320 | good hierarchical representations of actions?

00:23:30.560 | - Also gradient-based.

00:23:32.440 | - Yeah, all of that needs to be somewhat differentiable

00:23:35.920 | so that you can apply sort of gradient-based learning,

00:23:38.760 | which is really what deep learning is about.

00:23:40.920 | - So it's background, knowledge, ability to reason

00:23:46.760 | in a way that's differentiable,

00:23:50.520 | that is somehow connected, deeply integrated

00:23:53.840 | with that background knowledge,

00:23:55.480 | or builds on top of that background knowledge,

00:23:57.640 | and then given that background knowledge,

00:23:59.120 | be able to make hierarchical plans in the world.

00:24:02.400 | - So if you take classical optimal control,

00:24:05.480 | there's something in classical optimal control

00:24:07.000 | called model predictive control,

00:24:10.520 | and it's been around since the early '60s.

00:24:13.840 | NASA uses that to compute trajectories of rockets.

00:24:16.840 | And the basic idea is that you have a predictive model

00:24:20.600 | of the rocket, let's say,

00:24:21.840 | or whatever system you intend to control,

00:24:25.440 | which, given the state of the system at time t,

00:24:28.360 | and given an action that you're taking on the system,

00:24:31.640 | so for a rocket to be thrust,

00:24:33.520 | and all the controls you can have,

00:24:35.600 | it gives you the state of the system

00:24:37.960 | at time t plus delta t, right?

00:24:39.560 | So basically a differential equation, something like that.

00:24:44.240 | And if you have this model,

00:24:45.920 | and you have this model in the form

00:24:47.840 | of some sort of neural net,

00:24:49.360 | or some sort of set of formula

00:24:51.600 | that you can back propagate gradient through,

00:24:53.600 | you can do what's called model predictive control,

00:24:55.880 | or gradient-based model predictive control.

00:24:58.280 | So you have, you can unroll that model in time,

00:25:03.280 | you feed it a hypothesized sequence of actions,

00:25:09.560 | and then you have some objective function

00:25:13.560 | that measures how well, at the end of the trajectory,

00:25:16.040 | the system has succeeded or matched what you want it to do.

00:25:18.880 | Is it a robot harm?

00:25:21.200 | Have you grasped the object you want to grasp?

00:25:23.400 | If it's a rocket, are you at the right place

00:25:26.320 | near the space station?

00:25:27.600 | Things like that.

00:25:29.080 | And by back propagation through time,

00:25:30.960 | and again, this was invented in the 1960s

00:25:32.800 | by optimal control theorists,

00:25:35.120 | you can figure out what is the optimal sequence of actions

00:25:39.080 | that will get my system to the best final state.

00:25:44.040 | So that's a form of reasoning.

00:25:47.560 | It's basically planning,

00:25:48.640 | and a lot of planning systems in robotics

00:25:51.120 | are actually based on this.

00:25:52.840 | And you can think of this as a form of reasoning.

00:25:56.160 | So to take the example of the teenager driving a car again,

00:26:00.800 | you have a pretty good dynamical model of the car,

00:26:02.880 | it doesn't need to be very accurate,

00:26:04.200 | but you know, again, that if you turn the wheel

00:26:06.760 | to the right and there is a cliff,

00:26:08.080 | you're gonna run off the cliff, right?

00:26:09.400 | You don't need to have a very accurate model

00:26:10.840 | to predict that.

00:26:12.080 | And you can run this in your mind

00:26:13.640 | and decide not to do it for that reason,

00:26:16.040 | because you can predict in advance

00:26:17.440 | that the result is gonna be bad.

00:26:18.560 | So you can sort of imagine different scenarios

00:26:21.000 | and then employ or take the first step in the scenario

00:26:25.200 | that is most favorable,

00:26:26.360 | and then repeat the process of planning.

00:26:27.800 | That's called receding horizon model predictive control.

00:26:30.560 | So even all those things have names,

00:26:33.000 | going back decades.

00:26:35.800 | And so if you're not, you know,

00:26:38.960 | in classical optimal control,

00:26:40.040 | the model of the world is not generally learned.

00:26:42.440 | There's, you know, sometimes a few parameters

00:26:44.840 | you have to identify,

00:26:45.680 | that's called systems identification.

00:26:47.080 | But generally the model is mostly deterministic

00:26:52.000 | and mostly built by hand.

00:26:53.280 | So the big question of AI,

00:26:55.640 | I think the big challenge of AI for the next decade

00:26:58.720 | is how do we get machines to run predictive models

00:27:01.080 | of the world that deal with uncertainty

00:27:03.680 | and deal with the real world in all this complexity.

00:27:05.800 | So it's not just the trajectory of a rocket,

00:27:08.120 | which you can reduce to first principles.

00:27:10.200 | It's not even just the trajectory of a robot arm,

00:27:13.000 | which again, you can model by, you know,

00:27:14.880 | careful mathematics, but it's everything else,

00:27:17.160 | everything we observe in the world, you know,

00:27:18.840 | people behavior, you know, physical systems

00:27:22.960 | that involve collective phenomena like water or, you know,

00:27:27.960 | trees and, you know, branches in a tree or something,

00:27:31.880 | or like complex things that, you know,

00:27:35.040 | humans have no trouble developing abstract representations

00:27:38.520 | and predictive model for,

00:27:39.840 | but we still don't know how to do with machines.

00:27:41.600 | - Where do you put in these three,

00:27:43.880 | maybe in the planning stages,

00:27:46.160 | the game theoretic nature of this world,

00:27:50.640 | where your actions not only respond

00:27:52.960 | to the dynamic nature of the world, the environment,

00:27:55.520 | but also affect it.

00:27:57.520 | So if there's other humans involved,

00:27:59.840 | is this point number four,

00:28:02.240 | or is it somehow integrated

00:28:03.400 | into the hierarchical representation of action in your view?

00:28:06.640 | - I think it's integrated.

00:28:07.480 | It's just that now your model of the world has to deal with,

00:28:11.360 | you know, it just makes it more complicated, right?

00:28:13.080 | The fact that humans are complicated

00:28:15.600 | and not easily predictable,

00:28:17.240 | that makes your model of the world much more complicated,

00:28:19.880 | that much more complicated.

00:28:21.360 | - Well, there's a chess, I mean,

00:28:23.080 | I suppose chess is an analogy.

00:28:25.280 | So Monte Carlo tree search.

00:28:28.120 | I mean, there's a, I go, you go, I go, you go.

00:28:32.040 | Like Andrej Karpathy recently gave a talk

00:28:34.960 | at MIT about car doors.

00:28:36.960 | I think there's some machine learning too,

00:28:39.280 | but mostly car doors.

00:28:40.920 | And there's a dynamic nature to the car,

00:28:43.360 | like the person opening the door checking.

00:28:45.720 | I mean, he wasn't talking about that.

00:28:46.880 | He was talking about the perception problem

00:28:48.440 | of what the ontology of what defines a car door,

00:28:50.920 | this big philosophical question.

00:28:52.920 | But to me, it was interesting,

00:28:54.080 | 'cause like it's obvious

00:28:55.720 | that the person opening the car doors,

00:28:57.320 | they're trying to get out, like here in New York,

00:28:59.600 | trying to get out of the car.

00:29:01.440 | You slowing down is going to signal something.

00:29:03.640 | You speeding up is gonna signal something.

00:29:05.440 | That's a dance.

00:29:06.480 | It's a asynchronous chess game.

00:29:10.200 | I don't know.

00:29:11.040 | So it feels like it's not just,

00:29:16.960 | I mean, I guess you can integrate all of them

00:29:18.800 | to one giant model,

00:29:20.360 | like the entirety of these little interactions.

00:29:24.360 | 'Cause it's not as complicated as chess.

00:29:25.760 | It's just like a little dance.

00:29:27.160 | We do like a little dance together

00:29:28.800 | and then we figure it out.

00:29:30.000 | - Well, in some ways it's way more complicated than chess

00:29:32.520 | because it's continuous,

00:29:35.000 | it's uncertain in a continuous manner.

00:29:37.280 | - It doesn't feel more complicated.

00:29:39.840 | - But it doesn't feel more complicated

00:29:41.080 | because that's what we've evolved to solve.

00:29:43.680 | This is the kind of problem we've evolved to solve.

00:29:45.480 | And so we're good at it

00:29:46.400 | because nature has made us good at it.

00:29:49.280 | Nature has not made us good at chess.

00:29:52.360 | We completely suck at chess.

00:29:55.720 | In fact, that's why we designed it as a game,

00:29:58.000 | is to be challenging.

00:29:59.040 | And if there is something that recent progress

00:30:02.600 | in chess and Go has made us realize

00:30:05.600 | is that humans are really terrible at those things,

00:30:07.920 | like really bad.

00:30:08.920 | There was a story before AlphaGo

00:30:11.520 | that the best Go players thought

00:30:15.200 | there were maybe two or three stones

00:30:16.720 | behind an ideal player that they would call God.

00:30:19.640 | In fact, no, there are like nine or 10 stones behind,

00:30:23.680 | I mean, which is bad.

00:30:25.360 | So we're not good at,

00:30:27.400 | and it's because we have limited working memory.

00:30:30.360 | We're not very good at doing this tree exploration

00:30:32.960 | that computers are much better at doing than we are,

00:30:36.760 | but we are much better

00:30:37.960 | at learning differentiable models of the world.

00:30:40.600 | I mean, I say differentiable in a kind of,

00:30:43.840 | I should say not differentiable in the sense

00:30:46.040 | that we run backprop through it,

00:30:47.480 | but in the sense that our brain has some mechanism

00:30:50.520 | for estimating gradients of some kind.

00:30:54.080 | And that's what makes us efficient.

00:30:56.520 | So if you have an agent that consists of a model

00:31:01.520 | of the world, which in the human brain

00:31:04.360 | is basically the entire front half of your brain,

00:31:06.800 | an objective function, which in humans

00:31:13.320 | is a combination of two things.

00:31:14.400 | There is your sort of intrinsic motivation module,

00:31:17.640 | which is in the basal ganglia,

00:31:19.120 | the base of your brain.

00:31:20.080 | That's the thing that measures pain and hunger

00:31:22.480 | and things like that, like immediate feelings and emotions.

00:31:26.800 | And then there is the equivalent

00:31:30.720 | of what people in reinforcement learning call a critic,

00:31:32.560 | which is a sort of module that predicts ahead

00:31:36.040 | what the outcome of a situation will be.

00:31:41.040 | And so it's not a cost function,

00:31:43.800 | but it's sort of not an objective function,

00:31:45.400 | but it's sort of a trained predictor

00:31:49.000 | of the ultimate objective function.

00:31:50.960 | And that also is differentiable.

00:31:52.560 | And so if all of this is differentiable,

00:31:54.640 | your cost function, your critic, your world model,

00:31:59.640 | then you can use gradient-based type methods

00:32:03.080 | to do planning, to do reasoning, to do learning,

00:32:05.840 | to do all the things that we'd like

00:32:08.160 | an intelligent agent to do.

00:32:11.840 | - And a gradient-based learning,

00:32:14.200 | like what's your intuition?

00:32:15.360 | That's probably at the core of what can solve intelligence.

00:32:18.400 | So you don't need like a logic-based reasoning in your view.

00:32:23.400 | - I don't know how to make logic-based reasoning

00:32:27.280 | compatible with efficient learning.

00:32:29.760 | - Yeah.

00:32:31.000 | - Okay, I mean, there is a big question,

00:32:32.320 | perhaps a philosophical question.

00:32:33.880 | I mean, it's not that philosophical,

00:32:35.200 | but that we can ask is that, you know,

00:32:38.080 | all the learning algorithms we know

00:32:40.360 | from engineering and computer science

00:32:43.280 | proceed by optimizing some objective function.

00:32:46.520 | - Yeah. - Right?

00:32:48.320 | So one question we may ask is,

00:32:49.920 | does learning in the brain minimize an objective function?

00:32:54.720 | I mean, it could be a composite

00:32:57.320 | of multiple objective functions,

00:32:58.480 | but it's still an objective function.

00:33:00.320 | Second, if it does optimize an objective function,

00:33:04.640 | does it do it by some sort of gradient estimation?

00:33:09.640 | You know, it doesn't need to be backprop,

00:33:10.880 | but some way of estimating the gradient in an efficient manner

00:33:14.840 | whose complexity is on the same order of magnitude

00:33:17.000 | as actually running the inference.

00:33:20.760 | 'Cause you can't afford to do things like, you know,

00:33:24.920 | perturbing a weight in your brain

00:33:26.520 | to figure out what the effect is,

00:33:28.040 | and then sort of, you know,

00:33:29.640 | you can't do sort of estimating gradient by perturbation.

00:33:32.640 | It's, to me, it seems very implausible

00:33:35.400 | that the brain uses some sort of, you know,

00:33:39.200 | zeroth-order black box gradient-free optimization

00:33:42.960 | because it's so much less efficient

00:33:45.160 | than gradient optimization.

00:33:46.280 | So it has to have a way of estimating gradient.

00:33:49.240 | - Is it possible that some kind of logic-based reasoning

00:33:52.760 | emerges in pockets as a useful, like you said,

00:33:56.320 | if the brain is an objective function,

00:33:58.080 | maybe it's a mechanism for creating objective functions.

00:34:01.280 | It's a mechanism for creating knowledge bases, for example,

00:34:06.280 | that can then be queried.

00:34:08.360 | Like, maybe it's like an efficient representation

00:34:10.240 | of knowledge that's learned in a gradient-based way

00:34:12.640 | or something like that.

00:34:13.760 | - Well, so I think there is a lot of different types

00:34:15.920 | of intelligence.

00:34:17.280 | So first of all, I think the type of logical reasoning

00:34:19.600 | that we think about, that we are, you know,

00:34:23.040 | maybe stemming from, you know, sort of classical AI

00:34:26.000 | of the 1970s and '80s,

00:34:27.640 | I think humans use that relatively rarely

00:34:32.920 | and are not particularly good at it.

00:34:34.640 | - But we judge each other based on our ability

00:34:37.480 | to solve those rare problems.

00:34:40.520 | It's called an IQ test.

00:34:41.600 | - I don't think so.

00:34:42.600 | Like, I'm not very good at chess.

00:34:45.160 | - Yes, I'm judging you this whole time because,

00:34:48.480 | well, we actually-

00:34:49.720 | - With your, you know, heritage,

00:34:51.760 | I'm sure you're good at chess.

00:34:53.440 | - No, stereotypes.

00:34:55.000 | Not all stereotypes are true.

00:34:56.640 | - Well, I'm terrible at chess.

00:34:58.960 | So, you know, but I think perhaps another type

00:35:03.960 | of intelligence that I have is this, you know,

00:35:07.560 | ability of sort of building models to the world from,

00:35:11.000 | you know, reasoning, obviously, but also data.

00:35:15.640 | And those models generally are more kind of analogical,

00:35:18.600 | right?

00:35:19.440 | So it's reasoning by simulation and by analogy,

00:35:23.880 | where you use one model to apply to a new situation,

00:35:26.840 | even though you've never seen that situation,

00:35:28.400 | you can sort of connect it to a situation

00:35:31.560 | you've encountered before.

00:35:33.440 | And your reasoning is more, you know,

00:35:36.360 | akin to some sort of internal simulation.

00:35:38.360 | So you're kind of simulating what's happening

00:35:41.080 | when you're building, I don't know,

00:35:42.160 | a box out of wood or something, right?

00:35:44.000 | You kind of imagine in advance,

00:35:46.360 | like what would be the result of, you know,

00:35:47.760 | cutting the wood in this particular way?

00:35:49.560 | Are you going to use, you know, screws and nails or whatever?

00:35:52.800 | When you are interacting with someone,

00:35:54.080 | you also have a model of that person

00:35:55.720 | and sort of interact with that person, you know,

00:35:59.480 | having this model in mind to kind of tell the person

00:36:03.560 | what you think is useful to them.

00:36:05.240 | So I think this ability to construct models of the world

00:36:10.160 | is basically the essence of intelligence.

00:36:13.840 | And the ability to use it then to plan actions

00:36:18.200 | that will fulfill a particular criterion,

00:36:23.040 | of course, is necessary as well.

00:36:25.440 | - So I'm going to ask you a series of impossible questions

00:36:27.720 | as we keep asking, as I've been doing.

00:36:30.160 | So if that's the fundamental sort of dark matter

00:36:33.400 | of intelligence, this ability to form a background model,

00:36:36.560 | what's your intuition about how much knowledge is required?

00:36:41.440 | You know, I think dark matter,

00:36:43.120 | you can put a percentage on it

00:36:46.000 | of the composition of the universe

00:36:50.040 | and how much of it is dark matter,

00:36:51.440 | how much of it is dark energy.

00:36:52.640 | How much information do you think is required

00:36:57.640 | to be a house cat?

00:36:59.920 | So you have to be able to, when you see a box,

00:37:02.160 | go in it, when you see a human compute the most evil action,

00:37:06.240 | if there's a thing that's near an edge, you knock it off.

00:37:09.600 | All of that, plus the extra stuff you mentioned,

00:37:12.720 | which is a great self-awareness of the physics

00:37:15.720 | of your own body and the world.

00:37:18.740 | How much knowledge is required, do you think, to solve it?

00:37:21.600 | I don't even know how to measure an answer to that question.

00:37:25.600 | - I'm not sure how to measure it,

00:37:26.680 | but whatever it is, it fits in about 800,000 neurons.

00:37:32.100 | 800 million neurons, sorry.

00:37:33.900 | - The representation does.

00:37:35.380 | - Everything, all knowledge, everything, right?

00:37:38.660 | It's less than a billion.

00:37:41.460 | A dog is two billion, but a cat is less than one billion.

00:37:44.380 | And so multiply that by a thousand

00:37:48.100 | and you get the number of synapses.

00:37:50.260 | And I think almost all of it is learned

00:37:52.740 | through a sort of self-supervised learning,

00:37:55.900 | although I think a tiny sliver

00:37:58.460 | is learned through reinforcement learning,

00:37:59.860 | and certainly very little through

00:38:02.180 | classical supervised learning,

00:38:03.300 | although it's not even clear how supervised learning

00:38:05.180 | actually works in the biological world.

00:38:08.100 | So I think almost all of it is self-supervised learning,

00:38:12.860 | but it's driven by the sort of ingrained objective functions

00:38:17.860 | that a cat or a human have at the base of their brain,

00:38:21.380 | which kind of drives their behavior.

00:38:24.880 | So nature tells us you're hungry.

00:38:29.480 | It doesn't tell us how to feed ourselves.

00:38:31.900 | That's something that the rest of our brain

00:38:33.500 | has to figure out, right?

00:38:34.820 | - What's interesting is there might be more

00:38:37.940 | like deeper objective functions

00:38:39.660 | that are allowing the whole thing.

00:38:41.300 | So hunger may be some kind of,

00:38:44.500 | now you go to like neurobiology,

00:38:46.140 | it might be just the brain

00:38:47.500 | trying to maintain homeostasis.

00:38:52.460 | So hunger is just one of the human perceivable symptoms

00:38:58.020 | of the brain being unhappy with the way things are currently.

00:39:01.440 | It could be just like one really dumb

00:39:03.240 | objective function at the core.

00:39:04.920 | - But that's how behavior is driven.

00:39:08.440 | The fact that the basal ganglia

00:39:11.240 | drive us to do things that are different

00:39:14.800 | from say an orangutan or certainly a cat

00:39:18.160 | is what makes human nature

00:39:20.040 | versus orangutan nature versus cat nature.

00:39:23.240 | So for example,

00:39:25.680 | our basal ganglia drives us to

00:39:28.540 | seek the company of other humans.

00:39:32.220 | And that's because nature has figured out

00:39:34.540 | that we need to be social animals

00:39:36.140 | for our species to survive.

00:39:37.540 | And it's true of many primates.

00:39:40.320 | It's not true of orangutans.

00:39:42.620 | Orangutans are solitary animals.

00:39:44.920 | They don't seek the company of others.

00:39:46.900 | In fact, they avoid them.

00:39:48.140 | In fact, they scream at them when they come too close

00:39:51.060 | because they're territorial.

00:39:52.740 | 'Cause for their survival,

00:39:55.880 | evolution has figured out that's the best thing.

00:39:58.280 | I mean, they're occasionally social, of course,

00:40:00.040 | for reproduction and stuff like that.

00:40:03.520 | But they're mostly solitary.

00:40:05.920 | So all of those behaviors are not part of intelligence.

00:40:09.720 | People say, "Oh, you're never gonna have

00:40:11.040 | "intelligent machines because human intelligence is social."

00:40:13.960 | But then you look at orangutans, you look at octopus.

00:40:16.840 | Octopus never know their parents.

00:40:18.800 | They barely interact with any other.

00:40:20.480 | And they get to be really smart

00:40:22.200 | in less than a year, in like half a year.

00:40:24.920 | In a year, they're adults.

00:40:27.640 | In two years, they're dead.

00:40:28.800 | So there are things that we think, as humans,

00:40:33.600 | are intimately linked with intelligence,

00:40:35.760 | like social interaction, like language.

00:40:38.840 | I think we give way too much importance to language

00:40:43.520 | as a substrate of intelligence as humans

00:40:46.760 | because we think our reasoning is so linked with language.

00:40:49.840 | - So to solve the house cat intelligence problem,

00:40:53.480 | you think you could do it on a desert island?

00:40:55.480 | You could have-- - Pretty much.

00:40:56.760 | - You could just have a cat sitting there

00:40:58.760 | looking at the waves, at the ocean waves,

00:41:03.160 | and figure a lot of it out.

00:41:05.720 | - It needs to have sort of the right set of drives

00:41:08.840 | to kind of get it to do the thing

00:41:12.540 | and learn the appropriate things, right?

00:41:14.320 | But like, for example, baby humans are driven

00:41:19.440 | to learn to stand up and walk.

00:41:21.920 | That's kind of, this desire is hardwired.

00:41:26.000 | How to do it precisely is not, that's learned.

00:41:28.520 | But the desire to-- - To walk?

00:41:30.440 | - Move around and stand up, that's sort of hardwired.

00:41:35.440 | But it's very simple to hardwire this kind of stuff.

00:41:38.040 | - Oh, like the desire to, well, that's interesting.

00:41:42.760 | You're hardwired to want to walk.

00:41:45.600 | That's not a, there's gotta be a deeper need for walking.

00:41:50.440 | I think it was probably socially imposed by society

00:41:53.120 | that you need to walk, all the other bipedal--

00:41:55.560 | - No, like a lot of simple animals that, you know,

00:41:58.280 | would probably walk without ever watching

00:42:01.040 | any other members of the species.

00:42:03.880 | - It seems like a scary thing to have to do

00:42:06.820 | 'cause you suck at bipedal walking at first.

00:42:09.280 | It seems crawling is much safer, much more,

00:42:13.380 | like, why are you in a hurry?

00:42:15.720 | - Well, because you have this thing

00:42:17.560 | that drives you to do it, you know,

00:42:19.320 | which is sort of part of the sort of human development.

00:42:25.080 | - Is that understood, actually, what--

00:42:26.720 | - Not entirely, no.

00:42:28.280 | - What's the reason to get on two feet?

00:42:29.760 | It's really hard.

00:42:30.680 | Like, most animals don't get on two feet.

00:42:32.800 | Why is that? - Well, they get on four feet.

00:42:34.000 | You know, many mammals get on four feet.

00:42:35.800 | - Yeah, they do get-- - Very quickly.

00:42:36.800 | Some of them, extremely quickly.

00:42:38.520 | - But I don't, you know, like, from the last time

00:42:41.400 | I've interacted with a table, that's much more stable

00:42:43.640 | than a thing on two legs.

00:42:44.980 | It's just a really hard problem.

00:42:46.440 | - Yeah, I mean, birds have figured it out with two feet.

00:42:49.640 | - Well, technically, we can go into ontology.

00:42:52.000 | They have four.

00:42:53.160 | I guess they have two feet.

00:42:54.480 | - They have two feet. - Chickens.

00:42:56.400 | - You know, dinosaurs have two feet, many of them.

00:42:58.840 | - Allegedly.

00:42:59.680 | I'm just now learning that T-Rex was eating grass,

00:43:04.320 | not other animals.

00:43:05.400 | T-Rex might have been a friendly pet.

00:43:08.040 | What do you think about, I don't know if you looked at

00:43:12.440 | the test for general intelligence

00:43:14.620 | that Francois Chollet put together.

00:43:16.380 | I don't know if you got a chance to look at

00:43:18.100 | that kind of thing.

00:43:19.660 | What's your intuition about how to solve

00:43:21.820 | like an IQ type of test?

00:43:23.740 | - I don't know.

00:43:24.580 | I think it's so outside of my radar screen

00:43:26.140 | that it's not really relevant, I think, in the short term.

00:43:30.700 | - Well, I guess one way to ask, another way,

00:43:33.920 | perhaps more closer to what, to your work,

00:43:37.280 | is like, how do you solve MNIST

00:43:40.260 | with very little example data?

00:43:42.720 | - That's right, and that's the answer to this probably,

00:43:44.840 | is self-supervised learning.

00:43:45.840 | Just learn to represent images, and then learning

00:43:48.240 | to recognize handwritten digits on top of this

00:43:51.800 | will only require a few samples.

00:43:53.640 | And we observe this in humans, right?

00:43:55.480 | You show a young child a picture book

00:43:58.680 | with a couple of pictures of an elephant, and that's it.

00:44:01.960 | The child knows what an elephant is.

00:44:03.880 | And we see this today with practical systems

00:44:06.720 | that we train image recognition systems

00:44:09.520 | with enormous amounts of images,

00:44:13.660 | either completely self-supervised

00:44:15.740 | or very weakly supervised.

00:44:16.960 | For example, you can train a neural net

00:44:20.840 | to predict whatever hashtag people type on Instagram, right?

00:44:24.120 | Then you can do this with billions of images

00:44:25.740 | 'cause there's billions per day that are showing up.

00:44:28.520 | So the amount of training data there

00:44:30.640 | is essentially unlimited.

00:44:32.300 | And then you take the output representation,

00:44:35.020 | you know, a couple layers down from the output

00:44:37.320 | of what the system learned,

00:44:39.400 | and feed this as input to a classifier

00:44:42.000 | for any object in the world that you want,

00:44:43.760 | and it works pretty well.

00:44:45.000 | So that's transfer learning, okay?

00:44:47.600 | Or weakly supervised transfer learning.

00:44:50.140 | People are making very, very fast progress

00:44:53.480 | using self-supervised learning

00:44:55.280 | for this kind of scenario as well.

00:44:57.380 | And, you know, my guess is that

00:45:00.760 | that's gonna be the future.

00:45:02.520 | - For self-supervised learning,

00:45:03.640 | how much cleaning do you think is needed

00:45:06.800 | for filtering malicious signal,

00:45:11.760 | or what's a better term?

00:45:13.000 | But like a lot of people use hashtags on Instagram

00:45:15.720 | to get like good SEO

00:45:20.040 | that doesn't fully represent the contents of the image.

00:45:23.080 | Like they'll put a picture of a cat

00:45:24.480 | and hashtag it with like science, awesome, fun.

00:45:28.040 | I don't know, all kinds.

00:45:29.720 | Why would you put science?

00:45:31.200 | That's not very good SEO.

00:45:33.080 | - The way my colleagues who worked on this project

00:45:34.960 | at Facebook now Meta, Meta.AI,

00:45:38.680 | a few years ago dealt with this is that

00:45:41.560 | they only selected something like 17,000 tags

00:45:43.760 | that correspond to kind of physical things or situations.

00:45:48.080 | Like, you know, that has some visual content.

00:45:50.320 | So, you know, you wouldn't have like #tbt

00:45:55.800 | or anything like that.

00:45:57.120 | - Oh, so they keep a very select set of hashtags

00:46:00.800 | is what you're saying? - Yeah.

00:46:01.800 | - Okay.

00:46:02.960 | - It's still on the order of, you know, 10 to 20,000.

00:46:06.040 | So it's fairly large.

00:46:07.920 | - Okay.

00:46:09.040 | Can you tell me about data augmentation?

00:46:11.240 | What the heck is data augmentation?

00:46:13.240 | And how is it used, maybe contrast of learning for video?

00:46:18.240 | What are some cool ideas here?

00:46:20.840 | - Right, so data augmentation,

00:46:22.080 | I mean, first data augmentation, you know,

00:46:23.800 | is the idea of artificially increasing the size

00:46:26.120 | of your training set by distorting the images

00:46:29.360 | that you have in ways that don't change

00:46:31.000 | the nature of the image, right?

00:46:32.320 | So you take, you do MNIST,

00:46:33.960 | you can do data augmentation on MNIST.

00:46:35.440 | And people have done this since the 1990s, right?

00:46:37.320 | You take an MNIST digit and you shift it a little bit,

00:46:40.840 | or you change the size or rotate it, skew it,

00:46:44.800 | you know, et cetera.

00:46:46.960 | - Add noise.

00:46:48.240 | - Add noise, et cetera.

00:46:49.480 | And it works better.

00:46:50.800 | If you train a supervised classifier with augmented data,

00:46:53.440 | you're gonna get better results.

00:46:55.560 | Now it's become really interesting

00:46:58.600 | over the last couple of years,

00:47:00.400 | because a lot of self-supervised learning techniques

00:47:04.160 | to pre-train vision systems are based on data augmentation.

00:47:08.000 | And the basic techniques is originally inspired

00:47:12.000 | by techniques that I worked on in the early '90s

00:47:15.840 | and Geoff Hinton worked on also in the early '90s.

00:47:17.720 | They were sort of parallel work.

00:47:20.040 | I used to call this Siamese networks.

00:47:21.600 | So basically you take two identical copies

00:47:24.960 | of the same network, they share the same weights,

00:47:27.720 | and you show two different views of the same object.

00:47:31.760 | Either those two different views may have been obtained

00:47:33.920 | by data augmentation, or maybe it's two different views

00:47:36.480 | of the same scene from a camera that you moved

00:47:39.320 | or at different times or something like that, right?

00:47:41.360 | Or two pictures of the same person, things like that.

00:47:44.480 | And then you train this neural net,

00:47:46.440 | those two identical copies of this neural net

00:47:48.400 | to produce an output representation, a vector,

00:47:51.480 | in such a way that the representation

00:47:53.960 | for those two images are as close to each other as possible,

00:47:58.920 | as identical to each other as possible, right?

00:48:00.840 | Because you want the system to basically learn a function

00:48:04.640 | that will be invariant, that will not change,

00:48:07.160 | whose output will not change when you transform those inputs

00:48:10.840 | in those particular ways, right?

00:48:12.880 | So that's easy to do.

00:48:15.680 | What's complicated is how do you make sure

00:48:17.720 | that when you show two images that are different,

00:48:19.520 | the system will produce different things?

00:48:21.960 | Because if you don't have a specific provision for this,

00:48:26.200 | the system will just ignore the input.

00:48:28.240 | When you train it, it will end up ignoring the input

00:48:30.360 | and just produce a constant vector

00:48:31.720 | that is the same for every input, right?

00:48:33.640 | That's called a collapse.

00:48:35.200 | Now, how do you avoid collapse?

00:48:36.720 | So there's two ideas.

00:48:37.760 | One idea that I proposed in the early '90s

00:48:41.560 | with my colleagues at Bell Labs,

00:48:43.080 | Jane Bromley and a couple other people,

00:48:45.360 | which we now call contrastive learning,

00:48:48.280 | which is to have negative examples, right?

00:48:50.040 | So you have pairs of images that you know are different

00:48:53.200 | and you show them to the network and those two copies,

00:48:57.520 | and then you push the two output vectors

00:48:59.560 | away from each other.

00:49:01.120 | And it will eventually guarantee

00:49:02.240 | that things that are semantically similar

00:49:04.920 | produce similar representations

00:49:06.520 | and things that are different

00:49:07.360 | produce different representations.

00:49:09.040 | We actually came up with this idea

00:49:11.480 | for a project of doing signature verification.

00:49:14.520 | So we would collect signatures

00:49:17.880 | from multiple signatures on the same person

00:49:20.200 | and then train a neural net

00:49:21.440 | to produce the same representation.

00:49:23.320 | And then force the system to produce different representation

00:49:28.320 | from different signatures.

00:49:29.880 | This was actually, the problem was proposed

00:49:33.000 | by people from what was a subsidiary of AT&T at the time

00:49:36.720 | called NCR.

00:49:38.280 | And they were interested in storing a representation

00:49:41.040 | of the signature on the 80 bytes

00:49:43.520 | of the magnetic strip of a credit card.

00:49:46.680 | So we came up with this idea of having a neural net

00:49:48.800 | with 80 outputs that we quantized on bytes

00:49:52.320 | so that we could encode the-

00:49:53.880 | - And that encoding was then used to compare

00:49:55.480 | whether the signature matches or not.

00:49:57.120 | - That's right.

00:49:57.960 | So then you would sign, it would run through the neural net

00:50:00.680 | and then you would compare the output vector

00:50:02.440 | to whatever is stored on your card.

00:50:03.280 | - Did it actually work?

00:50:04.680 | - It worked, but they ended up not using it.

00:50:06.880 | Because nobody cares actually.

00:50:10.160 | I mean, the American financial payment system

00:50:13.840 | is incredibly lax in that respect compared to Europe.

00:50:17.600 | - Oh, with the signatures?

00:50:19.000 | What's the purpose of signatures anyway?

00:50:20.560 | This is very-

00:50:21.400 | - Nobody looks at them, nobody cares.

00:50:23.320 | - It's, yeah.

00:50:24.480 | - Yeah, no.

00:50:25.320 | So that's contrastive learning, right?

00:50:27.840 | So you need positive and negative pairs.

00:50:29.480 | And the problem with that is that,

00:50:31.760 | even though I had the original paper on this,

00:50:34.760 | I'm actually not very positive about it

00:50:36.800 | because it doesn't work in high dimension.

00:50:38.680 | If your representation is high dimensional,

00:50:41.040 | there's just too many ways for two things to be different.

00:50:44.280 | And so you would need lots and lots and lots

00:50:46.320 | of negative pairs.

00:50:48.240 | So there is a particular implementation of this,

00:50:50.800 | which is relatively recent

00:50:51.920 | from actually the Google Toronto group,

00:50:54.840 | where Geoff Hinton is the senior member there.

00:50:58.800 | And it's called SimClear, S-I-M-C-L-A-R.

00:51:01.360 | And it, basically a particular way

00:51:03.720 | of implementing this idea of contrastive learning,

00:51:06.760 | the particular objective function.

00:51:08.640 | Now, what I'm much more enthusiastic about these days

00:51:13.200 | is non-contrastive methods.

00:51:14.640 | So other ways to guarantee that

00:51:16.600 | the representations would be different

00:51:20.680 | for different inputs.

00:51:23.240 | And it's actually based on an idea

00:51:26.160 | that Geoff Hinton proposed in the early '90s

00:51:29.520 | with his student at the time, Sue Becker.

00:51:31.960 | And it's based on the idea

00:51:32.800 | of maximizing the mutual information

00:51:34.280 | between the outputs of the two systems.

00:51:36.160 | You only show positive pairs,

00:51:37.440 | you only show pairs of images

00:51:38.800 | that you know are somewhat similar.

00:51:41.640 | And you train the two networks to be informative,

00:51:44.160 | but also to be as informative of each other as possible.

00:51:49.720 | So basically one representation has to be predictable

00:51:52.240 | from the other, essentially.

00:51:53.880 | And he proposed that idea,

00:51:57.160 | had a couple of papers in the early '90s,

00:52:00.200 | and then nothing was done about it for decades.

00:52:03.080 | And I kind of revived this idea

00:52:04.800 | together with my postdocs at FAIR,

00:52:07.000 | particularly a postdoc called Stefan Duny,

00:52:09.640 | who's now a junior professor in Finland

00:52:12.520 | at the University of Aalto.

00:52:13.880 | We came up with something that we called Barlow Twins,

00:52:19.280 | and it's a particular way

00:52:20.640 | of maximizing the information content of a vector

00:52:25.480 | using some hypotheses.

00:52:28.080 | And we have kind of another version of it

00:52:32.040 | that's more recent now called VICREG, V-I-C-R-E-G,

00:52:34.520 | that means variance, invariance, covariance, regularization.

00:52:37.880 | And it's the thing I'm the most excited about

00:52:39.960 | in machine learning in the last 15 years.

00:52:41.720 | I mean, I'm really, really excited about this.

00:52:44.360 | - What kind of data augmentation is useful for that,

00:52:47.920 | not contrast to the learning method?

00:52:50.240 | Are we talking about, does that not matter that much?

00:52:52.640 | Or it seems like a very important part of the step.

00:52:55.920 | - Yeah.

00:52:56.760 | - How you generate the images that are similar,

00:52:58.160 | but sufficiently different.

00:52:59.600 | - Yeah, that's right.

00:53:00.440 | It's an important step, and it's also an annoying step,

00:53:02.400 | because you need to have that knowledge

00:53:03.760 | of what data augmentation you can do

00:53:06.800 | that do not change the nature of the object.

00:53:10.400 | And so the standard scenario,

00:53:13.160 | which a lot of people working in this area are using,

00:53:15.400 | is you use the type of distortion.

00:53:19.640 | So basically you do a geometric distortion.

00:53:22.040 | So one basically just shifts the image a little bit,

00:53:24.240 | it's called cropping.

00:53:25.240 | Another one kind of changes the scale a little bit.

00:53:27.760 | Another one kind of rotates it.

00:53:29.160 | Another one changes the colors.

00:53:30.720 | You can do a shift in color balance

00:53:32.960 | or something like that.

00:53:34.120 | Saturation.

00:53:35.840 | Another one sort of blurs it.

00:53:37.160 | Another one adds noise.

00:53:38.160 | So you have like a catalog of kind of standard things,

00:53:41.160 | and people try to use the same ones for different algorithms

00:53:44.040 | so that they can compare.

00:53:45.960 | But some algorithms, some self-supervised algorithm

00:53:48.240 | actually can deal with much bigger,

00:53:50.680 | like more aggressive data augmentation, and some don't.

00:53:53.560 | So that kind of makes the whole thing difficult.

00:53:56.400 | But that's the kind of distortions we're talking about.

00:53:58.800 | And so you train with those distortions,

00:54:03.560 | and then you chop off the last layer,

00:54:07.320 | a couple layers of the network,

00:54:11.120 | and you use the representation as input to a classifier.

00:54:13.560 | You train the classifier on ImageNet, let's say,

00:54:17.640 | or whatever, and measure the performance.

00:54:20.520 | And interestingly enough,

00:54:23.120 | the methods that are really good

00:54:24.400 | at eliminating the information that is irrelevant,

00:54:26.840 | which is the distortions between those images,

00:54:29.200 | do a good job at eliminating it.

00:54:32.400 | And as a consequence,

00:54:34.080 | you cannot use the representations in those systems

00:54:37.200 | for things like object detection and localization,

00:54:39.880 | because that information is gone.

00:54:41.520 | So the type of data augmentation you need to do

00:54:44.720 | depends on the task you want eventually the system to solve.

00:54:48.640 | And the type of data augmentation,

00:54:50.680 | standard data augmentation that we use today,

00:54:52.560 | are only appropriate for object recognition

00:54:54.680 | or image classification.

00:54:56.000 | They're not appropriate for things like--

00:54:57.760 | - Can you help me out understand why localization is--

00:55:00.800 | So you're saying it's just not good at the negative,

00:55:03.760 | like at classifying the negative,

00:55:05.440 | so that's why it can't be used for the localization?

00:55:07.920 | - No, it's just that you train the system,

00:55:10.360 | you give it an image,

00:55:12.360 | and then you give it the same image shifted and scaled,

00:55:14.960 | and you tell it that's the same image.

00:55:17.400 | So the system basically is trained

00:55:19.160 | to eliminate the information about position and size.

00:55:22.040 | So now you want to use that--

00:55:24.760 | - Oh, like literally--

00:55:26.200 | - Where an object is and what size it is.

00:55:27.760 | - Like a bounding box, like to be able to actually, okay.

00:55:30.800 | It can still find the object in the image,

00:55:34.160 | it's just not very good at finding

00:55:35.960 | the exact boundaries of that object, interesting.

00:55:39.000 | Interesting, which, you know,

00:55:41.120 | that's an interesting sort of philosophical question.

00:55:43.480 | How important is object localization anyway?

00:55:47.040 | We're like obsessed by measuring like image segmentation,

00:55:51.240 | obsessed by measuring perfectly

00:55:53.080 | knowing the boundaries of objects,

00:55:54.700 | when arguably that's not that essential

00:55:59.700 | to understanding what are the contents of the scene.

00:56:03.800 | - On the other hand, I think evolutionarily,

00:56:05.880 | the first vision systems in animals

00:56:08.200 | were basically all about localization,

00:56:10.040 | very little about recognition.

00:56:12.480 | And in the human brain, you have two separate pathways

00:56:15.320 | for recognizing the nature of a scene or an object,

00:56:20.320 | and localizing objects.

00:56:22.320 | So you use the first pathway called the ventral pathway

00:56:25.200 | for telling what you're looking at.

00:56:28.160 | The other pathway, the dorsal pathway,

00:56:30.580 | is used for navigation, for grasping, for everything else.

00:56:34.140 | And basically a lot of the things you need for survival

00:56:36.900 | are localization and detection.

00:56:39.740 | - Is similarity learning or contrastive learning,

00:56:45.060 | are these non-contrastive methods

00:56:46.540 | the same as understanding something?

00:56:48.860 | Just because you know a distorted cat

00:56:50.680 | is the same as a non-distorted cat,

00:56:52.620 | does that mean you understand what it means to be a cat?

00:56:56.740 | - To some extent.

00:56:57.580 | I mean, it's a superficial understanding, obviously.

00:57:00.100 | - But what is the ceiling of this method, do you think?

00:57:02.380 | Is this just one trick on the path

00:57:05.140 | to doing self-supervised learning,

00:57:07.300 | or can we go really, really far?

00:57:10.020 | - I think we can go really far.

00:57:11.260 | So if we figure out how to use techniques of that type,

00:57:16.260 | perhaps very different, but of the same nature,

00:57:19.460 | to train a system from video,

00:57:22.460 | to do video prediction, essentially,

00:57:24.260 | I think we'll have a path towards,

00:57:29.080 | I wouldn't say unlimited,

00:57:31.340 | but a path towards some level of physical common sense

00:57:36.340 | in machines.

00:57:38.100 | And I also think that that ability to learn

00:57:43.100 | how the world works from a high-throughput channel,

00:57:47.720 | like vision, is a necessary step

00:57:51.900 | towards real artificial intelligence.

00:57:55.540 | In other words, I believe in grounded intelligence.

00:57:58.100 | I don't think we can train a machine

00:57:59.920 | to be intelligent purely from text,

00:58:02.180 | because I think the amount of information

00:58:04.420 | about the world that's contained in text

00:58:06.220 | is tiny compared to what we need to know.

00:58:09.960 | So for example, and people have attempted to do this

00:58:15.300 | for 30 years, right, the PSYCH project,

00:58:17.460 | and things like that, right,

00:58:18.420 | of basically kind of writing down all the facts

00:58:20.620 | that are known and hoping that some sort of common sense

00:58:24.100 | will emerge, I think it's basically hopeless.

00:58:27.180 | But let me take an example.

00:58:28.300 | You take an object, I describe a situation to you.

00:58:31.300 | I take an object, I put it on the table,

00:58:33.540 | and I push the table.

00:58:34.940 | It's completely obvious to you that the object

00:58:37.220 | will be pushed with the table, right,

00:58:39.220 | because it's sitting on it.

00:58:40.580 | There's no text in the world, I believe,

00:58:43.420 | that explains this.

00:58:45.060 | And so if you train a machine as powerful as it could be,

00:58:48.380 | you know, your GPT 5000, or whatever it is,

00:58:53.380 | it's never gonna learn about this.

00:58:55.640 | That information is just not present in any text.

00:59:01.020 | - Well, the question, like with the PSYCH project,

00:59:03.260 | the dream, I think, is to have like 10 million,

00:59:08.020 | say, facts like that, that give you a head start,

00:59:13.260 | like a parent guiding you.

00:59:15.460 | Now, we humans don't need a parent to tell us

00:59:17.580 | that the table will move, sorry,

00:59:19.500 | the smartphone will move with the table.

00:59:21.700 | But we get a lot of guidance in other ways,

00:59:25.900 | so it's possible that we can give it a quick shortcut.

00:59:28.420 | - What about a cat?

00:59:29.460 | A cat knows that.

00:59:31.060 | - No, but they evolved, so--

00:59:33.380 | - No, they learn like us.

00:59:34.660 | - Sorry, the physics of stuff?

00:59:37.340 | - Yeah.

00:59:38.740 | - Well, yeah, so you're saying it's,

00:59:42.500 | so you're putting a lot of intelligence

00:59:45.100 | onto the nurture side, not the nature.

00:59:47.140 | - Yes.

00:59:47.980 | - 'Cause we seem to have, you know,

00:59:50.020 | there's a very inefficient, arguably,

00:59:52.540 | process of evolution that got us from bacteria

00:59:55.540 | to who we are today.

00:59:56.960 | Started at the bottom, now we're here.

00:59:59.780 | So the question is how, okay,

01:00:04.260 | the question is how fundamental is that,

01:00:05.980 | the nature of the whole hardware?

01:00:08.380 | And then, is there any way to shortcut it

01:00:11.660 | if it's fundamental?

01:00:12.500 | If it's not, if it's most of the intelligence,

01:00:14.280 | most of the cool stuff we've been talking about

01:00:15.900 | is mostly nurture, mostly trained,

01:00:18.780 | we figure it out by observing the world,

01:00:20.660 | we can form that big, beautiful, sexy background model

01:00:24.780 | that you're talking about just by sitting there.

01:00:27.240 | Then, okay, then you need to, then like maybe,

01:00:32.600 | it is all supervised learning all the way down.

01:00:37.820 | It's all supervised learning, so.

01:00:38.980 | Whatever it is that makes human intelligence

01:00:42.180 | different from other animals,

01:00:44.100 | which a lot of people think is language

01:00:46.340 | and logical reasoning and this kind of stuff,

01:00:48.740 | it cannot be that complicated

01:00:49.900 | because it only popped up in the last million years.

01:00:52.860 | - Yeah.

01:00:53.700 | - And it only involves less than 1% of a genome,

01:00:59.660 | which is the difference between human genome

01:01:01.220 | and chimps or whatever.

01:01:03.420 | So it can't be that complicated.

01:01:06.900 | It can't be that fundamental.

01:01:08.020 | I mean, most of the complicated stuff

01:01:10.860 | already exists in cats and dogs

01:01:12.460 | and certainly primates, non-human primates.

01:01:15.800 | - Yeah, that little thing with humans

01:01:18.620 | might be just something about social interaction

01:01:22.420 | and ability to maintain ideas

01:01:23.940 | across a collective of people.

01:01:28.100 | It sounds very dramatic and very impressive,

01:01:30.800 | but it probably isn't, mechanistically speaking.

01:01:33.340 | - It is, but we're not there yet.

01:01:34.660 | Like, we have, I mean, this is numbers.

01:01:37.300 | - Number 634 in the list of problems we have to solve.

01:01:42.300 | (laughs)

01:01:43.380 | - So basic physics of the world is number one.

01:01:46.860 | What do you, just a quick tangent on data augmentation.

01:01:51.580 | So a lot of it is hard-coded versus learned.

01:01:56.580 | Do you have any intuition that maybe

01:02:00.940 | there could be some weird data augmentation,

01:02:03.580 | like generative type of data augmentation,

01:02:06.180 | like doing something weird to images,

01:02:07.660 | which then improves the similarity learning process?

01:02:12.660 | So not just kind of dumb, simple distortions,

01:02:16.260 | but by you shaking your head,

01:02:18.100 | just saying that even simple distortions are enough.

01:02:20.900 | - I think, no, I think data augmentation

01:02:22.780 | is a temporary, necessary evil.

01:02:25.080 | So what people are working on now is two things.

01:02:28.880 | One is the type of self-supervised learning,

01:02:32.220 | like trying to translate the type of self-supervised learning

01:02:35.460 | people use in language, translating these two images,

01:02:38.660 | which is basically a denoising autoencoder method, right?

01:02:41.820 | So you take an image, you block, you mask some parts of it,

01:02:46.820 | and then you train some giant neural net

01:02:49.500 | to reconstruct the parts that are missing.

01:02:52.660 | And until very recently,

01:02:56.220 | there was no working methods for that.

01:02:59.140 | All the autoencoder type methods for images

01:03:01.620 | weren't producing very good representation,

01:03:03.740 | but there's a paper now coming out of the FAIR group

01:03:06.620 | at MNLO Park that actually works very well.

01:03:08.980 | So that doesn't require data augmentation,

01:03:12.140 | that requires only masking.

01:03:14.460 | Okay.

01:03:15.300 | - Only masking for images.

01:03:17.180 | Okay.

01:03:19.060 | - Right, so you mask part of the image

01:03:20.300 | and you train a system, which, you know,

01:03:23.060 | in this case is a transformer because you can,

01:03:26.620 | the transformer represents the image

01:03:28.380 | as non-overlapping patches,

01:03:30.860 | so it's easy to mask patches and things like that.

01:03:33.260 | - Okay, then my question transfers to that problem,

01:03:35.620 | the masking, like why should the mask be a square

01:03:38.740 | or a rectangle?

01:03:40.060 | - So it doesn't matter, like, you know,

01:03:41.580 | I think we're gonna come up probably in the future

01:03:44.300 | with sort of, you know, ways to mask that are, you know,

01:03:48.540 | kind of random, essentially.

01:03:51.140 | I mean, they are random already, but-

01:03:52.860 | - No, no, but like something that's challenging,

01:03:55.820 | like optimally challenging.

01:03:59.380 | So like, I mean, maybe it's a metaphor that doesn't apply,

01:04:02.460 | but you're, it seems like there's a data augmentation

01:04:06.420 | or masking, there's an interactive element with it.

01:04:09.860 | Like, you're almost like playing with an image.

01:04:11.980 | - Yeah.

01:04:12.820 | - And like, it's like the way we play

01:04:14.180 | with an image in our minds.

01:04:15.660 | - No, but it's like dropout.

01:04:16.700 | It's like Boson machine training.

01:04:18.300 | You know, every time you see a percept,

01:04:23.180 | you also, you can perturb it in some way.

01:04:26.820 | And then the principle of the training procedure

01:04:31.500 | is to minimize the difference of the output

01:04:33.580 | or the representation between the clean version

01:04:36.900 | and the corrupted version, essentially, right?

01:04:40.260 | And you can do this in real time, right?

01:04:42.020 | So, you know, Boson machine work like this, right?

01:04:44.220 | You show a percept, you tell the machine

01:04:47.420 | that's a good combination of activities

01:04:49.820 | or your input neurons.

01:04:50.900 | And then you either let them go their merry way

01:04:57.020 | without clamping them to values,

01:05:00.060 | or you only do this with a subset.

01:05:02.380 | And what you're doing is you're training the system

01:05:04.620 | so that the stable state of the entire network

01:05:07.980 | is the same regardless of whether it sees the entire input

01:05:10.660 | or whether it sees only part of it.

01:05:12.460 | You know, denoising autoencoder method

01:05:15.380 | is basically the same thing, right?

01:05:16.940 | You're training a system to reproduce the input,

01:05:19.540 | the complete input and filling the blanks,

01:05:21.820 | regardless of which parts are missing.

01:05:24.060 | And that's really the underlying principle.

01:05:26.220 | And you could imagine sort of even in the brain,

01:05:28.260 | some sort of neural principle where, you know,

01:05:30.700 | neurons kind of oscillate, right?

01:05:32.780 | So they take their activity and then temporarily

01:05:35.460 | they kind of shut off to, you know,

01:05:37.980 | force the rest of the system to basically reconstruct

01:05:42.100 | the input without their help, you know?

01:05:44.780 | And I mean, you could imagine, you know,

01:05:49.020 | more or less biologically possible processes.

01:05:51.060 | - Something like that.

01:05:51.900 | And I guess with this denoising autoencoder

01:05:54.940 | and masking and data augmentation,

01:05:58.700 | you don't have to worry about being super efficient.

01:06:01.140 | You can just do as much as you want

01:06:03.980 | and get better over time.

01:06:06.180 | 'Cause I was thinking like you might wanna be clever

01:06:08.780 | about the way you do all of these procedures, you know,

01:06:12.020 | but that's only, it's somehow costly to do every iteration,

01:06:16.740 | but it's not really.

01:06:17.940 | - Not really, maybe.

01:06:20.300 | And then there is, you know,

01:06:21.500 | data augmentation without explicit data augmentation.

01:06:24.180 | Is data augmentation by weighting,

01:06:25.580 | which is, you know, the sort of video prediction.

01:06:28.100 | You're observing a video clip,

01:06:31.500 | observing the, you know, the continuation of that video clip

01:06:35.940 | and you try to learn a representation

01:06:38.060 | using the joint embedding architectures

01:06:40.260 | in such a way that the representation of the future clip

01:06:43.300 | is easily predictable from the representation

01:06:45.660 | of the observed clip.

01:06:47.380 | - Do you think YouTube has enough raw data

01:06:52.740 | from which to learn how to be a cat?

01:06:56.420 | - I think so.

01:06:57.780 | - So the amount of data is not the constraint.

01:07:01.220 | - No, it would require some selection, I think.

01:07:04.140 | - Some selection?

01:07:05.420 | - Some selection of, you know,

01:07:07.060 | maybe the right type of data.

01:07:08.460 | - Don't go down the rabbit hole of just cat videos.

01:07:11.100 | I might, you might need to watch some lectures or something.

01:07:14.580 | - No.

01:07:15.700 | - How meta would that be?

01:07:17.500 | If it like watches lectures about intelligence

01:07:21.380 | and then learns, watches your lectures on NYU

01:07:24.300 | and learns from that how to be intelligent.

01:07:26.220 | - I don't think there'd be enough.

01:07:27.860 | - What's your, do you find multimodal learning interesting?

01:07:33.220 | We've been talking about visual language,

01:07:35.060 | like combining those together,

01:07:36.460 | maybe audio, all those kinds of things.

01:07:38.140 | - There's a lot of things that I find interesting

01:07:40.380 | in the short term, but are not addressing

01:07:43.260 | the important problem that I think are really

01:07:45.220 | kind of the big challenges.

01:07:46.660 | So I think, you know, things like multitask learning,

01:07:48.940 | continual learning, you know, adversarial issues.

01:07:53.940 | I mean, those have, you know, great practical interests

01:07:57.020 | in the relatively short term possibly,

01:08:00.300 | but I don't think they're fundamental, you know,

01:08:01.460 | active learning, even to some extent reinforcement learning.

01:08:04.380 | I think those things will become either obsolete

01:08:07.940 | or useless or easy once we figured out

01:08:12.940 | how to do self-supervised representation learning

01:08:15.900 | or learning predictive world models.

01:08:19.300 | And so I think that's what, you know,

01:08:21.540 | the entire community should be focusing on.

01:08:24.420 | At least people who are interested

01:08:25.460 | in sort of fundamental questions or, you know,

01:08:27.220 | really kind of pushing the envelope of AI

01:08:29.540 | towards the next stage.

01:08:31.460 | But of course there's like a huge amount of, you know,

01:08:33.340 | very interesting work to do in sort of practical questions

01:08:35.860 | that have, you know, short term impact.

01:08:38.020 | - Well, you know, it's difficult to talk about

01:08:41.300 | the temporal scale because all of human civilization

01:08:44.260 | will eventually be destroyed because the sun will die out.

01:08:48.580 | And even if Elon Musk is successful

01:08:50.300 | in multi-planetary colonization across the galaxy,

01:08:54.620 | eventually the entirety of it

01:08:56.620 | will just become giant black holes.

01:08:58.980 | And that's gonna keep the universe.

01:09:00.820 | - That's gonna take a while, though.

01:09:02.140 | - So, but what I'm saying is then that logic

01:09:04.860 | can be used to say it's all meaningless.

01:09:07.420 | I'm saying all that to say that multitask learning

01:09:11.900 | might be, you're calling it practical or pragmatic

01:09:16.220 | or whatever, that might be the thing

01:09:18.340 | that achieves something very akin to intelligence

01:09:21.140 | while we're trying to solve the more general problem

01:09:26.940 | of self-supervised learning of background knowledge.

01:09:29.460 | So the reason I bring that up,

01:09:30.660 | maybe one way to ask that question.

01:09:33.080 | I've been very impressed by what

01:09:34.740 | Tesla Autopilot team is doing.

01:09:36.460 | I don't know if you've gotten a chance to glance

01:09:38.340 | at this particular one example of multitask learning

01:09:42.140 | where they're literally taking the problem,

01:09:45.000 | like, I don't know, Charles Darwin studying animals.

01:09:48.940 | They're studying the problem of driving and asking,

01:09:52.100 | okay, what are all the things you have to perceive?

01:09:55.020 | And the way they're solving it is, one,

01:09:57.860 | there's an ontology where you're bringing that to the table.

01:10:00.420 | So you're formulating a bunch of different tasks.

01:10:02.300 | It's like over 100 tasks or something like that

01:10:04.260 | that they're involved in driving.

01:10:06.060 | And then they're deploying it

01:10:07.740 | and then getting data back from people that run into trouble

01:10:10.580 | and they're trying to figure out, do we add tasks?

01:10:12.700 | Do we, like, we focus on each individual task separately.

01:10:15.900 | - Sure. - In fact, half,

01:10:17.140 | so I would say, I'll classify Andrej Karpathy's talk

01:10:20.020 | in two ways.

01:10:20.860 | So one was about doors and the other one

01:10:23.140 | about how much ImageNet sucks.

01:10:24.740 | He kept going back and forth on those two topics,

01:10:28.600 | which ImageNet sucks,

01:10:30.060 | meaning you can't just use a single benchmark.

01:10:33.060 | There's so, like, you have to have, like,

01:10:36.060 | a giant suite of benchmarks to understand

01:10:38.460 | how well your system actually works.

01:10:40.020 | - Oh, I agree with him.

01:10:40.860 | I mean, he's a very sensible guy.

01:10:42.980 | Now, okay, it's very clear that if you're faced

01:10:47.620 | with an engineering problem that you need to solve

01:10:50.500 | in a relatively short time,

01:10:51.940 | particularly if you have Elon Musk breathing down your neck,

01:10:55.900 | you're going to have to take shortcuts, right?

01:10:57.380 | You might think about the fact that the right thing to do

01:11:02.380 | and the long-term solution involves, you know,

01:11:04.540 | some fancy self-supervisioning,

01:11:06.580 | but you have Elon Musk breathing down your neck,

01:11:10.260 | and this involves human lives,

01:11:13.620 | and so you have to basically just do

01:11:17.380 | the systematic engineering and fine-tuning and refinements

01:11:22.380 | and trial and error and all that stuff.

01:11:26.380 | There's nothing wrong with that.

01:11:27.460 | That's called engineering.

01:11:28.620 | That's called putting technology out

01:11:34.420 | in the world, and you have to kind of ironclad it

01:11:38.620 | before you do this, you know,

01:11:40.460 | so much for, you know, grand ideas and principles.

01:11:46.260 | But, you know, I'm placing myself sort of, you know,

01:11:50.740 | some, you know, upstream of this,

01:11:54.500 | quite a bit upstream of this.

01:11:55.780 | - Your Plato, think about platonic forms.

01:11:58.260 | You're- - It's not platonic,

01:11:59.900 | because eventually I want that stuff to get used,

01:12:03.100 | but it's okay if it takes five or 10 years

01:12:06.900 | for the community to realize this is the right thing to do.

01:12:09.300 | I've done this before.

01:12:11.260 | It's been the case before that, you know,

01:12:13.220 | I've made that case.

01:12:14.420 | I mean, if you look back in the mid-2000s, for example,

01:12:17.740 | and you ask yourself the question,

01:12:18.980 | okay, I want to recognize cars or faces or whatever,

01:12:22.060 | you know, I can use convolutional nets,

01:12:25.580 | so I can use sort of more conventional

01:12:28.380 | kind of computer vision techniques, you know,

01:12:29.900 | using interest point detectors or SIFT,

01:12:32.580 | dense SIFT features and, you know,

01:12:34.300 | sticking an SVM on top.

01:12:35.740 | At that time, the datasets were so small

01:12:37.820 | that those methods that use more hand engineering

01:12:41.940 | worked better than conv nets.

01:12:43.580 | There was just not enough data for conv nets,

01:12:45.540 | and conv nets were a little slow

01:12:47.860 | with the kind of hardware that was available at the time.

01:12:50.820 | And there was a sea change when, basically when, you know,

01:12:55.580 | datasets became bigger and GPUs became available.

01:12:58.580 | That's what, you know, two of the main factors

01:13:02.900 | that basically made people change their mind.

01:13:05.900 | And you can look at the history of,

01:13:11.820 | like all sub branches of AI or pattern recognition,

01:13:15.500 | and there's a similar trajectory followed by techniques

01:13:19.740 | where people start by, you know,

01:13:22.220 | engineering the hell out of it.

01:13:25.180 | You know, be it optical character recognition,

01:13:29.180 | speech recognition, computer vision,

01:13:31.740 | like image recognition in general,

01:13:34.260 | natural language understanding, like, you know,

01:13:35.980 | translation, things like that, right?

01:13:37.980 | You start to engineer the hell out of it.

01:13:39.980 | You start to acquire all the knowledge,

01:13:42.700 | the prior knowledge you know about image formation,

01:13:44.780 | about, you know, the shape of characters,

01:13:46.620 | about, you know, morphological operations,

01:13:49.580 | about like feature extraction, Fourier transforms,

01:13:52.420 | you know, Wernicke moments, you know, whatever, right?

01:13:54.500 | People have come up with thousands of ways

01:13:56.300 | of representing images so that they could be

01:13:58.620 | easily classified afterwards.

01:14:01.620 | Same for speech recognition, right?

01:14:03.020 | There is, you know, it took decades for people

01:14:05.020 | to figure out a good front end to pre-process

01:14:07.940 | a speech signal so that, you know,

01:14:10.540 | the information about what is being said is preserved,

01:14:13.420 | but most of the information about the identity

01:14:15.940 | of the speaker is gone.

01:14:17.060 | You know, Kestrel coefficients or whatever, right?

01:14:21.940 | And same for text, right?

01:14:24.540 | You do name entity recognition and you parse

01:14:27.460 | and you do tagging of the parts of speech.

01:14:32.460 | And, you know, you do this sort of tree representation

01:14:35.580 | of clauses and all that stuff, right,

01:14:37.500 | before you can do anything.

01:14:39.180 | So that's how it starts, right?

01:14:44.620 | Just engineer the hell out of it.

01:14:46.300 | And then you start having data

01:14:49.020 | and maybe you have more powerful computers,

01:14:51.260 | maybe you know something about statistical learning.

01:14:53.460 | So you start using machine learning

01:14:54.660 | and it's usually a small sliver on top of your

01:14:56.740 | kind of handcrafted system where, you know,

01:14:58.300 | you extract features by hand.

01:15:00.580 | Okay, and now, you know, nowadays,

01:15:02.580 | the standard way of doing this is that

01:15:04.100 | you train the entire thing end to end

01:15:05.380 | with a deep learning system and it learns its own features

01:15:07.740 | and, you know, speech recognition systems nowadays,

01:15:11.940 | OCR systems, are completely end to end.

01:15:13.900 | It's, you know, it's some giant neural net

01:15:16.380 | that takes raw waveforms and produces a sequence

01:15:19.940 | of characters coming out.

01:15:21.380 | And it's just a huge neural net, right?

01:15:23.020 | There's no, in a Markov model,

01:15:24.940 | there's no language model that is explicit

01:15:27.380 | other than, you know, something that's ingrained

01:15:29.540 | in the sort of neural language model, if you want.

01:15:31.900 | Same for translation, same for all kinds of stuff.

01:15:34.340 | So you see this continuous evolution

01:15:37.380 | from, you know, less and less handcrafting

01:15:41.340 | and more and more learning.

01:15:42.700 | And I think, I mean, it's true in biology as well.

01:15:49.260 | - So, I mean, we might disagree about this.

01:15:52.860 | Maybe not.

01:15:54.020 | In this one little piece at the end,

01:15:56.860 | you mentioned active learning.

01:15:58.340 | It feels like active learning,

01:16:01.460 | which is the selection of data and also the interactivity,

01:16:04.700 | needs to be part of this giant neural network.

01:16:06.780 | You cannot just be an observer

01:16:08.340 | to do self-supervised learning.

01:16:09.700 | You have to, well, I don't,

01:16:12.180 | self-supervised learning is just a word,

01:16:14.540 | but I would, whatever this giant stack

01:16:16.740 | of a neural network that's automatically learning,

01:16:19.620 | it feels, my intuition is that you have to have a system,

01:16:24.620 | whether it's a physical robot or a digital robot

01:16:30.180 | that's interacting with the world

01:16:32.300 | and doing so in a flawed way and improving over time

01:16:35.900 | in order to form the self-supervised learning well.

01:16:41.820 | You can't just give it a giant sea of data.

01:16:44.940 | - Okay, I agree and I disagree.

01:16:47.060 | I agree in the sense that I think, I agree in two ways.

01:16:52.060 | The first way I agree is that if you want,

01:16:55.100 | and you certainly need a causal model of the world

01:16:57.420 | that allows you to predict the consequences of your actions,

01:17:00.460 | to train that model, you need to take actions.

01:17:02.740 | You need to be able to act in a world and see the effect

01:17:06.100 | for you to learn causal models of the world.

01:17:08.420 | - So, that's not obvious because you can observe others.

01:17:11.500 | - You can observe others.

01:17:12.340 | - And you can infer that they're similar to you

01:17:14.660 | and then you can learn from that.

01:17:15.900 | - Yeah, but then you have to kind of hardware that part,

01:17:18.340 | right, and then you don't mirror neurons

01:17:19.820 | and all that stuff, right?

01:17:20.660 | So, and it's not clear to me

01:17:23.220 | how you would do this in a machine.

01:17:24.380 | So, I think the action part would be necessary

01:17:29.380 | for having causal models of the world.

01:17:32.580 | The second reason it may be necessary

01:17:36.660 | or at least more efficient is that active learning

01:17:40.580 | basically goes for the jiggler of what you don't know, right?

01:17:44.900 | There's obvious areas of uncertainty about your world

01:17:49.900 | and about how the world behaves.

01:17:52.980 | And you can resolve this uncertainty

01:17:56.220 | by systematic exploration of that part that you don't know.

01:18:00.300 | And if you know that you don't know,

01:18:01.700 | then it makes you curious.

01:18:03.020 | You kind of look into situations that...

01:18:05.580 | And across the animal world,

01:18:09.260 | different species are different levels of curiosity, right?

01:18:13.740 | Depending on how they're built, right?

01:18:15.100 | So, cats and rats are incredibly curious,

01:18:18.740 | dogs not so much, I mean, less.

01:18:20.620 | - Yeah, so it could be useful

01:18:22.100 | to have that kind of curiosity.

01:18:23.900 | - So, it'd be useful,

01:18:24.740 | but curiosity just makes the process faster.

01:18:26.980 | It doesn't make the process exist.

01:18:28.780 | So, what process, what learning process is it

01:18:34.580 | that active learning makes more efficient?

01:18:38.660 | And I'm asking that first question.

01:18:43.380 | We haven't answered that question yet.

01:18:44.780 | So, I worry about active learning once this question is...

01:18:48.100 | - So, it's the more fundamental question to ask.

01:18:50.820 | And if active learning or interaction

01:18:54.580 | increases the efficiency of the learning,

01:18:57.100 | see, sometimes it becomes very different

01:19:00.260 | if the increase is several orders of magnitude, right?

01:19:04.820 | - That's true.

01:19:05.660 | - But fundamentally, it's still the same thing

01:19:08.100 | in building up the intuition about how to,

01:19:11.180 | in a self-supervised way,

01:19:12.420 | to construct background models,

01:19:13.820 | efficient or inefficient, is the core problem.

01:19:18.640 | What do you think about Yoshua Bengio's

01:19:20.820 | talking about consciousness

01:19:22.900 | and all of these kinds of concepts?

01:19:24.540 | - Okay, I don't know what consciousness is, but...

01:19:29.540 | - It's a good opener.

01:19:31.980 | - And to some extent, a lot of the things

01:19:33.580 | that are said about consciousness remind me

01:19:35.980 | of the questions people were asking themselves

01:19:38.740 | in the 18th century or 17th century

01:19:41.340 | when they discovered that, you know,

01:19:44.100 | how the eye works and the fact that the image

01:19:46.300 | at the back of the eye was upside down, right?

01:19:49.900 | Because you have a lens.

01:19:51.180 | And so, on your retina, the image that forms

01:19:53.420 | is an image of the world, but it's upside down.

01:19:55.500 | How is it that you see right side up?

01:19:58.200 | And, you know, with what we know today in science,

01:20:00.460 | you know, we realize this question doesn't make any sense

01:20:03.860 | or is kind of ridiculous in some way, right?

01:20:06.340 | So, I think a lot of what is said about consciousness

01:20:08.180 | is of that nature.

01:20:09.020 | Now, that said, there's a lot of really smart people

01:20:10.980 | that, for whom I have a lot of respect,

01:20:13.820 | who are talking about this topic,

01:20:15.060 | people like David Chalmers, who is a colleague of mine

01:20:17.380 | at NYU.

01:20:18.220 | I have kind of an unorthodox, folk,

01:20:23.760 | speculative hypothesis about consciousness.

01:20:29.220 | So, we're talking about the study of a world model.

01:20:32.020 | And I think, you know, our entire prefrontal cortex

01:20:35.540 | basically is the engine for our world model.

01:20:40.820 | But when we are attending at a particular situation,

01:20:44.580 | we're focused on that situation.

01:20:46.060 | We basically cannot attend to anything else.

01:20:48.580 | And that seems to suggest that we basically have only one

01:20:53.580 | world model engine in our prefrontal cortex.

01:20:58.400 | That engine is configurable to the situation at hand.

01:21:02.580 | So, we are building a box out of wood,

01:21:04.620 | or we are, you know, driving down the highway,

01:21:08.340 | playing chess.

01:21:09.340 | We basically have a single model of the world

01:21:12.860 | that we're configuring to the situation at hand,

01:21:15.380 | which is why we can only attend to one task at a time.

01:21:18.080 | Now, if there is a task that we do repeatedly,

01:21:21.700 | it goes from the sort of deliberate reasoning

01:21:25.980 | using model of the world and prediction,

01:21:27.460 | and perhaps something like model predictive control,

01:21:29.340 | which I was talking about earlier,

01:21:31.420 | to something that is more subconscious

01:21:33.380 | that becomes automatic.

01:21:34.420 | So, I don't know if you've ever played against

01:21:36.340 | a chess grandmaster.

01:21:38.980 | You know, I get wiped out in 10 plies, right?

01:21:42.980 | And, you know, I have to think about my move

01:21:45.980 | for, you know, like 15 minutes.

01:21:48.680 | And the person in front of me, the grandmaster,

01:21:52.300 | you know, would just like react within seconds, right?

01:21:55.200 | You know, he doesn't need to think about it.

01:21:58.580 | That's become part of the subconscious,

01:21:59.980 | because, you know, it's basically just pattern recognition

01:22:02.620 | at this point.

01:22:03.460 | Same, you know, the first few hours you drive a car,

01:22:07.660 | you're really attentive.

01:22:08.660 | You can't do anything else.

01:22:09.660 | And then after 20, 30 hours of practice, 50 hours,

01:22:13.180 | you know, it's subconscious.

01:22:14.100 | You can talk to the person next to you,

01:22:15.420 | you know, things like that, right?

01:22:17.060 | Unless the situation becomes unpredictable,

01:22:18.980 | and then you have to stop talking.

01:22:21.020 | So, that suggests you only have one model in your head.

01:22:23.780 | And it might suggest the idea that consciousness

01:22:27.820 | basically is the module that configures

01:22:29.700 | this world model of yours.

01:22:31.700 | You know, you need to have some sort of executive

01:22:35.260 | kind of overseer that configures your world model

01:22:38.300 | for the situation at hand.

01:22:40.540 | And that leads to kind of the really curious concept

01:22:43.780 | that consciousness is not a consequence of the power

01:22:46.860 | of our minds, but of the limitation of our brains.

01:22:49.940 | But because we have only one world model,

01:22:52.020 | we have to be conscious.

01:22:53.660 | If we had as many world models as there are situations

01:22:57.620 | we encounter, then we could do all of them simultaneously,

01:23:00.740 | and we wouldn't need this sort of executive control

01:23:02.940 | that we call consciousness.

01:23:04.500 | - Yeah, interesting.

01:23:05.340 | And somehow maybe that executive controller,

01:23:08.940 | I mean, the hard problem of consciousness,

01:23:10.980 | there's some kind of chemicals in biology

01:23:12.860 | that's creating a feeling, like it feels

01:23:15.940 | to experience some of these things.

01:23:17.740 | That's kind of like the hard question is,

01:23:22.460 | what the heck is that, and why is that useful?

01:23:24.880 | Maybe the more pragmatic question,

01:23:26.180 | why is it useful to feel like this is really you

01:23:29.940 | experiencing this versus just like information

01:23:33.360 | being processed?

01:23:35.360 | - It could be just a very nice side effect

01:23:39.040 | of the way we evolved that it's just very useful

01:23:43.640 | to feel a sense of ownership to the decisions you make,

01:23:48.640 | to the perceptions you make, to the model

01:23:51.760 | you're trying to maintain.

01:23:53.200 | Like you own this thing, and it's the only one you got,

01:23:56.280 | and if you lose it, it's gonna really suck.

01:23:58.440 | And so you should really send the brain

01:24:00.640 | some signals about it.

01:24:03.720 | - What ideas do you believe might be true

01:24:06.840 | that most or at least many people disagree with you with,

01:24:10.080 | let's say in the space of machine learning?

01:24:13.760 | - Well, it depends who you talk about.

01:24:14.920 | But I think, so certainly there is a bunch of people

01:24:19.920 | who are nativist, right, who think that a lot

01:24:22.000 | of the basic things about the world are kind of hardwired

01:24:24.080 | in our minds.

01:24:25.320 | Things like the world is three-dimensional, for example.

01:24:28.880 | Is that hardwired?

01:24:30.400 | Things like object permanence, is it something

01:24:33.080 | that we learn before the age of three months or so,

01:24:37.520 | or are we born with it?

01:24:39.360 | And there are wide disagreements among

01:24:42.640 | the cognitive scientists for this.

01:24:46.560 | I think those things are actually very simple to learn.

01:24:49.040 | Is it the case that the oriented edge detectors in V1

01:24:54.240 | are learned, or are they hardwired?

01:24:56.160 | I think they are learned.

01:24:57.280 | They might be learned before birth,

01:24:58.560 | because it's really easy to generate signals

01:25:00.600 | from the retina that actually will train edge detectors.

01:25:03.000 | So, and again, those are things that can be learned

01:25:06.760 | within minutes of opening your eyes, right?

01:25:09.560 | I mean, since the 1990s, we have algorithms

01:25:14.040 | that can learn oriented edge detectors

01:25:15.440 | completely unsupervised with the equivalent

01:25:17.840 | of a few minutes of real time.

01:25:19.080 | So those things have to be learned.

01:25:21.540 | There's also those MIT experiments where you kind of plug

01:25:26.160 | the optical nerve on the auditory cortex of a baby ferret,

01:25:30.000 | right, and that auditory cortex

01:25:31.280 | becomes a visual cortex, essentially.

01:25:33.400 | So, you know, clearly there's learning taking place there.

01:25:37.980 | So, you know, I think a lot of what people think

01:25:40.680 | are so basic that they need to be hardwired,

01:25:43.160 | I think a lot of those things are learned

01:25:44.440 | because they are easy to learn.

01:25:46.240 | - So you put a lot of value in the power of learning.

01:25:49.960 | What kind of things do you suspect might not be learned?

01:25:53.340 | Is there something that could not be learned?

01:25:56.040 | - So your intrinsic drives are not learned.

01:25:59.760 | There are the things that, you know, make humans human

01:26:03.440 | or make, you know, cats different from dogs, right?

01:26:07.400 | It's the basic drives that are kind of hardwired

01:26:10.000 | in our basal ganglia.

01:26:11.920 | I mean, there are people who are working

01:26:14.000 | on this kind of stuff.

01:26:15.040 | It's called intrinsic motivation

01:26:16.320 | in the context of reinforcement learning.

01:26:18.160 | So these are objective functions

01:26:20.040 | where the reward doesn't come from the external world.

01:26:23.040 | It's computed by your own brain.

01:26:24.600 | Your own brain computes whether you're happy or not, right?

01:26:28.120 | It measures your degree of comfort or incomfort.

01:26:32.500 | And because it's your brain computing this,

01:26:36.080 | presumably it knows also how to estimate

01:26:37.760 | gradients of this, right?

01:26:38.760 | So it's easier to learn when your objective is intrinsic.

01:26:43.760 | So that has to be hardwired.

01:26:48.760 | The critic that makes long-term prediction of the outcome,

01:26:53.420 | which is the eventual result of this, that's learned.

01:26:56.720 | And perception is learned,

01:26:59.020 | and your model of the world is learned.

01:27:01.220 | But let me take an example of, you know, why the critic,

01:27:04.200 | I mean, an example of how the critic may be learned, right?

01:27:06.800 | If I come to you, you know, I reach across the table

01:27:11.200 | and I pinch your arm, right?

01:27:13.320 | Complete surprise for you.

01:27:15.040 | You would not have expected this from me.

01:27:15.880 | - I was expecting that the whole time, but yes, right.

01:27:18.040 | Let's say for the sake of the story, yes.

01:27:21.720 | - Okay, your basal ganglia is gonna light up

01:27:24.940 | 'cause it's gonna hurt, right?

01:27:26.780 | And now your model of the world includes the fact that

01:27:31.080 | I may pinch you if I approach my-

01:27:33.160 | - Don't trust humans.

01:27:36.180 | - Right, my hand to your arm.

01:27:37.820 | So if I try again, you're gonna recoil.

01:27:39.960 | And that's your critic, your predictive,

01:27:44.020 | you know, your predictor of your ultimate pain system

01:27:50.460 | that predicts that something bad is gonna happen

01:27:52.320 | and you recoil to avoid it.

01:27:53.760 | - So even that can be learned.

01:27:55.160 | - That is learned, definitely.

01:27:56.600 | This is what allows you also to, you know,

01:27:59.320 | define sub goals, right?

01:28:00.600 | So the fact that, you know, you're a school child,

01:28:04.440 | you wake up in the morning and you go to school and,

01:28:07.000 | you know, it's not because you necessarily like waking up

01:28:11.640 | early and going to school,

01:28:12.720 | but you know that there is a long-term objective

01:28:14.640 | you're trying to optimize.

01:28:15.840 | - So Ernest Becker, I'm not sure if you're familiar

01:28:18.120 | with him, the philosopher, he wrote the book

01:28:20.060 | "Denial of Death" and his idea is that

01:28:22.180 | one of the core motivations of human beings

01:28:24.460 | are terror of death, are fear of death.

01:28:27.260 | That's what makes us unique from cats.

01:28:28.900 | Cats are just surviving.

01:28:30.540 | They do not have a deep, like a cognizance,

01:28:35.540 | introspection that over the horizon is the end.

01:28:41.700 | And he says that, I mean,

01:28:43.020 | there's a terror management theory

01:28:44.380 | that just all these psychological experiments that show,

01:28:47.500 | basically this idea that all of human civilization,

01:28:52.500 | everything we create is kind of trying to forget

01:28:56.820 | if even for a brief moment that we're going to die.

01:29:00.620 | When do you think humans understand

01:29:03.720 | that they're going to die?

01:29:04.860 | Is it learned early on also?

01:29:07.460 | Like?

01:29:09.060 | - I don't know at what point, I mean, it's a question,

01:29:12.500 | like, you know, at what point do you realize that,

01:29:14.620 | you know, what death really is?

01:29:16.460 | And I think most people don't actually realize

01:29:18.220 | what death is, right?

01:29:19.260 | I mean, most people believe that you go to heaven

01:29:20.980 | or something, right?

01:29:21.900 | - So to push back on that, what Ernest Becker says

01:29:25.600 | and Sheldon Solomon, all of those folks,

01:29:29.340 | and I find those ideas a little bit compelling

01:29:31.660 | is that there is moments in life, early in life,

01:29:34.140 | a lot of this fun happens early in life

01:29:36.540 | when you are, when you do deeply experience the terror

01:29:41.540 | of this realization and all the things you think about,

01:29:45.340 | about religion, all those kinds of things

01:29:47.220 | that we kind of think about more like teenage years

01:29:49.620 | and later, we're talking about way earlier.

01:29:52.140 | - No, it was like seven or eight years,

01:29:53.220 | something like that, yeah.

01:29:54.060 | - You realize, holy crap, this is,

01:29:58.820 | like the mystery, the terror, like,

01:30:00.700 | it's almost like you're a little prey,

01:30:03.240 | a little baby deer sitting in the darkness

01:30:05.380 | of the jungle, the woods, looking all around you,

01:30:08.060 | the darkness full of terror.

01:30:09.580 | I mean, that realization says, okay,

01:30:12.180 | I'm going to go back in the comfort of my mind

01:30:14.500 | where there is a deep meaning,

01:30:16.820 | where there is a, maybe like, pretend I'm immortal

01:30:20.420 | in however way, however kind of idea I can construct

01:30:25.060 | to help me understand that I'm immortal.

01:30:27.180 | Religion helps with that.

01:30:28.660 | You can delude yourself in all kinds of ways,

01:30:31.460 | like lose yourself in the busyness of each day,

01:30:34.220 | have little goals in mind, all those kinds of things

01:30:36.380 | to think that it's gonna go on forever,

01:30:38.220 | and you kind of know you're gonna die, yeah,

01:30:40.780 | and it's gonna be sad, but you don't really understand

01:30:43.820 | that you're going to die.

01:30:45.140 | And so that's their idea.

01:30:46.460 | And I find that compelling because it does seem

01:30:49.940 | to be a core unique aspect of human nature

01:30:52.820 | that we're able to think that we're going,

01:30:55.180 | we're able to really understand that this life is finite.

01:30:59.540 | That seems important.

01:31:00.620 | - There's a bunch of different things there.

01:31:02.280 | So first of all, I don't think there is

01:31:03.660 | a qualitative difference between us and cats in the term.

01:31:07.520 | I think the difference is that we just have

01:31:09.240 | a better long-term ability to predict in the long term,

01:31:14.240 | and so we have a better understanding of how the world works,

01:31:17.380 | so we have a better understanding of finiteness of life

01:31:20.180 | and things like that.

01:31:21.020 | - So we have a better planning engine than cats?

01:31:23.520 | - Yeah.

01:31:24.440 | - Okay.

01:31:25.280 | - But what's the motivation for planning that far?

01:31:28.840 | - Well, I think it's just a side effect

01:31:30.160 | of the fact that we have just a better planning engine

01:31:32.320 | because it makes us, as I said,

01:31:34.760 | the essence of intelligence is the ability to predict.

01:31:37.400 | And so because we're smarter, as a side effect,

01:31:41.200 | we also have this ability to kind of make predictions

01:31:43.480 | about our own future existence or lack thereof.

01:31:47.560 | - Okay.

01:31:48.480 | - You say religion helps with that.

01:31:50.520 | I think religion hurts, actually.

01:31:52.960 | It makes people worry about what's gonna happen

01:31:55.600 | after their death, et cetera.

01:31:57.480 | If you believe that you just don't exist after death,

01:32:01.160 | it solves completely the problem, at least.

01:32:02.920 | - You're saying if you don't believe in God,

01:32:04.960 | you don't worry about what happens after death?

01:32:07.200 | - Yeah.

01:32:08.240 | - I don't know.

01:32:09.080 | - You only worry about this life

01:32:11.880 | because that's the only one you have.

01:32:14.240 | - I think it's, well, I don't know.

01:32:16.120 | If I were to say what Ernest Becker says,

01:32:17.760 | and I would say I agree with him more than not,

01:32:22.160 | is you do deeply worry.

01:32:26.160 | If you believe there's no God,

01:32:27.880 | there's still a deep worry of the mystery of it all.

01:32:31.760 | Like, how does that make any sense that it just ends?

01:32:35.680 | I don't think we can truly understand that this right,

01:32:39.720 | I mean, so much of our life, the consciousness, the ego,

01:32:43.000 | is invested in this being.

01:32:46.120 | - Science keeps bringing humanity down from its pedestal.

01:32:51.560 | And that's just another example of it.

01:32:54.720 | - That's wonderful, but for us individual humans,

01:32:57.840 | we don't like to be brought down from a pedestal.

01:33:00.280 | You're saying like-- - I'm fine with it.

01:33:01.720 | - But see, you're fine with it because, well,

01:33:04.140 | so what Ernest Becker would say is you're fine with it

01:33:06.360 | because that's just a more peaceful existence for you,

01:33:08.560 | but you're not really fine.

01:33:09.560 | You're hiding from, in fact, some of the people

01:33:12.000 | that experience the deepest trauma earlier in life,

01:33:17.000 | they often, before they seek extensive therapy,

01:33:19.600 | will say that I'm fine.

01:33:21.080 | It's like when you talk to people who are truly angry,

01:33:23.480 | how are you doing, I'm fine.

01:33:25.440 | The question is what's going on.

01:33:27.800 | - Now, I had a near-death experience.

01:33:29.200 | I had a very bad motorbike accident when I was 17.

01:33:33.640 | So, but that didn't have any impact

01:33:36.940 | on my reflection on that topic.

01:33:40.460 | - So I'm basically just playing a bit of a devil's advocate,

01:33:43.120 | pushing back on wondering is it truly possible

01:33:46.820 | to accept death?

01:33:47.660 | And the flip side that's more interesting, I think,

01:33:49.700 | for AI and robotics is how important is it to have this

01:33:54.700 | as one of the suite of motivations,

01:33:57.160 | is to not just avoid falling off the roof or something

01:34:02.160 | like that, but ponder the end of the ride.

01:34:07.160 | If you listen to the Stoics, it's a great motivator.

01:34:14.820 | It adds a sense of urgency.

01:34:16.900 | So maybe to truly fear death or be cognizant of it

01:34:21.440 | might give a deeper meaning and urgency

01:34:25.520 | to the moment, to live fully.

01:34:30.560 | - Maybe I don't disagree with that.

01:34:32.220 | I mean, I think what motivates me here is,

01:34:34.960 | you know, knowing more about human nature.

01:34:38.980 | I mean, I think human nature and human intelligence

01:34:41.760 | is a big mystery.

01:34:42.600 | It's a scientific mystery, in addition to, you know,

01:34:46.580 | philosophical and et cetera,

01:34:48.580 | but, you know, I'm a true believer in science.

01:34:50.860 | So, and I do have kind of a belief that for complex systems

01:34:57.540 | like the brain and the mind, the way to understand it

01:35:02.540 | is to try to reproduce it with, you know,

01:35:05.360 | artifacts that you build, because you know

01:35:07.660 | what's essential to it when you try to build it.

01:35:10.000 | You know, the same way I've used this analogy before

01:35:12.420 | with you, I believe, the same way we only started

01:35:15.820 | to understand aerodynamics when we started

01:35:18.640 | building airplanes, and that helped us understand

01:35:20.440 | how birds fly.

01:35:21.340 | So I think there's kind of a similar process here

01:35:25.480 | where we don't have a theory, a full theory of intelligence,

01:35:29.660 | but building, you know, intelligent artifacts

01:35:31.760 | will help us perhaps develop some, you know,

01:35:34.640 | underlying theory that encompasses not just

01:35:37.800 | artificial implements, but also human

01:35:41.920 | and biological intelligence in general.

01:35:43.840 | - So you're an interesting person to ask this question

01:35:46.080 | about sort of all kinds of different other

01:35:49.400 | intelligent entities or intelligences.

01:35:53.100 | What are your thoughts about kind of like the Turing

01:35:56.300 | or the Chinese room question?

01:35:58.020 | If we create an AI system that exhibits a lot of properties

01:36:04.100 | of intelligence and consciousness,

01:36:06.400 | how comfortable are you thinking of that entity

01:36:10.220 | as intelligent or conscious?

01:36:12.340 | So you're trying to build now systems that have intelligence

01:36:15.560 | and there's metrics about their performance,

01:36:17.420 | but that metric is external.

01:36:22.540 | - Okay.

01:36:23.380 | - So how are you, are you okay calling a thing intelligent

01:36:26.420 | or are you going to be like most humans

01:36:29.020 | and be once again unhappy to be brought down

01:36:32.700 | from a pedestal of consciousness/intelligence?

01:36:34.900 | - No, I'll be very happy to understand

01:36:39.500 | more about human nature, human mind, and human intelligence

01:36:45.500 | through the construction of machines

01:36:47.200 | that have similar abilities.

01:36:50.560 | And if a consequence of this is to bring down humanity

01:36:54.480 | one notch down from its already low pedestal,

01:36:58.000 | I'm just fine with it.

01:36:59.100 | That's just the reality of life.

01:37:01.300 | So I'm fine with that.

01:37:02.440 | Now, you were asking me about things that,

01:37:04.980 | opinions I have that a lot of people may disagree with.

01:37:07.900 | I think if we think about the design

01:37:12.740 | of an autonomous intelligence system,

01:37:14.220 | so assuming that we are somewhat successful at some level

01:37:18.660 | of getting machines to learn models of the world,

01:37:20.420 | predictive models of the world,

01:37:22.580 | we build intrinsic motivation objective functions

01:37:25.820 | to drive the behavior of that system.

01:37:28.300 | The system also has perception modules

01:37:30.060 | that allows it to estimate the state of the world

01:37:32.780 | and then have some way of figuring out a sequence of actions

01:37:35.460 | that, you know, to optimize a particular objective.

01:37:38.000 | If it has a critic of the type that I was describing before,

01:37:42.700 | the thing that makes you recoil your arm

01:37:44.580 | the second time I try to pinch you,

01:37:48.580 | intelligent autonomous machine will have emotions.

01:37:51.660 | I think emotions are an integral part

01:37:54.020 | of autonomous intelligence.

01:37:56.380 | If you have an intelligent system

01:37:58.980 | that is driven by intrinsic motivation, by objectives,

01:38:03.120 | if it has a critic that allows it to predict in advance

01:38:07.640 | whether the outcome of a situation

01:38:10.060 | is going to be good or bad, it's going to have emotions.

01:38:12.220 | It's going to have fear.

01:38:13.460 | - Yes.

01:38:14.300 | - When it predicts that the outcome is going to be bad

01:38:18.140 | and something to avoid, it's going to have elation

01:38:20.700 | when it predicts it's going to be good.

01:38:22.620 | If it has drives to relate with humans,

01:38:28.180 | you know, in some ways, the way humans have,

01:38:30.620 | you know, it's going to be social, right?

01:38:34.460 | And so it's going to have emotions about attachment

01:38:37.380 | and things of that type.

01:38:38.620 | So I think, you know, the sort of sci-fi thing

01:38:43.620 | where, you know, you see commander data

01:38:46.900 | like having an emotion chip that you can turn off, right?

01:38:50.100 | I think that's ridiculous.

01:38:51.700 | - So, I mean, here's the difficult

01:38:53.380 | philosophical social question.

01:38:57.820 | Do you think there will be a time,

01:39:00.040 | like a civil rights movement for robots where,

01:39:03.120 | okay, forget the movement, but a discussion,

01:39:06.460 | like the Supreme Court, that particular kinds of robots,

01:39:12.900 | you know, particular kinds of systems

01:39:14.860 | deserve the same rights as humans

01:39:18.300 | because they can suffer just as humans can,

01:39:21.640 | all those kinds of things?

01:39:24.740 | - Well, perhaps, perhaps not.

01:39:27.340 | Like imagine that humans were,

01:39:29.580 | that you could, you know, die and be restored.

01:39:33.740 | Like, you know, you could be sort of, you know,

01:39:35.500 | be 3D reprinted and, you know,

01:39:37.540 | your brain could be reconstructed in its finest details.

01:39:40.740 | Our ideas of rights will change in that case.

01:39:43.140 | If you can always just,

01:39:44.540 | there's always a backup, you could always restore.

01:39:48.220 | Maybe like the importance of murder

01:39:50.260 | will go down one notch.

01:39:51.980 | - That's right.

01:39:52.820 | But also the, your, you know,

01:39:56.140 | desire to do dangerous things like, you know,

01:39:59.620 | doing skydiving or, you know,

01:40:01.940 | or, you know, race car driving, you know,

01:40:06.100 | car racing or that kind of stuff, you know,

01:40:07.500 | would probably increase.

01:40:09.420 | Or, you know, airplane aerobatics

01:40:11.100 | or that kind of stuff, right?

01:40:11.940 | It would be fine to do a lot of those things

01:40:14.140 | or explore, you know, dangerous areas and things like that.

01:40:17.460 | It would kind of change your relationship.

01:40:19.180 | So now it's very likely that robots would be like that

01:40:22.380 | because, you know, they'll be based on perhaps technology

01:40:27.060 | that is somewhat similar to today's technology.

01:40:30.140 | And you can always have a backup.

01:40:32.260 | - So it's possible.

01:40:34.300 | I don't know if you like video games,

01:40:35.700 | but there's a game called Diablo and-

01:40:39.340 | - Oh, my sons are huge fans of this.

01:40:41.860 | - Yes.

01:40:42.700 | - And in fact, they made a game that's inspired by it.

01:40:47.060 | - Awesome.

01:40:47.900 | Like built a game?

01:40:49.260 | - My three sons have a game design studio between them.

01:40:52.380 | Yeah. - That's awesome.

01:40:53.220 | - They came out with a game.

01:40:54.060 | - Like it just came out last year?

01:40:55.540 | - No, this was last year, early last year,

01:40:57.300 | about a year ago.

01:40:58.140 | - That's awesome.

01:40:58.980 | But so in Diablo, there's something called hardcore mode,

01:41:01.980 | which if you die, there's no, you're gone.

01:41:05.420 | - Right.

01:41:06.260 | - That's it.

01:41:07.220 | So it's possible with AI systems,

01:41:09.660 | for them to be able to operate successfully

01:41:13.220 | and for us to treat them in a certain way,

01:41:15.540 | 'cause they have to be integrated in human society,

01:41:18.380 | they have to be able to die, no copies allowed.

01:41:22.020 | In fact, copying is illegal.

01:41:23.860 | It's possible with humans as well,

01:41:25.260 | like cloning will be illegal, even when it's possible.

01:41:28.580 | - But cloning is not copying, right?

01:41:29.940 | I mean, you don't reproduce the mind of the person

01:41:33.100 | and experience.

01:41:33.940 | - Right.

01:41:34.780 | - It's just a delayed twin, so.

01:41:36.420 | - But then it's, but we were talking about with computers

01:41:39.060 | that you will be able to copy.

01:41:40.500 | - Right.

01:41:41.340 | - You will be able to perfectly save,

01:41:42.660 | pickle the mind state.

01:41:46.660 | And it's possible that that will be illegal

01:41:49.660 | because that goes against,

01:41:52.320 | that will destroy the motivation of the system.

01:41:55.980 | - Okay, so let's say you have a domestic robot.

01:41:59.100 | - Yes.

01:41:59.940 | - Okay, sometime in the future.

01:42:01.380 | - Yes.

01:42:02.460 | - And the domestic robot, you know,

01:42:04.940 | comes to you kind of somewhat pre-trained,

01:42:07.140 | you know, it can do a bunch of things.

01:42:08.340 | - Yes.

01:42:09.180 | - But it has a particular personality

01:42:10.580 | that makes it slightly different from the other robots

01:42:12.300 | because that makes them more interesting.

01:42:14.220 | And then because it's, you know,

01:42:15.920 | it's lived with you for five years,

01:42:18.060 | you've grown some attachment to it and vice versa.

01:42:21.900 | And it's learned a lot about you.

01:42:24.380 | Or maybe it's not a household robot.

01:42:25.900 | Maybe it's a virtual assistant that lives in your,

01:42:29.380 | you know, augmented reality glasses or whatever, right?

01:42:32.580 | You know, the horror movie type thing, right?

01:42:35.020 | And that system, to some extent,

01:42:39.620 | the intelligence in that system is a bit like your child

01:42:43.900 | or maybe your PhD student in the sense that

01:42:47.100 | there's a lot of you in that machine now, right?

01:42:49.780 | - Yeah.

01:42:50.620 | - And so if it were a living thing,

01:42:53.500 | you would do this for free if you want, right?

01:42:56.560 | If it's your child, your child can, you know,

01:42:58.380 | then live his or her own life.

01:43:01.580 | And, you know, the fact that they learn stuff from you

01:43:04.020 | doesn't mean that you have any ownership of it, right?

01:43:06.020 | - Yeah.

01:43:06.860 | - But if it's a robot that you've trained,

01:43:09.380 | perhaps you have some intellectual property claim about-

01:43:13.980 | - Intellectual property?

01:43:15.140 | Oh, I thought you meant like a permanence value

01:43:18.160 | in the sense that's part of you is in-

01:43:20.140 | - Well, there is permanence value, right?

01:43:21.700 | So you would lose a lot if that robot were to be destroyed

01:43:24.660 | and you had no backup, you would lose a lot.

01:43:26.460 | You would lose a lot of investment, you know,

01:43:28.100 | kind of like a person dying, you know,

01:43:31.860 | that a friend of yours dying or a coworker

01:43:35.620 | or something like that.

01:43:36.820 | - But also you have like intellectual property rights

01:43:42.340 | in the sense that that system is fine-tuned

01:43:45.940 | to your particular existence.

01:43:47.340 | So that's now a very unique instantiation

01:43:49.860 | of that original background model,

01:43:51.980 | whatever it was that arrived.

01:43:54.260 | - And then there are issues of privacy, right?

01:43:55.660 | Because now imagine that that robot has its own

01:43:59.700 | kind of volition and decides to work for someone else

01:44:02.820 | or kind of, you know, thinks life with you

01:44:05.980 | is sort of untenable or whatever.

01:44:07.820 | - Right.

01:44:08.660 | - Now, all the things that that system learned from you,

01:44:12.740 | you know, can you like, you know,

01:44:16.820 | delete all the personal information

01:44:18.100 | that that system knows about you?

01:44:19.620 | - Yeah.

01:44:20.580 | - I mean, that would be kind of an ethical question.

01:44:22.180 | Like, you know, can you erase the mind of a intelligent

01:44:26.500 | robot to protect your privacy?

01:44:29.820 | - Yeah.

01:44:30.660 | - You can't do this with humans.

01:44:31.540 | You can ask them to shut up,

01:44:32.620 | but that you don't have complete power over them.

01:44:35.620 | - Can't erase humans.

01:44:36.780 | Yeah, it's the problem with the relationships, you know,

01:44:39.020 | that you break up, you can't erase the other human.

01:44:42.660 | With robots, I think it will have to be the same thing

01:44:44.940 | with robots, that risk, that there has to be some risk

01:44:50.300 | to our interactions to truly experience them deeply,

01:44:55.100 | it feels like.

01:44:56.140 | So you have to be able to lose your robot friend

01:44:59.620 | and that robot friend to go tweeting

01:45:01.660 | about how much of an asshole you are.

01:45:03.700 | - But then are you allowed to, you know,

01:45:06.140 | murder the robot to protect your private information?

01:45:08.620 | - Yeah, probably not.

01:45:09.460 | - If the robot decides to leave?

01:45:10.300 | - I have this intuition that for robots with certain,

01:45:14.540 | like, it's almost like a regulation.

01:45:16.820 | If you declare your robot to be,

01:45:19.220 | let's call it sentient or something like that,

01:45:20.980 | like this robot is designed for human interaction,

01:45:24.180 | then you're not allowed to murder these robots.

01:45:26.020 | It's the same as murdering other humans.

01:45:28.180 | - Well, but what about you do a backup of the robot

01:45:30.300 | that you preserve on a hard drive

01:45:32.580 | with the equivalent in the future?

01:45:33.860 | - That might be illegal.

01:45:34.700 | It's like piracy is illegal.

01:45:38.020 | - No, but it's your own robot, right?

01:45:39.740 | - But you can't, you don't--

01:45:41.620 | - But then you can wipe out his brain,

01:45:44.980 | so this robot doesn't know anything about you anymore,

01:45:47.380 | but you still have, technically it's still in existence

01:45:50.380 | because you backed it up.

01:45:51.660 | - And then there'll be these great speeches

01:45:53.500 | at the Supreme Court by saying,

01:45:55.420 | oh, sure, you can erase the mind of the robot

01:45:57.780 | just like you can erase the mind of a human.

01:46:00.020 | We both can suffer.

01:46:01.060 | There'll be some epic, like,

01:46:02.180 | Obama-type character with a speech that we,

01:46:05.620 | like, the robots and the humans are the same.

01:46:07.940 | We can both suffer, we can both hope,

01:46:11.340 | we can both, all of those kinds of things,

01:46:14.820 | raise families, all that kind of stuff.

01:46:17.220 | It's interesting for these, just like you said,

01:46:20.100 | emotion seems to be a fascinatingly powerful aspect

01:46:24.180 | of human-to-human interaction, human-robot interaction,

01:46:27.340 | and if they're able to exhibit emotions,

01:46:30.460 | at the end of the day, that's probably going to

01:46:33.540 | have us deeply consider human rights,

01:46:37.100 | like what we value in humans,

01:46:38.460 | what we value in other animals.

01:46:40.300 | That's why robots and AI is great.

01:46:42.140 | It makes us ask really good questions.

01:46:44.260 | - The hard questions, yeah.

01:46:46.100 | You asked about the Chinese room-type argument.

01:46:49.580 | Is it real, if it looks real?

01:46:51.500 | I think the Chinese room argument is a ridiculous one.

01:46:54.300 | - So for people who don't know, Chinese room is,

01:46:57.820 | I don't even know how to formulate it well,

01:47:00.740 | but basically, you can mimic the behavior

01:47:04.620 | of an intelligent system by just following

01:47:06.780 | a giant algorithm codebook that tells you

01:47:10.180 | exactly how to respond in exactly each case,

01:47:12.880 | but is that really intelligent?

01:47:14.700 | It's like a giant lookup table.

01:47:16.580 | When this person says this, you answer this.

01:47:18.580 | When this person says this, you answer this.

01:47:21.020 | And if you understand how that works,

01:47:24.300 | you have this giant, nearly infinite lookup table.

01:47:27.340 | Is that really intelligence?

01:47:28.620 | 'Cause intelligence seems to be a mechanism

01:47:31.300 | that's much more interesting and complex

01:47:33.420 | than this lookup table.

01:47:34.620 | - I don't think so.

01:47:35.460 | So the real question comes down to,

01:47:38.940 | do you think you can mechanize intelligence in some way,

01:47:44.340 | even if that involves learning?

01:47:47.580 | And the answer is, of course, yes.

01:47:49.300 | There's no question.

01:47:50.740 | There's a second question then,

01:47:52.140 | which is, assuming you can reproduce intelligence

01:47:56.540 | in sort of different hardware than biological hardware,

01:47:59.380 | you know, like computers,

01:48:00.620 | can you match human intelligence in all the domains

01:48:07.420 | in which humans are intelligent?

01:48:11.860 | Is it possible, right?

01:48:13.940 | So that's the hypothesis of strong AI.

01:48:17.060 | The answer to this, in my opinion, is an unqualified yes.

01:48:20.700 | This will happen at some point.

01:48:22.620 | There's no question that machines, at some point,

01:48:25.300 | will become more intelligent than humans

01:48:26.580 | in all domains where humans are intelligent.

01:48:28.580 | This is not for tomorrow.

01:48:30.180 | It's gonna take a long time,

01:48:32.180 | regardless of what Elon and others have claimed or believed.

01:48:37.180 | This is a lot harder than many of those guys think it is.

01:48:43.420 | And many of those guys who thought it was simpler than that

01:48:45.780 | years, you know, five years ago,

01:48:47.460 | now think it's hard because it's been five years

01:48:49.900 | and they realize it's gonna take a lot longer.

01:48:53.420 | That includes a bunch of people at DeepMind, for example.

01:48:56.180 | - Oh, interesting.

01:48:57.020 | I haven't actually touched base with the DeepMind folks,

01:48:59.340 | but some of it, Elon or Demis Hassabis,

01:49:03.300 | I mean, sometimes in your role,

01:49:05.820 | you have to kind of create deadlines

01:49:08.780 | that are nearer than farther away

01:49:10.740 | to kind of create an urgency.

01:49:12.820 | 'Cause you have to believe the impossible is possible

01:49:15.180 | in order to accomplish it.

01:49:16.180 | And there's, of course, a flip side to that coin,

01:49:18.540 | but it's a weird, you can't be too cynical

01:49:21.260 | if you wanna get something done.

01:49:22.420 | - Absolutely, I agree with that.

01:49:24.300 | But I mean, you have to inspire people, right,

01:49:26.900 | to work on sort of ambitious things.

01:49:28.740 | So, you know, it's certainly a lot harder than we believe,

01:49:35.580 | but there's no question in my mind that this will happen.

01:49:38.180 | And now, you know, people are kind of worried about

01:49:40.260 | what does that mean for humans?

01:49:42.460 | They are gonna be brought down from their pedestal,

01:49:45.100 | you know, a bunch of notches with that.

01:49:47.940 | And, you know, is that gonna be good or bad?

01:49:51.700 | I mean, it's just gonna give more power, right?

01:49:53.460 | It's an amplifier for human intelligence, really.

01:49:56.180 | - So speaking of doing cool, ambitious things,

01:49:59.700 | FAIR, the Facebook AI Research Group,

01:50:02.900 | has recently celebrated its eighth birthday,

01:50:05.500 | or maybe you can correct me on that.

01:50:08.620 | Looking back, what has been the successes,

01:50:11.580 | the failures, the lessons learned

01:50:13.340 | from the eight years of FAIR?

01:50:14.420 | And maybe you can also give context of

01:50:16.540 | where does the newly minted meta AI fit into,

01:50:21.260 | how does it relate to FAIR?

01:50:22.580 | - Right, so let me tell you a little bit

01:50:23.740 | about the organization of all this.

01:50:25.500 | Yeah, FAIR was created almost exactly eight years ago.

01:50:30.020 | It wasn't called FAIR yet.

01:50:31.220 | It took that name a few months later.

01:50:33.540 | And at the time I joined Facebook,

01:50:37.740 | there was a group called the AI Group

01:50:39.460 | that had about 12 engineers and a few scientists,

01:50:43.500 | like 10 engineers and two scientists

01:50:45.420 | or something like that.

01:50:47.020 | I ran it for three and a half years as a director,

01:50:49.900 | hired the first few scientists

01:50:52.300 | and kind of set up the culture and organized it,

01:50:55.300 | explained to the Facebook leadership

01:50:57.820 | what fundamental research was about

01:51:00.140 | and how it can work within industry

01:51:03.580 | and how it needs to be open and everything.

01:51:07.180 | And I think it's been an unqualified success

01:51:12.180 | in the sense that FAIR has simultaneously produced

01:51:16.460 | top-level research and advanced the science

01:51:20.900 | and the technology, provided tools,

01:51:22.500 | open source tools like PyTorch and many others.

01:51:24.980 | But at the same time has had a direct

01:51:29.820 | or mostly indirect impact on Facebook at the time,

01:51:34.620 | now meta, in the sense that a lot of systems

01:51:37.900 | that meta is built around now are based on research projects

01:51:44.340 | that started at FAIR.

01:51:48.300 | And so if you were to take out deep learning

01:51:50.020 | out of Facebook services now and meta more generally,

01:51:55.020 | I mean, the company would literally crumble.

01:51:57.660 | I mean, it's completely built around AI these days

01:52:01.380 | and it's really essential to the operations.

01:52:03.900 | So what happened after three and a half years

01:52:06.540 | is that I changed role, I became chief scientist.

01:52:10.140 | So I'm not doing day-to-day management of FAIR anymore.

01:52:14.780 | I'm more of a kind of, you know,

01:52:17.020 | think about strategy and things like that.

01:52:18.780 | And I carry my, I conduct my own research.

01:52:21.380 | I have, you know, my own kind of research group

01:52:23.220 | working on self-supervised learning and things like this,

01:52:25.220 | which I didn't have time to do when I was director.

01:52:28.140 | So now FAIR is run by Joël Pinot and Antoine Borde.

01:52:33.820 | Together, because FAIR is kind of split in two now,

01:52:36.300 | there's something called FAIR Labs,

01:52:37.820 | which is sort of bottom-up, science-driven research

01:52:40.900 | and FAIR Excel, which is slightly more organized

01:52:43.420 | for bigger projects that require a little more kind of focus

01:52:47.660 | and more engineering support and things like that.

01:52:49.740 | So Joël needs FAIR Lab and Antoine Borde needs FAIR Excel.

01:52:52.860 | - Where are they located?

01:52:54.540 | - It's delocalized all over.

01:52:56.620 | So there's no question that the leadership of the company

01:53:02.500 | believes that this was a very worthwhile investment.

01:53:06.540 | And what that means is that it's there for the long run.

01:53:11.540 | Right, so there is, if you want to talk in these terms,

01:53:16.780 | which I don't like, there's a business model, if you want,

01:53:19.540 | where FAIR, despite being a very fundamental research lab,

01:53:23.660 | brings a lot of value to the company,

01:53:25.900 | mostly indirectly through other groups.

01:53:27.860 | Now, what happened three and a half years ago

01:53:31.540 | when I stepped down was also the creation of Facebook AI,

01:53:34.620 | which was basically a larger organization

01:53:37.660 | that covers FAIR, so FAIR is included in it,

01:53:41.700 | but also has other organizations that are focused

01:53:46.260 | on applied research or advanced development of AI technology

01:53:51.220 | that is more focused on the products of the company.

01:53:54.660 | - So less emphasis on fundamental research.

01:53:56.660 | - Less fundamental, but it's still research.

01:53:58.220 | I mean, there's a lot of papers coming out

01:53:59.740 | of those organizations and people are awesome

01:54:03.940 | and wonderful to interact with,

01:54:06.380 | but it serves as kind of a way to kind of scale up,

01:54:11.380 | if you want, sort of AI technology,

01:54:15.700 | which may be very experimental and sort of lab prototypes

01:54:19.380 | into things that are usable.

01:54:20.580 | - So FAIR is a subset of meta AI.

01:54:23.020 | Is FAIR become like KFC?

01:54:25.140 | It'll just keep the F, nobody cares what the F stands for?

01:54:29.420 | - We'll know soon enough, probably by the end of 2021.

01:54:34.420 | - I guess it's not a giant change, Mare, FAIR.

01:54:38.420 | - Well, Mare doesn't sound too good,

01:54:39.540 | but the brand people are kind of deciding on this

01:54:43.540 | and they've been hesitating for a while now

01:54:45.860 | and they tell us they're gonna come up with an answer

01:54:48.500 | as to whether FAIR is gonna change name

01:54:50.460 | or whether we're gonna change just the meaning of the F.

01:54:53.180 | - Oh, that's a good call.

01:54:54.180 | I would keep FAIR and change the meaning of the F.

01:54:56.140 | - That would be my preference.

01:54:58.340 | - I would turn the F into fundamental.

01:55:00.980 | - Oh, that's good. - Fundamental AI research.

01:55:02.260 | - Oh, that's really good, yeah, yeah.

01:55:03.100 | - Within meta AI.

01:55:04.260 | So this would be sort of meta FAIR,

01:55:06.700 | but people would call it FAIR, right?

01:55:08.340 | - Yeah, exactly.

01:55:09.340 | I like it.

01:55:10.180 | - And now meta AI is part of the reality lab.

01:55:15.180 | So, you know, meta now, the new Facebook, right,

01:55:21.180 | it's called meta and it's kind of divided

01:55:23.940 | into, you know, Facebook, Instagram, WhatsApp,

01:55:28.660 | and reality lab.

01:55:32.900 | And reality lab is about, you know, AR, VR,

01:55:35.700 | you know, telepresence, communication,

01:55:39.180 | technology and stuff like that.

01:55:40.460 | It's kind of the, you can think of it as the sort of,

01:55:44.140 | a combination of sort of new products

01:55:47.820 | and technology part of meta.

01:55:51.900 | - Is that where the touch sensing for robots,

01:55:54.180 | I saw that you were posting about, that's-

01:55:56.020 | - But touch sensing for robot is part of FAIR, actually.

01:55:58.180 | That's a FAIR product. - Oh, it is, okay, cool.

01:56:00.500 | - This is also the, no, but there is the other way,

01:56:02.980 | the haptic glove, right? - Yes.

01:56:05.980 | - That has like- - That's more reality lab.

01:56:07.860 | - That's reality lab research.

01:56:10.700 | - Reality lab research.

01:56:11.700 | But by the way, the touch sensors is super interesting.

01:56:14.340 | Like integrating that modality

01:56:16.060 | into the whole sensing suite is very interesting.

01:56:20.060 | So what do you think about the metaverse?

01:56:23.620 | What do you think about this whole kind of expansion

01:56:27.740 | of the view of the role of Facebook and meta in the world?

01:56:30.820 | - Well, metaverse really should be thought of

01:56:32.420 | as the next step in the internet, right?

01:56:35.260 | Sort of trying to kind of, you know,

01:56:40.260 | make the experience more compelling of, you know,

01:56:44.060 | being connected either with other people or with content.

01:56:49.420 | And, you know, we are evolved and trained to evolve in,

01:56:54.260 | you know, 3D environments where, you know,

01:56:57.260 | we can see other people, we can talk to them when near them,

01:57:00.980 | or, you know, and other people are far away can't hear us,

01:57:04.060 | you know, things like that, right?

01:57:04.980 | So there's a lot of social conventions that exist

01:57:08.580 | in the real world that we can try to transpose.

01:57:10.740 | Now, what is gonna be eventually the,

01:57:13.220 | how compelling is it gonna be?

01:57:16.180 | Like, you know, is it gonna be the case

01:57:18.700 | that people are gonna be willing to do this

01:57:21.260 | if they have to wear, you know,

01:57:22.660 | a huge pair of goggles all day?

01:57:24.580 | Maybe not.

01:57:25.500 | - But then again, if the experience

01:57:27.460 | is sufficiently compelling, maybe so.

01:57:30.300 | - Or if the device that you have to wear

01:57:32.140 | is just basically a pair of glasses, you know,

01:57:34.300 | and technology makes sufficient progress for that.

01:57:36.780 | You know, AR is a much easier concept to grasp

01:57:41.540 | that you're gonna have, you know,

01:57:43.180 | augmented reality glasses that basically contain

01:57:46.580 | some sort of, you know, virtual assistant

01:57:48.620 | that can help you in your daily lives.

01:57:50.260 | - But at the same time with the AR,

01:57:51.900 | you have to contend with reality.

01:57:53.460 | With VR, you can completely detach yourself from reality,

01:57:55.860 | so it gives you freedom.

01:57:57.180 | It might be easier to design worlds in VR.

01:58:00.340 | - Yeah, but you can imagine, you know,

01:58:02.300 | the metaverse being-

01:58:03.540 | - A mix.

01:58:05.580 | - A mix, right.

01:58:06.500 | Or like you can have objects that exist in the metaverse

01:58:09.300 | that, you know, pop up on top of the real world

01:58:11.180 | or only exist in virtual reality.

01:58:14.380 | - Okay, let me ask the hard question.

01:58:17.060 | - Oh, because all of this was easy.

01:58:18.300 | - This was easy.

01:58:19.380 | The Facebook, now Meta, the social network

01:58:24.260 | has been painted by the media as net negative for society,

01:58:28.260 | even destructive and evil at times.

01:58:30.820 | You've pushed back against this, defending Facebook.

01:58:34.060 | Can you explain your defense?

01:58:36.540 | - Yeah, so the description,

01:58:38.620 | the company that is being described in some media

01:58:42.580 | is not the company we know when we work inside.

01:58:47.340 | And, you know, it could be claimed that

01:58:51.260 | a lot of employees are uninformed

01:58:52.860 | about what really goes on in a company,

01:58:54.580 | but, you know, I'm a vice president.

01:58:56.540 | I mean, I have a pretty good vision of what goes on.

01:58:58.660 | You know, I don't know everything, obviously.

01:59:00.180 | I'm not involved in everything,

01:59:01.860 | but certainly not in decision about like, you know,

01:59:04.580 | content moderation or anything like this,

01:59:06.100 | but I have some decent vision of what goes on.

01:59:10.140 | And this evil that is being described, I just don't see it.

01:59:13.660 | And then, you know, I think there is an easy story to buy,

01:59:18.180 | which is that, you know, all the bad things in the world,

01:59:21.740 | and, you know, the reason your friend believe crazy stuff,

01:59:25.140 | you know, there's an easy scapegoat, right,

01:59:28.740 | in social media in general, Facebook in particular.

01:59:34.460 | We have to look at the data.

01:59:35.460 | Like, is it the case that Facebook, for example,

01:59:40.060 | polarizes people politically?

01:59:42.700 | Are there academic studies that show this?

01:59:45.220 | Is it the case that, you know, teenagers

01:59:48.980 | think of themselves less if they use Instagram more?

01:59:52.140 | Is it the case that, you know,

01:59:55.340 | people get more riled up against, you know,

01:59:59.140 | opposite sides in a debate or political opinion

02:00:02.700 | if they are more on Facebook or if they are less?

02:00:05.700 | And study after study show that none of this is true.

02:00:10.700 | This is independent studies by academic.

02:00:12.420 | They're not funded by Facebook or Meta, you know,

02:00:15.900 | studied by Stanford, by some of my colleagues at NYU,

02:00:18.300 | actually, with whom I have no connection.

02:00:21.220 | You know, there's a study recently, they paid people,

02:00:25.020 | I think it was in the former Yugoslavia.

02:00:29.940 | I'm not exactly sure in what part,

02:00:31.820 | but they paid people to not use Facebook for a while

02:00:34.380 | in the period before the anniversary of the Srebrenica

02:00:39.940 | massacres, right?

02:00:41.140 | So, you know, people get riled up, like, should, you know,

02:00:43.180 | should we have a celebration?

02:00:45.460 | I mean, a memorial kind of celebration for it or not.

02:00:48.700 | So they paid a bunch of people to not use Facebook

02:00:51.420 | for a few weeks.

02:00:52.580 | And it turns out that those people ended up being

02:00:57.580 | more polarized than they were at the beginning.

02:01:00.460 | And the people who were more on Facebook

02:01:01.620 | were less polarized.

02:01:02.620 | There's a study, you know, from Stanford of,

02:01:07.620 | economists at Stanford that try to identify the causes

02:01:11.020 | of increasing polarization in the US.

02:01:14.460 | And it's been going on for 40 years before, you know,

02:01:17.460 | Mark Zuckerberg was born.

02:01:19.100 | - Yeah.

02:01:19.940 | - Continuously.

02:01:20.780 | And so if there is a cause,

02:01:24.220 | it's not Facebook or social media.

02:01:26.100 | So you could say if social media just accelerated,

02:01:28.100 | but no, I mean, it's basically a continuous evolution

02:01:31.740 | by some measure of polarization in the US.

02:01:34.300 | And then you compare this with other countries

02:01:36.300 | like the West half of Germany,

02:01:40.100 | because you can go 40 years in the East side

02:01:43.260 | or Denmark or other countries.

02:01:46.020 | And they use Facebook just as much.

02:01:47.980 | And they're not getting more polarized,

02:01:49.140 | they're getting less polarized.

02:01:50.540 | So if you want to look for, you know,

02:01:52.980 | a causal relationship there,

02:01:54.700 | you can find a scapegoat, but you can't find a cause.

02:01:58.420 | Now, if you want to fix the problem,

02:02:00.260 | you have to find the right cause.

02:02:01.620 | And what riles me up is that people now are,

02:02:04.900 | people now are accusing Facebook of bad deeds

02:02:08.300 | that are done by others.

02:02:09.300 | And those others are, we're not doing anything about them.

02:02:12.380 | And by the way, those others include

02:02:14.460 | the owner of the Wall Street Journal

02:02:15.660 | in which all of those papers were published.

02:02:17.700 | - So I should mention that I'm talking to Shrepp,

02:02:20.060 | Mike Shrepp on this podcast and also Mark Zuckerberg,

02:02:23.460 | and probably these are conversations you can have with them.

02:02:26.340 | 'Cause it's very interesting to me,

02:02:27.620 | even if Facebook has some measurable negative effect,

02:02:31.900 | you can't just consider that in isolation.

02:02:33.780 | You have to consider about all the positive ways

02:02:35.900 | that it connects us.

02:02:36.780 | So like every technology--

02:02:38.100 | - It connects people, it's a question.

02:02:39.620 | - You can't just say like, there's an increase in division.

02:02:43.860 | Yes, probably Google search engine

02:02:46.060 | has created increase in division,

02:02:47.860 | but you have to consider about how much information

02:02:49.860 | are brought to the world.

02:02:51.100 | Like I'm sure Wikipedia created more division

02:02:53.660 | if you just look at the division,

02:02:55.300 | but you have to look at the full context of the world

02:02:57.660 | and did it make a better world?

02:02:59.100 | - I mean, the printing press has created more division.

02:03:01.580 | - Exactly.

02:03:03.020 | - So when the printing press was invented,

02:03:06.860 | the first books that were printed

02:03:09.300 | were things like the Bible,

02:03:10.780 | and that allowed people to read the Bible by themselves,

02:03:13.780 | not get the message uniquely from priests in Europe.

02:03:17.380 | And that created the Protestant movement

02:03:20.340 | and 200 years of religious persecution and wars.

02:03:23.660 | So that's a bad side effect of the printing press.

02:03:26.180 | Social networks aren't being nearly as bad

02:03:28.500 | as the printing press,

02:03:29.340 | but nobody would say the printing press was a bad idea.

02:03:32.900 | - Yeah, a lot of it is perception

02:03:35.100 | and there's a lot of different incentives operating here.

02:03:38.420 | Maybe a quick comment,

02:03:40.020 | since you're one of the top leaders at Facebook

02:03:42.660 | and at Meta, sorry, that's in the tech space,

02:03:46.740 | I'm sure Facebook involves

02:03:48.500 | a lot of incredible technological challenges

02:03:52.020 | that need to be solved.

02:03:52.900 | A lot of it probably is on the computer infrastructure,

02:03:54.980 | the hardware, I mean, it's just a huge amount.

02:03:58.900 | Maybe can you give me context

02:04:00.380 | about how much of Schrepp's life is AI

02:04:04.380 | and how much of it is low-level compute?

02:04:06.220 | How much of it is flying all around doing business stuff

02:04:09.580 | in the same way Zuckerberg, Mark Zuckerberg?

02:04:12.020 | - They really focus on AI.

02:04:13.740 | I mean, certainly in the run-up of the creation affair

02:04:18.740 | and for at least a year after that, if not more,

02:04:24.060 | Mark was very, very much focused on AI

02:04:26.700 | and was spending quite a lot of effort on it.

02:04:29.700 | And that's his style.

02:04:30.780 | When he gets interested in something,

02:04:32.060 | he reads everything about it.

02:04:34.100 | He read some of my papers, for example, before I joined.

02:04:36.900 | And so he learned a lot about it.

02:04:41.860 | - He said he liked notes.

02:04:43.740 | - Right.

02:04:44.580 | And Schrepp was really into it also.

02:04:51.100 | I mean, Schrepp is really kind of,

02:04:52.780 | has something I've tried to preserve also

02:04:57.940 | despite my not so young age,

02:05:00.180 | which is a sense of wonder about science and technology.

02:05:03.180 | And he certainly has that.

02:05:05.260 | He's also a wonderful person.

02:05:07.420 | I mean, in terms of like as a manager,

02:05:10.380 | like dealing with people and everything,

02:05:12.140 | Mark also actually.

02:05:13.220 | So, I mean, they're very like, you know, very human people.

02:05:18.020 | In the case of Mark, it's shockingly human,

02:05:20.340 | you know, given his trajectory.

02:05:23.180 | I mean, the personality of him

02:05:27.060 | that he's painting in the press is just completely wrong.

02:05:29.580 | - Yeah.

02:05:30.420 | But you have to know how to play the press.

02:05:31.940 | So that's, I put some of that responsibility on him too.

02:05:36.180 | You have to, it's like, you know, like the director,

02:05:41.180 | the conductor of an orchestra,

02:05:44.300 | you have to play the press and the public

02:05:46.940 | in a certain kind of way

02:05:47.980 | where you convey your true self to them.

02:05:49.700 | If there's a depth of kindness to it.

02:05:51.100 | - It's hard.

02:05:51.940 | And it's probably not the best at it.

02:05:53.700 | So, yeah.

02:05:56.420 | You have to learn.

02:05:57.700 | And it's sad to see, I'll talk to him about it,

02:06:00.420 | but Shrepp is slowly stepping down.

02:06:04.020 | It's always sad to see folks sort of be there

02:06:07.500 | for a long time and slowly, I guess time is sad.

02:06:11.220 | - I think he's done the thing he set out to do.

02:06:14.780 | And, you know, he's got, you know,

02:06:17.540 | family priorities and stuff like that.

02:06:21.460 | And I understand, you know, after 13 years or something.

02:06:26.460 | - It's been a good run.

02:06:28.900 | - Which in Silicon Valley is basically a lifetime.

02:06:32.140 | - Yeah.

02:06:32.980 | - You know, cause you know, it's dog years.

02:06:34.980 | - So in Europe, the conference just wrapped up.

02:06:37.620 | Let me just go back to something else.

02:06:40.580 | You posted that the paper you coauthored

02:06:42.500 | was rejected from Europe.

02:06:44.460 | As you said, proudly in quotes, rejected.

02:06:47.140 | - I'm a joke.

02:06:48.940 | - Yeah, I know.

02:06:51.300 | Can you describe this paper and like,

02:06:53.940 | what was the idea in it?

02:06:55.700 | And also maybe this is a good opportunity

02:06:58.460 | to ask what are the pros and cons,

02:07:00.580 | what works and what doesn't about the review process?

02:07:03.620 | - Yeah, let me talk about the paper first.

02:07:04.980 | I'll talk about the review process afterwards.

02:07:08.260 | The paper is called VICREG.

02:07:10.700 | So this is, I mentioned that before,

02:07:12.540 | variance, invariance, covariance, regularization.

02:07:14.900 | And it's a technique, a non-contrastive learning technique

02:07:18.260 | for what I call joint embedding architecture.

02:07:21.300 | So Siamese nets are an example

02:07:23.340 | of joint embedding architecture.

02:07:24.860 | So joint embedding architecture is,

02:07:26.580 | let me back up a little bit, right?

02:07:30.620 | So if you want to do self-supervised learning,

02:07:33.300 | you can do it by prediction.

02:07:35.140 | So let's say you want to train a system to predict video,

02:07:38.580 | right?

02:07:39.420 | You show it a video clip

02:07:40.260 | and you train the system to predict the next,

02:07:43.580 | the continuation of that video clip.

02:07:45.060 | Now, because you need to handle uncertainty,

02:07:47.820 | because there are many, you know,

02:07:48.980 | many continuations that are plausible,

02:07:51.580 | you need to have, you need to handle this in some way.

02:07:54.020 | You need to have a way for the system

02:07:56.660 | to be able to produce multiple predictions.

02:08:00.620 | And the way, the only way I know to do this

02:08:03.500 | is through what's called a latent variable.

02:08:05.420 | So you have some sort of hidden vector

02:08:08.780 | of a variable that you can vary over a set

02:08:11.180 | or draw from a distribution.

02:08:12.700 | And as you vary this vector over a set,

02:08:14.500 | the output, the prediction,

02:08:15.620 | varies over a set of plausible predictions.

02:08:18.460 | Okay, so that's called,

02:08:19.460 | I call this a generative latent variable model.

02:08:23.220 | - Got it.

02:08:24.940 | - Okay, now there is an alternative to this,

02:08:27.020 | to handle uncertainty.

02:08:28.660 | And instead of directly predicting

02:08:31.140 | the next frames of the clip,

02:08:34.820 | you also run those through another neural net.

02:08:39.620 | So you now have two neural nets,

02:08:42.460 | one that looks at the, you know,

02:08:45.780 | the initial segment of the video clip,

02:08:48.660 | and another one that looks at the continuation

02:08:51.220 | during training, right?

02:08:52.380 | And what you're trying to do is learn a representation

02:08:56.260 | of those two video clips

02:08:59.020 | that is maximally informative

02:09:00.740 | about the video clips themselves,

02:09:03.420 | but is such that you can predict the representation

02:09:07.140 | of the second video clip

02:09:08.540 | from the representation of the first one, easily.

02:09:11.340 | Okay, and you can sort of formalize this

02:09:13.580 | in terms of maximizing mutual information

02:09:15.300 | and some stuff like that, but it doesn't matter.

02:09:18.100 | What you want is informative representations

02:09:23.100 | of the two video clips that are mutually predictable.

02:09:27.460 | What that means is that there's a lot of details

02:09:30.860 | in the second video clips that are irrelevant.

02:09:33.140 | You know, let's say a video clip consists in,

02:09:38.780 | you know, a camera panning the scene.

02:09:42.020 | There's going to be a piece of that room

02:09:43.340 | that is going to be revealed,

02:09:44.740 | and I can somewhat predict

02:09:46.180 | what that room is going to look like,

02:09:48.060 | but I may not be able to predict the details

02:09:50.220 | of the texture of the ground

02:09:52.300 | and where the tiles are ending and stuff like that, right?

02:09:54.500 | So those are irrelevant details

02:09:56.340 | that perhaps my representation will eliminate.

02:09:59.620 | And so what I need is to train this second neural net

02:10:03.660 | in such a way that whenever the continuation

02:10:09.020 | video clip varies over all the plausible continuations,

02:10:12.220 | the representation doesn't change.

02:10:15.620 | - Got it.

02:10:16.460 | So it's the, yeah, yeah, got it.

02:10:18.100 | Over the space of representations,

02:10:20.860 | doing the same kind of thing

02:10:21.860 | as you're doing with similarity learning.

02:10:24.300 | - Right.

02:10:25.700 | So these are two ways to handle

02:10:28.060 | multimodality in a prediction, right?

02:10:29.580 | In the first way, you parametrize the prediction

02:10:32.260 | with a latent variable,

02:10:33.460 | but you predict pixels essentially, right?

02:10:35.780 | In the second one, you don't predict pixels,

02:10:38.380 | you predict an abstract representation of pixels

02:10:40.660 | and you guarantee that this abstract representation

02:10:43.460 | has as much information as possible about the input,

02:10:46.140 | but sort of, you know,

02:10:47.020 | drops all the stuff that you really can't predict,

02:10:49.700 | essentially.

02:10:50.540 | I used to be a big fan of the first approach.

02:10:53.860 | And in fact, in this paper with Ishan Mishra,

02:10:55.580 | this blog post, the dark matter intelligence,

02:10:58.340 | I was kind of advocating for this.

02:10:59.740 | And in the last year and a half,

02:11:01.540 | I've completely changed my mind.

02:11:02.780 | I'm now a big fan of the second one.

02:11:05.540 | And it's because of a small collection of algorithms

02:11:10.020 | that have been proposed over the last year and a half

02:11:13.220 | or so, two years to do this, including V-Craig,

02:11:17.820 | its predecessor called Barlow-Twins,

02:11:19.620 | which I mentioned, a method from our friends

02:11:23.140 | at DeepMind called BYOL.

02:11:24.540 | And there's a bunch of others now that kind of work similarly.

02:11:29.580 | So they're all based on this idea of joint embedding.

02:11:32.580 | Some of them have an explicit criterion

02:11:34.660 | that is an approximation of mutual information.

02:11:36.620 | Some others at BYOL work, but we don't really know why.

02:11:39.420 | And there's been like lots of theoretical papers

02:11:41.220 | about why BYOL works.

02:11:42.340 | No, it's not that because we take it out

02:11:43.940 | and it still works and blah, blah, blah.

02:11:46.020 | I mean, so there's like a big debate,

02:11:47.820 | but the important point is that we now have a collection

02:11:51.540 | of non-contrastive joint embedding methods,

02:11:53.700 | which I think is the best thing since sliced bread.

02:11:56.380 | So I'm super excited about this

02:11:58.300 | because I think it's our best shot for techniques

02:12:02.020 | that would allow us to kind of build predictive world models

02:12:06.340 | and at the same time,

02:12:07.460 | learn hierarchical representations of the world

02:12:09.900 | where what matters about the world is preserved

02:12:11.860 | and what is irrelevant is eliminated.

02:12:14.420 | - And by the way, the representations of before and after

02:12:17.020 | is in the space in a sequence of images,

02:12:20.540 | or is it for single images?

02:12:22.300 | - It would be either for a single image, for a sequence.

02:12:24.660 | It doesn't have to be images.

02:12:25.660 | This could be applied to text.

02:12:26.700 | This could be applied to just about any signal.

02:12:28.540 | I'm looking for methods that are generally applicable

02:12:32.940 | that are not specific to one particular modality.

02:12:36.180 | It could be audio or whatever.

02:12:37.620 | - Got it.

02:12:38.460 | So what's the story behind this paper?

02:12:39.660 | This paper is what, is describing one such method?

02:12:43.460 | - This is this Vick-Reich method.

02:12:44.500 | So this is co-authored,

02:12:45.700 | the first author is a student called Adrien Barne,

02:12:49.260 | who is a resident PhD student at Fer Paris,

02:12:52.700 | who is co-advised by me and Jean Ponce,

02:12:55.820 | who's a professor at Ecole Normale Supérieure,

02:12:58.180 | also a research director at INRIA.

02:13:00.660 | So this is a wonderful program in France

02:13:03.580 | where PhD students can basically do their PhD in industry.

02:13:06.620 | And that's kind of what's happening here.

02:13:08.940 | And this paper is a follow-up on this Barlow-Twin paper

02:13:15.420 | by my former postdoc, now Stéphane Denis,

02:13:18.340 | with Li Jing and Joris Bontart

02:13:21.500 | and a bunch of other people from Fer.

02:13:24.700 | And one of the main criticism from reviewers

02:13:27.780 | is that Vick-Reich is not different enough from Barlow-Twins.

02:13:31.340 | But my impression is that it's Barlow-Twins

02:13:36.340 | with a few bugs fixed essentially.

02:13:39.860 | And in the end, this is what people will use.

02:13:43.140 | - Right.

02:13:44.500 | - But I'm used to stuff that I submit being rejected for a while.

02:13:48.980 | - So it might be rejected and actually exceptionally well cited

02:13:51.300 | 'cause people use it.

02:13:52.140 | - Well, it's already cited like a bunch of times.

02:13:54.340 | - So, I mean, the question is then to the deeper question

02:13:57.580 | about peer review and conferences.

02:14:00.220 | I mean, computer science as a field is kind of unique

02:14:02.580 | that the conference is highly prized.

02:14:04.940 | That's one.

02:14:05.780 | - Right.

02:14:06.620 | - And it's interesting because the peer review process

02:14:08.860 | there is similar, I suppose, to journals,

02:14:11.020 | but it's accelerated significantly.

02:14:13.600 | Well, not significantly, but it goes fast.

02:14:16.500 | And it's a nice way to get stuff out quickly,

02:14:19.740 | to peer review it quickly,

02:14:20.740 | go to present it quickly to the community.

02:14:22.580 | So not quickly, but quicker.

02:14:25.100 | - Yeah.

02:14:25.940 | - But nevertheless, it has many of the same flaws

02:14:27.780 | of peer review 'cause it's a limited number

02:14:30.180 | of people look at it.

02:14:31.460 | There's bias and the following,

02:14:32.740 | like that if you wanna do new ideas,

02:14:35.520 | you're gonna get pushed back.

02:14:37.020 | There's self-interested people that kind of can infer

02:14:42.060 | who submitted it and kind of be cranky about it,

02:14:46.700 | all that kind of stuff.

02:14:47.700 | - Yeah, I mean, there's a lot of social phenomenon there.

02:14:50.980 | There's one social phenomenon, which is that

02:14:53.180 | because the field has been growing exponentially,

02:14:56.760 | the vast majority of people in the field

02:14:58.540 | are extremely junior.

02:14:59.980 | - Yeah.

02:15:00.820 | - So as a consequence, and that's just a consequence

02:15:03.220 | of the field growing, right?

02:15:04.860 | So as the number of, as the size of the field

02:15:07.860 | kind of starts saturating, you will have less

02:15:10.100 | of that problem of reviewers being very inexperienced.

02:15:15.100 | A consequence of this is that young reviewers,

02:15:20.160 | I mean, there's a phenomenon which is that

02:15:22.860 | reviewers try to make their life easy

02:15:24.620 | and to make their life easy when reviewing a paper

02:15:27.460 | is very simple, you just have to find a flaw in the paper.

02:15:29.700 | Right?

02:15:30.540 | So basically they see their task as finding flaws in papers

02:15:34.500 | and most papers have flaws, even the good ones.

02:15:36.740 | - Yeah.

02:15:38.140 | - And so it's easy to do that.

02:15:41.500 | Your job is easier as a reviewer if you just focus on this.

02:15:46.420 | But what's important is like, is there a new idea

02:15:50.840 | in that paper that is likely to influence?

02:15:54.120 | It doesn't matter if the experiments are not that great,

02:15:56.240 | if the protocol is, you know, so-so, you know,

02:16:00.680 | things like that, as long as there is a worthy idea in it

02:16:05.040 | that will influence the way people think about the problem,

02:16:08.080 | even if they make it better, you know, eventually,

02:16:11.160 | I think that's really what makes a paper useful.

02:16:15.460 | And so this combination of social phenomena

02:16:19.480 | creates a disease that has plagued, you know,

02:16:24.120 | other fields in the past, like speech recognition,

02:16:26.640 | where basically, you know, people chase numbers

02:16:28.520 | on benchmarks and it's much easier to get a paper accepted

02:16:33.520 | if it brings an incremental improvement

02:16:37.000 | on a sort of mainstream, well-accepted method or problem.

02:16:43.800 | And those are, to me, boring papers.

02:16:46.020 | I mean, they're not useless, right?

02:16:47.860 | Because industry, you know, strives on those kind of progress

02:16:52.340 | but they're not the one that I'm interested in

02:16:54.020 | in terms of like new concepts and new ideas.

02:16:55.620 | So papers that are really trying to strike

02:16:59.260 | kind of new advances generally don't make it.

02:17:02.560 | Now, thankfully, we have Archive.

02:17:04.200 | - Archive, exactly.

02:17:05.260 | And then there's open review type of situations where you,

02:17:08.820 | and then, I mean, Twitter is a kind of open review.

02:17:11.620 | I'm a huge believer that review should be done

02:17:13.840 | by thousands of people, not two people.

02:17:15.680 | - I agree.

02:17:16.700 | - And so Archive, like do you see a future

02:17:19.540 | where a lot of really strong papers,

02:17:21.200 | it's already the present, but a growing future

02:17:23.620 | where it'll just be Archive

02:17:25.320 | and you're presenting an ongoing, continuous conference

02:17:31.260 | called Twitter/the Internet/Archive Sanity.

02:17:35.540 | Andre just released a new version.

02:17:38.000 | So just not, you know, not being so elitist

02:17:40.880 | about this particular gating.

02:17:43.420 | - It's not a question of being elitist or not.

02:17:44.940 | It's a question of being basically recommendation

02:17:49.940 | and seal of approvals for people who don't see themselves

02:17:53.340 | as having the ability to do so by themselves, right?

02:17:55.900 | And so it saves time, right?

02:17:57.300 | If you rely on other people's opinion

02:17:59.980 | and you trust those people or those groups

02:18:03.700 | to evaluate a paper for you,

02:18:07.300 | that saves you time.

02:18:09.920 | 'Cause you don't have to like scrutinize the paper as much.

02:18:13.340 | It is brought to your attention.

02:18:15.140 | I mean, there's the whole idea

02:18:15.980 | of sort of collective recommender system, right?

02:18:18.700 | So I actually thought about this a lot,

02:18:21.180 | you know, about 10, 15 years ago,

02:18:24.140 | 'cause there were discussions at NIPS

02:18:27.020 | and we're about to create iClear with Yoshua Bengio.

02:18:31.220 | And so I wrote a document kind of describing

02:18:34.820 | a reviewing system, which basically was,

02:18:38.020 | you post your paper on some repository,

02:18:39.660 | let's say archive or now could be open review.

02:18:42.540 | And then you can form a reviewing entity,

02:18:46.200 | which is equivalent to a reviewing board,

02:18:48.120 | you know, of a journal or a program committee

02:18:52.320 | of a conference.

02:18:53.940 | You have to list the members

02:18:55.540 | and then that group, reviewing entity,

02:18:59.400 | can choose to review a particular paper,

02:19:02.560 | spontaneously or not.

02:19:03.680 | There is no exclusive relationship anymore

02:19:05.580 | between a paper and a venue or reviewing entity.

02:19:09.160 | Any reviewing entity can review any paper

02:19:11.160 | or may choose not to.

02:19:14.080 | And then, you know, give an evaluation.

02:19:16.640 | It's not published, not published,

02:19:17.920 | it's just an evaluation and a comment,

02:19:20.320 | which would be public, signed by the reviewing entity.

02:19:23.660 | And if it's signed by a reviewing entity,

02:19:25.880 | you know, it's one of the members of reviewing entity.

02:19:27.760 | So if the reviewing entity is, you know,

02:19:30.680 | Lex Friedman's, you know, preferred papers, right?

02:19:33.700 | You know, it's Lex Friedman writing the review.

02:19:35.620 | - Yes.

02:19:36.700 | So for me, that's a beautiful system, I think.

02:19:40.920 | But what's in addition to that,

02:19:42.900 | it feels like there should be a reputation system

02:19:45.800 | for the reviewers.

02:19:47.480 | - For the reviewing entities,

02:19:49.020 | not the reviewers individually.

02:19:50.260 | - The reviewing entities, sure.

02:19:51.700 | But even within that, the reviewers too,

02:19:53.900 | because there's another thing here.

02:19:57.140 | It's not just the reputation,

02:19:59.340 | it's an incentive for an individual person to do great.

02:20:02.780 | Right now, in the academic setting,

02:20:05.060 | the incentive is kind of internal,

02:20:07.900 | just wanting to do a good job.

02:20:09.240 | But honestly, that's not a strong enough incentive

02:20:11.260 | to do a really good job in reading a paper,

02:20:13.700 | in finding the beautiful amidst the mistakes and the flaws

02:20:16.380 | and all that kind of stuff.

02:20:17.780 | Like, if you're the person

02:20:19.220 | that first discovered a powerful paper,

02:20:22.420 | and you get to be proud of that discovery,

02:20:25.100 | then that gives a huge incentive to you.

02:20:27.740 | - That's a big part of my proposal, actually,

02:20:29.300 | where I describe that as, you know,

02:20:31.700 | if your evaluation of papers

02:20:33.700 | is predictive of future success,

02:20:37.500 | then your reputation should go up as a reviewing entity.

02:20:40.900 | So, yeah, exactly.

02:20:43.740 | I mean, I even had a master's student

02:20:46.260 | who was a master's student in library science

02:20:49.540 | and computer science,

02:20:50.380 | actually kind of work out exactly

02:20:52.460 | how that should work with formulas and everything.

02:20:55.100 | - So, in terms of implementation,

02:20:56.780 | do you think that's something that's doable?

02:20:58.580 | - I mean, I've been sort of talking about this

02:21:00.740 | to sort of various people like, you know,

02:21:02.940 | Andrew McCallum, who started Open Review.

02:21:05.900 | And the reason why we picked Open Review

02:21:07.780 | for iClear initially,

02:21:09.060 | even though it was very early for them,

02:21:11.380 | is because my hope was that iClear,

02:21:14.260 | it was eventually going to kind of

02:21:16.700 | inaugurate this type of system.

02:21:18.580 | So iClear kept the idea of open reviews.

02:21:22.220 | So where the reviews are, you know,

02:21:23.820 | published with a paper, which I think is very useful.

02:21:27.300 | But in many ways, that's kind of reverted

02:21:29.740 | to kind of more of a conventional type conferences

02:21:33.260 | for everything else.

02:21:34.100 | And that, I mean, I don't run iClear,

02:21:37.780 | I'm just the president of the foundation,

02:21:41.180 | but, you know, people who run it

02:21:44.100 | should make decisions about how to run it.

02:21:45.620 | And I'm not going to tell them

02:21:47.340 | because they are volunteers

02:21:48.500 | and I'm really thankful that they do that.

02:21:50.300 | So, but I'm saddened by the fact that

02:21:53.820 | we're not being innovative enough.

02:21:57.060 | - Yeah, me too.

02:21:57.900 | I hope that changes.

02:21:59.660 | Yeah, 'cause the communication of science broadly,

02:22:02.060 | but the communication of computer science ideas

02:22:04.420 | is how you make those ideas have impact, I think.

02:22:08.300 | - Yeah, and I think, you know, a lot of this is

02:22:11.420 | because people have in their mind kind of an objective,

02:22:16.220 | which is, you know, fairness for authors

02:22:19.100 | and the ability to count points basically

02:22:22.260 | and give credits accurately.

02:22:24.860 | But that comes at the expense of the progress of science.

02:22:28.860 | So to some extent,

02:22:29.700 | we're slowing down the progress of science.

02:22:32.140 | - And are we actually achieving fairness?

02:22:34.420 | - And we're not achieving fairness, you know,

02:22:36.460 | we still have biases, you know,

02:22:38.060 | we're doing, you know, a double-blind review,

02:22:39.780 | but, you know, the biases are still there.

02:22:44.340 | There are different kinds of biases.

02:22:46.700 | - You write that the phenomenon of emergence,

02:22:49.340 | collective behavior exhibited by a large collection

02:22:51.660 | of simple elements in interaction

02:22:54.220 | is one of the things that got you

02:22:55.700 | into neural nets in the first place.

02:22:57.740 | I love cellular automata,

02:22:59.060 | I love simple interacting elements

02:23:01.940 | and the things that emerge from them.

02:23:04.020 | Do you think we understand how complex systems

02:23:07.260 | can emerge from such simple components

02:23:09.580 | that interact simply?

02:23:11.020 | - No, we don't.

02:23:12.260 | It's a big mystery.

02:23:13.100 | Also, it's a mystery for physicists,

02:23:14.460 | it's a mystery for biologists.

02:23:16.020 | You know, how is it that the universe around us

02:23:22.060 | seems to be increasing in complexity and not decreasing?

02:23:25.140 | I mean, that is a kind of curious property of physics

02:23:29.620 | that despite the second law of thermodynamics,

02:23:32.340 | we seem to be, you know, evolution and learning

02:23:35.940 | and et cetera seems to be kind of at least locally

02:23:39.620 | to increase complexity, not decrease it.

02:23:43.980 | So perhaps the ultimate purpose of the universe

02:23:46.500 | is to just get more complex.

02:23:49.060 | - Have these, I mean, small pockets of beautiful complexity.

02:23:55.060 | Does that, does cellular automata,

02:23:57.100 | do these kinds of emergence of complex systems

02:23:59.660 | give you some intuition or guide your understanding

02:24:04.100 | of machine learning systems and neural networks and so on?

02:24:06.660 | Or are these for you right now, disparate concepts?

02:24:09.420 | - Well, it got me into it.

02:24:10.860 | You know, I discovered the existence of the perceptron

02:24:15.580 | when I was a college student, you know,

02:24:18.540 | by reading a book on, it was a debate between Chomsky

02:24:20.940 | and Piaget and Seymour Papert from MIT

02:24:24.180 | was kind of singing the praise of the perceptron

02:24:26.620 | in that book.

02:24:27.460 | And I, the first time I heard about the learning machine,

02:24:29.740 | right, so I started digging the literature

02:24:31.340 | and I found those books,

02:24:33.540 | which were basically transcription of, you know,

02:24:36.020 | workshops or conferences from the 50s and 60s

02:24:39.860 | about self-organizing systems.

02:24:42.140 | So there were, there was a series of conferences

02:24:44.540 | on self-organizing systems and these books on this.

02:24:48.140 | Some of them are, you can actually get them

02:24:50.180 | at the internet archive, you know, the digital version.

02:24:53.220 | And there are like fascinating articles in there by,

02:24:58.260 | there's a guy whose name has been largely forgotten,

02:25:00.340 | Heinz von Foerster.

02:25:01.740 | So it was a German physicist who immigrated to the US

02:25:06.180 | and worked on self-organizing systems in the 50s.

02:25:11.180 | And in the 60s, he created,

02:25:12.860 | at University of Illinois Urbana-Champaign,

02:25:14.420 | he created the biological computer laboratory, BCL,

02:25:18.900 | which was, you know, all about neural nets.

02:25:21.580 | Unfortunately, that was kind of towards the end

02:25:23.340 | of the popularity of neural nets.

02:25:24.820 | So that lab never kind of strived very much,

02:25:27.660 | but he wrote a bunch of papers about self-organization

02:25:30.260 | and about the mystery of self-organization.

02:25:33.420 | An example he has is, you take,

02:25:35.620 | imagine you are in space, there's no gravity,

02:25:37.980 | you have a big box with magnets in it, okay?

02:25:42.100 | You know, kind of rectangular magnets

02:25:43.820 | with North Pole on one end, South Pole on the other end.

02:25:46.820 | You shake the box gently and the magnets

02:25:48.980 | will kind of stick to themselves

02:25:50.100 | and probably form a complex structure, you know,

02:25:53.660 | spontaneously, you know,

02:25:55.420 | that could be an example of self-organization.

02:25:57.020 | But, you know, you have lots of examples,

02:25:58.340 | neural nets are an example of self-organization too,

02:26:01.180 | you know, in many respect.

02:26:02.980 | And it's a bit of a mystery, you know,

02:26:05.900 | how, like what is possible with this, you know,

02:26:09.420 | pattern formation in physical systems,

02:26:11.940 | in chaotic system and things like that, you know,

02:26:14.700 | the emergence of life, you know, things like that.

02:26:16.860 | So, you know, how does that happen?

02:26:19.540 | So it's a big puzzle for physicists as well.

02:26:22.540 | - It feels like understanding this,

02:26:24.660 | the mathematics of emergence in some constrained situations

02:26:29.660 | might help us create intelligence.

02:26:32.060 | Like help us add a little spice to the systems

02:26:35.980 | because you seem to be able to,

02:26:39.500 | in complex systems with emergence,

02:26:41.900 | to be able to get a lot from little.

02:26:44.620 | And so that seems like a shortcut

02:26:47.020 | to get big leaps in performance.

02:26:49.700 | But-

02:26:51.100 | - But there's a missing theoretical concept

02:26:53.660 | that we don't have.

02:26:55.020 | - Yeah.

02:26:55.860 | - And it's something also I've been fascinated by

02:26:58.420 | since my undergrad days.

02:27:00.700 | And it's how you measure complexity, right?

02:27:03.900 | So we don't actually have good ways of measuring,

02:27:06.940 | or at least we don't have good ways of interpreting

02:27:09.860 | the measures that we have at our disposal.

02:27:11.940 | Like how do you measure the complexity of something, right?

02:27:14.500 | So there's all those things, you know,

02:27:15.660 | like, you know, Kolmogorov, Chaitin, Solomonov complexity

02:27:18.540 | of, you know, the length of the shortest program

02:27:20.940 | that would generate a bit string

02:27:22.460 | can be thought of as the complexity of that bit string.

02:27:25.220 | - Mm-hmm.

02:27:26.060 | - I've been fascinated by that concept.

02:27:28.180 | The problem with that is that that complexity

02:27:32.380 | is defined up to a constant, which can be very large.

02:27:34.980 | - Right.

02:27:36.740 | - There are similar concepts that are derived from,

02:27:38.860 | you know, Bayesian probability theory,

02:27:43.340 | where, you know, the complexity of something

02:27:45.580 | is the negative log of its probability, essentially, right?

02:27:49.460 | And you have a complete equivalence between the two things.

02:27:52.260 | And there you would think, you know,

02:27:53.180 | the probability is something

02:27:54.420 | that's well-defined mathematically,

02:27:56.220 | which means complexity is well-defined.

02:27:58.220 | But it's not true.

02:27:59.060 | You need to have a model of the distribution.

02:28:02.660 | You may need to have a prior

02:28:03.780 | if you're doing Bayesian inference.

02:28:05.100 | And the prior plays the same role

02:28:06.580 | as the choice of the computer

02:28:07.940 | with which you measure Kolmogorov complexity.

02:28:10.460 | And so every measure of complexity we have

02:28:12.980 | has some arbitrariness in it.

02:28:14.500 | You know, an additive constant,

02:28:17.740 | which can be arbitrarily large.

02:28:20.500 | And so, you know, how can we come up with a good theory

02:28:24.260 | of how things become more complex

02:28:25.580 | if we don't have a good measure of complexity?

02:28:26.900 | - Yeah, which we need for,

02:28:28.580 | there's one way that people study this

02:28:31.500 | in the space of biology,

02:28:32.980 | the people that study the origin of life

02:28:34.540 | or try to recreate life in the laboratory.

02:28:37.820 | And the more interesting one is the alien one,

02:28:39.860 | is when we go to other planets,

02:28:42.020 | how would we recognize this life?

02:28:44.700 | 'Cause, you know, complexity, we associate complexity,

02:28:47.500 | maybe some level of mobility with life.

02:28:49.820 | You know, we have to be able to like have concrete algorithms

02:28:55.700 | for like measuring the level of complexity we see

02:29:00.780 | in order to know the difference between life and non-life.

02:29:03.340 | - And the problem is that complexity

02:29:04.620 | is in the eye of the beholder.

02:29:06.060 | So let me give you an example.

02:29:08.100 | If I give you an image of the MNIST digits, right?

02:29:13.100 | And I flip through MNIST digits,

02:29:16.020 | there is some, obviously some structure to it

02:29:18.700 | because local structure, you know,

02:29:21.060 | neighboring pixels are correlated

02:29:22.780 | across the entire dataset.

02:29:26.140 | I imagine that I apply a random permutation

02:29:30.980 | to all the pixels, a fixed random permutation.

02:29:34.580 | I show you those images, they will look, you know,

02:29:37.980 | really disorganized to you, more complex.

02:29:40.420 | In fact, they're not more complex in absolute terms,

02:29:43.500 | they're exactly the same as originally, right?

02:29:46.100 | And if you knew what the permutation was, you know,

02:29:47.860 | you could undo the permutation.

02:29:50.020 | Now, imagine I give you special glasses

02:29:52.900 | that undo that permutation.

02:29:54.700 | Now, all of a sudden, what looked complicated becomes simple.

02:29:57.620 | - Right.

02:29:58.460 | - So if you have two, if you have, you know,

02:30:00.900 | humans on one end and then another race of aliens

02:30:03.820 | that sees the universe with permutation glasses.

02:30:05.980 | - Yeah, with the permutation glasses.

02:30:07.460 | (Lex laughing)

02:30:08.740 | - What we perceive as simple to them is hardly complicated,

02:30:11.380 | it's probably heat.

02:30:12.340 | - Yeah, heat, yeah.

02:30:13.540 | - Okay, and what they perceive as simple to us

02:30:15.900 | is random fluctuation, it's heat.

02:30:19.060 | - Yeah.

02:30:20.460 | - So-

02:30:21.300 | - Truly in the eye of the beholder,

02:30:22.780 | depends what kind of glasses you're wearing.

02:30:24.940 | - Right.

02:30:25.780 | - Depends what kind of algorithm you're running

02:30:26.860 | in your perception system.

02:30:28.380 | - So I don't think we'll have a theory of intelligence,

02:30:31.140 | self-organization, evolution, things like this

02:30:34.380 | until we have a good handle on a notion of complexity,

02:30:38.540 | which we know is in the eye of the beholder.

02:30:40.860 | - Yeah, it's sad to think that we might not be able

02:30:44.420 | to detect or interact with alien species

02:30:47.620 | because we're wearing different glasses.

02:30:50.340 | - Because the notion of locality

02:30:51.500 | might be different from ours.

02:30:52.460 | - Yeah, exactly.

02:30:53.300 | - This actually connects with fascinating questions

02:30:55.260 | in physics at the moment, like modern physics,

02:30:58.140 | quantum physics, like, you know, questions about,

02:31:00.300 | like, you know, can we recover the information

02:31:02.580 | that's lost in a black hole and things like this, right?

02:31:04.620 | And that relies on notions of complexity,

02:31:07.980 | which, you know, I find this fascinating.

02:31:11.700 | - Can you describe your personal quest

02:31:13.420 | to build an expressive electronic wind instrument, EWI?

02:31:18.420 | What is it?

02:31:20.660 | What does it take to build it?

02:31:24.060 | - Well, I'm a tinkerer.

02:31:25.140 | I like building things.

02:31:26.820 | I like building things with combinations of electronics

02:31:29.020 | and, you know, mechanical stuff.

02:31:32.460 | You know, I have a bunch of different hobbies,

02:31:34.140 | but, you know, probably my first one was little,

02:31:38.020 | was building model airplanes and stuff like that.

02:31:39.820 | And I still do that to some extent,

02:31:41.900 | but also electronics.

02:31:42.740 | I taught myself electronics before I studied it.

02:31:45.180 | And the reason I taught myself electronics

02:31:48.140 | is because of music.

02:31:49.620 | My cousin was an aspiring electronic musician

02:31:53.180 | and he had an analog synthesizer.

02:31:55.020 | And I was, you know, basically modifying it for him

02:31:58.020 | and building sequencers and stuff like that, right, for him.

02:32:00.260 | I was in high school when I was doing this.

02:32:02.620 | - How's the interest in like progressive rock, like '80s?

02:32:06.060 | Like what's the greatest band of all time,

02:32:07.980 | according to Yann LeCun?

02:32:09.500 | - Oh, man, there's too many of them.

02:32:11.100 | But, you know, it's a combination of, you know,

02:32:16.100 | Mahavishnu Orchestra, Weather Report,

02:32:19.820 | Yes, Genesis, you know,

02:32:22.780 | - Yes, Genesis.

02:32:23.980 | - Pre-Peter Gabriel, Gentle Giant, you know,

02:32:28.100 | things like that.

02:32:29.100 | - Great.

02:32:29.940 | Okay, so this love of electronics

02:32:32.300 | and this love of music combined together.

02:32:34.260 | - Right, so I was actually trained to play

02:32:36.340 | Baroque and Renaissance music.

02:32:39.500 | And I played in an orchestra when I was in high school

02:32:43.300 | and first year of college.

02:32:45.780 | And I played the recorder, cram horn,

02:32:48.060 | a little bit of oboe, you know, things like that.

02:32:50.220 | So I'm a wind instrument player.

02:32:52.540 | But I always wanted to play improvised music,

02:32:54.100 | even though I don't know anything about it.

02:32:56.340 | And the only way I figured, you know,

02:32:58.780 | short of like learning to play a saxophone

02:33:01.060 | was to play electronic wind instruments.

02:33:03.540 | So they behave, you know,

02:33:04.540 | the fingering is similar to a saxophone,

02:33:06.380 | but, you know, you have wide variety of sound

02:33:09.060 | because you control the synthesizer with it.

02:33:11.020 | So I had a bunch of those, you know,

02:33:13.100 | going back to the late 80s from either Yamaha or Akai.

02:33:18.100 | They're both kind of the main manufacturers of those.

02:33:22.500 | So they were classically, you know,

02:33:23.660 | going back several decades.

02:33:25.700 | But I've never been completely satisfied with them

02:33:27.660 | because of lack of expressivity.

02:33:29.260 | And, you know, those things, you know,

02:33:32.460 | are somewhat expressive.

02:33:33.420 | I mean, they measure the breath pressure,

02:33:34.780 | they measure the lip pressure,

02:33:36.540 | and, you know, you have various parameters.

02:33:39.820 | You can vary it with fingers,

02:33:41.500 | but they're not really as expressive

02:33:44.820 | as an acoustic instrument, right?

02:33:47.100 | You hear John Coltrane play two notes

02:33:49.420 | and you know it's John Coltrane,

02:33:50.820 | you know, it's got a unique sound.

02:33:53.060 | Or Miles Davis, right?

02:33:54.340 | You can hear it's Miles Davis playing the trumpet

02:33:57.540 | because the sound reflects their, you know,

02:34:02.540 | physiognomy, basically.

02:34:04.780 | The shape of the vocal track kind of shapes the sound.

02:34:09.700 | So how do you do this with an electronic instrument?

02:34:12.860 | And I was, many years ago I met a guy called David Wessel.

02:34:16.140 | He was a professor at Berkeley

02:34:18.780 | and created the Center for like, you know,

02:34:21.940 | music technology there.

02:34:23.500 | And he was interested in that question.

02:34:26.140 | And so I kept kind of thinking about this for many years.

02:34:28.620 | And finally, because of COVID, you know, I was at home.

02:34:31.540 | I was in my workshop.

02:34:32.580 | My workshop serves also as my kind of Zoom room

02:34:36.020 | and home office.

02:34:37.340 | - This is in New Jersey?

02:34:38.780 | - In New Jersey.

02:34:39.620 | And I started really being serious about, you know,

02:34:43.580 | building my own EWI instrument.

02:34:45.780 | - What else is going on in that New Jersey workshop?

02:34:48.140 | Is there some crazy stuff you've built?

02:34:50.860 | Like just, or like left on the workshop floor, left behind?

02:34:55.180 | - A lot of crazy stuff is, you know,

02:34:57.580 | electronics built with microcontrollers of various kinds

02:35:01.660 | and, you know, weird flying contraptions.

02:35:04.860 | - So you still love flying?

02:35:08.700 | - It's a family disease.

02:35:09.860 | My dad got me into it when I was a kid

02:35:12.620 | and he was building model airplanes when he was a kid.

02:35:16.820 | And he was a mechanical engineer.

02:35:19.780 | He taught himself electronics also.

02:35:21.140 | So he built his early radio control systems

02:35:24.060 | in the late sixties, early seventies.

02:35:26.420 | And so that's what got me into, I mean,

02:35:29.780 | he got me into kind of, you know,

02:35:31.060 | engineering and science and technology.

02:35:33.020 | - Do you also have an interest in appreciation of flight

02:35:36.100 | in other forms, like with drones, quadropters, or do you,

02:35:39.220 | is it model airplane, the thing that's-

02:35:42.700 | - You know, before drones were, you know,

02:35:45.180 | kind of a consumer product, you know,

02:35:49.180 | I built my own, you know,

02:35:50.220 | with also building a microcontroller

02:35:51.940 | with a gyroscopes and accelerometers for stabilization,

02:35:56.220 | writing the firmware for it, you know.

02:35:57.700 | And then when it became kind of a standard thing

02:35:59.140 | you could buy, it was boring, you know,

02:36:00.300 | I stopped doing it, it was not fun anymore.

02:36:02.460 | - Yeah, you were doing it before it was cool.

02:36:06.260 | - Yeah.

02:36:07.100 | - What advice would you give to a young person today

02:36:10.020 | in high school and college that dreams of doing

02:36:13.780 | something big like Yann LeCun,

02:36:15.940 | like let's talk in the space of intelligence,

02:36:18.940 | dreams of having a chance to solve

02:36:20.940 | some fundamental problem in space of intelligence,

02:36:23.980 | both for their career and just in life,

02:36:26.180 | being somebody who was a part of creating something special?

02:36:30.700 | - So try to get interested by big questions,

02:36:35.420 | things like, you know, what is intelligence?

02:36:38.660 | What is the universe made of?

02:36:40.420 | What's life all about?

02:36:41.660 | Things like that.

02:36:42.500 | Like even like crazy big questions, like what's time?

02:36:49.060 | Like nobody knows what time is.

02:36:51.460 | And then learn basic things, like basic methods,

02:36:56.460 | either from math, from physics or from engineering.

02:37:03.260 | Things that have a long shelf life.

02:37:05.620 | Like if you have a choice between like, you know,

02:37:08.740 | learning, you know, mobile programming on iPhone

02:37:11.700 | or quantum mechanics, take quantum mechanics.

02:37:14.860 | Because you're gonna learn things

02:37:18.500 | that you have no idea exist.

02:37:20.420 | And you may not, you may never be a quantum physicist,

02:37:25.340 | but you'll learn about path integrals

02:37:26.780 | and path integrals are used everywhere.

02:37:29.140 | It's the same formula that you use for, you know,

02:37:31.100 | Bayesian integration and stuff like that.

02:37:33.300 | - So the ideas, the little ideas within quantum mechanics,

02:37:38.100 | within some of these kind of more solidified fields

02:37:41.460 | will have a longer shelf life.

02:37:42.660 | They'll somehow use indirectly in your work.

02:37:46.940 | - Learn classical mechanics,

02:37:48.100 | like you learn about Lagrangians, for example.

02:37:50.420 | Which is like a hugely useful concept, you know,

02:37:55.140 | for all kinds of different things.

02:37:57.300 | Learn statistical physics, because all the math

02:38:01.660 | that comes out of, you know, for machine learning,

02:38:04.420 | basically comes out of what's figured out

02:38:07.260 | by statistical physicists in the, you know,

02:38:09.220 | late 19th, early 20th century, right?

02:38:10.940 | So, and for some of them actually more recently,

02:38:14.260 | for by people like Giorgio Parisi,

02:38:16.100 | who just got the Nobel prize for the replica method,

02:38:19.060 | among other things, it's used for a lot of different things.

02:38:23.180 | You know, variational inference,

02:38:25.580 | that math comes from statistical physics.

02:38:27.620 | So, a lot of those kind of, you know, basic courses,

02:38:33.580 | you know, if you do electrical engineering,

02:38:36.220 | you take signal processing,

02:38:37.620 | you'll learn about Fourier transforms.

02:38:39.860 | Again, something super useful is at the basis

02:38:42.700 | of things like graph neural nets,

02:38:44.900 | which is an entirely new sub area of, you know,

02:38:49.380 | AI machine learning, deep learning,

02:38:50.660 | which I think is super promising

02:38:52.140 | for all kinds of applications.

02:38:54.340 | Something very promising,

02:38:55.220 | if you're more interested in applications,

02:38:56.660 | is the applications of AI machine learning

02:38:58.820 | and deep learning to science.

02:39:00.420 | Or to science that can help solve big problems in the world.

02:39:05.540 | I have colleagues at Meta, at FAIR,

02:39:09.220 | who started this project called Open Catalyst,

02:39:11.220 | and it's an open project collaborative.

02:39:14.540 | And the idea is to use deep learning

02:39:16.620 | to help design new chemical compounds or materials

02:39:21.620 | that would facilitate the separation

02:39:23.740 | of hydrogen from oxygen.

02:39:25.780 | If you can efficiently separate oxygen from hydrogen

02:39:29.020 | with electricity, you solve climate change.

02:39:33.500 | It's as simple as that.

02:39:34.420 | Because you cover, you know,

02:39:37.580 | some random desert with solar panels,

02:39:39.740 | and you have them work all day, produce hydrogen,

02:39:43.420 | and then you shoot the hydrogen wherever it's needed.

02:39:45.380 | You don't need anything else.

02:39:46.820 | You know, you have controllable power

02:39:53.420 | that's, you know, can be transported anywhere.

02:39:55.620 | So if we have a large-scale, efficient

02:39:59.700 | energy storage technology, like producing hydrogen,

02:40:04.180 | we solve climate change.

02:40:06.620 | Here's another way to solve climate change,

02:40:08.500 | is figuring out how to make fusion work.

02:40:10.420 | Now, the problem with fusion

02:40:11.460 | is that you make a super-hot plasma,

02:40:13.580 | and the plasma is unstable, and you can't control it.

02:40:16.220 | Maybe with deep learning, you can find controllers

02:40:17.940 | that will stabilize plasma

02:40:19.100 | and make, you know, practical fusion reactors.

02:40:21.620 | I mean, that's very speculative,

02:40:23.060 | but, you know, it's worth trying,

02:40:24.460 | because, you know, the payoff is huge.

02:40:28.260 | There's a group at Google working on this,

02:40:29.900 | led by John Platt.

02:40:31.140 | - So, control, convert as many problems

02:40:33.900 | in science and physics and biology and chemistry

02:40:36.780 | into a learnable problem,

02:40:39.740 | and see if a machine can learn it.

02:40:41.540 | - Right, I mean, there's properties of, you know,

02:40:43.900 | complex materials that we don't understand

02:40:46.300 | from first principle, for example, right?

02:40:48.540 | So, you know, if we could design new, you know,

02:40:53.060 | new materials, we could make more efficient batteries.

02:40:56.420 | You know, we could make maybe faster electronics.

02:40:58.780 | We could, I mean, there's a lot of things we can imagine

02:41:01.900 | doing, or, you know, lighter materials

02:41:04.500 | for cars or airplanes and things like that.

02:41:06.420 | Maybe better fuel cells.

02:41:07.620 | I mean, there's all kinds of stuff we can imagine.

02:41:09.500 | If we had good fuel cells, hydrogen fuel cells,

02:41:12.300 | we could use them to power airplanes,

02:41:13.620 | and, you know, transportation wouldn't be, or cars,

02:41:17.220 | we wouldn't have emission problem,

02:41:20.300 | CO2 emission problems for air transportation anymore.

02:41:24.580 | So, there's a lot of those things,

02:41:26.500 | I think, where AI, you know, can be used.

02:41:29.180 | And this is not even talking about all the sort of

02:41:32.420 | medicine, biology, and everything like that, right?

02:41:35.660 | You know, like protein folding, you know,

02:41:38.100 | figuring out, like, how can you design your proteins

02:41:40.540 | that it sticks to another protein at a particular site,

02:41:42.820 | because that's how you design drugs in the end.

02:41:45.180 | So, you know, deep learning would be useful,

02:41:47.580 | all of this, and those are kind of, you know,

02:41:49.260 | would be sort of enormous progress

02:41:51.100 | if we could use it for that.

02:41:53.380 | Here's an example.

02:41:54.300 | If you take, this is like from recent material physics,

02:41:58.260 | you take a monoatomic layer of graphene, right?

02:42:02.180 | So, it's just carbon on an hexagonal mesh,

02:42:04.900 | and you make this single atom thick.

02:42:09.140 | You put another one on top,

02:42:10.340 | you twist them by some magic number of degrees,

02:42:13.100 | three degrees or something, it becomes superconductor.

02:42:16.780 | Nobody has any idea why.

02:42:18.100 | (both laughing)

02:42:20.820 | - I wanna know how that was discovered,

02:42:22.460 | but that's the kind of thing that machine learning

02:42:23.900 | can actually discover, these kinds of things.

02:42:25.820 | - Maybe not, but there is a hint, perhaps,

02:42:28.980 | that with machine learning, we would train a system

02:42:31.740 | to basically be a phenomenological model

02:42:34.860 | of some complex emergent phenomenon,

02:42:37.220 | which superconductivity is one of those,

02:42:40.380 | where this collective phenomenon is too difficult

02:42:45.340 | to describe from first principles

02:42:46.900 | with the usual sort of reductionist type method.

02:42:51.900 | But we could have deep learning systems

02:42:54.940 | that predict the properties of a system

02:42:57.660 | from a description of it,

02:42:59.180 | after being trained with sufficiently many samples.

02:43:02.660 | This guy, Pascal Fouad, at EPFL,

02:43:06.660 | he has a startup company

02:43:08.100 | where he basically trained a convolutional net,

02:43:13.420 | essentially, to predict the aerodynamic properties of solids.

02:43:17.980 | And you can generate as much data as you want

02:43:19.620 | by just running computational free dynamics, right?

02:43:21.900 | So, you give a wing airfoil or something,

02:43:28.260 | a shape of some kind,

02:43:29.780 | and you run computational free dynamics,

02:43:31.380 | you get, as a result, the drag and lift

02:43:36.380 | and all that stuff, right?

02:43:37.460 | And you can generate lots of data,

02:43:40.060 | train a neural net to make those predictions,

02:43:41.780 | and now what you have is a differentiable model

02:43:44.100 | of, let's say, drag and lift,

02:43:46.940 | as a function of the shape of that solid.

02:43:48.660 | And so you can do background and design,

02:43:49.900 | you can optimize the shape,

02:43:51.460 | so you get the properties you want.

02:43:53.220 | - Yeah, that's incredible.

02:43:56.020 | That's incredible.

02:43:56.860 | And on top of all that,

02:43:58.260 | probably you should read a little bit of literature

02:44:01.420 | and a little bit of history for inspiration and for wisdom,

02:44:06.420 | 'cause after all, all of these technologies

02:44:08.780 | will have to work in the human world.

02:44:10.260 | - Yes.

02:44:11.100 | - And the human world is complicated.

02:44:12.620 | - It is, certainly.

02:44:14.100 | - Jan, this is an amazing conversation.

02:44:18.380 | I'm really honored that you would talk with me today.

02:44:20.380 | Thank you for all the amazing work

02:44:21.820 | you're doing at FAIR, at META,

02:44:23.780 | and thank you for being so passionate

02:44:26.220 | after all these years about everything that's going on.

02:44:28.780 | You're a beacon of hope for the machine learning community.

02:44:31.620 | And thank you so much

02:44:32.700 | for spending your valuable time with me today.

02:44:34.460 | That was awesome.

02:44:35.300 | - Thanks for having me on.

02:44:36.300 | That was a pleasure.

02:44:37.780 | - Thanks for listening to this conversation with Jan Lekun.

02:44:41.420 | To support this podcast,

02:44:42.780 | please check out our sponsors in the description.

02:44:45.740 | And now, let me leave you with some words

02:44:47.820 | from Isaac Asimov.

02:44:49.580 | "Your assumptions are your windows on the world.

02:44:53.700 | "Scrub them off every once in a while,

02:44:55.940 | "or the light won't come in."

02:44:57.860 | Thank you for listening,

02:45:00.060 | and hope to see you next time.

02:45:02.060 | (upbeat music)

02:45:04.660 | (upbeat music)

02:45:07.260 | [BLANK_AUDIO]

Yann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258

Chapters