back to indexYann LeCun: Dark Matter of Intelligence and Self-Supervised Learning | Lex Fridman Podcast #258
Chapters
0:0 Introduction
0:36 Self-supervised learning
10:55 Vision vs language
16:46 Statistics
22:33 Three challenges of machine learning
28:22 Chess
36:25 Animals and intelligence
46:9 Data augmentation
67:29 Multimodal learning
79:18 Consciousness
84:3 Intrinsic vs learned ideas
88:15 Fear of death
96:7 Artificial Intelligence
109:56 Facebook AI Research
126:34 NeurIPS
142:46 Complexity
151:11 Music
156:6 Advice for young people
00:00:00.000 |
The following is a conversation with Jan Lekun, 00:00:04.540 |
He is the chief AI scientist at Meta, formerly Facebook, 00:00:15.620 |
of machine learning and artificial intelligence, 00:00:29.980 |
in the description, and now, here's my conversation 00:00:37.540 |
"Self-supervised learning, the dark matter of intelligence." 00:00:43.720 |
So let me ask, what is self-supervised learning, 00:00:46.640 |
and why is it the dark matter of intelligence? 00:00:59.860 |
that we currently are not reproducing properly 00:01:04.660 |
So the most popular approaches to machine learning today 00:01:09.660 |
are supervised learning and reinforcement learning. 00:01:21.820 |
a ridiculously large number of trial and errors 00:01:29.340 |
And that's why we don't have self-driving cars. 00:01:45.500 |
with reinforcement learning, you have to have 00:01:50.220 |
such that you can do that large-scale kind of learning 00:02:02.300 |
whereas even with millions of hours of simulated practice, 00:02:10.700 |
And so obviously we're missing something, right? 00:02:13.900 |
And it's quite obvious for a lot of people that, 00:02:16.420 |
you know, the immediate response you get from many people 00:02:19.520 |
is, well, you know, humans use their background knowledge 00:02:25.820 |
Now, how was that background knowledge acquired? 00:02:32.380 |
how do babies in their first few months of life 00:02:43.820 |
That may be the basis of what we call common sense. 00:02:47.940 |
This type of learning, it's not learning a task, 00:02:53.620 |
it's just observing the world and figuring out how it works. 00:02:57.240 |
Building world models, learning world models. 00:03:10.220 |
at trying to reproduce this kind of learning. 00:03:13.020 |
- Okay, so you're looking at just observation, 00:03:18.620 |
It's just sitting there watching mom and dad walk around, 00:03:23.420 |
- That's what you mean by background knowledge. 00:03:29.980 |
- Just having eyes open or having eyes closed, 00:03:37.820 |
And you're saying in order to learn to drive, 00:03:43.100 |
like the reason humans are able to learn to drive quickly, 00:03:48.660 |
they were able to watch cars operate in the world 00:03:53.580 |
the physics of basic objects, all that kind of stuff. 00:03:57.420 |
you don't even know, you don't even need to know, 00:04:08.100 |
because of your understanding of intuitive physics, 00:04:17.580 |
and nothing good will come out of this, right? 00:04:30.500 |
thousands of times before you figure out it's a bad idea. 00:04:42.500 |
- So self-supervised learning still has to have 00:04:45.820 |
some source of truth being told to it by somebody. 00:04:50.100 |
- So you have to figure out a way without human assistance 00:04:54.540 |
or without significant amount of human assistance 00:04:59.100 |
So the mystery there is how much signal is there, 00:05:03.980 |
how much truth is there that the world gives you, 00:05:08.180 |
like you watch YouTube or something like that, 00:05:16.300 |
There is way more signal in sort of a self-supervised setting 00:05:32.340 |
where when you try to figure out how much information 00:05:37.820 |
and how much feedback you give the machine at every trial, 00:05:43.300 |
you tell the machine you did good, you did bad, 00:05:45.340 |
and you only tell this to the machine once in a while. 00:05:57.100 |
you cannot possibly learn something very complicated 00:06:01.060 |
where you get many, many feedbacks of this type. 00:06:04.700 |
Supervised learning, you give a few bits to the machine 00:06:10.180 |
Let's say you're training a system on, you know, 00:06:17.660 |
that's a little less than 10 bits of information per sample. 00:06:20.900 |
But self-supervised learning here is a setting. 00:06:26.340 |
but ideally you would show a machine a segment of video 00:06:31.340 |
and then stop the video and ask the machine to predict 00:06:46.300 |
learn to do a better job at predicting next time around. 00:06:49.340 |
There's a huge amount of information you give the machine 00:06:51.500 |
because it's an entire video clip of, you know, 00:07:02.820 |
there's a subtle, seemingly trivial construction, 00:07:17.780 |
it is possible you could solve all of intelligence 00:07:43.860 |
Do you think it's possible that formulation alone, 00:07:50.980 |
can solve intelligence for vision and language? 00:07:53.620 |
- I think that's our best shot at the moment. 00:07:59.820 |
you know, human level intelligence or something, 00:08:07.340 |
that people have proposed, I think it's our best shot. 00:08:09.500 |
So I think this idea of an intelligence system 00:08:14.500 |
filling in the blanks, either predicting the future, 00:08:18.860 |
inferring the past, filling in missing information, 00:08:22.180 |
I'm currently filling the blank of what is behind your head 00:08:30.580 |
because I have basic knowledge about how humans are made. 00:08:35.660 |
what are you gonna say, at which point you're gonna speak, 00:08:37.260 |
whether you're gonna move your head this way or that way, 00:08:40.260 |
But I know you're not gonna just dematerialize 00:08:44.940 |
because I know what's possible and what's impossible, 00:08:50.900 |
- So you have a model of what's possible and what's impossible 00:08:53.260 |
and then you'd be very surprised if it happens, 00:08:55.100 |
and then you'll have to reconstruct your model. 00:08:59.620 |
It's what tells you what fills in the blanks. 00:09:04.460 |
about the state of the world, given by your perception, 00:09:08.060 |
your model of the world fills in the missing information. 00:09:15.220 |
filling in things you don't immediately perceive. 00:09:18.380 |
- And that doesn't have to be purely generic vision 00:09:25.820 |
like predicting what control decision you make 00:09:31.580 |
You have a sequence of images from a vehicle, 00:09:38.380 |
if you record it on video, where the car ended up going. 00:09:41.780 |
So you can go back in time and predict where the car went 00:09:49.420 |
- Right, but the question is whether we can come up 00:09:51.460 |
with sort of a generic method for training machines 00:09:56.460 |
to do this kind of prediction or filling in the blanks. 00:10:05.540 |
in the context of natural language processing. 00:10:13.660 |
You show it a sequence of words, you remove 10% of them, 00:10:22.660 |
you can use the internal representation learned by it 00:10:26.540 |
as input to something that you trained, supervised, 00:10:33.300 |
Not so successful in images, although it's making progress. 00:10:37.500 |
And it's based on sort of manual data augmentation. 00:10:43.460 |
But what has not been successful yet is training from video. 00:10:47.140 |
So getting a machine to learn to represent the visual world, 00:10:54.740 |
- Okay, well, let's kind of give a high-level overview. 00:10:57.460 |
What's the difference in kind and in difficulty 00:11:03.900 |
So you said people haven't been able to really kind of crack 00:11:15.820 |
Maybe like when we're talking about achieving, 00:11:18.660 |
like passing the Turing test in the full spirit 00:11:22.260 |
of the Turing test in language might be harder than vision. 00:11:40.180 |
that make them look essentially like the same cake, 00:11:44.740 |
And the main issue with learning world models 00:11:55.860 |
because the world is not entirely predictable. 00:12:00.700 |
We can get into the philosophical discussion about it, 00:12:11.740 |
and then I ask you to predict what's going to happen next, 00:12:20.540 |
with the interval of time that you're asking the system 00:12:26.460 |
And so one big question with self-supervised learning 00:12:32.300 |
how you represent multiple discrete outcomes, 00:12:40.380 |
And if you are a sort of a classical machine learning person, 00:12:45.180 |
you say, "Oh, you just represent a distribution." 00:12:47.580 |
And that we know how to do when we're predicting words, 00:12:53.660 |
because you can have a neural net give a score 00:12:58.580 |
It's a big list of numbers, maybe 100,000 or so. 00:13:02.420 |
And you can turn them into a probability distribution 00:13:12.300 |
There are only a few words that make sense there. 00:13:15.820 |
It could be a mouse or it could be a lizard spot 00:13:21.540 |
And if I say the blank is chasing the blank in the savanna, 00:13:33.620 |
that you can refer to to sort of fill in those blanks. 00:13:44.460 |
you cannot know if it's a zebra or a gnu or whatever, 00:14:09.980 |
a sort of infinite number of plausible continuations 00:14:13.540 |
of multiple frames in a high dimensional continuous space. 00:14:17.460 |
And we just have no idea how to do this properly. 00:14:26.220 |
they try to get it down to a small finite set 00:14:31.220 |
of like under a million, something like that. 00:14:39.020 |
of every single possible word for language and it works. 00:14:42.900 |
It feels like that's a really dumb way to do it. 00:14:58.900 |
about how to represent all of reality in a compressed way 00:15:01.860 |
such that you can form a distribution over it? 00:15:03.780 |
- That's one of the big questions, how do you do that? 00:15:25.640 |
those distributions are essentially independent 00:15:28.500 |
And you don't pay too much of a price for this. 00:15:38.900 |
if it gives a certain probability for a lion and a cheetah, 00:15:43.620 |
and then a certain probability for gazelle, wildebeest, 00:15:54.780 |
And it's not the case that those things are independent. 00:15:58.020 |
Lions actually attack like bigger animals than cheetahs. 00:16:01.440 |
So, there's a huge independence hypothesis in this process, 00:16:07.780 |
The reason for this is that we don't know how to represent 00:16:10.860 |
properly distributions over combinatorial sequences 00:16:33.380 |
latent representation of text that would say that, 00:16:40.660 |
lion for cheetah, I also have to switch zebra for gazelle. 00:16:48.720 |
let me throw some criticism at you that I often hear 00:16:52.940 |
So this kind of filling in the blanks is just statistics. 00:17:07.540 |
such that you can use it to generalize about the world. 00:17:35.580 |
Is it possible that intelligence is just statistics? 00:17:53.380 |
So if the criticism comes from people who say, 00:17:56.220 |
current machine learning system don't care about causality, 00:17:59.420 |
which by the way is wrong, I agree with them. 00:18:03.100 |
Your model of the world should have your actions 00:18:09.100 |
and that will drive you to learn causal models of the world 00:18:11.420 |
where you know what intervention in the world 00:18:16.700 |
or you can do this by observation of other agents 00:18:19.420 |
acting in the world and observing the effect, 00:18:35.180 |
that have deep mechanistic explanation for what goes on. 00:18:44.420 |
Because a lot of people who actually voice their criticism 00:19:16.340 |
it seems like when we think about what is intelligence, 00:19:25.580 |
like concepts of memory and reasoning module, 00:19:43.580 |
just like we ignore the way the operating system works, 00:19:52.740 |
the neural network might be doing something like statistics. 00:20:00.580 |
but doing this kind of fill in the gap kind of learning 00:20:03.340 |
and just kind of updating the model constantly 00:20:05.740 |
in order to be able to support the raw sensory information, 00:20:09.260 |
to predict it and then adjust to the prediction 00:20:12.420 |
But like when we look at our brain at the high level, 00:20:23.700 |
and we're putting them into long-term memory. 00:20:30.200 |
which is this kind of simple, large neural network 00:20:36.020 |
- Right, well, okay, so there's a lot of questions 00:20:40.620 |
there's a whole school of thought in neuroscience, 00:20:47.800 |
which is really related to the idea I was talking about 00:20:53.580 |
the essence of intelligence is the ability to predict 00:20:56.360 |
and everything the brain does is trying to predict 00:21:02.140 |
Okay, and that's really sort of the underlying principle, 00:21:07.820 |
is trying to kind of reproduce this idea of prediction 00:21:21.140 |
And of course, we all think about trying to reproduce 00:21:28.340 |
but with machines, we're not even at the level 00:21:30.420 |
of even reproducing the learning processes in a cat brain. 00:21:39.020 |
don't have as much common sense as a house cat. 00:21:49.580 |
They certainly have, because many cats can figure out 00:21:53.620 |
how they can act on the world to get what they want. 00:21:56.660 |
They certainly have a fantastic model of intuitive physics, 00:22:04.620 |
but also of praise and things like that, right? 00:22:09.940 |
They only do this with about 800 million neurons. 00:22:21.340 |
let's not even worry about the high-level cognition 00:22:26.340 |
and long-term planning and reasoning that humans can do 00:22:32.500 |
Now, that said, this ability to learn world models, 00:22:44.340 |
there are three main challenges in machine learning. 00:23:01.600 |
because this is what deep learning is all about, really. 00:23:04.280 |
And the third one is something we have no idea how to solve, 00:23:09.480 |
is can we get machines to learn hierarchical representations 00:23:18.780 |
to learn hierarchical representations of perception, 00:23:22.240 |
with convolutional nets and things like that, 00:23:23.680 |
and transformers, but what about action plans? 00:23:28.320 |
good hierarchical representations of actions? 00:23:32.440 |
- Yeah, all of that needs to be somewhat differentiable 00:23:35.920 |
so that you can apply sort of gradient-based learning, 00:23:40.920 |
- So it's background, knowledge, ability to reason 00:23:55.480 |
or builds on top of that background knowledge, 00:23:59.120 |
be able to make hierarchical plans in the world. 00:24:05.480 |
there's something in classical optimal control 00:24:13.840 |
NASA uses that to compute trajectories of rockets. 00:24:16.840 |
And the basic idea is that you have a predictive model 00:24:25.440 |
which, given the state of the system at time t, 00:24:28.360 |
and given an action that you're taking on the system, 00:24:39.560 |
So basically a differential equation, something like that. 00:24:51.600 |
that you can back propagate gradient through, 00:24:53.600 |
you can do what's called model predictive control, 00:24:58.280 |
So you have, you can unroll that model in time, 00:25:03.280 |
you feed it a hypothesized sequence of actions, 00:25:13.560 |
that measures how well, at the end of the trajectory, 00:25:16.040 |
the system has succeeded or matched what you want it to do. 00:25:21.200 |
Have you grasped the object you want to grasp? 00:25:35.120 |
you can figure out what is the optimal sequence of actions 00:25:39.080 |
that will get my system to the best final state. 00:25:52.840 |
And you can think of this as a form of reasoning. 00:25:56.160 |
So to take the example of the teenager driving a car again, 00:26:00.800 |
you have a pretty good dynamical model of the car, 00:26:04.200 |
but you know, again, that if you turn the wheel 00:26:18.560 |
So you can sort of imagine different scenarios 00:26:21.000 |
and then employ or take the first step in the scenario 00:26:27.800 |
That's called receding horizon model predictive control. 00:26:40.040 |
the model of the world is not generally learned. 00:26:42.440 |
There's, you know, sometimes a few parameters 00:26:47.080 |
But generally the model is mostly deterministic 00:26:55.640 |
I think the big challenge of AI for the next decade 00:26:58.720 |
is how do we get machines to run predictive models 00:27:03.680 |
and deal with the real world in all this complexity. 00:27:10.200 |
It's not even just the trajectory of a robot arm, 00:27:14.880 |
careful mathematics, but it's everything else, 00:27:17.160 |
everything we observe in the world, you know, 00:27:22.960 |
that involve collective phenomena like water or, you know, 00:27:27.960 |
trees and, you know, branches in a tree or something, 00:27:35.040 |
humans have no trouble developing abstract representations 00:27:39.840 |
but we still don't know how to do with machines. 00:27:52.960 |
to the dynamic nature of the world, the environment, 00:28:03.400 |
into the hierarchical representation of action in your view? 00:28:07.480 |
It's just that now your model of the world has to deal with, 00:28:11.360 |
you know, it just makes it more complicated, right? 00:28:17.240 |
that makes your model of the world much more complicated, 00:28:28.120 |
I mean, there's a, I go, you go, I go, you go. 00:28:48.440 |
of what the ontology of what defines a car door, 00:28:57.320 |
they're trying to get out, like here in New York, 00:29:01.440 |
You slowing down is going to signal something. 00:29:16.960 |
I mean, I guess you can integrate all of them 00:29:20.360 |
like the entirety of these little interactions. 00:29:30.000 |
- Well, in some ways it's way more complicated than chess 00:29:43.680 |
This is the kind of problem we've evolved to solve. 00:29:55.720 |
In fact, that's why we designed it as a game, 00:29:59.040 |
And if there is something that recent progress 00:30:05.600 |
is that humans are really terrible at those things, 00:30:16.720 |
behind an ideal player that they would call God. 00:30:19.640 |
In fact, no, there are like nine or 10 stones behind, 00:30:27.400 |
and it's because we have limited working memory. 00:30:30.360 |
We're not very good at doing this tree exploration 00:30:32.960 |
that computers are much better at doing than we are, 00:30:37.960 |
at learning differentiable models of the world. 00:30:47.480 |
but in the sense that our brain has some mechanism 00:30:56.520 |
So if you have an agent that consists of a model 00:31:04.360 |
is basically the entire front half of your brain, 00:31:14.400 |
There is your sort of intrinsic motivation module, 00:31:20.080 |
That's the thing that measures pain and hunger 00:31:22.480 |
and things like that, like immediate feelings and emotions. 00:31:30.720 |
of what people in reinforcement learning call a critic, 00:31:32.560 |
which is a sort of module that predicts ahead 00:31:54.640 |
your cost function, your critic, your world model, 00:32:03.080 |
to do planning, to do reasoning, to do learning, 00:32:15.360 |
That's probably at the core of what can solve intelligence. 00:32:18.400 |
So you don't need like a logic-based reasoning in your view. 00:32:23.400 |
- I don't know how to make logic-based reasoning 00:32:43.280 |
proceed by optimizing some objective function. 00:32:49.920 |
does learning in the brain minimize an objective function? 00:33:00.320 |
Second, if it does optimize an objective function, 00:33:04.640 |
does it do it by some sort of gradient estimation? 00:33:10.880 |
but some way of estimating the gradient in an efficient manner 00:33:14.840 |
whose complexity is on the same order of magnitude 00:33:20.760 |
'Cause you can't afford to do things like, you know, 00:33:29.640 |
you can't do sort of estimating gradient by perturbation. 00:33:39.200 |
zeroth-order black box gradient-free optimization 00:33:46.280 |
So it has to have a way of estimating gradient. 00:33:49.240 |
- Is it possible that some kind of logic-based reasoning 00:33:52.760 |
emerges in pockets as a useful, like you said, 00:33:58.080 |
maybe it's a mechanism for creating objective functions. 00:34:01.280 |
It's a mechanism for creating knowledge bases, for example, 00:34:08.360 |
Like, maybe it's like an efficient representation 00:34:10.240 |
of knowledge that's learned in a gradient-based way 00:34:13.760 |
- Well, so I think there is a lot of different types 00:34:17.280 |
So first of all, I think the type of logical reasoning 00:34:23.040 |
maybe stemming from, you know, sort of classical AI 00:34:34.640 |
- But we judge each other based on our ability 00:34:45.160 |
- Yes, I'm judging you this whole time because, 00:34:58.960 |
So, you know, but I think perhaps another type 00:35:03.960 |
of intelligence that I have is this, you know, 00:35:07.560 |
ability of sort of building models to the world from, 00:35:11.000 |
you know, reasoning, obviously, but also data. 00:35:15.640 |
And those models generally are more kind of analogical, 00:35:19.440 |
So it's reasoning by simulation and by analogy, 00:35:23.880 |
where you use one model to apply to a new situation, 00:35:26.840 |
even though you've never seen that situation, 00:35:38.360 |
So you're kind of simulating what's happening 00:35:49.560 |
Are you going to use, you know, screws and nails or whatever? 00:35:55.720 |
and sort of interact with that person, you know, 00:35:59.480 |
having this model in mind to kind of tell the person 00:36:05.240 |
So I think this ability to construct models of the world 00:36:13.840 |
And the ability to use it then to plan actions 00:36:25.440 |
- So I'm going to ask you a series of impossible questions 00:36:30.160 |
So if that's the fundamental sort of dark matter 00:36:33.400 |
of intelligence, this ability to form a background model, 00:36:36.560 |
what's your intuition about how much knowledge is required? 00:36:52.640 |
How much information do you think is required 00:36:59.920 |
So you have to be able to, when you see a box, 00:37:02.160 |
go in it, when you see a human compute the most evil action, 00:37:06.240 |
if there's a thing that's near an edge, you knock it off. 00:37:09.600 |
All of that, plus the extra stuff you mentioned, 00:37:12.720 |
which is a great self-awareness of the physics 00:37:18.740 |
How much knowledge is required, do you think, to solve it? 00:37:21.600 |
I don't even know how to measure an answer to that question. 00:37:26.680 |
but whatever it is, it fits in about 800,000 neurons. 00:37:35.380 |
- Everything, all knowledge, everything, right? 00:37:41.460 |
A dog is two billion, but a cat is less than one billion. 00:38:03.300 |
although it's not even clear how supervised learning 00:38:08.100 |
So I think almost all of it is self-supervised learning, 00:38:12.860 |
but it's driven by the sort of ingrained objective functions 00:38:17.860 |
that a cat or a human have at the base of their brain, 00:38:52.460 |
So hunger is just one of the human perceivable symptoms 00:38:58.020 |
of the brain being unhappy with the way things are currently. 00:39:48.140 |
In fact, they scream at them when they come too close 00:39:55.880 |
evolution has figured out that's the best thing. 00:39:58.280 |
I mean, they're occasionally social, of course, 00:40:05.920 |
So all of those behaviors are not part of intelligence. 00:40:11.040 |
"intelligent machines because human intelligence is social." 00:40:13.960 |
But then you look at orangutans, you look at octopus. 00:40:28.800 |
So there are things that we think, as humans, 00:40:38.840 |
I think we give way too much importance to language 00:40:46.760 |
because we think our reasoning is so linked with language. 00:40:49.840 |
- So to solve the house cat intelligence problem, 00:40:53.480 |
you think you could do it on a desert island? 00:41:05.720 |
- It needs to have sort of the right set of drives 00:41:14.320 |
But like, for example, baby humans are driven 00:41:26.000 |
How to do it precisely is not, that's learned. 00:41:30.440 |
- Move around and stand up, that's sort of hardwired. 00:41:35.440 |
But it's very simple to hardwire this kind of stuff. 00:41:38.040 |
- Oh, like the desire to, well, that's interesting. 00:41:45.600 |
That's not a, there's gotta be a deeper need for walking. 00:41:50.440 |
I think it was probably socially imposed by society 00:41:53.120 |
that you need to walk, all the other bipedal-- 00:41:55.560 |
- No, like a lot of simple animals that, you know, 00:42:19.320 |
which is sort of part of the sort of human development. 00:42:38.520 |
- But I don't, you know, like, from the last time 00:42:41.400 |
I've interacted with a table, that's much more stable 00:42:46.440 |
- Yeah, I mean, birds have figured it out with two feet. 00:42:49.640 |
- Well, technically, we can go into ontology. 00:42:56.400 |
- You know, dinosaurs have two feet, many of them. 00:42:59.680 |
I'm just now learning that T-Rex was eating grass, 00:43:08.040 |
What do you think about, I don't know if you looked at 00:43:26.140 |
that it's not really relevant, I think, in the short term. 00:43:42.720 |
- That's right, and that's the answer to this probably, 00:43:45.840 |
Just learn to represent images, and then learning 00:43:48.240 |
to recognize handwritten digits on top of this 00:43:58.680 |
with a couple of pictures of an elephant, and that's it. 00:44:20.840 |
to predict whatever hashtag people type on Instagram, right? 00:44:25.740 |
'cause there's billions per day that are showing up. 00:44:35.020 |
you know, a couple layers down from the output 00:45:13.000 |
But like a lot of people use hashtags on Instagram 00:45:20.040 |
that doesn't fully represent the contents of the image. 00:45:24.480 |
and hashtag it with like science, awesome, fun. 00:45:33.080 |
- The way my colleagues who worked on this project 00:45:41.560 |
they only selected something like 17,000 tags 00:45:43.760 |
that correspond to kind of physical things or situations. 00:45:48.080 |
Like, you know, that has some visual content. 00:45:57.120 |
- Oh, so they keep a very select set of hashtags 00:46:02.960 |
- It's still on the order of, you know, 10 to 20,000. 00:46:13.240 |
And how is it used, maybe contrast of learning for video? 00:46:23.800 |
is the idea of artificially increasing the size 00:46:26.120 |
of your training set by distorting the images 00:46:35.440 |
And people have done this since the 1990s, right? 00:46:37.320 |
You take an MNIST digit and you shift it a little bit, 00:46:40.840 |
or you change the size or rotate it, skew it, 00:46:50.800 |
If you train a supervised classifier with augmented data, 00:47:00.400 |
because a lot of self-supervised learning techniques 00:47:04.160 |
to pre-train vision systems are based on data augmentation. 00:47:08.000 |
And the basic techniques is originally inspired 00:47:12.000 |
by techniques that I worked on in the early '90s 00:47:15.840 |
and Geoff Hinton worked on also in the early '90s. 00:47:24.960 |
of the same network, they share the same weights, 00:47:27.720 |
and you show two different views of the same object. 00:47:31.760 |
Either those two different views may have been obtained 00:47:33.920 |
by data augmentation, or maybe it's two different views 00:47:36.480 |
of the same scene from a camera that you moved 00:47:39.320 |
or at different times or something like that, right? 00:47:41.360 |
Or two pictures of the same person, things like that. 00:47:46.440 |
those two identical copies of this neural net 00:47:48.400 |
to produce an output representation, a vector, 00:47:53.960 |
for those two images are as close to each other as possible, 00:47:58.920 |
as identical to each other as possible, right? 00:48:00.840 |
Because you want the system to basically learn a function 00:48:04.640 |
that will be invariant, that will not change, 00:48:07.160 |
whose output will not change when you transform those inputs 00:48:17.720 |
that when you show two images that are different, 00:48:21.960 |
Because if you don't have a specific provision for this, 00:48:28.240 |
When you train it, it will end up ignoring the input 00:48:50.040 |
So you have pairs of images that you know are different 00:48:53.200 |
and you show them to the network and those two copies, 00:49:11.480 |
for a project of doing signature verification. 00:49:23.320 |
And then force the system to produce different representation 00:49:33.000 |
by people from what was a subsidiary of AT&T at the time 00:49:38.280 |
And they were interested in storing a representation 00:49:46.680 |
So we came up with this idea of having a neural net 00:49:57.960 |
So then you would sign, it would run through the neural net 00:50:10.160 |
I mean, the American financial payment system 00:50:13.840 |
is incredibly lax in that respect compared to Europe. 00:50:31.760 |
even though I had the original paper on this, 00:50:41.040 |
there's just too many ways for two things to be different. 00:50:48.240 |
So there is a particular implementation of this, 00:50:54.840 |
where Geoff Hinton is the senior member there. 00:51:03.720 |
of implementing this idea of contrastive learning, 00:51:08.640 |
Now, what I'm much more enthusiastic about these days 00:51:41.640 |
And you train the two networks to be informative, 00:51:44.160 |
but also to be as informative of each other as possible. 00:51:49.720 |
So basically one representation has to be predictable 00:52:00.200 |
and then nothing was done about it for decades. 00:52:13.880 |
We came up with something that we called Barlow Twins, 00:52:20.640 |
of maximizing the information content of a vector 00:52:32.040 |
that's more recent now called VICREG, V-I-C-R-E-G, 00:52:34.520 |
that means variance, invariance, covariance, regularization. 00:52:37.880 |
And it's the thing I'm the most excited about 00:52:41.720 |
I mean, I'm really, really excited about this. 00:52:44.360 |
- What kind of data augmentation is useful for that, 00:52:50.240 |
Are we talking about, does that not matter that much? 00:52:52.640 |
Or it seems like a very important part of the step. 00:52:56.760 |
- How you generate the images that are similar, 00:53:00.440 |
It's an important step, and it's also an annoying step, 00:53:13.160 |
which a lot of people working in this area are using, 00:53:22.040 |
So one basically just shifts the image a little bit, 00:53:25.240 |
Another one kind of changes the scale a little bit. 00:53:38.160 |
So you have like a catalog of kind of standard things, 00:53:41.160 |
and people try to use the same ones for different algorithms 00:53:45.960 |
But some algorithms, some self-supervised algorithm 00:53:50.680 |
like more aggressive data augmentation, and some don't. 00:53:53.560 |
So that kind of makes the whole thing difficult. 00:53:56.400 |
But that's the kind of distortions we're talking about. 00:54:11.120 |
and you use the representation as input to a classifier. 00:54:13.560 |
You train the classifier on ImageNet, let's say, 00:54:24.400 |
at eliminating the information that is irrelevant, 00:54:26.840 |
which is the distortions between those images, 00:54:34.080 |
you cannot use the representations in those systems 00:54:37.200 |
for things like object detection and localization, 00:54:41.520 |
So the type of data augmentation you need to do 00:54:44.720 |
depends on the task you want eventually the system to solve. 00:54:50.680 |
standard data augmentation that we use today, 00:54:57.760 |
- Can you help me out understand why localization is-- 00:55:00.800 |
So you're saying it's just not good at the negative, 00:55:05.440 |
so that's why it can't be used for the localization? 00:55:12.360 |
and then you give it the same image shifted and scaled, 00:55:19.160 |
to eliminate the information about position and size. 00:55:27.760 |
- Like a bounding box, like to be able to actually, okay. 00:55:35.960 |
the exact boundaries of that object, interesting. 00:55:41.120 |
that's an interesting sort of philosophical question. 00:55:47.040 |
We're like obsessed by measuring like image segmentation, 00:55:59.700 |
to understanding what are the contents of the scene. 00:56:12.480 |
And in the human brain, you have two separate pathways 00:56:15.320 |
for recognizing the nature of a scene or an object, 00:56:22.320 |
So you use the first pathway called the ventral pathway 00:56:30.580 |
is used for navigation, for grasping, for everything else. 00:56:34.140 |
And basically a lot of the things you need for survival 00:56:39.740 |
- Is similarity learning or contrastive learning, 00:56:52.620 |
does that mean you understand what it means to be a cat? 00:56:57.580 |
I mean, it's a superficial understanding, obviously. 00:57:00.100 |
- But what is the ceiling of this method, do you think? 00:57:11.260 |
So if we figure out how to use techniques of that type, 00:57:16.260 |
perhaps very different, but of the same nature, 00:57:31.340 |
but a path towards some level of physical common sense 00:57:43.100 |
how the world works from a high-throughput channel, 00:57:55.540 |
In other words, I believe in grounded intelligence. 00:58:09.960 |
So for example, and people have attempted to do this 00:58:18.420 |
of basically kind of writing down all the facts 00:58:20.620 |
that are known and hoping that some sort of common sense 00:58:24.100 |
will emerge, I think it's basically hopeless. 00:58:28.300 |
You take an object, I describe a situation to you. 00:58:34.940 |
It's completely obvious to you that the object 00:58:45.060 |
And so if you train a machine as powerful as it could be, 00:58:55.640 |
That information is just not present in any text. 00:59:01.020 |
- Well, the question, like with the PSYCH project, 00:59:03.260 |
the dream, I think, is to have like 10 million, 00:59:08.020 |
say, facts like that, that give you a head start, 00:59:15.460 |
Now, we humans don't need a parent to tell us 00:59:25.900 |
so it's possible that we can give it a quick shortcut. 00:59:52.540 |
process of evolution that got us from bacteria 01:00:12.500 |
If it's not, if it's most of the intelligence, 01:00:14.280 |
most of the cool stuff we've been talking about 01:00:20.660 |
we can form that big, beautiful, sexy background model 01:00:24.780 |
that you're talking about just by sitting there. 01:00:27.240 |
Then, okay, then you need to, then like maybe, 01:00:32.600 |
it is all supervised learning all the way down. 01:00:46.340 |
and logical reasoning and this kind of stuff, 01:00:49.900 |
because it only popped up in the last million years. 01:00:53.700 |
- And it only involves less than 1% of a genome, 01:01:18.620 |
might be just something about social interaction 01:01:30.800 |
but it probably isn't, mechanistically speaking. 01:01:37.300 |
- Number 634 in the list of problems we have to solve. 01:01:43.380 |
- So basic physics of the world is number one. 01:01:46.860 |
What do you, just a quick tangent on data augmentation. 01:02:07.660 |
which then improves the similarity learning process? 01:02:12.660 |
So not just kind of dumb, simple distortions, 01:02:18.100 |
just saying that even simple distortions are enough. 01:02:25.080 |
So what people are working on now is two things. 01:02:32.220 |
like trying to translate the type of self-supervised learning 01:02:35.460 |
people use in language, translating these two images, 01:02:38.660 |
which is basically a denoising autoencoder method, right? 01:02:41.820 |
So you take an image, you block, you mask some parts of it, 01:03:03.740 |
but there's a paper now coming out of the FAIR group 01:03:23.060 |
in this case is a transformer because you can, 01:03:30.860 |
so it's easy to mask patches and things like that. 01:03:33.260 |
- Okay, then my question transfers to that problem, 01:03:35.620 |
the masking, like why should the mask be a square 01:03:41.580 |
I think we're gonna come up probably in the future 01:03:44.300 |
with sort of, you know, ways to mask that are, you know, 01:03:52.860 |
- No, no, but like something that's challenging, 01:03:59.380 |
So like, I mean, maybe it's a metaphor that doesn't apply, 01:04:02.460 |
but you're, it seems like there's a data augmentation 01:04:06.420 |
or masking, there's an interactive element with it. 01:04:09.860 |
Like, you're almost like playing with an image. 01:04:26.820 |
And then the principle of the training procedure 01:04:33.580 |
or the representation between the clean version 01:04:36.900 |
and the corrupted version, essentially, right? 01:04:42.020 |
So, you know, Boson machine work like this, right? 01:04:50.900 |
And then you either let them go their merry way 01:05:02.380 |
And what you're doing is you're training the system 01:05:04.620 |
so that the stable state of the entire network 01:05:07.980 |
is the same regardless of whether it sees the entire input 01:05:16.940 |
You're training a system to reproduce the input, 01:05:26.220 |
And you could imagine sort of even in the brain, 01:05:28.260 |
some sort of neural principle where, you know, 01:05:32.780 |
So they take their activity and then temporarily 01:05:37.980 |
force the rest of the system to basically reconstruct 01:05:49.020 |
more or less biologically possible processes. 01:05:58.700 |
you don't have to worry about being super efficient. 01:06:06.180 |
'Cause I was thinking like you might wanna be clever 01:06:08.780 |
about the way you do all of these procedures, you know, 01:06:12.020 |
but that's only, it's somehow costly to do every iteration, 01:06:21.500 |
data augmentation without explicit data augmentation. 01:06:25.580 |
which is, you know, the sort of video prediction. 01:06:31.500 |
observing the, you know, the continuation of that video clip 01:06:40.260 |
in such a way that the representation of the future clip 01:06:43.300 |
is easily predictable from the representation 01:06:57.780 |
- So the amount of data is not the constraint. 01:07:01.220 |
- No, it would require some selection, I think. 01:07:08.460 |
- Don't go down the rabbit hole of just cat videos. 01:07:11.100 |
I might, you might need to watch some lectures or something. 01:07:17.500 |
If it like watches lectures about intelligence 01:07:21.380 |
and then learns, watches your lectures on NYU 01:07:27.860 |
- What's your, do you find multimodal learning interesting? 01:07:38.140 |
- There's a lot of things that I find interesting 01:07:43.260 |
the important problem that I think are really 01:07:46.660 |
So I think, you know, things like multitask learning, 01:07:48.940 |
continual learning, you know, adversarial issues. 01:07:53.940 |
I mean, those have, you know, great practical interests 01:08:00.300 |
but I don't think they're fundamental, you know, 01:08:01.460 |
active learning, even to some extent reinforcement learning. 01:08:04.380 |
I think those things will become either obsolete 01:08:12.940 |
how to do self-supervised representation learning 01:08:25.460 |
in sort of fundamental questions or, you know, 01:08:31.460 |
But of course there's like a huge amount of, you know, 01:08:33.340 |
very interesting work to do in sort of practical questions 01:08:38.020 |
- Well, you know, it's difficult to talk about 01:08:41.300 |
the temporal scale because all of human civilization 01:08:44.260 |
will eventually be destroyed because the sun will die out. 01:08:50.300 |
in multi-planetary colonization across the galaxy, 01:09:07.420 |
I'm saying all that to say that multitask learning 01:09:11.900 |
might be, you're calling it practical or pragmatic 01:09:18.340 |
that achieves something very akin to intelligence 01:09:21.140 |
while we're trying to solve the more general problem 01:09:26.940 |
of self-supervised learning of background knowledge. 01:09:36.460 |
I don't know if you've gotten a chance to glance 01:09:38.340 |
at this particular one example of multitask learning 01:09:45.000 |
like, I don't know, Charles Darwin studying animals. 01:09:48.940 |
They're studying the problem of driving and asking, 01:09:52.100 |
okay, what are all the things you have to perceive? 01:09:57.860 |
there's an ontology where you're bringing that to the table. 01:10:00.420 |
So you're formulating a bunch of different tasks. 01:10:02.300 |
It's like over 100 tasks or something like that 01:10:07.740 |
and then getting data back from people that run into trouble 01:10:10.580 |
and they're trying to figure out, do we add tasks? 01:10:12.700 |
Do we, like, we focus on each individual task separately. 01:10:17.140 |
so I would say, I'll classify Andrej Karpathy's talk 01:10:24.740 |
He kept going back and forth on those two topics, 01:10:30.060 |
meaning you can't just use a single benchmark. 01:10:42.980 |
Now, okay, it's very clear that if you're faced 01:10:47.620 |
with an engineering problem that you need to solve 01:10:51.940 |
particularly if you have Elon Musk breathing down your neck, 01:10:55.900 |
you're going to have to take shortcuts, right? 01:10:57.380 |
You might think about the fact that the right thing to do 01:11:02.380 |
and the long-term solution involves, you know, 01:11:06.580 |
but you have Elon Musk breathing down your neck, 01:11:17.380 |
the systematic engineering and fine-tuning and refinements 01:11:34.420 |
in the world, and you have to kind of ironclad it 01:11:40.460 |
so much for, you know, grand ideas and principles. 01:11:46.260 |
But, you know, I'm placing myself sort of, you know, 01:11:59.900 |
because eventually I want that stuff to get used, 01:12:06.900 |
for the community to realize this is the right thing to do. 01:12:14.420 |
I mean, if you look back in the mid-2000s, for example, 01:12:18.980 |
okay, I want to recognize cars or faces or whatever, 01:12:28.380 |
kind of computer vision techniques, you know, 01:12:37.820 |
that those methods that use more hand engineering 01:12:43.580 |
There was just not enough data for conv nets, 01:12:47.860 |
with the kind of hardware that was available at the time. 01:12:50.820 |
And there was a sea change when, basically when, you know, 01:12:55.580 |
datasets became bigger and GPUs became available. 01:12:58.580 |
That's what, you know, two of the main factors 01:13:02.900 |
that basically made people change their mind. 01:13:11.820 |
like all sub branches of AI or pattern recognition, 01:13:15.500 |
and there's a similar trajectory followed by techniques 01:13:25.180 |
You know, be it optical character recognition, 01:13:34.260 |
natural language understanding, like, you know, 01:13:42.700 |
the prior knowledge you know about image formation, 01:13:49.580 |
about like feature extraction, Fourier transforms, 01:13:52.420 |
you know, Wernicke moments, you know, whatever, right? 01:14:03.020 |
There is, you know, it took decades for people 01:14:05.020 |
to figure out a good front end to pre-process 01:14:10.540 |
the information about what is being said is preserved, 01:14:13.420 |
but most of the information about the identity 01:14:17.060 |
You know, Kestrel coefficients or whatever, right? 01:14:32.460 |
And, you know, you do this sort of tree representation 01:14:51.260 |
maybe you know something about statistical learning. 01:14:54.660 |
and it's usually a small sliver on top of your 01:15:05.380 |
with a deep learning system and it learns its own features 01:15:07.740 |
and, you know, speech recognition systems nowadays, 01:15:16.380 |
that takes raw waveforms and produces a sequence 01:15:27.380 |
other than, you know, something that's ingrained 01:15:29.540 |
in the sort of neural language model, if you want. 01:15:31.900 |
Same for translation, same for all kinds of stuff. 01:15:42.700 |
And I think, I mean, it's true in biology as well. 01:16:01.460 |
which is the selection of data and also the interactivity, 01:16:04.700 |
needs to be part of this giant neural network. 01:16:16.740 |
of a neural network that's automatically learning, 01:16:19.620 |
it feels, my intuition is that you have to have a system, 01:16:24.620 |
whether it's a physical robot or a digital robot 01:16:32.300 |
and doing so in a flawed way and improving over time 01:16:35.900 |
in order to form the self-supervised learning well. 01:16:47.060 |
I agree in the sense that I think, I agree in two ways. 01:16:55.100 |
and you certainly need a causal model of the world 01:16:57.420 |
that allows you to predict the consequences of your actions, 01:17:00.460 |
to train that model, you need to take actions. 01:17:02.740 |
You need to be able to act in a world and see the effect 01:17:08.420 |
- So, that's not obvious because you can observe others. 01:17:12.340 |
- And you can infer that they're similar to you 01:17:15.900 |
- Yeah, but then you have to kind of hardware that part, 01:17:24.380 |
So, I think the action part would be necessary 01:17:36.660 |
or at least more efficient is that active learning 01:17:40.580 |
basically goes for the jiggler of what you don't know, right? 01:17:44.900 |
There's obvious areas of uncertainty about your world 01:17:56.220 |
by systematic exploration of that part that you don't know. 01:18:09.260 |
different species are different levels of curiosity, right? 01:18:28.780 |
So, what process, what learning process is it 01:18:44.780 |
So, I worry about active learning once this question is... 01:18:48.100 |
- So, it's the more fundamental question to ask. 01:19:00.260 |
if the increase is several orders of magnitude, right? 01:19:05.660 |
- But fundamentally, it's still the same thing 01:19:13.820 |
efficient or inefficient, is the core problem. 01:19:24.540 |
- Okay, I don't know what consciousness is, but... 01:19:35.980 |
of the questions people were asking themselves 01:19:44.100 |
how the eye works and the fact that the image 01:19:46.300 |
at the back of the eye was upside down, right? 01:19:53.420 |
is an image of the world, but it's upside down. 01:19:58.200 |
And, you know, with what we know today in science, 01:20:00.460 |
you know, we realize this question doesn't make any sense 01:20:06.340 |
So, I think a lot of what is said about consciousness 01:20:09.020 |
Now, that said, there's a lot of really smart people 01:20:15.060 |
people like David Chalmers, who is a colleague of mine 01:20:29.220 |
So, we're talking about the study of a world model. 01:20:32.020 |
And I think, you know, our entire prefrontal cortex 01:20:40.820 |
But when we are attending at a particular situation, 01:20:48.580 |
And that seems to suggest that we basically have only one 01:20:58.400 |
That engine is configurable to the situation at hand. 01:21:04.620 |
or we are, you know, driving down the highway, 01:21:09.340 |
We basically have a single model of the world 01:21:12.860 |
that we're configuring to the situation at hand, 01:21:15.380 |
which is why we can only attend to one task at a time. 01:21:18.080 |
Now, if there is a task that we do repeatedly, 01:21:21.700 |
it goes from the sort of deliberate reasoning 01:21:27.460 |
and perhaps something like model predictive control, 01:21:34.420 |
So, I don't know if you've ever played against 01:21:38.980 |
You know, I get wiped out in 10 plies, right? 01:21:48.680 |
And the person in front of me, the grandmaster, 01:21:52.300 |
you know, would just like react within seconds, right? 01:21:59.980 |
because, you know, it's basically just pattern recognition 01:22:03.460 |
Same, you know, the first few hours you drive a car, 01:22:09.660 |
And then after 20, 30 hours of practice, 50 hours, 01:22:21.020 |
So, that suggests you only have one model in your head. 01:22:23.780 |
And it might suggest the idea that consciousness 01:22:31.700 |
You know, you need to have some sort of executive 01:22:35.260 |
kind of overseer that configures your world model 01:22:40.540 |
And that leads to kind of the really curious concept 01:22:43.780 |
that consciousness is not a consequence of the power 01:22:46.860 |
of our minds, but of the limitation of our brains. 01:22:53.660 |
If we had as many world models as there are situations 01:22:57.620 |
we encounter, then we could do all of them simultaneously, 01:23:00.740 |
and we wouldn't need this sort of executive control 01:23:22.460 |
what the heck is that, and why is that useful? 01:23:26.180 |
why is it useful to feel like this is really you 01:23:29.940 |
experiencing this versus just like information 01:23:39.040 |
of the way we evolved that it's just very useful 01:23:43.640 |
to feel a sense of ownership to the decisions you make, 01:23:53.200 |
Like you own this thing, and it's the only one you got, 01:24:06.840 |
that most or at least many people disagree with you with, 01:24:14.920 |
But I think, so certainly there is a bunch of people 01:24:19.920 |
who are nativist, right, who think that a lot 01:24:22.000 |
of the basic things about the world are kind of hardwired 01:24:25.320 |
Things like the world is three-dimensional, for example. 01:24:30.400 |
Things like object permanence, is it something 01:24:33.080 |
that we learn before the age of three months or so, 01:24:46.560 |
I think those things are actually very simple to learn. 01:24:49.040 |
Is it the case that the oriented edge detectors in V1 01:25:00.600 |
from the retina that actually will train edge detectors. 01:25:03.000 |
So, and again, those are things that can be learned 01:25:21.540 |
There's also those MIT experiments where you kind of plug 01:25:26.160 |
the optical nerve on the auditory cortex of a baby ferret, 01:25:33.400 |
So, you know, clearly there's learning taking place there. 01:25:37.980 |
So, you know, I think a lot of what people think 01:25:46.240 |
- So you put a lot of value in the power of learning. 01:25:49.960 |
What kind of things do you suspect might not be learned? 01:25:53.340 |
Is there something that could not be learned? 01:25:59.760 |
There are the things that, you know, make humans human 01:26:03.440 |
or make, you know, cats different from dogs, right? 01:26:07.400 |
It's the basic drives that are kind of hardwired 01:26:20.040 |
where the reward doesn't come from the external world. 01:26:24.600 |
Your own brain computes whether you're happy or not, right? 01:26:28.120 |
It measures your degree of comfort or incomfort. 01:26:38.760 |
So it's easier to learn when your objective is intrinsic. 01:26:48.760 |
The critic that makes long-term prediction of the outcome, 01:26:53.420 |
which is the eventual result of this, that's learned. 01:27:01.220 |
But let me take an example of, you know, why the critic, 01:27:04.200 |
I mean, an example of how the critic may be learned, right? 01:27:06.800 |
If I come to you, you know, I reach across the table 01:27:15.880 |
- I was expecting that the whole time, but yes, right. 01:27:26.780 |
And now your model of the world includes the fact that 01:27:44.020 |
you know, your predictor of your ultimate pain system 01:27:50.460 |
that predicts that something bad is gonna happen 01:28:00.600 |
So the fact that, you know, you're a school child, 01:28:04.440 |
you wake up in the morning and you go to school and, 01:28:07.000 |
you know, it's not because you necessarily like waking up 01:28:12.720 |
but you know that there is a long-term objective 01:28:15.840 |
- So Ernest Becker, I'm not sure if you're familiar 01:28:35.540 |
introspection that over the horizon is the end. 01:28:44.380 |
that just all these psychological experiments that show, 01:28:47.500 |
basically this idea that all of human civilization, 01:28:52.500 |
everything we create is kind of trying to forget 01:28:56.820 |
if even for a brief moment that we're going to die. 01:29:09.060 |
- I don't know at what point, I mean, it's a question, 01:29:12.500 |
like, you know, at what point do you realize that, 01:29:16.460 |
And I think most people don't actually realize 01:29:19.260 |
I mean, most people believe that you go to heaven 01:29:21.900 |
- So to push back on that, what Ernest Becker says 01:29:29.340 |
and I find those ideas a little bit compelling 01:29:31.660 |
is that there is moments in life, early in life, 01:29:36.540 |
when you are, when you do deeply experience the terror 01:29:41.540 |
of this realization and all the things you think about, 01:29:47.220 |
that we kind of think about more like teenage years 01:30:05.380 |
of the jungle, the woods, looking all around you, 01:30:12.180 |
I'm going to go back in the comfort of my mind 01:30:16.820 |
where there is a, maybe like, pretend I'm immortal 01:30:20.420 |
in however way, however kind of idea I can construct 01:30:28.660 |
You can delude yourself in all kinds of ways, 01:30:31.460 |
like lose yourself in the busyness of each day, 01:30:34.220 |
have little goals in mind, all those kinds of things 01:30:40.780 |
and it's gonna be sad, but you don't really understand 01:30:46.460 |
And I find that compelling because it does seem 01:30:55.180 |
we're able to really understand that this life is finite. 01:31:03.660 |
a qualitative difference between us and cats in the term. 01:31:09.240 |
a better long-term ability to predict in the long term, 01:31:14.240 |
and so we have a better understanding of how the world works, 01:31:17.380 |
so we have a better understanding of finiteness of life 01:31:21.020 |
- So we have a better planning engine than cats? 01:31:25.280 |
- But what's the motivation for planning that far? 01:31:30.160 |
of the fact that we have just a better planning engine 01:31:34.760 |
the essence of intelligence is the ability to predict. 01:31:37.400 |
And so because we're smarter, as a side effect, 01:31:41.200 |
we also have this ability to kind of make predictions 01:31:43.480 |
about our own future existence or lack thereof. 01:31:52.960 |
It makes people worry about what's gonna happen 01:31:57.480 |
If you believe that you just don't exist after death, 01:32:04.960 |
you don't worry about what happens after death? 01:32:17.760 |
and I would say I agree with him more than not, 01:32:27.880 |
there's still a deep worry of the mystery of it all. 01:32:31.760 |
Like, how does that make any sense that it just ends? 01:32:35.680 |
I don't think we can truly understand that this right, 01:32:39.720 |
I mean, so much of our life, the consciousness, the ego, 01:32:46.120 |
- Science keeps bringing humanity down from its pedestal. 01:32:54.720 |
- That's wonderful, but for us individual humans, 01:32:57.840 |
we don't like to be brought down from a pedestal. 01:33:01.720 |
- But see, you're fine with it because, well, 01:33:04.140 |
so what Ernest Becker would say is you're fine with it 01:33:06.360 |
because that's just a more peaceful existence for you, 01:33:09.560 |
You're hiding from, in fact, some of the people 01:33:12.000 |
that experience the deepest trauma earlier in life, 01:33:17.000 |
they often, before they seek extensive therapy, 01:33:21.080 |
It's like when you talk to people who are truly angry, 01:33:29.200 |
I had a very bad motorbike accident when I was 17. 01:33:40.460 |
- So I'm basically just playing a bit of a devil's advocate, 01:33:43.120 |
pushing back on wondering is it truly possible 01:33:47.660 |
And the flip side that's more interesting, I think, 01:33:49.700 |
for AI and robotics is how important is it to have this 01:33:57.160 |
is to not just avoid falling off the roof or something 01:34:07.160 |
If you listen to the Stoics, it's a great motivator. 01:34:16.900 |
So maybe to truly fear death or be cognizant of it 01:34:38.980 |
I mean, I think human nature and human intelligence 01:34:42.600 |
It's a scientific mystery, in addition to, you know, 01:34:48.580 |
but, you know, I'm a true believer in science. 01:34:50.860 |
So, and I do have kind of a belief that for complex systems 01:34:57.540 |
like the brain and the mind, the way to understand it 01:35:07.660 |
what's essential to it when you try to build it. 01:35:10.000 |
You know, the same way I've used this analogy before 01:35:12.420 |
with you, I believe, the same way we only started 01:35:18.640 |
building airplanes, and that helped us understand 01:35:21.340 |
So I think there's kind of a similar process here 01:35:25.480 |
where we don't have a theory, a full theory of intelligence, 01:35:29.660 |
but building, you know, intelligent artifacts 01:35:43.840 |
- So you're an interesting person to ask this question 01:35:53.100 |
What are your thoughts about kind of like the Turing 01:35:58.020 |
If we create an AI system that exhibits a lot of properties 01:36:06.400 |
how comfortable are you thinking of that entity 01:36:12.340 |
So you're trying to build now systems that have intelligence 01:36:23.380 |
- So how are you, are you okay calling a thing intelligent 01:36:32.700 |
from a pedestal of consciousness/intelligence? 01:36:39.500 |
more about human nature, human mind, and human intelligence 01:36:50.560 |
And if a consequence of this is to bring down humanity 01:36:54.480 |
one notch down from its already low pedestal, 01:37:04.980 |
opinions I have that a lot of people may disagree with. 01:37:14.220 |
so assuming that we are somewhat successful at some level 01:37:18.660 |
of getting machines to learn models of the world, 01:37:22.580 |
we build intrinsic motivation objective functions 01:37:30.060 |
that allows it to estimate the state of the world 01:37:32.780 |
and then have some way of figuring out a sequence of actions 01:37:35.460 |
that, you know, to optimize a particular objective. 01:37:38.000 |
If it has a critic of the type that I was describing before, 01:37:48.580 |
intelligent autonomous machine will have emotions. 01:37:58.980 |
that is driven by intrinsic motivation, by objectives, 01:38:03.120 |
if it has a critic that allows it to predict in advance 01:38:10.060 |
is going to be good or bad, it's going to have emotions. 01:38:14.300 |
- When it predicts that the outcome is going to be bad 01:38:18.140 |
and something to avoid, it's going to have elation 01:38:34.460 |
And so it's going to have emotions about attachment 01:38:38.620 |
So I think, you know, the sort of sci-fi thing 01:38:46.900 |
like having an emotion chip that you can turn off, right? 01:39:00.040 |
like a civil rights movement for robots where, 01:39:06.460 |
like the Supreme Court, that particular kinds of robots, 01:39:29.580 |
that you could, you know, die and be restored. 01:39:33.740 |
Like, you know, you could be sort of, you know, 01:39:37.540 |
your brain could be reconstructed in its finest details. 01:39:40.740 |
Our ideas of rights will change in that case. 01:39:44.540 |
there's always a backup, you could always restore. 01:39:56.140 |
desire to do dangerous things like, you know, 01:40:14.140 |
or explore, you know, dangerous areas and things like that. 01:40:19.180 |
So now it's very likely that robots would be like that 01:40:22.380 |
because, you know, they'll be based on perhaps technology 01:40:27.060 |
that is somewhat similar to today's technology. 01:40:42.700 |
- And in fact, they made a game that's inspired by it. 01:40:49.260 |
- My three sons have a game design studio between them. 01:40:58.980 |
But so in Diablo, there's something called hardcore mode, 01:41:15.540 |
'cause they have to be integrated in human society, 01:41:18.380 |
they have to be able to die, no copies allowed. 01:41:25.260 |
like cloning will be illegal, even when it's possible. 01:41:29.940 |
I mean, you don't reproduce the mind of the person 01:41:36.420 |
- But then it's, but we were talking about with computers 01:41:52.320 |
that will destroy the motivation of the system. 01:41:55.980 |
- Okay, so let's say you have a domestic robot. 01:42:10.580 |
that makes it slightly different from the other robots 01:42:18.060 |
you've grown some attachment to it and vice versa. 01:42:25.900 |
Maybe it's a virtual assistant that lives in your, 01:42:29.380 |
you know, augmented reality glasses or whatever, right? 01:42:32.580 |
You know, the horror movie type thing, right? 01:42:39.620 |
the intelligence in that system is a bit like your child 01:42:47.100 |
there's a lot of you in that machine now, right? 01:42:53.500 |
you would do this for free if you want, right? 01:42:56.560 |
If it's your child, your child can, you know, 01:43:01.580 |
And, you know, the fact that they learn stuff from you 01:43:04.020 |
doesn't mean that you have any ownership of it, right? 01:43:09.380 |
perhaps you have some intellectual property claim about- 01:43:15.140 |
Oh, I thought you meant like a permanence value 01:43:21.700 |
So you would lose a lot if that robot were to be destroyed 01:43:26.460 |
You would lose a lot of investment, you know, 01:43:36.820 |
- But also you have like intellectual property rights 01:43:54.260 |
- And then there are issues of privacy, right? 01:43:55.660 |
Because now imagine that that robot has its own 01:43:59.700 |
kind of volition and decides to work for someone else 01:44:08.660 |
- Now, all the things that that system learned from you, 01:44:20.580 |
- I mean, that would be kind of an ethical question. 01:44:22.180 |
Like, you know, can you erase the mind of a intelligent 01:44:32.620 |
but that you don't have complete power over them. 01:44:36.780 |
Yeah, it's the problem with the relationships, you know, 01:44:39.020 |
that you break up, you can't erase the other human. 01:44:42.660 |
With robots, I think it will have to be the same thing 01:44:44.940 |
with robots, that risk, that there has to be some risk 01:44:50.300 |
to our interactions to truly experience them deeply, 01:44:56.140 |
So you have to be able to lose your robot friend 01:45:06.140 |
murder the robot to protect your private information? 01:45:10.300 |
- I have this intuition that for robots with certain, 01:45:19.220 |
let's call it sentient or something like that, 01:45:20.980 |
like this robot is designed for human interaction, 01:45:24.180 |
then you're not allowed to murder these robots. 01:45:28.180 |
- Well, but what about you do a backup of the robot 01:45:44.980 |
so this robot doesn't know anything about you anymore, 01:45:47.380 |
but you still have, technically it's still in existence 01:45:55.420 |
oh, sure, you can erase the mind of the robot 01:46:05.620 |
like, the robots and the humans are the same. 01:46:17.220 |
It's interesting for these, just like you said, 01:46:20.100 |
emotion seems to be a fascinatingly powerful aspect 01:46:24.180 |
of human-to-human interaction, human-robot interaction, 01:46:30.460 |
at the end of the day, that's probably going to 01:46:46.100 |
You asked about the Chinese room-type argument. 01:46:51.500 |
I think the Chinese room argument is a ridiculous one. 01:46:54.300 |
- So for people who don't know, Chinese room is, 01:47:24.300 |
you have this giant, nearly infinite lookup table. 01:47:38.940 |
do you think you can mechanize intelligence in some way, 01:47:52.140 |
which is, assuming you can reproduce intelligence 01:47:56.540 |
in sort of different hardware than biological hardware, 01:48:00.620 |
can you match human intelligence in all the domains 01:48:17.060 |
The answer to this, in my opinion, is an unqualified yes. 01:48:22.620 |
There's no question that machines, at some point, 01:48:32.180 |
regardless of what Elon and others have claimed or believed. 01:48:37.180 |
This is a lot harder than many of those guys think it is. 01:48:43.420 |
And many of those guys who thought it was simpler than that 01:48:47.460 |
now think it's hard because it's been five years 01:48:49.900 |
and they realize it's gonna take a lot longer. 01:48:53.420 |
That includes a bunch of people at DeepMind, for example. 01:48:57.020 |
I haven't actually touched base with the DeepMind folks, 01:49:12.820 |
'Cause you have to believe the impossible is possible 01:49:16.180 |
And there's, of course, a flip side to that coin, 01:49:24.300 |
But I mean, you have to inspire people, right, 01:49:28.740 |
So, you know, it's certainly a lot harder than we believe, 01:49:35.580 |
but there's no question in my mind that this will happen. 01:49:38.180 |
And now, you know, people are kind of worried about 01:49:42.460 |
They are gonna be brought down from their pedestal, 01:49:51.700 |
I mean, it's just gonna give more power, right? 01:49:53.460 |
It's an amplifier for human intelligence, really. 01:49:56.180 |
- So speaking of doing cool, ambitious things, 01:50:16.540 |
where does the newly minted meta AI fit into, 01:50:25.500 |
Yeah, FAIR was created almost exactly eight years ago. 01:50:39.460 |
that had about 12 engineers and a few scientists, 01:50:47.020 |
I ran it for three and a half years as a director, 01:50:52.300 |
and kind of set up the culture and organized it, 01:51:12.180 |
in the sense that FAIR has simultaneously produced 01:51:22.500 |
open source tools like PyTorch and many others. 01:51:29.820 |
or mostly indirect impact on Facebook at the time, 01:51:37.900 |
that meta is built around now are based on research projects 01:51:50.020 |
out of Facebook services now and meta more generally, 01:51:57.660 |
I mean, it's completely built around AI these days 01:52:03.900 |
So what happened after three and a half years 01:52:06.540 |
is that I changed role, I became chief scientist. 01:52:10.140 |
So I'm not doing day-to-day management of FAIR anymore. 01:52:21.380 |
I have, you know, my own kind of research group 01:52:23.220 |
working on self-supervised learning and things like this, 01:52:25.220 |
which I didn't have time to do when I was director. 01:52:28.140 |
So now FAIR is run by Joël Pinot and Antoine Borde. 01:52:33.820 |
Together, because FAIR is kind of split in two now, 01:52:37.820 |
which is sort of bottom-up, science-driven research 01:52:40.900 |
and FAIR Excel, which is slightly more organized 01:52:43.420 |
for bigger projects that require a little more kind of focus 01:52:47.660 |
and more engineering support and things like that. 01:52:49.740 |
So Joël needs FAIR Lab and Antoine Borde needs FAIR Excel. 01:52:56.620 |
So there's no question that the leadership of the company 01:53:02.500 |
believes that this was a very worthwhile investment. 01:53:06.540 |
And what that means is that it's there for the long run. 01:53:11.540 |
Right, so there is, if you want to talk in these terms, 01:53:16.780 |
which I don't like, there's a business model, if you want, 01:53:19.540 |
where FAIR, despite being a very fundamental research lab, 01:53:27.860 |
Now, what happened three and a half years ago 01:53:31.540 |
when I stepped down was also the creation of Facebook AI, 01:53:41.700 |
but also has other organizations that are focused 01:53:46.260 |
on applied research or advanced development of AI technology 01:53:51.220 |
that is more focused on the products of the company. 01:53:59.740 |
of those organizations and people are awesome 01:54:06.380 |
but it serves as kind of a way to kind of scale up, 01:54:15.700 |
which may be very experimental and sort of lab prototypes 01:54:25.140 |
It'll just keep the F, nobody cares what the F stands for? 01:54:29.420 |
- We'll know soon enough, probably by the end of 2021. 01:54:34.420 |
- I guess it's not a giant change, Mare, FAIR. 01:54:39.540 |
but the brand people are kind of deciding on this 01:54:45.860 |
and they tell us they're gonna come up with an answer 01:54:50.460 |
or whether we're gonna change just the meaning of the F. 01:54:54.180 |
I would keep FAIR and change the meaning of the F. 01:55:00.980 |
- Oh, that's good. - Fundamental AI research. 01:55:10.180 |
- And now meta AI is part of the reality lab. 01:55:15.180 |
So, you know, meta now, the new Facebook, right, 01:55:23.940 |
into, you know, Facebook, Instagram, WhatsApp, 01:55:40.460 |
It's kind of the, you can think of it as the sort of, 01:55:51.900 |
- Is that where the touch sensing for robots, 01:55:56.020 |
- But touch sensing for robot is part of FAIR, actually. 01:55:58.180 |
That's a FAIR product. - Oh, it is, okay, cool. 01:56:00.500 |
- This is also the, no, but there is the other way, 01:56:11.700 |
But by the way, the touch sensors is super interesting. 01:56:16.060 |
into the whole sensing suite is very interesting. 01:56:23.620 |
What do you think about this whole kind of expansion 01:56:27.740 |
of the view of the role of Facebook and meta in the world? 01:56:30.820 |
- Well, metaverse really should be thought of 01:56:40.260 |
make the experience more compelling of, you know, 01:56:44.060 |
being connected either with other people or with content. 01:56:49.420 |
And, you know, we are evolved and trained to evolve in, 01:56:57.260 |
we can see other people, we can talk to them when near them, 01:57:00.980 |
or, you know, and other people are far away can't hear us, 01:57:04.980 |
So there's a lot of social conventions that exist 01:57:08.580 |
in the real world that we can try to transpose. 01:57:32.140 |
is just basically a pair of glasses, you know, 01:57:34.300 |
and technology makes sufficient progress for that. 01:57:36.780 |
You know, AR is a much easier concept to grasp 01:57:43.180 |
augmented reality glasses that basically contain 01:57:53.460 |
With VR, you can completely detach yourself from reality, 01:58:06.500 |
Or like you can have objects that exist in the metaverse 01:58:09.300 |
that, you know, pop up on top of the real world 01:58:24.260 |
has been painted by the media as net negative for society, 01:58:30.820 |
You've pushed back against this, defending Facebook. 01:58:38.620 |
the company that is being described in some media 01:58:42.580 |
is not the company we know when we work inside. 01:58:56.540 |
I mean, I have a pretty good vision of what goes on. 01:58:58.660 |
You know, I don't know everything, obviously. 01:59:01.860 |
but certainly not in decision about like, you know, 01:59:06.100 |
but I have some decent vision of what goes on. 01:59:10.140 |
And this evil that is being described, I just don't see it. 01:59:13.660 |
And then, you know, I think there is an easy story to buy, 01:59:18.180 |
which is that, you know, all the bad things in the world, 01:59:21.740 |
and, you know, the reason your friend believe crazy stuff, 01:59:28.740 |
in social media in general, Facebook in particular. 01:59:35.460 |
Like, is it the case that Facebook, for example, 01:59:48.980 |
think of themselves less if they use Instagram more? 01:59:59.140 |
opposite sides in a debate or political opinion 02:00:02.700 |
if they are more on Facebook or if they are less? 02:00:05.700 |
And study after study show that none of this is true. 02:00:12.420 |
They're not funded by Facebook or Meta, you know, 02:00:15.900 |
studied by Stanford, by some of my colleagues at NYU, 02:00:21.220 |
You know, there's a study recently, they paid people, 02:00:31.820 |
but they paid people to not use Facebook for a while 02:00:34.380 |
in the period before the anniversary of the Srebrenica 02:00:41.140 |
So, you know, people get riled up, like, should, you know, 02:00:45.460 |
I mean, a memorial kind of celebration for it or not. 02:00:48.700 |
So they paid a bunch of people to not use Facebook 02:00:52.580 |
And it turns out that those people ended up being 02:00:57.580 |
more polarized than they were at the beginning. 02:01:07.620 |
economists at Stanford that try to identify the causes 02:01:14.460 |
And it's been going on for 40 years before, you know, 02:01:26.100 |
So you could say if social media just accelerated, 02:01:28.100 |
but no, I mean, it's basically a continuous evolution 02:01:34.300 |
And then you compare this with other countries 02:01:54.700 |
you can find a scapegoat, but you can't find a cause. 02:02:04.900 |
people now are accusing Facebook of bad deeds 02:02:09.300 |
And those others are, we're not doing anything about them. 02:02:17.700 |
- So I should mention that I'm talking to Shrepp, 02:02:20.060 |
Mike Shrepp on this podcast and also Mark Zuckerberg, 02:02:23.460 |
and probably these are conversations you can have with them. 02:02:27.620 |
even if Facebook has some measurable negative effect, 02:02:33.780 |
You have to consider about all the positive ways 02:02:39.620 |
- You can't just say like, there's an increase in division. 02:02:47.860 |
but you have to consider about how much information 02:02:51.100 |
Like I'm sure Wikipedia created more division 02:02:55.300 |
but you have to look at the full context of the world 02:02:59.100 |
- I mean, the printing press has created more division. 02:03:10.780 |
and that allowed people to read the Bible by themselves, 02:03:13.780 |
not get the message uniquely from priests in Europe. 02:03:20.340 |
and 200 years of religious persecution and wars. 02:03:23.660 |
So that's a bad side effect of the printing press. 02:03:29.340 |
but nobody would say the printing press was a bad idea. 02:03:35.100 |
and there's a lot of different incentives operating here. 02:03:40.020 |
since you're one of the top leaders at Facebook 02:03:42.660 |
and at Meta, sorry, that's in the tech space, 02:03:52.900 |
A lot of it probably is on the computer infrastructure, 02:03:54.980 |
the hardware, I mean, it's just a huge amount. 02:04:06.220 |
How much of it is flying all around doing business stuff 02:04:13.740 |
I mean, certainly in the run-up of the creation affair 02:04:18.740 |
and for at least a year after that, if not more, 02:04:26.700 |
and was spending quite a lot of effort on it. 02:04:34.100 |
He read some of my papers, for example, before I joined. 02:05:00.180 |
which is a sense of wonder about science and technology. 02:05:13.220 |
So, I mean, they're very like, you know, very human people. 02:05:27.060 |
that he's painting in the press is just completely wrong. 02:05:31.940 |
So that's, I put some of that responsibility on him too. 02:05:36.180 |
You have to, it's like, you know, like the director, 02:05:57.700 |
And it's sad to see, I'll talk to him about it, 02:06:04.020 |
It's always sad to see folks sort of be there 02:06:07.500 |
for a long time and slowly, I guess time is sad. 02:06:11.220 |
- I think he's done the thing he set out to do. 02:06:21.460 |
And I understand, you know, after 13 years or something. 02:06:28.900 |
- Which in Silicon Valley is basically a lifetime. 02:06:34.980 |
- So in Europe, the conference just wrapped up. 02:07:00.580 |
what works and what doesn't about the review process? 02:07:04.980 |
I'll talk about the review process afterwards. 02:07:12.540 |
variance, invariance, covariance, regularization. 02:07:14.900 |
And it's a technique, a non-contrastive learning technique 02:07:18.260 |
for what I call joint embedding architecture. 02:07:30.620 |
So if you want to do self-supervised learning, 02:07:35.140 |
So let's say you want to train a system to predict video, 02:07:40.260 |
and you train the system to predict the next, 02:07:51.580 |
you need to have, you need to handle this in some way. 02:08:19.460 |
I call this a generative latent variable model. 02:08:34.820 |
you also run those through another neural net. 02:08:48.660 |
and another one that looks at the continuation 02:08:52.380 |
And what you're trying to do is learn a representation 02:09:03.420 |
but is such that you can predict the representation 02:09:08.540 |
from the representation of the first one, easily. 02:09:15.300 |
and some stuff like that, but it doesn't matter. 02:09:23.100 |
of the two video clips that are mutually predictable. 02:09:27.460 |
What that means is that there's a lot of details 02:09:30.860 |
in the second video clips that are irrelevant. 02:09:33.140 |
You know, let's say a video clip consists in, 02:09:52.300 |
and where the tiles are ending and stuff like that, right? 02:09:56.340 |
that perhaps my representation will eliminate. 02:09:59.620 |
And so what I need is to train this second neural net 02:10:09.020 |
video clip varies over all the plausible continuations, 02:10:29.580 |
In the first way, you parametrize the prediction 02:10:38.380 |
you predict an abstract representation of pixels 02:10:40.660 |
and you guarantee that this abstract representation 02:10:43.460 |
has as much information as possible about the input, 02:10:47.020 |
drops all the stuff that you really can't predict, 02:10:50.540 |
I used to be a big fan of the first approach. 02:10:53.860 |
And in fact, in this paper with Ishan Mishra, 02:10:55.580 |
this blog post, the dark matter intelligence, 02:11:05.540 |
And it's because of a small collection of algorithms 02:11:10.020 |
that have been proposed over the last year and a half 02:11:13.220 |
or so, two years to do this, including V-Craig, 02:11:24.540 |
And there's a bunch of others now that kind of work similarly. 02:11:29.580 |
So they're all based on this idea of joint embedding. 02:11:34.660 |
that is an approximation of mutual information. 02:11:36.620 |
Some others at BYOL work, but we don't really know why. 02:11:39.420 |
And there's been like lots of theoretical papers 02:11:47.820 |
but the important point is that we now have a collection 02:11:53.700 |
which I think is the best thing since sliced bread. 02:11:58.300 |
because I think it's our best shot for techniques 02:12:02.020 |
that would allow us to kind of build predictive world models 02:12:07.460 |
learn hierarchical representations of the world 02:12:09.900 |
where what matters about the world is preserved 02:12:14.420 |
- And by the way, the representations of before and after 02:12:22.300 |
- It would be either for a single image, for a sequence. 02:12:26.700 |
This could be applied to just about any signal. 02:12:28.540 |
I'm looking for methods that are generally applicable 02:12:32.940 |
that are not specific to one particular modality. 02:12:39.660 |
This paper is what, is describing one such method? 02:12:45.700 |
the first author is a student called Adrien Barne, 02:12:55.820 |
who's a professor at Ecole Normale Supérieure, 02:13:03.580 |
where PhD students can basically do their PhD in industry. 02:13:08.940 |
And this paper is a follow-up on this Barlow-Twin paper 02:13:27.780 |
is that Vick-Reich is not different enough from Barlow-Twins. 02:13:39.860 |
And in the end, this is what people will use. 02:13:44.500 |
- But I'm used to stuff that I submit being rejected for a while. 02:13:48.980 |
- So it might be rejected and actually exceptionally well cited 02:13:52.140 |
- Well, it's already cited like a bunch of times. 02:13:54.340 |
- So, I mean, the question is then to the deeper question 02:14:00.220 |
I mean, computer science as a field is kind of unique 02:14:06.620 |
- And it's interesting because the peer review process 02:14:16.500 |
And it's a nice way to get stuff out quickly, 02:14:25.940 |
- But nevertheless, it has many of the same flaws 02:14:37.020 |
There's self-interested people that kind of can infer 02:14:42.060 |
who submitted it and kind of be cranky about it, 02:14:47.700 |
- Yeah, I mean, there's a lot of social phenomenon there. 02:14:53.180 |
because the field has been growing exponentially, 02:15:00.820 |
- So as a consequence, and that's just a consequence 02:15:04.860 |
So as the number of, as the size of the field 02:15:07.860 |
kind of starts saturating, you will have less 02:15:10.100 |
of that problem of reviewers being very inexperienced. 02:15:15.100 |
A consequence of this is that young reviewers, 02:15:24.620 |
and to make their life easy when reviewing a paper 02:15:27.460 |
is very simple, you just have to find a flaw in the paper. 02:15:30.540 |
So basically they see their task as finding flaws in papers 02:15:34.500 |
and most papers have flaws, even the good ones. 02:15:41.500 |
Your job is easier as a reviewer if you just focus on this. 02:15:46.420 |
But what's important is like, is there a new idea 02:15:54.120 |
It doesn't matter if the experiments are not that great, 02:15:56.240 |
if the protocol is, you know, so-so, you know, 02:16:00.680 |
things like that, as long as there is a worthy idea in it 02:16:05.040 |
that will influence the way people think about the problem, 02:16:08.080 |
even if they make it better, you know, eventually, 02:16:11.160 |
I think that's really what makes a paper useful. 02:16:19.480 |
creates a disease that has plagued, you know, 02:16:24.120 |
other fields in the past, like speech recognition, 02:16:26.640 |
where basically, you know, people chase numbers 02:16:28.520 |
on benchmarks and it's much easier to get a paper accepted 02:16:37.000 |
on a sort of mainstream, well-accepted method or problem. 02:16:47.860 |
Because industry, you know, strives on those kind of progress 02:16:52.340 |
but they're not the one that I'm interested in 02:16:59.260 |
kind of new advances generally don't make it. 02:17:05.260 |
And then there's open review type of situations where you, 02:17:08.820 |
and then, I mean, Twitter is a kind of open review. 02:17:11.620 |
I'm a huge believer that review should be done 02:17:21.200 |
it's already the present, but a growing future 02:17:25.320 |
and you're presenting an ongoing, continuous conference 02:17:43.420 |
- It's not a question of being elitist or not. 02:17:44.940 |
It's a question of being basically recommendation 02:17:49.940 |
and seal of approvals for people who don't see themselves 02:17:53.340 |
as having the ability to do so by themselves, right? 02:18:09.920 |
'Cause you don't have to like scrutinize the paper as much. 02:18:15.980 |
of sort of collective recommender system, right? 02:18:27.020 |
and we're about to create iClear with Yoshua Bengio. 02:18:39.660 |
let's say archive or now could be open review. 02:18:48.120 |
you know, of a journal or a program committee 02:19:05.580 |
between a paper and a venue or reviewing entity. 02:19:20.320 |
which would be public, signed by the reviewing entity. 02:19:25.880 |
you know, it's one of the members of reviewing entity. 02:19:30.680 |
Lex Friedman's, you know, preferred papers, right? 02:19:33.700 |
You know, it's Lex Friedman writing the review. 02:19:36.700 |
So for me, that's a beautiful system, I think. 02:19:42.900 |
it feels like there should be a reputation system 02:19:59.340 |
it's an incentive for an individual person to do great. 02:20:09.240 |
But honestly, that's not a strong enough incentive 02:20:13.700 |
in finding the beautiful amidst the mistakes and the flaws 02:20:27.740 |
- That's a big part of my proposal, actually, 02:20:37.500 |
then your reputation should go up as a reviewing entity. 02:20:46.260 |
who was a master's student in library science 02:20:52.460 |
how that should work with formulas and everything. 02:20:58.580 |
- I mean, I've been sort of talking about this 02:21:23.820 |
published with a paper, which I think is very useful. 02:21:29.740 |
to kind of more of a conventional type conferences 02:21:59.660 |
Yeah, 'cause the communication of science broadly, 02:22:02.060 |
but the communication of computer science ideas 02:22:04.420 |
is how you make those ideas have impact, I think. 02:22:08.300 |
- Yeah, and I think, you know, a lot of this is 02:22:11.420 |
because people have in their mind kind of an objective, 02:22:24.860 |
But that comes at the expense of the progress of science. 02:22:34.420 |
- And we're not achieving fairness, you know, 02:22:38.060 |
we're doing, you know, a double-blind review, 02:22:46.700 |
- You write that the phenomenon of emergence, 02:22:49.340 |
collective behavior exhibited by a large collection 02:23:04.020 |
Do you think we understand how complex systems 02:23:16.020 |
You know, how is it that the universe around us 02:23:22.060 |
seems to be increasing in complexity and not decreasing? 02:23:25.140 |
I mean, that is a kind of curious property of physics 02:23:29.620 |
that despite the second law of thermodynamics, 02:23:32.340 |
we seem to be, you know, evolution and learning 02:23:35.940 |
and et cetera seems to be kind of at least locally 02:23:43.980 |
So perhaps the ultimate purpose of the universe 02:23:49.060 |
- Have these, I mean, small pockets of beautiful complexity. 02:23:57.100 |
do these kinds of emergence of complex systems 02:23:59.660 |
give you some intuition or guide your understanding 02:24:04.100 |
of machine learning systems and neural networks and so on? 02:24:06.660 |
Or are these for you right now, disparate concepts? 02:24:10.860 |
You know, I discovered the existence of the perceptron 02:24:18.540 |
by reading a book on, it was a debate between Chomsky 02:24:24.180 |
was kind of singing the praise of the perceptron 02:24:27.460 |
And I, the first time I heard about the learning machine, 02:24:33.540 |
which were basically transcription of, you know, 02:24:36.020 |
workshops or conferences from the 50s and 60s 02:24:42.140 |
So there were, there was a series of conferences 02:24:44.540 |
on self-organizing systems and these books on this. 02:24:50.180 |
at the internet archive, you know, the digital version. 02:24:53.220 |
And there are like fascinating articles in there by, 02:24:58.260 |
there's a guy whose name has been largely forgotten, 02:25:01.740 |
So it was a German physicist who immigrated to the US 02:25:06.180 |
and worked on self-organizing systems in the 50s. 02:25:14.420 |
he created the biological computer laboratory, BCL, 02:25:21.580 |
Unfortunately, that was kind of towards the end 02:25:27.660 |
but he wrote a bunch of papers about self-organization 02:25:35.620 |
imagine you are in space, there's no gravity, 02:25:43.820 |
with North Pole on one end, South Pole on the other end. 02:25:50.100 |
and probably form a complex structure, you know, 02:25:55.420 |
that could be an example of self-organization. 02:25:58.340 |
neural nets are an example of self-organization too, 02:26:05.900 |
how, like what is possible with this, you know, 02:26:11.940 |
in chaotic system and things like that, you know, 02:26:14.700 |
the emergence of life, you know, things like that. 02:26:24.660 |
the mathematics of emergence in some constrained situations 02:26:32.060 |
Like help us add a little spice to the systems 02:26:55.860 |
- And it's something also I've been fascinated by 02:27:03.900 |
So we don't actually have good ways of measuring, 02:27:06.940 |
or at least we don't have good ways of interpreting 02:27:11.940 |
Like how do you measure the complexity of something, right? 02:27:15.660 |
like, you know, Kolmogorov, Chaitin, Solomonov complexity 02:27:18.540 |
of, you know, the length of the shortest program 02:27:22.460 |
can be thought of as the complexity of that bit string. 02:27:28.180 |
The problem with that is that that complexity 02:27:32.380 |
is defined up to a constant, which can be very large. 02:27:36.740 |
- There are similar concepts that are derived from, 02:27:45.580 |
is the negative log of its probability, essentially, right? 02:27:49.460 |
And you have a complete equivalence between the two things. 02:27:59.060 |
You need to have a model of the distribution. 02:28:07.940 |
with which you measure Kolmogorov complexity. 02:28:20.500 |
And so, you know, how can we come up with a good theory 02:28:25.580 |
if we don't have a good measure of complexity? 02:28:37.820 |
And the more interesting one is the alien one, 02:28:44.700 |
'Cause, you know, complexity, we associate complexity, 02:28:49.820 |
You know, we have to be able to like have concrete algorithms 02:28:55.700 |
for like measuring the level of complexity we see 02:29:00.780 |
in order to know the difference between life and non-life. 02:29:08.100 |
If I give you an image of the MNIST digits, right? 02:29:16.020 |
there is some, obviously some structure to it 02:29:30.980 |
to all the pixels, a fixed random permutation. 02:29:34.580 |
I show you those images, they will look, you know, 02:29:40.420 |
In fact, they're not more complex in absolute terms, 02:29:43.500 |
they're exactly the same as originally, right? 02:29:46.100 |
And if you knew what the permutation was, you know, 02:29:54.700 |
Now, all of a sudden, what looked complicated becomes simple. 02:30:00.900 |
humans on one end and then another race of aliens 02:30:03.820 |
that sees the universe with permutation glasses. 02:30:08.740 |
- What we perceive as simple to them is hardly complicated, 02:30:13.540 |
- Okay, and what they perceive as simple to us 02:30:25.780 |
- Depends what kind of algorithm you're running 02:30:28.380 |
- So I don't think we'll have a theory of intelligence, 02:30:31.140 |
self-organization, evolution, things like this 02:30:34.380 |
until we have a good handle on a notion of complexity, 02:30:40.860 |
- Yeah, it's sad to think that we might not be able 02:30:53.300 |
- This actually connects with fascinating questions 02:30:55.260 |
in physics at the moment, like modern physics, 02:30:58.140 |
quantum physics, like, you know, questions about, 02:31:00.300 |
like, you know, can we recover the information 02:31:02.580 |
that's lost in a black hole and things like this, right? 02:31:13.420 |
to build an expressive electronic wind instrument, EWI? 02:31:26.820 |
I like building things with combinations of electronics 02:31:32.460 |
You know, I have a bunch of different hobbies, 02:31:34.140 |
but, you know, probably my first one was little, 02:31:38.020 |
was building model airplanes and stuff like that. 02:31:42.740 |
I taught myself electronics before I studied it. 02:31:49.620 |
My cousin was an aspiring electronic musician 02:31:55.020 |
And I was, you know, basically modifying it for him 02:31:58.020 |
and building sequencers and stuff like that, right, for him. 02:32:02.620 |
- How's the interest in like progressive rock, like '80s? 02:32:11.100 |
But, you know, it's a combination of, you know, 02:32:39.500 |
And I played in an orchestra when I was in high school 02:32:48.060 |
a little bit of oboe, you know, things like that. 02:32:52.540 |
But I always wanted to play improvised music, 02:33:06.380 |
but, you know, you have wide variety of sound 02:33:13.100 |
going back to the late 80s from either Yamaha or Akai. 02:33:18.100 |
They're both kind of the main manufacturers of those. 02:33:25.700 |
But I've never been completely satisfied with them 02:33:54.340 |
You can hear it's Miles Davis playing the trumpet 02:34:04.780 |
The shape of the vocal track kind of shapes the sound. 02:34:09.700 |
So how do you do this with an electronic instrument? 02:34:12.860 |
And I was, many years ago I met a guy called David Wessel. 02:34:26.140 |
And so I kept kind of thinking about this for many years. 02:34:28.620 |
And finally, because of COVID, you know, I was at home. 02:34:32.580 |
My workshop serves also as my kind of Zoom room 02:34:39.620 |
And I started really being serious about, you know, 02:34:45.780 |
- What else is going on in that New Jersey workshop? 02:34:50.860 |
Like just, or like left on the workshop floor, left behind? 02:34:57.580 |
electronics built with microcontrollers of various kinds 02:35:12.620 |
and he was building model airplanes when he was a kid. 02:35:33.020 |
- Do you also have an interest in appreciation of flight 02:35:36.100 |
in other forms, like with drones, quadropters, or do you, 02:35:51.940 |
with a gyroscopes and accelerometers for stabilization, 02:35:57.700 |
And then when it became kind of a standard thing 02:36:02.460 |
- Yeah, you were doing it before it was cool. 02:36:07.100 |
- What advice would you give to a young person today 02:36:10.020 |
in high school and college that dreams of doing 02:36:15.940 |
like let's talk in the space of intelligence, 02:36:20.940 |
some fundamental problem in space of intelligence, 02:36:26.180 |
being somebody who was a part of creating something special? 02:36:42.500 |
Like even like crazy big questions, like what's time? 02:36:51.460 |
And then learn basic things, like basic methods, 02:36:56.460 |
either from math, from physics or from engineering. 02:37:05.620 |
Like if you have a choice between like, you know, 02:37:08.740 |
learning, you know, mobile programming on iPhone 02:37:11.700 |
or quantum mechanics, take quantum mechanics. 02:37:20.420 |
And you may not, you may never be a quantum physicist, 02:37:29.140 |
It's the same formula that you use for, you know, 02:37:33.300 |
- So the ideas, the little ideas within quantum mechanics, 02:37:38.100 |
within some of these kind of more solidified fields 02:37:48.100 |
like you learn about Lagrangians, for example. 02:37:50.420 |
Which is like a hugely useful concept, you know, 02:37:57.300 |
Learn statistical physics, because all the math 02:38:01.660 |
that comes out of, you know, for machine learning, 02:38:10.940 |
So, and for some of them actually more recently, 02:38:16.100 |
who just got the Nobel prize for the replica method, 02:38:19.060 |
among other things, it's used for a lot of different things. 02:38:27.620 |
So, a lot of those kind of, you know, basic courses, 02:38:39.860 |
Again, something super useful is at the basis 02:38:44.900 |
which is an entirely new sub area of, you know, 02:39:00.420 |
Or to science that can help solve big problems in the world. 02:39:09.220 |
who started this project called Open Catalyst, 02:39:16.620 |
to help design new chemical compounds or materials 02:39:25.780 |
If you can efficiently separate oxygen from hydrogen 02:39:39.740 |
and you have them work all day, produce hydrogen, 02:39:43.420 |
and then you shoot the hydrogen wherever it's needed. 02:39:53.420 |
that's, you know, can be transported anywhere. 02:39:59.700 |
energy storage technology, like producing hydrogen, 02:40:13.580 |
and the plasma is unstable, and you can't control it. 02:40:16.220 |
Maybe with deep learning, you can find controllers 02:40:19.100 |
and make, you know, practical fusion reactors. 02:40:33.900 |
in science and physics and biology and chemistry 02:40:41.540 |
- Right, I mean, there's properties of, you know, 02:40:48.540 |
So, you know, if we could design new, you know, 02:40:53.060 |
new materials, we could make more efficient batteries. 02:40:56.420 |
You know, we could make maybe faster electronics. 02:40:58.780 |
We could, I mean, there's a lot of things we can imagine 02:41:07.620 |
I mean, there's all kinds of stuff we can imagine. 02:41:09.500 |
If we had good fuel cells, hydrogen fuel cells, 02:41:13.620 |
and, you know, transportation wouldn't be, or cars, 02:41:20.300 |
CO2 emission problems for air transportation anymore. 02:41:29.180 |
And this is not even talking about all the sort of 02:41:32.420 |
medicine, biology, and everything like that, right? 02:41:38.100 |
figuring out, like, how can you design your proteins 02:41:40.540 |
that it sticks to another protein at a particular site, 02:41:42.820 |
because that's how you design drugs in the end. 02:41:47.580 |
all of this, and those are kind of, you know, 02:41:54.300 |
If you take, this is like from recent material physics, 02:41:58.260 |
you take a monoatomic layer of graphene, right? 02:42:10.340 |
you twist them by some magic number of degrees, 02:42:13.100 |
three degrees or something, it becomes superconductor. 02:42:22.460 |
but that's the kind of thing that machine learning 02:42:23.900 |
can actually discover, these kinds of things. 02:42:28.980 |
that with machine learning, we would train a system 02:42:40.380 |
where this collective phenomenon is too difficult 02:42:46.900 |
with the usual sort of reductionist type method. 02:42:59.180 |
after being trained with sufficiently many samples. 02:43:08.100 |
where he basically trained a convolutional net, 02:43:13.420 |
essentially, to predict the aerodynamic properties of solids. 02:43:17.980 |
And you can generate as much data as you want 02:43:19.620 |
by just running computational free dynamics, right? 02:43:40.060 |
train a neural net to make those predictions, 02:43:41.780 |
and now what you have is a differentiable model 02:43:58.260 |
probably you should read a little bit of literature 02:44:01.420 |
and a little bit of history for inspiration and for wisdom, 02:44:18.380 |
I'm really honored that you would talk with me today. 02:44:26.220 |
after all these years about everything that's going on. 02:44:28.780 |
You're a beacon of hope for the machine learning community. 02:44:32.700 |
for spending your valuable time with me today. 02:44:37.780 |
- Thanks for listening to this conversation with Jan Lekun. 02:44:42.780 |
please check out our sponsors in the description. 02:44:49.580 |
"Your assumptions are your windows on the world.