back to indexYann LeCun: Deep Learning, ConvNets, and Self-Supervised Learning | Lex Fridman Podcast #36
Chapters
0:0 Intro
1:11 Space Odyssey
1:38 Value misalignment
3:5 Designing objective functions
4:12 Ethical systems
4:57 Holding secrets
5:42 Autonomous AI
9:15 Intuition
10:49 Learning and Reasoning
13:10 Working Memory
16:7 Energy minimization
18:20 Expert systems
21:13 Causal inference
24:23 Deep learning in the 90s
31:10 A war story
34:48 Toy problems
36:41 Interactive environments
40:48 How many boolean functions
45:8 Selfsupervised running
47:18 Ground language in reality
50:28 Handling uncertainty in the world
52:14 Selfsupervised learning
53:53 Modelbased reinforcement running
56:58 Transfer learning
00:00:00.000 |
The following is a conversation with Jan Lekun. 00:00:03.080 |
He's considered to be one of the fathers of deep learning, 00:00:09.040 |
is the recent revolution in AI that's captivated the world 00:00:12.280 |
with the possibility of what machines can learn from data. 00:00:18.520 |
a vice president and chief AI scientist at Facebook, 00:00:26.280 |
He's probably best known as the founding father 00:00:40.080 |
unafraid to speak his mind in a distinctive French accent 00:00:45.720 |
both in the rigorous medium of academic research 00:00:57.960 |
give it five stars on iTunes, support it on Patreon, 00:01:00.960 |
or simply connect with me on Twitter at Lex Friedman, 00:01:06.840 |
And now, here's my conversation with Jan Lekun. 00:01:15.400 |
Hal 9000 decides to get rid of the astronauts 00:01:20.360 |
for people that haven't seen the movie, spoiler alert, 00:01:23.040 |
because he, it, she believes that the astronauts, 00:01:31.600 |
Do you see Hal as flawed in some fundamental way 00:01:52.120 |
and the machine strives to achieve this objective. 00:01:55.680 |
And if you don't put any constraints on this objective, 00:01:58.120 |
like don't kill people and don't do things like this, 00:02:00.760 |
the machine, given the power, will do stupid things 00:02:07.960 |
or damaging things to achieve this objective. 00:02:10.160 |
It's a little bit like, I mean, we're used to this 00:02:31.520 |
and education, obviously, to sort of correct for those. 00:02:35.160 |
- So maybe just pushing a little further on that point, 00:02:44.360 |
There's a, there's fuzziness around the ambiguity 00:02:49.800 |
but, you know, do you think that there will be a time 00:02:54.800 |
from a utilitarian perspective where an AI system, 00:02:58.160 |
where it is not misalignment, where it is alignment 00:03:02.800 |
that an AI system will make decisions that are difficult? 00:03:06.800 |
I mean, eventually we'll have to figure out how to do this. 00:03:12.600 |
because we've been doing this with humans for millennia. 00:03:20.880 |
And we don't do it by, you know, programming things, 00:03:30.680 |
And it's actually the design of an objective function. 00:03:40.720 |
So there is this idea somehow that it's a new thing 00:03:44.560 |
for people to try to design objective functions 00:03:47.920 |
But no, we've been writing laws for millennia 00:03:52.080 |
So that's where, you know, the science of lawmaking 00:04:02.840 |
- So it's nothing, there's nothing special about HAL 00:04:09.440 |
to make some of these difficult ethical judgments 00:04:13.000 |
- Yeah, and we have systems like this already 00:04:15.080 |
that make many decisions for ourselves in society 00:04:22.600 |
like rules about things that sometimes have bad side effects. 00:04:27.480 |
And we have to be flexible enough about those rules 00:04:39.640 |
- Wow, is that by accident or is there a lot- 00:04:52.520 |
so an improvement of HAL 9,000, what would you improve? 00:04:57.920 |
I wouldn't ask you to hold secrets and tell lies 00:05:01.960 |
because that's really what breaks it in the end. 00:05:03.800 |
That's the fact that it's asking itself questions 00:05:11.720 |
all the secrecy of the preparation of the mission 00:05:13.960 |
and the fact that it was a discovery on the lunar surface 00:05:28.560 |
- So you think there's never should be a set of things 00:05:35.480 |
like a set of facts that should not be shared 00:05:42.320 |
- Well, I think, no, I think it should be a bit like 00:06:05.920 |
and we can sort of hardwire this into our machines 00:06:10.920 |
So I'm not, you know, an advocate of the three laws 00:06:14.640 |
of robotics, you know, the Asimov kind of thing, 00:06:26.920 |
these are not questions that are kind of really worth 00:06:31.120 |
asking today because we just don't have the technology 00:06:34.400 |
We don't have autonomous intelligent machines. 00:06:37.480 |
semi-intelligent machines that are very specialized. 00:06:40.960 |
But they don't really sort of satisfy an objective. 00:06:43.320 |
They're just, you know, kind of trained to do one thing. 00:06:49.960 |
of a full-fledged autonomous intelligent system, 00:06:53.320 |
asking the question of how we design this objective, 00:07:01.560 |
in that it helps us understand our own ethical codes, 00:07:10.240 |
if you imagine that an AGI system is here today, 00:07:14.280 |
how would we program it as a kind of nice thought experiment 00:07:39.200 |
but certainly they shouldn't be framed as hell. 00:07:49.400 |
but what is the most beautiful or surprising idea 00:08:11.040 |
The fact that you can build gigantic neural nets, 00:08:16.440 |
train them on relatively small amounts of data, 00:08:26.920 |
breaks everything you read in every textbook, right? 00:08:29.240 |
Every pre-deep learning textbook that told you 00:09:00.360 |
before I knew anything that this is a good idea. 00:09:10.080 |
- So, okay, so can you talk through the intuition 00:09:12.280 |
of why it was obvious to you if you remember? 00:09:16.120 |
it's sort of like those people in the late 19th century 00:09:19.960 |
who proved that heavier than air flight was impossible, right? 00:09:30.400 |
it's obviously wrong as an empirical question, right? 00:09:39.920 |
And we know it's a large network of neurons in interaction 00:09:43.160 |
and that learning takes place by changing the connections. 00:09:49.320 |
but sort of trying to derive basic principles, 00:09:59.680 |
that I've been convinced of since I was an undergrad 00:10:04.680 |
that intelligence is inseparable from learning. 00:10:10.040 |
an intelligent machine by basically programming, 00:10:20.320 |
arrives at this intelligence through learning. 00:10:25.800 |
machine learning was a completely obvious path. 00:10:35.200 |
and learning is the automation of intelligence. 00:10:44.560 |
Because do you think of reasoning as learning? 00:11:03.440 |
- Do you think neural networks can be made to reason? 00:11:18.280 |
will emerge from it, you know, from learning? 00:11:23.160 |
all of our kind of model of what reasoning is 00:11:28.880 |
and are therefore incompatible with gradient-based learning. 00:11:39.920 |
that don't use kind of gradient information, if you want. 00:11:47.520 |
it's just that it's incompatible with learning 00:11:57.560 |
with suspicion by a lot of computer scientists 00:12:07.720 |
the kind of math you do in electrical engineering 00:12:10.200 |
than the kind of math you do in computer science. 00:12:12.760 |
And, you know, nothing in machine learning is exact, right? 00:12:16.200 |
Computer science is all about sort of, you know, 00:12:21.920 |
of like, you know, every index has to be right 00:12:24.200 |
and you can prove that an algorithm is correct, right? 00:12:27.200 |
Machine learning is the science of sloppiness, really. 00:12:33.560 |
So, okay, maybe let's feel around in the dark 00:12:41.840 |
or a system that works with continuous functions 00:12:54.320 |
build on previous knowledge, build on extra knowledge, 00:12:59.520 |
generalize outside of any training set ever built. 00:13:04.560 |
If, yeah, maybe do you have inklings of thoughts 00:13:14.200 |
I think, you know, we'd be building it right now. 00:13:22.240 |
So what you need to have is a working memory. 00:13:25.320 |
So you need to have some device, if you want, 00:13:29.960 |
some subsystem that can store a relatively large number 00:13:43.920 |
there are kind of three main types of memory. 00:13:45.760 |
One is the sort of memory of the state of your cortex, 00:13:51.560 |
and that sort of disappears within 20 seconds. 00:13:53.800 |
You can't remember things for more than about 20 seconds 00:13:56.200 |
or a minute if you don't have any other form of memory. 00:14:00.400 |
The second type of memory, which is longer term, 00:14:04.320 |
So you can, you know, you came into this building, 00:14:06.520 |
you remember where the exit is, where the elevators are. 00:14:15.360 |
You might remember something about what I said, 00:14:19.360 |
you might remember something about what I said, 00:14:28.400 |
And then the longer term memory is in the synapse, 00:14:36.440 |
is that you want the hippocampus-like thing, right? 00:14:44.560 |
neural training machines and stuff like that, right? 00:14:46.680 |
Transformers, which have sort of a memory in there, 00:14:57.120 |
Another thing you need is some sort of network 00:15:03.200 |
get an information back, and then kind of crunch on it, 00:15:14.280 |
is a process by which you update your knowledge 00:15:31.080 |
so that seems to be too small to contain the knowledge 00:15:39.200 |
- Well, a transformer doesn't have this idea of recurrence. 00:15:47.120 |
- But recurrence would build on the knowledge somehow. 00:15:54.680 |
and expand the amount of information, perhaps, 00:16:00.320 |
But is this something that just can emerge with size? 00:16:04.760 |
Because it seems like everything we have now is too small. 00:16:13.000 |
way, I mean, sort of the original memory network 00:16:15.200 |
maybe had something like the right architecture, 00:16:20.520 |
so that the memory contains all of Wikipedia, 00:16:25.120 |
- So there's a need for new ideas there, okay. 00:16:31.360 |
which is very classical, also, in some types of AI, 00:16:36.360 |
and it's based on, let's call it energy minimization. 00:17:22.120 |
You have a model of what's going to happen in the world 00:17:25.560 |
and that allows you to, by energy minimization, 00:17:29.840 |
that optimizes a particular objective function, 00:17:43.520 |
And perhaps what led to the ability of humans to reason 00:18:16.720 |
is not a useful way to think about knowledge? 00:18:20.280 |
- Graphs are a little brittle, or logic representation. 00:18:23.960 |
So basically, you know, variables that have values 00:18:31.280 |
is a little too rigid and too brittle, right? 00:18:32.840 |
So one of the, you know, some of the early efforts 00:18:35.680 |
in that respect were to put probabilities on them. 00:18:40.680 |
So a rule, you know, if you have this and that symptom, 00:18:44.280 |
you know, you have this disease with that probability, 00:18:58.320 |
you know, Bayesian networks and graphical models 00:19:00.320 |
and causal inference and variational, you know, method. 00:19:04.960 |
So there is, I mean, certainly a lot of interesting work 00:19:11.440 |
The main issue with this is knowledge acquisition. 00:19:13.880 |
How do you reduce a bunch of data to a graph of this type? 00:19:21.840 |
on the human being to encode, to add knowledge. 00:19:31.440 |
do you want to represent knowledge as symbols, 00:19:34.640 |
and do you want to manipulate them with logic? 00:19:37.240 |
And again, that's incompatible with learning. 00:19:42.680 |
Geoff Hinton has been advocating for many decades 00:20:14.360 |
"From Machine Learning to Machine Reasoning." 00:20:23.160 |
and then put the result back in the same space. 00:20:24.920 |
So it's this idea of working memory, basically. 00:20:36.900 |
I mean, you can learn basic logic operations there. 00:20:43.400 |
There's a big debate on sort of how much prior structure 00:20:46.680 |
you have to put in for this kind of stuff to emerge. 00:20:57.520 |
from the, you mentioned causal inference world. 00:21:00.240 |
So his worry is that the current neural networks 00:21:06.960 |
what causes what causal inference between things. 00:21:12.760 |
- So I think he's right and wrong about this. 00:21:21.320 |
people sort of didn't worry too much about this. 00:21:23.800 |
But there's a lot of people now working on causal inference. 00:21:26.200 |
And there's a paper that just came out last week 00:21:32.000 |
exactly on that problem of how do you kind of, 00:21:48.040 |
- I'd like to read that paper because that ultimately, 00:21:51.200 |
the challenge there is also seems to fall back 00:21:56.920 |
to ultimately decide causality between things. 00:22:01.920 |
- People are not very good at establishing causality, 00:22:06.560 |
and physicists actually don't believe in causality 00:22:08.560 |
because look at all the basic laws of macrophysics 00:22:17.400 |
- It's as soon as you start looking at macroscopic systems, 00:22:28.360 |
- Is it emergent or is it part of the fundamental fabric 00:22:34.320 |
- Or is it a bias of intelligent systems that, 00:22:48.480 |
the math doesn't care about the flow of time. 00:22:54.120 |
People themselves are not very good at establishing 00:22:58.960 |
If you ask, I think it was in one of Seymour Papert's book 00:23:08.880 |
he's the guy who co-authored the book "Perceptron" 00:23:17.240 |
He, in the sense of studying learning in humans 00:23:21.080 |
and machines, that's why he got interested in perceptron. 00:23:32.680 |
a lot of kids will say, they will think for a while 00:23:35.840 |
and they'll say, "Oh, it's the branches in the trees. 00:23:40.120 |
So they get the causal relationship backwards. 00:23:42.600 |
And it's because their understanding of the world 00:23:46.280 |
I mean, these are like four or five year old kids. 00:23:48.800 |
It gets better and then you understand that this, 00:23:57.440 |
because of our common sense understanding of things, 00:24:04.960 |
there's a lot of stuff that we can figure out 00:24:24.520 |
but all of humanity has been completely deluded 00:24:34.600 |
you attribute it to some deity, some divinity. 00:24:40.120 |
That's a way of saying, "I don't know the cause, 00:24:43.080 |
- So you mentioned Marvin Minsky and the irony of 00:24:54.600 |
You were there in the '90s, you were there in the '80s, 00:24:58.120 |
In the '90s, why do you think people lost faith 00:25:00.640 |
in deep learning in the '90s and found it again 00:25:13.840 |
I mean, I think I would put that around 1995, 00:25:23.760 |
from mainstream machine learning, if you want. 00:25:26.280 |
There were, it was basically electrical engineering 00:25:39.600 |
I was too close to it to really sort of analyze it 00:25:50.800 |
So the first one is, at the time, neural nets were, 00:25:57.880 |
in the sense that you would implement backprop 00:26:18.680 |
You would probably make some very basic mistakes, 00:26:21.320 |
like, you know, barely initialize your weights, 00:26:27.640 |
And of course, you know, and you would train on XOR 00:26:29.280 |
because you didn't have any other dataset to train on. 00:26:32.000 |
And of course, you know, it works half the time. 00:26:36.280 |
Also, you would train it with batch gradient, 00:26:40.240 |
So there was a lot of, there was a bag of tricks 00:26:42.680 |
that you had to know to make those things work, 00:26:54.720 |
to be able to kind of, you know, display things, 00:26:59.360 |
kind of get a good intuition for how to get them to work, 00:27:02.120 |
have enough flexibility so you can create, you know, 00:27:04.640 |
network architectures like convolutional nets 00:27:09.160 |
I mean, you had to write everything from scratch. 00:27:22.760 |
which by the way, one of my favorite languages. 00:27:34.960 |
it's that we had to write our Lisp interpreter. 00:27:37.560 |
Okay, 'cause it's not like we used one that existed. 00:27:50.920 |
we invented this idea of basically having modules 00:27:57.680 |
and then interconnecting those modules in a graph. 00:28:04.800 |
and we were able to implement this using our Lisp system. 00:28:14.400 |
So we actually wrote a compiler for that Lisp interpreter 00:28:16.880 |
so that Petrissi Mart, who is now at Microsoft, 00:28:26.640 |
and then we'll have a self-contained compute system 00:28:32.280 |
Neither PyTorch nor TensorFlow can do this today. 00:28:40.280 |
- I mean, there's something like that in PyTorch 00:28:48.120 |
we had to invest a huge amount of effort to do this. 00:28:52.440 |
if you don't completely believe in the concept, 00:28:55.160 |
you're not going to invest the time to do this. 00:28:59.320 |
or today, this would turn into Torch or PyTorch 00:29:05.000 |
everybody would use it and realize it's good. 00:29:13.800 |
release anything in open source of this nature. 00:29:17.760 |
And so we could not distribute our code, really. 00:29:24.920 |
I also read that there was some almost patent, 00:29:27.760 |
like a patent on convolutional neural networks. 00:30:07.600 |
And there was a period where the US patent office 00:30:24.040 |
I mean, I never actually strongly believed in this, 00:30:28.880 |
Facebook basically doesn't believe in this kind of patent. 00:30:33.200 |
Google fires patents because they've been burned with Apple. 00:30:38.200 |
And so now they do this for defensive purpose, 00:30:59.480 |
They are there because of the legal landscape 00:31:11.760 |
So what happens was the first patent about convolutional net 00:31:15.400 |
was about kind of the early version of convolutional net 00:31:22.800 |
with stride more than one, if you want, right? 00:31:25.200 |
And then there was a second one on convolutional nets 00:31:28.400 |
with separate pooling layers, training with backprop. 00:31:36.200 |
At the time, the life of a patent was 17 years. 00:31:39.320 |
So here's what happened over the next few years 00:31:44.640 |
character recognition technology around convolutional nets. 00:31:56.160 |
In 1995, it was for large check reading machines 00:32:00.640 |
And those systems were developed by an engineering group 00:32:16.600 |
And the lawyers just looked at all the patents 00:32:20.400 |
and they distributed the patents among the various companies. 00:32:22.960 |
They gave the convolutional net patent to NCR 00:32:26.400 |
because they were actually selling products that used it. 00:32:29.200 |
But nobody at NCR had any idea what a convolutional net was. 00:32:39.880 |
where I didn't actually work on machine learning 00:32:44.880 |
And between 2002 and 2007, I was working on them, 00:32:48.840 |
crossing my finger that nobody at NCR would notice. 00:32:52.040 |
- Yeah, and I hope that this kind of somewhat, 00:32:58.320 |
relative openness of the community now will continue. 00:33:02.920 |
- It accelerates the entire progress of the industry. 00:33:13.000 |
is not whether Facebook or Google or Microsoft or IBM 00:33:21.080 |
We want to build intelligent virtual assistants 00:33:24.960 |
We don't have monopoly on good ideas for this. 00:33:33.840 |
to human level intelligence and common sense, 00:33:52.600 |
This calls to the gap between the space of ideas 00:34:00.440 |
of practical application that you often speak to. 00:34:07.880 |
"to have a solution to artificial general intelligence, 00:34:14.220 |
"or who claim to have figured out how the brain works. 00:34:29.120 |
But I think your opinion is still MNIST and ImageNet, 00:34:43.360 |
and the practical testing, the practical application 00:34:57.280 |
as some sort of standard kind of benchmark, if you want. 00:35:04.280 |
people, Jason West, Antoine Born and a few others 00:35:14.320 |
to access working memory and things like this. 00:35:16.920 |
And it was very useful, even though it wasn't a real task. 00:35:26.040 |
It's just that I was really struck by the fact that 00:35:29.560 |
a lot of people, particularly a lot of people 00:35:31.120 |
with money to invest would be fooled by people telling them, 00:35:40.200 |
So there's a lot of people who try to take advantage 00:35:58.620 |
or it may be very difficult to establish a benchmark. 00:36:02.560 |
Establishing benchmarks is part of the process. 00:36:14.920 |
to just every kind of information you can pull off 00:36:24.940 |
what kind of benchmarks do you see that start creeping 00:36:33.600 |
like reasoning, like maybe you don't like the term, 00:36:41.520 |
- A lot of people are working on interactive environments 00:36:44.160 |
in which you can train and test intelligent systems. 00:36:50.200 |
the classical paradigm of supervised learning 00:37:10.100 |
the order in which you see them shouldn't matter, 00:37:17.560 |
which is the case, for example, in robotics, right? 00:37:32.960 |
so that creates also a dependency between samples, right? 00:37:40.960 |
is gonna be probably in the same building, most likely. 00:37:47.920 |
of this training set, test set hypothesis break 00:37:56.400 |
So people are setting up artificial environments 00:38:05.840 |
and can interact with objects and things like this. 00:38:18.800 |
and you have games, you know, things like that. 00:38:29.840 |
because it implies that human intelligence is general 00:38:35.760 |
and human intelligence is nothing like general. 00:38:40.860 |
We think it's general, we like to think of ourselves 00:38:54.220 |
is ability to learn, as we were talking about learning, 00:39:18.520 |
coming out of one of your eyes, okay, two million total, 00:39:23.440 |
It's one million nerve fibers, your optical nerve. 00:39:30.640 |
So the input to your visual cortex is one million bits. 00:39:36.880 |
Now, they're connected to your brain in a particular way, 00:39:41.940 |
that are kind of a little bit like a convolutional net, 00:39:44.160 |
they're kind of local in space and things like this. 00:39:55.720 |
and I put a device that makes a random perturbation, 00:40:04.600 |
is a fixed but random permutation of all the pixels. 00:40:07.820 |
There's no way in hell that your visual cortex, 00:40:22.680 |
- No, because now two pixels that are nearby in the world 00:40:25.640 |
will end up in very different places in your visual cortex, 00:40:29.240 |
and your neurons there have no connections with each other 00:40:35.040 |
the hardware is built in many ways to support-- 00:40:42.600 |
- Yeah, but it's still pretty damn impressive. 00:40:54.040 |
so let's imagine you want to train your visual system 00:40:58.280 |
to recognize particular patterns of those one million bits. 00:41:33.020 |
How many of those functions can actually be computed 00:41:37.240 |
And the answer is a tiny, tiny, tiny, tiny, tiny, tiny sliver 00:41:51.480 |
Okay, that's an argument against the word general. 00:41:55.540 |
I agree with your intuition, but I'm not sure it's, 00:42:13.380 |
that are outside of our comprehension, right? 00:42:54.140 |
When you reduce the volume, the temperature goes up, 00:42:57.380 |
the pressure goes up, things like that, right? 00:43:02.180 |
Those are the things you can know about that system. 00:43:16.720 |
And what you don't know about it is the entropy, 00:43:23.980 |
The energy contained in that thing is what we call heat. 00:43:45.420 |
And you're right, that's a nice way to put it. 00:43:47.340 |
We're general to all the things we can imagine, 00:43:50.220 |
which is a very tiny subset of all things that are possible. 00:43:56.300 |
or the Kolmogorov-Chaitin summa of complexity. 00:44:05.580 |
except for all the ones that you can actually write down. 00:44:13.500 |
So we can just call it artificial intelligence. 00:44:31.580 |
and it's difficult to define what human intelligence is. 00:44:43.900 |
Damn impressive demonstration of intelligence, whatever. 00:44:46.700 |
And so on that topic, most successes in deep learning 00:44:57.860 |
Is there a hope to reduce involvement of human input 00:45:16.620 |
The only thing I'm interested in at the moment is, 00:45:19.100 |
I call it self-supervised learning, not unsupervised, 00:45:21.260 |
'cause unsupervised learning is a loaded term. 00:45:24.020 |
People who know something about machine learning 00:45:31.580 |
And the wide public, when you say unsupervised learning, 00:45:33.660 |
oh my God, machines are gonna learn by themselves 00:45:40.820 |
- Yeah, so I call it self-supervised learning 00:45:42.940 |
because in fact, the underlying algorithms that are used 00:45:52.340 |
is not predict a particular set of variables, 00:46:06.420 |
But what you train the machine to do is basically 00:46:18.820 |
and ask it to predict what's gonna happen next. 00:46:20.980 |
And of course, after a while, you can show what happens 00:46:27.540 |
You can do, like all the latest, most successful models 00:46:34.820 |
You know, sort of BERT-style systems, for example, right? 00:46:38.700 |
You show it a window of a dozen words on a test corpus, 00:46:59.540 |
- So you construct, so in an unsupervised way, 00:47:05.100 |
- Or video, or the physical world, or whatever, right? 00:47:24.780 |
to have kind of true human level intelligence, 00:47:26.860 |
I think you need to ground language in reality. 00:47:29.260 |
So some people are attempting to do this, right? 00:47:32.820 |
Having systems that kind of have some visual representation 00:47:41.100 |
But it's like a huge technical problem that is not solved, 00:47:45.100 |
and that explains why self-supervised learning works 00:47:52.780 |
in the context of image recognition and video, 00:48:00.700 |
it's much easier to represent uncertainty in the prediction 00:48:06.940 |
than it is in the context of things like video and images. 00:48:13.940 |
you know, 15% of the words that are taken out. 00:48:20.060 |
There is a hundred thousand words in the lexicon, 00:48:33.140 |
So there, representing uncertainty in the prediction 00:48:58.780 |
You can't train a system to make one prediction. 00:49:22.820 |
but I might turn my head to the left or to the right. 00:49:27.020 |
If you don't have a system that can predict this, 00:49:31.820 |
to kind of minimize the error with a prediction 00:49:37.020 |
in all possible future positions that I might be in, 00:49:59.340 |
there might be artificial ways of like self-play in games 00:50:03.260 |
to where you can simulate part of the environment. 00:50:16.940 |
And because you can do huge amounts of data generation, 00:50:21.580 |
Well, it creeps up on the problem from the side of data, 00:50:26.020 |
and you don't think that's the right way to creep up. 00:50:30.980 |
So if you have a machine learn a predictive model 00:50:42.540 |
Just give a few frames of the game to a ConvNet, 00:50:47.020 |
and then have the game generates the next few frames. 00:50:49.660 |
And if the game is deterministic, it works fine. 00:50:52.380 |
And that includes feeding the system with the action 00:51:03.060 |
The problem comes from the fact that the real world 00:51:09.700 |
And so there you get those blurry predictions, 00:51:11.340 |
and you can't do planning with blurry predictions. 00:51:14.300 |
Right, so if you have a perfect model of the world, 00:51:26.740 |
But if your model is imperfect, how can you plan? 00:51:33.900 |
What are your thoughts on the extension of this, 00:51:39.660 |
it's connected to something you were talking about 00:51:44.580 |
So as opposed to sort of completely unsupervised 00:51:58.100 |
So if you think about a robot exploring a space 00:52:05.260 |
every once in a while asking for human input, 00:52:14.180 |
It's going to make things that we can already do 00:52:16.780 |
more efficient, or they will learn slightly more efficiently, 00:52:29.340 |
there's no conflict between self-supervised learning, 00:52:34.340 |
reinforcement learning, and supervised learning, 00:52:38.020 |
I see self-supervised learning as a preliminary 00:53:04.580 |
take about 80 hours of training to reach the level 00:53:07.780 |
that any human can reach in about 15 minutes. 00:53:10.020 |
They get better than humans, but it takes them a long time. 00:53:21.020 |
Aurelien Wigner and his team's system to play StarCraft, 00:53:27.020 |
plays, you know, a single map, a single type of player, 00:53:38.780 |
with about the equivalent of 200 years of training 00:53:46.380 |
It's not something that no human can, could ever do. 00:53:50.060 |
- I mean, I'm not sure what lesson to take away from that. 00:54:00.180 |
It would probably have to drive millions of hours. 00:54:03.940 |
It will have to kill thousands of pedestrians. 00:54:11.620 |
before it figures out that it's a bad idea, first of all. 00:54:15.140 |
And second of all, before it figures out how not to do it. 00:54:18.460 |
And so, I mean, this type of learning, obviously, 00:54:28.700 |
which I've been advocating for like five years now, 00:54:31.380 |
is that we have predictive models of the world 00:54:34.660 |
that include the ability to predict under uncertainty. 00:54:54.300 |
we know that if we turn the wheel to the right, 00:54:59.860 |
Because we have a pretty good model of intuitive physics 00:55:05.300 |
Babies learn this around the age of eight or nine months, 00:55:14.180 |
of the effect of turning the wheel of the car. 00:55:18.060 |
So there's a lot of things that we bring to the table, 00:55:20.620 |
which is basically our predictive model of the world. 00:55:23.500 |
And that model allows us to not do stupid things 00:55:35.260 |
but that allows us to learn really, really, really quickly. 00:55:38.780 |
So that's called model-based reinforcement learning. 00:55:41.340 |
There's some imitation and supervised learning 00:55:58.100 |
- And the physics is somewhat transferable from, 00:56:09.060 |
you don't need to be from a particularly intelligent species 00:56:12.620 |
to know that if you spill water from a container, 00:56:31.260 |
That's what self-supervised learning is all about. 00:56:34.060 |
- If you were to try to construct a benchmark for, 00:56:42.300 |
Do you think it's useful, interesting/possible 00:57:04.300 |
is train on some gigantic dataset of labeled digit, 00:57:10.540 |
We do this at Facebook, like in production, right? 00:57:15.860 |
to predict hashtags that people type on Instagram. 00:57:18.180 |
And we train on billions of images, literally billions. 00:57:38.180 |
What kind of transfer learning would be useful and impressive? 00:57:48.020 |
you know, have a kind of scenario for benchmark 00:57:54.500 |
and you can, and it's very large number of unlabeled data. 00:58:02.060 |
It could be where you do, you know, frame prediction. 00:58:04.820 |
It could be images where you could choose to, 00:58:18.780 |
and then you train on a particular supervised task, 00:58:32.100 |
as you increase the number of labeled training samples. 00:58:44.860 |
than if you train from scratch, from random weights. 00:58:47.460 |
So that to reach the same level of performance 00:58:50.500 |
in a completely supervised, purely supervised system 00:58:54.140 |
would reach, you would need way fewer samples. 00:58:57.700 |
because it will answer the question to like, you know, 00:59:02.980 |
Okay, you know, if I want to get to a particular 00:59:12.140 |
Can I do, you know, self-supervised pre-training 00:59:15.340 |
to reduce this to about a hundred or something? 00:59:25.020 |
- Telling you, active learning, but you disagree. 00:59:32.460 |
It's just gonna make things that we already do. 00:59:40.900 |
So I worked with a lot of large scale datasets 00:59:54.300 |
It's, you know, working with the data you have. 00:59:56.140 |
I mean, certainly people are doing things like, 01:00:09.420 |
And with just that, I would probably reach the same. 01:00:12.420 |
So it's a weak form of active learning, if you want. 01:00:16.340 |
- Yes, but there might be a much stronger version. 01:00:20.980 |
- That's what, and that's an open question if it exists. 01:00:23.940 |
The question is how much stronger it can get. 01:00:32.140 |
and deep learning can solve the autonomous driving problem. 01:00:38.300 |
possibilities of deep learning in this space? 01:00:42.980 |
I mean, I don't think we'll ever have a self-driving system, 01:01:04.220 |
and that was the case for autonomous driving, 01:01:11.300 |
but there's a lot of engineering that's involved 01:01:13.700 |
in kind of, you know, taking care of corner cases 01:01:29.100 |
now computer vision, natural language processing. 01:01:43.820 |
decent level of autonomy, where you don't expect 01:01:52.580 |
100 square kilometers or square miles in Phoenix, 01:01:55.340 |
but the weather is nice and the roads are wide, 01:02:03.260 |
with tons of lidars and sophisticated sensors 01:02:11.300 |
And you engineer the hell out of everything else, 01:02:17.940 |
so you have a complete 3D model of everything. 01:02:37.500 |
but I think eventually the long-term solution 01:02:45.020 |
of self-supervised learning and model-based reinforcement 01:02:50.860 |
- But ultimately learning will be not just at the core, 01:02:54.780 |
but really the fundamental part of the system. 01:02:57.180 |
- Yeah, it already is, but it'll become more and more. 01:03:00.340 |
- What do you think it takes to build a system 01:03:04.060 |
You talked about the AI system in the movie "Her" 01:03:22.860 |
but I don't know how many obstacles there are after this. 01:03:26.620 |
there is a bunch of mountains that we have to climb 01:03:38.380 |
have been overly optimistic about the result of AI. 01:03:54.540 |
is that all the problems you want to solve are exponential. 01:03:56.340 |
And so you can't actually use it for anything useful. 01:04:00.060 |
- Yeah, so yeah, all you see is the first peak. 01:04:02.260 |
So what are the first couple of peaks for "Her"? 01:04:09.780 |
How do we get machines to learn models of the world 01:04:15.820 |
So we've been working with cognitive scientists. 01:04:22.260 |
So this Emmanuelle Dupoux, who is at FAIR in Paris, 01:04:26.620 |
is a half-time, is also a researcher in French University. 01:04:42.700 |
And you can measure this in sort of various ways. 01:04:45.660 |
So things like distinguishing animate objects 01:04:52.940 |
you can tell the difference at age two, three months. 01:05:02.900 |
You know, there are various things like this. 01:05:06.460 |
the fact that objects are not supposed to float in the air, 01:05:10.060 |
you learn this around the age of eight or nine months. 01:05:12.580 |
If you look at a lot of, you know, eight-month-old babies, 01:05:15.340 |
you give them a bunch of toys on their high chair. 01:05:18.540 |
First thing they do is they throw them on the ground 01:05:29.700 |
but they, you know, they need to do the experiment, right? 01:05:32.660 |
So, you know, how do we get machines to learn like babies? 01:05:36.580 |
Mostly by observation with a little bit of interaction 01:05:41.220 |
because I think that's really a crucial piece 01:05:49.500 |
it needs to have a predictive model of the world. 01:05:51.340 |
So something that says, here is a world at time T, 01:05:54.060 |
here is a state of the world at time T plus one 01:06:01.260 |
- Yeah, well, but we don't know how to represent 01:06:04.820 |
So it's gotta be something weaker than that, okay? 01:06:23.260 |
is some sort of objective that you want to optimize. 01:06:26.020 |
Am I reaching the goal of grabbing this object? 01:06:48.780 |
computes your level of contentment or miscontentment. 01:07:14.860 |
of what your basal ganglia is going to tell you. 01:07:23.740 |
And you're predicting this because of your model of the world 01:07:26.100 |
and your sort of predictor of this objective, right? 01:07:35.140 |
you have the hardwired contentment objective computer, 01:07:46.700 |
which basically predicts your level of contentment. 01:08:04.420 |
- Call this a policy network or something like that, right? 01:08:13.940 |
And you can be stupid in three different ways. 01:08:16.100 |
You can be stupid because your model of the world is wrong. 01:08:19.380 |
You can be stupid because your objective is not aligned 01:08:36.340 |
but you're unable to figure out a course of action 01:08:42.380 |
Some people who are in charge of big countries 01:08:57.980 |
you've criticized the art project that is Sophia the Robot. 01:09:07.540 |
is uses our natural inclination to anthropomorphize 01:09:11.740 |
things that look like human and give them more. 01:09:14.780 |
Do you think that could be used by AI systems 01:09:38.500 |
about their marketing or behavior in general, 01:09:45.660 |
- I mean, don't you think, here's a tough question. 01:09:55.980 |
feels that Sophia can do way more than she actually can. 01:10:22.100 |
are taking advantage of the same misunderstanding 01:10:47.940 |
I mean, the reviewers are generally not very forgiving 01:10:57.180 |
And, but there are certainly quite a few startups 01:10:59.660 |
that have had a huge amount of hype around this 01:11:05.500 |
and I've been calling it out when I've seen it. 01:11:08.020 |
So yeah, but to go back to your original question, 01:11:13.020 |
I think, I don't think embodiment is necessary. 01:11:20.460 |
without some level of grounding in the real world. 01:11:30.300 |
- Can you talk to ground, what grounding means? 01:11:34.020 |
so there is this classic problem of common sense reasoning, 01:11:41.020 |
And so I tell you the trophy doesn't fit in a suitcase 01:11:49.180 |
And the it in the first case refers to the trophy 01:11:55.180 |
is because you know what the trophy and the suitcase are, 01:11:57.020 |
you know, one is supposed to fit in the other one, 01:12:00.620 |
and the big object doesn't fit in a small object 01:12:03.020 |
unless it's a TARDIS, you know, things like that, right? 01:12:05.300 |
So you have this knowledge of how the world works, 01:12:10.660 |
I don't believe you can learn everything about the world 01:12:14.700 |
by just being told in language how the world works. 01:12:18.020 |
I think you need some low-level perception of the world, 01:12:21.740 |
you know, be it visual touch, you know, whatever, 01:12:23.740 |
but some higher bandwidth perception of the world. 01:12:32.540 |
There's a lot of things that just will never appear in text 01:12:37.020 |
So I think common sense will emerge from, you know, 01:12:45.660 |
or perhaps even interacting in virtual environments 01:12:48.900 |
and possibly, you know, robot interacting in the real world. 01:12:55.980 |
but I think there's a need for some grounding. 01:13:04.860 |
It just needs to have an awareness, a grounding. 01:13:07.700 |
- Right, but it needs to know how the world works 01:13:10.140 |
to have, you know, to not be frustrating to talk to. 01:13:14.420 |
- And you talked about emotions being important. 01:13:29.340 |
the thing that calculates your level of miscontentment, 01:13:46.420 |
You have this inkling that there is some chance 01:13:49.260 |
that something really bad is gonna happen to you, 01:13:53.700 |
is gonna happen to you, you kind of give up, right? 01:14:08.860 |
So you mentioned very practical things of fear, 01:14:13.420 |
- But they are kind of the results of, you know, drives. 01:14:16.340 |
- Yeah, there's deeper biological stuff going on, 01:14:38.500 |
- You know, I think the first one we'll create 01:14:53.620 |
- Well, what's a good question to ask, you know, 01:14:56.900 |
to be impressed? - What is the cause of wind? 01:14:58.940 |
And if she answers, oh, it's because the leaves 01:15:03.900 |
of the tree are moving and that creates wind, 01:15:12.620 |
- No, and then you tell her, actually, you know, 01:15:24.500 |
to do common sense reasoning about the physical world. 01:15:26.980 |
- Yeah, and you'll sum it up with a causal inference.