back to indexYoshua Bengio: Deep Learning | Lex Fridman Podcast #4
Chapters
0:0 Introduction
3:42 Current state of deep learning
6:44 Architecture vs dataset
8:1 Learning through interaction
10:46 Our brain is big
12:40 Knowledge
24:1 Ex Machina
25:28 Bottle Ideas
27:54 Bias in Machine Learning
31:29 Teaching Machines
33:58 The Turing Test
37:48 Whats next
40:20 Gans
00:00:00.000 |
- What difference between biological neural networks 00:00:02.760 |
and artificial neural networks is most mysterious, 00:00:07.900 |
- First of all, there's so much we don't know 00:00:29.380 |
something that we don't know how biological neural networks 00:00:32.680 |
do but would be really useful for artificial ones 00:00:52.680 |
And this mismatch, I think, this kind of mismatch 00:01:00.020 |
to A, understand better how brains might do these things 00:01:04.340 |
because we don't have good corresponding theories 00:01:08.260 |
and B, maybe provide new ideas that we could explore 00:01:18.940 |
and that we could incorporate in artificial neural nets. 00:01:22.360 |
- So let's break credit assignment up a little bit. 00:01:38.480 |
building up common sense knowledge over time, 00:01:41.200 |
or is it more in the reinforcement learning sense 00:01:47.640 |
for a particular, to achieve a certain kind of goal? 00:01:50.200 |
- So I was thinking more about the first two meanings 00:02:12.860 |
and assign credit to decisions or interpretations 00:02:24.500 |
And then we can change the way we would have reacted 00:02:31.860 |
And now that's credit assignment used for learning. 00:02:35.120 |
- So in which way do you think artificial neural networks, 00:02:44.620 |
the current architectures are not able to capture 00:02:49.860 |
presumably you're thinking of very long-term? 00:02:53.860 |
So the current nets are doing a fairly good jobs 00:02:58.420 |
for sequences with dozens or say hundreds of timestamps. 00:03:06.060 |
and depending on what you have to remember and so on, 00:03:10.940 |
Whereas humans seem to be able to do credit assignment 00:03:17.700 |
Like I could remember something I did last year 00:03:20.580 |
and then now because I see some new evidence, 00:03:23.260 |
I'm gonna change my mind about the way I was thinking 00:03:26.860 |
last year and hopefully not do the same mistake again. 00:03:31.100 |
- I think a big part of that is probably forgetting. 00:03:36.120 |
You're only remembering the really important things. 00:03:48.020 |
to higher level cognition here regarding consciousness, 00:03:54.300 |
So there's deciding what comes to consciousness 00:04:01.460 |
- So you've been at the forefront there all along 00:04:07.000 |
showing some of the amazing things that neural networks, 00:04:24.020 |
is the weakest aspect of the way deep neural networks 00:04:36.700 |
trained on large quantities of images or texts, 00:04:54.340 |
and it's not nearly as robust and abstract in general 00:05:01.700 |
Okay, so that doesn't tell us how to fix things, 00:05:10.700 |
how we can maybe train our neural nets differently 00:05:27.840 |
Also, one thing I'll talk about in my talk this afternoon 00:05:32.840 |
is instead of learning separately from images and videos 00:05:38.480 |
on one hand and from texts on the other hand, 00:05:41.880 |
we need to do a better job of jointly learning 00:05:46.200 |
about language and about the world to which it refers 00:05:55.720 |
We need to have good world models in our neural nets 00:06:03.600 |
which talk about what's going on in the world, 00:06:10.120 |
to help provide clues about what high level concepts 00:06:24.480 |
that the purely unsupervised learning of representations 00:06:28.920 |
doesn't give rise to high level representations 00:06:33.360 |
that are as powerful as the ones we're getting 00:06:37.420 |
And so the clues we're getting just with the labels, 00:06:41.240 |
not even sentences, is already very powerful. 00:06:44.840 |
- Do you think that's an architecture challenge 00:07:08.760 |
the training objectives, the training frameworks. 00:07:11.480 |
For example, going from passive observation of data 00:07:25.080 |
the relationships between causes and effects, 00:07:34.880 |
the highest level explanations to rise from the learning, 00:07:50.240 |
So these kinds of questions are neither in the data set 00:08:01.600 |
- Yeah, I've heard you mention in several contexts 00:08:08.240 |
and it seems fascinating because in some sense, 00:08:12.200 |
except with some cases in reinforcement learning, 00:08:15.800 |
that idea is not part of the learning process 00:08:30.360 |
"You know what, if you poke this object in this kind of way, 00:08:34.400 |
it would be really helpful for me to further learn." 00:08:39.400 |
Sort of almost guiding some aspect of learning. 00:08:44.120 |
So I was talking to Rebecca Sachs just an hour ago, 00:08:47.600 |
and she was talking about lots and lots of evidence 00:08:50.760 |
from infants seem to clearly pick what interests them 00:09:05.000 |
They focus their attention on aspects of the world 00:09:18.100 |
- So that's a fascinating view of the future progress, 00:09:36.720 |
of the things that have been increasing a lot 00:09:39.300 |
in the past few years will also make significant progress? 00:09:43.600 |
So some of the representational issues that you mentioned, 00:09:52.360 |
- Oh, shallow you mean in the sense of abstraction? 00:09:58.160 |
- I don't think that having more depth in the network 00:10:03.720 |
we have 10,000 is going to solve our problem. 00:10:11.440 |
What is clear to me is that engineers and companies 00:10:17.960 |
to tune architectures and explore all kinds of tweaks 00:10:27.160 |
But I don't think that's gonna be nearly enough. 00:10:35.080 |
to achieve the goal that these learners actually understand 00:10:40.480 |
in a deep way the environment in which they are, 00:10:49.920 |
that's more interesting than just more layers. 00:10:52.800 |
It's basically once you figure out a way to learn 00:10:57.600 |
through interacting, how many parameters does it take 00:11:10.880 |
So I agree that in order to build neural nets 00:11:15.160 |
with the kind of broad knowledge of the world 00:11:21.000 |
probably the kind of computing power we have now 00:11:25.680 |
So, well, the good news is there are hardware companies 00:11:28.680 |
building neural net chips, and so it's gonna get better. 00:11:37.520 |
is that even our state of the art deep learning methods 00:11:53.840 |
I mean, of course, if you train them with enough examples, 00:11:57.280 |
But it's just like, instead of what humans might need, 00:12:10.140 |
And so I think there's an opportunity for academics 00:12:21.080 |
and exciting research to advance the state of the art 00:12:24.400 |
in training frameworks, learning models, agent learning, 00:12:29.320 |
in even simple environments that are synthetic, 00:12:37.600 |
- We talked about priors and common sense knowledge. 00:12:43.400 |
It seems like we humans take a lot of knowledge for granted. 00:12:59.140 |
and how we can teach neural networks or learning systems 00:13:10.960 |
like there's a time where knowledge representation, 00:13:23.320 |
And it was kind of put on hold a little bit, it seems like. 00:13:36.800 |
And how do you think those goals can be addressed? 00:14:08.740 |
And that knowledge is also necessary for machines 00:14:16.940 |
And that knowledge is hard to codify in expert systems, 00:14:21.140 |
rule-based systems, and classical AI formalism. 00:14:27.820 |
like not really good ways of handling uncertainty. 00:14:37.460 |
but I think still isn't enough in the minds of people, 00:14:44.080 |
that comes from distributive representations, 00:14:47.220 |
the thing that really makes neural nets work so well. 00:14:51.180 |
And it's hard to replicate that kind of power 00:15:02.140 |
is nicely decomposed into like a bunch of rules. 00:15:06.680 |
Whereas if you think about a neural net, it's the opposite. 00:15:24.320 |
that we have to take lessons from classical AI 00:15:28.580 |
in order to bring in another kind of compositionality 00:15:47.740 |
- So let me connect with disentangled representations, 00:15:55.500 |
and I still believe that it's really important 00:16:04.880 |
that build representations in which the important factors, 00:16:09.460 |
hopefully causal factors, are nicely separated 00:16:15.120 |
So that's the idea of disentangled representations. 00:16:38.180 |
So let's say we have these disentangled representations. 00:16:48.140 |
I mean, this is like too much of an assumption. 00:16:51.320 |
They're gonna have some interesting relationships 00:16:57.720 |
The kind of knowledge about those relationships 00:17:00.080 |
in a classical AI system is encoded in the rules. 00:17:03.140 |
Like a rule is just like a little piece of knowledge 00:17:05.500 |
that says, oh, I have these two, three, four variables 00:17:11.100 |
Then I can say something about one or two of them 00:17:21.020 |
which are like the variables in a rule-based system, 00:17:28.620 |
the mechanisms that relate those variables to each other. 00:17:38.940 |
And when I change a rule because I'm learning, 00:17:46.820 |
are very sensitive to what's called catastrophic forgetting 00:17:54.220 |
it can destroy the old things that I had learned, right? 00:18:14.780 |
But my idea is that when you project the data 00:18:18.780 |
it becomes possible to now represent this extra knowledge 00:18:23.420 |
beyond the transformation from input to representations, 00:18:26.140 |
which is how representations act on each other 00:18:35.540 |
So now it's the rules that are disentangled from each other 00:18:44.340 |
Like does there need to be an architectural difference? 00:18:58.940 |
And also computation, like it's not just variables, 00:19:06.700 |
But I'm hypothesizing that in the right high-level 00:19:15.200 |
and how they relate to each other can be disentangled 00:19:18.260 |
and that will provide a lot of generalization power. 00:19:24.020 |
- Distribution of the test set is assumed to be the same 00:19:31.100 |
This is where current machine learning is too weak. 00:19:36.780 |
is not able to tell us anything about how our, 00:19:46.140 |
if we don't know what the new distribution will be. 00:20:12.420 |
where things look very different on the surface, 00:20:16.200 |
but it's still the same laws of physics, right? 00:20:25.020 |
but because you can transport a lot of the knowledge 00:20:39.060 |
you can now make sense of what is going on on this planet 00:20:44.420 |
- Taking that analogy further and distorting it, 00:20:55.620 |
- Or maybe, which is probably one of my favorite AI movies. 00:21:01.860 |
- And then there's another one that a lot of people love 00:21:04.780 |
that maybe a little bit outside of the AI community 00:21:12.620 |
- By the way, what are your views on that movie? 00:21:16.860 |
- So there are things I like and things I hate. 00:21:25.940 |
which is there's quite a large community of people 00:21:29.260 |
from different backgrounds, often outside of AI, 00:21:35.820 |
- You've seen this community develop over time, 00:21:45.420 |
to have discourse about it within AI community 00:21:48.420 |
and outside and grounded in the fact that "Ex Machina" 00:21:58.620 |
There's a big difference between the sort of discussion 00:22:05.220 |
and the sort of discussion that really matter 00:22:09.180 |
So I think the picture of Terminator and AI loose 00:22:21.460 |
isn't really so useful for the public discussion 00:22:28.540 |
the things I believe really matter are the short-term 00:22:32.940 |
and medium-term, very likely negative impacts of AI 00:22:40.700 |
like Big Brother scenarios with face recognition 00:22:43.420 |
or killer robots or the impact on the job market 00:22:46.820 |
or concentration of power and discrimination, 00:22:50.060 |
all kinds of social issues which could actually, 00:22:53.840 |
some of them could really threaten democracy, for example. 00:22:58.900 |
- Just to clarify, when you said killer robots, 00:23:01.180 |
you mean autonomous weapon, like the weapon systems. 00:23:04.180 |
- Yes, I don't mean-- - Not "Turkish Terminator." 00:23:07.340 |
So I think these short and medium-term concerns 00:23:11.260 |
should be important parts of the public debate. 00:23:26.940 |
should we study what could happen if a meteorite 00:23:32.780 |
So I think it's very unlikely that this is gonna happen 00:23:38.420 |
It's very, the sort of scenario of an AI getting loose 00:23:51.940 |
and who knows what AI will be in 50 years from now. 00:23:54.380 |
So I think it is worth that scientists study those problems. 00:23:57.660 |
It's just not a pressing question as far as I'm concerned. 00:24:04.780 |
but what do you like and not like about Ex Machina 00:24:10.340 |
'Cause I actually watched it for the second time 00:24:15.140 |
and I enjoyed it quite a bit more the second time 00:24:18.260 |
when I sort of learned to accept certain pieces of it. 00:24:40.660 |
Science is not happening in some hidden place 00:25:09.660 |
All the scientists who are expert in their field 00:25:24.900 |
from the way science is painted in this movie. 00:25:41.580 |
- Do you think that will always be the case with AI? 00:26:00.840 |
It's not how I can foresee it in the foreseeable future. 00:26:16.220 |
- I think it's ominous that the lights went off 00:26:24.380 |
and you could imagine all kinds of science fiction. 00:26:28.060 |
maybe similar to the question about existential risk, 00:26:31.160 |
is that this kind of movie paints such a wrong picture 00:26:36.160 |
of what is actual, you know, the actual science 00:26:43.620 |
on people's understanding of current science. 00:26:59.340 |
Research is exploration in the space of ideas. 00:27:02.000 |
And different people will focus on different directions. 00:27:08.720 |
So I'm totally fine with people exploring directions 00:27:13.080 |
that are contrary to mine or look orthogonal to mine. 00:27:21.900 |
I and my friends don't claim we have universal truth 00:27:26.220 |
especially about what will happen in the future. 00:27:28.860 |
Now, that being said, we have our intuitions, 00:27:34.220 |
according to where we think we can be most useful 00:27:37.720 |
and where society has the most to gain or to lose. 00:27:43.140 |
and not end up in a society where there's only one voice 00:27:48.140 |
and one way of thinking and research money is spread out. 00:27:55.140 |
- Disagreement is a sign of good research, good science. 00:28:00.060 |
- The idea of bias in the human sense of bias. 00:28:05.660 |
- How do you think about instilling in machine learning 00:28:09.740 |
something that's aligned with human values in terms of bias? 00:28:17.700 |
of what a fundamental respect for other human beings means. 00:28:31.500 |
and then there are long-term things that we need to do. 00:28:43.380 |
to take data sets in which we know there is bias, 00:28:54.580 |
discrimination against particular groups, and so on. 00:29:06.700 |
We can do it, for example, using adversarial methods 00:29:11.540 |
to make our systems less sensitive to these variables 00:29:25.620 |
But I think, in fact, they're sufficiently mature 00:29:28.940 |
that governments should start regulating companies 00:29:32.620 |
where it matters, say, like insurance companies, 00:29:36.740 |
because those techniques will probably reduce the bias, 00:29:43.140 |
For example, maybe their predictions will be less accurate, 00:29:45.860 |
and so companies will not do it until you force them. 00:29:55.180 |
of how we can instill moral values into computers. 00:29:59.180 |
Obviously, this is not something we'll achieve 00:30:21.420 |
to patterns of, say, injustice, which could trigger anger. 00:30:26.420 |
So these are things we can do in the medium term, 00:30:31.660 |
and eventually train computers to model, for example, 00:30:42.200 |
I would say the simplest thing is unfair situations, 00:30:52.700 |
I think it's quite feasible within the next few years, 00:30:57.540 |
these kinds of things, to the extent, unfortunately, 00:31:00.640 |
that they understand enough about the world around us, 00:31:05.820 |
but maybe we can initially do this in virtual environments. 00:31:19.020 |
I think we could train machines to detect those situations 00:31:29.300 |
- You have shown excitement and done a lot of excellent work 00:31:40.980 |
- And one of the things I'm really passionate about 00:32:14.880 |
I think is something that deserves a lot more attention 00:32:19.420 |
So there are people who've coined the term machine teaching. 00:32:22.700 |
So what are good strategies for teaching a learning agent? 00:32:42.420 |
where there's a learning agent and a teaching agent. 00:32:46.300 |
Presumably the teaching agent would eventually be a human, 00:32:59.260 |
which it can acquire using whatever way, brute force, 00:33:04.740 |
to help the learner learn as quickly as possible. 00:33:09.020 |
So the learner is gonna try to learn by itself, 00:33:17.440 |
can have an influence on the interaction with the learner 00:33:29.060 |
or just add the boundary between what it knows 00:33:32.620 |
So there's a tradition of these kind of ideas 00:33:35.860 |
from other fields, like tutorial systems, for example, 00:33:40.820 |
and AI, and of course, people in the humanities 00:33:46.700 |
but I think it's time that machine learning people 00:33:51.740 |
we'll have more and more human-machine interaction 00:33:56.940 |
and I think understanding how to make this work better-- 00:34:00.540 |
- All the problems around that are very interesting 00:34:04.180 |
You've done a lot of work with language, too. 00:34:07.540 |
What aspect of the traditionally formulated Turing test, 00:34:12.540 |
a test of natural language understanding and generation, 00:34:20.220 |
is the hardest part of conversation to solve for machines? 00:34:24.580 |
- So I would say it's everything having to do 00:34:36.460 |
so these sentences that are semantically ambiguous. 00:34:39.500 |
In other words, you need to understand enough 00:34:42.300 |
about the world in order to really interpret properly 00:34:49.380 |
for machine learning because they point in the direction 00:34:52.980 |
of building systems that both understand how the world works 00:35:01.420 |
and associate that knowledge with how to express it 00:35:19.780 |
and all the underlying challenges we just mentioned 00:35:40.900 |
to learn from human agents, whatever their language. 00:35:46.260 |
- Well, certainly us humans can talk more beautifully 00:35:58.620 |
to convey complex ideas than it is in English. 00:36:07.580 |
But of course, the goal ultimately is our human brain 00:36:11.980 |
is able to utilize any kind of those languages 00:36:18.420 |
- Yeah, of course there are differences between languages 00:36:20.460 |
and maybe some are slightly better at some things. 00:36:25.020 |
where we're trying to understand how the brain works 00:36:31.180 |
- So you've lived perhaps through an AI winter of sorts. 00:36:39.820 |
- How did you stay warm and continue your research? 00:36:48.500 |
And what have you learned from the experience? 00:36:55.600 |
Don't be trying to just please the crowds and the fashion. 00:37:00.600 |
And if you have a strong intuition about something 00:37:07.960 |
that is not contradicted by actual evidence, go for it. 00:37:17.160 |
- Not your own instinct of based on everything 00:37:23.400 |
when your experiments contradict those beliefs. 00:37:26.640 |
But you have to stick to your beliefs otherwise. 00:37:31.740 |
It's what allowed me to go through those years. 00:37:34.940 |
It's what allowed me to persist in directions 00:37:51.380 |
of course it's marked with technical breakthroughs, 00:37:54.400 |
but it's also marked with these seminal events 00:37:57.420 |
that capture the imagination of the community. 00:38:04.260 |
the world champion human Go player was one of those moments. 00:38:08.780 |
What do you think the next such moment might be? 00:38:20.460 |
As I said, science really moves by small steps. 00:38:25.940 |
Now what happens is you make one more small step 00:38:41.180 |
to do something you were not able to do before. 00:38:46.840 |
or solving a problem becomes cheaper than what existed 00:38:52.820 |
So especially in the world of commerce and applications, 00:38:56.980 |
the impact of a small scientific progress could be huge. 00:39:01.660 |
But in the science itself, I think it's very, very gradual. 00:39:13.140 |
- So if I look at one trend that I like in my community, 00:39:35.100 |
pretty much absent just two or three years ago. 00:39:38.000 |
So there's really a big interest from students 00:39:41.700 |
and there's a big interest from people like me. 00:39:45.180 |
So I would say this is something where we're gonna see 00:39:49.660 |
more progress even though it hasn't yet provided much 00:40:00.420 |
like Google is not making money on this right now. 00:40:04.740 |
this is really, really important for many reasons. 00:40:07.240 |
So in other words, I would say reinforcement learning 00:40:17.060 |
that an agent is learning about its environment. 00:40:19.500 |
- Now, reinforcement learning you're excited about, 00:40:38.740 |
in building agents that can understand the world. 00:40:42.160 |
A lot of the successes in reinforcement learning 00:40:51.100 |
you don't actually learn a model of the world. 00:40:55.560 |
And we don't know how to do model-based RL right now. 00:41:02.020 |
in order to build models that can generalize faster 00:41:11.080 |
at least the underlying causal mechanisms in the world. 00:41:21.080 |
If you look back, what was the first moment in your life 00:41:26.080 |
when you were fascinated by either the human mind 00:41:31.320 |
- You know, when I was an adolescent, I was reading a lot. 00:41:42.560 |
And then I had one of the first personal computers 00:41:52.440 |
- Start with fiction and then make it a reality. 00:41:56.040 |
- Yoshua, thank you so much for talking to me.