Yoshua Bengio: Deep Learning | Lex Fridman Podcast #4

00:00:00.000 | - What difference between biological neural networks

00:00:02.760 | and artificial neural networks is most mysterious,

00:00:06.200 | captivating, and profound for you?

00:00:07.900 | - First of all, there's so much we don't know

00:00:13.800 | about biological neural networks,

00:00:15.400 | and that's very mysterious and captivating

00:00:17.580 | because maybe it holds the key

00:00:20.600 | to improving artificial neural networks.

00:00:23.380 | One of the things I studied recently,

00:00:29.380 | something that we don't know how biological neural networks

00:00:32.680 | do but would be really useful for artificial ones

00:00:37.200 | is the ability to do credit assignment

00:00:39.920 | through very long time spans.

00:00:44.160 | There are things that we can in principle do

00:00:48.060 | with artificial neural nets,

00:00:49.160 | but it's not very convenient

00:00:50.320 | and it's not biologically plausible.

00:00:52.680 | And this mismatch, I think, this kind of mismatch

00:00:57.200 | may be an interesting thing to study

00:01:00.020 | to A, understand better how brains might do these things

00:01:04.340 | because we don't have good corresponding theories

00:01:06.920 | with artificial neural nets,

00:01:08.260 | and B, maybe provide new ideas that we could explore

00:01:13.260 | about things that brain do differently

00:01:18.940 | and that we could incorporate in artificial neural nets.

00:01:22.360 | - So let's break credit assignment up a little bit.

00:01:24.700 | - Yes.

00:01:25.540 | - It's a beautifully technical term,

00:01:27.800 | but it could incorporate so many things.

00:01:29.840 | So is it more on the RNN memory side,

00:01:34.840 | thinking like that,

00:01:36.720 | or is it something about knowledge,

00:01:38.480 | building up common sense knowledge over time,

00:01:41.200 | or is it more in the reinforcement learning sense

00:01:45.440 | that you're picking up rewards over time

00:01:47.640 | for a particular, to achieve a certain kind of goal?

00:01:50.200 | - So I was thinking more about the first two meanings

00:01:53.880 | whereby we store all kinds of memories,

00:01:58.880 | episodic memories in our brain,

00:02:02.220 | which we can access later

00:02:06.100 | in order to help us both infer causes

00:02:11.100 | of things that we are observing now

00:02:12.860 | and assign credit to decisions or interpretations

00:02:19.140 | we came up with a while ago

00:02:21.740 | when those memories were stored.

00:02:24.500 | And then we can change the way we would have reacted

00:02:29.500 | or interpreted things in the past.

00:02:31.860 | And now that's credit assignment used for learning.

00:02:35.120 | - So in which way do you think artificial neural networks,

00:02:41.260 | the current LSTM,

00:02:44.620 | the current architectures are not able to capture

00:02:49.860 | presumably you're thinking of very long-term?

00:02:53.020 | - Yes.

00:02:53.860 | So the current nets are doing a fairly good jobs

00:02:58.420 | for sequences with dozens or say hundreds of timestamps.

00:03:03.180 | And then it gets sort of harder and harder

00:03:06.060 | and depending on what you have to remember and so on,

00:03:08.420 | as you consider longer durations.

00:03:10.940 | Whereas humans seem to be able to do credit assignment

00:03:15.940 | through essentially arbitrary times.

00:03:17.700 | Like I could remember something I did last year

00:03:20.580 | and then now because I see some new evidence,

00:03:23.260 | I'm gonna change my mind about the way I was thinking

00:03:26.860 | last year and hopefully not do the same mistake again.

00:03:31.100 | - I think a big part of that is probably forgetting.

00:03:36.120 | You're only remembering the really important things.

00:03:38.940 | So it's very efficient forgetting.

00:03:40.920 | - Yes.

00:03:43.060 | So there's a selection of what we remember.

00:03:45.100 | And I think there are really cool connection

00:03:48.020 | to higher level cognition here regarding consciousness,

00:03:52.600 | deciding and emotions.

00:03:54.300 | So there's deciding what comes to consciousness

00:03:56.500 | and what gets stored in memory,

00:03:58.320 | which are not trivial either.

00:04:01.460 | - So you've been at the forefront there all along

00:04:07.000 | showing some of the amazing things that neural networks,

00:04:09.920 | deep neural networks can do

00:04:12.140 | in the field of artificial intelligence,

00:04:13.580 | just broadly in all kinds of applications.

00:04:16.060 | But we can talk about that forever,

00:04:19.140 | but what in your view,

00:04:21.980 | because we're thinking towards the future,

00:04:24.020 | is the weakest aspect of the way deep neural networks

00:04:26.940 | represent the world?

00:04:28.060 | What is in your view is missing?

00:04:31.360 | - So current state of the art neural nets

00:04:36.700 | trained on large quantities of images or texts,

00:04:41.860 | have some level of understanding

00:04:45.980 | of what explains those data sets,

00:04:49.020 | but it's very basic.

00:04:51.820 | It's very low level,

00:04:54.340 | and it's not nearly as robust and abstract in general

00:04:59.340 | as our understanding.

00:05:01.700 | Okay, so that doesn't tell us how to fix things,

00:05:05.820 | but I think it encourages us to think about

00:05:10.700 | how we can maybe train our neural nets differently

00:05:15.700 | so that they would focus, for example,

00:05:21.060 | on causal explanation,

00:05:22.520 | something that we don't do currently

00:05:25.060 | with neural net training.

00:05:27.840 | Also, one thing I'll talk about in my talk this afternoon

00:05:32.840 | is instead of learning separately from images and videos

00:05:38.480 | on one hand and from texts on the other hand,

00:05:41.880 | we need to do a better job of jointly learning

00:05:46.200 | about language and about the world to which it refers

00:05:51.160 | so that both sides can help each other.

00:05:55.720 | We need to have good world models in our neural nets

00:06:00.200 | for them to really understand sentences

00:06:03.600 | which talk about what's going on in the world,

00:06:05.840 | and I think we need language input

00:06:10.120 | to help provide clues about what high level concepts

00:06:15.040 | like semantic concepts should be represented

00:06:18.720 | at the top levels of these neural nets.

00:06:21.800 | In fact, there is evidence

00:06:24.480 | that the purely unsupervised learning of representations

00:06:28.920 | doesn't give rise to high level representations

00:06:33.360 | that are as powerful as the ones we're getting

00:06:35.960 | from supervised learning.

00:06:37.420 | And so the clues we're getting just with the labels,

00:06:41.240 | not even sentences, is already very powerful.

00:06:44.840 | - Do you think that's an architecture challenge

00:06:47.720 | or is it a data set challenge?

00:06:49.640 | - Neither.

00:06:50.480 | (laughing)

00:06:52.720 | - I'm tempted to just end it there.

00:06:56.480 | (laughing)

00:06:57.320 | - No, okay.

00:06:58.160 | - Can you elaborate slightly?

00:06:59.640 | - Yes.

00:07:00.480 | (laughing)

00:07:03.080 | Of course, data sets and architectures

00:07:04.520 | are something you wanna always play with,

00:07:06.400 | but I think the crucial thing is more

00:07:08.760 | the training objectives, the training frameworks.

00:07:11.480 | For example, going from passive observation of data

00:07:17.200 | to more active agents,

00:07:20.080 | which learn by intervening in the world,

00:07:25.080 | the relationships between causes and effects,

00:07:28.600 | the sort of objective functions

00:07:31.400 | which could be important to allow

00:07:34.880 | the highest level explanations to rise from the learning,

00:07:39.880 | which I don't think we have now,

00:07:43.520 | the kinds of objective functions

00:07:45.080 | which could be used to reward exploration,

00:07:48.980 | the right kind of exploration.

00:07:50.240 | So these kinds of questions are neither in the data set

00:07:54.440 | nor in the architecture,

00:07:55.660 | but more in how we learn,

00:07:58.320 | under what objectives and so on.

00:08:01.600 | - Yeah, I've heard you mention in several contexts

00:08:04.600 | the idea of the way children learn,

00:08:06.520 | they interact with objects in the world,

00:08:08.240 | and it seems fascinating because in some sense,

00:08:12.200 | except with some cases in reinforcement learning,

00:08:15.800 | that idea is not part of the learning process

00:08:20.800 | in artificial neural networks.

00:08:23.040 | It's almost like, do you envision something

00:08:25.520 | like an objective function saying,

00:08:30.360 | "You know what, if you poke this object in this kind of way,

00:08:34.400 | it would be really helpful for me to further learn."

00:08:39.400 | Sort of almost guiding some aspect of learning.

00:08:43.280 | - Right, right, right.

00:08:44.120 | So I was talking to Rebecca Sachs just an hour ago,

00:08:47.600 | and she was talking about lots and lots of evidence

00:08:50.760 | from infants seem to clearly pick what interests them

00:09:00.200 | in a directed way.

00:09:01.560 | And so they're not passive learners.

00:09:05.000 | They focus their attention on aspects of the world

00:09:09.680 | which are most interesting, surprising,

00:09:12.560 | in a non-trivial way that makes them change

00:09:16.600 | their theories of the world.

00:09:18.100 | - So that's a fascinating view of the future progress,

00:09:24.140 | but on a more maybe boring question,

00:09:30.080 | do you think going deeper,

00:09:32.080 | and so do you think just increasing the size

00:09:36.720 | of the things that have been increasing a lot

00:09:39.300 | in the past few years will also make significant progress?

00:09:43.600 | So some of the representational issues that you mentioned,

00:09:48.560 | they're kind of shallow in some sense.

00:09:52.360 | - Oh, shallow you mean in the sense of abstraction?

00:09:55.360 | - In the sense of abstraction.

00:09:56.440 | They're not getting some, like you said--

00:09:58.160 | - I don't think that having more depth in the network

00:10:01.840 | in the sense of instead of 100 layers,

00:10:03.720 | we have 10,000 is going to solve our problem.

00:10:06.660 | - You don't think so?

00:10:07.500 | - No.

00:10:08.960 | - Is that obvious to you?

00:10:10.440 | - Yes.

00:10:11.440 | What is clear to me is that engineers and companies

00:10:14.720 | and labs and grad students will continue

00:10:17.960 | to tune architectures and explore all kinds of tweaks

00:10:22.800 | to make the current state of the art

00:10:24.480 | slightly ever slightly better.

00:10:27.160 | But I don't think that's gonna be nearly enough.

00:10:29.680 | I think we need some fairly drastic changes

00:10:31.880 | in the way that we are considering learning

00:10:35.080 | to achieve the goal that these learners actually understand

00:10:40.480 | in a deep way the environment in which they are,

00:10:44.040 | you know, observing and acting.

00:10:46.600 | - But I guess I was trying to ask a question

00:10:49.920 | that's more interesting than just more layers.

00:10:52.800 | It's basically once you figure out a way to learn

00:10:57.600 | through interacting, how many parameters does it take

00:11:01.320 | to store that information?

00:11:02.920 | So I think our brain is quite bigger

00:11:06.760 | than most neural networks.

00:11:07.840 | - Right, right.

00:11:08.680 | Oh, I see what you mean.

00:11:09.500 | Oh, I'm with you there.

00:11:10.880 | So I agree that in order to build neural nets

00:11:15.160 | with the kind of broad knowledge of the world

00:11:18.040 | that typical adult humans have,

00:11:21.000 | probably the kind of computing power we have now

00:11:23.960 | is gonna be insufficient.

00:11:25.680 | So, well, the good news is there are hardware companies

00:11:28.680 | building neural net chips, and so it's gonna get better.

00:11:31.480 | However, the good news in a way,

00:11:36.240 | which is also a bad news,

00:11:37.520 | is that even our state of the art deep learning methods

00:11:42.520 | fail to learn models that understand

00:11:47.060 | even very simple environments

00:11:48.720 | like some grid worlds that we have built.

00:11:51.720 | Even these fairly simple environments.

00:11:53.840 | I mean, of course, if you train them with enough examples,

00:11:56.240 | eventually they get it.

00:11:57.280 | But it's just like, instead of what humans might need,

00:12:02.280 | just dozens of examples,

00:12:04.640 | these things will need millions, right?

00:12:07.720 | For very, very, very simple tasks.

00:12:10.140 | And so I think there's an opportunity for academics

00:12:14.400 | who don't have the kind of computing power

00:12:17.200 | that say Google has to do really important

00:12:21.080 | and exciting research to advance the state of the art

00:12:24.400 | in training frameworks, learning models, agent learning,

00:12:29.320 | in even simple environments that are synthetic,

00:12:33.640 | that seem trivial,

00:12:34.720 | but yet current machine learning fails on.

00:12:37.600 | - We talked about priors and common sense knowledge.

00:12:43.400 | It seems like we humans take a lot of knowledge for granted.

00:12:48.400 | So what's your view of these priors

00:12:53.720 | of forming this broad view of the world,

00:12:56.600 | this accumulation of information,

00:12:59.140 | and how we can teach neural networks or learning systems

00:13:02.160 | to pick that knowledge up?

00:13:03.720 | So knowledge, you know, for a while,

00:13:06.800 | the artificial intelligence,

00:13:08.960 | what's maybe in the 80s,

00:13:10.960 | like there's a time where knowledge representation,

00:13:13.840 | knowledge acquisition, expert systems,

00:13:16.840 | I mean, the symbolic AI was a view,

00:13:21.040 | was an interesting problem set to solve.

00:13:23.320 | And it was kind of put on hold a little bit, it seems like.

00:13:27.360 | - Because it doesn't work.

00:13:28.480 | - It doesn't work, that's right.

00:13:29.560 | But that's right.

00:13:32.020 | - But the goals of that remain important.

00:13:35.720 | - Yes, remain important.

00:13:36.800 | And how do you think those goals can be addressed?

00:13:40.680 | - Right, so first of all,

00:13:42.360 | I believe that one reason why

00:13:45.560 | the classical expert systems approach failed

00:13:49.280 | is because a lot of the knowledge we have,

00:13:52.360 | so you talked about common sense, intuition,

00:13:55.100 | there's a lot of knowledge like this

00:13:59.360 | which is not consciously accessible.

00:14:03.360 | There are lots of decisions we're taking

00:14:04.680 | that we can't really explain,

00:14:05.920 | even if sometimes we make up a story.

00:14:08.740 | And that knowledge is also necessary for machines

00:14:13.740 | to take good decisions.

00:14:16.940 | And that knowledge is hard to codify in expert systems,

00:14:21.140 | rule-based systems, and classical AI formalism.

00:14:24.740 | And there are other issues, of course,

00:14:26.060 | with the old AI,

00:14:27.820 | like not really good ways of handling uncertainty.

00:14:32.540 | I would say something more subtle,

00:14:35.500 | which we understand better now,

00:14:37.460 | but I think still isn't enough in the minds of people,

00:14:41.460 | there's something really powerful

00:14:44.080 | that comes from distributive representations,

00:14:47.220 | the thing that really makes neural nets work so well.

00:14:51.180 | And it's hard to replicate that kind of power

00:14:56.780 | in a symbolic world.

00:14:59.540 | The knowledge in expert systems and so on

00:15:02.140 | is nicely decomposed into like a bunch of rules.

00:15:06.680 | Whereas if you think about a neural net, it's the opposite.

00:15:09.140 | You have this big blob of parameters

00:15:12.100 | which work intensely together

00:15:14.020 | to represent everything the network knows.

00:15:16.580 | And it's not sufficiently factorized.

00:15:19.260 | And so I think this is one of the weaknesses

00:15:22.380 | of current neural nets,

00:15:24.320 | that we have to take lessons from classical AI

00:15:28.580 | in order to bring in another kind of compositionality

00:15:32.540 | which is common in language, for example,

00:15:34.300 | and in these rules,

00:15:35.900 | but that isn't so native to neural nets.

00:15:39.540 | - And on that line of thinking,

00:15:42.700 | disentangled representations.

00:15:45.700 | - Yes.

00:15:46.540 | - So.

00:15:47.740 | - So let me connect with disentangled representations,

00:15:50.820 | if you might, if you don't mind.

00:15:51.660 | - Yes, yes, exactly.

00:15:53.120 | - So for many years, I've thought,

00:15:55.500 | and I still believe that it's really important

00:15:57.620 | that we come up with learning algorithms,

00:16:00.700 | either unsupervised or supervised,

00:16:02.600 | but or enforcement, whatever,

00:16:04.880 | that build representations in which the important factors,

00:16:09.460 | hopefully causal factors, are nicely separated

00:16:12.300 | and easy to pick up from the representation.

00:16:15.120 | So that's the idea of disentangled representations.

00:16:17.460 | It says transform the data into a space

00:16:19.660 | where everything becomes easy.

00:16:21.820 | We can maybe just learn with linear models

00:16:25.220 | about the things we care about.

00:16:27.620 | And I still think this is important,

00:16:29.540 | but I think this is missing out

00:16:31.020 | on a very important ingredient,

00:16:33.860 | which classical AI systems can remind us of.

00:16:38.180 | So let's say we have these disentangled representations.

00:16:40.740 | You still need to learn about

00:16:42.340 | the relationships between the variables,

00:16:45.500 | those high-level semantic variables.

00:16:46.980 | They're not gonna be independent.

00:16:48.140 | I mean, this is like too much of an assumption.

00:16:51.320 | They're gonna have some interesting relationships

00:16:53.280 | that allow to predict things in the future,

00:16:55.340 | to explain what happened in the past.

00:16:57.720 | The kind of knowledge about those relationships

00:17:00.080 | in a classical AI system is encoded in the rules.

00:17:03.140 | Like a rule is just like a little piece of knowledge

00:17:05.500 | that says, oh, I have these two, three, four variables

00:17:08.940 | that are linked in this interesting way.

00:17:11.100 | Then I can say something about one or two of them

00:17:13.380 | given a couple of others, right?

00:17:14.940 | In addition to disentangling

00:17:16.780 | the elements of the representation,

00:17:21.020 | which are like the variables in a rule-based system,

00:17:24.260 | you also need to disentangle

00:17:28.620 | the mechanisms that relate those variables to each other.

00:17:33.300 | So like the rules.

00:17:34.420 | So the rules are neatly separated.

00:17:36.340 | Each rule is living on its own.

00:17:38.940 | And when I change a rule because I'm learning,

00:17:42.580 | it doesn't need to break other rules.

00:17:45.060 | Whereas current neural nets, for example,

00:17:46.820 | are very sensitive to what's called catastrophic forgetting

00:17:49.940 | where after I've learned some things

00:17:52.460 | and then I learn new things,

00:17:54.220 | it can destroy the old things that I had learned, right?

00:17:57.180 | If the knowledge was better factorized

00:18:00.140 | and separated, disentangled,

00:18:03.520 | then you would avoid a lot of that.

00:18:06.620 | Now you can't do this in the sensory domain,

00:18:10.460 | but my idea-- - What do you mean

00:18:12.500 | by sensory domain? - Like in pixel space.

00:18:14.780 | But my idea is that when you project the data

00:18:17.580 | in the right semantic space,

00:18:18.780 | it becomes possible to now represent this extra knowledge

00:18:23.420 | beyond the transformation from input to representations,

00:18:26.140 | which is how representations act on each other

00:18:28.820 | and predict the future and so on,

00:18:30.820 | in a way that can be neatly disentangled.

00:18:35.540 | So now it's the rules that are disentangled from each other

00:18:38.180 | and not just the variables

00:18:39.180 | that are disentangled from each other.

00:18:41.220 | - And you draw a distinction

00:18:42.340 | between semantic space and pixel.

00:18:44.340 | Like does there need to be an architectural difference?

00:18:47.660 | - Well, yeah.

00:18:48.500 | So there's the sensory space like pixels,

00:18:50.300 | which where everything is entangled.

00:18:52.240 | The information, like the variables

00:18:55.140 | that are completely interdependent

00:18:56.580 | in very complicated ways.

00:18:58.940 | And also computation, like it's not just variables,

00:19:02.940 | it's also how they are related to each other

00:19:04.700 | is all intertwined.

00:19:06.700 | But I'm hypothesizing that in the right high-level

00:19:11.180 | representation space, both the variables

00:19:15.200 | and how they relate to each other can be disentangled

00:19:18.260 | and that will provide a lot of generalization power.

00:19:21.700 | - Generalization power. - Yes.

00:19:24.020 | - Distribution of the test set is assumed to be the same

00:19:28.420 | as the distribution of the training set.

00:19:30.220 | - Right.

00:19:31.100 | This is where current machine learning is too weak.

00:19:34.900 | It doesn't tell us anything,

00:19:36.780 | is not able to tell us anything about how our,

00:19:39.380 | you know, let's say are gonna generalize

00:19:41.060 | to a new distribution.

00:19:42.300 | And, you know, people may think,

00:19:44.900 | well, but there's nothing we can say

00:19:46.140 | if we don't know what the new distribution will be.

00:19:48.900 | The truth is humans are able

00:19:51.300 | to generalize to new distributions.

00:19:53.700 | - Yeah, how are we able to do that?

00:19:54.940 | - Yeah, because there is something,

00:19:56.460 | these new distributions,

00:19:58.060 | even though they could look very different

00:19:59.580 | from the training distributions,

00:20:01.660 | they have things in common.

00:20:02.540 | So let me give you a concrete example.

00:20:04.380 | You read a science fiction novel.

00:20:06.380 | The science fiction novel maybe, you know,

00:20:09.980 | brings you in some other planet

00:20:12.420 | where things look very different on the surface,

00:20:16.200 | but it's still the same laws of physics, right?

00:20:18.980 | And so you can read the book

00:20:20.060 | and you understand what's going on.

00:20:23.100 | So the distribution is very different,

00:20:25.020 | but because you can transport a lot of the knowledge

00:20:28.940 | you had from Earth about the underlying

00:20:32.380 | cause and effect relationships

00:20:33.940 | and physical mechanisms and all that,

00:20:36.580 | and maybe even social interactions,

00:20:39.060 | you can now make sense of what is going on on this planet

00:20:41.380 | where like visually, for example,

00:20:42.980 | things are totally different.

00:20:44.420 | - Taking that analogy further and distorting it,

00:20:48.820 | let's enter a science fiction world

00:20:51.380 | of say "Space Odyssey" 2001 with Hal.

00:20:54.780 | - Yeah.

00:20:55.620 | - Or maybe, which is probably one of my favorite AI movies.

00:21:00.620 | And then-- - Me too.

00:21:01.860 | - And then there's another one that a lot of people love

00:21:04.780 | that maybe a little bit outside of the AI community

00:21:08.460 | is "Ex Machina."

00:21:09.780 | - Right. - I don't know

00:21:10.620 | if you've seen it. - Yes, yes.

00:21:12.620 | - By the way, what are your views on that movie?

00:21:14.780 | Does it, are you able to enjoy it?

00:21:16.860 | - So there are things I like and things I hate.

00:21:21.220 | - So let me, you could talk about that

00:21:23.420 | in the context of a question I wanna ask,

00:21:25.940 | which is there's quite a large community of people

00:21:29.260 | from different backgrounds, often outside of AI,

00:21:32.060 | who are concerned about existential threat

00:21:34.140 | of artificial intelligence. - Right.

00:21:35.820 | - You've seen this community develop over time,

00:21:38.700 | you've seen, you have a perspective.

00:21:40.220 | So what do you think is the best way to talk

00:21:42.860 | about AI safety, to think about it,

00:21:45.420 | to have discourse about it within AI community

00:21:48.420 | and outside and grounded in the fact that "Ex Machina"

00:21:52.700 | is one of the main sources of information

00:21:54.660 | for the general public about AI?

00:21:56.660 | - So I think you're putting it right.

00:21:58.620 | There's a big difference between the sort of discussion

00:22:02.340 | we oughta have within the AI community

00:22:05.220 | and the sort of discussion that really matter

00:22:07.700 | in the general public.

00:22:09.180 | So I think the picture of Terminator and AI loose

00:22:15.420 | and killing people and super intelligence

00:22:18.780 | that's gonna destroy us, whatever we try,

00:22:21.460 | isn't really so useful for the public discussion

00:22:26.140 | because for the public discussion,

00:22:28.540 | the things I believe really matter are the short-term

00:22:32.940 | and medium-term, very likely negative impacts of AI

00:22:36.260 | on society, whether it's from security,

00:22:40.700 | like Big Brother scenarios with face recognition

00:22:43.420 | or killer robots or the impact on the job market

00:22:46.820 | or concentration of power and discrimination,

00:22:50.060 | all kinds of social issues which could actually,

00:22:53.840 | some of them could really threaten democracy, for example.

00:22:58.900 | - Just to clarify, when you said killer robots,

00:23:01.180 | you mean autonomous weapon, like the weapon systems.

00:23:04.180 | - Yes, I don't mean-- - Not "Turkish Terminator."

00:23:05.940 | - That's right.

00:23:07.340 | So I think these short and medium-term concerns

00:23:11.260 | should be important parts of the public debate.

00:23:13.860 | Now, existential risk for me

00:23:16.420 | is a very unlikely consideration,

00:23:20.260 | but still worth academic investigation

00:23:25.260 | in the same way that you could say,

00:23:26.940 | should we study what could happen if a meteorite

00:23:31.060 | came to Earth and destroyed it?

00:23:32.780 | So I think it's very unlikely that this is gonna happen

00:23:35.900 | or happen in a reasonable future.

00:23:38.420 | It's very, the sort of scenario of an AI getting loose

00:23:43.420 | goes against my understanding

00:23:45.220 | of at least current machine learning

00:23:46.700 | and current neural nets and so on.

00:23:48.620 | It's not plausible to me.

00:23:50.340 | But of course, I don't have a crystal ball

00:23:51.940 | and who knows what AI will be in 50 years from now.

00:23:54.380 | So I think it is worth that scientists study those problems.

00:23:57.660 | It's just not a pressing question as far as I'm concerned.

00:24:00.540 | - So before I continue down that line,

00:24:03.780 | I have a few questions there,

00:24:04.780 | but what do you like and not like about Ex Machina

00:24:09.500 | as a movie?

00:24:10.340 | 'Cause I actually watched it for the second time

00:24:12.260 | and enjoyed it.

00:24:13.860 | I hated it the first time,

00:24:15.140 | and I enjoyed it quite a bit more the second time

00:24:18.260 | when I sort of learned to accept certain pieces of it.

00:24:22.760 | You see it as a concept movie.

00:24:26.060 | What was your experience?

00:24:27.060 | What were your thoughts?

00:24:29.220 | - So the negative is the picture it paints

00:24:35.180 | of science is totally wrong.

00:24:37.240 | Science in general and AI in particular.

00:24:40.660 | Science is not happening in some hidden place

00:24:45.580 | by some really smart guy.

00:24:48.740 | - One person.

00:24:49.580 | - One person.

00:24:50.400 | This is totally unrealistic.

00:24:52.020 | This is not how it happens.

00:24:54.300 | Even a team of people in some isolated place

00:24:57.620 | will not make it.

00:24:58.540 | Science moves by small steps

00:25:01.780 | thanks to the collaboration and community

00:25:06.780 | of a large number of people interacting.

00:25:09.660 | All the scientists who are expert in their field

00:25:15.260 | kind of know what is going on,

00:25:16.580 | even in the industrial labs.

00:25:18.260 | Information flows and leaks and so on.

00:25:22.100 | The spirit of it is very different

00:25:24.900 | from the way science is painted in this movie.

00:25:28.780 | - Yeah, let me ask on that point.

00:25:31.500 | It's been the case to this point

00:25:34.180 | that kind of even if the research happens

00:25:36.260 | inside Google or Facebook, inside companies,

00:25:38.140 | it still kind of comes out, ideas come out.

00:25:40.740 | - Absolutely.

00:25:41.580 | - Do you think that will always be the case with AI?

00:25:42.940 | Is it possible to bottle ideas to the point

00:25:45.860 | where there's a set of breakthroughs

00:25:49.140 | that go completely undiscovered

00:25:50.300 | by the general research community?

00:25:52.460 | Do you think that's even possible?

00:25:54.980 | - It's possible, but it's unlikely.

00:25:56.780 | - Unlikely.

00:25:58.260 | - It's not how it is done now.

00:26:00.840 | It's not how I can foresee it in the foreseeable future.

00:26:04.640 | But of course, I don't have a crystal ball.

00:26:10.340 | And so who knows?

00:26:13.260 | This is science fiction, after all.

00:26:15.380 | But usually--

00:26:16.220 | - I think it's ominous that the lights went off

00:26:17.980 | during that discussion.

00:26:19.440 | - So the problem, again, there's a,

00:26:23.380 | one thing is the movie,

00:26:24.380 | and you could imagine all kinds of science fiction.

00:26:26.060 | The problem for me,

00:26:28.060 | maybe similar to the question about existential risk,

00:26:31.160 | is that this kind of movie paints such a wrong picture

00:26:36.160 | of what is actual, you know, the actual science

00:26:39.520 | and how it's going on,

00:26:40.620 | that it can have unfortunate effects

00:26:43.620 | on people's understanding of current science.

00:26:46.800 | And so that's kind of sad.

00:26:49.280 | Is it an important principle in research,

00:26:54.260 | which is diversity?

00:26:56.140 | So in other words, research is exploration.

00:26:59.340 | Research is exploration in the space of ideas.

00:27:02.000 | And different people will focus on different directions.

00:27:05.180 | And this is not just good, it's essential.

00:27:08.720 | So I'm totally fine with people exploring directions

00:27:13.080 | that are contrary to mine or look orthogonal to mine.

00:27:16.900 | I am more than fine.

00:27:20.300 | I think it's important.

00:27:21.900 | I and my friends don't claim we have universal truth

00:27:25.380 | about what will,

00:27:26.220 | especially about what will happen in the future.

00:27:28.860 | Now, that being said, we have our intuitions,

00:27:31.680 | and then we act accordingly,

00:27:34.220 | according to where we think we can be most useful

00:27:37.720 | and where society has the most to gain or to lose.

00:27:40.300 | We should have those debates

00:27:43.140 | and not end up in a society where there's only one voice

00:27:48.140 | and one way of thinking and research money is spread out.

00:27:53.340 | So--

00:27:55.140 | - Disagreement is a sign of good research, good science.

00:27:59.220 | - Yes.

00:28:00.060 | - The idea of bias in the human sense of bias.

00:28:04.060 | - Yeah.

00:28:05.660 | - How do you think about instilling in machine learning

00:28:09.740 | something that's aligned with human values in terms of bias?

00:28:14.060 | We, intuitively as human beings,

00:28:15.780 | have a concept of what bias means,

00:28:17.700 | of what a fundamental respect for other human beings means.

00:28:21.740 | But how do we instill that

00:28:23.780 | into machine learning systems, do you think?

00:28:26.660 | - So I think there are short-term things

00:28:29.700 | that are already happening,

00:28:31.500 | and then there are long-term things that we need to do.

00:28:35.260 | In the short term, there are techniques

00:28:37.900 | that have been proposed,

00:28:39.060 | and I think will continue to be improved,

00:28:40.980 | and maybe alternatives will come up,

00:28:43.380 | to take data sets in which we know there is bias,

00:28:47.260 | we can measure it.

00:28:48.180 | Pretty much any data set where humans are,

00:28:51.340 | you know, being observed, taking decisions,

00:28:53.140 | will have some sort of bias,

00:28:54.580 | discrimination against particular groups, and so on.

00:28:57.220 | And we can use machine learning techniques

00:29:01.100 | to try to build predictors, classifiers,

00:29:04.140 | that are gonna be less biased.

00:29:06.700 | We can do it, for example, using adversarial methods

00:29:11.540 | to make our systems less sensitive to these variables

00:29:16.260 | we should not be sensitive to.

00:29:18.260 | So these are clear, well-defined ways

00:29:20.780 | of trying to address the problem.

00:29:22.260 | Maybe they have weaknesses,

00:29:23.660 | and more research is needed, and so on.

00:29:25.620 | But I think, in fact, they're sufficiently mature

00:29:28.940 | that governments should start regulating companies

00:29:32.620 | where it matters, say, like insurance companies,

00:29:35.100 | so that they use those techniques,

00:29:36.740 | because those techniques will probably reduce the bias,

00:29:41.740 | but at a cost.

00:29:43.140 | For example, maybe their predictions will be less accurate,

00:29:45.860 | and so companies will not do it until you force them.

00:29:49.060 | All right, so this is short-term.

00:29:50.780 | Long-term, I'm really interested in thinking

00:29:55.180 | of how we can instill moral values into computers.

00:29:59.180 | Obviously, this is not something we'll achieve

00:30:01.420 | in the next five or 10 years.

00:30:03.100 | How can we, you know, there's already work

00:30:06.460 | in detecting emotions, for example,

00:30:09.300 | in images, in sounds, in texts,

00:30:13.140 | and also studying how different agents

00:30:18.140 | interacting in different ways may correspond

00:30:21.420 | to patterns of, say, injustice, which could trigger anger.

00:30:26.420 | So these are things we can do in the medium term,

00:30:31.660 | and eventually train computers to model, for example,

00:30:36.660 | how humans react emotionally.

00:30:42.200 | I would say the simplest thing is unfair situations,

00:30:47.020 | which trigger anger.

00:30:48.620 | This is one of the most basic emotions

00:30:50.500 | that we share with other animals.

00:30:52.700 | I think it's quite feasible within the next few years,

00:30:55.380 | so we can build systems that can detect

00:30:57.540 | these kinds of things, to the extent, unfortunately,

00:31:00.640 | that they understand enough about the world around us,

00:31:04.240 | which is a long time away,

00:31:05.820 | but maybe we can initially do this in virtual environments.

00:31:10.340 | So you can imagine like a video game

00:31:12.140 | where agents interact in some ways,

00:31:15.340 | and then some situations trigger an emotion.

00:31:19.020 | I think we could train machines to detect those situations

00:31:22.640 | and predict that the particular emotion

00:31:24.420 | will likely be felt if a human

00:31:27.260 | was playing one of the characters.

00:31:29.300 | - You have shown excitement and done a lot of excellent work

00:31:33.040 | with unsupervised learning, but on a super,

00:31:36.580 | you know, there's been a lot of success

00:31:38.560 | on the supervised learning side.

00:31:39.700 | - Yes, yes.

00:31:40.980 | - And one of the things I'm really passionate about

00:31:43.780 | is how humans and robots work together.

00:31:46.300 | And in the context of supervised learning,

00:31:49.380 | that means the process of annotation.

00:31:52.220 | Do you think about the problem of annotation

00:31:54.780 | of put in a more interesting way

00:31:58.380 | is humans teaching machines?

00:32:00.860 | - Yes. - Is there?

00:32:02.380 | - Yes, I think it's an important subject.

00:32:04.980 | Reducing it to annotation may be useful

00:32:08.100 | for somebody building a system tomorrow,

00:32:11.180 | but longer term, the process of teaching,

00:32:14.880 | I think is something that deserves a lot more attention

00:32:17.740 | from the machine learning community.

00:32:19.420 | So there are people who've coined the term machine teaching.

00:32:22.700 | So what are good strategies for teaching a learning agent?

00:32:26.580 | And can we design, train a system

00:32:30.620 | that is gonna be a good teacher?

00:32:32.620 | So in my group, we have a project

00:32:35.700 | called a BBI or BBI game,

00:32:38.620 | where there is a game or a scenario

00:32:42.420 | where there's a learning agent and a teaching agent.

00:32:46.300 | Presumably the teaching agent would eventually be a human,

00:32:50.540 | but we're not there yet.

00:32:52.060 | And the role of the teacher

00:32:57.300 | is to use its knowledge of the environment,

00:32:59.260 | which it can acquire using whatever way, brute force,

00:33:04.740 | to help the learner learn as quickly as possible.

00:33:09.020 | So the learner is gonna try to learn by itself,

00:33:11.180 | maybe using some exploration and whatever,

00:33:14.260 | but the teacher can choose,

00:33:17.440 | can have an influence on the interaction with the learner

00:33:21.500 | so as to guide the learner,

00:33:24.100 | maybe teach it the things

00:33:27.300 | that the learner has most trouble with,

00:33:29.060 | or just add the boundary between what it knows

00:33:30.980 | and doesn't know and so on.

00:33:32.620 | So there's a tradition of these kind of ideas

00:33:35.860 | from other fields, like tutorial systems, for example,

00:33:40.820 | and AI, and of course, people in the humanities

00:33:45.460 | have been thinking about these questions,

00:33:46.700 | but I think it's time that machine learning people

00:33:49.780 | look at this because in the future,

00:33:51.740 | we'll have more and more human-machine interaction

00:33:55.620 | with a human in a loop,

00:33:56.940 | and I think understanding how to make this work better--

00:34:00.540 | - All the problems around that are very interesting

00:34:02.660 | and not sufficiently addressed.

00:34:04.180 | You've done a lot of work with language, too.

00:34:07.540 | What aspect of the traditionally formulated Turing test,

00:34:12.540 | a test of natural language understanding and generation,

00:34:15.980 | in your eyes, is the most difficult?

00:34:18.340 | Of conversation, what in your eyes

00:34:20.220 | is the hardest part of conversation to solve for machines?

00:34:24.580 | - So I would say it's everything having to do

00:34:27.060 | with the non-linguistic knowledge,

00:34:30.340 | which implicitly you need

00:34:32.060 | in order to make sense of sentences.

00:34:34.860 | Things like the Winograd schema,

00:34:36.460 | so these sentences that are semantically ambiguous.

00:34:39.500 | In other words, you need to understand enough

00:34:42.300 | about the world in order to really interpret properly

00:34:45.620 | those sentences.

00:34:46.660 | I think these are interesting challenges

00:34:49.380 | for machine learning because they point in the direction

00:34:52.980 | of building systems that both understand how the world works

00:34:58.660 | and its causal relationships in the world

00:35:01.420 | and associate that knowledge with how to express it

00:35:06.380 | in language, either for reading or writing.

00:35:10.140 | - You speak French?

00:35:13.380 | - Yes, it's my mother tongue.

00:35:14.700 | - It's one of the romance languages.

00:35:17.180 | Do you think passing the Turing test

00:35:19.780 | and all the underlying challenges we just mentioned

00:35:22.100 | depend on language?

00:35:23.180 | Do you think it might be easier in French

00:35:24.840 | than it is in English?

00:35:26.140 | - No.

00:35:26.980 | - Or is it independent of language?

00:35:28.380 | - I think it's independent of language.

00:35:30.620 | I would like to build systems

00:35:35.540 | that can use the same principles,

00:35:38.540 | the same learning mechanisms

00:35:40.900 | to learn from human agents, whatever their language.

00:35:46.260 | - Well, certainly us humans can talk more beautifully

00:35:50.740 | and smoothly in poetry.

00:35:52.200 | So I'm Russian originally.

00:35:53.660 | I know poetry in Russian is maybe easier

00:35:58.620 | to convey complex ideas than it is in English.

00:36:02.380 | But maybe I'm showing my bias

00:36:04.180 | and some people could say that about French.

00:36:07.580 | But of course, the goal ultimately is our human brain

00:36:11.980 | is able to utilize any kind of those languages

00:36:15.900 | to use them as tools to convey meaning.

00:36:18.420 | - Yeah, of course there are differences between languages

00:36:20.460 | and maybe some are slightly better at some things.

00:36:22.700 | But in the grand scheme of things

00:36:25.020 | where we're trying to understand how the brain works

00:36:26.900 | and language and so on,

00:36:29.180 | I think these differences are minute.

00:36:31.180 | - So you've lived perhaps through an AI winter of sorts.

00:36:37.980 | - Yes.

00:36:39.820 | - How did you stay warm and continue your research?

00:36:44.620 | - Stay warm with friends.

00:36:45.660 | - With friends.

00:36:46.580 | Okay, so it's important to have friends.

00:36:48.500 | And what have you learned from the experience?

00:36:53.040 | - Listen to your inner voice.

00:36:55.600 | Don't be trying to just please the crowds and the fashion.

00:37:00.600 | And if you have a strong intuition about something

00:37:07.960 | that is not contradicted by actual evidence, go for it.

00:37:12.920 | I mean, it could be contradicted by people.

00:37:16.000 | (laughing)

00:37:17.160 | - Not your own instinct of based on everything

00:37:19.720 | you've learned.

00:37:20.560 | - Of course you have to adapt your beliefs

00:37:23.400 | when your experiments contradict those beliefs.

00:37:26.640 | But you have to stick to your beliefs otherwise.

00:37:31.740 | It's what allowed me to go through those years.

00:37:34.940 | It's what allowed me to persist in directions

00:37:39.280 | that took time, whatever other people think,

00:37:42.840 | took time to mature and bring fruits.

00:37:47.980 | - So history of AI is marked with these,

00:37:51.380 | of course it's marked with technical breakthroughs,

00:37:54.400 | but it's also marked with these seminal events

00:37:57.420 | that capture the imagination of the community.

00:38:00.740 | Most recent, I would say AlphaGo being

00:38:04.260 | the world champion human Go player was one of those moments.

00:38:08.780 | What do you think the next such moment might be?

00:38:12.780 | - Okay, so first of all, I think that these

00:38:15.140 | so-called seminal events are overrated.

00:38:18.200 | (laughing)

00:38:20.460 | As I said, science really moves by small steps.

00:38:25.940 | Now what happens is you make one more small step

00:38:30.740 | and it's like the drop that allows to,

00:38:35.740 | that fills the bucket and then you have

00:38:39.040 | drastic consequences because now you're able

00:38:41.180 | to do something you were not able to do before.

00:38:43.780 | Or now, say the cost of building some device

00:38:46.840 | or solving a problem becomes cheaper than what existed

00:38:50.380 | and you have a new market that opens up.

00:38:52.820 | So especially in the world of commerce and applications,

00:38:56.980 | the impact of a small scientific progress could be huge.

00:39:01.660 | But in the science itself, I think it's very, very gradual.

00:39:07.460 | - Where are these steps being taken now?

00:39:10.620 | So there's unsupervised learning.

00:39:13.140 | - So if I look at one trend that I like in my community,

00:39:18.140 | so for example, at Miele, my institute,

00:39:23.260 | what are the two hottest topics?

00:39:24.980 | GANs and reinforcement learning.

00:39:29.340 | Even though in Montreal in particular,

00:39:32.340 | like reinforcement learning was something

00:39:35.100 | pretty much absent just two or three years ago.

00:39:38.000 | So there's really a big interest from students

00:39:41.700 | and there's a big interest from people like me.

00:39:45.180 | So I would say this is something where we're gonna see

00:39:49.660 | more progress even though it hasn't yet provided much

00:39:54.380 | in terms of actual industrial fallout.

00:39:58.220 | Like even though there's AlphaGo,

00:40:00.420 | like Google is not making money on this right now.

00:40:02.880 | But I think over the long term,

00:40:04.740 | this is really, really important for many reasons.

00:40:07.240 | So in other words, I would say reinforcement learning

00:40:11.140 | may be more generally agent learning

00:40:13.900 | 'cause it doesn't have to be with rewards.

00:40:15.660 | It could be in all kinds of ways

00:40:17.060 | that an agent is learning about its environment.

00:40:19.500 | - Now, reinforcement learning you're excited about,

00:40:22.740 | do you think GANs could provide something,

00:40:27.540 | - Yes. - some moment in--

00:40:30.940 | - Well, GANs or other generative models,

00:40:34.700 | I believe will be crucial ingredients

00:40:38.740 | in building agents that can understand the world.

00:40:42.160 | A lot of the successes in reinforcement learning

00:40:45.980 | in the past has been with policy gradient

00:40:49.340 | where you just learn a policy,

00:40:51.100 | you don't actually learn a model of the world.

00:40:53.380 | But there are lots of issues with that.

00:40:55.560 | And we don't know how to do model-based RL right now.

00:40:58.500 | But I think this is where we have to go

00:41:02.020 | in order to build models that can generalize faster

00:41:05.500 | and better like to new distributions.

00:41:08.320 | That capture to some extent,

00:41:11.080 | at least the underlying causal mechanisms in the world.

00:41:14.940 | - Last question, what made you fall in love

00:41:19.280 | with artificial intelligence?

00:41:21.080 | If you look back, what was the first moment in your life

00:41:26.080 | when you were fascinated by either the human mind

00:41:29.300 | or the artificial mind?

00:41:31.320 | - You know, when I was an adolescent, I was reading a lot.

00:41:33.700 | And then I started reading science fiction.

00:41:36.480 | (both laughing)

00:41:37.800 | - There you go.

00:41:38.640 | - I got, that's it.

00:41:40.320 | That's where I got hooked.

00:41:42.560 | And then I had one of the first personal computers

00:41:47.360 | and I got hooked in programming.

00:41:51.080 | And so it just, you know.

00:41:52.440 | - Start with fiction and then make it a reality.

00:41:54.640 | - That's right.

00:41:56.040 | - Yoshua, thank you so much for talking to me.

00:41:57.560 | - My pleasure.

00:41:58.400 | (upbeat music)

00:42:00.980 | (upbeat music)

00:42:03.560 | (upbeat music)

00:42:06.140 | (upbeat music)

00:42:08.720 | (upbeat music)

00:42:11.300 | (upbeat music)

00:42:13.880 | [BLANK_AUDIO]

Yoshua Bengio: Deep Learning | Lex Fridman Podcast #4

Chapters