back to index

Yoshua Bengio: Deep Learning | Lex Fridman Podcast #4


Chapters

0:0 Introduction
3:42 Current state of deep learning
6:44 Architecture vs dataset
8:1 Learning through interaction
10:46 Our brain is big
12:40 Knowledge
24:1 Ex Machina
25:28 Bottle Ideas
27:54 Bias in Machine Learning
31:29 Teaching Machines
33:58 The Turing Test
37:48 Whats next
40:20 Gans

Whisper Transcript | Transcript Only Page

00:00:00.000 | - What difference between biological neural networks
00:00:02.760 | and artificial neural networks is most mysterious,
00:00:06.200 | captivating, and profound for you?
00:00:07.900 | - First of all, there's so much we don't know
00:00:13.800 | about biological neural networks,
00:00:15.400 | and that's very mysterious and captivating
00:00:17.580 | because maybe it holds the key
00:00:20.600 | to improving artificial neural networks.
00:00:23.380 | One of the things I studied recently,
00:00:29.380 | something that we don't know how biological neural networks
00:00:32.680 | do but would be really useful for artificial ones
00:00:37.200 | is the ability to do credit assignment
00:00:39.920 | through very long time spans.
00:00:44.160 | There are things that we can in principle do
00:00:48.060 | with artificial neural nets,
00:00:49.160 | but it's not very convenient
00:00:50.320 | and it's not biologically plausible.
00:00:52.680 | And this mismatch, I think, this kind of mismatch
00:00:57.200 | may be an interesting thing to study
00:01:00.020 | to A, understand better how brains might do these things
00:01:04.340 | because we don't have good corresponding theories
00:01:06.920 | with artificial neural nets,
00:01:08.260 | and B, maybe provide new ideas that we could explore
00:01:13.260 | about things that brain do differently
00:01:18.940 | and that we could incorporate in artificial neural nets.
00:01:22.360 | - So let's break credit assignment up a little bit.
00:01:24.700 | - Yes.
00:01:25.540 | - It's a beautifully technical term,
00:01:27.800 | but it could incorporate so many things.
00:01:29.840 | So is it more on the RNN memory side,
00:01:34.840 | thinking like that,
00:01:36.720 | or is it something about knowledge,
00:01:38.480 | building up common sense knowledge over time,
00:01:41.200 | or is it more in the reinforcement learning sense
00:01:45.440 | that you're picking up rewards over time
00:01:47.640 | for a particular, to achieve a certain kind of goal?
00:01:50.200 | - So I was thinking more about the first two meanings
00:01:53.880 | whereby we store all kinds of memories,
00:01:58.880 | episodic memories in our brain,
00:02:02.220 | which we can access later
00:02:06.100 | in order to help us both infer causes
00:02:11.100 | of things that we are observing now
00:02:12.860 | and assign credit to decisions or interpretations
00:02:19.140 | we came up with a while ago
00:02:21.740 | when those memories were stored.
00:02:24.500 | And then we can change the way we would have reacted
00:02:29.500 | or interpreted things in the past.
00:02:31.860 | And now that's credit assignment used for learning.
00:02:35.120 | - So in which way do you think artificial neural networks,
00:02:41.260 | the current LSTM,
00:02:44.620 | the current architectures are not able to capture
00:02:49.860 | presumably you're thinking of very long-term?
00:02:53.020 | - Yes.
00:02:53.860 | So the current nets are doing a fairly good jobs
00:02:58.420 | for sequences with dozens or say hundreds of timestamps.
00:03:03.180 | And then it gets sort of harder and harder
00:03:06.060 | and depending on what you have to remember and so on,
00:03:08.420 | as you consider longer durations.
00:03:10.940 | Whereas humans seem to be able to do credit assignment
00:03:15.940 | through essentially arbitrary times.
00:03:17.700 | Like I could remember something I did last year
00:03:20.580 | and then now because I see some new evidence,
00:03:23.260 | I'm gonna change my mind about the way I was thinking
00:03:26.860 | last year and hopefully not do the same mistake again.
00:03:31.100 | - I think a big part of that is probably forgetting.
00:03:36.120 | You're only remembering the really important things.
00:03:38.940 | So it's very efficient forgetting.
00:03:40.920 | - Yes.
00:03:43.060 | So there's a selection of what we remember.
00:03:45.100 | And I think there are really cool connection
00:03:48.020 | to higher level cognition here regarding consciousness,
00:03:52.600 | deciding and emotions.
00:03:54.300 | So there's deciding what comes to consciousness
00:03:56.500 | and what gets stored in memory,
00:03:58.320 | which are not trivial either.
00:04:01.460 | - So you've been at the forefront there all along
00:04:07.000 | showing some of the amazing things that neural networks,
00:04:09.920 | deep neural networks can do
00:04:12.140 | in the field of artificial intelligence,
00:04:13.580 | just broadly in all kinds of applications.
00:04:16.060 | But we can talk about that forever,
00:04:19.140 | but what in your view,
00:04:21.980 | because we're thinking towards the future,
00:04:24.020 | is the weakest aspect of the way deep neural networks
00:04:26.940 | represent the world?
00:04:28.060 | What is in your view is missing?
00:04:31.360 | - So current state of the art neural nets
00:04:36.700 | trained on large quantities of images or texts,
00:04:41.860 | have some level of understanding
00:04:45.980 | of what explains those data sets,
00:04:49.020 | but it's very basic.
00:04:51.820 | It's very low level,
00:04:54.340 | and it's not nearly as robust and abstract in general
00:04:59.340 | as our understanding.
00:05:01.700 | Okay, so that doesn't tell us how to fix things,
00:05:05.820 | but I think it encourages us to think about
00:05:10.700 | how we can maybe train our neural nets differently
00:05:15.700 | so that they would focus, for example,
00:05:21.060 | on causal explanation,
00:05:22.520 | something that we don't do currently
00:05:25.060 | with neural net training.
00:05:27.840 | Also, one thing I'll talk about in my talk this afternoon
00:05:32.840 | is instead of learning separately from images and videos
00:05:38.480 | on one hand and from texts on the other hand,
00:05:41.880 | we need to do a better job of jointly learning
00:05:46.200 | about language and about the world to which it refers
00:05:51.160 | so that both sides can help each other.
00:05:55.720 | We need to have good world models in our neural nets
00:06:00.200 | for them to really understand sentences
00:06:03.600 | which talk about what's going on in the world,
00:06:05.840 | and I think we need language input
00:06:10.120 | to help provide clues about what high level concepts
00:06:15.040 | like semantic concepts should be represented
00:06:18.720 | at the top levels of these neural nets.
00:06:21.800 | In fact, there is evidence
00:06:24.480 | that the purely unsupervised learning of representations
00:06:28.920 | doesn't give rise to high level representations
00:06:33.360 | that are as powerful as the ones we're getting
00:06:35.960 | from supervised learning.
00:06:37.420 | And so the clues we're getting just with the labels,
00:06:41.240 | not even sentences, is already very powerful.
00:06:44.840 | - Do you think that's an architecture challenge
00:06:47.720 | or is it a data set challenge?
00:06:49.640 | - Neither.
00:06:50.480 | (laughing)
00:06:52.720 | - I'm tempted to just end it there.
00:06:56.480 | (laughing)
00:06:57.320 | - No, okay.
00:06:58.160 | - Can you elaborate slightly?
00:06:59.640 | - Yes.
00:07:00.480 | (laughing)
00:07:03.080 | Of course, data sets and architectures
00:07:04.520 | are something you wanna always play with,
00:07:06.400 | but I think the crucial thing is more
00:07:08.760 | the training objectives, the training frameworks.
00:07:11.480 | For example, going from passive observation of data
00:07:17.200 | to more active agents,
00:07:20.080 | which learn by intervening in the world,
00:07:25.080 | the relationships between causes and effects,
00:07:28.600 | the sort of objective functions
00:07:31.400 | which could be important to allow
00:07:34.880 | the highest level explanations to rise from the learning,
00:07:39.880 | which I don't think we have now,
00:07:43.520 | the kinds of objective functions
00:07:45.080 | which could be used to reward exploration,
00:07:48.980 | the right kind of exploration.
00:07:50.240 | So these kinds of questions are neither in the data set
00:07:54.440 | nor in the architecture,
00:07:55.660 | but more in how we learn,
00:07:58.320 | under what objectives and so on.
00:08:01.600 | - Yeah, I've heard you mention in several contexts
00:08:04.600 | the idea of the way children learn,
00:08:06.520 | they interact with objects in the world,
00:08:08.240 | and it seems fascinating because in some sense,
00:08:12.200 | except with some cases in reinforcement learning,
00:08:15.800 | that idea is not part of the learning process
00:08:20.800 | in artificial neural networks.
00:08:23.040 | It's almost like, do you envision something
00:08:25.520 | like an objective function saying,
00:08:30.360 | "You know what, if you poke this object in this kind of way,
00:08:34.400 | it would be really helpful for me to further learn."
00:08:39.400 | Sort of almost guiding some aspect of learning.
00:08:43.280 | - Right, right, right.
00:08:44.120 | So I was talking to Rebecca Sachs just an hour ago,
00:08:47.600 | and she was talking about lots and lots of evidence
00:08:50.760 | from infants seem to clearly pick what interests them
00:09:00.200 | in a directed way.
00:09:01.560 | And so they're not passive learners.
00:09:05.000 | They focus their attention on aspects of the world
00:09:09.680 | which are most interesting, surprising,
00:09:12.560 | in a non-trivial way that makes them change
00:09:16.600 | their theories of the world.
00:09:18.100 | - So that's a fascinating view of the future progress,
00:09:24.140 | but on a more maybe boring question,
00:09:30.080 | do you think going deeper,
00:09:32.080 | and so do you think just increasing the size
00:09:36.720 | of the things that have been increasing a lot
00:09:39.300 | in the past few years will also make significant progress?
00:09:43.600 | So some of the representational issues that you mentioned,
00:09:48.560 | they're kind of shallow in some sense.
00:09:52.360 | - Oh, shallow you mean in the sense of abstraction?
00:09:55.360 | - In the sense of abstraction.
00:09:56.440 | They're not getting some, like you said--
00:09:58.160 | - I don't think that having more depth in the network
00:10:01.840 | in the sense of instead of 100 layers,
00:10:03.720 | we have 10,000 is going to solve our problem.
00:10:06.660 | - You don't think so?
00:10:07.500 | - No.
00:10:08.960 | - Is that obvious to you?
00:10:10.440 | - Yes.
00:10:11.440 | What is clear to me is that engineers and companies
00:10:14.720 | and labs and grad students will continue
00:10:17.960 | to tune architectures and explore all kinds of tweaks
00:10:22.800 | to make the current state of the art
00:10:24.480 | slightly ever slightly better.
00:10:27.160 | But I don't think that's gonna be nearly enough.
00:10:29.680 | I think we need some fairly drastic changes
00:10:31.880 | in the way that we are considering learning
00:10:35.080 | to achieve the goal that these learners actually understand
00:10:40.480 | in a deep way the environment in which they are,
00:10:44.040 | you know, observing and acting.
00:10:46.600 | - But I guess I was trying to ask a question
00:10:49.920 | that's more interesting than just more layers.
00:10:52.800 | It's basically once you figure out a way to learn
00:10:57.600 | through interacting, how many parameters does it take
00:11:01.320 | to store that information?
00:11:02.920 | So I think our brain is quite bigger
00:11:06.760 | than most neural networks.
00:11:07.840 | - Right, right.
00:11:08.680 | Oh, I see what you mean.
00:11:09.500 | Oh, I'm with you there.
00:11:10.880 | So I agree that in order to build neural nets
00:11:15.160 | with the kind of broad knowledge of the world
00:11:18.040 | that typical adult humans have,
00:11:21.000 | probably the kind of computing power we have now
00:11:23.960 | is gonna be insufficient.
00:11:25.680 | So, well, the good news is there are hardware companies
00:11:28.680 | building neural net chips, and so it's gonna get better.
00:11:31.480 | However, the good news in a way,
00:11:36.240 | which is also a bad news,
00:11:37.520 | is that even our state of the art deep learning methods
00:11:42.520 | fail to learn models that understand
00:11:47.060 | even very simple environments
00:11:48.720 | like some grid worlds that we have built.
00:11:51.720 | Even these fairly simple environments.
00:11:53.840 | I mean, of course, if you train them with enough examples,
00:11:56.240 | eventually they get it.
00:11:57.280 | But it's just like, instead of what humans might need,
00:12:02.280 | just dozens of examples,
00:12:04.640 | these things will need millions, right?
00:12:07.720 | For very, very, very simple tasks.
00:12:10.140 | And so I think there's an opportunity for academics
00:12:14.400 | who don't have the kind of computing power
00:12:17.200 | that say Google has to do really important
00:12:21.080 | and exciting research to advance the state of the art
00:12:24.400 | in training frameworks, learning models, agent learning,
00:12:29.320 | in even simple environments that are synthetic,
00:12:33.640 | that seem trivial,
00:12:34.720 | but yet current machine learning fails on.
00:12:37.600 | - We talked about priors and common sense knowledge.
00:12:43.400 | It seems like we humans take a lot of knowledge for granted.
00:12:48.400 | So what's your view of these priors
00:12:53.720 | of forming this broad view of the world,
00:12:56.600 | this accumulation of information,
00:12:59.140 | and how we can teach neural networks or learning systems
00:13:02.160 | to pick that knowledge up?
00:13:03.720 | So knowledge, you know, for a while,
00:13:06.800 | the artificial intelligence,
00:13:08.960 | what's maybe in the 80s,
00:13:10.960 | like there's a time where knowledge representation,
00:13:13.840 | knowledge acquisition, expert systems,
00:13:16.840 | I mean, the symbolic AI was a view,
00:13:21.040 | was an interesting problem set to solve.
00:13:23.320 | And it was kind of put on hold a little bit, it seems like.
00:13:27.360 | - Because it doesn't work.
00:13:28.480 | - It doesn't work, that's right.
00:13:29.560 | But that's right.
00:13:32.020 | - But the goals of that remain important.
00:13:35.720 | - Yes, remain important.
00:13:36.800 | And how do you think those goals can be addressed?
00:13:40.680 | - Right, so first of all,
00:13:42.360 | I believe that one reason why
00:13:45.560 | the classical expert systems approach failed
00:13:49.280 | is because a lot of the knowledge we have,
00:13:52.360 | so you talked about common sense, intuition,
00:13:55.100 | there's a lot of knowledge like this
00:13:59.360 | which is not consciously accessible.
00:14:03.360 | There are lots of decisions we're taking
00:14:04.680 | that we can't really explain,
00:14:05.920 | even if sometimes we make up a story.
00:14:08.740 | And that knowledge is also necessary for machines
00:14:13.740 | to take good decisions.
00:14:16.940 | And that knowledge is hard to codify in expert systems,
00:14:21.140 | rule-based systems, and classical AI formalism.
00:14:24.740 | And there are other issues, of course,
00:14:26.060 | with the old AI,
00:14:27.820 | like not really good ways of handling uncertainty.
00:14:32.540 | I would say something more subtle,
00:14:35.500 | which we understand better now,
00:14:37.460 | but I think still isn't enough in the minds of people,
00:14:41.460 | there's something really powerful
00:14:44.080 | that comes from distributive representations,
00:14:47.220 | the thing that really makes neural nets work so well.
00:14:51.180 | And it's hard to replicate that kind of power
00:14:56.780 | in a symbolic world.
00:14:59.540 | The knowledge in expert systems and so on
00:15:02.140 | is nicely decomposed into like a bunch of rules.
00:15:06.680 | Whereas if you think about a neural net, it's the opposite.
00:15:09.140 | You have this big blob of parameters
00:15:12.100 | which work intensely together
00:15:14.020 | to represent everything the network knows.
00:15:16.580 | And it's not sufficiently factorized.
00:15:19.260 | And so I think this is one of the weaknesses
00:15:22.380 | of current neural nets,
00:15:24.320 | that we have to take lessons from classical AI
00:15:28.580 | in order to bring in another kind of compositionality
00:15:32.540 | which is common in language, for example,
00:15:34.300 | and in these rules,
00:15:35.900 | but that isn't so native to neural nets.
00:15:39.540 | - And on that line of thinking,
00:15:42.700 | disentangled representations.
00:15:45.700 | - Yes.
00:15:46.540 | - So.
00:15:47.740 | - So let me connect with disentangled representations,
00:15:50.820 | if you might, if you don't mind.
00:15:51.660 | - Yes, yes, exactly.
00:15:53.120 | - So for many years, I've thought,
00:15:55.500 | and I still believe that it's really important
00:15:57.620 | that we come up with learning algorithms,
00:16:00.700 | either unsupervised or supervised,
00:16:02.600 | but or enforcement, whatever,
00:16:04.880 | that build representations in which the important factors,
00:16:09.460 | hopefully causal factors, are nicely separated
00:16:12.300 | and easy to pick up from the representation.
00:16:15.120 | So that's the idea of disentangled representations.
00:16:17.460 | It says transform the data into a space
00:16:19.660 | where everything becomes easy.
00:16:21.820 | We can maybe just learn with linear models
00:16:25.220 | about the things we care about.
00:16:27.620 | And I still think this is important,
00:16:29.540 | but I think this is missing out
00:16:31.020 | on a very important ingredient,
00:16:33.860 | which classical AI systems can remind us of.
00:16:38.180 | So let's say we have these disentangled representations.
00:16:40.740 | You still need to learn about
00:16:42.340 | the relationships between the variables,
00:16:45.500 | those high-level semantic variables.
00:16:46.980 | They're not gonna be independent.
00:16:48.140 | I mean, this is like too much of an assumption.
00:16:51.320 | They're gonna have some interesting relationships
00:16:53.280 | that allow to predict things in the future,
00:16:55.340 | to explain what happened in the past.
00:16:57.720 | The kind of knowledge about those relationships
00:17:00.080 | in a classical AI system is encoded in the rules.
00:17:03.140 | Like a rule is just like a little piece of knowledge
00:17:05.500 | that says, oh, I have these two, three, four variables
00:17:08.940 | that are linked in this interesting way.
00:17:11.100 | Then I can say something about one or two of them
00:17:13.380 | given a couple of others, right?
00:17:14.940 | In addition to disentangling
00:17:16.780 | the elements of the representation,
00:17:21.020 | which are like the variables in a rule-based system,
00:17:24.260 | you also need to disentangle
00:17:28.620 | the mechanisms that relate those variables to each other.
00:17:33.300 | So like the rules.
00:17:34.420 | So the rules are neatly separated.
00:17:36.340 | Each rule is living on its own.
00:17:38.940 | And when I change a rule because I'm learning,
00:17:42.580 | it doesn't need to break other rules.
00:17:45.060 | Whereas current neural nets, for example,
00:17:46.820 | are very sensitive to what's called catastrophic forgetting
00:17:49.940 | where after I've learned some things
00:17:52.460 | and then I learn new things,
00:17:54.220 | it can destroy the old things that I had learned, right?
00:17:57.180 | If the knowledge was better factorized
00:18:00.140 | and separated, disentangled,
00:18:03.520 | then you would avoid a lot of that.
00:18:06.620 | Now you can't do this in the sensory domain,
00:18:10.460 | but my idea-- - What do you mean
00:18:12.500 | by sensory domain? - Like in pixel space.
00:18:14.780 | But my idea is that when you project the data
00:18:17.580 | in the right semantic space,
00:18:18.780 | it becomes possible to now represent this extra knowledge
00:18:23.420 | beyond the transformation from input to representations,
00:18:26.140 | which is how representations act on each other
00:18:28.820 | and predict the future and so on,
00:18:30.820 | in a way that can be neatly disentangled.
00:18:35.540 | So now it's the rules that are disentangled from each other
00:18:38.180 | and not just the variables
00:18:39.180 | that are disentangled from each other.
00:18:41.220 | - And you draw a distinction
00:18:42.340 | between semantic space and pixel.
00:18:44.340 | Like does there need to be an architectural difference?
00:18:47.660 | - Well, yeah.
00:18:48.500 | So there's the sensory space like pixels,
00:18:50.300 | which where everything is entangled.
00:18:52.240 | The information, like the variables
00:18:55.140 | that are completely interdependent
00:18:56.580 | in very complicated ways.
00:18:58.940 | And also computation, like it's not just variables,
00:19:02.940 | it's also how they are related to each other
00:19:04.700 | is all intertwined.
00:19:06.700 | But I'm hypothesizing that in the right high-level
00:19:11.180 | representation space, both the variables
00:19:15.200 | and how they relate to each other can be disentangled
00:19:18.260 | and that will provide a lot of generalization power.
00:19:21.700 | - Generalization power. - Yes.
00:19:24.020 | - Distribution of the test set is assumed to be the same
00:19:28.420 | as the distribution of the training set.
00:19:30.220 | - Right.
00:19:31.100 | This is where current machine learning is too weak.
00:19:34.900 | It doesn't tell us anything,
00:19:36.780 | is not able to tell us anything about how our,
00:19:39.380 | you know, let's say are gonna generalize
00:19:41.060 | to a new distribution.
00:19:42.300 | And, you know, people may think,
00:19:44.900 | well, but there's nothing we can say
00:19:46.140 | if we don't know what the new distribution will be.
00:19:48.900 | The truth is humans are able
00:19:51.300 | to generalize to new distributions.
00:19:53.700 | - Yeah, how are we able to do that?
00:19:54.940 | - Yeah, because there is something,
00:19:56.460 | these new distributions,
00:19:58.060 | even though they could look very different
00:19:59.580 | from the training distributions,
00:20:01.660 | they have things in common.
00:20:02.540 | So let me give you a concrete example.
00:20:04.380 | You read a science fiction novel.
00:20:06.380 | The science fiction novel maybe, you know,
00:20:09.980 | brings you in some other planet
00:20:12.420 | where things look very different on the surface,
00:20:16.200 | but it's still the same laws of physics, right?
00:20:18.980 | And so you can read the book
00:20:20.060 | and you understand what's going on.
00:20:23.100 | So the distribution is very different,
00:20:25.020 | but because you can transport a lot of the knowledge
00:20:28.940 | you had from Earth about the underlying
00:20:32.380 | cause and effect relationships
00:20:33.940 | and physical mechanisms and all that,
00:20:36.580 | and maybe even social interactions,
00:20:39.060 | you can now make sense of what is going on on this planet
00:20:41.380 | where like visually, for example,
00:20:42.980 | things are totally different.
00:20:44.420 | - Taking that analogy further and distorting it,
00:20:48.820 | let's enter a science fiction world
00:20:51.380 | of say "Space Odyssey" 2001 with Hal.
00:20:54.780 | - Yeah.
00:20:55.620 | - Or maybe, which is probably one of my favorite AI movies.
00:21:00.620 | And then-- - Me too.
00:21:01.860 | - And then there's another one that a lot of people love
00:21:04.780 | that maybe a little bit outside of the AI community
00:21:08.460 | is "Ex Machina."
00:21:09.780 | - Right. - I don't know
00:21:10.620 | if you've seen it. - Yes, yes.
00:21:12.620 | - By the way, what are your views on that movie?
00:21:14.780 | Does it, are you able to enjoy it?
00:21:16.860 | - So there are things I like and things I hate.
00:21:21.220 | - So let me, you could talk about that
00:21:23.420 | in the context of a question I wanna ask,
00:21:25.940 | which is there's quite a large community of people
00:21:29.260 | from different backgrounds, often outside of AI,
00:21:32.060 | who are concerned about existential threat
00:21:34.140 | of artificial intelligence. - Right.
00:21:35.820 | - You've seen this community develop over time,
00:21:38.700 | you've seen, you have a perspective.
00:21:40.220 | So what do you think is the best way to talk
00:21:42.860 | about AI safety, to think about it,
00:21:45.420 | to have discourse about it within AI community
00:21:48.420 | and outside and grounded in the fact that "Ex Machina"
00:21:52.700 | is one of the main sources of information
00:21:54.660 | for the general public about AI?
00:21:56.660 | - So I think you're putting it right.
00:21:58.620 | There's a big difference between the sort of discussion
00:22:02.340 | we oughta have within the AI community
00:22:05.220 | and the sort of discussion that really matter
00:22:07.700 | in the general public.
00:22:09.180 | So I think the picture of Terminator and AI loose
00:22:15.420 | and killing people and super intelligence
00:22:18.780 | that's gonna destroy us, whatever we try,
00:22:21.460 | isn't really so useful for the public discussion
00:22:26.140 | because for the public discussion,
00:22:28.540 | the things I believe really matter are the short-term
00:22:32.940 | and medium-term, very likely negative impacts of AI
00:22:36.260 | on society, whether it's from security,
00:22:40.700 | like Big Brother scenarios with face recognition
00:22:43.420 | or killer robots or the impact on the job market
00:22:46.820 | or concentration of power and discrimination,
00:22:50.060 | all kinds of social issues which could actually,
00:22:53.840 | some of them could really threaten democracy, for example.
00:22:58.900 | - Just to clarify, when you said killer robots,
00:23:01.180 | you mean autonomous weapon, like the weapon systems.
00:23:04.180 | - Yes, I don't mean-- - Not "Turkish Terminator."
00:23:05.940 | - That's right.
00:23:07.340 | So I think these short and medium-term concerns
00:23:11.260 | should be important parts of the public debate.
00:23:13.860 | Now, existential risk for me
00:23:16.420 | is a very unlikely consideration,
00:23:20.260 | but still worth academic investigation
00:23:25.260 | in the same way that you could say,
00:23:26.940 | should we study what could happen if a meteorite
00:23:31.060 | came to Earth and destroyed it?
00:23:32.780 | So I think it's very unlikely that this is gonna happen
00:23:35.900 | or happen in a reasonable future.
00:23:38.420 | It's very, the sort of scenario of an AI getting loose
00:23:43.420 | goes against my understanding
00:23:45.220 | of at least current machine learning
00:23:46.700 | and current neural nets and so on.
00:23:48.620 | It's not plausible to me.
00:23:50.340 | But of course, I don't have a crystal ball
00:23:51.940 | and who knows what AI will be in 50 years from now.
00:23:54.380 | So I think it is worth that scientists study those problems.
00:23:57.660 | It's just not a pressing question as far as I'm concerned.
00:24:00.540 | - So before I continue down that line,
00:24:03.780 | I have a few questions there,
00:24:04.780 | but what do you like and not like about Ex Machina
00:24:09.500 | as a movie?
00:24:10.340 | 'Cause I actually watched it for the second time
00:24:12.260 | and enjoyed it.
00:24:13.860 | I hated it the first time,
00:24:15.140 | and I enjoyed it quite a bit more the second time
00:24:18.260 | when I sort of learned to accept certain pieces of it.
00:24:22.760 | You see it as a concept movie.
00:24:26.060 | What was your experience?
00:24:27.060 | What were your thoughts?
00:24:29.220 | - So the negative is the picture it paints
00:24:35.180 | of science is totally wrong.
00:24:37.240 | Science in general and AI in particular.
00:24:40.660 | Science is not happening in some hidden place
00:24:45.580 | by some really smart guy.
00:24:48.740 | - One person.
00:24:49.580 | - One person.
00:24:50.400 | This is totally unrealistic.
00:24:52.020 | This is not how it happens.
00:24:54.300 | Even a team of people in some isolated place
00:24:57.620 | will not make it.
00:24:58.540 | Science moves by small steps
00:25:01.780 | thanks to the collaboration and community
00:25:06.780 | of a large number of people interacting.
00:25:09.660 | All the scientists who are expert in their field
00:25:15.260 | kind of know what is going on,
00:25:16.580 | even in the industrial labs.
00:25:18.260 | Information flows and leaks and so on.
00:25:22.100 | The spirit of it is very different
00:25:24.900 | from the way science is painted in this movie.
00:25:28.780 | - Yeah, let me ask on that point.
00:25:31.500 | It's been the case to this point
00:25:34.180 | that kind of even if the research happens
00:25:36.260 | inside Google or Facebook, inside companies,
00:25:38.140 | it still kind of comes out, ideas come out.
00:25:40.740 | - Absolutely.
00:25:41.580 | - Do you think that will always be the case with AI?
00:25:42.940 | Is it possible to bottle ideas to the point
00:25:45.860 | where there's a set of breakthroughs
00:25:49.140 | that go completely undiscovered
00:25:50.300 | by the general research community?
00:25:52.460 | Do you think that's even possible?
00:25:54.980 | - It's possible, but it's unlikely.
00:25:56.780 | - Unlikely.
00:25:58.260 | - It's not how it is done now.
00:26:00.840 | It's not how I can foresee it in the foreseeable future.
00:26:04.640 | But of course, I don't have a crystal ball.
00:26:10.340 | And so who knows?
00:26:13.260 | This is science fiction, after all.
00:26:15.380 | But usually--
00:26:16.220 | - I think it's ominous that the lights went off
00:26:17.980 | during that discussion.
00:26:19.440 | - So the problem, again, there's a,
00:26:23.380 | one thing is the movie,
00:26:24.380 | and you could imagine all kinds of science fiction.
00:26:26.060 | The problem for me,
00:26:28.060 | maybe similar to the question about existential risk,
00:26:31.160 | is that this kind of movie paints such a wrong picture
00:26:36.160 | of what is actual, you know, the actual science
00:26:39.520 | and how it's going on,
00:26:40.620 | that it can have unfortunate effects
00:26:43.620 | on people's understanding of current science.
00:26:46.800 | And so that's kind of sad.
00:26:49.280 | Is it an important principle in research,
00:26:54.260 | which is diversity?
00:26:56.140 | So in other words, research is exploration.
00:26:59.340 | Research is exploration in the space of ideas.
00:27:02.000 | And different people will focus on different directions.
00:27:05.180 | And this is not just good, it's essential.
00:27:08.720 | So I'm totally fine with people exploring directions
00:27:13.080 | that are contrary to mine or look orthogonal to mine.
00:27:16.900 | I am more than fine.
00:27:20.300 | I think it's important.
00:27:21.900 | I and my friends don't claim we have universal truth
00:27:25.380 | about what will,
00:27:26.220 | especially about what will happen in the future.
00:27:28.860 | Now, that being said, we have our intuitions,
00:27:31.680 | and then we act accordingly,
00:27:34.220 | according to where we think we can be most useful
00:27:37.720 | and where society has the most to gain or to lose.
00:27:40.300 | We should have those debates
00:27:43.140 | and not end up in a society where there's only one voice
00:27:48.140 | and one way of thinking and research money is spread out.
00:27:55.140 | - Disagreement is a sign of good research, good science.
00:27:59.220 | - Yes.
00:28:00.060 | - The idea of bias in the human sense of bias.
00:28:04.060 | - Yeah.
00:28:05.660 | - How do you think about instilling in machine learning
00:28:09.740 | something that's aligned with human values in terms of bias?
00:28:14.060 | We, intuitively as human beings,
00:28:15.780 | have a concept of what bias means,
00:28:17.700 | of what a fundamental respect for other human beings means.
00:28:21.740 | But how do we instill that
00:28:23.780 | into machine learning systems, do you think?
00:28:26.660 | - So I think there are short-term things
00:28:29.700 | that are already happening,
00:28:31.500 | and then there are long-term things that we need to do.
00:28:35.260 | In the short term, there are techniques
00:28:37.900 | that have been proposed,
00:28:39.060 | and I think will continue to be improved,
00:28:40.980 | and maybe alternatives will come up,
00:28:43.380 | to take data sets in which we know there is bias,
00:28:47.260 | we can measure it.
00:28:48.180 | Pretty much any data set where humans are,
00:28:51.340 | you know, being observed, taking decisions,
00:28:53.140 | will have some sort of bias,
00:28:54.580 | discrimination against particular groups, and so on.
00:28:57.220 | And we can use machine learning techniques
00:29:01.100 | to try to build predictors, classifiers,
00:29:04.140 | that are gonna be less biased.
00:29:06.700 | We can do it, for example, using adversarial methods
00:29:11.540 | to make our systems less sensitive to these variables
00:29:16.260 | we should not be sensitive to.
00:29:18.260 | So these are clear, well-defined ways
00:29:20.780 | of trying to address the problem.
00:29:22.260 | Maybe they have weaknesses,
00:29:23.660 | and more research is needed, and so on.
00:29:25.620 | But I think, in fact, they're sufficiently mature
00:29:28.940 | that governments should start regulating companies
00:29:32.620 | where it matters, say, like insurance companies,
00:29:35.100 | so that they use those techniques,
00:29:36.740 | because those techniques will probably reduce the bias,
00:29:41.740 | but at a cost.
00:29:43.140 | For example, maybe their predictions will be less accurate,
00:29:45.860 | and so companies will not do it until you force them.
00:29:49.060 | All right, so this is short-term.
00:29:50.780 | Long-term, I'm really interested in thinking
00:29:55.180 | of how we can instill moral values into computers.
00:29:59.180 | Obviously, this is not something we'll achieve
00:30:01.420 | in the next five or 10 years.
00:30:03.100 | How can we, you know, there's already work
00:30:06.460 | in detecting emotions, for example,
00:30:09.300 | in images, in sounds, in texts,
00:30:13.140 | and also studying how different agents
00:30:18.140 | interacting in different ways may correspond
00:30:21.420 | to patterns of, say, injustice, which could trigger anger.
00:30:26.420 | So these are things we can do in the medium term,
00:30:31.660 | and eventually train computers to model, for example,
00:30:36.660 | how humans react emotionally.
00:30:42.200 | I would say the simplest thing is unfair situations,
00:30:47.020 | which trigger anger.
00:30:48.620 | This is one of the most basic emotions
00:30:50.500 | that we share with other animals.
00:30:52.700 | I think it's quite feasible within the next few years,
00:30:55.380 | so we can build systems that can detect
00:30:57.540 | these kinds of things, to the extent, unfortunately,
00:31:00.640 | that they understand enough about the world around us,
00:31:04.240 | which is a long time away,
00:31:05.820 | but maybe we can initially do this in virtual environments.
00:31:10.340 | So you can imagine like a video game
00:31:12.140 | where agents interact in some ways,
00:31:15.340 | and then some situations trigger an emotion.
00:31:19.020 | I think we could train machines to detect those situations
00:31:22.640 | and predict that the particular emotion
00:31:24.420 | will likely be felt if a human
00:31:27.260 | was playing one of the characters.
00:31:29.300 | - You have shown excitement and done a lot of excellent work
00:31:33.040 | with unsupervised learning, but on a super,
00:31:36.580 | you know, there's been a lot of success
00:31:38.560 | on the supervised learning side.
00:31:39.700 | - Yes, yes.
00:31:40.980 | - And one of the things I'm really passionate about
00:31:43.780 | is how humans and robots work together.
00:31:46.300 | And in the context of supervised learning,
00:31:49.380 | that means the process of annotation.
00:31:52.220 | Do you think about the problem of annotation
00:31:54.780 | of put in a more interesting way
00:31:58.380 | is humans teaching machines?
00:32:00.860 | - Yes. - Is there?
00:32:02.380 | - Yes, I think it's an important subject.
00:32:04.980 | Reducing it to annotation may be useful
00:32:08.100 | for somebody building a system tomorrow,
00:32:11.180 | but longer term, the process of teaching,
00:32:14.880 | I think is something that deserves a lot more attention
00:32:17.740 | from the machine learning community.
00:32:19.420 | So there are people who've coined the term machine teaching.
00:32:22.700 | So what are good strategies for teaching a learning agent?
00:32:26.580 | And can we design, train a system
00:32:30.620 | that is gonna be a good teacher?
00:32:32.620 | So in my group, we have a project
00:32:35.700 | called a BBI or BBI game,
00:32:38.620 | where there is a game or a scenario
00:32:42.420 | where there's a learning agent and a teaching agent.
00:32:46.300 | Presumably the teaching agent would eventually be a human,
00:32:50.540 | but we're not there yet.
00:32:52.060 | And the role of the teacher
00:32:57.300 | is to use its knowledge of the environment,
00:32:59.260 | which it can acquire using whatever way, brute force,
00:33:04.740 | to help the learner learn as quickly as possible.
00:33:09.020 | So the learner is gonna try to learn by itself,
00:33:11.180 | maybe using some exploration and whatever,
00:33:14.260 | but the teacher can choose,
00:33:17.440 | can have an influence on the interaction with the learner
00:33:21.500 | so as to guide the learner,
00:33:24.100 | maybe teach it the things
00:33:27.300 | that the learner has most trouble with,
00:33:29.060 | or just add the boundary between what it knows
00:33:30.980 | and doesn't know and so on.
00:33:32.620 | So there's a tradition of these kind of ideas
00:33:35.860 | from other fields, like tutorial systems, for example,
00:33:40.820 | and AI, and of course, people in the humanities
00:33:45.460 | have been thinking about these questions,
00:33:46.700 | but I think it's time that machine learning people
00:33:49.780 | look at this because in the future,
00:33:51.740 | we'll have more and more human-machine interaction
00:33:55.620 | with a human in a loop,
00:33:56.940 | and I think understanding how to make this work better--
00:34:00.540 | - All the problems around that are very interesting
00:34:02.660 | and not sufficiently addressed.
00:34:04.180 | You've done a lot of work with language, too.
00:34:07.540 | What aspect of the traditionally formulated Turing test,
00:34:12.540 | a test of natural language understanding and generation,
00:34:15.980 | in your eyes, is the most difficult?
00:34:18.340 | Of conversation, what in your eyes
00:34:20.220 | is the hardest part of conversation to solve for machines?
00:34:24.580 | - So I would say it's everything having to do
00:34:27.060 | with the non-linguistic knowledge,
00:34:30.340 | which implicitly you need
00:34:32.060 | in order to make sense of sentences.
00:34:34.860 | Things like the Winograd schema,
00:34:36.460 | so these sentences that are semantically ambiguous.
00:34:39.500 | In other words, you need to understand enough
00:34:42.300 | about the world in order to really interpret properly
00:34:45.620 | those sentences.
00:34:46.660 | I think these are interesting challenges
00:34:49.380 | for machine learning because they point in the direction
00:34:52.980 | of building systems that both understand how the world works
00:34:58.660 | and its causal relationships in the world
00:35:01.420 | and associate that knowledge with how to express it
00:35:06.380 | in language, either for reading or writing.
00:35:10.140 | - You speak French?
00:35:13.380 | - Yes, it's my mother tongue.
00:35:14.700 | - It's one of the romance languages.
00:35:17.180 | Do you think passing the Turing test
00:35:19.780 | and all the underlying challenges we just mentioned
00:35:22.100 | depend on language?
00:35:23.180 | Do you think it might be easier in French
00:35:24.840 | than it is in English?
00:35:26.140 | - No.
00:35:26.980 | - Or is it independent of language?
00:35:28.380 | - I think it's independent of language.
00:35:30.620 | I would like to build systems
00:35:35.540 | that can use the same principles,
00:35:38.540 | the same learning mechanisms
00:35:40.900 | to learn from human agents, whatever their language.
00:35:46.260 | - Well, certainly us humans can talk more beautifully
00:35:50.740 | and smoothly in poetry.
00:35:52.200 | So I'm Russian originally.
00:35:53.660 | I know poetry in Russian is maybe easier
00:35:58.620 | to convey complex ideas than it is in English.
00:36:02.380 | But maybe I'm showing my bias
00:36:04.180 | and some people could say that about French.
00:36:07.580 | But of course, the goal ultimately is our human brain
00:36:11.980 | is able to utilize any kind of those languages
00:36:15.900 | to use them as tools to convey meaning.
00:36:18.420 | - Yeah, of course there are differences between languages
00:36:20.460 | and maybe some are slightly better at some things.
00:36:22.700 | But in the grand scheme of things
00:36:25.020 | where we're trying to understand how the brain works
00:36:26.900 | and language and so on,
00:36:29.180 | I think these differences are minute.
00:36:31.180 | - So you've lived perhaps through an AI winter of sorts.
00:36:37.980 | - Yes.
00:36:39.820 | - How did you stay warm and continue your research?
00:36:44.620 | - Stay warm with friends.
00:36:45.660 | - With friends.
00:36:46.580 | Okay, so it's important to have friends.
00:36:48.500 | And what have you learned from the experience?
00:36:53.040 | - Listen to your inner voice.
00:36:55.600 | Don't be trying to just please the crowds and the fashion.
00:37:00.600 | And if you have a strong intuition about something
00:37:07.960 | that is not contradicted by actual evidence, go for it.
00:37:12.920 | I mean, it could be contradicted by people.
00:37:16.000 | (laughing)
00:37:17.160 | - Not your own instinct of based on everything
00:37:19.720 | you've learned.
00:37:20.560 | - Of course you have to adapt your beliefs
00:37:23.400 | when your experiments contradict those beliefs.
00:37:26.640 | But you have to stick to your beliefs otherwise.
00:37:31.740 | It's what allowed me to go through those years.
00:37:34.940 | It's what allowed me to persist in directions
00:37:39.280 | that took time, whatever other people think,
00:37:42.840 | took time to mature and bring fruits.
00:37:47.980 | - So history of AI is marked with these,
00:37:51.380 | of course it's marked with technical breakthroughs,
00:37:54.400 | but it's also marked with these seminal events
00:37:57.420 | that capture the imagination of the community.
00:38:00.740 | Most recent, I would say AlphaGo being
00:38:04.260 | the world champion human Go player was one of those moments.
00:38:08.780 | What do you think the next such moment might be?
00:38:12.780 | - Okay, so first of all, I think that these
00:38:15.140 | so-called seminal events are overrated.
00:38:18.200 | (laughing)
00:38:20.460 | As I said, science really moves by small steps.
00:38:25.940 | Now what happens is you make one more small step
00:38:30.740 | and it's like the drop that allows to,
00:38:35.740 | that fills the bucket and then you have
00:38:39.040 | drastic consequences because now you're able
00:38:41.180 | to do something you were not able to do before.
00:38:43.780 | Or now, say the cost of building some device
00:38:46.840 | or solving a problem becomes cheaper than what existed
00:38:50.380 | and you have a new market that opens up.
00:38:52.820 | So especially in the world of commerce and applications,
00:38:56.980 | the impact of a small scientific progress could be huge.
00:39:01.660 | But in the science itself, I think it's very, very gradual.
00:39:07.460 | - Where are these steps being taken now?
00:39:10.620 | So there's unsupervised learning.
00:39:13.140 | - So if I look at one trend that I like in my community,
00:39:18.140 | so for example, at Miele, my institute,
00:39:23.260 | what are the two hottest topics?
00:39:24.980 | GANs and reinforcement learning.
00:39:29.340 | Even though in Montreal in particular,
00:39:32.340 | like reinforcement learning was something
00:39:35.100 | pretty much absent just two or three years ago.
00:39:38.000 | So there's really a big interest from students
00:39:41.700 | and there's a big interest from people like me.
00:39:45.180 | So I would say this is something where we're gonna see
00:39:49.660 | more progress even though it hasn't yet provided much
00:39:54.380 | in terms of actual industrial fallout.
00:39:58.220 | Like even though there's AlphaGo,
00:40:00.420 | like Google is not making money on this right now.
00:40:02.880 | But I think over the long term,
00:40:04.740 | this is really, really important for many reasons.
00:40:07.240 | So in other words, I would say reinforcement learning
00:40:11.140 | may be more generally agent learning
00:40:13.900 | 'cause it doesn't have to be with rewards.
00:40:15.660 | It could be in all kinds of ways
00:40:17.060 | that an agent is learning about its environment.
00:40:19.500 | - Now, reinforcement learning you're excited about,
00:40:22.740 | do you think GANs could provide something,
00:40:27.540 | - Yes. - some moment in--
00:40:30.940 | - Well, GANs or other generative models,
00:40:34.700 | I believe will be crucial ingredients
00:40:38.740 | in building agents that can understand the world.
00:40:42.160 | A lot of the successes in reinforcement learning
00:40:45.980 | in the past has been with policy gradient
00:40:49.340 | where you just learn a policy,
00:40:51.100 | you don't actually learn a model of the world.
00:40:53.380 | But there are lots of issues with that.
00:40:55.560 | And we don't know how to do model-based RL right now.
00:40:58.500 | But I think this is where we have to go
00:41:02.020 | in order to build models that can generalize faster
00:41:05.500 | and better like to new distributions.
00:41:08.320 | That capture to some extent,
00:41:11.080 | at least the underlying causal mechanisms in the world.
00:41:14.940 | - Last question, what made you fall in love
00:41:19.280 | with artificial intelligence?
00:41:21.080 | If you look back, what was the first moment in your life
00:41:26.080 | when you were fascinated by either the human mind
00:41:29.300 | or the artificial mind?
00:41:31.320 | - You know, when I was an adolescent, I was reading a lot.
00:41:33.700 | And then I started reading science fiction.
00:41:36.480 | (both laughing)
00:41:37.800 | - There you go.
00:41:38.640 | - I got, that's it.
00:41:40.320 | That's where I got hooked.
00:41:42.560 | And then I had one of the first personal computers
00:41:47.360 | and I got hooked in programming.
00:41:51.080 | And so it just, you know.
00:41:52.440 | - Start with fiction and then make it a reality.
00:41:54.640 | - That's right.
00:41:56.040 | - Yoshua, thank you so much for talking to me.
00:41:57.560 | - My pleasure.
00:41:58.400 | (upbeat music)
00:42:00.980 | (upbeat music)
00:42:03.560 | (upbeat music)
00:42:06.140 | (upbeat music)
00:42:08.720 | (upbeat music)
00:42:11.300 | (upbeat music)
00:42:13.880 | [BLANK_AUDIO]