François Chollet: Limits of Deep Learning

00:00:00.000 | What do you think are the current limits of deep learning?

00:00:06.000 | If we look specifically at these function approximators that try to generalize from

00:00:14.000 | data.

00:00:15.000 | You've talked about local versus extreme generalization.

00:00:19.760 | You mentioned that neural networks don't generalize well, humans do.

00:00:22.800 | So there's this gap.

00:00:26.360 | And you've also mentioned that extreme generalization requires something like reasoning to fill

00:00:31.120 | those gaps.

00:00:32.120 | So how can we start trying to build systems like that?

00:00:35.760 | Right.

00:00:36.760 | Yeah.

00:00:37.760 | So this is by design, right?

00:00:39.200 | Deep learning models are like huge parametric models, differentiable, so continuous, that

00:00:48.160 | go from an input space to an output space.

00:00:51.320 | And they're trained with gradient descent.

00:00:52.720 | So they're trained pretty much point by point.

00:00:55.760 | They're learning a continuous geometric morphing from an input vector space to an output vector

00:01:02.040 | space.

00:01:03.040 | Right.

00:01:04.120 | And because this is done point by point, a deep neural network can only make sense of

00:01:11.400 | points in experience space that are very close to things that it has already seen in the

00:01:16.680 | stream data.

00:01:17.680 | At best, it can do interpolation across points.

00:01:22.520 | But that means in order to train your network, you need a dense sampling of the input cross

00:01:29.040 | output space, almost a point by point sampling, which can be very expensive if you're dealing

00:01:35.520 | with complex real world problems like autonomous driving, for instance, or robotics.

00:01:42.240 | It's doable if you're looking at the subset of the visual space, but even then it's still

00:01:46.480 | fairly expensive.

00:01:47.480 | You still need millions of examples.

00:01:49.700 | And it's only going to be able to make sense of things that are very close to what it has

00:01:54.280 | seen before.

00:01:55.280 | And in contrast to that, well, of course you have human intelligence, but even if you're

00:01:59.200 | not looking at human intelligence, you can look at very simple rules, algorithms.

00:02:05.360 | If you have a symbolic rule, it can actually apply to a very, very large set of inputs

00:02:11.600 | because it is abstract.

00:02:13.400 | It is not obtained by doing a point by point mapping.

00:02:18.400 | Right.

00:02:19.400 | So for instance, if you try to learn a sorting algorithm using a deep neural network, well,

00:02:24.280 | you're very much limited to learning point by point what the sorted representation of

00:02:30.320 | this specific list is like.

00:02:33.040 | But instead you could have a very, very simple sorting algorithm written in a few lines.

00:02:40.600 | Maybe it's just, you know, two nested loops.

00:02:44.260 | And it can process any list at all because it is abstract, because it is a set of rules.

00:02:50.840 | So deep learning is really like point by point geometric morphings, morphing, train, risk,

00:02:56.080 | and descent.

00:02:57.420 | And meanwhile, abstract rules can generalize much better.

00:03:02.760 | And I think the future is really to combine the two.

00:03:05.280 | So how do we, do you think, combine the two?

00:03:08.260 | How do we combine good point by point functions with programs, which is what the symbolic

00:03:16.420 | AI type systems?

00:03:17.420 | Yeah.

00:03:18.420 | At which levels the combination happen?

00:03:20.220 | I mean, obviously we're jumping into the realm of where there's no good answers.

00:03:25.980 | You just kind of ideas and intuitions and so on.

00:03:29.020 | Well, if you look at the really successful AI systems today, I think they are already

00:03:33.700 | hybrid systems that are combining symbolic AI with deep learning.

00:03:38.100 | For instance, successful robotics systems are already mostly model-based, rule-based,

00:03:46.060 | things like planning algorithms and so on.

00:03:48.020 | At the same time, they're using deep learning as perception modules.

00:03:52.820 | Sometimes they're using deep learning as a way to inject a fuzzy intuition into a rule-based

00:03:58.140 | process.

00:03:59.500 | If you look at a system like in a self-driving car, it's not just one big end to end neural

00:04:05.260 | network that wouldn't work at all, precisely because in order to train that, you would

00:04:09.420 | need a dense sampling of experience base when it comes to driving, which is completely unrealistic,

00:04:16.460 | obviously.

00:04:17.460 | Instead, the self-driving car is mostly symbolic.

00:04:24.660 | It's software, it's programmed by hand.

00:04:27.060 | It's mostly based on explicit models, in this case, mostly 3D models of the environment

00:04:34.260 | around the car, but it's interfacing with the real world using deep learning modules.

00:04:40.020 | So the deep learning there serves as a way to convert the raw sensory information to

00:04:44.620 | something usable by symbolic systems.

00:04:47.060 | Okay, well, let's linger on that a little more.

00:04:50.940 | So dense sampling from input to output, you said it's obviously very difficult.

00:04:56.900 | Is it possible?

00:04:58.860 | In the case of self-driving, you mean?

00:05:00.500 | Let's say self-driving, right?

00:05:02.180 | Self-driving for many people, let's not even talk about self-driving, let's talk about

00:05:09.020 | steering, so staying inside the lane.

00:05:13.780 | Lane following, yeah, it's definitely a problem you can solve with an end-to-end deep learning

00:05:16.980 | model, but that's like one small subset.

00:05:18.780 | Hold on a second.

00:05:19.780 | I don't know why you're jumping from the extreme so easily, because I disagree with you on

00:05:24.500 | that.

00:05:25.500 | I think, well, it's not obvious to me that you can solve lane following.

00:05:31.580 | No, it's not obvious.

00:05:33.260 | I think it's doable.

00:05:34.260 | I think in general, there is no hard limitations to what you can learn with a deep neural network

00:05:42.140 | as long as the search space is rich enough, is flexible enough.

00:05:49.940 | And as long as you have this dense sampling of the input cross output space.

00:05:53.900 | The problem is that this dense sampling could mean anything from 10,000 examples to trillions

00:06:00.340 | and trillions.

00:06:01.340 | So that's my question.

00:06:02.900 | So what's your intuition?

00:06:04.820 | And if you could just give it a chance and think what kind of problems can be solved

00:06:10.300 | by getting a huge amount of data and thereby creating a dense mapping.

00:06:16.580 | So let's think about natural language dialogue, the Turing test.

00:06:22.540 | Do you think the Turing test can be solved with a neural network alone?

00:06:28.820 | Well, the Turing test is all about tricking people into believing they're talking to a

00:06:35.100 | human.

00:06:36.100 | And I don't think that's actually very difficult because it's more about exploiting a human

00:06:43.420 | perception and not so much about intelligence.

00:06:46.140 | There's a big difference between mimicking intelligent behavior and actual intelligent

00:06:50.020 | behavior.

00:06:51.020 | So, okay, let's look at maybe the Alexa prize and so on.

00:06:53.860 | The different formulations of the natural language conversation that are less about

00:06:58.380 | mimicking and more about maintaining a fun conversation that lasts for 20 minutes.

00:07:03.420 | That's a little less about mimicking and that's more about, I mean, it's still mimicking,

00:07:07.760 | but it's more about being able to carry forward a conversation with all the tangents that

00:07:11.700 | happen in dialogue and so on.

00:07:13.300 | Do you think that problem is learnable with this kind of, with a neural network that does

00:07:20.820 | the point to point mapping?

00:07:23.100 | So I think it would be very, very challenging to do this with deep learning.

00:07:26.300 | I don't think it's out of the question either.

00:07:29.940 | I wouldn't rule it out.

00:07:31.940 | The space of problems that can be solved with a large neural network.

00:07:35.580 | What's your sense about the space of those problems?

00:07:38.700 | So useful problems for us?

00:07:41.180 | In theory, it's infinite, right?

00:07:42.940 | You can solve any problem.

00:07:44.820 | In practice, well, deep learning is a great fit for perception problems.

00:07:50.420 | In general, any problem which is not really amenable to explicit handcrafted rules or

00:07:59.780 | rules that you can generate by exhaustive search over some program space.

00:08:04.660 | So perception, artificial intuition, as long as you have a sufficient training dataset.

00:08:11.900 | And that's the question.

00:08:12.900 | I mean, perception, there's interpretation and understanding of the scene, which seems

00:08:17.300 | to be outside the reach of current perception systems.

00:08:21.540 | So do you think larger networks will be able to start to understand the physics and the

00:08:27.740 | physics of the scene, the three-dimensional structure and relationships of objects in

00:08:32.500 | the scene and so on?

00:08:34.260 | Or really that's where symbolic AI has to step in?

00:08:37.820 | Well, it's always possible to solve these problems with deep learning.

00:08:45.460 | It's just extremely inefficient.

00:08:47.140 | A model would be an explicit rule-based abstract model would be a far better, more compressed

00:08:53.780 | representation of physics.

00:08:55.580 | Then learning just this mapping between in this situation, this thing happens.

00:08:59.580 | If you change the situation slightly, then this other thing happens and so on.

00:09:03.020 | Do you think it's possible to automatically generate the programs that would require that

00:09:09.340 | kind of reasoning?

00:09:10.900 | Or does it have to?

00:09:12.100 | The way the expert systems failed is so many facts about the world had to be hand-coded

00:09:16.540 | in.

00:09:17.540 | Do you think it's possible to learn those logical statements that are true about the

00:09:23.900 | world and their relationships?

00:09:25.380 | Do you think, I mean, that's kind of what theorem proving at a basic level is trying

00:09:30.060 | to do, right?

00:09:31.060 | Yeah, except it's much harder to formulate statements about the world compared to formulating

00:09:36.900 | mathematical statements.

00:09:39.220 | Statements about the world tend to be subjective.

00:09:42.860 | So can you learn rule-based models?

00:09:47.540 | Yes, definitely.

00:09:49.660 | That's the field of program synthesis.

00:09:52.100 | However, today we just don't really know how to do it.

00:09:56.620 | So it's very much a grass search or tree search problem.

00:10:01.140 | And so we are limited to the sort of tree search and grass search algorithms that we

00:10:06.340 | have today.

00:10:07.340 | But certainly I think genetic algorithms are very promising.

00:10:11.380 | So almost like genetic programming.

00:10:12.380 | Genetic programming, exactly.

00:10:13.820 | Yeah.

00:10:14.820 | Yeah.

00:10:15.820 | Yeah.

00:10:16.820 | Yeah.

00:10:17.820 | Yeah.

00:10:18.820 | Yeah.

00:10:19.820 | Yeah.

00:10:20.820 | Yeah.

00:10:21.820 | Yeah.

00:10:22.820 | Yeah.

00:10:23.820 | Yeah.

00:10:24.820 | Yeah.

00:10:25.820 | Yeah.

00:10:26.820 | Yeah.

00:10:27.820 | Yeah.

00:10:28.820 | Yeah.

00:10:29.820 | Yeah.

00:10:30.820 | Yeah.

00:10:31.820 | Yeah.

00:10:32.820 | Yeah.

François Chollet: Limits of Deep Learning | AI Podcast Clips