Back to Index

François Chollet: Limits of Deep Learning | AI Podcast Clips


Transcript

What do you think are the current limits of deep learning? If we look specifically at these function approximators that try to generalize from data. You've talked about local versus extreme generalization. You mentioned that neural networks don't generalize well, humans do. So there's this gap. And you've also mentioned that extreme generalization requires something like reasoning to fill those gaps.

So how can we start trying to build systems like that? Right. Yeah. So this is by design, right? Deep learning models are like huge parametric models, differentiable, so continuous, that go from an input space to an output space. And they're trained with gradient descent. So they're trained pretty much point by point.

They're learning a continuous geometric morphing from an input vector space to an output vector space. Right. And because this is done point by point, a deep neural network can only make sense of points in experience space that are very close to things that it has already seen in the stream data.

At best, it can do interpolation across points. But that means in order to train your network, you need a dense sampling of the input cross output space, almost a point by point sampling, which can be very expensive if you're dealing with complex real world problems like autonomous driving, for instance, or robotics.

It's doable if you're looking at the subset of the visual space, but even then it's still fairly expensive. You still need millions of examples. And it's only going to be able to make sense of things that are very close to what it has seen before. And in contrast to that, well, of course you have human intelligence, but even if you're not looking at human intelligence, you can look at very simple rules, algorithms.

If you have a symbolic rule, it can actually apply to a very, very large set of inputs because it is abstract. It is not obtained by doing a point by point mapping. Right. So for instance, if you try to learn a sorting algorithm using a deep neural network, well, you're very much limited to learning point by point what the sorted representation of this specific list is like.

But instead you could have a very, very simple sorting algorithm written in a few lines. Maybe it's just, you know, two nested loops. And it can process any list at all because it is abstract, because it is a set of rules. So deep learning is really like point by point geometric morphings, morphing, train, risk, and descent.

And meanwhile, abstract rules can generalize much better. And I think the future is really to combine the two. So how do we, do you think, combine the two? How do we combine good point by point functions with programs, which is what the symbolic AI type systems? Yeah. At which levels the combination happen?

I mean, obviously we're jumping into the realm of where there's no good answers. You just kind of ideas and intuitions and so on. Well, if you look at the really successful AI systems today, I think they are already hybrid systems that are combining symbolic AI with deep learning. For instance, successful robotics systems are already mostly model-based, rule-based, things like planning algorithms and so on.

At the same time, they're using deep learning as perception modules. Sometimes they're using deep learning as a way to inject a fuzzy intuition into a rule-based process. If you look at a system like in a self-driving car, it's not just one big end to end neural network that wouldn't work at all, precisely because in order to train that, you would need a dense sampling of experience base when it comes to driving, which is completely unrealistic, obviously.

Instead, the self-driving car is mostly symbolic. It's software, it's programmed by hand. It's mostly based on explicit models, in this case, mostly 3D models of the environment around the car, but it's interfacing with the real world using deep learning modules. So the deep learning there serves as a way to convert the raw sensory information to something usable by symbolic systems.

Okay, well, let's linger on that a little more. So dense sampling from input to output, you said it's obviously very difficult. Is it possible? In the case of self-driving, you mean? Let's say self-driving, right? Self-driving for many people, let's not even talk about self-driving, let's talk about steering, so staying inside the lane.

Lane following, yeah, it's definitely a problem you can solve with an end-to-end deep learning model, but that's like one small subset. Hold on a second. I don't know why you're jumping from the extreme so easily, because I disagree with you on that. I think, well, it's not obvious to me that you can solve lane following.

No, it's not obvious. I think it's doable. I think in general, there is no hard limitations to what you can learn with a deep neural network as long as the search space is rich enough, is flexible enough. And as long as you have this dense sampling of the input cross output space.

The problem is that this dense sampling could mean anything from 10,000 examples to trillions and trillions. So that's my question. So what's your intuition? And if you could just give it a chance and think what kind of problems can be solved by getting a huge amount of data and thereby creating a dense mapping.

So let's think about natural language dialogue, the Turing test. Do you think the Turing test can be solved with a neural network alone? Well, the Turing test is all about tricking people into believing they're talking to a human. And I don't think that's actually very difficult because it's more about exploiting a human perception and not so much about intelligence.

There's a big difference between mimicking intelligent behavior and actual intelligent behavior. So, okay, let's look at maybe the Alexa prize and so on. The different formulations of the natural language conversation that are less about mimicking and more about maintaining a fun conversation that lasts for 20 minutes. That's a little less about mimicking and that's more about, I mean, it's still mimicking, but it's more about being able to carry forward a conversation with all the tangents that happen in dialogue and so on.

Do you think that problem is learnable with this kind of, with a neural network that does the point to point mapping? So I think it would be very, very challenging to do this with deep learning. I don't think it's out of the question either. I wouldn't rule it out.

The space of problems that can be solved with a large neural network. What's your sense about the space of those problems? So useful problems for us? In theory, it's infinite, right? You can solve any problem. In practice, well, deep learning is a great fit for perception problems. In general, any problem which is not really amenable to explicit handcrafted rules or rules that you can generate by exhaustive search over some program space.

So perception, artificial intuition, as long as you have a sufficient training dataset. And that's the question. I mean, perception, there's interpretation and understanding of the scene, which seems to be outside the reach of current perception systems. So do you think larger networks will be able to start to understand the physics and the physics of the scene, the three-dimensional structure and relationships of objects in the scene and so on?

Or really that's where symbolic AI has to step in? Well, it's always possible to solve these problems with deep learning. It's just extremely inefficient. A model would be an explicit rule-based abstract model would be a far better, more compressed representation of physics. Then learning just this mapping between in this situation, this thing happens.

If you change the situation slightly, then this other thing happens and so on. Do you think it's possible to automatically generate the programs that would require that kind of reasoning? Or does it have to? The way the expert systems failed is so many facts about the world had to be hand-coded in.

Do you think it's possible to learn those logical statements that are true about the world and their relationships? Do you think, I mean, that's kind of what theorem proving at a basic level is trying to do, right? Yeah, except it's much harder to formulate statements about the world compared to formulating mathematical statements.

Statements about the world tend to be subjective. So can you learn rule-based models? Yes, definitely. That's the field of program synthesis. However, today we just don't really know how to do it. So it's very much a grass search or tree search problem. And so we are limited to the sort of tree search and grass search algorithms that we have today.

But certainly I think genetic algorithms are very promising. So almost like genetic programming. Genetic programming, exactly. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.