back to indexGary Marcus: Limits of Deep Learning | AI Podcast Clips
00:00:00.000 |
You've highlighted in your new book as well, but a couple of years ago wrote a paper titled 00:00:05.840 |
Deep Learning, a Critical Appraisal that lists 10 challenges faced by current deep learning 00:00:11.880 |
So let me summarize them as data efficiency, transfer learning, hierarchical knowledge, 00:00:19.120 |
open-ended inference, explainability, integrating prior knowledge, causal reasoning, modeling 00:00:26.440 |
on a stable world, robustness, adversarial examples, and so on. 00:00:30.480 |
And then my favorite probably is reliability and engineering of real world systems. 00:00:35.440 |
So whatever people can read the paper, they should definitely read the paper, should definitely 00:00:40.760 |
But which of these challenges, if solved in your view, has the biggest impact on the AI 00:00:50.280 |
And I'm going to be evasive because I think that they go together a lot. 00:00:54.320 |
So some of them might be solved independently of others. 00:00:57.760 |
But I think a good solution to AI starts by having real, what I would call cognitive models 00:01:04.640 |
So right now we have an approach that's dominant where you take statistical approximations 00:01:09.720 |
of things, but you don't really understand them. 00:01:12.040 |
So you know that bottles are correlated in your data with bottle caps, but you don't 00:01:16.800 |
understand that there's a thread on the bottle cap that fits with the thread on the bottle, 00:01:21.480 |
and that that tightens, and if I tighten enough that there's a seal and the water will come 00:01:28.440 |
And having a good cognitive model of that kind of everyday phenomena is what we call 00:01:33.160 |
And if you had that, then a lot of these other things start to fall into at least a little 00:01:39.000 |
Because right now you're learning correlations between pixels when you play a video game 00:01:45.120 |
It works when the video game is just the way that you studied it, and then you alter the 00:01:48.560 |
video game in small ways, like you move the paddle and break out a few pixels, and the 00:01:53.440 |
Because it doesn't understand, it doesn't have a representation of a paddle, a ball, 00:02:02.680 |
- So the idea of common sense, it's full of mystery. 00:02:06.520 |
You've worked on it, but it's nevertheless full of mystery, full of promise. 00:02:14.240 |
So the way you've been discussing it now is very intuitive. 00:02:16.960 |
It makes a lot of sense that that is something we should have, and that's something deep 00:02:22.000 |
But the argument could be that we're oversimplifying it because, we're oversimplifying the notion 00:02:28.760 |
of common sense because that's how we, it feels like we as humans at the cognitive level 00:02:35.280 |
- So a lot of people aren't actually gonna read my book. 00:02:39.620 |
But if they did read the book, one of the things that might come as a surprise to them 00:02:43.320 |
is that we actually say common sense is really hard and really complicated. 00:02:47.920 |
So they would probably, my critics know that I like common sense, but that chapter actually 00:02:53.640 |
starts by us beating up, not on deep learning, but kind of on our own home team as it will. 00:02:58.280 |
So Ernie and I are first and foremost people that believe in at least some of what good 00:03:04.880 |
So we believe in symbols and logic and programming, things like that are important. 00:03:10.160 |
And we go through why even those tools that we hold fairly dear aren't really enough. 00:03:15.840 |
So we talk about why common sense is actually many things. 00:03:19.080 |
And some of them fit really well with those classical sets of tools. 00:03:24.560 |
So I know that a bottle is an object or it's a vessel, let's say, and I know a vessel is 00:03:29.960 |
an object and objects are material things in the physical world. 00:03:33.840 |
So like I can make some inferences if I know that vessels need to not have holes in them, 00:03:43.360 |
then I can infer that in order to carry their contents, then I can infer that a bottle shouldn't 00:03:49.080 |
So you can do hierarchical inference and so forth. 00:03:52.280 |
And we say that's great, but it's only a tiny piece of what you need for common sense. 00:03:57.400 |
We give lots of examples that don't fit into that. 00:03:59.720 |
So another one that we talk about is a cheese grater. 00:04:02.800 |
You've got holes in a cheese grater, you've got a handle on top. 00:04:05.820 |
You can build a model in the game engine sense of a model so that you could have a little 00:04:10.840 |
cartoon character flying around through the holes of the grater. 00:04:14.040 |
But we don't have a system yet, taxonomy doesn't help us that much, that really understands 00:04:18.760 |
why the handle is on top and what you do with the handle or why all of those circles are 00:04:23.040 |
sharp or how you'd hold the cheese with respect to the grater in order to make it actually 00:04:28.840 |
- So those ideas are just abstractions that could emerge on a system like a very large 00:04:36.080 |
- I'm a skeptic that that kind of emergence per se can work. 00:04:39.440 |
So I think that deep learning might play a role in the systems that do what I want systems 00:04:46.120 |
I've never seen a deep learning system really extract an abstract concept. 00:04:52.240 |
What they do, principled reasons for that stemming from how back propagation works, 00:04:59.440 |
One example is deep learning people actually all build in something called convolution, 00:05:05.880 |
which Jan Lekun is famous for, which is an abstraction. 00:05:11.200 |
So the abstraction is an object that looks the same if it appears in different places. 00:05:15.480 |
And what Lekun figured out and why, essentially why he was a co-winner of the Turing Award 00:05:20.400 |
was that if you program this in innately, then your system would be a whole lot more 00:05:27.160 |
In principle, this should be learnable, but people don't have systems that kind of reify 00:05:34.240 |
And so what you'd really wind up with if you don't program that in advance is a system 00:05:38.600 |
that kind of realizes that this is the same thing as this, but then I take your little 00:05:42.560 |
clock there and I move it over and it doesn't realize that the same thing applies to the 00:05:46.800 |
So the really nice thing, you're right, that convolution is just one of the things that's 00:05:51.520 |
like it's an innate feature that's programmed by the human expert. 00:05:57.480 |
- Yes, but the nice feature is it feels like that requires coming up with that brilliant 00:06:02.800 |
idea, can get you a Turing Award, but it requires less effort than encoding, and something we'll 00:06:16.320 |
So it feels like there's a huge amount of limitations which you clearly outline with 00:06:21.680 |
deep learning, but the nice feature of deep learning, whatever it is able to accomplish, 00:06:25.920 |
it does a lot of stuff automatically without human intervention. 00:06:30.880 |
- Well, and that's part of why people love it, right? 00:06:32.960 |
But I always think of this quote from Bertrand Russell, which is it has all the advantages 00:06:40.800 |
It's really hard to program into a machine a notion of causality or even how a bottle 00:06:48.960 |
Ernie Davis and I wrote a, I don't know, 45-page academic paper trying just to understand what 00:06:54.320 |
a container is, which I don't think anybody ever read the paper, but it's a very detailed 00:06:59.760 |
analysis of all the things, well, not even all, some of the things you need to do in 00:07:04.880 |
It would be a whole lot nicer, and I'm a co-author on the paper, I made it a little bit better, 00:07:09.400 |
but Ernie did the hard work for that particular paper. 00:07:11.960 |
And it took him like three months to get the logical statements correct, and maybe that's 00:07:19.160 |
It's a way to do it, but on that way of doing it, it's really hard work to do something 00:07:24.560 |
as simple as understanding containers, and nobody wants to do that hard work. 00:07:32.240 |
Everybody would rather just feed their system in with a bunch of videos with a bunch of 00:07:35.760 |
containers and have the systems infer how containers work. 00:07:40.080 |
It would be like so much less effort, let the machine do the work. 00:07:43.100 |
And so I understand the impulse, I understand why people want to do that. 00:07:47.960 |
I've never seen anybody build a system that in a robust way can actually watch videos 00:07:54.760 |
and predict exactly which containers would leak and which ones wouldn't or something 00:07:59.560 |
And I know someone's going to go out and do that since I said it, and I look forward to 00:08:04.480 |
But getting these things to work robustly is really, really hard. 00:08:09.200 |
So Jan LeCun, who was my colleague at NYU for many years, thinks that the hard work 00:08:14.880 |
should go into defining an unsupervised learning algorithm that will watch videos, use the 00:08:21.440 |
next frame basically in order to tell it what's going on. 00:08:24.840 |
And he thinks that's the Royal Road, and he's willing to put in the work in devising that 00:08:34.140 |
My intuition, based on years of watching this stuff and making predictions 20 years ago 00:08:39.240 |
that still hold even though there's a lot more computation and so forth, is that we 00:08:43.000 |
actually have to do a different kind of hard work, which is more like building a design 00:08:46.560 |
specification for what we want the system to do, doing hard engineering work to figure 00:08:51.000 |
out how we do things like what Jan did for convolution in order to figure out how to 00:08:58.760 |
The current systems don't have that much knowledge other than convolution, which is again this 00:09:03.800 |
object being in different places and having the same perception, I guess I'll say, same 00:09:14.480 |
They don't see how to naturally fit one with the other. 00:09:19.640 |
But also on the expert system side, there's a temptation to go too far the other way. 00:09:23.880 |
So it's just having an expert sort of sit down and encode the description, the framework 00:09:28.800 |
for what a container is, and then having the system reason the rest. 00:09:32.800 |
From my view, one really exciting possibility is of active learning where it's continuous 00:09:40.420 |
As the machine, there's kind of deep learning type extraction of information from data, 00:09:45.120 |
patterns and so on, but humans also guiding the learning procedures, guiding both the 00:09:54.160 |
process and the framework of how the machine learns, whatever the task is. 00:09:58.240 |
I was with you with almost everything you said except the phrase deep learning. 00:10:02.820 |
What I think you really want there is a new form of machine learning. 00:10:06.760 |
So let's remember, deep learning is a particular way of doing machine learning. 00:10:10.440 |
Most often it's done with supervised data for perceptual categories. 00:10:15.080 |
There are other things you can do with deep learning, some of them quite technical, but 00:10:19.000 |
the standard use of deep learning is I have a lot of examples and I have labels for them. 00:10:31.600 |
You just get millions of examples, millions of labels. 00:10:37.480 |
It's better than any other solution that anybody has devised, but it is not good at representing 00:10:43.720 |
It's not good at representing things like bottles contain liquid and have tops to them 00:10:51.080 |
It's not very good at learning or representing that kind of knowledge. 00:10:53.760 |
It is an example of having a machine learn something, but it's a machine that learns 00:10:59.000 |
a particular kind of thing, which is object classification. 00:11:01.920 |
It's not a particularly good algorithm for learning about the abstractions that govern 00:11:09.440 |
Part of what we counsel in the book is maybe people should be working on devising such 00:11:13.440 |
- So one possibility, just I wonder what you think about it, is deep neural networks do 00:11:19.280 |
form abstractions, but they're not accessible to us humans in terms of we can't-- 00:11:27.040 |
- So is it possible that either current or future neural networks form very high level 00:11:32.040 |
abstractions which are as powerful as our human abstractions of common sense? 00:11:41.200 |
And so the problem is essentially we need to make them explainable. 00:11:45.560 |
- This is an astute question, but I think the answer is at least partly no. 00:11:49.560 |
One of the kinds of classical neural network architectures that we call an auto-associator, 00:11:53.840 |
it just tries to take an input, goes through a set of hidden layers, and comes out with 00:11:59.440 |
And it's supposed to learn essentially the identity function, that your input is the 00:12:03.600 |
So you think of this binary numbers, you've got like the one, the two, the four, the eight, 00:12:08.680 |
And so if you want to input 24, you turn on the 16, you turn on the eight. 00:12:12.120 |
It's like binary one, one, and a bunch of zeros. 00:12:15.240 |
So I did some experiments in 1998 with the precursors of contemporary deep learning. 00:12:23.240 |
And what I showed was you could train these networks on all the even numbers, and they 00:12:30.920 |
A lot of people thought that I was, I don't know, an idiot or faking the experiment or 00:12:36.520 |
But it is true that with this class of networks that we had in that day, that they would never 00:12:43.920 |
And it's not that the networks were stupid, it's that they see the world in a different 00:12:49.840 |
They were basically concerned, what is the probability that the rightmost output node 00:12:56.420 |
And as far as they were concerned, in everything they'd ever been trained on, it was a zero. 00:13:03.440 |
And so they figured, well, I turned it on now. 00:13:05.360 |
Whereas a person would look at the same problem and say, well, it's obvious, we're just doing 00:13:10.120 |
The Latin for it is mutandis mutandis, will change what needs to be changed. 00:13:16.880 |
So I can do f of x equals y plus two, and I can do it for a couple of values. 00:13:21.640 |
I can tell you if y is three, then x is five, and if y is four, x is six. 00:13:25.600 |
And now I can do it with some totally different number, like a million. 00:13:27.720 |
Then you can say, well, obviously it's a million and two, because you have an algebraic operation 00:13:34.000 |
And deep learning systems kind of emulate that, but they don't actually do it. 00:13:38.800 |
The particular example, you can fudge a solution to that particular problem. 00:13:44.400 |
The general form of that problem remains that what they learn is really correlations between 00:13:50.360 |
And they're complex correlations with multiple nodes involved and so forth. 00:13:56.600 |
They're not structured over these operations over variables. 00:13:59.320 |
Now, someday people may do a new form of deep learning that incorporates that stuff. 00:14:04.600 |
And there's some tentative work on things like differentiable programming right now 00:14:09.960 |
But the sort of classic stuff like people use for ImageNet doesn't have it. 00:14:14.960 |
And you have people like Hinton going around saying symbol manipulation, like what Marcus, 00:14:22.960 |
We should just use this cool electric power that we've got with a deep learning. 00:14:26.920 |
And that's really destructive, because we really do need to have the gasoline engine 00:14:33.520 |
I mean, I don't think it's a good analogy, but we really do need to have the stuff that 00:14:39.760 |
- Yeah, and Hinton as well would say that we do need to throw out everything and start 00:14:49.360 |
And I had a friend who interviewed him and tried to pin him down on what exactly we need 00:14:57.920 |
If he knew that he'd throw it out himself, but I mean, you can't have it both ways. 00:15:01.640 |
You can't be like, I don't know what to throw out, but I am gonna throw out the symbols. 00:15:06.160 |
I mean, and not just the symbols, but the variables and the operations of the variables. 00:15:10.480 |
Don't forget the operations of the variables, the stuff that I'm endorsing, and which John 00:15:15.380 |
McCarthy did when he founded AI, that stuff is the stuff that we build most computers 00:15:21.040 |
There are people now who say, we don't need computer programmers anymore. 00:15:25.200 |
Not quite looking at the statistics of how much computer programmers actually get paid 00:15:30.720 |
And most of them, they do a little bit of machine learning, but they still do a lot 00:15:36.320 |
Code where it's like, if the value of X is greater than the value of Y, then do this 00:15:40.200 |
kind of thing, like conditionals and comparing operations over variables. 00:15:43.840 |
Like there's this fantasy you can machine learn anything. 00:15:46.480 |
There's some things you would never wanna machine learn. 00:15:48.760 |
I would not use a phone operating system that was machine learned. 00:15:52.360 |
Like you made a bunch of phone calls and you recorded which packets were transmitted and 00:15:58.800 |
Or to build a web browser by taking logs of keystrokes and images, screenshots, and then 00:16:07.920 |
Nobody would ever, no rational person would ever try to build a browser that way. 00:16:12.160 |
They would use symbol manipulation, the stuff that I think AI needs to avail itself of in