Back to Index

Gary Marcus: Limits of Deep Learning | AI Podcast Clips


Transcript

You've highlighted in your new book as well, but a couple of years ago wrote a paper titled Deep Learning, a Critical Appraisal that lists 10 challenges faced by current deep learning systems. So let me summarize them as data efficiency, transfer learning, hierarchical knowledge, open-ended inference, explainability, integrating prior knowledge, causal reasoning, modeling on a stable world, robustness, adversarial examples, and so on.

And then my favorite probably is reliability and engineering of real world systems. So whatever people can read the paper, they should definitely read the paper, should definitely read your book. But which of these challenges, if solved in your view, has the biggest impact on the AI community? It's a very good question.

And I'm going to be evasive because I think that they go together a lot. So some of them might be solved independently of others. But I think a good solution to AI starts by having real, what I would call cognitive models of what's going on. So right now we have an approach that's dominant where you take statistical approximations of things, but you don't really understand them.

So you know that bottles are correlated in your data with bottle caps, but you don't understand that there's a thread on the bottle cap that fits with the thread on the bottle, and that that tightens, and if I tighten enough that there's a seal and the water will come out.

There's no machine that understands that. And having a good cognitive model of that kind of everyday phenomena is what we call common sense. And if you had that, then a lot of these other things start to fall into at least a little bit better place. Because right now you're learning correlations between pixels when you play a video game or something like that.

And it doesn't work very well. It works when the video game is just the way that you studied it, and then you alter the video game in small ways, like you move the paddle and break out a few pixels, and the system falls apart. Because it doesn't understand, it doesn't have a representation of a paddle, a ball, a wall, a set of bricks, and so forth.

And so it's reasoning at the wrong level. - So the idea of common sense, it's full of mystery. You've worked on it, but it's nevertheless full of mystery, full of promise. What does common sense mean? What does knowledge mean? So the way you've been discussing it now is very intuitive.

It makes a lot of sense that that is something we should have, and that's something deep learning systems don't have. But the argument could be that we're oversimplifying it because, we're oversimplifying the notion of common sense because that's how we, it feels like we as humans at the cognitive level approach problems.

- So a lot of people aren't actually gonna read my book. But if they did read the book, one of the things that might come as a surprise to them is that we actually say common sense is really hard and really complicated. So they would probably, my critics know that I like common sense, but that chapter actually starts by us beating up, not on deep learning, but kind of on our own home team as it will.

So Ernie and I are first and foremost people that believe in at least some of what good old fashioned AI tried to do. So we believe in symbols and logic and programming, things like that are important. And we go through why even those tools that we hold fairly dear aren't really enough.

So we talk about why common sense is actually many things. And some of them fit really well with those classical sets of tools. So things like taxonomy. So I know that a bottle is an object or it's a vessel, let's say, and I know a vessel is an object and objects are material things in the physical world.

So like I can make some inferences if I know that vessels need to not have holes in them, then I can infer that in order to carry their contents, then I can infer that a bottle shouldn't have a hole in order to carry its contents. So you can do hierarchical inference and so forth.

And we say that's great, but it's only a tiny piece of what you need for common sense. We give lots of examples that don't fit into that. So another one that we talk about is a cheese grater. You've got holes in a cheese grater, you've got a handle on top.

You can build a model in the game engine sense of a model so that you could have a little cartoon character flying around through the holes of the grater. But we don't have a system yet, taxonomy doesn't help us that much, that really understands why the handle is on top and what you do with the handle or why all of those circles are sharp or how you'd hold the cheese with respect to the grater in order to make it actually work.

- So those ideas are just abstractions that could emerge on a system like a very large deep neural network? - I'm a skeptic that that kind of emergence per se can work. So I think that deep learning might play a role in the systems that do what I want systems to do, but it won't do it by itself.

I've never seen a deep learning system really extract an abstract concept. What they do, principled reasons for that stemming from how back propagation works, how the architectures are set up. One example is deep learning people actually all build in something called convolution, which Jan Lekun is famous for, which is an abstraction.

They don't have their systems learn this. So the abstraction is an object that looks the same if it appears in different places. And what Lekun figured out and why, essentially why he was a co-winner of the Turing Award was that if you program this in innately, then your system would be a whole lot more efficient.

In principle, this should be learnable, but people don't have systems that kind of reify things and make them more abstract. And so what you'd really wind up with if you don't program that in advance is a system that kind of realizes that this is the same thing as this, but then I take your little clock there and I move it over and it doesn't realize that the same thing applies to the clock.

So the really nice thing, you're right, that convolution is just one of the things that's like it's an innate feature that's programmed by the human expert. - We need more of those, not less. - Yes, but the nice feature is it feels like that requires coming up with that brilliant idea, can get you a Turing Award, but it requires less effort than encoding, and something we'll talk about, the expert system.

So encoding a lot of knowledge by hand. So it feels like there's a huge amount of limitations which you clearly outline with deep learning, but the nice feature of deep learning, whatever it is able to accomplish, it does a lot of stuff automatically without human intervention. - Well, and that's part of why people love it, right?

But I always think of this quote from Bertrand Russell, which is it has all the advantages of theft over honest toil. It's really hard to program into a machine a notion of causality or even how a bottle works or what containers are. Ernie Davis and I wrote a, I don't know, 45-page academic paper trying just to understand what a container is, which I don't think anybody ever read the paper, but it's a very detailed analysis of all the things, well, not even all, some of the things you need to do in order to understand a container.

It would be a whole lot nicer, and I'm a co-author on the paper, I made it a little bit better, but Ernie did the hard work for that particular paper. And it took him like three months to get the logical statements correct, and maybe that's not the right way to do it.

It's a way to do it, but on that way of doing it, it's really hard work to do something as simple as understanding containers, and nobody wants to do that hard work. Even Ernie didn't want to do that hard work. Everybody would rather just feed their system in with a bunch of videos with a bunch of containers and have the systems infer how containers work.

It would be like so much less effort, let the machine do the work. And so I understand the impulse, I understand why people want to do that. I just don't think that it works. I've never seen anybody build a system that in a robust way can actually watch videos and predict exactly which containers would leak and which ones wouldn't or something like that.

And I know someone's going to go out and do that since I said it, and I look forward to seeing it. But getting these things to work robustly is really, really hard. So Jan LeCun, who was my colleague at NYU for many years, thinks that the hard work should go into defining an unsupervised learning algorithm that will watch videos, use the next frame basically in order to tell it what's going on.

And he thinks that's the Royal Road, and he's willing to put in the work in devising that algorithm. Then he wants the machine to do the rest. And again, I understand the impulse. My intuition, based on years of watching this stuff and making predictions 20 years ago that still hold even though there's a lot more computation and so forth, is that we actually have to do a different kind of hard work, which is more like building a design specification for what we want the system to do, doing hard engineering work to figure out how we do things like what Jan did for convolution in order to figure out how to encode complex knowledge into the systems.

The current systems don't have that much knowledge other than convolution, which is again this object being in different places and having the same perception, I guess I'll say, same appearance. People don't want to do that work. They don't see how to naturally fit one with the other. I think that's, yes, absolutely.

But also on the expert system side, there's a temptation to go too far the other way. So it's just having an expert sort of sit down and encode the description, the framework for what a container is, and then having the system reason the rest. From my view, one really exciting possibility is of active learning where it's continuous interaction between a human and machine.

As the machine, there's kind of deep learning type extraction of information from data, patterns and so on, but humans also guiding the learning procedures, guiding both the process and the framework of how the machine learns, whatever the task is. I was with you with almost everything you said except the phrase deep learning.

What I think you really want there is a new form of machine learning. So let's remember, deep learning is a particular way of doing machine learning. Most often it's done with supervised data for perceptual categories. There are other things you can do with deep learning, some of them quite technical, but the standard use of deep learning is I have a lot of examples and I have labels for them.

So here are pictures. This one's the Eiffel Tower. This one's the Sears Tower. This one's the Empire State Building. This one's a cat. This one's a pig and so forth. You just get millions of examples, millions of labels. Deep learning is extremely good at that. It's better than any other solution that anybody has devised, but it is not good at representing abstract knowledge.

It's not good at representing things like bottles contain liquid and have tops to them and so forth. It's not very good at learning or representing that kind of knowledge. It is an example of having a machine learn something, but it's a machine that learns a particular kind of thing, which is object classification.

It's not a particularly good algorithm for learning about the abstractions that govern our world. There may be such a thing. Part of what we counsel in the book is maybe people should be working on devising such things. - So one possibility, just I wonder what you think about it, is deep neural networks do form abstractions, but they're not accessible to us humans in terms of we can't-- - There's some truth in that.

- So is it possible that either current or future neural networks form very high level abstractions which are as powerful as our human abstractions of common sense? We just can't get a hold of them. And so the problem is essentially we need to make them explainable. - This is an astute question, but I think the answer is at least partly no.

One of the kinds of classical neural network architectures that we call an auto-associator, it just tries to take an input, goes through a set of hidden layers, and comes out with an output. And it's supposed to learn essentially the identity function, that your input is the same as your output.

So you think of this binary numbers, you've got like the one, the two, the four, the eight, the 16, and so forth. And so if you want to input 24, you turn on the 16, you turn on the eight. It's like binary one, one, and a bunch of zeros.

So I did some experiments in 1998 with the precursors of contemporary deep learning. And what I showed was you could train these networks on all the even numbers, and they would never generalize to the odd number. A lot of people thought that I was, I don't know, an idiot or faking the experiment or wasn't true or whatever.

But it is true that with this class of networks that we had in that day, that they would never ever make this generalization. And it's not that the networks were stupid, it's that they see the world in a different way than we do. They were basically concerned, what is the probability that the rightmost output node is going to be one?

And as far as they were concerned, in everything they'd ever been trained on, it was a zero. That node had never been turned on. And so they figured, well, I turned it on now. Whereas a person would look at the same problem and say, well, it's obvious, we're just doing the thing that corresponds.

The Latin for it is mutandis mutandis, will change what needs to be changed. And we do this, this is what algebra is. So I can do f of x equals y plus two, and I can do it for a couple of values. I can tell you if y is three, then x is five, and if y is four, x is six.

And now I can do it with some totally different number, like a million. Then you can say, well, obviously it's a million and two, because you have an algebraic operation that you're applying to a variable. And deep learning systems kind of emulate that, but they don't actually do it.

The particular example, you can fudge a solution to that particular problem. The general form of that problem remains that what they learn is really correlations between different input and output nodes. And they're complex correlations with multiple nodes involved and so forth. But ultimately, they're correlative. They're not structured over these operations over variables.

Now, someday people may do a new form of deep learning that incorporates that stuff. And I think it will help a lot. And there's some tentative work on things like differentiable programming right now that fall into that category. But the sort of classic stuff like people use for ImageNet doesn't have it.

And you have people like Hinton going around saying symbol manipulation, like what Marcus, what I advocate is like the gasoline engine. It's obsolete. We should just use this cool electric power that we've got with a deep learning. And that's really destructive, because we really do need to have the gasoline engine stuff that represents...

I mean, I don't think it's a good analogy, but we really do need to have the stuff that represents symbols. - Yeah, and Hinton as well would say that we do need to throw out everything and start over. So there's a... - Yeah, Hinton said that to Axios. And I had a friend who interviewed him and tried to pin him down on what exactly we need to throw out.

And he was very evasive. - Well, of course, 'cause we can't... If he knew that he'd throw it out himself, but I mean, you can't have it both ways. You can't be like, I don't know what to throw out, but I am gonna throw out the symbols. I mean, and not just the symbols, but the variables and the operations of the variables.

Don't forget the operations of the variables, the stuff that I'm endorsing, and which John McCarthy did when he founded AI, that stuff is the stuff that we build most computers out of. There are people now who say, we don't need computer programmers anymore. Not quite looking at the statistics of how much computer programmers actually get paid right now.

We need lots of computer programmers. And most of them, they do a little bit of machine learning, but they still do a lot of code, right? Code where it's like, if the value of X is greater than the value of Y, then do this kind of thing, like conditionals and comparing operations over variables.

Like there's this fantasy you can machine learn anything. There's some things you would never wanna machine learn. I would not use a phone operating system that was machine learned. Like you made a bunch of phone calls and you recorded which packets were transmitted and you just machine learned it.

It'd be insane. Or to build a web browser by taking logs of keystrokes and images, screenshots, and then trying to learn the relation between them. Nobody would ever, no rational person would ever try to build a browser that way. They would use symbol manipulation, the stuff that I think AI needs to avail itself of in addition to deep learning.