Gary Marcus: Limits of Deep Learning

00:00:00.000 | You've highlighted in your new book as well, but a couple of years ago wrote a paper titled

00:00:05.840 | Deep Learning, a Critical Appraisal that lists 10 challenges faced by current deep learning

00:00:10.880 | systems.

00:00:11.880 | So let me summarize them as data efficiency, transfer learning, hierarchical knowledge,

00:00:19.120 | open-ended inference, explainability, integrating prior knowledge, causal reasoning, modeling

00:00:26.440 | on a stable world, robustness, adversarial examples, and so on.

00:00:30.480 | And then my favorite probably is reliability and engineering of real world systems.

00:00:35.440 | So whatever people can read the paper, they should definitely read the paper, should definitely

00:00:39.560 | read your book.

00:00:40.760 | But which of these challenges, if solved in your view, has the biggest impact on the AI

00:00:46.760 | community?

00:00:47.760 | It's a very good question.

00:00:50.280 | And I'm going to be evasive because I think that they go together a lot.

00:00:54.320 | So some of them might be solved independently of others.

00:00:57.760 | But I think a good solution to AI starts by having real, what I would call cognitive models

00:01:03.560 | of what's going on.

00:01:04.640 | So right now we have an approach that's dominant where you take statistical approximations

00:01:09.720 | of things, but you don't really understand them.

00:01:12.040 | So you know that bottles are correlated in your data with bottle caps, but you don't

00:01:16.800 | understand that there's a thread on the bottle cap that fits with the thread on the bottle,

00:01:21.480 | and that that tightens, and if I tighten enough that there's a seal and the water will come

00:01:25.640 | out.

00:01:26.640 | There's no machine that understands that.

00:01:28.440 | And having a good cognitive model of that kind of everyday phenomena is what we call

00:01:32.160 | common sense.

00:01:33.160 | And if you had that, then a lot of these other things start to fall into at least a little

00:01:38.000 | bit better place.

00:01:39.000 | Because right now you're learning correlations between pixels when you play a video game

00:01:42.640 | or something like that.

00:01:44.080 | And it doesn't work very well.

00:01:45.120 | It works when the video game is just the way that you studied it, and then you alter the

00:01:48.560 | video game in small ways, like you move the paddle and break out a few pixels, and the

00:01:52.440 | system falls apart.

00:01:53.440 | Because it doesn't understand, it doesn't have a representation of a paddle, a ball,

00:01:57.520 | a wall, a set of bricks, and so forth.

00:01:59.600 | And so it's reasoning at the wrong level.

00:02:02.680 | - So the idea of common sense, it's full of mystery.

00:02:06.520 | You've worked on it, but it's nevertheless full of mystery, full of promise.

00:02:11.680 | What does common sense mean?

00:02:12.800 | What does knowledge mean?

00:02:14.240 | So the way you've been discussing it now is very intuitive.

00:02:16.960 | It makes a lot of sense that that is something we should have, and that's something deep

00:02:20.040 | learning systems don't have.

00:02:22.000 | But the argument could be that we're oversimplifying it because, we're oversimplifying the notion

00:02:28.760 | of common sense because that's how we, it feels like we as humans at the cognitive level

00:02:34.280 | approach problems.

00:02:35.280 | - So a lot of people aren't actually gonna read my book.

00:02:39.620 | But if they did read the book, one of the things that might come as a surprise to them

00:02:43.320 | is that we actually say common sense is really hard and really complicated.

00:02:47.920 | So they would probably, my critics know that I like common sense, but that chapter actually

00:02:53.640 | starts by us beating up, not on deep learning, but kind of on our own home team as it will.

00:02:58.280 | So Ernie and I are first and foremost people that believe in at least some of what good

00:03:03.400 | old fashioned AI tried to do.

00:03:04.880 | So we believe in symbols and logic and programming, things like that are important.

00:03:10.160 | And we go through why even those tools that we hold fairly dear aren't really enough.

00:03:15.840 | So we talk about why common sense is actually many things.

00:03:19.080 | And some of them fit really well with those classical sets of tools.

00:03:22.840 | So things like taxonomy.

00:03:24.560 | So I know that a bottle is an object or it's a vessel, let's say, and I know a vessel is

00:03:29.960 | an object and objects are material things in the physical world.

00:03:33.840 | So like I can make some inferences if I know that vessels need to not have holes in them,

00:03:43.360 | then I can infer that in order to carry their contents, then I can infer that a bottle shouldn't

00:03:47.360 | have a hole in order to carry its contents.

00:03:49.080 | So you can do hierarchical inference and so forth.

00:03:52.280 | And we say that's great, but it's only a tiny piece of what you need for common sense.

00:03:57.400 | We give lots of examples that don't fit into that.

00:03:59.720 | So another one that we talk about is a cheese grater.

00:04:02.800 | You've got holes in a cheese grater, you've got a handle on top.

00:04:05.820 | You can build a model in the game engine sense of a model so that you could have a little

00:04:10.840 | cartoon character flying around through the holes of the grater.

00:04:14.040 | But we don't have a system yet, taxonomy doesn't help us that much, that really understands

00:04:18.760 | why the handle is on top and what you do with the handle or why all of those circles are

00:04:23.040 | sharp or how you'd hold the cheese with respect to the grater in order to make it actually

00:04:27.840 | work.

00:04:28.840 | - So those ideas are just abstractions that could emerge on a system like a very large

00:04:34.920 | deep neural network?

00:04:36.080 | - I'm a skeptic that that kind of emergence per se can work.

00:04:39.440 | So I think that deep learning might play a role in the systems that do what I want systems

00:04:44.680 | to do, but it won't do it by itself.

00:04:46.120 | I've never seen a deep learning system really extract an abstract concept.

00:04:52.240 | What they do, principled reasons for that stemming from how back propagation works,

00:04:56.800 | how the architectures are set up.

00:04:59.440 | One example is deep learning people actually all build in something called convolution,

00:05:05.880 | which Jan Lekun is famous for, which is an abstraction.

00:05:09.440 | They don't have their systems learn this.

00:05:11.200 | So the abstraction is an object that looks the same if it appears in different places.

00:05:15.480 | And what Lekun figured out and why, essentially why he was a co-winner of the Turing Award

00:05:20.400 | was that if you program this in innately, then your system would be a whole lot more

00:05:26.160 | efficient.

00:05:27.160 | In principle, this should be learnable, but people don't have systems that kind of reify

00:05:31.920 | things and make them more abstract.

00:05:34.240 | And so what you'd really wind up with if you don't program that in advance is a system

00:05:38.600 | that kind of realizes that this is the same thing as this, but then I take your little

00:05:42.560 | clock there and I move it over and it doesn't realize that the same thing applies to the

00:05:45.800 | clock.

00:05:46.800 | So the really nice thing, you're right, that convolution is just one of the things that's

00:05:51.520 | like it's an innate feature that's programmed by the human expert.

00:05:55.600 | - We need more of those, not less.

00:05:57.480 | - Yes, but the nice feature is it feels like that requires coming up with that brilliant

00:06:02.800 | idea, can get you a Turing Award, but it requires less effort than encoding, and something we'll

00:06:11.600 | talk about, the expert system.

00:06:12.920 | So encoding a lot of knowledge by hand.

00:06:16.320 | So it feels like there's a huge amount of limitations which you clearly outline with

00:06:21.680 | deep learning, but the nice feature of deep learning, whatever it is able to accomplish,

00:06:25.920 | it does a lot of stuff automatically without human intervention.

00:06:30.880 | - Well, and that's part of why people love it, right?

00:06:32.960 | But I always think of this quote from Bertrand Russell, which is it has all the advantages

00:06:38.840 | of theft over honest toil.

00:06:40.800 | It's really hard to program into a machine a notion of causality or even how a bottle

00:06:47.120 | works or what containers are.

00:06:48.960 | Ernie Davis and I wrote a, I don't know, 45-page academic paper trying just to understand what

00:06:54.320 | a container is, which I don't think anybody ever read the paper, but it's a very detailed

00:06:59.760 | analysis of all the things, well, not even all, some of the things you need to do in

00:07:03.400 | order to understand a container.

00:07:04.880 | It would be a whole lot nicer, and I'm a co-author on the paper, I made it a little bit better,

00:07:09.400 | but Ernie did the hard work for that particular paper.

00:07:11.960 | And it took him like three months to get the logical statements correct, and maybe that's

00:07:17.320 | not the right way to do it.

00:07:19.160 | It's a way to do it, but on that way of doing it, it's really hard work to do something

00:07:24.560 | as simple as understanding containers, and nobody wants to do that hard work.

00:07:29.160 | Even Ernie didn't want to do that hard work.

00:07:32.240 | Everybody would rather just feed their system in with a bunch of videos with a bunch of

00:07:35.760 | containers and have the systems infer how containers work.

00:07:40.080 | It would be like so much less effort, let the machine do the work.

00:07:43.100 | And so I understand the impulse, I understand why people want to do that.

00:07:46.360 | I just don't think that it works.

00:07:47.960 | I've never seen anybody build a system that in a robust way can actually watch videos

00:07:54.760 | and predict exactly which containers would leak and which ones wouldn't or something

00:07:58.560 | like that.

00:07:59.560 | And I know someone's going to go out and do that since I said it, and I look forward to

00:08:02.680 | seeing it.

00:08:04.480 | But getting these things to work robustly is really, really hard.

00:08:09.200 | So Jan LeCun, who was my colleague at NYU for many years, thinks that the hard work

00:08:14.880 | should go into defining an unsupervised learning algorithm that will watch videos, use the

00:08:21.440 | next frame basically in order to tell it what's going on.

00:08:24.840 | And he thinks that's the Royal Road, and he's willing to put in the work in devising that

00:08:28.560 | algorithm.

00:08:29.600 | Then he wants the machine to do the rest.

00:08:31.920 | And again, I understand the impulse.

00:08:34.140 | My intuition, based on years of watching this stuff and making predictions 20 years ago

00:08:39.240 | that still hold even though there's a lot more computation and so forth, is that we

00:08:43.000 | actually have to do a different kind of hard work, which is more like building a design

00:08:46.560 | specification for what we want the system to do, doing hard engineering work to figure

00:08:51.000 | out how we do things like what Jan did for convolution in order to figure out how to

00:08:55.640 | encode complex knowledge into the systems.

00:08:58.760 | The current systems don't have that much knowledge other than convolution, which is again this

00:09:03.800 | object being in different places and having the same perception, I guess I'll say, same

00:09:10.800 | appearance.

00:09:13.180 | People don't want to do that work.

00:09:14.480 | They don't see how to naturally fit one with the other.

00:09:17.760 | I think that's, yes, absolutely.

00:09:19.640 | But also on the expert system side, there's a temptation to go too far the other way.

00:09:23.880 | So it's just having an expert sort of sit down and encode the description, the framework

00:09:28.800 | for what a container is, and then having the system reason the rest.

00:09:32.800 | From my view, one really exciting possibility is of active learning where it's continuous

00:09:37.720 | interaction between a human and machine.

00:09:40.420 | As the machine, there's kind of deep learning type extraction of information from data,

00:09:45.120 | patterns and so on, but humans also guiding the learning procedures, guiding both the

00:09:54.160 | process and the framework of how the machine learns, whatever the task is.

00:09:58.240 | I was with you with almost everything you said except the phrase deep learning.

00:10:02.820 | What I think you really want there is a new form of machine learning.

00:10:06.760 | So let's remember, deep learning is a particular way of doing machine learning.

00:10:10.440 | Most often it's done with supervised data for perceptual categories.

00:10:15.080 | There are other things you can do with deep learning, some of them quite technical, but

00:10:19.000 | the standard use of deep learning is I have a lot of examples and I have labels for them.

00:10:23.840 | So here are pictures.

00:10:25.080 | This one's the Eiffel Tower.

00:10:26.640 | This one's the Sears Tower.

00:10:27.920 | This one's the Empire State Building.

00:10:29.600 | This one's a cat.

00:10:30.600 | This one's a pig and so forth.

00:10:31.600 | You just get millions of examples, millions of labels.

00:10:35.240 | Deep learning is extremely good at that.

00:10:37.480 | It's better than any other solution that anybody has devised, but it is not good at representing

00:10:42.200 | abstract knowledge.

00:10:43.720 | It's not good at representing things like bottles contain liquid and have tops to them

00:10:50.080 | and so forth.

00:10:51.080 | It's not very good at learning or representing that kind of knowledge.

00:10:53.760 | It is an example of having a machine learn something, but it's a machine that learns

00:10:59.000 | a particular kind of thing, which is object classification.

00:11:01.920 | It's not a particularly good algorithm for learning about the abstractions that govern

00:11:06.200 | our world.

00:11:07.200 | There may be such a thing.

00:11:09.440 | Part of what we counsel in the book is maybe people should be working on devising such

00:11:12.440 | things.

00:11:13.440 | - So one possibility, just I wonder what you think about it, is deep neural networks do

00:11:19.280 | form abstractions, but they're not accessible to us humans in terms of we can't--

00:11:25.640 | - There's some truth in that.

00:11:27.040 | - So is it possible that either current or future neural networks form very high level

00:11:32.040 | abstractions which are as powerful as our human abstractions of common sense?

00:11:38.640 | We just can't get a hold of them.

00:11:41.200 | And so the problem is essentially we need to make them explainable.

00:11:45.560 | - This is an astute question, but I think the answer is at least partly no.

00:11:49.560 | One of the kinds of classical neural network architectures that we call an auto-associator,

00:11:53.840 | it just tries to take an input, goes through a set of hidden layers, and comes out with

00:11:58.440 | an output.

00:11:59.440 | And it's supposed to learn essentially the identity function, that your input is the

00:12:02.600 | same as your output.

00:12:03.600 | So you think of this binary numbers, you've got like the one, the two, the four, the eight,

00:12:06.840 | the 16, and so forth.

00:12:08.680 | And so if you want to input 24, you turn on the 16, you turn on the eight.

00:12:12.120 | It's like binary one, one, and a bunch of zeros.

00:12:15.240 | So I did some experiments in 1998 with the precursors of contemporary deep learning.

00:12:23.240 | And what I showed was you could train these networks on all the even numbers, and they

00:12:28.560 | would never generalize to the odd number.

00:12:30.920 | A lot of people thought that I was, I don't know, an idiot or faking the experiment or

00:12:34.800 | wasn't true or whatever.

00:12:36.520 | But it is true that with this class of networks that we had in that day, that they would never

00:12:41.880 | ever make this generalization.

00:12:43.920 | And it's not that the networks were stupid, it's that they see the world in a different

00:12:47.880 | way than we do.

00:12:49.840 | They were basically concerned, what is the probability that the rightmost output node

00:12:54.840 | is going to be one?

00:12:56.420 | And as far as they were concerned, in everything they'd ever been trained on, it was a zero.

00:13:01.400 | That node had never been turned on.

00:13:03.440 | And so they figured, well, I turned it on now.

00:13:05.360 | Whereas a person would look at the same problem and say, well, it's obvious, we're just doing

00:13:08.480 | the thing that corresponds.

00:13:10.120 | The Latin for it is mutandis mutandis, will change what needs to be changed.

00:13:14.720 | And we do this, this is what algebra is.

00:13:16.880 | So I can do f of x equals y plus two, and I can do it for a couple of values.

00:13:21.640 | I can tell you if y is three, then x is five, and if y is four, x is six.

00:13:25.600 | And now I can do it with some totally different number, like a million.

00:13:27.720 | Then you can say, well, obviously it's a million and two, because you have an algebraic operation

00:13:31.720 | that you're applying to a variable.

00:13:34.000 | And deep learning systems kind of emulate that, but they don't actually do it.

00:13:38.800 | The particular example, you can fudge a solution to that particular problem.

00:13:44.400 | The general form of that problem remains that what they learn is really correlations between

00:13:48.880 | different input and output nodes.

00:13:50.360 | And they're complex correlations with multiple nodes involved and so forth.

00:13:54.880 | But ultimately, they're correlative.

00:13:56.600 | They're not structured over these operations over variables.

00:13:59.320 | Now, someday people may do a new form of deep learning that incorporates that stuff.

00:14:03.600 | And I think it will help a lot.

00:14:04.600 | And there's some tentative work on things like differentiable programming right now

00:14:08.280 | that fall into that category.

00:14:09.960 | But the sort of classic stuff like people use for ImageNet doesn't have it.

00:14:14.960 | And you have people like Hinton going around saying symbol manipulation, like what Marcus,

00:14:19.160 | what I advocate is like the gasoline engine.

00:14:21.960 | It's obsolete.

00:14:22.960 | We should just use this cool electric power that we've got with a deep learning.

00:14:26.920 | And that's really destructive, because we really do need to have the gasoline engine

00:14:31.480 | stuff that represents...

00:14:33.520 | I mean, I don't think it's a good analogy, but we really do need to have the stuff that

00:14:38.760 | represents symbols.

00:14:39.760 | - Yeah, and Hinton as well would say that we do need to throw out everything and start

00:14:44.880 | over.

00:14:45.880 | So there's a...

00:14:46.880 | - Yeah, Hinton said that to Axios.

00:14:49.360 | And I had a friend who interviewed him and tried to pin him down on what exactly we need

00:14:53.560 | to throw out.

00:14:54.560 | And he was very evasive.

00:14:55.560 | - Well, of course, 'cause we can't...

00:14:57.920 | If he knew that he'd throw it out himself, but I mean, you can't have it both ways.

00:15:01.640 | You can't be like, I don't know what to throw out, but I am gonna throw out the symbols.

00:15:06.160 | I mean, and not just the symbols, but the variables and the operations of the variables.

00:15:10.480 | Don't forget the operations of the variables, the stuff that I'm endorsing, and which John

00:15:15.380 | McCarthy did when he founded AI, that stuff is the stuff that we build most computers

00:15:20.040 | out of.

00:15:21.040 | There are people now who say, we don't need computer programmers anymore.

00:15:25.200 | Not quite looking at the statistics of how much computer programmers actually get paid

00:15:28.080 | right now.

00:15:29.240 | We need lots of computer programmers.

00:15:30.720 | And most of them, they do a little bit of machine learning, but they still do a lot

00:15:34.800 | of code, right?

00:15:36.320 | Code where it's like, if the value of X is greater than the value of Y, then do this

00:15:40.200 | kind of thing, like conditionals and comparing operations over variables.

00:15:43.840 | Like there's this fantasy you can machine learn anything.

00:15:46.480 | There's some things you would never wanna machine learn.

00:15:48.760 | I would not use a phone operating system that was machine learned.

00:15:52.360 | Like you made a bunch of phone calls and you recorded which packets were transmitted and

00:15:56.000 | you just machine learned it.

00:15:57.360 | It'd be insane.

00:15:58.800 | Or to build a web browser by taking logs of keystrokes and images, screenshots, and then

00:16:05.920 | trying to learn the relation between them.

00:16:07.920 | Nobody would ever, no rational person would ever try to build a browser that way.

00:16:12.160 | They would use symbol manipulation, the stuff that I think AI needs to avail itself of in

00:16:16.320 | addition to deep learning.

00:16:17.480 | [BLANK_AUDIO]

00:16:27.480 | [BLANK_AUDIO]

Gary Marcus: Limits of Deep Learning | AI Podcast Clips