back to index

Gary Marcus: Limits of Deep Learning | AI Podcast Clips


Whisper Transcript | Transcript Only Page

00:00:00.000 | You've highlighted in your new book as well, but a couple of years ago wrote a paper titled
00:00:05.840 | Deep Learning, a Critical Appraisal that lists 10 challenges faced by current deep learning
00:00:10.880 | systems.
00:00:11.880 | So let me summarize them as data efficiency, transfer learning, hierarchical knowledge,
00:00:19.120 | open-ended inference, explainability, integrating prior knowledge, causal reasoning, modeling
00:00:26.440 | on a stable world, robustness, adversarial examples, and so on.
00:00:30.480 | And then my favorite probably is reliability and engineering of real world systems.
00:00:35.440 | So whatever people can read the paper, they should definitely read the paper, should definitely
00:00:39.560 | read your book.
00:00:40.760 | But which of these challenges, if solved in your view, has the biggest impact on the AI
00:00:46.760 | community?
00:00:47.760 | It's a very good question.
00:00:50.280 | And I'm going to be evasive because I think that they go together a lot.
00:00:54.320 | So some of them might be solved independently of others.
00:00:57.760 | But I think a good solution to AI starts by having real, what I would call cognitive models
00:01:03.560 | of what's going on.
00:01:04.640 | So right now we have an approach that's dominant where you take statistical approximations
00:01:09.720 | of things, but you don't really understand them.
00:01:12.040 | So you know that bottles are correlated in your data with bottle caps, but you don't
00:01:16.800 | understand that there's a thread on the bottle cap that fits with the thread on the bottle,
00:01:21.480 | and that that tightens, and if I tighten enough that there's a seal and the water will come
00:01:26.640 | There's no machine that understands that.
00:01:28.440 | And having a good cognitive model of that kind of everyday phenomena is what we call
00:01:32.160 | common sense.
00:01:33.160 | And if you had that, then a lot of these other things start to fall into at least a little
00:01:38.000 | bit better place.
00:01:39.000 | Because right now you're learning correlations between pixels when you play a video game
00:01:42.640 | or something like that.
00:01:44.080 | And it doesn't work very well.
00:01:45.120 | It works when the video game is just the way that you studied it, and then you alter the
00:01:48.560 | video game in small ways, like you move the paddle and break out a few pixels, and the
00:01:52.440 | system falls apart.
00:01:53.440 | Because it doesn't understand, it doesn't have a representation of a paddle, a ball,
00:01:57.520 | a wall, a set of bricks, and so forth.
00:01:59.600 | And so it's reasoning at the wrong level.
00:02:02.680 | - So the idea of common sense, it's full of mystery.
00:02:06.520 | You've worked on it, but it's nevertheless full of mystery, full of promise.
00:02:11.680 | What does common sense mean?
00:02:12.800 | What does knowledge mean?
00:02:14.240 | So the way you've been discussing it now is very intuitive.
00:02:16.960 | It makes a lot of sense that that is something we should have, and that's something deep
00:02:20.040 | learning systems don't have.
00:02:22.000 | But the argument could be that we're oversimplifying it because, we're oversimplifying the notion
00:02:28.760 | of common sense because that's how we, it feels like we as humans at the cognitive level
00:02:34.280 | approach problems.
00:02:35.280 | - So a lot of people aren't actually gonna read my book.
00:02:39.620 | But if they did read the book, one of the things that might come as a surprise to them
00:02:43.320 | is that we actually say common sense is really hard and really complicated.
00:02:47.920 | So they would probably, my critics know that I like common sense, but that chapter actually
00:02:53.640 | starts by us beating up, not on deep learning, but kind of on our own home team as it will.
00:02:58.280 | So Ernie and I are first and foremost people that believe in at least some of what good
00:03:03.400 | old fashioned AI tried to do.
00:03:04.880 | So we believe in symbols and logic and programming, things like that are important.
00:03:10.160 | And we go through why even those tools that we hold fairly dear aren't really enough.
00:03:15.840 | So we talk about why common sense is actually many things.
00:03:19.080 | And some of them fit really well with those classical sets of tools.
00:03:22.840 | So things like taxonomy.
00:03:24.560 | So I know that a bottle is an object or it's a vessel, let's say, and I know a vessel is
00:03:29.960 | an object and objects are material things in the physical world.
00:03:33.840 | So like I can make some inferences if I know that vessels need to not have holes in them,
00:03:43.360 | then I can infer that in order to carry their contents, then I can infer that a bottle shouldn't
00:03:47.360 | have a hole in order to carry its contents.
00:03:49.080 | So you can do hierarchical inference and so forth.
00:03:52.280 | And we say that's great, but it's only a tiny piece of what you need for common sense.
00:03:57.400 | We give lots of examples that don't fit into that.
00:03:59.720 | So another one that we talk about is a cheese grater.
00:04:02.800 | You've got holes in a cheese grater, you've got a handle on top.
00:04:05.820 | You can build a model in the game engine sense of a model so that you could have a little
00:04:10.840 | cartoon character flying around through the holes of the grater.
00:04:14.040 | But we don't have a system yet, taxonomy doesn't help us that much, that really understands
00:04:18.760 | why the handle is on top and what you do with the handle or why all of those circles are
00:04:23.040 | sharp or how you'd hold the cheese with respect to the grater in order to make it actually
00:04:27.840 | work.
00:04:28.840 | - So those ideas are just abstractions that could emerge on a system like a very large
00:04:34.920 | deep neural network?
00:04:36.080 | - I'm a skeptic that that kind of emergence per se can work.
00:04:39.440 | So I think that deep learning might play a role in the systems that do what I want systems
00:04:44.680 | to do, but it won't do it by itself.
00:04:46.120 | I've never seen a deep learning system really extract an abstract concept.
00:04:52.240 | What they do, principled reasons for that stemming from how back propagation works,
00:04:56.800 | how the architectures are set up.
00:04:59.440 | One example is deep learning people actually all build in something called convolution,
00:05:05.880 | which Jan Lekun is famous for, which is an abstraction.
00:05:09.440 | They don't have their systems learn this.
00:05:11.200 | So the abstraction is an object that looks the same if it appears in different places.
00:05:15.480 | And what Lekun figured out and why, essentially why he was a co-winner of the Turing Award
00:05:20.400 | was that if you program this in innately, then your system would be a whole lot more
00:05:26.160 | efficient.
00:05:27.160 | In principle, this should be learnable, but people don't have systems that kind of reify
00:05:31.920 | things and make them more abstract.
00:05:34.240 | And so what you'd really wind up with if you don't program that in advance is a system
00:05:38.600 | that kind of realizes that this is the same thing as this, but then I take your little
00:05:42.560 | clock there and I move it over and it doesn't realize that the same thing applies to the
00:05:45.800 | clock.
00:05:46.800 | So the really nice thing, you're right, that convolution is just one of the things that's
00:05:51.520 | like it's an innate feature that's programmed by the human expert.
00:05:55.600 | - We need more of those, not less.
00:05:57.480 | - Yes, but the nice feature is it feels like that requires coming up with that brilliant
00:06:02.800 | idea, can get you a Turing Award, but it requires less effort than encoding, and something we'll
00:06:11.600 | talk about, the expert system.
00:06:12.920 | So encoding a lot of knowledge by hand.
00:06:16.320 | So it feels like there's a huge amount of limitations which you clearly outline with
00:06:21.680 | deep learning, but the nice feature of deep learning, whatever it is able to accomplish,
00:06:25.920 | it does a lot of stuff automatically without human intervention.
00:06:30.880 | - Well, and that's part of why people love it, right?
00:06:32.960 | But I always think of this quote from Bertrand Russell, which is it has all the advantages
00:06:38.840 | of theft over honest toil.
00:06:40.800 | It's really hard to program into a machine a notion of causality or even how a bottle
00:06:47.120 | works or what containers are.
00:06:48.960 | Ernie Davis and I wrote a, I don't know, 45-page academic paper trying just to understand what
00:06:54.320 | a container is, which I don't think anybody ever read the paper, but it's a very detailed
00:06:59.760 | analysis of all the things, well, not even all, some of the things you need to do in
00:07:03.400 | order to understand a container.
00:07:04.880 | It would be a whole lot nicer, and I'm a co-author on the paper, I made it a little bit better,
00:07:09.400 | but Ernie did the hard work for that particular paper.
00:07:11.960 | And it took him like three months to get the logical statements correct, and maybe that's
00:07:17.320 | not the right way to do it.
00:07:19.160 | It's a way to do it, but on that way of doing it, it's really hard work to do something
00:07:24.560 | as simple as understanding containers, and nobody wants to do that hard work.
00:07:29.160 | Even Ernie didn't want to do that hard work.
00:07:32.240 | Everybody would rather just feed their system in with a bunch of videos with a bunch of
00:07:35.760 | containers and have the systems infer how containers work.
00:07:40.080 | It would be like so much less effort, let the machine do the work.
00:07:43.100 | And so I understand the impulse, I understand why people want to do that.
00:07:46.360 | I just don't think that it works.
00:07:47.960 | I've never seen anybody build a system that in a robust way can actually watch videos
00:07:54.760 | and predict exactly which containers would leak and which ones wouldn't or something
00:07:58.560 | like that.
00:07:59.560 | And I know someone's going to go out and do that since I said it, and I look forward to
00:08:02.680 | seeing it.
00:08:04.480 | But getting these things to work robustly is really, really hard.
00:08:09.200 | So Jan LeCun, who was my colleague at NYU for many years, thinks that the hard work
00:08:14.880 | should go into defining an unsupervised learning algorithm that will watch videos, use the
00:08:21.440 | next frame basically in order to tell it what's going on.
00:08:24.840 | And he thinks that's the Royal Road, and he's willing to put in the work in devising that
00:08:28.560 | algorithm.
00:08:29.600 | Then he wants the machine to do the rest.
00:08:31.920 | And again, I understand the impulse.
00:08:34.140 | My intuition, based on years of watching this stuff and making predictions 20 years ago
00:08:39.240 | that still hold even though there's a lot more computation and so forth, is that we
00:08:43.000 | actually have to do a different kind of hard work, which is more like building a design
00:08:46.560 | specification for what we want the system to do, doing hard engineering work to figure
00:08:51.000 | out how we do things like what Jan did for convolution in order to figure out how to
00:08:55.640 | encode complex knowledge into the systems.
00:08:58.760 | The current systems don't have that much knowledge other than convolution, which is again this
00:09:03.800 | object being in different places and having the same perception, I guess I'll say, same
00:09:10.800 | appearance.
00:09:13.180 | People don't want to do that work.
00:09:14.480 | They don't see how to naturally fit one with the other.
00:09:17.760 | I think that's, yes, absolutely.
00:09:19.640 | But also on the expert system side, there's a temptation to go too far the other way.
00:09:23.880 | So it's just having an expert sort of sit down and encode the description, the framework
00:09:28.800 | for what a container is, and then having the system reason the rest.
00:09:32.800 | From my view, one really exciting possibility is of active learning where it's continuous
00:09:37.720 | interaction between a human and machine.
00:09:40.420 | As the machine, there's kind of deep learning type extraction of information from data,
00:09:45.120 | patterns and so on, but humans also guiding the learning procedures, guiding both the
00:09:54.160 | process and the framework of how the machine learns, whatever the task is.
00:09:58.240 | I was with you with almost everything you said except the phrase deep learning.
00:10:02.820 | What I think you really want there is a new form of machine learning.
00:10:06.760 | So let's remember, deep learning is a particular way of doing machine learning.
00:10:10.440 | Most often it's done with supervised data for perceptual categories.
00:10:15.080 | There are other things you can do with deep learning, some of them quite technical, but
00:10:19.000 | the standard use of deep learning is I have a lot of examples and I have labels for them.
00:10:23.840 | So here are pictures.
00:10:25.080 | This one's the Eiffel Tower.
00:10:26.640 | This one's the Sears Tower.
00:10:27.920 | This one's the Empire State Building.
00:10:29.600 | This one's a cat.
00:10:30.600 | This one's a pig and so forth.
00:10:31.600 | You just get millions of examples, millions of labels.
00:10:35.240 | Deep learning is extremely good at that.
00:10:37.480 | It's better than any other solution that anybody has devised, but it is not good at representing
00:10:42.200 | abstract knowledge.
00:10:43.720 | It's not good at representing things like bottles contain liquid and have tops to them
00:10:50.080 | and so forth.
00:10:51.080 | It's not very good at learning or representing that kind of knowledge.
00:10:53.760 | It is an example of having a machine learn something, but it's a machine that learns
00:10:59.000 | a particular kind of thing, which is object classification.
00:11:01.920 | It's not a particularly good algorithm for learning about the abstractions that govern
00:11:06.200 | our world.
00:11:07.200 | There may be such a thing.
00:11:09.440 | Part of what we counsel in the book is maybe people should be working on devising such
00:11:12.440 | things.
00:11:13.440 | - So one possibility, just I wonder what you think about it, is deep neural networks do
00:11:19.280 | form abstractions, but they're not accessible to us humans in terms of we can't--
00:11:25.640 | - There's some truth in that.
00:11:27.040 | - So is it possible that either current or future neural networks form very high level
00:11:32.040 | abstractions which are as powerful as our human abstractions of common sense?
00:11:38.640 | We just can't get a hold of them.
00:11:41.200 | And so the problem is essentially we need to make them explainable.
00:11:45.560 | - This is an astute question, but I think the answer is at least partly no.
00:11:49.560 | One of the kinds of classical neural network architectures that we call an auto-associator,
00:11:53.840 | it just tries to take an input, goes through a set of hidden layers, and comes out with
00:11:58.440 | an output.
00:11:59.440 | And it's supposed to learn essentially the identity function, that your input is the
00:12:02.600 | same as your output.
00:12:03.600 | So you think of this binary numbers, you've got like the one, the two, the four, the eight,
00:12:06.840 | the 16, and so forth.
00:12:08.680 | And so if you want to input 24, you turn on the 16, you turn on the eight.
00:12:12.120 | It's like binary one, one, and a bunch of zeros.
00:12:15.240 | So I did some experiments in 1998 with the precursors of contemporary deep learning.
00:12:23.240 | And what I showed was you could train these networks on all the even numbers, and they
00:12:28.560 | would never generalize to the odd number.
00:12:30.920 | A lot of people thought that I was, I don't know, an idiot or faking the experiment or
00:12:34.800 | wasn't true or whatever.
00:12:36.520 | But it is true that with this class of networks that we had in that day, that they would never
00:12:41.880 | ever make this generalization.
00:12:43.920 | And it's not that the networks were stupid, it's that they see the world in a different
00:12:47.880 | way than we do.
00:12:49.840 | They were basically concerned, what is the probability that the rightmost output node
00:12:54.840 | is going to be one?
00:12:56.420 | And as far as they were concerned, in everything they'd ever been trained on, it was a zero.
00:13:01.400 | That node had never been turned on.
00:13:03.440 | And so they figured, well, I turned it on now.
00:13:05.360 | Whereas a person would look at the same problem and say, well, it's obvious, we're just doing
00:13:08.480 | the thing that corresponds.
00:13:10.120 | The Latin for it is mutandis mutandis, will change what needs to be changed.
00:13:14.720 | And we do this, this is what algebra is.
00:13:16.880 | So I can do f of x equals y plus two, and I can do it for a couple of values.
00:13:21.640 | I can tell you if y is three, then x is five, and if y is four, x is six.
00:13:25.600 | And now I can do it with some totally different number, like a million.
00:13:27.720 | Then you can say, well, obviously it's a million and two, because you have an algebraic operation
00:13:31.720 | that you're applying to a variable.
00:13:34.000 | And deep learning systems kind of emulate that, but they don't actually do it.
00:13:38.800 | The particular example, you can fudge a solution to that particular problem.
00:13:44.400 | The general form of that problem remains that what they learn is really correlations between
00:13:48.880 | different input and output nodes.
00:13:50.360 | And they're complex correlations with multiple nodes involved and so forth.
00:13:54.880 | But ultimately, they're correlative.
00:13:56.600 | They're not structured over these operations over variables.
00:13:59.320 | Now, someday people may do a new form of deep learning that incorporates that stuff.
00:14:03.600 | And I think it will help a lot.
00:14:04.600 | And there's some tentative work on things like differentiable programming right now
00:14:08.280 | that fall into that category.
00:14:09.960 | But the sort of classic stuff like people use for ImageNet doesn't have it.
00:14:14.960 | And you have people like Hinton going around saying symbol manipulation, like what Marcus,
00:14:19.160 | what I advocate is like the gasoline engine.
00:14:21.960 | It's obsolete.
00:14:22.960 | We should just use this cool electric power that we've got with a deep learning.
00:14:26.920 | And that's really destructive, because we really do need to have the gasoline engine
00:14:31.480 | stuff that represents...
00:14:33.520 | I mean, I don't think it's a good analogy, but we really do need to have the stuff that
00:14:38.760 | represents symbols.
00:14:39.760 | - Yeah, and Hinton as well would say that we do need to throw out everything and start
00:14:44.880 | over.
00:14:45.880 | So there's a...
00:14:46.880 | - Yeah, Hinton said that to Axios.
00:14:49.360 | And I had a friend who interviewed him and tried to pin him down on what exactly we need
00:14:53.560 | to throw out.
00:14:54.560 | And he was very evasive.
00:14:55.560 | - Well, of course, 'cause we can't...
00:14:57.920 | If he knew that he'd throw it out himself, but I mean, you can't have it both ways.
00:15:01.640 | You can't be like, I don't know what to throw out, but I am gonna throw out the symbols.
00:15:06.160 | I mean, and not just the symbols, but the variables and the operations of the variables.
00:15:10.480 | Don't forget the operations of the variables, the stuff that I'm endorsing, and which John
00:15:15.380 | McCarthy did when he founded AI, that stuff is the stuff that we build most computers
00:15:20.040 | out of.
00:15:21.040 | There are people now who say, we don't need computer programmers anymore.
00:15:25.200 | Not quite looking at the statistics of how much computer programmers actually get paid
00:15:28.080 | right now.
00:15:29.240 | We need lots of computer programmers.
00:15:30.720 | And most of them, they do a little bit of machine learning, but they still do a lot
00:15:34.800 | of code, right?
00:15:36.320 | Code where it's like, if the value of X is greater than the value of Y, then do this
00:15:40.200 | kind of thing, like conditionals and comparing operations over variables.
00:15:43.840 | Like there's this fantasy you can machine learn anything.
00:15:46.480 | There's some things you would never wanna machine learn.
00:15:48.760 | I would not use a phone operating system that was machine learned.
00:15:52.360 | Like you made a bunch of phone calls and you recorded which packets were transmitted and
00:15:56.000 | you just machine learned it.
00:15:57.360 | It'd be insane.
00:15:58.800 | Or to build a web browser by taking logs of keystrokes and images, screenshots, and then
00:16:05.920 | trying to learn the relation between them.
00:16:07.920 | Nobody would ever, no rational person would ever try to build a browser that way.
00:16:12.160 | They would use symbol manipulation, the stuff that I think AI needs to avail itself of in
00:16:16.320 | addition to deep learning.
00:16:17.480 | [BLANK_AUDIO]
00:16:27.480 | [BLANK_AUDIO]