back to index

Noam Chomsky: Deep Learning is Useful but It Doesn't Tell You Anything about Human Language


Whisper Transcript | Transcript Only Page

00:00:00.000 | - Let me ask you about a field of machine learning,
00:00:05.000 | deep learning.
00:00:06.560 | There's been a lot of progress in neural networks
00:00:08.920 | based neural network based machine learning
00:00:11.520 | in the recent decade.
00:00:14.000 | Of course, neural network research goes back many decades.
00:00:17.560 | What do you think are the limits of deep learning,
00:00:23.220 | of neural network based machine learning?
00:00:26.120 | - Well, to give a real answer to that,
00:00:28.760 | you'd have to understand the exact processes
00:00:32.520 | that are taking place, and those are pretty opaque.
00:00:35.520 | So it's pretty hard to prove a theorem
00:00:37.840 | about what can be done and what can't be done.
00:00:41.640 | But I think it's reasonably clear.
00:00:44.400 | I mean, putting technicalities aside,
00:00:46.800 | what deep learning is doing
00:00:49.520 | is taking huge numbers of examples
00:00:52.940 | and finding some patterns.
00:00:55.320 | - Okay, that could be interesting. In some areas it is.
00:00:59.560 | But we have to ask here a certain question.
00:01:02.680 | Is it engineering or is it science?
00:01:05.820 | Engineering in the sense of just trying
00:01:08.160 | to build something that's useful,
00:01:10.420 | or science in the sense that it's trying
00:01:12.400 | to understand something about elements of the world.
00:01:16.320 | So it takes a Google parser.
00:01:19.400 | We can ask that question.
00:01:21.480 | Is it useful?
00:01:22.760 | Yeah, it's pretty useful.
00:01:24.040 | You know, I use a Google translator.
00:01:27.000 | So on engineering grounds,
00:01:29.480 | it's kind of worth having, like a bulldozer.
00:01:32.500 | Does it tell you anything about human language?
00:01:36.560 | Zero.
00:01:37.400 | Nothing.
00:01:39.420 | And in fact, it's very striking.
00:01:41.760 | It's from the very beginning,
00:01:44.420 | it's just totally remote from science.
00:01:47.960 | So what is a Google parser doing?
00:01:50.220 | It's taking an enormous text,
00:01:52.800 | let's say the Wall Street Journal corpus,
00:01:55.320 | and asking how close can we come
00:01:58.200 | to getting the right description
00:02:01.760 | of every sentence in the corpus.
00:02:04.040 | Well, every sentence in the corpus
00:02:06.160 | is essentially an experiment.
00:02:08.320 | Each sentence that you produce is an experiment,
00:02:12.380 | which is, am I a grammatical sentence?
00:02:15.320 | The answer is usually yes.
00:02:17.280 | So most of the stuff in the corpus is grammatical sentences.
00:02:20.880 | But now ask yourself,
00:02:22.580 | is there any science which takes random experiments,
00:02:27.580 | which are carried out for no reason whatsoever,
00:02:31.320 | and tries to find out something from them?
00:02:34.160 | Like if you're, say, a chemistry PhD student,
00:02:37.280 | you wanna get a thesis, can you say,
00:02:38.840 | well, I'm just gonna do a lot of,
00:02:41.040 | mix a lot of things together, no purpose,
00:02:44.120 | just, and maybe I'll find something.
00:02:47.280 | You'd be laughed out of the department.
00:02:50.040 | Science tries to find critical experiments,
00:02:53.820 | ones that answer some theoretical question.
00:02:56.740 | Doesn't care about coverage of millions of experiments.
00:03:00.580 | So it just begins by being very remote from science,
00:03:03.840 | and it continues like that.
00:03:05.860 | So the usual question that's asked
00:03:08.940 | about, say, a Google parser,
00:03:11.140 | is how well does it do, or some parser,
00:03:13.780 | how well does it do on a corpus?
00:03:15.940 | But there's another question that's never asked.
00:03:18.740 | How well does it do on something
00:03:20.520 | that violates all the rules of language?
00:03:23.700 | So for example, take the structure dependence case
00:03:26.320 | that I mentioned.
00:03:27.280 | Suppose there was a language in which
00:03:29.800 | you used linear proximity as the mode of interpretation.
00:03:34.800 | These deep learning would work very easily on that.
00:03:39.320 | In fact, much more easily than an actual language.
00:03:42.380 | Is that a success?
00:03:43.560 | No, that's a failure.
00:03:45.200 | From a scientific point of view, it's a failure.
00:03:48.000 | It shows that we're not discovering
00:03:51.160 | the nature of the system at all,
00:03:53.440 | 'cause it does just as well or even better
00:03:55.360 | on things that violate the structure of the system.
00:03:58.480 | And it goes on from there.
00:04:00.320 | It's not an argument against doing it.
00:04:02.400 | It is useful to have devices like this.
00:04:04.800 | - So yes, so neural networks are kind of approximators
00:04:08.280 | that look, there's echoes of the behavioral debates,
00:04:11.800 | right, behavioralism.
00:04:13.760 | - More than echoes.
00:04:15.200 | Many of the people in deep learning
00:04:17.680 | say they've vindicated, Terry Sanyoski, for example,
00:04:22.200 | in his recent books, says this vindicates
00:04:25.160 | Skinnerian behaviors.
00:04:27.160 | It doesn't have anything to do with it.
00:04:29.080 | - Yes, but I think there's something
00:04:31.400 | actually fundamentally different when the data set is huge.
00:04:35.920 | But your point is extremely well taken.
00:04:38.800 | But do you think we can learn, approximate,
00:04:43.040 | that interesting complex structure of language
00:04:46.420 | with neural networks that will somehow
00:04:48.440 | help us understand the science?
00:04:50.380 | - It's possible.
00:04:52.120 | I mean, you find patterns that you hadn't noticed,
00:04:54.880 | let's say, could be.
00:04:57.360 | In fact, it's very much like a kind of linguistics
00:05:01.240 | that's done, what's called corpus linguistics.
00:05:05.720 | When you, suppose you have some language
00:05:08.720 | where all the speakers have died out,
00:05:11.080 | but you have records.
00:05:12.760 | So you just look at the records
00:05:15.720 | and see what you can figure out from that.
00:05:18.200 | It's much better to have actual speakers
00:05:21.280 | where you can do critical experiments.
00:05:23.680 | But if they're all dead, you can't do them.
00:05:26.120 | So you have to try to see what you can find out
00:05:28.400 | from just looking at the data that's around.
00:05:31.480 | You can learn things.
00:05:32.640 | Actually, paleoanthropology is very much like that.
00:05:36.000 | You can't do a critical experiment
00:05:38.240 | on what happened two million years ago.
00:05:41.120 | So you're kind of forced just to take what data's around
00:05:44.160 | and see what you can figure out from it.
00:05:46.800 | Okay, it's a serious study.
00:05:48.880 | (upbeat music)
00:05:51.460 | (upbeat music)
00:05:54.040 | (upbeat music)
00:05:56.620 | (upbeat music)
00:05:59.200 | (upbeat music)
00:06:01.780 | (upbeat music)
00:06:04.360 | [BLANK_AUDIO]