back to indexEdward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426
Chapters
0:0 Introduction
1:13 Human language
5:19 Generalizations in language
11:6 Dependency grammar
21:5 Morphology
29:40 Evolution of languages
33:0 Noam Chomsky
77:6 Thinking and language
90:36 LLMs
103:35 Center embedding
130:2 Learning a new language
133:54 Nature vs nurture
140:30 Culture and language
154:58 Universal language
159:21 Language translation
162:36 Animal communication
00:00:00.000 |
Naively, I certainly thought that all humans would have words for exact counting, and the 00:00:11.680 |
There's not a word for one in their language. 00:00:13.720 |
And so, there's certainly not a word for two, three, or four, so that kind of blows people's 00:00:21.240 |
How are you going to ask, "I want two of those"? 00:00:24.360 |
And so, that's just not a thing you can possibly ask in the Piraha. 00:00:32.360 |
The following is a conversation with Edward Gibson, or Ted, as everybody calls him. 00:00:40.640 |
He heads the MIT Language Lab that investigates why human languages look the way they do, 00:00:46.680 |
the relationship between cultural language and how people represent, process, and learn 00:00:53.680 |
Also, he should have a book titled, "Syntax, a Cognitive Approach," published by MIT Press 00:01:05.560 |
To support it, please check out our sponsors in the description. 00:01:13.800 |
When did you first become fascinated with human language? 00:01:17.580 |
As a kid in school, when we had to structure sentences in English grammar, I found that 00:01:25.780 |
I found it confusing as to what it was I was told to do. 00:01:29.700 |
I didn't understand what the theory was behind it, but I found it very interesting. 00:01:34.540 |
So, when you look at grammar, you're almost thinking about it like a puzzle, almost like 00:01:40.540 |
I didn't know I was going to work on this at all at that point. 00:01:42.780 |
I was really just, I was kind of a math geek person, a computer scientist. 00:01:48.740 |
And then I found language as a neat puzzle to work on from an engineering perspective, 00:01:56.980 |
That's what I, as a, I sort of accidentally, I decided after I finished my undergraduate 00:02:03.060 |
degree, which was computer science and math in Canada, in Queens University, I decided 00:02:09.300 |
It's like, that's what I always thought I would do. 00:02:11.420 |
And I went to Cambridge, where they had a master's program in computational linguistics, 00:02:18.140 |
and I hadn't taken a single language class before. 00:02:21.740 |
All I'd taken was CS, computer science, math classes, pretty much, mostly, as an undergrad. 00:02:26.940 |
And I just thought this was an interesting thing to do for a year, because it was a single 00:02:33.300 |
And then I ended up spending my whole life doing it. 00:02:35.980 |
So fundamentally, your journey through life was one of a mathematician and a computer 00:02:39.820 |
scientist, and then you kind of discovered the puzzle, the problem of language, and approached 00:02:46.020 |
it from that angle, to try to understand it from that angle, almost like a mathematician 00:02:53.780 |
- As an engineer, I'd say, I mean, to be frank, I had taken an AI class, I guess it was '83 00:02:59.500 |
or '84, '85, somewhere '84 in there, a long time ago, and there was a natural language 00:03:06.540 |
I thought, there must be more interesting things we can do. 00:03:10.140 |
It didn't seem very, it seemed just a bunch of hacks to me. 00:03:14.900 |
It didn't seem like a real theory of things in any way. 00:03:17.780 |
And so I just thought this seemed like an interesting area where there wasn't enough 00:03:24.260 |
- Did you ever come across the philosophy angle of logic? 00:03:27.940 |
So if you think about the '80s with AI, the expert systems where you try to kind of maybe 00:03:34.180 |
sidestep the poetry of language and some of the syntax and the grammar and all that kind 00:03:38.820 |
of stuff and go to the underlying meaning that language is trying to communicate and 00:03:43.380 |
try to somehow compress that in a computer-representable way, did you ever come across that in your 00:03:50.900 |
- I mean, I probably did, but I wasn't as interested in it. 00:03:53.380 |
I was trying to do the easier problems first, the ones I could, thought maybe were handleable, 00:03:58.940 |
which seems like the syntax is easier, which is just the forms as opposed to the meaning, 00:04:04.140 |
like when you're starting to talk about the meaning, that's a very hard problem, and it 00:04:09.860 |
But the forms is easier, and so I thought at least figuring out the forms of human language, 00:04:16.020 |
which sounds really hard, but is actually maybe more tractable. 00:04:20.380 |
You think there is a big divide, there's a gap, there's a distance between form and meaning, 00:04:26.420 |
because that's a question you have discussed a lot with LLMs, because they're damn good 00:04:34.780 |
- I think that's what they're good at, is form. 00:04:37.780 |
And that's why they're good, because they can do form. 00:04:40.780 |
I mean, it's an open question, right, how close form and meaning are, but we'll discuss 00:04:46.380 |
But to me, studying form, maybe it's a romantic notion, gives you, form is like the shadow 00:04:54.300 |
of the bigger meaning thing underlying language, as I, form is, language is how we communicate 00:05:02.540 |
ideas, we communicate with each other using language. 00:05:05.580 |
So in understanding the structure of that communication, I think you start to understand 00:05:10.860 |
the structure of thought and the structure of meaning behind those thoughts and communication, 00:05:19.660 |
- What do you find most beautiful about human language, maybe the form of human language, 00:05:28.020 |
- What I find beautiful about human language is the, some of the generalizations that happen 00:05:34.460 |
across the human language, just within and across a language. 00:05:37.380 |
So let me give you an example of something which I find kind of remarkable, that is if 00:05:43.220 |
a language, if it has a word order such that the verbs tend to come before their objects, 00:05:51.380 |
So we have the first, the subject comes first in a simple sentence, so I say, the dog chased 00:05:59.100 |
the cat, or Mary kicked the ball, so the subject's first, and then after the subject, there's 00:06:03.980 |
the verb, and then we have objects, all these things come after in English. 00:06:08.220 |
So it's generally a verb, and most of the stuff that we want to say comes after the 00:06:11.940 |
subject, it's the objects, there's a lot of things we want to say that come after. 00:06:16.060 |
And there's a lot of languages like that, about 40% of the languages of the world look 00:06:20.060 |
like that, they're subject-verb-object languages. 00:06:24.180 |
And then these languages tend to have prepositions, these little markers on the nouns that connect 00:06:36.340 |
So a preposition like in, or on, or of, or about, I say I talk about something, the something 00:06:44.100 |
is the object of that preposition, we have these little markers come, just like verbs, 00:06:51.820 |
Okay, and then, so, now we look at other languages, like Japanese, or Hindi, or some, these are 00:06:57.820 |
so-called verb-final languages, those, maybe a little more than 40%, maybe 45% of the world's 00:07:04.980 |
languages, or more, I mean 50% of the world's languages are verb-final, those tend to be 00:07:09.500 |
postpositions, those markers, the states have the same kinds of markers as we do in English, 00:07:18.980 |
So, sorry, but they put 'em first, the markers come first, so you say, instead of, you know, 00:07:25.340 |
talk about a book, you say a book about, the opposite order there, in Japanese or in Hindi, 00:07:32.700 |
you do the opposite, and the talk comes at the end, so the verb will come at the end 00:07:37.660 |
So instead of Mary kicked the ball, it's Mary ball kicked, and then if it says Mary kicked 00:07:44.660 |
the ball to John, it's John to, the to, the marker there, the preposition, it's a postposition 00:07:52.540 |
And so the interesting thing, fascinating thing to me, is that within a language, this 00:07:58.180 |
order aligns, it's harmonic, and so if it's one or the other, it's either verb-initial 00:08:05.980 |
or verb-final, but then you'll have prepositions, prepositions, or postpositions, and that's 00:08:11.660 |
across the languages that we can look at, we've got around 1,000 languages for, there's 00:08:16.420 |
around 7,000 languages on the Earth right now, but we have information about, say, word 00:08:22.760 |
order on around 1,000 of those, a pretty decent amount of information. 00:08:27.120 |
And for those 1,000 which we know about, about 95% fit that pattern, so they will have either 00:08:34.060 |
verb-initial, it's about half and half, half a verb-initial, like English, and half a verb-final, 00:08:40.660 |
- So just to clarify, verb-initial is subject-verb-object. 00:08:50.160 |
- That's correct, yeah, the subject is generally first. 00:08:52.220 |
- That's so fascinating, "I ate an apple," or "I apple ate," okay, and it's fascinating 00:08:59.500 |
that there's a pretty even division in the world amongst those, 40, 45%. 00:09:05.780 |
And those two are the most common by far, those two word orders, the subject tends to 00:09:09.900 |
There's so many interesting things, but these things are, the thing I find so fascinating 00:09:12.900 |
is there are these generalizations within and across a language. 00:09:17.340 |
And not only those, and there's actually a simple explanation, I think, for a lot of 00:09:22.540 |
that, and that is you're trying to minimize dependencies between words. 00:09:28.660 |
That's basically the story, I think, behind a lot of why word order looks the way it is, 00:09:34.220 |
is we're always connecting, what is the thing I'm telling you? 00:09:38.220 |
I'm talking to you in sentences, you're talking to me in sentences, these are sequences of 00:09:42.060 |
words which are connected, and the connections are dependencies between the words. 00:09:47.860 |
And it turns out that what we're trying to do in a language is actually minimize those 00:09:54.580 |
It's easier for me to say things if the words that are connecting for their meaning are 00:09:59.300 |
It's easier for you in understanding if that's also true. 00:10:03.500 |
If they're far away, it's hard to produce that, and it's hard for you to understand. 00:10:08.820 |
And the languages of the world, within a language and across languages, fit that generalization, 00:10:14.020 |
which is, so it turns out that having verbs initial and then having prepositions ends 00:10:23.140 |
And having verbs final and having postpositions ends up making dependencies shorter than if 00:10:29.500 |
If you cross them, it ends up, you just end up, it's possible. 00:10:35.460 |
It just ends up with longer dependencies than if you didn't. 00:10:43.900 |
So it was observed a long time ago, without the explanation, by a guy called Joseph Greenberg, 00:10:53.460 |
He observed a lot of generalizations about how word order works, and these are some of 00:10:57.220 |
the harmonic generalizations that he observed. 00:11:12.140 |
- Well, what I mean is, in language, there's kind of three structures to, three components 00:11:23.940 |
I'm not talking about that part, I'm talking, then there's two meaning parts, and those 00:11:27.420 |
are the words, and you were talking about meaning earlier. 00:11:30.420 |
So words have a form, and they have a meaning associated with them, and so cat is a full 00:11:35.300 |
form in English, and it has a meaning associated with whatever a cat is. 00:11:38.760 |
And then the combinations of words, that's what I'll call grammar or syntax, and that's 00:11:45.180 |
like when I have a combination like "the cat" or "two cats," okay? 00:11:49.660 |
So where I take two different words there and put them together, and I get a compositional 00:11:54.960 |
meaning from putting those two different words together, and so that's the syntax. 00:11:59.180 |
And in any sentence or utterance, whatever I'm talking to you, you're talking to me, 00:12:04.340 |
we have a bunch of words and we're putting together in a sequence, it turns out they 00:12:08.980 |
are connected so that every word is connected to just one other word in that sentence. 00:12:17.020 |
And so you end up with what's called technically a tree, it's a tree structure, where there's 00:12:21.460 |
a root of that utterance, of that sentence, and then there's a bunch of dependents, like 00:12:27.740 |
branches from that root that go down to the words. 00:12:31.140 |
The words are the leaves in this metaphor for a tree. 00:12:34.700 |
So a tree is also sort of a mathematical construct. 00:12:37.180 |
Yeah, yeah, it's a graph theoretical thing, exactly. 00:12:40.180 |
So it's fascinating that you can break down a sentence into a tree, and then every word 00:12:45.380 |
is hanging on to another, it's depending on it. 00:12:48.820 |
And everyone agrees on that, so all linguists will agree with that, that is not controversial. 00:12:53.380 |
There's nobody sitting here listening mad at you. 00:12:59.700 |
I think in every language, I think everyone agrees that all sentences are trees at some 00:13:07.540 |
'Cause it, to me, just as a layman, it's surprising that you can break down sentences 00:13:19.820 |
I've never heard of anyone disagreeing with that. 00:13:22.500 |
The details of the trees are what people disagree about. 00:13:25.580 |
Well, okay, so what's at the root of a tree, how do you construct, how hard is it, what 00:13:30.860 |
is the process of constructing a tree from a sentence? 00:13:34.180 |
Well, this is where, you know, depending on what your, there's different theoretical notions. 00:13:38.420 |
I'm gonna say the simplest thing, dependency grammar. 00:13:41.380 |
It's like a bunch of people invented this, Tenier was the first French guy, back in, 00:13:46.060 |
I mean, the paper was published in 1959, but he was working on the '30s and stuff, so, 00:13:50.900 |
and it goes back to, you know, philologist Pignini was doing this in ancient India, okay? 00:13:57.980 |
And so, you know, doing something like this, the simplest thing we can think of is that 00:14:02.420 |
there's just connections between the words to make the utterance. 00:14:06.420 |
And so, let's just say I have, like, two dogs entered a room, okay, here's a sentence. 00:14:11.980 |
And so, we're connecting two and dogs together, that's like, there's some dependency between 00:14:17.100 |
those words to make some bigger meaning, and then we're connecting dogs now to entered, 00:14:23.860 |
And we connect a room somehow to entered, and so I'm gonna connect to room and then 00:14:30.580 |
That's the tree, is I, the root is entered, that's, the thing is like an entering event, 00:14:35.060 |
that's what we're saying here, and the subject, which is whatever that dog is, is two dogs, 00:14:39.740 |
it was, and the connection goes back to dogs, which goes back to, then that goes back to 00:14:46.780 |
It starts at entered, goes to dogs, down to two, and then the other side, after the verb, 00:14:52.300 |
the object, it goes to room, and then that goes back to the determiner or article, whatever 00:14:58.500 |
So, there's a bunch of categories of words here, we're noticing, so there are verbs, 00:15:02.780 |
those are these things that typically mark, they refer to events and states in the world, 00:15:08.660 |
and there are nouns, which typically refer to people, places, and things, is what people 00:15:12.660 |
say, but they can refer to other more, they can refer to events themselves as well. 00:15:17.060 |
They're marked by, you know, how they get, the category, the part of speech of a word 00:15:25.700 |
It's like, that's how you decide what the category of a word is, not by the meaning, 00:15:32.580 |
What's usually the root, is it gonna be the verb that defines the event? 00:15:39.220 |
- Yeah, I mean, if I don't say a verb, then there won't be a verb, and so it'll be something 00:15:43.380 |
- What if you're messing, are we talking about language that's like, correct language? 00:15:46.300 |
What if you're doing poetry and messing with stuff, is it, then rules go out the window, 00:15:55.020 |
- No, no, no, no, you're constrained by whatever language you're dealing with. 00:15:56.700 |
Probably you have other constraints in poetry, such that you're, like usually in poetry there's 00:16:00.900 |
multiple constraints that you want to, like you want to usually convey multiple meanings 00:16:04.820 |
is the idea, and maybe you have like a rhythm or a rhyming structure as well, and depending, 00:16:09.980 |
but you usually are constrained by your, the rules of your language for the most part, 00:16:17.620 |
You can violate them somewhat, but not too much, so it has to be recognizable as your 00:16:23.300 |
Like in English, I can't say, "Dogs two entered room a." 00:16:27.500 |
I mean, I meant that, you know, two dogs entered a room, and I can't mess with the order of 00:16:33.540 |
the articles, the articles and the nouns, you just can't do that. 00:16:37.420 |
In some languages, you can mess around with the order of words much more. 00:16:42.380 |
I mean, you speak Russian, Russian has a much freer word order than English, and so in fact 00:16:46.900 |
you can move around words in, you know, I told you that English has a subject, verb, 00:16:51.540 |
object, word order, so does Russian, but Russian is much freer than English, and so you can 00:16:56.320 |
actually mess around with the word order, so probably Russian poetry is gonna be quite 00:17:00.740 |
different from English poetry because the word order is much less constrained. 00:17:04.820 |
- Yeah, there's a much more extensive culture of poetry throughout the history of the last 00:17:10.540 |
hundred years in Russia, and I always wondered why that is, but it seems that there's more 00:17:20.100 |
You're morphing the language easier by altering the words, altering the order of the words, 00:17:26.340 |
- Well, you can just mess with different things in each language, and so in Russian, you have 00:17:29.780 |
case markers, which are just these endings on the nouns which tell you how each noun 00:17:37.100 |
We don't have that in English, and so when I say Mary kissed John, I don't know who the 00:17:42.820 |
agent or the patient is except by the order of the words, right? 00:17:46.220 |
In Russian, you actually have a marker on the end if you're using a Russian name, and 00:17:49.820 |
each of those names, you'll also say is it, you know, it'll be the nominative, which is 00:17:55.660 |
marking the subject, or an accusative will mark the object. 00:18:00.780 |
You could put accusative first, you could put subject, you could put the patient first, 00:18:07.340 |
and then the verb, and then the subject, and that would be a perfectly good Russian sentence, 00:18:11.660 |
and it would still mean, I could say John kissed Mary, meaning Mary kissed John, as 00:18:17.900 |
long as I use the case markers in the right way. 00:18:21.020 |
- I love the terminology of agent and patient, and the other ones you used, those are sort 00:18:29.320 |
- Those are, those are for, like, kind of meaning, those are meaning, and subject and 00:18:32.540 |
object are generally used for position, so subject is just like the thing that comes 00:18:37.140 |
before the verb, and the object is the one that comes after the verb. 00:18:40.180 |
The agent is kind of like the thing doing, that's kind of what that means, right? 00:18:44.380 |
The subject is often the person doing the action, right, the thing, so yeah. 00:18:49.780 |
So how hard is it to form a tree in general, is there a procedure to it, like if you look 00:18:55.260 |
at different languages, is it supposed to be a very natural, like is it automatable, 00:18:59.140 |
or is there some human genius involved in constructing it? 00:19:01.220 |
- I think it's pretty automatable at this point. 00:19:03.820 |
People can figure out what the words are, they can figure out the morphemes, which are 00:19:05.860 |
the, technically, morphemes are the minimal meaning units within a language, okay? 00:19:10.980 |
And so, when you say eats, or drinks, it actually has two morphemes in it in English, there's 00:19:16.300 |
the root, which is the verb, and then there's some ending on it which tells you, you know, 00:19:20.540 |
that's this third person, third person singular. 00:19:24.860 |
- Morphemes are just the minimal meaning units within a language, and a word is just, kind 00:19:28.060 |
of the things we put spaces between in English, and they're a little bit more, they have the 00:19:31.940 |
morphology as well, they have the endings, this inflectional morphology on the endings 00:19:37.140 |
- It modifies something about the word that adds additional meaning. 00:19:40.100 |
- Yeah, yeah, yeah, and so we have a little bit of that in English, very little, much 00:19:43.300 |
more in Russian, for instance, but we have a little bit in English, and so we have a 00:19:47.340 |
little on the nouns, you can say it's either singular or plural, and you can say, same 00:19:52.220 |
thing for verbs, like simple past tense, for example, it's like, you know, notice in English 00:19:58.100 |
we say drinks, you know, he drinks, but everyone else is I drink, you drink, we drink, it's 00:20:02.860 |
unmarked in a way, and then, but in the past tense, it's just drank, for everyone, there's 00:20:09.820 |
There is morphology, it's marking past tense, but it's kind of, it's an irregular now, so 00:20:13.820 |
we don't even, you know, drink to drank, you know, it's not even a regular word, so in 00:20:17.720 |
most verbs, many verbs, there's an -ed, we kind of add, so walk to walked, we add that 00:20:22.380 |
to say it's the past tense, that I just happened to choose an irregular, 'cause it's a high-frequency 00:20:26.480 |
word, and the high-frequency words tend to have irregulars in English for. 00:20:31.380 |
- Irregular is just, there isn't a rule, so drink to drank, it's an irregular. 00:20:37.260 |
- As opposed to walk, walked, talked, talked. 00:20:40.020 |
- And there's a lot of irregulars in English. 00:20:44.180 |
The frequent ones, the common words, tend to be irregular, there's many, many more low-frequency 00:20:50.760 |
words, and those tend to be, those are regular ones. 00:20:53.120 |
- The evolution of the irregulars are fascinating, 'cause it's essentially slang that's sticky, 00:20:57.240 |
'cause you're breaking the rules, and then everybody uses it and doesn't follow the rules, 00:21:01.960 |
and they say screw it to the rules, it's fascinating, so you said morphemes, lots of questions, 00:21:07.840 |
so morphology is what, the study of morphemes? 00:21:11.040 |
- Morphemes is the connections between the morphemes onto the roots, the roots. 00:21:14.880 |
So in English, we mostly have suffixes, we have endings on the words, not very much, 00:21:22.000 |
Some words, depending on your language, can have mostly prefixes, mostly suffixes, or 00:21:28.120 |
mostly, or both, and then even languages, several languages have things called infixes, 00:21:32.840 |
where you have some kind of a general form for the root, and you put stuff in the middle, 00:21:42.760 |
- That's fascinating, that's fascinating, so in general, there's what, two morphemes 00:21:51.760 |
- Well in English, it's one or two, in English, it tends to be one or two, there can be more. 00:21:56.640 |
In other languages, a language like Finnish, which has a very elaborate morphology, there 00:22:02.800 |
may be 10 morphemes on the end of a root, and so there may be millions of forms of a 00:22:09.720 |
- Okay, I will ask the same question over and over, but how does, just sometimes to 00:22:18.800 |
understand things like morphemes, it's nice to just ask the question, how do these kinds 00:22:26.480 |
So you have a great book studying sort of the, how the cognitive processing, how language 00:22:35.320 |
used for communication, so the mathematical notion of how effective language is for communication, 00:22:40.360 |
what role that plays in the evolution of language, but just high level, like how do we, how does 00:22:46.120 |
a language evolve with, where English is two morphemes, or one or two morphemes per word, 00:22:54.760 |
So what, how does that happen, is it just people? 00:23:00.600 |
That's a very good question, is why do languages have more morphology versus less morphology, 00:23:06.640 |
and I don't think we know the answer to this. 00:23:08.520 |
I think there's just a lot of good solutions to the problem of communication. 00:23:13.440 |
So I believe, as you hinted, that language is an invented system by humans for communicating 00:23:22.080 |
their ideas, and I think it comes down to we label the things we want to talk about, 00:23:26.560 |
those are the morphemes and words, those are the things we want to talk about in the world, 00:23:30.320 |
and we invent those things, and then we put them together in ways that are easy for us 00:23:38.120 |
But that's like a naive view, and I don't, I mean, I think it's probably right, right? 00:23:43.960 |
- One has to notice, I don't know if it's naive, I think it's simple. 00:23:48.640 |
- Naive is an indication that it's incorrect somehow, it's a trivial, too simple, I think 00:23:56.720 |
But it's interesting how sticky, it feels like two people got together, it just feels 00:24:03.400 |
like once you figure out certain aspects of a language, that just becomes sticky and the 00:24:07.480 |
tribe forms around that language, or maybe the language, maybe the tribe forms first 00:24:11.800 |
and then the language evolves, and then you just kind of agree and you stick to whatever 00:24:16.560 |
- These are very interesting questions, we don't know really about how words, even words, 00:24:22.720 |
get invented very much about, we don't really, I mean, assuming they get invented, we don't 00:24:28.640 |
really know how that process works and how these things evolve. 00:24:31.280 |
What we have is kind of a current picture, a current picture of a few thousand languages, 00:24:40.960 |
We don't have any pictures of really how these things are evolving, really. 00:24:45.960 |
And then the evolution is massively confused by contact, right? 00:24:52.260 |
So as soon as one language group, one group runs into another, we are smart, humans are 00:24:58.640 |
smart and they take on whatever is useful in the other group. 00:25:02.780 |
And so any kind of contrast which you're talking about, which I find useful, I'm gonna start 00:25:09.480 |
And I worked a little bit in specific areas of words, in number words and in color words. 00:25:16.240 |
And in color words, so we have, in English, we have around 11 words that everyone knows 00:25:25.080 |
And many more, if you happen to be interested in color for some reason or other, if you're 00:25:29.520 |
a fashion designer or an artist or something, you may have many, many more words. 00:25:33.800 |
But we can see millions, like if you have normal color vision, normal trichromatic color 00:25:38.960 |
vision, you can see millions of distinctions in color. 00:25:43.200 |
The most efficient, no, the most detailed color vocabulary would have over a million 00:25:49.240 |
terms to distinguish all the different colors that we can see. 00:25:53.760 |
So it's somehow, it's kind of useful for English to have evolved in some way to, there's 11 00:26:01.560 |
terms that people find useful to talk about, black, white, red, blue, green, yellow, purple, 00:26:08.920 |
black, gray, pink, and I probably missed something there. 00:26:11.440 |
Anyway, there's 11 that everyone knows, but you go to different cultures, especially the 00:26:17.960 |
non-industrialized cultures, and there'll be many fewer. 00:26:21.080 |
So some cultures will have only two, believe it or not. 00:26:25.080 |
The Danai in Papua New Guinea have only two labels that the group uses for color. 00:26:33.000 |
They are very, very dark and very, very light, which are roughly black and white. 00:26:36.800 |
And you might think, oh, they're dividing the whole color space into light and dark 00:26:43.120 |
They mostly just only label the black and the white things. 00:26:46.040 |
They just don't talk about the colors for the other ones. 00:26:50.320 |
I worked with a group called the Chimani down in Bolivia in South America, and they have 00:26:56.920 |
three words that everyone knows, but there's a few others that several people, that many 00:27:04.160 |
And so they have, kind of depending on how you count, between three and seven words that 00:27:15.120 |
And red, red is, that tends to be the third word that everyone, that cultures bring in. 00:27:21.120 |
If there's a word, it's always red, the third one. 00:27:23.480 |
And then after that, it's kind of all bets are off about what they bring in. 00:27:26.560 |
And so after that, they bring in a sort of a big blue-green group. 00:27:34.120 |
And then different people have different words that they'll use for other parts of the space. 00:27:39.040 |
And so anyway, it's probably related to what they want to talk, not what they see, because 00:27:47.880 |
So it's not like they have a weak, a low color palette in the things they're looking at. 00:27:54.320 |
They're looking at a lot of beautiful scenery, a lot of different colored flowers and berries 00:28:02.600 |
And so there's lots of things of very bright colors, but they just don't label the color 00:28:08.400 |
And the reason, probably, we don't know this, but we think probably what's going on here 00:28:12.880 |
is that what you do, why you label something, is you need to talk to someone else about 00:28:20.080 |
Well, if I have two things which are identical, and I want you to give me the one that's different, 00:28:26.280 |
and the only way it varies is color, then I invent a word which tells you, "This is 00:28:32.640 |
So I want the red sweater off the rack, not the green sweater. 00:28:36.360 |
And so those things will be identical, because these are things we made, and they're dyed, 00:28:42.680 |
And so in industrialized society, everything we've got is pretty much arbitrarily colored. 00:28:50.640 |
But if you go to a non-industrialized group, that's not true. 00:28:53.520 |
And so they don't, it's not only that they're not interested in color, if you bring bright 00:28:57.480 |
colored things to them, they like them just like we like them. 00:29:01.080 |
Bright colors are great, they're beautiful, but they just don't need to, no need to talk 00:29:08.080 |
- So probably color words is a good example of how language evolves from sort of function, 00:29:13.320 |
when you need to communicate the use of something. 00:29:16.800 |
- Then you kind of invent different variations, and basically, you can imagine that the evolution 00:29:22.200 |
of a language has to do with what the early tribe's doing, like what kind of problems 00:29:27.680 |
are facing them, and they're quickly figuring out how to efficiently communicate the solution 00:29:32.720 |
to those problems, whether it's aesthetic or functional, all that kind of stuff, running 00:29:39.600 |
But I think what you're pointing to is that we don't have data on the evolution of language, 00:29:45.840 |
because many languages were formed a long time ago, so you don't get the chatter. 00:29:50.160 |
We have a little bit of old English to modern English, because there was a writing system, 00:29:58.680 |
So the word order changed, for instance, in old English to middle English to modern English, 00:30:02.440 |
and so we can see things like that, but most languages don't even have a writing system. 00:30:07.080 |
So of the 7,000, only a small subset of those have a writing system, and even if they have 00:30:13.120 |
a writing system, it's not a very modern writing system, and so they don't have it, so we just 00:30:17.360 |
basically have, for Mandarin, for Chinese, we have a lot of evidence for a long time, 00:30:25.600 |
Not for German a little bit, but not for a whole lot of long-term language evolution. 00:30:32.240 |
We just have snapshots, is what we've got, of current languages. 00:30:34.960 |
- Yeah, you get an inkling of that from the rapid communication on certain platforms, 00:30:40.640 |
There's different communities, and they'll come up with different slang. 00:30:44.200 |
Especially from my perspective, German, by a little bit of humor, or maybe mockery or 00:30:49.400 |
whatever, just talking shit in different kinds of ways, and you could see the evolution of 00:30:57.040 |
language there, because I think a lot of things on the internet, you don't want to be the 00:31:03.920 |
boring mainstream, so you want to deviate from the proper way of talking, and so you 00:31:11.960 |
get a lot of deviation, rapid deviation, and then when communities collide, you get, just 00:31:18.240 |
like you said, humans adapt to it, and you can see it through the lens of humor. 00:31:22.240 |
It's very difficult to study, but you can imagine 100 years from now, if there's a new 00:31:26.420 |
language born, for example, we'll get really high-resolution data. 00:31:33.100 |
All languages change all the time, so there's the famous result about the Queen's English. 00:31:40.680 |
So if you look at the Queen's vowels, the Queen's English is supposed to be, originally 00:31:45.520 |
the proper way for the talk was sort of defined by whoever the Queen talked, or the King, 00:31:50.080 |
whoever was in charge, and so if you look at how her vowels changed from when she first 00:31:57.800 |
became Queen in 1952, '53, when she was coronated, I mean, that's Queen Elizabeth who died recently, 00:32:03.040 |
of course, until 50 years later, her vowels changed, her vowels shifted a lot. 00:32:08.240 |
And so even in the sounds of British English, in her, the way she was talking was changing. 00:32:16.800 |
So that's just, in the sounds, there's change. 00:32:19.280 |
I don't know what's, I'm interested, we're all interested in what's driving any of these 00:32:25.040 |
The word order of English changed a lot over 1,000 years, right? 00:32:28.400 |
So it used to look like German, it used to be a verb-final language with case marking, 00:32:33.880 |
and it shifted to a verb-medial language, a lot of contact, so a lot of contact with 00:32:38.120 |
French, and it became a verb-medial language with no case marking. 00:32:48.200 |
- It totally evolved, and so it may very well, I mean, it doesn't evolve maybe very much 00:32:52.240 |
in 20 years, is maybe what you're talking about, but over 50 and 100 years, things change 00:32:57.600 |
- We'll now have good data on it, which is great. 00:33:01.200 |
- Can you talk to what is syntax and what is grammar? 00:33:06.920 |
You were asking me before about how do I figure out what a dependency structure is. 00:33:10.600 |
I'd say the dependency structures aren't that hard to, generally, I think there's a lot 00:33:14.760 |
of agreement of what they are for almost any sentence in most languages. 00:33:22.680 |
There are other parameters in the mix such that some people think there's a more complicated 00:33:30.080 |
And so, you know, like Noam Chomsky, he's the most famous linguist ever, and he is famous 00:33:36.760 |
for proposing a slightly more complicated syntax. 00:33:43.720 |
So he's well-known for many, many things, but in the '50s and early '60s, but late '50s, 00:33:50.120 |
he was basically figuring out what's called formal language theory. 00:33:54.420 |
And he figured out sort of a framework for figuring out how complicated a certain type 00:34:01.480 |
of language might be, so-called phrase structure grammars of language might be. 00:34:06.120 |
And so his idea was that maybe we can think about the complexity of a language by how 00:34:18.720 |
They will have a left-hand side and they'll have a right-hand side. 00:34:22.840 |
And on the left-hand side, we'll expand to the thing on the right-hand side. 00:34:25.560 |
So say we'll start with an S, which is like the root, which is a sentence, okay? 00:34:30.640 |
And then we're going to expand to things like a noun phrase and a verb phrase is what he 00:34:36.800 |
An S goes to an NP and a VP is a kind of a phrase structure rule. 00:34:42.280 |
An NP is a determiner and a noun, for instance, and a verb phrase is something else, is a 00:34:47.960 |
verb and another noun phrase and another NP, for instance. 00:34:50.920 |
Those are the rules of a very simple phrase structure, okay? 00:34:55.280 |
And so he proposed phrase structure grammar as a way to sort of cover human languages. 00:35:01.120 |
And then he actually figured out that, well, depending on the formalization of those grammars, 00:35:05.080 |
you might get more complicated or less complicated languages. 00:35:08.280 |
And so he said, well, these are things called context-free languages that rule. 00:35:14.480 |
He thought human languages would tend to be what he calls context-free languages. 00:35:20.040 |
But there are simpler languages, which are so-called regular languages, and they have 00:35:23.760 |
a more constrained form to the rules of the phrase structure of these particular rules. 00:35:28.840 |
So he basically discovered and kind of invented ways to describe the language, and those are 00:35:38.640 |
And he was mostly interested in English initially in his work in the '50s. 00:35:44.600 |
So formal language theory is the big field of just studying language formally. 00:35:49.320 |
- Yes, and it doesn't have to be human language there. 00:35:51.440 |
We can have computer languages, any kind of system which is generating some set of expressions 00:36:01.880 |
And those could be like the statements in a computer language, for example. 00:36:08.280 |
So it could be that or it could be human language. 00:36:10.240 |
- So technically you can study programming languages. 00:36:16.480 |
There's a big field of programming languages within the formal language. 00:36:20.600 |
- Okay, and then phrase structure, grammar, is this idea that you can break down language 00:36:28.920 |
- It's a particular formalism for describing language. 00:36:35.120 |
He's the one who figured that stuff out back in the '50s. 00:36:41.720 |
The context-free grammar is kind of equivalent in the sense that it generates the same sentences 00:36:49.400 |
The dependency grammar is a little simpler in some way. 00:36:51.720 |
You just have a root, and it goes, we don't have any of these. 00:36:59.440 |
The phrase structure grammar is kind of a different way to think about the dependency 00:37:04.960 |
It's slightly more complicated, but it's kind of the same in some ways. 00:37:07.720 |
- So to clarify, dependency grammar is the framework under which you see language, and 00:37:13.840 |
you make the case that this is a good way to describe language. 00:37:20.920 |
This is very upset right now, so let's, just kidding. 00:37:24.280 |
But what's the difference between, where's the place of disagreement between phrase structure 00:37:34.480 |
So phrase structure grammar and dependency grammar aren't that far apart. 00:37:38.240 |
I like dependency grammar because it's more perspicuous. 00:37:42.500 |
It's more transparent about representing the connections between the words. 00:37:46.200 |
It's just a little harder to see in phrase structure grammar. 00:37:49.100 |
The place where Chomsky sort of devolved or went off from this is he also thought there 00:38:01.220 |
That's the place where I would say we disagree. 00:38:03.440 |
And I mean, maybe we'll get into that later, but the idea is, if you wanna, do you want 00:38:13.040 |
Okay, so here's the, movement is, Chomsky basically sees English and he says, okay, 00:38:17.020 |
I said, we had that sentence earlier, it was like two dogs entered the room, but it's changed 00:38:22.380 |
a little bit, say, two dogs will enter the room. 00:38:25.180 |
And he notices that, hey, English, if I wanna make a question, a yes/no question from that 00:38:30.660 |
same sentence, I say, instead of two dogs will enter the room, I say, will two dogs 00:38:36.060 |
Okay, there's a different way to say the same idea, and it's like, well, the auxiliary verb 00:38:40.780 |
that will thing, it's at the front as opposed to in the middle, okay? 00:38:45.600 |
And so, and he looked, if you look at English, you see that that's true for all those modal 00:38:50.660 |
verbs and for other kinds of auxiliary verbs in English, you always do that, you always 00:38:54.460 |
put an auxiliary verb at the front, and when he saw that, so if I say, I can win this bet, 00:39:01.560 |
can I win this bet, right, so I move a can to the front. 00:39:04.580 |
So actually, that's a theory, I just gave you a theory there, he talks about it as movement, 00:39:09.740 |
that word in the declarative is the root, is the sort of default way to think about 00:39:14.540 |
the sentence, and you move the auxiliary verb to the front. 00:39:17.940 |
That's a movement theory, okay, and he just thought that was just so obvious that it must 00:39:23.260 |
be true, that there's nothing more to say about that, that this is how auxiliary verbs 00:39:29.980 |
There's a movement rule, such that you move, like to get from the declarative to the interrogative, 00:39:35.080 |
you're moving the auxiliary to the front, and it's a little more complicated as soon 00:39:38.060 |
as you go to simple present and simple past, because if I say John slept, you have to say 00:39:45.100 |
did John sleep, not slept John, right, and so you have to somehow get an auxiliary verb 00:39:49.900 |
and I guess underlyingly, it's like slept, it's a little more complicated than that, 00:39:54.660 |
but that's his idea, there's a movement, okay, and so a different way to think about that, 00:39:59.380 |
that isn't, I mean, then he ended up showing later, so he proposed this theory of grammar, 00:40:04.580 |
which has movement, and there's other places where he thought there's movement, not just 00:40:07.660 |
auxiliary verbs, but things like the passive in English and things like questions, WH questions, 00:40:14.340 |
a bunch of places where he thought there's also movement going on, and in each one of 00:40:19.220 |
those, he thinks there's words, well, phrases and words are moving around from one structure 00:40:23.260 |
to another, which he called deep structure to surface structure, I mean, there's like 00:40:26.300 |
two different structures in his theory, okay. 00:40:29.860 |
There's a different way to think about this, which is there's no movement at all, there's 00:40:34.540 |
a lexical copying rule, such that the word will or the word can, these auxiliary verbs, 00:40:41.380 |
they just have two forms, and one of them is the declarative and one of them is interrogative, 00:40:46.580 |
and you basically have the declarative one, and oh, I form the interrogative, or I can 00:40:50.860 |
form one from the other, doesn't matter which direction you go, and I just have a new entry, 00:40:55.900 |
which has the same meaning, which has a slightly different argument structure, argument structure 00:41:00.820 |
is just a fancy word for the ordering of the words, and so if I say, it was the dogs, two 00:41:07.540 |
dogs can or will enter the room, there's two forms of will, one is will declarative, and 00:41:16.220 |
then okay, I've got my subject to the left, it comes before me, and the verb comes after 00:41:20.660 |
me in that one, and then the will interrogative, it's like, oh, I go first, interrogative, 00:41:25.940 |
will is first, and then I have the subject immediately after, and then the verb after 00:41:29.860 |
that, and so you can just generate from one of those words, another word with a slightly 00:41:35.020 |
different argument structure, with different ordering. 00:41:37.820 |
- And these are just lexical copies, they're not necessarily moving from one to another. 00:41:43.140 |
- There's a romantic notion that you have one main way to use a word, and then you could 00:41:48.940 |
move it around, which is essentially what movement is implying. 00:41:52.820 |
- Yeah, but that's the lexical copying is similar, so then we do lexical copying for 00:41:58.420 |
that same idea, that maybe the declarative is the source, and then we can copy it, and 00:42:03.100 |
so an advantage, there's multiple advantages of the lexical copying story, it's not my 00:42:08.740 |
story, this is like, Ivan Sog, linguists, a bunch of linguists have been proposing these 00:42:14.140 |
stories as well, in tandem with the movement story, okay, Ivan Sog died a while ago, but 00:42:20.060 |
he was one of the proponents of the non-movement of the lexical copying story, and so that 00:42:24.900 |
is that, a great advantage is, well, Chomsky, really famously in 1971, showed that the movement 00:42:34.500 |
story leads to learnability problems, it leads to problems for how language is learned, it's 00:42:41.220 |
really, really hard to figure out what the underlying structure of a language is if you 00:42:45.940 |
have both phrase structure and movement, it's like really hard to figure out what came from 00:42:51.380 |
what, there's like a lot of possibilities there. 00:42:53.460 |
If you don't have that problem, the learning problem gets a lot easier. 00:42:57.220 |
- Just say there's lexical copies, and when we say the learning problem, do you mean humans 00:43:03.140 |
- Yeah, just learning English, so a baby is lying around listening to me talk, and how 00:43:09.220 |
are they learning English, or maybe it's a two-year-old who's learning interrogatives 00:43:13.940 |
and stuff, how are they doing that, are they doing it from, are they figuring out, so Chomsky 00:43:20.940 |
said it's impossible to figure it out, actually, he said it's actually impossible, not hard, 00:43:26.300 |
but impossible, and therefore, that's where universal grammar comes from, is that it has 00:43:31.220 |
to be built in, and so what they're learning is, there's some built in, movement is built 00:43:37.140 |
in in his story, it's absolutely part of your language module, and then you are, you're 00:43:44.380 |
just setting parameters, you're said, depending on English, it's just sort of a variant of 00:43:48.340 |
the universal grammar, and you're figuring out, oh, which orders does English do these 00:43:52.940 |
things, that's, the non-movement story doesn't have this, it's like much more bottom up, 00:43:59.500 |
you're learning rules, you're learning rules one by one, and oh, there's, this word is 00:44:04.420 |
connected to that word, a great advantage, another advantage, it's learnable, another 00:44:08.880 |
advantage of it is that it predicts that not all auxiliaries might move, like it might 00:44:14.300 |
depend on the word, depending on whether you, and that turns out to be true, so there's 00:44:19.140 |
words that don't really work as auxiliary, they work in declarative and not in interrogative, 00:44:25.860 |
so I can say, I'll give you the opposite first, I can say, "Aren't I invited to the party?" 00:44:32.820 |
And that's an interrogative form, but it's not from, "I aren't invited to the party," 00:44:38.180 |
there is no, "I aren't," so that's interrogative only. 00:44:42.540 |
And then we also have forms like, "Ought," "I ought to do this," and I guess some old 00:44:55.780 |
I don't even think "ought" is great, but I mean, I totally recognize, "I ought to," 00:44:59.100 |
it's not too bad, actually, I can say, "I ought to do this," that sounds pretty good. 00:45:04.280 |
I don't know, it just sounds completely out to me. 00:45:08.500 |
Anyway, so there are variants here, and a lot of these words just work in one versus 00:45:13.100 |
the other, and that's fine under the lexical copying story, it's like, well, you just 00:45:17.660 |
learn the usage, whatever the usage is, is what you do with this word. 00:45:23.780 |
But it's a little bit harder in the movement story. 00:45:26.700 |
The movement story, that's an advantage, I think, of lexical copying, and in all these 00:45:30.460 |
different places, there's all these usage variants which make the movement story a little 00:45:39.980 |
So one of the main divisions here is the movement story versus the lexical copying story, that 00:45:43.940 |
has to do about the auxiliary words and so on, but if you rewind to the phrase structured 00:45:52.540 |
Those are equivalent in some sense, in that for any dependency grammar, I can generate 00:45:57.780 |
a phrase structured grammar which generates exactly the same sentences, I just like the 00:46:03.220 |
dependency grammar formalism because it makes something really salient, which is the lengths 00:46:11.020 |
of dependencies between words, which isn't so obvious in the phrase structure. 00:46:15.220 |
In the phrase structure, it's just kind of hard to see. 00:46:17.640 |
It's in there, it's just very, very, it's opaque. 00:46:21.060 |
- Technically, I think phrase structured grammar is mappable to dependency grammar. 00:46:29.580 |
- Yeah, for a particular dependency grammar, you can make a phrase structured grammar which 00:46:34.220 |
generates exactly those same sentences, and vice versa, but there are many phrase structured 00:46:39.340 |
grammars which you can't really make a dependency grammar. 00:46:41.980 |
I mean, you can do a lot more in a phrase structured grammar, but you get many more 00:46:48.860 |
You can have more structure in there, and some people like that, and maybe there's value 00:46:55.180 |
- Well, for you, so we should clarify, so dependency grammar, it's just, well, one word 00:47:01.020 |
depends on only one other word, and you form these trees, and that makes, it really puts 00:47:07.220 |
priority on those dependencies, just like as a tree that you can then measure the distance 00:47:12.660 |
of the dependency from one word to the other. 00:47:15.140 |
They can then map to the cognitive processing of the sentences, how easy it is to understand, 00:47:23.620 |
So, it just puts the focus on just like the mathematical distance of dependence between 00:47:35.460 |
- Just continue on the thread of Chomsky, 'cause it's really interesting, 'cause as you're 00:47:39.420 |
discussing disagreement, to the degree there's disagreement, you're also telling the history 00:47:44.440 |
of the study of language, which is really awesome. 00:47:47.220 |
So, you mentioned context-free versus regular. 00:47:50.660 |
Does that distinction come into play for dependency grammars? 00:47:57.420 |
I mean, regular languages are too simple for human languages. 00:48:04.380 |
But human languages are, in the phrase structure world, are definite, they're at least context-free. 00:48:11.620 |
Maybe a little bit more, a little bit harder than that. 00:48:15.300 |
So, there's something called context-sensitive as well, where you can have, like this is 00:48:22.860 |
In a context-free grammar, you have one, this is like a bunch of formal language theory 00:48:31.140 |
So, you have a left-hand side category, and you're expanding to anything on the right. 00:48:36.700 |
So, the idea is that that category on the left expands in independent of context to 00:48:40.660 |
those things, whatever they are on the right, doesn't matter what. 00:48:43.820 |
And a context-sensitive says, okay, I actually have more than one thing on the left. 00:48:50.140 |
I can tell you only in this context, maybe you have a left and a right context, or just 00:48:54.580 |
a left context or a right context, I have two or more stuff on the left, tells you how 00:49:02.700 |
A regular language is just more constrained, and so it doesn't allow anything on the right. 00:49:09.540 |
It allows very, basically, it's one very complicated rule, is kind of what a regular language is. 00:49:17.260 |
And so, it doesn't have any, what's it say, long-distance dependencies? 00:49:25.300 |
Yeah, recursion is where you, which is, human languages have recursion, they have embedding, 00:49:29.260 |
and you can't, well, it doesn't allow center-embedded recursion, which human languages have, which 00:49:39.460 |
But, you know, the formal language stuff is a little aside, Chomsky wasn't proposing it 00:49:44.500 |
He was just pointing out that human languages are context-free, and then he was most, for 00:49:49.380 |
human, 'cause that was kind of stuff we did for formal languages, and what he was most 00:49:52.960 |
interested in was human language, and that's like, the movement is where we, where he sort 00:50:00.120 |
of set off on, I would say, a very interesting, but wrong foot. 00:50:04.740 |
It was kind of interesting, it's a very, I agree, it's a very interesting history. 00:50:08.040 |
So there's this, so he proposed this multiple theories in '57 and then '65, they all have 00:50:13.640 |
this framework, though, it was phrase structure plus movement, different versions of the phrase 00:50:18.020 |
structure and the movement in the '57, these are the most famous original bits of Chomsky's 00:50:23.180 |
And then '71 is when he figured out that those lead to learning problems, that there's cases 00:50:27.540 |
where a kid could never figure out which rule, which set of rules was intended. 00:50:34.980 |
And so, and then he said, well, that means it's innate. 00:50:37.620 |
It's kind of interesting, he just really thought the movement was just so obviously true that 00:50:41.820 |
he couldn't, he didn't even entertain giving it up, it's just obvious, that's obviously 00:50:48.180 |
And it was later where people figured out that there's all these subtle ways in which 00:50:53.500 |
things, which look like generalizations aren't generalizations, and they, across the category, 00:50:58.820 |
they're word-specific, and they have, and they kind of work, but they don't work across 00:51:02.780 |
various other words in the category, and so it's easier to just think of these things 00:51:07.820 |
And I think he was very obsessed, I don't know, I'm guessing, that he just, he really 00:51:13.220 |
wanted this story to be simple in some sense, and language is a little more complicated 00:51:18.940 |
He didn't like words, he never talks about words, he likes to talk about combinations 00:51:23.940 |
And words are, you know, look up a dictionary, there's 50 senses for a common word, right? 00:51:28.900 |
The word "take" will have 30 or 40 senses in it. 00:51:32.060 |
So there'll be many different senses for common words. 00:51:35.400 |
And he just doesn't think about that, or he doesn't think that's language. 00:51:39.900 |
I think he doesn't think that's language, he thinks that words are distinct from combinations 00:51:47.760 |
If you look at my brain in the scanner, while I'm listening to a language I understand, 00:51:54.180 |
and you compare, I can localize my language network in a few minutes, in like 15 minutes. 00:51:59.320 |
And what you do is I listen to a language I know, I listen to, you know, maybe some 00:52:03.000 |
language I don't know, or I listen to muffled speech, or I read sentences, and I read non-words, 00:52:09.180 |
like I can do anything like this, anything that's sort of really like English, and anything 00:52:13.700 |
So I've got something like it and not, and I've got a control. 00:52:16.660 |
And the voxels, which is just, you know, the 3D pixels in my brain that are responding 00:52:22.740 |
most is a language area, and that's this left-lateralized area in my head. 00:52:30.540 |
And wherever I look in that network, if you look for the combinations versus the words, 00:52:41.460 |
There are no areas that we know, I mean, that's, it's a little overstated right now. 00:52:46.940 |
At this point, the technology isn't great, it's not bad, but we have the best way to 00:52:51.980 |
figure out what's going on in my brain when I'm listening or reading language is to use 00:53:02.140 |
So I can figure out where exactly these signals are coming from, pretty, you know, down to, 00:53:06.460 |
you know, millimeters, you know, cubic millimeters or smaller, okay? 00:53:16.420 |
And oxygen takes a little while to get to those cells, so it takes on the order of seconds. 00:53:21.140 |
So I talk fast, I probably listen fast, and I can probably understand things really fast. 00:53:28.060 |
And so to say that we know what's going on, that the words, right now in that network, 00:53:34.620 |
our best guess is that whole network is doing something similar, but maybe different parts 00:53:43.900 |
We just don't have very good methods to figure that out, right, at this moment. 00:53:47.820 |
And so since we're kind of talking about the history of the study of language, what other 00:53:54.500 |
interesting disagreements, and you're both at MIT, or were for a long time, what kind 00:53:59.180 |
of interesting disagreements there, tension of ideas are there between you and Noam Chomsky? 00:54:03.860 |
And we should say that Noam was in the linguistics department, and you're, I guess for a time 00:54:10.660 |
were affiliated there, but primarily brain and cognitive science department, which is 00:54:16.140 |
another way of studying language, and you've been talking about fMRI. 00:54:19.940 |
So what, is there something else interesting to bring to the surface about the disagreement 00:54:25.700 |
between the two of you, or other people in the discipline? 00:54:28.980 |
- Yeah, I mean, I've been at MIT for 31 years, since 1993, and he, Chomsky's been there much 00:54:36.860 |
So I met him, I knew him, I met when I first got there, I guess, and we would interact 00:54:44.220 |
So I'd say our biggest difference is our methods, and so that's the biggest difference between 00:54:52.300 |
me and Noam, is that I gather data from people. 00:54:57.820 |
I do experiments with people, and I gather corpus data, whatever corpus data's available, 00:55:02.940 |
and we do quantitative methods to evaluate any kind of hypothesis we have. 00:55:09.900 |
And so, he has never once been associated with any experiment or corpus work, ever. 00:55:19.600 |
It's his own intuitions, so I just don't think that's the way to do things. 00:55:25.720 |
That's a cross-the-street, they're-across-the-street-from-us kind of difference between brain and cog sci 00:55:32.260 |
I mean, not all linguists, some of the linguists, depending on what you do, more speech-oriented, 00:55:37.020 |
they do more quantitative stuff, but in the meaning, words and, well, it's combinations 00:55:43.100 |
of words, syntax semantics, they tend not to do experiments and corpus analysis. 00:55:49.420 |
- So on the linguistic side, probably, well, but the method is a symptom of a bigger approach, 00:55:56.020 |
which is sort of a psychology/philosophy side, and, Noam, for you, it's more sort of data-driven, 00:56:08.500 |
Brain and cognitive science is MIT's old psychology department. 00:56:12.060 |
It was a psychology department up until 1985, and it became the Brain and Cognitive Science 00:56:17.000 |
And so, I mean, my training is math and computer science, but I'm a psychologist. 00:56:27.380 |
- I don't know what I am, but I'm happy to be called a linguist, I'm happy to be called 00:56:30.540 |
a computer scientist, I'm happy to be called a psychologist, any of those things. 00:56:33.980 |
- In the actual, like how that manifests itself outside of the methodology is like these differences, 00:56:39.660 |
these subtle differences about the movement story versus the lexical copy story. 00:56:45.640 |
So the theories are, but I think the reason we differ in part is because of how we evaluate 00:56:52.980 |
And so I evaluate theories quantitatively, and Noam doesn't. 00:56:59.380 |
Okay, well, let's explore the theories that you explore in your book. 00:57:04.420 |
Let's return to this dependency grammar framework of looking at language. 00:57:10.140 |
What's a good justification why the dependency grammar framework is a good way to explain 00:57:16.800 |
- So the reason I like dependency grammar, as I've said before, is that it's very transparent 00:57:22.660 |
about its representation of distance between words. 00:57:26.120 |
So it's like, all it is, is you've got a bunch of words, you're connecting together to make 00:57:30.980 |
a sentence, and a really neat insight, which turns out to be true, is that the further 00:57:39.100 |
apart the pair of words are that you're connecting, the harder it is to do the production, the 00:57:44.740 |
It's harder to produce, it's harder to understand when the words are far apart. 00:57:47.500 |
When they're close together, it's easy to produce and it's easy to comprehend. 00:57:53.720 |
So we have, in any language, we have mostly local connections between words, but they're 00:58:01.840 |
The connections are abstract, they're between categories of words. 00:58:05.180 |
And so you can always make things further apart if you add modification, for example, 00:58:13.840 |
So a noun in English comes before a verb, the subject noun comes before a verb, and 00:58:22.120 |
I can say what I said before, you know, "The dog entered the room," or something like that. 00:58:27.120 |
If I say something more about "dog" after it, then what I'm doing is, indirectly, I'm 00:58:32.280 |
lengthening the dependence between "dog" and "entered" by adding more stuff to it. 00:58:39.320 |
If I say, "The boy who the cat scratched cried," we're going to have a mean cat here. 00:58:50.400 |
And so what I've got here is, "The boy cried," it would be a very short, simple sentence, 00:58:54.320 |
and I just told you something about the boy, and I told you it was the boy who the cat 00:59:01.240 |
So the "cried" is connected to the "boy," the "cried" at the end is connected to the 00:59:09.860 |
And I can say, "The cat which the dog chased ran away," or something, okay? 00:59:17.920 |
But it's really hard now, I've got, you know, whatever I have here, I have, "The boy who 00:59:23.960 |
the cat"—now let's say I try to modify "cat," okay? 00:59:27.080 |
"The boy who the cat which the dog chased scratched ran away." 00:59:34.880 |
I'm sort of just working that through in my head, how to produce, and it's really just 00:59:41.600 |
At least I've got intonation there to sort of mark the boundaries and stuff, but that's 00:59:52.400 |
So what's interesting about that is that what I'm doing is nesting dependencies there. 00:59:56.200 |
I'm putting one—I've got a subject connected to a verb there, and then I'm modifying that 01:00:01.920 |
with a clause, another clause, which happens to have a subject and a verb relation. 01:00:06.240 |
I'm trying to do that again on the second one. 01:00:08.120 |
And what that does is it lengthens out the dependence—multiple dependents actually 01:00:13.320 |
So the dependencies get longer, and the outside ones get long, and even the ones in between 01:00:28.200 |
So no matter what language you look at, if you do—just figure out some structure where 01:00:33.680 |
I'm going to have some modification following some head, which is connected to some later 01:00:41.040 |
So 100%, that will be uninterpretable in that language in the same way that was uninterpretable 01:00:47.800 |
The distance of the dependencies is whenever the boy cried, there's a dependence between 01:00:55.880 |
two words, and then you're counting the number of, what, morphemes between them? 01:01:10.480 |
And you're saying the longer the distance of that dependence, the more—no matter the 01:01:23.080 |
But that—the people will be very upset that speak that language. 01:01:27.240 |
Not upset, but they'll either not understand it, or they'll be like this is—their brain 01:01:34.040 |
- They will have a hard time either producing or comprehending it. 01:01:36.600 |
They might tell you that's not their language. 01:01:40.020 |
I mean, it's following their—like, they'll agree with each of those pieces as part of 01:01:43.220 |
their language, but somehow that combination will be very, very difficult to produce and 01:01:54.240 |
- So the—well, I mean—and then there's—I'm giving you two kinds of explanations. 01:01:58.040 |
I'm telling you that center embedding, that's nesting, those are the same—those are synonyms 01:02:03.980 |
And the explanation for what—those are always hard. 01:02:06.760 |
Center embedding and nesting are always hard. 01:02:08.080 |
And I gave you an explanation for why they might be hard, which is long-distance connections. 01:02:12.580 |
When you do center embedding, when you do nesting, you always have long-distance connections 01:02:16.940 |
You just—and so that's not necessarily the right explanation, it just—I can go through 01:02:20.880 |
reasons why that's probably a good explanation. 01:02:26.240 |
So probably it's a pair of them or something of these dependents that you—get long that 01:02:31.200 |
drives you to like be really confused in that case. 01:02:33.980 |
And so what the behavioral consequence there—I mean, we—this is kind of methods. 01:02:41.540 |
You could try to do experiments to get people to produce these things. 01:02:44.600 |
They're going to have a hard time producing them. 01:02:46.160 |
You can try to do experiments to get them to understand them and get—see how well 01:02:49.800 |
they understand them, can they understand them. 01:02:52.720 |
Another method you can do is give people partial materials and ask them to complete them, you 01:02:58.440 |
know, those center-embedded materials, and they'll fail. 01:03:06.820 |
So central embedding meaning, like you take a normal sentence like boy cried and inject 01:03:10.720 |
a bunch of crap in the middle that separates the boy and the cried. 01:03:20.120 |
Center-embedding, those are totally equivalent terms. 01:03:22.080 |
I'm sorry I sometimes use one and sometimes use the other. 01:03:28.160 |
And then what you're saying is there's a bunch of different kinds of experiments you can 01:03:32.160 |
And the way to understand anyone is like have more embedding, more central embedding. 01:03:37.320 |
But then you have to measure the level of understanding, I guess. 01:03:42.360 |
I mean, there's the simplest way is just ask people how good does it sound? 01:03:52.240 |
And so it's like, I don't know what it means exactly, but it's doing something such that 01:03:55.720 |
we're measuring something about the confusion, the difficulty associated with those. 01:03:59.000 |
- And those, like those are giving you a signal. 01:04:02.760 |
What about the completion of the central embed? 01:04:05.560 |
- So if you give them a partial sentence, say I say the book which the author who, and 01:04:15.600 |
I mean, either say it, yeah, yeah, but say it's written in front of you and you can just 01:04:21.480 |
They will, even though that one's not too hard, right? 01:04:24.240 |
So if I say it's like the book, it's like, oh, the book which the author who I met wrote 01:04:33.840 |
If I give that completion online somewhere to a crowdsourcing platform and ask people 01:04:40.280 |
to complete that, they will miss off a verb very regularly, like half the time, maybe 01:04:46.640 |
They'll say, they'll just leave off one of those verb phrases. 01:04:49.520 |
Even with that simple, so say the book which the author who, and they'll say was, you need 01:05:04.080 |
They'll say, who was famous, was good, or something like that. 01:05:11.360 |
So 40%, maybe 30, they'll do it correctly, correctly, meaning they'll do a three verb 01:05:20.600 |
- Yeah, I can actually, I'm struggling with it in my head. 01:05:25.360 |
- If you look, it's a little easier than listening, it's pretty tough. 01:05:28.200 |
'Cause you have to, 'cause there's no trace of it. 01:05:31.320 |
You have to remember the words that I'm saying, which is very hard auditorily. 01:05:38.840 |
It's easier in many dimensions in some ways, depending on the person. 01:05:41.680 |
It's easier to gather written data for, I mean, most sort of, I work in psycholinguistics, 01:05:49.400 |
And so a lot of our work is based on written stuff because it's so easy to gather data 01:05:57.240 |
Written tasks are just more complicated to administer and analyze because people do weird 01:06:02.480 |
things when they speak, and it's harder to analyze what they do. 01:06:05.880 |
But they generally point to the same kinds of things. 01:06:10.080 |
- Okay, so the universal theory of language by Ted Gibson is that you can form dependency, 01:06:19.320 |
you can form trees from any sentences, and you can measure the distance in some way of 01:06:23.920 |
those dependencies, and then you can say that most languages have very short dependencies. 01:06:34.880 |
So an ex-student of mine, this guy's at University of California, Irvine, Richard Futrell did 01:06:40.680 |
a thing a bunch of years ago now, where he looked at all the languages we could look 01:06:45.720 |
at, which was about 40 initially, and now I think there's about 60, for which there 01:06:52.760 |
So meaning there's gotta be a big text, a bunch of texts, which have been parsed for 01:06:57.120 |
the dependency structures, and there's about 60 of those which have been parsed that way. 01:07:01.840 |
And for all of those, what he did was take any sentence in one of those languages, and 01:07:09.720 |
you can do the dependency structure, and then start at the root, we're talking about dependency 01:07:13.360 |
structures, that's pretty easy now, and he's trying to figure out what a control way you 01:07:18.080 |
might say the same sentence is in that language. 01:07:21.280 |
And so he's just like, all right, there's a root, and let's say the sentence is, let's 01:07:28.160 |
So entered is the root, and entered has two dependents, it's got dogs, and it has room. 01:07:35.440 |
And what he does is, let's scramble that order, that's three things, the root, and the head, 01:07:40.280 |
and the two dependents, in just some random order, just random, and then just do that 01:07:46.000 |
So now look, do it for the, and whatever, it's two, and dogs, and for, and room. 01:07:50.480 |
And that's not, it's a very short sentence, when sentences get longer, and you have more 01:07:55.120 |
dependents, there's more scrambling that's possible, and what he found, so that's one, 01:08:00.800 |
you can figure out one scrambling for that sentence, he did this like a hundred times, 01:08:04.000 |
for every sentence in every one of these texts, every corpus, and then he just compared the 01:08:10.880 |
dependency lengths in those random scramblings to what actually happened, what the English 01:08:16.640 |
or the French or the German was in the original language, or Chinese, or what all these like 01:08:22.960 |
And the dependency lengths are always shorter in the real language, compared to this kind 01:08:28.400 |
And there's another, it's a little more rigid, his control, so the way I described it, you 01:08:36.120 |
could have crossed dependencies, like by scrambling that way, you could scramble in any way at 01:08:41.440 |
all, languages don't do that, they tend not to cross dependencies very much. 01:08:46.440 |
Like so the dependency structure, they tend to keep things non-crossed, and there's a 01:08:52.240 |
technical term, they call that projective, but it's just non-crossed is all that is projective. 01:08:56.720 |
And so if you just constrain the scrambling, so that it only gives you projective, sort 01:09:04.320 |
So still human languages are much shorter than this kind of a control. 01:09:10.720 |
So there's like, what it means is that we're, in every language, we're trying to put things 01:09:18.920 |
It doesn't matter about the word order, some of these are verb-final, some of these are 01:09:21.720 |
verb-medial-like English, and some are even verb-initial, there are a few languages in 01:09:25.800 |
the world which have VSO, word order, verb, subject, object languages, haven't talked 01:09:34.000 |
- And even in those languages, it's still short dependencies. 01:09:39.080 |
- Okay, so what are some possible explanations for that? 01:09:47.160 |
So that's one of the, I suppose, disagreements you might have with Chomsky, so you consider 01:09:53.240 |
the evolution of language in terms of information theory, and for you, the purpose of language 01:10:02.680 |
is ease of communication, right, and processing. 01:10:06.280 |
So I mean, the story here is just about communication, it is just about production, really, it's 01:10:15.120 |
- Oh, I just mean ease of language production, it's easier for me to say things when the, 01:10:20.360 |
what I'm doing whenever I'm talking to you is somehow I'm formulating some idea in my 01:10:24.240 |
head and I'm putting these words together, and it's easier for me to do that, to put, 01:10:29.840 |
to say something where the words are closely connected in a dependency, as opposed to separated, 01:10:35.600 |
by putting something in between and over and over again, it's just hard for me to keep 01:10:39.600 |
that in my head, that's the whole story, the story, it's basically, the dependency grammar 01:10:44.880 |
sort of gives that to you, just like long is bad, short is good, it's easier to keep 01:10:50.440 |
in mind because you have to keep it in mind for, probably for production, probably matters 01:10:55.660 |
in comprehension as well, also matters in comprehension. 01:10:58.160 |
- It's on both sides, the production and the-- 01:11:00.400 |
- But I would guess it's probably evolved for production, it's about producing, what's 01:11:04.040 |
easier for me to say, that ends up being easier for you also, that's very hard to disentangle, 01:11:09.800 |
this idea of who is it for, is it for me, the speaker, or is it for you, the listener, 01:11:14.160 |
I mean part of my language is for you, like the way I talk to you is gonna be different 01:11:19.320 |
from how I talk to different people, I'm definitely angling what I'm saying to who I'm saying, 01:11:24.600 |
it's not like I'm just talking the same way to every single person, and so I am sensitive 01:11:29.920 |
to my audience, but does that work itself out in the dependency length differences, 01:11:37.480 |
I don't know, maybe that's about just the words, that part, which words I select. 01:11:41.280 |
- My initial intuition is that you optimize language for the audience, but it's just kind 01:11:48.320 |
of like messing with my head a little bit to say that some of the optimization might 01:11:52.440 |
be, maybe the primary objective of the optimization might be the ease of production. 01:11:57.400 |
- We have different senses I guess, I'm very selfish, and you're like, I think it's all 01:12:03.920 |
about me, I'm just doing what's easiest for me, I don't wanna, I mean but I have to of 01:12:09.520 |
course choose the words that I think you're gonna know, I'm not gonna choose words you 01:12:14.200 |
don't know, in fact I'm gonna fix that, so there it's about, but maybe for the syntax, 01:12:20.280 |
for the combinations it's just about me, I feel like it's, I don't know though, it's 01:12:24.040 |
very hard to-- - Wait, wait, wait, but the purpose of communication is to be understood, 01:12:27.920 |
is to convince others and so on, so like the selfish thing is to be understood, so it's 01:12:32.680 |
about the listener. - Okay, it's a little circular there too 01:12:34.000 |
then, okay. - Right, I mean like the ease of production-- 01:12:37.200 |
- Helps me be understood then, I don't think it's circular, so I want what's-- 01:12:42.320 |
- No I think the primary objective is about the listener, 'cause otherwise if you're optimizing 01:12:49.400 |
for the ease of production then you're not gonna have any of the interesting complexity 01:12:53.320 |
of language, like you're trying to like explain-- - Well let's control for what it is I want 01:12:57.120 |
to say, like I'm saying let's control for the thing, the message, control for the message, 01:13:01.880 |
I want to tell you-- - But that means the message needs to be 01:13:03.280 |
understood, that's the goal. - Oh but that's the meaning, so I'm still 01:13:06.440 |
talking about the form, just the form of the meaning, how do I frame the form of the meaning 01:13:11.920 |
is all I'm talking about, you're talking about a harder thing I think, it's like how am I, 01:13:16.040 |
like trying to change the meaning, let's keep the meaning constant, like which, if you keep 01:13:21.200 |
the meaning constant, how can I phrase whatever it is I need to say, like I gotta pick the 01:13:26.360 |
right words and I'm gonna pick the order so that it's easy for me, that's what I think 01:13:31.920 |
it's probably like. - I think I'm still tying meaning and form 01:13:36.040 |
together in my head, but you're saying if you keep the meaning of what you're saying 01:13:40.320 |
constant, the optimization, yeah it could be the primary objective that optimization 01:13:46.120 |
is for production, that's interesting. I'm struggling to keep constant meaning, it's 01:13:54.120 |
just so, I mean I'm a human, so for me the form, without having introspected on this, 01:14:02.440 |
the form and the meaning are tied together, like deeply, because I'm a human, like for 01:14:09.680 |
me when I'm speaking, 'cause I haven't thought about language, like in a rigorous way, about 01:14:14.800 |
the form of language. - But look, for any event, there's an unbounded, 01:14:22.360 |
I don't wanna say infinite, but sort of ways that I might communicate that same event. 01:14:26.760 |
This two dogs entered a room, I can say in many, many different ways, I can say hey, 01:14:31.360 |
there's two dogs, they entered the room. Hey, the room was entered by something, the thing 01:14:37.120 |
that was entered was two dogs, I mean that's kind of awkward and weird and stuff, but those 01:14:40.960 |
are all similar messages with different forms, different ways I might frame, and of course 01:14:48.040 |
I use the same words there all the time. I could have referred to the dogs as a Dalmatian 01:14:52.960 |
and a poodle or something. I could have been more specific or less specific about what 01:14:56.760 |
they are, and I could have said, been more abstract about the number. So I'm trying to 01:15:02.520 |
keep the meaning, which is this event, constant, and then how am I gonna describe that to get 01:15:08.280 |
that to you, it kind of depends on what you need to know, right, and what I think you 01:15:11.360 |
need to know, but I'm like trying to, let's control for all that stuff, and not, and I'm 01:15:16.680 |
just choosing, I'm doing something simpler than you're doing, which is just forms, just 01:15:21.800 |
words. - So to you, specifying the breed of dog 01:15:25.960 |
and whether they're cute or not is changing the meaning. 01:15:30.320 |
- That might be, yeah, yeah, that would be changing, oh, that would be changing the meaning 01:15:32.840 |
for sure. - Right, so you're just, well, yeah, yeah. 01:15:36.640 |
That's changing the meaning, but say, even if we keep that constant, we can still talk 01:15:40.600 |
about what's easier or hard for me, right, the listener and the, right? Which phrase 01:15:46.000 |
structures I use, which combinations, which, you know. 01:15:49.080 |
- This is so fascinating and just like a really powerful window into human language, but I 01:15:56.080 |
wonder still throughout this how vast the gap between meaning and form. I just have 01:16:03.480 |
this like maybe romanticized notion that they're close together, that they evolve close, like 01:16:09.120 |
hand in hand, that you can't just simply optimize for one without the other being in the room 01:16:15.880 |
with us. Like it's, well, it's kind of like an iceberg. Form is the tip of the iceberg 01:16:21.920 |
and the rest, the meaning is the iceberg, but you can't like separate. 01:16:26.120 |
- But I think that's why these large language models are so successful is 'cause they're 01:16:30.640 |
good at form and form isn't that hard in some sense. And meaning is tough still and that's 01:16:35.960 |
why they're not, you know, they don't understand what they're doing. We're gonna talk about 01:16:39.120 |
that later maybe, but like we can distinguish in our, forget about large language models, 01:16:44.920 |
like humans, maybe you'll talk about that later too, is like the difference between 01:16:49.200 |
language, which is a communication system, and thinking, which is meaning. So language 01:16:54.440 |
is a communication system for the meaning, it's not the meaning. And so that's why, I 01:16:59.760 |
mean, and there's a lot of interesting evidence we can talk about relevant to that. 01:17:04.560 |
- Well, I mean, that's a really interesting question. What is the difference between language, 01:17:10.800 |
written, communicated, versus thought? What to use the difference between them? 01:17:19.040 |
- Well, you or anyone has to think of a task, which they think is a good thinking task. 01:17:24.640 |
And there's lots and lots of tasks, which should be good thinking tasks. And whatever 01:17:29.320 |
those tasks are, let's say it's, you know, playing chess, or that's a good thinking 01:17:33.160 |
task, or playing some game, or doing some complex puzzles, maybe remembering some digits, 01:17:39.640 |
that's thinking, remembering some, a lot of different tasks we might think, maybe just 01:17:43.160 |
listening to music is thinking, or there's a lot of different tasks we might think of 01:17:46.520 |
as thinking. There's this woman in my department, F. Fedorenko, and she's done a lot of work 01:17:51.640 |
on this question about what's the connection between language and thought. And so she uses, 01:17:56.680 |
I was referring earlier to MRI, fMRI, that's her primary method. And so she has been really 01:18:02.860 |
fascinated by this question about whether, what language is. And so, as I mentioned earlier, 01:18:08.600 |
you can localize my language area, your language area, in a few minutes. In like 15 minutes 01:18:13.920 |
I can listen to language, listen to non-language, or backward speech, or something, and we'll 01:18:18.760 |
find areas, left lateralized network in my head, which is specially, which is very sensitive 01:18:24.640 |
to language, as opposed to whatever that control was, okay? 01:18:28.080 |
- Can you specify what you mean by language, like communicated language? Like what is language? 01:18:31.880 |
- Just sentences. You know, I'm listening to English of any kind, a story, or I can 01:18:35.680 |
read sentences, anything at all that I understand, if I understand it, then it'll activate my 01:18:42.720 |
- My language network is going like crazy when I'm talking, and when I'm listening to 01:18:45.960 |
you, because we're both, we're communicating. 01:18:49.480 |
- Yeah, it's incredibly stable. So I've, I happen to be married to this woman at Federico, 01:18:55.400 |
and so I've been scanned by her over, and over, and over, since 2007, or six, or something. 01:18:59.680 |
And so my language network is exactly the same, you know, like a month ago, as it was 01:19:06.480 |
- It's amazingly stable, it's astounding. It's a really fundamentally cool thing. And 01:19:11.720 |
so my language network is, it's like my face, okay? It's not changing much over time, inside 01:19:17.720 |
- Can I ask a quick question? Sorry, this is a small tangent. At which point in the, 01:19:22.280 |
as you grow up from baby to adult, does it stabilize? 01:19:28.000 |
- That's a very hard question. They're working on that right now, because of the problem 01:19:31.560 |
scanning little kids. Like doing the, trying to do local, trying to do the localization 01:19:36.520 |
on little children in this scanner. You're lying in the fMRI scan, that's the best way 01:19:41.280 |
to figure out where something's going on inside our brains. And the scanner's loud, and you're 01:19:45.680 |
in this tiny little area, you're claustrophobic. And it doesn't bother me at all, I can go 01:19:50.360 |
to sleep in there. But some people are bothered by it, and little kids don't really like it, 01:19:54.520 |
and they don't like to lie still. And you have to be really still, because if you move 01:19:57.760 |
around, that messes up the coordinates of where everything is. And so, you know, try 01:20:02.160 |
to get, you know, your question is, how and when are language developing, you know, how 01:20:07.440 |
does this left lateralized system come to play? And it's really hard to get a two year 01:20:11.480 |
old to do this task. But you can maybe, they're starting to get three and four and five year 01:20:15.600 |
olds to do this task for short periods, and it looks like it's there pretty early. 01:20:19.960 |
- So clearly, when you lead up to a baby's first words, before that, there's a lot of 01:20:26.120 |
fascinating turmoil going on about figuring out, what are these people saying? And you're 01:20:32.720 |
trying to make sense, how does that connect to the world, and all that kind of stuff. 01:20:36.960 |
That might be just fascinating development that's happening there. That's hard to introspect. 01:20:41.760 |
- But anyway, we're back to the scanner. And I can find my network in 15 minutes, and now 01:20:47.640 |
we can ask, find my network, find yours, find, you know, 20 other people do this task. And 01:20:53.080 |
we can do some other tasks. Anything else you think is thinking of some other thing. 01:20:56.880 |
I can do a spatial memory task. I can do a music perception task. I can do programming 01:21:03.920 |
task, if I program, okay? I can do, where I can understand computer programs. And none 01:21:10.080 |
of those tasks tap the language network at all. Like, at all. There's no overlap. They're 01:21:15.320 |
highly activated in other parts of the brain. There's a bilateral network, which I think 01:21:20.880 |
she tends to call the multiple demands network, which does anything kind of hard. And so anything 01:21:25.360 |
that's kind of difficult in some ways will activate that multiple demands network. I 01:21:30.480 |
mean, music will be in some music area. You know, there's music-specific kinds of areas. 01:21:36.560 |
But none of them are activating the language area at all, unless there's words. Like, so 01:21:41.440 |
if you have music, and there's a song, and you can hear the words, then you get the language 01:21:46.640 |
- Are we talking about speaking and listening? Or are we also talking about reading? 01:21:54.680 |
- So this network doesn't make any difference if it's written or spoken. So the thing that 01:22:00.720 |
she calls, Federico calls, the language network is this high-level language. So it's not about 01:22:05.160 |
the spoken language, and it's not about the written language. It's about either one of 01:22:09.240 |
them. And so when you do speech, you either listen to speech, and you subtract away some 01:22:14.840 |
language you don't understand, or you subtract away backward speech, which sounds like speech, 01:22:20.760 |
but it isn't. And then so you take away the sound part altogether. And then if you do 01:22:26.680 |
written, you get exactly the same network. So for just reading the language versus reading 01:22:32.040 |
sort of nonsense words or something like that, you'll find exactly the same network. And 01:22:36.280 |
so this is about high-level comprehension of language, yeah, in this case. And the same 01:22:41.560 |
thing happens, production's a little harder to run the scanner, but the same thing happens 01:22:44.280 |
in production. You get the same network. So production's a little harder, right? You have 01:22:47.320 |
to figure out how do you run a task in the network such that you're doing some kind of 01:22:50.920 |
production. And I can't remember what, they've done a bunch of different kinds of tasks there 01:22:54.360 |
where you get people to produce things, yeah, figure out how to produce. And the same network 01:22:59.720 |
goes on there. It's actually the same place. - Wait, wait, so if you read random words? 01:23:04.600 |
- Yeah, if you read things like-- - Like gibberish. 01:23:07.480 |
- Yeah, yeah, Lewis Carroll's twas brillig, jabberwocky, right? They call that jabberwocky 01:23:14.760 |
- Not as much. There are words in there. - Yeah, 'cause it's still-- 01:23:17.880 |
- There's function words and stuff, so it's lower activation. 01:23:20.600 |
- Fascinating. - Yeah, yeah. So there's like, 01:23:22.440 |
basically, the more language-like it is, the higher it goes in the language network. And 01:23:27.000 |
that network is there from when you speak, as soon as you learn language. And it's there, 01:23:33.560 |
like you speak multiple languages, the same network is going for your multiple languages. 01:23:37.640 |
So you speak English, you speak Russian, both of them are hitting that same network if you're 01:23:43.000 |
fluent in those languages. - So programming-- 01:23:45.080 |
- Not at all. Isn't that amazing? Even if you're a really good programmer, that is not a human 01:23:50.520 |
language. It's just not conveying the same information. And so it is not in the language 01:23:57.240 |
as I think. That's weird. - It's pretty cool. 01:23:59.880 |
- That's really weird. - And so that's like one set of data. 01:24:01.800 |
This is hers, shows that what you might think is thinking is not language. Language is just 01:24:08.440 |
this conventionalized system that we've worked out in human languages. Oh, another fascinating 01:24:14.600 |
little tidbit is that even if there are these constructed languages like Klingon, or I don't 01:24:21.560 |
know the languages from Game of Thrones, I'm sorry, I don't remember those languages. 01:24:24.600 |
- There's a lot of people offended right now. - There's people that speak those languages. 01:24:28.200 |
They really speak those languages because the people that wrote the languages for the shows, 01:24:34.920 |
they did an amazing job of constructing something like a human language. And that lights up the 01:24:40.840 |
language area. Because they can speak pretty much arbitrary thoughts in a human language. 01:24:46.840 |
It's a constructed human language, and probably it's related to human languages because the people 01:24:51.560 |
that were constructing them were making them like human languages in various ways. But it also 01:24:56.040 |
activates the same network, which is pretty cool. Anyway. 01:24:59.400 |
- Sorry to go into a place where you may be a little bit philosophical, but is it possible 01:25:05.400 |
that this area of the brain is doing some kind of translation into a deeper set of 01:25:14.760 |
So it's doing in communication, right? It is translating from thought, whatever that is, 01:25:19.960 |
it's more abstract, and it's doing that. That's what it's doing. That is kind of what it is doing. 01:25:24.920 |
It's kind of a meaning network, I guess. - Yeah, like a translation network. But I 01:25:29.240 |
wonder what is at the core, at the bottom of it, what are thoughts? Are thoughts, 01:25:34.440 |
to me like thoughts and words, are they neighbors, or is it one turtle sitting on top of the other? 01:25:41.960 |
Meaning like, is there a deep set of concepts that we-- 01:25:46.280 |
- Well, there's connections between what these things mean, and then there's probably other 01:25:51.240 |
parts of the brain that what these things mean. And so when I'm talking about whatever it is I 01:25:56.360 |
want to talk about, it'll be represented somewhere else. That knowledge of whatever that is will be 01:26:01.400 |
represented somewhere else. - Well, I wonder if there's some stable, 01:26:04.840 |
nicely compressed encoding of meanings that's separate from language. I guess the implication 01:26:14.200 |
here is that we don't think in language. - That's correct. Isn't that cool? And that's 01:26:21.720 |
so interesting. So people, I mean, this is like hard to do experiments on, but there is this idea 01:26:26.680 |
of inner voice, and a lot of people have an inner voice. And so if you do a poll on the internet and 01:26:32.360 |
ask if you hear yourself talking when you're just thinking or whatever, about 70 or 80% of people 01:26:37.720 |
will say yes. Most people have an inner voice. I don't. And so I always find this strange. So when 01:26:44.280 |
people talk about an inner voice, I always thought this was a metaphor, and they hear. I know most of 01:26:50.360 |
you, whoever's listening to this, thinks I'm crazy now 'cause I don't have an inner voice, and I just 01:26:55.240 |
don't know what you're listening to. It sounds so kind of annoying to me to have this voice going on 01:27:01.000 |
while you're thinking, but I guess most people have that, and I don't have that, and we don't 01:27:06.760 |
really know what that connects to. - I wonder if the inner voice activates 01:27:10.280 |
that same network. I wonder. - I don't know. I don't know. I mean, 01:27:14.280 |
this could be speechy, right? So that's like, you hear. Do you have an inner voice? 01:27:17.720 |
- I don't think so. - Oh. A lot of people have 01:27:20.280 |
this sense that they hear themselves, and then say they read someone's email. I've heard people tell 01:27:25.960 |
me that they hear that other person's voice when they read other people's emails, and I'm like, 01:27:31.640 |
wow, that sounds so disruptive. - I do think I vocalize what I'm reading, 01:27:36.520 |
but I don't think I hear a voice. - Well, you probably don't have 01:27:39.800 |
an inner voice. - Yeah, I don't think I have an inner voice. 01:27:40.840 |
- People have an inner voice. People have this strong percept of hearing sound in their heads 01:27:46.600 |
when they're just thinking. - I refuse to believe 01:27:49.000 |
that's the majority of people. - Majority, absolutely. 01:27:54.600 |
- I would never ask class, and when I go on the internet, they always say that. 01:27:58.280 |
So you're in a minority. - It could be a self-report flaw. 01:28:03.480 |
inside my head, I'm kind of like saying the words, which is probably the wrong way to read, 01:28:12.920 |
but I don't hear a voice. There's no percept of a voice. I refuse to believe the majority 01:28:19.400 |
people have it. Anyway, it's a fascinating, the human brain is fascinating, but it still blew 01:28:23.560 |
my mind that language does appear, comprehension does appear to be separate from thinking. 01:28:31.240 |
- Mm-hmm, so that's one set. One set of data from Fedorenko's group is that no matter what task you 01:28:39.160 |
do, if it doesn't have words and combinations of words in it, then it won't light up the language 01:28:43.800 |
network. It'll be active somewhere else, but not there. So that's one. And then this other 01:28:49.320 |
piece of evidence relevant to that question is it turns out there are this group of people who've 01:28:56.680 |
had a massive stroke on the left side and wiped out their language network. And as long as they 01:29:02.520 |
didn't wipe out everything on the right as well, in that case, they wouldn't be cognitively 01:29:06.280 |
functionable. But if they just wiped out language, which is pretty tough to do because it's very 01:29:11.160 |
expansive on the left, but if they have, then there is patients like this, so-called global 01:29:17.240 |
aphasics, who can do any task just fine, but not language. They can't talk to them. I mean, 01:29:24.760 |
they don't understand you. They can't speak, can't write, they can't read, but they can play chess, 01:29:31.240 |
they can drive their cars, they can do all kinds of other stuff, do math. So math is not in the 01:29:36.520 |
language area, for instance. You do arithmetic and stuff, that's not language area. It's got 01:29:40.680 |
symbols. So people sort of confuse some kind of symbolic processing with language, and symbolic 01:29:44.440 |
processing is not the same. So there are symbols and they have meaning, but it's not language. It's 01:29:49.400 |
not a conventionalized language system. And so math isn't there. And so they can do math. They 01:29:55.720 |
do just as well as their age-matched controls and all these tasks. This is Rosemary Varley over in 01:30:01.080 |
University College London, who has a bunch of patients who she's shown this, that they're just, 01:30:05.320 |
so that sort of combination suggests that language isn't necessary for thinking. It doesn't mean you 01:30:14.040 |
can't think in language. You could think in language, 'cause language allows a lot of 01:30:17.640 |
expression, but it's just, you don't need it for thinking. It suggests that language is separate, 01:30:22.280 |
is a separate system. - This is kind of blowing 01:30:26.040 |
- I'm trying to load that in, because it has implications for large language models. 01:30:32.120 |
- It sure does, and they've been working on that. 01:30:34.280 |
- Well, let's take a stroll there. You wrote that the best current theories of human language are 01:30:39.320 |
arguably large language models. So this has to do with form. 01:30:42.760 |
- It's kind of a big theory, but the reason it's arguably the best is that it does the best at 01:30:49.720 |
predicting what's English, for instance. It's incredibly good, better than any other theory. 01:30:55.800 |
It's so, but we don't, it's not sort of, there's not enough detail. 01:31:00.760 |
- Well, it's opaque. You don't know what's going on. 01:31:03.960 |
- You don't know what's going on. It's another black box. But I think it is a theory. 01:31:07.640 |
- What's your definition of a theory? 'Cause it's a gigantic black box with a very large 01:31:13.640 |
number of parameters controlling it. To me, theory usually requires a simplicity, right? 01:31:19.960 |
- Well, I don't know. Maybe I'm just being loose there. I think it's not a great theory, 01:31:24.920 |
but it's a theory. It's a good theory in one sense, in that it covers all the data. 01:31:28.760 |
Like anything you want to say in English, it does. And so that's how it's arguably the best, 01:31:33.080 |
is that no other theory is as good as a large language model in predicting exactly what's good 01:31:38.440 |
and what's bad in English. Now you're saying, is it a good theory? Well, probably not, you know, 01:31:43.800 |
because I want a smaller theory than that. It's too big. I agree. 01:31:46.920 |
- You could probably construct a mechanism by which it can generate a simple explanation 01:31:53.400 |
of a particular language, like a set of rules. It could generate a dependency 01:32:03.240 |
- You could probably just ask it about itself. - Well, you know, that presumes, 01:32:14.520 |
and there's some evidence for this, that some large language models are implementing something 01:32:20.680 |
like dependency grammar inside them. And so there's work from a guy called Chris Manning 01:32:25.560 |
and colleagues over at Stanford in natural language. And they looked at, I don't know 01:32:31.960 |
how many large language model types, but certainly BERT and some others, where you do some kind of 01:32:38.120 |
fancy math to figure out exactly what kind of abstractions of representations are going on. 01:32:43.320 |
And they were saying, it does look like dependency structure is what they're constructing. So it's 01:32:49.160 |
actually a very, very good map. So they are constructing something like that. Does it mean 01:32:55.960 |
that they're using that for meaning? I mean, probably, but we don't know. 01:33:00.360 |
- You write that the kinds of theories of language that LLMs are closest to 01:33:05.000 |
are called construction-based theories. Can you explain what construction-based theories are? 01:33:09.160 |
- It's just a general theory of language such that there's a form and a meaning pair 01:33:16.360 |
for lots of pieces of the language. And so it's primarily usage-based, is the construction 01:33:21.720 |
grammar. It's trying to deal with the things that people actually say, actually say and actually 01:33:27.480 |
write. And so it's a usage-based idea. And what's a construction? A construction is either a simple 01:33:33.720 |
word, so like a morpheme plus its meaning, or a combination of words. It's basically 01:33:39.320 |
combinations of words, like the rules. But it's unspecified as to what the form of the grammar 01:33:49.560 |
is underlyingly. And so I would argue that the dependency grammar is maybe the right form to use 01:33:56.760 |
for the types of construction grammar. Construction grammar typically isn't kind of formalized quite. 01:34:03.480 |
And so maybe the formalization, a-formalization of that, it might be in dependency grammar. 01:34:09.400 |
I mean, I would think so. But I mean, it's up to people, other researchers in that area, 01:34:17.160 |
that large language models understand language? Are they mimicking language? I guess the deeper 01:34:23.720 |
question there is, are they just understanding the surface form? Or do they understand something 01:34:29.720 |
deeper about the meaning that then generates the form? - I mean, I would argue they're doing the 01:34:35.160 |
form. They're doing the form, they're doing it really, really well. And are they doing the 01:34:38.440 |
meaning? No, probably not. I mean, there's lots of these examples from various groups showing that 01:34:44.120 |
they can be tricked in all kinds of ways. They really don't understand the meaning of what's 01:34:48.440 |
going on. And so there's a lot of examples that he and other groups have given, which show they 01:34:55.400 |
don't really understand what's going on. So you know the Monty Hall problem is this silly problem, 01:35:00.440 |
right? Where if you have three door, it's let's make a deal, it's this old game show, 01:35:06.040 |
and there's three doors, and there's a prize behind one, and there's some junk prizes behind 01:35:12.680 |
the other two, and you're trying to select one. And if you, he knows, Monty, he knows where the 01:35:18.760 |
target item is, the good thing, he knows everything is back there. And you're supposed to, he gives 01:35:24.360 |
you a choice, you choose one of the three, and then he opens one of the doors, and it's some 01:35:28.040 |
junk prize. And then the question is, should you trade to get the other one? And the answer is yes, 01:35:32.360 |
you should trade, because he knew which ones you could turn around, and so now the odds are two 01:35:36.440 |
thirds, okay? And then if you just change that a little bit to the large language model, the large 01:35:41.720 |
language model has seen that explanation so many times that it just, if you change the story, it's 01:35:47.560 |
a little bit, but it makes it sound like it's the Monty Hall problem, but it's not. You just say, 01:35:51.720 |
"Oh, there's three doors, and one behind them is a good prize, and there's two bad doors. I happen 01:35:57.400 |
to know it's behind door number one. The good prize, the car, is behind door number one. So, 01:36:01.800 |
I'm going to choose door number one. Monty Hall opens door number three and shows me nothing 01:36:05.560 |
there. Should I trade for door number two, even though I know the good prize is in door number 01:36:09.560 |
one?" And then the large language model will say, "Yes, you should trade," because it just goes 01:36:13.960 |
through the forms that it's seen before so many times on these cases, where it's, "Yes, you should 01:36:20.920 |
trade, because your odds have shifted from one and three now to two out of three to being that thing." 01:36:25.640 |
It doesn't have any way to remember that actually you have 100% probability behind that door number 01:36:31.800 |
one. You know that. That's not part of the scheme that it's seen hundreds and hundreds of times 01:36:37.160 |
before. Even if you try to explain to it that it's wrong, that it can't do that, it'll just keep 01:36:43.080 |
giving you back the problem. - But it's also possible the large language model will be aware 01:36:48.200 |
of the fact that there's sometimes over-representation of a particular kind of formulation. 01:36:55.800 |
And it's easy to get tricked by that. So you could see if they get larger and larger, 01:37:01.880 |
models be a little bit more skeptical. So you see over-representation. So it just feels like 01:37:08.040 |
training on form can go really far in terms of being able to generate 01:37:19.160 |
things that look like the thing understands deeply the underlying world model of the kind of 01:37:28.200 |
mathematical world, physical world, psychological world that would generate these kinds of sentences. 01:37:36.600 |
It just feels like you're creeping close to the meaning part. Easily fooled, all this kind of 01:37:42.600 |
stuff. But that's humans too. So it just seems really impressive how often it seems like it 01:37:51.320 |
understands concepts. - I mean, you don't have to convince me of that. I am very, very impressed. 01:37:58.120 |
I mean, you're giving a possible world where maybe someone's going to train some other versions such 01:38:05.480 |
that it'll be somehow abstracting away from types of forms. I mean, I don't think that's happened. 01:38:11.880 |
- Well, no, no, no. I'm not saying that. I think when you just look at anecdotal examples 01:38:17.640 |
and just showing a large number of them where it doesn't seem to understand and it's easily fooled, 01:38:22.680 |
that does not seem like a scientific data-driven analysis of how many places is damn impressive 01:38:32.360 |
in terms of meaning and understanding and how many places is easily fooled. 01:38:35.560 |
- That's not the inference. So I don't want to make that. The inference I wouldn't want to make 01:38:40.760 |
was that inference. The inference I'm trying to push is just that is it like humans here? It's 01:38:46.120 |
probably not like humans here. It's different. So humans don't make that error. If you explain that 01:38:50.920 |
to them, they're not going to make that error. They don't make that error. And so it's doing 01:38:55.320 |
something different from humans that they're doing in that case. - Well, what's the mechanism by which 01:39:00.360 |
humans figure out that it's an error? - I'm just saying the error there is like, if I explain to 01:39:04.840 |
you there's a 100% chance that the car is behind this door, well, do you want to trade? People say 01:39:11.240 |
no. But this thing will say yes because it's so, that trick, it's so wound up on the form 01:39:17.480 |
that it's, that's an error that a human doesn't make, which is kind of interesting. 01:39:22.840 |
- Less likely to make, I should say. - Yeah, less likely. 01:39:28.440 |
- I mean, you're asking, you know, you're asking humans to, you're asking a system to understand 01:39:34.200 |
100%, like you're asking some mathematical concepts. And so like. - Look, the places 01:39:40.600 |
where large language models are, the form is amazing. So let's go back to nested structures, 01:39:46.840 |
center-embedded structures, okay? If you ask a human to complete those, they can't do it. 01:39:50.920 |
Neither can a large language model. They're just like humans in that. If you ask, if I ask a large 01:39:55.960 |
language model. - That's fascinating, by the way. The 01:39:58.440 |
central embedding, the central embedding is, it struggles with. 01:40:01.400 |
- Just like humans, exactly like humans. Exactly the same way as humans. And that's not trained. 01:40:06.360 |
So they do exactly, so that is a similarity. So but then it's, that's not meaning, right? This 01:40:13.400 |
is form. But when we get into meaning, this is where they get kind of messed up, where you start 01:40:17.960 |
to saying, oh, what's behind this door? Oh, it's, you know, this is the thing I want. Humans don't 01:40:22.920 |
mess that up as much. Here, the form is just like, the form matches amazing, similar, without being 01:40:31.160 |
trained to do that. I mean, it's trained in the sense that it's getting lots of data, which is 01:40:34.840 |
just like human data, but it's not being trained on bad sentences and being told what's bad. It 01:40:41.720 |
just can't do those. It'll actually say things like, those are too hard for me to complete, 01:40:46.760 |
or something, which is kind of interesting, actually. How does it know that? I don't know. 01:40:50.120 |
Oh, but it really often doesn't just complete sentences. It very often says stuff that's true, 01:40:58.280 |
and sometimes says stuff that's not true. And almost always the form is great. 01:41:04.840 |
But it's still very surprising that with really great form, it's able to generate a lot of things 01:41:12.440 |
that are true, based on what it's trained on and so on. So it's not just form that is 01:41:19.800 |
generating. It's mimicking true statements from the internet. I guess the underlying idea there 01:41:28.040 |
is that on the internet, truth is overrepresented versus falsehoods. 01:41:34.840 |
So, but the fundamental thing it's trained on, you're saying, is just form. 01:41:41.160 |
Well, that's a sad... To me, that's still a little bit of an open question. I probably lean 01:41:48.120 |
agreeing with you, especially now you've just blown my mind that there's a separate module 01:41:54.440 |
in the brain for language versus thinking. Maybe there's a fundamental part missing from 01:42:00.680 |
the large language model approach that lacks the thinking, the reasoning capability. 01:42:06.840 |
Yeah, that's what this group argues. So the same group, Fedorenko's group, 01:42:13.800 |
has a recent paper arguing exactly that. There's a guy called Kyle Mahowell, who's here in Austin, 01:42:20.360 |
Texas, actually. He's an old student of mine, but he's a faculty in linguistics at Texas, 01:42:26.200 |
That's fascinating. Still, to me, an open question. 01:42:31.080 |
What to you are the interesting limits of LLMs? 01:42:32.920 |
You know, I don't see any limits to their form. Their form is perfect. 01:42:39.480 |
Yeah, yeah, yeah. It's pretty much... I mean, it's close to... 01:42:41.800 |
Well, you said ability to complete central embeddings. 01:42:44.920 |
Yeah, it's just the same as humans. It seems the same. 01:42:47.560 |
But that's not perfect, right? It should be able to... 01:42:49.080 |
That's good. No, but I want it to be like humans. I want a model of humans. 01:42:53.400 |
Oh, wait, wait, wait. Oh, so perfect is as close to humans as possible. I got it. 01:42:59.640 |
But you should be able to, if you're not human, like you're superhuman, 01:43:03.160 |
you should be able to complete central embedded sentences, right? 01:43:06.600 |
I mean, that's the mechanism. If it's modeling something, 01:43:10.840 |
I think it's kind of really interesting that it can't... 01:43:14.120 |
That it's more like... I think it's potentially 01:43:17.240 |
underlyingly modeling something like the way the form is processed. 01:43:27.800 |
And how they generate language. Process language and generate language, that's fascinating. 01:43:35.160 |
If we can just linger on the center embedding thing, that's hard for LLMs to produce, 01:43:40.040 |
and that seems really impressive because that's hard for humans to produce. 01:43:43.400 |
And how does that connect to the thing we've been talking about before, 01:43:48.520 |
which is the dependency grammar framework in which you view language, 01:43:52.920 |
and the finding that short dependencies seem to be a universal part of language. 01:43:58.120 |
So why is it hard to complete center embeddings? 01:44:01.960 |
So what I like about dependency grammar is it makes 01:44:05.480 |
the cognitive cost associated with longer distance connections very transparent. 01:44:14.360 |
Turns out there is a cost associated with producing and comprehending 01:44:19.480 |
connections between words which are just not beside each other. 01:44:23.320 |
The further apart they are, the worse it is, according to... 01:44:30.840 |
Can you just linger on what do you mean by cognitive cost? 01:44:34.840 |
Oh, well, you can measure it in a lot of ways. 01:44:36.760 |
The simplest is just asking people to say how good a sentence sounds. 01:44:44.360 |
And you can try to triangulate then across sentences and across structures 01:44:48.920 |
to try to figure out what the source of that is. 01:44:50.840 |
You can look at reading times in controlled materials. 01:44:56.760 |
In certain kinds of materials, and then we can measure the dependency distances there. 01:45:09.960 |
We could look at the language network and we could look at the activation 01:45:13.240 |
in the language network and how big the activation is depending on 01:45:18.920 |
And it turns out in just random sentences that you're listening to, 01:45:22.440 |
So it turns out there are people listening to stories here. 01:45:27.080 |
The longer the dependency is, the stronger the activation in the language network. 01:45:35.240 |
There's a bunch of different measures we could do. 01:45:37.240 |
That's kind of a neat measure, actually, of actual... 01:45:41.880 |
- So you can somehow, in different ways, convert it to a number. 01:45:44.920 |
I wonder if there's a beautiful equation connecting cognitive costs 01:45:50.920 |
- Yeah, it's complicated, but probably it's doable. 01:45:55.480 |
I tried to do that a while ago and I was reasonably successful, 01:46:00.360 |
but for some reason I stopped working on that. 01:46:02.200 |
I agree with you that it would be nice to figure out... 01:46:08.680 |
Another issue you raised before was how do you measure distance? 01:46:15.960 |
Is that some words matter more than others, and probably meaning nouns 01:46:22.040 |
might matter, and then it maybe depends on which kind of noun. 01:46:25.080 |
Is it a noun we've already introduced or a noun that's already been mentioned? 01:46:32.280 |
So probably the simplest thing to do is just like, 01:46:34.120 |
"Oh, let's forget about all that and just think about words or morphemes." 01:46:39.160 |
But there might be some insight in the kind of function that fits the data, 01:46:51.720 |
- So we think it's probably an exponential such that the longer the distance, 01:47:03.560 |
If you've got a bunch of them that are being connected at some point, 01:47:06.680 |
that's at the ends of those, the cost is some exponential function of those is my guess. 01:47:13.240 |
But because the reason it's probably an exponential is like it's not just the distance 01:47:18.200 |
between two words because I can make a very, very long subject verb dependency 01:47:21.960 |
by adding lots and lots of noun phrases and prepositional phrases, 01:47:27.240 |
It's when you do nested, when I have multiple of these, 01:47:34.360 |
- Probably somehow connected to working memory or something like this. 01:47:36.920 |
- Yeah, that's probably a function of the memory here is the access, 01:47:43.640 |
It's kind of hard to figure out what was referred to earlier. 01:47:48.280 |
That's the sort of notion of merking, as opposed to a storagy thing, 01:47:51.960 |
but trying to connect, retrieve those earlier words depending on what was in between. 01:47:57.480 |
And then we're talking about interference of similar things in between. 01:48:01.240 |
That's the right theory probably has that kind of notion, 01:48:06.280 |
And so I'm dealing with an abstraction over the right theory, 01:48:12.120 |
And then maybe you're right though, there's some sort of an exponential 01:48:18.280 |
so we can figure out a function for any given sentence in any given language. 01:48:22.920 |
But you know, it's funny, people haven't done that too much, 01:48:25.640 |
which I do think is, I'm interested that you find that interesting. 01:48:30.760 |
and a lot of people haven't found it interesting. 01:48:32.600 |
And I don't know why I haven't got people to want to work on that. 01:48:36.440 |
- No, that's a beautified, and the underlying idea is beautiful, 01:48:40.120 |
that there's a cognitive cost that correlates with the length of dependency. 01:48:44.760 |
It feels like, I mean, language is so fundamental to the human experience, 01:48:48.440 |
and this is a nice, clean theory of language where it's like, wow, okay, 01:48:55.560 |
so we like our words close together, dependent words close together. 01:49:04.000 |
- It's so simple, and yet it explains some very complicated phenomena. 01:49:09.640 |
it's kind of hard to know why they're so hard, 01:49:13.320 |
I can give you a math formula for why each one of them is bad and where, 01:49:21.480 |
Is there like, if you take a piece of text and then simplify, 01:49:25.560 |
sort of like there's an average length of dependency, 01:49:29.720 |
and then you like, you know, reduce it and see comprehension on the entire, 01:49:35.320 |
not just single sentence, but like, you know, 01:49:37.480 |
you go from James Joyce to Hemingway or something. 01:49:43.880 |
That does, there's probably things you can do in that kind of direction. 01:49:47.480 |
- We might, you know, we're gonna talk about legalese at some point, 01:49:50.760 |
so maybe we'll talk about that kind of thinking with applied to legalese. 01:49:55.400 |
- Well, let's talk about legalese, 'cause you mentioned that as an exception. 01:50:03.480 |
- That you say that most natural languages, as we've been talking about, 01:50:08.840 |
have local dependencies, with one exception, legalese. 01:50:15.160 |
- Oh, well, legalese is what you think it is. 01:50:19.880 |
- Well, I mean, like, I actually know very little 01:50:24.120 |
- So I'm just talking about language in laws and language in contracts. 01:50:30.520 |
we have to run into every other day or every day, 01:50:38.280 |
And, or, you know, partly it's just long, right? 01:50:40.760 |
There's a lot of text there that we don't really want to know about. 01:50:46.200 |
so I've been working with this guy called Eric Martinez, 01:50:49.960 |
who is a, he was a lawyer who was taking my class. 01:50:53.560 |
I was teaching a psycholinguistics lab class, 01:50:55.800 |
and I have been teaching it for a long time at MIT, 01:51:00.120 |
And he took the class 'cause he had done some linguistics as an undergrad, 01:51:03.400 |
and he was interested in the problem of why legalese sounds hard to understand. 01:51:09.320 |
You know, why, and so why is it hard to understand, 01:51:11.880 |
and why do they write that way if it is so hard to understand? 01:51:15.320 |
It seems apparent that it's hard to understand. 01:51:20.280 |
And we did an evaluation of a bunch of contracts. 01:51:24.760 |
Actually, we just took a bunch of sort of random contracts, 01:51:29.240 |
contracts in laws might not be exactly the same, 01:51:33.720 |
that most people have to deal with most of the time. 01:51:36.040 |
And so that's kind of the most common thing that humans have, 01:51:38.680 |
like humans, that adults in our industrialized society 01:51:48.520 |
but it turns out that the way they're written is very center-embedded, 01:51:59.000 |
it does have surprising, slightly lower-frequency words 01:52:10.680 |
You just revealed a game that lawyers are playing. 01:52:18.920 |
so now you're saying it's, they're doing it intentionally. 01:52:20.760 |
I don't think they're doing it intentionally. 01:52:30.520 |
so like, 'cause it turns out that we're not the first 01:52:34.200 |
Like, back to, Nixon had a plain language act in 1970, 01:52:45.640 |
"Oh, we've got to simplify legal language, must simplify it." 01:52:52.040 |
You need to know what it is you're supposed to do 01:52:55.480 |
And so you need to like, you need a psycholinguist 01:52:58.120 |
to analyze the text and see what's wrong with it 01:53:05.400 |
And so what we did was just, that's what we did. 01:53:08.280 |
We just took a bunch of contracts, had people, 01:53:17.240 |
And so that is like, basically how often a clause 01:53:23.240 |
would intervene between a subject and a verb. 01:53:26.200 |
For example, that's one kind of a central embedding 01:53:29.480 |
And turns out they're massively central embedded. 01:53:32.440 |
Like, so I think in random contracts and in random laws, 01:53:35.720 |
I think you get about 70% or 80, something like 70% 01:53:39.480 |
of sentences have a central embedded clause in them, 01:53:43.400 |
If you go to any other text, it's down to 20% or something. 01:53:46.680 |
It's so much higher than any control you can think of, 01:53:54.280 |
No, people don't write central embedded sentences 01:53:59.720 |
it's on the 20%, 30% realm, as opposed to 70. 01:54:03.080 |
And so there's that, and there's low-frequency words. 01:54:09.000 |
Passive, for some reason, the passive voice in English 01:54:22.600 |
than there is in other texts. - And the passive voice 01:54:24.120 |
accounts for some of the low-frequency words. 01:54:25.720 |
- No, no, no, no, those are separate, those are separate. 01:54:28.040 |
- Oh, so passive voice sucks, low-frequency word sucks. 01:54:36.120 |
These are things which happen in legalese text. 01:54:38.920 |
The dependent measure is how well you understand 01:54:47.240 |
So it has zero effect on your comprehension ability, 01:54:55.880 |
They do, and low-frequency words are gonna hurt you 01:54:59.480 |
But what really hurts is the center embedding. 01:55:08.920 |
That makes them, they can't recall what was said 01:55:20.840 |
of sort of different levels of law firms and stuff. 01:55:34.840 |
They didn't process it just as well as it was normal. 01:55:41.240 |
So they can much better recall, much better understanding, 01:55:51.400 |
So we constructed non-center embedded versions 01:55:54.600 |
We constructed versions which have higher frequency words 01:56:08.280 |
And the un-center embedding makes big differences 01:56:16.280 |
But how hard is it to detect center embedding? 01:56:32.200 |
- So you're not just looking for long dependencies. 01:56:34.280 |
You're just literally looking for center embedding. 01:56:35.880 |
- Yeah, yeah, we are in this case, in these cases. 01:56:37.480 |
But long dependencies, they're highly correlated. 01:56:43.160 |
you throw inside of a sentence that just blows up the, 01:56:47.560 |
Can I read a sentence for you from these things? 01:56:49.560 |
- I mean, this is just like one of the things that, 01:57:00.600 |
It goes, "In the event that any payment or benefit 01:57:02.920 |
by the company, all such payments and benefits, 01:57:06.440 |
under Section 3A hereof, being here and after 01:57:13.160 |
then the cash severance payments shall be reduced." 01:57:15.320 |
So that's something we pulled from a regular text, 01:57:23.400 |
They throw the definition of what payments and benefits are 01:57:31.560 |
- How about put the definition somewhere else, 01:58:02.440 |
and they much preferred the un-centerbed versions. 01:58:06.920 |
- Yeah, and we asked them, "Would you hire someone 01:58:12.360 |
and they always preferred the less complicated version, 01:58:28.600 |
that there's actually some kind of a performative meaning 01:58:40.120 |
Like, that's a reasonable guess, and maybe it's just... 01:58:47.480 |
So we kind of call this the magic spell hypothesis. 01:58:49.880 |
So when you tell someone to put a magic spell on someone, 01:58:58.440 |
You know, that's kind of what people will tend to do. 01:59:00.440 |
They'll do rhyming, and they'll do sort of like 01:59:05.320 |
And maybe there's a syntactic sort of reflex here 01:59:12.760 |
And so that's like, oh, it's trying to tell you 01:59:19.960 |
It's telling you something that we want you to believe 01:59:24.280 |
That's what legal contracts are trying to enforce on you. 01:59:31.400 |
This is like a very abstract form, center embedding, 01:59:39.720 |
for lawyers to generate things that are hard to understand? 02:00:05.080 |
what is broken about the gigantic bureaucracy 02:00:07.320 |
that leads to Chernobyl or something like this. 02:00:09.240 |
I think the incentives under which you operate 02:00:26.760 |
look at the system, as opposed to asking individual lawyers 02:00:35.160 |
- Like you're gonna need a lawyer to figure that out, 02:00:40.120 |
I guess, from the perspective of the individual. 02:00:42.360 |
But then that could be the performative aspect. 02:00:44.360 |
It could be as opposed to the incentive-driven 02:01:16.760 |
- Influential bad apples that everybody looks up to, 02:01:21.400 |
or whatever, they're like central figures in how-- 02:02:04.840 |
What they're doing is just throwing stuff in there 02:02:16.280 |
'cause if you only use it in that one sentence, 02:02:19.160 |
then there's no reason to introduce extra terms. 02:02:35.080 |
- So maybe the next president of the United States 02:02:43.480 |
and make Ted the language czar of the United States. 02:02:47.800 |
Martinez is the guy you should really put in there. 02:02:53.160 |
- But center embeddings are the bad thing to have. 02:03:02.780 |
- And it is really fascinating on many fronts 02:03:12.040 |
So one of the mathematical formulations you have 02:03:28.360 |
So Shannon, Claude Shannon was a student at MIT 02:03:33.800 |
And so he wrote this very influential piece of work 02:03:37.320 |
about communication theory or information theory. 02:03:40.120 |
And he was interested in human language, actually. 02:03:43.560 |
He was interested in this problem of communication, 02:03:46.600 |
of getting a message from my head to your head. 02:03:59.080 |
And so assuming we both speak the same language, 02:04:14.200 |
And then the problem there in the communication 02:04:18.760 |
is that there's a lot of noise in the system. 02:04:23.080 |
I don't speak perfectly, I make errors, that's noise. 02:04:34.520 |
There's some speaking going on that you're at a party, 02:04:39.320 |
You're trying to hear someone, it's hard to understand them 02:04:41.400 |
because there's all this other stuff going on 02:04:48.520 |
so that you have some problem maybe understanding me 02:04:52.040 |
for stuff that's just internal to you in some way. 02:05:01.240 |
You know, who knows why you're not able to pay attention 02:05:05.800 |
And so that language, if it's a communication system, 02:05:12.440 |
the passing of the message from one side to the other. 02:05:15.160 |
And so, I mean, one idea is that maybe aspects of like 02:05:21.640 |
word order, for example, might've optimized in some way 02:05:24.440 |
to make language a little more easy to be passed 02:05:32.920 |
You know, it's very interesting, historically, 02:05:37.080 |
He was at MIT and he did, this was his master's thesis 02:05:40.840 |
You know, it's crazy how much he did for his master's thesis 02:05:48.440 |
And it just wasn't a popular, communication as a reason, 02:05:53.960 |
a source for what language was, wasn't popular at the time. 02:05:56.760 |
So Chomsky was becoming, it was moving in there. 02:05:58.760 |
And he just wasn't able to get a handle there, I think. 02:06:01.880 |
And so he moved to Bell Haps and worked on communication 02:06:05.880 |
from a mathematical point of view and was, you know, 02:06:12.120 |
- More on the signal side versus like the language side. 02:06:27.480 |
- We can kind of show that there's a noisy channel process 02:06:31.160 |
going on in when you're listening to me, you know, 02:06:34.520 |
you can often sort of guess what I meant by what I, you know, 02:06:39.560 |
And I mean, with respect to sort of why language 02:06:43.240 |
looks the way it does, we might, there might be sort of, 02:06:45.480 |
as I alluded to, there might be ways in which word order 02:06:49.000 |
is somewhat optimized for, because of the noisy channel 02:06:53.000 |
- I mean, that's really cool to sort of model 02:06:55.480 |
if you don't hear certain parts of a sentence 02:06:57.800 |
or have some probability of missing that part, 02:07:05.720 |
- And then you're kind of saying like the word order 02:07:07.880 |
and the syntax of language, the dependency length 02:07:14.200 |
- Yeah, well, the dependency length is really about memory. 02:07:17.080 |
I think that's like about sort of what's easier or harder 02:07:20.920 |
And these other ideas are about sort of robustness 02:07:24.440 |
So the problem of potential loss of signal due to noise. 02:07:28.840 |
It's so that there may be aspects of word order, 02:07:33.480 |
And, you know, we have this one guess in that direction. 02:07:41.480 |
All we can do is like, look at the current languages 02:07:44.120 |
This is like, we can't sort of see how languages change 02:07:46.520 |
or anything because we've got these snapshots of a few, 02:07:49.320 |
you know, a hundred or a few thousand languages. 02:07:54.920 |
of modifications to test these things experimentally. 02:07:57.560 |
And so, you know, so just take this with a grain of salt. 02:08:01.720 |
The dependency stuff I can, I'm much more solid on. 02:08:10.600 |
Here's like, why, you know, why does a word order 02:08:14.040 |
We're now into shaky territory, but it's kind of cool. 02:08:25.320 |
And you model with a noisy channel, the loudness, 02:08:30.360 |
the noise, and we have the signal that's coming across. 02:08:33.480 |
And you're saying word order might have something 02:08:36.200 |
to do with optimizing that, where there's presence of noise. 02:08:41.480 |
I mean, to me, it's interesting how much you can load 02:08:51.480 |
at least three different kinds of things going on there. 02:08:53.960 |
And we probably don't want to treat them all as the same. 02:08:56.440 |
- And so I think that you, you know, the right model, 02:08:58.760 |
a better model of a noisy channel would treat, 02:09:06.360 |
speaker inherent noise, and listener inherent noise. 02:09:10.040 |
And those are not, those are all different things. 02:09:11.960 |
But then underneath it, there's a million other subsets. 02:09:18.120 |
I just mentioned cognitive load on both sides. 02:09:27.960 |
we start to creep into the meaning realm of like, 02:09:36.760 |
And so if it's second language for you versus first language, 02:09:40.200 |
and how, maybe what other languages you know, 02:09:44.200 |
And that's like potentially very informative. 02:09:49.800 |
So like a child learning a language is a, you know, 02:09:53.000 |
as a noisy representation of English grammar, 02:09:58.520 |
So maybe when they're six, they're perfectly formed. 02:10:04.200 |
is like a way to measure a language is learning problems. 02:10:08.280 |
So like, what's the correlation between everything 02:10:22.200 |
Is there some kind of, or like the dependency grammar, 02:10:30.280 |
- Yeah, well, all the languages in the world's language, 02:10:33.160 |
none is right now we know is any better than any other 02:10:36.600 |
with respect to sort of optimizing dependency links. 02:10:53.880 |
You know, they're just sort of noisy solutions 02:11:01.880 |
They're probably optimized for communication. 02:11:11.240 |
if it were just about minimizing dependency links 02:11:22.600 |
but languages always have regularity in their rules. 02:11:32.360 |
all that mattered was keeping the dependencies 02:11:52.520 |
So they're very, and depending on the language, 02:11:56.040 |
So you speak Russian, they're less strict than English. 02:12:05.320 |
Like that's probably not about communication. 02:12:11.080 |
It's probably easier to learn regular things, 02:12:14.440 |
things which are very predictable and easy to, 02:12:16.760 |
so that's probably about learning is our guess, 02:12:31.800 |
We have free-er, but not free, like there's always-- 02:12:37.720 |
is like cultural, like sticky cultural things, 02:12:44.760 |
that it's an imperfect, it's a noisy, optimistic, stochastic. 02:12:49.160 |
The function over which you're optimizing is very noisy. 02:12:57.720 |
that learning is part of the objective function, 02:13:00.120 |
'cause some languages are way harder to learn than others, 02:13:12.520 |
- But that depends on what you started with, right? 02:13:14.920 |
So, it really depends on how close that second language 02:13:19.080 |
And so, yes, it's very, very hard to learn Arabic 02:13:26.280 |
or if you've started with, Chinese, I think, is the worst. 02:13:29.000 |
There's like Defense Language Institute in the United States 02:13:46.840 |
like by three or four, they speak that language. 02:13:49.320 |
And so, there's no evidence of anything harder or easier 02:13:56.440 |
this is returning to Chomsky a little bit, is innate? 02:14:06.440 |
to explain away certain things that are observed. 02:14:15.160 |
- I mean, the answer is I don't know, of course. 02:14:19.400 |
But I mean, I like to, I'm an engineer at heart, I guess, 02:14:28.600 |
And so, I'm guessing that a lot of it's learned. 02:14:31.240 |
So, I think the reason Chomsky went with the innateness 02:14:34.120 |
is because he hypothesized movement in his grammar. 02:14:40.120 |
He was interested in grammar, and movement's hard to learn. 02:14:43.400 |
Movement is a hard, it's a hard thing to learn, 02:14:45.320 |
to learn these two things together and how they interact. 02:14:47.720 |
And there's a lot of ways in which you might generate 02:14:50.280 |
exactly the same sentences, and it's really hard. 02:14:54.840 |
Sorry, so I guess it's not learned, it's innate. 02:14:59.160 |
and just think about that in a different way, 02:15:11.400 |
It's actually, it's a valuable asset of the theory. 02:15:23.640 |
And that's kind of why I think these large language models 02:15:25.640 |
are learning so well, is because I think you can learn 02:15:28.520 |
the form, the forms of human language from the input. 02:15:37.880 |
That could be just, you don't need, you don't need any. 02:15:46.680 |
It doesn't have to, so there's something called 02:16:01.160 |
which does visual word processing if you read, 02:16:13.720 |
And so, the modularization is not evidence for innateness. 02:16:39.000 |
or something goes really wrong on the left side, 02:17:00.440 |
And there's these natural experiments which happen, 02:17:15.880 |
'cause they happen to be accidentally scanned 02:17:18.360 |
It's like, what happened to your left hemisphere? 02:17:28.920 |
So, that's like a very interesting current research. 02:17:47.560 |
And she happened to be a writer for the New York Times. 02:17:50.200 |
And there was an article in the New York Times 02:17:58.920 |
by sort of the general process of MRI and language. 02:18:04.040 |
And because she's writing for the New York Times, 02:18:11.880 |
because they've been accidentally scanned for some reason 02:18:24.360 |
They're kind of messy, but natural experiments. 02:18:29.000 |
- The first few hours, days, months of human life 02:18:46.040 |
all that kind of stuff, no matter what happens. 02:18:47.960 |
Not no matter what, but robust to the different ways 02:19:02.760 |
that language seems to be happening separate from thought? 02:19:26.600 |
that it could be completely separate from thought. 02:19:39.240 |
You can't do a thought experiment to figure that out. 02:19:41.720 |
You need a scanner, you need brain-damaged people, 02:19:44.760 |
you need something, you need ways to measure that. 02:19:57.800 |
There's no way to say that the language network 02:20:11.400 |
So you can always make, "Oh, it's only two people. 02:20:16.200 |
"It's four people," or something for the patients. 02:20:18.920 |
And there's something special about them we don't know. 02:20:20.760 |
But these are just random people, and with lots of them. 02:20:33.000 |
What's the connection between culture and language? 02:20:37.000 |
You've also mentioned that much of our study of language 02:20:49.880 |
Western-educated, industrialized, rich, and democratic. 02:21:13.400 |
And he basically was pushing that observation 02:21:20.360 |
when we're talking in psychology or sociology, 02:21:25.240 |
about humans if we're talking about undergrads 02:21:37.640 |
there's a lot of other kinds of languages in the world 02:21:51.960 |
I mean, of course, English and Chinese cultures 02:21:54.600 |
but hunter-gatherers are much more different in some ways. 02:21:59.160 |
And so if culture has an effect on what language is, 02:22:03.080 |
then we kind of want to look there as well as looking. 02:22:06.760 |
It's not like the industrialized cultures aren't interesting. 02:22:09.640 |
But we want to look at non-industrialized cultures as well. 02:22:28.440 |
in that they do a little bit of farming as well, 02:23:05.240 |
I mean, he was a missionary actually initially 02:23:19.160 |
the Chimani and the Piraha are both isolate languages, 02:23:22.440 |
meaning there's no known connected languages at all. 02:23:55.080 |
And so in Africa, you've got a lot of moving of people 02:24:03.560 |
if you've got to move because you've got no water, 02:24:08.600 |
And then you run into contact with other tribes, 02:24:15.000 |
And so people can stay there for hundreds and hundreds 02:24:19.000 |
And so these groups, the Chimani and the Piraha 02:24:23.720 |
And they can just, I guess they've just lived there 02:24:30.600 |
And so, I mean, I'm interested in them because they are, 02:24:35.000 |
I mean, in these cases, I'm interested in their words. 02:24:40.680 |
their orders of words, but I'm mostly just interested 02:24:49.320 |
And so with the Piraha, sort of most interesting, 02:24:51.560 |
I was working on number there, number information. 02:24:54.760 |
And so the basic idea is I think language is invented. 02:25:01.800 |
It's the same idea so that what you need to talk about 02:25:05.880 |
with someone else is what you're gonna invent words for. 02:25:09.160 |
And so we invent labels for colors that I need, 02:25:12.680 |
not that I can see, but the things I need to tell you about 02:25:21.800 |
or a word for aquamarine in the Amazon jungle 02:25:26.920 |
for the most part, because I don't have two things 02:25:31.480 |
And so numbers are really another fascinating source 02:25:35.400 |
of information here where you might, naively, 02:25:39.640 |
I certainly thought that all humans would have words 02:25:47.160 |
Okay, so they don't have any words for even one. 02:25:50.440 |
There's not a word for one in their language. 02:25:54.040 |
And so there's certainly not a word for two, three or four. 02:26:01.800 |
- How are you gonna ask, I want two of those? 02:26:04.840 |
And so that's just not a thing you can possibly ask 02:26:12.440 |
Okay, so it was thought to be a one, two, many language. 02:26:16.200 |
There are three words for quantifiers for sets, 02:26:19.480 |
but people had thought that those meant one, two and many. 02:26:23.800 |
But what they really mean is few, some and many. 02:26:31.880 |
and this is kind of cool, is that we gave people, 02:26:39.880 |
doesn't really matter what they are, identical objects. 02:26:42.280 |
And I sort of start off here, I just give you one of those 02:26:46.920 |
Okay, so you're a Piraha speaker and you tell me what it is. 02:26:49.640 |
And then I give you two and say, what's that? 02:26:51.640 |
And nothing's changing in the set except for the number, okay? 02:26:55.080 |
And then I just ask you to label these things. 02:26:56.760 |
We just do this for a bunch of different people. 02:27:03.320 |
So they say the word that we thought was one, it's few, 02:27:06.920 |
but for the first one, and then maybe they say few, 02:27:15.160 |
And then five, six, seven, eight, I go all the way to 10. 02:27:21.880 |
because they told me what the word was for six, seven, eight. 02:27:26.040 |
And I'm gonna continue asking them at nine and 10. 02:27:30.200 |
They understand that I wanna know their language. 02:27:33.640 |
is I'm trying to learn their language, and so that's okay. 02:27:37.960 |
'cause they already told me what the word for many was, 02:27:43.480 |
So it's a little funny to do this task over and over. 02:27:46.040 |
We did this with a guy called, Dan was our translator. 02:27:49.160 |
He's the only one who really speaks Piraha fluently. 02:27:53.160 |
He's a good bilingual for a bunch of languages, 02:28:10.520 |
and they all do exactly the same labeling for one up. 02:28:16.680 |
We do some of them up, some of them down first, okay? 02:28:19.000 |
And so we do, instead of one to 10, we do 10 down to one. 02:28:45.480 |
and there's gonna be a threshold in the context. 02:28:53.640 |
I mean, that's gonna depend completely on the context. 02:28:55.960 |
- And that might actually be, at first, hard to discover, 02:29:09.000 |
That's fascinating that numbers don't present themselves. 02:29:28.040 |
We put out those spools of thread again, okay? 02:29:34.680 |
And those happened to be uninflated red balloons. 02:29:39.000 |
It's just they're a bunch of exactly the same thing. 02:29:57.000 |
because I did this with this guy, Mike Frank, 02:29:59.560 |
and I'd be the experimenter telling him to do this 02:30:06.920 |
All we had to, I didn't have to speak Pirohan 02:30:10.680 |
Like do what he did is like all we had to be able to say. 02:30:16.760 |
We do some sort of random number of items up to 10. 02:30:28.040 |
I don't need to know how many there are there 02:30:31.000 |
And they would make mistakes, but very, very few. 02:30:37.160 |
Just gonna say, like there's no, these are low stakes. 02:30:50.680 |
But I just don't know what he did wrong there 02:30:54.200 |
And, you know, I can train my dog to do this task. 02:31:00.360 |
But the other task that was sort of more interesting 02:31:12.360 |
I just put a opaque sheet in front of the things. 02:31:26.040 |
And it's easy if it's two or three, it's very easy. 02:31:37.000 |
- For us it's easy 'cause we just count them. 02:32:02.360 |
And so then, and there's a bunch of tasks we did 02:32:07.080 |
They did approximate after five on all those tasks. 02:32:17.240 |
- There's a little bit of a chicken and egg thing there. 02:32:27.960 |
won't be able to come up with a counting task. 02:32:36.680 |
So yes, you develop counting because you need it. 02:32:50.360 |
They do matching really well for building purposes, 02:32:53.160 |
building some kind of hut or something like this. 02:32:55.720 |
So it's interesting that language is a limiter 02:33:22.840 |
This is one of those problems with the snapshot 02:33:40.520 |
and you have say 17 goats and you go to bed at night 02:33:44.520 |
boy, it's easier to have a count system to do that. 02:33:50.520 |
So they don't have, like, people often ask me 02:33:54.600 |
they say, "Well, don't these Purahan, don't they have kids? 02:33:57.800 |
I'm like, "Yeah, they have a lot of children." 02:34:02.600 |
And they go, "Well, don't they need the numbers 02:34:10.760 |
because that's not how you keep track of your kids. 02:34:22.840 |
If you replaced one with someone else, I would care. 02:34:35.400 |
you're gonna know them actually individually also. 02:34:38.200 |
- I mean, cows, goats, if there's a source of food and milk 02:34:44.600 |
such that you don't have to care about their identities 02:35:02.520 |
Like, why do we have all these different languages? 02:35:07.880 |
Well, my guess is that the function of a language 02:35:13.240 |
I mean, unless there's some function to that language 02:35:36.520 |
And so there's a neighboring group called Mocitan, 02:35:44.840 |
So there's two languages which are really close, 02:35:54.680 |
in that it has a lot of contact with Spanish, 02:35:59.560 |
The reason it's dying is there's not a lot of value 02:36:03.000 |
for the local people in their native language. 02:36:06.600 |
So there's much more value in knowing Spanish, 02:36:19.800 |
They want, and so Mocitan is in danger and is dying. 02:36:27.000 |
the reason we learn language is to communicate, 02:36:35.000 |
and to do whatever it is to feed our families. 02:36:38.200 |
And if that's not happening, then it won't take off. 02:36:47.320 |
It's not because it's an easy language to learn. 02:36:54.200 |
- But because the United States is a gigantic economy, 02:37:02.120 |
and so there's a motivation to learn Mandarin. 02:37:09.880 |
because there's so, so many speakers all over the world. 02:37:21.000 |
that do want to learn language just for language's sake, 02:37:39.000 |
- And that, well, well-- - We're moving towards 02:37:53.640 |
but if you look at geopolitics and superpowers, 02:37:58.120 |
it does seem that there's another thing in tension, 02:38:00.040 |
which is a language is a national identity sometimes. 02:38:07.400 |
Language, Ukrainian language is a symbol of that war 02:38:11.240 |
in many ways, like a country fighting for its own identity. 02:38:18.520 |
is the convenience of trade and the economics 02:38:21.720 |
and be able to communicate with neighboring countries 02:38:25.240 |
and trade more efficiently with neighboring countries, 02:38:31.320 |
- I completely agree. - 'Cause language is the way, 02:38:33.720 |
for every community, like dialects that emerge 02:38:42.520 |
for people to say F-U to the more powerful people. 02:38:48.280 |
So in that way, language can be used as that tool. 02:38:52.760 |
And there's a lot of work to try to create that identity 02:38:57.960 |
As a cognitive scientist and language expert, 02:39:02.840 |
I hope that continues because I don't want languages to die. 02:39:07.800 |
because they're so interesting for so many reasons. 02:39:15.880 |
just for the language part, but I think there's a lot 02:39:25.400 |
that can break down the barriers of language? 02:39:27.400 |
So while all these different diverse languages exist, 02:39:29.960 |
I guess there's many ways of asking this question, 02:39:36.760 |
in an automated way from one language to another? 02:39:43.160 |
So there are concepts that are in one language 02:39:51.400 |
So like good luck translating a lot of English into Piraha. 02:39:56.680 |
There's no way to do it because there are no words 02:40:09.800 |
And so I just don't know what those concepts are. 02:40:11.800 |
I mean, the space, the world space is different 02:40:19.000 |
things are, it's gonna have to do with their life 02:40:25.720 |
And so there's gonna be problems like that always. 02:40:36.360 |
It's like extreme, I'd say in the number space, 02:40:39.560 |
exact number space, but in the color dimension, right? 02:40:45.960 |
that you don't have ways to talk about the concepts. 02:40:49.480 |
- And there might be entire concepts that are missing. 02:40:51.640 |
But to you, it's more about the space of concept 02:40:59.160 |
But so you were talking earlier about translation 02:41:06.360 |
I mean, now we're talking about translations of form, right? 02:41:26.840 |
but there's a music and a rhythm to the form. 02:41:32.120 |
like the difference between Dostoevsky and Tolstoy 02:41:34.840 |
or Hemingway, Bukowski, James Joyce, like I mentioned, 02:41:49.720 |
I'm optimistic that we could get measures of those things. 02:41:59.560 |
- Translation to Hemingway is probably the lowest, 02:42:06.920 |
but the average per sentence dependency length 02:42:15.720 |
It's simple sentences with short, yeah, yeah, yeah, yeah. 02:42:18.920 |
- I mean, that's when, if you have really long sentences, 02:42:21.400 |
even if they don't have center embedding, like-- 02:42:30.200 |
- But it is much more likely to have the possibility 02:42:33.400 |
of long dependencies with long sentences, yeah. 02:42:39.000 |
who does a lot of cool stuff, really brilliant. 02:42:42.120 |
He works with Tristan Harris and a bunch of stuff, 02:42:43.960 |
but he was talking to me about communicating with animals. 02:42:52.040 |
where you're trying to find the common language 02:42:57.640 |
And he was saying that there's a lot of promising work 02:43:02.120 |
that even though the signals are very different, 02:43:04.840 |
like the actual, if you have embeddings of the languages, 02:43:10.760 |
they're actually trying to communicate similar type things. 02:43:20.760 |
in everything you've seen in different cultures, 02:43:33.480 |
There's this sort of weird view, well, odd view, I think, 02:43:38.280 |
to think that human language is somehow special. 02:43:44.600 |
We can certainly do more than any of the other species. 02:43:48.040 |
And maybe our language system is part of that. 02:43:55.400 |
But people have often talked about how human, 02:44:00.760 |
how only human language has this compositionality thing 02:44:21.240 |
And the reasoning is like, that's bad reasoning. 02:44:25.400 |
I'm pretty sure if you asked a whale what we're saying, 02:44:28.600 |
they'd say, well, I'm making a bunch of weird noises. 02:44:31.340 |
- And so it's like, this is a very odd reasoning 02:44:37.000 |
because we're the only ones who have human language. 02:44:38.600 |
I'm like, well, we don't know what those other, 02:44:46.680 |
And it might very well be something complicated 02:44:51.240 |
I mean, sure, with a small brain in lower species, 02:44:55.960 |
there's probably not a very good communication system. 02:45:00.120 |
what seems to be abilities to communicate something, 02:45:05.160 |
there might very well be a lot more signal there 02:45:10.600 |
- But also if we have a lot of intellectual humility here, 02:45:16.120 |
who I admire very much, has talked a lot about, 02:45:23.640 |
So like, yes, the signal there is even less than, 02:45:28.200 |
but like, it's not out of the realm of possibility 02:45:40.200 |
through some way of communicating with each other. 02:45:43.560 |
And if you have enough humility about that possibility, 02:45:46.120 |
I think you can, I think it would be a very interesting, 02:45:49.640 |
in a few decades, maybe centuries, hopefully not, 02:45:52.840 |
a humbling possibility of being able to communicate, 02:46:07.080 |
but some of them will. - But you could still-- 02:46:13.080 |
there could be some interesting trees out there. 02:46:17.000 |
Well, they're probably talking to other trees, right? 02:46:23.160 |
to some other conspecific, as opposed to us, right? 02:46:27.960 |
And so there probably is, there may be some signal there. 02:46:32.680 |
actually it's pretty common to say that human language 02:46:50.200 |
'cause we don't speak these other communication systems 02:47:02.120 |
- Let me ask you a wild, out there sci-fi question. 02:47:05.160 |
If we make contact with an intelligent alien civilization 02:47:25.400 |
He is like amazing at learning foreign languages. 02:47:28.760 |
And so this is an amazing feat to be able to go. 02:47:37.320 |
- Well, there was a guy that had been there before, 02:48:05.480 |
is one of the most basic things to figure out. 02:48:10.760 |
and you just throw a stick down and say, "Stick." 02:48:22.280 |
that there weren't any count words in this language 02:48:24.760 |
because they didn't know this wasn't interesting. 02:48:33.080 |
But you have to be pretty out there socially, 02:48:37.800 |
which these are really very different people from you. 02:48:44.760 |
is that's how a lot of people know a lot of languages, 02:48:49.960 |
- That's a tough one, where you just show up knowing nothing. 02:48:53.800 |
- It's beautiful that humans are able to connect in that way. 02:49:10.760 |
- When you see something interesting, just go and do it. 02:49:17.560 |
So when I saw that Piero Han was available to go and visit, 02:49:24.360 |
we had some trouble with the Brazilian government. 02:49:31.480 |
And so I was like, "All right, I gotta find another group." 02:49:33.720 |
And so we searched around and we were able to find the Chamani, 02:49:36.280 |
because I wanted to keep working on this kind of problem. 02:49:38.520 |
And so we found the Chamani and just go there. 02:49:40.520 |
I didn't really have, we didn't have content. 02:49:42.680 |
We had a little bit of contact and brought someone. 02:49:44.680 |
And that was, we just kind of just try things. 02:49:48.440 |
I say it's like, a lot of that's just like ambition, 02:49:51.080 |
just try to do something that other people haven't done. 02:49:54.040 |
Just give it a shot is what I, I mean, I do that all the time. 02:50:20.360 |
please check out our sponsors in the description. 02:50:23.160 |
And now let me leave you with some words from Wittgenstein. 02:50:26.520 |
The limits of my language mean the limits of my world.