back to indexStanford CS224N NLP with Deep Learning | 2023 | Lecture 14 - Insights between NLP and Linguistics

00:00:00.000 | 
Cool. Hi, everyone. Hi, I'm Isabel. I'm a PhD student in the NLP group. It's about 00:00:15.920 | 
connecting insights between NLP and linguistics. Yeah, so hopefully we're going to learn some 00:00:21.040 | 
linguistics and think about some cool things about language. Some logistics. We're in the 00:00:27.760 | 
project part of the class, which is cool. We're so excited to see everything you guys 00:00:31.760 | 
do. You should have a mentor grader assigned through your project proposal. The person 00:00:39.360 | 
who ever graded your project proposal, especially if you're in a custom project, we recommend 00:00:45.080 | 
that you go to your graders' office hours. They'll know the most and be most into your 00:00:51.120 | 
project. And project milestones are due next Thursday. So that's in one week from now. 00:00:57.800 | 
So hopefully you guys are all getting warmed up doing some things for the project. And 00:01:04.160 | 
we'd love to hear where you are next week. Cool. So the main thing that I'm going to 00:01:10.960 | 
talk about today is that there's been kind of a paradigm shift for the role of linguistics 00:01:16.760 | 
in NLP due to large language models. So it used to be that there was just human language. 00:01:23.880 | 
We created all the time. We're literally constantly creating it. And then we would analyze it 00:01:28.440 | 
in all these ways. Maybe we want to make trees out of it. Maybe we want to make different 00:01:31.520 | 
types of trees out of it. And then all that would kind of go into making some kind of 00:01:38.240 | 
computer system that can use language. And now we've cut out this middle part. So we 00:01:46.280 | 
have human language. And we can just immediately train a system that's very competent in human 00:01:53.120 | 
language. And so now we have all this analysis stuff from before. And we're still producing 00:02:01.400 | 
more and more of it. There's still all this structure, all this knowledge that we know 00:02:04.420 | 
about language. And the question is, is this relevant at all to NLP? And I'm going to show 00:02:10.200 | 
how it's useful for looking at these models, understanding these models, understanding 00:02:17.280 | 
how things work, what we can expect, what we can't expect from large language models. 00:02:25.760 | 
So in this lecture, we'll learn some linguistics, hopefully. Language is an amazing thing. It's 00:02:32.120 | 
like so fun to think about language. And hopefully, we can instill some of that in you. Maybe 00:02:36.800 | 
you'll go take Ling 1 or something after this. And we'll discuss some questions about NLP 00:02:44.160 | 
and linguistics. Where does linguistics fit in for today's NLP? And what does NLP have 00:02:49.240 | 
to gain from knowing and analyzing human language? What does a 224N student have to gain from 00:02:59.680 | 
So for the lecture today, we're going to start off talking about structure in human language, 00:03:07.640 | 
thinking about the linguistics of syntax and how structure works in language. We're going 00:03:11.280 | 
to then move on to looking at linguistic structure in NLP, in language models, the kind of analysis 00:03:18.640 | 
that people have done for understanding structure in NLP. And then we're going to think of going 00:03:26.660 | 
beyond pure structure, so beyond thinking about syntax, thinking about how meaning and 00:03:36.080 | 
discourse and all of that play into making language language and how we can think of 00:03:41.640 | 
this both from a linguistic side and from a deep learning side. 00:03:46.200 | 
And then lastly, we're going to look at multilinguality and language diversity in NLP. Cool. So starting 00:03:56.260 | 
off with structure in human language, just like a small primer in language in general, 00:04:04.400 | 
if you've taken any intro to linguistics class, you'll know all of this. But I think it's 00:04:08.080 | 
fun to get situated in the amazingness of this stuff. So all humans have language. And 00:04:14.760 | 
no other animal communication is similar. It's this thing which is incredibly easy for 00:04:20.000 | 
any baby to pick up in any situation. And it's just this remarkably complex system. 00:04:26.940 | 
Very famously linguists like to talk about the case of Nicaraguan sign language because 00:04:33.840 | 
it kind of emerged while people were watching in a great way. It's like after the Sandinista 00:04:42.840 | 
Revolution, they started, there's a kind of large public education in Nicaragua and they 00:04:51.160 | 
made a school for deaf children. And there was no central Nicaraguan sign language. People 00:04:58.980 | 
had like isolated language. And then you see this like full language emerge in the school 00:05:03.440 | 
very autonomously, very naturally. I hope this is common knowledge. Maybe it's not. 00:05:09.140 | 
Sign languages are like full languages with like morphology and things like pronouns and 00:05:14.800 | 
tenses and like all the things. It's not like how I would talk to you across the room. Yeah. 00:05:20.480 | 
And so, and what's cool about language is that it can be manipulated to say infinite 00:05:24.000 | 
things. And the brain is finite. So it's either we have some kind of set of rules that we 00:05:29.720 | 
like tend to be able to pick up from hearing them as a baby and then be able to say infinite 00:05:34.840 | 
things. And we can manipulate these rules to really say anything, right? We can talk 00:05:38.960 | 
about things that don't exist, things that can't exist. This is very different from like 00:05:42.560 | 
the kind of animal communication we see like a squirrel, like alarm call or something, 00:05:47.160 | 
you know, it's like, watch out, there's a cat. Things that are like totally abstract, 00:05:53.520 | 
you know, that have like no grounding in anything. We can express like subtle differences between 00:05:58.640 | 
similar things. I always, when I'm thinking about like this point and like things called, 00:06:03.880 | 
yeah, like this featured language, I was thinking of like the Stack Exchange world building 00:06:07.080 | 
thing. I don't know if you ever looked at the sidebar where there's then there's like 00:06:11.360 | 
thing where like science fiction authors kind of pitch like their ideas for like their science 00:06:15.200 | 
fiction world. And it's like the wackiest, like you can really create any world with 00:06:19.480 | 
like with English, with the language that we're given. It's like amazing. And so there's 00:06:25.600 | 
structure underlying language, right? This is, I said recap here, cause we've done like 00:06:28.800 | 
the dependency parsing lectures. We thought about this, right? But you know, if we have 00:06:33.480 | 
some, some sentence like, you know, Isabel broke the window, the window was broken by 00:06:37.040 | 
Isabel, right? We have these two sentences or some kind of relation between them. And 00:06:40.760 | 
then, and then we have another two sentences and they have like the similar relation between 00:06:44.800 | 
them, right? This kind of passive alternation, it's kind of something which exists for both 00:06:47.680 | 
of these sentences, you know, and then we can even use like made up words and it's still, 00:06:53.360 | 
you can still see that it's a passive alternation, right? And so it seems that we have some knowledge 00:06:57.060 | 
of structure that's separate from, from the words we use and the things we say that's 00:07:00.640 | 
kind of above it. And then what's interesting about structure is that it dictates how we 00:07:05.960 | 
can use language, right? So, you know, if, if I have a sentence like the cat sat on the 00:07:10.720 | 
mat and it's, and it looks, you know, and, and then someone tells, tells you like, well, 00:07:16.280 | 
this is, you know, if you make a tree for it's going to look like this, according to 00:07:18.800 | 
my type of tree theory, you would say, well, why should I care about that? And the reason 00:07:24.600 | 
that this stuff is relevant is because it kind of influences what you could do, right? 00:07:30.720 | 
So like any subtree or like, you know, in this specific case, any subtree, in other 00:07:34.800 | 
cases, like many subtrees, it can kind of be replaced with like one item, right? So 00:07:39.280 | 
it's like, he sat on the mat or he sat on it or he sat there, right? Or he did so, did 00:07:44.520 | 
so, it's two words, but you know, there's a lot of ink spilled over do in English, especially 00:07:50.240 | 
in like early linguistic teaching. So we're not going to spill any ink. It's kind of like 00:07:53.400 | 
one word. But then when something is not a subtree, you like, can't really replace it 00:07:58.200 | 
with one thing, right? So you, so you can't express like the cat sat and it kind of like 00:08:02.080 | 
have the mat as a different thing, right? And one, you could be like, he did so on the 00:08:06.000 | 
mat, right? You'd have to kind of do two things. And like, and, and one way you could think 00:08:09.880 | 
about this is that, well, it's not a subtree, right? It's kind of like, you kind of have, 00:08:13.640 | 
have, have to go up a level to, to, to do this. And so you can't really separate the 00:08:18.960 | 
cat from on the mat in this way. And so, and we implicitly know like so many complex rules 00:08:25.220 | 
about structure, right? We're like processing the, these like streams of sound or like streams 00:08:29.560 | 
of letters all the time. And yet we like have these, like the ways that we use them show 00:08:34.600 | 
that we have all these like complex ideas, like the tree I just showed, or like for, 00:08:38.960 | 
for example, these are like, I'm just going to give some examples for like a, a taste 00:08:42.960 | 
of like the kinds of things people are thinking about now, but there's like so many, right? 00:08:48.120 | 
So like, what can we pull out to make a question, right? So like if we form, form a question, 00:08:53.280 | 
we, we form it by like, we were kind of referring to some part of like, you know, there might 00:08:58.280 | 
be another sentence, which like is the statement version, right? And we've kind of pulled, 00:09:03.280 | 
pulled, pulled out some, some part to make the question. They're not necessarily like 00:09:06.840 | 
fully related, but you know, so if say Leon is a doctor, we can kind of pull, pull that 00:09:10.440 | 
out to make a question, right? Like what is Leon? And if we have like, my cat likes tuna, 00:09:14.880 | 
we could pull that out. What does my cat like? Again, do, ignore the do. If we have something 00:09:19.680 | 
like Leon is a doctor and an activist, we actually can't pull out this, this last thing, 00:09:24.800 | 
right? So if something's like in this, if something's like conjoined with an and, we, 00:09:28.560 | 
it can't like be, be taken out of that and, right? You, you, you could only say like, 00:09:32.800 | 
what is Leon? You could be like, oh, a doctor and an activist, but you can't really say 00:09:35.880 | 
what is Leon a doctor and this is like not how question formation works. And you know, 00:09:39.480 | 
this is like some, something that we all know. It's I think something that we've, any of 00:09:42.360 | 
us have been taught, right? Even people who've been taught English as a second language. 00:09:45.200 | 
I don't think this is something which you're ever, which, which ever really taught explicitly, 00:09:50.520 | 
right? But, but, but most of us probably know this very well. Another such rule, right? 00:09:58.760 | 
Is like, when is like, is like, when can we kind of shovel things around, right? So if 00:10:04.400 | 
we have something like I dictated the letter to my secretary, right? We can make like a 00:10:10.040 | 
longer version of that, right? I dictated the letter that I had been procrastinating 00:10:13.040 | 
writing for weeks and weeks to my secretary. This character is like both a grad student 00:10:18.160 | 
and like a high ranking executive. And, and then we can, we can move the, we can move 00:10:25.840 | 
that, that long thing to the end, right? So it's like, I dictated to my secretary the 00:10:28.760 | 
letter that I'd been procrastinating writing for weeks and weeks. And that's like fine. 00:10:31.600 | 
You know, maybe it's like slightly awkwardly phrased, but it's not like, I think this, 00:10:37.200 | 
for me, at least everyone varies, right? Could, could appear in like natural productive speech, 00:10:42.000 | 
but then something like this is like much worse, right? So somehow the fact that it 00:10:46.280 | 
becomes weighty is good and we can move it to the end. But when it doesn't become weighty, 00:10:50.800 | 
we can't, right? And we like, this sounds kind of more like Yoda-y than like real language. 00:10:56.080 | 
And so, and so like, and we have this rule, like this one's not that easy to explain, 00:11:01.840 | 
actually. Like people have tried many ways, like to like make sense of this in linguistics. 00:11:06.280 | 
And it's just like, but it's a thing we all know, right? And, and so when I say rules 00:11:10.440 | 
of grammar, these are not the kind of rules that were usually taught as rules of grammar, 00:11:13.760 | 
right? So a community of speakers, you know, for example, like standard American English 00:11:19.400 | 
speakers, they share this rough consensus of like the implicit rules they all have. 00:11:23.480 | 
These are not the same, you know, like P people have like gradations and disagree on things, 00:11:27.880 | 
but you know, and then kind of like a grammar is an attempt to describe all, all these rules, 00:11:32.960 | 
right? And you can like, kind of linguists might write out like a big thing called like, 00:11:36.600 | 
you know, the like grammar of the English language where they're trying to describe 00:11:40.440 | 
all of them. It's like really not going to be large enough ever. They're like, this is 00:11:45.400 | 
a really hefty book and it's like not still not describing all of them, right? Like language 00:11:49.320 | 
is so complex. But so what, so what we were told as rules of grammar, you know, these 00:11:54.080 | 
kind of like prescriptive rules where they tell us what we can and can't do, you know, 00:11:57.880 | 
they often have other purposes than describing the English language, right? So for example, 00:12:02.040 | 
when they've told us things like, oh, you should never start a sentence with and, you 00:12:05.800 | 
know, that's like not true. You know, we start sentences with and all the time in English 00:12:08.720 | 
and it's fine. You know, what they probably mean, you know, there's some probably like 00:12:15.840 | 
reason that they're saying this, right? Like, especially if you're like trying to teach 00:12:18.160 | 
a high schooler to like, write, you know, you probably, when you want them to focus 00:12:21.160 | 
their thoughts, you probably don't want them to be like, oh, and this, oh, and this again, 00:12:23.600 | 
or, you know, like you want them to like, and so you tell them like, oh, rule of writing, 00:12:26.960 | 
you know, is like, you can never start a sentence with and, right? And when they say something 00:12:30.640 | 
like, oh, it's incorrect to say, I don't want nothing. This is like bad grammar, you know, 00:12:35.480 | 
well, this is, you know, in, in, in standard American English, you probably wouldn't have 00:12:39.720 | 
nothing there, right? Cause you, you would have anything, right? But, but in many dialects 00:12:45.880 | 
of English, you know, in many languages across the world, when you have a negation, right? 00:12:49.240 | 
Like the not and don't, then like everything, it kind of scopes over also has to be negated 00:12:55.360 | 
or has to agree. And many dialects of English are like this. And so what they're really 00:12:58.940 | 
telling you is, you know, the dialect with the most power in the United States doesn't 00:13:02.200 | 
do negation this way. And so you shouldn't either in school. Right. And, and, and so, 00:13:08.640 | 
you know, and so the way that we can maybe define grammaticality, right. Rather than 00:13:12.000 | 
like what they tell us is wrong or right is that, you know, if we choose a community of 00:13:15.160 | 
speakers to look into, they share this rough consensus of their implicit rules. And so 00:13:19.480 | 
like the utterances that we can generate from these rules, you know, are grammatical, roughly, 00:13:25.480 | 
you know, everyone has these like gradations of what they can accept. And if we can't produce 00:13:29.640 | 
not or it's using these rules, you know, it's ungrammatical. And that's where like, this 00:13:33.120 | 
is like the descriptive way of thinking about grammar, where we're, where we're thinking 00:13:36.820 | 
about what people actually say and what people actually like and don't like. And so for an 00:13:41.480 | 
example, you know, in, in English, large, largely, we have a pretty strict rule that 00:13:45.720 | 
like the subject, the verb and the object appear in this like SVO order. There's exceptions 00:13:49.760 | 
to this, like there's exceptions to everything, right? Especially things like says I, in some 00:13:52.520 | 
dialects, but you know, it is like, largely if something is before the verb, it's a subject, 00:13:56.760 | 
something is after the verb, it's an object, and you can't move that around too much. And, 00:14:01.880 | 
you know, we also have these subject pronouns, you know, like I, I, she, he, they, that have 00:14:05.840 | 
to be the subject and these object pronouns, you know, me, me, her, him, them, that have 00:14:09.600 | 
to be the object. And, and, you know, and so if we follow the, these rules, we get a 00:14:15.080 | 
sentence that we think is good, right? Like, I love her. And if we don't, then we get a 00:14:18.720 | 
sentence that we think is, is ungrammatical, right? Something like me love she, it's like, 00:14:21.960 | 
we don't know who is who, you know, who is doing the loving and, and, and who is being 00:14:25.760 | 
loved in, in this one, right? And it's, it doesn't exactly parse. And this is like also 00:14:31.280 | 
true, you know, like even when there's no ambiguity, this continues to be true, right? 00:14:36.280 | 
So for a sentence like me, a cupcake ate, which is like, the meaning is perfectly clear. 00:14:41.460 | 
Our rules of grammaticality don't seem to cut, to cut as much slack, right? We're like, 00:14:44.520 | 
oh, this is wrong. I understand what you mean, but in my head, I know it's like not, you 00:14:48.760 | 
know, correct, even not, not by the like prescriptive notion of what I think is correct, you know, 00:14:52.680 | 
by the descriptive notion, like my, I just don't, don't like it. Right. And, and, and, 00:14:59.320 | 
and you can also, you know, sentences can be grammatical without any meaning. So you 00:15:02.880 | 
can have meaning with, with that grammaticality, right? Like me, a cupcake ate, and you could 00:15:06.600 | 
also have, it's like classic example from, from Chomsky in 1957. I introduced it earlier, 00:15:16.480 | 
but yeah, classically from 1957, you know, like colorless green ideas sleep, sleep furiously, 00:15:21.080 | 
right? Which like this has no meaning, cause you can't really make any sense out of this 00:15:25.560 | 
sentence as a whole, but you know, you know, it's grammatical and you know, it's grammatical, 00:15:28.520 | 
right? Cause you can make an ungrammatical version of it, right? Like colorless green 00:15:31.840 | 
ideas sleeps furious, right? Which does make sense. Cause there's no agreement, even though 00:15:35.400 | 
you don't have any meaning for any of this. And then lastly, you know, people don't fully 00:15:42.520 | 
agree. You know, everyone has their own idiolect, right? People like usually speak like more 00:15:46.440 | 
than one dialect and they kind of move between them and they have a mixture and those have 00:15:49.240 | 
like their own way of thinking of things. They also have these, like, those have different 00:15:52.600 | 
opinions at the margins. People like, like some things more, others don't, right? So 00:15:57.080 | 
an example of this is like, not everyone is as strict for some WH constraints, right? 00:16:01.400 | 
So if you're trying to pull out something like, I saw who am I doubted report that would 00:16:05.320 | 
capture in the nationwide FBI manhunt, this from a paper by a Hofmeister and Ivan Sogg 00:16:10.200 | 
from Stanford. This is like, some people like it, some people don't, you know, it's kind 00:16:14.480 | 
of, some people can like clearly see it as like, Oh, it's the who that we had captured 00:16:17.800 | 
and Emma doubted the reports that we had captured. You know, and some people are like, this is 00:16:21.560 | 
as bad as like, what is the, on a doctor and I don't like it. Right. So yeah, so that's 00:16:30.040 | 
grammaticality. And the question is like, why do you even need this? Right. It's like, we, 00:16:35.000 | 
we like, we like accept these useless utterances and we block out these perfectly communicative 00:16:39.400 | 
utterances. Right. And, and this is like, I started off saying that this is like a fundamental 00:16:43.520 | 
facet of human intelligence. Like it seems kind of, you know, a strange thing to have. 00:16:49.040 | 
And so I think one thing I keep returning on when I think about linguistics is that 00:16:53.840 | 
a basic fact about languages that is that we can say anything, right. There's like really 00:16:58.040 | 
every language, you know, can express anything, you know, and if like, there's no word, word 00:17:01.080 | 
for something people will develop it if they want to talk about it. Right. And so if we 00:17:05.760 | 
ignore the rules because we know what it's probably intended, right. You know, then we 00:17:10.520 | 
would be limiting possibilities. Right. So in my kitchen horror novel, where the ingredients 00:17:13.800 | 
become sentient, I want to say the onion chopped the chef. And if people, if people just assumed 00:17:18.800 | 
I meant the chef chopped the onion because like SVO order doesn't really matter, then 00:17:24.040 | 
I can't, I can't say that. So then, yeah, to, to like, to conclude, you know, a fact 00:17:33.360 | 
about language that that's like very cool is that it's compositional, right. We have 00:17:38.400 | 
the set of rules that defines grammaticality and then this like, and then this lexicon, 00:17:42.960 | 
right. This like dictionary of words that, that relate to the world we want to talk to. 00:17:46.080 | 
And we kind of combine them in these limitless ways to say anything we want to say. Cool. 00:17:50.720 | 
Any questions about all this? I've like tried to bring a lot of like linguistic fun facts, 00:17:54.360 | 
like top of mind for this lecture. So hopefully, hopefully have answers for things you want 00:18:01.760 | 
to know. Cool. Cool. Yeah. Cool. So, so now, you know, that was a nice foray into like 00:18:11.000 | 
a lot of like sixties linguistics. You know, how, how, how does that relate to us like 00:18:16.280 | 
today? Right. In NLP. And so we said that in humans, you know, like we can think about 00:18:23.800 | 
languages, it's like there's a system for producing language, you know, that can be 00:18:27.320 | 
described by these discrete rules, you know, so it's not like it's smaller than all the 00:18:32.000 | 
things that we can say. There's this kind of like rules that we can kind of put together 00:18:34.680 | 
to say things. And so do NLP systems work, work like that? And one answer is like, well, 00:18:40.160 | 
they definitely used to, right? Because as you said in the beginning, before self supervised 00:18:44.320 | 
learning, the way to approach doing NLP was through understanding the human language system, 00:18:50.760 | 
right? And then trying to imitate it, trying to see, you know, if you think really, really 00:18:53.520 | 
hard about how humans do something, then you kind of like code up a computer to do it. 00:18:58.400 | 
Right. And so for, for one example, like, you know, parsing used to be like super important 00:19:03.160 | 
in, in, in, in NLP. Right. So, and this is because, you know, as an example, if I want 00:19:09.120 | 
my sentiment analysis system to classify a movie review correctly, right. Something like 00:19:14.120 | 
my uncultured roommate hated this movie, but I absolutely loved it. Right. How would, how, 00:19:19.160 | 
how would we do this before we had like chat GBT? We, we, we, we, you know, we might have 00:19:24.560 | 
some semantic representation of words like hate and uncultured, you know, it's not looking 00:19:27.720 | 
good for the movie, but you know, how, how, how does everything relate? Well, you know, 00:19:33.120 | 
we, we might ask how would human structure this word, you know, so many linguists, you 00:19:36.680 | 
know, there's many theories of how to make, you know, of how syntax might work, but they 00:19:40.920 | 
would tell you some, some, something like this. So it's like, okay, now I'm interested 00:19:44.320 | 
in the I, right. Cause that's like probably what, what the review relates to. They're 00:19:48.240 | 
just worrying stuff about uncultured and hated, but it seems like those are related like syntactically 00:19:52.720 | 
together, right? It's like the roommate hated and that can't really connect to the I right. 00:19:57.120 | 
So the I can't, can't really be related to the hated, right. Cause there's kind of separated. 00:20:03.200 | 
They're like separate sub sub trees separated by this like conjunction by this, but relation. 00:20:10.280 | 
And so, and so it seems that I goes with loved, which is looking good for the movie that, 00:20:15.280 | 
you know, we have loved it. And so then we have to move beyond the rules of, of, of syntax, 00:20:19.520 | 
right. The rules of like discourse, how, how would this kind of, you know, like what could 00:20:24.400 | 
it mean? You know, and there's like a bunch of rules of discourse. Now, if you say it, 00:20:27.120 | 
you're probably referring to like the latest kind of salient thing that's, you know, matches 00:20:31.760 | 
in like, you know, it is probably non-sentient, right. And so, you know, in this case it would 00:20:36.240 | 
be movie, right. So, so, so then, you know, like linguistic theory, you know, they helped 00:20:44.680 | 
NLP it helped NLP reverse engineer language. So you had something like input, you know, 00:20:49.600 | 
it'd get like syntax, you get semantics from, from the syntax, right. So you would take 00:20:55.040 | 
the tree and then from the tree kind of build up all these like little, you know, like you, 00:21:00.400 | 
you, you, you can build up these little functions of like how, how, how things, how things like 00:21:04.840 | 
relate to each other. And then, and then you, you'd go to discourse, right. So, so, so what 00:21:10.320 | 
refers to what, what, what nouns are being talked about, what things are being talked 00:21:14.760 | 
about and, you know, and, and then whatever else was interesting for your specific use 00:21:24.880 | 
case. Now we don't need all that, right. Language models just seem to catch on to a lot of these 00:21:29.640 | 
things, right. So, so, so this whole thing that I did with the tree is like Chachupitino 00:21:34.840 | 
does, and those much harder things than this, right. This was like, this isn't even like 00:21:38.160 | 
slightly prompt engineered. I just like woke up one morning, I was like, oh, there's another 00:21:41.200 | 
lecture going to put that into chat GPT. And this exactly, you know, I didn't even get 00:21:44.920 | 
some like, yeah, stop. Well, I guess I got a bit of moralizing, but I just like immediately, 00:21:51.600 | 
immediately just told, told, told me, you know, who, who likes it, who, who, who doesn't 00:21:55.040 | 
like it and why I'm doing something like slightly wrong, which is how it ends everything, right. 00:22:04.360 | 
And so, and so, you know, NLP systems definitely used to, this is where we were, work in this 00:22:11.720 | 
kind of structured, discrete way. But now NLP works better than it ever has before. 00:22:16.320 | 
And we're not constraining our systems to know any syntax, right. So what, what about 00:22:21.120 | 
structure in modern language models? And so this question is like, do the question of 00:22:29.440 | 
like a lot of analysis work has, has, has, has been focused on, you know, I think we'll 00:22:33.520 | 
have more analysis lectures later also. So this is going to be, you know, looked at in 00:22:37.520 | 
more detail, right. Is how could you get from training data, you know, which is just kind 00:22:41.680 | 
of like a loose set of just things that have appeared on the internet or sometimes not 00:22:45.560 | 
on the internet rarely, right. To rules about language, right. To, to, to, to the idea that 00:22:51.200 | 
there's this like structure underlying language that we all seem to know, even though we do 00:22:54.220 | 
just talk in streams of things that then sometimes appear on the internet. And one way to think 00:23:00.120 | 
about this is like testing, you know, is testing how novel words and old structures work, right. 00:23:07.480 | 
So humans can easily integrate new words into our old syntactic structures. I remember like 00:23:12.520 | 
I had lived in Greece for a few years for middle school, just speak, not speaking English 00:23:16.240 | 
too much. And I came back for high school and, and yeah, and, and this was like in Berkeley 00:23:24.360 | 
in the East Bay. And there was like, there was literally like 10 new vocabulary words 00:23:28.240 | 
I'd like never heard of before. And they all had like a very similar role to like dank 00:23:31.800 | 
or like sick, you know, but they were like the ones that were being tested out and did 00:23:34.160 | 
not pass. And within like one, you know, one day I immediately knew how to use all of them, 00:23:39.680 | 
right. It was not, it was not like a hard thing for me. I didn't have to like get a 00:23:43.200 | 
bunch of training data about how, how to use, you know, all these words. Right. And so this 00:23:49.000 | 
kind of like is, is, is one way of arguing that, you know, the thing I was arguing for 00:23:53.680 | 
the whole first part of the lecture, that's syntactic structures, they exist independently 00:23:58.120 | 
of the words that they have appeared with. Right. A famous example of this is, is Lewis, 00:24:04.200 | 
Lewis Carroll's poem, Jabberwocky. Right. I was going to quote from it, but I can't 00:24:07.400 | 
actually see it there. Right. Where they, where they, you know, where he just like made 00:24:12.080 | 
up a bunch of new words and he just made this poem, which is all new open class words, open 00:24:16.440 | 
class words, what we call, you know, kind of like nouns, verbs, adjectives, adverbs, 00:24:21.760 | 
classes of words that like we add new things to all the time while, while things like conjunctions, 00:24:27.880 | 
you know, like and or but are closed class. Oh, there's been a new conjunction added late, 00:24:32.240 | 
added recently. I just remembered after I said that. Does anyone know like of a conjunction 00:24:36.720 | 
that's kind of the past like 30 years or something, maybe 40? Spoken slash, like now we say slash 00:24:42.080 | 
and it kind of has a meaning that's like not and or but, or, or, or, or, but it's, it's 00:24:46.160 | 
a new one, but it's closed class generally. This happens rarely. Anyway. And, and, and 00:24:50.440 | 
so, you know, you, you, you have like twas brillig and the slithy toves, did gyre and 00:24:55.000 | 
gimble and the wave, right? Toves is a noun. We all know that we've never heard it before. 00:24:59.000 | 
And in fact, you know, one word for, from, from Jabberwocky chortle actually entered 00:25:03.560 | 
the English vocabulary, right? It kind of means like a, like a little chuckle that's 00:25:07.120 | 
maybe slightly suppressed or something. Right? So, so, so it shows like, you know, there 00:25:11.080 | 
was one, literally like one example of this word and then people picked it up and started 00:25:15.160 | 
using it as if it was a real word. Right? So, and so one, one way of asking do language 00:25:22.600 | 
models have structures, like do they have this ability? And, you know, and I was thinking 00:25:27.760 | 
it would be cool to go over like a benchmark about this. Right? So like the kind of things, 00:25:31.080 | 
so people like make things where you could test your language models to, to, to see if 00:25:34.560 | 
it does this. Yeah. Are there any questions until now? I go into just like this new benchmark. 00:25:44.920 | 
Cool. So yeah, the COGS benchmarks, the composition rule and generalization from semantics. 00:25:52.560 | 
Benchmark or something. Right. It kind of checks if, if language models can, can, can 00:26:00.560 | 
do new word structure combinations. Right? So, so the, the task at hand is semantic interpretation. 00:26:07.160 | 
This is, I kind of glossed over it before, but it's like if you have, if you have a sentence, 00:26:11.440 | 
right, like the girls saw the hedgehog, you have this idea that like, and you've seen 00:26:15.480 | 
what like saw is a function that takes in two arguments and it outputs at the first 00:26:20.240 | 
one saw the second one, you know, this is like a bit of like, you know this is like 00:26:24.400 | 
one way of thinking about semantics. There's many more as we'll see, but you know, this 00:26:27.400 | 
is one. And so like, and so, and so you can make a little like kind of Lambda expression 00:26:33.080 | 
about you know, about how, how, how, you know, what the sentence means and to get that you 00:26:39.640 | 
kind of have to use the, the, the tree to get it correct. But anyway, the, the specific 00:26:46.640 | 
mechanism of this is not very important, but it's just like the semantic interpretation 00:26:49.400 | 
where you take the girls saw the hedgehog and you, and you output this like function 00:26:52.080 | 
of like, you know, C takes two, two arguments, you know, first is the girl, second is the 00:26:56.440 | 
hedgehog. And then, and then the training on a test set, they have distinct words and 00:27:01.460 | 
structures in, in, in different roles. Right. So, so, so for example, you know, you have 00:27:07.200 | 
things like Paula, right. Or the hedgehog is like always an object in the, in the training 00:27:13.240 | 
data. So when you're fine tuning to do this task, but then in the test data, it's a subject, 00:27:17.880 | 
right. So it's like, can, can, can you like, can you, can you use this word that you've 00:27:24.680 | 
seen, you know, in, in a new kind of, in, in, in a new place. Cause in English, anything 00:27:28.880 | 
that, that, that, that, that's an object can be a subject, you know, with like some, there's 00:27:34.000 | 
some subtlety around like some things are more likely to be subjects, but yeah. And 00:27:37.960 | 
then similarly, you know, if, if you have something like the cat on the mat, you know, 00:27:42.160 | 
and it always appears. So, so this idea that, that like a noun can go with like a prepositional 00:27:48.440 | 
phrase, right. But that's always, always in the subject, right. Like Emma saw the cat 00:27:51.920 | 
in the mat. And then like, can, can you do something like, you know, the cat on the mat 00:27:56.320 | 
saw Mary, right. So it's like move that kind of structure to subject position, which is 00:27:59.880 | 
something that in English we can do, right. Like any type of noun phrase that can be in 00:28:04.560 | 
an object position can be in subject position. And so that, and so that's the, the, the Cogs 00:28:08.960 | 
benchmark, you know, large language models haven't aced this yet. I wrote this and like 00:28:13.120 | 
I was looking over this slide and I was like, well, they haven't checked the largest ones. 00:28:16.440 | 
You know, they never do check the largest ones because it's really hard to like do this 00:28:20.400 | 
kind of more, more like analysis work, you know, and things move so fast, fast, like 00:28:24.680 | 
the really large ones. But you know, T5, 3 billion, you know, 3 billion is like a large 00:28:29.000 | 
number. It's maybe not a large language model anymore. But, you know, they don't ace this, 00:28:33.920 | 
right. They're, they're getting like 80% while like when they don't have to do the structural 00:28:37.720 | 
generalization when they can just like do like a test set, which, which, which like 00:28:42.000 | 
things appear in the same role as it in training set, they get like 100% easy. It's not a very 00:28:45.560 | 
hard task. And so, you know, this is like, but still pretty good, you know, and it's 00:28:50.880 | 
probably like if a human had never ever seen something in subject position, I'm not sure 00:28:55.440 | 
that it would be like 100% as easy as if they had, you know, like I think that, you know, 00:29:00.000 | 
we don't want to fully idealize how, how, how things were, were working humans, right. 00:29:05.400 | 
So similarly, you can take literal Jabberwocky sentences, right. So, so, so build, building 00:29:12.280 | 
on some, some work that John did that I'm sure you'll talk about later. So I'm not going 00:29:15.160 | 
to go in, but maybe I'm wrong on that assumption, right. We can like kind of test the models 00:29:19.800 | 
like embedding space, right. So if we go high up in the layers and test the embedding space, 00:29:24.180 | 
we can test it to see if it encodes structural information, right. And, and so we can test 00:29:28.760 | 
to see like, okay, is there like a, a rough representation of like syntactic tree relations 00:29:35.920 | 
in this latent space. And, and, and then these, yeah, and then a recent paper asked, does 00:29:44.560 | 
this work when we introduce new words, right. So if we, so if we take, you know, if we take 00:29:49.320 | 
like Jabberwocky style sentences and then ask, can the model find out these, the, the 00:29:54.400 | 
trees and these in its latent space, does it like encode them? And, and, and, and the 00:29:59.760 | 
answer is, you know, like it's kind of worse, you know, in, in, in this graph, the, the 00:30:03.680 | 
hatched bar, so the ones on the right are the Jabberwocky sentences and the, and the, 00:30:09.040 | 
and the clear ones or the not hatched ones, I guess, are the ones, are, are, are, are 00:30:13.840 | 
the normal sentences. And we see, you know, performance is worse, you know, so this is 00:30:17.680 | 
like unlabeled attachment score on the Y axis. It is like, you know, forms probably 00:30:21.400 | 
worse than humans, right? It's easier to read a normal poem than to read Jabberwocky. So, 00:30:24.840 | 
you know, the extent to which this is like damning or something, you know, is I, I think 00:30:28.600 | 
very, very small. I think the paper is, I have linked it there, but, you know, I think 00:30:32.200 | 
the paper is maybe a bit more, um, um, uh, sure about this being a big deal maybe than 00:30:37.720 | 
it is. But yeah, you know, it, it, it does show that, that, that this kind of process 00:30:45.400 | 
What are the words that, like, applies for Jabberwocky substitutions? 00:30:50.400 | 
Oh, so this is, um, this is, uh, something called like phonotactics, right? So, so in, 00:30:56.400 | 
uh, I think like this is probably around, kind of what you're asking that it's like, 00:31:00.400 | 
you want a word which sounds like it could be in English, right? Like pro- like provocated, 00:31:05.400 | 
right? It sounds like it can't be in English. You know, a classic example is like, you know, 00:31:09.400 | 
like blick, it could be an English word, you know, bnick, can't, right? We can't start 00:31:12.400 | 
And that's not an impossibility of the mouth. 00:31:15.660 | 
It's similar things like pterodactyl, pneumonia. 00:31:19.620 | 
These come from Greek words like pneumonas and ptero. 00:31:26.980 | 
Like PN and PT, I can put them at the beginning of a syllable. 00:31:33.280 | 
And so if you follow these rules and also add the correct suffixes and stuff. 00:31:39.520 | 
So like provocated we know is like past tense and stuff. 00:31:43.460 | 
Then you can make kind of words that don't exist but could exist. 00:31:54.080 | 
You don't want to do something totally wacky to test the models. 00:32:01.540 | 
So when you generate this test set with these JavaScript substitutions, 00:32:06.860 | 
are these words generated by a computer or is there a human coming up with words that sound like English? 00:32:17.820 | 
And I think they get theirs from some list of them. 00:32:21.420 | 
Because if you have 200, that's enough to run this test. 00:32:28.380 | 
I mean, I think that the phonotactic rules of English can be actually laid out kind of simply. 00:32:33.940 | 
It's like you can't really have two stops together. 00:32:43.100 | 
You can probably make a short program or a long-ish program, but not a very super complex one 00:32:53.000 | 
So I'm wondering how the model would tokenize these jabberwocky sentences. 00:32:56.740 | 
Would it not just map all these words like provocated just to the unknown? 00:33:01.180 | 
So these are largely models that have word piece tokenizers. 00:33:08.260 | 
So if they don't know a word, they're like, OK, what's the largest bit of it that I know? 00:33:17.100 | 
It's like back in the day-- and this is like back in the day meaning until maybe like six or seven years ago, 00:33:22.060 | 
it was very normal to have UNK tokens, like unknown tokens. 00:33:24.460 | 
But now generally, there is no such thing as an unknown. 00:33:27.700 | 
You put like kind of at a bare minimum, you have like the alphabet in your vocabulary. 00:33:33.380 | 
So at a bare minimum, you're splitting everything up into like letter by letter tokens, character by character tokens. 00:33:45.460 | 
yeah, it should find kind of like-- and this is why the phonotactic stuff is kind of important for this, right? 00:33:53.100 | 
That it's tokenized like hopefully in like slightly bigger chunks that have some meaning. 00:33:57.620 | 
And because of how attention works and how contextualization works, you can like-- 00:34:01.500 | 
even if you have like a little bit of a word, you can give the correct kind of attention to it 00:34:08.540 | 
once it figures out what's going on a few layers in for like a real unknown word. 00:34:27.220 | 
A few slides back, there was like 80% scores that you were saying these are not-- 00:34:36.940 | 
I'm just trying to get a sense of what 80% means in that context. 00:34:43.940 | 
I think the relevant comparison is that, well, you didn't have this kind of structural difference, 00:34:53.820 | 
was like something which was like never an object was then an object. 00:34:56.620 | 
You know, the like the accuracy on that test set is like 100% like easy. 00:35:03.540 | 
And so it kind of-- there was no good graph which showed these next to each other. 00:35:09.180 | 
And so I think like that's like the relevant piece of information that like somehow this like swapping 00:35:16.860 | 
That being said, you're right, like exact match of semantic parts is kind of a hard metric. 00:35:22.940 | 
this is-- yeah, none of this stuff, and I think this is important. 00:35:25.900 | 
None of this stuff is like they do not have the kind of rules human have. 00:35:28.420 | 
This is also like, well, there's a bit of confusion. 00:35:34.820 | 
And I'm going to go into that in the next section too. 00:35:42.780 | 
Overall, like I think the results are like surprisingly not damning, I would say. 00:35:47.220 | 
there's like clearly like, you know, maybe not the fully like programmed discrete kind of rules. 00:35:56.420 | 
Another thing we could do, yeah, is test how syntactic structure kind of maps onto like meaning and role, right? 00:36:02.100 | 
And so like as we said before, right, like in English, the syntax of word order gives us the who did what to whom meaning. 00:36:09.460 | 
And so, you know, if we have like, you know, for any combination like a verb and be, 00:36:14.700 | 
if I sound like a verb be, you know, like a is the doer, b is the patient. 00:36:19.220 | 
And so we ask like, is this kind of relationship, you know, strictly represented in English language models as it is like in the English language? 00:36:29.420 | 
And so what we could do is that we could take a bunch of things which like, you know, appear in subject position, 00:36:35.340 | 
a bunch of things which appear in object position and take their latent space representation and kind of learn, you know, 00:36:48.860 | 
learn like a little classifier, you know, this should be like a pretty clear distinction in latent space. 00:36:52.860 | 
In any like good model, right, like which like these models are good, this should be a pretty clear distinction. 00:36:57.020 | 
We could just like a linear classifier to kind of separate them, right? 00:36:59.740 | 
And the more on the one side you are, you're more subject, the more on the other side you are, you're more object, right? 00:37:05.860 | 
And so then we can test, you know, does the model know the difference, you know, 00:37:14.260 | 
be between when something is a subject and when something is an object, you know, 00:37:18.140 | 
does it know that like you're going to go on opposite sides of this dividing line, you know, 00:37:26.340 | 
even if like everything else stays the same and all the clues point to something else, right? 00:37:30.580 | 
So it's like does syntax map onto role in this way? 00:37:33.300 | 
You might think like, well, I could just check if it's like second or like fifth, right? 00:37:37.100 | 
But, you know, we've actually, we, yeah, this is a proof that I wrote, you know, we did like compare, you know, 00:37:42.380 | 
we like try to control for like position stuff in various ways. 00:37:48.180 | 
And so it's hopefully we claim we're kind of showing like the like syntax to role mapping. 00:37:56.500 | 
So if we kind of graph the distance from that dividing line, you know, on the y-axis, 00:38:04.620 | 
we see like the original subjects when we swap them and put them in object position, 00:38:11.260 | 
they do like diverge as we go up layers in that dimension. 00:38:15.700 | 
And we tried this again, you know, all this analysis experiment with some kind of small models, 00:38:18.580 | 
with some BERT, with some GPT-2, you know, with some like a bigger version of GPT-2 and it worked out. 00:38:22.380 | 
But it's like, you know, none of this is like, you know, none of this is like the big, big stuff. 00:38:30.420 | 
I think now we're starting to see more analysis on the big, big stuff. 00:38:36.780 | 
So then where are we with like structure and language models, right? 00:38:39.300 | 
We know that language models are not, they're not engineered around discrete linguistic rules. 00:38:45.100 | 
But the pre-training process, you know, it isn't just a bunch of surface level memorization, right? 00:38:49.700 | 
There is some kind of like discrete rule-based system kind of coming out of this. 00:38:56.420 | 
You know, maybe it's not the perfect kind of thing you would like write down in a syntax class, 00:39:00.500 | 
but, you know, there is some syntactic knowledge, you know, and it's complicated in various ways. 00:39:06.260 | 
And that's what we're going to get to next, right? 00:39:09.140 | 
There's no ground truth for how language works yet, right? 00:39:11.700 | 
Like if we knew how to fully describe English, right, with a bunch of good discrete rules, 00:39:17.020 | 
we would just like make an old pipeline system and it would be amazing, right? 00:39:21.500 | 
If we could like take the Cambridge grammar of English, but like it was truly, truly complete. 00:39:26.980 | 
If we just knew how English worked, we would do that. 00:39:29.540 | 
And so we're working on this case where there's no really no ground truth. 00:39:42.260 | 
So moving beyond this kind of like very structure-based idea of language, 00:39:48.460 | 
I think it's very cool to learn about structure in this way. 00:39:51.140 | 
And like at least how I was taught linguistics, it was like a lot of it, 00:39:53.940 | 
the first like many semesters was like this kind of stuff. 00:39:59.740 | 
But then, but I think there's like so much more. 00:40:02.580 | 
And like very important, I think that meaning plays a role in linguistic structure, right? 00:40:10.020 | 
Like there's a lot of rich information in words that affects like the final way that like the syntax works. 00:40:16.260 | 
And of course, what like you end up meaning and like what like the words influence each other to mean, right? 00:40:21.740 | 
And so like the semantics of words, right, the meaning, 00:40:24.380 | 
it's like always playing a role in forming and applying the rules of language, right? 00:40:28.580 | 
And so, you know, for example, like a classic example is like, you know, verbs, 00:40:31.900 | 
they like have kind of like selectional restriction, right? 00:40:34.260 | 
So like ate can like take kind of any food and it can also take nothing. 00:40:37.420 | 
It's like I ate, it means that I've just like I've eaten, right? 00:40:41.860 | 
The word devoured actually can't be used intransitively, right? 00:40:47.860 | 
There's verbs like elapsed that only take like, you know, a very certain type of noun, right? 00:40:52.820 | 
Like elapsed only takes nouns that refer to time, you know, 00:40:59.140 | 
so maybe like harvest can refer to time, moon can refer to time, somewhere, you know, 00:41:02.500 | 
it's trees, it cannot take a verb like trees, right? 00:41:04.820 | 
There's even verbs that only ever take one specific noun as their argument, right? 00:41:09.580 | 
I think, yeah, my- my advisor Dan- Dan Jirowski told me this one to put it in. 00:41:16.980 | 
And- and- and what's cool is that like that- that's how we train models these days. 00:41:20.460 | 
If you see this- this diagram I screenshotted from John's Transformers lecture, right? 00:41:28.740 | 
We start with these like a thousand on the order of like a thousand, you know, 00:41:32.500 | 
depending on the model, size, embeddings, right? 00:41:35.580 | 
Which it's like, think of how much information you can express like on a plane, right? 00:41:39.260 | 
On two dimensions, it's like the kind of richness that you can fit into a thousand dimensions, you know, 00:41:43.300 | 
it's huge and we start with these word- word- word embeddings and then move on, right? 00:41:48.460 | 
It's like the attention block and- and everything. 00:41:51.780 | 
And so, yeah, I'm just gonna go through some examples of the ways that- that languages, 00:41:57.060 | 
you know, the ways that like meaning kind of plays a role in forming syntax, 00:42:01.980 | 
hopefully it's like fun, a tour through like the cool things that happen in language, right? 00:42:06.900 | 
So, as we said, you know, anything can be an object, anything can be a subject, 00:42:11.260 | 
we want to be able to say anything, language can like express anything, 00:42:14.180 | 
this is like kind of a basic part of language. 00:42:16.300 | 
But, you know, many languages they have a special syntactic way of- of dealing with this, right? 00:42:20.620 | 
So, they want to tell you like if there's an object that you wouldn't expect, right? 00:42:24.220 | 
Like in this case, I want to tell you, hey, watch out, you know, the- be careful, 00:42:28.300 | 
we're- we're dealing with a weird object here, right? 00:42:30.980 | 
So, this is like kind of in the syntax of languages, you know, if you're- if you're- 00:42:34.740 | 
if you're a native speaker or- or you've learned Spanish, right? 00:42:38.660 | 
You- you know, this like a constraint, right? 00:42:40.700 | 
So, if you say like, you know, so if something is a- is an object but it's inanimate, 00:42:47.260 | 
you don't need the a because you're like, yeah, I found a problem. 00:42:49.700 | 
But then if you're putting something animate in the object position, 00:42:52.620 | 
you need to kind of mark it and you'd be like, hey, watch out, you know, there- there's an object here. 00:42:56.180 | 
And that's like a rule of the grammar, right? 00:43:00.980 | 
Similarly, like Hindi has a kind of a more subtle one, but I think it's cool, right? 00:43:07.340 | 
So, you- to- if- if you put an object that is definite, 00:43:12.740 | 
you have to mark it with a little like- this is an object marker, right? 00:43:18.220 | 
And like, you might ask, okay, I understand why like animacy is- is- is- is- is a big deal, right? 00:43:26.180 | 
Like, you know, maybe animate things more often do things and have things done to them. 00:43:33.980 | 
Like, why- why would you need this little like call marker, this like the goat versus a goat? 00:43:39.460 | 
And it's like, well, probably something is definite. 00:43:41.020 | 
It means that it's like- it means that- that it's like in the kind of in- 00:43:45.540 | 
we've like kind of probably been talking about it or we're all thinking about it, you know. 00:43:48.820 | 
For example, it's like, oh, I ate the apple, right? 00:43:51.140 | 
This means that either like we had one apple left and I ate it or like it was like really rotten or something. 00:43:55.140 | 
You can't believe I ate it, right? Or something like that. 00:43:57.220 | 
And so like, then things that we're already talking about, 00:43:59.940 | 
they're probably more likely to be subjects, right? 00:44:02.340 | 
Like if we're all, you know, you know, if- if I was like, oh, Rosa, you know, like, 00:44:08.060 | 
yeah, I feel like Rosa did this and Rosa did- did- did that and Rosa that. 00:44:12.340 | 
And then- and then- and- and- and then like Leon kissed Rosa. 00:44:15.260 | 
You'd be like, no, you probably want to be like Rosa kissed Leon, right? 00:44:17.100 | 
You probably want to put, you know, it's not strict, 00:44:18.860 | 
but if you're talking about something, you're probably- 00:44:20.860 | 
it's probably going to be the subject of the next sentence. 00:44:22.580 | 
So then if it's the goat, you- you have to put a little accusative marker on it. 00:44:26.540 | 
So this is like how like the marking in the language works, 00:44:30.540 | 
and it's kind of all influenced by this like interesting semantic relationship. 00:44:36.100 | 
And language models are also aware of these gradations. 00:44:39.100 | 
And it's, you know, in a similar like classifying sub- subjects and objects paper that- that- that we wrote, 00:44:47.540 | 
we see that language models also have these gradations, right? 00:44:50.940 | 
So if you like- again, if you like map- map the probability of being 00:44:54.140 | 
with that classifier on the y-axis, right, we see that there's- there's a high accuracy, right? 00:45:00.460 | 
we have the subjects, they're classified above. 00:45:01.980 | 
On the right, we have the object, they're classified below. 00:45:03.940 | 
But, you know, animacy kind of influences this grammatical distinction, right? 00:45:08.100 | 
So like if you're animate and a subject, you're very sure. 00:45:11.500 | 
If you're inanimate and an object, you're very sure. 00:45:13.500 | 
Anything else, you're kind of close to 50, you know? 00:45:16.100 | 
And so it's like this- this kind of a- this kind of relation where the meaning plays 00:45:25.500 | 
into the structure is- is reflected in language models, you know? 00:45:33.780 | 
Or, you know, it kind of- we should like, you know, temper our expectations maybe away 00:45:38.820 | 
from the like fully- fully syntactic things that we're talking about. 00:45:44.580 | 
Another kind of cool- cool example of like- of how meaning can influence, you know, what 00:45:51.700 | 
What we can say- I've said from the beginning many times that all kind of combinations of 00:45:55.900 | 
structures and words are possible, but that's not strictly true, right? 00:46:00.100 | 
So in many cases, if something is like too outlandish, we often do just assume the more 00:46:05.740 | 
So like there's these psycholinguistics experiments where they kind of test this- what's, you 00:46:10.900 | 
know, like these kind- these kind of like giving verbs. 00:46:13.460 | 
Verbs is like, you know, the mother gave the daughter the candle and you could actually 00:46:16.260 | 
like switch that around, you know, you could do like- sounds like the date of alternation, 00:46:19.740 | 
but you switch that around to make the mother give the candle to the daughter. 00:46:24.580 | 
And then if you- if you switch around who's actually being given, right? 00:46:28.580 | 
So if you're actually saying the mother gave the candle to the daughter, people don't really- 00:46:36.260 | 
people don't interpret that like in its literal sense. 00:46:38.780 | 
They usually interpret it as like the mother gave the daughter the candle. 00:46:41.600 | 
And like, of course, outlandish meanings, you know, they're never impossible to express, 00:46:47.380 | 
And so you can like kind of spell it out, you know, you could be like, well, the mother, 00:46:52.060 | 
she picked up her daughter and she handed her to the candle, you know, who is sentient. 00:46:55.420 | 
And then you could say this, but you like can't- you can't do it simply with the give 00:47:00.900 | 
word, like people tend to interpret it the other way. 00:47:03.940 | 
And so like marking these like less prominent things and marking them- sorry, these less 00:47:07.580 | 
plausible things and marking them more prominently, there's like pervasive feature that we say 00:47:13.500 | 
And all these ways are like, you know, also like very like embedded in the grammar as 00:47:21.260 | 
So another way that's, you know, in where- how we see meaning kind of play in to, you 00:47:31.540 | 
know, and kind of break apart this like full compositionality, you know, syntax picture, 00:47:37.100 | 
right, is that meaning can't always be composed from individual words, right? 00:47:41.060 | 
And just full of idioms, you know, sometimes when you talk about idioms, you, you know, 00:47:45.140 | 
you might think, okay, there's maybe like 20 of them, you know, things like my grandfather 00:47:48.260 | 
would say, you know, things about like chickens and donkeys. 00:47:53.500 | 
You know, we're actually constantly using constructions that, that, you know, that we 00:47:56.740 | 
couldn't actually like get from like, you know, they're kind of like idiomatic in their 00:48:01.140 | 
little sense, right, that we couldn't actually get from like composing the words, right? 00:48:05.220 | 
Things like, I wouldn't put it past him, he's getting to me these days, that won't go down 00:48:09.860 | 
There's like so, so, so many of these, and it's kind of like a basic part of, of communication 00:48:15.700 | 
to kind of use the, these little like canned idiomatic phrases, you know, and like linguists 00:48:22.700 | 
love, love, love saying that like, oh, any string of words you say is like totally novel, 00:48:27.780 | 
you know, and it's like probably true, you know, I've been speaking for like 50 minutes, 00:48:32.060 | 
you know, and like probably no one has said this exact thing like ever before, I just 00:48:34.740 | 
use the computational rules of English to make it. 00:48:36.340 | 
But actually, most of my real utterances like, oh, yeah, no, totally, right, like something 00:48:41.980 | 
like that, which is actually people say that all the time, right? 00:48:44.220 | 
Most of my real utterances are like, people say that all the time, you know, we have these 00:48:47.700 | 
little canned things that we love reusing, and that, and that, you know, we reuse them 00:48:50.980 | 
so much that like they stop making sense if you break them apart into individual words, 00:48:55.900 | 
And we even also even have these constructions that can like take arguments, but like don't 00:48:59.860 | 
really, you know, so, so they're not like canned words, they're kind of like a canned 00:49:03.740 | 
way of saying something that, you know, doesn't really work if you build up from the syntax, 00:49:07.940 | 
So like, oh, he won't, he won't eat shrimp, let alone like oyster, right? 00:49:14.140 | 
Well, it means like I'm defining some axis of like, you know, of like moreness, right? 00:49:19.580 | 
In this case, probably like selfish and like, shellfish and like, weird or something, you 00:49:25.260 | 
know, and so it's like, well, shrimp is less weird, so oysters more, you know, and if I 00:49:27.860 | 
say like, oh, he won't eat shrimp, let alone beef, right? 00:49:31.220 | 
So it's like this construct, it does like kind of like a complex thing, right? 00:49:35.340 | 
Where you're saying like, he won't do one thing, let alone the one that's worse in the 00:49:38.340 | 
dimension, you know, like, it's like, oh, she slept the afternoon away, he knitted the 00:49:43.060 | 
night away, they drank the night away, right? 00:49:44.820 | 
It's like all this is like time away thing doesn't actually, you know, you like can't 00:49:48.700 | 
really tell, otherwise, you know, like these like this er, er construction, like, like 00:49:52.860 | 
the, the bigger they are, the more expensive they are, right? 00:49:56.020 | 
Like the, man, I forgot how it goes, the bigger they come, the harder they fall, right? 00:50:00.900 | 
Like so it doesn't even have to be a, yeah, and it was like, you know, that travesty of 00:50:05.980 | 
Right, like that of a construction, there's so many of these, right? 00:50:08.460 | 
Like so much of how we speak, if you actually try to like do like the tree parts, new like 00:50:12.220 | 
semantic parts up, up from it, it won't really make sense. 00:50:16.460 | 
And so there, there's been this work, this is more, more recent, recently kind of come, 00:50:21.620 | 
coming to light, and I've been really excited by it. 00:50:23.660 | 
There's texting constructions in large language models. 00:50:26.860 | 
There was just this year, a paper by Kyle Mahalwald, who is a postdoc here, testing 00:50:32.940 | 
the like the beautiful five days in Austin construction, right? 00:50:35.500 | 
So it's like the a, adjective, numeral, noun construction where it's like, it's like doesn't 00:50:43.820 | 
Because it's like, it wouldn't really work, right? 00:50:48.860 | 
And there's like many ways, you know, and like anything kind of similar to it, right? 00:50:52.980 | 
Like it's like a five beautiful days that, that doesn't work, right? 00:50:56.220 | 
So somehow like this specific construction is like grammatically correct to us. 00:50:59.580 | 
But like, you know, you like, you can't say a five days in Austin, right? 00:51:02.380 | 
You can't say a five beautiful days in Austin, you know, you have to say like this. 00:51:06.260 | 
And it's just like GPT-3 is actually like largely concurrent, concurs with humans on 00:51:13.380 | 
So on the, on the left here, the gray bars, we have the, the, the, the things that are 00:51:21.500 | 
So those are like a beautiful five, five days in Austin and five beautiful days in Austin, 00:51:28.460 | 
They do this over like many, many instances of this construction, not just Austin, obviously. 00:51:33.220 | 
But yeah, and we say like GPT-3 like accepts these, you know, those are the gray bars and 00:51:37.740 | 
humans also accept these, though those are the green triangles. 00:51:41.900 | 
And like every other iteration, the human triangles are very low. 00:51:45.360 | 
And GPT-3 is like lower, but, but, but does get tricked by some things, right? 00:51:48.560 | 
So it seems to have this knowledge of this construction, but not as like starkly as humans 00:51:53.580 | 
So the, especially like if you see, if you see that, that, that third one over there, 00:51:57.860 | 
The five beautiful days, humans don't, don't accept it as much. 00:52:01.020 | 
It's funny to me, it sounds almost better than those rest of them, but I guess these 00:52:14.900 | 
And GPT-3 is like better, you know, like think, thinks those are better than maybe humans 00:52:18.320 | 
do, but there is this like difference, you know, it's like significant difference between 00:52:25.440 | 
And then similarly, some people tested the, the X or the Y construction, right? 00:52:29.200 | 
And so it's like, they took examples of sentences that, that were like the X or the Y construction. 00:52:34.660 | 
And then like they, they, they took example sentence which had like an er followed by 00:52:40.300 | 
an er, but they weren't, or like, but, but they weren't actually the X or the Y, right? 00:52:44.620 | 
It's like, oh, the older guys help out the younger guys, right? 00:52:47.540 | 
So, but so that's not an X or Y or construction. 00:52:49.780 | 
And, and, and, you know, and then they were like, right, if we mark the ones that are 00:52:53.620 | 
as positive ones that aren't as negative, it does the latent space of models kind of 00:52:58.340 | 
That, that, that like all this construction kind of clustered together in a way. 00:53:04.940 | 
And then the last thing I want to talk about in this like semantic space, you know, after 00:53:09.700 | 
like constructions and all that, is like the meaning of words is like actually very subtle 00:53:13.420 | 
and sensitive and it's like influenced by context and all these like crazy ways, right? 00:53:17.780 | 
And Erica Peterson and Chris Potts from, from the linguistics department here did this like 00:53:23.180 | 
great investigation on a, you know, on the, on the verb, on the verb break, you know. 00:53:30.300 | 
And it's like the break can have all these meanings, right? 00:53:32.540 | 
Like we, we think it's like, yeah, break is like a word, you know, and like words are 00:53:36.340 | 
things like table and dog and break that have like one sense. 00:53:39.020 | 
But, you know, actually there aren't even senses that you can enumerate, you know, like 00:53:43.380 | 
river bank and financial bank and just like, yeah, you know, break the horse means tame 00:53:47.700 | 
It means like spread, spread into like smaller bits of money, right? 00:53:55.020 | 
There's just like so, so many ways in which break, you know, like its meaning is just 00:54:00.300 | 
It's like kind of true for like every word, you know, or like many words, maybe like table 00:54:04.940 | 
It's like, yeah, there's like a set of all things that are tables or dogs. 00:54:08.580 | 
You know, there's maybe some more philosophical way of going about it, but, you know, so like 00:54:12.380 | 
pocket, you know, it's like a pocket, but then like you can pocket something. 00:54:15.140 | 
Then like it kind of means steal in many cases, doesn't just mean put something in your pocket 00:54:20.900 | 
This is like, so yeah, there's like all these ways in which in which like the meaning of 00:54:31.260 | 
And what they do is that don't worry about like what's actually going on here, but, you 00:54:34.980 | 
know, they've kind of mapped each sense, like a color. 00:54:38.540 | 
And when you start off in layer one, they're all I think this is just by like position 00:54:44.940 | 
You start up in layer one and it's just like, I think that's what it is. 00:54:45.940 | 
And then you like if you take all the words past pass, pass them through like a big model, 00:54:59.220 | 
And then, you know, by the end, they've all kind of split up. 00:55:01.380 | 
You take all the colors are kind of clustering together. 00:55:03.740 | 
Each color is kind of like one of one of these meanings. 00:55:07.060 | 
And so they kind of clustered together and these like kind of is it constructions again 00:55:09.860 | 
or is it just like, you know, the way in which they kind of isolate these like really subtle 00:55:18.740 | 
So then I think a big question in NLP, right, is like, how do we strike the balance between 00:55:22.780 | 
like syntax and the ways that like meaning influences things? 00:55:26.620 | 
So well, and I pulled up this quote from a book by John Bidey, which I enjoy. 00:55:33.860 | 
And I think it kind of brings to light like a question that we should be asking in an 00:55:39.740 | 
This book is about is like just like a linguistics book. 00:55:41.740 | 
But, you know, it's in while language is full of both broad generalizations and items with 00:55:45.420 | 
properties, linguists have been dazzled by the quest for general patterns. 00:55:50.300 | 
You know, and like, of course, the abstract structures and categories of language are 00:55:55.300 | 
But, you know, I would submit or she would submit that what is even more fascinating 00:55:58.460 | 
is the way that the general structures arise from and interact with the more specific items 00:56:02.100 | 
of language, producing a highly conventional set of general and specific structures that 00:56:07.060 | 
allow the expression of both conventional and novel ideas. 00:56:10.100 | 
It's kind of like this like middle ground between abstraction and like specificity that 00:56:16.180 | 
like we would want, you know, that like humans probably exhibit that we would want our models 00:56:21.900 | 
I was wondering if you could go back one slide and just unpack this diagram a little more 00:56:36.900 | 
Oh, so this is all like, you know, so if you take, you know, the way that that that like 00:56:45.180 | 
words are, you know, as you're passing through a transformer through through many layers, 00:56:48.980 | 
I just want to be like, look at how the colors cluster. 00:56:52.460 | 
But yeah, you're passing through a transformer, many layers at any one point in that transformer, 00:56:57.140 | 
you could like say, OK, how are the words organized now, you know, and you think, well, 00:57:02.380 | 
I'm going to project that to two dimensions from like a thousand. 00:57:05.340 | 
And that's, you know, maybe a good idea, maybe a bad idea. 00:57:07.820 | 
I think there's a lot of but, you know, I would be able to show them here if they were 00:57:11.620 | 
So let's like assume that it's like an OK thing to be doing. 00:57:15.660 | 
Then then, you know, so this is what they've done for like for layer one and then for layer 00:57:21.540 | 
And so we could see that that like they they start off where like the colors are totally 00:57:25.300 | 
jumbled and they're probably, you know, in before layer one, you add in the position 00:57:31.220 | 
So I think I think that that's what all those clusters are. 00:57:34.620 | 
So it's like kind of clustering because you don't have anything to go off of. 00:57:36.340 | 
You know, it's like this is break and it's in position five. 00:57:38.420 | 
It's like, OK, I guess I'll cluster all the breaks in position five. 00:57:41.380 | 
But then as you go as you as you go up up the model, right. 00:57:46.380 | 
And kind of like all this meaning is being formed. 00:57:48.580 | 
You see these like senses kind of like come out in the in in in how it organizes things. 00:57:57.340 | 
So it's like all all these like breaks kind of like become they're very specific. 00:58:01.500 | 
You know, they're very like kind of subtle versions of breaks. 00:58:04.260 | 
You know, there's like this work and I think it's different from a lot of NLP work because 00:58:08.300 | 
it has like a lot of labor put into this labeling. 00:58:13.540 | 
Like this is like some something because because, you know, the person who this is a linguistic 00:58:19.940 | 
And if you go through corpus and label every break by like which one of these it means, 00:58:24.300 | 
And so I think it's the kind of thing that you wouldn't be able to show otherwise. 00:58:33.100 | 
So yeah, language is characterized by the fact that it's this amazingly abstract system. 00:58:39.220 | 
And, you know, and we want our models to capture that. 00:58:40.700 | 
That's why we do all this compositionality kind of syntax tests. 00:58:43.420 | 
You know, but meaning is so rich and multifaceted. 00:58:46.340 | 
High dimensional spaces are much better at capturing these subtleties. 00:58:50.220 | 
We started off talking about word embeddings in this class. 00:58:53.020 | 
You know, high dimensional space are so much better at this than any rules that we would 00:58:55.100 | 
come up with being like, OK, maybe we could have like break subscript, like break money, 00:58:59.620 | 
you know, and we're going to put that into our system. 00:59:02.420 | 
And so where do deep learning models where do they stand now? 00:59:07.100 | 
Between surface level memorization and abstraction. 00:59:08.940 | 
You know, and this is what like a lot of analysis and interpretability work is trying to understand, 00:59:13.540 | 
And I think that what's important to keep in mind when we're reading and kind of doing 00:59:16.260 | 
this analysis and interpretability work is that this is not even a solved question for 00:59:22.100 | 
Like we don't know exactly where humans stand between like having an abstract grammar and 00:59:24.660 | 
having these like these like very like construction specific and meaning specific ways that that 00:59:32.620 | 
Any questions overall on the importance of semantics and the richness of human language? 00:59:39.420 | 
This is probably a question from quite a bit before, but you're showing a chart from your 00:59:45.860 | 
research where the model is really well able to distinguish inanimate from animate given 00:59:58.980 | 
I was just trying to interpret that graph and understand what the sort of links between 01:00:13.060 | 
So so the main so this is similar to the other graph where it was you know where what it's 01:00:15.980 | 
trying to distinguish is a subject from object. 01:00:19.540 | 
But we've just split the test set into these four ways. 01:00:22.300 | 
We're split into like subject inanimate, subject animate, you know, so we just split the test 01:00:28.100 | 
And so like what the what like the two panels in the x axis are showing are like these different 01:00:34.380 | 
So like OK so things that are subjects and basically the ground truth is that things 01:00:37.060 | 
on the left should be above 15 things on the right should be below 50. 01:00:41.700 | 
But if we further split it by animate and inanimate, we see that there's just like influence 01:00:52.300 | 
Sorry, I rushed over these graphs like kind of I want to get like a taste of things that 01:00:56.740 | 
But yeah, it's good to also understand fully what's going on. 01:01:09.180 | 
So I'm assuming for judging acceptability, you just ask that for like GPT-3, how do you 01:01:20.460 | 
I think that's what Kyle Mahalwa did in this paper. 01:01:24.140 | 
You could just like take like the probabilities out, put it up then if you like, you know, 01:01:26.700 | 
if you like kind of for GPT-3, it's like going left to right. 01:01:29.620 | 
I think there's like other things that people do sometimes. 01:01:32.580 | 
But like, yeah, especially for these models, they don't have too much access to apart from 01:01:36.380 | 
like the like generation and like the like probability of each generation. 01:01:42.260 | 
Yeah, I think that you might want to do that. 01:01:44.780 | 
And there's like, you know, you don't want to multiply every logic together, right? 01:01:48.420 | 
Because then like if you're multiplying many probabilities, longer, longer sentences, you 01:01:54.540 | 
Which is like not true exactly for humans or, you know, it's not true in that way for 01:01:58.740 | 
So, you know, I think there's like things you should do, like ways to control it and 01:02:00.820 | 
stuff like when you're running an experiment like this. 01:02:16.820 | 
So so far we've been talking about English, right? 01:02:19.020 | 
All this I haven't been saying it explicitly all the times, but most things I've said, 01:02:21.820 | 
you know, apart from some, maybe some differential object marking examples, right? 01:02:24.860 | 
They've been kind of about English, about English models, but there's so many languages, 01:02:29.500 | 
There's like 7,000 languages in the world, maybe not over, there's around 7,000 languages 01:02:40.780 | 
It's kind of difficult, you know, like even in the case of English, right? 01:02:45.460 | 
The language spoken in Scotland, is that English? 01:02:47.420 | 
It's like, you know, something like Jamaican English, you know, like maybe that's a different 01:02:51.380 | 
There's like different structures, but it's still like clearly like much more related 01:02:57.340 | 
And so, you know, how do you make a kind of a multilingual model? 01:03:03.100 | 
Well, so far a big approach to me, you know, you take a bunch of languages, this is like 01:03:09.300 | 
all of them, and maybe you're not going to take all of them, you know, maybe you can 01:03:11.340 | 
take a hundred or something, and you just funnel them into just like one transformer 01:03:16.340 | 
And there's maybe things you could do like up sampling some, they don't have too much 01:03:18.580 | 
data of, you know, or like down sampling some, they have too much data of, you know, but 01:03:23.220 | 
like this is the general approach, you know, what if we just make one, you know, like one 01:03:27.300 | 
transformer language model, you know, like something like a BERT, it's usually like a 01:03:32.860 | 
BERT type model, because it's hard to get good generation for like too many languages, 01:03:36.420 | 
you know, but yeah, how about just get one transformer language model for all of these 01:03:42.240 | 
And so what's cool about this is that multilingual language models, right, they let us share 01:03:46.700 | 
parameters between high resource languages and low resource languages, right? 01:03:51.300 | 
There's a lot of language in the world, really just most languages in the world, which you 01:03:54.620 | 
could not train like even like a BERT size model for, right, they're just like not enough 01:03:58.860 | 
data and there's, yeah, and there's a lot of work being done on this. 01:04:03.380 | 
And one way to do this is say like, well, you know, like, you know, pre-training and 01:04:07.460 | 
transfer learning, they brought us so much unexpected success, right? 01:04:12.860 | 
And so like, you know, and we get this great linguistic capability in generality, right, 01:04:17.460 | 
if we pre-train something in English that we weren't asking for, so, you know, so will 01:04:21.820 | 
this self-supervised learning paradigm, you know, can it like deliver between languages? 01:04:25.300 | 
So it's like, maybe I can get a lot of the, a lot of the like linguistic knowledge, like 01:04:30.500 | 
the more general stuff from like just all the high resource languages and then kind 01:04:33.580 | 
of apply it to the low resource languages, right? 01:04:35.620 | 
Like a bilingual person doesn't have like two totally separate parts of their self, 01:04:39.420 | 
right, that like have learned languages, probably some sharing some way that like things are 01:04:43.260 | 
like in the same space, like linguistics are broadly the same, right? 01:04:48.620 | 
And so, and so, and so, you know, we have this like attempt to like bring NLP to like 01:04:57.060 | 
some still very small subset of the 7,000 languages in the world. 01:05:03.020 | 
On the one hand, you know, languages are remarkably diverse. 01:05:05.860 | 
So we'll go over some of the cool ways that languages in the world vary, you know, and 01:05:10.580 | 
so does multilingual NLP capture the specific differences of different languages? 01:05:15.020 | 
On the other hand, you know, languages are similar to each other in many ways. 01:05:18.740 | 
And so does multilingual NLP capture the parallel structure between languages? 01:05:23.940 | 
So you know, just, just, just to go over some ways, like, you know, really understanding 01:05:27.860 | 
like how like diverse languages can be, you know, in around a quarter, this is a quote 01:05:33.140 | 
from a book, but you know, in around a quarter of the world's languages, every statement, 01:05:37.900 | 
right, like every time you use a verb must specify the type of source on which it is 01:05:42.940 | 
So it's like a part, you know, how we have like tense in English, where we like, you 01:05:46.140 | 
know, kind of everything you say is like kind of either in the past or the present or the 01:05:51.100 | 
And so like an example in a, in Tarjana, these are again from, from the book, right? 01:05:58.320 | 
But it's, you know, you, you have this like marker in bold at the end, right? 01:06:02.920 | 
And so, and so when you say something like, Jose has played football, right? 01:06:07.940 | 
You if you put like the car marker, that means that we saw it, right? 01:06:10.620 | 
It's kind of like the visual evidential marker, right? 01:06:12.540 | 
And there's, and there's kind of a non visual marker that kind of means we heard it, right? 01:06:17.340 | 
So if you say, you know, so if you say statement, you could say we heard it, right? 01:06:21.660 | 
There's a like, we infer it from visual evidence, right? 01:06:24.140 | 
So if it's like, oh, his like cleats are gone, and he is also gone, but like, and people, 01:06:30.100 | 
you know, and we see people going to play football, right? 01:06:32.740 | 
Or we see people coming back, I guess, from playing football because in the past, right? 01:06:35.300 | 
That means like, you know, so, so we can infer it. 01:06:37.940 | 
There's like, you know, or like, if he plays football every Saturday, you know, and it's 01:06:43.340 | 
Saturday, we you would use a different marker, right? 01:06:47.340 | 
Or like, if someone has told you if it's hearsay, you would use a different marker, right? 01:06:50.900 | 
So this is like, this is like a part of the grammar, right? 01:06:57.660 | 
Like, I don't speak any language that has this, it seems like it's, it seems like very 01:07:03.100 | 
cool and like different from like anything I would ever think would be like a part of 01:07:08.820 | 
Or like, especially like a compulsory part of the grammar, right? 01:07:14.180 | 
And you can like map out, I wanted to include some maps from WALS, the World Atlas of Linguistic 01:07:21.540 | 
You know, you could like map out all the languages, right? 01:07:23.820 | 
Like I only speak white dot languages, which are like no grammatical evidentials. 01:07:27.820 | 
You know, if you want to say whether you heard something or saw it, you have to say it like 01:07:31.860 | 
But there's many languages, you know, as very, yeah, especially in the Americas, right? 01:07:39.820 | 
Tainan is I think Brazilian language from like up by the border with, yeah. 01:07:46.460 | 
But yeah, the, you know, while we're looking at like language typology maps, right? 01:07:52.980 | 
And so like this, this like language organization, like in categorization maps, the most like, 01:07:59.220 | 
the classic one, right, is again, like the subject object and verb order, right? 01:08:04.460 | 
So as you said, English has SVO order, but there's just so, so many orders that, you 01:08:09.660 | 
know, kind of like almost all the possible ones are a test that, you know, some languages 01:08:14.340 | 
have no dominant order, like Greek, so like a language that I speak natively has a dominant 01:08:20.220 | 
You would say you would move things around for emphasis or whatever. 01:08:24.020 | 
And you see like, and here, you know, we're seeing some, some like diversity, we're seeing 01:08:28.580 | 
typology, we're also seeing some tendencies, right? 01:08:30.900 | 
Like some are just so much more common than others, right? 01:08:33.580 | 
And this is like, again, something which like people talk about so much, right? 01:08:41.900 | 
Why are some more common where some others, it's like a basic fact of language, it's something 01:08:45.100 | 
which happened, you know, is this like just the fact of like how discourse works, maybe, 01:08:49.140 | 
you know, like that's, that's more preferred for many people to say something, you know, 01:08:55.740 | 
Another way though that languages vary, you know, is like the number of morphemes they 01:09:00.180 | 
Like some languages are like, you know, like Vietnamese classically, just like very isolating, 01:09:04.060 | 
like kind of like each, you know, like each kind of thing you want to express like tense 01:09:08.020 | 
or something is going to be in, in a different word, you know, in English, we actually combine 01:09:12.460 | 
kind of tenses, we have things like "able," right? 01:09:14.500 | 
Like, you know, like, like throwable or something, right? 01:09:18.180 | 
And then like in, in, in some languages, they're just like really so much stuff is expressed 01:09:23.220 | 
And so you can have languages, especially in like Alaska and Canada, a lot of languages 01:09:28.740 | 
there and like Greenland, where you have like, and these are all like one, one language family, 01:09:36.460 | 
you can have like kind of whole sentences expressed with just like things, things that 01:09:43.620 | 
So you have to have things, things like the, you know, like the object and the, or I guess 01:09:51.260 | 
in this case, you start with the object, again, you have kind of like the verb and the like, 01:09:55.340 | 
whether it's happening or not happening and who said it and like, or like whether it's 01:09:58.540 | 
said in the future and all that just kind of all put in, you know, these like, quote 01:10:03.380 | 
It's like a very different way of a language working than English works like at all, right? 01:10:08.620 | 
Yeah, this is from two slides ago, the one with the map. 01:10:09.620 | 
I just want to know like what these dots mean, because in the US, the top right is gray, 01:10:18.140 | 
like in the Northeast, but in the Pacific Northwest, it's yellow. 01:10:21.500 | 
Is that different dialects for like the same American English? 01:10:28.580 | 
So, so English is just this one dot in here spread in amongst all the like Cornish and 01:10:35.940 | 
Yeah, so English is just like in Great Britain. 01:10:42.940 | 
Yeah, and that's why, yeah, and that's why like all this like really and that's why like 01:10:47.580 | 
all this like evidential stuff is happening in, uh, in like the Americas, right? 01:10:50.740 | 
Because there's like a lot of, you know, very often the indigenous languages of the Americas 01:10:54.300 | 
are like the classic, like very evidentially marking ones, which are the pink ones. 01:10:59.180 | 
You said that normally we use like a bird style model for multilingual models because 01:11:04.300 | 
it's difficult for natural English generation across languages. 01:11:08.420 | 
I mean, I guess intuitively that makes sense, right? 01:11:11.140 | 
Because of the subtleties and the nuance between different languages when you're producing it. 01:11:15.300 | 
But is there like a reason that, um, like a particular reason that that's been so much 01:11:21.580 | 
I think a good generation is just like harder, right? 01:11:25.420 | 
Like to get something like, you know, like GPT-3 or something. 01:11:28.540 | 
If you need like really like a lot of data and maybe like it's kind of like, I think 01:11:32.340 | 
there are, can I think of any, are there any, is it G-Shard? 01:11:36.900 | 
Yeah, I can't really think of any like, you know, like encoder-decoder, as you said, you 01:11:43.540 | 
Of course, like GPT-3 has this thing where if you're like, how do you say this in French? 01:11:46.100 | 
You'll be like, you say it like this, you know? 01:11:47.340 | 
So it's like, if you've seen all of the data, it's going to include a lot of languages, 01:11:50.780 | 
but this kind of like multilingual model where you'd be like, right, you know, be as good 01:11:54.380 | 
as GPT-3, but in this other language, you know, I think it's just, it's just, you need 01:11:58.740 | 
a lot more data to get that kind of coherence, right? 01:12:01.140 | 
As opposed to like, yeah, as opposed to something if you do like text infilling or something, 01:12:05.580 | 
which is like how the bird style models are, then you get like very good, even if the text 01:12:09.860 | 
infilling, you know, performance isn't great for every language, you can actually get very, 01:12:15.340 | 
very good embeddings to work with for a lot of those languages. 01:12:21.580 | 
Now, for just like a one last language diversity thing, I think this is interesting, the motion 01:12:26.620 | 
event stuff, because it's like, this is actually, you know, it's not, it's like languages that, 01:12:31.140 | 
you know, many of us know, I'm going to talk about Spanish, but it's actually something 01:12:34.220 | 
which you might not have thought about, but then once you see, you're like, oh, actually, 01:12:37.420 | 
that's like actually affects how like everything works. 01:12:40.660 | 
So in English, right, the manner of motion is usually expressed on the verb, right? 01:12:43.420 | 
So you can see something like the bottle floated into the cave, right? 01:12:46.140 | 
And so like, the fact that it's floating is on the verb, and the fact that it's going 01:12:51.660 | 
Well, like in Spanish, the direction of motion usually expressed on the verb, Greek is like 01:12:58.180 | 
I feel like most Indo-European languages are not like this, they're actually like English. 01:13:01.300 | 
So like most languages from like Europe to like North India tend to not be like this, 01:13:07.380 | 
So you would say like, "La botella entro a la cueva flotando," right? 01:13:10.860 | 
So you'd have like, so the floating is not usually put on the main verb. 01:13:16.140 | 
And like, in English, you could actually say like, right, like the bottle entered the cave 01:13:19.620 | 
floating, it's just like maybe not what you would say, right? 01:13:23.460 | 
And similar, like in Spanish, you can say the other way, right? 01:13:26.580 | 
This is called like satellite framing language and verb framing language, like really affects 01:13:29.780 | 
how you would kind of like say most, you know, like kind of how everything works, right? 01:13:34.500 | 
It's kind of like a division that's like, you know, pretty attested, of course, it's 01:13:43.180 | 
Chinese I think often has these structures where there's like two verb slots, right? 01:13:47.540 | 
Where you could have both a manner of motion and a direction of motion kind of in the like 01:13:52.860 | 
the one verb slot, none of them have to go kind of like after playing some different 01:13:58.620 | 
So these are like, there's all these ways in which like language are just different, 01:14:01.580 | 
you know, from like things that maybe we didn't even think could like be in a language, like 01:14:09.020 | 
But we don't realize that in some, sometimes you're just like so different in these like 01:14:16.100 | 
And so, you know, and so going to the other annual language are so different, they're 01:14:21.740 | 
So like, you know, there's this idea like, is there like a universal grammar, some like 01:14:26.740 | 
abstract structure that all, that unite all languages, right? 01:14:31.940 | 
And you know, the question is, can we define an abstraction where we can all say like all 01:14:36.780 | 
There's like other ways of thinking about universals, like all languages like tend to 01:14:39.500 | 
be one way or tend to be like languages that tend to be one way also tend to be some other 01:14:44.660 | 
And there's like a third way of thinking about universals, that's like languages all deal 01:14:49.940 | 
in similar types of relations, you know, like subject, object, you know, like types of modifiers, 01:14:55.300 | 
So the universal dependencies project was like a way of kind of saying like, maybe we 01:15:02.020 | 
can make dependencies kind of for all languages in a way that doesn't shoehorn them into each 01:15:08.980 | 
RRG, like relational something grammar, you know, was also kind of this idea that maybe 01:15:12.940 | 
one way to think about all languages together is like the kind of relations they define, 01:15:17.300 | 
And, you know, ask me about kind of like the Chomsky and the Greenbergian stuff you want 01:15:26.460 | 
It's kind of, yeah, it's slightly more difficult. 01:15:29.380 | 
So maybe it's easier to think of this third one in terms of NLP, right? 01:15:33.820 | 
And like back to the subject object relation stuff, if we look at it across languages, 01:15:38.100 | 
right, we see that they're kind of encoded in parallel because classifiers, right, those 01:15:41.780 | 
classifiers that we're training, they're like as accurate in their own language as they 01:15:47.020 | 
Their own language being red and other languages being black, right? 01:15:50.660 | 
It's not like, wow, if I take a multilingual model and I train one classifier in one language, 01:15:56.140 | 
it's going to be so good at itself and so bad at everything else, right? 01:15:59.660 | 
They're clearly like on the top end, the red dots. 01:16:03.860 | 
And UD relations, right, so universal dependencies, right, like the kind of like dependency relations, 01:16:13.620 | 
Again, main thing to take from this example is that like the colors cluster together, 01:16:18.900 | 
So if you train kind of like a parser on or like, you know, parse classification in one 01:16:24.620 | 
language and kind of transfer it to another, you see these clusters form for the other 01:16:29.100 | 
So it's like these ideas of how like things relate together, right? 01:16:31.460 | 
Like a kind of noun modifier, you know, all that kind of stuff. 01:16:35.620 | 
They do cluster together in these parallel ways across languages, you know? 01:16:42.060 | 
And so language specificity is also important. 01:16:48.420 | 
But you know, it seems like maybe sometimes some languages are shoehorned into others 01:16:54.820 | 
And maybe part of this is that data quality, it's very variable in multilingual corpora, 01:16:59.940 | 
So if you take like all these multilingual corpora, there was like an audit of them. 01:17:04.460 | 
And like for like all these various like multilingual corpora, like 20% of languages, they're less 01:17:08.620 | 
than 50% correct, meaning like 50% of it was often like just links or like just something 01:17:14.380 | 
So that might be like some language, but it was not at all. 01:17:19.340 | 
And like maybe the way we maybe we don't want too much parameter sharing, right? 01:17:23.140 | 
Like Afroberta is a recent, it's a kind of recent BERT model trained like only on African 01:17:29.460 | 
languages, you know, maybe like having too much, too high resources like harming, you 01:17:33.900 | 
know, and there's work here at Stanford being done in the same direction, you know. 01:17:37.980 | 
Another, yeah, another recent cross-lingual model, XLMV, came out, which is like, why 01:17:47.060 | 
You know, like you just have like a big vocabulary. 01:17:53.140 | 
It kind of like knocks out similar models or smaller vocabularies, which are like maybe, 01:17:56.620 | 
you know, computer is the same in English and French. 01:18:00.420 | 
Maybe it's better to separate things, you know. 01:18:01.700 | 
It's like hard to like kind of find this balance between, let's skip over this paper too. 01:18:05.860 | 
It's very cool and there's a link there, so you should look at it. 01:18:08.940 | 
But yeah, we want language generality, but we also want to preserve diversity. 01:18:13.980 | 
And so how is multilingual NLP doing, you know, especially with things like dialects? 01:18:17.860 | 
You know, there's so many complex issues for multilingual NLP to be dealing with. 01:18:22.780 | 
How can deep learning work for low resource languages? 01:18:25.620 | 
You know, what are the ethics of working in NLP for low resource languages? 01:18:31.500 | 
Who like wants the language to be translated? 01:18:33.100 | 
You know, these are all like very important ethical issues in multilingual NLP. 01:18:38.580 | 
And so after looking at structure, beyond structure, multilinguality in models, I hope 01:18:47.100 | 
you know that linguistics is a way of, you know, investigating what's going on in black 01:18:52.220 | 
The subtleties of linguistic analysis, they can help us understand what we want or expect 01:18:58.100 | 
And like even though we're not reverse engineering human language, linguistic insights, I hope 01:19:02.260 | 
I've convinced you they still have a place in understanding, you know, the models that 01:19:05.380 | 
we're working with, the models that we're dealing with. 01:19:07.740 | 
And you know, and in so many more ways beyond what we've discussed here, you know, like 01:19:12.220 | 
language acquisition, language and vision, and like instructions and music, discourse, 01:19:16.860 | 
conversation and communication, and like so many other ways. 01:19:22.380 | 
If there's any more questions, you can come ask me.