Lex Fridman Podcast - #426 – Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs
Episode Date: April 17, 2024Edward Gibson is a psycholinguistics professor at MIT and heads the MIT Language Lab. Please support this podcast by checking out our sponsors: - Yahoo Finance: https://yahoofinance.com - Listening: h...ttps://listening.com/lex and use code LEX to get one month free - Policygenius: https://policygenius.com/lex - Shopify: https://shopify.com/lex to get $1 per month trial - Eight Sleep: https://eightsleep.com/lex to get special savings Transcript: https://lexfridman.com/edward-gibson-transcript EPISODE LINKS: Edward's X: https://x.com/LanguageMIT TedLab: https://tedlab.mit.edu/ Edward's Google Scholar: https://scholar.google.com/citations?user=4FsWE64AAAAJ TedLab's YouTube: https://youtube.com/@Tedlab-MIT PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (10:53) - Human language (14:59) - Generalizations in language (20:46) - Dependency grammar (30:45) - Morphology (39:20) - Evolution of languages (42:40) - Noam Chomsky (1:26:46) - Thinking and language (1:40:16) - LLMs (1:53:14) - Center embedding (2:19:42) - Learning a new language (2:23:34) - Nature vs nurture (2:30:10) - Culture and language (2:44:38) - Universal language (2:49:01) - Language translation (2:52:16) - Animal communication
Transcript
Discussion (0)
The following is a conversation with Edward Gibson or Ted as everybody calls him. He is a
psycholinguistics professor at MIT. He heads the MIT language lab that investigates why human
languages look the way they do, the relationship between culture and language, and how people
represent, process, and learn language. Also, he should have a book titled Syntax, A Cognitive Approach, published by MIT Press,
coming out this fall.
So look out for that.
And now a quick few second mention of each sponsor.
Check them out in the description.
It's the best way to support this podcast.
We got Yahoo Finance for basically everything
you've ever needed if you're an investor.
Listening for, listening to research papers,
policy genius for insurance, Shopify for selling stuff online
and eight sleep for naps.
Choose wisely my friends.
Also, if you want to work with our amazing team
or just get in touch with me,
go to lexfreedman.com slash contact.
And now onto the full ad reads.
As always, no ads in the middle.
I try to make this interesting,
but if you must skip friends,
please still check out the sponsors.
I enjoy their stuff, maybe you will too.
This episode is brought to you by Yahoo Finance,
a new sponsor, and they got a new website
that you should check out.
It's a website that provides financial management
reports, information, and news for investors.
Yahoo itself has been around forever.
Yahoo Finance has been around forever.
I don't know how long, but it must be over 20 years.
It's survived so much.
It evolved rapidly and quickly, adjusting, evolving,
improving, all of that.
The thing I use it for now is there's a portfolio
that you can add your account to.
Ever since I had zero money, I used,
boy, I think it's called TD Ameritrade.
I still use that same thing, just getting a basic mutual fund.
And I think TD Ameritrade got bought by Charles Schwab
or acquired or merged, I don't know.
I don't know how these things work.
All I know is that Yahoo Finance can integrate that
and just show me everything I need to know
about my quote unquote portfolio.
I don't have anything interesting going on,
but it is still good to kind of monitor it,
to stay in touch.
Now, a lot of people I know have a lot more interesting stuff
going on investment-wise.
So, all of that could be easily integrated
into Yahoo Finance,
and you can look at all that stuff,
the charts, blah, blah, blah.
It looks beautiful and sexy,
and just helps you be informed.
Now, that's about your own portfolio,
but then also for the entirety of the finance information
for the entirety of the world.
That's all there.
The big news, the analysis of everything that's going on,
everything like that.
And I should also mention that I would like to do
more and more financial episodes.
I've done a couple of conversations with Ray Dalio.
A lot of that is about finance,
but some of that is about sort of geopolitics
and the bigger context of finance.
I just recently did a conversation with Bill Ackman,
very much about finance.
And I did a series of conversations on cryptocurrency,
lots and lots of brilliant people,
Michael Saylor, so on.
Charles Hoskinson, Vitalik,
I mean, just lots of brilliant people in that space
thinking about the future of money, future of finance.
Anyway, you can keep track of all of that with Yahoo Finance.
For comprehensive financial news and analysis,
go to yahoofinance.com.
That's yahoofinance.com.
This episode is also brought to you by Listening,
an app that allows you to listen to academic papers.
It's a thing I've always wished existed.
And I always kind of suspected it's very difficult
to pull off, but these guys pulled it off.
Basically, it's any kind of formatted text
brought to life through audio.
Now for me, the thing I care about most,
and I think that's at the foundation of listening,
is academic papers.
So I love to read academic papers
and there's several levels of rigor
in the actual reading process.
But listening to them, especially after I skimmed it
or after I did a deep dive,
listening to them is just such a beautiful experience.
It solidifies the understanding.
It brings to life all kinds of thoughts.
And I'm doing this while I'm cooking,
while I'm running,
while I'm going to grab a coffee, all that kind of stuff.
It does require an elevated level of focus,
especially the kind of papers I listen to,
which are computer science papers.
But you can load in all kinds of stuff.
You can do philosophy papers,
you could do psychology papers like this,
very topic of linguistics.
I've listened to a few papers on linguistics.
I went back to Chomsky and listened to papers.
It's great.
Papers, books, PDFs, webpages, articles,
all that kind of stuff.
Even email newsletters.
And the voices they got are pretty sexy.
It's great.
It's pleasant to listen to.
I think that's what's ultimately most important, is it shouldn't feel like a chore to listen to it.
Like I really enjoy it.
Normally you'd get a two week free trial,
but listeners of this podcast get one month free.
So go to listening.com slash Lex,
that's listening.com slash Lex.
This episode is brought to you by Policy Genius,
a marketplace for insurance, life, auto, home,
disability, all kinds of insurance.
There's really nice tools for comparison.
I'm a big fan of nice tools for comparison.
Like I have to travel to harsh conditions soon
and I have to figure out how I need to update my equipment
to make sure it's weatherproof, waterproof even.
It's just resilient to harsh conditions
and it would be nice to have sort of comparisons.
I have to resort to like Reddit posts or forum posts
kind of debating different audio recorders
and cabling and microphones and waterproof containers, all
that kind of stuff.
I would love to be able to do a rigorous comparison of them.
Of course, going to Amazon, you get the reviews and those are actually really, really solid.
So I think Amazon has been the giant gift to society in that way, that you can lay out all the different options and get a lot of structured analysis
of how good this thing is.
So Amazon's been great at that.
Now, what Policy Genius did is did the Amazon thing
but for insurance.
So the tools for comparison is really my favorite thing.
It's just really easy to understand.
The full marketplace of insurance.
With Policy Genius, you can find life insurance policies
that start at just $292 per year for $1 million of coverage.
Head to policygenius.com slash Lex,
or click the link in the description
to get your free life insurance quotes
and see how much you can save.
That's policygenius.com slash Lex.
This episode is also brought to you by Shopify,
a platform designed for anyone to sell anywhere
with a great looking online store.
I'm not name dropping here,
but I recently went on a hike with the CEO of Shopify,
Toby, he's brilliant.
I've been a fan of his for a long time,
long before Shopify was a sponsor.
I don't even know if he knows
that Shopify sponsors this podcast.
Now, just to clarify, it really doesn't matter.
Nobody in this world can put pressure on me
to have a sponsor or not to have a sponsor,
or for a sponsor to put pressure on me
when I can and can't say.
I, when I wake up in the morning, feel put pressure on me what I can and can't say, I, when I
wake up in the morning, feel completely free to say what I want to say and to think what
I want to think.
I've been very fortunate in that way in many dimensions in my life.
And I also have always lived a frugal life and a life of discipline, which is where the
freedom of speech where the freedom
of speech and the freedom of thought truly comes from.
So I don't need anybody.
I don't need a boss, I don't need money.
I'm free to exist in this world in the way I see is right.
Now, on top of that, of course,
I'm surrounded by incredible people,
many of whom I disagree with and have arguments.
So I'm influenced by those conversations
and those arguments that I'm always learning,
always challenging myself, always humbling myself.
I have kind of intellectual humility.
I kind of suspect I'm kind of an idiot.
I start my approach to the world of ideas from that place,
assuming I'm an idiot and everybody has a lesson
to teach me.
Anyway, not sure why I got on off that tangent,
but the hike was beautiful.
Nature, friends, is beautiful.
Anyway, I have a Shopify store, lexfreedman.com slash store.
It's very minimal, which is how I like, I think,
most things.
If you wanna set up a store, it's super easy.
Takes a few minutes,
even I figured out how to do it.
Sign up for a $1 per month trial period
at Shopify.com slash Lex, that's all lowercase.
Go to Shopify.com slash Lex to take your business
to the next level today.
This episode is also brought to you by Asleep,
and it's part of the three cover.
The source of my escape.
The door, when opened, allows me to travel away
from the troubles of the world
into this ethereal universe of calmness,
a cold bed surface with a warm blanket,
a perfect 20 minute nap, and it doesn't matter
how dark the place my mind is in,
a nap will pull me out and I see the beauty
of the world again.
Technologically speaking, ASleep is just really cool.
You can control temperature with an app.
It's become such an integral part of my life
that I've begun to take it for granted.
Typical human.
So the app controls the temperature.
I set it, currently I'm setting it to a negative five.
Then it's just super nice, cool surface.
It's something I really look forward to,
especially when I'm traveling.
I don't have one of those.
It really makes me feel like home.
Check it out and get special savings
when you go to asleep.com slash Lex.
This is the Lex Freeman Podcast. To support it, please check out our sponsors in the description.
And now, dear friends, here's Edward Gibson. When did you first become fascinated with human language?
As a kid in school, when we had to structure sentences in English grammar, I found that
process interesting. I found it
confusing as to what it was I was told to do. I didn't understand what
the theory was behind it, but I found it very interesting. So when you look at
grammar you're almost thinking about like a puzzle, like almost like a
mathematical puzzle. Yeah I think that's right. I didn't know I was gonna work on
this at all at that point. I was really just, I was kind of a math geek person, computer scientist.
I really liked computer science.
And then I found language as a neat puzzle to work on from an engineering perspective,
actually.
I sort of accidentally, I decided after I finished my undergraduate degree, which was
computer science and math and Canada and Queen's University, I decided to go to grad school.
That's what I always thought I would do.
I went to Cambridge where they had a master's program in computational linguistics.
I hadn't taken a single language class before.
All I'd taken was CS, computer science, math classes, pretty much,
mostly, as an undergrad. And I just thought this was an interesting thing to do for a
year because it was a single year program. And then I ended up spending my whole life
doing it.
So fundamentally, your journey through life was one of a mathematician and a computer
scientist. And then you kind of discovered the puzzle, the problem of language and approached it from that angle
to try to understand it from that angle,
almost like a mathematician or maybe even an engineer.
As an engineer, I'd say, I mean, to be frank,
I had taken an AI class, I guess it was 83 or 84,
somewhere 84 in there a long time ago,
and there was a natural language section in there
and it didn't impress me.
I thought there must be more interesting things we can do.
It didn't seem very, it seemed just a bunch of hacks to me.
It didn't seem like a real theory of things in any way.
So I just thought this seemed like an interesting area
where there wasn't enough good work.
Did you ever come across the philosophy angle of logic?
So if you think about the 80s with AI, the expert systems where you try to
kind of maybe sidestep the poetry of language and some of the syntax
and the grammar and all that kind of stuff and go to the underlying meaning
that language is trying to communicate and try to somehow compress that
in a computer representable way.
Do you ever come across that in your studies?
I mean, I probably did, but I wasn't as interested in it.
I was trying to do the easier problems first,
the ones I could thought maybe were handleable,
which seems like the syntax is easier,
like which is just the forms as opposed to the meaning.
Like you're talking about,
when you're starting talking
about the meaning, that's a very hard problem
and it still is a really, really hard problem.
But the forms is easier.
And so I thought at least figuring out the forms
of human language, which sounds really hard
but is actually maybe more tractable.
So it's interesting.
You think there is a big divide, there's a gap,
there's a distance between form and meaning.
Because that's a question you have discussed a lot
with LLMs because they're damn good at form.
Yeah, I think that's what they're good at, is form.
Yeah.
Exactly, and that's why they're good
because they can do form, meaning's art.
Do you think there's, oh wow,
and I mean it's an open question, right?
How close form and meaning are. We'll discuss it, but to me, studying form,
maybe it's a romantic notion, gives you,
form is like the shadow of the bigger meaning thing
underlying language.
Form is, language is how we communicate ideas.
We communicate with each other using language.
So in understanding the structure of that communication,
I think you start to understand the structure of thought
and the structure of meaning behind those thoughts
and communication to me, but to you, big gap.
Yeah.
What do you find most beautiful about human language?
Maybe the form of human language,
the expression of human language.
What I find beautiful about human language
is some of the generalizations
that happen across the human language,
just within and across a language.
So let me give you an example of something
which I find kind of remarkable.
That is if like a language,
if it has a word order such that the verbs
tend to come before their objects.
And so that's like English does that.
So we have the first, the subject comes first
in a simple sentence.
So I say, you know, the dog chased the cat
or Mary kicked the ball.
So the subject's first, and then after the subject,
there's the verb, and then we have objects.
All these things come after in English.
So it's generally a verb, and most of the stuff
that we wanna say comes after the subject,
it's the objects, there's a lot of things
we wanna say they come after.
And there's a lot of languages like that,
about 40% of the languages of the world look like that.
They're subject, verb, object languages.
And then these languages tend to have prepositions,
these little markers on the nouns that connect nouns to other nouns or nouns to verbs. So
I, when I sort of verbalize, sorry, preposition like in or on or of or about, I say I talk
about something, the something is the object of that preposition that
we have these little markers come also just like verbs they come before their nouns. Okay and then
so now we look at other languages that like Japanese or Hindi or some these are so-called
verb final languages. Those is about maybe a little more than 40 percent maybe 45 percent of
the world's languages or more I mean 50% of the world's languages are more,
I mean, 50% of the world's languages are verb final. Those tend to be post positions. Those
markers, they have the same kinds of markers as we do in English, but they put them after.
So sorry, they put them first, the markers come first. So you say, instead of, you know, talk about a book,
you say a book about, the opposite order there,
in Japanese or in Hindi, you do the opposite.
And the talk comes at the end.
So the verb will come at the end as well.
So instead of Mary kicked the ball, it's Mary ball kicked.
And then if it says Mary kicked the ball to John,
it's John two, the two, the marker there,
the preposition, it's a post position in these languages.
And so the interesting thing, a fascinating thing to me
is that within a language, this order aligns, it's harmonic.
And so if it's one or the other,
it's either verb initial or verb
final, but then you'll have prepositions, prepositions or postpositions. And that's
across the languages that we can look at. We've got around a thousand languages. There's
around 7,000 languages on the earth right now. But we have information about, say, word
order on around a,000 of those,
pretty decent amount of information,
and for those 1,000 which we know about,
about 95% fit that pattern.
So they will have either verb,
it's about half and half, half of verb initial,
like English, and half of verb final, like Japanese.
So just to clarify, verb initial is subject, verb, object.
That's correct.
Verb final is still subject, object, verb.
That's correct, yeah, the subject is generally first.
That's so fascinating.
I ate an apple or I apple ate.
Yes.
Okay, and it's fascinating that there's a pretty even
division in the world amongst those, 40, 45%.
Yeah, it's pretty even.
And those two are the most common by far.
Those two words, the subject tends to be first.
There's so many interesting things,
but these things are,
the thing I find so fascinating
is there are these generalizations
within and across a language.
And not only those are the,
and there's actually a simple explanation, I think,
for a lot of that.
And that is, you're trying to like, minimize dependencies
between words. That's basically the story, I think, behind a lot of why word order looks the way it is,
is you, we're always connecting. What is it, what is the thing I'm telling you? I'm talking to you
in sentences, you're talking to me in sentences. These are sequences of words which are connected.
And the connections are dependencies between the words.
And it turns out that what we're trying to do
in a language is actually minimize those dependency links.
It's easier for me to say things
if the words that are connecting for their meaning
are close together.
It's easier for you in understanding if that's also true.
If they're far away, it's hard to produce that
and it's hard for you to understand.
And the languages of the world,
within a language and across languages,
fit that generalization.
So it turns out that having verbs initial
and then having prepositions
ends up making dependencies shorter.
And having verbs final and having postpositions ends up making dependencies shorter. And having verbs final and having post positions
ends up making dependencies shorter than if you cross them. If you cross them, it ends
up, you just end up, it's possible. You can do it.
You mean within a language.
Within a language, you can do it. It just ends up with longer dependencies than if you
didn't. And so, languages tend to go that way. They tend to, they call it harmonic.
So it was observed a long time ago,
by without the explanation by a guy called Joseph Greenberg,
who's a famous typologist from Stanford.
He observed a lot of generalizations
about how word order works.
And these are some of the harmonic generalizations
that he observed.
Harmonic generalizations about word order.
There's so many things I want to ask you.
Okay, let me just, sometimes basics.
You mentioned dependencies a few times.
What do you mean by dependencies?
Well, what I mean is in language,
there's kind of three structures to,
three components to the structure of language.
One is the sounds.
So cat is k-a-t in English.
I'm not talking about that part.
I'm talking, then there's two meaning parts. And those are in English. I'm not talking about that part. I'm talking about,
then there's two meaning parts and those are the words. And you were talking about meaning earlier.
So words have a form and they have a meaning associated with them. And so cat is a full form
in English and it has a meaning associated with whatever a cat is. And then the combinations of
words, that's what I'll call grammar or syntax. And that's like when I have a combination like
the cat or two cats, okay? So where I take two different words there and put them together and
I get a compositional meaning from putting those two different words together. And so that's the
syntax. And in any sentence or utterance, whatever I'm talking to you, you're talking to me, we have a bunch of words
and we're putting together in a sequence.
It turns out they are connected
so that every word is connected to just one other word
in that sentence.
And so you end up with what's called technically a tree.
It's a tree structure.
So there's a root of that utterance of that sentence. And then there's a root of that of that utterance of that sentence and then there's a bunch of
Dependents like branches from that root that go down to the words the words are the leaves in this metaphor for a tree
So a tree is also sort of a mathematical construct. Yeah. Yeah, it's a graph theoretical thing graph theory
Yeah, so in this fascinating that you can break down a sentence into a tree, and then every word
is hanging onto another, it's depending on it.
That's right, and everyone agrees on that,
so all linguists will agree with that.
No one, no one.
This is not a controversial thing.
That is not controversial.
There's nobody sitting here listening mad at you.
I don't think so.
Okay, there's no linguist sitting there mad at this.
No, I think in every language,
I think everyone agrees that all sentences
are trees at some level.
Can I pause on that?
Sure.
Because to me, just as a layman, it's surprising.
Yeah.
That you can break down sentences in all languages.
All languages, I think.
Into a tree.
I think so.
I've never heard of anyone disagreeing with that.
That's weird.
The details of the trees are what people disagree about.
Well, okay, so what's the root of a tree?
How do you construct?
How hard is it?
What is the process of constructing a tree from a sentence?
Well, this is where, you know,
depending on what your, there's different theoretical notions.
I'm gonna say the simplest thing, dependency grammar.
It's like a bunch of people invented this.
Tenier was the first French guy back in, I mean, the paper was published in 1959, but he was
working on the 30s and stuff. And it goes back to, you know, philologist Pinyini was doing this in
ancient India, okay? And so, you know, doing something like this, the simplest thing we
can think of is that there's just connections between the words to make the
utterance. And so let's just say I have like two dogs entered a room. Okay, here's a sentence.
And so we're connecting two and dogs together. That's like there's some dependency between
those words to make some bigger meaning. And then we're connecting dogs now to entered,
right? And we connect a room somehow to entered.
And so I'm gonna connect to room
and then room back to entered.
That's the tree.
Is the root is entered.
That's the thing is like an entering event.
That's what we're saying here.
And the subject, which is whatever that dog is,
is two dogs it was.
And the connection goes back to dogs,
which goes back to, then that goes back to two.
I'm just, that's my tree. It starts back to, then that goes back to two.
That's my tree.
It starts at entered, goes to dogs, down to two, and then the other side, after the verb,
the object, it goes to room, and then that goes back to the determiner or article, whatever
you want to call that word.
So there's a bunch of categories of words here we're noticing.
So there are verbs, those are these things that typically mark,
they refer to events and states in the world. And there are nouns which typically refer
to people, places, and things is what people say, but they can refer to other more, I think
you've heard of events themselves as well. They're marked by, you know, how they, how
they, the category, the part of speech of a word is how it gets used in language.
That's how you decide what the category of a word is.
Not by the meaning, but how it gets used.
How it's used.
What's usually the root?
Is it gonna be the verb that defines the event?
Usually, yes, yes.
Okay.
I mean, if I don't say a verb, then there won't be a verb,
and so it'll be something else.
What if you're messing, are we talking about language
that's like correct language? What if you're doing verb, then there won't be a verb, and so it'll be something else. What if you're messing, are we talking about language that's like correct language?
What if you're doing poetry and messing with stuff?
Is it, then rules go out the window, right?
Then it's, you're still constrained.
No, no, no, no, no.
You're constrained by whatever language you're dealing with.
Probably you have other constraints in poetry,
such that you're, like usually in poetry,
there's multiple constraints that you want to,
like you want to usually convey multiple meanings
is the idea.
And maybe you have like a rhythm or a rhyming structure as well.
And depending on, so, but you usually are constrained by your, the rules of your language
for the most part.
And so you don't violate those too much.
You can violate them somewhat, but not too much.
So it has to be recognizable as your language.
Like in English, I can't say dogs two entered room,
ah. I mean, I meant that, you know, two dogs entered a room and I can't mess with the order of the
articles, the articles and the nouns. You just can't do that. In some languages, you can mess
around with the order of words much more. I mean, you speak Russian. Russian has a much freer word order than English.
And so in fact, you can move around words in,
I told you that English has the subject,
verb, object, word order, so does Russian,
but Russian is much freer than English.
And so you can actually mess around with the word order.
So probably Russian poetry is gonna be quite different
from English poetry because the word order
is much less constrained.
Yeah, there's a much more extensive culture of poetry
throughout the history of the last hundred years in Russia.
And I always wondered why that is,
but it seems that there's more flexibility
in the way the language is used.
There's more, you're morphing the language easier
by altering the words,
altering the order of the words, messing with it. Well, you can just mess with different things in the language easier by altering the words, altering the order of the words, messing
with it.
Well, you can just mess with different things in each language.
And so in Russian, you have case markers, right, on the end, which are these endings
on the nouns, which tell you how it connects, each noun connects to the verb, right?
We don't have that in English.
And so when I say, um, Mary kissed John, I don't know who the agent or the patient is,
except by the order of the
words, right?
In Russian, you actually have a marker on the end if you're using a Russian name and
each of those names, you'll also say, is it, you know, agent, it'll be the, you know, a
nominative, which is marking the subject or an accusative will mark the object.
And you could put them in the reverse order.
You could put accusative first, you could put subject, you could put the patient first,
and then the verb, and then the subject, and that would be a perfectly good Russian sentence. And
it would still mean, I could say John kissed Mary, meaning Mary kissed John, as long as I use the
case markers in the right way. You can't do that in English. And so-
I love the terminology of agent and patient, and the other ones you used.
Those are sort of linguistic terms, correct?
Those are, those are for like kind of meaning.
Those are meaning, and subject and object
are generally used for position.
So subject is just like the thing that comes before the verb,
and the object is the one that comes after the verb.
The agent is kind of like the thing doing,
that's kind of what that means, right?
The subject is often the person doing the action, right?
The thing, so yeah.
Okay, this is fascinating.
So how hard is it to form a tree in general?
Is there a procedure to it?
Like if you look at different languages,
is it supposed to be a very natural,
like is it automatable or is there some human genius
involved in it?
I think it's pretty automatable at this point.
People can figure out what the words are.
They can figure out the morphemes, which are the, technically morphemes are the minimal
meaning units within a language.
And so when you say eats or drinks, it actually has two morphemes in English.
There's the root, which is the verb, and then there's some ending on it, which tells you
that's the third person singular.
Can you say what morphemes are?
Morphemes are just the minimal meaning units
within a language.
And a word is just kind of the things we put spaces
between in English and they have a little bit more,
they have the morphology as well.
They have the endings, this inflectional morphology
on the endings on the roots.
They modify something about the word
that adds additional meaning.
They tell you, yeah, yeah.
And so we have a little bit of that in English,
very little, you have much more in Russian, for instance.
And, but we have a little bit in English. And so we have a little bit of that in English, very little, much more in Russian, for instance. But we have a little bit in English. And so we have a little on the nouns,
you can say it's either singular or plural. And you can say, same thing for verbs, like simple
past tense, for example. So, you know, notice in English, we say drinks, you know, he drinks,
but everyone else says, I drink, you drink, we drink. It's unmarked in a way. And then,
but in the past tense, it's just drank. For everyone, there's no morphology
at all for past tense. If there is morphology, it's marking past tense, but it's kind of
an irregular now. So we don't even, you know, drink to drink, you know, it's not even a
regular word. So in most verbs, many verbs, there's an ed we kind of add. So walk to walked,
we add that to say it's the past tense. I just happened to choose an irregular because the high frequency word and the high frequency words tend to have
irregular in English for... What's an irregular?
Irregular is just there isn't a rule. So drink to drank is an irregular.
Drink drank, okay, versus walked. As opposed to walk, walked, talked, talked.
And there's a lot of irregular in English. There's a lot of irregulars in English. There's a lot of irregulars in English. The frequent ones, the common words, tend to be irregular.
There's many, many more low-frequency words,
and those are regular ones.
The evolution of the irregulars are fascinating.
It's essentially slang that's sticky,
because you're breaking the rules,
and then everybody use it and doesn't follow the rules,
and they say, screw it to the rules.
It's fascinating.
So you said morphemes, lots of questions.
So morphology is what, the study of morphemes?
Morphology is the connections
between the morphemes onto the roots.
So in English, we mostly have suffixes,
we have endings on the words,
not very much, but a little bit,
and as opposed to prefixes,
some words, depending on your language,
can have mostly prefixes, mostly suffixes,
or mostly, or both.
And then even languages, several languages
have things called infixes,
where you have some kind of a general form for the root,
and you put stuff in the middle, you change the vowels.
Stuff like that.
That's fascinating.
That is fascinating. So in general, there's what, two morphemes per word?
One or two or three?
Well, in English, it's one or two.
In English, it tends to be one or two.
There can be more.
In other languages, a language like Finnish,
which has a very elaborate morphology,
there may be 10 morphemes on the end of a root.
Okay, and so there may be millions of forms
of a given word, okay?
Okay, I will ask the same question over and over,
but how does the, just sometimes to understand
things like morphemes, it's nice to just ask the question,
how do these kinds of things evolve?
So you have a great book studying sort of the,
how the cognitive processing,
how language used for communication,
so the mathematical notion of how effective language is
for communication, what role that plays
in the evolution of language, but just high level,
like how does a language evolve
where English is two morphemes or one or two morphemes
per word and then Finnish has infinity per word?
So how does that happen?
Is it just people?
That's a really good question.
That's a very good question.
It's like why do languages have more morphology
versus less morphology?
And I don't think we know the answer to this.
I think there's just like a lot of good solutions
to the problem of communication.
So like, I believe as you hinted that language
is an invented system by humans for communicating their ideas.
And I think it comes down to we label the things
we wanna talk about, those are the morphemes and words,
those are the things we wanna talk about in the world
and we invent those things.
And then we put them together in ways that are easy for us
to convey, to process.
But that's like a naive view and I don't,
I mean, I think it's probably right, right?
It's naive and probably right.
Well, that's the notice.
I don't know if it's naive, I think it's simple.
Simple, yeah.
I think naive is an indication that's incorrect somehow.
It's a trivial, too simple.
I think it could very well be correct.
But it's interesting how sticky,
it feels like two people got together.
It just feels like once you figure out
certain aspects
of a language, they just become sticky
and the tribe forms around that language.
Maybe the language, maybe the tribe forms first
and then the language evolves.
And then you just kind of agree
and then you stick to whatever that is.
I mean, these are very interesting questions.
We don't know really about how words,
even words get invented very much about,
we don't really, I mean,
assuming they get invented, we don't really know how that process works and how these things evolve.
What we have is kind of a current picture, a current picture of a few thousand languages,
a few thousand instances. We don't have any pictures of really how these things are evolving really.
And then the evolution is massively confused by contact, right?
So as soon as one language group, one group runs into another, we are smart.
Humans are smart and they take on whatever is useful in the other group.
And so any kind of contrast which you're talking about,
which I find useful, I'm going to start using as well. So I worked a little bit in specific
areas of words, in number words and in color words. And color words, so we have in English,
we have around 11 words that everyone knows for colors. And many more if you happen to be interested
in color for some reason or other. If you're a fashion designer or an artist or something,
you may have many, many more words. But we can see millions. Like if you have normal
color vision, normal trichromatric color vision, you can see millions of distinctions in color.
So we don't have millions of words. The most efficient, no, the most detailed color vocabulary would have over a million terms to distinguish all
the different colors that we can see, but of course we don't have that. So it's somehow,
it's kind of useful for English to have evolved in some way to, there's 11 terms that people
find useful to talk about, you know, black, white, red, blue, green,
yellow, purple, gray, pink, and I probably missed something there. Anyway, there's 11
that everyone knows.
Yeah.
And depending on your, but you go to different cultures, especially the non-industrialized
cultures and there'll be many fewer. So some cultures will have only two, believe it or not. The Danai in Papua New Guinea
have only two labels that the group uses for color. Those are roughly black and white. They are very,
very dark and very, very light, which are roughly black and white. And you might think, oh,
they're dividing the whole color space into light and dark or something. And that's not really true.
They mostly just only label the black and the white things. They just don't talk about the colors for the other ones. And so,
and then there's other groups. I've worked with a group called the Chimani down in Bolivia in
South America, and they have three words that everyone knows, but there's a few others that
several people like, that many people know. And so, they have, kind of depending on how you count,
between three and seven words that the group knows, okay? And again, they're black and white,
everyone knows those. And red, red is, that tends to be the third word that everyone,
that cultures bring in, if there's a word, it's always red, the third one. And then after that,
it's kind of all bets are off about what they bring in. And so, after that, they bring in if there's a word, it's always red, the third one. And then after that, it's kind of all bets are off about what they bring in. And so after that, they bring in a sort of a big blue green
space, Gru, Gru, they have one for that. And then they have, and then, you know, different people
have different words that they'll use for other parts of the space. And so anyway, it's probably
related to what they want to talk, not what they see,
because they see the same colors as we see.
So it's not like they have a weak,
a low color palette in the things they're looking at.
They're looking at a lot of beautiful scenery, okay?
A lot of different colored flowers and berries and things.
And so there's lots of things of very bright colors,
but they just don't label the color in those cases. And the reason probably, we don't know this, but we think probably what's going on here is that what you do, why you label something,
is you need to talk to someone else about it. And why do I need to talk about a color?
Well, if I have two things which are identical and I want you to give me the one that's different
and the only way it varies is color,
then I invent a word which tells you,
this is the one I want.
So I want the red sweater off the rack,
not the green sweater, right?
There's two, and so those things will be identical
because these are things we made and they're dyed
and there's nothing different about them.
And so in industrialized society, we have,
everything we've got is pretty much arbitrarily colored,
but if you go to a non-industrialized group,
that's not true.
And so they don't, suddenly they're not interested in color.
If you bring bright colored things to them,
they like them just like we like them.
Bright colors are great, they're beautiful,
but they just don't need to, no
need to talk about them. They don't have.
So probably color words is a good example of how language evolves from sort of function
when you need to communicate the use of something.
I think so.
Then you kind of invent different variations and basically you can imagine that the evolution
of a language has to do with what the early tribes doing.
Like what kind of problems are facing them
and they're quickly figuring out
how to efficiently communicate the solution
to those problems, whether it's aesthetic or function,
all that kind of stuff, running away from a mammoth
or whatever.
But you know, I think what you're pointing to
is that we don't have data on the evolution of language
because many languages were formed a long time ago, so you don't get the chatter.
We have a little bit of like old English to modern English because there was a writing
system and we can see how old English looked.
So the word order changed for instance in old English to middle English to modern English
and so we could see things like that, but most languages
don't even have a writing system. Of the 7,000, only a small subset of those have a writing system.
Even if they have a writing system, it's not a very modern writing system. We just basically
have for Mandarin, for Chinese, we have a lot of evidence for long time and for English, and not for much else, not for German a little bit,
but not for a whole lot of long-term language evolution.
We don't have a lot.
We just have snapshots is what we've got
of current languages.
Yeah, you get an inkling of that
from the rapid communication on certain platforms
like on Reddit.
There's different communities,
and they'll come up with different slang.
Usually, from my perspective,
driven by a little bit of humor,
or maybe mockery or whatever,
it's just talking shit in different kinds of ways.
And you could see the evolution of language there,
because I think a lot of things on the internet,
you don't want to be the boring mainstream.
So you want to deviate from the proper way of talking.
And so you get a lot of deviation, like rapid deviation.
Then when communities collide, you get like,
just like you said, humans adapt to it.
And you can see it through the lens of humor.
I mean, it's very difficult to study,
but you can imagine like 100 years from now, well, if there's a new language born, for example, we'll get really high resolution
data. I mean, English is changing. English changes all the time. All languages change all the time.
So, you know, it's a famous result about the Queen's English. So, if you look at the Queen's
vowels, the Queen's English is supposed to be originally
the proper way for the talk was sort of defined by whoever the Queen talked or the King, whoever
was in charge.
And so, if you look at how her vowels changed from when she first became Queen in 1952 or
1953 when she was coronated, the first, I mean, that's Queen Elizabeth who died recently,
of course, until 50 years later,
her vowels changed, her vowels shifted a lot. And so that, even in the sounds of British English,
in her, the way she was talking was changing, the vowels were changing slightly. So that's just,
in the sounds there's change. I don't know what's, I'm interested, we're all interested in what's
driving any of these changes. The word order of English changed a lot over a thousand years, right?
So it used to look like German.
It used to be a verb final language with case marking and it shifted to a verb medial language.
A lot of contact, so a lot of contact with French and it became a verb medial language
with no case marking.
And so it became this verb initially thing.
So that's-
It's evolving.
It totally evolved.
And so it may very well, I mean,
it doesn't evolve maybe very much in 20 years
is maybe what you're talking about.
But over 50 and a hundred years,
things change a lot, I think.
We'll now have good data on it, which is great.
That's for sure.
Can you talk to what is syntax and what is grammar?
So you wrote a book on syntax.
I did.
You were asking me before about how do I figure out
what a dependency structure is.
I'd say the dependency structures aren't that hard.
Generally, I think it's a lot of agreement
of what they are for almost any sentence in most languages.
I think people will agree on a lot of that.
There are other parameters in the mix such that some people think there's
a more complicated grammar than just a dependency structure. And so, you know, like Noam Chomsky,
he's the most famous linguist ever. And he is famous for proposing a slightly more complicated
syntax. And so he invented phrase structure grammar. So he's well known for many, many
things, but in the 50s and early 60s, like the late 50s, he was basically figuring out
what's called formal language theory. So, and he figured out sort of a framework for
figuring out how complicated language, you know, a certain type of language might be,
so-called phrase structure grammars of language might be.
And so his idea was that maybe we can think
about the complexity of a language
by how complicated the rules are, okay?
And the rules will look like this.
They will have a left-hand side
and it'll have a right-hand side.
Look, something on the left-hand side
will expand to the thing on the right-hand side. So say we'll start a right hand side. Something on the left hand side will expand to the thing on the right hand side.
So say we'll start with an S,
which is like the root, which is a sentence, okay?
And then we're gonna expand to things
like a noun phrase and a verb phrase
is what he would say, for instance, okay?
An S goes to an NP and a VP
is a kind of a phrase structure rule.
And then we figure out what an NP is.
An NP is a determiner and a noun, for instance.
And a verb phrase is something else, is a verb and another noun phrase and another NP,
for instance. Those are the rules of a very simple phrase structure. Okay? And so he proposed
phrase structure grammar as a way to sort of cover human languages. And then he actually
figured out that, well, depending on the formalization of those grammars, you might get more complicated or less complicated languages. So he said,
well, these are things called context-free languages, that rule. He thought human languages
tend to be what he calls context-free languages. But there are simpler languages, which are
so-called regular languages, and they have a more constrained form to the rules of the phrase structure of these particular rules.
He basically discovered and kind of invented ways to describe the language, and those are
phrase structure, a human language.
He was mostly interested in English initially in his work in the 50s.
Quick questions around all this.
Former language theory is the big field
of just studying language formally.
Yes, and it doesn't have to be human language there.
We can have computer languages, any kind of system
which is generating some set of expressions in a language.
And those could be like the statements in a computer language,
for example. It could be that or it could be human language.
So technically you can study programming languages.
Yes, and have been heavily studied using this formalism. There's a big field of
programming languages within the formal language.
Okay. And then phrase structure grammar is this idea that you can break down language
into this SNP, VP type of thing.
It's a particular formalism for describing language. Okay. So, and Chomsky was the first
one. He's the one who figured that stuff out back in the fifties. And that's equivalent
actually there. The context free grammar is actually is kind of equivalent in the sense
that it generates the same sentences
as a dependency grammar would.
The dependency grammar is a little simpler in some way.
You just have a root and it goes,
we don't have any of these,
the rules are implicit I guess,
and we just have connections between words.
The free structure grammar is kind of a different way
to think about the dependency grammar.
It's slightly more complicated,
but it's kind of the same in some ways.
So to clarify, dependency grammar is the framework
under which you see language and you make a case
that this is a good way to describe language.
That's correct.
And Noam Chomsky is watching this,
he's very upset right now, so let's,
just kidding, but what's the difference between,
where's the place of disagreement
between phrase structure grammar and dependency grammar?
They're very close.
So phrase structure grammar and dependency grammar
aren't that far apart.
I like dependency grammar because it's more perspicuous,
it's more transparent about representing the connections between the words.
It's just a little harder to see in phrase structure grammar.
The place where Chomsky sort of devolved or went off
from this is he also thought there was
something called movement, okay?
And so, and that's where we disagree, okay?
That's the place where I would say we disagree.
And I mean, maybe we'll get into that later, but the idea is, if you want to do you want me to explain that? No,
I would love to explain movement. Okay, so you're saying so many interesting things.
Yeah, yeah. Okay, so here's the movement is Chomsky basically sees English and he says,
okay, I said, you know, we had that sentence sentence early, like it was like two dogs entered
the room. It's changed a little bit, say two dogs will enter the room. And he notices that, hey, English, if I want to make a question,
a yes-no question from that same sentence, I say instead of two dogs will enter the room,
I say, will two dogs enter the room? Okay, there's a different way to say the same idea.
And it's like, well, the auxiliary verb, that will thing, it's at the front as opposed to
in the middle. Okay. And so, and he looked, you know, if it's at the front as opposed to in the middle. Okay?
And so, and he looked, you know, if you look at English, you see that that's true for all
those modal verbs and for other kinds of auxiliary verbs in English.
You always do that.
You always put an auxiliary verb at the front.
And when he saw that, so, you know, if I say, I can win this bet, can I win this bet, right?
So I move a can to the front.
So actually, that's a theory.
I just gave you a theory there.
He talks about it as movement.
That word in the declitings, the declarative is the root,
is the sort of default way to think about the sentence
and you move the auxiliary verb to the front.
That's a movement theory, okay?
And he just thought that was just so obvious
that it must be true,
that there's nothing more to say
about that, that this is how auxiliary verbs work in English.
There's a movement rule such that you're moving, like to get from the declarative to the interrogative,
you're moving the auxiliary to the front.
And it's a little more complicated as soon as you go to simple present and simple past
because if I say, you know, John slept, you have to say, did John
sleep, not slept John, right? And so you have to somehow get an auxiliary verb and I guess
underlyingly it's like slept is, it's a little more complicated than that, but that's his
idea there's a movement, okay? And so a different way to think about that, that isn't, I mean,
then he ended up showing later. So he proposed this theory of grammar, which has movement. There's other places where he thought there's movement, not just auxiliary verbs,
but things like the passive in English and things like questions, WH questions,
a bunch of places where he thought there's also movement going on. And in each one of those,
he thinks there's words, well, phrases and words are moving around from one structure to another,
which he called deep structure to surface structure. I mean, there's like two different structures
in his theory, okay? There's a different way to think about this, which is there's no movement
at all. There's a lexical copying rule such that the word will or the word can, these
auxiliary verbs, they just have two forms. And one of them is the declarative
and one of them is interrogative. And you basically have the declarative one and, oh,
I form the interrogative or I can form one from the other, it doesn't matter which direction
you go. And I just have a new entry, which has the same meaning, which has a slightly
different argument structure. Argument structure is just a fancy word for the ordering of the words. And so if I say,
the two dogs can or will enter the room, there's two forms of will. One is will declarative,
and then, okay, I've got my subject to the left, it comes before me, and the verb comes after me in that one. And then the will interrogative is like, oh, I go first.
Interrogative will is first,
and then I have the subject immediately after,
and then the verb after that.
And so you can just generate from one of those words
another word with a slightly different argument structure
with different ordering.
And these are just lexical copies.
And they're just.
They're not necessarily moving from one to another.
There's no movement.
There's a romantic notion that you have like
one main way to use a word,
and then you could move it around.
Right, right.
Which is essentially what movement is implying.
Yeah, but that's the lexical copying is similar.
So then we do lexical copying for that same idea
that maybe the declarative is the source,
and then we can copy it.
And so an advantage,
there's multiple advantages of the lexical copying story. It's not my story. This is like Ivan Sog, linguists, a bunch of linguists
have been proposing these stories as well, you know, in tandem with the movement story. Okay,
you know, he's, Ivan Sog died a while ago, but he was one of the proponents of the non-movement of the lexical copying story. And so that is that a great advantage is, well, Chomsky really famously in 1971 showed
that the movement story leads to learnability problems.
It leads to problems for how language is learned.
It's really, really hard to figure out what the underlying structure of a language is
if you have both phrase structure and movement.
It's like really hard to figure out what came from what.
There's like a lot of possibilities there.
If you don't have that problem,
learning, the learning problem gets a lot easier.
Just say there's lexical copies.
Yeah, yeah.
When we say the learning problem,
do you mean like humans learning a new language?
Yeah, just learning English.
So baby is lying around, listening to the crib, listening to me talk.
How are they learning English?
Or maybe it's a two-year-old who's learning interrogatives and stuff.
How are they doing that?
Are they doing it from, are they figuring out?
So Chomsky said it's impossible to figure it out, actually.
He said it's actually impossible, not hard, but impossible.
And therefore, that's where universal grammar comes from, is that it has to be built in.
And so what they're learning is that there's some built-in movement that's built in in
his story, is absolutely part of your language module.
And then you're just setting parameters. You're set depending on English is just sort of
a variant of the universal grammar and you're figuring out which orders do those English do
these things. The non-movement story doesn't have this. It's like much more bottom-up. You're
learning rules. You're learning rules one by one, this word is connected to that word. Another
advantage, it's learnable, another advantage of it is that it predicts that not all auxiliaries
might move. It might depend on the word, depending on whether you, and that turns out to be true.
So there's words that don't really work as auxiliary, they work in declarative and not in interrogative.
So I can say, I'll give you the opposite first.
If I can say, aren't I invited to the party?
And that's an interrogative form,
but it's not from I aren't invited to the party.
There is no I aren't, right?
So that's interrogative only.
And then we also have forms like ought.
I ought to do this.
And I guess some British, old British people can say-
Ought I.
Exactly, it doesn't sound right, does it?
For me, it sounds ridiculous.
I don't even think ought is great,
but I mean, I totally recognize I ought to do.
I think it's not too bad actually.
I can say ought to do this.
That sounds pretty good.
Ought I, if I'm trying to sound sophisticated maybe.
I don't know, it just sounds completely out to me this. That sounds pretty good. If I'm trying to sound sophisticated, maybe. I don't know.
It just sounds completely out.
Odd-eye.
Anyway, so there are variants here.
And a lot of these words just work in one versus the other.
And that's fine under the lexical copying story.
It's like, well, you just learn the usage.
Whatever the usage is, is what you do with this word.
But it's a little bit harder in the movement story.
The movement story, that's an advantage,
I think, of lexical copying.
In all these different places,
there's all these usage variants which make
the movement story a little bit harder to work.
So one of the main divisions here is
the movement story versus the lexical copy story.
That has to do about the auxiliary words and so on.
But if you rewind to the phrase structure grammar
versus dependency grammar.
Those are equivalent in some sense
in that for any dependency grammar,
I can generate a phrase structure grammar
which generates exactly the same sentences.
I just like the dependency grammar formalism because it makes
something really salient, which is the length of dependencies between words, which isn't
so obvious in the phrase structure. In the phrase structure, it's just kind of hard to
see. It's in there. It's just very opaque.
Technically, I think phrase structure grammar is mappable to dependency grammar.
And vice versa. And vice versa. Yeah, there's like these like little labels S and PVP. Yeah.
For a particular dependency grammar, you can make a phrase structure grammar which generates
exactly those same sentences and vice versa. But there are many phrase structure grammars
which you can't really make a dependency grammar. I mean, you can do a lot more in a phrase
structure grammar,
but you get many more of these extra nodes basically.
You can have more structure in there.
And some people like that, and maybe there's value to that.
I don't like it.
Well, for you, so we should clarify.
So dependency grammar is just, well,
one word depends on only one other word,
and you form these trees, and that makes, it really puts priority
on those dependencies just like as a tree
that you can then measure the distance
of the dependency from one word to the other.
They can then map to the cognitive processing
of the sentences, how easy it is to understand
and all that kind of stuff.
So it just puts the focus on just like the mathematical
distance of dependence between words.
So like it's just a different focus.
Absolutely.
Just continue on the thread of Chomsky
because it's really interesting.
Because as you're discussing disagreement,
to the degree there's disagreement,
you're also telling the history of the study of language which is really awesome. So you mentioned
context-free versus regular. Does that distinction come into play for
dependency grammars? No, not at all. I mean regular regular languages are too
simple for human languages. They are they it's a part of the hierarchy, but human languages are in the phrase structure world
are definite, they're at least context free,
maybe a little bit more, a little bit harder than that.
But, so there's something called context sensitive as well,
where you can have, like this is just the formal language
description, in a context free grammar, you have one, this is like a bunch of like
formal language theory we're doing here.
I love it.
Okay. So you have a left-hand side category and you're expanding to anything on the right
is a, that's a context-free. So like the idea is that that category on the left expands
in independent of context to those things, whatever they are on the right. It doesn't matter what. And a context sensitive says, okay,
I actually have more than one thing on the left.
I can tell you only in this context,
maybe you have like a left and a right context
or just a left context or a right context,
I have two or more stuff on the left
tells you how to expand those things in that way.
Okay, so it's context sensitive.
A regular language is just more
constrained. And so it doesn't allow anything on the right. Basically, it's one very complicated
rule is kind of what a regular language is. And so it doesn't have any, I was going to say
long distance dependencies, it doesn't allow recursion, for instance. There's no recursion.
Yeah, recursion is where you, which is doesn't allow recursion, for instance. There's no recursion.
Yeah, recursion is where you,
which is human languages have recursion.
They have embedding and you can't, well,
it doesn't allow center embedded recursion,
which human languages have, which is what.
Center embedded recursion.
So within a sentence, within a sentence.
Yeah, within a sentence.
So here we're going to get to that.
But I know the formal language stuff is a little aside.
Chomsky wasn't proposing it for human languages even.
He was just pointing out that human languages are context-free. That was kind of stuff we did
for formal languages. What he was most interested in was human language. The movement is where he
sort of set off on the, I would say, a very interesting, but wrong foot. It was kind of interesting.
I agree, it's a very interesting history. So he proposed this multiple theories in 57
and then 65. They all have this framework though, was phrase structure plus movement.
Different versions of the phrase structure and the movement in the 57. These are the
most famous original bits of Chomsky's work. And then in 71 is when he figured out that those lead to learning problems,
that there's cases where a kid could never figure out
which set of rules was intended.
And then he said, well, that means it's innate.
It's kind of interesting.
He just really thought the movement
was just so obviously true that he couldn't,
he didn't even entertain giving it up.
It's just obvious that's
obviously right. And it was later where people figured out that there's all these like subtle
ways in which things would, which look like generalizations, aren't generalizations and they,
you know, across the category, they're, they're word specific and they have, and they, they kind
of work, but they don't work across various other words in the category. And so it's easier to just think of these things as lexical copies. And I think he was very obsessed,
I don't know, I'm guessing, that he really wanted this story to be simple in some sense.
And language is a little more complicated in some sense. He didn't like words. He never talks about
words. He likes to talk about combinations of words. And words are, you know, look up a dictionary, there's 50 senses for a common word, right?
The word take will have 30 or 40 senses in it.
So there'll be many different senses for common words.
And he just doesn't think about that.
It doesn't think that's language.
I think he doesn't think that's language.
He thinks that words are distinct from combinations
of words. I think they're the same. If you look at my brain in the scanner while I'm
listening to a language I understand and you compare, I can localize my language network
in a few minutes, in like 15 minutes. And what you do is I listen to a language I know,
I listen to maybe some language I don't know, or I listen to muffled
speech or I read sentences, I read non-words.
I can do anything like this, anything that's sort of really like English and anything that's
not very like English.
So I've got something like it and not, and I've got control.
And the voxels, which is just the 3D pixels in my brain that are responding most, is a
language area.
And that's this left lateralized area in my head.
And wherever I look in that network,
if you look for the combinations versus the words,
it's everywhere.
It's the same.
That's fascinating.
And so it's like hard to find,
there are no areas that we know.
I mean, that's, it's a little
overstated right now.
At this point, the technology isn't great.
It's not bad, but we have the best way to figure out what's going on in my brain when
I'm listening or reading language is to use fMRI, functional magnetic resonance imaging.
And that's a very good localization method.
So I can figure out where exactly these signals are coming from pretty
you know down to you know millimeters you know cubic millimeters or smaller okay very small we
can figure those out very well. The problem is the when okay it's it's measuring oxygen okay and
oxygen takes a little while to get to those cells so it takes on the order of seconds so
I talk fast I probably listen fast and I can probably understand things
really fast. So a lot of stuff happens in two seconds. And so to say that we know what's going
on, that the words right now in that network, our best guess is that whole network is doing
something similar, but maybe different parts of that network are doing different things. And
that's probably the case. We just don't have very good methods to figure that out,
right, at this moment.
And so.
Since we're kind of talking about the history
of the study of language,
what other interesting disagreements,
and you're both at MIT, or were for a long time,
what kind of interesting disagreements there,
tension of ideas are there,
between you and Noam Chomsky?
And we should say that Noam was in the linguistics
department and you're, I guess for a time
were affiliated there, but primarily brain
and cognitive science department.
It's just another way of studying language
and you've been talking about fMRI.
So like what, is there something else interesting
to bring to the surface about the disagreement between the two of you or other people in the discussion?
Yeah, I mean, I've been at MIT for 31 years since 1993 and Chomsky's been there much longer.
So I met him, I knew him, I met when I first got there, I guess, and we would interact every now and then. I'd say that, so I'd say our biggest difference is our methods.
And so that's the biggest difference between me and Noam is that I gather data from people.
I do experiments with people and I gather corpus data, whatever corpus data is available,
and we do quantitative methods to evaluate
any kind of hypothesis we have. He just doesn't do that. So he has never once been
associated with any experiment or corpus work ever. And so it's all thought experiments. It's
his own intuitions. So I just don't think that's the way to do things. That's a, you know, across the street,
they're across the street from us,
kind of difference between brain and cog-sci and linguistics.
I mean, not all, some of the linguists,
depending on what you do, more speech oriented,
they do more quantitative stuff,
but in the meaning, words and well,
it's combinations of words, syntax, semantics,
they tend not
to do experiments and corpus analyses.
So I know linguistic science probably, well, but the method is a symptom of a bigger approach,
which is sort of a psychology philosophy side on Gnome.
And for you, it's more sort of data driven, sort of almost like mathematical approach.
Yeah, I mean, I'm a psychologist.
So I would say we're in psychology.
You know, I'm in brain and cognitive sciences
is MIT's old psychology department.
It was a psychology department up until 1985,
and that became the brain and cognitive science department.
And so I mean, my training is in psychology.
I mean, my training is math and computer science,
but I'm a psychologist.
I mean, I don't know what I am.
So data-driven psychologist, you are. I know what I am, but I'm a psychologist. I mean, I don't know what I am. So data drove a psychologist.
Yeah, yeah, yeah.
You are.
I know what I am, but I'm happy to be called a linguist,
I'm happy to be called a computer scientist,
I'm happy to be called a psychologist,
any of those things.
But in the actual, like how that manifests itself
outside of the methodology is like these differences,
these subtle differences about the movement story
versus the lexical copy story.
Yeah, those are theories, right?
So the theories are, but I think the reason we differ
in part is because of how we evaluate the theories.
And so I evaluate theories quantitatively
and Noam doesn't.
Got it.
Okay, well let's explore the theories
that you explore in your book.
Let's return to this dependency grammar framework
of looking at language.
What's a good justification why the dependency
grammar framework is a good way to explain language?
What's your intuition?
So the reason I like dependency grammar,
as I've said before, is that it's very transparent
about its representation of distance between words.
So it's like, all it is is you've got a bunch of words, you're connecting together to make a sentence.
And a really neat insight, which turns out to be true, is that the further apart the pair of words are that you're connecting,
the harder it is to do the production, the harder it is to do the comprehension.
It's as harder to produce, it's hard to understand when the words are far apart.
When they're close together, it's easy to produce and it's easy to comprehend. Let me give you an
example, okay? So we have, in any language, we have mostly local connections between words,
but they're abstract. The connections are abstract. They're between categories of words. And so you can always make things further apart if you put your, if you add modification, for example,
after a noun. So a noun in English comes before a verb. The subject noun comes before a verb.
And then there's an object after, for example. So I can say what I said before, you know,
the dog entered the room or something like that.
So I can modify dog.
If I say something more about dog after it,
then what I'm doing is indirectly,
I'm lengthening the dependence between dog and entered
by adding more stuff to it.
So I just make it explicit here.
If I say,
the boy who the cat scratched cried.
We're gonna have a mean cat here.
And so what I've got here is the boy cried
would be a very short, simple sentence.
And I just told you something about the boy
and I told you it was the boy who the cat scratched, okay?
So the cry is connected to the boy.
The cry at the end is connected to the boy in the beginning. Right, and so I can do that. I can say that that's a perfectly fine English sentence
and I can say the cat which the dog chased ran away or something. Okay. I can do that.
But it's really hard now. I've got whatever I have here. I have the boy who the cat, now let's say I try to
modify cat, okay, the boy who the cat which the dog chased scratched ran away. Oh my god, that's
hard, right? I can, I'm sort of just working that through in my head how to produce and how to,
and it's really very just horrendous to understand. It's not so bad, at least I've got intonation
there to sort of mark the boundaries and stuff, but it's,
that's really complicated.
That's sort of English in a way.
I mean, that follows the rules of English.
But so what's interesting about that is, is that what I'm doing is nesting dependencies
here.
I'm putting one, I've got a subject connected to a verb there, and then I'm modifying that
with a clause, another clause, which happens to have a subject and a verb relation.
I'm trying to do that again on the second one.
And what that does is it lengthens out the dependence,
multiple dependence actually get lengthened out there.
The dependencies get longer, on the outside ones get long,
and even the ones in between get kind of long.
And you just, so what's fascinating is that that's bad.
That's really horrendous in English,
but that's horrendous in any language.
And so in no matter what language you look at,
if you do just figure out some structure
where I'm gonna have some modification following some head,
which is connected some later head,
and I do it again, it won't be good, it guaranteed.
Like 100%, that will be uninterpretable in that language
in the same way that was uninterpretable in English.
Just to clarify the distance of the dependencies is whenever the boy cried
this there's a dependence between two words and then you counting the number
of what morphemes between them.
That's a good question.
I just say words, your words are morphemes between.
We don't know that actually. That's a very good question. What is the distance metric? But let's just say it's a good question. I just say words. Your words are morphemes between. We don't know that.
Actually, that's a very good question.
What is the distance metric?
But let's just say it's words, sure.
So, and you're saying the longer the distance
to that dependence, the more, no matter the language,
except Ligulese.
Even Ligulese.
Even Ligulese, okay, we'll talk about it.
We'll talk about it.
We'll get to that.
Okay, okay, okay.
But that, the people will be very upset
that speak that language.
Not upset, but they'll either not understand it,
they'll be like, this is,
their brain will be working in overtime.
They will have a hard time either producing
or comprehending it.
They might tell you that's not their language.
It's sort of the language.
I mean, it's following,
like they'll agree with each of those pieces
as part of their language,
but somehow that combination will be very, very difficult
to produce and understand.
Is that a chicken or the egg issue here?
So like, is...
Well, I'm giving you an explanation.
Right.
So the, I mean, I'm giving you two kinds of explanations.
I'm telling you that center embedding, that's nesting,
those are the same, those are synonyms
for the same concept here.
And the explanation for why, those are always hard,
center embedding and nesting are always hard. And I gave you an explanation for why they might be
hard, which is long distance connections. When you do center embedding, when you do nesting,
you always have long distance connections between the dependents. So that's not necessarily the
right explanation. I can go through reasons why that's probably a good explanation. And it's not
really just about one of them.
So probably it's a pair of them or something of these dependents that you get long that
drives you to be really confused in that case.
And so the behavioral consequence there, this is kind of methods, like how do we get at
this?
You could try to do experiments to get people to produce these things.
They're going to have a hard time producing them.
You can try to do experiments to get them to understand them and see how well they understand
them, can they understand them.
Another method you can do is give people partial materials and ask them to complete them, those
center embedded materials, and they'll fail.
So I've done that.
I've done all these kinds of things.
So, wait I mean, so,
so central embedding meaning,
like you take a normal sentence,
like the boy cried and they inject a bunch of crap
in the middle that separates the boy and the cried.
Okay, that's central embedding,
and nesting is on top of that.
No, nesting is the same thing.
Central embedding, those are totally equivalent terms.
I'm sorry, I sometimes use one and sometimes use two.
Ah, got it, got it.
They don't mean anything different.
Got it.
And then what you're saying is there's a bunch
of different kinds of experiments you can do.
I mean, I like the understanding one is like,
have more embedding, more central embedding,
is it easier or harder to understand,
but then you have to measure the level of understanding,
I guess.
Yeah, you could.
I mean, there's multiple ways to do that.
I mean, there's the simplest ways,
just ask people how good is the sound? How natural is the sound. That's a very blunt but very good measure.
It's very, very reliable. People will do the same thing. And so it's like, I don't know what it
means exactly, but it's doing something such that we're measuring something about the confusion,
the difficulty associated with those. And those like, those are giving you a signal.
That's why you can say them. Okay, what about the completion of the central bit?
So if you give them a partial sentence,
say I say the book which the author who,
and I ask you to now finish that off for me.
I mean, either say it, yeah, yeah,
but you can just say it's written in front of you
and you can just type in, have as much time as you want.
They will, even though that one's not too hard, right?
So if I say it's like the book, it's like, oh, the book which the author who I met wrote was good, you know,
that's a very simple completion for that. If I give that completion online somewhere to a, you know,
a crowdsourcing platform and ask people to complete that, they will miss off of a verb,
very regular, like half a time, maybe two thirds of the time. They'll say, they will miss off a verb very regularly, like half the time,
maybe two thirds of the time. They'll just leave off one of those verb phrases. Even
with that simple, so let's say the book, which the author, who, and they'll say, was, they
need three verbs, right? I need three verbs here. Who I met wrote was good, and they'll
give me two. They'll say who was famous was
good or something like that. They'll just give me two and that'll happen about 60% of
the time. So 40%, maybe 30, they'll do it correctly, meaning they'll do it with three
verb phrases. I don't know what's correct or not. This is hard. It's a hard task.
Yeah, I'm actually struggling with it in my head.
Well, it's easier when you stare at it.
If you look, it's a little easier than listening is pretty tough.
Because there's no trace of it. You have to remember the words that I'm saying,
which is very hard auditorily. We wouldn't do it this way. You do it written. You can look at it
and figure it out. It's easier in many dimensions in some ways, depending on the person. It's easier
to gather written data. I mean, most sort of,
I work in psycholinguistics, right? Psychology of language and stuff. And so a lot of our work is
based on written stuff because it's so easy to gather data from people doing written kinds of
tasks. Spoken tasks are just more complicated to administer and analyze because people do weird
things when they speak and it's harder to analyze what they do. But they, um, they,
they generally point to the same kinds of things.
So, okay. So the universal theory of language by Ted Gibson is,
uh, that you can form dependency.
You can form trees for many sentences and you can measure the distance in some
way of those dependencies. And then you can say that most languages
have very short dependencies.
All languages. All languages.
All languages have short dependencies.
You can actually measure that.
So an ex-student of mine,
this guy's at University of California Irvine,
Richard Futrell did a thing a bunch of years ago now,
where he looked at all the languages we could look at,
which was about 40 initially,
and now I think there's about 60
for which there are dependency structures.
So there are meaning that it's gotta be like a big text,
bunch of texts which have been parsed
for their dependency structures.
And there's about 60 of those
which have been parsed that way.
And for all of those,
what he did was take any sentence
in one of those languages and you can do the dependency structure and then start at the
root. We're talking about dependency structures. That's pretty easy now. And he's trying to
figure out what a control way you might say the same sentence is in that language. And
so we just like, all right, there's a root's a root. Say as a sentence is, let's go back to two dogs entered
the room.
So entered is the root.
And entered has two dependents.
It's got dogs, and it has room.
And what he does is like, let's scramble that order.
That's three things, the root and the head
and the two dependents, and into some random order, just random.
And then just do that for all the dependents down the two. So now look, do it for the and whatever is two in dogs and for in room.
And that's, you know, that's not a, it's a very short sentence. When sentences get longer
and you have more dependence, there's more scrambling that's possible. And when he found
what, so that, so, so that that's one, you can figure out one scrambling for that sentence.
He did it like a hundred times for every sentence in every corpus, in every one of
these texts, every corpus.
And then he just compared the dependency lengths in those random scramblings to what actually
happened, what the English or the French or the German was in the original language or
Chinese or what all these like 80 languages, 60 languages, okay?
And the dependency lengths are always shorter in the real language compared to this kind of a control. And there's another, it's a little more rigid his
control. So the way I described it, you could have crossed dependencies, like by scrambling that way,
you could scramble in any way at all. Languages don't do that. They tend not to cross dependencies very much. So the dependency structure, they tend to keep things non-crossed.
There's a technical term, they call that projective, but it's just non-crossed is all that is
projective. And so if you just constrain the scrambling so that it only gives you projective,
sort of non-crossed, the same thing holds. So still human languages are much shorter
than this kind of a control.
So what it means is that in every language,
we're trying to put things close
relative to this kind of a control.
It doesn't matter about the word order.
Some of these are verb final.
Some of them use a verb medial like English.
And some are even verb initial.
There are a few languages of the world which have VSO,
word order, verb, subject, object languages.
Haven't talked about those.
It's like 10% of the.
And even in those languages, it's still short dependencies.
Short dependencies is rules.
Okay, so what are some possible explanations for that?
For why languages have evolved that way?
So that's one of the, I suppose, disagreements
you might have with Chomsky.
So you consider the evolution of language
in terms of information theory.
And for you, the purpose of language
is ease of communication, right, and processing.
That's right, that's right. So I mean, the story here is ease of communication, right? And processing.
That's right, that's right.
That's right, so I mean,
the story here is just about communication.
It is just about production, really.
It's about ease of production, is the story.
When you say production, can you?
Oh, I just mean, ease of language production.
It's easier for me to say things when the,
when I'm doing, whenever I'm talking to you,
is somehow I'm formulating some idea in my head
and I'm putting these words together. And it's easier for me to do that, to say something where
the words are closely connected in a dependency as opposed to separated by putting something in
between and over and over again. It's just hard for me to keep that in my head. That's the whole
story. The story is basically, the dependency grammar sort of gives that to you.
Like just like long is bad, short is good.
It's like easier to keep in mind
because you have to keep it in mind for,
probably for production,
probably matters in comprehension as well.
Like also matters in comprehension.
So on both sides of it, the production and the.
But I would guess it's probably evolved for production.
Like about producing, it's what's easier for me to say
that ends up being easier for you also.
That's very hard to disentangle,
this idea of who is it for?
Is it for me, the speaker, or is it for you, the listener?
I mean, part of my language is for you.
Like the way I talk to you is gonna be different
from how I talk to different people.
So I'm definitely angling what I'm saying
to who I'm saying, right?
It's not like I'm just talking the same way
to every single person, and so I am sensitive
to my audience, but does that work itself out
in the dependency length differences?
I don't know, maybe that's about just the words,
that part, which words I select.
My initial intuition is that you optimize language
for the audience.
Yeah.
But it's both.
It's just kind of like messing with my head a little bit
to say that some of the optimization might be,
it may be the primary objective of the optimization
might be the ease of production.
Yeah, and we have different senses, I guess.
I'm like very selfish.
And you're like, I think it's all about me.
I'm like, I'm just doing this easiest for me.
I don't wanna, I mean, but I have to, of course,
choose the words that I think you're gonna know.
I'm not gonna choose words you don't know.
In fact, I'm gonna fix that.
So there it's about, but maybe for the syntax,
for the combinations, it's just about me.
I feel like it's, I don't know though.
It's very hard.
Wait, wait, wait, wait, wait, wait,
but the purpose of communication is to be understood.
Is to convince others and so on.
So like the selfish thing is to be understood.
Okay, yeah, it's a little circular there too then, okay.
Right, I mean like the ease of production.
Helps me be understood then.
I don't think it's circular.
So I want what's the-
No, I think the primary objective is to be understood,
is about the listener.
Because otherwise, if you're optimizing
for the ease of production,
then you're not gonna have any of the interesting
complexity of language.
Like, you're trying to like explain...
Well, let's control for what it is I wanna say.
Like I'm saying, let's control for the thing,
the message, control for the message.
But that means the message needs to be understood.
That's the goal. Oh, but that's the meaning. So I'm still talking about the form. Just the form for the message. I want to tell you. But that means the message needs to be understood. That's the goal.
Oh, but that's the meaning.
So I'm still talking about the form.
Just the form of the meaning.
How do I frame the form of the meaning
is all I'm talking about.
You're talking about a harder thing, I think.
It's like how am I, like try to change the meaning.
Let's keep the meaning constant.
Like which, if you keep the meaning constant,
how can I phrase whatever it is I need to say,
like I gotta pick the right words,
and I'm gonna pick the order so that it's easy for me.
That's what I think it's probably like.
I think I'm still tying meaning and form together in my head.
But you're saying if you keep the meaning
of what you're saying constant,
the optimization, yeah, it could be the primary objective that
optimization is the for production. That's interesting. I'm struggling to keep constant
the meaning is just so I mean, I'm such a human, right? So for me, the form without
having introspected on this, the form and the meaning are tied together, like deeply, because I'm a human.
Like for me when I'm speaking,
because I haven't thought about language,
like in a rigorous way about the form of language.
Look, for any event,
there's an unbounded, I don't wanna say infinite,
but sort of ways that I might communicate that same event. This two dogs entered aed, I don't wanna say infinite, but sort of unbounded ways
that I might communicate that same event.
This two dogs entered a room,
I can say in many, many different ways.
I can say, hey, there's two dogs.
They entered the room.
Hey, the room was entered by something.
The thing that was entered was two dogs.
I mean, that's kind of awkward and weird and stuff,
but those are all similar messages
with different forms,
different ways I might frame.
And of course, I use the same words there all the time.
I could have referred to the dogs as a Dalmatian
and a poodle or something.
I could have been more specific or less specific
about what they are, and I could have said,
been more abstract about the number.
So I'm trying to keep the meaning,
which is this event constant,
and then how am I gonna describe that to get that to you?
It kind of depends on what you need to know, right?
And what I think you need to know,
but I'm like, let's get control for all that stuff,
and not, and it's like, I'm just like choosing,
but I'm doing something simpler than you're doing,
which is just forms.
Just words.
So to you specifying the breed of dog
and whether they're cute or not is changing the meaning.
That might be, yeah.
Yeah, that would be changing.
Oh, that would be changing the meaning for sure.
Right, so you're just, yeah.
Yeah, yeah.
That's changing the meaning.
But say, even if we keep that constant,
we can still talk about what's easier or hard for me, right?
The listener and the, right?
I can have which phrase structures I use,
which combinations, which, you know.
This is so fascinating and just like a really powerful window
into human language, but I wonder still throughout this,
how vast the gap between meaning and form.
I just have this like maybe romanticized notion
that they're close together,
that they evolve close to like hand in hand.
That you can't just simply optimize for one
without the other being in the room with us.
Like, it's, well, it's kind of like an iceberg.
Form is the tip of the iceberg and the rest,
the meaning is the iceberg, but you can't like separate.
But I think that's why these large language models
are so successful is because they're good at form
and form isn't that hard in some sense.
And meaning is tough still.
And that's why they're not, they don't understand
what they're doing.
We're gonna talk about that later maybe,
but like we can distinguish in our,
forget about large language models,
like humans, maybe you'll talk about that later too,
is like the difference between language,
which is a communication system,
and thinking, which is meaning.
So language is a communication system for the meaning,
it's not the meaning.
And so that's why, I mean,
and there's a lot of interesting evidence
we can talk about relevant to that.
Well, I mean, that's a really interesting question.
What is the difference between language written,
communicated versus thought?
What to use the difference between them?
Well, you or anyone has to think of a task
which they think is a good thinking task.
And there's lots and lots of they think is a good thinking task. There's lots and lots
of tasks which should be good thinking tasks. Whatever those tasks are, let's say it's playing
chess or that's a good thinking task or playing some game or doing some complex puzzles, maybe
remembering some digits that's thinking, remembering a lot of different tasks we might think, maybe
just listening to music is thinking. There's a lot of different tasks we might think, maybe just listening to music is thinking, or there's a lot of different tasks
we might think of as thinking.
There's this woman in my department, F. Federico,
and she's done a lot of work on this question
about what's the connection between language and thought.
And so she uses, I was referring earlier to MRI, FMRI,
that's her primary method.
And so she has been really fascinated by this question
about what language is.
Okay.
And so as I mentioned earlier, you can localize my language area, your language area in a
few minutes.
Okay.
In like 15 minutes, I can listen to language, listen to non-language or backward speech
or something.
And we'll find areas, left lateralized network in my head, which is especially, which is
very sensitive to language
as opposed to whatever that control was, okay?
Can you specify what you mean by language,
like communicating language?
Like what is language?
Just sentences.
I'm listening to English of any kind, a story,
or I can read sentences, anything at all that I understand,
if I understand it, then it'll activate my language network.
So right now, my language network is going like crazy
when I'm talking and when I'm listening to you
because we're both, we're communicating.
And that's pretty stable.
Yeah, it's incredibly stable.
So I happen to be married to this woman at Fenrenko
and so I've been scanned by her over and over and over
since 2007 or six or something.
And so my language network is exactly the same
you know, like a month ago as it was back in 2007.
It's amazingly stable, it's astounding.
And it's a really fundamentally cool thing.
So my language network is like my face, okay?
It's not changing much over time inside my head.
Can I ask a quick question, sorry, as a small tangent?
At which point in the, as you grow up from baby to adult,
does it stabilize? We don't know
like that's a good that's a very hard question they're working on that right
now because of the problem scanning little kids like doing the trying to do
local trying to do the localization on little children in this scanner you're
lying in the fMRI scan that's the best way to figure out where something's
going on inside our brains and the scanner is loud and you're in this tiny little area, you're claustrophobic, and
it doesn't bother me at all. I can go to sleep in there. But some people are bothered by
it, and little kids don't really like it, and they don't like to lie still. And you
have to be really still because you move around, that messes up the coordinates of where everything
is.
And so, try to get, your question is, how and when are language developing?
How does this left lateralized system come to play?
And it's really hard to get a two year old to do this task.
But you can maybe, they're starting to get
three and four and five year olds
to do this task for short periods.
And it looks like it's there pretty early.
So clearly when you lead up to a baby's first words,
before that there's a lot of fascinating turmoil going on
about figuring out what are these people saying
and you're trying to make sense,
how does that connect to the world and all that kind of stuff.
Yeah, that might be just fascinating development
that's happening there.
That's hard to introspect.
But anyway, you.
But anyway, we're back to the scanner
and I can find my network in 15 minutes,
and now we can ask, find my network, find yours,
find 20 other people do this task,
and we can do some other tasks.
Anything else you think is thinking of some other thing.
I can do a spatial memory task.
I can do a music perception task.
I can do programming task if I program, okay? I can do where I
can like understand computer programs. And none of those tasks will tap the language
network at all. Like at all. There's no overlap. They're highly activated in other parts of
the brain. There's a bilateral network, which I think she tends to call the multiple demands
network, which does anything kind of hard. And so anything that's kind of difficult in some ways
will activate that multiple demands network.
I mean, music will be in some music area,
there's music specific kinds of areas.
And so, but none of them are activating
the language area at all, unless there's words.
Like, so if you have music and there's a song
and you can hear the words,
then you get the language area.
Are we talking about speaking and listening or are we also talking about reading?
This is all comprehension of any kind.
That is fascinating.
This network doesn't make any difference if it's written or spoken.
The thing that she calls, Federico calls the language network, is this high level language.
It's not about the spoken language and it's not about the written language.
It's about either one of them.
When you do speech, you listen to speech and you subtract away some language you don't
understand or you subtract away backwards speech, which sounds like speech but isn't.
You take away the sound part altogether.
And so, and then if you do written,
you get exactly the same network.
So for just reading the language
versus reading sort of nonsense words or something like that,
you'll find exactly the same network.
And so it's about high level,
the comprehension of language.
Yeah, in this case.
And the same thing happens,
production is a little harder to run the scanner,
but the same thing happens in production.
You get the same network. So production is a little harder, run the scanner, but the same thing happens in production. You get the same network.
So production's a little harder, right?
You have to figure out how do you run a task
in the network such that you're doing some kind
of production.
And I can't remember what,
they've done a bunch of different kinds of tasks there
where you get people to produce things,
figure out how to produce.
And the same network goes on there,
exactly the same place.
And so if you, wait, wait.
So if you read random words.
Yeah, if you read things like,
Like Jabberish.
Yeah, yeah, Lewis Carroll's, Twas Brillig,
Jabberwocky, right, they call that Jabberwocky speech.
The network doesn't get activated.
Not as much.
There are words in there.
Because it's still, it's ecotic.
There's function words and stuff,
so it's lower activation.
It's fascinating.
Yeah, yeah, so there's like, basically the more language
like it is, the higher it goes
in the language network. And that network is there from when you speak from as soon as you learn
language. And, and it's, it's there, like you speak multiple languages, the same network is going for
your multiple languages. So you speak English, you speak Russian, then the both of them are
hitting that same network. If you, if you're fluent in those languages. So programming.
Not at all, isn't that amazing?
Even if you're a really good programmer,
that is not a human language,
it's just not conveying the same information.
And so it is not in the language network.
And so-
That is mind blowing as I think.
That's pretty cool. That's weird.
That's amazing.
So that's like one set of data.
This is hers, like shows that
what you might think is thinking is not language.
Language is just this conventionalized system
that we've worked out in human languages.
Oh, another fascinating little bit is that
even if they're these constructed languages,
like Klingon or I don't know the languages
from Game of Thrones, I'm sorry,
I don't remember those languages.
Maybe you do. There's a lot of people offended right now sorry, I don't remember those languages. Maybe you do.
There's a lot of people offending right now.
There's people that speak those languages.
They really speak those languages
because the people that wrote the languages for the shows,
they did an amazing job of constructing
something like a human language.
And that lights up the language area.
That's like, because they can speak, you know,
pretty much arbitrary thoughts in a human language.
It's not a, it's a constructed human language.
Probably it's related to human languages
because the people that were constructing them
were making them like human languages in various ways.
But it also activates the same network,
which is pretty, pretty cool.
Anyway.
Sorry to go into a place where you may be
a little bit philosophical,
but is it possible that this area of the brain
is doing some kind of translation into a deeper set of,
almost like concepts?
I mean, it has to be doing.
So it's doing in communication, right?
It is translating from thought,
whatever that is, is more abstract,
and it's doing that.
That's what it's doing.
Like, it is, that is kind of what it is doing.
It's kind of a meaning network, I guess.
Yeah, like a translation network.
But I wonder what is at the core, at the bottom of it,
like what are thoughts?
Are they thoughts?
To me, like thoughts and words, are they neighbors
or is it one turtle sitting on top of the other?
Meaning like, is there a deep set of concepts that we?
Well, there's connections right between
what these things mean,
and then there's probably other parts of the brain
that what these things mean.
And so, when I'm talking about whatever it is
I wanna talk about, it'll be represented somewhere else.
That knowledge of whatever that is
will be represented somewhere else.
Well, I wonder if there's like some stable,
nicely compressed encoding of meanings
that's separate from language.
I guess the implication here is that
that we don't think in language.
That's correct.
Isn't that cool?
And that's so interesting.
So people, I mean, this is like hard to do experiments on, but there is this idea of inner voice and a lot of people have an inner voice. And so if you do
a poll on the internet and ask if you hear yourself talking when you're just thinking or whatever,
about 70 or 80% of people will say yes. Most people have an inner voice. I don't. And so I
always find this strange. So when people talk about an inner voice,
I always thought this was a metaphor. And they hear. I know most of you, whoever's listening to
this thinks I'm crazy now because I don't have an inner voice and I just don't know what you're
listening to. It sounds so kind of annoying to me to have this voice going on while you're thinking,
but I guess most people have that, and I don't have that,
and we don't really know what that connects to.
I wonder if the inner voice activates that same network.
I don't know.
I don't know.
I mean, this could be speechy, right?
So that's like, you hear, do you have an inner voice?
I don't think so.
Oh.
A lot of people have this sense that they hear other,
they hear themselves, and then say they read someone's email, I've
heard people tell me that they hear that other person's voice when they read other people's
emails.
And I'm like, wow, that sounds so disruptive.
I do think I vocalize what I'm reading, but I don't think I hear a voice.
Well, you probably don't have an inner voice.
Yeah, I don't think I have an inner voice.
People have an inner voice.
People have this strong percept of hearing sound
in their heads when they're just thinking.
I refuse to believe that's the majority of people.
Majority, absolutely.
What?
It's like two thirds or three quarters.
It's a lot.
I would never ask class, and I went to internet,
they always say that.
So you're in a minority.
It could be a self-report flaw.
It could be.
You know, when I'm reading, inside my head,
I'm kind of like saying the words,
which is probably the wrong way to read,
but I don't hear a voice.
There's no percept of a voice.
I refuse to believe the majority of people have a voice.
Anyway, it's a fascinating, the human brain is fascinating,
but it still blew my mind that the,
that language does appear,
comprehension does appear to be separate from thinking.
So that's one set.
One set of data from Fedorenko's group is that
no matter what task you do,
if it doesn't have words and combinations of words in it,
then it won't light up the language network.
And you know, you could, it'll be active somewhere else,
but not there.
So that's one.
And then this other piece of evidence
relevant to that question is,
it turns out there are this group of people
who've had a massive stroke on the left side
and wiped out their language network.
And as long as they didn't wipe out
everything on the right as well,
in that case they wouldn't be cognitively know, cognitively functionable. But if they just
wiped out language, which is pretty tough to do because it's very expansive on the left,
but if they have, then there are these there's patients like this, so-called global aphasics,
who can do any task just fine, but not language. They can't talk to them. I mean, they don't understand you.
They can't speak, can't write, they can't read,
but they can play chess, they can drive their cars,
they can do all kinds of other stuff.
You know, do math.
So math is not in the language area, for instance.
You do arithmetic and stuff, that's not language area.
It's got symbols.
So people sort of confuse some kind of
symbolic processing with language,
and symbolic processing is not the same. So there are symbols and they have meaning, but it's got symbols. So people sort of confuse some kind of symbolic processing with language, and symbolic processing is not the same.
So there are symbols and they have meaning, but it's not language.
It's not a, you know, conventionalized language system.
And so, math isn't there.
So they can do math.
They do just as well as their control, age-matched controls, and all these tasks.
This is Rosemary Varley over in University College London, who has a bunch of patients
who she's shown this, that they're just,
so that sort of combination suggests
that language isn't necessary for thinking.
It doesn't mean you can't think in language,
you could think in language,
because language allows a lot of expression,
but it's just, you don't need it for thinking.
It suggests that language is a separate system.
This is kind of blowing my mind right now.
It's cool, isn't it?
I'm trying to load that in
because it has implications for large language models.
It sure does, and they've been working on that.
Well, let's take a stroll there.
You wrote that the best current theories of human language
are arguably large language models.
So this has to do with form.
It's kind of a big theory,
but the reason it's arguably the best is that it does the best at predicting
what's English for instance. It's incredibly good, you know, it's better
than any other theory. It's so you know, but you know, we don't you know, there's
it's not sort of there's not enough detail. It's opaque, like there's not you
don't know what's going on. No, what's going on. It's another black box. But I think it's you know, it is a theory. What there's not, you don't know what's going on. You don't know what's going on.
It's another black box.
But I think it's, you know, it is a theory.
What's your definition of a theory?
Cause it's a gigantic, it's a gigantic black box
with a very large number of parameters controlling it.
To me, theory usually requires a simplicity, right?
Well, I don't know.
Maybe I'm just being loose there.
I think it's a, it's not a great theory, but it's a theory.
It's a good theory in one sense
in that it covers all the data.
Like anything you wanna say in English, it doesn't.
So that's how it's arguably the best,
is that no other theory is as good
as a large language model in predicting exactly
what's good and what's bad in English.
Now you're saying, is it a good theory?
Well, probably not, because I want a smaller theory
than that, it's too big, I agree.
You could probably construct mechanism by which
it can generate a simple explanation
of a particular language, like a set of rules.
Something like, it could generate a dependency grammar
for a language, right?
Yes.
You could probably, you could probably just ask it about itself.
Well, that presumes, and there's some evidence for this,
that some large language models are implementing
something like dependency grammar inside them.
And so there's work from a guy called Chris Manning
and colleagues over at Stanford in natural language.
And they looked at, I don't know how many
large language model types, but certainly Bert
and some others where you do some kind of fancy math
to figure out exactly what kind of abstractions
of representations are going on.
And they were saying saying it does look like
dependency structure is what they're constructing.
It doesn't, so it's actually a very, very good map.
So they are constructing something like that.
Does it mean that they're using that for meaning?
I mean, probably, but we don't know.
You write that the kinds of theories of language
that LLMs are closest to are called
construction-based theories.
Can you explain what construction-based theories are?
It's just a general theory of language such that
there's a form and a meaning pair
for lots of pieces of the language.
And so it's primarily usage-based,
is the construction grammar.
It's just trying to deal with the things
that people
actually say, actually say and actually write. And so it's a usage-based idea. And what's a construction? A construction is either a simple word, so like a morpheme plus
its meaning, or a combination of words. It's basically a combination of words, like the rules.
words, like the rules. But it's unspecified as to what the form of the grammar is underlyingly. And so I would argue that the dependency grammar is maybe the right form to use for the types
of construction grammar. Construction grammar typically isn't kind of formalized quite. And so maybe the formalization,
a formalization of that,
it might be in dependency grammar.
I mean, I would think so,
but I mean, it's up to people,
other researchers in that area if they agree or not.
So what do you think that
large language models understand language?
Are they mimicking language?
I guess the deeper question there is,
are they just understanding the surface form?
Or do they understand something deeper about the meaning
that then generates the form?
I mean, I would argue they're doing the form.
They're doing the form, they're doing it really, really well.
And are they doing the meaning?
No, probably not.
I mean, there's lots of these examples from various groups
showing that they can be tricked
in all kinds of ways.
They really don't understand the meaning of what's going on.
And so there's a lot of examples that he and other groups have given, which just show they
don't really understand what's going on.
So you know the Monty Hall problem is this silly problem, right?
Where you know, if you have three doors, it's let's make a deal, it's this old game show and there's three doors and there's a prize behind one and there's
some junk prizes behind the other two and you're trying to select one. And if you, you
know, he knows, Monty, he knows where the target item is, the good thing, he knows everything
is back there. And you're supposed to, he gives you a choice, you choose one of the three, and then he opens one of the doors and it's some junk prize. And
then the question is, should you trade to get the other one? And the answer is yes, you should trade
because he knew which ones you could turn around. And so now the odds are two thirds. Okay. And then
you just change that a little bit to the large language model. The large language model has seen
that, that, that explanation so many times that it just,
if you change the story, it's a little bit,
but it make it sound like it's the Monte Hall problem,
but it's not.
You just say, oh, there's three doors
and one behind them is a good prize,
and there's two bad doors.
I happen to know it's behind door number one.
The good prize, the car is behind door number one.
So I'm gonna choose door number one.
Monte Hall opens door number three
and shows me nothing there. Should I trade for door number two? Even though I know the good prize is in door number one. So I'm going to choose door number one. Monty Hall opens door number three and shows me nothing there. Should I trade for door number two?
Even though I know the good prize in door number one. And then the large-language malls say,
yes, you should trade because it just goes through the forms that it's seen before so many times on
these cases where it's yes, you should trade because your odds have shifted from one and three now to
two out of three to being that thing.
It doesn't have any way to remember that actually
you have 100% probability behind that door number one.
You know that, that's not part of the scheme
that it's seen hundreds and hundreds of times before.
And so you can't, even if you try to explain to it
that it's wrong, that they can't do that,
it'll just keep giving you back the problem. But it's also possible that a large language model will be aware of the fact that it's wrong, that they can't do that, it'll just keep giving you back the problem.
But it's also possible that a larger language model
will be aware of the fact that there's sometimes
over-representation of a particular kind of formulation
and it's easy to get tricked by that.
And so you could see if they get larger and larger,
models be a little bit more skeptical.
So you see over overrepresentation.
So it just feels like form can,
training on form can go really far
in terms of being able to generate things
that look like the thing understands deeply
the underlying world model of the kind of mathematical world,
physical world, psychological world
that would generate these kinds of sentences.
It just feels like you're creeping close
to the meaning part.
Easily fooled, all this kind of stuff,
but that's humans too.
So it just seems really impressive
how often it seems like it understands concepts.
I mean, you don't have to convince me of that.
I am very, very impressed, but does it do, I mean,
you're giving a possible world where maybe
someone's gonna train some other versions
such that it'll be somehow abstracting away
from types of forms.
I mean, I don't think that's happened.
And so.
Well, no, no, no, no.
I'm not saying that.
I think when you just look at anecdotal examples
and just showing a large number of them
where it doesn't seem to understand.
Yeah.
And it's easily fooled.
Yes.
That does not seem like a scientific,
the data-driven analysis of how many places
is damn impressive in terms of meaning and understanding
and how many places is easily fooled.
That's not the inference.
So I don't wanna make that,
the inference I wouldn't wanna make was that inference.
The inference I'm trying to push is just that,
is it like humans here?
It's probably not like humans here, it's different.
So humans don't make that error.
If you explain that to them,
they're not gonna make that error.
They don't make that error.
And so that's something, it's doing something different
from humans that they're doing in that case.
What's the mechanism by which humans figure out
that it's an error?
I'm just saying the error there is like,
if I explain to you there's a 100% chance
that the car is behind this case, this door,
well, do you want to trade?
People say no, but this thing will say yes
because it's so trick, that trick,
it's so wound up on the form that it's,
that's an error that a human doesn't make,
which is kind of interesting.
Less likely to make, I should say.
Yeah, less likely.
Because humans are very...
Oh yeah.
You're asking a system to understand 100%,
like you're asking some mathematical concepts.
And so like...
Look, the places where large language models are,
the form is amazing.
So let's go back to nested structures,
center embedded structures.
Okay, if you ask a human to complete those,
they can't do it.
Neither can a large language model.
They're just like humans in that.
If you ask, if I ask a large language model-
That's fascinating, by the way.
That's-
The central embedding.
Yeah, central embedding.
The central embedding struggles with-
Just like humans, exactly like humans.
Exactly the same way as humans.
And that's not trained.
So they do exactly, so that is a similarity.
So, but then it's, that's not meaning, right?
This is form.
But when we get into meaning,
this is where they get kind of messed up.
When you start to saying, oh, what's behind this door?
Oh, it's, you know, this is the thing I want.
Humans don't mess that up as much, you know.
Here, the form is, it's, you know, this is the thing I want. Humans don't mess that up as much. You know, here, the form is, it's just like,
the form of the match is amazing,
is similar, without being trained to do that.
I mean, it's trained in the sense
that it's getting lots of data,
which is just like human data,
but it's not being trained on, you know,
bad sentences and being told what's bad.
It just can't do those.
It'll actually say things like,
those are too hard for me to complete or something, which is kind of interesting actually. How does it
know that? I don't know. But it really often doesn't just complete sentence. It very often says stuff
that's true and sometimes says stuff that's not true. And almost always the form is great.
But it's still very surprising that with really great form
it's able to generate a lot of things that are true.
Based on what it's trained on and so on.
So it's not just form that it's generating,
it's mimicking true statements from That's right, that's right.
From the internet.
I guess the underlying idea there is that on the internet,
truth is overrepresented versus falsehoods.
I think that's probably right.
So but the fundamental thing is trained on,
you're saying is just form.
I think so, yeah, yeah, I think so.
Well, that's a sad, to me that's still a little bit of open question.
I probably lean agreeing with you,
especially now you just blown my mind
that there's a separate module in the brain
for language versus thinking.
Maybe there's a fundamental part missing
from the large language model approach
that lacks the thinking, the reasoning capability.
Yeah, that's what this group argues. So the same group, Fedorenko's group, has a
recent paper arguing exactly that. There's a guy called Kyle Mawa, who's here in Austin, Texas,
actually. He's an old student of mine, but he's a faculty in linguistics at Texas,
and he was the first author on that.
That's fascinating.
Still, to me, an open question.
Yeah.
What are the interesting limits of LLMs?
You know, I don't see any limits to their form.
Their form is perfect. Impressive.
Yeah, yeah, yeah, it's pretty much,
I mean, it's close to being.
Well, you said ability to complete central embeddings.
Yeah, it's just the same as humans.
It seems the same.
But that's not perfect, right?
It should be able to. That's good.
No, but I want it to be like humans.
I'm trying to, I want a model of humans.
But, oh wait, wait, wait, wait.
Oh, so perfect is as close to humans as possible.
I got it. Yeah.
But you should be able to, if you're not human,
you're like, you're superhuman.
You should be able to complete
central embedded sentences, right? I mean, if you're not human, you're superhuman, you should be able to complete sensual embedded sentences, right?
I mean, that's the mechanism, if it's modeling,
I think it's kind of really interesting that it can't.
That it's really interesting.
That it's more like, I think it's potentially
underlying modeling something like
what the way the form is processed.
The form of human language.
The way that you.
And how humans process the language. Yes, I think that's plausible. The form of human language. The way that you. And how humans process the language.
Yes, I think that's plausible.
And how they generate language.
Process language and generate language, that's fascinating.
So in that sense, they're perfect.
If we can just linger on the center embedding thing
that's hard for LLM to produce,
and that seems really impressive
because that's hard for humans to produce,
and how does that connect to the thing
we've been talking about before,
which is the dependency grammar framework
in which you view language and the finding
that short dependencies seem to be
a universal part of language.
So why is it hard to complete center embeddings?
So what I like about dependency grammar
is it makes
the cognitive cost associated with
longer distance connections very transparent.
Basically, it turns out there is a cost associated
with producing and comprehending connections between words
which are just not beside each other.
The further apart they are, the worse it is.
According to, well, we can measure that.
And there is a cost associated with that.
Can you just linger on what do you mean by cognitive cost?
Sure.
How do you measure it?
Oh, well, you can measure it in a lot of ways.
The simplest is just asking people to say whether,
how good a sentence sounds.
We just ask.
That's one way to measure.
And you try to triangulate then across sentences and across structures to try to figure out what the
source of that is. You can look at reading times in controlled materials, you know, in certain
kinds of materials and then we can like measure the dependency distances there. We can, there's a
recent study which looked at, we're talking about the brain here, we could look
at the language network, okay?
We could look at the language network and we could look at the activation in the language
network and how big the activation is depending on the length of the dependencies.
And it turns out in just random sentences that you're listening to, if you're listening
to, so it turns out there are people listening to stories here.
And the bigger, the longer the dependency is, the stronger the activation in the language network.
And so there's some measure, there's a bunch
of different measures we could do.
That's a kind of a neat measure actually of actual.
Activations. Activation in the brain.
So that you can somehow in different ways
convert it to a number.
I wonder if there's a beautiful equation
connecting cognitive cost and length of dependency.
E equals MC squared kind of thing.
Yeah, it's complicated but probably it's doable.
I would guess it's doable.
I tried to do that a while ago
and I was reasonably successful
but for some reason I stopped working on that.
I agree with you that it would be nice to figure out.
So there's like some way to figure out the cost. I mean, it's
complicated.
Another issue you raised before was like, how do you measure distance? Is it words?
It probably isn't. Is it part of the problem? Is that some words matter more than others
and probably, you know, meaning like nouns might matter depending and then maybe depends
on which kind of noun.
Is it a noun we've already introduced
or a noun that's already been mentioned?
Is it a pronoun versus a name?
Like all these things probably matter.
So probably the simplest thing to do is just like,
oh, let's forget about all that
and just think about words or morphemes.
For sure, but there might be some insight
in the kind of function
that fits the data, meaning like quadratic, like what?
I think it's an exponential.
So we think it's probably an exponential
such that the longer the distance, the less it matters.
And so then it's the sum of those is my,
that was our best guess a while ago.
So you've got a bunch of dependencies,
if you've got a bunch of them
that are being connected at some point,
that's at the ends of those,
the cost is some exponential function of those,
is my guess.
But because the reason it's probably an exponential
is like it's not just the distance between two words,
because I can make a very, very long subject,
verb dependency, by adding lots and lots of noun phrases
and prepositional phrases, and it doesn't matter too much. It's when you do nested, when I
have multiple of these, then things go really bad, go south.
Probably somehow connected to working memory or something like this.
Yeah, that's probably a function of the memory here, is the access, is trying to find those
earlier things. It's kind of hard to figure out what
was referred to earlier. Those are those connections. That's the sort of notion of murky, as opposed
to a storagey thing, but trying to connect, retrieve those earlier words depending on
what was in between. And then we're talking about interference of similar things in between.
The right theory probably has that kind of notion and it is an interference of similar. And so, I'm dealing with an abstraction over the right theory,
which is just, you know, let's count words. It's not right, but it's close. And then maybe you're
right though, there's some sort of an exponential or something to figure out the totals so we can
figure out a function for any given sentence in any given language. But, you know, it's funny,
people haven't done that too much,
which I do think is, I'm interested
that you find that interesting.
I really find that interesting,
and a lot of people haven't found it interesting,
and I don't know why I haven't got people
to wanna work on that.
I really like that too.
No, that's a beautiful, and the underlying idea
is beautiful that there's a cognitive cost
that correlates with the length of dependency.
It just, it feels like it's a deep,
I mean language is so fundamental to the human experience
and this is a nice, clean theory of language
where it's like wow, okay, so like we like our words
close together, the dependent words close together.
That's why I like it too, it's so simple.
Yeah, the simplicity of the theory.
And yet it explains some very complicated phenomena.
If I write these very complicated sentences,
it's kind of hard to know why they're so hard
and you can like, oh, nail it down.
I can give you a math formula
for why each one of them is bad and where.
And that's kind of cool.
I think that's very neat.
Have you gone through the process?
Is there like, if you take a piece of text
and then simplify,
sort of like there's an average length of dependency
and then you like, reduce it and see comprehension
on the entire, not just a single sentence,
but like, you know, you go from James Joyce
to Hemingway or something.
No, no, simple answer is no, that does,
there's probably things you can do
in that kind of direction.
That's fun.
We might, we're gonna talk about legalese at some point,
and so maybe we'll talk about that kind of thinking
with applied to legalese.
Well, let's talk about legalese,
because you mentioned that as an exception,
which is taking tangent upon tangent.
That's an interesting one.
You give it as an exception.
It's an exception.
That you say that most natural languages,
as we've been talking about,
have local dependencies with one exception, legalese.
That's right.
So what is legalese, first of all?
Oh, well, legalese is what you think it is.
It's just any legal language.
I mean, I actually know very little
about the kind of language that lawyers use.
So I'm just talking about language in laws and language in contracts.
So the stuff that you have to run into, we have to run into every other day or every
day and you skip over because it reads poorly.
Or partly it's just long, right?
There's a lot of text there that we don't really want to know about.
But the thing I'm interested in, so I've been working with this guy called Eric Martinez,
he was a lawyer who was taking my class.
I was teaching a psycholinguistics lab class, and I had been teaching it for a long time
at MIT, and he was a law student at Harvard.
And he took the class because he had done some linguistics as an undergrad and he was interested in the problem of why legalese sounds hard to understand. So why is it hard to understand and
why do they write that way if it is so hard to understand? It seems apparent that it's hard to
understand. The question is why is it? And so we didn't know and we did an evaluation of a bunch
of contracts. Actually, we just took a bunch of random contracts.
Because I don't know, contracts and laws
might not be exactly the same, but contracts
are kind of the things that most people have to deal with most
of the time.
And so that's kind of the most common thing
that humans have, that adults in our industrialized society
have to deal with a lot.
And so that's what we pulled.
And we didn't know what was hard about them, but it turns out that the way they're written
is very center embedded, has nested structures in them.
So it has low frequency words as well.
That's not surprising.
Lots of texts have low frequency.
It does have surprising, slightly lower frequency words than other kinds of control texts, even
sort of academic texts.
Legalese is even worse.
It is the worst that we were being able to find.
It's fascinating.
You just reveal the game that lawyers are playing.
They're optimizing it different.
Well, you know, it's interesting.
Now you're getting at why, and so,
and I don't think, so now you're saying
they're doing it intentionally.
I don't think they're doing it intentionally.
But let's get to the next.
It's an emergent phenomena, okay.
Yeah, yeah, yeah, we'll get to that.
We'll get to that.
And so, but we wanted to see why.
So we see what first as opposed to,
because it turns out that we're not the first
to observe that legalese is weird.
Like back to Nixon had a Plain Language Act in 1970
and Obama had one.
And boy, a lot of these,
a lot of presidents have said,
oh, we've got to simplify legal language,
must simplify it.
But you don't know how it's complicated.
It's not easy to simplify it.
You need to know what it is you're supposed to do
before you can fix it, right?
And so you need to like, you need a psycholinguist
to analyze the text and see what's wrong with it
before you can like fix it.
You don't know how to fix it.
How am I supposed to fix something
I don't know what's wrong with it.
And so what we did was just that's what we did.
We figured out, well, that's okay.
We just took a bunch of contracts, had people, and we encoded them for a bunch of features.
And so another feature of the people, one of them was center embedding.
And so that is like basically how often a clause would intervene between a subject and a verb, for example. That's one kind of
a center embedding of a clause. And turns out they're massively center embedded. So
I think in random contracts and in random laws, I think you get about 70% or 80, something
like 70% of sentences have a center embedded clause in them, which is insanely high. If
you go to any other text, it's down to 20% or something.
It's so much higher than any control you can think of,
including, you think, oh, people think,
oh, technical academic texts.
No, people don't write center embedded sentences
in technical academic texts.
I mean, they do a little bit,
but it's on the 20%, 30% realm as opposed to 70.
And so there's that, and there's low frequency words,
and then people, oh, maybe it's passive.
People don't like the passive.
Passive for some reason, the passive voice in English
has a bad rap, and I'm not really sure
where that comes from.
And there is a lot of passive.
There's much more passive voice in legalese
than there is in other texts.
And the passive voice accounts for some
of the low frequency words.
No, no, no, no, those are separate.
Oh, so passive voice, socks, low frequency words, socks.
Well, socks is different.
That's a judgment, I'm passive.
Yeah, yeah, yeah, drop the judgment.
It's just like these are frequent.
These are things which happen in legalese texts.
Then we can ask, the dependent measure is like,
how well you understand those things with those features.
Okay, And so then
it turns out the passive makes no difference. So it has a zero effect on your comprehension
ability, on your recall ability, nothing at all. That has no effect. The words matter a little bit.
They do. Low frequency words are going to hurt you in recall and understanding. But what really,
what really hurts is the central embedding. That kills you. That is like, that slows people down.
That makes them, that makes them very poor at understanding.
That makes them, they can't recall what was said as well, nearly as well.
And we did this not only on laypeople, we did it on a lot of laypeople.
We ran it on a hundred lawyers.
We recruited lawyers from a wide range of sort of different levels of law firms and stuff, and they have the same pattern.
So they also, like when they did this, I did not know it would happen. I thought maybe they could
process, they're used to legalese, they think process just as well as it was normal. No, no,
they're much better than laypeople. So they're much, like they can much better recall, much
better understanding, but they have the same main effects as laypeople, exactly the same. So they also much prefer the non-center. So we
constructed non-center embedded versions of each of these. We constructed versions which have
higher frequency words in those places, and we did un-passivized. We turned them into active
versions. The passive active made no difference.
The words made little difference,
and the un-center embedding makes big differences
in all the populations.
Un-center embedding.
How hard is that process, by the way?
It's not very hard.
I'm so sorry, I don't question,
but how hard is it to detect center embedding?
Oh, easy, easy to detect.
That's an easy one to parse.
You're just looking at long dependencies,
or is there a real?
Yeah, yeah, you can just, you can,
so there's automatic parsers for English,
which are pretty good.
Very good.
And they can detect center embedding.
Oh yeah, very, very.
Or I guess nesting.
Perfectly.
Yeah, they've learned, yeah, pretty much.
So you're not just looking for long dependencies,
you're just literally looking for center embedding.
Yeah, we are in this case, in these cases.
But long dependencies are, they're highly correlated,
these types of this.
So like center embedding is a big bomb you throw inside
of a sentence that just blows up the that makes
Yeah, can I read a sentence for you from these things?
I mean, this is just like one of the things that this is just my eyes might glaze over in middle mid-sentence
No, I understand that I mean legally is hard
This is a go because in the event that any payment or benefit by the company, all such payments and benefits, including the payments and benefits under section 3A hereof, being
here and after referred to as a total payment, would be subject to the excise tax, then the
cash severance payments shall be reduced.
So that's something we pulled from a regular text, from a contract.
And the center embedded bit there is just, for some reason, there's a definition.
They throw the definition of what payments and benefits are in between the subject and
the verb.
How about don't do that?
How about put the definition somewhere else as opposed to in the middle of the sentence?
And so that's very, very common, by the way.
That's what happens.
You just throw your definitions, you use a word, a couple of words, and then you define
it and then you continue the sentence. Like, just don't write like that. And you ask,
so then we asked lawyers, we thought, oh, maybe lawyers like this. Lawyers don't like this.
They don't like this. They don't want to, they don't want to write like this. They, they,
we asked them to rate materials which are with the same meaning, with, with un-centred bed,
un-centred bed, and they much preferred the un-centred bed versions.
On the comprehension, on the reading side.
Yeah, and we asked them,
we asked them would you hire someone
who writes like this or this?
We asked them all kinds of questions,
and they always preferred the less complicated version,
all of them.
So I don't even think they want it this way.
Yeah, but how did it happen?
How did it happen?
That's a very good question.
And the answer is, they still don't know.
But. I have some theories. Well, our question. And the answer is, I still don't know. But.
I have some theories.
Well, our best theory at the moment is that there's,
there's actually some kind of a performative meaning
in the center embedding in the style,
which tells you it's legalese.
We think that that's the kind of a style
which tells you it's legalese.
Like that's a, it's a reasonable guess.
And maybe it's just, so for instance, if you're like, it's like
a magic spell. So we kind of call this the magic spell hypothesis. So when you give them,
when you tell someone to put a magic spell on someone, what do you do? They,
you know, people know what a magic spell is and they do a lot of rhyming. You know, that's kind
of what people will tend to do. They'll do rhyming and they'll do sort of like some kind of poetry
kind of thing. Abracadabra type of thing. Exactly, yeah. And maybe there's a syntactic sort of reflex here of a magic spell, which is
centermending. And so that's like, oh, it's trying to like, tell you this is like, this is something
which is true, which is what the goal of law is, right? It's telling you something that we want
you to believe is certainly true, right? That's what legal contracts are trying to enforce on you,
right?
And so maybe that's like a form which has,
this is like an abstract, very abstract form,
sent from betting, which has a meaning associated with it.
Well, don't you think there's an incentive
for lawyers to generate things that are hard to understand.
That was one of our working hypotheses.
We just couldn't find any evidence of that.
No, lawyers also don't understand it.
But you're creating space.
Why, you, I mean, you ask in a communist Soviet Union,
the individual members, their self-report is not going to
correctly reflect
what is broken about the gigantic bureaucracy
that then leads to Chernobyl or something like this.
I think the incentives under which you operate
are not always transparent to the members within that system.
So it just feels like a strange coincidence
that there is benefit if you just zoom out,
look at the system as opposed to asking individual lawyers
that making something hard to understand
is going to make a lot of people money.
You're gonna need a lawyer to figure that out,
I guess, from the perspective of the individual,
but then that could be the performative aspect.
It could be as opposed to the incentive driven
to be complicated.
It could be performative to where we lawyers
speak in this sophisticated way
and you regular humans don't understand it
so you need to hire a lawyer.
Yeah, I don't know which one it is,
but it's suspicious.
Suspicious that it's hard to understand
and that everybody's eyes glaze over and they don't read.
I'm suspicious as well, I'm still suspicious.
And I hear what you're saying, it could be kind of,
no individual, and even average of individuals,
it could just be a few bad apples in a way
which are driving the effect in some way.
Influential bad apples, that everybody looks up to,
whatever, they're like central figures in how, you know.
But it turns, it is kind of interesting
that among our hundred lawyers, they did not shift that.
They didn't want this, that's fascinating.
They really didn't like it.
And so it gave us hope.
And they weren't better than regular people
at comprehending it.
Or they were on average better, but like.
But they had the same difference.
The same, same difference.
Exact same difference.
So they, but I, they wanted it fixed.
So they also, and so that gave us hope that
because it actually isn't very hard to construct a material
which is un-center embedded and has the same meaning,
it's not very hard to do.
Just basically in that situation,
you're just putting definitions outside
of the subject-verb relation in that particular example.
And that's kind of, that's pretty general,
what they're doing is just throwing stuff in there which you didn't have to put in
there. There's extra words involved typically. You may need a few extra words sort of to refer
to the things that you're defining outside in some way. Because if you only use it in that one
sentence, then there's no reason to introduce extra terms. So we might have a few more words,
but it'll be easier to understand.
So I mean, I have hope that now that maybe we can
make legalese less convoluted in this way.
So maybe the next president of the United States can,
instead of saying generic things, say,
I ban center embeddings and make Ted the language czar.
Well, he makes Eric.
Martinez is the guy you should really put in there.
Yeah, yeah.
Yeah, yeah.
I mean, yeah.
But center embeddings are the bad thing to have.
That's right.
So you can get rid of that.
That'll do a lot of it.
That'll fix a lot.
That's fascinating.
That is so fascinating.
And it is really fascinating on many fronts
that humans are just not able to deal with this kind of thing.
And that language, because of that, evolved in the way it did.
It's fascinating.
So one of the mathematical formulations you have
when talking about language is communication is,
let's say, do you have noisy channels?
What's a noisy channel?
So that's about communication and so this is
going back to Shannon. So Shannon, Claude Shannon was a student at MIT in the
40s and so he wrote this very influential piece of work about
communication theory or information theory and he was interested in human
language actually. He was trying, he was interested in human language, actually. He was interested in this problem of communication,
of getting a message from my head to your head.
And so he was concerned or interested in
what was a robust way to do that.
And so that, assuming we both speak the same language,
we both already speak English, whatever the language is.
We speak that.
What is a way that I can say the language
so that it's most likely to get the signal that I want to you?
And then the problem there in the communication
is the noisy channel.
It's that there's a lot of noise in the system.
I don't speak perfectly.
I make errors. That's lot of noise in the system. I don't speak perfectly, I make errors, that's noise.
There's background noise, you know that.
Like a literal background noise.
There is like white noise in the background
or some other kind of noise, there's some speaking going on
that you're just, you're at a party,
that's background noise, you're trying to hear someone,
it's hard to understand them because there's all this
other stuff going on in the background.
And then there's noise on the communication, on the receiver side, so that you have some problem
maybe understanding me for stuff that's just internal to you in some way. So you've got
some other problems, whatever, with understanding for whatever reasons. Maybe you've had too
much to drink. Who knows why you're not able to pay attention to the signal. So that's the noisy channel.
And so that language, if it's communication system,
we are trying to optimize in some sense
the passing of the message from one side to the other.
And so one idea is that maybe aspects of like word order,
for example, might've optimized in some way
to make language
a little more easy to be passed from speaker to listener.
And so Shannon's the guy that did this stuff
way back in the forties.
He was very interesting, you know, historically,
he was interested in working in linguistics.
He was in MIT and he did,
this was his master's thesis of all things.
You know, it's crazy how much he did
for his master's thesis in 1948, I think, or 49 something.
And he wanted to keep working in language
and it just wasn't a popular communication
as a reason, a source for what language was
wasn't popular at the time.
So Chomsky was moving in there.
He just wasn't able to get a handle there, I think.
And so he moved to Bell Haps and worked on communication
from a mathematical point of view
and did all kinds of amazing work.
And so he's just.
More on the signal side versus the language side.
It would have been interesting to see
if he pursued the language side.
That's really interesting.
He was interested in that.
His examples in the 40s are kind of like, they're very language-like things.
We can kind of show that there's a noisy channel process going on in when you're listening to me,
you know, you can often sort of guess what I meant by what I, you know, what you think I meant given what I said.
And I mean, with respect to sort of why language looks the way it does, we might, there might be sort of, as I alluded to,
there might be ways in which word order is somewhat optimized for,
for because of the noisy channel in some way.
I mean, that's really cool to sort of model.
If you don't hear certain parts of a sentence or have some probability of
missing that part, like how do you construct the language that's resilient to
that? That's somewhat robust to that.
Yeah, that's the idea. And then you're kind of saying like the word order
and the syntax of the language, the dependency length are all helpful. Yeah, well, the dependency
length is really about memory, right? I think that's like about sort of what's easier or harder
to produce in some way. And these other ideas are about robustness to communication, so the problem
of potential loss of signal due to noise.
There may be aspects of word order which is somewhat optimized for that.
We have this one guess in that direction.
These are kind of just those stories, I have to be pretty frank.
They're not like, I can't show this is true.
All we can do is look at the current languages of the world.
We can't see how languages change or anything because we've got these snapshots
of a few hundred or a few thousand languages.
We don't really, and we can't do the right kinds of modifications to test these things
experimentally.
And so, you know, so just take this with a grain of salt, okay, from this stuff.
The dependency stuff I can, I'm much more solid on.
I'm like, here's what the lengths are,
and here's what's hard, and here's what's easy,
and this is a reasonable structure.
I think I'm pretty reasonable.
Here's like why, you know, why does a word order
look the way it does?
We're now into shaky territory, but it's kind of cool.
But we're talking about, just to be clear,
we're talking about maybe just actually
the sounds of communication.
Like you and I are sitting in a bar, it's very loud,
and you model with a noisy channel the loudness,
the noise, and we have the signal that's coming across,
and you're saying word order might have something to do
with optimizing that, the presence of noise.
Yes, yes.
I mean, it's really interesting.
I mean, to me, it's interesting how much you can load
into the noisy channel.
Like, how much can you bake in?
You said like, you know, cognitive load on the receiver end.
We think that there's at least three different kinds of things going on there.
We probably don't want to treat them all as the same.
Sure.
And so I think that the right model, a better model of a noisy channel would have three
different sources of noise, which are background noise, speaker
inherent noise and listener inherent noise. And those are all different things.
Sure. But then underneath it, there's a million other subsets.
Yeah. Oh, yeah. That's true.
Under receiving, I just mentioned cognitive load on both sides. Then there's speech impediments,
or just everything.
World view, I mean, the meaning,
we start to creep into the meaning realm
of like, we have different worldviews.
Well, how about just form still though?
Like just what language you know.
Like, so how well you know the language.
And so if it's second language for you
versus first language, and maybe what other languages
you know, these are still just form stuff.
And that's like potentially very informative. And you know, how old you are. These are still just form stuff, and that's potentially very informative.
And how old you are, these things probably matter.
So like a child learning a language
is as a noisy representation of English grammar,
depending on how old they are.
So maybe when they're six, they're perfectly formed.
But-
You mentioned one of the things is like a way to measure
a language is learning problems.
So like, what's the correlation between everything
we've been talking about and how easy it is
to learn a language?
So is like short dependencies correlated
to ability to learn a language?
Is there some kind of, or like the dependency grammar,
is there some kind of connection there?
How easy it is to learn?
Yeah, well all the languages in the world's language,
none is, right now we know is any better than any other
with respect to sort of optimizing dependency lengths,
for example.
They're all kind of do it, do it well.
They all keep low.
It's, so I think of every human language as some kind of an opposite,, they all keep low. So I think of every human language
as some kind of an optimization problem,
a complex optimization problem to this communication problem.
And so they've solved it, they're just sort of
noisy solutions to this problem of communication.
There's just so many ways you can do this.
So they're not optimized for learning,
they're probably not blessed for communication.
And learning, so yes, one of the factors which is,
yeah, so learning is messing this up a bit.
And so, for example, if it were just about
minimizing dependency lengths,
and that was all that matters,
so then we might find grammars
which didn't have regularity in their rules,
but languages always have regularity in their rules. But languages always have regularity in their rules.
So what I mean by that is that if I wanted to say something
to you in the optimal way to say it,
it really mattered to me, all that mattered
was keeping the dependencies as close together as possible.
Then I would have a very lax set of free structure
or dependency rule.
They wouldn't have very many of those.
I would have very little of that.
And I would just put the words as close,
the things that refer to the things that are connected
right beside each other.
But we don't do that.
Like there are word order rules, right?
So they're very, and depending on the language,
they're more and less strict, right?
So you speak Russian, they're less strict than English.
English is very rigid word order rules.
We order things in a very particular way.
And so why do we do that?
Like that's probably not about communication,
that's probably about learning.
I mean, then we're talking about learning.
It's like probably easier to learn regular things,
things which are very predictable and easy to,
so that's probably about learning is our guess,
because that can't be about communication.
Can it be just noise?
Can it be just the messiness
of the development of a language?
Well, if it were just a communication,
then we should have languages
which have very, very free word order.
And we don't have that.
We have free-er, but not free.
Like there's always.
Well, no, but what I mean by noise is like cultural,
like sticky cultural things,
like the way you communicate,
just there's a stickiness to it.
That it's an imperfect, it's a noisy,
it's stochastic, the function over which
you're optimizing is very noisy.
So, because I don't, it feels weird to say
that learning is part of the objective function,
because some languages are way harder
to learn than others, right?
Or is that, that's not true?
That's interesting. I mean, that's the public perception, right?
Yes, that's true for a second language.
For a second language.
But that depends on what you started with, right? So it really depends on how close that second
language is to the first language you've got. And so, yes, it's very, very hard to learn Arabic if
you've started with English or it's harder to learn Japanese Arabic if you've started with English, or it's hard to learn Japanese,
or if you've started with,
Chinese I think is the worst.
There's like Defense Language Institute in the United States
has like a list of how hard it is
to learn what language from English.
I think Chinese is the worst.
But this is the second language.
You're saying babies don't care.
No.
There's no evidence that there's anything harder or easier
about any baby, any language learned.
Like by three or four they speak that language.
And so there's no evidence of anything harder or easier
about any human language.
They're all kind of equal.
To what degree is language,
this is returning to Chomsky a little bit, is innate.
You said that for Chomsky,
he used the idea that language is,
some aspects of language are innate to explain away
certain things that are observed.
How much are we born with language
at the core of our mind, brain?
I mean, the answer is I don't know, of course.
But I mean, I like to, I'm an engineer at heart, I guess,
and I sort of think it's fine to
postulate that a lot of it's learned.
And so I'm guessing that a lot of it's learned.
So I think the reason Chomsky went with the innateness is because he hypothesized movement
in his grammar.
He was interested in grammar and movement's hard to learn.
I think he's right.
Movement is a hard thing to learn, to learn these two things together and how they interact. And there's like a lot of
ways in which you might generate exactly the same sentences and it's like really hard. And so he's
like, oh, I guess it's learned. Sorry, I guess it's not learned, it's innate. And if you just
throw out the movement and just think about that in a different way, then you get some messiness,
different way, then you get some messiness. But the messiness is human language, which it actually fits better.
That messiness isn't a problem.
It's actually a valuable asset of the theory.
So I think I don't really see a reason to postulate much innate structure.
And that's kind of why I think these large language models are learning so well, is because
I think you can learn the forms of human language from the input.
I think that's likely to be true.
So that part of the brain that lights up when you're doing all the comprehension, that could
be learned.
That could be just, you don't need any.
Yeah, it doesn't have to be innate.
So lots of stuff is modular in the brain that's learned.
It doesn't have to, you know, so there's something called the visual word form area in the back,
and so it's in the back of your head, near the, you know, the visual cortex, okay?
And that is very specialized language, sorry, very specialized brain area, which does visual
word processing if you read, if you're a reader,
okay?
If you don't read, you don't have it, okay?
Guess what?
You spend some time learning to read and you develop that brain area which does exactly
that.
And so these, the modularization is not evidence for innateness.
So the modularization of a language area doesn't mean we're born with it.
We could have easily learned that.
We might have been born with it. We could have easily learned that. We might have been born with it.
We just don't know at this point.
We might very well have been born
with this left lateralized area.
I mean, there's like a lot of other interesting components
here, features of this kind of argument.
So some people get a stroke or something goes really wrong
on the left side, where the language area would be,
and that isn't there, it's not available. And it develops just fine on the left side, where the language area would be, and that isn't there. It's
not available. And it develops just fine on the right. So it's not about the left. It
goes to the left. This is a very interesting question. Why are any of the brain areas the
way that they are? How did they come to be that way? There's these natural experiments
which happen where people get these strange events in
their brains at very young ages which wipe out sections of their brain and they behave totally
normally and no one knows anything was wrong. And we find out later because they happen to be
accidentally scanned for some reason. It's like, what happened to your left hemisphere? It's missing.
There's not many people who miss their whole left hemisphere, but they'll be missing some other
section of their left or their right. And
they behave absolutely normally, would never know. So that's like a very interesting, you
know, current research. You know, this is another project that this person, Ev Fedorenko,
is working on. She's got all these people contacting her because she's scanned some
people who have been missing sections. One person missed a section of her brain
and was scanned in her lab.
And she happened to be a writer for the New York Times.
And there was an article in New York Times
about the, just about the scanning procedure
and about what might be learned about
by sort of the general process of MRI and language,
in an unnecessary language.
And because she's writing for the New York Times,
then all these people started writing to her,
who also have similar kinds of deficits
because they've been accidentally scanned for some reason
and found out they're missing some section.
They volunteer to be scanned.
So these are natural experiments.
Natural experiments.
They're kind of messy, but natural experiments are kind of cool.
She calls them interesting brains.
The first few hours, days, months of human life are fascinating.
It's like, well, inside the womb, actually, like that development.
That machinery, whatever that is, seems to create powerful humans
that are able to speak, comprehend, think, all that kind of stuff, no matter what happens. What would Chomsky say about the fact,
the thing you're saying now that language
seems to be happening separate from thought?
As far as I understand, maybe you're saying
that language is a language that's
being used in a way that's not being used in a way
that's not being used in a way that's not being used
in a way that's not being used in a way that's not being used
in a way that's not being used in a way that's not being used he say about the fact, the thing you're saying now that language seems to be happening separate from thought.
As far as I understand, maybe you can correct me,
he thought that language underpins.
Yeah, he thinks so.
I don't know what he'd say.
He would be surprised, because for him,
the idea is that language is the foundation of thought.
That's right, absolutely.
And it's pretty mind-blowing to think that it could be completely separate from thought. That's right, absolutely. And it's pretty mind-blowing to think that it could be
completely separate from thought. That's right, but so you know he's basically a philosopher,
philosopher of language in a way, thinking about these things. It's a fine thought. You can't test
it in his methods. You can't do a thought experiment to figure that out. You need a
scanner. You need brain damage people. You need something, you need ways to measure that. And that's what, you know, fMRI offers
as a, and, you know, patients are a little messier. fMRI is pretty unambiguous, I'd say.
It's like very unambiguous. There's no way to say that the language network is doing
any of these tasks. There's like, you should look at those data.
It's like, there's no chance that you can say
that those networks are overlapping.
They're not overlapping.
They're just like completely different.
And so, you know, you can always make,
you know, it's only two people, it's four people
or something for the patients.
And there's something special about them we don't know.
But these are just random people and with lots of them.
And you find always the same effects,
and it's very robust, I'd say.
What's a fascinating effect.
You mentioned Bolivia.
What's the connection between culture and language?
You've also mentioned that much of our study of language
comes from W-E-I-R-D, weird people. You've also mentioned that much of our study of language
comes from W-E-I-R-D, weird people, Western educated, industrialized, rich and democratic.
So when you study remote cultures,
such as around the Amazon jungle,
what can you learn about language?
So that term weird is from Joe Henrich.
He's at Harvard.
He's a Harvard evolutionary biologist.
And so he works on lots of different topics.
And he basically was pushing that observation that we should be careful about the inferences
we want to make when we're talking in psychology or social, yeah, mostly in psychology, I guess about
Humans if we're talking about, you know
Undergrads at MIT and Harvard those aren't the same right? These aren't the same things
and so if you want to make inferences about language for instance, you
there's a lot of very a lot of other kinds of languages in the world than English and
French and Chinese.
And so maybe for language, we care about how culture, because cultures can be very, I mean,
of course English and Chinese cultures are very different, but hunter-gatherers are much
more different in some ways.
And so if culture has an effect on what language is, then we kind of want to look there as well as looking.
It's not like the industrialized cultures aren't interesting. Of course they are.
But we want to look at non-industrialized cultures as well. And so I worked with two.
I worked with the Chimani, which are in Bolivia and Amazon, both in the Amazon in these cases.
There are so-called farmer-foragers, which is not hunter-gatherers.
It's sort of one-up from hunter-gatherers in that they do a little bit of farming as well,
a lot of hunting as well, but a little bit of farming. And the kind of farming they do
is the kind of farming that I might do if I ever were to grow tomatoes or something in my backyard.
So it's not like big field farming. It's just farming for a family, a few things you do
that.
So that's the kind of farming they do.
And the other group I've worked with are the Piraha, which are also in the Amazon and happen
to be in Brazil.
And that's with a guy called Dan Everett, who is a linguist, anthropologist who actually
lived and worked in the, I mean, he was a missionary
actually initially back in the 70s working with trying to translate languages so they could teach
them the Bible, teach them Christianity. What can you say about that?
Yeah, so the two groups I've worked with, the Chimani and the Piedaha, are both isolate languages,
meaning there's no known connected languages at all, just like on
their own. There's a lot of those and most of the isolates occur in the Amazon or in Papua New
Guinea and these places where the world has sort of stayed still for long enough and there hap, like, so there, there aren't earthquakes, there aren't,
um, well certainly no earthquakes in the Amazon jungle. And, and, and, uh, the climate isn't bad.
So you don't have droughts. And so, you know, in Africa, you've got a lot of moving of people
because there's drought problems. And so, so they get a lot of language contact when you have,
when people have to, if you you got to move because you got
no water, then you got to get going. And then you run into contact with other tribes, other groups.
In the Amazon, that's not the case. And so people can stay there for hundreds and hundreds and
probably thousands of years, I guess. And so these groups, the Chimani and the Pienawha are
both isolates in that. And I guess they've just lived there
for ages and ages with minimal contact with other outside groups.
And so I mean, I'm interested in them because they are, I mean, in these cases, I'm interested
in their words.
I would love to study their syntax, their orders of words, but I'm mostly just interested
in how languages are
connected to their cultures in this way. And so with the Pinoja, the most interesting, I was
working on number there, number information. And so the basic idea is I think language is invented.
That's what I get from the words here is that I think language is invented. We talked about color
earlier. It's the same idea. so that what you need to talk about
with someone else is what you're going to invent words for.
And so we invent labels for colors that I need, not that I can see, but the things I
need to tell you about so that I can get objects from you or get you to give me the right objects.
And I just don't need a word for teal or a word for aquamarine in the Amazon jungle for
the most part, because I don't have two things which differ on those colors.
I just don't have that.
And so numbers are really another fascinating source of information here where you might,
naively, I certainly thought that all humans would have words for exact counting and the Piyaraha don't.
Okay, so they don't have any words for even one. There's not a word for one in their language.
And so there's certainly not a word for two, three, or four. So that kind of blows people's
minds off. Yeah, that's blowing my mind. That's pretty weird. How are you going to ask,
I want two of those. You just don't. And so that's just not a thing you can possibly ask in the Pura Ha. It's not possible. That is, there's
no words for that. So here's how we found this out. Okay. So it was thought to be a one, two,
many language. There are three words, four quantifiers for sets. But people had thought
that those meant one, two, and many. But what they really mean is few, some, and many.
Many is correct.
It's few, some, and many.
And so the way we figured this out, and this is kind of cool,
is that we gave people a set of objects.
These were having to be spools of thread.
Doesn't really matter what they are.
Identical objects.
And when I sort of start off here,
I just give you one of those and say, what's that?
Okay, so you're a piano house speaker
and you tell me what it is.
And then I give you two and say, what's that?
And nothing's changing in this set except for the number.
Okay, and then I just ask you to label these things.
We just do this for a bunch of different people.
And frankly, I did this task.
This is fascinating.
And it's a weird, it's a little bit weird.
So they say the word that we thought was one, it's few, but for's a little bit weird. So they say the word that they thought that
we thought was one, it's few, but for the first one, and then maybe they say few, or maybe they
say some for the second. And then for the third or the fourth, they start using the word many for
the set. And then five, six, seven, eight, I go all the way to 10. And it's always the same word. And
they look at me like I'm stupid because they told me what the word was for six, seven, eight, and I'm going to continue asking them at nine and ten. I'm sorry. They understand
that I want to know their language. That's the point of the task is like I'm trying to learn
their language and so that's okay. But it does seem like I'm a little slow because they already
told me what the word for many was five, six, seven, and I keep asking. So it's a little funny
to do this task over and over.
We did this with a guy called, Dan was our translator.
He's the only one who really speaks Piraha fluently.
He's a good bilingual for a bunch of languages,
but also English and Piraha.
And then a guy called Mike Frank was also a student
with me down there.
He and I did these things.
And so you do that,
okay? And everyone does the same thing. We ask like 10 people and they all do exactly the same
labeling for one up. And then we just do the same thing down on random order actually. We do some of
them up, some of them down first, okay? And so we do, instead of one to 10, we do 10 down to one.
And so I give them 10, nine, and eight, they start saying the word for some.
And then when you get to four,
everyone is saying the word for few,
which we thought was one.
So it's like the context determined
what word, what that quantifier they used was.
So it's not a count word, they're not count words,
they're just approximate words.
And they're gonna be noisy
when you interview a bunch of people
with the definition of few
and there's gonna be a threshold in the context.
Yeah, I don't know what that means.
That's gonna be dependent on the context.
I think it's true in English too, right?
If you ask an English person what a few is,
I mean, that's gonna depend completely on the context.
And it might actually be at first hard to discover
because for a lot of people,
the jump from one to two will be few, right?
So it's a jump.
Yeah, it might be. It might still be there, yeah.
I mean, that's fascinating.
That's fascinating that numbers don't present themselves.
So the words aren't there.
And then we do these other things.
Well, if they don't have the words,
can they do exact matching kinds of tasks?
Can they even do those tasks?
And the answer is sort of yes and no.
And so, yes, they can do them.
So here's the tasks that we did. We put out those spools of thread again. Okay. So I put like three
out here and then we gave them some objects and those happened to be uninflated red balloons.
It doesn't really matter what they are. It's just they're a bunch of exactly the same thing.
And it was easy to put down right next to these spools of threat.
And so then I put out three of these and your task was to just put one against each of my
three things.
And they could do that perfectly.
So I mean, I would actually do that.
It was a very easy task to explain to them because I did this with this guy, Mike Frank,
and he would be my, I'd be the experimenter telling him to do this and showing him to
do this. And then we just like, just do what he did, you know, copy him. All we had to, I didn't
have to speak to Peter Ha, except for know what, copy him, like do what he did is like
all we had to be able to say. And then they would do that just perfectly. And so we'd
move it up. We'd do some sort of random number of items up to 10 and they basically do perfectly
on that. They never get that wrong. I mean, that's not a counting task, right?
That is just a match.
You just put one against another.
It doesn't matter how many.
I don't need to know how many there are there
to do that correctly.
And they would make mistakes, but very, very few
and no more than MIT undergrads.
I'm just gonna say, like, this is low stakes.
So, you know, you make mistakes.
Counting is not required to complete the matching task.
That's right, not at all.
Okay, and so that's our control. And this guy had gone down there before So you make mistakes. So counting is not required to complete the matching task. That's right. Not at all. OK.
And so that's our control.
And this guy had gone down there before
and said that they couldn't do this task.
But I just don't know what he did wrong there,
because they can do this task perfectly well.
And I can train my dog to do this task.
So of course they can do this task.
And so it's not a hard task.
But the other task that was more interesting is like,
so then we do a bunch of tasks where you need
Some way to encode the set so like one of them is just I just put a a
Opaque sheet in front of the other things
I put down a bunch a set of these things and I put an opaque sheet down and so you can't see them anymore
And I tell you do the same thing you were doing before.
And it's easy if it's two or three, it's very easy.
But if I don't have the words for eight,
it's a little harder.
Like maybe with practice, well, no.
Because you have to count.
For us, it's easy because we just count them.
It's just so easy to count them.
But they can't count them because they don't count.
They don't have words for this thing.
And so they would do approximate.
It's totally fascinating.
So they would get them approximately right,
after four or five,
because you can basically always get four right,
three or four, that looks,
that's something we can visually see.
But after that, you kind of have,
it's an approximate number.
And so then, and there was a bunch of tasks we did
and they all failed, I mean failed.
They did approximate after five on all those tasks.
And it kind of shows that the words,
you kind of need the words, you know,
to be able to do these kinds of tasks.
There's a little bit of a chicken and egg thing there,
because if you don't have the words,
then maybe they'll limit you in the kind of,
like a little baby Einstein there,
won't be able to come up with a counting task.
You know what I mean?
Like the ability to count enables you to come up
with interesting things probably.
So yes, you develop counting because you need it,
but then once you have counting,
you can probably come up with a bunch of different inventions.
Like how to, I don't know, what kind of thing,
they do matching really well for building purposes,
building some kind of hut or something like this.
So it's interesting that language is a limiter on what you're able to do.
Yeah, here language is just is the words, here is the words.
Like the words for exact count is the limiting factor here.
They just don't have them.
Yeah, well that's what I mean.
That limit is also a limit on the society
and what they're able to build.
That's gonna be true, yeah.
So it's probably, I mean we don't know,
this is one of those problems with the snapshot
of just current languages is that we don't know
what causes a culture to discover slash invent
a counting system, but the hypothesis is the guess
out there is something to do with farming.
So if you have a bunch of goats
and you wanna keep track of them,
and you have say you have 17 goats
and you go to bed at night and you get up in the morning,
boy, it's easier to have a count system to do that.
You know, that's an abstraction over a set.
So that I don't have, like, people often ask me
when I talk to them about this kind of work,
and they say, well, don't these,
Peter, don't they have kids?
Don't they have a lot of children?
I'm like, yeah, they have a lot of children.
And they do, they often have families
of three or four or five kids.
And they go, well, don't they need the numbers
to keep track of their kids?
And I always ask the person who says this, like, do you have children?
And the answer is always no, because that's not how you keep track of your kids.
You care about their identities.
It's very important to me when I go, I think I have five children.
It's it's doesn't matter which it matters, which five it's like.
If you replace one with someone else,
I would care.
A goat, maybe not, right?
That's the kind of point.
It's an abstraction.
Something that looks very similar to the one
wouldn't matter to me, probably.
But if you care about goats,
you're gonna know them actually individually also.
Yeah, you will.
I mean, cows and goats, if that's the source of food
and milk and all that kind of stuff,
you're gonna actually really care.
But I'm saying it is an abstraction such that
you don't have to care about their identities
to do this thing fast.
That's the hypothesis, not mine.
From anthropologists as they're guessing
about where words for counting came from
is from farming maybe.
Do you have a sense why universal languages
like Esperanto have not taken off?
Like why do we have all these different languages?
Well my guess is, the function of a language
is to do something in a community.
And I mean, unless there's some function
to that language in the community,
it's not gonna survive, it's not gonna be useful.
So here's a great example.
So language death is super common, okay?
Languages are dying all around the world.
And here's why they're dying.
And it's like, yeah, I see this in, you know,
it's not happening right now in either the Chimane
or the Piedmont, but it probably will.
And so there's a neighboring group called Mostitan,
which is, I said that it's a,
I said it's actually, there's a dual, there's two of them.
Okay, so it's actually, there's two of them okay so it's actually
there's two languages which are really close which are most time and and Chimane which are
unrelated to anything else and most time is unlike Chimane in that it has a lot of contact with
Spanish and it's dying so that language is dying the reason it's dying is there's not a lot of
value for the local people in their native language.
So there's much more value in knowing Spanish, like because they want to feed their families.
And how do you feed your family?
You learn Spanish so you can make money, so you can get a job and do these things and
then you can and then you make money.
And so they want Spanish things they want.
And so so most of time is danger is in danger and is dying.
And that's normal.
And so basically the problem is that people,
the reason we learn language is to communicate.
And we use it to make money and to do whatever it is
to feed our families.
And if that's not happening, then it won't take off.
It's not like a game or something.
This is like something we use,
like why is English so popular?
It's not because it's an easy language to learn.
Maybe it is, I don't really know.
But that's not why it's popular.
But because it's the United States,
it's a gigantic economy and therefore.
Big economies that do this.
It's all it is.
It's all about money and that's what,
and so there's a motivation to learn Mandarin. There's a motivation to learn, so there's a motivation to learn Mandarin.
There's a motivation to learn Spanish.
There's a motivation to learn English.
These languages are very valuable to know
because there's so, so many speakers all over the world.
That's fascinating.
There's less of a value economically.
It's like kind of what drives this.
It's not a, you know, it's not just for fun.
I mean, there are these groups that do want
to learn language just for language's sake
and they want, and then there's something, you know, to that. But those are rare, those are these groups that do want to learn language just for language's sake, and there's something to that,
but those are rare, those are rarities in general.
Those are a few small groups that do that.
Most people don't do that.
Well, if that was a primary driver,
then everybody was speaking English
or speaking one language.
There's also a tension.
That's happening.
And that, well, well.
We're moving towards fewer and fewer languages.
We are.
I wonder if, you're right, maybe, you know, this is slow, but maybe that's where we're moving towards fewer and fewer languages. We are. I wonder if, you're right, maybe this is slow,
but maybe that's where we're moving.
But there is a tension,
you're saying a language that infringes,
but if you look at geopolitics and superpowers,
it does seem that there's another thing in tension,
which is a language is a national identity sometimes.
Oh yeah. For certain nations, I mean, that's the war in Ukraine.
Language, Ukrainian language is a symbol of that war
in many ways, like a country fighting for its own identity.
So it's not merely the convenience.
I mean, those two things are attention,
is the convenience of trade and the economics
and be able to communicate with neighboring countries
and trade more efficiently with neighboring countries,
all that kind of stuff, but also identity of the group.
I completely agree.
As language is the way, for every community,
like dialects that emerge are a kind of identity for people
and sometimes a way for people to say F-U
to the more powerful people.
That's interesting.
So in that way, language can't be used as that tool.
Yeah, I completely agree and there's a lot of work
to try to create that identity so people want to do that.
As a cognitive scientist and language expert,
I hope that continues because I don't want languages to die.
I want languages to survive because they're so interesting
for so many reasons.
But I mean, I find them fascinating
just for the language part,
but I think there's a lot of connections to culture as well,
which is also very important.
Do you have hope for machine translation
that can break down the barriers of language?
So while all these different diverse languages exist,
I guess there's many ways of asking this question,
but basically how hard is it to translate
in an automated way from one language to another?
There's gonna be cases
where it's gonna be really hard, right?
So there are concepts that are in one language
and not in another.
The most extreme kinds of cases are these cases
of number information.
So good luck translating a lot of English into Piraha.
It's just impossible.
There's no way to do it because there are no words
for these concepts that we're talking about.
There's probably the flip side, right?
There's probably stuff in Piraha,
which is gonna be hard to translate into English
on the other side.
And so I just don't know what those concepts are.
I mean, the space, the world space is a little,
is different from my world space.
And so I don't know what, like,
so that the things they talk about, things are,
it's gonna have to do with their life
as opposed to my industrial life,
which is gonna be different. And so there's gonna be to do with their life as opposed to my industrial life, which is gonna be different.
And so there's gonna be problems like that always.
Maybe it's not so bad in the case of some of these spaces
and maybe it's gonna be harder in others.
And so it's pretty bad in number.
It's like extreme, I'd say, in the number space,
exact number space, but in the color dimension, right?
So that's not so bad.
But it's a problem
that you don't have ways to talk about the concepts.
There's some.
And there might be entire concepts that are missing.
So to you, it's more about the space of concept
versus the space of form.
Like form, you can probably map.
Yes.
Yeah, but so you were talking earlier about translation
and about how translations, you know, there's good and bad translations. Now we're
talking about translations of form, right? So what makes writing good?
There's music to the form.
It's not just the content. It's how it's written. And translating that, that sounds difficult.
We should say that there is like, I don't hesitate to say meaning, but
there's a music and a rhythm to the form when you look at the broad picture like the Prince
between Dostoevsky and Tolstoy or Hemingway, Bukowski, James Joyce, like I mentioned,
there's a beat to it, there's an edge to it that is in the form.
I mentioned there's a beat to it, there's an edge to it that is in the form.
We can probably get measures of those.
Yeah.
I don't know.
I'm optimistic that we could get measures of those things
and so maybe that's
Translatable. I don't know.
I don't know though.
I have not worked on that.
I would love to see translation to Hemingway.
I mean, Hemingway is probably the lowest,
I would love to see different authors, I mean, Hemingway is probably the lowest,
I would love to see different authors,
but the average per sentence dependency length
for Hemingway is probably the shortest.
That's your sense, huh?
It's simple sentences with short, yeah, yeah, yeah.
I mean, that's when, if you have really long sentences,
even if they don't have center embedding, like.
They can have longer connections, yeah. They can have longer connections. They don't have to, right? You can't have a long sentences, even if they don't have center and bending. They can have longer connections, yeah.
They can have longer connections.
They don't have to, right?
You can have a long, long sentence
with a bunch of local words, yeah.
But it is much more likely to have the possibility
of long dependencies with long sentences, yeah.
I met a guy named Azar Askin who does a lot of cool stuff,
really brilliant, works with Tristan Harris
and a bunch of stuff.
But he was talking to me about communicating with animals.
He co-founded Earth Species Project,
where you're trying to find the common language
between whales, crows, and humans.
And he was saying that there's a lot of promising work,
that even though the signals are very different,
like the actual, like, if you have embeddings
of the languages, they're actually trying
to communicate similar type things.
Is there something you can comment on that,
like where is there promise to that
in everything you've seen in different cultures,
especially like remote cultures,
that this is a possibility?
No.
Like we can talk to whales.
I would say yes.
I think it's not crazy at all.
I think it's quite reasonable.
There's this sort of weird view, well, odd view, I think,
that to think that human language is somehow special.
I mean, it is, maybe it is.
We can certainly do more than any of the other species.
And maybe our language system is part of that.
It's possible.
But people do have often talked about how human,
like Chomsky in fact, has talked about how human only,
only human language has this composition know, this compositionality thing
that he thinks is sort of key in language.
And it's the problem with that argument
is he doesn't speak whale.
And he doesn't speak a crow and he doesn't speak monkey.
You know, he's like, they say things like,
well, they're making a bunch of grunts and squeaks.
And that reasoning is like, that's bad reasoning.
Like, you know, I'm pretty sure if you asked a whale
what we're saying, they'd say, well, I'm making a bunch
of weird noises.
Exactly.
And so it's like, this is a very odd reasoning
to be making that human language is special
because we're the only ones who have human language.
I'm like, well, we don't know what those other,
we just don't, we can't talk to them yet.
And so there are probably a signal in there and it might very well
be something complicated like human language.
I mean, sure with a small brain in lower species,
there's probably not a very good communication system,
but in these higher species where you have,
what seems to be abilities to communicate something,
there might very well be a lot more signal there
than we might have otherwise thought.
But also if we have a lot of intellectual humility here,
there's somebody formerly from MIT,
and Ari Oxman, who I admire very much,
has talked a lot about, has worked on
communicating with plants.
So like, yes, the signal there is even less than, on communicating with plants.
So like, yes, the signal there is even less than, but like it's not out of the realm of possibility
that all nature has a way of communicating.
And it's a very different language,
but they do develop a kind of language through the chemistry,
through some way of communicating with each other.
And if you have enough humility about that possibility,
I think you can, I think it would be a very interesting
in a few decades, maybe centuries, hopefully not,
a humbling possibility of being able to communicate
not just between humans effectively,
but between all of living things on Earth.
Well, I mean, I think some of them
are not gonna have much interesting to say.
But some of them will.
We don't know.
We certainly don't know.
I think if we were humble,
there could be some interesting trees out there.
Well, they're probably talking to other trees, right?
They're not talking to us.
And so to the extent they're talking,
they're saying something interesting to some other
conspecific as opposed to us, right? And so there the extent they're talking, they're saying something interesting to some other, conspecific as opposed to us, right?
And so there probably is, there may be some signal there.
So there are people out there,
actually it's pretty common to say that human language
is special and different
from any other animal communication system.
And I just don't think the evidence is there for that claim.
I think it's not obvious. and I just don't think the evidence is there for that claim.
I think it's not obvious.
We just don't know,
because we don't speak these other communication systems
until we get better.
I do think there are people working on that,
as you pointed out,
the people working on WhaleSpeak, for instance,
that's really fascinating.
Let me ask you a wild, out there sci-fi question.
If we make contact with an intelligent alien civilization
and you get to meet them, how hard do you think,
like how surprised would you be
about their way of communicating?
Do you think it would be recognizable?
Maybe there's some parallels here
when you go to the remote tribes.
I mean, I would want Dan Everett with me.
He is like amazing at learning foreign languages.
And so he, like, this is an amazing feat, right?
To be able to go, this is a language,
Piedaham, which has no translators before him.
I mean, there were- Oh, wow.
He was a missionary- He just shows up.
Well, there was a guy that had been there before,
but he wasn't very good.
And so he learned the language far better
than anyone else had learned before him.
He's like good at, he's a very social person.
I think that's a big part of it, is being able to interact.
So I don't know, it kind of depends on these species
from outer space how much they wanna talk to us.
Is there something you can say about the process he follows?
Like how do you show up to a tribe and socialize?
I mean, I guess colors and counting
is one of the most basic things to figure out.
Yeah, you start that.
You actually start with like objects
and just say, you know, just throw a stick down
and say stick and then you say, what do you call this?
And then they'll say the word whatever.
And he says a standard thing to do is to throw two sticks.
Two sticks and then, you know, he learned pretty quick
that there weren't any count words in this language
because they didn't know this wasn't interesting. I mean, it was kind of weird. They'd say some or something, And then he learned pretty quick that there weren't any count words in this language because
they didn't know this wasn't interesting.
It was kind of weird.
They'd say some or something, the same word over and over again.
But that is a standard thing.
But you have to be pretty out there socially, willing to talk to random people, which these
are really very different people from you.
And he's very social.
And so I think that's a big part of this is that's how a lot of people know a lot of languages,
is they're willing to talk to other people.
That's a tough one, where you just show up knowing nothing.
Yeah, oh God.
That's beautiful that humans are able to connect in that way.
Yeah, yeah.
You've had an incredible career
exploring this fascinating topic.
What advice would you give to young people
about how to have a career like that,
or a life that they can be proud of?
When you see something interesting, just go and do it.
Like I do that.
Like that's something I do,
which is kind of unusual for most people.
So like when I saw the,
like if Piedra Han was available to go and visit,
I was like, yes, yes, I'll go.
And then when we couldn't go back,
we had some trouble with the Brazilian government. There's some corrupt people there. It was
very difficult to go back in there. And so I was like, all right, I got to find another
group. And so we searched around and we were able to find the Chimani because I wanted
to keep working on this kind of problem. And so we found the Chimani and just go there.
I didn't really have, we didn't have contact, we had a little bit of contact and brought someone and that was, you know, we just kind of just try things. I say it's like,
a lot of that's just like ambition, just try to do something that other people haven't done. Just
give it a shot is what I, I mean, I do that all the time. I don't know.
I love it. And I love the fact that your pursuit of fun has landed you here talking to me. This
was an incredible conversation.
Ted, you're just a fascinating human being.
Thank you for taking a journey through human language
with me today, this is awesome.
Thank you very much, Alexis, what a pleasure.
Thanks for listening to this conversation
with Edward Gibson.
To support this podcast, please check out our sponsors
in the description.
And now, let me leave you with some words from Wittgenstein.
The limits of my language mean the limits of my world.
Thank you for listening and hope to see you next time.
you