Steven Pinker: Linguistics as a Window to Understanding the Brain | Big Think
My name is Steve Pinker, and I’m Professor of Psychology at Harvard University. And today I’m going to speak to you about language. I’m actually not a linguist, but a cognitive scientist. I’m not so much interested in language as an object in its own right, but as a window to the human mind.
Language is one of the fundamental topics in the human sciences. It’s the trait that most conspicuously distinguishes humans from other species; it’s essential to human cooperation. We accomplish amazing things by sharing our knowledge or coordinating our actions by means of words. It poses profound scientific mysteries such as, how did language evolve in this particular species? How does the brain compute language? But also, language has many practical applications, not surprisingly given how central it is to human life.
Language comes so naturally to us that we’re apt to forget what a strange and miraculous gift it is. But think about what you’re doing for the next hour. You’re going to be listening patiently as a guy makes noise as he exhales. Now, why would you do something like that? It’s not that I can claim that the sounds I’m going to make are particularly mellifluous, but rather I’ve coded information into the exact sequences of hisses and hums and squeaks and pops that I’ll be making. You have the ability to recover the information from that stream of noises, allowing us to share ideas.
Now, the ideas we are going to share are about this talent, language, but with a slightly different sequence of hisses and squeaks, I could cause you to be thinking thoughts about a vast array of topics, anything from the latest developments in your favorite reality show to theories of the origin of the universe. This is what I think of as the miracle of language, its vast expressive power, and it’s a phenomenon that still fills me with wonder, even after having studied language for 35 years. And it is the prime phenomenon that the science of language aims to explain.
Not surprisingly, language is central to human life. The Biblical story of the Tower of Babel reminds us that humans accomplish great things because they can exchange information about their knowledge and intentions via the medium of language. Language, moreover, is not a peculiarity of one culture, but it has been found in every society ever studied by anthropologists. There are some 6,000 languages spoken on Earth, all of them complex, and no one has ever discovered a human society that lacks complex language.
For this and other reasons, Charles Darwin wrote, “Man has an instinctive tendency to speak, as we see in the babble of our young children, while no child has an instinctive tendency to bake, brew or write.” Language is an intricate talent, and it’s not surprising that the science of language should be a complex discipline. It includes the study of how language itself works, including: grammar, the assembly of words, phrases and sentences; phonology, the study of sound; semantics, the study of meaning; and pragmatics, the study of the use of language in conversation.
Scientists interested in language also study how it is processed in real time, a field called psycholinguistics; how it is acquired by children, the study of language acquisition; and how it is computed in the brain, the discipline called neurolinguistics.
Now, before we begin, it’s important to not confuse language with three other things that are closely related to language. One of them is written language. Unlike spoken language, which is found in all human cultures throughout history, writing was invented a very small number of times in human history, about 5,000 years ago. And alphabetic writing, where each mark on the page stands for a vowel or a consonant, appears to have been invented only once in all of human history by the Canaanites about 3,700 years ago. And as Darwin pointed out, children have no instinctive tendency to write, but have to learn it through construction and schooling.
A second thing not to confuse language with is proper grammar. Linguists distinguish between descriptive grammar—the rules that characterize how people speak—and prescriptive grammar—rules that characterize how people ought to speak if they are writing careful written prose. A dirty secret from linguistics is that not only are these not the same kinds of rules, but many of the prescriptive rules of language make no sense whatsoever.
Take one of the most famous of these rules, the rule not to split infinitives. According to this rule, Captain Kirk made a grievous grammatical error when he said that the mission of the Enterprise was “to boldly go where no man has gone before.” He should have said, according to these editors, “to go boldly where no man has gone before,” which immediately clashes with the rhythm and structure of ordinary English.
In fact, this prescriptive rule was based on a clumsy analogy with Latin, where you can’t split an infinitive because it’s a single word, as in facary to do. Julius Caesar couldn’t have split an infinitive if he wanted to. That rule was translated literally over into English, where it really should not apply.
Another famous prescriptive rule is that one should never use a so-called double negative. Mick Jagger should not have sung, “I can’t get no satisfaction,” he really should have sung, “I can’t get any satisfaction.” Now, this is often promoted as a rule of logical speaking, but “can’t” and “any” is just as much of a double negative as “can’t” and “no.”
The only reason that “can’t get any satisfaction” is deemed correct and “can’t get no satisfaction” is deemed ungrammatical is that the dialect of English spoken in the south of England in the 17th century used “can’t” and “any” rather than “can’t” and “no.” If the capital of England had been in the north of the country instead of the south of the country, then “can’t get no” would have been correct and “can’t get any” would have been deemed incorrect.
There’s nothing special about a language that happens to be chosen as the standard for a given country. In fact, if you compare the rules of languages and so-called dialects, each one is complex in different ways. Take for example, African-American Vernacular English, also called Black English or Ebonics. There is a construction in African-American English where you can say, “He be workin,” which is not an error or bastardization or a corruption of Standard English, but in fact conveys a subtle distinction, one that’s different than simply, “He workin.”
“He be workin,” means that he is employed; he has a job. “He workin,” means that he happens to be working at the moment that you and I are speaking. Now, this is a tense difference that can be made in African-American English that is not made in Standard English, one of many examples in which the dialects have their own set of rules that is just as sophisticated and complex as the one in the standard language.
Now, a third thing not to confuse language with is thought. Many people report that they think in language, but a common of psychologists have shown that there are many kinds of thought that don’t actually take place in the form of sentences.
(1.) Babies (and other mammals) communicate without speech. For example, we know from ingenious experiments that non-linguistic creatures, such as babies before they’ve learned to speak, or other kinds of animals, have sophisticated kinds of cognition. They register cause and effect and objects and the intentions of other people, all without the benefit of speech.
(2.) Types of thinking go on without language--visual thinking. We also know that even in creatures that do have language, namely adults, a lot of thinking goes on in forms other than language, for example, visual imagery. If you look at the top two three-dimensional figures in this display, and I would ask you, do they have the same shape or a different shape? People don’t solve that problem by describing those strings of cubes in words, but rather by taking an image of one and mentally rotating it into the orientation of the other, a form of non-linguistic thinking.
(3.) We use tacit knowledge to understand language and remember the gist. For that matter, even when you understand language, what you come away with is not in itself the actual language that you hear. Another important finding in cognitive psychology is that long-term memory for verbal material records the gist or the meaning or the content of the words rather than the exact form of the words.
For example, I like to think that you retain some memory of what I have been saying for the last 10 minutes. But I suspect that if I were to ask you to reproduce any sentence that I have uttered, you would be incapable of doing so. What sticks in memory is far more abstract than the actual sentences, something that we can call meaning or content or semantics.
In fact, when it even comes to understanding a sentence, the actual words are the tip of a vast iceberg of a very rapid, unconscious, non-linguistic processing that’s necessary even to make sense of the language itself. And I’ll illustrate this with a classic bit of poetry, the lines from the shampoo bottle: “Wet hair, lather, rinse, repeat.”
Now, in understanding that very simple snatch of language, you have to know, for example, that when you repeat, you don’t wet your hair a second time because it’s already wet. And when you get to the end of it and you see “repeat,” you don’t keep repeating over and over in an infinite loop; repeat here means, “repeat just once.”
Now, this tacit knowledge of what the writers of language had in mind is necessary to understand language, but it itself is not language.
(4.) If language is thinking, then where did it come from? Finally, if language were really thought, it would raise the question of where language would come from if it were incapable of thinking without language. After all, the English language was not designed by some committee of Martians who came down to Earth and gave it to us. Rather, language is a grassroots phenomenon. It’s the original wiki, which aggregates the contributions of hundreds of thousands of people who invent jargon and slang and new constructions.
Some of them get accumulated into the language as people seek out new ways of expressing their thoughts, and that’s how we get a language in the first place. Now, this is not to deny that language can affect thought, and linguistics has long been interested in what has sometimes been called the linguistic relativity hypothesis or the Sapir-Whorf Hypothesis (note correct spelling), named after the two linguists who first formulated it, namely that language can affect thought.
There’s a lot of controversy over the status of the linguistic relativity hypothesis, but no one believes that language is the same thing as thought and that all of our mental life consists of reciting sentences.
Now that we have set aside what language is not, let’s turn to what language is, beginning with the question of how language works. In a nutshell, you can divide language into three topics.
There are the words that are the basic components of sentences that are stored in a part of long-term memory that we can call the mental lexicon or the mental dictionary. There are rules, the recipes or algorithms that we use to assemble bits of language into more complex stretches of language, including syntax—the rules that allow us to assemble words into phrases and sentences; morphology—the rules that allow us to assemble bits of words, like prefixes and suffixes into complex words; phonology—the rules that allow us to combine vowels and consonants into the smallest words.
And then all of this knowledge of language has to connect to the world through interfaces that allow us to understand language coming from others, to produce language that others can understand us, the language interfaces.
Let’s start with words. The basic principle of a word was identified by the Swiss linguist, Ferdinand de Saussure, more than 100 years ago when he called attention to the arbitrariness of the sign. Take for example the word, “duck.” The word, “duck” doesn’t look like a duck or walk like a duck or quack like a duck, but I can use it to get you to think the thought of a duck because all of us at some point in our lives have memorized that brute force association between that sound and that meaning, which means that it has to be stored in memory in some format.
In a very simplified form, an entry in the mental lexicon might look something like this: there is a symbol for the word itself, there is some kind of specification of its sound, and there’s some kind of specification of its meaning.
Now, one of the remarkable facts about the mental lexicon is how capacious it is. Using dictionary sampling techniques, where you take the top left-hand word on every 20th page of the dictionary, give it to people in a multiple-choice test, correct for guessing, and multiply by the size of the dictionary, you can estimate that a typical high school graduate has a vocabulary of around 60,000 words, which works out to a rate of learning of about one new word every two hours starting from the age of one.
When you think that every one of these words is arbitrary as a telephone number or a date in history, you’re reminded about the remarkable capacity of human long-term memory to store the meanings and sounds of words.
But of course, we don’t just blurt out individual words; we combine them into phrases and sentences. And that brings up the second major component of language, namely, grammar.
Now the modern study of grammar is inseparable from the contributions of one linguist, the famous scholar, Noam Chomsky, who set the agenda for the field of linguistics for the last 60 years.
To begin with, Chomsky noted that the main puzzle that we have to explain in understanding language is creativity, or as linguists often call it, productivity—the ability to produce and understand new sentences.
Except for a small number of clichéd formulas, just about any sentence that you produce or understand is a brand new combination produced for the first time, perhaps in your life, perhaps even in the history of the species. We have to explain how people are capable of doing it.
It shows that when we know a language, we haven’t just memorized a very long list of sentences, but rather have internalized a grammar or algorithm or recipe for combining elements into brand new assemblies. For that reason, Chomsky has insisted that linguistics is really properly a branch of psychology and is a window into the human mind.
A second insight is that languages have a syntax which can’t be identified with their meaning. Now, the only quotation that I know of, of a linguist that has actually made it into Bartlett’s Familiar Quotations, is the following sentence from Chomsky, from 1956: “Colorless green ideas sleep furiously.”
Well, what’s the point of that sentence? The point is that it is very close to meaningless. On the other hand, any English speaker can instantly recognize that it conforms to the patterns of English syntax. Compare, for example, “furiously sleep ideas dream colorless,” which is also meaningless, but we perceive as a word salad.
A third insight is that syntax doesn’t consist of a string of word-by-word associations as in stimulus-response theories in psychology, where producing a word is a response which you then hear, and it becomes a stimulus to producing the next word, and so on.
Again, the sentence, “colorless green ideas sleep furiously,” can help make this point. Because if you look at the word-by-word transition probabilities in that sentence—for example, colorless and then green—how often have you heard colorless and green in succession? Probably zero times.
Green and ideas, those two words never occur together; ideas and sleep, sleep and furiously. Every one of the transition probabilities is very close to zero; nonetheless, the sentence as a whole can be perceived as a well-formed English sentence.
Language, in general, has long-distance dependencies. The word in one position in a sentence can dictate the choice of the word several positions downstream. For example, if you begin a sentence with “either,” somewhere down the line, there has to be an “or.” If you have an “if,” generally, you expect somewhere down the line there to be a “then.”
There’s a story about a child who says to his father, “Daddy, why did you bring that book that I don’t want to be read to out of, up for?” where you have a set of nested or embedded long-distance dependencies.
Indeed, one of the applications of linguistics to the study of good prose style is that sentences can be rendered difficult to understand if they have too many long-distance dependencies because that could put a strain on the short-term memory of the reader or listener while trying to understand them.
Rather than a set of word-by-word associations, sentences are assembled in a hierarchical structure that looks like an upside-down tree. Let me give you an example of how that works in the case of English.
One of the basic rules of English is that a sentence consists of a noun phrase, the subject, followed by a verb phrase, the predicate. A second rule, in turn, expands the verb phrase. A verb phrase consists of a verb followed by a noun phrase, the object, followed by a sentence, the complement, as in, “I told him that it was sunny outside.”
Now, why do linguists insist that language must be composed out of phrase-structural rules?
(1.) Rules allow for open-ended creativity. Well, for one thing, that helps explain the main phenomenon that we want to explain, namely the open-ended creativity of language.
(2.) Rules allow for expression of unfamiliar meaning. It allows us to express unfamiliar meanings. There’s a cliché in journalism, for example, that when a dog bites a man, that isn’t news, but when a man bites a dog, that is news. The beauty of grammar is that it allows us to convey news by assembling into familiar words in brand new combinations. Also, because of the way phrase structure rules work, they produce a vast number of possible combinations.
(3.) Rules allow for production of vast numbers of combinations. Moreover, the number of different thoughts that we can express through the combinatorial power of grammar is not just humongous, but in a technical sense, it’s infinite.
Now, of course, no one lives an infinite number of years, and therefore cannot showcase their ability to understand an infinite number of sentences, but you can make the point in the same way that a mathematician can say that someone who understands the rules of arithmetic knows that there are an infinite number of numbers; namely, if anyone ever claimed to have found the longest one, you can always come up with one that’s even bigger by adding one to it. And you can do the same thing with language.
Let me illustrate it in the following way. As a matter of fact, there has been a claim that there is a world’s longest sentence. Who would make such a claim? Well, who else? The Guinness Book of World Records. You can look it up. There is an entry for the World’s Longest Sentence. It is 1,300 words long, and it comes from a novel by William Faulkner.
Now I won’t read all 1,300 words, but I’ll just tell you how it begins: “They both bore it as though in deliberate flatulent exaltation…” and it runs on from there. But I’m here to tell you that in fact, this is not the world’s longest sentence. And I’ve been tempted to obtain immortality in Guinness by submitting the following record breaker: “Faulkner wrote, they both bore it as though in deliberate flatulent exaltation.”
But sadly, this would not be immortality after all but only the proverbial 15 minutes of fame because based on what you now know, you could submit a record breaker for the record breaker, namely, “Guinness noted that Faulkner wrote,” or “Pinker mentioned that Guinness noted that Faulkner wrote,” or “who cares that Pinker mentioned that Guinness noted that Faulkner wrote…”
Take for example the following wonderfully ambiguous sentence that appeared in TV Guide: “On tonight’s program, Conan will discuss sex with Dr. Ruth.”
Now this has a perfectly innocent meaning in which the verb, “discuss” involves two things—namely, the topic of discussion, “sex,” and the person with who it’s being discussed, in this case, with Dr. Ruth. But it has a somewhat naughtier meaning if you rearrange the words into phrases according to a different structure, in which case “sex with Dr. Ruth” is the topic of conversation, and that’s what’s being discussed.
Now, phrase structure not only can account for our ability to produce so many sentences, but it’s also necessary for us to understand what they mean. The geometry of branches in a phrase structure is essential to figuring out who did what to whom.
Another important contribution of Chomsky to the science of language is the focus on language acquisition by children. Now, children can’t memorize sentences because knowledge of language isn’t just one long list of memorized sentences, but somehow they must distill out or abstract out the rules that go into assembling sentences based on what they hear coming out of their parent’s mouths when they were little. And the talent of using rules to produce combinations is in evidence from the moment that kids begin to speak.
Children create sentences unheard from adults. At the two-word stage, which you typically see in children who are 18 months or a bit older, kids are producing the smallest sentences that deserve to be counted as sentences—namely, two words long. But already it’s clear that they are putting them together using rules in their own mind.
To take an example, a child might say, “more outside,” meaning, take them outside or let them stay outside. Now, adults don’t say, “more outside.” So it’s not a phrase that the child simply memorized by rote, but it shows that already children are using these rules to put together new combinations.
Another example: a child, having jam washed from his fingers, said to his mother, “all gone sticky.” Again, not a phrase that you could ever have copied from a parent, but one that shows the child producing new combinations.
Past tense rule. An easy way of showing that children assimilate rules of grammar unconsciously from the moment they begin to speak is the use of the past tense rule. For example, children go through a long stage in which they make errors like, “We holded the baby rabbits,” or “He teared the paper and then he sticked it.”
Cases in which they overgeneralize the regular rule of forming the past tense, adding ‘ed’ to irregular verbs like “hold,” “stick,” or “tear.” And it’s easy to show... it’s easy to get children to flaunt this ability to apply rules productively in a laboratory demonstration called the Wug Test.
You bring a kid into a lab. You show them a picture of a little bird and you say, “This is a wug.” And you show them another picture and you say, “Well, now there are two of them.” There are two, and children will fill in the gap by saying “wugs.”
Again, a form they could not have memorized because it’s invented for the experiment, but it shows that they have productive mastery of the regular plural rule in English. And famously, Chomsky claimed that children solved the problem of language acquisition by having the general design of language already wired into them in the form of a universal grammar.
A spec sheet for what the rules of any language have to look like. What is the evidence that children are born with a universal grammar? Well, surprisingly, Chomsky didn’t propose this by actually studying kids in the lab or kids in the home, but through a more abstract argument called “The poverty of the input.”
Namely, if you look at what goes into the ears of a child and look at the talent they end up with as adults, there is a big chasm between them that can only be filled in by assuming that the child has a lot of knowledge of the way that language works already built-in.
Here’s how the argument works. One of the things that children have to learn when they learn English is how to form a question. Now, children will get evidence from parent’s speech to how the question rule works, such as sentences like, “The man is here,” and the corresponding question, “Is the man here?”
Now, logically speaking, a child getting that kind of input could posit two different kinds of rules. There’s a simple word-by-word linear rule. In this case, find the first “is” in the sentence and move it to the front. “The man is here,” “Is the man here?”
Now, there’s a more complex rule that the child could posit called a structure-dependent rule, one that looks at the geometry of the phrase structure tree. In this case, the rule would be: find the first “is” after the subject noun phrase and move that to the front of the sentence.
A diagram of what that rule would look like is as follows: you look for the “is” that occurs after the subject noun phrase, and that’s what gets moved to the front of the sentence. Now, what’s the difference between the simple word-by-word rule and the more complex structure-dependent rule?
Well, you can see the difference when it comes to performing the question from a slightly more complex sentence like, “The man who is tall is in the room.” But how is the child supposed to learn that? How did all of us end up with the correct structure-dependent rule rather than the far simpler word-by-word version of the rule?
“Well,” Chomsky argues, “if you were actually to look at the kind of language that all of us hear, it’s actually quite rare to hear a sentence like, “Is the man who is tall in the room?” The kind of input that would logically inform you that the word-by-word rule is wrong and the structure-dependent rule is right.
Nonetheless, we all grow up into adults who unconsciously use the structure-dependent rule rather than the word-by-word rule. Moreover, children don’t make errors like, “is the man who tall is in the room,” as soon as they begin to form complex questions, they use the structure-dependent rule. And that,” Chomsky argues, “is evidence that structure-dependent rules are part of the definition of universal grammar that children are born with.”
Now, though Chomsky has been fantastically influential in the science of language, that does not mean that all language scientists agree with him. And there have been a number of critiques of Chomsky over the years. For one thing, the critics point out, Chomsky hasn’t really shown principles of universal grammar that are specific to language itself as opposed to general ways in which the human mind works across multiple domains, language and vision and control of motion and memory and so on.
We don’t really know that universal grammar is specific to language, according to this critique. Secondly, Chomsky and the linguists working with him have not examined all 6,000 of the world’s languages and shown that the principles of universal grammar apply to all 6,000. They’ve posited it based on a small number of languages and the logic of the poverty of the input but haven’t actually come through with the data that would be necessary to prove that universal grammar is really universal.
Finally, the critics argue, Chomsky has not shown that more general-purpose learning models, such as neural network models, are incapable of learning language together with all the other things that children learn, and therefore has not proven that there has to be specific knowledge of how grammar works in order for the child to learn grammar.
Another component of language governs the sound pattern of language, the ways that the vowels and consonants can be assembled into the minimal units that go into words. Phonology, as this branch of linguistics is called, consists of formation rules that capture what is a possible word in a language according to the way that it sounds.
To give you an example, the sequence “bluk” is not an English word, but you get a sense that it could be an English word that someone could coin as “bluk.” But when you hear the sound *** (a sound undetectable to the human ear), you instantly know that not only isn’t it an English word, but it really couldn’t be an English word. *** (a sound undetectable to the human ear), by the way, comes from Yiddish and means kind of to sigh or to moan.
Oi. That’s to *** (a sound undetectable to the human ear). The reason that we recognize it’s not English is because it has sounds like *** (a sound undetectable to the human ear) and sequences like *** (a sound undetectable to the human ear), which aren’t part of the formation rules of English phonology.
But together with the rules that define the basic words of a language, there are also phonological rules that make adjustments to the sounds, depending on what the other words the word appears with. Very few of us realize, for example, in English, that the past tense suffix “ed” is actually pronounced in three different ways.
When we say, “He walked,” we pronounce the “ed” like a “ta,” walked. When we say “jogged,” we pronounce it as a “d,” jogged. And when we say “patted,” we stick in a vowel, pat-ted, showing that the same suffix, “ed,” can be readjusted in its pronunciation according to the rules of English phonology.
Now, when someone acquires English as a foreign language or acquires a foreign language in general, they carry over the rules of phonology of their first language and apply it to their second language. We have a word for it; we call it an “accent.” When a language user deliberately manipulates the rules of phonology, that is, when they don’t just speak in order to convey content, they pay attention as to what phonological structures are being used; we call it poetry and rhetoric.
So far, I’ve been talking about knowledge of language, the rules that go into defining what are possible sequences of language. But those sequences have to get into the brain during speech comprehension and they have to get out during speech production. And that takes us to the topic of language interfaces.
And let’s start with production. This diagram here is literally a human cadaver that has been sawn in half. An anatomist took a saw and [sound] allowing it to see in cross-section the human vocal tract. And that can illustrate how we get our knowledge of language out into the world as a sequence of sounds.
Now, each of us has at the top of our windpipe or trachea, a complex structure called the larynx or voice box; it’s behind your Adam’s Apple. And the air coming out of your lungs has to go past two cartilaginous flaps that vibrate and produce a rich, buzzy sound source, full of harmonics. Before that vibrating sound gets out to the world, it has to pass through a gauntlet or chambers of the vocal tract.
The throat behind the tongue, the cavity above the tongue, the cavity formed by the lips, and when you block off airflow through the mouth, it can come out through the nose.
Now, each one of those cavities has a shape that, thanks to the laws of physics, will amplify some of the harmonics in that buzzy sound source and suppress others. We can change the shape of those cavities when we move our tongue around. When we move our tongue forward and backward, for example, as in “eh,” “aa,” “eh,” “aa,” we change the shape of the cavity behind the tongue, changing the frequencies that are amplified or suppressed, and the listener hears them as two different vowels.
Likewise, when we raise or lower the tongue, we change the shape of the resonant cavity above the tongue, as in, say, “eh,” “ah,” “eh,” “ah.” Once again, the change in the mixture of harmonics is perceived as a change in the nature of the vowel.
When we stop the flow of air and then release it as in “t,” “ca,” “ba,” then we hear a consonant rather than a vowel, or even when we restrict the flow of air, as in “f,” “ss,” producing a chaotic noisy sound. Each one of those sounds that gets sculpted by different articulators is perceived by the brain as a qualitatively different vowel or consonant.
Now, an interesting peculiarity of the human vocal tract is that it obviously co-opts structures that evolved for different purposes—for breathing and for swallowing and so on. And it’s an interesting fact, first noted by Darwin, that the larynx over the course of evolution has descended in the throat so that every particle of food going from the mouth through the esophagus to the stomach has to pass over the opening into the larynx with some probability of being inhaled, leading to the danger of death by choking.
And in fact, until the invention of the Heimlich Maneuver, several thousand people every year died of choking because of this maladaptive trait of the human vocal tract. Why did we evolve a mouth and throat that leaves us vulnerable to choking?
Well, a plausible hypothesis is that it’s a compromise that was made in the course of evolution to allow us to speak. By giving range to a variety of possibilities for alternating the resonant cavities, for moving the tongue back and forth and up and down, we expanded the range of speech sounds we could make, improving the efficiency of language, but suffered the compromise of an increased risk of choking, showing that language presumably had some survival advantage that compensated for the disadvantage in choking.
What about the flow of information in the other direction, that is, from the world into the brain—the process of speech comprehension? Speech comprehension turns out to be an extraordinarily complex computational process, which we're reminded of every time we interact with a voicemail menu on a telephone or use dictation on our computers.
For example, one writer, using the state-of-the-art speech-to-text systems, dictated the following words into his computer. He dictated “book tour,” and it came out on the screen as “back to work.” Another example: he said, “I truly couldn’t see,” and it came out on the screen as, “a cruelly good MC.” Even more disconcertingly, he started a letter to his parents by saying, “Dear mom and dad,” and what came out on the screen was, “The man is dead.”
Now, dictation systems have gotten better and better, but they still have a way to go before they can duplicate a human stenographer. What is it about the problem of speech understanding that makes it so easy for a human, but so hard for a computer? Well, there are two main contributors.
One of them is the fact that each phony, each vowel or consonant actually comes out very differently, depending on what comes before and what comes after—a phenomenon sometimes called co-articulation. Let me give you an example. The place called Cape Cod has two “c” sounds. Each of them symbolized by the letter “C,” the hard “C.”
Nonetheless, when you pay attention to how you pronounce them, you notice that in fact, you pronounce them in very different parts of the mouth. Try it: Cape Cod, Cape Cod… “c,” “c.” In one case, the “c” is produced way back in the mouth; the other it’s produced much farther forward.
We don’t notice that we pronounce “c” in two different ways depending on whether it comes before an “a” or an “o,” but that difference forms a difference in the shape of the resonant cavity in our mouth, which produces a very different wave form. And unless a computer is specifically programmed to take that variability into account, it will perceive those two different “c’s” as a different sound that objectively speaking, they really are: “c-eh,” “c-oa.”
They really are different sounds, but our brain lumps them together. The other reason that speech recognition is such a difficult problem is because of the absence of segmentation. Now, we have an illusion when we listen to speech that consists of a sequence of sounds corresponding to words.
But if you actually were to look at the wave form of a sentence on an oscilloscope, there would not be little silences between the words the way there are little bits of white space in printed words on a page, but rather a continuous ribbon in which the end of one word leads right to the beginning of the next. It’s something that we’re aware of when we listen to speech in a foreign language when we have no idea where one word ends and the other one begins.
In our own language, we detect the word boundaries simply because in our mental lexicon, we have stretches of sound that correspond to one word that tell us where it ends. But you can’t get that information from the waveform itself. In fact, there’s a whole genre of wordplay that takes advantage of the fact that word boundaries are not physically present in the speech wave.
Novelty songs like “Mairzy doats and dozy doats and liddle lamzy divey, a kiddley divey too, wooden shoe?” Now, it turns out that this is actually a grammatical sequence in words in English: “Mares eat oats and does eat oats and little lambs eat ivy; a kid'll eat ivy too, wouldn’t you?”
When it is spoken or sung normally, the boundaries between words are obliterated and so the same sequence of sounds can be perceived either as nonsense or, if you know what they’re meant to convey, as sentences.
Another example familiar to most children: “Fuzzy Wuzzy was a bear, Fuzzy Wuzzy had no hair. Fuzzy Wuzzy wasn’t very fuzzy, was he?” And the famous doggerel, “I scream, you scream, we all scream for ice cream.”
We are generally unaware of how unambiguous language is. In context, we effortlessly and unconsciously derive the intended meaning of a sentence, but a poor computer not equipped with all of our common sense and human abilities and just going by the words and the rules is often flabbergasted by all the different possibilities.
Take a sentence as simple as “Mary had a little lamb,” you might think that that’s a perfectly simple unambiguous sentence. But now imagine that it was continued with “with mint sauce.” You realize that “have” is actually a highly ambiguous word.
As a result, computer translations can often deliver comically incorrect results. According to legend, one of the first computer systems that was designed to translate from English to Russian and back again did the following: given the sentence, “The spirit is willing, but the flesh is weak,” it translated it back as “The vodka is agreeable, but the meat is rotten.”
So why do people understand language so much better than computers? What is the knowledge that we have that has been so hard to program into our machines? Well, there’s a third interface between language and the rest of the mind, and that is the subject matter of the branch of linguistics called Pragmatics, namely, how people understand language in context using their knowledge of the world and their expectations about how other speakers communicate.
The most important principle of Pragmatics is called “the cooperative principle,” namely; assume that your conversational partner is working with you to try to get a meaning across truthfully and clearly. And our knowledge of Pragmatics, like our knowledge of syntax and phonology and so on, is deployed effortlessly but involves many intricate computations.
For example, if I were to say, “If you could pass the guacamole, that would be awesome,” you understand that as a polite request meaning, “give me the guacamole.” You don’t interpret it literally as a rumination about a hypothetical affair; you just assume that the person wanted something and was using that string of words to convey the request politely.
Often, comedies will use the absence of pragmatics in robots as a source of humor. As in the old “Get Smart” situation comedy, which had a robot named Hymie, and a recurring joke in the series would be that Maxwell Smart would say to Hymie, “Hymie, can you give me a hand?” And then Hymie would go, {sound}, remove his hand and pass it over to Maxwell Smart, not understanding that “give me a hand,” in context means, help me rather than literally transfer the hand over to me.
Or take the following example of Pragmatics in action. Consider the following dialogue; Martha says, “I’m leaving you.” John says, “Who is he?” Now, understanding language requires finding the antecedents pronouns; in this case, who the “he” refers to, and any competent English speaker knows exactly who the “he” is, presumably John’s romantic rival, even though it was never stated explicitly in any part of the dialogue.
This shows how we bring to bear on language understanding a vast store of knowledge about human behavior, human interactions, and human relationships. And we often have to use that background knowledge even to solve mechanical problems like, who does a pronoun like “he” refer to? It’s that knowledge that’s extraordinarily difficult, to say the least, to program into a computer.
Language is a miracle of the natural world because it allows us to exchange an unlimited number of ideas using a finite set of mental tools. Those mental tools comprise a large lexicon of memorized words and a powerful mental grammar that can combine them. Language, thought of in this way, should not be confused with writing, with the prescriptive rules of proper grammar or style, or with thought itself.
Modern linguistics is guided by the questions, though not always the answers, suggested by the linguist known as Noam Chomsky, namely how is the unlimited creativity of language possible? What are the abstract mental structures that relate words to one another? How do children acquire them? What is universal across languages? And what does that say about the human mind?
The study of language has many practical applications, including computers that understand and speak, the diagnosis and treatment of language disorders, the teaching of reading, writing, and foreign languages, the interpreting of the language of law, politics, and literature.
But for someone like me, language is eternally fascinating because it speaks to such fundamental questions of the human condition. Language is really at the center of a number of different concerns of thought, of social relationships, of human biology, of human evolution, that all speak to what’s special about the human species. Language is the most distinctively human talent. Language is a window into human nature, and most significantly, the vast expressive power of language is one of the wonders of the natural world. Thank you.