Chapter 12

Chapter 12 Speech Perception

Animals use sound to communicate in many ways • Bird calls • Whale calls • Baboons shrieks • Vervet calls • Grasshopper rubbing legs • These kinds of communication differ from language in the structure of the signals.

Speech perception is a broad category • Understanding what is said (linguistic information) • Understanding “paralinguistic information” • Speaker’s identity • Speaker’s affective state • Speech processing ≠ linguistics processing.

Vocal tract • Includes larynx, throat, tongue, teeth, and lips. • Vocal chords = vocal folds • Male vocal chords 60% larger than female vocal chords in humans • Size of vocal chords are not the sole cue to sex of speaker. Children’s voices can be discriminated.

Physical disturbances in air ≠ phonemes • Many different sounds are lumped together in a every single phoneme. • Another case of separating the physical from the psychological.

Humans normally speak at about 12 phonemes per second. • Humans can comprehend speech at up to about 50 phonemes per second. • Voice spectrogram changes with age. • Spectrograms can be taken of all sorts of sounds.

Neural analysis of speech sounds • One phoneme can have distinct sound spectrograms. Distinct sound spectrograms can be metamers for a phoneme.

Primary Auditory Cortex http://www.molbio.princeton.edu/courses/mb427/2000/projects/0008/messedupbrainmain.html

Broca’s and Wernicke’s

Brain mechanisms of speech perception Single-cell recordings in monkeys show they are sensitive to: • Time lapsing between lip movements and start of sound production • Acoustic context of sound • Rate of sound frequency changes

Human studies • Human studies have been based on neuroimaging (fMRI and PET). • A1 is not a linguistic center; merely an auditory center. It does not respond preferentially to speech, rather than sound. • Speech processing is a grab bag of kinds of processing, e.g. linguistic, emotional, and speaker identity.

Wernicke’s aphasia • Subjects can hear sounds. • Subjects lose ability to comprehend speech, though they can produce (clearly disturbed) speech themselves.

Other brain regions involved in speech processing • Right temporal hemisphere is involved in emotion, speaker sex, and identity. • Phonagnosia • Right temporal hemisphere is less involved in linguistic analysis. • Right pre-frontal cortex and parts of the limbic systems respond to emotion.

Other brain regions involved in speech processing • Both hemispheres active in human vocalizations, such as laughing or humming. • Some motor areas for speech are active during speech perception.

A “what” and “where” pathway in speech processing? • One pathway is anterior (forward) and ventral (below) • The other pathway is posterior (backward) and dorsal (above). • Not clear what these pathways do.

Understanding speech: Aftereffects • Tilt aftereffect and motion aftereffect due to “fatigue” of specific neurons. • Eimas & Corbett, (1973), performed a linguistic version. • Take ambiguous phonemes, e.g. between /t/ and /d/. • Listen to /d/ over and over, then the ambiguity disappears.

Understanding speech: Context effects • In vision, surrounding objects affect interpretation of size, color, brightness. In other words, context influences perception. • In speech, context influences perception. We noted this earlier with /di/ and /du/.

Understanding speech: Context effects • Semantic context can influence perception. • Examples of song lyrics (aka Mondegreens). • "They had slain the Earl of Moray/And Lady Mondegreen." • "They had slain the Earl of Moray/And laid him on the green." • "Gladly, the cross-eyed bear.” • "Gladly The Cross I'd Bear"). • "There's a bathroom on the right“ • "There's a bad moon on the rise" • 'Scuse Me While I Kiss This Guy, • “‘scuse me while I kiss the sky” • “He's Got the Whole World in His Pants” • “When a Man loves a walnut”

Understanding speech: Context effects • Semantic context can influence perception. • Examples of song lyrics. • Speed of utterance influences phonetic interpretation. • A syllable may sound like /ba/ when preceding words are spoken slowly, but like /pa/ when preceding words are spoken quickly. • Cadence of a sentence can influence interpretation of the last word. (Ladeford & Broadbent, 1957)

Understanding speech:visual effects McGurk Effect • Movies of speakers influence syllables heard. • Vocal /ga/ + lip /ba/ = /da/ • Vocal “tought” + lip “hole” = “towel”. • McGurk effect reduced with face inversion

Emotions of talking heads • Movie of facial emotion + voice with an emotion • When face and voice agree, most subject correctly identity emotion. • When face and voice conflict, facial expression provided the emotion.

McGurk effect + talking heads effect makes sense, since it enables humans to function more reliably in noise environments. • Infants 18-20 weeks old can match voice and face. • Humans can match movies of speakers with voices of speakers.

Monkeys and preferential looking • Ghazanfar & Logothetis, (2003). • Showed monkeys two silent movies of monkeys vocalizing at the same time. • Played a vocalization that matched one of the silent movies. • All 20 monkeys looked at the monkey face that matched the sound.

More neuroimaging of speech perception • Subjects watched faces of silent speakers. • MT (aka V5) was active for motion processing. • A1 and additional language centers were also active.

Perceived sound boundaries in words are illusory. • Pauses indicate times at which to switch speakers. • Disfluency: repetitions, false starts, and useless interjections. • Help by parsing sentence, give subject time to process, and hinting at new information.

Other disfluencies: “Bushisms” • "If you've got somebody in harm's way, you want the president being—making advice, not—be given advice by the military, and not making decisions based upon the latest Gallup poll or focus group."—New Albany, Ind., Nov. 13, 2007

Other disfluencies: “Bushisms” • "We're going to—we'll be sending a person on the ground there pretty soon to help implement the malaria initiative, and that initiative will mean spreading nets and insecticides throughout the country so that we can see a reduction in death of young children that—a death that we can cure."—Washington, D.C., Oct. 18, 2007

“Bushisms” • "My hearts are with the Jeffcoats right now, that's what I'm thinking."—After meeting with California wildfire victims Kendra and Jay Jeffcoat, San Diego, Calif., Oct. 25, 2007

“Bushisms” • "You know, when you give a man more money in his pocket—in this case, a woman more money in her pocket to expand a business, it—they build new buildings. And when somebody builds a new building somebody has got to come and build the building. And when the building expanded it prevented additional opportunities for people to work."—Lancaster, Pa., Oct. 3, 2007

Intonation • Conveys end of sentence. • Margaret Thatcher • Differentiates questions from statements. • She forgot her book? vs. She forgot her book. • Indicates speaker • Conveys mood.

Language-based learning impairment: A specifically linguistic, rather than acoustic impairment. • LLI appears to be an insensitivity to fast alternations in the speech signal. • This can be treated, to some degree, by a video game that relies on sensitivity to fast alternations.

Chapter 12

Chapter 12

Presentation Transcript

12~Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

CHAPTER 12

Chapter 12

CHAPTER 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12

Chapter 12