390 likes | 854 Vues
The Perception of Speech. Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes examples of phonemes: /ba/ in bat , /pa/ in pat. Acoustic Properties of Speech. Speech can be characterized by a spectrogram. Acoustic Properties of Speech.
 
                
                E N D
Speech • Speech is for rapid communication • Speech is composed of units of sound called phonemes • examples of phonemes: /ba/ in bat , /pa/ in pat
Acoustic Properties of Speech • Speech can be characterized by a spectrogram
Acoustic Properties of Speech • Spectrogram reveals differences between phonemes • The differences are in the formants and the formant transitions
Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • right?…
Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • Then specific phonemes must correspond to specific spectrograms - a property called acoustic-phonetic invariance
Perceiving Speech • Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram • This is not the case! For example /d/ followed by different vowels:
Perceiving Speech • Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram • This is not the case! For example /d/ • Clearly perception and understanding of speech sounds is more elaborate than simply interpreting an internal spectrogram
Perceiving Speech • The phrase “Peter buttered the burnt toast” has five /t/ phonemes. There are not 5 identical sweeps in the spectrogram
Perceiving Speech • The Segmentation Problem • Segmentation is the perception of silence between words • Often illusory
Perceiving Speech • The phrase “I owe you a Yo-Yo” has no silence in it !
Spoken Input • The Segmentation Problem: • The stream of acoustic input is not physically segmented into discrete phonemes, words, phrases, etc. • Silent gaps don’t always indicate (aren’t perceived as) interruptions in speech
Spoken Input • The Segmentation Problem: • The stream of acoustic input is not physically segmented into discrete phonemes, words, phrases, etc. • Continuous speech stream is sometimes perceived as having gaps
Perceiving Speech • So how do you perceive speech? Some of the “strategies”: 1. reduce the data 2. use context clues 3. use vision
Categorical Perception • Categorical Perception is a phenomenon in which the brain assigns a stimulus into one or another category but never into an intermediate category
Categorical Perception • For example, /ba/ and /pa/ differ in their formant transitions • /ba/ is formed by stopping the flow of air from the lungs and releasing it after about 10 milliseconds (called voice onset time) • /pa/ is similar except that voice onset time is about 50 ms
Categorical Perception • Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but...
Categorical Perception • Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but... • English speakers will hear either /ba/ or /pa/ but never something in between
Categorical Perception is Part of Learning a Language • Babies can discriminate /ba/ from /pa/ and can discriminate these from phonemes with intermediate voice onset times! • By 10 to 12 months, babies (learning English) stop discriminating intermediate voice onset times
Categorical Perception is Part of Learning a Language • Once category boundaries are learned it is impossible to unlearn them • non-native speakers of any language often cannot hear certain phonemes the way native speakers do • as a consequence they will always have at least some slight accent
Categorical Perception • Another example:
Perception (of all types) Makes Use of Context • The stream of information contained in speech is usually ambiguous and incomplete • Your brain makes a “best guess” based on the circumstances
Perception (of all types) Makes Use of Context • Consider the following example: shoe”. “The __eel fell of the cough car”.
Perception (of all types) Makes Use of Context • Consider the following example: • Listeners report hearing the “appropriate” phoneme during the cough shoe”. “The __eel fell of the cough car”.
Much of Speech Perception isn’t Auditory ! • Why rely on only one sensory system when there is information in two !?
Much of Speech Perception isn’t Auditory ! • Why rely on only one sensory system when there is information in two !? • The brain seamlessly integrates any information it is given - this is called cross-modal integration
Cross-modal Integration • Speech perception involves the synthesis of vision and hearing • The McGurk effect demonstrates the critical role of vision on speech perception
Cross-modal Integration • The McGurk Effect
Cross-modal Integration • The McGurk Effect - suggests that visual and auditory information are combined to enhance speech perception under normal circumstances • When visual and auditory information are incongruous the resulting perception is unpredictable and often wrong
Auditory Scene Analysis • Sounds don’t happen in isolation, they happen in streams of changing frequencies • How does the system group related auditory events into streams and keep different streams separate?
Auditory Scene Analysis • Solving this problem is called Auditory Scene Analysis • One important principle is proximity –in pitch, time, or spatial location
Auditory Scene Analysis • Effect of timing proximity: Slow Fast
Auditory Scene Analysis • Effect of timing proximity: Slow Fast Do you hear this? Pitch Or this? Pitch
Auditory Scene Analysis • Effect of pitch proximity: far close
Auditory Scene Analysis • Effect of pitch proximity: far close Do you hear this? Pitch Or this? Pitch
Auditory Scene Analysis • Effect of proximity: • auditory system groups together events that happen close together in time and frequency
Auditory Scene Analysis • Effect of proximity: • auditory system groups together events that happen close together in time and frequency • This enables us to perceive meaningful streams of information when they are mixed with distraction
Auditory Scene Analysis • Effect of proximity: • auditory system groups together events that happen close together in time and frequency • This enables us to perceive meaningful streams of information when they are mixed with distraction • Interestingly, the brain can disentangle mixed streams only certain circumstances • E.g. “The picket fence illusion” : gaps of silence dramatically distort perception of a sentence, while bursts of noise do not