1 / 28

From speech signal acoustics to perception

From speech signal acoustics to perception. Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC). NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 4, 2002. Overview.

ermin
Télécharger la présentation

From speech signal acoustics to perception

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From speech signal acousticsto perception Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 4, 2002

  2. Overview • how do we perceive (speech) dynamics? • The Intelligent Ear. On the Nature of Sound Perception, by Reinier Plomp (2002) • from psychoacoustics to speech perception • (lack of) context; robustness; continuity • V and C reduction; coarticulation • perceptual compensation for artic. undershoot? • speech efficiency • conclusions From speech signal acoustics to perception, Il Ciocco

  3. Various scientific preferences • several biases have affected the history of (speech &) hearing research (Plomp, 2002): • dominance of sinusoidal tones as stimuli • preference for microscopic approach (e.g., phoneme discrimination rather than intelligibility) • emphasis on psychophysical (rather than cognitive) aspects of hearing • clean stimuli in the lab rather than the acoustic reality of the outside world (disruptive sounds) From speech signal acoustics to perception, Il Ciocco

  4. Psychoacoustics - speech perc. • duration, pitch, loudness, timbre, direction • absolute and masked threshold, jnd, discrim. • continuity • complexity (pure - complex tone, voicing) • effect of context, meaning (intell.), freq. occ. • phoneme: more text-guided than perceived • speech perceptual tasks: • phoneme —> sent. identif.; discrim.; matching From speech signal acoustics to perception, Il Ciocco

  5. F2 3 - 5% frequency 1.5 Hz BW 20 - 40% Detection thresholds and jnd multi-harmonic, simple, stationary signals single-formant-like periodic signals From speech signal acoustics to perception, Il Ciocco

  6. Perceiving speech-like trans. • Ph.D thesis A. van Wieringen (1995) • “Perceiving dynamic speechlike sounds. Psycho-acoustics and speech perception” • see also vWie & Pols, Acustica 84 (1998) 520-528 • stimulus characteristics • (segmented and/or reversed) natural or synthetic • tone glide; single- or multi-formant transition • isolated trans.; initial or final trans. with steady st. • converg. or diverg. trans. (var. duration or slope) • task: jnd/DL; matching; abs. ident.; classif. From speech signal acoustics to perception, Il Ciocco

  7. complex initial final simple short longer trans. DL for short speech-like transitions Adopted from van Wieringen & Pols (1998), Acta Acustica 84, 520-528 “Discrimination of short and rapid speechlike transitions” From speech signal acoustics to perception, Il Ciocco

  8. Perceiving (speech) dynamics • vowel perception w/w or w/o transitions? • our claims (vSon, IFA Proc. 17 (1993)): • only evidence for compensatory processes, i.e. perceptual-overshoot and dynamic-specification, when in an appropriate context • synthetic isolated dynamic formant tracks lead to perceptual undershoot (=averaging) • silent center studies are ambiguous • concl.: info in formant dynamics is only used when V’s are heard in appropriate context From speech signal acoustics to perception, Il Ciocco

  9. Vowel identification • compare V responses for dynamic stimuli with those for static stimuli • calculate net shift in V responses per onglide (CV), complete (CVC), or offglide (VC) • result: responses average over the trailing part of the formant track From speech signal acoustics to perception, Il Ciocco

  10. Perceptual undershoot Net shift in vowel responses to tokens with curved formant tracks vs. stationary tokens. All values significant, except small open triangles From speech signal acoustics to perception, Il Ciocco

  11. Effect of local context • “Perisegmental speech improves consonant and vowel identification”, vSon & Pols, Speech Comm. 29,1-22 (1999) • also “Phoneme recognition as a function of task and context”, IFA Proc. 24, 27-38 (2001) and Proc. SPRAAC, 25-30 (2001) • also Pols & vSon (1993), “Acoustics and perception of dynamic vowel segments”, Speech Comm. 13, 135-147 From speech signal acoustics to perception, Il Ciocco

  12. V and C identification • gated tokens from 120 CVC speech fragments taken from a long text reading • 50 ms V kernel, + V trans., + C part (L/R) • stimuli randomized; V identification (17 Ss) and Ci and Cf identification (15 Ss) • results: • phoneme identification benefits from extra speech • left context more beneficial than right context • better identification when also other member of pair was identified correctly (context effect) From speech signal acoustics to perception, Il Ciocco

  13. Error rates of vowel identification for the individual stimulus token types. Long-short vowel errors (/α-a:, -o:/) are ignored c

  14. V and C in CV tokens were identified better when the other member of the pair was identified correctly

  15. Effect of (lack of) context • 100 Dutch listeners identifying V segments • “Vowel contrast reduction”, K-vBeinum (1980) n ASC = 1/n Σ |LFi - LFi|2(total variance), LFi = 100 10log Fi i=1 From speech signal acoustics to perception, Il Ciocco

  16. Human word intelligibility vs. noise from Ph.D thesis H. Steeneken (1992) ‘On measuring and predicting speech intelligibility’ From speech signal acoustics to perception, Il Ciocco

  17. Robustness to degraded speech • speech = time-modulated signal in frequency bands • relatively insensitive to (spectral) distortions • prerequisite for digital hearing aid • modulating spectral slope: -5 to +5 dB/oct, 0.25-2 Hz • temporal smearing of envelope modulation • ca. 4 Hz max. in modulation spectrum  syllable • LP>4 Hz and HP<8 Hz little effect on intelligibility • spectral envelope smearing • for BW>1/3 oct masked SRT starts to degrade (for references, see keynote paper Pols in Proc. ICPhS’99) From speech signal acoustics to perception, Il Ciocco

  18. Some examples • partly reversed speech (Saberi & Perrott, Nature, 4/99) • fixed duration segments time reversed or shifted in time • perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversed original ) • low frequency modulation envelope (3-8 Hz) vs. acoustic spectrum • syllable as information unit? (S. Greenberg) • gap and click restoration (Warren) • gating experiments From speech signal acoustics to perception, Il Ciocco

  19. 2000 Hz 1200 900 500 —>time Continuity, especiallywhile masked • continuity effect (Miller & Licklider), auditory induction (Warren), pulsation threshold (Houtgast) • also for gliding tones • also for complex tones • also for pitch • fission, fusion • segregation, streaming • phonemic restoration From speech signal acoustics to perception, Il Ciocco

  20. V and C reduction, coarticulation • spectral variability is not random but, at least partly, speaker-, style-, and context-specific • read - spontaneous; stressed - unstressed • not just for vowels, but also for consonants • duration; spectral balance • intervocalic sound energy difference • F2 slope difference; locus equation From speech signal acoustics to perception, Il Ciocco

  21. C-duration C error rate Mean consonant duration Mean error rate for C identification 791 VCV pairs (read & spontan.; stressed & unstr. segments; one male); C-identification by 22 Dutch subjects Adopted from van Son & Pols (Eurospeech’97)

  22. Perception of ac. V reduction • Ph.D thesis Dick van Bergem (1995) • “Acoustic and lexical vowel reduction” • lexical V reduction: Fr /betõ/ vs. Du /b@tOn/ • acoustic V reduction: • Du ‘miljoen’ as /mIljun/ or as /m@ljun/ • identify the unstressed vowels (as V or @) • by 20 listeners (8M, 12 F) • in 47 words (cond. W and S) • or 20 words (cond. P), like ‘milJOEN’ or ‘biosCOOP’ • spoken by 20 male speakers (2280 stimuli) From speech signal acoustics to perception, Il Ciocco

  23. 4 reduction stages for 20 speakers % schwa responses on /I/ by 20 listeners 5% 36% 60% model prediction for schwa in this m-l context 69% adapted from vBergem (1995) Conclusion: Vowel reduction is not centralization but contextual assimilation

  24. Speech efficiency • speech is most efficient if it contains only the information needed to understand it: “Speech is the missing information” (Lindblom, JASA ‘96) • less information needed for more predictable things: • shorter duration and more spectral reduction for high-frequent syllables and words • C-confusion correlates with acoustic factors (duration, CoG) and with information content (syll./word freq.) I(x) = -log2(Prob(x)) in bits (see van Son, Koopmans-van Beinum, and Pols (ICSLP’98)) From speech signal acoustics to perception, Il Ciocco

  25. Correlation between consonant confusion and 4 measures indicated Dutch male sp. 20 min. R/S 12 k syll. 8k words 791 VCV R/S - 308 lex. str. - 483 unstr. C ident. 22 Ss Adopted from van Son et al. (Proc. ICSLP’98)

  26. Conclusions • perceiving speech (segments) very much depends on speech quality and context • isolated segments is also a kind of context • only ‘proper’ interpretation of formant transitions (perceptual compensation for spectro-temporal undershoot) when presented in an appropriate context • reduced V are best perceived as schwa if transitions are contextually assimilated From speech signal acoustics to perception, Il Ciocco

More Related