1 / 17

A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing

A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing. L. Deng (Microsoft Research, Redmond) X. Cui & A. Alwan (U. California, Los Angeles) R. Pruvenok (Georgia Institute of Tech, Atlanta) J. Huang (Carnegie Mellon U., Pittsburg) S. Momen (Princeton U., Princeton)

chinue
Télécharger la présentation

A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing L. Deng (Microsoft Research, Redmond) X. Cui & A. Alwan (U. California, Los Angeles) R. Pruvenok (Georgia Institute of Tech, Atlanta) J. Huang (Carnegie Mellon U., Pittsburg) S. Momen (Princeton U., Princeton) Y. Chen (Cornell U., Ithaca)

  2. Introduction • Joint research project between MSR & IPAM of UCLA • Carried out during 2005 NSF-RIPS summer program • Main Goals: • Create a database of VTR/formant trajectories for research in speech processing (ground truth). • Quantitatively assess various existing automatic VTR/formant tracking algorithms

  3. Background • Vocal tract resonance (VTR or formant-I) --- acoustic resonance in the human tract in speech production • May differ from spectral peaks measured from the speech signal (formant-II) • Importance of VTR/formants for speech perception and production • Many techniques for automatic VTR or formant-II extraction

  4. Background (cont’d) • Difficulty of automatic VTR/formant tracking • When two formants are close to each other (e.g., /iy,y,uw,r/) • Consonant sounds when VTRs are not directly visible from spectrogram (e.g., nasals, fricatives, stops) • CV or VC transitions • Lack of standard database for quantitative evaluation of tracking algorithms • Requirement for extensive human expertise

  5. Data Selection • Subset of TIMIT utterances • 538 utterances in total • 192 utterances in core test set • 346 utterances in training set (173 speakers; one SX & one SI for each) • Balance of speaker, dialect, gender, & phoneme distributions

  6. VTR Trajectory Labeling • Start from the results of a previous VTR tracking algorithm (ICASSP 2004 paper) • Develop a software tool for manual error correction using spectrogram display • Use human expertise

  7. GUI Tool for VTR Labeling/Correction

  8. Human Expertise • Prior knowledge of nominal VTR target values for individual phones • Contextual effects of VTR values (target directed trajectories) • Overall spectral properties across entire utterance (same phones at diff times) • Effects of anti-resonances in splitting VTRs of nasalized vowels • Special formant movement patterns (e.g., velar pinch, etc.) • Etc.

  9. After correction

  10. Two Automatic Algorithms • WaveSurfer http://www.speech.kth.se/wavesurfer) (same algorithm as ESPS/xwaves, Talkin et.al) • based on LPC analysis and dynamic programming • MSR Hidden dynamic model based algorithm • Implemented by Kalman filter/smoother • Piecewise-linearized mapping from VTR to cepstra • By-product of a speech recognizer • Typing all phone VTR targets • Details in ICASSP 2004 paper

  11. Comparisons of Two Algorithms His failure to open the store by eight cost him his job

  12. Comparisons of Two Algorithms We always thought we would die with our boots on

  13. Cross-Labeler Variation Results

  14. Computing Formant Tracking Errors

  15. Computing Formant Tracking Errors--- Focusing on transitions

  16. Summary and Conclusion • VTR/Formants are critical for speech production, perception, and processing • Prior to this work, lack of standard database • Creating a database using human expertise • Immediate application: quantitative evaluation of automatic VTR/formant tracking algorithms • Second-pass verification & correction at MSR recently completed • Data soon to be publicly released from both MSR and UCLA sites.

More Related