1 / 9

Graphical Models of Articulation Using GMTK

Graphical Models of Articulation Using GMTK. Karen Livescu Massachusetts Institute of Technology CLSP Workshop 2001 August 16, 2001. Overview. Motivation Why use articulatory models for speech recognition? Why use graphical models to represent articulation? Proposed model

zlata
Télécharger la présentation

Graphical Models of Articulation Using GMTK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphical Models of Articulation Using GMTK Karen Livescu Massachusetts Institute of Technology CLSP Workshop 2001 August 16, 2001

  2. Overview • Motivation • Why use articulatory models for speech recognition? • Why use graphical models to represent articulation? • Proposed model • Issues in articulatory modeling for ASR • Choice of features • Model size and constraints • Initialization • Structure learning • Conclusion

  3. Lips Velum Glottis Tongue Why articulatory modeling? • Definition: • Articulatory features specify the state of the articulators (directly or implicitly) at a given point in time • Features can be binary/multivalued, discrete/continuous, partial/complete • Motivation: • Speech may be better described by asynchronous motion of articulators than by phones with rigid start and end times • Articulatory features concisely represent coarticulatory effects such as nasalization and inserted stop closures (“warmth”  “warmpth”) • Articulatory features can help recover information; e.g., a vowel is more likely to be nasalized if following nasal is deleted • Pronunciation modeling: phones are more likely to be modified by articulatory change rather than replaced with other phones

  4. Phone-based view: Brain: Give me a []! Lips, tongue, velum, glottis: Right on it, sir! Lips, tongue, velum, glottis: Right on it, sir! Lips, tongue, velum, glottis: Right on it, sir! Lips, tongue, velum, glottis: Right on it, sir! Brain: Give me a []! Lips: Huh? Velum, glottis: Right on it, sir ! Velum, glottis: Right on it, sir ! Tongue: Umm…yeah, OK. Example: “warmth”  “warmpth” • Articulatory view:

  5. Why graphical models to represent articulation? • Graphical models allow us to • quickly and easily specify structures with large number of variables • represent knowledge about the variables in a concise and transparent way • easily reason and communicate information about conditional independence relations in the model • perform structure learning explicitly on the variables of interest • Articulatory models use a large number of variables in each time frame, therefore taking advantage of the above properties

  6. frame i frame i+1 phone phone . . . . . . a1 a2 a1 a2 aN aN obs obs Articulatory graphical model for speech recognition • Initial version implemented at CLSP Workshop 2001

  7. GMTK implementation of articulatory structure variable : phone { type: discrete hidden cardinality NUM_PHONES; switchingparents: nil; conditionalparents: word(0), wordPosition(0) using MTCPT("wordWordPos2Phone"); } variable : voicing { type: discrete hidden cardinality 2; switchingparents: nil; conditionalparents: phone(0), voicing(-1) using MDCPT(“voicingMDCPT”); } variable : velum { type: discrete hidden cardinality 2; switchingparents: nil; conditionalparents: phone(0), velum(-1) using MDCPT("velumMDCPT"); }

  8. Issues in articulatory modeling • Choice of features • Feature set developed with Eva Holtz and Katrin Kirchhoff at WS01 • 8 features, state space size = 8,960 • Model size and constraints • Use inter-frame (and inter-articulator) constraints to limit state space • Experiment with varying the size of the state space • Initialization • Initialize from existing phone models • If multiple phones match the same feature setting, aggregate models • If no phone matches a feature setting, interpolate models of phones with similar features (similarly to HAMM of Richardson, Bilmes, and Diorio) • Discrminative structure learning over articulatory variables

  9. Conclusion • Speech may be better modeled by articulatory features than by phones • Graphical models in general, and GMTK in particular, allow for easy experimentation with articulatory models • Plan for future work: • Experiments with pre-specified articulatory structure • Structure learning on articulatory variables

More Related