Download
learning structured models for phone recognition n.
Skip this Video
Loading SlideShow in 5 Seconds..
Learning Structured Models for Phone Recognition PowerPoint Presentation
Download Presentation
Learning Structured Models for Phone Recognition

Learning Structured Models for Phone Recognition

134 Vues Download Presentation
Télécharger la présentation

Learning Structured Models for Phone Recognition

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

  2. Acoustic Modeling

  3. Motivation • Standard acoustic models impose many structural constraints • We propose an automatic approach • Use TIMIT Dataset • MFCC features • Full covariance Gaussians (Young and Woodland, 1994)

  4. ? ? ? ? ? ? ? ? ? ? Phone Classification

  5. Phone Classification æ

  6. HMMs for Phone Classification

  7. HMMs for Phone Classification Temporal Structure

  8. Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures

  9. Our Model Standard Model Fully Connected Single Gaussians

  10. 25.6% 23.9% Hierarchical Baum-Welch Training 32.1% 28.7%

  11. Phone Classification Results

  12. ? ? ? ? ? ? ? ? ? Phone Recognition

  13. Standard State-Tied Acoustic Models

  14. No more State-Tying

  15. No more Gaussian Mixtures

  16. Fully connected internal structure

  17. Fully connected external structure

  18. Refinement of the /ih/-phone

  19. Refinement of the /ih/-phone

  20. Refinement of the /ih/-phone

  21. Refinement of the /ih/-phone

  22. Refinement of the /l/-phone

  23. Hierarchical Refinement Results

  24. t-1 t t+1 t-1 t t+1 Merging • Not all phones are equally complex • Compute log likelihood loss from merging Split model Merged at one node

  25. t-1 t t+1 t-1 t t+1 Merging Criterion

  26. Split and Merge Results

  27. HMM states per phone

  28. HMM states per phone

  29. HMM states per phone

  30. Alignment Results

  31. Alignment State Distribution

  32. Inference • State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5 • Phone sequence: d- d- d-d-ae- ae- ae- ae- d- d-d- d- d • Transcription d - ae - d Viterbi Variational ???

  33. Solution: : Posterior edge marginals Variational Inference Variational Approximation:

  34. Phone Recognition Results

  35. Conclusions • Minimalist, Automatic Approach • Unconstrained • Accurate • Phone Classification • Competitive with state-of-the-art discriminative methods despite being generative • Phone Recognition • Better than standard state-tied triphone models

  36. Thank you! http://nlp.cs.berkeley.edu