1 / 36

Learning Structured Models for Phone Recognition

Learning Structured Models for Phone Recognition. Slav Petrov, Adam Pauls, Dan Klein. Acoustic Modeling. Motivation. Standard acoustic models impose many structural constraints We propose an automatic approach Use TIMIT Dataset MFCC features Full covariance Gaussians.

Télécharger la présentation

Learning Structured Models for Phone Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

  2. Acoustic Modeling

  3. Motivation • Standard acoustic models impose many structural constraints • We propose an automatic approach • Use TIMIT Dataset • MFCC features • Full covariance Gaussians (Young and Woodland, 1994)

  4. ? ? ? ? ? ? ? ? ? ? Phone Classification

  5. Phone Classification æ

  6. HMMs for Phone Classification

  7. HMMs for Phone Classification Temporal Structure

  8. Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures

  9. Our Model Standard Model Fully Connected Single Gaussians

  10. 25.6% 23.9% Hierarchical Baum-Welch Training 32.1% 28.7%

  11. Phone Classification Results

  12. ? ? ? ? ? ? ? ? ? Phone Recognition

  13. Standard State-Tied Acoustic Models

  14. No more State-Tying

  15. No more Gaussian Mixtures

  16. Fully connected internal structure

  17. Fully connected external structure

  18. Refinement of the /ih/-phone

  19. Refinement of the /ih/-phone

  20. Refinement of the /ih/-phone

  21. Refinement of the /ih/-phone

  22. Refinement of the /l/-phone

  23. Hierarchical Refinement Results

  24. t-1 t t+1 t-1 t t+1 Merging • Not all phones are equally complex • Compute log likelihood loss from merging Split model Merged at one node

  25. t-1 t t+1 t-1 t t+1 Merging Criterion

  26. Split and Merge Results

  27. HMM states per phone

  28. HMM states per phone

  29. HMM states per phone

  30. Alignment Results

  31. Alignment State Distribution

  32. Inference • State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5 • Phone sequence: d- d- d-d-ae- ae- ae- ae- d- d-d- d- d • Transcription d - ae - d Viterbi Variational ???

  33. Solution: : Posterior edge marginals Variational Inference Variational Approximation:

  34. Phone Recognition Results

  35. Conclusions • Minimalist, Automatic Approach • Unconstrained • Accurate • Phone Classification • Competitive with state-of-the-art discriminative methods despite being generative • Phone Recognition • Better than standard state-tied triphone models

  36. Thank you! http://nlp.cs.berkeley.edu

More Related