
Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein
Motivation • Standard acoustic models impose many structural constraints • We propose an automatic approach • Use TIMIT Dataset • MFCC features • Full covariance Gaussians (Young and Woodland, 1994)
? ? ? ? ? ? ? ? ? ? Phone Classification
HMMs for Phone Classification Temporal Structure
Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures
Our Model Standard Model Fully Connected Single Gaussians
25.6% 23.9% Hierarchical Baum-Welch Training 32.1% 28.7%
? ? ? ? ? ? ? ? ? Phone Recognition
t-1 t t+1 t-1 t t+1 Merging • Not all phones are equally complex • Compute log likelihood loss from merging Split model Merged at one node
t-1 t t+1 t-1 t t+1 Merging Criterion
Alignment Results
Inference • State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5 • Phone sequence: d- d- d-d-ae- ae- ae- ae- d- d-d- d- d • Transcription d - ae - d Viterbi Variational ???
Solution: : Posterior edge marginals Variational Inference Variational Approximation:
Conclusions • Minimalist, Automatic Approach • Unconstrained • Accurate • Phone Classification • Competitive with state-of-the-art discriminative methods despite being generative • Phone Recognition • Better than standard state-tied triphone models
Thank you! http://nlp.cs.berkeley.edu