Learning Structured Models for Phone Recognition

Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein

Acoustic Modeling

Motivation • Standard acoustic models impose many structural constraints • We propose an automatic approach • Use TIMIT Dataset • MFCC features • Full covariance Gaussians (Young and Woodland, 1994)

? ? ? ? ? ? ? ? ? ? Phone Classification

Phone Classification æ

HMMs for Phone Classification

HMMs for Phone Classification Temporal Structure

Standard subphone/mixture HMM Temporal Structure Gaussian Mixtures

Our Model Standard Model Fully Connected Single Gaussians

25.6% 23.9% Hierarchical Baum-Welch Training 32.1% 28.7%

Phone Classification Results

? ? ? ? ? ? ? ? ? Phone Recognition

Standard State-Tied Acoustic Models

No more State-Tying

No more Gaussian Mixtures

Fully connected internal structure

Fully connected external structure

Refinement of the /ih/-phone

Refinement of the /l/-phone

Hierarchical Refinement Results

t-1 t t+1 t-1 t t+1 Merging • Not all phones are equally complex • Compute log likelihood loss from merging Split model Merged at one node

t-1 t t+1 t-1 t t+1 Merging Criterion

Split and Merge Results

HMM states per phone

Alignment Results

Alignment State Distribution

Inference • State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5 • Phone sequence: d- d- d-d-ae- ae- ae- ae- d- d-d- d- d • Transcription d - ae - d Viterbi Variational ???

Solution: : Posterior edge marginals Variational Inference Variational Approximation:

Phone Recognition Results

Conclusions • Minimalist, Automatic Approach • Unconstrained • Accurate • Phone Classification • Competitive with state-of-the-art discriminative methods despite being generative • Phone Recognition • Better than standard state-tied triphone models

Thank you! http://nlp.cs.berkeley.edu

Learning Structured Models for Phone Recognition

Learning Structured Models for Phone Recognition

Presentation Transcript

Structured Thread Models

Stage- / Size-Structured Models

Structured Models for Multi-Agent Interactions

Learning Models for Object Recognition from Natural Language Descriptions

Structured learning

Structured Learning Conversations

Hidden Markov Models for Speech Recognition

Semi-Structured Data Models

Structured Models for Decision Making

Part III Learning structured representations Hierarchical Bayesian models

Structure Learning for NLP Named-entity recognition using generative models

Age-structured models (continued)

Mobile Phone Application for Code Recognition

Learning structured ouputs

Demographic matrix models for structured populations

Structured Workplace Learning

Structured Text Retrieval Models

Language Models For Speech Recognition

Novel Speech Recognition Models for Arabic

Structured learning

Learning Structured Prediction Models: A Large Margin Approach