1 / 10

TANDEM OBSERVATION MODELS

TANDEM OBSERVATION MODELS. Introduction. Tandem is a method to use the predictions of a MLP as observation vectors in generative models, e..g. HMMs Extensively used in the ICSI/SRI systems: 10-20 % improvement for English, Arabic, and Mandarin

uriah
Télécharger la présentation

TANDEM OBSERVATION MODELS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TANDEM OBSERVATION MODELS

  2. Introduction • Tandem is a method to use the predictions of a MLP as observation vectors in generative models, e..g. HMMs • Extensively used in the ICSI/SRI systems: 10-20 % improvement for English, Arabic, and Mandarin • Most previous work used phone MLPs for deriving tandem (e.g., Hermansky et al. ’00, and Morgan et al. ‘05 ) • We explore tandem based on articulatory MLPs • Similar to the approach in Kirchhoff ’99 • Questions • Are articulatory tandems better than the phonetic ones? • Are factored observation models for tandem and acoustic (e.g. PLP) observations better than the observation concatenation approaches?

  3. MLP OUTPUTS LOGARITHM PRINCIPAL COMPONENT ANALYSIS SPEAKER MEAN/VAR NORMALIZATION TANDEM FEATURE Tandem Processing Steps • MLP posteriors are processed to make them Gaussian like • There are 8 articulatory MLPs; their outputs are joined together at the input (64 dims) • PCA reduces dimensionality to 26 (95% of the total variance) • Use this 26-dimensional vector as acoustic observations in an HMM or some other model • The tandem features are usually used in combination w/ a standard feature, e.g. PLP

  4. State Concatenated Observations Factored Observations State Tandem PLP PLP p(X, Y|Q) = p(X|Q) p(Y|Q) Tandem Tandem Observation Models • Feature concatenation: Simply append tandems to PLPs • All of the standard modeling methods applicable to this meta observation vector (e.g., MLLR, MMIE, and HLDA) • Factored models: Tandem and PLP distributions are factored at the HMM state output distributions - Potentially more efficient use of free parameters, especially if streams are conditionally independent • Can use e.g., separate triphone clusters for each observation

  5. Articulatory vs. Phone Tandems • Monophones on 500 vocabulary task w/o alignments; feature concatenated PLP/tandem models • All tandem systems are significantly better than PLP alone • Articulatory tandems are as good as phone tandems • Articulatory tandems from Fisher (1776 hrs) trained MLPs outperform those from SVB (3 hrs) trained MLPs

  6. Concatenation vs. Factoring • Monophone models w/o alignments • All tandem results are significant over PLP baseline • Consistent improvements from factoring; statistically significant on the 500 task

  7. Triphone Experiments • 500 vocabulary task w/o alignments • PLP x Tandem factoring uses separate decision trees for PLP and Tandem, as well as factored pdf’s • A significant improvement from factoring over the feature concatenation approach • All pairs of results are statistically significant

  8. phoneState KLT’ed log MLP outputs, separate from PLP outputs PLPs Observation factoring and weight tuning Factored tandem Results Dimensions of streams Fully factored tandem phoneState PLPs dg1 pl1 rd . . . log outputs of separate MLPs Dims after KLT account for 95% of variance

  9. PLPs KLT’ed log MLP outputs, separate from PLP outputs PLPs dg1 pl1 rd . . . Weight tuning Factored Fully factored MLP weight= 1 Language model tuned for PLP weight=1 Weight tuning in progress

  10. Summary • Tandem features w/ PLPs outperform PLPs alone for both monophones and triphones • 8-13 % relative improvements (statistically significant) • Articulatory tandems are as good as phone tandems - Further comparisons w/ phone MLPs trained on Fisher • Factored models look promising (significant results on the 500 vocabulary task) - Further experiments w/ tying, initialization - Judiciously selected dependencies between the factored vectors, instead of complete independence

More Related