1 / 14

Minimum Phone Error (MPE) Model and Feature Training

Minimum Phone Error (MPE) Model and Feature Training. ShihHsiang 2006. The derivation flow of the various training criteria. Difference. MPE v.s. ORCE ORCE focuses on word error rate and is implemented on N-best results

lyneth
Télécharger la présentation

Minimum Phone Error (MPE) Model and Feature Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Minimum Phone Error (MPE)Model and Feature Training ShihHsiang 2006

  2. The derivation flow of the various training criteria

  3. Difference • MPE v.s. ORCE • ORCE focuses on word error rate and is implemented on N-best results • MPE focuses on phone accuracy and is implemented on a word graph also introduces the prior distribution of the new estimated models (I-smoothing) • MPE v.s. MMI • MMI treated the correct transcriptions as the numerator lattice and the whole word graph as the denominator lattice or the competing sequences • MPE treats all possible correct sequences on the word graph as the numerator lattice, and treats all possible wrong sequences as the denominator lattice

  4. fMPE (cont.) • Feature-space minimum phone error (fMPE) is a discriminative training method which adds an offset to the old feature transform matrix high-dimensional feature current feature current frame average Each vector contains 10,000 Gaussian posterior probability And the Gaussian likelihoods are evaluated with no priors

  5. fMPE (cont.) • Objective Function using gradient descent to update the transformation matrix Direct differential

  6. fMPE (cont.) • When using only direct differential to update the transformation matrix, significant improvements are obtainable but then lost very soon when the acoustic model is retrained with ML • The indirect differential part thus aims to reflect the model change from the ML training with new features,

  7. offset fMPE • The difference of offset fMPE from the original fMPE is the definition of the high dimensional vector t h of posterior probabilities where represents the posterior of i -th Gaussian at time tsize: • The number of Gaussians needed is about 1000, which is significantly lower than 100000 for the original fMPE dimension dependent

  8. Dimension-weighted offset fMPE • Different from the offset fMPE which gives the same weight on each dimension of the feature offset vector • calculates the posterior probability on each dimension of the feature offset vector

  9. Experiments (on MATBN) • Error rates (%) for MPE and fMPE for different features, on different acoustic levels.

  10. Experiments (cont.) • CER(%) for offset fMPE and dimension-weighted offset fMPE with different features

  11. + = Connect to SPLICE • Decomposition Scheme 1

  12. Connect to SPLICE (cont.) • Compensation of the original feature is carried out by adding a large number of bias vectors, each of which is computed as a full-rank rotation of a small set of posterior probabilities • Maximum-Likelihood estimation denotes the term greater than remaining (n-1) terms

  13. Connect to SPLICE (cont.) • Decomposition Scheme 2 + =

  14. Connect to SPLICE (cont.) • The compensation vector consists of a linear weighted sum of a set of frame-independent correction vectors, where the weight is the posterior probability associated with the corresponding correction vector • The key difference is • the bias vector for compensation in fMPE is specific to each time frame t • the bias vector in feature-space stochastic matching is common over all frames in the utterance

More Related