1 / 28

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition. 朱國華 89/12/06. References. 簡仁宗、廖國鴻 ,“ 具有累進學習能力之貝氏預測法則在汽車語音辨識之應用” , ROCLING XIII, pp.179~197, 2000.

morse
Télécharger la présentation

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition 朱國華 89/12/06

  2. References • 簡仁宗、廖國鴻,“ 具有累進學習能力之貝氏預測法則在汽車語音辨識之應用”, ROCLING XIII, pp.179~197, 2000. • H. Jiang, K. Hirose and Q. Huo, “Robust Speech Recognition Based on a Bayesian Prediction Approach”,IEEE Transaction on Speech and Audio Processing, Vol. 7, no. 4, pp. 426-440, July,1999. • J.T. Chien, “Online Hierarchical Transformation of Hidden Markov Models for Speech Recognition”, IEEE Transaction on Speech and Audio Processing, Vol. 7, no. 6, pp. 656-667, November 1999.

  3. Contents • Introduction • Problem Formulation • Some Decision Rules for ASR • Transform-Based Bayesian Predictive Classification(TBPC) • Derivation of the Bayesian Predictive Likelihood Measurement (BPLM) • Online Prior Evolution(OPE) • Experiments and Discussions

  4. Introduction • Transform-based Bayesian predictive classification robustness decision rules for noisy speech recognition. • Online prior evolution to copy with the nonstationary testing environment (both for environmental and speaker’s variation).

  5. Problem Formulation • Approximate MAP(Quasi Bayesian, QB) Estimation of ASR: n : index of the input test utterance W : word content or syllable string of input utternce η: acoustic transformation parameter(function) n: ={X1,X2,…, Xn } be the i.i.d. and successively observed block samples φ(n-1) : represents the estimated environmental statistic from previous input utterance X1,X2,…, Xn-1 .

  6. Problem Formulation(cont.) • Assume W and ηare independent, so the previous QB estimation can be rewritten as follow: {p(W) :Language Model}

  7. Some Decision Rules for ASR • Plug-In MAP Rule: • The performance of plug-in MAP decision rule depends on the choice of estimation approach (ML, MAP, discriminative training, etc.)., the nature and size of the training data, and the degree of the mismatch between training and testing conditions. • Point estimation.

  8. Some Decision Rules for ASR(cont.) • Minimax Rule : • Nonparametric Compensation. • Minimizes the upper bound of the worst-case probability of classification error. • Assume the unknown true parameter  is a r. v. with uniform distribution in a neighborhood region .

  9. TBPC • Transform Bayesian Predictive Classification(TBPC) Rule: • where likelihood is obtained by:

  10. TBPC (cont.) • TBPC treat the transformed parameter as a random variable(not the point estimation). • The average is taken both with respect to the sampling variation in the expected testing data and with respect to the uncertainty described by the prior pdf p(Xn|W,). • TBPC can be applied both to supervised and unsupervised learning environment.

  11. TBPC (cont.) • Transformation-based Adaptation: • For a given HMM model with L states and K mixtures ={i}={ik,ik,rik}, i=1~L, k=1~K,the estimated transformation function G(n)() of the given testing utterance nis defined as : • where c is the index of the transformation cluster (hierarchical transformation).

  12. TBPC (cont.) • Implementation: (Approach I) • Considering the missing data problem, we use the Viterbi TBPC for the likelihood : • Frame-synchronous Viterbi Bayesian search algorithm can be utilized to overcome the memory space and computation load(Jiang, IEEE SAP 1999).

  13. TBPC (cont.) • Implementation: (Approach I cont.) • In Jiang ( IEEE SAP 1999), they only considered the uncertainty of the mean vectors of CDHMM with diagonal covariance matrices and assume they are uniformly distributed in a neighborhood of pretrained means(no online adaptation).

  14. TBPC (cont.) • Implementation: (Approach II) • Bayesian Predictive Density Based Model Compensation(BP-MC) of the K mixture state observation pdf is : • where f(xt(n)|ik)is the Bayesian predictive density and is defined below:

  15. TBPC (cont.) • Implementation: (Approach II cont.) • The choice of prior pdf: • In Chien (RocLing 2000), he adopted the multivariate Gaussian pdf which is based on the conjugate prior of statistical.

  16. Derivation of the BPLM • Since p(xt(n)|ik,c) and p(c|c(n-1)) are both Gaussian, we can derivate the f(xt(n)|ik) : (assume both kc and rik are diagonal precision matrix)

  17. Online Prior Evolution • Viterbi Approach: • where (sn*,ln*) is the most likely state and mixture sequence corresponding to Xn, respectively.

  18. Online Prior Evolution (cont.) • The parameter statistics of the c th cluster are:

  19. Online Prior Evolution (cont.) • Where • From above derivation, we can online adapted(learning) the c(n-1) from c(n-1).

  20. Online Prior Evolution (cont.) • We can estimate the initial parameter c (0) from the prior given training data.

  21. Experiments • Training and testing data set I: (Mic1,clean) • 70 males and 70 females , each person records 10 continuous Mandarin digit sentence. • 50 males’ + 50 females’ utterance are for training, the other 20 males’ and females’ are for testing. • Training and testing data set II: (Mic2,noisy) • 2 males+2 females in Toyota Corolla 1.8 • 3 males+3females in Nissan Sentra 1.6 • Each speaker records individually 10 sentences in idle speed, 20 sentences in 50km speed, and 30 sentences in 90 km speed . • Arbitrary choose 5 sentences for training, others for testing.

  22. Experiments (cont.) • Signal to Noise Ratio :

  23. Experiments (cont.) • Recognizer Structure • Features : 12 order LPC derived cepstrum and -cepstrum plus  and  log energy. • HMM Model : 7 states and 4 mixtures for each digit model, plus 3 different single state background noise model.

  24. Experiments (cont.) • Baseline results:

  25. Experiments (cont.) • Supervised DER corresponding to the number of training data.

  26. Experiments (cont.) • Unsupervised TBPC-OPE DER . • (parentheses means the % improvement) • In 2 clusters case , 10 digits are one cluster , and 3 background noise model are the other.

  27. Experiments (cont.) • Unsupervised Performance Comparison of different BPC approaches:

  28. Discussions • Jiang’s results are the worst among all because of the fixed prior distribution. • Surendran’s results are worse than TBPC-OPE because the adaptation of prior pdf is just count on the current input utterance but not the accumulated ones. • We can also adjust the weight (Dirichlet dist.) and variance (Wishart dist.) with the mean at the same time of the HMM model of the BP-MC approach.

More Related