Download
bayesian learning for latent semantic analysis n.
Skip this Video
Loading SlideShow in 5 Seconds..
Bayesian Learning for Latent Semantic Analysis PowerPoint Presentation
Download Presentation
Bayesian Learning for Latent Semantic Analysis

Bayesian Learning for Latent Semantic Analysis

210 Vues Download Presentation
Télécharger la présentation

Bayesian Learning for Latent Semantic Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Bayesian Learning for Latent Semantic Analysis Presenter: Hsuan-Sheng Chiu Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu

  2. Reference • Chia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval”, 2005 • Q. Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate”, 1997 Speech Lab. NTNU

  3. Outline • Introduction • PLSA • ML (Maximum Likelihood) • MAP (Maximum A Posterior) • QB (Quasi-Bayes) • Experiments • Conclusions Speech Lab. NTNU

  4. Introduction • LSA vs. PLSA • Linear algebra and probability • Semantic space and latent topics • Batch learning vs. Incremental learning Speech Lab. NTNU

  5. PLSA • PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data. • Topics (hidden variables) • Corpus (document-word pairs) Speech Lab. NTNU

  6. PLSA • Assume that di and wj are independent conditionally on the mixture of associated topic zk • Joint probability: Speech Lab. NTNU

  7. ML PLSA • Log likelihood of Y: • ML estimation: Speech Lab. NTNU

  8. ML PLSA • Maximization: Speech Lab. NTNU

  9. ML PLSA • Complete data: • Incomplete data: • EM (Expectation-Maximization) Algorithm • E-step • M-step Speech Lab. NTNU

  10. ML PLSA • E-Step Speech Lab. NTNU

  11. ML PLSA • Auxiliary function: • And Speech Lab. NTNU

  12. ML PLSA • M-step: • Lagrange multiplier Speech Lab. NTNU

  13. ML PLSA • Differentiation • New parameter estimation: Speech Lab. NTNU

  14. MAP PLSA • Estimation by Maximizing the posteriori probability: • Definition of prior distribution: • Dirichlet density: • Prior density: Kronecker delta Assume and are independent Speech Lab. NTNU

  15. MAP PLSA • Consider prior density: • Maximum a Posteriori: Speech Lab. NTNU

  16. MAP PLSA • E-step: • expectation • Auxiliary function: Speech Lab. NTNU

  17. MAP PLSA • M-step • Lagrange multiplier Speech Lab. NTNU

  18. MAP PLSA • Auxiliary function: Speech Lab. NTNU

  19. MAP PLSA • Differentiation • New parameter estimation: Speech Lab. NTNU

  20. QB PLSA • It needs to update continuously for an online information system. • Estimation by maximize the posteriori probability: • Posterior density is approximated by the closest tractable prior density with hyperparameters • As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters. Speech Lab. NTNU

  21. QB PLSA • Conjugate prior: • In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type of distribution. • A close-form solution • A reproducible prior/posteriori pair for incremental learning Speech Lab. NTNU

  22. QB PLSA • Hyperparameter α: Speech Lab. NTNU

  23. QB PLSA • After careful arrangement, exponential of posteriori expectation function can be expressed: • A reproducible prior/posterior pair is generated to build the updating mechanism of hyperparameters Speech Lab. NTNU

  24. Initial Hyperparameters • A open issue in Bayesian learning • If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters updating in incremental training. Speech Lab. NTNU

  25. Experiments • MED Corpus: • 1033 medical abstracts with 30 queries • 7014 unique terms • 433 abstracts for ML training • 600 abstracts for MAP or QB training • Query subset for testing • K=8 • Reuters-21578 • 4270 documents for training • 2925 for QB learning • 2790 documents for testing • 13353 unique words • 10 categories Speech Lab. NTNU

  26. Experiments Speech Lab. NTNU

  27. Experiments Speech Lab. NTNU

  28. Experiments Speech Lab. NTNU

  29. Conclusions • This paper presented an adaptive text modeling and classification approach for PLSA based information system. • Future work: • Extension of PLSA for bigram or trigram will be explored. • Application for spoken document classification and retrieval Speech Lab. NTNU

  30. Discriminative Maximum Entropy Language Model for Speech Recognition Chuang-Hua Chueh, To-Chang Chien and Jen-Tzung Chien Presenter: Hsuan-Sheng Chiu

  31. Reference • R. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponential language models : a vehicle for linguistic statistical integration”, 2001 • W.H. Tsai, “An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition”, 2005 Speech Lab. NTNU

  32. Outline • Introduction • Whole-sentence exponential model • Discriminative ME language model • Experiment • Conclusions Speech Lab. NTNU

  33. Introduction • Language model • Statistical n-gram model • Latent semantic language model • Structured language model • Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution. Speech Lab. NTNU

  34. Whole-Sentence Exponential Model • Traditional method: • Exponential form: • Usage: • When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists. Speech Lab. NTNU

  35. Whole-Sentence ME Language Model • Expectation of feature function: • Empirical: • Actual: • Constraint: Speech Lab. NTNU

  36. Whole-Sentence ME Language Model • To Solve the constrained optimization problem: Speech Lab. NTNU

  37. GIS algorithm Speech Lab. NTNU

  38. Discriminative ME Language Model • In general, ME can be considered as a maximum likelihood model using log-linear distribution. • Propose a Discriminative language model based on whole-sentence ME model (DME) Speech Lab. NTNU

  39. Discriminative ME Language Model • Acoustic features for ME estimation: • Sentence-level log-likelihood ratio of competing and target sentences • Feature weight parameter: • Namely, we activate feature parameter to be one for those speech signals observed in training database Speech Lab. NTNU

  40. Discriminative ME Language Model • New estimation: • Upgrade to discriminative linguistic parameters Speech Lab. NTNU

  41. Discriminative ME Language Model Speech Lab. NTNU

  42. Experiment • Corpus: TCC300 • 32 mixtures • 12 Mel-frequency cepstral coefficients • 1 log-energy and first derivation • 4200 sentences for training, 450 for testing • Corpus: Academia Sinica CKIP balanced corpus • Five million words • Vocabulary 32909 words Speech Lab. NTNU

  43. Experiment Speech Lab. NTNU

  44. Conclusions • A new ME language model integrating linguistic and acoustic features for speech recognition • The derived ME language model was inherent with discriminative power. • DME model involved a constrained optimization procedure and was powerful for knowledge integration. Speech Lab. NTNU

  45. Relation between DME and MMI • MMI criterion: • Modified MMI criterion: • Express ME model as ML model: Speech Lab. NTNU

  46. Relation between DME and MMI • The optimal parameter: Speech Lab. NTNU

  47. Relation between DME and MMI Speech Lab. NTNU

  48. Relation between DME and MMI Speech Lab. NTNU