Bayesian Learning for Latent Semantic Analysis

Bayesian Learning for Latent Semantic Analysis Presenter: Hsuan-Sheng Chiu Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu

Reference • Chia-Sheng Wu, “Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval”, 2005 • Q. Huo and C.-H. Lee, “On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate”, 1997 Speech Lab. NTNU

Outline • Introduction • PLSA • ML (Maximum Likelihood) • MAP (Maximum A Posterior) • QB (Quasi-Bayes) • Experiments • Conclusions Speech Lab. NTNU

Introduction • LSA vs. PLSA • Linear algebra and probability • Semantic space and latent topics • Batch learning vs. Incremental learning Speech Lab. NTNU

PLSA • PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data. • Topics (hidden variables) • Corpus (document-word pairs) Speech Lab. NTNU

PLSA • Assume that di and wj are independent conditionally on the mixture of associated topic zk • Joint probability: Speech Lab. NTNU

ML PLSA • Log likelihood of Y: • ML estimation: Speech Lab. NTNU

ML PLSA • Maximization: Speech Lab. NTNU

ML PLSA • Complete data: • Incomplete data: • EM (Expectation-Maximization) Algorithm • E-step • M-step Speech Lab. NTNU

ML PLSA • E-Step Speech Lab. NTNU

ML PLSA • Auxiliary function: • And Speech Lab. NTNU

ML PLSA • M-step: • Lagrange multiplier Speech Lab. NTNU

ML PLSA • Differentiation • New parameter estimation: Speech Lab. NTNU

MAP PLSA • Estimation by Maximizing the posteriori probability: • Definition of prior distribution: • Dirichlet density: • Prior density: Kronecker delta Assume and are independent Speech Lab. NTNU

MAP PLSA • Consider prior density: • Maximum a Posteriori: Speech Lab. NTNU

MAP PLSA • E-step: • expectation • Auxiliary function: Speech Lab. NTNU

MAP PLSA • M-step • Lagrange multiplier Speech Lab. NTNU

MAP PLSA • Auxiliary function: Speech Lab. NTNU

MAP PLSA • Differentiation • New parameter estimation: Speech Lab. NTNU

QB PLSA • It needs to update continuously for an online information system. • Estimation by maximize the posteriori probability: • Posterior density is approximated by the closest tractable prior density with hyperparameters • As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters. Speech Lab. NTNU

QB PLSA • Conjugate prior: • In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type of distribution. • A close-form solution • A reproducible prior/posteriori pair for incremental learning Speech Lab. NTNU

QB PLSA • Hyperparameter α: Speech Lab. NTNU

QB PLSA • After careful arrangement, exponential of posteriori expectation function can be expressed: • A reproducible prior/posterior pair is generated to build the updating mechanism of hyperparameters Speech Lab. NTNU

Initial Hyperparameters • A open issue in Bayesian learning • If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters updating in incremental training. Speech Lab. NTNU

Experiments • MED Corpus: • 1033 medical abstracts with 30 queries • 7014 unique terms • 433 abstracts for ML training • 600 abstracts for MAP or QB training • Query subset for testing • K=8 • Reuters-21578 • 4270 documents for training • 2925 for QB learning • 2790 documents for testing • 13353 unique words • 10 categories Speech Lab. NTNU

Experiments Speech Lab. NTNU

Conclusions • This paper presented an adaptive text modeling and classification approach for PLSA based information system. • Future work: • Extension of PLSA for bigram or trigram will be explored. • Application for spoken document classification and retrieval Speech Lab. NTNU

Discriminative Maximum Entropy Language Model for Speech Recognition Chuang-Hua Chueh, To-Chang Chien and Jen-Tzung Chien Presenter: Hsuan-Sheng Chiu

Reference • R. Rosenfeld, S. F. Chen and X. Zhu, “Whole-sentence exponential language models : a vehicle for linguistic statistical integration”, 2001 • W.H. Tsai, “An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition”, 2005 Speech Lab. NTNU

Outline • Introduction • Whole-sentence exponential model • Discriminative ME language model • Experiment • Conclusions Speech Lab. NTNU

Introduction • Language model • Statistical n-gram model • Latent semantic language model • Structured language model • Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution. Speech Lab. NTNU

Whole-Sentence Exponential Model • Traditional method: • Exponential form: • Usage: • When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists. Speech Lab. NTNU

Whole-Sentence ME Language Model • Expectation of feature function: • Empirical: • Actual: • Constraint: Speech Lab. NTNU

Whole-Sentence ME Language Model • To Solve the constrained optimization problem: Speech Lab. NTNU

GIS algorithm Speech Lab. NTNU

Discriminative ME Language Model • In general, ME can be considered as a maximum likelihood model using log-linear distribution. • Propose a Discriminative language model based on whole-sentence ME model (DME) Speech Lab. NTNU

Discriminative ME Language Model • Acoustic features for ME estimation: • Sentence-level log-likelihood ratio of competing and target sentences • Feature weight parameter: • Namely, we activate feature parameter to be one for those speech signals observed in training database Speech Lab. NTNU

Discriminative ME Language Model • New estimation: • Upgrade to discriminative linguistic parameters Speech Lab. NTNU

Discriminative ME Language Model Speech Lab. NTNU

Experiment • Corpus: TCC300 • 32 mixtures • 12 Mel-frequency cepstral coefficients • 1 log-energy and first derivation • 4200 sentences for training, 450 for testing • Corpus: Academia Sinica CKIP balanced corpus • Five million words • Vocabulary 32909 words Speech Lab. NTNU

Experiment Speech Lab. NTNU

Conclusions • A new ME language model integrating linguistic and acoustic features for speech recognition • The derived ME language model was inherent with discriminative power. • DME model involved a constrained optimization procedure and was powerful for knowledge integration. Speech Lab. NTNU

Relation between DME and MMI • MMI criterion: • Modified MMI criterion: • Express ME model as ML model: Speech Lab. NTNU

Relation between DME and MMI • The optimal parameter: Speech Lab. NTNU

Relation between DME and MMI Speech Lab. NTNU

Bayesian Learning for Latent Semantic Analysis

Bayesian Learning for Latent Semantic Analysis

Presentation Transcript

An Introduction to Latent Semantic Analysis

Latent Semantic Analysis (LSA)

Latent Semantic Indexing: A probabilistic Analysis

Latent Semantic Analysis

IR Models: Latent Semantic Analysis

Lecture 5: Probabilistic Latent Semantic Analysis

Multi-Relational Latent Semantic Analysis .

Paper: Indexing by Latent Semantic Analysis

An Introduction to Latent Semantic Analysis

Lecture 5: Probabilistic Latent Semantic Analysis

Indexing by Latent Semantic Analysis

Bayesian Learning for Latent Semantic Analysis

Probabilistic Latent Semantic Analysis

Latent Semantic Analysis

Introducing Latent Semantic Analysis

Latent Semantic Indexing

Latent Semantic Indexing: A probabilistic Analysis

Latent Semantic Analysis (LSA)

Latent Semantic Analysis