Relevance Language Modeling For Speech Recognition

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP 2011 2014/1/17 Reporter:陳思澄

Outline • Introduction • Basic Relevance Model(RM) • Topic-based Relevance Model • Modeling Pairwise Word Association • Experimental • Conclusion

Introduction • In the relevance modelingto IR, each query is assumed to be associated with an unknown relevance class , and documents that are relevant to the information need expressed in the query are samples drawn from . • When RMis applied to language modeling in speech recognition, we can conceptually regard the search history as a query and each of its immediately succeeding words as a document, and estimate a relevance model for modeling the relationship between and . Relevance Documents Query

Basic Relevance Model • The task of language modeling in speech recognition can be interpreted as calculating the conditional probability . • is a search history , usually expressed as a sequence , and is one of its possible immediately succeeding word. • Because the relevance class of each search history is not known in advance, A local feedback-like can be used to obtain a set of relevant documents to estimate the joint probability .

Basic Relevance Model • where is the probability that we would randomly select and is the joint probability of simultaneously observing H and w in . • The joint probability of observing H together with w is: • Bag-of-word assumption: Assume the words are conditionally Independent given and their order is no importance.

Basic Relevance Model • The conditional probability: • The background n-gram language model trained on a large general corpus can provide the generic constraint information of lexical regularities.

Topic-based Relevance Model • TRM makes a step forward to incorporate latent topic information into RM modeling • Relevance documents of each search history are assumed to share a same set of latent topic variables describing the “word-document” co-occurrence characteristics.

Topic-based Relevance Model TRM can be represented by: ( Word of the document all come from the same topic.)

Modeling Pairwise Word Association • Instead of using RM to model the association between an entire search history and a newly decoded word, we can also use RM to render the pairwise word association between a word in the history and a newly decoded word .

Modeling Pairwise Word Association • A “composite” conditional probability for the search history to predict can be obtained by linearly combining of all words in the history: • Where the value of the nonnegative weighting coefficients are empirically set to be exponentially decayed.

By the same token, a set of latent topics to describe word-word co-occurrence relationships in a relevant document , and the pairwise word association between a history word and the decoded word is thus modeled by

Experimental setup • Speech corpus: 196 hours(MATBN) • Vocabulary size: 72 thousands words • Trigram language model was estimated from a background text corpus consisting of 170 million Chinese characters. • The baseline rescoring procedure with the background trigram language model results in a character error rate(CER) of 20.08% on the test set. Experimental • 1. We assess the effectiveness of RM and PRM with respect to different numbers of retrieved documents being used to approximate the relevance class. • 2.Measure the goodness of RM and PRM when a set of latent topic is additionally employed to describe the word-word co-occurrence relationships in a relevant document ,when the resulting models are TRM and TPRM. • 3. Compare the proposed methods with several well-practiced language model adaption methods.

Experimental • This reveals that only a small subset of relevant documents retrieved from the contemporaneous corpus is sufficient enough for dynamic language model adaptation. • PRM shows its superiority over RM for almost all adaptation settings. Results of RM and PRM (in CER(%))

Experimental While simply assuming that the model parameters are uniformly distributed tends to perform Slightly worse than that with the Dirichlat prior assumption with their best setting. • Results of TRM and TPRM (in CER(%))

Experimental • These results are at the same performance level as that obtained by TPRM. • On the other hand , TBLM has its best CER of 19.32% , for which the corresponding number of trigger pairs was determined using the development set. • Our proposed methods seem to be good surrogates for the exiting language model adaptation methods , in terms of the CER reduction.

Conclusion • We study a novel use of relevance information for dynamic language model adaptation in speech recognition. • Our methods not only inherit the merits of several existing techniques but also provide a flexible but systematic way to render the lexical and topical relationships between a search history and an upcoming word. • Empirical results on large vocabulary continuous speech recognition seem to demonstrate the utility of the presented models. • These methods can also be used to expand query models for spoken document retrieval (SDR) tasks.

Relevance Language Modeling For Speech Recognition