90 likes | 227 Vues
This paper discusses the development and application of adaptive language modeling techniques aimed at improving multi-domain speech recognition. It highlights the IBM Superhuman Recognition Program and various experiments conducted on datasets such as Switchboard, voicemail, and call center interactions. Key methods include tuning mixing weights, building new language models from relevant data, and dynamic adaptation strategies. The positive outcomes of simple adaptations on recognition tasks are underscored, along with potential exploration of dynamic language model construction.
E N D
Experiments in Adaptive Language Modeling Lidia Mangu & Geoffrey Zweig
Motivation • Multi-domain recognition • IBM Superhuman Recognition Program • Switchboard / Fisher • Voicemail • Call Center • ICSI Meetings • One-size LM may not fit all • Even a gigantic LM
Lots of Past Work • Kneser & Steinbiss ’93 • On The Dynamic Adaptation of Stochastic Language Modeling” • Tune mixing weights to suit particular text • Chen, Gauvain, Lamel, Adda & Adda ’01 • “Language Model Adaptation for Broadcast News Transcription” • Build and add new LMs from relevant training data • Florian & Yarowsky ’99 – Hierarchical LMs • Gao, Li & Lee ’00 – Upweight training counts whose frequency is similar to that in test • Seymore & Rosenfeld ’97- Interpolate Topic LMs • Bacchiani & Roark ’03 – MAP adaptation for voicemail • Many others.
Plan of Attack • No adaptation: The Superhuman LM • 8-way LM from multiple domains • Baseline adaptation: Adjust interpolation weights per conversation • Extended adaptation: build new LM from relevant training data
Description of Atomic LMs • SWB + CallHome • 3.4M words, 1.4M 3-gms • Broadcast News • 148M words, 38M 3-gms • Financial Call Ceneters • 655K words, 303K 3-gms • UW Web data (conversational-like) • 192M words, 48M 3-gms • SWB Cellular • 244K words, 134K 3-gms • UW Web data (meeting-like) • 28M words, 12M 3-gms • UW Newsgroup data • 102M words, 34M 3-gms • Voicemail • 1.1M words, 551K 3-gms
Description of Lattice-Building Models & Process • Generate lattices with bigram LM • Word-internal acoustic context • 3.6K acoustic units; 142K gaussians • PLP + VTLN + FMLLR + MMI • LM rescoring w/ 8-way interpolated LM • Acoustic rescoring w/ cross-word AM • Cross-word AM • 10K acoustic units; 589K gaussians • PLP + VTLN + FMLLR + ML • Adapt on scripts of the last step • Adjust interpolation weights to minimize perplexity on decoded scripts
Conclusions • Simple adaptation effective for a multi-domain system • Contrasts some previous results on BN • Not very sensitive to initial decoding errors • Dynamic LM construction to be explored