100 likes | 223 Vues
This document provides a comprehensive overview of the implementation of a language model using HTK. It introduces key concepts, including N-grams and their variations such as Unigram, Bigram, Trigram, and 4-grams, and discusses the process of database preparation and vocabulary mapping. We explore the effect of different n-gram levels on perplexity and performance, emphasizing that while higher n-grams yield better accuracy, they also increase memory usage and risk overfitting. We conclude with insights on balancing model complexity and performance.
E N D
Language model using HTK Raymond Sastraputera
Overview • Introduction • Implementation • Experimentation • Conclusion
Word 1 Word 2 Word 3 Word 4 Introduction • Language model • N-gram • HTK 3.3 • Windows binary
Implementation • Database Preparation • Word map • N-gram file • Mapping OOV words • Vocabulary list
Implementation • Language model generation • Unigram • Bigram • Trigram • 4-gram • Perplexity
Conclusion and Summary • Higher n-gram • Less perplexity • More memory usage • Too high means over fitting • Multiple backed • Waste of time
Reference • 1. HTK (http://htk.eng.cam.ac.uk/)
Thank you • Any Questions?