90 likes | 101 Vues
Learn about Skip N-grams, Language Models, Bag Generation Metrics, and Clustering in text analysis. Discover ways to predict words and measure model stability and reconstruction. 8
E N D
Language Model Methods and Metrics Gary Luu Ryan Fortune
Skip N-grams • Interpolated with Bigram • Get Influence of words further away without increasing dimensionality • Learning Curve
Content Word Language Model • Help predict next word using last uncommon word, try to capture context • Found list of 250 most common words • Tried different sizes for common words • Interpolated with language models, since this wouldn’t maintain grammar • P(w|C)
Bag Generation Metrics • Bag Generation – NP-Hard • Random Restart Greedy Hill-Climbing • Stability Metric • Give model correct sentence, does it maintain it as an optima? • A percentage of sentences that remain stable • Reconstruction Metric • Needs to be compared against lucky/random
Clustering -IBMFullPredict • Clustering overview • Perplexity down to 107 with million sentence corpus • Pibmfullpredict(wi|wi-2wi-1) = [λP(W|wi-2wi-1) + (1-λ)P(W|Wi-1Wi-2)] * [μP(w|wi-1wi-2,W) + (1-μ)P(w|Wi-2,Wi-1,W)]