Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009

A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain AdaptationFrank Wood and Yee Whye TehAISTATS 2009 Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009

Outline • Background • Pitman-Yor Process • Hierachical Pitman-Yor Process Language Models • Doubly Hierachical Pitman-Yor Process Language Model • Inference • Experimental results • Summary

Background:Language modeling and n-Gram models • “A language model is usually formulated as a probability distribution p(s) over strings s that attempts to reflect how frequently a string s occurs as a sentence”. • n-Gram (n=2: bigram, n=3: trigram) • Smoothing: Reference: S.F. Chen and J.T Goodman. 1998. An empiricalstudy of smoothing techniques for language modeling.Technical Report TR-10-98, Computer ScienceGroup, HarvardUniversity.

Example • Smoothing Reference: S.F. Chen and J.T Goodman. 1998. An empiricalstudy of smoothing techniques for language modeling.Technical Report TR-10-98, Computer ScienceGroup, HarvardUniversity.

Evaluation • Train the n-Gram model: • Calculate: • Cross-entropy: • Perplexity: Reference: S.F. Chen and J.T Goodman. 1998. An empiricalstudy of smoothing techniques for language modeling.Technical Report TR-10-98, Computer ScienceGroup, HarvardUniversity.

Dirichlet Process and Pitman-Yor Process • Dirichlet Process Number of unique words grows at • Pitman-Yor Process Number of unique words grows at • When d=0, Pitman-Yor Process reduces to DP • Both can be understood through the Chinese Restaurant process DP Pitman-Yor Sitting at Table k Sitting at new Table

Power-law properties of the Pitman-Yor Process Number of unique words Proportion of words appearing once Number of words Number of words

Hierachical Pitman-Yor Process Language Models

Doubly Hierachical Pitman-Yor Process Language Model

Inference • Direchlet Process, Chinese Restaurant Process • Hierachical Direchlet Process, Chinese Restaurant Franchise • Pitman-Yor Process, Chinese Restaurant Process • Hierachical Pitman-Yor Process, Chinese Restaurant Franchise • Doubly Hierachical Pitman-Yor Language Model, Graphical Pitman-Yor Process, Multi-floor Chinese Restaurant Process, Multi-floor Chinese Restaurant Franchise

Experimental results (HPYLM)

Experimental results (DHPYLM)

Summary • DHPYLM achieves encouraging domain adaptation results. • A graphical Pitman-Yor process is constructed and a multi-floor Chinese restaurant representation is proposed for doing sampling. • DHPYLM may be integrated into topic models to eliminate “bag-of-words” assumptions.

Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009

Presented by: Mingyuan Zhou Duke University, ECE December 18, 2009

Presentation Transcript

Which SNP genotyping errors are most costly and when?

Portfolio Selection with Higher Moments

World History

The Economic Implications of Corporate Financial Reporting

Wireless Medium Access Control Romit Roy Choudhury Wireless Networking Lectures Duke University

Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 1 Introduction

FY 2009 Year End Training May 12, 2009

Organphosphorus Compounds-Induced Neurotoxicity

Teaching Strategies and Learning Styles CRA-W Workshop March 4, 2009

Jimmy Lin The iSchool University of Maryland Sunday, May 31, 2009

The Grind Continues Pinal Partnership December 14 th , 2012 Presented By: Elliott D. Pollack CEO, Elliott D. Pollack &a

Duke Compsci 220 / ECE 252 Advanced Computer Architecture I

When East meet West

Presented by: Group 8 @ 23 March 2009

Xianghong Jasmine Zhou Molecular and Computational Biology University of Southern California

A Trail Guide to HITECH

Duke University School of Medicine Immunology Quality Assessment Center

COS 211 Hebrew Bible I

Computational Social Choice

neolithic

TAG Meeting December 9, 2009