1 / 14

CS 679: Text Mining

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 679: Text Mining. Lecture #9: Introduction to Markov Chain Monte Carlo, part 3. Slide Credit: Teg Grenager of Stanford University, with adaptations and improvements by Eric Ringger.

sylvie
Télécharger la présentation

CS 679: Text Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Text Mining Lecture #9: Introduction to Markov Chain Monte Carlo, part 3 Slide Credit: TegGrenager of Stanford University, with adaptations and improvements by Eric Ringger.

  2. Announcements • Assignment #4 • Prepare and give lecture on your 2 favorite papers • Starting in approx. 1.5 weeks • Assignment #5 • Pre-proposal • Answer the 5 Questions!

  3. Where are we? • Joint distributions are useful for answering questions (joint or conditional) • Directed graphical models represent joint distributions • Hierarchical bayesianmodels: make parameters explicit • We want to ask conditional questions on our models • E.g., posterior distributions over latent variables given large collections of data • Simultaneously inferring values of model parameters and answering the conditional questions: “posterior inference” • Posterior inference is a challenging computational problem • Sampling is an efficient mechanism for performing that inference. • Variety of MCMC methods: random walk on a carefully constructed markov chain • Convergence to desirable stationary distribution

  4. Agenda • Motivation • The Monte Carlo Principle • Markov Chain Monte Carlo • Metropolis Hastings • Gibbs Sampling • Advanced Topics

  5. Metropolis-Hastings • The symmetry requirement of the Metropolis proposal distribution can be hard to satisfy • Metropolis-Hastings is the natural generalization of the Metropolis algorithm, and the most popular MCMC algorithm • Choose a proposal distribution which is not necessarily symmetric • Define a Markov chain with the following process: • Sample a candidate point x* from a proposal distribution q(x*|x(t)) • Compute the importance ratio: • With probability min(r,1) transition to x*, otherwise stay in the samestate x(t)

  6. MH convergence • Theorem: The Metropolis-Hastings algorithm converges to the target distribution p(x). • Proof: • For all , WLOG assume • Thus, it satisfies detailed balance candidate is always accepted (i.e., ) b/c multiply by 1 commute transition prob.

  7. Gibbs sampling • A special case of Metropolis-Hastings which is applicable to state spaces in which • we have a factored state space And • access to the full (“complete”) conditionals: • Perfect for Bayesian networks! • Idea: To transition from one state (variable assignment) to another, • Pick a variable, • Sample its value from the conditional distribution • That’s it! • We’ll show in a minute why this is an instance of MH and thus must be sampling from joint or conditional distribution we wanted.

  8. Markov blanket • Recall that Bayesian networks encode a factored representation of the joint distribution • Variables are independent of their non-descendents given their parents • Variables are independent of everything else in the network given their Markov blanket! • So, to sample each node, we only need to condition on its Markov blanket:

  9. if otherwise Gibbs sampling • More formally, the proposal distribution is • The importance ratio is • So we always accept! Dfn of proposal distribution and Dfn of conditional probability (twice) Definition of Cancel common terms

  10. T A T T F 1 1 T F F B 1 1 Gibbs sampling example • Consider a simple, 2 variable Bayes net • Initialize randomly • Sample variables alternately F T T

  11. Gibbs Sampling (in 2-D) (MacKay, 2002)

  12. Practical issues • How many iterations? • How to know when to stop? • M-H: What’s a good proposal function? • Gibbs: How to derive the complete conditionals?

  13. Advanced Topics • Simulated annealing, for global optimization, is a form of MCMC • Mixtures of MCMC transition functions • Monte Carlo EM • stochastic E-step • i.e., sample instead of computing full posterior • Reversible jump MCMC for model selection • Adaptive proposal distributions

  14. Next • Document Clustering by Gibbs Sampling on a Mixture of Multinomials • From Dan Walker’s dissertation!

More Related