1 / 107

A Guided Tour of Finite Mixture Models: From Pearson to the Web ICML ‘01 Keynote Talk Williams College, MA June 29th 200

A Guided Tour of Finite Mixture Models: From Pearson to the Web ICML ‘01 Keynote Talk Williams College, MA June 29th 2001. Padhraic Smyth Information and Computer Science University of California, Irvine www.datalab.uci.edu. Outline. What are mixture models? Definitions and examples

haley
Télécharger la présentation

A Guided Tour of Finite Mixture Models: From Pearson to the Web ICML ‘01 Keynote Talk Williams College, MA June 29th 200

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Guided Tour of Finite Mixture Models: From Pearson to the WebICML ‘01 Keynote TalkWilliams College, MAJune 29th 2001 Padhraic Smyth Information and Computer Science University of California, Irvine www.datalab.uci.edu

  2. Outline • What are mixture models? • Definitions and examples • How can we learn mixture models? • A brief history and illustration • What are mixture models useful for? • Applications in Web and transaction data • Recent research in mixtures?

  3. Acknowledgements • Students: • Igor Cadez, Scott Gaffney, Xianping Ge, Dima Pavlov • Collaborators • David Heckerman, Chris Meek, Heikki Mannila, Christine McLaren, Geoff McLachlan, David Wolpert • Funding • NSF, NIH, NIST, KLA-Tencor, UCI Cancer Center, Microsoft Research, IBM Research, HNC Software.

  4. Finite Mixture Models

  5. Finite Mixture Models

  6. Finite Mixture Models

  7. Finite Mixture Models

  8. Finite Mixture Models Weightk ComponentModelk Parametersk

  9. Example: Mixture of Gaussians • Gaussian mixtures:

  10. Example: Mixture of Gaussians • Gaussian mixtures: Each mixture component is a multidimensional Gaussian with its own mean mkand covariance “shape” Sk

  11. Example: Mixture of Gaussians • Gaussian mixtures: Each mixture component is a multidimensional Gaussian with its own mean mkand covariance “shape” Sk e.g., K=2, 1-dim: {q, a} = {m1, s1 , m2, s2 , a1}

  12. Example: Mixture of Naïve Bayes

  13. Example: Mixture of Naïve Bayes

  14. Example: Mixture of Naïve Bayes Conditional Independence model for each component (often quite useful to first-order)

  15. Mixtures of Naïve Bayes Terms 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Documents 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  16. Mixtures of Naïve Bayes Terms 1 1 1 1 1 1 1 1 1 1 1 1 Component 1 1 1 1 1 1 1 Documents 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Component 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1

  17. Other Component Models • Mixtures of Rectangles • Pelleg and Moore (ICML, 2001) • Mixtures of Trees • Meila and Jordan (2000) • Mixtures of Curves • Quandt and Ramsey (1978) • Mixtures of Sequences • Poulsen (1990)

  18. Interpretation of Mixtures 1. C has a direct (physical) interpretation e.g., C = {age of fish}, C = {male, female}

  19. Interpretation of Mixtures 1. C has a direct (physical) interpretation e.g., C = {age of fish}, C = {male, female} 2. C might have an interpretation e.g., clusters of Web surfers

  20. Interpretation of Mixtures 1. C has a direct (physical) interpretation e.g., C = {age of fish}, C = {male, female} 2. C might have an interpretation e.g., clusters of Web surfers 3. C is just a convenient latent variable e.g., flexible density estimation

  21. Graphical Models for Mixtures E.g., Mixtures of Naïve Bayes: C

  22. Graphical Models for Mixtures E.g., Mixtures of Naïve Bayes: C X1 X2 X3

  23. Graphical Models for Mixtures E.g., Mixtures of Naïve Bayes: C (discrete, hidden) X1 X2 X3 (observed)

  24. Sequential Mixtures C C C X1 X2 X3 X1 X2 X3 X1 X2 X3 Time t-1 Time t Time t+1

  25. Sequential Mixtures C C C X1 X2 X3 X1 X2 X3 X1 X2 X3 Time t-1 Time t Time t+1 Markov Mixtures = C has Markov dependence = Hidden Markov Model (here with naïve Bayes) C = discrete state, couples observables through time

  26. Dynamic Mixtures • Computer Vision • mixtures of Kalman filters for tracking • Atmospheric Science • mixtures of curves and dynamical models for cyclones • Economics • mixtures of switching regressions for the US economy

  27. Limitations of Mixtures • Discrete state space • not always appropriate • e.g., in modeling dynamical systems • Training • no closed form solution, can be tricky • Interpretability • many different mixture solutions may explain the same data

  28. Learning of mixture models

  29. Learning Mixtures from Data Consider fixed K e.g., Unknown parameters Q = {m1, s1 , m2, s2 , a1} Given data D = {x1,…….xN}, we want to find the parameters Q that “best fit” the data

  30. Early Attempts Weldon’s data, 1893 - n=1000 crabs from Bay of Naples - Ratio of forehead to body length - suspected existence of 2 separate species

  31. Early Attempts Karl Pearson, 1894: - JRSS paper - proposed a mixture of 2 Gaussians - 5 parameters Q = {m1, s1 , m2, s2 , a1} - parameter estimation -> method of moments - involved solution of 9th order equations! (see Chapter 10, Stigler (1986), The History of Statistics)

  32. “The solution of an equation of the ninth degree, where almost all powers, to the ninth, of the unknown quantity are existing, is, however, a very laborious task. Mr. Pearson has indeed possessed the energy to perform his heroic task…. But I fear he will have few successors…..” Charlier (1906)

  33. Maximum Likelihood Principle • Fisher, 1922 • assume a probabilistic model • likelihood = p(data | parameters, model) • find the parameters that make the data most likely

  34. Maximum Likelihood Principle • Fisher, 1922 • assume a probabilistic model • likelihood = p(data | parameters, model) • find the parameters that make the data most likely

  35. 1977: The EM Algorithm • Dempster, Laird, and Rubin • general framework for likelihood-based parameter estimation with missing data • start with initial guesses of parameters • Estep: estimate memberships given params • Mstep: estimate params given memberships • Repeat until convergence • converges to a (local) maximum of likelihood • Estep and Mstep are often computationally simple • generalizes to maximum a posteriori (with priors)

  36. Example of a Log-Likelihood Surface Mean 2 Log Scale for Sigma 2

  37. Control Group Anemia Group

More Related