Download
expectation maximization algorithm n.
Skip this Video
Loading SlideShow in 5 Seconds..
Expectation Maximization Algorithm PowerPoint Presentation
Download Presentation
Expectation Maximization Algorithm

Expectation Maximization Algorithm

443 Vues Download Presentation
Télécharger la présentation

Expectation Maximization Algorithm

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Expectation Maximization Algorithm Rong Jin

  2. A Mixture Model Problem • Apparently, the dataset consists of two modes • How can we automatically identify the two modes?

  3. Gaussian Mixture Model (GMM) • Assume that the dataset is generated by two mixed Gaussian distributions • Gaussian model 1: • Gaussian model 2: • If we know the memberships for each bin, estimating the two Gaussian models is easy. • How to estimate the two Gaussian models without knowing the memberships of bins?

  4. EM Algorithm for GMM • Let memberships to be hidden variables • EM algorithm for Gaussian mixture model • Unknown memberships: • Unknown Gaussian models: • Learn these two sets of parameters iteratively

  5. Start with A Random Guess • Random assign the memberships to each bin

  6. Start with A Random Guess • Random assign the memberships to each bin • Estimate the means and variance of each Gaussian model

  7. E-step • Fixed the two Gaussian models • Estimate the posterior for each data point

  8. EM Algorithm for GMM • Re-estimate the memberships for each bin

  9. Weighted by posteriors Weighted by posteriors M-Step • Fixed the memberships • Re-estimate the two model Gaussian

  10. EM Algorithm for GMM • Re-estimate the memberships for each bin • Re-estimate the models

  11. At the 5-th Iteration • Red Gaussian component slowly shifts toward the left end of the x axis

  12. At the10-th Iteration • Red Gaussian component still slowly shifts toward the left end of the x axis

  13. At the 20-th Iteration • Red Gaussian component make more noticeable shift toward the left end of the x axis

  14. At the 50-th Iteration • Red Gaussian component is close to the desirable location

  15. At the 100-th Iteration • The results are almost identical to the ones for the 50-th iteration

  16. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

  17. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

  18. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

  19. Logarithm Bound Algorithm • Start with initial guess

  20. Logarithm Bound Algorithm Touch Point • Start with initial guess • Come up with a lower bounded

  21. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes

  22. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes • Repeat the procedure

  23. Logarithm Bound Algorithm Optimal Point • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes • Repeat the procedure • Converge to the local optimal

  24. EM as A Bound Optimization • Parameter for previous iteration: • Parameter for current iteration: • Compute

  25. Concave property of logarithm function

  26. Definition of posterior

  27. Log-Likelihood of EM Alg. Saddle points

  28. Maximize GMM Model • What is the global optimal solution to GMM? • Maximizing the objective function of GMM is ill-posed problem

  29. Maximize GMM Model • What is the global optimal solution to GMM? • Maximizing the objective function of GMM is ill-posed problem

  30. Identify Hidden Variables • For certain learning problems, identifying hidden variables is not a easy task • Consider a simple translation model • For a pair of English and Chinese sentences: • A simple translation model is • The log-likelihood of training corpus

  31. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  32. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  33. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  34. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

  35. EM Algorithm for A Translation Model • Introduce an alignment variable for each translation pair • EM algorithm for the translation model • E-step: compute the posterior for each alignment variable • M-step: estimate the translation probability Pr(e|c)

  36. EM Algorithm for A Translation Model • Introduce an alignment variable for each translation pair • EM algorithm for the translation model • E-step: compute the posterior for each alignment variable • M-step: estimate the translation probability Pr(e|c) We are luck here. In general, this step can be extremely difficult and usually requires approximate approaches

  37. Compute Pr(e|c) • First compute

  38. Compute Pr(e|c) • First compute

  39. Bound Optimization for A Translation Model

  40. Bound Optimization for A Translation Model

  41. Iterative Scaling • Maximum entropy model • Iterative scaling • All features • Sum of features are constant

  42. Iterative Scaling • Compute the empirical mean for each feature of every class, i.e., for every j and every class y • Start w1,w2 …, wc = 0 • Repeat • Compute p(y|x) for each training data point (xi, yi) using w from the previous iteration • Compute the mean of each feature of every class using the estimated probabilities, i.e., for every j and every y • Compute for every j and every y • Update w as

  43. Iterative Scaling

  44. Iterative Scaling Can we use the concave property of logarithm function? No, we can’t because we need a lower bound

  45. Weights still couple with each other • Still need further decomposition Iterative Scaling

  46. Iterative Scaling

  47. Iterative Scaling Wait a minute, this can not be right! What happens?

  48. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes