 Download Download Presentation Expectation Maximization Algorithm

# Expectation Maximization Algorithm

Télécharger la présentation ## Expectation Maximization Algorithm

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. A Mixture Model Problem • Apparently, the dataset consists of two modes • How can we automatically identify the two modes?

2. Gaussian Mixture Model (GMM) • Assume that the dataset is generated by two mixed Gaussian distributions • Gaussian model 1: • Gaussian model 2: • If we know the memberships for each bin, estimating the two Gaussian models is easy. • How to estimate the two Gaussian models without knowing the memberships of bins?

3. EM Algorithm for GMM • Let memberships to be hidden variables • EM algorithm for Gaussian mixture model • Unknown memberships: • Unknown Gaussian models: • Learn these two sets of parameters iteratively

4. Start with A Random Guess • Random assign the memberships to each bin

5. Start with A Random Guess • Random assign the memberships to each bin • Estimate the means and variance of each Gaussian model

6. E-step • Fixed the two Gaussian models • Estimate the posterior for each data point

7. EM Algorithm for GMM • Re-estimate the memberships for each bin

8. Weighted by posteriors Weighted by posteriors M-Step • Fixed the memberships • Re-estimate the two model Gaussian

9. EM Algorithm for GMM • Re-estimate the memberships for each bin • Re-estimate the models

10. At the 5-th Iteration • Red Gaussian component slowly shifts toward the left end of the x axis

11. At the10-th Iteration • Red Gaussian component still slowly shifts toward the left end of the x axis

12. At the 20-th Iteration • Red Gaussian component make more noticeable shift toward the left end of the x axis

13. At the 50-th Iteration • Red Gaussian component is close to the desirable location

14. At the 100-th Iteration • The results are almost identical to the ones for the 50-th iteration

15. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

16. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

17. EM as A Bound Optimization • EM algorithm in fact maximizes the log-likelihood function of training data • Likelihood for a data point x • Log-likelihood of training data

19. Logarithm Bound Algorithm Touch Point • Start with initial guess • Come up with a lower bounded

20. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes

21. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes • Repeat the procedure

22. Logarithm Bound Algorithm Optimal Point • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes • Repeat the procedure • Converge to the local optimal

23. EM as A Bound Optimization • Parameter for previous iteration: • Parameter for current iteration: • Compute

24. Concave property of logarithm function

25. Definition of posterior

26. Log-Likelihood of EM Alg. Saddle points

27. Maximize GMM Model • What is the global optimal solution to GMM? • Maximizing the objective function of GMM is ill-posed problem

28. Maximize GMM Model • What is the global optimal solution to GMM? • Maximizing the objective function of GMM is ill-posed problem

29. Identify Hidden Variables • For certain learning problems, identifying hidden variables is not a easy task • Consider a simple translation model • For a pair of English and Chinese sentences: • A simple translation model is • The log-likelihood of training corpus

30. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

31. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

32. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

33. Identify Hidden Variables • Consider a simple case • Alignment variable a(i) • Rewrite

34. EM Algorithm for A Translation Model • Introduce an alignment variable for each translation pair • EM algorithm for the translation model • E-step: compute the posterior for each alignment variable • M-step: estimate the translation probability Pr(e|c)

35. EM Algorithm for A Translation Model • Introduce an alignment variable for each translation pair • EM algorithm for the translation model • E-step: compute the posterior for each alignment variable • M-step: estimate the translation probability Pr(e|c) We are luck here. In general, this step can be extremely difficult and usually requires approximate approaches

36. Compute Pr(e|c) • First compute

37. Compute Pr(e|c) • First compute

38. Bound Optimization for A Translation Model

39. Bound Optimization for A Translation Model

40. Iterative Scaling • Maximum entropy model • Iterative scaling • All features • Sum of features are constant

41. Iterative Scaling • Compute the empirical mean for each feature of every class, i.e., for every j and every class y • Start w1,w2 …, wc = 0 • Repeat • Compute p(y|x) for each training data point (xi, yi) using w from the previous iteration • Compute the mean of each feature of every class using the estimated probabilities, i.e., for every j and every y • Compute for every j and every y • Update w as

42. Iterative Scaling

43. Iterative Scaling Can we use the concave property of logarithm function? No, we can’t because we need a lower bound

44. Weights still couple with each other • Still need further decomposition Iterative Scaling

45. Iterative Scaling

46. Iterative Scaling Wait a minute, this can not be right! What happens?

47. Logarithm Bound Algorithm • Start with initial guess • Come up with a lower bounded • Search the optimal solution that maximizes