Download
forward backward algorithm n.
Skip this Video
Loading SlideShow in 5 Seconds..
Forward-backward algorithm PowerPoint Presentation
Download Presentation
Forward-backward algorithm

Forward-backward algorithm

255 Vues Download Presentation
Télécharger la présentation

Forward-backward algorithm

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Forward-backward algorithm LING 575 Week 2: 01/10/08

  2. Some facts • The forward-backward algorithm is a special case of EM. • EM stands for “expectation maximization”. • EM falls into the general framework of maximum-likelihood estimation (MLE).

  3. Outline • Maximum likelihood estimate (MLE) • EM in a nutshell • Forward-backward algorithm • Hw1 • Additional slides • EM main ideas • EM for PM models

  4. MLE

  5. What is MLE? • Given • A sample X={X1, …, Xn} • A list of parameters θ • We define • Likelihood of the data: P(X | θ) • Log-likelihood of the data: L(θ)=log P(X|θ) • Given X, find

  6. MLE (cont) • Often we assume that Xis are independently identically distributed (i.i.d.) • Depending on the form of P(X | θ), solving this optimization problem can be easy or hard.

  7. An easy case • Assuming • A coin has a probability p of being a head, 1-p of being a tail. • Observation: We toss a coin N times, and the result is a set of Hs and Ts, and there are m Hs. • What is the value of p based on MLE, given the observation?

  8. An easy case (cont) p= m/N

  9. EM: basic concepts

  10. Basic setting in EM • X is a set of data points: observed data •  is a parameter vector. • EM is a method to find θML where • Calculating P(X | θ) directly is hard. • Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or “missing” data).

  11. The basic EM strategy • Z = (X, Y) • Z: complete data (“augmented data”) • X: observed data (“incomplete” data) • Y: hidden data (“missing” data)

  12. The “missing” data Y • Y need not necessarily be missing in the practical sense of the word. • It may just be a conceptually convenient technical device to simplify the calculation of P(X | θ). • There could be many possible Ys.

  13. Examples of EM

  14. Basic idea • Consider a set of starting parameters • Use these to “estimate” the missing data • Use “complete” data to update parameters • Repeat until convergence

  15. The EM algorithm • Start with initial estimate, θ0 • Repeat until convergence • E-step: calculate • M-step: find

  16. EM Highlights • It is a general algorithm for missing data problems. • It requires “specialization” to the problem at hand. • Some classes of problem have a closed-form solution for the M-step. For example, • Forward-backward algorithm for HMM • Inside-outside algorithm for PCFG • EM in IBM MT Models

  17. The forward-backward algorithm

  18. Notation • A sentence: O1,T=o1…oT, • T is the sentence length • The state sequence X1,T+1=X1 … XT+1 • t: time t, range from 1 to T+1. • Xt: the state at time t. • i, j: state si, sj. • k: word wk in the vocabulary

  19. Forward probability It is the probability of producing o1,t-1 while ending up in state si: Initialization: Induction:

  20. Backward probability • It is the probability of producing the sequence Ot,T, given that at time t, we are at state si. Initialization: Induction:

  21. Calculating the prob of the observation

  22. is the prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)

  23. is the prob of being at state i at time t given O:

  24. Expected counts • Calculating expected counts by summing over the time index t • Expected # of transitions from state i to j in O: • Expected # of transitions from state i in O:

  25. Update parameters

  26. Final formulae

  27. The inner loop for forward-backward algorithm Given an input sequence and • Calculate forward probability: • Base case • Recursive case: • Calculate backward probability: • Base case: • Recursive case: • Calculate expected counts: • Update the parameters:

  28. Summary • A way of estimating parameters for HMM • Define forward and backward probability, which can calculated efficiently (DP) • Given an initial parameter setting, we re-estimate the parameters at each iteration. • The forward-backward algorithm is a special case of EM algorithm for PM model

  29. Hw1

  30. Arc-emission HMM: Q1: How to estimate the emission probability in a state-emission HMM?

  31. Given an input sequence and an HMM • Calculate forward probability: • Base case • Recursive case: • Calculate backward probability: • Base case: • Recursive case: • Calculate expected counts: 4. Update the parameters: Q2: how to modify the algorithm when there are multiple input sentences (e.g., a set of sentences)?

  32. EM: main ideas

  33. Additional slides

  34. Idea #1: find θ that maximizes the likelihood of training data

  35. Idea #2: find the θt sequence No analytical solution  iterative approach, find s.t.

  36. Idea #3: find θt+1 that maximizes a tight lower bound of a tight lower bound

  37. Idea #4: find θt+1 that maximizes the Q function Lower bound of The Q function

  38. The Q-function • Define the Q-function (a function of θ): • Y is a random vector. • X=(x1, x2, …, xn) is a constant (vector). • Θt is the current parameter estimate and is a constant (vector). • Θ is the normal variable (vector) that we wish to adjust. • The Q-function is the expected value of the complete data log-likelihood P(X,Y|θ) with respect to Y given X and θt.

  39. The EM algorithm • Start with initial estimate, θ0 • Repeat until convergence • E-step: calculate • M-step: find

  40. Important classes of EM problem • Products of multinomial (PM) models • Exponential families • Gaussian mixture • …

  41. The EM algorithm for PM models

  42. PM models Where is a partition of all the parameters, and for any j

  43. HMM is a PM

  44. PCFG • PCFG: each sample point (x,y): • x is a sentence • y is a possible parse tree for that sentence.

  45. PCFG is a PM

  46. Q-function for PM

  47. Maximizing the Q function Maximize Subject to the constraint Use Lagrange multipliers

  48. Optimal solution Expected count Normalization factor

  49. PCFG example • Calculate expected counts • Update parameters

  50. The EM algorithm for PM models // for each iteration // for each training example xi // for each possible y // for each parameter // for each parameter