1 / 41

410 likes | 506 Vues

Hidden Markov Models. Richard Golden (following approach of Chapter 9 of Manning and Schutze, 2000) REVISION DATE: April 15 (Tuesday), 2003. a 11 =0.7. 1. S 0. S 1. a 12 =0.3. a 21 =0.5. 2. S 2. a 22 =0.5. VMM (Visible Markov Model). HMM Notation.

Télécharger la présentation
## Hidden Markov Models

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Hidden Markov Models**Richard Golden (following approach of Chapter 9 of Manning and Schutze, 2000) REVISION DATE:April 15 (Tuesday), 2003**a11=0.7**1 S0 S1 a12=0.3 a21=0.5 2 S2 a22=0.5 VMM (Visible Markov Model)**HMM Notation**• State Sequence Variables: X1, …, XT+1 • Output Sequence Variables: O1, …, OT • Set of Hidden States (S1, …, SN) • Output Alphabet (K1, …, KM) • Initial State Probabilities (1, .., N)i=p(X1=Si), i=1,…,N • State Transition Probabilities (aij) i,j{1,…,N}aij =p(Xt+1|Xt), t=1,…,T • Emission Probabilities (bij) i{1,…,N},j {1,…,M}bij=p(Xt+1=Si|Xt=Sj), t=1,…,T**S0**S1 K3 K2 K1 HMM State-Emission Representation • Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs • In this situation you would have a lot more emission arrows because there’s a lot more arcs… • But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation) a11=0.7 b11=0.6 b12=0.1 b13=0.3 1=1 a12=0.3 b22=0.7 b23=0.2 2=0 a21=0.5 S2 b21=0.1 a22=0.5**Arc-Emission Representation**• Note that sometimes a Hidden Markov Model is represented by having the emission arrows come off the arcs • In this situation you would have a lot more emission arrows because there’s a lot more arcs… • But the transition and emission probabilities are the same…it just takes longer to draw on your powerpoint presentation (self-conscious presentation)**Fundamental Questions for HMMs**• MODEL FIT • How can we compute likelihood of observations and hidden statesgiven known emission and transition probabilities? Compute:p(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bkm}) • How can we compute likelihood of observationsgiven known emission and transition probabilities? p(“Dog”,”is”,”Good” | {aij},{bkm})**Fundamental Questions for HMMs**• INFERENCE • How can we infer the sequence of hidden statesgiven the observations and the known emission and transition probabilities? • Maximize: • p(“Dog”/?,”is”/?, “Good”/? | {aij},{bkm})with respect to the unknown labels**Fundamental Questions for HMMs**• LEARNING • How can we estimate the emission and transition probabilitiesgiven observations and assuming that hidden states are observable during learning process? • How can we estimate emission and transition probabilitiesgivenobservations only?**Direct Calculation of Model Fit(note use of “Markov”**Assumptions) Part 1 Follows directly from the definition of a conditional probability: p(o,x)=p(o|x)p(x) EXAMPLE:P(“Dog”/NOUN,”is”/VERB,”Good”/ADJ | {aij},{bij}) = p(“Dog”,”is”,”Good”|NOUN,VERB,ADJ {aij},{bij}) X p(NOUN,VERB,ADJ | aij},{bij})**Direct Calculation of Likelihood of Labeled**Observations(note use of “Markov” Assumptions)Part 2 EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})**a11=0.7**b11=0.6 b12=0.1 b13=0.3 1=1 a12=0.3 b22=0.7 S0 S1 K1 K2 K3 b23=0.2 a21=0.5 2=0 S2 b21=0.1 a22=0.5 Graphical Algorithm Representation of Direct Calculation of Likelihood of Observations and Hidden States (not hard!) Note that “good” is The name Of the dogj So it is a Noun! The likelihood of a particular “labeled” sequence of observations (e.g., p(“Dog”/NOUN,”is”/VERB,”Good”/NOUN|{aij},{bkm})) may be computed Using the “direct calculation” method using following simple graphical algorithm. Specifically, p(K3/S1, K2/S2, K1/S1 |{aij},{bkm}))= 1b13a12b22a21b11**Extension to case where the likelihood of the observations**given parameters is needed(e.g., p( “Dog”, ”is”, ”good” | {aij},{bij}) KILLER EQUATION!!!!!**Efficiency of Calculations is Important (e.g., Model-Fit)**• Assume 1 multiplication per microsecond • Assume N=1000 word vocabulary and T=7 word sentence. • (2T+1)NT+1 multiplications by “direct calculation” yields (2(7)+1)(1000)(7+1) is about475,000 million years of computer time!!! • 2N2T multiplications using “forward method”is about 14 seconds of computer time!!!**Forward, Backward, and Viterbi Calculations**• Forward calculation methods are thus very useful. • Forward, Backward, and Viterbi Calculations will now be discussed.**S0**S1 Forward Calculations – Overview TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b11=0.6 b13=0.3 a11=0.7 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 b23=0.2 a22=0.5 b21=0.1 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**S0**S1 Forward Calculations – Time 2 (1 word example) TIME 2 NOTE: that 1 (2)+ 2 (2) is the likelihood of the observation/word “K3”in this “1 word example” K1 K2 K3 b13=0.3 a11=0.7 S1 1 a12=0.3 a21=0.5 2 S2 S2 a22=0.5 b23=0.2 K1 K2 K3**S0**S1 Forward Calculations – Time 3 (2 word example) TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 b12=0.1 1(3) b11=0.6 a11=0.7 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 a22=0.5 b21=0.1 b22=0.1 K1 K2 K3 K1 K2 K3**S0**S1 Forward Calculations – Time 4 (3 word example) TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b13=0.3 b11=0.6 a11=0.7 S1 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 S2 a22=0.5 b21=0.1 b23=0.2 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**Forward Calculation of Likelihood Function (“emit and**jump”)**S0**S1 Backward Calculations – Overview TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b11=0.6 b13=0.3 a11=0.7 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 b23=0.2 a22=0.5 b21=0.1 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**Backward Calculations – Time 4**TIME 4 K1 K2 K3 b11`=0.6 S1 S2 b21=0.1 K1 K2 K3**Backward Calculations – Time 3**TIME 3 K1 K2 K3 b11`=0.6 S1 S2 b21=0.1 K1 K2 K3**Backward Calculations – Time 2**TIME 3 TIME 4 TIME 2 NOTE: that 1 (2)+ 2 (2) is the likelihood the observation/word sequence “K2,K1”in this “2 word example” K1 K2 K3 K1 K2 K3 b12=0.1 b13=0.3 a11=0.7 S1 S1 a12=0.3 a21=0.5 a22=0.5 S2 S2 b23=0.2 b22=0.7 K1 K2 K1 K2 K3 K3**S0**S1 Backward Calculations – Time 1 TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b11=0.6 b13=0.3 a11=0.7 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 b23=0.2 a22=0.5 b21=0.1 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**Backward Calculation of Likelihood Function (“EMIT AND**JUMP”)**You get same answer going forward or backward!!**Backward Forward**The Forward-Backward Method**• Note the forward method computes: • Note the backward method computes (t>1): • We can do the forward-backward methodwhich computes p(K1,…,KT) using formula (using any choice of t=1,…,T+1!):**Example Forward-Backward Calculation!**Backward Forward**Solution to Problem 1**• The “hard part” of the 1st Problem was to find the likelihood of the observations for an HMM • We can now do this using either theforward, backward, or forward-backwardmethod.**Solution to Problem 2: Viterbi Algorithm(Computing “Most**Probable” Labeling) • Consider direct calculation of labeledobservations • Previously we summedthese likelihoods together across all possible labelings to solve the first problemwhich was to compute the likelihood of the observationsgiven the parameters (Hard part of HMM Question 1!). • We solved this problem using forward or backward method. • Now we want to compute all possible labelings and theirrespective likelihoods and pick the labeling which isthe largest! EXAMPLE:Compute p(“DOG”/NOUN,”is”/VERB,”good”/ADJ|{aij},{bkm})**Efficiency of Calculations is Important (e.g., Most Likely**Labeling Problem) • Just as in the forward-backward calculations wecan solve problem of computing likelihood of every possible one of the NT labelings efficiently • Instead of millions of years of computing time we can solve the problem in several seconds!!**S0**S1 Viterbi Algorithm – Overview (same setup as forward algorithm) TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b11=0.6 b13=0.3 a11=0.7 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 b23=0.2 a22=0.5 b21=0.1 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**S0**S1 Forward Calculations – Time 2 (1 word example) TIME 2 K1 K2 K3 b13=0.3 a11=0.7 S1 1=1 a12=0.3 a21=0.5 2=0 S2 S2 a22=0.5 b23=0.2 K1 K2 K3**S0**S1 Backtracking – Time 2 (1 word example) TIME 2 K1 K2 K3 b13=0.3 a11=0.7 S1 1=1 a12=0.3 a21=0.5 2=0 S2 S2 a22=0.5 b23=0.2 K1 K2 K3**S0**S1 Forward Calculations – (2 word example) TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 b12=0.1 b13=0.3 S1 S1 a11=0.7 1 a21=0.5 a12=0.3 2 a22=0.5 S2 S2 S2 b23=0.2 b22=0.1 K1 K2 K3 K1 K2 K3**S0**S1 BACKTRACKING – (2 word example) TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 b12=0.1 b13=0.3 S1 S1 a11=0.7 1 a21=0.5 a12=0.3 2 a22=0.5 S2 S2 S2 b23=0.2 b22=0.1 K1 K2 K3 K1 K2 K3**S0**S1 Forward Calculations – Time 4 (3 word example) TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b13=0.3 b11=0.6 a11=0.7 S1 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 S2 a22=0.5 b21=0.1 b23=0.2 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**S0**S1 Backtracking to Obtain Labeling for 3 word case TIME 3 TIME 4 TIME 2 K1 K2 K3 K1 K2 K3 K1 K2 K3 b12=0.1 b13=0.3 b11=0.6 a11=0.7 S1 S1 S1 1 a12=0.3 a21=0.5 2 S2 S2 S2 S2 a22=0.5 b21=0.1 b23=0.2 b22=0.1 K1 K2 K3 K1 K2 K1 K2 K3 K3**Third Fundamental Question:Parameter Estimation**• Make Initial Guess for {aij} and {bkm} • Compute probability one hidden state follows another given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm) • Compute probability of observed state given a hidden state given: {aij} and {bkm} and sequence of observations.(computed using forward-backward algorithm) • Use these computed probabilities tomake an improved guess for {aij} and {bkm} • Repeat this process until convergence • Can be shown that this algorithm does infact converge to correct choice for {aij} and {bkm}assuming that the initial guess was close enough..

More Related