1 / 28

Hidden Markov Models

Hidden Markov Models. Hidden Markov Model. In some Markov processes, we may not be able to observe the states directly. A HMM is a quintuple ( S, E , P, A, B ). S : {s 1 …s N } are the values for the hidden states E : {e 1 …e T } are the values for the observations

kasia
Télécharger la présentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models

  2. Hidden Markov Model • In some Markov processes, we may not be able to observe the states directly.

  3. A HMM is a quintuple (S, E, P, A, B ). S : {s1…sN } are the values for the hidden states E : {e1…eT } are the values for the observations P: probability distribution of the initial state A: transition probability matrix B: emission probability matrix Hidden Markov Model X1 Xt-1 Xt Xt+1 XT e1 et-1 et et+1 eT

  4. Inferences with HMM • Filtering: P(xt|e1:t) • Given an observation sequence, compute the probability of the last state. • Decoding: argmaxx1:t P(x1:t|e1:t) • Given an observation sequence, compute the most likely hidden state sequence. • Learning: argmax P(e1:t) where =(P, A, B ) are parameters of the HMM • Given an observation sequence, find out which transition probability and emission probability table assigns the observations the highest probability. • Unsupervised learning

  5. Filtering P(Xt+1|e1:t+1) = P(Xt+1|e1:t, et+1) =P(et+1|Xt+1, e1:t) P(Xt+1|e1:t)/P(et+1|e1:t) =P(et+1|Xt+1) P(Xt+1|e1:t)/P(et+1|e1:t) P(Xt+1|e1:t) = xt P(Xt+1|xt, e1:t) P(xt|e1:t) Same form. Use recursion

  6. Filtering Example

  7. Viterbi Algorithm • Compute argmaxx1:t P(x1:t|e1:t) • Since P(x1:t|e1:t) = P(x1:t, e1:t)/P(e1:t), • and P(e1:t) remains constant when we consider different x1:t • argmaxx1:t P(x1:t|e1:t)= argmaxx1:t P(x1:t, e1:t) • Since the Markov chain is a Bayes Net, • P(x1:t, e1:t)=P(x0) i=1,t P(xi|xi-1) P(ei|xi) • Minimize – log P(x1:t, e1:t) =–logP(x0) +i=1,t(–log P(xi|xi-1) –log P(ei|xi))

  8. Viterbi Algorithm • Given a HMM (S, E, P, A, B ) and observations o1:t, construct a graph that consists 1+tN nodes: • One initial node • N node at time i. The jth node at time i represent Xi=sj. • The link between the nodes Xi-i=sj and Xi=sk is associated with the length –log P(Xi=sk| Xi-1=sj-1)P(ei|Xi=sk)

  9. The problem of finding argmaxx1:t P(x1:t|e1:t) becomes that of finding the shortest path from x0=s0 to one of the nodes xt=st.

  10. Example

  11. Baum-Welch Algorithm • The previous two kinds of computation needs parameters =(P, A, B ). Where do the probabilities come from? • Relative frequency? • But the states are not observable! • Solution: Baum-Welch Algorithm • Unsupervised learning from observations • Find argmax P(e1:t)

  12. Baum-Welch Algorithm • Start with an initial set of parameters 0 • Possibly arbitrary • Compute pseudo counts • How many times the transition from Xi-i=sj to Xi=sk occurred? • Use the pseudo counts to obtain another (better) set of parameters 1 • Iterate until P1(e1:t) is not bigger than P(e1:t) • A special case of EM (Expectation-Maximization)

  13. Xt=si Xt+1=sj Pseudo Counts • Given the observation sequence e1:T, the pseudo counts of the link from Xt=si to Xt+1=sj is the probability P(Xt=si,Xt+1=sj|e1:T)

  14. Update HMM Parameters • Add P(Xt=si,Xt+1=sj|e1:T) to count(i,j) • Add P(Xt=si|e1:T) to count(i) • Add P(Xt=si|e1:T) to count(i,et) • Updated aij= count(i,j)/count(i); • Updated bjet=count(j,et)/count(j)

  15. P(Xt=si,Xt+1=sj|e1:T) = P(Xt=si,Xt+1=sj, e1:t, et+1, et+2:T)/ P(e1:T) = P(Xt=si, e1:t)P(Xt+1=sj|Xt=si)P(et|Xt+1=sj) P(et+2:T|Xt+1=sj)/P(e1:T) = P(Xt=si, e1:t) aijbjetP(et+2:T|Xt+1=sj)/ P(e1:T) = i(t) aij bjetβj(t+1)/P(e1:T)

  16. Forward Probability

  17. Backward Probability

  18. Xt=si Xt+1=sj bj(t+1) ai(t) aijbjet t-1 t t+1 t+2

  19. P(Xt=si|e1:T) =P(Xt=si, e1:t, et+1:T)/P(e1:T) =P(et+1:T| Xt=si, e1:t)P(Xt=si, e1:t)/P(e1:T) = P(et+1:T| Xt=si)P(Xt=si|e1:t)P(e1:t)/P(e1:T) = i(t) βi(t)/P(et+1:T|e1:t)

  20. Speech Recognition

  21. Phones

  22. Speech Signal • Waveform • Spectrogram

  23. Feature Extraction Frame 1 Frame 2 Feature VectorX1 Feature VectorX2

  24. Speech input Acoustic analysis x x 1 T ... Phoneme inventory P ( x x w w ) ... ... | 1 T 1 k Global search: Maximize Pronunciation lexicon w w ... 1 k P ( x x w w ) ... ... | 1 T 1 k ... P ( w w ) Language model 1 k over Recognized word sequence P ( x x w w ) P ( w w ) ... ... ... | ・ 1 T 1 k 1 k Speech System Architecture

  25. HMM for Speech Recognition a24 Word Model a11 a22 a33 a01 a12 a23 a34 start0 n1 iy2 d3 end4 b1(o3) b1(o5) b1(o1) b1(o2) b1(o6) b1(o4) ObservationSequence … … o1 o2 o3 o4 o5 o6

  26. Language Modeling • Goal: determine which sequence of words is more likely: • I went to a party • Eye went two a bar tea • Rudolph the red nose reindeer. • Rudolph the Red knows rain, dear. • Rudolph the Red Nose reigned here.

  27. Summary • HMM • Filtering • Decoding • Learning • Speech Recognition • Feature extraction from signal • HMM for speech recognition

More Related