1 / 57

Hidden Markov Models

Hidden Markov Models. BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Oct 25 th , 2012. BMI/CS 576. Revisiting the CpG question. How to classify a new region into CpG or not CpG ? Could divide the genome into k basepair bins

petula
Télécharger la présentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models BMI/CS 576 www.biostat.wisc.edu/bmi576.html Sushmita Roy sroy@biostat.wisc.edu Oct 25th, 2012 BMI/CS 576

  2. Revisiting the CpG question • How to classify a new region into CpG or not CpG? • Could divide the genome into kbasepair bins • Ask using log likelihood ratio per bin, we can ask if the bin has a higher likelihood of being generated by one Markov chain versus another. • Alternatively, we can connect two Markov chains which in essence gives us a hidden Markov model.

  3. A+ A- G+ G- C+ C- T+ T- A simple HMM • given say aTin our input sequence, which state emitted it? Background CpG Island

  4. The hidden part of the problem • we’ll distinguish between the observed parts of a problem and the hiddenparts • in the Markov models we’ve considered previously, it is clear which state accounts for each part of the observed sequence • in the model above, there are multiple states that could account for each part of the observed sequence – this is the hidden part of the problem

  5. The parameters of an HMM • as in Markov chain models, we have transition probabilities • since we’ve decoupled states and characters, we might also have emission probabilities probability of a transition from state k to l represents a path (sequence of states) through the model probability of emitting character b in state k

  6. 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 begin end 0.6 0.5 1 3 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 A simple HMM with emission parameters probability of a transition from state 1 to state 3 probability of emitting character A in state 2 0.8

  7. Three important questions • How likely is a given sequence of observations? the Forward algorithm • What is the most probable “path” for generating a given sequence? the Viterbi algorithm • How can we learn the HMM parameters given a set of sequences? the Forward-Backward (Baum-Welch) algorithm

  8. begin end Path notation • let be a vector representing a path through the HMM 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8

  9. How likely is a given sequence of observations and states? • the probability that the path is taken and the sequence is generated: (assuming begin/end are the only silent states on path)

  10. begin end How likely is agiven sequence? 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8

  11. How likely is a given sequence of observations? • the probability over all paths is:

  12. begin end Number of paths • for a sequence of length L, how many possible paths through this HMM are there? 2L A 0.4 C 0.1 G 0.2 T 0.3 1 0 3 A 0.4 C 0.1 G 0.1 T 0.4 2 • the Forward algorithm enables us to compute efficiently

  13. How likely is a given sequence: the Forward algorithm • define to be the probability of being in state k having observed the first i characters of x • we want to compute , the probability of being in the end state having observed all of x • can define this recursively

  14. 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 begin end 0.5 1 3 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 The Forward algorithm • because of the Markov property, don’t have to explicitly enumerate every path – use dynamic programming instead • e.g. compute using

  15. The Forward algorithm initialization: probability that we’re in start state and have observed 0 characters from the sequence

  16. The Forward algorithm recursion for emitting states (i=1…L): recursion for silent states:

  17. The Forward algorithm termination: probability that we’re in the end state and have observed the entire sequence

  18. 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 begin end 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 Forwardalgorithm example • given the sequence x = TAGA

  19. Forward algorithm example • given the sequence x = TAGA • initialization • computing other values

  20. 1 3 begin end 0 5 2 4 Forward algorithm note • in some cases, we can make the algorithm more efficient by taking into account the minimum number of steps that must be taken to reach a state • e.g. for this HMM, we don’t need to initialize or compute the values

  21. Three important questions • How likely is a given sequence? • What is the most probable “path” for generating a given sequence? • How can we learn the HMM parameters given a set of sequences?

  22. Finding the most probable path: the Viterbi algorithm • define to be the probability of the most probable path accounting for the first i characters of x and ending in state k • we want to compute , the probability of the most probable path accounting for all of the sequence and ending in the end state • can define recursively, use DP to find efficiently

  23. Finding the most probable path: the Viterbi algorithm • initialization:

  24. The Viterbi algorithm • recursion for emitting states (i=1…L): keep track of most probable path • recursion for silent states:

  25. The Viterbi algorithm • traceback: follow pointers back starting at • termination:

  26. Three important questions • How likely is a given sequence? • What is the most probable “path” for generating a given sequence? • How can we learn the HMM parameters given a set of sequences?

  27. begin end Learning without hidden information • learning is simple if we know the correct path for each sequence in our training set 0 2 2 4 4 5 C A G T 1 3 0 5 2 4 • estimate parameters by counting the number of times each parameter is used across the training set

  28. begin end Learning with hidden information • if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence ? ? ? ? 0 5 C A G T 1 3 0 5 2 4 • estimate parameters through a procedure that counts the expected number of times each parameter is used across the training set

  29. Learning parameters • if we know the state path for each training sequence, learning the model parameters is simple • no hidden information during training • count how often each parameter is used • normalize/smooth to get probabilities • process is just like it was for Markov chain models • if we don’t know the path for each training sequence, how can we determine the counts? • key insight: estimate the counts by considering every path weighted by its probability

  30. Learning parameters: the Baum-Welch algorithm • a.k.a the Forward-Backward algorithm • an Expectation Maximization (EM) algorithm • EM is a family of algorithms for learning probabilistic models in problems that involve hidden information • in this context, the hidden information is the path that best explains each training sequence

  31. Learning parameters: the Baum-Welch algorithm • algorithm sketch: • initialize parameters of model • iterate until convergence • calculate the expected number of times each transition or emission is used • adjust the parameters to maximize the likelihood of these expected values

  32. The expectation step • We need to know the probability of the ith symbol being produced by state k, given sequencex(posterior probability of state k at time t) • We also need to know the probability of ith and (i+1)th symbol being produced by state k, and lgiven sequencex • given these we can compute our expected counts for state transitions, character emissions

  33. The expectation step • the probability of producing x with the ithsymbol being emitted by state kis • the first term is , computed by the forward algorithm • the second term is , computed by the backward algorithm

  34. begin end The expectation step • we want to know the probability of generating sequence x with the ithsymbol being produced by state k(for all x, i and k) 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 C A G T

  35. end The expectation step • the forward algorithm gives us , the probability of being in state k and having observed the first i characters of x 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 begin 0 5 A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 0.5 0.9 0.2 2 4 0.1 0.8 C AG T

  36. The expectation step • the backward algorithm gives us , the probability of observing the rest of x, given that we’re in state kafter icharacters 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 begin end 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 C A G T

  37. The expectation step • putting forward and backward together, we can compute the probability of producing sequence x with the ith symbol being produced by state q 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 begin end 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 C A G T

  38. The expectation step 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 begin end 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 C A G T

  39. The backward algorithm • initialization: • for states with a transition to end state

  40. The backward algorithm • recursion (i=L …1):

  41. The backward algorithm • termination:

  42. The expectation step • now we can calculate the probability of the ith symbol being produced by state k, givenx

  43. The expectation step • now we can calculate the expected number of times letter c is emitted by state k • here we’ve added the superscript j to refer to a specific sequence in the training set sum over positions where c occurs in x sum over sequences

  44. The expectation step • and we can calculate the expected number of times that the transition from k to l is used • or if l is a silent state

  45. The maximization step • Let be the expected number of emissions of c from state k for the training set • estimate new emission parameters by: • just like in the simple case • but typically we’ll do some “smoothing” (e.g. add pseudocounts)

  46. The maximization step • let be the expected number of transitions from state k to state l for the training set • estimate new transition parameters by:

  47. The Baum-Welch algorithm • initialize the parameters of the HMM • iterate until convergence • initialize , with pseudocounts • E-step: for each training set sequence j= 1…n • calculate values for sequence j • calculate values for sequence j • add the contribution of sequence j to , • M-step: update the HMM parameters using ,

  48. begin end A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 1.0 0.2 0.9 0 3 1 2 0.1 0.8 Baum-Welch algorithm example • given • the HMM with the parameters initialized as shown • the training sequences TAG, ACG • we’ll work through one iteration of Baum-Welch

  49. Baum-Welch example (cont) • determining the forward values for TAG • here we compute just the values that represent events with non-zero probability • in a similar way, we also compute forward values forACG

  50. Baum-Welch example (cont) • determining the backward values for TAG • here we compute just the values that represent events with non-zero probability • in a similar way, we also compute backward values for ACG

More Related