1 / 45

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models. Marjolijn Elsinga & Elze de Groot. Andrei A. Markov. Born: 14 June 1856 in Ryazan, Russia Died: 20 July 1922 in Petrograd, Russia Graduate of Saint Petersburg University (1878)

julio
Télécharger la présentation

Markov Chains and Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot Marjolijn Elsinga & Elze de Groot

  2. Andrei A. Markov • Born: 14 June 1856 in Ryazan, RussiaDied: 20 July 1922 in Petrograd, Russia • Graduate of Saint Petersburg University (1878) • Work: number theory and analysis, continued fractions, limits of integrals, approximation theory and the convergence of series Marjolijn Elsinga & Elze de Groot

  3. Todays topics • Markov chains • Hidden Markov models - Viterbi Algorithm - Forward Algorithm - Backward Algorithm - Posterior Probabilities Marjolijn Elsinga & Elze de Groot

  4. Markov Chains (1) • Emitting states Marjolijn Elsinga & Elze de Groot

  5. Markov Chains (2) • Transition probabilities • Probability of the sequence Marjolijn Elsinga & Elze de Groot

  6. Key property of Markov Chains • The probability of a symbol xidepends only on the value of the preceding symbol xi-1 Marjolijn Elsinga & Elze de Groot

  7. Begin and End states • Silent states Marjolijn Elsinga & Elze de Groot

  8. Example: CpG Islands • CpG = Cytosine – phosphodiester bond – Guanine • 100 – 1000 bases long • Cytosine is modified by methylation • Methylation is suppressed in short stretches of the genome (start regions of genes) • High chance of mutation into a thymine (T) Marjolijn Elsinga & Elze de Groot

  9. Two questions • How would we decide if a short strech of genomic sequence comes from a CpG island or not? • How would we find, given a long piece of sequence, the CpG islands in it, if there are any? Marjolijn Elsinga & Elze de Groot

  10. Discrimination • 48 putative CpG islands are extracted • Derive 2 models - regions labelled as CpG island (‘+’ model) - regions from the remainder (‘-’ model) • Transition probabilities are set - Where Cst+ is number of times letter t follows letter s Marjolijn Elsinga & Elze de Groot

  11. Maximum Likelihood Estimators • Each row sums to 1 • Tables are asymmetric Marjolijn Elsinga & Elze de Groot

  12. Log-odds ratio Marjolijn Elsinga & Elze de Groot

  13. Discrimination shown Marjolijn Elsinga & Elze de Groot

  14. Simulation: ‘+’ model Marjolijn Elsinga & Elze de Groot

  15. Simulation: ‘-’ model Marjolijn Elsinga & Elze de Groot

  16. Todays topics • Markov chains • Hidden Markov models - Viterbi Algorithm - Forward Algorithm - Backward Algorithm - Posterior Probabilities Marjolijn Elsinga & Elze de Groot

  17. Hidden Markov Models (HMM) (1) • No one-to-one correspondence between states and symbols • No longer possible to say what state the model is in when in xi • Transition probability from state k to l: • πi is the ith state in the path (state sequence) Marjolijn Elsinga & Elze de Groot

  18. Hidden Markov Models (HMM) (2) • Begin state: a0k • End state: a0k • In CpG islands example: Marjolijn Elsinga & Elze de Groot

  19. Hidden Markov Models (HMM) (3) • We need new set of parameters because we decoupled symbols from states • Probability that symbol b is seen when in state k: Marjolijn Elsinga & Elze de Groot

  20. Example: dishonest casino (1) • Fair die and loaded die • Loaded die: probability 0.5 of a 6 and probability 0.1 for 1-5 • Switch from fair to loaded: probability 0.05 • Switch back: probability 0.1 Marjolijn Elsinga & Elze de Groot

  21. Dishonest casino (2) • Emission probabilities: HMM model that generate or emit sequences Marjolijn Elsinga & Elze de Groot

  22. Dishonest casino (3) • Hidden: you don’t know if die is fair or loaded • Joint probability of observed sequence x and state sequence π: Marjolijn Elsinga & Elze de Groot

  23. Three algorithms • What is the most probable path for generating a given sequence? Viterbi Algorithm • How likely is a given sequence? Forward Algorithm • How can we learn the HMM parameters given a set of sequences? Forward-Backward (Baum-Welch) Algorithm Marjolijn Elsinga & Elze de Groot

  24. Viterbi Algorithm • CGCG can be generated on different ways, and with different probabilities • Choose path with highest probability • Most probable path can be found recursively Marjolijn Elsinga & Elze de Groot

  25. Viterbi Algorithm (2) • vk(i) = probability ofmost probable path ending in state k with observation i Marjolijn Elsinga & Elze de Groot

  26. Viterbi Algorithm (3) Marjolijn Elsinga & Elze de Groot

  27. Viterbi Algorithm • Most probable path for CGCG Marjolijn Elsinga & Elze de Groot

  28. Viterbi Algorithm • Result with casino example Marjolijn Elsinga & Elze de Groot

  29. Three algorithms • What is the most probable path for generating a given sequence? Viterbi Algorithm • How likely is a given sequence? Forward Algorithm • How can we learn the HMM parameters given a set of sequences? Forward-Backward (Baum-Welch) Algorithm Marjolijn Elsinga & Elze de Groot

  30. Forward Algorithm (1) • Probability over all possible paths • Number of possible paths increases exponentonial with length of sequence • Forward algorithm enables us to compute this efficiently Marjolijn Elsinga & Elze de Groot

  31. Forward Algorithm (2) • Replacing maximisation steps for sums in viterbi algorithm • Probability of observed sequence up to and including xi, requiring πi = k Marjolijn Elsinga & Elze de Groot

  32. Forward Algorithm (3) Marjolijn Elsinga & Elze de Groot

  33. Three algorithms • What is the most probable path for generating a given sequence? Viterbi Algorithm • How likely is a given sequence? Forward Algorithm • How can we learn the HMM parameters given a set of sequences? Forward-Backward (Baum-Welch) Algorithm Marjolijn Elsinga & Elze de Groot

  34. Backward Algorithm (1) • Probability of observed sequence from xi to the end of the sequence, requiring πi = k Marjolijn Elsinga & Elze de Groot

  35. Disadvantage Algorithms • Multiplying many probabilities gives very small numbers which can lead to underflow errors on the computer  can be solved by doing the algorithms in log space, calculating log(vl(i)) Marjolijn Elsinga & Elze de Groot

  36. Backward Algorithm Marjolijn Elsinga & Elze de Groot

  37. Posterior State Probability (1) • Probability that observation xi came from state k, given the observed sequence • Posterior probability of state k at time i when the emitted sequence is known: P(πi = k | x) Marjolijn Elsinga & Elze de Groot

  38. Posterior State Probability (2) • First calculate probability of producing entire observed sequence with the ith symbol being produced by state k • P(x, πi = k) = fk (i) ·bk (i) Marjolijn Elsinga & Elze de Groot

  39. Posterior State Probability (3) • Posterior probabilities will then be: • P(x) is result of forward or backward calculation Marjolijn Elsinga & Elze de Groot

  40. Posterior Probabilities (4) • For the casino example Marjolijn Elsinga & Elze de Groot

  41. Two questions • How would we decide if a short strech of genomic sequence comes from a CpG island or not? • How would we find, given a long piece of sequence, the CpG islands in it, if there are any? Marjolijn Elsinga & Elze de Groot

  42. Prediction of CpG islands • First way: Viterbi Algorithm - Find most probable path through the model - When this path goes through the ‘+’ state, a CpG island is predicted Marjolijn Elsinga & Elze de Groot

  43. Prediction of CpG islands • Second Way: Posterior Decoding - function: - g(k) = 1 for k Є {A+, C+, G+, T+} - g(k) = 0 for k Є {A-, C-, G-, T-} - G(i|x) is posterior probability according to the model that base i is in a CpG island Marjolijn Elsinga & Elze de Groot

  44. Summary (1) • Markov chain is a collection of states where a state depends only on the state before • Hidden markov model is a model in which the states sequence is ‘hidden’ Marjolijn Elsinga & Elze de Groot

  45. Summary (2) • Most probable path: viterbi algorithm • How likely is a given sequence?: forward algorithm • Posterior state probability: forward and backward algorithms (used for most probable state of an observation) Marjolijn Elsinga & Elze de Groot

More Related