1 / 23

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

Advanced Algorithms and Models for Computational Biology -- a machine learning approach. Computational Genomics II: HMM variants and Comparative Gene Finding Eric Xing Lecture 5, February 1, 2005. Reading: Chap 3, 5 DEKM book Chap 9, DTW book . Higher-order HMMs. The Genetic Code

kaspar
Télécharger la présentation

Advanced Algorithms and Models for Computational Biology -- a machine learning approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Algorithms and Models for Computational Biology-- a machine learning approach Computational Genomics II: HMM variants and Comparative Gene Finding Eric Xing Lecture 5, February 1, 2005 Reading: Chap 3, 5 DEKM book Chap 9, DTW book

  2. Higher-order HMMs • The Genetic Code • 3 nucleotides make 1 amino acid • Statistical dependencies in triplets • Question: • Recognize protein-coding segments with an HMM

  3. y1,...,N = i i e e i y1 x1,...,N = A C T T G x1 A ... y2 y3 y4 yN ... x2 A x3 A x4 A xN A Higher-order HMMs • Every state of the HMM emits 1 nucleotide • Transition probabilities: Probability of a state at one position, given those of 3 previous positions (triplets): P(yi | yi-1,yi-2, yi-3) • Emission probabilities: P(xi| yi) • Algorithms extend with small modifications

  4. y1,...,N = i i e e i y1 x1,...,N = A C T T G Q1 Q2 Q3 x1 A ... Y1,Y2,Y3 Y2,Y3,Y4 Y3,Y4,Y5 y2 y3 y4 yN ... ... X1,X2,X3 X2,X3,X4 X3,X4,X5 R1 R2 R3 x2 A x3 A x4 A xN A ... Inference on Higher-order HMMs • Building 1st-order HMM on "mega" state • Use FB algorithm as usual • P(Q2|R)  P(Y2, Y3, Y4 |X)  P(Y3 |X)=SY2,Y4P(Y2, Y3, Y4 |X)

  5. Modeling the Duration of States 1-p • Length distribution of region X: E[lX] = 1/(1-p) • Geometric distribution, with mean 1/(1-p) • (homework: derive this) • This is a significant disadvantage of HMMs • Several solutions exist for modeling different length distributions X Y p q 1-q

  6. Observed Duration Time

  7. Poisson Point Process • A counting process that represents the total number of occurrences of discrete events during a temporal/spatial interval • the number of occurrences in any internal of length t isPoisson distributedwith parameter lt: • the number of occurrences in disjoint intervals are independent • the duration of the interval between two consecutive occurrences has the following distribution:

  8. Poisson point process m= lt Truncation is needed at both ends!

  9. ... y2 y3 y4 yN y1 d1 d2 d3 dN d4 ... xd2 A xd3 A xd2 A xdN A xd1 A Generalized HMM Upon entering a state: • Choose duration d, according to probability distribution • Generate d letters according to emission probs • Take a transition to next state according to transition probs Disadvantage: Increase in complexity: Time: O(D2) Space: O(D) where D = maximum duration of state

  10. Comparative Genomics

  11. A pairwise comparison between human and mouse genome

  12. Aligning One Locus

  13. Three Pairwise Alignments

  14. Example: a human/mouse ortholog

  15. Paired HMM M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J I (+1, 0) J (0, +1) -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

  16. Let’s score the transitions s(xi, yj) M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J s(xi, yj) s(xi, yj) -d -d I (+1, 0) J (0, +1) -e -e -e -e -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

  17. 1 – 2 –  M P(xi, yj) 1 – 2 –  1 – 2 –      I P(xi) J P(yj)   A Pair HMM for alignments BEGIN  1 – 2 –    I M J    END

  18. Gene Finding

  19. Generalized HMM Gene finder

  20. Generalized Pair-HMM gene finder

  21. Hierarchical state transition in pHMM

  22. Allowing for inserted exons

  23. Acknowledgments • Serafim Batzoglou: for some of the slides adapted or modified from his lecture slides at Stanford University • Lior Pachter': for some of the slides modified from his lectures at UC Berkeley

More Related