Hidden Markov Models Sasha Tkachev and Ed Anderson

Presenter: Sasha Tkachev Hidden Markov ModelsSasha TkachevandEd Anderson

Forward algorithm • We want to find P(sequence | HMM) • Naïve way: sum up probabilities of all possible paths • Using recursion this can be done more effectively, probability to be in cloudy state at t=2 only depends on t=1 and observation at t=2 • When we reach t=3 our P is simply a sum of probabilities of being sunny, cloudy or rainy at t=3 http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/forward_algorithm/s1_pg1.html

Pfam • Database of protein domains and domain families • Contains multiple sequence alignments and profile HMMs for every domain • “Seed” and “full” alignments, seed alignment is rather small full alignment contains everything and is built using HMMER out of seed alignment http://www.sanger.ac.uk/Software/Pfam

Using Pfam • For known proteins, get a pre-calculated domain structure • For new sequences, get a list of matching domains • Analyse domain structure, e.g., find a list of proteins with a similar domain structure; find a list of proteins containing domains A and B; • Species specific analysis, e.g. find all domains unique to a certain virus

generalized HMM (GENSCAN) Gene prediction, GENSCAN (1997) • “Explicit state duration HMM”, generalized HMM (GHMM) • P(Φ, S) = P(s1|q1,d1)f(d1)T(q1|q2) x P(s2|q2,d2)f(d2) … T(qN-1|qN) x P(sN|qN,dN)f(dN) Φ – sequence of states {q1 … qN} T(q|q’) – transition probability q’ → q f(d) – state duration probability according to a distribution • Individual states can themselves be an HMM, e.g. coding exon states

Modelling Internal Coding Exons • See if evaluated sequence looks like coding or non-coding region by looking at hexamer (a “word” of 6 bp long) frequencies in exons/introns. This is done with 5-th order HMM • Take into account splice signals, start and stop translational signals (all non-HMM) • Use modified Viterbi algorithm to get the optimal parse

Comparative genomic methods • Mouse and human genome sequences provide new data, how to use it ? • Use GPHMM for alignment and gene prediction at the same time for both genomes (SLAM) • Or modify GENSCAN scoring schema with alignment scores (TWINSCAN) generalized pair HMM (SLAM) • Methods that can use more than two genomes are being developed, e.g. TWINSCAN 3.0

Hidden Markov Models Sasha Tkachev and Ed Anderson

Hidden Markov Models Sasha Tkachev and Ed Anderson

Presentation Transcript

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models