130 likes | 257 Vues
This document explores the integration of Hidden Markov Models (HMMs) with finite state automata (FSAs) to advance techniques in pairwise sequence alignment. By converting FSAs into probabilistic models, we can evaluate alignment reliability and derive alternative alignments based on comprehensive scoring systems. The application of the Viterbi Algorithm within the HMM framework allows for effective computation of alignment probabilities. This approach provides advantages by utilizing log-odds scores for optimal alignment while maintaining independence from specific alignment paths.
E N D
Hidden Markov Models Pairwise Alignments
Hidden Markov Models • Finite state automata with multiple states as a convenient description of complex dynamic programming algorithms for pairwise alignment • Basis for a probabilistic modelling of the gapped alignment process by converting the FSA into HMM • Advantages: 1) use resulting probabilistic model to explore reliability of the alignment and explore alternative alignments 2) weighting all alternative alignments probabilistically yields scores of similarity independent of any specific alignment
X qxi ε δδ τ δ 1-ε -τ τ M pxiyj E B 1-2δ - τ 1-ε -τ τ δ δ Y qyj ε Hidden Markov Models
Hidden Markov Models • Pair Hidden Markov Models generate an aligned pair of sequences • Start in the Begin state B and cycle over the following two steps: 1) pick the next state according to the transition probability distributions leaving the current state 2) pick a symbol pair to be added to the alignment according to the emission probability distribution in the new state • Stop when a transition into the End state E is made
Hidden Markov Models • State M has emission probability distribution pab for emitting an aligned pair a:b • States X ynd Y have distributions qxi for emitting symbol xi from sequence x against a gap • The transition probability from M to an insert state X or Y is denoted δ and the probability of staying in an insert state by ε • The probability for transition into an end state is denoted τ • All algorithms discussed so far carry across to pair HMMs • The total probability of generating a particular alignment is just the product of the probabilities of each individual step.
Hidden Markov Models Viterbi Algorithm for pair HMMs • Initialisation: • Recurrence: • Termination:
1-η η η X qxi Y qyj η 1-η 1-η E B η 1-η Hidden Markov Models probabilistic model for a random alignment
Hidden Markov Model • The main states X and Y emit the two sequences independently • The silent state does not emit any symbol but gathers input from the X and Begin states • The probability of a pair of sequences according to the random model is
Hidden Markov Model • Allocate the terms in this expression to those that make up the probability of the Viterbi alignment, so that the log-odds ratio is the sum of the individual log-odds terms • Allocate one factor of (1-η) and the corresponding qa factor to each residue that is emitted in a Viterbi step • So the match transitions will be allocated (1-η)2qaqb where a and b are the residues matched • The insert states will be allocated (1-η)qa where a is the residue inserted • As the Viterbi path must account for all residues, exactly (n+m) terms will be used
Hidden Markov Model • We can now compute in terms of an additive model with log-odds emission scores and log-odds transition scores. • In practice this is the most practical way to implement pair HMMs • Merge the emission scores with the transitions to produce scores that correspond to the standard terms used in sequence alignment by dynamic programming • Now the log-odds version of the Viterbi alignment algorithm can be given in a form that looks like standard pairwise dynamic programming
Hidden Markov Model Optimal log-odds alignment • Initialisation: • Recursion: • Termination:
Hidden Markov Model • The constant c in the termination has the value • The procedure shows how for any pair HMM we can derive an equivalent finite state automaton for obtaining the most probable alignment