310 likes | 410 Vues
Explore hidden Markov models in sequence modeling, including Markov Chains, CpG Islands, Viterbi Algorithm, Forward and Backward Algorithms, and Expectation Maximization.
E N D
Sequence Models • So far we examined several probabilistic model sequence models • These model, however, assumed that positions are independent • This means that the order of elements in the sequence did not play a role • In this class we learn about probabilistic models of sequences
Probability of Sequences • Fix an alphabet • Let X1,…,Xn be a sequence of random variables over • We want to model P(X1,…,Xn)
Markov Chains Assumption: • Xi+1 is independent of the past once we know Xi This allows us to write:
Markov Chains (cont) Assumption: • P(Xi+1|Xi) is the same for all i Notation P(Xi+1=b |Xi=a ) = Aab • By specifying the matrix A and initial probabilities, we define P(X1,…,Xn) • To avoid the special case of P(X1), we can use a special start state, and denote P(X1 = a) = Asa
Example: CpG islands • In human genome, CpG dinucleotides are relatively rare • CpG pairs undergo a process called methylation that modifies the C nucleotide • A methylated C can (with relatively high chance) mutate to a T • Promotor regions are CpG rich • These regions are not methylated, and thus mutate less often • These are called CpG islands
CpG Islands • We construct Markov chain for CpG rich and poor regions • Using maximum likelihood estimates from 60K nucleotide, we get two models
Ratio Test for CpG islands • Given a sequence X1,…,Xnwe compute the likelihood ratio
Finding CpG islands Simple Minded approach: • Pick a window of size N(N = 100, for example) • Compute log-ratio for the sequence in the window, and classify based on that Problems: • How do we select N? • What do we do when the window intersects the boundary of a CpG island?
Alternative Approach • Build a model that include “+” states and “-” states • A state “remembers” last nucleotide and the type of region • A transition from a - state to a + describes a start of CpG island
Hidden Markov Models Two components: • A Markov chain of hidden statesH1,…,Hn with L values • P(Hi+1=k |Hi=l ) = Akl • ObservationsX1,…,Xn • Assumption: Xidepends only on hidden state Hi • P(Xi=a |Hi=k ) = Bka
Computing Most Probable Sequence Given:x1,…,xn Output: h*1,…,h*n such that
Idea: • If we know the value of hi, then the most probable sequence on i+1,…,n does not depend on observations before time i • Let Vi(l) be the probability of the best sequence h1,…,hi such that hi = l
Viterbi Algorithm • Set V0(0) = 1, V0(l) = 0 for l > 0 • for i= 1, …, n • for l = 1,…,L • set • Let h*n = argmaxl Vn(l) • for i = n-1,…,1 • set h*i = Pi+1(h*i+1)
Computing Probabilities Given:x1,…,xn Output: P(x*1,…,x*n ) How do we sum of exponential number of hidden sequences?
Forward Algorithm • Perform dynamic programming on sequences • Let fi(l) = P(x1,…,xi,Hi=l) • Recursion rule: • Conclusion
Backward Algorithm • Perform dynamic programming on sequences • Let bi(l) = P(xi+1,…,xn|Hi=l) • Recursion rule: • Conclusion
Computing Posteriors • How do we compute P(Hi | x1,…,xn) ?
Dishonest Casino (again) • Computing posterior probabilities for “fair” at each point in a long sequence:
Learning Given a sequence x1,…,xn, h1,…,hn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn,h1,…,hn) We simply count: • Nkl - number of times hi=k & hi+1=l • Nka - number of times hi=k & xi = a
Learning Given only sequence x1,…,xn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn) Problem: • Counts are inaccessible since we do not observe hi
Expected Counts • We can compute expected number of times hi=k & hi+1=l • Similarly
Expectation Maximization (EM) • Choose Akl and Bka E-step: • Compute expected counts E[Nkl], E[Nka] M-Step: • Restimate: • Reiterate
EM - basic properties • P(x1,…,xn: Akl, Bka) P(x1,…,xn: A’kl, B’ka) • Likelihood grows in each iteration • If P(x1,…,xn: Akl, Bka) = P(x1,…,xn: A’kl, B’ka)then Akl, Bka is a stationary point of the likelihood • either a local maxima, minima, or saddle point
Complexity of E-step • Compute forward and backward messages • Time & Space complexity: O(nL) • Accumulate expected counts • Time complexity O(nL2) • Space complexity O(L2)
EM - problems Local Maxima: • Learning can get stuck in local maxima • Sensitive to initialization • Require some method for escaping such maxima Choosing L • We often do not know how many hidden values we should have or can learn