Understanding Hidden Markov Models in Sequence Modeling

Class 5:Hidden Markov Models .

Sequence Models • So far we examined several probabilistic model sequence models • These model, however, assumed that positions are independent • This means that the order of elements in the sequence did not play a role • In this class we learn about probabilistic models of sequences

Probability of Sequences • Fix an alphabet  • Let X1,…,Xn be a sequence of random variables over  • We want to model P(X1,…,Xn)

Markov Chains Assumption: • Xi+1 is independent of the past once we know Xi This allows us to write:

Markov Chains (cont) Assumption: • P(Xi+1|Xi) is the same for all i Notation P(Xi+1=b |Xi=a ) = Aab • By specifying the matrix A and initial probabilities, we define P(X1,…,Xn) • To avoid the special case of P(X1), we can use a special start state, and denote P(X1 = a) = Asa

Example: CpG islands • In human genome, CpG dinucleotides are relatively rare • CpG pairs undergo a process called methylation that modifies the C nucleotide • A methylated C can (with relatively high chance) mutate to a T • Promotor regions are CpG rich • These regions are not methylated, and thus mutate less often • These are called CpG islands

CpG Islands • We construct Markov chain for CpG rich and poor regions • Using maximum likelihood estimates from 60K nucleotide, we get two models

Ratio Test for CpG islands • Given a sequence X1,…,Xnwe compute the likelihood ratio

Empirical Evalation

Finding CpG islands Simple Minded approach: • Pick a window of size N(N = 100, for example) • Compute log-ratio for the sequence in the window, and classify based on that Problems: • How do we select N? • What do we do when the window intersects the boundary of a CpG island?

Alternative Approach • Build a model that include “+” states and “-” states • A state “remembers” last nucleotide and the type of region • A transition from a - state to a + describes a start of CpG island

Hidden Markov Models Two components: • A Markov chain of hidden statesH1,…,Hn with L values • P(Hi+1=k |Hi=l ) = Akl • ObservationsX1,…,Xn • Assumption: Xidepends only on hidden state Hi • P(Xi=a |Hi=k ) = Bka

Semantics

Example: Dishonest Casino

Computing Most Probable Sequence Given:x1,…,xn Output: h*1,…,h*n such that

Idea: • If we know the value of hi, then the most probable sequence on i+1,…,n does not depend on observations before time i • Let Vi(l) be the probability of the best sequence h1,…,hi such that hi = l

Dynamic Programming Rule • so

Viterbi Algorithm • Set V0(0) = 1, V0(l) = 0 for l > 0 • for i= 1, …, n • for l = 1,…,L • set • Let h*n = argmaxl Vn(l) • for i = n-1,…,1 • set h*i = Pi+1(h*i+1)

Computing Probabilities Given:x1,…,xn Output: P(x*1,…,x*n ) How do we sum of exponential number of hidden sequences?

Forward Algorithm • Perform dynamic programming on sequences • Let fi(l) = P(x1,…,xi,Hi=l) • Recursion rule: • Conclusion

Backward Algorithm • Perform dynamic programming on sequences • Let bi(l) = P(xi+1,…,xn|Hi=l) • Recursion rule: • Conclusion

Computing Posteriors • How do we compute P(Hi | x1,…,xn) ?

Dishonest Casino (again) • Computing posterior probabilities for “fair” at each point in a long sequence:

Learning Given a sequence x1,…,xn, h1,…,hn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn,h1,…,hn) We simply count: • Nkl - number of times hi=k & hi+1=l • Nka - number of times hi=k & xi = a

Learning Given only sequence x1,…,xn • How do we learn Akl and Bka ? • We want to find parameters that maximize the likelihoodP(x1,…,xn) Problem: • Counts are inaccessible since we do not observe hi

If we have Akl and Bka we can compute

Expected Counts • We can compute expected number of times hi=k & hi+1=l • Similarly

Expectation Maximization (EM) • Choose Akl and Bka E-step: • Compute expected counts E[Nkl], E[Nka] M-Step: • Restimate: • Reiterate

EM - basic properties • P(x1,…,xn: Akl, Bka)  P(x1,…,xn: A’kl, B’ka) • Likelihood grows in each iteration • If P(x1,…,xn: Akl, Bka) = P(x1,…,xn: A’kl, B’ka)then Akl, Bka is a stationary point of the likelihood • either a local maxima, minima, or saddle point

Complexity of E-step • Compute forward and backward messages • Time & Space complexity: O(nL) • Accumulate expected counts • Time complexity O(nL2) • Space complexity O(L2)

EM - problems Local Maxima: • Learning can get stuck in local maxima • Sensitive to initialization • Require some method for escaping such maxima Choosing L • We often do not know how many hidden values we should have or can learn

Understanding Hidden Markov Models in Sequence Modeling

Understanding Hidden Markov Models in Sequence Modeling

Presentation Transcript

Markov Random Fields

Ordinal and Multinomial Models

Introduction to Medical Decision Making and Decision Analysis

Markov Decision Processes: A Survey

Stock Returns Predictability using Markov Regime Switching Model

Markov Logic

Major Models and Hypotheses of Chiropractic Subluxation: II. Neurologic Models

Class 6 Qualitative Dependent Variable Models

Planning under Uncertainty with Markov Decision Processes: Lecture II

From: “The Hidden Teacher” an essay by Loren Eiseley

Graphical Models

Interaction Design Models

Conditional Random Fields

Partially Observable Markov Decision Processes

What and why of process excellence models Adaptive Processes Consulting

Hidden Markov Models

The Rendering Process Hidden Line and Hidden Surface Removal (HLHSR)

Objectives of this class

Simulation Algorithms for Lattice QCD

Mining the Biomedical Literature

2.5 Using Linear Models