Learning HMM parameters

Learning HMM parameters Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576/ sroy@biostat.wisc.edu Oct 31st, 2013

Recall the three questions in HMMs • Given a sequence of observations how likely is it an HMM to have generated it? • Forward algorithm • What is the most likely sequence of states that has generated a sequence of observations • Viterbi • How can we learn an HMM from a set of sequences? • Forward-backward or Baum-Welch (an EM algorithm)

Learning HMMs from data • Parameter estimation • If we knew the state sequence it would be easy to estimate the parameters • But we need to work with hidden state sequences • Use “expected” counts of state transitions

begin end Learning without hidden information • Learning is simple if we know the correct path for each sequence in our training set 0 2 2 4 4 5 C A G T 1 3 0 5 2 4 • Estimate parameters by counting the number of times each parameter is used across the training set

Learning without hidden information • Transition probabilities • Emission probabilities Number of transitions from k to l k,l are states Number of times b is emitted from k

begin end Learning with hidden information • if we don’t know the correct path for each sequence in our training set, consider all possible paths for the sequence ? ? ? ? 0 5 C A G T 1 3 0 5 2 4 • estimate parameters through a procedure that counts the expected number of times each parameter is used across the training set

The Baum-Welch algorithm • Also known as Forward-backward algorithm • An Expectation Maximization algorithm • Expectation: Estimate the “expected” number of times there are transitions and emissions (using current values of parameters) • Maximization: Estimate parameters given hidden variables • Hidden variables are the state transitions and emission counts

The expectation step • We need to know the probability of the ith symbol being produced by state k, given sequencex(posterior probability of state k at time t) • We also need to know the probability of ith and (i+1)th symbol being produced by state k, and lgiven sequencex • Given these we can compute our expected counts for state transitions, character emissions

Computing • We will do this in a somewhat indirect manner • First we compute the probability of the entire observed sequence with the tthsymbol being generated by state k Forward algorithm fk(t) Backward algorithm bk(t)

Computing • If we can compute • How can we get Forward step

The backward algorithm • the backward algorithm gives us , the probability of observing the rest of x, given that we’re in state kafter icharacters 0.4 0.2 A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2 0.8 0.6 0.5 1 3 begin end 0 5 A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 0.5 0.9 0.2 2 4 0.1 0.8 C A G T

Steps of the backward algorithm • Initialization (t=T) • Recursion (t=T-1 to 1) • Termination

Computing • This is

Putting it all together • We need the expected number of times c is emitted by state k • And the expected number of times k transitions to l Training sequences

The maximization step • Estimate new emission parameters by: • Estimate new transition parameters by • Just like in the simple case but typically we’ll do some “smoothing” (e.g. add pseudocounts)

The Baum-Welch algorithm • initialize the parameters of the HMM • iterate until convergence • initialize , with pseudocounts • E-step: for each training set sequence j= 1…n • calculate values for sequence j • calculate values for sequence j • add the contribution of sequence j to , • M-step: update the HMM parameters using ,

begin end A 0.4 C 0.1 G 0.1 T 0.4 A 0.1 C 0.4 G 0.4 T 0.1 1.0 0.2 0.9 0 3 1 2 0.1 0.8 Baum-Welch algorithm example • given • the HMM with the parameters initialized as shown • the training sequences TAG, ACG • we’ll work through one iteration of Baum-Welch

Baum-Welch example (cont) • Determining the forward values for TAG • Here we compute just the values that are needed for computing successive values. • For example, no point in calculating f1(3) • In a similar way, we also compute forward values forACG

Baum-Welch example (cont) • Determining the backward values for TAG • Again, here we compute just the values that are needed • In a similar way, we also compute backward values for ACG

Baum-Welch example (cont) • determining the expected emission counts for state 1 contribution of TAG contribution of ACG pseudocount *note that the forward/backward values in these two columns differ; in each column they are computed for the sequence associated with the column

Baum-Welch example (cont) • determining the expected transition counts for state 1 (not using pseudocounts) • in a similar way, we also determine the expected emission/transition counts for state 2 Contribution of TAG Contribution of ACG

Baum-Welch example (cont) • determining probabilities for state 1

Summary • Three problems in HMMs • Probability of an observed sequence • Forward algorithm • Most likely path for an observed sequence • Viterbi • Can be used for segmentation of observed sequence • Parameter estimation • Baum-Welch • The backward algorithm is used to compute a quantity needed to estimate the posterior of a state given the entire observed sequence

Learning HMM parameters