Hidden Markov Models

Hidden Markov Models Theory By Johan Walters (SR 2003)

Topics overview • HMM’s as part of Speech Recognition • Input / Output • Basics • Definition • Simple HMM • Markov assumption • HMM program • Evaluation problem • Decoding problem • Learning problem

HMM in SR

Input / Output An HMM is a statistical model that describes a probability distribution over a number of possible sequences. • Input: A sequence of feature vectors • Output: Words with highest probability being spoken Given a sequence of feature vectors, what words are most probably meant?

Basics • States • State transition probabilities • Symbol emission probabilities State transition probability matrix

A simple HMM Formal definition HMM An output observation alphabet The set of states A transition probability matrix An output probability matrix An initial state distribution Assumptions • Markov assumption • Output independence assumption Ease of use / no significant affect Formal notation whole parameter set

Markov assumption “probability of the random variable at a given time depends only on the value at the preceding time.”

HMM Program • t:=1; • Start in state sj with probability πi(i.e., X1 = i) • Forever do • Move from state si to state sj with probability aij (i.e. Xt+1 = j) • Emit observation symbol ot = k with probability bijk • t := t+1 • end

HMM ㅕ ㄹ ㅓ Features frame Frame shift Speech signals time • A symbol sequence (or observations) is generated by starting at an initial state and moving from state to state until a terminal state is reached. • The state sequence is “hidden”. • Only the symbol sequence that hidden states emit is observable.

Problems The Evaluation Problem Given the observation sequence O and the model Ф, how do we efficiently compute P(O|Ф), the probability of the observation sequence, given the model? The Decoding Problem Finding the sequence of hidden states that most probably generated an observed sequence. The Learning Problem How can we adjust the model parameter to maximize the joint probability (likelihood)?

How to evaluate an HMM • Given multiple HMM’s (1 for each word) and a observation sequence. Which HMM most probably generated the sequence? Simple (expensive) solution: • Enumerate all possible state sequences S of length T • Sum up all probabilities of these sequences • Probability of path S (calculate for all paths): State sequence probability * joint output probability • Forward Algorithm is used to calculate above idea much more efficient, Complexity O(N2T) • Recursive use of partially computed probabilities for efficiency

HMM for word 1 l1 Seoul P(X|l1) Likelihood computation . . . Recognized word Speech Select maximum Feature extraction HMM for word V lV Likelihood computation P(X|lV) How to evaluate an HMM (2)

How to decode an HMM • Forward algorithm does not find best state sequence (‘best path’) • Exhaustive search for best path is expensive • Viterbi algorithm is used: • Also uses partially computed results recursively • Partially computed results are best path so far • Each calculated state remembers most optimal previous state invoking it • Complexity O(N2T) • Finding best path is very important for continuous speech recognition

How to estimate HMM Parameters (learning) Baum-Welch ( or Forward-Backward) algorithm • Estimation of model parameters ф=(A,B,): • First make an initial guess of the parameters (which may well be entirely wrong) • Refine it by assessing its worth, attempt to reduce provoked errors when fitted to the given data • Performs a form of gradient descent, looking for a minimum of an error measure. • Forward probability term  and backward probability term  Similar to Forward & Viterbi (recursive use of incomplete data) but more complex • Unsupervised learning: feed sample speech data along with phonemes of spoken words

waveform feature il i chil Converged? Yes Speech database Feature Extraction Baum-Welch Re-estimation end No Word HMM l1 l2 l7 How to estimate HMM Parameters (learning) (2)

Hidden Markov Models