An Introduction to Hidden Markov Models and Gesture Recognition Troy L. McDaniel Research Assistant Center for Cognitive Ubiquitous Computing Arizona State University Notation and Algorithms From (Dugad and Desai 1996) Please send your questions, comments, and errata to Troy.McDaniel@asu.edu
The Big Picture We learned about this part Now lets take a closer look at how this part works…
Introduction • A hidden Markov model can be used to recognize any temporal or modeling sequence • How? We can train a finite state machine using training data consisting of sequences of symbols • States will represent, e.g., poses for gestures, and transitions between states will have probabilities HMM goodbye
Applications • Speech Recognition • Computational Biology • Computer Vision • Biometrics • Gesture Recognition • And many others… • Lets take a look at gesture recognition in detail…
Gesture Recognition-Training • Interact with a computer through gestures • Training • Create a database of gestures • We store the feature vectors of poses that make up each gesture • Create a database of poses to increase accuracy • Train HMMs for each class of gestures Goodbye Gesture Single Pose
Gesture Recognition-Testing • Testing • Segmentation – Obtain the user’s hand by identifying skin color pixels. This performs background subtraction. • Feature Extraction – Extract features. For example, we can fit ellipsoids around fingers and palm, and use their major axes and angles between them. • Pose Recognition – Match feature vectors with those in the pose database to improve recognition. • Gesture Recognition – Run gestures through all of the HMMs. The HMM with the highest probability is the recognized gesture.
Gesture Recognition System • Overview of system Next, we will learn how HMMs work…
Urns and Marbles Example • There are 3 urns filled with any number of marbles each of a certain color, say red, green or blue • A friend of ours is in a room choosing urns, each time taking out a marble, shouting the color, and putting it back • We’re outside the room and cannot see in! • We know the # of urns and observations (R, R, G, R, B,..) • But what is it that we don’t know? RED! He just saw a red!
Urns and Marbles Example-ll • The urns are states, each with an initial probability • Transition probabilities exist between states • The Markovian property • Each state represents a distribution of symbols (E.g., red = 25%, green = 25% and blue = 50% for urn 1)
So What’s An HMM? • As we’ve already seen, it is a finite number of states connected by transitions, which can generate an observation sequence depending on its transition, bias, and initial probabilities • It is represented as a set of three sets of probabilities • The Markov model is hidden because we don’t know which state led to each observation • Going from the urn example to more familiar models…
So What’s An HMM?-ll • For gesture recognition, a state will represent a pose • The distribution for each state will be symbols represented by feature vectors—e.g., the major axes of fingers and palm, and the angles between them. • Remember that during training, each gesture, even though it may belong to the same class (goodbye, etc.), will have variations. • An HMM can either represent a single object such as a word or gesture, or a collection of objects.
The Algorithms • Next, we’re going to cover algorithms for training and testing hidden Markov models • Algorithms include Forward-Backward , Viterbi , K-means, Baum-Welch , and the Kullback-Leibler based distance measure  • Each algorithm, once explained, will be mapped to pseudocode
HMM Structure Pseudocode • For the pseudocode, assume that HMMs are objects, containing the constants and data structures below.
Problem #1 • HMM applications are reduced to solving 3 problems. Lets look at the first one… • Problem 1: Given , how do we compute P(O|)? • Solution: Forward-Backward Algorithm • Why do we care? And when do we use it? What’s the probability of getting B, G, R, B?
Why Do We Care? Red, Green, Blue HMM 1 HMM 2 HMM 3 98% 5% 50%
But First, the Brute Force Approach • Lets look at the brute force approach  first • We can find this probability by finding the probability of O for a fixed state sequence times the probability of getting that state sequence • But we do this for every possible state sequence… • With NT possible state sequences, it’s not practical. T Blue, Green, Red, Blue Urn 1 Urn 2 Urn 3 Urn 1 Urn 2 Urn 3 Urn 1 Urn 2 Urn 3 Urn 1 Urn 2 Urn 3 N
Forward Algorithm • A more practical approach: Forward Algorithm  • The forward variable • The probability of the partial observation sequence up to time t and state i at time t • It is an inductive algorithm, shown next… What’s the probability of getting B, G, R, B, and ending at urn 2?
Forward Algorithm-ll Order N2T multiplications!
Forward Algorithm Example Time 1 2 3 1 What’s the probability of R, G, B? States 2 3 Just add up the circled values… It’s 2.95%!
Backward Algorithm • Next is the Backward Algorithm  • The backward variable • The probability of the observation sequence Ot+1, Ot+2, …, OT given an HMM and state i at time t • Similar, but important distinctions from the forward variable • These differences allow us to break a sequence in half and attack it from both ends • Reduced run time • Allows for novel algorithms
Backward Algorithm Example Time 1 2 3 1 What’s the probability of R, G, B? States 2 3 0.5*0.25*0.12 + 0.25*0.2*0.1125 + 0.25*0.45*0.0788 = 2.9%
Problem #2 • Problem 2: Given , find a state sequence I such that the occurrence of the observation sequence O is greater than from any other state sequence. I.e., find a state sequence such that P(O, I| ) is maximized. • Solution: Viterbi Algorithm  • Why do we care? And when do we use it? What sequence of urns will give us the best chance of getting B, G, B? 3 1 2
Why Do We Care? A particular state sequence within a hidden Markov model can correspond to a certain object, such as the word ‘hello’, which is made up of phonemes represented as states. This highest-probability sequence may correspond to a particular word or gesture, for example. …
Viterbi Algorithm As these increase… 0.6 Cost is -ln(aijbj(Ot)) = -ln(0.6*0.3) = 1.71 = 0.3 1 2 …the cost decreases!
Viterbi Algorithm-ll • So, it all comes down to finding the path with the minimum cost! • A low probability = a large cost • A high probability = a small cost Low Probability -> High Cost High Probability -> Low Cost
Viterbi Algorithm-lll (and order of N2T multiplications!)
Viterbi Algorithm Example Time sTable Time aTable 1 2 1 2 1 1 States 2 States 2 3 3 What’s the best path for of R, B? Take the minimum value here, match it with this entry, trace backward, and we get a path of 1,2.
Problem #3 How can we maximize the probability of getting B, G, B? Or maximize R, B, G, and it’s best state sequence 1, 3, 2?
K-means Algorithm K-means Trainer Training Data Trained HMM
K-means Algorithm Pseudocode-ll … 2 3 4 5 …
K-means Algorithm Example Initial means Let the colors of marbles in our urns take on decimal values, and be a function of R, G, B. 2 2 2 2 1 1 Classify 1 1 2 3 3 Points in R, G, B space 2 2 2 1 HMM Generation Re-classify 1 1 1 Calculate new means 3
Baum-Welch Re-estimation Formulas Baum-Welch Algorithm Initial HMM Trained HMM Observation Sequence
Baum-Welch Pseudocode 1 2 …