Discriminative Training and Machine Learning Approaches
400 likes | 423 Vues
Discriminative Training and Machine Learning Approaches. Chih-Pin Liao. Machine Learning Lab, Dept. of CSIE, NCKU. Discriminative Training. Our Concerns. Feature extraction and HMM modeling should be jointly performed. Common objective function should be considered.
Discriminative Training and Machine Learning Approaches
E N D
Presentation Transcript
Discriminative Trainingand Machine Learning Approaches Chih-Pin Liao Machine Learning Lab, Dept. of CSIE, NCKU
Our Concerns • Feature extraction and HMM modeling should be jointly performed. • Common objective function should be considered. • To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory. • Model parameters should be calculated rapidly without applying descent algorithm.
Minimum Classification Error (MCE) • MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications. • Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors. • Gradientdescent algorithm was used to estimate HMM parameters.
MCE Training Procedure • Procedure of training discriminative models using observations X • Discriminant function • Anti-discriminant function • Misclassification measure
Expected Loss • Loss function is calculated by mapping into a range between zero to one through a sigmoid function. • Minimize the expected loss or classification error to find discriminative model.
Likelihood Ratio Test • New training criterion was derived from hypothesis test theory. • We are testing null hypothesis against alternative hypothesis. • Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma • Higher likelihood ratio imply stronger confidence towards accepting null hypothesis.
Hypotheses in HMM Training • Null and alternative hypotheses : ObservationsX are from target HMM state j : Observation X are not from target HMM state j • We develop discriminative HMM parameters for target state against non-target states. • Problem turns out to verify the goodness of data alignment to the corresponding HMM states.
Maximum Confidence HMM • MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure where parameter set consists of HMM parameters and transformation matrix
Hybrid Parameter Estimation • Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation • E-step
MC Classification Rule • Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y
Summary • A new maximum confidence HMM framework was proposed. • Hypothesis test principle was used for building training criterion. • Discriminative feature extraction and HMM modeling were performed under the same criterion. • “Maximum Confidence Hidden Markov Modeling for Face Recognition”Chien, Jen-Tzung; Liao, Chih-Pin;Pattern Analysis and Machine Intelligence, IEEE Transactions onVolume 30, Issue 4, April 2008 Page(s):606 – 616
Introduction • Conditional Random Fields (CRF) • relax the normal conditional independence assumption of the likelihood model • enforce the homogeneity of labeling variables conditioned on the observation • Due to the weak assumptions of CRF model and its discriminative nature • allows arbitrary relationship among data • may require less resources to train its parameters
Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs) • language and text processing problem • Object recognition problems • Image and video segmentation • tracking problem in video sequences
Two Classes of Models • Generative model (HMM) - model the distribution of states • Direct model (MEMM and CRF) - model the posterior probability directly MEMM CRF
Comparisons of Two Kinds of Model • Generative model – HMM • Use Bayesian rule approximation • Assume that observations are independent • Multiple overlapping features are not modeled • Model is estimated through recursive Viterbi algorithm
Direct model - MEMM and CRF • Direct modeling of posterior probability • Dependencies of observations are flexibly modeled • Model is estimated through recursive Viterbi algorithm
Hidden Markov Model & Maximum Entropy Markov Model
HMM for Human Motion Recognition • HMM is defined by • Transition probability • Observation probability
Maximum Entropy Markov Model • MEMM is defined by • is used to replace transition and observation probability in HMM model
Maximum Entropy Criterion • Definition of feature functions where • Constrained optimization problem where empirical expectation model expectation
Solution of MEMM • Lagrange multipliers are used for constrained optimization where are the model parameters • Solution is obtained by
GIS Algorithm • Optimize the Maxmimum Mutual Information Criterion (MMI) • Step1: Calculate the empirical expectation • Step2: Start from an initial value • Step3: Calculate the model expectation • Step4: Update model parameters • Repeat step 3 and 4 until convergence
Conditional Random Field • Definition Let be a graph such that . When conditioned on , and obeyed the Markov property Then, is a conditional random field
CRF Model Parameters • The undirected graphical structure can be used to factorize into a normalized product of potential functions • Consider the graph as a linear-chain structure • Model parameter set • Feature function set
CRF Parameter Estimation • We can rewrite and maximize the posterior probability where and • Logposterior probability is given by
Parameter Updating by GIS Algorithm • Differentiating the log posterior probability with respect to parameter • Setting this derivative to zero yields the constraint in maximum entropy model • This estimation has no closed-form solution. We can use GIS algorithm.
Summary and Future works • We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied. • In the future, the variational inference algorithm will be developed for improving calculation of conditional probability. • The posterior probability can be calculated directly by a approximating approach. • “Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung;ICASSP 2008. IEEE International Conference on March 31 2008-April 4 2008 Page(s):1969 - 1972
Thanks for your attention and Discussion