Discriminative Training and Machine Learning Approaches

Discriminative Trainingand Machine Learning Approaches Chih-Pin Liao Machine Learning Lab, Dept. of CSIE, NCKU

Discriminative Training

Our Concerns • Feature extraction and HMM modeling should be jointly performed. • Common objective function should be considered. • To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory. • Model parameters should be calculated rapidly without applying descent algorithm.

Minimum Classification Error (MCE) • MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications. • Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors. • Gradientdescent algorithm was used to estimate HMM parameters.

MCE Training Procedure • Procedure of training discriminative models using observations X • Discriminant function • Anti-discriminant function • Misclassification measure

Expected Loss • Loss function is calculated by mapping into a range between zero to one through a sigmoid function. • Minimize the expected loss or classification error to find discriminative model.

Hypothesis Test

Likelihood Ratio Test • New training criterion was derived from hypothesis test theory. • We are testing null hypothesis against alternative hypothesis. • Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma • Higher likelihood ratio imply stronger confidence towards accepting null hypothesis.

Hypotheses in HMM Training • Null and alternative hypotheses : ObservationsX are from target HMM state j : Observation X are not from target HMM state j • We develop discriminative HMM parameters for target state against non-target states. • Problem turns out to verify the goodness of data alignment to the corresponding HMM states.

Maximum Confidence Hidden Markov Model

Maximum Confidence HMM • MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure where parameter set consists of HMM parameters and transformation matrix

Hybrid Parameter Estimation • Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation • E-step

Expectation Function

MC Estimates of HMM Parameters

MC Estimate of Transformation Matrix

MC Classification Rule • Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y

Summary • A new maximum confidence HMM framework was proposed. • Hypothesis test principle was used for building training criterion. • Discriminative feature extraction and HMM modeling were performed under the same criterion. • “Maximum Confidence Hidden Markov Modeling for Face Recognition”Chien, Jen-Tzung; Liao, Chih-Pin;Pattern Analysis and Machine Intelligence, IEEE Transactions onVolume 30, Issue 4, April 2008 Page(s):606 – 616

Machine Learning Approaches

Introduction • Conditional Random Fields (CRF) • relax the normal conditional independence assumption of the likelihood model • enforce the homogeneity of labeling variables conditioned on the observation • Due to the weak assumptions of CRF model and its discriminative nature • allows arbitrary relationship among data • may require less resources to train its parameters

Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs) • language and text processing problem • Object recognition problems • Image and video segmentation • tracking problem in video sequences

Generative & Discriminative Model

Two Classes of Models • Generative model (HMM) - model the distribution of states • Direct model (MEMM and CRF) - model the posterior probability directly MEMM CRF

Comparisons of Two Kinds of Model • Generative model – HMM • Use Bayesian rule approximation • Assume that observations are independent • Multiple overlapping features are not modeled • Model is estimated through recursive Viterbi algorithm

Direct model - MEMM and CRF • Direct modeling of posterior probability • Dependencies of observations are flexibly modeled • Model is estimated through recursive Viterbi algorithm

Hidden Markov Model & Maximum Entropy Markov Model

HMM for Human Motion Recognition • HMM is defined by • Transition probability • Observation probability

Maximum Entropy Markov Model • MEMM is defined by • is used to replace transition and observation probability in HMM model

Maximum Entropy Criterion • Definition of feature functions where • Constrained optimization problem where empirical expectation model expectation

Solution of MEMM • Lagrange multipliers are used for constrained optimization where are the model parameters • Solution is obtained by

GIS Algorithm • Optimize the Maxmimum Mutual Information Criterion (MMI) • Step1: Calculate the empirical expectation • Step2: Start from an initial value • Step3: Calculate the model expectation • Step4: Update model parameters • Repeat step 3 and 4 until convergence

Conditional Random Field

Conditional Random Field • Definition Let be a graph such that . When conditioned on , and obeyed the Markov property Then, is a conditional random field

CRF Model Parameters • The undirected graphical structure can be used to factorize into a normalized product of potential functions • Consider the graph as a linear-chain structure • Model parameter set • Feature function set

CRF Parameter Estimation • We can rewrite and maximize the posterior probability where and • Logposterior probability is given by

Parameter Updating by GIS Algorithm • Differentiating the log posterior probability with respect to parameter • Setting this derivative to zero yields the constraint in maximum entropy model • This estimation has no closed-form solution. We can use GIS algorithm.

Summary and Future works • We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied. • In the future, the variational inference algorithm will be developed for improving calculation of conditional probability. • The posterior probability can be calculated directly by a approximating approach. • “Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung;ICASSP 2008. IEEE International Conference on March 31 2008-April 4 2008 Page(s):1969 - 1972

Thanks for your attention and Discussion

Discriminative Training and Machine Learning Approaches

Discriminative Training and Machine Learning Approaches

Presentation Transcript

CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers

Learning and Vision: Discriminative Models

Dependency Parsing: Machine Learning Approaches

Discriminative Learning of Extraction Sets for Machine Translation

Machine Learning in Natural Language More on Discriminative models

Machine Translation Discriminative Word Alignment

Hybrids of generative and discriminative methods for machine learning

Discriminative Training Approaches for Continuous Speech Recognition

Machine Learning Training

Machine learning online courses, Machine learning online Training, Machine learning Certifications

Machine Learning Training

Machine learning Courses | Machine Learning Training

Machine Learning Training

Machine Learning Online Training | Machine Learning Training | Hyderabad

Machine Learning Training

Machine learning | Training | Career

Last Lecture – Discriminative approaches (SVM, HCRF)

Hybrids of generative and discriminative methods for machine learning

machine learning training

Machine Learning Projects | Machine Learning Applications | Machine Learning Training | Simplilearn

machine Learning Training inChandigarh

Machine Learning Approaches for Demand Forecasting