Pattern Recognition: Statistical and Neural

Nanjing University of Science & Technology Pattern Recognition:Statistical and Neural Lonnie C. Ludeman Lecture 9 Sept 28, 2005

Review 1: Classifier performance Measures 1. A’Posteriori Probability (Maximize) 2. Probability of Error ( Minimize) 3. Bayes Average Cost (Maximize) 4. Probability of Detection ( Maximize with fixed Probability of False alarm) (Neyman Pearson Rule) 5. Losses (Minimize the maximum)

Review 2: MAP, MPE , and Bayes Classification Rule C1 > If l( x ) N < C2 Threshold Likelihood ratio P(C2) P(C2) NMAP = NMPE = P(C1) P(C1) (C22 - C12 ) P(C2) NBAYES = (C11 - C21 ) P(C1)

Review 3. General Calculation of Probability of Error F2 decide C2 F1 decide C1 R1 decide C1 y = g(x) y Feature Space x p(x | C1) L( x ) = Pattern Space p(x | C2) R2 decide C2 N = Threshold L1 decide C1 0 L1 decide C1 Likelihood Ratio Space

Topics for Lecture 9 • Neyman Pearson Decision Rule Receiver Operating Characteristic(ROC) • M-Class Case MAP Decision Rule • M-Class Case MPE Decision Rule • 4. M-Class Bayes Decision Rule

Motivation: Falling Rock Small probability of a falling rock Difficult to assign realistic costs to consequences Very High cost to not detect Low cost for false alarm

Definitions: P(decide target | target) Detection P(decide no target | target) Miss P(decide target | no target) False Alarm P(decide no target | no target ) Correct Dismissal

Neyman Pearson Classifier- 2 Classes A. Assumptions: C1: (target), known p(x | C1) C2: (no Target),known p(x | C2) NoApriori probabilities specified Acceptable False Alarm rate specified No cost assignment available specified

B. Performance: Probability of Detection and Probability of False Alarm PD = P(decide target | target is present) PFA = P(decide target | when target C Decision rule: Want to Maximize the probability of Detection for an acceptable False alarm rate is NOT present)

Neyman Pearson Decision Rule – rough derivation 0 0 PD= P (decide target | target) = p(x | C1 ) dx = 1 - PM R1 PFA= P( decide target | no target) = p(x | C2 ) dx R1 < PFA= Acceptable false alarm rate Use Lagrangian Multipliers to minimize J as follows J = PM + (PFA - )

Neyman Pearson Decision Rule – rough derivation 0 0 J = 1 - p(x | C1 ) dx + ( p(x | C2 ) dx - ) = 1 - + [ - p(x | C1 ) + p(x | C2 ) ] dx R1 R1 R1 To minimize J we select x to be in R1 if the term in [ … ] is negative. x is assigned to R1 if - p(x | C1 ) + p(x | C2 ) < 0 which can be rearranged as follows

Neyman Pearson Decision Rule 0 C1 p(x | C1 ) > If = NNP < p(x | C2 ) C2 where is the solution of the constraining equation = p(x | C2 ) dx R1( )

Receiver Operating Characteristic (ROC) PD Always Say Target 1 Slope = NNP pD (pD , pFA ) Operating Point 0 pFA 1 PFA Always Say NO Target

Extension of MAP, MPE, & Bayes to M Classes Shorthand Notation for M Class case: C1 : x ~ p(x | C1) , P(C1) C2 : x ~ p(x | C2) , P(C2) CM : x ~ p(x | CM) , P(CM)

Maximum A’Posteriori Classification Rule (M Class Case ) A. Basic Assumptions: Know : Conditional Probability Density functions pX( x | C1), pX( x | C2 ), … , pX( x | CM ). Know : A’Priori Probabilities P( C1 ), P( C2 ), … , P( CM ) B. Performance Measure:A’posteriori Probability P( Ci | x )

Maximum A’Posteriori Classification Rule (M Class Case ) C. Decision Rule for an observed vector x, Selects class with Maximum Aposterioi Probabability. if P(Ci | x) > P(Cj | x ) for all j = 1, 2, … , M then decide x from C1 if equality then decide x from the boundary classes by random choice j = i

Derivation of MAP Decision Rule Determine for i = 1, 2, … , M the aposteriori probabilities P(Ci| x ) Use One form of Bayes Theorem P(Ci | x ) = p( x | Ci ) P(Ci ) / p( x ) Substitute the above for the P(Ci| x) to give p( x | Ci ) P(Ci ) / p( x ), i = 1, 2, … , M But p( x ) is the same for all terms so the decision rule simplifies to

MAP Decision Rule for an observed vector x Select class Ci if p( x | Ci ) P(Ci ) > p( x | Cj ) P(Cj ) for all j = 1, 2, … , M j = i if equality then decide x from the boundary classes by random choice

2. Minimum Probability of Error Classification Rule (M Class Case ) A. Basic Assumptions: Known conditional probability density functions p(x | C1), p(x | C2), … , p(x | CM) Known a’priori probabilities P(C1), P(C2), … , P(CM) B. Performance: (Total Probability of Error) P(error) = p(error | C1) P(C1) + P(error | C2) P(C2) + … + P(error | CM) P(CM) C. Decision Rule:Minimizes P(error)

2. Derivation: Minimum Probability of Error Classification Rule (M Class Case ) Selects decision regions such that P(error) is minimized Decide C2 Decide C1 R2 R1 Ri RM Decide Ci Decide CM Pattern Space X But P(error) = 1 – P(correct) where P(correct) = P(correct | C1) P(C1) + P(correct | C2) P(C2) + … + P(correct | CM) P(CM)

Derivation Continued Rk M P(error) = 1 - p(x | Ck )P(Ck) dx k=1 The Minimum Probability of error decision rule selects Rk k=1, 2, … , M such that the P(errror) is minimized. By selecting xto be a member of Rk if the term p(x | Ck )P(Ck) is the MAXIMUM all others we will minimize P(errror).

Thus MPE Decision Rule for an observed vector x Select class Ck if p( x | Ck ) P(Ck ) > p( x | Cj ) P(Cj ) for all j = 1, 2, … , M j = k if equality then decide x from the boundary classes by random choice

Bayes Classifier- (M Class Case) A: Statistical Assumptions Known: C1 : x ~ p(x | C1) , P(C1) C2 : x ~ p(x | C2) , P(C2) Ck : x ~ p(x | Ck) , P(Ck) CM : x ~ p(x | CM) , P(CM) … … ConditionalProbabilityDensity Functions Classes Observed Pattern Vector A’Priori Probabilities

Bayes Classifier - Cost definitions Define Costs associated with decisions: C11 , C12 , … , C1M C21 , C22 , … , C2M CM1 , CM2 , … , CMM … Where C = the cost associated with deciding Class C when true class Class C i j i j

Rk Bayes Classifier -Risk Definition M-Class Case Risk is defined as the average cost associated with making a decision. M M R =Risk = P(decide Ci | Cj) P(Cj) Cij i=1 j=1 P(decide Ck | Cj ) = p(x | Cj ) dx

R1 R2 Derivation Continued M Risk = C1j p(x | Cj) P(Cj) j=1 M C2j p(x | Cj) P(Cj) + j=1 … M + CMj p(x | Cj) P(Cj) j=1 RM

Bayes Decision Rule: M-Class Case M yi(x) = Cijp(x | Cj) P(Cj) j=1 To MINIMIZE risk we would assign x to the region Riif yi(x) < yj(x) for all j = i

Bayes Decision Rule: M-Class Case Final Step of Derivation M yi(x) = Cijp(x | Cj) P(Cj) j=1 if yi(x) < yj(x) for all j = i Then decide x is from Ci

Summary • Neyman Pearson Decision Rule Receiver Operating Characteristic(ROC} • M-Class Case MAP Decision Rule • M-Class Case MPE Decision Rule • 4. M-Class Bayes Decision Rule

End of Lecture 9

Pattern Recognition: Statistical and Neural