A Posteriori Corrections to Classification Methods

A Posteriori Corrections to Classification Methods Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk

Motivation So you’ve got you model … but that’s not the end. Try to derive as much information from it as possible. • In pathological cases NN, FS, RS and other systems lead to results below the base rates. How to avoid it? • In controlled experiment the split was 50%-50%. In real life it is 5-95%. How to deal with it? • So your model is accurate; that doesn’t impress me much.How about the costs? Confidence in results? Sensitivity? Specificity? Can you improve it quickly? A posteriori corrections may help and are (almost) for free.

Corrections increasing accuracy NN, kNN, FIS, RS & others do not estimate probabilities rigorously, but some estimations of p(Ci|X) are obtained. Many systems do not optimize error functions. Idea: linear scaling of probabilities: K classes, CK is the majority class. ki = 0 gives majority classifier, ki = 1 gives original one. Optimize ki = 0.

Softmax If ki  [0,1] then p. of the majority class may only grow. Solution: assume ki  [0,∞], and kK =1, use softmax This will flatten probabilities; for 2 classes: P(C|X)[(1+e-1)-1,(1+e+1)-1][0.27,0.73].

Cost function Pi(X) are “true” probabilities, if given, or 1 if the label of the training vector X is CiPi(X) = 0 otherwise kNNs, Kohonen nets, Decision Trees, many fuzzy and rough systems do not minimize such cost function. Alternative: stacking with linear perceptron.

Cost function with linear rescaling Due to normalization:

Minimum of E() - solution Elegant solution is found in the LMS sense.

Numerical example The primate splice-junction DNA gene sequences: 60 nucleotides, distinguish if there is an intron => exon, exon => intron boundary, or neither. 3190 vectors (2000 training + 1190 test) kNN (k=11, Manhattan) gave initial probabilities. Before correction: 85.8% (train), 85.7% (test) After correction: 86.4% (train), 86.9% (test) k1= 1,0282; k2= 0,8785 MSE improvement: better probabilities, even if not always correct answers.

Changes in the a priori class distribution A priori class distribution is different in training/test data. If data comes from the same process the densities p(X|Ci)=const, posteriors p(Ci|X) change. Bayes theorem for training pt(Ci|X) and test p(Ci|X):

Estimation of a priori probabilities How to estimate new p(Ci) ? Estimate confusion matrix on the training set pt(Ci|Cj) (McLachlan and Basford 1988); estimate ptest(C) from applying classifier to test data. Solve linear equations: Experiment: use MLP on small 50-50 training sample

What to optimize? Overall accuracy is not always the most important thing to optimize. Given a model M, confusion matrix for a class + and all other classes is (rows=true, columns=predicted by M):

Quantities derived from p(Ci|Cj) Several quantitiesare used to evaluate classification models M created to distinguish C+ class:

Error functions Best classifier selected using Recall (Precision) curves or ROC curves Sensitivity(1-Specificity), i.e. S+(1-S-) Confidence in M may be increased by rejecting some cases

Errors and costs Optimization with explicit costs: For a = 0 this is equivalent to maximization of and for large a to the maximization of

Conclusions Applying a trained model in real world application does not end with classification, it may be only the beginning . 3 types of corrections to optimize the final model have been considered: • a posteriori, improving accuracy by scaling probabilities • restoring the balance between the training/test distributions • improving confidence, selectivity or specificity of results. They are especially useful for optimization of logical rules. They may be combined, for example a posteriori corrections may be applied to accuracy for a chosen class (sensitivity), confidence, cost optimization etc.

A Posteriori Corrections to Classification Methods

A Posteriori Corrections to Classification Methods

Presentation Transcript

A posteriori Knowledge A priori knowledge

Linear Methods for Classification

Common Climate Classification Methods

Classification of Compression Methods

Part II.2 A-Posteriori Methods and Evolutionary Multiobjective Optimization

Welcome to Corrections

3. Classification Methods

Classification Ensemble Methods 1

Classification Ensemble Methods 2

KNOWLEDGE IS A PRIORI AND A POSTERIORI

LINEAR CLASSIFICATION METHODS

Corrections to Links:

Prototype Classification Methods

INTRO. TO CORRECTIONS

Ensemble Classification Methods

Image Classification: Supervised Methods

Introduction to Corrections

Classification of Analytical Methods

Linear Methods for Classification