150 likes | 279 Vues
This paper discusses the necessity and implementation of a posteriori corrections in classification models to improve their accuracy, particularly in real-life scenarios where class distributions differ significantly from controlled experiments. It examines various methods to adjust probability estimates, optimize performance, and balance training and test data distributions. By employing techniques such as linear scaling and softmax adjustments, researchers can refine model outputs, increasing sensitivity, specificity, and overall confidence in predictions. Practical examples illustrate the efficacy of these corrective measures.
E N D
A Posteriori Corrections to Classification Methods Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk
Motivation So you’ve got you model … but that’s not the end. Try to derive as much information from it as possible. • In pathological cases NN, FS, RS and other systems lead to results below the base rates. How to avoid it? • In controlled experiment the split was 50%-50%. In real life it is 5-95%. How to deal with it? • So your model is accurate; that doesn’t impress me much.How about the costs? Confidence in results? Sensitivity? Specificity? Can you improve it quickly? A posteriori corrections may help and are (almost) for free.
Corrections increasing accuracy NN, kNN, FIS, RS & others do not estimate probabilities rigorously, but some estimations of p(Ci|X) are obtained. Many systems do not optimize error functions. Idea: linear scaling of probabilities: K classes, CK is the majority class. ki = 0 gives majority classifier, ki = 1 gives original one. Optimize ki = 0.
Softmax If ki [0,1] then p. of the majority class may only grow. Solution: assume ki [0,∞], and kK =1, use softmax This will flatten probabilities; for 2 classes: P(C|X)[(1+e-1)-1,(1+e+1)-1][0.27,0.73].
Cost function Pi(X) are “true” probabilities, if given, or 1 if the label of the training vector X is CiPi(X) = 0 otherwise kNNs, Kohonen nets, Decision Trees, many fuzzy and rough systems do not minimize such cost function. Alternative: stacking with linear perceptron.
Cost function with linear rescaling Due to normalization:
Minimum of E() - solution Elegant solution is found in the LMS sense.
Numerical example The primate splice-junction DNA gene sequences: 60 nucleotides, distinguish if there is an intron => exon, exon => intron boundary, or neither. 3190 vectors (2000 training + 1190 test) kNN (k=11, Manhattan) gave initial probabilities. Before correction: 85.8% (train), 85.7% (test) After correction: 86.4% (train), 86.9% (test) k1= 1,0282; k2= 0,8785 MSE improvement: better probabilities, even if not always correct answers.
Changes in the a priori class distribution A priori class distribution is different in training/test data. If data comes from the same process the densities p(X|Ci)=const, posteriors p(Ci|X) change. Bayes theorem for training pt(Ci|X) and test p(Ci|X):
Estimation of a priori probabilities How to estimate new p(Ci) ? Estimate confusion matrix on the training set pt(Ci|Cj) (McLachlan and Basford 1988); estimate ptest(C) from applying classifier to test data. Solve linear equations: Experiment: use MLP on small 50-50 training sample
What to optimize? Overall accuracy is not always the most important thing to optimize. Given a model M, confusion matrix for a class + and all other classes is (rows=true, columns=predicted by M):
Quantities derived from p(Ci|Cj) Several quantitiesare used to evaluate classification models M created to distinguish C+ class:
Error functions Best classifier selected using Recall (Precision) curves or ROC curves Sensitivity(1-Specificity), i.e. S+(1-S-) Confidence in M may be increased by rejecting some cases
Errors and costs Optimization with explicit costs: For a = 0 this is equivalent to maximization of and for large a to the maximization of
Conclusions Applying a trained model in real world application does not end with classification, it may be only the beginning . 3 types of corrections to optimize the final model have been considered: • a posteriori, improving accuracy by scaling probabilities • restoring the balance between the training/test distributions • improving confidence, selectivity or specificity of results. They are especially useful for optimization of logical rules. They may be combined, for example a posteriori corrections may be applied to accuracy for a chosen class (sensitivity), confidence, cost optimization etc.