Bias-Variance Trade-Off in Classifier Selection and Optimization
E N D
Presentation Transcript
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Outline • Principles of generalization • Survey of classifiers • Project discussion • Discussion of Rosch
Pipeline for Prediction Imagery Representation Classifier Predictions
Bias and Variance Error High Bias Low Variance Complexity Low Bias High Variance
Overfitting • Need validation set • Validation set not same as test set
Bias-Variance View of Features • More compact = lower variance, potentially higher bias • More features = higher variance, lower bias • More independence among features = simpler classifier lower variance
How to reduce variance • Parameterize model E.g., linear vs. piecewise
How to measure complexity? • VC dimension Upper bound on generalization error Training error + N: size of training set h: VC dimension : 1-probability
How to reduce variance • Parameterize model • Regularize
How to reduce variance • Parameterize model • Regularize • Increase number of training examples
Effect of Training Size Error Number of Training Examples
Risk Minimization • Margins x x x x x x x o x o o o o x2 x1
Classifiers • Generative methods • Naïve Bayes • Bayesian Networks • Discriminative methods • Logistic Regression • Linear SVM • Kernelized SVM • Ensemble methods • Randomized Forests • Boosted Decision Trees • Instance based • K-nearest neighbor • Unsupervised • Kmeans
Components of classification methods • Objective function • Parameterization • Regularization • Training • Inference
Classifiers: Naïve Bayes • Objective • Parameterization • Regularization • Training • Inference y x1 x2 x3
Classifiers: Logistic Regression • Objective • Parameterization • Regularization • Training • Inference
Classifiers: Linear SVM • Objective • Parameterization • Regularization • Training • Inference x x x x x x x o x o o o o x2 x1
Classifiers: Linear SVM • Objective • Parameterization • Regularization • Training • Inference x x x x x x x o x o o o o x2 x1
Classifiers: Linear SVM • Objective • Parameterization • Regularization • Training • Inference Needs slack x x o x x x x x o x o o o o x2 x1
Classifiers: Kernelized SVM • Objective • Parameterization • Regularization • Training • Inference x x o o o x x x1 x x o o x12 o x x x1
Classifiers: Decision Trees • Objective • Parameterization • Regularization • Training • Inference x x x x x x o x o x o o o o x2 x1
Ensemble Methods: Boosting figure from Friedman et al. 2000
Boosted Decision Trees High in Image? Gray? Yes No Yes No Smooth? Green? High in Image? Many Long Lines? … Yes Yes No Yes No Yes No No Blue? Very High Vanishing Point? Yes No Yes No P(label | good segment, data) Ground Vertical Sky [Collins et al. 2002]
Boosted Decision Trees • How to control bias/variance trade-off • Size of trees • Number of trees
K-nearest neighbor • Objective • Parameterization • Regularization • Training • Inference x x o x x x x o x o x o o o o x2 x1
Clustering + x + o + + x x + x + + x + + o x x + o + o + o x2 x2 x1 x1
References • General • Tom Mitchell, Machine Learning, McGraw Hill, 1997 • Christopher Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995 • Adaboost • Friedman, Hastie, and Tibshirani, “Additive logistic regression: a statistical view of boosting”, Annals of Statistics, 2000 • SVMs • http://www.support-vector.net/icml-tutorial.pdf