Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods

Semi-Supervised Trainingfor Appearance-Based Statistical Object Detection Methods Charles Rosenberg Thesis Oral May 10, 2004 Thesis Committee Martial Hebert, co-chair Sebastian Thrun, co-chair Henry Schneiderman Avrim Blum Tom Minka, Microsoft Research

Motivation: Object Detection Example eye detections from the Schneiderman detector. • Modern object detection systems “work”. • Lots of manually labeled training data required. • How can we reduce the cost of training data?

Approach: Semi-Supervised Training • Supervised training: costly fully labeled data • Semi-Supervised training: fully and weakly labeled data. • Goal: Develop semi-supervised approach for the object detection problem and characterize issues.

What is Semi-Supervised Training? • Supervised Training • Standard training approach • Training with fully labeled data • Semi-Supervised Training • Training with a combination of fully labeled data and unlabeled or weakly labeled data • Weakly Labeled Data • Certain label values unknown • E.g. object is present, but location and scale unknown • Labeling is relatively “cheap” • Unlabeled Data • No label information known

Issues for Object Detection • What semi-supervised approaches are applicable? • Ability to handle object detection problem uniqueness. • Compatibility with existing detector implementations. • What are the practical concerns? • Object detector interactions • Training data issues • Detector parameter settings • What kind of performance gain possible? • How much labeled training data is needed?

Contributions • Devised approach which achieves substantial performance gains through semi-supervised training. • Comprehensive evaluation of semi-supervised training applied to object detection. • Detailed characterization and comparison of semi-supervised approaches used.

Presentation Outline • Introduction • Background • Semi-supervised Training Approach • Analysis: Filter Based Detector • Analysis: Schneiderman Detector • Conclusions and Future Work

P(X) P(X) What is Unique About Object Detection? • Complex feature set • high dimensional, continuous with a complex distribution • Large inherent variation • lighting, viewpoint, scale, location, etc. • Many examples per training image • many negative examples and a very small number of positive examples. • Negative examples are free. • Large class overlap • the object class is a “subset” of the clutter class

Background Graph-Based Approaches Graph is constructed to represent the labeled and unlabeled data relationships – construction method important. Edges in the graph are weighted according to distance measure. Blum, Chawla, ICML 2001. Szummer, Jaakkola, NIPS 2001. Zhu, Ghahramani, Lafferty, ICML 2003. Information Regularization explicit about information transferred from P(X) to P(Y|X) Szummer, Jaakkola, NIPS 2002; Corduneanu, Jaakkola, UAI 2003. Multiple Instance Learning Addresses multiple examples per data element Dietterich, Lathrop, Lozano-Perez, AI 97. Maron, Lozano-Perez, NIPS 1998. Zhang, Goldman, NIPS 2001. Transduction, other methods…

Semi-Supervised Training Approaches Expectation-Maximization (EM) Batch Algorithm All data processed each iteration Soft Class Assignments Likelihood distribution over class labels Distribution recomputed each iteration Self-Training Incremental Algorithm Data added to active pool at iteration Hard Class Assignments Most likely class assigned Labels do not change once assigned

Dempster, Laird, Rubin, 1977. • Nigam, McCallum, Thrun, Mitchell. 1999. Train initial detector model with initial labeled data set. Semi-Supervised Training with EM Run detector on weakly labeled set andcompute most likely detection. Repeat for a fixed number of iterations or until convergence. Compute expected statistics of fully labeled examples and weakly labeled examples weighted by class likelihoods. Update the parameters of the detection model. Maximization Step Expectation step

Semi-Supervised Training with Self-Training Train detector model with the labeled data set. Run detector on weakly labeled set andcompute most likely detection. Repeat until weakly labeled data exhausted for until some other stopping criterion. Select the m best scoring examples and add them to the labeled training set. Score each detection with the selection metric. • Nigam, Ghani, 2000. Moreno, Agaarwal, ICML 2003

data point score = minimum distance Self-Training Selection Metrics • Nearest Neighbor (NN) Distance • Score = minimum distance between detection and labeled examples Detector Confidence Score = detection confidence Intuitively appealing Can prove problematic in practice

Selection Metric Behavior Confidence Metric Nearest-Neighbor (NN) Metric = class 1 = unlabeled = class 2

Semi-Supervised Training & Computer Vision EM Approaches S. Baluja. Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data. NIPS 1998. R. Fergus, P. Perona, A. Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning. CVPR 2003. Self Training A. Selinger. Minimally Supervised Acquisition of 3D Recognition Models from Cluttered Images. CVPR 2001. Summary Reasonable performance improvements reported “One of” experiments No insight into issues or general application.

Filter Based Detector Clutter GMM Input Image Object GMM xi Filter Bank Feature Vector Gaussian Mixture Models fi Mo+Mc

Filter Based Detector Overview Input Features and Model Features = output of 20 filters at each pixel location Generative Model = separate Gaussian Mixture Model for object and clutter class A single model is used for all locations on the object Detection Compute filter responses and likelihood under the object and clutter models at each pixel location “Spatial Model” used to aggregate pixel responses into object level responses

Spatial Model Training Images Object Masks Spatial Model Log Likelihood Ratio Log Likelihood Ratio Example Detection

Typical Example Filter Model Detections Sample Detection Plots Log Likelihood Ratio Plots

Filter Based Detector Overview Fully Supervised Training fully labeled example = image + pixel mask Gaussian Mixture Model parameters trained Spatial model trained from pixel masks Semi-Supervised Training weakly labeled example = image with the object Initial model is trained using the fully labeled object and clutter data The spatial model and clutter class model are fixed once trained with the initial labeled data set. EM and self-training variants are evaluated

data point score = minimum distance Self-Training Selection Metrics • Nearest neighbor (NN) selection metric • selection is distance to closest labeled example • distance is based on a model of each weakly labeled example Confidence based selection metric selection is detector odds ratio

Filter Based Experiment Details Training Data 12 images desktop telephone + clutter, view points +/- 90 degrees roughly constant scale and lighting conditions 96 images clutter only Experimental variations 12 repetitions with different fully / weakly training data splits Testing data 12 images, disjoint set, similar imaging conditions Correct Detection Incorrect Detection

Example Filter Model Results Labeled Data Only Expectation-Maximization Self-Training Confidence Metric Self-Training NN Metric

Labeled Only = 26.7% Single Image Semi-Supervised Results Expect-Max = 19.2% Confidence Metric = 34.2% 1-NN Selection Metric = 47.5%

Reference Two Image Semi-Supervised Results Close Near Far Labeled Data Only + Near Pair = 52.5% 4-NN Metric + Near Pair = 85.8%

Example Schneiderman Face Detections

Schneiderman Detector Details Schneiderman 98,00,03,04 Detection Process Wavelet Transform Feature Construction Search Over Location + Scale Classifier Wavelet Transform Feature Search Feature Selection Adaboost Training Process

Schneiderman Detector Training Data Fully Supervised Training fully labeled examples with landmark locations Semi-Supervised Training weakly labeled example = image containing the object initial model is trained using fully labeled data Variants of self-training are evaluated

Self Training Selection Metrics Labeled Images Confidence based selection metric Classifier output / odds ratio Nearest Neighbor selection metric Preprocessing = high pass filter + normalized variance Mahalanobis distance to closest labeled example Candidate Image

Schneiderman Experiment Details Training Data 231 images from the Feret data set and the web Multiple eyes per image = 480 training examples 80 synthetic variations – position, scale, orientation Native object resolution = 24x16 pixels 15,000 non-object examples from clutter images

Schneiderman Experiment Details Evaluation Metric +/- 0.5 object radius and +/- 1 scale octave are correct Area under the ROC curve (AUC) performance measure ROC curve = Receiver Operating Characteristic Curve Detection rate vs. false positive count Detection Rate in Percent Number of False Positives

Schneiderman Experiment Details Experimental Variations 5-10 runs with random data splits per experiment Experimental Complexity Training the detector = one iteration One iteration = 12 CPU hours on a 2 GHz class machine One run = 10 iterations = 120 CPU hours = 5 CPU days One experiment = 10 runs = 50 CPU days All experiments took approximately 3 CPU years Testing Data Separate set of 44 images with 102 examples

Example Detection Results Fully Labeled Data Only Fully Labeled + Weakly Labeled Data

Performance vs. Fully Labeled Data Set Size When can weakly labeled data help? Full Data Normalized AUC smooth saturated failure Fully Labeled Training Set Size on a Log Scale • It can help in the “smooth” regime • Three regimes of operation: saturated, smooth, failure

Confidence Metric Self-Training AUC Performance Performance of Confidence Metric Self-Training Full Data Normalized AUC 24 30 34 40 48 60 Fully Labeled Training Set Size • Improved performance over range of data set sizes. • Not all improvements significant at 95% level.

NN Metric Self-Training AUC Performance Performance of NN Metric Self-Training Full Data Normalized AUC 24 30 34 40 48 60 Fully Labeled Training Set Size • Improved performance over range of data set sizes. • All improvements significant at 95% level.

MSE Metric Changes to Self-Training Behavior Base Data Normalized AUC Base Data Normalized AUC Iteration Number Iteration Number Confidence Metric Performance vs. Iteration NN Metric Performance vs. Iteration NN metric performance trend is level or upwards

Example Training Image Progression 0.822 0.822 Confidence Metric NN Metric 0.770 1 0.867 2 0.882 0.798

Example Training Image Progression 0.798 3 0.922 0.745 4 0.931 5 0.759 0.906

How much weakly labeled data is used? Weakly labeled data set size Weakly labeled data set ratio Ratio of Weakly to Fully Labeled Data Training Data Size 24 30 34 40 48 60 24 30 34 40 48 60 Fully Labeled Training Set Size Fully Labeled Training Set Size It is relatively constant over initial data set size.

Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods

Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods

Presentation Transcript

An Overview on Semi-Supervised Learning Methods

Semi-supervised Learning

Semi-Supervised Learning

(semi)Automatic Methods for Security Bug Detection

Semi-supervised learning and self-training

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Supervised and semi-supervised learning for NLP

Semi-Supervised Clustering

Chapter 11 Supervised Learning: STATISTICAL METHODS

Semi-supervised Training of Statistical Parsers

Multi-Task Semi-Supervised Underwater Mine Detection

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Semi-Supervised Learning

Semi-Supervised Learning

Scalable Methods for Graph-Based Unsupervised and Semi-Supervised Learning

Semi-Supervised Boosting for Statistical Word Alignment

Chapter 11 Supervised Learning: STATISTICAL METHODS

(semi)Automatic Methods for Security Bug Detection