Learning from labelled and unlabeled data

Learning from labelled and unlabeled data Semi-Supervised Learning Machine Learning – PDEEC 2008/2009 Filipe Tiago Alves de Magalhães 08/09/2014

Semi-Supervised Learning Supervised Learning Unsupervised Learning Labbeled + unlabeled data discover patterns in the data that relate data attributes with a target (class) attribute. The data have no target attribute (unlabeled). Semi-Supervised Learning Typically, plenty of unlabeled data available. We want to explore the data to find some intrinsic structures in them. • These patterns are then • utilized to predict the • values of the target • attribute in future • data instances. Tries to improve the predictive power using both labelled and unlabeled data. (Expected to be better than using one alone)

Semi-Supervised Learning Unlabeled data is easy to obtain Labelled data can be difficult to obtain - human annotation is boring - may require experts - may require special equipment - very time-consuming • Examples: • Web page classification (billions of pages) • Email classification (SPAM or No-SPAM) • Speech annotation (400h for each hour of conversation) • …

Semi-Supervised Learning Semi-Supervised learning can be seen as an excellent way to improve the results that we would get using exclusively supervised or non-supervised methods, for the same scenario. Although we (or specialists) do not need to spend such a big effort labelling data, a great concern must be faced for the design of good models, feature extraction, kernels definition.

Semi-Supervised Learning Sometimes, it may not be so hard to label data… www.espgame.org Tries to guess the user’s gender based on his/her choices. After that, we tell if it was right or wrong Takes advantage of player’s intervention in order to enrich the training of automatic learning algorithms

Semi-Supervised Self-Training of Object Detection Models Chuck Rosenberg Google, Inc. Martial Hebert Carnegie Mellon University Henry Schneiderman Carnegie Mellon University 7th IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) 2005

Semi-Supervised Learning Self-Training L = (Xi , Yi ) Set of labelled data U = (Xi , ? ) Set of unlabeled data • Algorithm • Repeat • Train a classifier C with training data L • Classify data in U with C • Find a subset U’ of U with the most confident scores • L + U’  L • U – U’  U

Semi-Supervised Self-Training of Object Detection Models Object detection Object detection based on its shape - time-consuming - exhaustive labelling (background, foreground, object, non-object) Try to simplify the collection and preparation of training data - combining data labelled in different ways - labelling of each image region can take the form of a probability distribution over labels (“weakly” labelled) - e.g., is more likely that the object is present in the centre of the image - e.g., a certain image has a high likelihood of containing the object, but its position is unknown.

Semi-Supervised Self-Training of Object Detection Models Training Approaches Generic detection algorithm for classification of a subwindow in an image as being part of the “object” class or the “clutter/everything else” class If X – image feature vectors xi – data at a specific location in the image (i = {1, … ,n} indexes images locations) Y – class f – foreground b – background θf – parameters of the foreground model θb – parameters of the background model

Semi-Supervised Self-Training of Object Detection Models Training Approaches EM approach

Semi-Supervised Self-Training of Object Detection Models Training Approaches EM approach • There are many reasons why EM may not perform well in a particular semi-supervised training context. • EM solely finds a set of model parameters which maximize the likelihood of the data. • - Fully labeled data may not sufficiently constrain the solution, which means that there may be solutions which maximize the data likelihood but do not optimize classification performance.

Semi-Supervised Self-Training of Object Detection Models Training Approaches Alternative

Semi-Supervised Self-Training of Object Detection Models Detector Overview (Experimental Setup) Subwindow is processed for lighting correction Two-level wavelet transform is applied Features are computed by vector quantizing groups of wavelet coefficients Subwindow is classified by thresholding a linear combination of the log-likelihood ratios of the features Cascade architecture → only image patches which are accepted by the first detector are passed on to the next

Semi-Supervised Self-Training of Object Detection Models Data (Experimental Setup) Landmark used on a typical training image sample training images and the training examples associated with them Set with positive examples – 231 images 480 training examples Independent test set – 44 images 102 test examples 15000 negative examples Training examples – 24 x 16 pixels (rotated, scaled and cropped) 200-300 pixels high and 300-400 pixels wide

Semi-Supervised Self-Training of Object Detection Models Training (Experimental Setup) Training the model with fully labeled data consists of the following steps: • Given the training data landmark locations • geometrically normalize the training example subimages; • apply lighting normalization to the subimages; • generate synthetic training examples (scaling, shifting and rotating) • Compute the wavelet transform of the subimages • Quantize each group of wavelet coefficients and build a naïve Bayes model with respect to each group to discriminate between positive and negative examples • Adjust the naïve Bayes model using boosting, but maintaining a linear decision function, effectively performing gradient descent on the margin • Compute a ROC curve for the detector using a cross validation set • Choose a threshold for the linear function, based on the final performance desired

Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experimental Setup) Selection metric is crucial to the performance of the training • Confidence selection • Computed at every iteration by applying the detector trained from the current set of labelled data to the weakly labelled data set. • Detection with highest confidence is selected and added to the training set • MSE selection • Is calculated for each weakly labelled example by evaluating the distance between the corresponding image window and all of the other templates in the training data (including the original labelled examples and the weakly labelled examples added in prior iterations)

Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experimental Setup) The candidate image and the labeled images are first normalized with a specific set of processing steps before the MSE based score metric is computed. The score is based on the Mahalanobis distance

Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experimental Setup) position Detector MSE selection metric scale The detector must be accurate in localization but need not be accurate in detection since false detection will be discarded due to their large MSE distances to all of the training examples. This is crucial to ensure the performance of the training algorithm with small initial training sets. This is also part of the reason for the MSE to outperform the confidence metric, which requires the detector to be accurate in both localization and detection performance.

Semi-Supervised Self-Training of Object Detection Models Experiment Scenarios (Experiments and Analysis) Each experiment was repeated using a different initial random subset, in order to avoid the variance that was being observed in the detector performance and in the behaviour of the semi-supervised training process. Experiment = specific set of experimental conditions Run = each repetition of that experiment Mostly, 5 runs were performed for each experiment Typically, 20 weakly labelled images were added to the training set at each iteration, because of the substantial training time of the detector. Ideally, only a single image would be added at each iteration.

Semi-Supervised Self-Training of Object Detection Models Evaluation Metrics (Experiments and Analysis) Each run was evaluated by using the area under the ROC curve (AUC). Because different experimental conditions affect performance, the AUCs were normalized relatively to the full data performance of that run. if (performance level = = 1.0) { the model being evaluated has the same performance as it would if all of the labelled data was utilised } if (performance level < 1.0) { the model has a lower performance than that achieved with the full data set } To compute the full data performance, each specific run is trained with the full data set and its performance is recorded.

Semi-Supervised Self-Training of Object Detection Models Baseline training configurations (Experiments and Analysis) Smooth regime was chosen in order to perform experiments under conditions where the addition of weakly labelled data would make a difference.

Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experiments and Analysis) Does the choice of the selection metric make a substantial difference in the performance of the semi-supervised training? Confidence metric MSE metric

Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experiments and Analysis) Does the choice of the selection metric make a substantial difference in the performance of the semi-supervised training? decreases increases

Semi-Supervised Self-Training of Object Detection Models Relative size of fully Labelled Data(Experiments and Analysis) How many weakly labelled examples do we need to add to the training set in order to reach the best detector performance?

Semi-Supervised Self-Training of Object Detection Models Conclusions/Discussion The results showed that it was possible to achieve detection performance that was close to the base performance obtained with the fully labelled data, even when a small fraction of the training data was used in the initial training set. The experiments showed that the self-training approach to semi-supervised training can be applied to an existing detector that was originally designed for supervised training. The MSE selection metric consistently outperformed the confidence metric. More generally, the self-training approach using an independently-defined selection metric outperforms both the confidence metrics and the batch EM approaches. During the training process, the distribution of the labeled data at any particular iteration may not match the actual underlying distribution of the data.

Semi-Supervised Self-Training of Object Detection Models Conclusions/Discussion True labels for the unlabeled data Original unlabeled data and labelled data (c),(d) The points labelled by the incremental self-training algorithm after 5 iterations using the confidence metric and the Euclidean metric, respectively.

Semi-Supervised Self-Training of Object Detection Models Future Work Study the relation between the semi-supervised training approach evaluated here with the co-training approaches. Develop more precise guidelines for selecting the initial training set. The approach could be extended to training examples that are labelled in different ways. For example, some images may be provided with scale information and nothing else. Additional information may be provided such as the rough shape of the object, or a prior distribution over its location in the image.

ZZZZZZZZZZZZZZ….. Still Awake???

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Andrew B. Goldberg Computer Sciences Department University of Wisconsin-Madison Xiaojin Zhu Computer Sciences Department University of Wisconsin-Madison TextGraphs: HLT/NAACL Workshop on Graph-based Algorithms for Natural Language Processing 2006

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Sentiment Categorization ? ? ?

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Sentiment Categorization

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization What we saw is rating inference Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL. • In this work… • Graph-based Semi-supervised Learning • Main assumption encoded in graph: • Similar documents should have similar ratings

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 50% accuracy 

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 100% accuracy 

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Goal

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Approach

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Measuring Loss over the Graph

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Minimization now is non- trivial

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Finding a Closed-Form Solution Fortunately, we

Learning from labelled and unlabeled data

Learning from labelled and unlabeled data

Presentation Transcript

Self-taught Learning Transfer Learning from Unlabeled Data

LEARNING FROM DATA

Stochastic Unsupervised Learning on Unlabeled Data

Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

LABELLED BUNDLES

Learning from Positive and Unlabeled Examples

Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science

Techniques For Exploiting Unlabeled Data

Learning From Data

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

Latent Tree Analysis of Unlabeled Data

Improving the Graph Mincut Approach to Learning from Labeled and Unlabeled Examples

Classification of unlabeled data:

Incorporating Unlabeled Data in the Learning Process

Learning from Labeled and Unlabeled Data using Graph Mincuts

Incorporating Unlabeled Data in the Learning Process

Information Extraction with Unlabeled Data

A Theoretical Model for Learning from Labeled and Unlabeled Data

Techniques For Exploiting Unlabeled Data