1 / 8

Semi-Supervised Learning

Semi-Supervised Learning. Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot more unlabeled data available than labeled Assume a set L of labeled data and U of unlabeled data (from the same distribution)

zarita
Télécharger la présentation

Semi-Supervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Learning • Can we improve the quality of our learning by combining labeled and unlabeled data • Usually a lot more unlabeled data available than labeled • Assume a set L of labeled data andUof unlabeled data (from the same distribution) • Focus on Semi-Supervised Classification though there are many other variations • Aiding clustering with some labeled data • Regression • Model selection with unlabeled data (COD) • Transduction vs Induction CS 678 - Ensembles and Bayes

  2. How Semi-Supervised Works • Approaches make strong model assumptions (guesses). If wrong can make things worse. • Some common used assumptions • Clusters of data are from the same class • Data can be represented as a mixture of parameterized distributions • Decision boundaries should go through non-dense areas of the data • Model should be as simple as possible (Occam) CS 678 - Ensembles and Bayes

  3. Unsupervised Learning of Domain Features • PCA, SVD • NLDR – Non-Linear Dimensionality Reduction • Deep Learning • Deep Belief Nets • Sparse Auto-encoders • Self-Taught Learning CS 678 - Ensembles and Bayes

  4. Self-Training (Bootstrap) • Self-Training • Train supervised model on labeled data L • Test on unlabeled data U • Add the most confidently classified members of U to L • Repeat • Multi-Model • Uses an ensemble to trained models for Self-Training • Co-Training • Train two models with different independent features sets • Add most confident instances from U of one model into L of the other • Multi-View training • Find ensemble of multiple diverse models trained on L which also tend to all agree well on U CS 678 - Ensembles and Bayes

  5. More Models Generative – Assume data can be represented by some mixture of parameterized models (e.g. Gaussian) and use EM to learn parameters (ala Baum-Welch) CS 678 - Ensembles and Bayes

  6. Graph Models • Graph Models • Neighbor nodes assumed to be similar with larger edge weights. • Force same class member in L to be close, while maintaining smoothness with respect to the graph for U. • Add in members of U as neighbors based on some similarity • Iteratively label U (breadth first) CS 678 - Ensembles and Bayes

  7. TSVM • Transductive SVM (TSVM) or Semi-Supervised SVM (S3VM) • Maximize margin of both L and U. Decision surface placed in non-dense spaces • Assumes classes are "well-separated" • Can also try simultaneously maintain class proportion on both sides similar to labeled proportion CS 678 - Ensembles and Bayes

  8. CS 678 - Ensembles and Bayes

More Related