1 / 10

Ensembles: Co-training and Classifier Improvement

This text discusses the concept of ensembles, including co-training and boosting techniques, to improve the accuracy of classifiers. It also explores the advantages of using unlabeled data in co-training and provides real-world examples.

keithmiller
Télécharger la présentation

Ensembles: Co-training and Classifier Improvement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Ensembles • Co-training 1

  2. Ensembles of Classifiers • Assume errors are independent • Assume majority vote • Prob majority is wrong=area under binomial dist 2

  3. Prob 0.2 0.1 Number of classifiers in error Enemble of 21 classifiers • If individual area is 0.3 • Area under curve for 11 wrong is 0.026 • Order of magnitude improvement! 3

  4. Constructing Ensembles • Bagging • Run classifier k times on m examples drawn randomly with replacement from the original set of m examples • Training sets correspond to 63.2% of original (+ duplicates) • Cross-validated committees • Divide examples into k disjoint sets • Train on k sets, each corresponding to original minus 1/k th 4

  5. Ensemble Creation II • Boosting • Maintain a probability distribution over set of training ex • On each iteration, use distribution to sample • Use error rate to modify distribution • Create harder and harder learning problems... 5

  6. Co-Training Motivation • Learning methods need labeled data • Lots of <x, f(x)> pairs • Hard to get… (who wants to label data?) • But unlabeled data is usually plentiful… • Could we use this instead?????? 6

  7. Co-training • Suppose each instance has two parts: x = [x1, x2] x1, x2 conditionally independent given f(x) • Suppose each half can be used to classify instance f1, f2 such that f1(x1) = f2(x2) = f(x) • Suppose f1, f2 are learnable f1  H1, f2  H2,  learning algorithms A1, A2 [x1, x2] Unlabeled Instances 7

  8. ~ A2 A1 <[x1, x2], f1(x1)> f2 Labeled Instances Hypothesis 8

  9. Observations • Can apply A1 to generate as much training data as one wants • If x1 is conditionally independent of x2 given f(x) then the error in the labels produced by A1 will look like random noise to A2 • Thus no limit to quality of the hype A2 can make 9

  10. It really works! Learning to classify web pages as course pages x1 = bag of words on a page x2 = bag of words from all anchors pointing to a page Naïve Bayes classifiers 12 labeled pages 1039 unlabeled 10

More Related