1 / 19

Paired Sampling in Density-Sensitive Active Learning

Paired Sampling in Density-Sensitive Active Learning. Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University. Outline. Problem setting Motivation

randi
Télécharger la présentation

Paired Sampling in Density-Sensitive Active Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University

  2. Outline • Problem setting • Motivation • Our approach • Experiments • Conclusion

  3. Setting • X: feature space, label set Y={-1,+1} • Data D ~ X x Y • D = T U U • T: training set U: unlabeled set • T is small initially, U is large • Active Learning: • Choose most informative samples to label • Goal: high performance with least number of labeling requests

  4. Motivation • Optimize the decision boundary placement • Sampling disproportionately on one side may not be optimal • Maximize likelihood of straddling the boundary with paired samples • Three factors affect sampling • Local density • Conditional entropy maximization • Utility score

  5. Illustrative Example Paired sampling Single point sampling • Left Figure • significant shift in the current hypothesis • large reduction in version space • Right Figure • small shift in the current hypothesis • small reduction in version space

  6. Density-Sensitive Distance • Cluster Hypothesis: • decision boundary should NOT cut clusters • squeeze distances in high density regions • increase distances in low density regions • Solution: Density-Sensitive Distance • find the weakest link along each path in a graph G • a better way to avoid outliers (i.e. a very short edge in a long path) Chapelle & Zien (2005)

  7. Density-Sensitive Distance • Apply MDS (Multi-dimensional Scaling) to to obtain a Euclidean embedding • Find eigenvalues and eigenvectors of • Pick the first p eigenvectors s.t.

  8. Active Sampling Procedure • Given a training set T in MDS space • Train logistic regression classifier on T • For all • Compute the pairwise score • Choose the pair with the maximum score • Repeat 1-3

  9. Details of the Scoring Function S • Two components of S • Likelihood of a pair having opposite labels (straddling the decision boundary) • Utility of the pair • By cluster assumption • decision boundary should not clusters => points in different clusters are likely to have different labels • In the transformed space, points in different clusters have low similarity (large distance) Thus, we can estimate

  10. An Analysis Justifying our Claim • Pairwise distances are divided into bins • Pairs are assigned to bins acc. to their distances • For each bin, relative frequency of pairs with opposite class labels are computed • This graph (empirically) shows that likelihood of having opposite labels for two points monotonically increases with the pairwise distance between them. * This graph is plotted on g50c dataset.

  11. Utility Function • Two components • Local density depends on • number of close neighbors • their proximity • Conditional Entropy • For binary problems

  12. Uncertainty-Weighed Density • captures • the density of a given point • information content of its neighbors • novelty: • each neighbor’s contribution weighed by its uncertainty • reduces the effect of highly certain neighbors • dense points with highly uncertain neighbors become important

  13. Utility Function • utility of a pair is • regularize • information content (entropy) of the pair • proximity-weighted information content of neighbors

  14. Experimental Data • pair with maximum score selected • Six binary datasets

  15. Experiment Setting • For each data set • start with 2 labeled data points (1 +, 1 -) • run each method for 20 iterations • results averaged over 10 runs • Baselines • Uncertainty Sampling • Density-only Sampling • Representative Sampling (Xu et. al. 2003) • Random Sampling

  16. Results

  17. Results

  18. Conclusion • Our contributions: • combine uncertainty, density, and dissimilarity across decision boundary • proximity-weighted conditional entropy selection is effective for active learning • Results show • our method significantly outperforms baselines in • error reduction • fewer labeling requests than others to achieve the same performance

  19. Thank You!

More Related