Active Learning: Sampling Method

Active Learning: Sampling Method Meeting 6 — Jan 31, 2013 CSCE 6933 Rodney Nielsen

Space of Active Learning

Uncertainty Sampling • Uncertainty sampling • Select examples based on confidence in prediction • Least confident • Margin sampling • Entropy-based models

If |Y|=2, three uncertainty methods are the same • If |Y|=3, consider the following examples • 0.34, 0.33, 0.33 • 0.50, 0.50, 0.00 • 0.50, 0.49, 0.01 • 0.40, 0.30, 0.30 • 0.41, 0.40, 0.19

Query by Committee • Train a committee of hypotheses • Representing different regions of the version space • Obtain some measure of (dis)agreement on the instances in the dataset (e.g., vote entropy) • Assume the most informative instance is the one on which the committee has the most disagreement • Goal: minimize the version space • No agreement on size of committee, but even 2-3 provides good results

Competing Hypotheses • a

Expected Model Change • Query the instance that would result in the largest expected change in h based on the current model and Expectations • E.g., the instance that would result in the largest gradient descent in the model parameters • Prefer the instance x that leads to the most significant change in the model

Expected Model Change • What learning algorithms does this work for • What are the issues • Can be computationally expensive for large datasets and feature spaces • Can be led astray if features aren’t properly scaled • How do you properly scale the features?

Admin • IR / Thursday’s meeting time

ML Publication Venues • ML Journals • Machine Learning • Journal of Machine Learning Research • ML Conferences • NIPS – Neural Information Processing • ICML – International Conference on ML • ECML • IROS – Intl Conf on Intelligent Robots and Systems • ICPR – Intl Conference on Pattern Recognition • ISNN – Intl Symposium on Neural Networds • COLT – Computational Learning Theory • UAI – Uncertainty in Artificial Intelligence (AI) • AAAI – Association for Advancement of AI • IJCAI – International Joint Conference on AI • FLAIRS – Conference of the AI Research Society

NLP Publication Venues • NLP Journals • Computational Linguistics • JNLE – Journal of Natural Language Engineering • Language Resources and Evaluation • NLP Conferences • ACL / NAACL / EACL / PAACL • ICASP • CoLing • HLT • LREC • EMNLP • Interspeech

Projects • Set up meeting with me next week to discuss possible projects • Come prepared to discuss the concept you are most interested in pursuing (not the implementation details, just the high-level description) • Or if you don’t have a specific goal, send me an email describing your general interests

Reading Responses • Skip this coming Monday/Tuesday reading response

Estimated Error Reduction • Other models approximate the goal of minimizing future error by minimizing (e.g., uncertainty, variance, …) • Estimated Error Reduction attempts to directly minimize E[error]

Estimated Error Reduction • Often computationally prohibitive • Binary logistic regression would be O(|U||L|G) • Where G is the number of gradient descent iterations to convergence • Conditional Random Fields would be O(T|Y|T+2|U||L|G) • Where T is the number of instances in the sequence

Variance Reduction • Regression problems • E[error2] = noise + bias + variance: • Learner can’t change noise or bias so minimize variance • Fisher Information Ratio used for classification

Outlier Phenomenon • Uncertainty sampling and Query by Committee might be hindered by querying many outliers

Density Weighted Methods • Uncertainty sampling and Query by Committee might be hindered by querying many outliers • Density weighted methods overcome this potential problem by also considering whether the example is representative of the input dist. • Tends to work better than any of the base classifiers on their own

Diversity • Naïve selection by earlier methods results in selecting examples that are very similar • Must factor this in and look for diversity in the queries

Active Learning Empirical Results • Appears to work well, barring publication bias From Settles, 2009

Labeling Costs • Are all labels created equal? • Generating labels by experiments • Some instances easier to label (eg, shorter sents) • Can pre-label data for a small savings • Experimental problems • Value of information (VOI) • Considers labeling & estmtd misclassification costs • Critical to goal of Active Learning • Divide informativeness by cost?

Batch Mode Active Learning

Questions • ???

Active Learning: Sampling Method

Active Learning: Sampling Method

Presentation Transcript

Visible-Surface Detection Methods

Implement sampling procedures

Bioaerosol Sampling

RANDOM SAMPLING:

Sampling and Sample Size Determination

Data Stream Algorithms Intro, Sampling, Entropy

Chapter 4: Planning the Active Directory and Security

Ch 2: probability sampling, SRS

Active Learning Strategies

SAMPLING METHODS

STATISTICAL SAMPLING FOR AUDITORS

Sampling and Sample Size

Machine Learning

Mathematics Review

The Direct Method - The DM

Audit Sampling for Tests of Controls and Substantive Tests of Transactions

1. Active Learning

Sampling and Reconstruction

Sampling Bayesian Networks