Semi-Supervised Learning Using Randomized Mincuts

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira Carnegie Mellon

Motivation • Often have little labeled data but lots of unlabeled data. • We want to use the relationships between the unlabeled examples to guide our predictions. • Assumption: “Similar examples should generally be labeled similarly."

Learning using Graph Mincuts:Blum and Chawla (ICML 2001)

Construct an (unweighted) Graph

+ - Add auxiliary “super-nodes”

Obtain s-t mincut + - Mincut

Classification + - Mincut

Problem • Plain mincut gives no indication of it’s confidence on different examples. Solution • Add random weights to the edges. • Run plain mincut and obtain a classification. • Repeat the above process several times. • For each unlabeled example take a majority vote. • Margin of the vote gives a measure of the confidence.

Before adding random weights + - Mincut

After adding random weights + - Mincut

PAC-Bayes • PAC-Bayes bounds show that the ‘average’ of several hypotheses that are all consistent with the training data will probably be more accurate than any single hypothesis. • In our case each distinct cut corresponds to a different hypothesis. • Hence the average of these cuts will probably be more accurate than any single cut.

Markov Random Fields • Ideally we would like to assign a weight to each cut in the graph (a higher weight to small cuts) and then take a weighted vote over all the cuts in the graph. • This corresponds to a Markov Random Field model. • We don’t know how to do this efficiently, but we can view randomized mincuts as an approximation.

Related Work –Gaussian Fields • Zhu, Gharamani and Lafferty (ICML 2003). • Each unlabeled example receives a label that is the average of its neighbors. • Equivalent to minimizing the squared difference of the labels.

How to construct the graph? • k-NN • Graph may not have small balanced cuts. • How to learn k? • Connect all points within distance δ • Can have disconnected components. • How to learn δ? • Minimum Spanning Tree • No parameters to learn. • Gives connected, sparse graph. • Seems to work well on most datasets.

Experiments • ONE vs. TWO: 1128 examples . • (8 X 8 array of integers, Euclidean distance). • ODD vs. EVEN: 4000 examples . • (16 X 16 array of integers, Euclidean distance). • PC vs. MAC: 1943 examples . • (20 newsgroup dataset, TFIDF distance) .

ONE vs. TWO

ODD vs. EVEN

PC vs. MAC

Accuracy Coverage: PC vs. MAC(12 labeled)

Conclusions • We can get useful estimates of the confidence of our predictions. • Often get better accuracy than plain mincut. • Minimum spanning tree gives good results across different datasets.

Future Work • Sample complexity lower bounds (i.e. how much unlabeled data do we need to see?). • More principled way of sampling cuts?

THE END

Questions?

Semi-Supervised Learning Using Randomized Mincuts

Semi-Supervised Learning Using Randomized Mincuts

Presentation Transcript

Semi-supervised Learning

Semi-Supervised Learning

Learning using Graph Mincuts

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised learning

Semi-Supervised Learning

Semi-supervised Learning

Semi-Supervised Learning Using Randomized Mincuts

Active Semi-Supervised Learning using Submodular Functions

Inductive Semi-supervised Learning

Semi-Supervised Learning

Semi-Supervised Learning

Semi-supervised Learning

Semi-Supervised Learning

COMP3503 Semi-Supervised Learning

Semi-Supervised Learning

EEG Classification using Semi Supervised Learning