In Defense of One-Vs-All Classification

In Defense of One-Vs-All Classification Ryan Rifkin and Aldebaro Klautau Journal of Machine Learning Research, Volume 5, (December 2004), Pages: 101 – 141. Presented by Shuiwang Ji Machine Learning Lab at CSE Center for Evolutionary Functional Genomics The Biodesign Institute Part of this slides are taken from: http://www.mit.edu/~9.520/Classes/class08.html

Main thesis • “One-against-rest scheme is extremely powerful, producing results that are often at least as accurate as other methods.” • “Experimental evidence of the superiority of the proposed methods over a simple One-against-rest scheme is improperly controlled or measured.”

Outline • Single machine approaches; • Error correcting code approaches; • Tree-structured approaches (NOT in the paper); • Experiments.

Watson & Watkins (WW)(1998) • Binary-class: Learn one function. Penalize each machine separately based on the margin violations; • Multi-class: Pay a penalty based on the relative values output by the machines.

Watson & Watkins (WW)(1998) • Learn N functions. If a point x is in class i, make (k-1)*n

Watson & Watkins (WW)(1998) • Too many constraints and slack variables (k-1)*n; • Not easy to decompose (not scalable); • Experimental setup is problematic.

Crammer & Singer (2001) • Watson & Watkins: paying each class for which • Crammer & Singer: Penalize for the largest

Crammer & Singer (2001) • Watson & Watkins: • Crammer & Singer n(k-1) n

Crammer & Singer (2001) • Fewer slacks (compared to Watson & Watkins ); • Can be decomposed (more scalable); • Many tricks are developed and implemented for efficient training; • C and R source codes available: http://www.cis.upenn.edu/~crammer/code/MCSVM/MCSVM_1_0.tar.gz R (http://www.r-project.org/ ) kernlab package

Lee, Lin, and Wahba (2001)

Lee, Lin, Wahba, Analysis • Like the WW formulation, this formulation is big, and no decomposition method is provided; • This is an asymptotic analysis. It requires and and no rates are provided. But asymptotically, density estimation will allow us to recover the optimal Bayes rule.

Outline • Single machine approaches; • Error correcting code approaches; • Tree-structured approaches; • Experiments.

Codeword Meta-classifier Error-Correcting Code (ECC) Dietterich & Bakiri (1995) 0 1 0 0 0 0 0 0 0 0 Source: Dietterich and Bakiri (1995)

One-against-rest

One-against-one

Special cases of ECC Source: http://www-cse.ucsd.edu/users/elkan/254spring01/aldebaro1.pdf

Outline • Single machine approaches; • Error correcting code approaches; • Tree-structured approaches; • Experiments.

Large Margin Directed Acyclic Graph (DAG) • Identical to one-against-one at training time; • At test time, DAG is used to determine which classifiers to test on a given point; • Classes i and j are compared, whichever class achieves lower score is removed from further consideration; • Repeat N-1 time, only one class remained.

Large Margin DAGs for Multiclass Classification Source: Platt et al. (2000)

Margin tree (Tibshirani and Hastie 2006) • SVM is constructed for each pair of classes to compute pair-wise margins; • Agglomerative clustering uses the pair-wise margins as distances to construct the hierarchical structure bottom up. • Three approaches: Greedy, single linkage, and complete linkage.

Margin tree Source: Tibshirani and Hastie (2006)

Outline • Single machine approaches; • Error correcting code approaches; • Tree-structured approaches; • Experiments (Compare five ECC approaches).

Observations • In nearly all cases, the results of compared methods are very close; • In majority of experiments, 0 is in the confidence interval, meaning the classifiers are not statistically different;

Implementations in R • e1071: one-against-one (LIBSVM) • kernlab: one-against-one, Crammer & Singer, Weston & Watkins • klaR: one-against-rest (SVMlight) • marginTree: http://www-stat.stanford.edu/~tibs/marginTree_1.00.zip

Q & A Thank you!

In Defense of One-Vs-All Classification

In Defense of One-Vs-All Classification

Presentation Transcript

Clustering vs. Classification

ALL FOR ONE ONE FOR ALL

Applications of one-class classification

In Defense of Nearest-Neighbor Based Image Classification

All In One Training

All-in-one Business

ALL IN ONE DISCOUNTS

One Size Fits All? Enterprise vs. Best of Breed Systems

PagePro1380 All-In-One

Europe in One One team, all of Europe

All In One Marketing

Clustering vs. Classification

FAB Classification of ALL

Team Defense One Case, One Client

All-In-One Printers

All In One Photographer

Vegessential All in One

ONE-CLASS CLASSIFICATION

Yoast vs All in One SEO Pack: Which should You Prefer