Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon

MILESTONE RESULTS Mar. 1st, 2007 Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see http://www.agnostic.inf.ethz.ch/credits.php

Thanks

Agnostic Learning vs. Prior Knowledge challenge When everything else fails, ask for additional domain knowledge… • Two tracks: • Agnostic learning: Preprocessed datasets in a nice “feature-based” representation, but no knowledge about the identity of the features. • Prior knowledge: Raw data, sometimes not in a feature-based representation. Information given about the nature and structure of the data.

Part I DATASETS

Datasets Type Dataset Validation Examples Domain Feat-ures Training Examples Test Examples Dense ADA 415 Marketing 48 4147 41471 Dense GINA 315 Digits 970 3153 31532 Dense HIVA 384 Drug discovery 1617 3845 38449 Sparse binary NOVA 175 Text classif. 16969 1754 17537 Dense SYLVA 1308 Ecology 216 13086 130858 http://www.agnostic.inf.ethz.ch

ADA ADA is the marketing database • Task: Discover high revenue people from census data. Two-class pb. • Source: Census bureau, “Adult” database from the UCI machine-learning repository. • Features: 14 original attributes including age, workclass, education, education, marital status, occupation, native country. Continuous, binary and categorical features.

GINA GINA is the digit database • Task: Handwritten digit recognition. Separate the odd from the even digits. Two-class pb. with heterogeneous classes. • Source: MNIST database formatted by LeCun and Cortes. • Features: 28x28 pixel map.

HIVA HIVA is the HIV database • Task: Find compounds active against the AIDS HIV infection. We brought it back to a two-class pb. (active vs. inactive), but provide the original labels (active, moderately active, and inactive). • Data source: National Cancer Inst. • Data representation: The compounds are represented by their 3d molecular structure.

NOVA Subject: Re: Goalie masksLines: 21Tom Barrasso wore a great mask, one time, last season. He unveiled it at a game in Boston. It was all black, with Pgh city scenes on it. The "Golden Triangle" graced the top, alongwith a steel mill on one side and the Civic Arena on the other. On the back of the helmet was the old Pens' logo the current (at the time) Penslogo, and a space for the "new" logo.A great mask done in by a goalie's superstition.Lori NOVA is the text classification database • Task: Classify newsgroup emails into politics or religion vs. other topics. • Source: The 20-Newsgroup dataset from in the UCI machine-learning repository. • Data representation : The raw text with an estimated 17000 words of vocabulary.

SYLVA SYLVA is the ecology database • Task: Classify forest cover types into Ponderosa pine vs. everything else. • Source: US Forest Service (USFS). • Data representation: Forest cover type for 30 x 30 meter cells encoded with 108 features (elavation, hill shade, wilderness type, soil type, etc.)

Part II PROTOCOL and SCORING

Protocol • Data split: training/validation/test. • Data proportions: 10/1/100. • Online feed-back on validation data (1st phase). • Validation labels released in February, 2007. • Challenge prolonged until August 1st, 2007. • Final ranking on test data using the five last complete submissions for each entrant.

Performance metrics • Balanced Error Rate (BER): average of error rates of positive class and negative class. • Area Under the ROC Curve (AUC). • Guess error (for the performance prediction challenge only): dBER = abs(testBER – guessedBER)

Ranking • Compute an overall score: • For each dataset, regardless of the track, rank all the entries with “test BER”. Score=entry_rank/max_rank. • Overall_score=average score over datasets. • Keep only the last five complete entries of each participant, regardless of track. • Individual dataset ranking: For each dataset, make one ranking for each track using “test BER”. • Overall ranking: Rank the entries separately in each track with their overall score. Entries having “prior knowledge” results for at least one dataset are entered in the “prior knowledge” track.

Part III RESULT ANALYSIS

Challenge statistics • Date started: October 1st, 2006. • Milestone (NIPS 06): December 1st, 2006 • Milestone: March 1st, 2007 • End: August 1st, 2007 • Total duration: 10 months. • Five last complete entries ranked (Aug 1st): • Total ALvsPK challenge entrants: 37. • Total ALvsPK development entries: 1070. • Total ALvsPK complete entries: 90 prior + 167 agnos. • Number of ranked participants: 13 (prior), 13 (agnos). • Number of ranked submissions: 7 prior + 12 agnos

Best entry performance, IJCNN06 challenge 0.5 0.45 0.4 0.35 HIVA 0.3 BER 0.25 0.2 ADA 0.15 0.1 NOVA GINA 0.05 SYLVA 0 0 20 40 60 80 100 120 140 160 Time (days) Learning curves

Best entry performance, IJCNN07 challenge 0.3 HIVA 0.25 0.2 ADA BER 0.15 0.1 NOVA 0.05 GINA SYLVA 0 5 0 10 Time (months) Learning curves

BER distribution Agnostic learning Prior knowledge The black vertical line indicates the best ranked entry (only the 5 last entry of each participant were ranked). Beware of overfitting!

Final AL results Agnostic learning best ranked entries as of August 1st, 2007 Best ave. BER still held by Reference (Gavin Cawley) with “the bad”. Note that the best entry for each dataset is not necessarily the best entry overall.

Method comparison (PPC) Agnostic track no significant improvement so far dBER Test BER

LS-SVM Gavin Cawley, July 2006

Logitboost Roman Lutz, July 2006

Final PK results Prior knowledge best ranked entries as of August 1st, 2007 Best ave. BER held by Reference (Gavin Cawley) with “interim all prior”. Louis Duclos-Gosselin is second on ADA with Neural Network13, and S. Joshua Swamidass second on HIVA, but they are not entered in the table because he did not submit a complete entry. The overall entry ranking is performed with the overall score (average rank over all datasets). The best performing complete entry may not contain all the best performing entries on the individual datasets. We indicate the ranks of the “prior” entries only for individual datasets.

AL vs. PK, who wins? We compare the best results of the ranked entries for entrants who entered both tracks. If the Agnostic Learning BER is larger than the Prior Knowledge BER, “1” is shown in the table. The sign test is not powerful enough to reveals a significant advantage of PK or AL.

Progress? • On ADA and NOVA, the best results obtained by the participants is in the agnostic track! But it is possible to do better with prior knowledge: on ADA, the PK winner has a worse AL entry; the PK best reference entry yields best results on NOVA. • On GINA and SYLVA, significantly better results are achieved in the prior knowledge track and all but one participant who entered both tracks did better with PK. • On HIVA, experts achieve significantly better results with prior knowledge, but non-experts entering both tracks do worse in the PK track.

Conclusion • PK wins, but not by a huge margin. Improving performances using PK is not that easy! • AL using fairly simple low level features is a fast way of getting hard-to-beat results. • The website will remain open for post-challenge entries http://www.agnostic.inf.ethz.ch.

Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon

Agnostic Learning vs. Prior Knowledge challenge Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon

Presentation Transcript

RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon,

NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies

Gideon Vs. Wainwright

Prior Knowledge

Access Prior Knowledge

Gideon vs Wainwright 1963

Prior Knowledge!

Prior Knowledge

Prior Knowledge

Presenter: Alexander Rodack Mentor: Dr. Olivier Guyon

Prior Knowledge

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon

Prior knowledge

Matthew Guyon Week 7: March 1 st , 2007

Matthew Guyon Week 5: February 15 th , 2007

Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn

Matthew Guyon Week 9: March 22 nd , 2007

PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge

Feature Selection and Bioinformatics Applications Isabelle Guyon

Matthew Guyon Week 3: February 1 st , 2007

Prior Knowledge

Competitions in machine learning: the fun, the art, and the science Isabelle Guyon