Classifier ensembles: Does the combination rule matter?

Classifier ensembles: Does the combination rule matter? LudmilaKuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk

class label classifier ensemble combiner classifier classifier classifier feature values (object description)

Congratulations! The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum. We applaud all the contributors to this quest, which improves our ability to connect people to the movies they love. class label classifier ensemble combiner classifier classifier classifier feature values (object description)

class label classifier ensemble combiner classifier classifier classifier cited 7194 times by 28 July 2013 (Google Scholar) feature values (object description)

Classifier combination? Hmmmm….. David J. Hand (2006) Classifier technology and the illusion of progress, Statist. Sci. 21(1), 1-14. SasoDzeroski We are kidding ourselves; there is no real progress in spite of ensemble methods. S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the best one? Machine Learning, 54, 255-273. Chances are that the single best classifier will be better than the ensemble. David Hand

Quo Vadis? "combining classifiers" OR "classifier combination" OR "classifier ensembles" OR "ensemble of classifiers" OR "combining multiple classifiers" OR "committee of classifiers" OR "classifier committee" OR "committees of neural networks" OR "consensus aggregation" OR "mixture of experts" OR "bagging predictors" OR adaboost OR (( "random subspace" OR "random forest" OR "rotation forest" OR boosting) AND "machine learning")

Gartner’s Hype Cycle: a typical evolution pattern of a new technology Where are we?...

(6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE TSMC = IEEE Transactions on Systems, Man and Cybernetics JASA = Journal of the American Statistical Association IJCV = International Journal of Computer Vision JTB = Journal of Theoretical Biology (2) PPL = Protein and Peptide Letters JAE = Journal of Animal Ecology PR = Pattern Recognition (4) ML = Machine Learning NN = Neural Networks CC = Cerebral Cortex top cited paper is from… application paper

International Workshop on Multiple Classifier Systems 2000 – 2013 - continuing

Levels of questions • A Combination level • selection or fusion? • voting or another combination method? • trainable or non-trainable combiner? Combiner • B Classifier level • same or different classifiers? • decision trees, neural networks or other? • how many? Classifier 1 Classifier 2 … Classifier L • CFeature level • all features or subsets of features? • random or selected subsets? Features • DData level • independent/dependent bootstrap samples? • selected data sets? Data set

Strength of classifiers The perfect classifier ?  Large ensemble of nearly identical classifiers - REDUNDANCY • 3-8 classifiers • heterogeneous • trained combiner • (stacked generalisation) • 30-50 classifiers • same or different models? • trained or non-trained combiner? • selection or fusion? • IS IT WORTH IT? How about here? Number of classifiers L 1 Must engineer diversity…  Small ensembles of weak classifiers - INSUFFICIENCY ? • 100+ classifiers • same model • non-trained combiner • (bagging, boosting, etc.)

Strength of classifiers The perfect classifier  Large ensemble of nearly identical classifiers - REDUNDANCY • 3-8 classifiers • heterogeneous • trained combiner • (stacked generalisation) Diversity is pretty impossible… • 30-50 classifiers • same or different models? • trained or non-trained combiner? • selection or fusion? • IS IT WORTH IT? Number of classifiers L 1 Must engineer diversity…  Small ensembles of weak classifiers - INSUFFICIENCY • 100+ classifiers • same model • non-trained combiner • (bagging, boosting, etc.) Diversity is absolutely CRUCIAL!

Label outputs Continuous-valued outputs Decision profile 1 2 3 1 2 3 x x

Ensemble (label outputs, R,G,B) Red Red Blue Green Majority vote Red Red Red 204 R 102 G 54 B

Majority vote Ensemble (label outputs, R,G,B) Red Red Red Blue Green Red Red 200 R 219 G 190 B Weighted Majority vote Green 0.27 0.70 0.50 0.05 0.50 0.02 0.10 0.70 0.10

RBRRGR Ensemble (label outputs, R,G,B) Red Red Blue Classifier Green Red Red Green

Ensemble (continuous outputs, [R,G,B]) [0.7 0.6 0.5] [0.1 0.0 0.6] [0.4 0.3 0.1] [0.6 0.3 0.1] [0 1 0] [0.9 0.7 0.8]

Ensemble (continuous outputs, [R,G,B]) [0.7 0.6 0.5] [0.1 0.0 0.6] [0.4 0.3 0.1] [0.6 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R = 0.45

Ensemble (continuous outputs, [R,G,B]) [0.70.6 0.5] [0.10.0 0.6] [0.4 0.3 0.1] [0.60.3 0.1] [0 1 0] [0.90.7 0.8] Mean R = 0.45 Mean G = 0.48

Ensemble (continuous outputs, [R,G,B]) [0.70.60.5] [0.10.00.6] [0.4 0.3 0.1] [0.60.30.1] [0 1 0] [0.90.7 0.8] Mean R = 0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN

Ensemble (continuous outputs, [R,G,B]) [0.70.60.5] [0.10.00.6] Decision profile [0.4 0.3 0.1] 0.6 0.3 0.1 0.1 0.0 0.6 0.7 0.6 0.5 0.4 0.3 0.1 0 .0 1.0 0.0 0.9 0.7 0.8 [0.60.30.1] [0 1 0] [0.90.7 0.8] Mean R = 0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN

Decision profile classes Support that classifier #4 gives to the hypothesis that the object to classify comes from class #3. 0.6 0.3 0.1 0.1 0.0 0.6 0.7 0.6 0.5 0.4 0.3 0.1 0 .0 1.0 0.0 0.9 0.7 0.8 classifiers Would be nice if these were probability distributions...

Decision profile classes classifiers … We can take probability outputs from the classifiers

Combination Rules For label outputs For continuous-valued outputs Majority (plurality) vote Simple rules: minimum, maximum, product, average (sum) Weighted majority vote Naïve Bayes c Regressions BKS A classifier A classifier

Combination Rules Decision profile For label outputs For continuous-valued outputs Majority (plurality) vote Simple rules: minimum, maximum, product, average (sum) Weighted majority vote Naïve Bayes c Regressions BKS A classifier A classifier

class label classifier ensemble combiner classifier classifier classifier feature values (object description)

class label classifier ensemble classifier classifier classifier classifier feature values (object description)

Bob Duin: The Combining Classifier: to Train or Not to Train? http://samcnitt.tumblr.com/

Tin Ho: “Multiple Classifier Combination: Lessons and Next Steps”, 2002 “Instead of looking for the best set of features and the best classifier, now we look for the best set of classifiers and then the best combination method. One can imagine that very soon we will be looking for the best set of combination methods and then the best way to use them all. If we do not take the chance to review the fundamental problems arising from this challenge, we are bound to be driven into such an infinite recurrence, dragging along more and more complicated combination schemes and theories and gradually losing sight of the original problem.”

Conclusions - 1 • Classifier ensembles: Does the combination rule matter? • In a word, yes. • But its merit depends upon • the base classifier model, • the training of the individual classifiers, • the diversity, • the possibility to train the combiner, and more.

Conclusions - 2 The choice of the combiner should not be side-lined. The combiner should be chosen in relation to the rest of the ensemble and the available data.

Questions to you: What is the future of classifier ensembles? (Are they here to stay or are they a mere phase?) In what direction(s) will they evolve/dissolve? What will be the ‘classifier of the future’? Or the ‘classification paradigm of the future’? And one last question: How can we get a handle of the ever growing scientific literature in each and every area? How can we find the gems among the pile of stones?

Classifier ensembles: Does the combination rule matter?

Classifier ensembles: Does the combination rule matter?

Presentation Transcript

PPT Presentation

Download the Powerpoint Presentation - PowerPoint Presentation

talk-ppt - PowerPoint Presentation

Does IT Matter?

Does IT Matter?

DOES THE CHURCH REALLY MATTER?

Does Governance Matter?

Does agility matter?

Does quality matter?

Size does matter!

Does Homosexuality Matter?

Does truth matter?

Does Happiness Matter?

Size Does Matter!

Classifier Ensembles: Facts, Fiction, Faults and Future

Skills Lab: Combination of Matter

Does Denial Matter?

Modeling Consensus: Classifier Combination for WSD

The USA v . The CWB Does NAFTA work? Does it matter? Telecon-Presentation

Singularities: does matter matter?

Does Gender Matter?

Does SiZe Matter ?