Combining Multiple Experts’ Opinion CS570 Lecture Notes by Jin Hyung Kim

Combining Multiple Experts’ Opinion CS570 Lecture Notes by Jin Hyung Kim Computer Science Department KAIST

Combining Multiple Decisions • Methods for reaching consensus in group decision making • Voting method • Social choice functions - preference ordering • Decision with multiple decision-makers

Basic Problem Statement • Given a number of experts working on the same problem, is group decision superior to individual decisions? Excerpted from A. F. R. Rahman and M. C. Fairhurst

Is Democracy the answer? May be ??? Excerpted from A. F. R. Rahman and M. C. Fairhurst

Clever ways to Exploit Experts • Complimentary Information • Redundancy: Check and Balance • Simultaneous with a combination method Issues : • How to express expert’s preference or belief • Computational cost of making consensus

Delpi Method • A set of procedures for formulating a group judgment • Based on the adage of 'Two heads are better than one' • logically 'n' heads should at least be better than two (certainly no worse). • Delphi procedures led to increased accuracy of group responses • the spread of answers (standard deviation of responses on a given question) • a self-rating index (average of individual self-ratings on a given question)

Delpi Method procedure • Anonymous response • Opinions of members are obtained by questionnaire or formal communication channels • Iteration and controlled feedback • systematic exercise conducted in several iterations with feedback between rounds • a summary of results of previous round are conveyed to participants • Statistical group response • The group opinion is defined as the statistical aggregate of individual responses on the final round.

Why Delpi method is good ? • Influence of dominant individuals • The group is highly influenced by the person who talks the most. There is little correlation between speech and knowledge. • Noise • There exist semantic noise. Much of the communication has to do with individual interests, not with problem solving per se. • Group pressure for conformity • Group pressure can have the effect of distorting individual judgments.

Combining Multiple Experts • Approaches for improving the performance of the group of experts • Best single expert vs Combining multiple experts • Difficulty of single expert(classifier) approach • Some features could hardly be incorporated into a single classifier • Two heads (experts, classifiers) are better than one • Method for generating multiple classifiers • Method for combining multiple classifiers

Sorts of Combination methods • Three sorts of classification results or decisions of classifiers: • Measurement score • Ranking or ranked list • Single choice • Three sorts of Architecture: • Sequential or Serial • Conditional Topology • Once a classifier fails, the next will be called • Hierarchical Topology • Classifiers with various levels will be applied in succession • Parallel • Hybrid

Three Levels of Classification Results • Measurement Level • Rank Level • Abstract Level ordering top choice

Serial Architecture

Parallel Architecture Horizontal Systems

Hybrid Approach

Combining Multiple Classifiers Classifier #1 Classifier #2 Decision Combinator Fused Decision Input Classifier #(K-1) Classifier #K

Decision Combination Methods • Decision Combination Type • Majority voting • Weighted majority voting • Borda Count • Naïve Bayesian with Independent Assumption • Behavioral Knowledge Space • Dependency-based framework • Divide and Conquer Type • Boosting • Adaboosting

Decision Combination Type

Divide and Conquer type Individual Solution Final Solution

For an input x from class b, three classifiers yield rankings: Example: Decisions with Multiple Classifiers • Notations • a set of classes : • a set of classifiers : • an input :

Decision with Voting Method

Weighted Voting • A generalization of Voting scheme • Weight is a relative significant of classifier • Weight can be estimated based on observations of the classifier performance • How ? – it is a supervised parameter learning

Decision with Borda Count • Borda Count : sum of reverse ranks

Decision by Bayesian Method

Behavioral Knowledge Space • Estimate the prior probabilities from a training set • By Counting frequencies for each combination of the classifier decisions • Once these probabilities obtained, BKS is a lookup table • Simple

BKS Example From Multiple Classifier Systems by F. Roli

BKS Drawbacks • Require KN+1 posterior probabilities • K : number of classes • N : number of classifiers • In case of limited training samples • Many of the cell is empty • Estimation by small number of cases From Multiple Classifier Systems by F. Roli

Dependency-based Bayesian • Problems of Independence Assumption • highly dependent classifiers under independence assumption cause problems. • Biased decision combination • Degraded performance • A new combination method without independence assumption is desirable.

is a ditto to C 1 With Additional Classifier C4

Undesirables under Independence Assumption

K-th Order Dependency • (Conditional) Independence Assumption • Approximation with kth-order Dependency • Behavior-Knowledge Space (BKS) Method too coarse too large

Generating MCSs • Effectiveness of Ensemble method relies on characteristics of classifiers as a group • Complementary classifiers • Injecting randomness • Varying classifier architecture, parameters, etc. • Manipulating the training data • Mostly heuristics, but recently theoretical bases

Bagging • Bagging is a method of constructing multiple classifiers by training data manipulation • Obtaining equal size training sets by bootstrap • Bootstrap : randomly draw N samples with replacement from original N samples • Any combination method can be used • Majority voting • Simple averaging

Schemes of Bagging

Boosting • Combine Weak learning models to a strong learning model • Weak learning model : error is slightly less than random guess • First expert is trained on a set of N examples • N Misclassified examples are used as training set • Subsequent experts are trained with examples on which previously trained experts disagree • Need a LARGE training set • Solution : allow to reuse them

AdaBoost • Update the Weight of training examples • Easy examples get lower weight • Hard examples get higher weight • For all i, w(i) = 1/N • For all classifier J, w(i) = w(i) * b if J correctly classify i where b is exp(- log((1-e)/e) and e is weighted training error. • Normalize all w(i)’s and go to 2 • Final classification is weighted vote of T weak learner • Mostly used algorithm

Combining Multiple Experts’ Opinion CS570 Lecture Notes by Jin Hyung Kim