Similarity-based Classifier Combination for Decision Making

Similarity-based Classifier Combination for Decision Making Authors: Gongde Guo, Daniel Neagu Department of Computing, University of Bradford

Outline of Presentation • Background • Classification process; • Drawbacks of A Single Classifier • Solutions … • Approaches for Multiple Classifier Systems • Explanation of the Four Approaches • An Architecture of Multiple Classifier System • Involved Classifiers for Combination • K-Nearest Neighbour Method (kNN) • Weighted k-Nearest Neighbour Method (wkNN) • Contextual Probability-based Classification (CPC) • kNN Model-based Method (kNNModel) • Combination Strategies • Majority voting –based combination • Maximal Similarity-based Combination • Average Similarity-based Combination • Weighted Similarity-based Combination • Experimental Results • Conclusions • Reference

Background - Classification Process Classification occurs in a wide range of human activities. At its broadest, the term could cover any activity in which some decision or forecast is made on the basis of currently available information, and a classifier is then some formal method for repeatedly making such judgments in new situations (Michie et al. 1994). Various approaches to classification have been developed and applied to real-world applications for decision making. Examples include probabilistic decision theory, discriminant analysis, fuzzy-neural networks, belief networks, non-parametric methods, tree-structured classifiers, and rough sets.

Background - Drawbacks of A Single Classifier Unfortunately, no dominant classifier exists for all the data distributions, and the data distribution of the task at hand is usually unknown. A single classifier cannot be discriminative enough if the number of classes is huge. For applications where the classes of content are numerous, unlimited, and unpredictable, one specific classifier cannot solve the problem with a good accuracy.

Background - Solutions A Multiple Classifier System (MCS) is a powerful solution to difficult decision making problems involving large sets and noisy input because it allows simultaneous use of arbitrary feature descriptors and classification procedures. The ultimate goal of designing such a multiple classifier system is to achieve the best possible classification performance for the task at hand. Empirical studies have observed that different classifier designs potentially offer complementary information about the patterns to be classified, which could be harnessed to improve the performance of the selected classifier.

Combiner Combiner Classifier 1 Classifier i … Classifier L Classifier 1 Classifier i … Classifier L x x Approach 1: Different combination schemes. Approach 2: Different classifier models. Combiner … Classifier 1 Classifier i … Classifier L S1 Si Sk D1 Di Dm x Approach 4: Different training sets. Approach 3: Different feature subsets. Architecture of Multiple Classification Systems Given a set of classifiers C={C1, C2, …, CL} and a dataset D, each instance x in D represents as a feature vector [x1, x2, …, xn]T, x A classifier gets as its input x and assigns it to a class label from Ω, i.e. Four approaches are generally used to design a classifier combination system (Kuncheva, 2003).

Explanation of the Four Approaches Approach 1: The problem is to pick a combination scheme for L classifiers C1, C2, …, CL studied to form a combiner. Approach 2: The problem is to choose individuals (classifiers) by considering the issues of similarity/ diversity, homogeneous/heterogeneous etc. Approach 3: The problem is to build each Ci on an individual subset of features (subspace of ) Approach 4: The problem is to select training subsets D1, D2, …, Dm of the dataset D to lead to a team of diverse classifiers.

An Architecture of Multiple Classifier System Data Sets kNN Output1 Classifier Combination kNNModel Output2 Data Pre-processing Output wkNN Output3 GR IG CFS CPC Output4 MSC ASC WSC

k=5 x Involved Classifiers for Combination- kNN Given an instance x, the k-nearest neighbour classifier finds its k nearest instances, and traditionally uses the majority rule (or majority votingrule) to determine its class, i.e. assigning the single most frequent class label associated with the k nearest neighbours to x. This is illustrated in Figure 3. The two classes here are depicted by “□” and “o”, with ten instances for each class. Each instance is represented by a two-dimensional point within a continuous-valued Euclidean space. The instance x, represented as ‘ ’.

Involved Classifiers for Combination-wkNN In wkNN, the k nearest neighbours are assigned different weights. Let ∆ be a distance measure, and x1, x2, …, xk be the k nearest neighbours of x arranged in increasing order of ∆(xi, x). So x1 is the first nearest neighbour of x. The distance weight wi for i-th neighbour xi is defined as follows: Instance x is assigned to the class for which the weights of the representatives among the k nearest neighbours sum to the greatest value.

Involved Classifiers for Combination- CPC Contextual probability-based classifier (CPC) (Guo et al., 2004) is based on a new function G – a probability function used to calculate the support of overlapping or non-overlapping neighbourhoods. The idea of CPC is to aggregate the support of multiple sets of nearest neighbours of a new instance for various classes to give a more reliable support value, which better reveals the true class of this instance.

Involved Classifiers for Combination- kNNModel The basic idea of kNN model-based classification method (kNNModel) (Guo et al. 2003) is to find a set of more meaningful representatives of the complete data set to serve as the basis for further classification. Each chosen representative xi is represented in the form of <Cls(xi), Sim(xi), Num(xi), Rep(xi)> which respectively represents the class label of xi; the similarity of xi to the furthest instance among the instances covered by Ni; the number of instances covered by Ni; a representation of intance xi. The symbol Ni represents the area that the distance to Ni is less than or equal to Sim(xi). kNNModel can generate a set of optimal representatives via inductively learning from the dataset.

Combination Strategy–Majority Voting-based Combination Given a new instance x to be classified, whose true class label is tx and k predefined classifiers are denoted as A1, A2, …, Ak respectively, the classifier Ai approximates a discrete-valued function : The final class label of x,obtained by using majority voting-based classifier combination, is described as follows: f(x) where if a=b, and otherwise

Combination Strategy – Class-wise similarity-based classifier combination The classification result of x classified by Aj is given by a vector of normalized similarity values of x to each class, represented by S = <Sj1, Sj2, …, Sjm>, where j=1, 2, …, k. The final class label of x can be obtained in three different ways: a) Maximal Similarity-based Combination (MSC): b) Average Similarity-based Combination (ASC): c) Weighted Similarity-based Combination (WSC): , where is a control parameter used for setting the relative importance of local optimization and global optimization of combination.

Experimental Results This study mainly focuses on Approach 1. Given four classifiers: kNN, kNNModel, CPC and wkNN, we proposed three similarity-based classifier combination schemes empirically. After evaluating them on fifteen public datasets from UCI machine learning repository, we apply the best approach to a real-world application of toxicity prediction of the environment effects of chemicals in order to obtain better classification performance.

Fifteen public data sets from the UCI machine learning repository and one data set (Phenols) from real-world applications (toxicity prediction of chemical compounds) have been collected for training and testing. Some information about these data sets is given in Table 1. Table 1. Some information about the data sets In Table 1, NF-Number of Features, NN-Number of Nominal features, NO-Number of Ordinal features, NB-Number of Binary features, NI-Number of Instances, CD-Class Distribution. Four Phenols data sets are used in the experiment, where Phenols_M represents the phenols data set with MOA (Mechanism of Action) as endpoint for prediction; Phenols_M_FS represents the Phenols_M data set after feature selection; Phenols_T represents the Phenols data set with toxicity as endpoint for prediction, and Phenols_T_FS represents Phenols_T data set after feature selection.

Table 2. A comparison of four individual algorithms and MV in classification performance.

Table 3. A comparison of different combination schemes

Table 4. The signed test of different classifiers In Table 4, the item 2.07 (+) in cell (3, 4), for example, means WSC is better than kNNModel in terms of performance over the nineteen data sets. That is, the corresponding |Z|>Z0.95=1.729. The item 1.15 (-) in cell (3, 2) means there is no significant difference in terms of performance between WSC and SVM over nineteen data sets as the corresponding |Z|<Z0.95=1.729.

Conclusions • The proposed methods directly employ class-wise similarity measure used in each individual classifier for combination without changing the representation from similarity to probability. • It significantly improves the average classification accuracy carried out over nineteen data sets. The average classification accuracy of WSC is better than that of any other individual classifiers and the majority voting-based combination method. • The statistical test also shows that the proposed combination method WSC is better than any individual classifier with an exception of SVM. • The average classification accuracy of WSC is still better than that of SVM with a 2.49% improvement. • Further research is required into how to combine heterogeneous classifiers using class-wise similarity-based combination methods.

References (Michie et al. 1994) D. Michie, D.J.Spiegelhalter, and C.C.Taylor. Machine Learning, Neural and Statistical Classification, Ellis Horwood, 1994. (Guo et al. 2003) G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer. kNN Model-Based Approach in Classification. In Proc. of ODBASE 2003, LNCS 2888/2003, pp. 986-996, 2003. (Guo et al. 2004) G. Guo, H. Wang, D. Bell, Z. Liao. Contextual Probability-Based Classification. In Proc. of ER 2004, LNCS 3288/2004, pp. 313-326, Springer-Verlag, 2004. (Kuncheva, 2003) Kuncheva. L.I. Combining Classifiers: Soft Computing Solutions. In: S.K. Pal (Eds.) Pattern Recognition: From Classical to Modern Approaches, pp. 427-452, World Scientific, Singapore, 2003.

Thank you very much!

Similarity-based Classifier Combination for Decision Making

Similarity-based Classifier Combination for Decision Making

Presentation Transcript

Consensus Based Decision Making

Model-based design for decision making

Knowledge-Based Decision Making

Data-Based Decision-Making

Values-Based Decision Making

Evidence based Decision Making

Data Based Decision Making

Values-Based Decision Making

Data Based Decision Making

Data based Decision Making

Data-Based Decision Making

Evidence-based decision making

Data-Based strategies for decision making

Modeling Consensus: Classifier Combination for WSD

Data Based Decision Making

Decision Tree Classifier

KNOWLEDGE BASED DECISION MAKING

Data Based Decision Making

Data Based Decision Making

Fact-Based Decision-Making

Site-Based Decision Making

Knowledge-Based Decision Making