Evaluation of classifiers: choosing the right model

Evaluation of classifiers: choosing the right model Rajashree Paul Irina Pekerskaya

1. Introduction • Different classification algorithms – which one is the best? No free lunch theorem • Huge amount of experimental evaluations already conducted • Idea – to ask what is important to the user SFU, CMPT 740 R. Paul, I. Pekerskaya

1.1. Long-term vision • User define quality measures that are important • Data characteristics of the given dataset are analyzed • Feature selection method is chosen • Classification method is chosen • The parameters of the classification method are optimized SFU, CMPT 740 R. Paul, I. Pekerskaya

1.2. Project goals • Analyze possible user-defined criteria • Analyze possible data characteristics • Analyze the behavior of chosen algorithms • Give some suggestions on the algorithm choice with respect to those criteria and characteristics • Survey some approaches for automatic selection of the classification method SFU, CMPT 740 R. Paul, I. Pekerskaya

Presentation Progress • Introduction • Classification Algorithms • Specifying Classification Problems • Combining Algorithms and different measures for a final mapping • Meta Learning and automatic selection of the algorithm SFU, CMPT 740 R. Paul, I. Pekerskaya

2. Classification Algorithms: • Decision Tree (C4.5) • Neural Net • Naïve Bayes • k – Nearest Neighbor • Linear Discriminant Analysis (LDA) SFU, CMPT 740 R. Paul, I. Pekerskaya

2.1 Decision Trees (C4.5) • Starts with the root as complete dataset • Chooses the best split using Information Ratio and partitions the dataset accordingly • Recursively does the same thing at each node • Stops when no attribute is left or all records of the node are of same class • Applies Post pruning to avoid overfitting SFU, CMPT 740 R. Paul, I. Pekerskaya

2.2 Neural Net • Layered network of Neurons ( Perceptrons) connected via weighted edges • In Backpropagation Network 1. Input and the Expected Output are fed forward 2. Actual and Expected Outputs are compared (Error) 3. For each node Error is propagated back through the network to tune the weights and bias of each node SFU, CMPT 740 R. Paul, I. Pekerskaya

2.3 Naïve Bayes • Assumes that the Input Attributes are independent of each other given a target class • Decision Rule of the Naïve Bayes Classifier : where P(Oi | Cj) is the posterior probability of attribute Oi conditioned on class Cj SFU, CMPT 740 R. Paul, I. Pekerskaya

2.4 k – Nearest Neighbor • Uses a distance function as a measure of similarity/dissimilarity between two objects • Classifies objects to the majority class of k nearest objects • Value of k may vary SFU, CMPT 740 R. Paul, I. Pekerskaya

2.5 Linear Discriminant Analysis (LDA) • Optimal for normally distributed data and classes having equal covariance structure What it does: • For each class it generates a linear function • For a problem with d features it separates the classes by (d-1) dimensional hyperplanes SFU, CMPT 740 R. Paul, I. Pekerskaya

3. Specifying Classification Problems • 3.1 Quality Measures • 3.2 Data Characteristics SFU, CMPT 740 R. Paul, I. Pekerskaya

3.1 Quality Measures • Accuracy • Training Time • Execution Time • Interpretability • Scalability • Robustness • … SFU, CMPT 740 R. Paul, I. Pekerskaya

3.2 Data Characteristics • Size of Dataset • Number of Attributes • Amount of Training Data • Proportion of Symbolic Attributes • Proportion of Missing Values • Proportion of Noisy Data • Linearity of Data • Normality of Data SFU, CMPT 740 R. Paul, I. Pekerskaya

4. Combining Algorithms and Measures • 1st step toward our goal Comparing the algorithms with respect to different Data Characteristics SFU, CMPT 740 R. Paul, I. Pekerskaya

4.1Evaluating Algorithms with respect to Data Characteristics SFU, CMPT 740 R. Paul, I. Pekerskaya

4. Combining Algorithms and Measures …(contd.) • 2nd step… Comparing Algorithms in terms of different Quality Measures SFU, CMPT 740 R. Paul, I. Pekerskaya

4.2 Evaluating Algorithms with respect to Quality Measures SFU, CMPT 740 R. Paul, I. Pekerskaya

4. Combining Algorithms and Measures…(Contd.) • Finally… Merging the information of two tables into one table : (Data characteristics + Quality Measures) => Algorithms SFU, CMPT 740 R. Paul, I. Pekerskaya

4.3 Choice of the algorithm SFU, CMPT 740 R. Paul, I. Pekerskaya

5. Meta-learning Overview • We have some knowledge about the algorithms performance on some datasets • Measure similarity between given dataset and previously processed ones and choose those that are similar • Measure algorithm performance on chosen datasets • Rank the algorithms SFU, CMPT 740 R. Paul, I. Pekerskaya

5. Meta-learning Measuring the similarity between datasets Data characteristics: • Number of examples • Number of attributes • Proportion of symbolic attributes • Proportion of missing values • Proportion of attributes with outliers • Entropy of classes SFU, CMPT 740 R. Paul, I. Pekerskaya

5. Meta-learning Measuring the similarity between datasets K-Nearest Neighbor Distance function: SFU, CMPT 740 R. Paul, I. Pekerskaya

5. Meta-learning Evaluating algorithms Adjusted ratio of ratios(ARR) multicriteria evaluation measure: AccD – the amount of accuracy user can trade for a 10 times speedup or slowdown SFU, CMPT 740 R. Paul, I. Pekerskaya

5. Meta-learning Ranking algorithms • Calculate ARR for each algorithm • Rank them according to this measure SFU, CMPT 740 R. Paul, I. Pekerskaya

Given a new dataset with data certain characteristics Find k datasets most similar to the new problem dataset using k-NN algorithm Time and accuracy on training datasets Count the ARR measure for all the algorithms for k chosen datasets Rank the algorithms according to ARR 5. Meta-learning SFU, CMPT 740 R. Paul, I. Pekerskaya

5. Meta-learning • Promising approach, as it allows quantitative measure • Needs more investigation, for example incorporating more parameters to the ARR parameter SFU, CMPT 740 R. Paul, I. Pekerskaya

6. Conclusion Done: • Overall mapping of five classifiers according to some user specified parameters along with some characteristics reflecting the type of datasets. • Surveyed systems for automatic selection of the appropriate classifier or rank some different classifiers given a dataset and set of user preferences SFU, CMPT 740 R. Paul, I. Pekerskaya

6. Conclusion Future research • Performing dedicated experiments • Analysis of the choice of feature selection method and optimization of the parameters • Working on the intelligent system for choosing classification method SFU, CMPT 740 R. Paul, I. Pekerskaya

7. References • 1. Brazdil, P., & Soares, C. (2000). A comparison of ranking methods for classification algorithm selection. In R.de M´antaras & E. Plaza (Eds.), Machine Learning: Proceedings of the 11th European Conference on Machine Learning ECML2000 (pp. 63–74). Berlin: Springer. • 2. Brodley, C., Utgoff ,C. (1995). Multivariate Decision Trees, Machine Learning, 19 • 3. Brown, D., Corruble, V, Pittard, C.L. (1993) A Comparison of Decision Tree Classifiers With Backpropagation Neural Networks For Multimodal Classification Problems, Pattern Recognition, Vol. 26,No. 6 • 4. Curram, S., Mingers, J (1994) Neural Networks, Decision Tree Induction and Discriminant Analysis : An Empirical Comparison, Operational Research Society, Vol. 45, No. 4 • 5. Han, J, Kamber, M. (2001) Data Mining:concepts and techniques. San Francisco:Morgan Kaufmann Publishers • 6. Hilario, M., & Kalousis, A. (1999). Building algorithm profiles for prior model selection in knowledge discovery systems. In Proceedings of the IEEE SMC’99 International Conference on Systems, Man and Cybernetics. New York: IEEE Press. • 7. Kalousis, A.,&Theoharis, T. (1999).NOEMON:Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3:5, 319–337. SFU, CMPT 740 R. Paul, I. Pekerskaya

7. References • 8. Keller, J., Paterson, I., & Berrer, H. (2000). An integrated concept for multi-criteria ranking of data-mining algorithms. In J. Keller & C. Giraud-Carrier (Eds.), Meta-Learning: Building Automatic Advice Strategies forModel Selection and Method Combination. • 9. Kiang, M. (2003). A Comparative Assessment of Classification Methods, Decision Support Systems, 35 • 10. Kononenko,I, Bratko,I. (1991). Information-Based Evaluation Criterion for Classifier’s Performance, Machine Learning ,6 • 11 Lim, T., Loh, W., Shoh, Y.(2000). A comparison of Prediction Accuracy, Complexity and Training Timeof Thirty-Three Old and New Classification Algorithms, Machine Learning ,6 • 12. Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood. • 13. Mitchell, T. (1997). Machine Learning. New York: McGraw-Hill. • 14. Quinlan, J. (1993) C4.5 : programs for machine learning. San Francisco : Morgan Kaufmann Publishers SFU, CMPT 740 R. Paul, I. Pekerskaya

7. References • 15. Salzberg, S.(1997) On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach, Data Mining and knowledge Discovery, 1 • 16. Shalvic, J, Mooney, R., Towell, G. (1991). Symbolic and Neural Learning Algorithms: An Experimental Comparison, Machine Learning ,40 • 17. Todorovski, L., & Dˇzeroski, S. (2000). Combining multiple models with meta decision trees. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD00) (pp. 54–64). New York: Springer. • 18. Weiss, S., Kulikowski, C. (1991). Computer Systems that learn. San Francisco:Morgan Kaufmann Publishers • 19. Witten, I., Frank, E. (2000) Data mining: practical machine learning tools and techniques with Java implementations. San Francisco:Morgan Kaufmann Publishers • 20. Brazdil, P., Soares, C.(2003) Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results, Machine Learning, 50, 251–277 • 21. Blake, C., Keogh, E., & Merz, C. (1998). Repository of machine learning databases. Available at http:/www. ics.uci.edu/~mlearn/MLRepository.html SFU, CMPT 740 R. Paul, I. Pekerskaya

Thank you Questions?

Evaluation of classifiers: choosing the right model