R for Classification
This document presents an exploration of classification techniques using R to automatically identify object types based on measured features. It covers data preparation and exploratory data analysis (EDA) techniques such as box plots and PCA. Additionally, various classification algorithms, including decision trees, support vector machines (SVM), and ensemble methods, are examined. The document also discusses performance assessment methods like confusion matrices and ROC curves. It features example data and highlights the best features for distinguishing between classes.
R for Classification
E N D
Presentation Transcript
R for Classification Jennifer Broughton Shimadzu Research Laboratory Manchester, UK jennifer.broughton@srlab.co.uk 2nd May 2013
Classification? Object Type Feature1 Feature2 Feature3 ……. Feature n Label 1 val[1,1] val[1,2] val[1,3] ……. val[1,n] Label 2 val[2,1] val[2,2] val[2,3] ……. val[2,n] …… ……. ……. ……. ……. ……… Label m val[m,1] val[m.2] val[m,3] ……. val[m,n] Automatic Identification of Type (Class) of Object from Measured Variables (Features) 2 of 17
Example Data 3 of 17
Data Preparation & Investigation EDA Technique Box Plots PCA Decision Trees Clustering • Best features to distinguish between classes • Relationships between • features • Feature reduction Training Set 4 of 17
Box Plots PCA & Multivariate Analysis: ade4 FactoMineR 5 of 17
Example Classifier 6 of 17
Classification Algorithms in R Rattle: RAnalytical Tool to Learn Easily (Rattle: A Data Mining GUI for R, Graham J Williams, The R Journal, 1(2):45-55) 7 of 17
SVM 8 of 17
Ensemble Algorithm 9 of 17
Training and Testing Classification Results Trained Classifier Training Set (labelled) Classification Algorithm: Neural Network Support Vector Machine Random Forest Test Set (unlabelled) Assess Predictions: Confusion Matrix ROC Curve (2 categories) …. Prediction Results + Labels 10 of 17
Using Classifiers in R Select Training Data Build Classifier classifier algorithm(formula, data, options) (boosting and nnet) Run Classifier classifier.pred predict(classifier, newdata, options) 11 of 17
SVM & Neural Net Tuning 12 of 17
Classifier Feedback print(classifier) plot(classifier) high Gini Coefficient = high dispersion 13 of 17
Classifier Prediction Results predict(type = “class”) predict(type = “prob”) confusion matrix 14 of 17
Binary Classification Results Class Present? N Y False Positive True Positive Y Class Detected? False Negative True Negative N 15 of 17
ROC Curves in R ROCR package 16 of 17
Example Results 17 of 17