An Exercise in Machine Learning

An Exercise in Machine Learning • http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/ • Cornelia Caragea

Outline • Machine Learning Software • Preparing Data • Building Classifiers • Interpreting Results

Machine Learning Software • Suites (General Purpose) • WEKA (Source: Java) • MLC++ (Source: C++) • SAS • List from KDNuggets (Various) • Specific • Classification: C4.5, SVMlight • Association Rule Mining • Bayesian Net … • Commercial vs. Free

What does WEKA do? • Implementation of the state-of-the-art learning algorithm • Main strengths in the classification • Regression, Association Rules and clustering algorithms • Extensible to try new learning schemes • Large variety of handy tools (transforming datasets, filters, visualization etc…)

WEKA resources • API Documentation, Tutorials, Source code. • WEKA mailing list • Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations • Weka-related Projects: • Weka-Parallel - parallel processing for Weka • RWeka - linking R and Weka • YALE - Yet Another Learning Environment • Many others…

Preparing Data • ARFF Data Format • Header – describing the attribute types • Data – (instances, examples) comma-separated list

Launching WEKA • java -jar weka.jar

Load Dataset into WEKA

Data Filters • Useful support for data preprocessing • Removing or adding attributes, resampling the dataset, removing examples, etc. • Creates stratified cross-validation folds of the given dataset, and class distributions are approximately retained within each fold. • Typically split data as 2/3 in training and 1/3 in testing

Data Filters

Building Classifiers • A classifier model - mapping from dataset attributes to the class (target) attribute. Creation and form differs. • Decision Tree and Naïve Bayes Classifiers • Which one is the best? • No Free Lunch!

Building Classifiers

(1) weka.classifiers.rules.ZeroR • Class for building and using a 0-R classifier • Majority class classifier • Predicts the mean (for a numeric class) or the mode (for a nominal class)

Exercise 1 • http://www.cs.iastate.edu/~cs573x/BBSIlab/2006/exercises/ex1.html

(2)weka.classifiers.bayes.NaiveBayes • Class for building a Naive Bayes classifier

(3) weka.classifiers.trees.J48 • Class for generating a pruned or unpruned C4.5 decision tree

Test Options • Percentage Split (2/3 Training; 1/3 Testing) • Cross-validation • estimating the generalization error based on resampling when limited data; averaged error estimate. • stratified • 10-fold • leave-one-out (Loo)

Understanding Output

Decision Tree Output (1)

Decision Tree Output (2)

Performance Measures • Accuracy & Error rate • Confusion matrix – contingency table • True Positive rate & False Positive rate (Area under Receiver Operating Characteristic) • Precision,Recall & F-Measure • Sensitivity & Specificity • For more information on these, see • uisp09-Evaluation.ppt

Decision Tree Pruning • Overcome Over-fitting • Pre-pruning and Post-pruning • Reduced error pruning • Subtree raising with different confidence • Comparing tree size and accuracy

Subtree replacement • Bottom-up: tree is considered for replacement once all its subtrees have been considered

Subtree Raising • Deletes node and redistributes instances • Slower than subtree replacement

An Exercise in Machine Learning

An Exercise in Machine Learning

Presentation Transcript

Topics in Machine Learning

Machine Learning in Bioinformatics

An exercise in forecasting

An Interactive Group Learning Exercise

An Introduction to Machine Learning

An Overview of Machine Learning

Learning from an ePMA procurement exercise

Machine Learning in DryadLINQ

Machine learning in IDS

Submodularity in Machine Learning

an exercise in composition

Machine Learning: An Overview

Classification by Machine Learning Approaches - Exercise Solution

Machine Learning in realtime

Machine Learning in GATE

Experiments in Machine Learning

Evaluation in Machine Learning

Machine Learning in Football

CAUSAL INFERENCE AS A MACHINE LEARNING EXERCISE

Machine Learning - An Emerging Career!

Experiments in Machine Learning

An Overview of Machine Learning