Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN {hidenao,yamaguti}@ks.cs.inf.shizuoka.ac.jp

Contents • Introduction • Constructive meta-learning based on method repositories • Case study with common data sets • Conclusion Parallel and Distributed Computing for Machine Learning 2003

学習データセット学習 Data Set 学習 Learning Algorithms Overview of meta-learning scheme Meta-learning algorithm Meta Knowledge Meta-Learning Executor A Better result to the given data set than its done by each single learning algorithm Parallel and Distributed Computing for Machine Learning 2003

Selective meta-learning scheme • Integrating base-level classifiers, which are learned with different training data sets generating by • “Bootstrap Sampling” (bagging) • weighting ill-classified instances (boosting) • Integrating base-level classifiers, which are learned with different algorithms, with • simple voting (voting) • constructing meta-classifier with a meta data set and a meta-level learning algorithm (stacking, cascading) They don’t work well, when no base-level learning algorithm works well to the given data set !! -> Because they don’t de-compose base-level learning algorithms. Parallel and Distributed Computing for Machine Learning 2003

Contents • Introduction • Constructive meta-learning based on method repositories • Basic idea of our constructive meta-learning • An implementation of the constructive meta-learning called CAMLET • Parallel execution and search for CAMLET • Case Study with common data sets • Conclusion Parallel and Distributed Computing for Machine Learning 2003

De-composition& Organization Search & Composition + C4.5 CS VS AQ15 Basic Idea of ourConstructive Meta-Learning Analysis of Representative Inductive Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

De-composition& Organization Search & Composition + CS AQ15 Basic Idea ofConstructive Meta-Learning Analysis of Representative Inductive Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

De-composition& Organization Search & Composition + Automatic Composition of Inductive Applications Organizing inductive learning methods, control structures and treated objects. Basic Idea ofConstructive Meta-Learning Analysis of Representative Inductive Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

Analysis of Representative Inductive Learning Algorithms We have analyzed the following 8 learning algorithms: • Version Space • AQ15 • ID3 • C4.5 • Boosted C4.5 • Bagged C4.5 • Neural Network with back propagation • Classifier Systems We have identified 22 specific inductive learning methods. Parallel and Distributed Computing for Machine Learning 2003

Inductive Learning Method Repository Parallel and Distributed Computing for Machine Learning 2003

Data Type Hierarchy Organization of input/output/reference data types for inductive learning methods Objects classifier-set If-Then rules tree (Decision Tree) network (Neural Net) data-set training data set validation data set test data set Parallel and Distributed Computing for Machine Learning 2003

Identifying Inductive Learning Methods and Control Structures Modifying training and validation data sets Generating Training & Validation Data Sets Evaluating classifier sets to test data set Generating a classifier set Evaluating a classifier set Modifying a classifier set Modifying a classifier set Parallel and Distributed Computing for Machine Learning 2003

CAMLET:a Computer Aided Machine Learning Engineering Tool Data Sets, Goal Accuracy CAMLET Method Repository Data Type Hierarchy Control Structures Construction Instantiation Compile User Go to the goal accuracy? Go & Test Yes No Refinement Inductive application, etc. Parallel and Distributed Computing for Machine Learning 2003

CAMLET with a parallel environment Data Sets, Goal Accuracy CAMLET Method Repository Data Type Hierarchy Control Structures Composition Level Construction Specifications Data Sets Execution Level Instantiation Compile User Go to the goal accuracy? Go & Test Yes No Results Refinement Inductive application, etc. Parallel and Distributed Computing for Machine Learning 2003

Parallel Executions of Inductive Applications Initializing Composition Level is necessary one process Constructing specs Sending specs Execution Level is necessary one or more process elements R R R R Refining & Sending Executing Receiving Refining the spec Sending refined one Waiting Receiving a result Ex Ex Ex R Ex Ex Ex Ex Ex Ex S Ex R: receiving ,Ex: executing, S: sending Parallel and Distributed Computing for Machine Learning 2003

Refinement of Inductive Applications with GA Executed Specs & Their Results t Generation • Transform executed specifications to chromosomes • Select parents with “Tournament Method” • Crossover the parents and Mutate one of their children • Execute children’s specification, transforming chromosomes to specs • * If CAMLET can execute more than two I.A at the same time, some slow I.A will be added to t+2 or later Generation. Selection execute & add Parents Crossover and Mutation Children t+1 Generation X Generation Parallel and Distributed Computing for Machine Learning 2003

CAMLET on a parallel GA refinement Data Sets, Goal Accuracy CAMLET Method Repository Data Type Hierarchy Control Structures Composition Level Construction Specifications Data Sets Execution Level Instantiation Compile User Go to the goal accuracy? Go & Test Yes No Results Refinement Inductive application, etc. Parallel and Distributed Computing for Machine Learning 2003

Contents • Introduction • Constructive meta-learning based on method repositories • Case Study with common data sets • Accuracy comparison with common data sets • Evaluation of parallel efficiencies in CAMLET • Conclusion Parallel and Distributed Computing for Machine Learning 2003

Set up of accuracy comparison with common data sets • We have used two kinds of common data sets • 10 Statlog data sets • 32 UCI data sets (distributed with WEKA) • We have input goal accuracies to each data set as the criterion. • CAMLET has output just one specification of the best inductive application to each data set and its performance. • CAMLET has searched about 6,000 inductive applications for the best one, executing up to one hundred inductive applications. • 10 “Execution Level” PEs have been used to each data set. • Stacking methods implemented in WEKA data mining suites Parallel and Distributed Computing for Machine Learning 2003

Statlog common data sets To generate 10-fold cross validation data sets’ pairs, We have used SplitDatasetFilter implemented in WEKA. Parallel and Distributed Computing for Machine Learning 2003

Setting up of Stacking methods to Statlog common data sets Meta-level Learning Algorithms J4.8 (with pruning, C=0.25) Naïve Bayes Classification with Linear Regression (with all of features) J4.8 (with pruning, C=0.25) Naïve Bayes Part (with pruning, C=0.25) IBk (k=5) Bagged J4.8 (Sampling rate=55%, 5 Iterations) Bagged J4.8 (Sampling rate=55%, 10 Iterations) Boosted J4.8 (5 Iterations) Boosted J4.8 (10 Iterations) Base-level Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

Evaluation on accuracies to Statlog common data sets Inductive applications composed by CAMLET have shown us as good performance as that of stacking with Classification via Linear Regression. Parallel and Distributed Computing for Machine Learning 2003

Setting up of Stacking methods to UCI common data sets Meta-level Learning Algorithms J4.8 (with pruning, C=0.25) Naïve Bayes Linear Regression (with all of features) J4.8 (with pruning, C=0.25) Naïve Bayes Part (with pruning, C=0.25) IBk (k=5) Bagged J4.8 (Sampling rate=55%, 10 Iterations) Boosted J4.8 (10 Iterations) Base-level Learning Algorithms Parallel and Distributed Computing for Machine Learning 2003

Evaluation on accuracies to UCI common data sets Parallel and Distributed Computing for Machine Learning 2003

Analysis of parallel efficiency of CAMLET • We have analyzed parallel efficiencies of executions of the case study with “Statlog common data sets”. • We have used 10 processing elements to execute each inductive applications. • Parallel Efficiency is calculated as the following formula: 1<=x<100, n:#fold, e:Execution Time of Each PE Parallel and Distributed Computing for Machine Learning 2003

Parallel Efficiency of the case study with StatLog data sets Parallel Efficiencies of 10-fold CV data sets Parallel Efficiencies of training-test data sets Parallel and Distributed Computing for Machine Learning 2003

Why the parallel efficiency of “satimage” data set was so low? • Because: • - CAMLET could not find satisfactory inductive application to this • data set within 100 executions of inductive applications. • In the end of search, inductive applications with large computational • cost badly affected to parallel efficiency. Parallel and Distributed Computing for Machine Learning 2003

Contents • Introduction • Constructive meta-learning based on method repositories • Case Study with common data sets • Conclusion Parallel and Distributed Computing for Machine Learning 2003

Conclusion • CAMLET has been implemented as a tool for “Constructive Meta-Learning” scheme based on method repositories. • CAMLET shows us a significant performance as a meta-learning scheme. • We are extending the method repository to construct data mining application of whole data mining process. • The parallel efficiency of CAMLET has been achieved greater than 5.93 times. • It should be improved • with methods to equalize each execution time such as load balancing or/and three-layered architecture. • with more efficient search method to find satisfactory inductive application faster. Parallel and Distributed Computing for Machine Learning 2003

Thank you! Parallel and Distributed Computing for Machine Learning 2003

ベースレベル 分類器ベースレベル分類器ベースレベル分類器ベースレベル分類器 Base-level classifiers Base-level classifiers Stacking: training and prediction Training phase Prediction phase Training Set Repeating with CV Training Set Test Set Learning of Base-level classifiers Training Set (Int.) Test Set (Int.) Learning of Base-level classifiers Transformation Meta-level Test Set Transformation Prediction Meta-level Training Set Meta-level classifiers Meta-level learning Meta-level classifier Results of Preduction Parallel and Distributed Computing for Machine Learning 2003

How to transforme base-level attributes to meta-level attributes Learning with Algorithm 1 Classifier by Algorithm 1 • Getting prediction probability to each class value with the • classifier. • 2. Adding base-level class value of each base-level instance A algorithm C class values #Meta-level Att. = AC Parallel and Distributed Computing for Machine Learning 2003

Meta-level classifiers (Ex.) The prediction probability of”0” With classifier by algorithm 1 ＞0.1 ≦0.1 The prediction probability of”1” With classifier by algorithm 2 The prediction probability of”1” With classifier by algorithm 1 ＞0.95 ≦0.95 ≦0.9 ＞0.9 The prediction probability of”0” With classifier by algorithm 3 Prediction “1” Prediction “1” Prediction “0” ．．．． Parallel and Distributed Computing for Machine Learning 2003

Accuracies of 8 base-level learning algorithms to Statlog data sets CAMLET Base-level algorithms Stacking Base-level algorithms Accuracy（％） Accuracy（％） Parallel and Distributed Computing for Machine Learning 2003

C4.5 Decision Tree without pruning (reproduction of CAMLET) Generating training and validation data sets with a void validation set 1. Select just one control structure 2. Fill each method with specific methods 3. Instantiate this spec. 4. Compile the spec. 5. Execute its executable code 6. Refine the spec. (if needed) Generating a classifier set (decision tree) with entropy + information ratio Evaluating a classifier set with set evaluation Evaluating classifier sets to test data set with single classifier set evaluation Parallel and Distributed Computing for Machine Learning 2003

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods

Presentation Transcript

Comparing the accuracy of prediction methods

Parallel Spectral Methods: Fast Fourier Transform (FFTs) with Applications

Parallel Methods for Nano/Materials Science Applications

Parallel Applications of the Minimax Algorithm

Comparing Training Methods

Comparing Research Methods

Applications of Inductive Logic Programming(ILP)

Automatic methods of MT evaluation

Parallel composition, reduction

Parallel Methods for Nano/Materials Science Applications

Parallel Spectral Methods: Fast Fourier Transform (FFTs) with Applications

Scheduling Mixed Parallel Applications with Reservations

PARALLEL APPLICATIONS

Parallel Methods for Nano/Materials Science Applications

Comparing Web Applications with Desktop Applications: An Empirical Study

Parallel Spectral Methods: Fast Fourier Transform (FFTs) with Applications

Parallel Spectral Methods: Fast Fourier Transform (FFTs) with Applications

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods

Parallel Spectral Methods: Fast Fourier Transform (FFTs) with Applications

Parallel Spectral Methods: Fast Fourier Transform (FFTs) with Applications