1 / 14

Meta Learning: For Classification

Meta Learning: For Classification. Daniel Spohn drspohn@student.ysu.edu Youngstown State University 03-23-06. Introduction - Meta-Learning. Process to improve the results by using additional methods.

Samuel
Télécharger la présentation

Meta Learning: For Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meta Learning:For Classification Daniel Spohn drspohn@student.ysu.edu Youngstown State University 03-23-06

  2. Introduction - Meta-Learning Process to improve the results by using additional methods. • Bagging, Stacking and Boosting all involve running multiple classification algorithms or running the same algorithm multiple times. • Meta-Learning can also be run on a table to determine which algorithm is most appropriate for Learning with. Daniel Spohn Youngstown State University

  3. Stacking • Using multiple algorithms and combining their results. • Output from the previous layer is passed on as input to the next layer. • Ex. The Output of a decision tree and be used as input for a neural network. • This combining of algorithms helps reduce the problems of an individual algorithm. Daniel Spohn Youngstown State University

  4. Boosting • Algorithm is recursively run until misclassification is low. • The instances that are difficult to classify are given higher weight and the algorithm is run again. • As an alternative, boosting can be run with random sub-samples each time, instead of weights. Daniel Spohn Youngstown State University

  5. Bagging • Bagging (Bootstrap Aggregation) • Implements “Voting” • Multiple algorithms are run independently on different subsets of the dataset. • The results of the algorithms are then combined to create the final model. Daniel Spohn Youngstown State University

  6. Meta-Classifiers • Meta-Learning using Meta-Classifiers • Meta-Classifiers are the attributes of the dataset. (ex. Number of columns, types of attributes, etc.) • Using the Meta-Classifiers a dataset is compared to previously analyzed datasets and a ranking is produced to showing the estimated effectiveness of classification algorithms. Daniel Spohn Youngstown State University

  7. Datasets - Adults • Adult.arff • 48842 instances • 14 attributes (6 continuous, 8 nominal) • Contains information on adults such as age, gender, ethnicity, martial status, education, native country, etc. • The instances are classified into either “Salary >50K” or “Salary <= 50K” Daniel Spohn Youngstown State University

  8. Datasets – Census Income • census-income.arff • 199,523 instances • 40 attributes (7 continuous, 33 nominal) • Demographic information and monetary information from the 1994 and 1995 surveys conducted by the U.S. Census Bureau. • The instances are classified into either “Income >50K” or “Income <= 50K” Daniel Spohn Youngstown State University

  9. Datasets - Census Income • The dataset has already been split into a training set and a testing set (2/3 training, 1/3 testing). • This dataset contains missing values, and some attributes may need to be discretized to improved performance and effectiveness of the algorithm. • Additionally some attributes may need to be removed because of irrelevancy. Daniel Spohn Youngstown State University

  10. Analyzing Adult Dataset • Running this dataset in Weka MetaL (Meta Classifiers) produces: • The Top ranked Algorithm is LogitBoost. Daniel Spohn Youngstown State University

  11. Analyzing Adult Dataset • When using Weka MetaL's top ranked algorithm, LogitBoost, 84.68% of instances are correctly classified. • When using Weka MetaL's lowest ranked algorithm, ZeroR, 76.07% of instances are correctly classified. Daniel Spohn Youngstown State University

  12. Analyzing Census Dataset • Took a sample of 9,953 records (of 199,523) • Of the 40 attributes, selected 9 of the most important to help classification. • Grouped the attributes that contain continuous numbers into static groups. Daniel Spohn Youngstown State University

  13. Analyzing Census Dataset • Running J48 results in: • <=50K = 99%Correctly Classified • >50K = 17% Correctly Classified • Running ADABoost with J48 as the classifier: • <=50K = 98%Correctly Classified • >50K = 36% Correctly Classified Daniel Spohn Youngstown State University

  14. Summary • Meta Learning helps improve results over the basic algorithms. • Using Meta Characteristics on the Adult dataset to determine an appropriate algorithm, I achieved almost 85% correct classification. • Using Boosting – AdaBoost with J48 correctly classified more than twice the amount of correct instances for the second group, than J48 alone. Daniel Spohn Youngstown State University

More Related