180 likes | 337 Vues
This presentation by Manoj Wartikar and Sameer Sagade provides an in-depth evaluation of the WEKA (Waikato Environment for Knowledge Analysis) system. It covers essential features, including attribute selection, clustering, and classification algorithms, highlighting both advantages and disadvantages. Developed in Java, WEKA supports a comprehensive range of machine learning algorithms and is well-suited for data mining tasks. Enhancements in version 3.1.7 address prior shortcomings, featuring improved visualizations and a user-friendly interface. Explore WEKA's capabilities for effective data analysis.
E N D
Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade
Outline • Introduction to the WEKA System. • Features • Pros and Cons • Enhancements
Introduction • A research project at the University of Waikato, NZ • Weka is a collection of machine learning algorithms for solving real-world data mining problems. • Developed in Java 2
Features • Documented features of WEKA • Attribute Selection • Clustering • Classification • Association Rules • Filters • Estimators
Attribute Selection • A part of the Preprocessing phase in the Knowledge Discovery process. • Useful to specify the attributes and their values on which data can be mined.
Attribute Selection contd…. • Algorithms Implemented • Best First • Forward Selection • Ranked Output First
Clustering • Algorithms Implemented • Cobweb • Estimation Maximization • Clusterer • Distribution Clusterer
Classification • Algorithms Implemented • K Nearest Neighbor • Naïve Bayes • Bagging • Boosting • Multi - Class Classifier
Association Rules • Algorithms Implemented • Apriori
Filters • Algorithms Implemented • Attribute Filter • Discretize Filter • Split Dataset Filter
Estimators • Algorithms Implemented • Discrete Estimator • Kernel Estimator • Normal Estimator • Poisson Estimator
Sample Execution java weka.associations.Apriori -t data/weather.nominal.arff -I yes Apriori ======= Minimum support: 0.2 Minimum confidence: 0.9 Number of cycles performed: 17 Generated sets of large itemsets: Size of set of large itemsets L(1): 12
Sample Execution Best rules found: 1. humidity=normal windy=FALSE 4 ==> play=yes 4 (1) 2. temperature=cool 4 ==> humidity=normal 4 (1) 3. outlook=overcast 4 ==> play=yes 4 (1) 4. temperature=cool play=yes 3 ==> humidity=normal 3 (1) 5. outlook=rainy windy=FALSE 3 ==> play=yes 3 (1) 6. outlook=rainy play=yes 3 ==> windy=FALSE 3 (1) 7. outlook=sunny humidity=high 3 ==> play=no 3 (1) 8. outlook=sunny play=no 3 ==> humidity=high 3 (1)
Boosting • ADA Boost • Logit Boost • Decision Stump
Pros and Cons of WEKA • Covers the Entire Machine Learning Process • Easy to compare the results of the different algorithms implemented • Accepts one of the most widely used data formats as input i.e the ARFF format.
Flexible APIs for programmers Customization possible Pros and Cons for WEKA
Pros and Cons for WEKA • Textual User Interface • Requires the Java Virtual Machine to be installed for execution • Visualization of the mining results not possible
Enhancements • The new version of WEKA 3.1.7 overcomes some of the decripancies of the previous version like • Graphical User Interface • Visualization of Results. • Mining of Non - local data bases