1 / 36

WEKA

WEKA. Weka: the bird. Copyright: Martin Kramer (mkramer@wxs.nl) . Hamilton. WEKA: the software. W aikato E nvironment for K nowledge A nalysis Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java Released under the GPL

yank
Télécharger la présentation

WEKA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WEKA

  2. Weka: the bird Copyright: Martin Kramer (mkramer@wxs.nl) University of Waikato

  3. University of Waikato

  4. University of Waikato

  5. University of Waikato

  6. Hamilton University of Waikato

  7. WEKA: the software • Waikato Environment for Knowledge Analysis • Collection of state-of-the-art machine learning algorithms and data processing tools implemented in Java • Released under the GPL • Support for the whole process of experimental data mining • Preparation of input data • Statistical evaluation of learning schemes • Visualization of input data and the result of learning • Used for education, research and applications • Complements “Data Mining” by Witten & Frank University of Waikato

  8. Main Features • 49 data preprocessing tools • 76 classification/regression algorithms • 8 clustering algorithms • 15 attribute/subset evaluators + 10 search algorithms for feature selection • 3 algorithms for finding association rules • 3 graphical user interfaces • “The Explorer” (exploratory data analysis) • “The Experimenter” (experimental environment) • “The KnowledgeFlow” (new process model inspired interface) University of Waikato

  9. History • Project funded by the NZ government since 1993 • Develop state-of-the art workbench of data mining tools • Explore fielded applications • Develop new fundamental methods University of Waikato

  10. History - timeline • Late 1992 - funding was applied for by Ian Witten • 1993 - development of the interface and infrastructure • WEKA acronym coined by Geoff Holmes • WEKA’s file format “ARFF” was created by Andrew Donkin • ARFF was rumored to stand for Andrew’s Ridiculous File Format • Sometime in 1994 - first internal release of WEKA • TCL/TK user interface + learning algorithms written mostly in C • Very much beta software • Changes for the b1 release included (among others): “Ambiguous and Unsupported menu commands removed.” “Crashing processes handled (in most cases :-)” • October 1996 - first public release of WEKA (v 2.1) University of Waikato

  11. History - timeline • July 1997 - WEKA 2.2 • Schemes: 1R, T2, K*, M5, M5Class, IB1-4, FOIL, PEBLS, support for C5 • Included a facility (based on unix makefiles) for configuring and running large scale experiments • Early 1997 - decision was made to rewrite WEKA in Java • Originated from code written by Eibe Frank for his PhD • Originally codenamed JAWS (JAva Weka System) • May 1998 - WEKA 2.3 • Last release of the TCL/TK-based system • Mid 1999 - WEKA 3 (100% Java) released • Version to complement the Data Mining book • Development version (including GUI) University of Waikato

  12. Back then… University of Waikato

  13. Today:

  14. Explorer: pre-processing the data • Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary • Data can also be read from a URL or from an SQL databases using JDBC • Pre-processing tools in WEKA are called “filters” • WEKA contains filters for: • Discretization, normalization, resampling, attribute selection, attribute combination, … University of Waikato

  15. Explorer: Building classification models • “Classifiers” in WEKA are models for predicting nominal or numeric quantities • Implemented schemes include: • Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, … • “Meta”-classifiers include: • Bagging, boosting, stacking, error-correcting output codes, data cleansing, … University of Waikato

  16. Explorer: classification University of Waikato

  17. Explorer: classification University of Waikato

  18. Explorer: classification University of Waikato

  19. Explorer: classification University of Waikato

  20. Explorer: classification University of Waikato

  21. Explorer: classification University of Waikato

  22. Explorer: classification University of Waikato

  23. Explorer: classification University of Waikato

  24. Explorer: classification University of Waikato

  25. KnowledgeFlow: process flows University of Waikato

  26. KnowledgeFlow: batch processing University of Waikato

  27. KnowledgeFlow: batch processing University of Waikato

  28. KnowledgeFlow: incremental processing University of Waikato

  29. Experimenter University of Waikato

  30. Experimenter University of Waikato

  31. Experimenter University of Waikato

  32. Impact - downloads University of Waikato

  33. Projects based on WEKA • Incorporate/wrap WEKA • GRB Tool Shed - a tool to aid gamma ray burst research • YALE - facility for large scale ML experiments • GATE - NLP workbench with a WEKA interface • Judge - document clustering and classification • Extend/modify WEKA • BioWeka - extension library for knowledge discovery in biology • WekaMetal - meta learning extension to WEKA • Weka-Parallel - parallel processing for WEKA • Grid Weka - grid computing using WEKA • Weka-CG - computational genetics tool library University of Waikato

  34. The WEKA Project Today • FRST funding for the next two years • Goal of the project remains the same • People • 6 staff • 2 postdocs • 3 PhD students • 3 MSc students • 2 research programmers University of Waikato

  35. The Future • Continue to develop and support WEKA • MOA (Massive Online Analysis) • Framework that supports learning from data streams • Facilities for data generation, experimental analysis, learning algorithms, etc. • The Moa (another native NZ bird) is not only flightless, like the Weka, but also extinct • First public release, probably this Christmas, or perhaps Thanksgiving (as it’s just another turkey) • MILK • Multi-Instance Learning Kit • Proper • Propositionalization toolbox for WEKA University of Waikato

More Related