1 / 12

CS 8520: Artificial Intelligence

CS 8520: Artificial Intelligence. Weka Lab Paula Matuszek Spring, 2013. CSC 8520 Spring 2013. Paula Matuszek. Weka is. Waikato Environment for Knowledge Analysis Machine Learning Software Suite from the University of Waikato Been under development for 20 years

Télécharger la présentation

CS 8520: Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 8520: Artificial Intelligence Weka Lab Paula Matuszek Spring, 2013 CSC 8520 Spring 2013. Paula Matuszek

  2. Weka is • Waikato Environment for Knowledge Analysis • Machine Learning Software Suite from the University of Waikato • Been under development for 20 years • Well-developed, maintained, supported • Open source • Windows, Mac and Unix versions • http://www.cs.waikato.ac.nz/ml/weka/index.html • Lots of help available at the wiki: • http://weka.wikispaces.com/ CSC 8520 Spring 2013. Paula Matuszek

  3. ROC Curve • {Receiver|Relative} Operating Characteristic Curve • Name derives from signal detection theory • Basically plots sensitivity on the Y axis against specificity on the X-axis (actually 1-specificity) • Ideal would be (0,1). Random would be (0.5, 0.5) (in a balanced domain) • Useful for • evaluating a classifier • comparing classifiers • setting cutoffs for class membership CSC 8520 Spring 2013. Paula Matuszek

  4. http://en.wikipedia.org/wiki/File:ROC_space-2.png CSC 8520 Spring 2013. Paula Matuszek

  5. More Weka • Last week -- cross-validated decision tree. • Go through section 4.2 of the tutorial. • What data set did you use? • Which classifier did better based on the confusion matrix? • What about the ROC curve? CSC 8520 Spring 2013. Paula Matuszek

  6. Trying a Support Vector Classifier • SMO is a support vector classifier • http://weka.sourceforge.net/doc/weka/classifiers/functions/SMO.html • libSVM is a faster SVM, but it is not installed with Weka; all that is there is a wrapper. CSC 8520 Spring 2013. Paula Matuszek

  7. Decision Tree vs SMO • Repeat section 4.2, replacing the RandomForest classifier with SMO • What were the results for your data source? CSC 8520 Spring 2013. Paula Matuszek

  8. Moving on to the Weka Explorer • Explore some of the data sets included with Weka. • Restart Weka, using the Explorer instead of the KnowledgeFlow. • Make sure the Proprocess step is highlighted • Use the Open File Option to look at some of the data sets • Choose one which is binary • usually there is a feature just labeled class • And looks interesting. CSC 8520 Spring 2013. Paula Matuszek

  9. Exploring with Weka • Going to go through a different tutorial which uses the Explorer interface • The tutorial is at http://www.ibm.com/developerworks/opensource/library/os-weka2/index.html • It uses data which can be downloaded at the Download section about 2/3 of the way down the page. CSC 8520 Spring 2013. Paula Matuszek

  10. Decision Tree Again • The first part of the tutorial creates a decision tree using J48, as in the Knowledge Flow Tutorial. • This should give exactly the same results as the KnowledgeFlow approach; it’s just a different interface. • Which did you find easier? • Try it on the data set you chose earlier. How well did it do? CSC 8520 Spring 2013. Paula Matuszek

  11. Clustering • The second part of the tutorial uses a simpleKMeans cluster algorithm. • Try it on the sample data they provide. • Do the results for their data make sense? • Set the number of clusters to 2 and try it on the data set you chose. • Do the results make sense? • Do the two clusters match the two classes in your data? • Try it again removing the “class” feature. Do you still get reasonable results? CSC 8520 Spring 2013. Paula Matuszek

  12. Explore! • Go ahead and try some of the other capabilities in Weka. CSC 8520 Spring 2013. Paula Matuszek

More Related