60 likes | 206 Vues
This document presents an overview of high-performance data mining (HP-KDD) challenges and the current tools used in the field. It provides insights into software tools like MLC++/MineSet and NCSA D2K, highlighting their application in knowledge discovery and data mining (KDD). The objectives of the lab include gaining proficiency with KDD software, understanding modern machine learning systems, and implementing data-driven projects with specified milestones and evaluation criteria. The project incorporates supervised learning and performance evaluation techniques, utilizing various data sources for effective decision support.
E N D
Lab 0 High-Performance Data Mining: Problems and Current Tools Monday, May 15, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu
Data Mining Software Practicum • Lab and Implementation Project Objectives • Learn to work with some KDD software tools • Understand modern ML-based object oriented HP-KDD systems • Main tools: MLC++/MineSet and NCSA D2K • Implementation Project Milestones • Project plan • Due Monday, May 22, 2000 (Homework 1) • Short writeup: 1-2 pages design plus simple specification document • Project interim report • Due Wednesday, May 31, 2000 (Homework 2) • Preliminary itinerary and documentation • Comparative results • Using software tools • Due Thursday, June 8, 2000 (machine problem; Homework 3) • Exposure to: MineSet; SNNS, NS3; Hugin, BKD;GPSys, Genesis
Rapid KDD Development Environment KDD and Software Engineering:D2K Framework
Environment (Data Model) Knowledge Base Performance Element Learning Element Performance Element:Decision Support Systems (DSS) • Model Identification (Relational Database) • Specify data model • Group attributes by type (dimension) • Define queries • Prediction Objective Identification • Identify target function • Define hypothesis space • Transformation of Data • Reduce data: e.g., decrease frequency • Selectrelevant data channels (given prediction objective) • Integrate models, sources of data (e.g., interactively elicited rules) • Supervised Learning • Analysis and Assimilation: Performance Evaluation using DSS
6500 news stories from the WWW in 1997 NCSA D2K - http://www.ncsa.uiuc.edu/STI/ALG Cartia ThemeScapes - http://www.cartia.com Database Mining Reasoning (Inference, Decision Support) Destroyed Normal Extinguished Ignited Fire Alarm Engulfed Flooding Planning, Control DC-ARM - http://www-kbs.ai.uiuc.edu Some Interesting Industrial Applications