1 / 28

Data Mining Introduction

Data Mining Introduction. TYNE SYSTEM Chun-hung, Chou 2003.12.09. Outline. 1. Data Mining Overview 2. Functionalities 3. Software 4. R function 5. Example 6. Q & A. Data Mining Overview. Knowledge Discovery Process. 1. Data cleaning - remove noise and inconsistent data

jabir
Télécharger la présentation

Data Mining Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Introduction TYNE SYSTEM Chun-hung, Chou 2003.12.09

  2. Outline 1. Data Mining Overview 2. Functionalities 3. Software 4. R function 5. Example 6. Q & A

  3. Data Mining Overview

  4. Knowledge Discovery Process 1. Data cleaning - remove noise and inconsistent data 2. Data integration - combine multiple data sources 3. Data selection - data relevant to the analysis task 4. Data transformation - the forms for mining 5. Data mining 6. Pattern evaluation - identify 7. Knowledge presentation

  5. What is Data Mining? • Viewed as part of the Knowledge Discovery process. • Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data. • Uses tools from Computer Science and Artificial Intelligence as well as Statistics.

  6. Why do we need data mining? • Large number of records (cases) (108-1012 bytes) • High dimensional data (variables) (10-104 attributes) • Only a small portion, typically 5% to 10%, of the collected data is ever analyzed. • Data that may never be explored continues to be collected out of fear that something that may prove important in the future may be missing. • Magnitude of data precludes most traditional analysis ANOVA/PC/

  7. Potential Applications • Fraud Detection • Manufacturing Processes • Targeting Markets • Scientific Data Analysis • Risk Management • Web Intelligence • Bioinformation • …...

  8. Data Mining Myths • Data mining tools need no guidance. • Data mining models explain behavior. • Data mining requires no data analysis skill. • Data mining tools are “different” from statistics • Data mining eliminates the need to understand your business and • your data • .

  9. Data Mining Functionalities • Concept/Class Description • Association Analysis • Classification Analysis • Cluster Analysis • Outlier Analysis • Evolution Analysis

  10. Concept Description Generate descriptions for characterization and comparison of data characterization : summarizes and describes a collection of data e.g. mean,distribution,percentile,.. comparison : summarizes and distinguishes one collection of data from other collection(s) of data

  11. Concept Description Method: visualization: e.g. boxplot,bar chart, histogram,… statistics/tabulate: e.g. mean, std, proportion,contingency table…

  12. Association Analysis • Goal: • find interesting relationships among items in • a given data set

  13. Association Analysis Example: • Market Basket Analysis - An example of Rule-based Machine Learning • Customer Analysis • Market Basket Analysis uses the information about what a customer purchases to give us insight into who they are and why they make certain purchases • Product Analysis • Market Basket Analysis gives us insight into the merchandise by telling us which products tend to be purchased together and which are most amenable to purchase

  14. Classification Analysis Goal: Build a model to describe a predetermined set of data classes or concepts and use the model as prediction

  15. Classification Analysis Method: Decision Tree Bayesian network Bayesian belife network Neural network k-nearest neighbor case-based reasoning genetic algorithm rough sets fuzzy logic

  16. Cluster Analysis Goal: grouping a set of physical or abstract objects into classes of similar objects

  17. Cluster • Method: Partitioning methods :k-means Hierarchical methods :top-down,bottom-up Density-based methods :arbitrary shapes Grid-based methods :cells Model-based methods :best fit of given model

  18. Outlier Analysis Outlier: the data can be considered as inconsistent in a given data set Goal: find an efficient method to mine the outliers

  19. Outlier Analysis Method: - Statistical-Based Outlier Detection - Distance-Based Outlier Detection - Deviation-Based Outlier Detection

  20. Evolution Analysis • Goal: Describe and models regularities or trends for objects whose behavior changes over time

  21. Evolution Analysis • Method: Statistical Method Trend Analysis Similarity Search in Time-Series Analysis Sequential Pattern Mining Periodicity Analysis

  22. Commercial Software • Full Suite

  23. Method in R

  24. Example—Decision Tree • Decision Tree for Tools abnormal detection AWD030,AWD050 AWD080

  25. Example– Decision Tree

  26. Example -- Cluster

  27. Question & Suggestion

  28. Thanks !

More Related