Data Mining Fundamentals: A Comprehensive Overview

Part I Data Mining Fundamentals

Data Mining: A First View Chapter 1

1.1 Data Mining: A Definition

Data Mining The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned.

Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.

Data Mining: A KDD Process Knowledge Pattern Evaluation • Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

1.2 What Can Computers Learn?

Four Levels of Learning Facts Concepts Procedures (to be worked out) Principles

Concepts Computers are good at learning concepts. Concepts are the output of a data mining session.

Three Concept Views Classical View (Crisp)---old hands As a definition Probabilistic View (85%)---with some experience DM rules with confidence Exemplar View (CBR)—new comer An illustrated example: good credit?

Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome new instances of unknown origin.

Supervised Learning: A Decision Tree Example

Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes.

Figure 1.1 A decision tree for the data in Table 1.1

Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy

Unsupervised Clustering A data mining method that builds models from data without predefined classes.

3 groups formed (table 1.3 is only a part of whole table) G1.MarginAccount=yes and age =20-29 and AnnualIncome=40-59k accuracy=80% coverage=0.5 G2. AccountType=Custodial and FavoriteRecreation=Skiing and AnnualIncome=40-59k accuracy=95% coverage=0.35 G3.AccountType=joint and Trades/Month>5 and TransactionMethod=online accuracy=82% coverage=0.65

1.3 Is Data Mining Appropriate for My Problem?

Data Mining or Data Query? Shallow Knowledge (SQL) Multidimensional Knowledge (OLAP) Hidden Knowledge (DM) Deep Knowledge (human)

Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.

1.4 Expert Systems or Data Mining?

圖１４－２　專家系統架構細部圖

Expert System A computer program that emulates the problem-solving skills of one or more human experts.

Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge.

Figure 1.2 Data mining vs. expert systems

1.5 A Simple Data Mining Process Model

Figure 1.3 A simple data mining process model

Assembling the Data The Data Warehouse Relational Databases and Flat Files

Mining the Data

Interpreting the Results

Result Application

1.6 Why Not Simple Search? Nearest Neighbor Classifier (i.e., CBA, add a new instance in a class based on similarity) Time consuming and entropy independent K-nearest Neighbor Classifier Form a class consisting of K-nearest neighbors

Assignment 4 A new instance, Patient ID=14, Sore Throat=yes, Fever =No, Swollen Glands=No, Congestion =No, Headache =No Comparison: with one matched attribute: ID=1,9 with one matched attribute: ID=2,5,10 with one matched attribute: ID=3,6,7,8 with one matched attribute: ID=4strep throat? Correct diagnosis should be allergy using decision tree Q: Try K-nearest Neighbor Classifier

1.7 Data Mining Applications

Customer Intrinsic Value

Figure 1.4 Intrinsic vs. actual customer value

Data Mining Fundamentals: A Comprehensive Overview

Data Mining Fundamentals: A Comprehensive Overview

Presentation Transcript

Part I

Part I

Part I

Part I

Part I

Part I

PART I

Part I

Part I

PART I:

Part I

Part I

Part I I I

Part I

Part I

Part I

PART I - I

PART I

Part I

Part I

Part-I

Part I