1 / 40

Part I

Part I. Data Mining Fundamentals. Data Mining: A First View. Chapter 1. 1.1 Data Mining: A Definition. Data Mining. The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data. Induction-based Learning.

donnaclark
Télécharger la présentation

Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part I Data Mining Fundamentals

  2. Data Mining: A First View Chapter 1

  3. 1.1 Data Mining: A Definition

  4. Data Mining The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

  5. Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned.

  6. Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.

  7. Data Mining: A KDD Process Knowledge Pattern Evaluation • Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

  8. 1.2 What Can Computers Learn?

  9. Four Levels of Learning Facts Concepts Procedures (to be worked out) Principles

  10. Concepts Computers are good at learning concepts. Concepts are the output of a data mining session.

  11. Three Concept Views Classical View (Crisp)---old hands As a definition Probabilistic View (85%)---with some experience DM rules with confidence Exemplar View (CBR)—new comer An illustrated example: good credit?

  12. Supervised Learning Build a learner model using data instances of known origin. Use the model to determine the outcome new instances of unknown origin.

  13. Supervised Learning: A Decision Tree Example

  14. Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes.

  15. Figure 1.1 A decision tree for the data in Table 1.1

  16. Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy

  17. Unsupervised Clustering A data mining method that builds models from data without predefined classes.

  18. 3 groups formed (table 1.3 is only a part of whole table) G1.MarginAccount=yes and age =20-29 and AnnualIncome=40-59k accuracy=80% coverage=0.5 G2. AccountType=Custodial and FavoriteRecreation=Skiing and AnnualIncome=40-59k accuracy=95% coverage=0.35 G3.AccountType=joint and Trades/Month>5 and TransactionMethod=online accuracy=82% coverage=0.65

  19. 1.3 Is Data Mining Appropriate for My Problem?

  20. Data Mining or Data Query? Shallow Knowledge (SQL) Multidimensional Knowledge (OLAP) Hidden Knowledge (DM) Deep Knowledge (human)

  21. Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.

  22. 1.4 Expert Systems or Data Mining?

  23. 圖14-2 專家系統架構細部圖

  24. Expert System A computer program that emulates the problem-solving skills of one or more human experts.

  25. Knowledge Engineer A person trained to interact with an expert in order to capture their knowledge.

  26. Figure 1.2 Data mining vs. expert systems

  27. 1.5 A Simple Data Mining Process Model

  28. Figure 1.3 A simple data mining process model

  29. Assembling the Data The Data Warehouse Relational Databases and Flat Files

  30. Mining the Data

  31. Interpreting the Results

  32. Result Application

  33. 1.6 Why Not Simple Search? Nearest Neighbor Classifier (i.e., CBA, add a new instance in a class based on similarity) Time consuming and entropy independent K-nearest Neighbor Classifier Form a class consisting of K-nearest neighbors

  34. Assignment 4 A new instance, Patient ID=14, Sore Throat=yes, Fever =No, Swollen Glands=No, Congestion =No, Headache =No Comparison: with one matched attribute: ID=1,9 with one matched attribute: ID=2,5,10 with one matched attribute: ID=3,6,7,8 with one matched attribute: ID=4strep throat? Correct diagnosis should be allergy using decision tree Q: Try K-nearest Neighbor Classifier

  35. 1.7 Data Mining Applications

  36. Customer Intrinsic Value

  37. Figure 1.4 Intrinsic vs. actual customer value

More Related