210 likes | 486 Vues
Knowledge Discovery in Database (KDD). Knowledge Discovery Process. The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database It includes data selection , cleaning , enrichment , coding , data mining , and reporting
E N D
Knowledge Discovery Process • The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database • It includes data selection, cleaning, enrichment, coding, data mining, and reporting • Data Mining is the key stage of Knowledge Discovery Process • The process of finding the desired information from large database
Knowledge Discovery Process • Example: the database of a magazine publisher which sells five types of magazines– cars, houses, sports, music and comics • Data mining: Find interesting customer properties • What is the profile of a reader of a car magazine? • Is there any correlation between an interest in cars and an interest in comics? • Apply knowledge discovery process
Data Selection • Select the information about people who have subscribed to a magazine
Cleaning • Pollutions: Type errors, moving from one place to another without notifying change of address, people give incorrect information about themselves • Pattern Recognition Algorithms
Cleaning • Lack of domain consistency
Enrichment • Need extra information about the clients consisting of date of birth, income, amount of credit, and whether or not an individual owns a car or a house
Enrichment • The new information need to be easily joined to the existing client records • Extract more knowledge
Coding • We select only those records that have enough information to be of value (row) • Project the fields in which we are interested (column)
Coding • Code the information which is too detailed • Address to region • Birth date to age • Divide income by 1000 • Divide credit by 1000 • Convert cars yes-no to 1-0 • Convert purchase date to month numbers starting from 1990 • The way in which we code the information will determine the type of patterns we find • Coding has to be performed repeatedly in order to get the best results
Coding • We are interested in the relationships between readers of different magazines • Perform flattening operation
Steps of a KDD Process • Learning the Application Domain • Relevant Prior Knowledge and Goals of Application • Creating a Target Data Set • Data Selection • Data Cleaning and Enrichment • May Take 60% of Effort • Data Reduction and Transformation (Coding) • Find Useful Features, Dimensionality Reduction • Choosing Functions of Data Mining • Summarization, Association, Classification, Regression, Clustering, … • Choosing the mining algorithm(s) • Data mining • Search for Patterns of Interest • Pattern Evaluation and Knowledge Presentation • Visualization, Transformation, Removing Redundant Patterns, etc. • Use of Discovered Knowledge
Exercises 1 • 何謂 RFM 指標? 功能為何? • 何謂資料探勘 (Data Mining)?目標為何? • 為何小型公司不需要資料探勘 ? • 大型公司要如何了解客戶? • 請描述一個資料探勘的應用實例 (不可以與投影片的例子相同). • 請列出並解釋Knowledge Discovery in Database (KDD) 處理的步驟.