1 / 0

Introduction to DATA MINING

Introduction to DATA MINING. MIS2502 Data Analytics. The Information Architecture of an Organization. Now we’re here…. Data entry. Transactional Database. Data extraction. Analytical Data Store. Data analysis. Stores real-time transactional data.

chaim
Télécharger la présentation

Introduction to DATA MINING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to DATA MINING

    MIS2502 Data Analytics
  2. The Information Architecture of an Organization Now we’re here… Data entry Transactional Database Data extraction Analytical Data Store Data analysis Stores real-time transactional data Stores historical transactional and summary data
  3. The difference between data mining and OLAP OLAP can tell you what is happening, or what has happened Analytical Data Store Data mining can tell you why it is happening, and help predict what will happen The (dimensional) data warehouse feed both…
  4. The Evolution of Data Analytics
  5. Origins of Data Mining Draws ideas from Artificial intelligence Pattern recognition Statistics Database systems Traditional techniques may not work because of Sheer amount of data High dimensionality of data Heterogeneous, distributed nature of data Data Mining
  6. What data mining is…
  7. What data mining is not… If these aren’t data mining examples, then what are they ?
  8. Data Mining Tasks from Fayyad et al., Advances in Knowledge Discovery and Data Mining, 1996
  9. Case Study You are a marketing manager for a brokerage company Problem: High churn (i.e., customers leave) Turnover (after 6 month introductory period) is 40% They get a reward (average cost: $160) to open an account Giving more incentives to everyone who might leave is expensive and wasteful And getting a customer back after they leave is difficult and costly
  10. …a solution
  11. Data Mining Tasks
  12. Decision Trees Used to classify data according to a pre-defined outcome Based on characteristics of that data Uses Predict whether a customer should receive a loan Flag a credit card charge as legitimate Determine whether an investment will pay off http://www.mindtoss.com/2010/01/25/five-second-rule-decision-chart/
  13. Ok…here’s a real one Will a customer buy some product given their demographics? What are the characteristics of customers who are likely to buy? http://onlamp.com/pub/a/python/2006/02/09/ai_decision_trees.html
  14. Clustering Used to determine distinct groups of data Based on data across multiple dimensions Uses Customer segmentation Identifying patient care groups Performance of business sectors Here you have four clusters of web site visitors. What does this tell you? from http://www.datadrivesmedia.com/two-ways-performance-increases-targeting-precision-and-response-rates/
  15. Association Rules Used to determine which events occur together Usually that “event” is a product purchase Uses Determine which products are bought together Which web sites are likely to be visited in a single session Find sets of customization options that should bundled What features should be sold in a discounted bundle?
  16. Bottom line In large sets of data, these patterns aren’t obvious And we can’t just figure it out in our head We need analytics software We’ll be using SAS to perform these three analyses on large sets of data Decision Trees Clustering Association Rules
More Related