1 / 27

Section 1.1

Section 1.1 . Background. Objectives. Discuss some of the history of data mining. Define data mining and its uses. Defining Characteristics. 1. The Data Massive, operational, and opportunistic 2. The Users and Sponsors Business decision support 3. The Methodology

isra
Télécharger la présentation

Section 1.1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 1.1 Background

  2. Objectives • Discuss some of the history of data mining. • Define data mining and its uses.

  3. Defining Characteristics • 1. The Data • Massive, operational, and opportunistic • 2. The Users and Sponsors • Business decision support • 3. The Methodology • Computer-intensive “ad hockery” • Multidisciplinary lineage

  4. Data Mining, circa 1963 IBM 7090 600 cases “Machine storage limitations restricted the total number of variables which could be considered at one time to 25.”

  5. Since 1963 • Moore’s Law: • The information density on silicon-integrated circuits doubles every 18 to 24 months. • Parkinson’s Law: • Work expands to fill the time available for its completion.

  6. Data Deluge hospital patient registries electronic point-of-sale data remote sensing images tax returns stock trades OLTP telephone calls airline reservations credit card charges catalog orders bank transactions

  7. The Data ExperimentalOpportunistic Purpose Research Operational Value Scientific Commercial Generation Actively Passively controlled observed Size Small Massive Hygiene Clean Dirty State Static Dynamic

  8. Business Decision Support • Database Marketing • Target marketing • Customer relationship management • Credit Risk Management • Credit scoring • Fraud Detection • Healthcare Informatics • Clinical decision support

  9. Multidisciplinary Statistics Pattern Recognition Neurocomputing Machine Learning AI Data Mining Databases KDD

  10. Tower of Babel • “Bias” STATISTICS: the expected difference between an estimator and what is being estimated NEUROCOMPUTING: the constant term in a linear combination MACHINE LEARNING: a reason for favoring any model that does not fit the data perfectly

  11. Steps in Data Mining/Analysis • 1. Specific Objectives • In terms of the subject matter • 2. Translation into Analytical Methods • 3. Data Examination • Data capacity • Preliminary results • 4. Refinement and Reformulation

  12. Required Expertise • Domain • Data • Analytical Methods

  13. Nuggets “If you’ve got terabytes of data, and you’re relying on data mining to find interesting things in there for you, you’ve lost before you’ve even begun.” — Herb Edelstein

  14. What Is Data Mining? • IT • Complicated database queries • ML • Inductive learning from examples • Stat • What we were taught not to do

  15. Problem Translation • Predictive Modeling • Supervised classification • Cluster Analysis • Association Rules • Something Else

  16. Predictive Modeling Inputs Target ... ... ... ... ... ... Cases ... ... ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ...

  17. Types of Targets • Supervised Classification • Event/no event (binary target) • Class label (multiclass problem) • Regression • Continuous outcome • Survival Analysis • Time-to-event (possibly censored)

  18. Section 1.2 SEMMA

  19. Objectives • Define SEMMA. • Introduce the tools available in Enterprise Miner.

  20. SEMMA • Sample • Explore • Modify • Model • Assess

  21. Input Data Source Sampling Data Partition Sample

  22. Explore Distribution Explorer Multiplot Insight Association Variable Selection Link Analysis

  23. Data Set Attributes Transform Variables Filter Outliers Replacement Clustering SOM/Kohonen Time Series Modify

  24. Regression Tree Neural Network Princomp/ Dmneural User Defined Model Ensemble Memory Based Reasoning Two-Stage Model Model

  25. Assessment Reporter Assess

  26. Score C*Score Other Types of Nodes – Scoring Nodes

  27. Group Processing Data Mining Database SAS Code Control Point Subdiagram Other Types of Nodes – Utility Nodes

More Related