1 / 25

Chapter 1: Introduction to Data Mining, Warehousing, and Visualization

Chapter 1: Introduction to Data Mining, Warehousing, and Visualization. Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas Spring 2012. Objectives. What is the purpose and motivation for developing a Data Warehouse (DW)?

filia
Télécharger la présentation

Chapter 1: Introduction to Data Mining, Warehousing, and Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1: Introduction to Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas Spring 2012 Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  2. Objectives • What is the purpose and motivation for developing a Data Warehouse (DW)? • Position of DW within IT infrastructure • Relationship between DW and business data mart • What can a DW do? • Foundations for Data Mining • Steps in a typical Data mining project • What is a “Correlation”? KEY CONCEPT • History of Data Visualization vis-à-vis DW Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  3. 1-1: The Modern Data Warehouse • A data warehouse is a copy of transaction data specifically structured for querying, analysis and reporting • Note that the data warehouse contains a copy of the transactions. These are not updated or changed later by the transaction system. • Also note that this data is specially structured, and may have been transformed when it was placed in the warehouse Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  4. 1-2: Data Warehouse Roles and Structures The DW has the following primary functions: • It is a direct reflection of the business rules of the enterprise. • It is the collection point for strategic information. • It is the historical store of strategic information. • It is the source of information later delivered to data marts. • It is the source of stable data regardless of how the business processes may change. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  5. Elements of a DW Extract Transform Store [ETS] Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  6. Position of the Data Warehouse Within the Organization – Figure 1-2 Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  7. Data Mining ExampleService Quality vs. Training Courtesy: MicroStrategy (2005) Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  8. Examples of Common DW Applications Table 1-1 Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  9. Comparison of Typical DW Costs and Benefits Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  10. 1-4: The Cost of DW • Expenditures can be categorized as one-time initial costs or as recurring, ongoing costs. • The initial costs can further be identified as for hardware or software. • Expenditures can also be categorized as capital costs (associated with acquisition of the warehouse) or as operational costs (associated with running and maintaining the warehouse) • Cost of a Data Warehouse: • Rule of Thumb: $1 million per 1 Terabyte of data Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  11. Expenditures Associated with Building a DW Table 1-3 Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  12. 1-5: Data Mining:Farmers and Explorers • Every corporation has two types of DW users. • Farmers[Traditional Statistical Hypothesis testing] know what they want before they set out to find it. They submit small queries and retrieve small nuggets of information. • Explorers [Data Mining] are quite unpredictable. They often submit large queries. Sometimes they find nothing, sometimes they find priceless “golden” nuggets. • Cost justification for the DW is usually done on the basis of the results obtained by farmers since explorers are unpredictable. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  13. 1-6: Foundations of Data Mining • Data mining is the process of using raw data to infer important business relationships. • Despite a consensus on the value of data mining, a great deal of confusion exists about what it is. • It is a collection of powerful techniques intended for analyzing large datasets. • There is no single data mining approach, but rather a set of techniques that can be used in combination with each other. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  14. 1-6 & -7: The Foundations of Data Mining • Data mining has roots in practice dating back over 30 years using standard statistics [e.g., bio-statistics] • In the early 1960s, data mining was called statistical analysis, and the pioneers were statistical software companies such as SAS and SPSS. • By the 1980s, the traditional techniques had been augmented by new methods such as fuzzy logic, heuristics and neural networks. • Also, DSS tools came into popular use in the 1980’s with tools such as Lotus 1-2-3 & EXCEL Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  15. Data Mining – A General Approach Although all data mining endeavors are unique, they possess a common set of process steps: • Infrastructure preparation – choice of hardware platform, the database system and one or more mining tools • Exploration – looking at summary data, sampling and applying intuition [Data visualization useful here] • Analysis – each discovered pattern is analyzed for significance and trends Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  16. A General Approach (continued) • Interpretation – Once patterns have been discovered and analyzed, the next step is to interpret them. Considerations include business cycles, seasonality and the population the pattern applies to. • Exploitation – this is both a business and a technical activity. One way to exploit a pattern is to use it for prediction. Others are to package, price or advertise the product in a different way. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  17. The Data Warehouse and Data Mining • Data mining does not require the use of a data warehouse (DW), however, DWs are designed with data mining in mind. • The data in the DW is integrated and stable (non-volatile) • Data changes continuously in an operational database. • If multiple analyses are run in sequence, the data need to be held constant (as in a DW). Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  18. Volumes of Data – The Biggest Challenge • The largest challenge a “data miner” may face is the sheer volume of data in the warehouse. • It is quite important, then, that summary data also be available to get the analysis started. • A major problem is that this sheer volume may mask the important relationships the analyst is interested in. • The ability to overcome the volume and visualize the data becomes quite important. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  19. 1.9: Foundations of Data Visualization [DV] • One of the earliest known examples of data visualization was in London during the 1854 cholera epidemic. A map (next slide) helped to identify the source of the disease. • Modern visualization techniques grew from the twin technologies of computer graphics and high performance computing in the 1970s and 1980s. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  20. Dr. John Snow used a map to show the source of cholera was a water pump, thus proving the disease was water borne. Broad Street Pump Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  21. DV: Opportunity and Timing • Alternative input devices (light pen, sketch pad and mouse) began to appear in the 1960s. • In the 1970s, flight simulators became much more realistic when graphics replaced film. • In the same decade, special effects computers became entrenched in the entertainment industry. • In the 1980s, visualization grew more dynamic with applications like the animation of weather patterns. Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  22. Data Visualization – Sales by Region Typical Spreadsheet Graphic Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  23. Data Visualization – Total Precipitation Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  24. DV & DM: Future Success Drivers • In the 1990s, rapid advances in chip technology, both at the CPU and the graphics processor, put data visualization everywhere. • On-going reduced costs of computing. • Each new generation has a 10X-100X performance-cost improvements. • Approximately every 18 months [Moore’s Law]. • Web-based E-commerce • Business to Consumer Commerce [B to C; and C:C] • Generates billions and even trillions of characters per reporting period Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

  25. The End Modern Data Warehousing, Mining & Visualization, 2003, George Marakas

More Related