1 / 24

Introduction to Data Warehousing

Introduction to Data Warehousing. From DBMS to Decision Support. DBMSs widely used to maintain transactional data Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems. Rapid Growth since mid 70’s

Télécharger la présentation

Introduction to Data Warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Data Warehousing

  2. From DBMS to Decision Support • DBMSs widely used to maintain transactional data • Attempts to use of these data for analysis, exploration, identification of trends etc. has led to Decision Support Systems. • Rapid Growth since mid 70’s • DBMSs vendors have answered this trend by adding new features to existing products • Rarely enough

  3. DBs for Decision Support • Trend towards Data Warehousing • Data Warehousing – consolidation of data from several databases which are in turn maintained by individual business units along with historical and summary information

  4. Characteristics of TPSs Characteristic OLTP Typical operation Update Level of analytical requirements Low Screens Unchanging Amount of data per transaction Small Data level Detailed Age of data Current Orientation Records

  5. TPS vs Decision Support OLTP Information to supportday-to-day service Data stored at transactionlevel Database design: Normalized Complex Analysis Historical informationto analyze Data needs to be integrated Database design: Denormalized, star schema

  6. MIS and Decision Support Ad hoc access Productionplatforms Decision makers Operational reports • MIS systems provided business data • Reports were developed on request • Reports provided little analysis capability • no personal ad hoc access to data

  7. Analyzing Data from Operational Systems • Data structures are complex • Systems are designed for high performance and throughput • Data is not meaningfully represented • Data is dispersed • TPS systems unsuitable for intensive queries ERP Productionplatforms Operational reports

  8. Data Extract Processing • End user computing offloaded from the operational environment • User’s own data Operational systems Extracts Decision makers

  9. Management Issues Extract explosion • Duplicated effort • Multiple technologies • Obsolete reports • No metadata Operational systems Extracts Decision makers

  10. Data Quality Issues • No common time basis • Different calculation algorithms • Different levels of extraction • Different levels of granularity • Different data field names • Different data field meanings • Missing information • No data correction rules • No drill-down capability

  11. From Extract to Warehouse DSS • Controlled • Reliable • Quality information • Single source of data Data warehouse Decision makers Internal andexternal systems

  12. Data Warehousing Architecture External Data Sources Visualisation Extract Clean Transform Load Refresh Metadata respository Serves OLAP Operational Databases Data Warehouse Data Mining

  13. Business Motivators • Provide superior services and products • Know the business • New products • Invest in customers • Retain customers • Invest in technology • Reinvent to face new challenges

  14. Centralised data warehouse Federated data warehouse

  15. Tiered data warehouse

  16. Data Mart Data Warehouses Vs Data Marts Data Warehouse Property Scope Subjects Data Source Size (typical) Implementation time Data Warehouse Enterprise Multiple Many 100 GB to > 1 TB Months to years Data Mart Department Single-subject Few < 100 GB Months

  17. End-user Access Tools • High performance is achieved by pre-planning the requirements for joins, summations, and periodic reports by end-users. • There are five main groups of access tools: • Data reporting and query tools • Application development tools • Executive information system (EIS) tools • Online analytical processing (OLAP) tools • Data mining tools

  18. Data Usage - $1000 questions Need to complement RDBMS technology with a flexible, multidimensional view of data

  19. The Functionality of OLAP • Rotate and drill down • Create and examine calculated data • Determine comparative or relative differences. • Perform exception and trend analysis. • Perform advanced analytical functions

  20. The star structure

  21. Multidimensional Database Model Customer Store The data is found at the intersection of dimensions. Store Time Time FINANCE SALES Product

  22. Data Mining

  23. Data mining functions • Associations • 85 percent of customers who buy a certain brand of wine also buy a certain type of pasta • Sequential patterns • 32 percent of female customers who order a red jacket within six months buy a gray skirt • Classifying • Frequent customers are those with incomes about $50,000 and having two or more children • Clustering • Market segmentation • Predicting • predict the revenue value of a new customer based on that personal demographic variables

More Related