html5-img
1 / 28

Data Warehouses

Data Warehouses. Definition. Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization. Need for Data Warehousing.

lazaro
Télécharger la présentation

Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouses

  2. Definition • Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.

  3. Need forData Warehousing • Integrated, company-wide view of high-quality information. • Separation of operational and informational systems and data • operational system: a system that is used to run a business in real time, based on current data • informational system: systems designed to support decision making based on stable point-in-time or historical data

  4. Factors AllowingData Warehousing • Relational DBMS. • Advances in hardware: speed and storage capacity. • End-user computing interfaces and tools.

  5. Data Warehouse Architectures • Two-level • source system files containing operational data • transformed and integrated data warehouse • Three-level • Operational data. • Enterprise data warehouse (EDW)- single source of data for decision making. • Data marts - limited scope; data selected from EDW; customized decision-support for individual user groups

  6. Generic data warehouse architecture

  7. Three-layer architecture

  8. Reasons for theThree-Level Architecture • EDW and data marts have different purposes and data architectures. • Data transformation is complex and is best performed in two steps. • Data marts customized decision support for different groups • Architecture • Operational data, reconciled data, Derived data.

  9. Three-layer data architecture

  10. Data Characteristics • Status vs. Event data. • A transaction is a business activity that triggers one or more business events: event data captures them • Transient vs. Periodic data. • Transient: data in which changes to existing records are written over previous records, thus destroying previous data content • periodic data: data that are never physically altered or deleted once added

  11. Example of DBMS log entry

  12. Transient operational data

  13. Reconciled DataCharacteristics • Detailed • Historical • Normalized • Enterprise-wide • Quality controlled

  14. The Data Reconciliation Process • Capture: capture the relevant data from source files to fill EDW • Static - initial load. • Incremental - ongoing update. • Scrub or data cleansing • missing data, name reconciliation • Pattern recognition and other artificial intelligence techniques.

  15. Steps in data reconciliation

  16. The Data Reconciliation Process • Transform • Convert the data format from the source to the target system. • Record-Level Functions • Selection. • Joining. • Aggregation (for data marts). • Field-Level Functions • Single-field transformation • Multi-field transformation

  17. The Data Reconciliation Process • Load and Index • Refresh Mode • When the warehouse is first created. • Static data capture. • Update Mode • Ongoing update of the warehouse. • Incremental data capture.

  18. Derived DataCharacteristics • Type of data • Detailed, possibly periodic. • Aggregated. • Distributed to departmental servers. • Implemented in star schema.

  19. Star Schema • Also called the dimensional model. • Fact and dimension tables. • Fact table: consists of factual or quantitative data about the business • Dimension table: hold descriptive data • Grain of a fact table - time period for each record.

  20. Components of a star schema

  21. Star schema example

  22. Star schema with sample data

  23. Example of snowflake sample

  24. Size of the fact table • Total number of stores: 1,000 • Total number of products: 10,000 • Total number of periods: 24 • Total rows: 1000 * 10,000 * 24 = 240,000,000 • On average 50% items record sales, • no of rows = 120,000,000

  25. Types of Data Marts • Dependent - Populated from the EDW. • Independent - Data taken directly from the operational databases.

  26. The User Interface • The role of metadata. • Traditional query and reporting tools. • On-line analytical processing (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.

  27. The User Interface • Slicing a cube. • Pivot • Rotate the view for a particular data point to obtain another perspective. • E.g. take a value from the units column and obtain by-store values. • Drill-down

  28. Slicing a data cube

More Related