1 / 26

Data Warehouses

Data Warehouses. Mert Terzihan CmpE 422 Database Systems. Outline. What is a Data Warehouse? Architecture Design and Implementation of a DW Logical Design Metadata in DW Data Marts Data Staging Testing Data Warehouses. Data Warehouse.

buck
Télécharger la présentation

Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouses Mert Terzihan CmpE 422 Database Systems

  2. Outline • What is a Data Warehouse? • Architecture • Design and Implementation of a DW • Logical Design • Metadata in DW • Data Marts • Data Staging • Testing Data Warehouses

  3. Data Warehouse • “Subject-oriented, integrated, timevarying, non-volatile collection of data that is used primarily in organizational decision making” • Historical Data for decision support • Seperate from organization’s operational databases (OLTP)

  4. Data Warehouse vs. OLTP

  5. Architecture

  6. Design and Implementation • Define the architecture, do capacity planning, and select the storage servers, database and OLAP servers, and tools. • Integrate the servers, storage, and client tools. • Design the warehouse schema and views. • Define the physical warehouse organization, data placement, partitioning, and access methods.

  7. Design and Implementation • Connect the sources using gateways, ODBC drivers, or other wrappers. • Design and implement scripts for data extraction, cleaning, transformation, load, and refresh. • Populate the repository with the schema and view definitions, scripts, and other metadata. • Design and implement end-user applications. • Roll out the warehouse and applications.

  8. Design and Implementation: Buttom-Up Design • First data marts are created, then DW • Drill across: Integration of multiple data marts • Advantages: Faster and easier implementation, less risk of failure • Disadvantages: Each data mart has a narrow view of data, redundant data in data marts

  9. Design and Implementation: Top-Down Design • DW is designed using a normalized enterprise data model. • Advantages: Enterprise view of data, centralized rules and control • Disadvantages: Longer to build, high risk of failure

  10. Logical Design: Dimensional Data Modeling • Critical operation: Read • Join operations decrease speed • Normalization means more tables and more joins • Minimize join operations -> Quick answers to queries • Centralized approach with only 1 join

  11. Logical Design: Dimensional Data Modeling Star Schema • Fact Table: - Dimension Attributes (FK-PK relationships with dimension tables) - Degenerate Dimensions (Dimension attributes without related to any dimension table) - Measures (contain values to be aggregated) • Dimension Tables

  12. Logical Design: Dimensional Data Modeling • Star Schema

  13. Star Schema: Example Queries • Selecting rows from fact table: - Select cost and sell for all jobs where the Ship Month is January 2005 • Group rows for summary information - Select the total cost and sell for each Ship Month in the Ship Year 2005

  14. Star Schema • Dimension tables allow users to examine different level of details (Hierarchy): i.e. allowing rows to be grouped by Ship Date, Month, Quarter, or Year • Drill-down operation (examine more detailed data-moving down a hierarchy) • Roll-up operation (summarize details-moving up a hierarchy)

  15. Logical Design: Dimensional Data Modeling • Snowflake Schema - Normalization of dimension tables - Each hierarchical level has its own table

  16. Logical Design: Dimensional Data Modeling • Snowflake Schema

  17. Metadata in Data Warehouse • Administrative metadata (information for setting up and using a DW) • Business metadata (business terms and definitions, ownership of data, and charging policies) • Operational metadata (information collected during operations of DW )

  18. Data Marts • Scaled-down DW • Serves data to the users • Oriented to specific business line or team • Advantages: increased speed, reduced cost, complexity and risks • Disadvantages: non-centralized, limited scope

  19. Data Staging • Data coming from sources (operational databases, web etc.) needs to be changed, converted, and made ready in a format that is suitable to be stored for querying and analysis. • Data Extraction • Data Transformation • Data Loading • Together ETL processes

  20. Data Staging: Extract • Extract data from different sources (RDBs, flat files, etc.) • Each one may have different data formats • Parse and convert data for transformation

  21. Data Staging: Transform • Apply series of rules to extracted data for loading • Cleaning (duplicates, default values for missing elements, etc.) • Standardization (data types, synonyms, etc.) • Combining • Summarizing

  22. Data Staging: Load • Load the data into DW • Additional preprocessing is required • Checking integrity constraints • Sorting, aggregation • Other computations to build the tables stored in data warehouse

  23. Data Staging: ETL

  24. Testing Data Warehouses • Check the quality of the data • Data Completeness: Ensures that all the expected data is loaded • Data Transformation: Ensures that all data is transformed correctly according to business rules and/or design specifications.

  25. Challenges for Testing a Data Warehouse • Data selection from multiple source systems. • Volume and the complexity of the data. • Inconsistent and redundant data in a data warehouse. • Non-Availability of comprehensive test bed. • Critical data for Business.

  26. References • Database Modeling and Design, T. Teorey, Morgan Kaufmann Publishers • An Overview of Data Warehousing and OLAP Technology, S. Chaudhuri, Microsoft Research • Data Warehousing with Oracle, M. A. Shahzad Thank you for listening...

More Related