1 / 33

Designing a Data Warehousing System

Designing a Data Warehousing System. Overview. Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema. Identifying Business Drivers and Objectives.

Télécharger la présentation

Designing a Data Warehousing System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing a Data Warehousing System

  2. Overview • Business Analysis Process • Data Warehousing System • Modeling a Data Warehouse • Choosing the Grain • Establishing Dimensions • Establishing a Fact Table • Implementing a Star Schema

  3. Identifying Business Drivers and Objectives Gathering and Analyzing Information Identifying a Business Process Establishing a Conceptual Data Model Identifying Sources and Performing Transformations Establishing Duration Business Analysis Process

  4. Data from Operational Systems DataWarehouse Data Marts Data inOLAPEnvironment Sales Production OLTP Accounting EnterpriseData OLTP OLTP Purchasing Data Warehousing System

  5. Defines Entities That Are Fully Normalized Follows Third Normal Form or Greater Produces a Complex Database Design Stores Data at the Lowest Level of Transactional Detail Increases the Number of Joined Tables in Queries Is Typically Static Defines Entities That Are Denormalized Produces a Simple Database Design That Is More Easily Understood by Users Stores Data Transactional level Summarized level Decreases the Number of Joined Tables in Queries Is Dynamic Comparing Database Modeling Environments Operational: OLTP Analytical: Data Warehouse

  6. Modeling a Data Warehouse • Data Warehouse Modeling Components • Using a Star Schema • Components of a Star Schema • Using a Snowflake Schema • Choosing a Schema

  7. DimensionTables Dimension Geographic Measures Fact Table Geographic Product Time Units $ Facts Product Time Data Warehouse Modeling Components

  8. Dimension Table Fact Table Time_Dim Shipper_Dim Employee_Dim Customer_Dim Product_Dim ProductKey CustomerKey ShipperKey EmployeeKey TimeKey Sales_Fact CustomerID . . . ShipperID . . . EmployeeID . . . TheDate . . . ProductID . . . TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate . . . Using a Star Schema

  9. Employee_Dim EmployeeKey EmployeeID . . . Time_Dim Product_Dim TimeKey Sales_Fact ProductKey TheDate . . . TimeKey EmployeeKey ProductKey CustomerKey ShipperKey ProductID . . . Multipart Key RequiredDate . . . Measures Shipper_Dim Customer_Dim ShipperKey CustomerKey ShipperID . . . CustomerID . . . Components of a Star Schema EmployeeKey TimeKey ProductKey TimeKey EmployeeKey Dimensional Keys ProductKey CustomerKey ShipperKey ShipperKey CustomerKey

  10. Primary Dimension Table Sales_Fact Product_Dim TimeKey EmployeeKey ProductKey CustomerKey ShipperKey ProductKey Product Name Product Size RequiredDate . . . Product Brand ID Product_Brand_Id Secondary Dimension Tables Product Brand Product Category ID Product_Category_Id Product Category Product Category ID Using a Snowflake Schema

  11. Star Snowflake Model Understandability Easier More Difficult Number of Tables Less More Query Complexity Simpler More Complex Query Performance Quicker Slower Choosing a Schema

  12. Choosing the Grain • Determining Data Requirements • Choosing the Lowest Level of Detail • Requires disk space • Involves more process time • Provides detailed data analysis capability • Conforming Measures to the Stated Grain • Design Considerations

  13. Establishing Dimensions • Defining Dimension Characteristics • Identifying Dimension Hierarchies • Defining Conventional Dimensions • Sharing Dimensions Among Other Data Marts • Defining Other Types of Dimensions

  14. Defining Dimension Characteristics • Applying Characteristics to Dimension Tables • Define a primary key • Include highly correlated and descriptive character columns • Designing for Usability and Extensibility • Minimize or avoid using codes or abbreviations • Create columns that are useful for levels of aggregation • Avoid missing or null values • Minimize the number of rows that change over time

  15. Consolidated Hierarchy Separate Hierarchy Store Location Store Location Continent Continent Country Region City Store Continent Country Country Region Region City City Store Store 01 Identifying Dimension Hierarchies

  16. Defining Conventional Dimensions • Time Dimension • Break time down into individual attributes • Represent time as work days, weekends, holidays, seasons, or fiscal periods • Is limited to the grain of the fact table • Geographic Dimension • Product Dimension • Customer Dimension

  17. Production Multiple instances exist inindividual data marts One instance exist and isshared among data marts Time Sharing Dimensions Among Other Data Marts Sales Purchasing

  18. Defining Other Types of Dimensions • Defining Degenerate Dimensions • Useful for consolidated reporting at the business event level • Defining Junk Dimensions • Useful for capturing important information without increasing the size of the fact table

  19. Establishing a Fact Table • Defining the Fact Table • Defining Precalculations • Minimizing Fact Table Size • Balancing Size and Performance

  20. Defining the Fact Table • Applying the Grain • Ensuring Consistency Between Measures • Using Additive and Numeric Values • Summarizing Data

  21. Single Row Precalculations Fact Table Values are derived from measures within that row Time Keys Product Keys Price Discount Rebate Extended Price 7 2 20.00 .10 5.00 13.00 13 30 ~ ~ ~ ~ ((Price - (Price X Discount)) - Rebate) = Extended Price 25 8 ~ ~ ~ ~ 7 5 10.00 .10 5.00 4.00 ... ... ... ... ... ... Multiple Row Precalculations Fact Table Time Keys Product Keys Year-to-date sales 7 2 20,000.00 Values are derived from multiple rows 10 2 25,000.00 ~ ~ ~ SUM(Extended Price) = Year-to-date Sales ... ... ... Defining Precalculations

  22. Minimizing Fact Table Size • Reducing the Number of Columns • Data is redundant • Data is not required for analysis • Reducing the Size of Each Column • Use surrogate keys • Ensure that character and binary data is variable length

  23. Balancing Size and Performance • Storing Large Fact Tables • Designing Star Schema • Fact tables—long and narrow • Dimension tables—short and wide • Including Precalculated Data • Improves query performance but increases the size of a fact table • Moving Fact Table Columns to Another Table • Reduces fact table size but may affect query performance

  24. Lab A: Designing a Star Schema

  25. Implementing a Star Schema • Estimating Size of the Data Warehouse • Creating a Database • Creating Tables • Creating Constraints • Creating Indexes

  26. Description Calculation Method Value Number of rows in fact table 10,000 x 4 x 365 x 5 73,000,000 Estimated row size of fact table (7 IDs x 4 bytes) + (5 measures x 4 bytes) ~48 bytes Estimated data warehouse size 48 bytes x 73,000,000 rows ~3.5 GB Estimating Size of the Data Warehouse • Size of Fact Table • Grain • Bytes per Row • Variables: • Years of data = 5 • Customers = 10,000 • Average number of transactions per customer per day = 4

  27. Creating a Database • Using CREATE DATABASE Options • SIZE • MAXSIZE • FILEGROWTH • Setting Database Options • Trunc. log on chkpt. • SELECT INTO/Bulkcopy

  28. Creating Tables • Creating a Table • Specifying NULL or NOT NULL • Generating Column Values

  29. Creating Constraints • Using PRIMARY KEY Constraints • Does not allow duplicate values • Allows index to be created • Does not allow null values • Using FOREIGN KEY Constraints • Is the multipart key stored in the fact table • Defines a reference to a column with a PRIMARY KEY or UNIQUE constraint • Specifies the data values that are acceptable to update

  30. 1 2 3 4 Creating Indexes • Steps for Creating Data Warehouse Indexes Define primary key in dimension tables Declare foreign key relationships Define primary key in fact table Define indexes on each foreign key in fact table • Using Surrogate Keys • Using Clustered Indexes • Using Nonclustered Indexes • Creating Composite Indexes

  31. Use Star Schema to Model Data Mart or Data Warehouse Database Do Not Mix Grain in Individual Fact Table Attributes Use Single Element Surrogate Keys When Defining Dimensions Define Shared Dimensions Use Facts That Are Both Numeric and Additive Choose Grain Recommended Practices

  32. Lab B: Implementing a Star Schema

  33. Review • Business Analysis Process • Data Warehousing System • Modeling a Data Warehouse • Choosing the Grain • Establishing Dimensions • Establishing a Fact Table • Implementing a Star Schema

More Related