330 likes | 458 Vues
Designing a Data Warehousing System. Overview. Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing Dimensions Establishing a Fact Table Implementing a Star Schema. Identifying Business Drivers and Objectives.
E N D
Overview • Business Analysis Process • Data Warehousing System • Modeling a Data Warehouse • Choosing the Grain • Establishing Dimensions • Establishing a Fact Table • Implementing a Star Schema
Identifying Business Drivers and Objectives Gathering and Analyzing Information Identifying a Business Process Establishing a Conceptual Data Model Identifying Sources and Performing Transformations Establishing Duration Business Analysis Process
Data from Operational Systems DataWarehouse Data Marts Data inOLAPEnvironment Sales Production OLTP Accounting EnterpriseData OLTP OLTP Purchasing Data Warehousing System
Defines Entities That Are Fully Normalized Follows Third Normal Form or Greater Produces a Complex Database Design Stores Data at the Lowest Level of Transactional Detail Increases the Number of Joined Tables in Queries Is Typically Static Defines Entities That Are Denormalized Produces a Simple Database Design That Is More Easily Understood by Users Stores Data Transactional level Summarized level Decreases the Number of Joined Tables in Queries Is Dynamic Comparing Database Modeling Environments Operational: OLTP Analytical: Data Warehouse
Modeling a Data Warehouse • Data Warehouse Modeling Components • Using a Star Schema • Components of a Star Schema • Using a Snowflake Schema • Choosing a Schema
DimensionTables Dimension Geographic Measures Fact Table Geographic Product Time Units $ Facts Product Time Data Warehouse Modeling Components
Dimension Table Fact Table Time_Dim Shipper_Dim Employee_Dim Customer_Dim Product_Dim ProductKey CustomerKey ShipperKey EmployeeKey TimeKey Sales_Fact CustomerID . . . ShipperID . . . EmployeeID . . . TheDate . . . ProductID . . . TimeKey EmployeeKey ProductKey CustomerKey ShipperKey RequiredDate . . . Using a Star Schema
Employee_Dim EmployeeKey EmployeeID . . . Time_Dim Product_Dim TimeKey Sales_Fact ProductKey TheDate . . . TimeKey EmployeeKey ProductKey CustomerKey ShipperKey ProductID . . . Multipart Key RequiredDate . . . Measures Shipper_Dim Customer_Dim ShipperKey CustomerKey ShipperID . . . CustomerID . . . Components of a Star Schema EmployeeKey TimeKey ProductKey TimeKey EmployeeKey Dimensional Keys ProductKey CustomerKey ShipperKey ShipperKey CustomerKey
Primary Dimension Table Sales_Fact Product_Dim TimeKey EmployeeKey ProductKey CustomerKey ShipperKey ProductKey Product Name Product Size RequiredDate . . . Product Brand ID Product_Brand_Id Secondary Dimension Tables Product Brand Product Category ID Product_Category_Id Product Category Product Category ID Using a Snowflake Schema
Star Snowflake Model Understandability Easier More Difficult Number of Tables Less More Query Complexity Simpler More Complex Query Performance Quicker Slower Choosing a Schema
Choosing the Grain • Determining Data Requirements • Choosing the Lowest Level of Detail • Requires disk space • Involves more process time • Provides detailed data analysis capability • Conforming Measures to the Stated Grain • Design Considerations
Establishing Dimensions • Defining Dimension Characteristics • Identifying Dimension Hierarchies • Defining Conventional Dimensions • Sharing Dimensions Among Other Data Marts • Defining Other Types of Dimensions
Defining Dimension Characteristics • Applying Characteristics to Dimension Tables • Define a primary key • Include highly correlated and descriptive character columns • Designing for Usability and Extensibility • Minimize or avoid using codes or abbreviations • Create columns that are useful for levels of aggregation • Avoid missing or null values • Minimize the number of rows that change over time
Consolidated Hierarchy Separate Hierarchy Store Location Store Location Continent Continent Country Region City Store Continent Country Country Region Region City City Store Store 01 Identifying Dimension Hierarchies
Defining Conventional Dimensions • Time Dimension • Break time down into individual attributes • Represent time as work days, weekends, holidays, seasons, or fiscal periods • Is limited to the grain of the fact table • Geographic Dimension • Product Dimension • Customer Dimension
Production Multiple instances exist inindividual data marts One instance exist and isshared among data marts Time Sharing Dimensions Among Other Data Marts Sales Purchasing
Defining Other Types of Dimensions • Defining Degenerate Dimensions • Useful for consolidated reporting at the business event level • Defining Junk Dimensions • Useful for capturing important information without increasing the size of the fact table
Establishing a Fact Table • Defining the Fact Table • Defining Precalculations • Minimizing Fact Table Size • Balancing Size and Performance
Defining the Fact Table • Applying the Grain • Ensuring Consistency Between Measures • Using Additive and Numeric Values • Summarizing Data
Single Row Precalculations Fact Table Values are derived from measures within that row Time Keys Product Keys Price Discount Rebate Extended Price 7 2 20.00 .10 5.00 13.00 13 30 ~ ~ ~ ~ ((Price - (Price X Discount)) - Rebate) = Extended Price 25 8 ~ ~ ~ ~ 7 5 10.00 .10 5.00 4.00 ... ... ... ... ... ... Multiple Row Precalculations Fact Table Time Keys Product Keys Year-to-date sales 7 2 20,000.00 Values are derived from multiple rows 10 2 25,000.00 ~ ~ ~ SUM(Extended Price) = Year-to-date Sales ... ... ... Defining Precalculations
Minimizing Fact Table Size • Reducing the Number of Columns • Data is redundant • Data is not required for analysis • Reducing the Size of Each Column • Use surrogate keys • Ensure that character and binary data is variable length
Balancing Size and Performance • Storing Large Fact Tables • Designing Star Schema • Fact tables—long and narrow • Dimension tables—short and wide • Including Precalculated Data • Improves query performance but increases the size of a fact table • Moving Fact Table Columns to Another Table • Reduces fact table size but may affect query performance
Implementing a Star Schema • Estimating Size of the Data Warehouse • Creating a Database • Creating Tables • Creating Constraints • Creating Indexes
Description Calculation Method Value Number of rows in fact table 10,000 x 4 x 365 x 5 73,000,000 Estimated row size of fact table (7 IDs x 4 bytes) + (5 measures x 4 bytes) ~48 bytes Estimated data warehouse size 48 bytes x 73,000,000 rows ~3.5 GB Estimating Size of the Data Warehouse • Size of Fact Table • Grain • Bytes per Row • Variables: • Years of data = 5 • Customers = 10,000 • Average number of transactions per customer per day = 4
Creating a Database • Using CREATE DATABASE Options • SIZE • MAXSIZE • FILEGROWTH • Setting Database Options • Trunc. log on chkpt. • SELECT INTO/Bulkcopy
Creating Tables • Creating a Table • Specifying NULL or NOT NULL • Generating Column Values
Creating Constraints • Using PRIMARY KEY Constraints • Does not allow duplicate values • Allows index to be created • Does not allow null values • Using FOREIGN KEY Constraints • Is the multipart key stored in the fact table • Defines a reference to a column with a PRIMARY KEY or UNIQUE constraint • Specifies the data values that are acceptable to update
1 2 3 4 Creating Indexes • Steps for Creating Data Warehouse Indexes Define primary key in dimension tables Declare foreign key relationships Define primary key in fact table Define indexes on each foreign key in fact table • Using Surrogate Keys • Using Clustered Indexes • Using Nonclustered Indexes • Creating Composite Indexes
Use Star Schema to Model Data Mart or Data Warehouse Database Do Not Mix Grain in Individual Fact Table Attributes Use Single Element Surrogate Keys When Defining Dimensions Define Shared Dimensions Use Facts That Are Both Numeric and Additive Choose Grain Recommended Practices
Review • Business Analysis Process • Data Warehousing System • Modeling a Data Warehouse • Choosing the Grain • Establishing Dimensions • Establishing a Fact Table • Implementing a Star Schema