0 Chapter 1Adamson & Venerable Spring 2012
0 Dimensional Modeling • Dimensional Model Basics • Fact & Dimension Tables • Star Schema • Granularity • Facts and Measures • Multiple Processes • Aggregation
Transaction Systems • They capture critical detailed info about business events: • A product is manufactured • An account is credited while another is debited • A seat is reserved • Most businesses have a variety of OLTP systems • An OLTP system looks at the micro-level transactions that drive the process • OLTP’s normalized design ensures transaction consistency
0 Dimensional Modeling • Dimensional Model Basics • Relational Model vs. Dimensional Model • ERDs vs. Star Schema • Normalization issues • Reports & Queries in Relational Models can be complex: • Especially when seeking historical comparisons • Business Process Measurements (Not in RDBMS) • Gross Margins • Average Aged A/R balances • Inventory levels compared to Sales by Warehouse Location • ROI, etc.
Business Questions • Questions managers ask about a business: • What are the best-selling products last week? • How often did we fulfill an order in more than two shipments? • What is the quarterly trend on orders by region? • Show me gross margin by product category • How does inventory level compare to sales by warehouse? • Answering these questions requires a different design than in OLTP • The dimensional model serves this purpose
Process Metrics • Each question centers around a business process: • Orders, Account Management, Inventory Management, Accounts Receivable, Returns • Each question reveals how a process is measured: • Gross Margin measures the sales process for distributors • Banks monitor balances as part of their account mgt. • Warehouse managers track inventory levels • Accountants quantify receivables • For some processes there is a single metric • Some others have multiple metrics
The Dimensional Model • Reflects the way business people think • Facts or Measures: The measurements that are of importance to a business • Dimensions: The parameters by which a measurement can be viewed or referred to • Any fact (measure) can be combined with any dimension • EXAMPLE: An Order Entry Process
0 Fact & Dimension Tables
0 Dimensional Modeling • Fact & Dimension Tables • Use Normalization approach to determine appropriate dimension tables • Functional dependencies • Customer_key Name, address, city, state, zip • Salesperson_key Name, code, territory, region • Date_key date, day_week, day_of_month, etc. • Order_key product_key, order_date_key, salesperson_key, customer_key, order_dollars, extended_cost, margin_dollars, quantity_ordered, order_number, order_line
0 Star Diagram for Orders
0 Star Schema • Primary Keys • Foreign Keys • Orders [Order_Date_key, Product_key, salesperson_key, customer_key, ….] • Surrogate Key • Orders [Order_ID, Order_Date_key, Product_key, Salesperson_key, Customer_key, etc….] • Avoid Keys from other Systems • Granularity (the Grain) • What level of detail is recorded on the Fact Table • Individual Transactions • Daily summaries, weekly summaries, monthly summaries, etc. • Must be uniform throughout
Ensuring a chosen Grain • Foreign keys relating a row in the fact table to dimension rows should never be null • Optional relationships to a dimension are usually a sign of a grain problem • The rule against optionality can be relaxed when a dimension is not a part of the grain of the fact table. Then, add “fake” rows such as a product called Not-a-Product or a promotion table item called Not-a-Promotion
0 The Dimensions: WIDE tables • Produce “Wide” dimension tables: 100 or 200 attributes (dimensions) are common • Relationships among dimensions in a dimension table does not have to be strong: • Customer table may include region, regional data, sales force data, etc. • Resist the urge to normalize dimension tables! • Role of dimensions attributes are to qualify queries/break out measures • Spreading dimensional attributes across several tables would complicate and slow down queries and users would have to wait longer • Dimension hierarchies are flattened out: year 1998 may appear 365 times in the time dimension table, departments may appear multiple times in a salesperson dimension table
0 Dimensional Modeling • Slowly changing dimensions • Type 1 Change • Correct an error or a value that has no significance in future analysis • E.g., phone number change for customer • Type 2 Change • Add new row in dimension table • Significant change, such as a new location (address) for customer • Type 3 Change • Add new dimension or modify schema (add columns) • New organization structure for firm; want to be able to compare new and old
0 The Fact table: A DEEP table 3 Kinds of Measures • Fully additive • Facts may be added equally well across any dimension • E.g., gross sales dollars, extended cost, margin dollars, quantity sold • Non additive • Facts that may not be added. Generally ratios. E.g., Margin rate • Need to break components into individual facts: margin dollars, sales dollars • Semi-additive • Fact that can be summarized across some dimensions • E.g., Banking account balances can be totaled for point in time, but not across time periods.
0 Dimensional Modeling • Multiple Processes • E.g., Orders vs. Shipments • Different process • Different grains • E.g., Time dimension • Hour • Day • Week • Month • Year
0 Multiple Processes: Orders & Shipments
0 Dimensional Modeling • Aggregation • As size of Data Warehouse grows we may need to aggregate date in order to improve query performance • E.g., change from Daily to Monthly summaries • Keep same Dimension Tables • Time table may be modified • Fact table keeps same measures as before • Facts now represent monthly summarizations of data.
0 Aggregate Table De-normalize from Day to Month Adjust primary key accordingly
0 Summary • Dimensional Model Basics • Fact & Dimension Tables • Star Schema • Granularity • Facts and Measures • Multiple Processes • Aggregation