100 likes | 197 Vues
Learn about managing slowly changing and rapidly changing dimensions in data warehousing, including strategies for multiple fact tables and different scenarios. Understand techniques like overwriting, adding new rows, columns, and split tables.
E N D
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 4
Complex, multi-stage transactions • In some instances, processes might involve multiple conditional transactions (e.g., one type of payment requires a credit check, another requires specialized invoicing, etc.) • A choice needs to be made about whether to have multiple fact tables associated with each process
Multiple fact tables • Criteria in deciding • What are the user’s analytical requirements • Will separate processes be examined together or separately? • Are there really multiple separate business processes? • If so, then two or more fact tables are probably more appropriate since these processes may operate independently • Are multiple source systems involved? • If yes, use separate fact tables • How do the processes relate to dimensions? • If separate processes have different dimensions that they interact with, then separate fact tables may be implied
Slowly changing dimensions • What do we do when dimension information changes over time? (e.g., a product category changes)
Dealing with slowly changing dimensions • Overwrite the old attribute value with the new attribute value • Advantage: easy to manage • Disadvantage: lose historical information
Dealing with slowly changing dimensions • Add a new, unique dimension row (e.g., product description 2 will be added to product dimension 1) • Advantage: this maintains the historical information and is more in line with the goal of the data warehouse • Disadvantage: • Requires the creation and management of a new surrogate key for the new dimension record • May result in the creation of very large dimension tables • Tracking across the dimension change may be difficult
Dealing with slowly changing dimensions • Add a new column to the dimension table to hold information about the old dimension information (e.g., the old product category) • Advantage: allows tracking of dimensions over the change in dimensional information • Disadvantage: becomes unwieldy when multiple changes occur
Dealing with slowly changing dimensions • Predictable changes with multiple version overlays creates columns representing the dimension attributes value for each time period • Advantage: allows tracking of dimensions over time • Disadvantage: only handles periodic and predictable changes
Dealing with rapidly changing dimensions • Quickly changing dimensions cannot easily be handled using the previous techniques • Split quickly changing attributes into a separate dimension table • Minidimension: A foreign key for the split table is located in the fact table (e.g., a demographic category such as age would be applied to the customer when the purchase was made) • Outrigger: a foreign key for the split table is located in the slowly changing dimension