450 likes | 823 Vues
Designing the data warehouse / data marts. Methodologies and Techniques. Basic principles. Warehouse Database. Life cycle of the DW. First time load. Operational Databases. Refresh. Refresh. Purge or Archive. Refresh. Oracle Medi`. Oracle Warehouse Components. Any Source. Any Data.
E N D
Designing the data warehouse/ data marts Methodologies and Techniques
Warehouse Database Life cycle of the DW First time load Operational Databases Refresh Refresh Purge or Archive Refresh
Oracle Medi` Oracle Warehouse Components Any Source Any Data Any Access Relational tools Relational /Multidimensional Operational data OLAP tools Text, image Spatial Audio,video External data Web Applications/ Web
Oracle Intelligence Tools IS develops user’s Views Business users Analysts Current Tactical Strategic Oracle Reports Oracle Discoverer Oracle Express
Oracle Data Mart Suite Data Modeling Oracle Data Mart Designer OLTP Databases Data Mart Database OLTP Engines Ware- housing Engines Oracle8 SQL*PLUS Data Access & Analysis Discoverer & Oracle Reports Data Extraction Oracle Data Mart Builder Data Management Oracle Enterprise Manager
“Big Bang” Approach:Advantages and Disadvantages • Advantages: • warehouse built as part of major project (eg: BPR) • Having a “big picture” of the data warehouse before starting the data warehousing project • Disadvantages: • Involves a high risk, takes a longer time • Runs the risk of needing to change requirements • Costly and harder to get support for from users
Strategy Definition Analysis Design Build Production Incremental Approach to Warehouse Development • Multiple iterations • Shorter implementations • Validation of each phase
Benefits of an Incremental Approach • Delivers a strategic data warehouse solution through incremental development efforts • Provides extensible, scalable architecture • Quickly provides business benefits and ensures a much earlier return of investment • Allows a data warehouse to be built based on a subject or application area at a time • Allows the construction of an integrated data mart environment
Data Mart • A subset of a data warehouse that supports the requirements of a particular department or business function. • Characteristics include: • Do not normally contain detailed operational data unlike data warehouses. • May contain certain levels of aggregation
Marketing Marketing Sales Finance Human Resources Dependent Data Mart Flat Files Operational Systems Sales Data Warehouse Finance Data Marts External Data
Independent Data Mart Operational Systems Flat Files Sales or Marketing External Data
Reasons for Creating a Data Mart • To give users more flexible access to the data they need to analyse most often. • To provide data in a form that matches the collective view of a group of users • To improve end-user response time. • Potential users of a data mart are clearly defined and can be targeted for support
Reasons for Creating a Data Mart • To provide appropriately structured data as dictated by the requirements of the end-user access tools. • Building a data mart is simpler compared with establishing a corporate data warehouse. • The cost of implementing data marts is far less than that required to establish a data warehouse.
Data Marts Issues • Data mart functionality • Data mart size • Data mart load performance • Users access to data in multiple data marts • Data mart Internet / Intranet access • Data mart administration • Data mart installation
Example of DW tool OLAP • Rotate and drill down to successive levels of detail. • Create and examine calculated data interactively on large volumes of data. • Determine comparative or relative differences. • Perform exception and trend analysis. • Perform advanced analytical functions for example forecasting, modeling, and regression analysis
Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture
Original OLAP Rules 6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels
Relational Database Model Attribute 1Name Attribute 2Age • Attribute 3Gender • Attribute 4Emp No. Anderson Green Lee Ramos 31 42 22 32 1001 1007 1010 1020 F M M F Row 1 Row 2 Row 3 Row 4 The table above illustrates the employee relation.
Multidimensional Database Model Customer Store The data is found at the intersection of dimensions. Store Time Time FINANCE SALES Product GL_Line
Specialised Multidimensional tool • Benefits: • Quick access to very large volumes of data • Extensive and comprehensive libraries of complex functions • analysis • Strong modeling and forecasting capabilities • Can access multidimensional and relational database structures • Caters for calculated fields • Disadvantages: • Difficulty of changing model • Lack of support for very large volumes of data • May require significant processing power
MOLAP Server • The application layer stores data in a multidimensional structure • The presentation layer provides the multidimensional view DSS client MOLAP Engine • Efficient storage and processing • Complexity hidden from the user • Analysis using preaggregated summaries and precalculated measures Application layer Warehouse
ROLAP Server • The warehouse stores atomic data. • The application layer generates SQL for the three- dimensional view. • The presentation layer provides the multidimensional view. DSS client ROLAP engine Application layer Multiple SQL Warehouseserver
MOLAP MDDB Query Periodic load Data Warehouse Express Server Express user
ROLAP Cache Live fetch Query Data cache Data Express user Express Server Warehouse Also Hybrid (HOLAP)
MOLAP ROLAP Choosing a Reporting Architecture • Business needs • Potential for growth • interface • enterprise architecture • Network architecture • Speed of access • Openness Good Query Performance OK Simple Complex Analysis
Data Acquisition • Identify, extract, transform, and transport source data • Consider internal and external data • Perform gap analysis between source data and target database objects • Plan move of data between sources and target • Define first-time load and refresh strategy • Define tool requirements • Build, test, and execute data acquisition modules
Modeling • Warehouses differ from operational structures: • Analytical requirements • Subject orientation • Data must map to subject oriented information: • Identify business subjects • Define relationships between subjects • Name the attributes of each subject • Modeling is iterative • Modeling tools are available
Select a business process Physical model Modeling the Data Warehouse 1 • Defining the business model • Creating the dimensional model • Modeling summaries 4. Creating the physical model 2, 3 4
Identifying Business Rules Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Product Type Monitor Status PC 15 inch New Server 17 inch Rebuilt 19 inch Custom None Time Month > Quarter > Year Store Store > District > Region
Creating the Dimensional Model Identify fact tables • Translate business measures into fact tables • Analyze source system information for additional measures • Identify base and derived measures • Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users
Product Channel Facts (units, price) Customer Time Dimension Tables Dimension tables have the following characteristics: • Contain textual information that represents the attributes of the business • Contain relatively static data • Are joined to a fact table through a foreign key reference
Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metrics) of the business • May contain summarized (aggregated) data • May contain date-stamped data • Are typically additive • Have key value that is typically a concatenated key composed of the primary keys of the dimensions • Joined to dimension tables through foreign keys that reference primary keys in the dimension tables
Product Channel Facts (units, price) Customer Time Dimensional Model (Star Schema) Fact table Dimension tables
Star Schema Model Product Table Product_id Product_desc … Store Table Store_id District_id ... • Central fact table • Radiating dimensions • Denormalized model Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units ... Time Table Day_id Month_id Period_id Year_id Item Table Item_id Item_desc ...
Star Schema Model • Easy for users to understand • Fast response to queries • Simple metadata • Supported by many front end tools • Less robust to change • Slower to build • Does not support history
Snowflake Schema Model Product Table Product_id Product_desc Store Table Store_id Store_desc District_id District Table District_id District_desc Sales Fact Table Item_id Store_id Sales_dollars Sales_units Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Item Table Item_id Item_desc Dept_id
Snowflake Schema Model • Direct use by some tools • More flexible to change • Provides for speedier data loading • May become large and unmanageable • Degrades query performance • More complex metadata
Using Summary Data Phase 3: Modeling summaries • Provides fast access to precomputed data • Reduces use of I/O, CPU, and memory • Is distilled from source systems and precalculated summaries • Usually exists in summary fact tables
Average Maximum Total Percentage Designing Summary Tables Units Sales(€) Store Product A Total Product B Total Product C Total
Summary Tables Example SALES FACTS Sales Region Month 10,000 North Jan 99 12,000 South Feb 99 11,000 North Jan 99 15,000 West Mar 99 18,000 South Feb 99 20,000 North Jan 99 10,000 East Jan 99 2,000 West Mar 99 SALES BY MONTH/REGION Month Region Tot_Sales$ Jan 99 North 41,000 Jan 99 East 10,000 Feb 99 South 40,000 Mar 99 West 17,000 SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000
Summary advisor Summary usage Space requirements Summary recommendations Summary Management in Oracle8i Salessummary Sales Region State City Time Product
The Time Dimension • Time is critical to the data warehouse. • A consistent representation of time is required for extensibility. Sales fact Timedimension How and where should it be stored?