1 / 45

Designing the data warehouse / data marts

Designing the data warehouse / data marts. Methodologies and Techniques. Basic principles. Warehouse Database. Life cycle of the DW. First time load. Operational Databases. Refresh. Refresh. Purge or Archive. Refresh. Oracle Medi`. Oracle Warehouse Components. Any Source. Any Data.

mliss
Télécharger la présentation

Designing the data warehouse / data marts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing the data warehouse/ data marts Methodologies and Techniques

  2. Basic principles

  3. Warehouse Database Life cycle of the DW First time load Operational Databases Refresh Refresh Purge or Archive Refresh

  4. Oracle Medi` Oracle Warehouse Components Any Source Any Data Any Access Relational tools Relational /Multidimensional Operational data OLAP tools Text, image Spatial Audio,video External data Web Applications/ Web

  5. Oracle Intelligence Tools IS develops user’s Views Business users Analysts Current Tactical Strategic Oracle Reports Oracle Discoverer Oracle Express

  6. Oracle Data Mart Suite Data Modeling Oracle Data Mart Designer OLTP Databases Data Mart Database OLTP Engines Ware- housing Engines Oracle8 SQL*PLUS Data Access & Analysis Discoverer & Oracle Reports Data Extraction Oracle Data Mart Builder Data Management Oracle Enterprise Manager

  7. “Big Bang” Approach:Advantages and Disadvantages • Advantages: • warehouse built as part of major project (eg: BPR) • Having a “big picture” of the data warehouse before starting the data warehousing project • Disadvantages: • Involves a high risk, takes a longer time • Runs the risk of needing to change requirements • Costly and harder to get support for from users

  8. Strategy Definition Analysis Design Build Production Incremental Approach to Warehouse Development • Multiple iterations • Shorter implementations • Validation of each phase

  9. Benefits of an Incremental Approach • Delivers a strategic data warehouse solution through incremental development efforts • Provides extensible, scalable architecture • Quickly provides business benefits and ensures a much earlier return of investment • Allows a data warehouse to be built based on a subject or application area at a time • Allows the construction of an integrated data mart environment

  10. Data Mart • A subset of a data warehouse that supports the requirements of a particular department or business function. • Characteristics include: • Do not normally contain detailed operational data unlike data warehouses. • May contain certain levels of aggregation

  11. Marketing Marketing Sales Finance Human Resources Dependent Data Mart Flat Files Operational Systems Sales Data Warehouse Finance Data Marts External Data

  12. Independent Data Mart Operational Systems Flat Files Sales or Marketing External Data

  13. Reasons for Creating a Data Mart • To give users more flexible access to the data they need to analyse most often. • To provide data in a form that matches the collective view of a group of users • To improve end-user response time. • Potential users of a data mart are clearly defined and can be targeted for support

  14. Reasons for Creating a Data Mart • To provide appropriately structured data as dictated by the requirements of the end-user access tools. • Building a data mart is simpler compared with establishing a corporate data warehouse. • The cost of implementing data marts is far less than that required to establish a data warehouse.

  15. Data Marts Issues • Data mart functionality • Data mart size • Data mart load performance • Users access to data in multiple data marts • Data mart Internet / Intranet access • Data mart administration • Data mart installation

  16. Example of DW tool OLAP • Rotate and drill down to successive levels of detail. • Create and examine calculated data interactively on large volumes of data. • Determine comparative or relative differences. • Perform exception and trend analysis. • Perform advanced analytical functions for example forecasting, modeling, and regression analysis

  17. Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture

  18. Original OLAP Rules 6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels

  19. Relational Database Model Attribute 1Name Attribute 2Age • Attribute 3Gender • Attribute 4Emp No. Anderson Green Lee Ramos 31 42 22 32 1001 1007 1010 1020 F M M F Row 1 Row 2 Row 3 Row 4 The table above illustrates the employee relation.

  20. Multidimensional Database Model Customer Store The data is found at the intersection of dimensions. Store Time Time FINANCE SALES Product GL_Line

  21. Two dimensions

  22. Three dimensions

  23. Specialised Multidimensional tool • Benefits: • Quick access to very large volumes of data • Extensive and comprehensive libraries of complex functions • analysis • Strong modeling and forecasting capabilities • Can access multidimensional and relational database structures • Caters for calculated fields • Disadvantages: • Difficulty of changing model • Lack of support for very large volumes of data • May require significant processing power

  24. MOLAP Server • The application layer stores data in a multidimensional structure • The presentation layer provides the multidimensional view DSS client MOLAP Engine • Efficient storage and processing • Complexity hidden from the user • Analysis using preaggregated summaries and precalculated measures Application layer Warehouse

  25. ROLAP Server • The warehouse stores atomic data. • The application layer generates SQL for the three- dimensional view. • The presentation layer provides the multidimensional view. DSS client ROLAP engine Application layer Multiple SQL Warehouseserver

  26. MOLAP MDDB Query Periodic load Data Warehouse Express Server Express user

  27. ROLAP Cache Live fetch Query Data cache Data Express user Express Server Warehouse Also Hybrid (HOLAP)

  28. MOLAP ROLAP Choosing a Reporting Architecture • Business needs • Potential for growth • interface • enterprise architecture • Network architecture • Speed of access • Openness Good Query Performance OK Simple Complex Analysis

  29. Data Acquisition • Identify, extract, transform, and transport source data • Consider internal and external data • Perform gap analysis between source data and target database objects • Plan move of data between sources and target • Define first-time load and refresh strategy • Define tool requirements • Build, test, and execute data acquisition modules

  30. Modeling • Warehouses differ from operational structures: • Analytical requirements • Subject orientation • Data must map to subject oriented information: • Identify business subjects • Define relationships between subjects • Name the attributes of each subject • Modeling is iterative • Modeling tools are available

  31. Select a business process Physical model Modeling the Data Warehouse 1 • Defining the business model • Creating the dimensional model • Modeling summaries 4. Creating the physical model 2, 3 4

  32. Identifying Business Rules Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Product Type Monitor Status PC 15 inch New Server 17 inch Rebuilt 19 inch Custom None Time Month > Quarter > Year Store Store > District > Region

  33. Creating the Dimensional Model Identify fact tables • Translate business measures into fact tables • Analyze source system information for additional measures • Identify base and derived measures • Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users

  34. Product Channel Facts (units, price) Customer Time Dimension Tables Dimension tables have the following characteristics: • Contain textual information that represents the attributes of the business • Contain relatively static data • Are joined to a fact table through a foreign key reference

  35. Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metrics) of the business • May contain summarized (aggregated) data • May contain date-stamped data • Are typically additive • Have key value that is typically a concatenated key composed of the primary keys of the dimensions • Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

  36. Product Channel Facts (units, price) Customer Time Dimensional Model (Star Schema) Fact table Dimension tables

  37. Star Schema Model Product Table Product_id Product_desc … Store Table Store_id District_id ... • Central fact table • Radiating dimensions • Denormalized model Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units ... Time Table Day_id Month_id Period_id Year_id Item Table Item_id Item_desc ...

  38. Star Schema Model • Easy for users to understand • Fast response to queries • Simple metadata • Supported by many front end tools • Less robust to change • Slower to build • Does not support history

  39. Snowflake Schema Model Product Table Product_id Product_desc Store Table Store_id Store_desc District_id District Table District_id District_desc Sales Fact Table Item_id Store_id Sales_dollars Sales_units Time Table Week_id Period_id Year_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Item Table Item_id Item_desc Dept_id

  40. Snowflake Schema Model • Direct use by some tools • More flexible to change • Provides for speedier data loading • May become large and unmanageable • Degrades query performance • More complex metadata

  41. Using Summary Data Phase 3: Modeling summaries • Provides fast access to precomputed data • Reduces use of I/O, CPU, and memory • Is distilled from source systems and precalculated summaries • Usually exists in summary fact tables

  42. Average Maximum Total Percentage Designing Summary Tables Units Sales(€) Store Product A Total Product B Total Product C Total

  43. Summary Tables Example SALES FACTS Sales Region Month 10,000 North Jan 99 12,000 South Feb 99 11,000 North Jan 99 15,000 West Mar 99 18,000 South Feb 99 20,000 North Jan 99 10,000 East Jan 99 2,000 West Mar 99 SALES BY MONTH/REGION Month Region Tot_Sales$ Jan 99 North 41,000 Jan 99 East 10,000 Feb 99 South 40,000 Mar 99 West 17,000 SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000

  44. Summary advisor Summary usage Space requirements Summary recommendations Summary Management in Oracle8i Salessummary Sales Region State City Time Product

  45. The Time Dimension • Time is critical to the data warehouse. • A consistent representation of time is required for extensibility. Sales fact Timedimension How and where should it be stored?

More Related