1 / 15

Adv. DBMS Data Warehouse CSC5301

Adv. DBMS Data Warehouse CSC5301. Review Hachim Haddouti. Do You Remember?. DSS. Data cube. MD. OLTP. RollUp. drill down. Slice/dice. MD. ROLAP. Star schema. MOLAP. Data mining. Data extraction. Fact table. Data Warehouses.

randy
Télécharger la présentation

Adv. DBMS Data Warehouse CSC5301

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adv. DBMSData WarehouseCSC5301 Review Hachim Haddouti

  2. Do You Remember? DSS Data cube MD OLTP RollUp drill down Slice/dice MD ROLAP Star schema MOLAP Data mining Data extraction Fact table

  3. Data Warehouses • “Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process” Inmon (AP = analytical processing is missing) • Used for analysis of existing data • Resolves performance issues suffered by operational RDBMSs and OLTPs

  4. Sizing DW? Mining of mobile phone calls: (Caller, Callee, Time, Duration, Geogr. Location) ~ 100 B/tuple In Germany 107 users * 10 calls/(day*user) * 100 B/call = = 1010 B/day ~ 3*1012 B/year = 3 TB/year Scanning data at 107 B/s takes 3*1012/107 = 3*105 s > 3 days

  5. Data Warehouse Architecture

  6. ER Model a disaster for querying a huge amount of data (time) not understandable for users and they can not be navigated usefully by DBMS software. hard to visualize; many possible connections between tables, To avoid redundancy MD Model better performance Better data organisation Better visualization Business queries (why, what if) Data model

  7. Typical DWH Analyses/Queries • What are the consequences of new orders for production capacity w.r. to investment, personnel, maintenance, extra hours, ... • Seasonal adaptions, e.g. when to produce how many skis, bikinis, convertibles, ... • Influence of external financing on profits

  8. Operations: • aggregation • slice • dice (cube) • rollup to coarser level • drill down to more detailed level • grouping • sorting

  9. Data Cube Representation

  10. Steps to build a DWH • Acquisition of data • Data cleansing • Storage • Processing: AP • Maintenance, ... Not possible with classical DB-technology alone

  11. On-Line Analytical Processing • OLTP (online transaction processing) for operational data of enterprise, e.g. in relational DBMS, IMS, SAP/R3, ... • DSS:Decision Support System to store data/information for strategic management decisions: aggregations, summaries, etc. • Optimized to work with data warehouses • Used to answer questions • Allows users to perceive data as a multidimensional data cube • Data mining

  12. OLTP versus OLAP Thematic focus • OLTP: many small transactions (microscopic view of business processes, individual steps at lowest level, single order, delivery) • OLAP: finances in general, personnel in general, ... • OLAP requires integration and unification of many detailed data into big picture • Time orientation • Durability: data extracted once, no updates

  13. Technical Comparison OLTP vs OLAP • OLTP: high rate of updates, several thousand t/s • OLAP: read only transactions, very complex, DWH is loaded at certain time intervals, e.g. after the end of the month, quarter • Compute intensive • Special systems with new access methods, e.g. multidimensional data organization and access methods • Special OLAP systems necessary to offload OLTP systems

  14. ROLAP and MOLAP Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end (star schema) Solution 2: MOLAP: multidimensional online analytical processing • new model • new data organizations • new algorithms • new query languages • new optimization techniques

  15. DW Review • degenerate dimension • big dimensions • hierarchies • snow falcking • Slowly changing dimensions • dirty dimensions • Hetegrogeneous prodcuts (core and custom) • Factless Fact table

More Related