210 likes | 308 Vues
Explore the publication "An Overview of Data Warehousing and OLAP Technology" by Chaudhuri, Dayal, and Goshey. Understand the problem addressed, major contributions, key concepts, and validation methodology. Investigate key assumptions and the 2006 rewrite's impact on terminology, tools, and vendors in Business Intelligence. Gain insights on data warehouses, OLTP vs. OLAP, relational vs. dimensional data models, bitmap indices, validation methodologies, and slowly changing dimensions.
E N D
An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of Minnesota, Fall 2006 CSci 8701: Overview of Database Research
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Introduction • Selected paper • S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology, SIGMOD Record 26(1): 65-74(1997). • Motivation • Personal Interest Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Problem Addressed • Problem Statement • Survey: organizing the data warehousing space • Differing requirements between OLTP and OLAP • Significance • Growth area • Reference work establishing consensus on terms, architectures and issues Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Major Contributions • Bridging the gulf between industry and academia • OLTP vs. OLAP: clarifying the differences • Concise survey of relevant issues, architectures and tools • Concrete list of data warehouse design and build steps Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Key Concepts • Data warehouses and data marts • OLTP, OLAP, ROLAP vs. MOLAP) • Relational and dimensional data models • Bitmap Index • ETL • Metadata • Managed query vs. ad hoc environments • Materialized views • SQL extensions (cube, rollup, rank, percentile, etc.) Michael Goshey: 9/19/2006
Data Warehouse, Data Mart Michael Goshey: 9/19/2006
Relational or Dimensional? Michael Goshey: 9/19/2006
Relational or Dimensional? (image from http://www.laynetworks.com) Michael Goshey: 9/19/2006
Bitmap Indices • cardinality: unique values/total rows • B-Tree vs. bitmap: 1% rule, uniqueness • Boolean algebra directly on indices Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Validation Methodology • Survey paper goals • Academic and industry citations • Referencing tools, vendors • Case studies Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Assumptions • Read-only environments • Shortcomings • (occasional) transactional commitments • the data revision problem Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
2006 Rewrite • Changes in terminology, tools, vendors • Fact constellations -> conformed dimensions • Decision support -> BI • Vendors and tools in BI, ETL, OLAP • Multiple user constituencies • Data history difficulties • petabyte databases -> very large warehouses common • data expiry challenges • slowly changing dimensions Michael Goshey: 9/19/2006
Slowly Changing Dimensions • Before • After: Type 1 • After: Type 2 • After: Type 3 Michael Goshey: 9/19/2006
Questions? Michael Goshey: 9/19/2006