210 likes | 303 Vues
Analysis of the publication by Surajit Chaudhuri, Umeshwar Dayal, and Michael Goshey from the University of Minnesota discussing the key concepts, major contributions, and validation methodology in the field of data warehousing and OLAP technology. The paper addresses problems and assumptions related to data warehouses, OLTP vs. OLAP, and data models. It also explores the evolution of concepts in the industry, with a focus on bridging academia and industry standards. The validation methodology includes academic citations, case studies, and tools/vendors referencing.
E N D
An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of Minnesota, Fall 2006 CSci 8701: Overview of Database Research
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Introduction • Selected paper • S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology, SIGMOD Record 26(1): 65-74(1997). • Motivation • Personal Interest Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Problem Addressed • Problem Statement • Survey: organizing the data warehousing space • Differing requirements between OLTP and OLAP • Significance • Growth area • Reference work establishing consensus on terms, architectures and issues Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Major Contributions • Bridging the gulf between industry and academia • OLTP vs. OLAP: clarifying the differences • Concise survey of relevant issues, architectures and tools • Concrete list of data warehouse design and build steps Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Key Concepts • Data warehouses and data marts • OLTP, OLAP, ROLAP vs. MOLAP) • Relational and dimensional data models • Bitmap Index • ETL • Metadata • Managed query vs. ad hoc environments • Materialized views • SQL extensions (cube, rollup, rank, percentile, etc.) Michael Goshey: 9/19/2006
Data Warehouse, Data Mart Michael Goshey: 9/19/2006
Relational or Dimensional? Michael Goshey: 9/19/2006
Relational or Dimensional? (image from http://www.laynetworks.com) Michael Goshey: 9/19/2006
Bitmap Indices • cardinality: unique values/total rows • B-Tree vs. bitmap: 1% rule, uniqueness • Boolean algebra directly on indices Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Validation Methodology • Survey paper goals • Academic and industry citations • Referencing tools, vendors • Case studies Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Assumptions • Read-only environments • Shortcomings • (occasional) transactional commitments • the data revision problem Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
2006 Rewrite • Changes in terminology, tools, vendors • Fact constellations -> conformed dimensions • Decision support -> BI • Vendors and tools in BI, ETL, OLAP • Multiple user constituencies • Data history difficulties • petabyte databases -> very large warehouses common • data expiry challenges • slowly changing dimensions Michael Goshey: 9/19/2006
Slowly Changing Dimensions • Before • After: Type 1 • After: Type 2 • After: Type 3 Michael Goshey: 9/19/2006
Questions? Michael Goshey: 9/19/2006