1 / 17

Modeling Issues for Data Warehouses

Modeling Issues for Data Warehouses. CMPT 455/826 - Week 7, Day 1 (based on Trujollo). This is a tough paper. This is the toughest paper that we’ve dealt with so far It introduces a number of concepts that are very important in ways that are often difficult to follow

davina
Télécharger la présentation

Modeling Issues for Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Issues for Data Warehouses CMPT 455/826 - Week 7, Day 1 (based on Trujollo) Sept-Dec 2009 – w7d1

  2. This is a tough paper • This is the toughest paper that we’ve dealt with so far • It introduces • a number of concepts that are very important • in ways that are often difficult to follow • with a combination of standard and homemade terms • So, for today • rather than concentrate on critique items • we need to concentrate on the concepts Sept-Dec 2009 – w7d1

  3. Multidimensional modeling • Ties together the concepts of: • a data warehouse • multidimensional database (MDB) • online analytical processing (OLAP) • What are dimensions? • What are • data warehouses • multidimensional database (MDB) • online analytical processing (OLAP) Sept-Dec 2009 – w7d1

  4. Multidimensional modeling • Structures information into • facts • dimensions • a set of attributes called measures or fact attributes • can be atomic or derived • are contained in cells or points within the data cube • We base this set of measures on a set of dimensions that derive from the granularity chosen for representing the facts. • These dimensions thus present the context for analyzing the facts. • dimension attributes • provide the specifics that characterize dimensions. Sept-Dec 2009 – w7d1

  5. Multidimensional modeling • facts • many-to-many relationships between all dimensions • many-to-one relationships between the fact and every particular dimension • e.g. product sale is related to only one product that is sold in one store to one customer at one time • can represent many-to-many relationships between particular dimensions • e.g. one sales slip can contain many products, and one product can be on many sales slips Sept-Dec 2009 – w7d1

  6. Multidimensional modeling • The additivity / summarizability concept • A measure (fact attribute) is additive along a dimension • if we can use the SUM operator to aggregate attribute values along all hierarchies defined on that dimension • The aggregation of some fact attributes • called roll-up in OLAP terminology • might not be semantically meaningful for all measures along all dimensions • e.g. number of clients • estimated by counting the number of purchase receipts for a given product, customer, day, and store • is not additive along the product dimension. Because the same ticket can include other products, adding up the number of clients for two or more products would lead to inconsistent results. Sept-Dec 2009 – w7d1

  7. Multidimensional modeling • The strictness concept • an object at a hierarchy’s lower level • belongs to only one higher level object • e.g. a province can only relate to one country • The completeness concept • all members belong to one higher-class object and • that object consists of those members only • e.g. only the recorded provinces can form a country. • In a “complete” classification hierarchy between the country and province levels, • all the recorded provinces form the country, and • all the provinces that form the country have been recorded Sept-Dec 2009 – w7d1

  8. Multidimensional modeling • Categorization of dimensions • some attributes are normally valid for all elements within a dimension • while others are only valid for a subset of elements • e.g. the attributes alcohol percentage and volume would only be valid for drink products and would be null for food products. • A proper multidimensional data model • should consider attributes only when necessary, • depending on the categorization of dimensions. Sept-Dec 2009 – w7d1

  9. Multidimensional modeling • Recommended modeling approach • Clearly separate the structure of a multidimensional model into • facts • dimensions • Fact classes • are composite classes • “in a shared-aggregation relationship of n dimension classes” • e.g. they relate instances from all dimensions • A fact object instance • is always related to object instances from all dimensions Sept-Dec 2009 – w7d1

  10. Multidimensional modeling • Given the basic of their modeling approach • they then go on to explain how they can annotate • derived measures (with a “/”) • table specific components of the table’s primary key / object ID (“OID”) • attributes that function as descriptors (‘D”) • constraints on additivity (between braces near the fact table) • additivity and derivation rules (separate from the diagram) • that a dimension is a directed acyclic graph (“DAG”) • they also use various other UML notations • Is this perhaps a little much semantic loading? Sept-Dec 2009 – w7d1

  11. Multidimensional modeling • Regardless of how we model these various concepts • it is important that they be considered • in the design of data warehouses Sept-Dec 2009 – w7d1

  12. Dimensional Modeling (based on Jones) Sept-Dec 2009 – w7d1

  13. Characteristics for using Patterns • The problem that the pattern addresses is identified, recognized, and defined from real world situations. • A pattern provides an approach for formulating a solution to a real world problem. • The approach must be defined with respect to the real world context from which the problem emanates. • The approach is reusable because it has been successfully used to solve recurring real world problems. • A pattern endures over time. Sept-Dec 2009 – w7d1

  14. Dimensional Data Patterns • involve a commonly known & recognized mental model • with the intent of increasing the practitioner's ability to understand, remember, and apply the DDPs • facilitate the identification of commonly used entities • thereby providing a greater potential for improving design correctness with the initial model • are common across many dimensional models • thus reusability is improved and design time may be decreased Sept-Dec 2009 – w7d1

  15. Mental Models for DDPs • Using a story as the basis for Domain DDPs • Who: the characters involved in the story • What: the important entities and the ideas for those entities • When: a particular time frame involved • Where: the location / setting of the story • Why: the motivation or the reasons behind the story Sept-Dec 2009 – w7d1

  16. Domain DDPs • A high-level set of domains can then be constructed: • temporal (when) • location (where) • stakeholder (who) • action (what is done or accomplished) • object (what) • qualifier (why) Sept-Dec 2009 – w7d1

  17. Commonality of DDPs • The basic domains can apply to any story • Experience across stories will recognize commonalities • Individual stories may contain unique components • however, many of these components will take on similar patterns • despite the components having different names Sept-Dec 2009 – w7d1

More Related