1 / 13

Aggregate management

Aggregate management. References: Lawrence Corr Aggregate improvement www.intelligententerprise.com/011004/415warehouse1_1.jhtml Lost, shrunken, and collapsed www.intelligententerprise.com/020101/501warehouse1_1.jhtml Ralph Kimball Aggregate navigation with (almost) no metadata

hila
Télécharger la présentation

Aggregate management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aggregate management References: Lawrence Corr • Aggregate improvement www.intelligententerprise.com/011004/415warehouse1_1.jhtml • Lost, shrunken, and collapsed www.intelligententerprise.com/020101/501warehouse1_1.jhtml Ralph Kimball • Aggregate navigation with (almost) no metadata www.dbmsmag.com/9608d54.html ACS-4904 Ron McFadyen

  2. Aggregate Navigation From “20 criteria for dimensional friendly systems” #4. Open Aggregate Navigation. The system uses physically stored aggregates as a way to enhance performance of common queries. These aggregates, like indexes, are chosen silently by the database if they are physically present. End users and application developers do not need to know what aggregates are available at any point in time, and applications are not required to explicitly code the name of an aggregate. All query processes accessing the data, even those from different application vendors, realize the full benefit of aggregate navigation. ACS-4904 Ron McFadyen

  3. Aggregation • Created for performance reasons • Using one base star schema, many aggregates can be defined • Some dimensions could be lost, some shrunken or collapsed – see Corr’s articles Customer Date Consider the Transaction-level schema: Price Cost Tax Qty Product Transaction Store ACS-4904 Ron McFadyen

  4. Aggregation Month Customer Date Price Cost Tax Qty Price Cost Tax Qty Product Product Transaction Store Store Transaction-level schema • Aggregate schema with: • lost dimensions (transaction, customer) • shrunken dimension (month) • metrics are accumulated over the transactions, customers, month ACS-4904 Ron McFadyen

  5. Aggregation • Suppose you must create the schema: • What are the steps to do so? • Product and Store already exist, but the others must be created Month Price Cost Tax Qty Product Store ACS-4904 Ron McFadyen

  6. Aggregation Design Goal 1 • Aggregates must be stored in their own fact tables; each distinct aggregation level must occupy its own unique fact table ACS-4904 Ron McFadyen

  7. Aggregation Design Goal 2 • The dimension tables attached to the aggregate fact tables must be shrunken versions of the dimension tables associated with the base fact table Category is a shrunken version of Product Attributes names from Product What values does the PK of Category assume? What name should the PK of Category have? Product ProductSK SKU Description Brand Category Department Category SK? Category Department ACS-4904 Ron McFadyen

  8. Aggregation Design Goal 3 • The base atomic fact table and all of its related aggregate fact tables must be associated together as a “family of schemas” so that the aggregate navigator knows which tables are related to one another. • Kimball’s paper presents an algorithm/technique for mapping an SQL statement from a base schema to an aggregate schema ACS-4904 Ron McFadyen

  9. Aggregation Design Goal 4 • Force all SQL created by any end-user data access tool or application to refer exclusively to the base fact table and its associated full-size dimension tables. ACS-4904 Ron McFadyen

  10. Kimball’s Aggregate Navigation Algorithm 1. For any given SQL statement presented to the DBMS, find the smallest fact table that has not yet been examined in the family of schemas referenced by the query. "Smallest" in this case means the least number of rows. Finding the smallest unexamined schema is a simple lookup in the aggregate navigator metadata table. Choose the smallest schema and proceed to step 2. 2. Compare the table fields in the SQL statement to the table fields in the particular Fact and Dimension tables being examined. This is a series of lookups in the DBMS system catalog. If all of the fields in the SQL statement can be found in the Fact and Dimension tables being examined, alter the original SQL by simply substituting destination table names for original table names. No field names need to change. If any field in the SQL statement cannot be found in the current Fact and Dimension tables, then go back to step 1 and find the next larger Fact table. 3. Run the altered SQL. It is guaranteed to return the correct answer because all of the fields in the SQL statement are present in the chosen schema. ACS-4904 Ron McFadyen

  11. Aggregate Navigation www.intelligententerprise.com/000929/feat1.jhtml “Other data mart capabilities worth mentioning are aggregation and aggregate navigation. The nature of multidimensional analysis dictates the extensive use of aggregation. … Both UDB and O8i provide a database-resident means for navigating user queries to the optimum aggregated tables” ACS-4904 Ron McFadyen

  12. Materialized Views An MV is a view where the result set is pre-computed and stored in the database Periodically an MV must be re-computed to account for updates to its source tables MVs represent a performance enhancement to a DBMS as the result set is immediately available when the view is referenced in a SQL statement MVs are used in some cases to provide summary or aggregates for data warehousing transparently to the end-user Query optimization is a DBMS feature that may re-write an SQL statement supplied by a user. ACS-4904 Ron McFadyen

  13. Materialized Views Oracle example: CREATE MATERIALIZED VIEW hr.mview_employees AS SELECT employees.employee_id, employees.email FROM employees UNION ALL SELECT new_employees.employee_id, new_employees.email FROM new_employees; SQL Server has “indexed views” … see http://www.databasejournal.com/features/mssql/article.php/2119721 ACS-4904 Ron McFadyen

More Related