1 / 28

Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7

Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7. Client. Client. Query & Analysis. Metadata. Warehouse. Integration. Source. Source. Source. Processing. ROLAP servers vs. MOLAP servers Index Structures Cube computation What to Materialize? Algorithms. ROLAP

zytka
Télécharger la présentation

Cube Computation and Indexes for Data Warehouses CPS 196.03 Notes 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cube Computation and Indexes for Data Warehouses CPS 196.03Notes 7

  2. Client Client Query & Analysis Metadata Warehouse Integration Source Source Source Processing • ROLAP servers vs. MOLAP servers • Index Structures • Cube computation • What to Materialize? • Algorithms

  3. ROLAP server utilities relational DBMS ROLAP Server • Relational OLAP Server tools Special indices, tuning; Schema is “denormalized”

  4. Sales City B A milk soda eggs soap Product 1 2 3 4 Date utilities MOLAP Server • Multi-Dimensional OLAP Server M.D. tools multi-dimensional server could also sit on relational DBMS

  5. Date 2Qtr 1Qtr sum 3Qtr 4Qtr TV Product U.S.A PC VCR sum Canada Country Mexico sum All, All, All MOLAP Total annual sales of TV in U.S.A.

  6. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A MOLAP B

  7. Challenges in MOLAP • Storing large arrays for efficient access • Row-major, column major • Chunking • Compressing sparse arrays • Creating array data from data in tables • Efficient techniques for Cube computation Topics are discussed in the paper for reading

  8. Index Structures • Traditional Access Methods • B-trees, hash tables, R-trees, grids, … • Popular in Warehouses • inverted lists • bit map indexes • join indexes • text indexes

  9. Inverted Lists . . . data records inverted lists age index

  10. Using Inverted Lists • Query: • Get people with age = 20 and name = “fred” • List for age = 20: r4, r18, r34, r35 • List for name = “fred”: r18, r52 • Answer is intersection: r18

  11. Bit Maps . . . age index data records bit maps

  12. Bitmap Index • Index on a particular column • Each value in the column has a bit vector: bit-op is fast • The length of the bit vector: # of records in the base table • The i-th bit is set if the i-th row of the base table has the value for the indexed column • not suitable for high cardinality domains Base table Index on Region Index on Type

  13. Using Bit Maps • Query: • Get people with age = 20 and name = “fred” • List for age = 20: 1101100000 • List for name = “fred”: 0100000001 • Answer is intersection: 010000000000 • Good if domain cardinality small • Bit vectors can be compressed

  14. Join • “Combine” SALE, PRODUCT relations • In SQL: SELECT * FROM SALE, PRODUCT WHERE ...

  15. Join Indexes join index

  16. Cube Computation for Data Warehouses

  17. Counting Exercise • How many cuboids are there in a cube? • The full or nothing case • When dimension hierarchies are present • What is the size of each cuboid?

  18. day 2 day 1 Lattice of Cuboids 129 all city product date city, product city, date product, date city, product, date

  19. Dimension Hierarchies all state city

  20. Dimension Hierarchies all product city date product, date city, product city, date state city, product, date state, date state, product state, product, date not all arcs shown...

  21. Efficient Data Cube Computation • Data cube can be viewed as a lattice of cuboids • The bottom-most cuboid is the base cuboid • The top-most cuboid (apex) contains only one cell • How many cuboids in an n-dimensional cube with L levels? • Materialization of data cube • Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) • Selection of which cuboids to materialize • Based on size, sharing, access frequency, etc.

  22. Derived Data • Derived Warehouse Data • indexes • aggregates • materialized views (next slide) • When to update derived data? • Incremental vs. refresh

  23. does not exist at any source Idea of Materialized Views • Define new warehouse tables/arrays

  24. Efficient OLAP Processing • Determine which operations should be performed on available cuboids • Transform drill, roll, etc. into corresponding SQL and/or OLAP operations, e.g., dice = selection + projection • Determine which materialized cuboid(s) should be selected for OLAP: • Let the query to be processed be on {brand, province_or_state} with the condition “year = 2004”, and there are 4 materialized cuboids available: 1) {year, item_name, city} 2) {year, brand, country} 3) {year, brand, province_or_state} 4) {item_name, province_or_state} where year = 2004 Which should be selected to process the query? • Explore indexing structures & compressed vs. dense arrays in MOLAP

  25. What to Materialize? • Store in warehouse results useful for common queries • Example: total sales day 2 . . . day 1 129 materialize

  26. Materialization Factors • Type/frequency of queries • Query response time • Storage cost • Update cost Will study a concrete algorithm later

  27. Iceberg Cube • Computing only the cuboid cells whose count or other aggregates satisfying the condition like HAVING COUNT(*) >= minsup • Motivation • Only a small portion of cube cells may be “above the water’’ in a sparse cube • Only calculate “interesting” cells—data above certain threshold

  28. Challenges in MOLAP • Storing large arrays for efficient access • Row-major, column major • Chunking • Compressing sparse arrays • Creating array data from data in tables • Efficient techniques for Cube computation Topics are discussed in the paper for reading

More Related