1 / 13

20.5 Data Cubes

20.5 Data Cubes. Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222. Outline. Introduction The Cube Operator Aggregation Cube implementation by Materialized Views Materialized Views Queries using materialized view The Lattice of Views Lattice of a materialized view. Introduction.

osmond
Télécharger la présentation

20.5 Data Cubes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 20.5Data Cubes Instructor : Dr. T.Y. Lin Chandrika Satyavolu 222

  2. Outline • Introduction • The Cube Operator • Aggregation • Cube implementation by Materialized Views • Materialized Views • Queries using materialized view • The Lattice of Views • Lattice of a materialized view

  3. Introduction • A formal data cube precomputes all possible aggregates in a systematic way. • Amount of extra storage required is tolerable. • If the warehoused data does not change, there is no penalty incurred to keep the aggregates up-to-date.

  4. The Cube Operator • Given : a fact table F, an augmented table would be CUBE(F) and it adds a new value ‘*’, to each dimension. • * means any. It aggregates dimension in which it appears. • Border represents only a small addition to the volume of the CUBE. • A tuple of CUBE(F) that has ‘*’ in one or more dimension will have sum of values of that dependent attribute in all tuples in that dimension.

  5. Aggregation Example of aggregation : Given fact table: • Sales (serialNo, date, dealer, price) • serialNo is a key, so aggregating over all dates, dealers or prices will still give just one result with that serialNo • Replace serialNo by two attributes : model and color (both non-key) • It could give total sales price for • a given model • a given color • a given dealer • a given date • To get average sales, add a new dimension cnt and replace price with val i.e. sum of prices over a category • Relation Sales now becomes: Sales (model, color, date, dealer, val, cnt)

  6. Example of a tuple in both Sales and CUBE(Sales) would be: (‘Gobi’, ‘red’, ‘2001-05-21’, ‘Friendly-Fred’, 45000, 2) • Aggregating over all colors: (‘Gobi’, *, ‘2001-05-21’, ‘Friendly-Fred’, 152000, 7) • Aggregating over all colors and dealers: (‘Gobi’, *, ‘2001-05-21’, *, 2348000, 100) • Aggregating over all colors, times and dealers: (‘Gobi’, *, *, *, 1339800000, 58000) • Aggregating over all colors, times, dealers and models: (*, *, *, *, 3521727000, 198000) Aggregating over all dimensions tells us that the total sales of all models in all colors, over all time at all dealers is 198000 cars for a total price of $3521727000

  7. Query and Tuple: • If the query specifies a value v for attribute a, then the tuple t has v in its component for a. • If the query groups by an attribute a, then t has a non-* value in its component for a. • If the query neither groups by an attribute a nor specifies a value a, then t has a * value in its component for a. • Example: • Query: SELECT color, AVG(price) FROM Sales WHERE model = ‘Gobi’ GROUP BY color; • Tuple: (‘Gobi’, c, *, *, v, n)

  8. Cube implementation by Materialized Views • Until now, we assumed that queries choose to aggregate completely in a dimension or not to aggregate at all. • Some dimensions can have many degrees of granularity. Example: • days can be aggregated by weeks, months, quarters or years • As number of choices of grouping increases, cost of storing the results increases too. • Answer to this is : chose some materialized views of the data cube.

  9. Materialized Views • A materialized view is the result of some query, which we choose to store in the database, rather than reconstructing it as needed in response to queries. • Coarser the partition, the less space materialized view takes. • To use a view to answer a query, the view must not partition any more coarser than the query does. • Example: Group dates by months and dealers by city • INSERT INTO SalesV1 SELECT model, color, month, city, SUM(val) AS val, SUM(cnt) AS cnt FROM Sales JOIN Dealers on dealer = name GROUP BY model, color, month, city; • Example: Group dates by weeks and dealers by state and over all colors • INSERT INTO SalesV2 SELECT model, week, state, SUM(val) AS val, SUM(cnt) AS cnt FROM Sales JOIN Dealers on dealer = name GROUP BY model, week, state;

  10. Queries using materialized view Q1 : SELECT model, SUM(val) FROM Sales GROUP BY model; can be answered by: SELECT model, SUM(val) FROM SalesV1 GROUP BY model; or SELECT model, SUM(val) FROM SalesV2 GROUP BY model; Q2 : SELECT model, year, state, SUM(val) FROM Sales JOIN Dealers ON dealer = name GROUP BY model, year, state; can only be answered by: SELECT model, year, state, SUM(val) FROM SalesV1 GROUP BY model, year, state; Q3 : SELECT model, color, date, SUM(val) FROM Sales GROUP BY model, color, date; can be answered by neither SalesV1 nor SalesV2.

  11. The Lattice of Views • Lattice arranges groupings of similar dimensions in a hierarchical order. • Points of lattice are the ways we can partition a dimension. • If partition P1 is below partition P2 (P1≤P2), if each group of P1 is contained within some group of P2. • Examples of lattice: Lattice of partitions on time interval Lattice of partitions on dealers

  12. Lattice of a materialized view • If V1 and V2 are two different views, then V1≤ V2 means in each dimension, the partition P1 from view V1 is at least as fine as partition P2 from view V2; that is P1≤P2. • General rule: We can answer a query Q using view V if and only if V ≤ Q. Example of view lattice:

  13. Thank You.

More Related