Efficient Cube Computation Methods for Multi-way Array Aggregation

Cube Computation Prof. Navneet Goyal Computer Science & Information Systems Department BITS, Pilani

Cuboids Corresponding to the Cube all 0-D(apex) cuboid country product date 1-D cuboids product,date product,country date, country 2-D cuboids 3-D(base) cuboid product, date, country

Efficient Data Cube Computation • Data cube can be viewed as a lattice of cuboids • The bottom-most cuboid is the base cuboid • The top-most cuboid (apex) contains only one cell • How many cuboids in an n-dimensional cube with L levels? • Materialization of data cube • Materialize every (cuboid) (full materialization), none (no materialization), or some (partial materialization) • Selection of which cuboids to materialize • Based on size, sharing, access frequency, etc.

Cube Computation: ROLAP-Based Method • Efficient cube computation methods • ROLAP-based cubing algorithms (Agarwal et al’96) • Array-based cubing algorithm (Zhao et al’97) • Bottom-up computation method (Bayer & Ramarkrishnan’99) • ROLAP-based cubing algorithms • Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples • Grouping is performed on some subaggregates as a “partial grouping step” • Aggregates may be computed from previously computed aggregates, rather than from the base fact table

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Partition arrays into chunks (a small subcube which fits in memory). • Compressed sparse array addressing: (chunk_id, offset) • Compute aggregates in “multiway” by visiting cube cells in the order which minimizes the # of times to visit each cell, and reduces memory access and storage cost. What is the best traversing order to do multi-way aggregation?

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation Example:3-D data array containing 3 dimensions A, B, & C • Array is partitioned into small, memory-based chunks • 64 chunks • Dimension A is organized into 4 equi-sized partitions a0-a3 • Same for B & C • Chunks 1, 2,…,64 correspond to the subcubes a0b0c0, a1b0c0,…a3b3c3.

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation Example:3-D data array containing 3 dimensions A, B, & C • Cardinality of the dimensions: A-40, B-400, & C-4000 • Size of each partition in A, B, & C is therefore 10, 100, 1000.

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation Example:3-D data array containing 3 dimensions A, B, & C • Many possible orderings with which chunks can be read into memory • Suppose we want to computer b0c0 chunk of the BC cuboid • By scanning chunks 1-4 of ABC, the b0c0 chunk is computed • Cells for b0c0 are aggregated over a0 to a3.

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Chunk memory can now be assigned to the next chunk b1c0, which completes its aggregation after scanning the next 4 chunks of ABC: 5-8 • Continuing in this manner, the entire BC cuboid can be computed • ONLY 1 chunk of BC needs to be in memory for the computation of all chunks of BC

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • In computing entire BC cubiod, we will have examined all the 64 chunks • Is there a way to avoid having to rescan all of these chunks for the computation of other cuboids? • YES • MULTIWAY COMPUTATION OR SIMULTANEOUS AGGREGATION

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • For example, when chunk 1 (a0b0c0) is being scanned (say for the computation of the 2D chunk b0c0 of BC), all of the other 2D chunks relating to a0b0c0 can be simultaneoulsy computed. • That is, when a0b0c0 is being scanned, each of the three chunks, b0c0, a0c0, & a0b0, on the three 2D aggregation planes, BC, AC, & AB, should be computed then as well

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Multiway computation simultaneously aggregates to each of the 2D planes while a 3D chunk is in memory. • Largest 2D plane is BC (400*4000=1600000), then AC (40*4000=160000) & finally AB (40*400=16000). • Scan the chunks in the order shown below.

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation B b0c0 chunk

Multi-way Array Aggregation for Cube Computation C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 B 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Suppose chunks are scanned in the order shown below • One chunk of the largest 2D plane BC is fully computed for each row scanned • b0co is fully aggregated after scanning 1-4 • Similarly b1co is fully aggregated after scanning 5-8 & so on • Complete computation of 1 chunk of 2nd largest plane AC, requires 13 chunks, given the ordering 1-64

C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 B 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A Multi-way Array Aggregation for Cube Computation • Complete computation of 1 chunk of 2nd largest plane AC, requires 13 chunks, given the ordering 1-64 • a0c0 is fully aggregated after scanning of 1,5,9, & 13. • For smallest plane AB, the chunk a0b0 requires scanning 49 chunks. ( 1,17,33, & 49) • NOTE that AB requires the longest scan of chunks

Multi-Way Array Aggregation for Cube Computation (Cont.) • Method: the planes should be sorted and computed according to their size in ascending order. • Idea: keep the smallest plane in the main memory, fetch and compute only one chunk at a time for the largest plane • Limitation of the method: computing well only for a small number of dimensions • If there are a large number of dimensions, “bottom-up computation” and iceberg cube computation methods can be explored

Multi-Way Array Aggregation for Cube Computation (Cont.) BEST (156000 memory units) WORST (1641000 memory units)

References • S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases, 506-521, Bombay, India, Sept. 1996. • K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), 359-370, Philadelphia, PA, June 1999. • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data, pages 205-216, Montreal, Canada, June 1996. • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data, 159-170, Tucson, Arizona, May 1997.

Q & A

Thank You

Efficient Cube Computation Methods for Multi-way Array Aggregation