190 likes | 335 Vues
Overview. Traditional database systems are tuned to many, small, simple queries.Some new applications use fewer, more time-consuming, analytic queries.New architectures have been developed to handle analytic queries efficiently.. Background. DSS (Decision Support System)Gain competitiveness for
E N D
1. M.Tech.(CS) By Research 1st Sem Seminar Title of the Project : Computing & Querying Data Cube A Parallel Approach.
Supervisors : Dr.A.K. Ramani (Prof. & Head, SCS, DAV Indore )
Dr. B.S.Panda (Associate Professor, Maths Department ,IIT Delhi)
2. Overview Traditional database systems are
tuned to many, small, simple queries.
Some new applications use fewer, more time-consuming, analytic queries.
New architectures have been developed to handle analytic queries efficiently.
3. Background DSS (Decision Support System)
Gain competitiveness for business
Data warehouse
Maintain historical information
Use Data Cube to summarize results
Identify trends
Performance issue (time and space)
Need to reuse result (materialization of view)
4. The Data Warehouse The most common form of Data integration.
Copy sources into a single DB (warehouse) and try to keep it up-to-date.
Usual method: periodic reconstruction of the warehouse, perhaps overnight.
Frequently essential for analytic queries.
5. OLTP Most Database operations involve Online Transaction Processing (OLTP).
Short, simple, frequent queries and/or modifications, each involving a small number of tuples.
Examples: Answering queries from web interface, sales at cash registers, selling airline tickets.
6. OLAP On-Line Application Processing (OLAP, or analytic) queries are, typically:
Few, but complex queries --- may run for hours.
Queries do not depend on having an absolutely up-to-date database.
7. OLAP Examples Amazon analyses purchases by its customers to come up with an individual screen with products of likely interest to the customer.
Analysts at Wal-Mart look for items with increasing sales in some region.
8. Common Architecture Databases at store branches handle OLTP.
Local store databases copied to a central warehouse overnight.
Analysts use the warehouse for OLTP.
9. ROLLUP & CUBE ROOT ROLL UP operator delivers aggregates and superaggregates within a GROUP BY.
Used by report writers to extract statistics summary information from result sets.
The cummulative aggregates can be used in reports, charts and graphs.
Without ROLLUP subtotals can be produce by UNION ALL. For n columns n+1 SELECT statement.
10. Introduction of Datacube Datacube
Dimensionality (number of GROUP-BYs)
Aggregated data: Values in each cell
Dimension of datacube: Detail of summary
Higher Dimension: Higher detail
Common Operations
Drill down: Look in more detail
Roll Up: Look in less detail
11. Definition of Datacube Users of DSS often see data in the form of Data Cubes.
A Cube can be 2 dimensional,
3 dimensional,or higher dimensional
A Data cube is defined to be data abstraction that allows one to view aggregated data from a number of perspectives or views.
12. Our Problem Physically materialize the whole data cube
Best query response
Heavy pre-computing, large storage space
i,e Time efficient but space inefficient
Materialize Nothing
Worse query response
Dynamic query evaluation, less storage space
i,e Space efficient but time inefficient
13. Problem on materialized views Materialize only part of the data cube
Balance the storage space and response
What Is the best subject to materialize
14. Data? View? We use data cube to modify aggregate data
So what we use to model view
Lattice
15. Lattice Multidimensional data can be viewed as lattice of cubiods.
16. Data Cube Technology Sequential Computation
Pipe Sort Algorithm: Undertaken by Sunita Sarawagi and others of the IBM Research Center
Other Algorithm : Pipe Hash and Array-based
17. Parallel Computation As the data sizes of organisations have increased at an exponential rate, the efforts of the researchers too have been increased to a great extent towards providing parallel solutions for the problem
The algorithm uses a cluster based approach, consisting of multiprocessors grouped together to perform computations simultaneoulsy and separate memories and processing capabilities for faster results.
18. References S.Chaudhari and Kyuseok Shim. Including Group-By Query Optimization In proceedings of the twentieth International Conference on Very Large Databases (VLDB)
2) S. Chaudhari et al. An overview of Data Warehousing and OLAP Technology. ACM-SIGMOD Record
19. References 3) J.Gray, A. Bosworth, A. Layman, and H.Piramish. Data Cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In the proceeding of the 12th Intl. Conference on Data Engineering
4) V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In the proceedings of the ACM SIGMOD Conference on Management of Data