1 / 35

Approximate Query Processing using Wavelets

Approximate Query Processing using Wavelets. Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra. Outline. Introduction.

shepry
Télécharger la présentation

Approximate Query Processing using Wavelets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate Query Processing using Wavelets Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26th VLDB Conference, Cairo, Egypt Presented By Supriya Sudheendra

  2. Outline

  3. Introduction • Approximate Query Processing is a viable solution for: • Huge amounts of data • High query complexities • Stringent response-time requirements • Decision Support Systems • Support business and organizational decision-making activities • Helps decision makers compile useful information from raw data, solve problems and make decisions

  4. Introduction… • DSS users pose very complex queries to the DBMS • Requires complex operations over GB or TBs of disk-resident data • Very long time to execute and produce exact answers • Number of scenarios where users prefer a fast, approximate answers

  5. Prior Work • Previous Approximate query processing techniques • Focused on specific forms of aggregate queries • Data reduction mechanism – how to obtain the synopses of data • Sampling-based Techniques • A join-operator on 2 uniform random samples results in a non-uniform sample having very few tuples • For non-aggregate queries, it produces a small subset of the exact answer which might be empty when joins are involved.

  6. Prior Work… • Histogram Based Techniques • Problematic for high-dimensional data • Storage overhead • High construction cost • Wavelet Based Techniques • Mathematical tool for hierarchical decomposition of functions • Apply wavelet decomposition to input data collection –> data synopsis • Avoids high construction costs and storage overhead

  7. Contribution of the Paper • Viability and effectiveness of wavelets as a generic tool for high-dimensional DSS • New, I/O-efficient wavelet decomposition algorithm for relational tables • Novel Query processing algebra for Wavelet-Co-Efficient Data Synopses • Extensive Experiments

  8. Background • Mathematical tool to hierarchically decompose functions • Coarse overall approximation together with detail coefficients that influence function at various scales • Haar wavelets are conceptually simple, fast to compute • Variety of applications like image editing and querying

  9. One-Dimensional Haar Wavelets • How to compute, given a data array: • Average the values together pairwise to get a “lower-resolution” representation of data • Detailed coefficients-> differences of the averaged value from the computed pairwise average • Reconstruction of the data array possible • Why Detail Coefficients

  10. One-dimensional Haar Wavelets • Wavelet Transform: Overall average followed by detail coefficients in increasing order of resolution. Each entry->wavelet coefficient • WA = [4, -2, 0, -1] • For vectors containing similar values, • most detail coefficients have small values that can be eliminated • Introduces only small errors

  11. One-dimensional Haar Wavelets • Overall average more important than any detail coefficient • To normalize the final entries of WA, each wavelet coefficient is divided by 2l • l: level of resolution • WA = [4, -2, 0, -1/2]

  12. Multi-dimensional Haar Wavelets • Haar wavelets can be extended to multi-dimensional array • Standard Decomposition • Fix an ordering for the data dimensions(1,2,…d) • Apply complete 1-D wavelet transform for each 1-d row of array cells along dimension k

  13. Nonstandard Decomposition • Alternates between dimensions during successive steps of pairwise averaging and differencing for each 1-D row of array cells along dimension k • Repeated recursively on quadrant containing all averages across all dimensions

  14. Non-standard Decomposition • Pairwise averaging and differencing for one positioning of 2x2 box with root [2i1, 2i2] • Distribution of the results in the wavelet transform array • Process is recursed on lower-left quadrant of WA

  15. Example Decomposition of a 4 X 4 Array

  16. Multi-dimensional Haar coefficients: Semantics and Representation • D-dimensional Haar basis function corresponding to Wavelet w is defined by: • D-dimensional rectangular support region • Quadrant sign information

  17. Support Regions for 16 Nonstandard 2-D Haar Basis Function • Blank areas – regions of A whose reconstruction is independent of the coefficient • WA[0,0] – overall average • WA[3,3] – contributes only to upper right quadrant

  18. HaarCoEfficients: Semantics and Representation • W = <R, S, v> • W.R – d-dimensional support hyper-rectangle of W encloses all cells in A to which W contributes • Hyper-rectangle – represented by low and high boundaries across each dimension j, 1<= j <=d • W.R.boundary[j].lo and W.R.boundary[j].hi • W contributes to each data cell A[i1,……id] where W.R.boundary[j].lo <= ij <= W.R.boundary[j].hi for all j

  19. W.S – sign information for all d-dimensional quadrants of W.R • Denoted by W.S.sign[j].lo and W.S.sign[j].hi corresponding to lower and upper half of W.R’s extent along j • Computed as the product of d sign-vector entries that map to that quadrant • W.v – scalar magnitude of W • Quantity that W contributes to all data array cells enclosed in W.R

  20. Building Wavelet Coefficient Synopses • Relation R with d attributes X1, X2, ………Xd • Can represent R as a d-dimensional array AR • Jth dimension is indexed by the values of attribute Xj • Cells contain the count of tuples in R having the corresponding combination of attribute values • AR – joint frequency distribution of all attributes of R

  21. Chunk-based organization of relational tables • Joint frequency array AR – split into d-dimensional chunks • Tuples of R of same chunk are stored contiguously on disk • If R is not chunked, one extra pre-processing step to reorganize R on disk

  22. ComputeWavelet Algorithm • When a chunk is loaded for the first time, ComputeWavelet can perform entire computation for decomposing • Pairwise averaging and differencing is performed as soon as 2d averages are accumulated • Memory efficient- no more than one active sub-array at a time for each level of resolution

  23. Processing Relational Queries in Wavelet Coefficient Domain Wavelet-Coefficient Synopses WT1, WT2,…WTk Wavelet-Coefficient Synopses WT1, WT2,…WTk Render(WT1…WTk) Op(WT1,….WTk) RS of Wavelet Coefficients WS Approximate Relations T1, T2,….Tk Op(T1, T2…. Tk) Render(WS) Approx. Result Relation S Approx. Result Relation S

  24. Selection Operator Our selection operator has the general form selectpred(WT ), where pred represents a generic conjunctive predicate on a subset of the d attributesin T; that is, pred = (li1 ≤ Xi1 ≤ hi1 ) ∧ . . . ∧ (lik ≤ Xik ≤ hik ), where lijand hijdenote the low and high boundaries of the selected range along each selection dimension Dij , j = 1, 2, · · · , k, k ≤ d.

  25. Relation Selection - Relational Domain Joint Data Distribution Array 3 2 1 3 Dim. D1 2 3 1 7 3 4 6 8 6 Dim. D2 Query Range • In relational domain, interested in only those cells inside query range • In wavelet domain, interested in only the coefficients that contribute to those cells

  26. Projection Operator

  27. Projection- Wavelet Domain

  28. Join Operator

  29. Join Operator- Wavelet Domain

  30. Experimental Study • Improved answer quality • Low synopsis construction costs • Fast query execution

  31. Query Execution Times

  32. SELECT-JOIN-SUM

  33. SELECT Query errors on real-life data

  34. Conclusion • Multidimensional wavelets as an effective tool for general purpose approximate query processing in modern, high dimensional applications • The query processing algorithms operate directly on the wavelet-coefficient synopses of relational data, thus allowing for very fast processing of arbitrarily complex queries entirely in the wavelet-coefficient domain • Extensive experimental study with synthetic as well as real-life data sets that verifies the effectiveness of the wavelet-based approach compared to both sampling and histograms

  35. Thank you

More Related