1 / 50

Caching Dynamic Skyline Queries

Caching Dynamic Skyline Queries. D. Sacharidis 1 , P. Bouros 1 , T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management of Information Systems – R.C. Athena. Outline. Introduction Skyline (SL) and dynamic skyline queries (DSL) Related work

liora
Télécharger la présentation

Caching Dynamic Skyline Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching Dynamic Skyline Queries D. Sacharidis1, P. Bouros1, T. Sellis1,2 1National Technical University of Athens 2Institute for Management of Information Systems – R.C. Athena HDMS'08

  2. Outline • Introduction • Skyline (SL) and dynamic skyline queries (DSL) • Related work • Evaluating dynamic skyline queries • Computing orthant skylines (OSL) • Computing dynamic skyline via caching • LRU, LFU, LPP cache replacement policies • Experimental evaluation • Conclusions and Future work HDMS'08

  3. Skyline queries (SL) • Given a dataset of d-dimensional points • SL contains points not dominated by others • x dominates y iff x as good as y in all dimensions and strictly better in at least one HDMS'08

  4. Skyline queries (SL) • Given a dataset of d-dimensional points • SL contains points not dominated by others • x dominates y iff x as good as y in all dimensions and strictly better in at least one Price Distance from sea • Example • Dataset of hotels • Prefer cheap hotels close to the sea HDMS'08

  5. Skyline queries (SL) • Given a dataset of d-dimensional points • SL contains points not dominated by others • x dominates y iff x as good as y in all dimensions and strictly better in at least one Skyline points Price Distance from sea • Example • Dataset of hotels • Prefer cheap hotels close to the sea HDMS'08

  6. Skyline queries (SL) • Given a dataset of d-dimensional points • SL contains points not dominated by others • x dominates y iff x as good as y in all dimensions and strictly better in at least one Skyline points Price p1 Distance from sea • Example • Dataset of hotels • Prefer cheap hotels close to the sea HDMS'08

  7. Skyline queries (SL) • Given a dataset of d-dimensional points • SL contains points not dominated by others • x dominates y iff x as good as y in all dimensions and strictly better in at least one Skyline points Price p1 p2 Distance from sea • Example • Dataset of hotels • Prefer cheap hotels close to the sea HDMS'08

  8. Dynamic skyline queries (DSL) • Extension of skyline queries • Given a query point q • DSL contains points not dynamically dominated by others w.r.t q • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one • Can be treated as static SL • Transform points w.r.t. q HDMS'08

  9. Dynamic skyline queries (DSL) • Extension of skyline queries • Given a query point q • DSL contains points not dynamically dominated by others w.r.t q • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one • Can be treated as static SL • Transform points w.r.t. q Query point q Price Distance from sea • Example • User defines “ideal” hotel q HDMS'08

  10. Dynamic skyline queries (DSL) • Extension of skyline queries • Given a query point q • DSL contains points not dynamically dominated by others w.r.t q • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one • Can be treated as static SL • Transform points w.r.t. q Dynamic Skyline points Price q Distance from sea • Example • User defines “ideal” hotel q HDMS'08

  11. Dynamic skyline queries (DSL) • Extension of skyline queries • Given a query point q • DSL contains points not dynamically dominated by others w.r.t q • x dynamically dominates y iff x as preferable as y w.r.t. q in all dimensions and strictly more preferable w.r.t. q in at least one • Can be treated as static SL • Transform points w.r.t. q p4 p5 Dynamic Skyline points Price q Distance from sea • Example • User defines “ideal” hotel q HDMS'08

  12. Intuition (1) • Traditional SL algorithms need to run anew for each DSL query • Our idea • Exploit results from past queries to reduce processing cost for future DSL queries • Cache past queries • Decide which queries in cache are useful HDMS'08

  13. Intuition (2) Price Distance from sea HDMS'08

  14. Intuition (2) • 2 past DSL queries • qa, qb • Each query partitions space in 4quadrants qa Price qb Distance from sea HDMS'08

  15. Intuition (3) p4 • A new query q arrives • Consider DSL for qa • p1 is contained DSL(qa) • p1 dominates p2, p3, p4 • p1 lies in upper right quadrant w.r.t. qa • qa lies in upper right quadrant w.r.t. q • p1 dominates also p2, p3,p4 w.r.t. to q • Exclude p2, p3, p4 from dominance test for DSL(q) p2 p1 p3 qa q Price qb Distance from sea • Shaded area denotes points dominated by p1 HDMS'08

  16. Contribution in brief • Caching past DSL queries cannot reduce processing cost for future ones • We need more information about dominance relationships • Introduce orthantskylines (OSL) and examine their relationship with DSL • ExtendBitmap algorithm to compute OSL in parallel with DSL • Cache OSL to enhance DSL queries evaluation • Present 3 cache replacement policies • LRU, LFU, LPP • Experimental evaluation of caching mechanism HDMS'08

  17. Related work • Non-indexed methods • Block-Nested Loops (BnL) • Bitmap • Multidimensional Divide and Conquer (DnC) • Sort First Scan (SFS) • Index-based methods • B-tree • sort points according to the lowest valued coordinate • R-tree • Nearest neighbor based (NN) • Branch and bound (BBS) HDMS'08

  18. Related work • Non-indexed methods • Block-Nested Loops (BnL) • Bitmap • Multidimensional Divide and Conquer (DnC) • Sort First Scan (SFS) • Index-based methods • B-tree • sort points according to the lowest valued coordinate • R-tree • Nearest neighbor based (NN) • Branch and bound (BBS) HDMS'08

  19. Bitmap • BnL variant • Suitable for domains with low cardinality and discrete • In brief • Computes a bitmap representation of the points in the dataset • Examines each point separately (dominance test) • Checks whether it is contained in the skyline or not • Exploits fast bitwise operations OR/AND HDMS'08

  20. Bitmap – Dominance test • For each point p • Define A = A1 & A2 & … & Ad • Denotes the points as good as p in all dimensions • Define B = B1 | B2 | … | Bd • Denotes the points strictly better than p in at least one dimension • Dominance test: • If C = A & B has all bits set to 0 then p is in SL HDMS'08

  21. Orthant skyline (OSL) • OSL provides more information about dominance relationships than DSL • Useful for pruning • Given a dataset of d-dimensional points and a query point q • Space partitioned in 2d orthants • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q HDMS'08

  22. Orthant skyline (OSL) Quadrant 1 Quadrant 0 • OSL provides more information about dominance relationships than DSL • Useful for pruning • Given a dataset of d-dimensional points and a query point q • Space partitioned in 2d orthants • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q Query point q Price Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  23. Orthant skyline (OSL) Quadrant 1 Quadrant 0 • OSL provides more information about dominance relationships than DSL • Useful for pruning • Given a dataset of d-dimensional points and a query point q • Space partitioned in 2d orthants • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q Query point q Price Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  24. Orthant skyline (OSL) Quadrant 1 Quadrant 0 • OSL provides more information about dominance relationships than DSL • Useful for pruning • Given a dataset of d-dimensional points and a query point q • Space partitioned in 2d orthants • o-th orthant skyline (OSL) of q contains points of the o-th orthantnot dynamically dominated by others inside orthant o w.r.t q Query point q Quadrant 2 skyline points Price Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  25. OSL and DSL relationship Quadrant 1 Quadrant 0 Price q Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  26. OSL and DSL relationship Quadrant 1 Quadrant 0 Price q Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  27. OSL and DSL relationship Quadrant 1 Quadrant 0 • Map points from quadrants 1,2,3 to points inside quadrant 0 Price q Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  28. OSL and DSL relationship Quadrant 1 Quadrant 0 • Map points from quadrants 1,2,3 to points inside quadrant 0 • Compute DSL w.r.t. q Price q Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  29. OSL and DSL relationship Quadrant 1 Quadrant 0 • Map points from quadrants 1,2,3 to points inside quadrant 0 • Compute DSL w.r.t. q • Union of all OSLs is superset of DSL w.r.t. to q Price q Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  30. OSL and DSL relationship Quadrant 1 Quadrant 0 • Map points from quadrants 1,2,3 to points inside quadrant 0 • Compute DSL w.r.t. q • Union of all OSLs is superset of DSL w.r.t. to q p1 p2 Price q Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  31. OSL and DSL relationship Quadrant 1 Quadrant 0 • Map points from quadrants 1,2,3 to points inside quadrant 0 • Compute DSL w.r.t. q • Union of all OSLs is superset of DSL w.r.t. to q p2 Price q p3 Distance from sea Quadrant 2 Quadrant 3 HDMS'08

  32. Computing orthant skylines • Algorithm DBM • Extends Bitmap to compute DSL and OSLs at the same time • Method: • Compute bitmap representation • Transform each point coordinates w.r.t. to query q • Dominance test, point p, orthant o • p not in OSLo and not in DSL • p not in DSL, but in OSLo • p in DSL and in OSLo HDMS'08

  33. Dynamic skylines Via Caching • Cache OSLs instead of DSLs • Query cache contains (query point qj, OSLs) • OSLs encode by bitmaps • Algorithm cDBM • OSL contains information about dominance test inside orthant • Discard points inside orthants from dominance tests • Method: • Compute bitmap representation • For each point p consider its position (orthant) w.r.t. to cache queries qj • If p in the same orthant o w.r.t qj as qj w.r.t. q and p not in OSLo(qj) then exclude it from OSLo(q), DSL(q) HDMS'08

  34. Cache Replacement Policies • General idea • Limited cache space • Identify least useful query point in cache • Replace it with new one HDMS'08

  35. Usage-based policies • Only a few queries in cache are useful • Log cache query usage • Given a new query q • Consider as input the query point cache Q • Only query points in OSL of Q w.r.t. q are useful • Update cache - remove: • Least Recently Used (LRU) query point • Least Frequently Used (LFU) query point HDMS'08

  36. Usage-based policies • Only a few queries in cache are useful • Log cache query usage • Given a new query q • Consider as input the query point cache Q • Only query points in OSL of Q w.r.t. q are useful • Update cache - remove: • Least Recently Used (LRU) query point • Least Frequently Used (LFU) query point qd qb qc qa Price q Distance from sea HDMS'08

  37. Usage-based policies Redundant queries • Only a few queries in cache are useful • Log cache query usage • Given a new query q • Consider as input the query point cache Q • Only query points in OSL of Q w.r.t. q are useful • Update cache - remove: • Least Recently Used (LRU) query point • Least Frequently Used (LFU) query point qd qb qc qa Price q Distance from sea HDMS'08

  38. Usage-based policies Redundant queries • Only a few queries in cache are useful • Log cache query usage • Given a new query q • Consider as input the query points in cache Q • Only query points inOSL of Q w.r.t. q are useful • Update cache - remove: • Least Recently Used (LRU) query point • Least Frequently Used (LFU) query point qd qb qc qa Price q Distance from sea HDMS'08

  39. Usage-based policies Redundant queries • Only a few queries in cache are useful • Log cache query usage • Given a new query q • Consider as input the query points in cache Q • Only query points in OSL of Q w.r.t. q are useful • Update cache - remove: • Least Recently Used (LRU) query point • Least Frequently Used (LFU) query point qd qb qc qa Price q Distance from sea HDMS'08

  40. Pruning power-based policy • Usage-based policies do not indicate usefulness • Useful cached query • Great pruning power • Probability that a query can prune points of dataset from DSL computation • Depends on • Points dominated by query in an orthant j • Points contained in the antisymetric orthantof j • Update cache – remove • Query point with less pruning power (LPP) qa Price q Distance from sea HDMS'08

  41. Pruning power-based policy • Usage-based policies do not indicate usefulness • Useful cached query • Great pruning power • Probability that a query can prune points of dataset from DSL computation • Depends on • Points dominated by query in an orthant j • Points contained in the antisymetric orthantof j • Update cache – remove • Query point with less pruning power (LPP) 5:24 2:24 qa 3:24 74:24 Price q Distance from sea HDMS'08

  42. Pruning power-based policy • Usage-based policies do not indicate usefulness • Useful cached query • Great pruning power • Probability that a query can prune points of dataset from DSL computation • Depends on • Points dominated by query in an orthant j • Points contained in the antisymetric orthant of j • Update cache – remove • Query point with less pruning power (LPP) 5:74 2:34 qa 3:44 74:884 Price q Distance from sea HDMS'08

  43. Pruning power-based policy • Usage-based policies do not indicate usefulness • Useful cached query • Great pruning power • Probability that a query can prune points of dataset from DSL computation • Depends on • Points dominated by query in an orthant j • Points contained in the antisymetric orthant of j • Update cache – remove • Query point with less pruning power (LPP) 5:74 2:3176 qa 3:44 74:884 Price q Distance from sea HDMS'08

  44. Pruning power-based policy • Usage-based policies do not indicate usefulness • Useful cached query • Great pruning power • Probability that a query can prune points of dataset from DSL computation • Depends on • Points dominated by query in an orthant j • Points contained in the antisymetric orthant of j • Update cache – remove • Query point with less pruning power (LPP) 5:720 2:3176 qa 3:421 74:88222 Price q Distance from sea HDMS'08

  45. Pruning power-based policy • Usage-based policies do not indicate usefulness • Useful cached query • Great pruning power • Probability that a query can prune points of dataset from DSL computation • Depends on • Points dominated by query in an orthant j • Points contained in the antisymetric orthant of j • Update cache – remove • Query point with less pruning power (LPP) 5:720 2:3176 qa 3:421 74:88222 Price q Distance from sea HDMS'08

  46. Experimental Evaluation • Synthetic datasets • Distribution types • Independent, correlated, anti-correlated • Number of points N • 10k, 20k, 50k, 100k, • Dimensionality • d = {2,3,4,5,6} • Domain size for dimension • |D| = {10,20,50} • Compare • Bitmap (NO-CACHE) • cDBM with LFU,LRU,LPP cache replacement policies • Query cache • |Q| = {10,20,30,40,50} past query points • Cache size is |Q|*N bits uncompressed HDMS'08

  47. Varying query cache size Independent Anti-correlated • Dataset: N = 50k points, with d = 4 dimensions of |D| = 20 domain size • LFU,LRU cache queries not representative for future ones • LPP caches queries with great pruning power HDMS'08

  48. Effect of distribution parameters Correlated vary N Correlated vary d • Relative improvement in running time over NO-CACHE • Vary number of points N • d = 4 dimensions of |D| = 20 domain size • Vary number of dimensions d • N = 50k, |D| = 20 HDMS'08

  49. Conclusions and Future work • Conclusions • Introduced orthant skylines(OSLs) and discussed its relationship with DSL • Extended Bitmap to compute OSLs and DSL at the same time (DBM algorithm) • Proposed caching mechanism of OSLs to reducecost for future DSL queries • LRU, LFU, LPPcache replacement policies • Experimentally verified the efficiency of caching mechanism • Future work • Apply caching mechanism to index-based methods • Further increase pruning power of cached queries HDMS'08

  50. Questions ? HDMS'08

More Related