1 / 45

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications. 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU. IEEE International Conference on Data Engineering (ICDE). A premium international conference on databases

risa
Télécharger la présentation

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU

  2. IEEE International Conference on Data Engineering (ICDE) • A premium international conference on databases • Inaugural conference held at Los Angeles in 1984 • Held in Taiwan in 1995

  3. ICDE2012 Research Papers Distribution • System Aspects • Privacy and Security 8% • Storage Management and Performance 7% • Entity resolution/Versioning 7% • Query Processing 31% • Top-k query 9% • Distributed/parallel/map-reduce 8% • Location-aware 5% • Execution Plan 5% • Graph indexing 4%

  4. Text/Web/Keyword Search 19% • Stream/Trajectory/Sequence/Spatio-Temporal 10% • Social Media7% • Uncertain Database 6% • Data Mining 5%

  5. Efficient Dual-Resolution Layer Indexing for Top-k Queries, ICDE2012 H2 H7 H1 H6 H8 H3 H4 H9 H5

  6. (price, distance to the airport) (0.45, 0.6) 0.525 (0.6, 0.2) 0.4 (0.55, 0.4) H2 H7 0.475 H1 (0.55, 0.3) 0.425 H6 (0.7, 0.4) 0.55 H8 (0.3, 0.7) 0.5 (0.3, 0.6) 0.45 H3 H4 (0.5, 0.5) 0.5 0.45 (0.2, 0.7) H9 H5

  7. (price, distance to the airport) (0.6, 0.2) 0.4 (0.55, 0.4) H7 0.475 H1 (0.55, 0.3) 0.425 H6 (0.3, 0.6) 0.45 H4 0.45 (0.2, 0.7) H5

  8. Answering Why-not Questions on Top-k Queries, ICDE2012 • Top-k query (Cleanliness, delicious, Parking spaces) p1 (95,80,40) 82 p2 (70,20,30) 41 p3 Top-2(0.4,0.5,0.1) p5 (50,90,60) (85,60,60) 71 p4 69 (75,70,50) 70 p6 (58,20,30) 36.2

  9. (Cleanliness, delicious, Parking spaces) • Why-not question p1 (95,80,40) 82 p2 83.5 (70,20,30) 41 p3 46 Top-2(0.5,0.4,0.1) Why p5 is not in my top-2 query list? p5 does not exist? Should I revise my query to look for top-5 hotels? Should I change my weights? p5 (50,90,60) 71 (85,60,60) 67 p4 69 71.7 (75,70,50) 70 70.5 p6 (58,20,30) 36.2 40

  10. The Min-dist Location Selection Query, ICDE2012 c1 c6 c2 Nearest facility distance c3 Minimize Nearest facility distance f1 p1 f2 c5 p2 c7 c4 c8

  11. c1 c6 c2 Nearest facility distance c3 f1 p1 f2 c5 c7 c4 c8

  12. c1 c6 c2 Nearest facility distance c3 f1 c5 f2 c7 p2 c4 c8

  13. Introduction Assume k = 3 kNN(q) = {a, b, c} a b q c kNN (k-Nearest Neighbors) Queries 13

  14. Introduction Assume k = 3 d RkNN(q) = {a, …} a q d RkNN (Reverse k-Nearest Neighbors) Queries 14

  15. Introduction Two types of data Assume k = 3 d BRkNN(q) = {a, …} a q d BRkNN (Bi-chromatic Reverse k-Nearest Neighbors) Queries 15

  16. Application I shop customer Which location is the best?

  17. Top-n Reverse kNN Queries Given two types of data G (goal) and C (condition) G:C: g3 g2 g1 Retrieve n data points from G, which have the largest BRkNN values Example: n=2, k=2 BR2NN value of g1=4 BR2NN value of g2=9 BR2NN value of g3=5 BR2Top-2 ={g2, g3}

  18. Voronoi Diagram of G : goal point (VD-node) : condition point 18

  19. A Filter-Refinement Frameworkfor Solving BRkNN Queries Assume k = 2 Lower-bound region of VDi (layer 0) Upper-bound region of VDi (layer 0 ~ layer (k-1)) Layer 1 Layer 0 VDi Layer 1 19

  20. Filter phase Assume k = 2 VDi Construct bisectors layer by layer to reduce the region 20

  21. Refinement Phase Assume k = 2 For a data point p, we want to check VDs at layer 1 ~ layer 2 to make sure whether VDi is one of the 2NN of p p VDi 21

  22. Refinement Phase Assume k = 2 VDi: (VD13, 1.2) (VD26, 1.4) (VD27, 1.7) (VD3, 1.7) (VD4, 1.8) (VD30, 2.1) (VD5, 2.5) (VD7, 4.8) dist(p, VD30) > 1.2 p 0.9 VDi >1.2 2.1 VD30 … 22

  23. Refinement Phase Assume k = 2 VDi: (VD13, 1.2) (VD26, 1.4) (VD27, 1.7) (VD3, 1.7) (VD4, 1.8) (VD30, 2.1) (VD5, 2.5) (VD7, 4.8) p 0.9 VDi >1.2 dist(VDi, VDj) > 2dist(VDi, p) 2.1 VD30 … 23

  24. Application II Maximum Coverage BRkNN Queries Retrieve 2 points from dataset G Assume k = 2 24

  25. BRkNN value = 9 25

  26. BRkNN value = 8 26

  27. total = 12 27

  28. total = 14 28

  29. Maximum Coverage BRkNN Queries C G • Given: • A set of goal points (G) • A set of condition points (C) • k: the k value of BRkNN • Goal: • Find n points from G, g1, g2, …, gn, which maximize |∪i=1~nBRkNN(gi,G,C)| 29

  30. Application III • Find n Most Favorite Products based on Reverse Top-k Queries

  31. Airlines Hotels All candidate packages Which are the most favorite packages? 31

  32. Top-k Queries (Customer’s View) All candidate packages C1- (a1, h1): 0.80+0.20.2+0.40.5+0.60.1+0.40.2 =0.38 (a1, h2): 0.80+0.20.2+0.40.5+0.60.1+0.60.2 =0.42 … C2- (a1, h1): 0.80.1+0.20.3+0.40.1+0.60.3+0.40.2 =0.44 (a1, h2): 0.80.1+0.20.3+0.40.1+0.60.3+0.60.2 =0.48 … Customer preferences 32

  33. Reverse Top-k Queries (Travel Agency’s View) All candidate packages Retrieve the customers whose top-2 favorites contain (a1, h2)  {c3} #customers in the reverse top-k query for a product is a good estimate of the favoring degree of the product in the market Customer preferences 33

  34. All candidate packages k (#packages considered by customers) = 2 n (#packages to be offered by the travel agency) = 2 (a1, h2): {c3} (a1, h5): {c3, c4} (a2, h5): {c4} (a3, h2): {c2} (a3, h5): {c2, c4} (a3, h6): {c1, c5} (a4, h6): {c5} (a5, h6): {c1} (a1, h2): {c3} (a1, h5): {c3, c4} (a2, h5): {c4} (a3, h2): {c2} (a3, h5): {c2, c4} (a3, h6): {c1, c5} (a4, h6): {c5} (a5, h6): {c1} Customer preferences 34

  35. Problem Definition of n-k MFP Given a set of component tables T1, T2, …, and Tx, which form a set of the candidate products P, a set of customers C with different preferences on the products, and two positive integers k and n RTOPk(cp, P, C): the set of the customers whose top-k favorites contain the candidate product cp Retrieve the minimum subset P’ of P such that |P’|  n and is maximized Maximum coverage problem: NP-hard 35

  36. Skyline A2 A1 0 An object p is said to dominate another object q if and only if p is larger than or equal to q on all dimensions and p is larger than q on at least one dimension Given a set of multi-dimensional objects, the skyline consists of the objects which are not dominated by any other object 36

  37. Property 1 Airlines Hotels Only the component tuples dominated by at most (k-1) other tuples in the same component table have the possibility of being a part of a top-k product for a customer c 37

  38. Reduce component tables Airlines Hotels 38

  39. Property 2 A2 The candidate products in the n-k MFP must be in Skyline(P) A1 0 For any two candidate products cp1 and cp2 in P, if cp1 dominates cp2, RTOPk(cp2, P, C)  RTOPk(cp1, P, C) For any candidate product cp in P, if cp  Skyline(P), cp  n-k MFP 39

  40. Property 2 (cont.) Airlines Hotels : the set of candidate products generated from Skyline(T1), Skyline(T2), …, and Skyline(Tx) A candidate product cp  Skyline(P) if and only if cp  [VLDB’09] Only the skyline tuples of each component table have the possibility of being a part of a candidate product in the n-k MFP 40

  41. Property 3 The upper bounds of the remaining candidate packages RTOPk(cp, Skyline(P), C) is an upper bound of RTOPk(cp, P, C) Only the customers in RTOPk(cp, Skyline(P), C) possibly become the members in RTOPk(cp, P, C) 41

  42. Refinement The top-2 favorites of C3: {(a1, h5), (a1, h2)} The top-2 favorites of C4: {(a1, h5), (a2, h5), (a3, h5)} P’ : {(a1, h5)} 42

  43. Refinement The top-2 favorites of C1: {(a3, h6), (a4, h6)} The top-2 favorites of C5: {(a3, h6), (a4, h6)} P’ : {(a1, h5)} P’ : {(a1, h5), (a3, h6)} P’ : {(a1, h5)} P’ : {(a1, h5)} P’ : {(a1, h5)} P’ : {(a1, h5)} 43

  44. Application IV • Find Most Favorite Products by Top-k Reverse Skyline Queries : user preferences 1 Year : products 1 1 u1 2 1 1 1 k=1 u2 Mileage

  45. Thank you for your attention!

More Related