1 / 26

Reverse Top- k Queries

Reverse Top- k Queries. Akrivi Vlachou * , Christos Doulkeridis * , Yannis Kotidis # , Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU), Trondheim, Norway # Athens University of Economics and Business (AUEB), Greece. Outline. Motivation & Preliminaries

natara
Télécharger la présentation

Reverse Top- k Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reverse Top-k Queries Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nørvåg* *Norwegian University of Science and Technology (NTNU), Trondheim, Norway #Athens University of Economics and Business (AUEB), Greece

  2. Outline • Motivation & Preliminaries • Monochromatic Reverse Top-k Queries • Bichromatic Reverse Top-k Queries • Threshold-based Algorithm • Materialized Views • Experimental Evaluation • Conclusions & Future Work

  3. Rank-aware Query Processing • Huge amount of available data • Users prefer to retrievea limited set of k ranked data objects that best match their preferences (top-k queries)

  4. Top-k Query • Given a scoring function f(), retrieve the k object that best match the user preferences • Linear scoring function f w(p)= Σw[i]*p[i] • Weight w[i]: • relative importance of attribute i • Definition TOPk(w): Given a weighting vector w and a positive integer k, find the k data points p with the minimumf(p) scores Query line of w at point p: defines the score of p Query space of w defined by point p: number of enclosed points determines the rank of p

  5. customer customer customer customer Reversing the Top-k Query • From the perspective of manufacturers: • it is important that a product is returned in the highest ranked positions for as many user preferences as possible • estimate the impact of a product compared to their competitors products • advertise a product to potential customers Which customers would be interested? sales representative

  6. customer customer customer customer Reversing the Top-k Query • Reverse top-k query: Given a potential product q and a positive integer k, which are the weighting vectors w for which q is in the top-k query result set? • Two different versions Monochromatic: • no knowledge of user preferences Bichromatic: • a dataset with user preferences is given sales representative

  7. Car Database Example • A database containing information about different cars • Different users have different preferences • Bob prefers a cheap car, and does not care much about the age • the best choice (top-1) for Bob is the car p1 with score 2.5 • Tom prefers a newer car rather than a cheap car • the best choice for Tom and Max is the car p2

  8. Car Database Example Query point q=p2, k=1: • Bichromatic reverse top-k: {(0.2,0.8), (0.5,0.5)} • advertise product to Tom and Max • Monochromatic reverse top-k: line segment w[price]=[1/7,5/6] • estimate the impact of p2 as 69% Query point q=p3, k=1: empty result set for the bichromatic query

  9. Outline • Motivation & Preliminaries • Monochromatic Reverse Top-k Queries • Bichromatic Reverse Top-k Queries • Threshold-based Algorithm • Materialized Views • Experimental Evaluation • Conclusions & Future Work

  10. mRTOP1(q) Monochromatic Reverse Top-k Query • mRTOPk(q): Given a point q, a positive number k and a dataset S, the result set of the monochromatic reverse top-k query is the locus for which there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p). • The solution spaceW can be split into a finite set of non-adjacent partitions such that query point q has the same rank for all the weighting vectors. • For the monochromatic case: we focus on the 2-d space 2 1 2 Solution space

  11. Geometric Interpretation d=2, k =1 • If q belongs to the convex hull, then there exists exactly one partition in mRTOP1(q) • Weighting vectors that are perpendicular to pq and qr define the line segment • For weighting vectors with smaller and larger slopes than w1, the relative order of p and q changes • Monochromatic reverse top-k,k>1: • The solution space may contain more than 1 partition

  12. Outline • Motivation & Preliminaries • Monochromatic Reverse Top-k Queries • Bichromatic Reverse Top-k Queries • Threshold-based Algorithm • Materialized Views • Experimental Evaluation • Conclusions & Future Work

  13. Bichromatic Reverse Top-k Query • bRTOPk(q): Given a point q, a positive number k and two datasets S and W, where S represents data points and W is a dataset containing different weighting vectors, a weighting vector wibelongs to the result set, if and only if there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p) • Naïve approach: • for each weighting vector process the top-k query • test if query point q is in the top-k list

  14. Threshold-based Algorithm (RTA) • Goal: • reduce the number of top-k evaluations by discarding weighting vectors • Threshold-based Algorithm (RTA): • sort the weighting vectors based on pairwise similarity • top-k queries defined by similar vectors, have similar result sets • evaluate the first top-k query, calculate a threshold • For each weighting vector • possibly prune based on threshold • refine threshold

  15. p2 p3 p1 p4 p6 p5 p8 p9 p10 p7 w3 w1 w2 Example of RTA Algorithm (k=2) Buffer: p1, p2 • Evaluate top-2 query for w1 • Set threshold based on w2 • fw2(q) > threshold  discard w2 • Refine threshold for w3 q W=[ w1, w2, w3 ]

  16. Materialized Views • Threshold-based Algorithm (RTA) • reduce the top-k evaluations by discarding some weighting vectors that are not in the reverse top-k result set • process at least as many top-k evaluations as the cardinality of the result set • Materialized Views • find weighting vectors that belong definitely to the result without top-k evaluation

  17. w1, w2, w3 Materialized Views • Grid-based space partitioning • cell Ci • lower left corner CiL • upper right corner CiU • We store for each cell Cithe results of reverse top-k queries for corners CiLand CiU

  18. w1, w2, w3 w1, w2, w3 , w4 Materialized Views • Given a point q enclosed in cell Ci • all weighting vectors in RTOPk(CiU) belong to the result set of q • only weighting vectors in RTOPk(CiL) - RTOPk(CiU) have to be examined • Materialized views can be generalized for arbitrary k<Kvalues

  19. Outline • Motivation & Preliminaries • Monochromatic Reverse Top-k Queries • Bichromatic Reverse Top-k Queries • Threshold-based Algorithm • Materialized Views • Experimental Evaluation • Conclusions & Future Work

  20. Experimental Setup • Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – real data) • Queries: uniform and k-skyband points • Metrics: • time • I/Os • number of top-k evaluations

  21. RTA vs. Naïve uniform distribution of S and uniform weights W |S|=10K, |W|=10K, top-k=10, skyband query points • RTA outperforms naive by 1 to 2 orders of magnitude • as dimensionality increases, |RTOPk(q)| decreases leading to fewer top-k evaluations

  22. naive requires |W| top-k query evaluations |W|=5K, correlated dataset: RTA needs on 544 out of 5000 top-k evaluations (saves 89.12% of the cost) the average size of the result set is 459 Scalability of RTA Algorithm various distributions (UN, AC, CO) of S and uniform weights W |S|=10K or|W|=10K, d=5, top-k=10, skyband query points

  23. Performance of RTA on Real Data NBAconsists of 17265 tuples, d=5(number of points scored,rebounds, assists, steals and blocks) HOUSE consists of 127930 tuples, d=6 (income spent on gas, electricity, water, heating, insurance, and property tax) • uniform and clustered weights W (|W|=10K) • clustered weights lead to fewer top-k evaluations

  24. Outline • Motivation & Preliminaries • Example of Reverse Top-k Queries • Monochromatic Reverse Top-k Queries • Bichromatic Reverse Top-k Queries • Threshold-based Algorithm • Materialized Views • Experimental Evaluation • Conclusions & Future Work

  25. Conclusions and Future Work • We introduced reverse top-k queries • geometric interpretation of the solution space • efficient algorithm for bichromatic reverse top-k query • materialized reverse top-k views • Future Work • interpretation of solution space for higher dimensions (monochromatic reverse top-k) • improve the performance of the bichromatic reverse top-k computation

  26. Thank you! Related work: Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: "Reverse Top-k Queries" Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: "Identifying the Most Influential Data Objects with Reverse Top-k Queries" More information: http://www.idi.ntnu.no/~vlachou/

More Related