1 / 36

Context-sensitive ranking

Context-sensitive ranking. Rakesh Agrawal Microsoft Search Labs Ralf Rantzau IBM Silicon Valley Lab Evimaria Terzi University of Helsinki & Microsoft Search Labs. Work done largely while the authors were in IBM Almaden. The curse of abundance: Too many data and too many answers.

chi
Télécharger la présentation

Context-sensitive ranking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context-sensitive ranking Rakesh Agrawal Microsoft Search Labs Ralf Rantzau IBM Silicon Valley Lab Evimaria Terzi University of Helsinki & Microsoft Search Labs Work done largely while the authors were in IBM Almaden SIGMOD 2006

  2. The curse of abundance:Too many data and too many answers • Query shopping.com for a digital camera: • Query Froogle for a tennis racquet: SIGMOD 2006

  3. Ranking query results • Algorithms for ranking web pages have been quite successful ([BP’98,Kleinberg98]) • Key idea: Exploit the graph of hyperlinks between web pages • Can we take similar approach for ranking database query results? • Need for a graph structure that accurately describes the relationships between tuples in the database - Past attempts: schema and key constraints or queries [BHP’04, BHNCS’02, GMT’04] But are these graphs natural or do they reflect design optimization decisions? SIGMOD 2006

  4. t1 t2 t3 Using preferences to induce a graph of tuples t1>t3and t2>t3 t1>t3 t3>t1 • Drama > Sci-Fi • Kidman > Reeves • Matrix > Birth [ Preferences are predicates of the form “X=x1 > X=x2” ] SIGMOD 2006

  5. Augment preferences with context • in general (*) • English > Spanish | * • but in the context of Comedies • Spanish > English| Comedies [ Contexts are predicates of the form “Y=a” ] SIGMOD 2006

  6. Preferences in the past • Preferences expressed via a numeric score [AW’00,KI’04,KI’05] • Nicole Kidman : 0.9 • Penelope Cruz : 0.4 • Dramas : 0.8 • Comedies : 0.3 • Pairwise preferences in ML literature [CSS’97] • Preferences as partial orders [Kieβling’02] • Preferences as first-order formulas [Chomiki’03] SIGMOD 2006

  7. 1/2 t1 t2 1/2 1/3 2/3 t3 t1 t2 t1>t3|En and t2>t3|En t1>t3|En t3>t1|En t3 Contextual preferences 1 • P1={G=Drama > G=Sci-Fi | L=English} • P2={A=Kidman > A=Reeves | L = English} • P3={T=matrix > T=Birth | L=English } SIGMOD 2006

  8. Obtaining preferences • Users provide preferences voluntarily – in the same way users rate products and services • Preferences can be automatically collected via browser plug-ins or taskbars (with user permission) • Preferences can be learned from past data • Preferences can also be learned from the data (e.g., using association-rule mining) Preferences are obtained from various sources and can contain cycles and contradictions, which are resolved democratically SIGMOD 2006

  9. Overview Question: How to incorporate users preferences when ranking query results? Approach: • Accumulate contextual preferences of the form i1>i2|X • Order the answer tuples such that the preferences are maximally respected, giving higher weight to those preferences whose contexts have closer match to the query SIGMOD 2006

  10. Issues • How to define similarity between a query and a context ? • See paper for the distance function. • Can we create orders in an offline step and use their information at query time ? • Should we save all orders? • How to combine the saved orders while answering queries ? SIGMOD 2006

  11. Problem decomposition [Problem 1]: For every context X build an order τX (Ordering) [Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders) • Assign eachof the input ordersto one of the representatives (the closest) • Associate with each representative σ a set of contexts Yσ [Problem 3]: Provide top-k results for the query Q • respecting the representative orders and • weight respect according to the similarity between query and contexts (Querying) SIGMOD 2006

  12. 1/2 t1 t1 t2 t2 1/2 1/3 1 2/3 t3 t3 Problem 1: The Ordering problem For a given context X and a set of preferences PX over the tuples D={t1,…,tn} find an ordering τ of D such that t2 t1 Agree = 1 +1/2 = 2/3 = 13/6 t3 SIGMOD 2006

  13. Problem 2: The ClusterOrders problem Given m orders Tm={τ1,…,τm} , each corresponding to a single concept Xi, find ℓ representative orders Tℓ such that cost(Tℓ) is minimized where and We use the standard Spearman footrule and Kendall tau distances for comparing orderings SIGMOD 2006

  14. a a a b c f f f b b a b e e e c c c d d d c d d d e c c d e e e f b b b f f f a a a The ClusterOrders problem: Example 1 0 1 0 1 Cost(τ1) = 2 Cost(τ2) = 1 Cost(τ1, τ2) = 2+1=3 SIGMOD 2006

  15. Problem 3: The Querying problem Provide top-k results for query Qrespecting the representative orders and weighting respect using the corresponding set of contexts SIGMOD 2006

  16. Problem decomposition [Problem 1]: For every context X build an order τX (Ordering) [Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders) • Assign each of the input orders to one of the representatives (the closest) • Associate with each representative σ a set of contexts Yσ [Problem 3]: Provide top-k results for the query Q • respecting the representative orders and • weight respect according to the similarity between query and contexts (Querying) SIGMOD 2006

  17. 1/2 t1 t1 t2 t2 1/2 1/3 1 2/3 t3 t3 Constructing orders from preferences [Problem1] • Problem is NP-hard; need for heuristics • PickPerm algorithm : pick a random permutation, inverse it and pick the best of the two t2 t1 t2 t3 t3 t3 t1 t2 t1 A = 11/6 A = 5/6 [ Inspired by the 2-approximation algorithm for finding the maximum acyclic subgraph of a given graph ] SIGMOD 2006

  18. 1/2 t1 t1 t1 t2 t2 1/2 1/3 2/3 1/3 1 t3 2/3 t3 t3 Greedy algorithm [CSS’97] • At the i-th iteration pick the i-th element of the output permutation • At each iteration pick the tuple t with the highest s_val(t) =OutDegree(t)-InDegree(t) in the remaining preference graph 1/3 1 1/3 t2 t2 t2 t1 t1 t3 -4/3 -1/3 SIGMOD 2006

  19. MC-algorithm • Reverse the directions of the edges on the preference graph • Run a random walk (with random restarts) on the reversed graph • Rank according to the stationary distribution SIGMOD 2006

  20. Performance • Data generation • Fix an order on the tuples • Generate preferences that respect this order • Pc: the probability that a preference is generated between a pair of tuples • Observations • For small pc values more orders are compatible, all algorithms are good • For large pc values MC and Greedy find the optimal order SIGMOD 2006

  21. Problem decomposition [Problem 1]: For every context X build an order τX (Ordering) [Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders) • Assign eachof the input ordersto one of the representatives (the closest) • Associate with each representative σ a set of contexts Yσ [Problem 3]: Provide top-k results for the query Q • respecting the representative orders and • weight respect according to the similarity between query and contexts (Querying) SIGMOD 2006

  22. Reducing the number of orders [Problem 2] • Finding ℓ representative orders is NP-hard • Finding ℓ orders from the input ones (good approximation, but still hard) • Need for heuristics • Greedy algorithm • Always pick the order (from the input) that introduces the minimum cost • Furthest algorithm • Start by picking a random orderτ and add it in the output set of orders Tℓ • For ℓ-1 iterations pick the order that is furthest away from the orders already in Tℓ SIGMOD 2006

  23. Refine the representative orders • Given the set of representative orders Tℓ, assign each input order τЄTm to its closest representative in Tℓ. (partition Tm into ℓ partitions)* • Discrete refinement: For each partition pick the best representative of the partition • Continuous refinement: ([DKNS’01]) For each partition find the best representative of the partition *Notice the resemblance between this problem and Catalog Segmentation problem by [KPR’04] SIGMOD 2006

  24. Performance • Data generation • Fixℓ underlying orders T • Generate other orders from T by picking an order in T and adding noise (swaps) • Compute the cost of the solution wrt to the ground truth • Observations • Without refinements: Greedy performs steadily better than Furthest • With refinements: Both algorithms are equally good • The groupings are equivalent SIGMOD 2006

  25. Problem decomposition [Problem 1]: For every context X build an order τX (Ordering) [Problem 2]: Given a set of orders Tm = {τ1,…, τm} find ℓ representative orders Tℓ (ClusterOrders) • Assign each of the input orders to one of the representatives (the closest) • Associate with each representative σ a set of contexts Yσ [Problem 3]: Provide top-k results for the query Q • respecting the representative orders and • weight respect according to the similarity between query and contexts (Querying) SIGMOD 2006

  26. Problem 3: The Querying problem • Use variation of the TA algorithms [FLN’02, FKS’03] • Assume k = 2 and query Q such that: • sim(Q,Y1) = 0.5, sim(Q,Y2) = 0.3, sim(Q,Y3)=0.1 0.5 0.3 0.1 SIGMOD 2006

  27. Problem 3: The Querying problem • At each sequential access • Set the threshold TH to be the aggregate of the scores seen in this access 0.5 0.3 0.1 TH =0.5*5+0.3*5+0.1*5=4.5 SIGMOD 2006

  28. Problem 3: The Querying problem • At each sequential access • Do random accesses and compute the score of the objects seen 0.5 0.3 0.1 TH =0.5*5+0.3*5+0.1*5=4.5 SIGMOD 2006

  29. At each sequential access Do random accesses and compute the score of the objects seen Problem 3: The Querying problem 0.5 0.3 0.1 TH =0.5*5+0.3*5+0.1*5=4.5 SIGMOD 2006

  30. Problem 3: The Querying problem • At each sequential access • Maintain a list of the top-k objects seen so far 0.5 0.3 0.1 TH =0.5*5+0.3*5+0.1*5=4.5 SIGMOD 2006

  31. Problem 3: The Querying problem • At each sequential access d. When the scores of the top-k are greater or equal to the threshold, stop 0.5 0.3 0.1 TH =0.5*4+0.3*4+0.1*4=3.6 SIGMOD 2006

  32. Accuracy of top-k results • IMDB dataset • Automatically generate preferences via association-rule mining: ‘A1=a’ > ‘A1=b’ |X if conf(Xa)>conf(Xb) • Solk: top-k results obtained after clustering • Gk: top-k results without clustering SIGMOD 2006

  33. Accuracy of top-k results SIGMOD 2006

  34. Recap • Notion of contextual preferences • Use of contextual preferences to order database results • Use of association rules to obtain contextual preferences • Experimental validation of the effectiveness of the proposed techniques using both synthetic and real data SIGMOD 2006

  35. Conclusions and future work • The framework of contextual preferences is both intuitive and practical • The framework is easily extended to accommodate for top-k lists and bucket orders • Scalability of the algorithms needs further investigation SIGMOD 2006

  36. Questions? SIGMOD 2006

More Related