1 / 17

Diversifying Search Results

Diversifying Search Results. Rakesh Agrawal Sreenivas Gollapudi Search Labs Search Labs Microsoft Research Microsoft Research rakesha@microsoft.com sreenig@microsoft.com Alan Halverson Samuel Ieong Search Labs Search Labs Microsoft Research Microsoft Research

dore
Télécharger la présentation

Diversifying Search Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diversifying Search Results RakeshAgrawalSreenivasGollapudi Search Labs Search Labs Microsoft Research Microsoft Research rakesha@microsoft.comsreenig@microsoft.com Alan Halverson Samuel Ieong Search Labs Search Labs Microsoft Research Microsoft Research alanhal@microsoft.comsaieong@microsoft.com WSDM ’09

  2. Outline • Introduction • Problem Formulation • A Greedy Algorithm for DIVERSIFY(K) • Performance Metrics • Evaluation • Conclusions

  3. Introduction • Minimize the risk of dissatisfaction of the average user • Assume that there exist • a taxonomy • a model user intents • Consider both the relevance of the documents and diversity of the search result • Tradeoff relevance and diversity

  4. Problem Formulation • The number of results to show for each category according to the percentage of users interested in that category may perform poorly • Example : Flash • Technology : 0.6

  5. Problem Formulation • Non-order • Our algorithm is also designed to generate an ordering of results rather than just a set of results

  6. Problem Formulation • DIVERSIFY(k) is NP-hard • Optimal for DIVERSIFY (k-1) need not be a subset of documents optimal for DIVERSIFY (k) • Example : p(c1|q)=p(c2|q)=0.5 DIVERSIFY(1):d1,d2,d3 DIVERSIFY(2):d2,d3,d1

  7. A Greedy Algorithm for DIVERSIFY(K)

  8. Performance Metrics • NDCG,MRR,MAP do not take into account the value of diversification • Intent Aware Measure example: p(c2|q)>>p(c1|q) d1 is Excellent for c1(but unrelated to c2) d2is Good for c2(but unrelated to c1) Classical IR metrics:d1,d2 Intent aware measures:d2,d1

  9. Intent Aware Measure

  10. Evaluation • Evaluate our approach against three commercial search engine • Conduct three sets of experiments • Differ in how the distributions of intents and how the relevance of the documents are obtained

  11. Experiment 1 • The distributions of intents for both queries and documents via standard classifiers • The relevance of documents from a proprietary repository of human judgements that we have been granted access to • Dataset : 10,000 random queries with top 50 documents • Many documents are assigned human judgments in the top 10 for each query

  12. Experiment 1 • sample about 900 queries • at least two categories • a significant fraction of associated documents have human judgments

  13. Experiment 1

  14. Experiment 2 • Obtain the distributions of intents for queries and the document relevance using the Amazon Mechanical Turk platform • Sample 200 queries from the dataset • at least three categories • Submit these queries along with the three most likely categories as estimated by the classier and the top five results produced by IA-Select to the Turks

  15. Experiment 2

  16. Experiment 3 • IA-Select : p(c|q) from Amazon Mechanical Turk platform • Metrics : p(c|q) and relevance documents are the same as used in Experiment 1

  17. Conclusions • Provide a greedy algorithm with good approximation gurantees • To evaluate the effectiveness of our approach, we proposed generalizations of well-studied metrics to take into account of the intentions of the users • Our approach outperforms results produced by commercial search engines over all of the metrics

More Related