1 / 19

Random Walks on the Click Graph

Random Walks on the Click Graph. Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007. Introduction (1/2). A search engine can track which of its search results were clicked for which query

Télécharger la présentation

Random Walks on the Click Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007

  2. Introduction (1/2) • A search engine can track which of its search results were clicked for which query • Click records of query-document pairs can be viewed as a weak indication of relevance • The user decided to at least view the document, based on its description in the search results • We can use the clicks of past users to improve the current search results • The clicked set of documents is likely to differ from the current user’s relevance set

  3. Introduction (2/2) • From the perspective of a user conducting a search: • Documents that are clicked but not relevant constitute noise • Documents that are relevant but not clicked constitute sparsity in the click data • Power law distribution: most queries in the click log have a small number of clicked documents • This paper focuses on the sparsity problem by giving a Markov random walk model, although the model also has noise reduction properties

  4. Algorithm on the Click Graph • The current model uses click data alone, without considering document content or query content • The click graph: • Bipartite • Two types of nodes: queries and documents • An edge connects a query and a document if a click for that query-document pair is observed • The edge may be weighted according to the total number of clicks from all users

  5. Click Graph Example

  6. Application Areas for Algorithms on Click Graph • Query-to-document ‘search’ • Given a query, find relevant documents, as in ad hoc search • Query-to-query ‘suggestion’ • Given a query, find other queries that the user might like to run • Document-to-query ‘annotation’ • Given a document, attach related queries to it • Document-to-document ‘relevance feedback’ • Given an example document that is relevant to the user, find additional relevant documents

  7. Random Walk Model • A basic query formulation model • Imagine a document (information need) • Think of a query associated with the document • Issue the query or imagine another document related to the query • Iterative thought process (noise process) • A Markov random walk which describes a probability distribution over queries • The retrieval model is obtained by inverting the query formulation model • Starts from an observed query, and attempts to undo the noise, inferring the underlying information need • Backward walks

  8. Random Walk Computation • Cjk: click counts associating node j to k • Define transition probabilities Pt+1|t(k|j) from j to k s is the self-transition probability, which corresponds to the user favoring the current query or document • Transition matrix [A]jk= Pt+1|t(k|j) Pt|0(k|j)=[At]jk • A measure of the volume of paths between j and k

  9. Random Walk Model for Retrieval • Backward random walk for retrieval: Given that we ended a t-step walk at node j, we find the probability of starting at node k, P0|t(k|j) Bayes rule: P0|t(k|j) = Pt|0(j|k)P0 (k)╱Pt(j), assumingP0 (k)=1/N and Pt(j) = Σi[At]ij  P0|t(k|j) = [AtZ-1]kj where Z is diagonal and Zjj= Σi[At]ij • Forward random walk: Pt|0(k|j) = [vj.At]k

  10. Forward vs. Backward Walks • PageRank: a query-independent forward random walk on the link graph, which proceeds to its stationary distribution • In statistics, the backward walk model is referred to as diagnostic, and in contrast, the forward walk model is predictive • When t → ∞: • The forward random walk approaches the stationary distribution • Gives high probability to nodes with large number of clicks • The backward random walk approaches the prior starting distribution, which we have taken to be uniform

  11. Clustering Effect • Given an end node that is part of a cluster, we have similar probabilities of having started the walk from any node in the cluster

  12. Walk Parameters Figure: Probability distribution of non-self transitions under different combinations of t and s

  13. Experiment Data • A 14-day click log of web image search engines • Judged images with distance 1 from the query had precision of 75% • Pruning: remove URLs only connected to one query and remove queries that only connected to one URL • After pruning: 505,000 URLs, 202,000 queries and 1.1 million edges • Uniformly sampling 45 queries for evaluation • TREC-style pooling relevance judgments of depth 20 • 2278 relevance judgments identify 818 relevant images

  14. Experiment Result-1 Table 1. The furthest node from any of our test queries is at distance 41 (‘101-0.9-backward’). ‘dist’ and ‘1-0-forward’ are the baselines.

  15. Experiment Result-2 Figure: The number of images retrieved at different distances from the query for each method. The 101-step walk with zero-self-transition possibly goes too far, returning too few distance-1 images.

  16. Experiment Result-3 Figure: The precision at different distances from the query for each method.

  17. Experiment Result-4 Figure: Precision-recall curves of forward and backward walks, with zero self-transition probability (1000 URLs retrieved)

  18. Experiment Result-5 Figure: Parameter sensitivity for a backwards walk. Each contour shows a 0.01 variation in MAP@20. Grid intersections indicate the parameter combinations tried. The large plateau has the highest MAP@20 (0.56-0.57)

  19. Conclusion • We have applied a Markov random walk model to the click graph, giving us a high-quality ranking of documents for a given query, including those as-yet unclicked for that query • A backward walk was more effective than a forward walk, which supports the notion underlying our backward walk • We got the best results from a walk of 11 steps, or 101 steps with high self-transition probability • We have studied ad hoc retrieval in this paper and the model could be effective and easily applied in the applications listed • Given our model, another possible step would be to incorporate document content and query content, by incorporating a language model, aiming to find document that are not yet part of the click graph

More Related