1 / 25

Improving Web Search Results Using Affinity Graph

Improving Web Search Results Using Affinity Graph. Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research Asia SIGIR 2005. INTRODUCTION. The top search results can hardly cover a sufficient variety of topics (redundant)

damon
Télécharger la présentation

Improving Web Search Results Using Affinity Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research Asia SIGIR 2005

  2. INTRODUCTION • The top search results can hardly cover a sufficient variety of topics (redundant) • re-ranking method based on MMR • There is no indication about how informative a returned document is on the query topic (coverage) • subtopic retrieval method • two novel metrics, diversity and information richness

  3. BACKGROUND • The most famous works on link analysis • PageRank and HITS algorithm • Explicit link analysis and implicit link analysis • two web pages are implicitly linked if they are visited sequentially by the same end-user. • DirectHit and Small Web Search

  4. AFFINITY RANKING

  5. AFFINITY RANKING • Diversity: Given a set of documents R , we use diversity Div(R) to denote the number of different topics contained in R. • Information Richness: Given a document collection D={d1…dn}, we use information richness InfoRich (di) to denote richness of information contained in the document di with respect to the entire collection D.

  6. Affinity Graph Construction • According to vector space model , similarity between a documents pair of di and dj can be calculated as • For further measurement on the significance of the similarity between each document pair, we define the affinity of dj to di as

  7. InformationRichness Computation • After obtaining Affinity Graph, we apply a link analysis algorithm similar to PageRank • M is normalized to make the sum of each row equal to 1.

  8. InformationRichness Computation • the score of document di can be deduced from those of all other document linked to it • With dumping factor c (similar to the random jumping factor in PageRank):

  9. InformationRichness Computation • information can choose where to flow according to the following two rules: • With a probability c, the information will flow into document nodes which di links • With a probability of c - 1 the information will randomly flow into any document in the collection.

  10. Diversity Penalty

  11. Re-ranking Method • The re-ranking mechanism is a combination of results from fulltext search and Affinity Ranking • score-combination

  12. Re-ranking Method • rank-combination

  13. EXPERIMENTS • Yahoo! Directory • contained a total of 292,216 categories (including leaf categories and non-leaf categories) • All categories are organized into a 16-level hierarchy. • we have downloaded 792,601 documents in total. • ODP (Open Directory Project) • We downloaded the directory in August, 2004. ODP includes a total of 172,565 categories • we have downloaded 1,547,000 documents in total.

  14. EXPERIMENTS • Newsgroup dataset • The Newsgroup data is composed of 256,449 posts collected from 117 commercial application with a total size of about 400M • Title and content of the post are given a 3:1 weighting ratio in indexing process • There is no explicit link existing among the posts • large amount of posts are very likely to be devoted to the same topic

  15. Affinity Ranking vs. K-Means Clustering

  16. Affinity Ranking vs. K-Means Clustering • The top 1000 search results of each query are passed to AR or Kmeans algorithm to re-rank top 10 results • For K-Means algorithm, we set K=10 and use the top 1 document of each cluster to construct the top 10 results

  17. Affinity Ranking vs. K-Means Clustering

  18. Affinity Ranking in Newsgroup dataset • Query • We compare our approach with the Okapi system in three aspects: diversity, information richness and relevance

  19. Affinity Ranking in Newsgroup dataset • Four researchers are hired to labele the top 50 search results for each of the 20 queries based on the following steps:

  20. Affinity Ranking in Newsgroup dataset • N is the number of users • X could be diversity, information richness, or relevance of the top search results • A and F represent results from our ranking scheme and full-text search

  21. Improvement in Top 10 Search Results • As the top 10 search results always receive the most attention of end-users • In this experiment, we use the rank-combination scheme and which α= 0 and β =1

  22. Improvement within Top 50 Search Results

  23. Improvement within Top 50 Search Results

  24. A Case Study • This example is extracted from our experiments on the Newsgroup search for the query “Outlook print error”

  25. CONCLUSIONS • Proposed two new metrics, diversity and information richness • A novel ranking scheme, Affinity Ranking, is proposed to re-rank the search results • Our experiments showed that the proposed metrics and new ranking method can effectively improve the search performance • Future work includes scaling our Affinity Ranking computation, for example, to the Web scale

More Related