1 / 1

Extending Relevance Model for Relevance Feedback

User Feedback. Top Docs. Term Weighting. Relevance Model. Initial Query. Feedback Retrieval. Figure 1. Flowchart of our relevance feedback model. Extended Relevance Model (decomposed). Empirical judged relevance :. Uniform empirical document distribution : 1/|Pseudo|.

lawson
Télécharger la présentation

Extending Relevance Model for Relevance Feedback

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Feedback TopDocs TermWeighting RelevanceModel Initial Query Feedback Retrieval Figure 1. Flowchart of our relevance feedback model Extended RelevanceModel (decomposed) Empirical judged relevance: Uniform empirical document distribution: 1/|Pseudo| • Empirical distributions normalize out factors like #fdbk documents, and #relevant docs, thus, correct the bias toward the majority source. Extending Relevance Model for Relevance Feedback Le Zhao Chenmin Liang Jamie CallanLanguage Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA Introduction TREC 2008 Relevance Feedback track defines a testbed for evaluating relevance feedback algorithms. It includes different levels of feedback, from only 1 relevant feedback document to over 100 judgments with at least 3 relevant documents per topic. • The Extended RelevanceModel • Problem Setup: • weight feedback terms according to relevant feedback docs and pseudo relevant docs – instead of building two queries and combining; • use single tuning parameter to control how much more important true relevant documents should be than the pseudo ones; • Goal: separate out factors that affect term weights from the two sources: #fdbk docs, #rel docs, P(I) etc., so that stableacrosstopics. • Key problem: modelingP(I); can no longer be dropped w/o cost! Goal The design of feedback algorithms is most challenging when the amount of feedback information is minimal. Thus, we aim at designing a robust relevance feedback algorithm that can utilize even a small number of feedback documents to achieve robust performance. • Experiments • Baseline • Dependency model queries, for increased top precision • Pseudo relevance feedback (relevance model) for better recall • Best runs in 2005, 2006 Terabyte tracks • Extended Relevance Model • Stability of optimal • tuning on a per topic basis gives only 3-4% improvement on feedback set C or D • suggest tuning the interpolation of the extended relevance model with the original • query • Optimal around 0.7-0.8, significantly better than relevance feedback alone, when only one (the top) relevant document is used for feedback. p<0.004 by paired sign-test • No significant difference between merged model w/ top rel doc fdbk and PRF • Performance change as amount of feedback information increases • Data Set Documents: • GOV2 collection Topics: • 50 topics from previous Terabyte tracks • 150 topics from Million Query tracks Feedback: • Top documents ranked by systems from the previous tracks • Judgments also from previous tracks • training topics from previous • Terabyte (TB) and MQ tracks • different from test – TB only • feedback documents randomly • sampled from judgments • different from test – top ranked by previous TREC runs • almost flat curve • PRF is gaining a lot • need lower ranked relevant documents for effective feedback? • Modeling P(I) Generated from Collection model: • P(I | C) ~ (approximated with) P(Q | C) Considering documents in the collection: • maxD in CP(I | D) ~ maxD in CP(Q | D) • Intuition: relevant document is as good as the best document in C • avgD in TopN P(I | D) ~ avgD in TopN P(Q | D) • Intuition: relevant document is as good as the average of TopN in C Goal is to make stable, across topics with different P(I | D) values. • The Relevance Model • A distribution over terms, given information need • I, (Lavrenko and Croft 2001). For term r, • P(I) can be dropped w/o affecting the term weights • Top n terms  Relevance model Indri query: • #weight(w1r1 w2 r2 .. wnrn), where, wi= P(ri | I) • Interpolation with original query #weight( w Original_Query (1-w) Relevant_Model_Query ) • Conclusions & Future Work • The extended relevance model works well. (Otherwise would vary based on the number of relevant documents.) • One randomly-sampled relevant document is more informative than a top-ranked relevant document. • Merging relevance feedback and PRF is significantly better than relevance feedback. • Top ranked negative feedback documents probably carry more information for the system than top ranked relevant feedback documents. Future work.

More Related