Section Based Relevance Feedback
Section Based Relevance Feedback. Student: Nat Young Supervisor: Prof. Mark Sanderson. Relevance Feedback. SE user marks document(s) as relevant E.g. “find more like this” Terms are extracted from full document Whole document may not be relevant
Section Based Relevance Feedback
E N D
Presentation Transcript
Section Based Relevance Feedback Student: Nat Young Supervisor: Prof. Mark Sanderson
Relevance Feedback • SE user marks document(s) as relevant • E.g. “find more like this” • Terms are extracted from full document • Whole document may not be relevant • Could marking a sub-section relevant be better?
Test Collections • Simulate a real user’s search process • Submit queries in batch mode • Evaluate the result sets • Relevance Judgments • QREL: <topicId, docId> pairs (1 … n) • Traditionally produced by human assessors
Building a Test Collection • Documents • 1,388,939 research papers • Stop words removed • Porter Stemmer applied • Topics • 100 random documents • Their sub-sections (6 per document)
Building a Test Collection • In-edges • Documents that cite paper X • Found 943 using the CiteSeerX database • Out-edges • Documents cited by paper X • Found 397 using pattern matching on titles
QRELs • Total • 1,340 QRELs • Avg. 13.4 QRELs per document • Previous work: • Anna Richie et. al. (2006) • 82 Topics, Avg. 11.4 QRELs • 196 Topics, Avg. 4.5 QRELs • Last year • 71 Topics, Avg. 2.9 QRELs
Section Queries • RQ1 Do the sections return different results?
Section Queries • RQ2 Do the sections return different relevant results? Avg. = The average number of relevant results returned @ 20. E.g. Abstract queries returned 2 QRELs
Section Queries Average intersection sizes of relevant results E.g. Avg(|Abstract ∩ All|) = 0.63 Avg(|Abstract \ All|) = 1.37 100 - ((0.63 / 2) * 100) = 68.5% difference
Section Queries Average set complement % of relevant results E.g. Section X returned n% different relevant results than section Y
Next • Practical Significance • Does SRF provide benefits over standard RF?