240 likes | 365 Vues
This paper presents a probabilistic model for fine-grained expert search aimed at improving the identification of domain experts. It addresses the challenges associated with traditional coarse-grained expert search methods, focusing on a robust evidence extraction process. By utilizing various document types, including emails and general documents, it enhances the matching of queries to expert candidates. Experimental results from the TREC dataset demonstrate the model's effectiveness, offering valuable insights for refining expert search mechanisms and improving search engine usability.
E N D
A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio
Schedule Introduction 1 Fine-grained Expert Search 2 Experimental Results 3 Conclusion 4
Search Engine User Query Experts Introduction • Expert Search • “who is an expert on X?” Whoare experts on Semantic Web Search Engine?
Introduction • Pioneering Expert Search Systems • Log data in software development • Kautz et al., 1996; Mockus and Herbsleb, 2002; McDonald and Ackerman, 1998; etc. • Email communications • Campbell et al., 2003; Dom et al. 2003; Sihn and Heeren, 2001; etc. • General documents • Yimam, 1996; Davenport and Prusak, 1998; Steer and Lochbaum, 1988; Mattox et al., 1999; Hertzum and Pejtersen, 2000; Craswell et al., 2001; etc.
Introduction • Expert Search at TREC • A new task at TREC 2005, 2006, 2007 • Craswell et al., 2005; • Soboroff et al., 2006; • Bailey et al., 2007; • Many approaches have been proposed • Two generative models, Balog et al. 2006 • Prior distribution, relevance feedback, Fang et al. 2006 • Hierarchical language model, Petkova and Croft 2006 • Voting and data fusion, Macdonald and Ounis 2006 • …
Different blocks of electronic documents Different functions and qualities Different impacts for expert search Introduction • Coarse-grained approach. • Expert search is carried out under a grain of document. • Further improvements are hard to achieve
irrelevant relevant Window queried topic Examples Windowed Section Relation
Examples Title Query: Timed Text Author Title-Author Relation 8
Examples Reference Section Relation 9
Examples Query: W3C Management Team <H1> <H2> Section Title-Body Relation 10
Schedule Introduction 1 Fine-grained Expert Search 2 Experimental Results 3 Conclusion 4
Fine-grained ExpertSearch --Evidence Extraction <topic, person, relation, document> Fine-grained Evidence • Document-001: “…a high-level plan of the architecture of the semantic web by Tim Berners-Lee… ” “…later, Berners-Lee describes a semantic web search engineexperience…” Whoare experts on Semantic Web Search Engine? Tim Berners-Lee • E1: <semantic web, Tim Berners-Lee, same-section, document-001> • E2: <semantic web search engine, Berners-Lee, same-section, document-001>
Fine-grained Expert Search –Search Model Query (q) <topic, person, relation, document> (t,p,r,d) Expert Candidate (c) Expert Matching Model Evidence Matching Model
Fine-grained Expert Search -- Expert Matching <topic, person, relation, document> (<t, p, r, d> for short)
Schedule Introduction 1 Fine-grained Expert Search 2 Experimental Results 3 Conclusion 4
Experimental Result • W3C Corpus • 331,307 web pages • 10 training topics of TREC 2005 • 50 test topics of TREC 2005 • 49 test topics of TREC 2006 • Evaluation Metrics • Mean average precision (MAP) • R-precision (R-P) • Top N precision (P@N)
Experimental Result • Query Matching
Experimental Result • Person Matching
Experimental Result • Multiple Relations
Experimental Result • Evidence Quality
Schedule Introduction 1 Fine-grained Expert Search 2 Experimental Results 3 Conclusion 4
Conclusion • Fine-grained expert search • Probabilistic model and its implementation • Evaluation on the TREC data set