Advanced Expert Finding Framework Using Language Modeling Techniques
160 likes | 272 Vues
This paper presents a sophisticated framework for expert finding within organizations using language modeling. The authors propose two methodologies: a candidate-based approach that profiles experts and ranks them based on query topics and a document-based approach that identifies relevant documents and associates them with experts. Through experiments evaluating the effectiveness of both models, it was found that the document-based model outperforms the candidate-based model in terms of average precision and reciprocal rank. The study highlights the importance of effective information retrieval in expert finding tasks.
Advanced Expert Finding Framework Using Language Modeling Techniques
E N D
Presentation Transcript
A language modeling framework for expert finding Presenter : Lin, Shu-Han Authors : KrisztianBalog, Leif Azzopardi, Maarten de Rijke Information Processing and Management(IPM) 45 (2009) 1–19
Outline • Motivation • Objective • Methodology • Experiments • Conclusion • Comments
Motivation • The expert finding: finding experts given a topic. • Yellow Pages: • Profiles:employees self-assess their skills. • Keywords;e.g.,marketing • Problem: • Information:antiquated • Keywords:restricted 3
Objectives • Withintheorganization… • Minepublished intranetdocuments. • Search all kinds of expertise. • ‘Whoaretheexpertsontopic“Internet marketing and internet advertising”inmyorganization?’ 4
Methodology–Overview (uniform) Bayes’ Theorem (constant) • Tocapturetheassociationbetweenacandidateexpertandanareaofexpertise… “What is the probability of a candidate ca being an expert given the query topic q?” • Model1:candidate-based(query-independent)approach: idea:build a profileof candidate experts, and rank them based on query. • Model2:document-based(query-dependent)approach idea:findthequery-relevant documents, then associatewith experts. 5
Methodology–Model1 p(InternetMarketing|θca)=p(“Internet”|θca)‧p(“Marketing”|θca) (Smoothed) (weighted) e.g.,p(Internet marketing and internet advertising|θca)=p(“Internet”|θca)2‧ p(“Marketing”|θca) ‧ p(“and”|θca) ‧ p(“Advertising”|θca) Buildatextualrepresentation(model)ofaperson’sknowledgeaccordingtohisdocuments. Thenestimatetheprobabilityofthequerygiventhecandidate’smodel. 6
Methodology–Model1B (weighted) e.g.,p(“Internet”|“Mail.No.43”,“John”)…John(john@gmail.com)isamajorinmarketing.… …<731842>(< 731842 >)isamajorinmarketing.… p.s.thecloser,themorepowerful. • Estimatep(t|d,ca) • Candidateidentifier • Windowsize(w) 7
Methodology–Model2 (Smoothed) 8
Methodology–Model2B Model2 Model2B 9
Methodology–document-candidateassociations (documentimportance) (seniormemberoforganization) Booleanmodel TF-IDF 10
Experiments (1/3 + 1/2 + 1)/3 = 11/18 • Evaluationmeasures: • MAP(meanaverageprecision) • MRR(meanreciprocalrank): 11
Experiments Model1vs.Model2 Window-basedmodels 12
Experiments Associationmethods Parametersensitivity 13
Conclusions • Model1:build a profile of candidate experts, and rank them based on query. • Model 2:find the query-relevant documents, then associate with experts. • Model 2was to be preferred over Model 1: • Effectiveness:in terms of average precision and reciprocalrank • Implement:only requiring a regular document index • window-basedextensions improved : • Effectiveness: especially on top of Model 1 • Frequency-based(TF-IDF) document-candidate associations ishelpful.
Comments • Advantage • Integrateideas • Drawback • … • Application • …