Personalizing Web Search
E N D
Presentation Transcript
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR
Personalizing Web Search • Motivation • Algorithms • Results • Future Work
Personalizing Web Search • Motivation • Algorithms • Results • Future Work
Study of Personal Relevancy • 15 SIS users x ~10 queries • Evaluate 50 results • Highly relevant / Relevant / Irrelevant • Query selection • Previously issued query • Chose from 10 pre-selected queries • Collected evaluations for 137 queries • 53 of pre-selected queries (2-9/query)
Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant
Same Query, Different Intent • Different meanings • “Information about the astronomical/astrological sign of cancer” • “information about cancer treatments” • Different intents • “is there any new tests for cancer?” • “information about cancer treatments”
Same Intent, Different Evaluation • Query: Microsoft • “information about microsoft, the company” • “Things related to the Microsoft corporation” • “Information on Microsoft Corp” • 31/50 rated as not irrelevant • Only 6/31 do more than one agree • All three agree only for www.microsoft.com
More to Understand • Do people cluster? • Even if they can’t state their intention • How are the differences reflected? • Can they be seen from the information on a person’s computer? • Can we do better than the ranking that would make everyone the most happy? • Best common ranking: +38% • Best personalized ranking: +55%
Personalizing Web Search • Motivation • Algorithms • Results • Future Work
Personalization Algorithms • Standard IR • Related to relevance feedback • Query expansion Query Server Document Client User v. Result re-ranking
Result Re-Ranking • Takes full advantage of SIS • Ensures privacy • Good evaluation framework • Look at light weight user models • Collected on server side • Sent as query expansion
BM25 with Relevance Feedback Score = Σtfi * wi N ni R ri N ni wi = log
BM25 with Relevance Feedback Score = Σtfi * wi N ni R ri (ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) wi = log
User Model as Relevance Feedback Score = Σtfi * wi N R N’ = N+R ni’ = ni+ri ri ni (ri+0.5)(N-ni-R+ri+0.5) (ni- ri+0.5)(R-ri+0.5) (ri+0.5)(N’-ni’-R+ri+0.5) (ni’- ri+0.5)(R-ri+0.5) wi = log
User Model as Relevance Feedback World Score = Σtfi * wi N User R ri ni
User Model as Relevance Feedback World Score = Σtfi * wi N User World related to query R ri ni ni N
User Model as Relevance Feedback World Score = Σtfi * wi N User World related to query R ri ni R ni N User related to query ri Query Focused Matching
User Model as Relevance Feedback World Focused Matching World Score = Σtfi * wi N User Web related to query R ri ni R ni N User related to query ri Query Focused Matching
Parameters • Matching • User representation • World representation • Query expansion
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused
User Representation • Stuff I’ve Seen (SIS) index • Recently indexed documents • Web documents in SIS index • Query history • Relevance judgments • None
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None
Parameters • Matching • User representation • World representation • Query expansion Query Focused World Focused All SIS Recent SIS Web SIS Query History Relevance Feedback None
World Representation • Document Representation • Full text • Title and snippet • Corpus Representation • Web • Result set – title and snippet • Result set – full text
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet
Query Expansion • All words in document • Query focused The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ... The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused
Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused
Personalizing Web Search • Motivation • Algorithms • Results • Future Work
Baselines • Best possible • Random • Text based ranking • Web ranking • URL Boost http://mail.yahoo.com/inbox/msg10 http://mail.yahoo.com/inbox/msg10 +1 http://mail.yahoo.com/inbox/msg10 +1
Best Parameter Settings • Richer user representation better • SIS > Recent > Web > Query History > None Suggests rich client important • Efficiency hacks don’t hurt • Snippets query focused • Length normalization not an issue • Query focus good
Text Alone Not Enough • Better than some baselines • Better than random • Better than no user representation • Better than relevance feedback • Worse than Web results • Blend in other features • Web ranking • URL boost
Good, but Lots of Room to Grow • Best combination: 9.1% improvement • Best possible: 51.5% improvement • Assumes best Web combination selected • Only improves results 2/3 of the time
Personalizing Web Search • Motivation • Algorithms • Results • Future Work
Finding the Best Parameter Setting • Almost always some parameter setting that improves results • Use learning to select parameters • Based on individual • Based on query • Based on results • Give user control?
Further Exploration of Algorithms • Larger parameter space to explore • More complex user model subsets • Different parsing (e.g., phrases) • Tune BM25 parameters • What is really helping? • Generic user model or personal model • Use different indices for the queries • Deploy system
Practical Issues • Efficiency issues • Can interfaces mitigate some of the issues? • Merging server and client • Query expansion • Get more relevant results in the set to be re-ranked • Design snippets for personalization