1 / 43

Personalizing Web Search

Personalizing Web Search. Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR. Demo. Personalizing Web Search. Motivation Algorithms Results Future Work. Personalizing Web Search. Motivation Algorithms Results Future Work. Study of Personal Relevancy.

charisse
Télécharger la présentation

Personalizing Web Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR

  2. Demo

  3. Personalizing Web Search • Motivation • Algorithms • Results • Future Work

  4. Personalizing Web Search • Motivation • Algorithms • Results • Future Work

  5. Study of Personal Relevancy • 15 SIS users x ~10 queries • Evaluate 50 results • Highly relevant / Relevant / Irrelevant • Query selection • Previously issued query • Chose from 10 pre-selected queries • Collected evaluations for 137 queries • 53 of pre-selected queries (2-9/query)

  6. Relevant Results Have Low Rank Highly Relevant Relevant Irrelevant

  7. Same Query, Different Intent • Different meanings • “Information about the astronomical/astrological sign of cancer” • “information about cancer treatments” • Different intents • “is there any new tests for cancer?” • “information about cancer treatments”

  8. Same Intent, Different Evaluation • Query: Microsoft • “information about microsoft, the company” • “Things related to the Microsoft corporation” • “Information on Microsoft Corp” • 31/50 rated as not irrelevant • Only 6/31 do more than one agree • All three agree only for www.microsoft.com

  9. More to Understand • Do people cluster? • Even if they can’t state their intention • How are the differences reflected? • Can they be seen from the information on a person’s computer? • Can we do better than the ranking that would make everyone the most happy? • Best common ranking: +38% • Best personalized ranking: +55%

  10. Personalizing Web Search • Motivation • Algorithms • Results • Future Work

  11. Personalization Algorithms • Standard IR • Related to relevance feedback • Query expansion Query Server Document Client User v. Result re-ranking

  12. Result Re-Ranking • Takes full advantage of SIS • Ensures privacy • Good evaluation framework • Look at light weight user models • Collected on server side • Sent as query expansion

  13. BM25 with Relevance Feedback Score = Σtfi * wi N ni R ri N ni wi = log

  14. BM25 with Relevance Feedback Score = Σtfi * wi N ni R ri (ri+0.5)(N-ni-R+ri+0.5) (ni-ri+0.5)(R-ri+0.5) wi = log

  15. User Model as Relevance Feedback Score = Σtfi * wi N R N’ = N+R ni’ = ni+ri ri ni (ri+0.5)(N-ni-R+ri+0.5) (ni- ri+0.5)(R-ri+0.5) (ri+0.5)(N’-ni’-R+ri+0.5) (ni’- ri+0.5)(R-ri+0.5) wi = log

  16. User Model as Relevance Feedback World Score = Σtfi * wi N User R ri ni

  17. User Model as Relevance Feedback World Score = Σtfi * wi N User World related to query R ri ni ni N

  18. User Model as Relevance Feedback World Score = Σtfi * wi N User World related to query R ri ni R ni N User related to query ri Query Focused Matching

  19. User Model as Relevance Feedback World Focused Matching World Score = Σtfi * wi N User Web related to query R ri ni R ni N User related to query ri Query Focused Matching

  20. Parameters • Matching • User representation • World representation • Query expansion

  21. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused

  22. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused

  23. User Representation • Stuff I’ve Seen (SIS) index • Recently indexed documents • Web documents in SIS index • Query history • Relevance judgments • None

  24. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None

  25. Parameters • Matching • User representation • World representation • Query expansion Query Focused World Focused All SIS Recent SIS Web SIS Query History Relevance Feedback None

  26. World Representation • Document Representation • Full text • Title and snippet • Corpus Representation • Web • Result set – title and snippet • Result set – full text

  27. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet

  28. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet

  29. Query Expansion • All words in document • Query focused The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ... The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...

  30. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

  31. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

  32. Parameters • Matching • User representation • World representation • Query expansion Query focused World focused All SIS Recent SIS Web SIS Query history Relevance feedback None Full text Title and snippet Web Result set – full text Result set – title and snippet All words Query focused

  33. Personalizing Web Search • Motivation • Algorithms • Results • Future Work

  34. Baselines • Best possible • Random • Text based ranking • Web ranking • URL Boost http://mail.yahoo.com/inbox/msg10 http://mail.yahoo.com/inbox/msg10 +1 http://mail.yahoo.com/inbox/msg10 +1

  35. Best Parameter Settings • Richer user representation better • SIS > Recent > Web > Query History > None Suggests rich client important • Efficiency hacks don’t hurt • Snippets query focused • Length normalization not an issue • Query focus good

  36. Text Alone Not Enough • Better than some baselines • Better than random • Better than no user representation • Better than relevance feedback • Worse than Web results • Blend in other features • Web ranking • URL boost

  37. Good, but Lots of Room to Grow • Best combination: 9.1% improvement • Best possible: 51.5% improvement • Assumes best Web combination selected • Only improves results 2/3 of the time

  38. Personalizing Web Search • Motivation • Algorithms • Results • Future Work

  39. Finding the Best Parameter Setting • Almost always some parameter setting that improves results • Use learning to select parameters • Based on individual • Based on query • Based on results • Give user control?

  40. Further Exploration of Algorithms • Larger parameter space to explore • More complex user model subsets • Different parsing (e.g., phrases) • Tune BM25 parameters • What is really helping? • Generic user model or personal model • Use different indices for the queries • Deploy system

  41. Practical Issues • Efficiency issues • Can interfaces mitigate some of the issues? • Merging server and client • Query expansion • Get more relevant results in the set to be re-ranked • Design snippets for personalization

  42. Thank you!

More Related