1 / 28

Organizing Search Results

Organizing Search Results. Susan Dumais Microsoft Research. Organizing Search Results. Algorithms and interfaces that improve the effectiveness of search Beyond ranked lists Main goal to support search Also information analysis and discovery Example applications

Rita
Télécharger la présentation

Organizing Search Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Organizing Search Results Susan Dumais Microsoft Research

  2. Organizing Search Results • Algorithms and interfaces that improve the effectiveness of search • Beyond ranked lists • Main goal to support search • Also information analysis and discovery • Example applications • SWISH, results classification • GridViz, results summarization • SIS, personal landmarks for context

  3. Searching with Information Structured Hierarchically (SWISH) • Collaborators • Edward Cutrell, Hao Chen (Berkeley) • Key Themes • Going beyond long lists of results • Classification algorithms • UI techniques • More about it • http://research.microsoft.com /~sdumais

  4. Organizing Search Results List Organization SWISH Category Organization => Shopping => Automotive => Computers => Automotive Query: “jaguar”

  5. Buy or Sell a Car Chat Finance & Insurance Magazines & Books Maintenance & Repair Makes, Models & Clubs Motorcycles New Car Showrooms Off-Road, 4X4 & RVs Other Auto Interests Shows & Museums Trucks & Tractors Vintage & Classic Web Directory • LookSmart Directory Structure • ~400k pages; 17k categories; 7 levels • 13 top-level categories; 150 second-level categories • Top-level Categories • Automotive • Business & Finance • Computers & Internet • Entertainment & Media • Health & Fitness • Hobbies & Interests • Home & Family • People & Chat • Reference & Education • Shopping & Services • Society & Politics • Sports & Recreation • Travel & Vacations

  6. SWISH System • Combines the advantages of • Directories - Manually crafted structure but small <~3 million pages> • Search engines - Broad coverage but limited metadata <~3 billion pages> • Project search engine results to category structure • Two main components • Text classification models • UI for integrating search results and structure • Context (category structure) plus focus (search results)

  7. ... web search results local search results Train (offline) Classify (online) manually classified web pages SVM model SWISH Architecture

  8. Learning & Classification • Support Vector Machine (SVM) • Accurate and efficient for text classification (Dumais et al., Joachims) • Model = weighted vector of words • “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … • “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ... • Hierarchical models for LS directory • 1 model for top level; N models for second • Very useful in conjunction w/ user interaction

  9. User Interface Experiments List Organization Category Organization

  10. No Cat Names + Cat Names Hover Inline Browse Hover Inline Group Interface List Interface

  11. Easy queries are faster(p<0.01) • Group faster than List(p<0.01) • Benefit is larger for hard queries(p<0.06) HARD HARD EASY EASY Group List Effect of Query Difficulty

  12. SWISH: Summary and Design Implications • Text Classification • Learn accurate category models • Classify new web pages on-the-fly • Organize search results • User Interface • Tightly couple search results with category structure • User manipulation of presentation of category structure

  13. GridViz • Collaborators • George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech) • Key Themes • Abstract beyond individual results • Highly interactive interface to support understanding of trends and relationships • More about it • http://research.microsoft.com/~sdumais

  14. GridViz • Summarize the results of a search • Grid-based design • Axes represent topic, time, people • Cells encode frequency, recency • Supports activities like: • What newsgroups are active (on topic x)? • What people are active, authoritative (on topic x)? • When did I last interact w/ people?

  15. GridViz Demo

  16. List View GridViz User Interface Experiments

  17. GridViz Summary • Abstracting beyond individual results • Highly interactive interface • Grid-based design • Axes represent people, topic, time • Cells encode frequency, recency • Preliminary but promising

  18. Stuff I’ve Seen (SIS) • Collaborators • Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel (Stanford) • Key Themes • Your content • Information re-use • Integration across sources • More about it • … internal for now

  19. Search Today … • Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, intranet) • Often slow

  20. Search with SIS • Unified index of stuff you’ve seen • Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc. • Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) • Automatic and immediate update of index • Rich UI possibilities, since it’s your content • Architecture • Client side indexing and storage • Built using MS Search components

  21. SIS Demo

  22. SIS Alpha Observations • 800+ internal users • Usage logs (incl different interfaces), survey data • File types opened • 76% Email • 14% Web pages • 10% Files • Age of items accessed • 7% today • 22% within the last week • 46% within the last month

  23. SIS Alpha Observations • Use of other search tools • Non-SIS search for web, email, and files decreases • Importance of people • 25% of the queries involve people’s names • Importance of time • Date by far the most popular sort field, followed by rank, author, title • Even when rank is the default

  24. SIS UI InnovationsTimeline w/ Landmarks • Importance of time • Timeline interface • Contextualize results using important landmarks as pointers into human memory • General: holidays, world events • Personal: important photos, appointments

  25. Milestones in Time Demo

  26. Milestones in Timeline

  27. SIS Summary • Unified index of stuff you’ve seen • Fast access to full-text and metadata, from heterogeneous sources • Automatic and immediate update of index • Rich UI possibilities • Next steps • Better support for tagging -> “flatland” • Implicit queries for finding related info, and identifying “Stuff I Should See” • Integration with richer activity-based info, Eve

  28. Organizinging Search Results • Algorithms and interfaces to improve search • Use structure and context • Examples and key themes • SWISH … grouping • GridViz … abstraction • SIS … personal content and landmarks • Also • Important attributes: People, topics, time • Interaction • Evaluation • More information • http://research.microsoft.com/~sdumais • sdumais@microsoft.com • Christopher Lee of (SIG)IR … • http://www.cdvp.dcu.ie/SIGIR/index.html

More Related