280 likes | 668 Vues
Organizing Search Results. Susan Dumais Microsoft Research. Organizing Search Results. Algorithms and interfaces that improve the effectiveness of search Beyond ranked lists Main goal to support search Also information analysis and discovery Example applications
E N D
Organizing Search Results Susan Dumais Microsoft Research
Organizing Search Results • Algorithms and interfaces that improve the effectiveness of search • Beyond ranked lists • Main goal to support search • Also information analysis and discovery • Example applications • SWISH, results classification • GridViz, results summarization • SIS, personal landmarks for context
Searching with Information Structured Hierarchically (SWISH) • Collaborators • Edward Cutrell, Hao Chen (Berkeley) • Key Themes • Going beyond long lists of results • Classification algorithms • UI techniques • More about it • http://research.microsoft.com /~sdumais
Organizing Search Results List Organization SWISH Category Organization => Shopping => Automotive => Computers => Automotive Query: “jaguar”
Buy or Sell a Car Chat Finance & Insurance Magazines & Books Maintenance & Repair Makes, Models & Clubs Motorcycles New Car Showrooms Off-Road, 4X4 & RVs Other Auto Interests Shows & Museums Trucks & Tractors Vintage & Classic Web Directory • LookSmart Directory Structure • ~400k pages; 17k categories; 7 levels • 13 top-level categories; 150 second-level categories • Top-level Categories • Automotive • Business & Finance • Computers & Internet • Entertainment & Media • Health & Fitness • Hobbies & Interests • Home & Family • People & Chat • Reference & Education • Shopping & Services • Society & Politics • Sports & Recreation • Travel & Vacations
SWISH System • Combines the advantages of • Directories - Manually crafted structure but small <~3 million pages> • Search engines - Broad coverage but limited metadata <~3 billion pages> • Project search engine results to category structure • Two main components • Text classification models • UI for integrating search results and structure • Context (category structure) plus focus (search results)
... web search results local search results Train (offline) Classify (online) manually classified web pages SVM model SWISH Architecture
Learning & Classification • Support Vector Machine (SVM) • Accurate and efficient for text classification (Dumais et al., Joachims) • Model = weighted vector of words • “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … • “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ... • Hierarchical models for LS directory • 1 model for top level; N models for second • Very useful in conjunction w/ user interaction
User Interface Experiments List Organization Category Organization
No Cat Names + Cat Names Hover Inline Browse Hover Inline Group Interface List Interface
Easy queries are faster(p<0.01) • Group faster than List(p<0.01) • Benefit is larger for hard queries(p<0.06) HARD HARD EASY EASY Group List Effect of Query Difficulty
SWISH: Summary and Design Implications • Text Classification • Learn accurate category models • Classify new web pages on-the-fly • Organize search results • User Interface • Tightly couple search results with category structure • User manipulation of presentation of category structure
GridViz • Collaborators • George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech) • Key Themes • Abstract beyond individual results • Highly interactive interface to support understanding of trends and relationships • More about it • http://research.microsoft.com/~sdumais
GridViz • Summarize the results of a search • Grid-based design • Axes represent topic, time, people • Cells encode frequency, recency • Supports activities like: • What newsgroups are active (on topic x)? • What people are active, authoritative (on topic x)? • When did I last interact w/ people?
List View GridViz User Interface Experiments
GridViz Summary • Abstracting beyond individual results • Highly interactive interface • Grid-based design • Axes represent people, topic, time • Cells encode frequency, recency • Preliminary but promising
Stuff I’ve Seen (SIS) • Collaborators • Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel (Stanford) • Key Themes • Your content • Information re-use • Integration across sources • More about it • … internal for now
Search Today … • Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, intranet) • Often slow
Search with SIS • Unified index of stuff you’ve seen • Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc. • Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) • Automatic and immediate update of index • Rich UI possibilities, since it’s your content • Architecture • Client side indexing and storage • Built using MS Search components
SIS Alpha Observations • 800+ internal users • Usage logs (incl different interfaces), survey data • File types opened • 76% Email • 14% Web pages • 10% Files • Age of items accessed • 7% today • 22% within the last week • 46% within the last month
SIS Alpha Observations • Use of other search tools • Non-SIS search for web, email, and files decreases • Importance of people • 25% of the queries involve people’s names • Importance of time • Date by far the most popular sort field, followed by rank, author, title • Even when rank is the default
SIS UI InnovationsTimeline w/ Landmarks • Importance of time • Timeline interface • Contextualize results using important landmarks as pointers into human memory • General: holidays, world events • Personal: important photos, appointments
SIS Summary • Unified index of stuff you’ve seen • Fast access to full-text and metadata, from heterogeneous sources • Automatic and immediate update of index • Rich UI possibilities • Next steps • Better support for tagging -> “flatland” • Implicit queries for finding related info, and identifying “Stuff I Should See” • Integration with richer activity-based info, Eve
Organizinging Search Results • Algorithms and interfaces to improve search • Use structure and context • Examples and key themes • SWISH … grouping • GridViz … abstraction • SIS … personal content and landmarks • Also • Important attributes: People, topics, time • Interaction • Evaluation • More information • http://research.microsoft.com/~sdumais • sdumais@microsoft.com • Christopher Lee of (SIG)IR … • http://www.cdvp.dcu.ie/SIGIR/index.html