Designing and Evaluating Search Interfaces

Designing and Evaluating Search Interfaces Prof. Marti Hearst School of Information UC Berkeley

Outline • Why is Supporting Search Difficult? • What Works? • How to Evaluate?

Why is Supporting Search Difficult? • Everything is fair game • Abstractions are difficult to represent • The vocabulary disconnect • Users’ lack of understanding of the technology • Clutter vs. Information

Everything is Fair Game • The scope of what people search for is all of human knowledge and experience. • Other interfaces are more constrained (word processing, formulas, etc) • Interfaces must accommodate human differences in: • Knowledge / life experience • Cultural background and expectations • Reading / scanning ability and style • Methods of looking for things (pilers vs. filers)

Abstractions Are Hard to Represent • Text describes abstract concepts • Difficult to show the contents of text in a visual or compact manner • Exercise: • How would you show the preamble of the US Constitution visually? • How would you show the contents of Joyce’s Ulysses visually? How would you distinguish it from Homer’s TheOdyssey or McCourt’s Angela’s Ashes? • The point: it is difficult to show text without using text

Vocabulary Disconnect • If you ask a set of people to describe a set of things there is little overlap in the results.

The Vocabulary Problem Data sets examined (and # of participants) • Main verbs used by typists to describe the kinds of edits that they do (48) • Commands for a hypothetical “message decoder” computer program (100) • First word used to describe 50 common objects (337) • Categories for 64 classified ads (30) • First keywords for a each of a set of recipes (24) Furnas, Landauer, Gomez, Dumais: The Vocabulary Problem in Human-System Communication. Commun. ACM 30(11): 964-971 (1987)

The Vocabulary Problem These are really bad results • If one person assigns the name, the probability of it NOT matching with another person’s is about 80% • What if we pick the most commonly chosen words as the standard? Still not good: Furnas, Landauer, Gomez, Dumais: The Vocabulary Problem in Human-System Communication. Commun. ACM 30(11): 964-971 (1987)

Lack of Technical Understanding • Most people don’t understand the underlying methods by which search engines work.

People Don’t Understand Search Technology A study of 100 randomly-chosen people found: • 14% never type a url directly into the address bar • Several tried to use the address bar, but did it wrong • Put spaces between words • Combinations of dots and spaces • “nursing spectrum.com” “consumer reports.com” • Several use search form with no spaces • “plumber’slocal9” “capitalhealthsystem” • People do not understand the use of quotes • Only 16% use quotes • Of these, some use them incorrectly • Around all of the words, making results too restrictive • “lactose intolerance –recipies” • Here the – excludes the recipes • People don’t make use of “advanced” features • Only 1 used “find in page” • Only 2 used Google cache Hargattai, Classifying and Coding Online Actions, Social Science Computer Review 22(2), 2004 210-227.

People Don’t Understand Search Technology Without appropriate explanations, most of 14 people had strong misconceptions about: • ANDing vs ORing of search terms • Some assumed ANDing search engine indexed a smaller collection; most had no explanation at all • For empty results for query “to be or not to be” • 9 of 14 could not explain in a method that remotely resembled stop word removal • For term order variation “boat fire” vs. “fire boat” • Only 5 out of 14 expected different results • Understanding was vague, e.g.: • “Lycos separates the two words and searches for the meaning, instead of what’re your looking for. Google understands the meaning of the phrase.” Muramatsu & Pratt, “Transparent Queries: Investigating Users’ Mental Models of Search Engines, SIGIR 2001.

What Works?

Cool Doesn’t Cut It • It’s very difficult to design a search interface that users prefer over the standard • Some ideas have a strong WOW factor • Examples: • Kartoo • Groxis • Hyperbolic tree • But they don’t pass the “will you use it” test • Even some simpler ideas fall by the wayside • Example: • Visual ranking indicators for results set listings

Early Visual Rank Indicators

Metadata Matters • When used correctly, text to describe text, images, video, etc. works well • “Searchers” often turn into “browsers” with appropriate links • However, metadata has many perils • The Kosher Recipe Incident

Small Details Matter • UIs for search especially require great care in small details • In part due to the text-heavy nature of search • A tension between more information and introducing clutter • How and where to place things important • People tend to scan or skim • Only a small percentage reads instructions

Small Details Matter • UIs for search especially require endless tiny adjustments • In part due to the text-heavy nature of search • Example: • In an earlier version of the Google Spellchecker, people didn’t always see the suggested correction • Used a long sentence at the top of the page: “If you didn’t find what you were looking for …” • People complained they got results, but not the right results. • In reality, the spellchecker had suggested an appropriate correction. • Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html

Small Details Matter • The fix: • Analyzed logs, saw people didn’t see the correction: • clicked on first search result, • didn’t find what they were looking for (came right back to the search page • scrolled to the bottom of the page, did not find anything • and then complained directly to Google • Solution was to repeat the spelling suggestion at the bottom of the page. • More adjustments: • The message is shorter, and different on the top vs. the bottom • Interview with Marissa Mayer by Mark Hurst: http://www.goodexperience.com/columns/02/1015google.html

Small Details Matter • Layout, font, and whitespace for information-centric interfaces requires very careful design • Example: • Photo thumbnails • Search results summaries

What Works for Search Interfaces? • Query term highlighting • in results listings • in retrieved documents • Term Suggestions (if done right) • Sorting of search results according to important criteria (date, author) • Grouping of results according to well-organized category labels (see Flamenco) • DWIM only if highly accurate: • Spelling correction/suggestions • Simple relevance feedback (more-like-this) • Certain types of term expansion • So far: not really visualization Hearst et al: Finding the Flow in Web Site Search, CACM45(9), 2002.

Highlighting Query Terms • Boldface or color • Adjacency of terms with relevant context is a useful cue.

found! found! don’t know don’t know Highlighted query term hits using Google toolbar Microso US Blackout PGA Microsoft

How to Introduce New Features? • Example: Yahoo “shortcuts” • Search engines now provide groups of enriched content • Automatically infer related information, such as sports statistics • Accessed via keywords • User can quickly specify very specific information • united 570 (flight arrival time) • map “san francisco” • We’re heading back to command languages!

Introducing New Features • A general technique: scaffolding • Scaffolding: • Facilitate a student’s ability to build on prior knowledge and internalize new information. • The activities provided in scaffolding instruction are just beyond the level of what the learner can do already. • Learning the new concept moves the learner up one “step” on the conceptual “ladder”

Scaffolding Example • The problem: how do people learn about these fantastic but unknown options? • Example: scaffolding the definition function • Where to put a suggestion for a definition? • Google used to simply hyperlink it next to the statistics for the word. • Now a hint appears to alert people to the feature.

Unlikely to notice the function here

Scaffolding to teach what is available

Query Term Suggestions

Query Reformulation • Query reformulation: • After receiving unsuccessful results, users modify their initial queries and submit new ones intended to more accurately reflect their information needs. • Web search logs show that searchers often reformulate their queries • A study of 985 Web user search sessions found • 33% went beyond the first query • Of these, ~35% retained the same number of terms while 19% had 1 more term and 16% had 1 fewer Use of query reformulation and relevance feedback by Excite users, Spink, Janson & Ozmultu, Internet Research 10(4), 2001

Query Reformulation • Many studies show that if users engage in relevance feedback, the results are much better. • In one study, participants did 17-34% better with RF • They also did better if they could see the RF terms than if the system did it automatically (DWIM) • But the effort required for doing so is usually a roadblock. • Before the web and in most research, searches have to select MANY relevant documents or MANY terms. Koenemann & Belkin, A Case for Interaction: A Study of Interactive Information Retrieval Behavior and Effectiveness, CHI’96

Query Reformulation • What happens when the web search engines suggests new terms? • Web log analysis study using the Prisma term suggestion system: Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.

Query Reformulation Study • Feedback terms were displayed to 15,133 user sessions. • Of these, 14% used at least one feedback term • For all sessions, 56% involved some degree of query refinement • Within this subset, use of the feedback terms was 25% • By user id, ~16% of users applied feedback terms at least once on any given day • Looking at a 2-week session of feedback users: • Of the 2,318 users who used it once, 47% used it again in the same 2-week window. • Comparison was also done to a baseline group that was not offered feedback terms. • Both groups ended up making a page-selection click at the same rate. Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.

Query Reformulation Study Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.

Query Reformulation Study • Other observations • Users prefer refinements that contain the initial query terms • Presentation order does have an influence on term uptake Anick, Using Terminological Feedback for Web Search Refinement – A Log-based Study, SIGIR’03.

Prognosis: Query Reformulation • Researchers have always known it can be helpful, but the methods proposed for user interaction were too cumbersome • Had to select many documents and then do feedback • Had to select many terms • Was based on statistical ranking methods which are hard for people to understand • RF is promising for web-based searching • The dominance of AND-based searching makes it easier to understand the effects of RF • Automated systems built on the assumption that the user will only add one term now work reasonably well • This kind of interface is simple

Supporting the Search Process • We should differentiate among searching: • The Web • Personal information • Large collections of like information • Different cues useful for each • Different interfaces needed • Examples • The “Stuff I’ve Seen” Project • The Flamenco Project

The “Stuff I’ve Seen” project • Did intense studies of how people work • Used the results to design an integrated search framework • Did extensive evaluations of alternative designs • The following slides are modifications of ones supplied by Sue Dumais, reproduced with permission. Dumais, Cutrell, Cadiz, Jancke, Sarin and Robbins, Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003.

Searching Over Personal Information • Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, notes) Slide adapted from Sue Dumais.

The “Stuff I’ve Seen” project • Unified index of items touched recently by user • All types of information, e.g., files of all types, email, calendar, contacts, web pages, etc. • Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) • Automatic and immediate update of index • Rich UI possibilities, since it’s your content • Search only over things already seen • Re-use vs. initial discovery Slide adapted from Sue Dumais.

SIS Interface Slide adapted from Sue Dumais

Search With SIS Slide adapted from Sue Dumais

Designing and Evaluating Search Interfaces

Designing and Evaluating Search Interfaces

Presentation Transcript

Designing and Evaluating DSS User Interfaces

Chapter 12 Designing Interfaces and Dialogues

Evaluating Interfaces

Evaluating interfaces with users

Chapter 3c Designing Interfaces and Dialogues

Designing and Evaluating Parallel Programs

Evaluating User Interfaces

Designing Graphical User Interfaces

Evaluating search engines

Search Interfaces and String Manipulation

Designing user interfaces

Designing user interfaces

Evaluating search engines

Designing Interfaces and Dialogues

Designing user interfaces

Evaluating Search Interfaces

Evaluating Interfaces

Evaluating Interfaces

Evaluating interfaces with users

Chapter 12 Designing Interfaces and Dialogues