710 likes | 876 Vues
Interfaces for Information Retrieval Ray Larson & Warren Sack IS202: Information Organization and Retrieval Fall 2001 UC Berkeley, SIMS lecture authors: Marti Hearst, Ray Larson, Warren Sack. Today. What is HCI? Interfaces for IR using the standard model of IR
E N D
Interfaces for Information RetrievalRay Larson & Warren SackIS202:Information Organization and RetrievalFall 2001UC Berkeley, SIMSlecture authors: Marti Hearst, Ray Larson, Warren Sack IS202: Information Organization & Retrieval
Today • What is HCI? • Interfaces for IR using the standard model of IR • Interfaces for IR using new models of IR and/or different models of interaction IS202: Information Organization & Retrieval
Human-Computer Interaction (HCI) • Human • the end-user of a program • Computer • the machine the program runs on • Interaction • the user tells the computer what they want • the computer communicates results (slide adapted What is HCI? from James Landay) IS202: Information Organization & Retrieval
Organizational & Social Issues Task Design Technology Human What is HCI? (slide by James Landay) IS202: Information Organization & Retrieval
Shneiderman on HCI • Well-designed interactive computer systems promote: • Positive feelings of success, competence, and mastery. • Allow users to concentrate on their work, rather than on the system. IS202: Information Organization & Retrieval
Usability Design Goals • Ease of learning • faster the second time and so on... • Recall • remember how from one session to the next • Productivity • perform tasks quickly and efficiently • Minimal error rates • if they occur, good feedback so user can recover • High user satisfaction • confident of success (slide by James Landay) IS202: Information Organization & Retrieval
Who builds UIs? • A team of specialists • graphic designers • interaction / interface designers • technical writers • marketers • test engineers • software engineers (slide by James Landay) IS202: Information Organization & Retrieval
Design Evaluate Prototype How to Design and Build UIs Iterate at every stage! • Task analysis • Rapid prototyping • Evaluation • Implementation (slide adapted from James Landay) IS202: Information Organization & Retrieval
Task Analysis • Observe existing work practices • Create examples and scenarios of actual use • Try out new ideas before building software IS202: Information Organization & Retrieval
Task = Information Access • The standard interaction model for information access • (1) start with an information need • (2) select a system and collections to search on • (3) formulate a query • (4) send the query to the system • (5) receive the results • (6) scan, evaluate, and interpret the results • (7) stop, or • (8) reformulate the query and go to step 4 IS202: Information Organization & Retrieval
HCI Interface questions using the standard model of IR • Where does a user start? Faced with a large set of collections, how can a user choose one to begin with? • How will a user formulate a query? • How will a user scan, evaluate, and interpret the results? • How can a user reformulate a query? IS202: Information Organization & Retrieval
Interface design: Is it always HCI or the highway? • No, there are other ways to design interfaces, including using methods from • Art • Architecture • Sociology • Anthropology • Narrative theory • Geography IS202: Information Organization & Retrieval
Information Access: Is the standard IR model always the model? • No, other models have been proposed and explored including • Berrypicking (Bates, 1989) • Sensemaking (Russell et al., 1993) • Orienteering (O’Day and Jeffries, 1993) • Intermediaries (Maglio and Barrett, 1996) • Social Navigation (Dourish and Chalmers, 1994) • Agents (e.g., Maes, 1992) • And don’t forget experiments like (Blair and Maron, 1985) IS202: Information Organization & Retrieval
IR+HCI Question 1: Where does the user start? IS202: Information Organization & Retrieval
Dialog box for choosing sources in old lexis-nexis interface IS202: Information Organization & Retrieval
Where does a user start? • Supervised (Manual) Category Overviews • Yahoo! • HiBrowse • MeSHBrowse • Unsupervised (Automated) Groupings • Clustering • Kohonen Feature Maps IS202: Information Organization & Retrieval
Incorporating Categories into the Interface • Yahoo is the standard method • Problems: • Hard to search, meant to be navigated. • Only one category per document (usually) IS202: Information Organization & Retrieval
More Complex Example: MeSH and MedLine • MeSH Category Hierarchy • Medical Subject Headings • ~18,000 labels • manually assigned • ~8 labels/article on average • avg depth: 4.5, max depth 9 • Top Level Categories: anatomy diagnosis related disc animals psych technology disease biology humanities drugs physics IS202: Information Organization & Retrieval
MeshBrowse (Korn & Shneiderman95)Only the relevant subset of the hierarchy is shown at one time. IS202: Information Organization & Retrieval
HiBrowse (Pollitt 97)Browsing several different subsets of category metadata simultaneously. IS202: Information Organization & Retrieval
Large Category Sets • Problems for User Interfaces • Too many categories to browse • Too many docs per category • Docs belong to multiple categories • Need to integrate search • Need to show the documents IS202: Information Organization & Retrieval
Text Clustering • Finds overall similarities among groups of documents • Finds overall similarities among groups of tokens • Picks out some themes, ignores others IS202: Information Organization & Retrieval
Scatter/Gather Cutting, Pedersen, Tukey & Karger 92, 93, Hearst & Pedersen 95 • How it works • Cluster sets of documents into general “themes”, like a table of contents • Display the contents of the clusters by showing topical terms andtypical titles • User chooses subsets of the clusters and re-clusters the documents within • Resulting new groups have different “themes” • Originally used to give collection overview • Evidence suggests more appropriate for displaying retrieval results in context IS202: Information Organization & Retrieval
Another use of clustering • Use clustering to map the entire huge multidimensional document space into a huge number of small clusters. • “Project” these onto a 2D graphical representation • Group by doc: SPIRE/Kohonen maps • Group by words: Galaxy of News/HotSauce/Semio IS202: Information Organization & Retrieval
Clustering Multi-Dimensional Document Space(image from Wise et al 95) IS202: Information Organization & Retrieval
Kohonen Feature Maps on Text(from Chen et al., JASIS 49(7)) IS202: Information Organization & Retrieval
Summary: Clustering • Advantages: • Get an overview of main themes • Domain independent • Disadvantages: • Many of the ways documents could group together are not shown • Not always easy to understand what they mean • Different levels of granularity IS202: Information Organization & Retrieval
IR+HCI Question 2: How will a user formulate a query? IS202: Information Organization & Retrieval
Query Specification • Interaction Styles (Shneiderman 97) • Command Language • Form Fill • Menu Selection • Direct Manipulation • Natural Language • What about gesture, eye-tracking, or implicit inputs like reading habits? IS202: Information Organization & Retrieval
Command-Based Query Specification • command attribute value connector … • find pa shneiderman and tw user# • What are the attribute names? • What are the command names? • What are allowable values? IS202: Information Organization & Retrieval
Form-Based Query Specification (Altavista) IS202: Information Organization & Retrieval
Form-Based Query Specification (Melvyl) IS202: Information Organization & Retrieval
Form-based Query Specification (Infoseek) IS202: Information Organization & Retrieval
Direct Manipulation Spec.VQUERY (Jones 98) IS202: Information Organization & Retrieval
Menu-based Query Specification(Young & Shneiderman 93) IS202: Information Organization & Retrieval
IR+HCI Question 3: How will a user scan, evaluate, and interpret the results? IS202: Information Organization & Retrieval
Display of Retrieval Results Goal: minimize time/effort for deciding which documents to examine in detail Idea: show the roles of the query terms in the retrieved documents, making use of document structure IS202: Information Organization & Retrieval
Putting Results in Context • Interfaces should • give hints about the roles terms play in the collection • give hints about what will happen if various terms are combined • show explicitly why documents are retrieved in response to the query • summarize compactly the subset of interest IS202: Information Organization & Retrieval
Putting Results in Context • Visualizations of Query Term Distribution • KWIC, TileBars, SeeSoft • Visualizing Shared Subsets of Query Terms • InfoCrystal, VIBE, Lattice Views • Table of Contents as Context • Superbook, Cha-Cha, DynaCat • Organizing Results with Tables • Envision, SenseMaker • Using Hyperlinks • WebCutter IS202: Information Organization & Retrieval
KWIC (Keyword in Context) • An old standard, ignored by internet search engines • used in some intranet engines, e.g., Cha-Cha IS202: Information Organization & Retrieval
TileBars • Graphical Representation of Term Distribution and Overlap • Simultaneously Indicate: • relative document length • query term frequencies • query term distributions • query term overlap IS202: Information Organization & Retrieval
TileBars Example Query terms: What roles do they play in retrieved documents? DBMS (Database Systems) Reliability Mainly about both DBMS & reliability Mainly about DBMS, discusses reliability Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability Mainly about high-tech layoffs IS202: Information Organization & Retrieval
SeeSoft: Showing Text Content using a linear representation and brushing and linking (Eick & Wills 95) IS202: Information Organization & Retrieval
David Small: Virtual Shakespeare IS202: Information Organization & Retrieval