280 likes | 428 Vues
This document discusses various interfaces for retrieving information, focusing on selecting collections, submitting requests, and balancing expressiveness with usability. It compares command line, graphical, and natural language interfaces while exploring methodologies for evaluating retrieval results. Strategies for understanding documents through metadata, document structure, and thumbnails are examined, alongside methods like KWIC and TileBars for visualizing document-query relations. Additionally, it highlights systems like DynaCat and Envision for categorizing and presenting results effectively to users.
E N D
Information Retrieval Activities • Selecting a collection • Talked about last class • Lists, overviews, wizards, automatic selection • Submitting a request • Balancing expressiveness and usability • Command line, graphical, and NL interfaces • Examining the response • Comprehension • Contextual displays
Evaluating Retrieval Results • Selecting among returned documents • Requires partial understanding of documents without looking at whole document • To provide understanding of documents: • Show relations to query terms • Show in collection overviews • Provide descriptive metadata • Indicate document structure • Indicate the hyperlink structure • Indicate relations between returned documents
Document Surrogates • Resulting documents are presented by partial information about document • Important metadata (title, date, source) • Selected chunks of document • Thumbnail images of documents • Some systems provide short and long document surrogates. • Normally, clicking on a surrogate causes the document to be displayed.
Document Relation to Query • Simple ways to indicate relation: • Select snippet with query terms • Highlight query terms in document display (thumbnail or whole) • Scroll to first occurrence of query term
Keyword in Context (KWIC) • KWIC document surrogates • Phrases and sentences with query terms are extracted • These snippets are presented along with metadata • Design issues • Deciding how many and which occurrences of keywords to show • Use query term weights, if any • Evidence indicates selecting text segments with largest number of query terms that appear near beginning of document
TileBars • TileBars is a compact visualization of documents’ relation to query terms. • Document surrogate is a rectangular bar divided into a matrix/table • Rows correspond to query facets • Columns are sections of document • Darkness in each row/column position indicates the occurrence of that facet in that portion of the document.
SeeSoft • Visualization where each line of document is visualized as line in graphical column • Color indicates characteristics of the line. • Originally developed to help understand program code • Applied to document analysis and text retrieval
Relative Query Term Relations • Prior set of systems present individual documents and their relation to query terms • To present a larger number of results • Visually represent sets of documents • Indicate sets’ relations to query terms • Examples • InfoCrystal • VIBE
Superbook • Uses a table of contents to indicate where query terms appear • Requires document structure
Categories for Retrieval Results • Present results in groups based on some categorization • Categories can be based on metadata • Categories canbe inferred • Categories canbe chosen basedon query type(DynaCat)
Hyperlinks for Retrieval Results • Present navigational links between retrieved documents • Relies on links between documents • Most often used for searching a single web site (or similar repository) • Examples • Cha-Cha • Mapuccino
Table Views for Retrieval Results • Category and link views present only one type of interdocument relation • Documents have many different potential relations • Tabular views can provide an overview of a set of relations • Each row is a document • Each column is an attribute (metadata field or other) • Content of table indicates values and relations between values • Examples • Envision • TableLens
Summary • Users must partly understand retrieval results to select which to view. • Techniques: • Highlighting and scrolling indications of relations to search terms (snippets, Google cache, Popout Prism) • Set-based views in relation to search terms (InfoCrystal, VIBE) • Visualization of search terms in sections (TileBars, SeeSoft, SuperBook) • Categorization of results (DynaCat, clusty.com) • Hyperlinks between results (Cha-Cha, Mapuccino) • Table views of results (Envision, TableLens)