1 / 57

Slides

Slides. Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf. Interactions. LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004. Slides. Please download the slides from

Télécharger la présentation

Slides

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

  2. Interactions LBSC 796/CMSC 838o Daqing He, Douglas W. Oard Session 5, March 8, 2004

  3. Slides • Please download the slides from www.umiacs.umd.edu/~daqingd/lbsc796-w5.ppt www.umiacs.umd.edu/~daqingd/lbsc796-w5.rtf

  4. Agenda • Interactions in retrieval systems • Query formulation • Selection • Examination • Document delivery

  5. Query Search Indexing Index Acquisition Collection System Oriented Retrieval Model Ranked List

  6. Whose Process Is It? • Who initiates a search process? • Who controls the progress? • Who ends a search process?

  7. IR System Query Formulation Query Search Collection Indexing Index Collection Acquisition Collection User Oriented Retrieval Model User Source Selection Ranked List Document Selection Document Document Examination Document Document Delivery

  8. Taylor’s Conceptual Framework • Four levels of “information needs” • Visceral • What you really want to know • Conscious • What you recognize that you want to know • Formalized (e.g., TREC topics) • How you articulate what you want to know • Compromised (e.g., TREC queries) • How you express what you want to know to a system [Taylor 68]

  9. Belkin’s ASK model • Users are concerned with a problem • But do not clearly understand • the problem itself • the information need to solve the problem  Anomalous State of Knowledge • Need clarification process to form a query [Belkin 80, Belkin, Oddy, Brooks 82]

  10. What are humans good at? • Sense low level stimuli • Recognize patterns • Reason inductively • Communicate with multiple channels • Apply multiple strategies • Adapt to changes or unexpected events From Ben Shneiderman’s “designing user interfaces”

  11. What are computers good at? • Sense stimuli outside human’s range • Calculate fast and mechanical • Store large quantities and recall accurately • Response rapidly and consistently • Perform repetitive actions reliably • Maintain performance under heavy load and extended time From Ben Shneiderman’s “designing user interfaces”

  12. What should Interaction be? Synergic Humans do things that human are good at Computers do things that computers are good at the strength of one covers the weakness of the other

  13. Source Selection People have their own preference Different tasks require different sources Possible choices ask help from people or machines browsing or search, or combination general purpose vs specific domain IR system different collections

  14. Query Search Collection Indexing Index Query Formulation User Query Formulation

  15. User’s Goals • User’s goals • Identify the right query for the current need • conscious/formalized need => compromised need • How can the user achieve this goal? • Infer the right query terms • Infer the right composition of terms

  16. System’s Goals • Help the user • build links between needs • know more about the system and the collection

  17. How does System Achieve Its Goals? • Ask more from the user • Encourage long/complex queries • Provide a large text entry area • Use forms filling or direct manipulation • Initiate interactions • Ask questions related to the needs • Engage a dialogue with the user • Infer from relevant items • Infer from previous queries • Infer from previous retrieved documents

  18. Query Formulation Interaction Styles • Shneiderman 97 • Command Language • Form Fillin • Menu Selection • Direct Manipulation • Natural Language Credit: Marti Hearst

  19. Form-Based Query Specification (Melvyl) Credit: Marti Hearst

  20. Form-based Query Specification (Infoseek) Credit: Marti Hearst

  21. Direct Manipulation Spec.VQUERY (Jones 98) Credit: Marti Hearst

  22. Search Engine High-Accuracy Retrieval of Documents Topic Statement Baseline Results Answers to Clarification Questions HARD Results Clarification Questions

  23. UMD HARD 2003 retrieval model Clarification Questions HARD retrieval process Preference among subtopic areas Query Expansion Recently viewed relevant documents Document Reranking Refined Ranked List Preference to sub-collections or genres Desired result formats Passage Retrieval Ranked List Merging [He & Demner, 2003]

  24. Document Collection 1. Formulate a Query 2. Need negotiation 3. Find Documents Matching the Query Search Engine Search Results Dialogues in Need Negotiation Information Need

  25. Casablanca Context Context Information Retrieval System Romantic Films Context Personalization through User’s Search Contexts Incremental Learner African Queen Romantic Films [Goker & He, 2000]

  26. Things That Hurt • Obscure ranking methods • Unpredictable effects of adding or deleting terms • Only single-term queries avoid this problem • Counterintuitive statistics • “clis”: AltaVista says 3,882 docs match the query • “clis library”: 27,025 docs match the query! • Every document with either term was counted

  27. Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination

  28. Indicative vs. Informative • Terms often applied to document abstracts • Indicative abstracts support selection • They describe the contents of a document • Informative abstracts support understanding • They summarize the contents of a document • Applies to any information presentation • Presented for indicative or informative purposes

  29. User’s Browsing Goals • Identify documents for some form of delivery • An indicative purpose • Query Enrichment • Relevance feedback (indicative) • User designates “more like this” documents • System adds terms from those documents to the query • Manual reformulation (informative) • Better approximation of visceral information need

  30. System’s Goals • Assist the user to • Identify relevant documents • Identify potential useful terms • for clarifying the right information need • for generating better queries

  31. Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination

  32. A Selection Interface Taxonomy • One dimensional lists • Content: title, source, date, summary, ratings, ... • Order: retrieval status value, date, alphabetic, ... • Size: scrolling, specified number, RSV threshold • Two dimensional displays • Construction: clustering, starfields, projection • Navigation: jump, pan, zoom • Three dimensional displays • Contour maps, fishtank VR, immersive VR

  33. Extraction-Based Summarization • Robust technique for making disfluent summaries • Four broad types: • Single-document vs. multi-document • Term-oriented vs. sentence-oriented • Combination of evidence for selection: • Salience: similarity to the query • Selectivity: IDF or chi-squared • Emphasis: title, first sentence • For multi-document, suppress duplication

  34. Generated Summaries • Fluent summaries for a specific domain • Define a knowledge structure for the domain • Frames are commonly used • Analysis: process documents to fill the structure • Studied separately as “information extraction” • Compression: select which facts to retain • Generation: create fluent summaries • Templates for initial candidates • Use language model to select an alternative

  35. Google’s KWIC Summary • For Query “University of Maryland College Park”

  36. Teoma’s Query Refine Suggestions url: www.teoma.com

  37. Vivisimo’s Clustering Results url: vivisimo.com

  38. Kartoo’s Cluster Visualization url: kartoo.com

  39. Cluster Formation • Based on inter-document similarity • Computed using the cosine measure, for example • Heuristic methods can be fairly efficient • Pick any document as the first cluster “seed” • Add the most similar document to each cluster • Adding the same document will join two clusters • Check to see if each cluster should be split • Does it contain two or more fairly coherent groups? • Lots of variations on this have been tried

  40. Starfield

  41. Dynamic Queries: • IVEE/Spotfire/Filmfinder (Ahlberg & Shneiderman 93)

  42. Constructing Starfield Displays • Two attributes determine the position • Can be dynamically selected from a list • Numeric position attributes work best • Date, length, rating, … • Other attributes can affect the display • Displayed as color, size, shape, orientation, … • Each point can represent a cluster • Interactively specified using “dynamic queries”

  43. Projection • Depict many numeric attributes in 2 dimensions • While preserving important spatial relationships • Typically based on the vector space model • Which has about 100,000 numeric attributes! • Approximates multidimensional scaling • Heuristic approaches are reasonably fast • Often visualized as a starfield • But the dimensions lack any particular meaning

  44. Contour Map Displays • Display a cluster density as terrain elevation • Fit a smooth opaque surface to the data • Visualize in three dimensions • Project two 2-D and allow manipulation • Use stereo glasses to create a virtual “fishtank” • Create an immersive virtual reality experience • Mead mounted stereo monitors and head tracking • “Cave” with wall projection and body tracking

  45. ThemeView Credit to: Pacific Northwest National Laboratory

  46. Browsing Retrieved Set User Query Formulation Query Search Ranked List Document Selection Document Query Reformulation Document Reselection Document Examination

  47. Full-Text Examination Interfaces • Most use scroll and/or jump navigation • Some experiments with zooming • Long documents need special features • “Best passage” function helps users get started • Overlapping 300 word passages work well • “Next search term” function facilitates browsing • Integrated functions for relevance feedback • Passage selection, query term weighting, …

  48. A Long Document

  49. Document lens Robertson & Mackinlay, UIST'93, Atlanta, 1993

  50. TileBar [Hearst et al 95]

More Related