1 / 62

2ID10: Information Retrieval Lecture 2: IR Evaluation & Queries

2ID10: Information Retrieval Lecture 2: IR Evaluation & Queries. Lora Aroyo 4 April 2006. User Query. Lecture 1 Summary. Compare the information need with the information. generate a ranking which reflects relevance. Information Need. Ranked list of documents. IR System. feedback.

leighton
Télécharger la présentation

2ID10: Information Retrieval Lecture 2: IR Evaluation & Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2ID10: Information RetrievalLecture 2: IR Evaluation & Queries Lora Aroyo 4 April 2006

  2. User Query Lecture 1 Summary Compare the information need with the information generate a ranking which reflects relevance Information Need Ranked list of documents IR System feedback Lecture 2: Query Languages & Operations

  3. Lecture 1: Summary • IR Classic Models • Document Representation • Query representation • Indexing • Weighting & Similarity • TF-IDF Lecture 2: Query Languages & Operations

  4. Lecture 2: Overview • Types of evaluation • Relevance and test collections • Effectiveness measures • Recall and Precision • Significance tests • Query languages Lecture 2: Query Languages & Operations

  5. Types of IR Evaluation • Assistance in formulating queries • Speed of retrieval • Resources required • Presentation of documents • Ability to find relevant documents • Appealing to users (market evaluation) • Evaluation generally comparative • System A vs. B or A vs A’ • Cost-benefit analysis possible • Most common evaluation: retrieval effectiveness Lecture 2: Query Languages & Operations

  6. IR Evaluation • Functional analysis • Test system each functionality (includes error analysis) • Performance analysis • Response Time & Space Required (balance/ tradeoffs) • short response time  smaller space used  better system • Performance evaluation • Performance of indexing structures, OS interactions, delays • Retrieval performance evaluation • How precise is the answer set • On a given retrieval strategy S similarity between retrieved docs & expert docs  goodness of S Lecture 2: Query Languages & Operations

  7. IR Evaluation • Effectiveness • the ability of IR system to retrieve relevant documents and suppress non-relevant documents • related to relevancy of retrieved items • Relevancy • typicallynot binary • Subjective: Depends upon a specific user’s judgment • Situational: Relates to user’s current needs • Cognitive: Depends on human perception and behavior • Dynamic: Changes over time Lecture 2: Query Languages & Operations

  8. Relevancy • Relevant (not relevant) according to User • Relevant (not relevant) according to System • Four main situations: • User – Relevant & System – Not Relevant • User – Not Relevant & System – Relevant • User – Not Relevant & System – Not Relevant • User – Relevant & System – Relevant Lecture 2: Query Languages & Operations

  9. Relevancy Aspects • Logical relevancy • “Bosch” (trade mark) vs. “Den Bosch” • Usability • Date and origin of the document • Format of the document • Other users Lecture 2: Query Languages & Operations

  10. Test collection • Real collections • never know full set of relevant documents • Compare retrieval performance with a Test collection • set of documents • set of queries • set of relevance judgments (which docs relevant to each query) Lecture 2: Query Languages & Operations

  11. Test Collections • To compare the performance of two techniques • each technique used to evaluate test queries • results (set or ranked list) compared using some performance measure • most common measures - precision and recall • Usually - use multiple measures to get different views of performance • Usually - test with multiple collections - performance is collection dependent Lecture 2: Query Languages & Operations

  12. Sample Test Collection Lecture 2: Query Languages & Operations

  13. Test collection creation • Manual method • Every document judged against every query by experts • Pooling method • Queries run against several IR systems first • Results pooled, top proportion chosen for judging • Only top documents are judged Lecture 2: Query Languages & Operations

  14. Text REtrieval Conference (TREC) • Established in 1992 to evaluate large-scale IR • Retrieving documents from a gigabyte collection • Run by NIST’s Information Access Division • Initially sponsored by DARPA as part of Tipster program • Now supported by many, including DARPA, ARDA, and NIST • Most well known IR evaluation setting • Proceedings available at http://trec.nist.gov Lecture 2: Query Languages & Operations

  15. Text REtrieval Conference (TREC) • Consists of IR research tracks • Ad-hoc retrieval, routing, cross-language, scanned documents, speech recognition, query, video, filtering, Spanish, question answering, novelty, Chinese, high precision, interactive, Web, database merging, NLP, … • Each track works on roughly the same model • NIST carries out evaluation • How well your site did • How others tackled the problem • Successful approaches generally adopted in next cycle Lecture 2: Query Languages & Operations

  16. Lecture 2: Overview • Types of evaluation • Relevance and test collections • Effectiveness measures • Recall and Precision • Significance tests • Query languages Lecture 2: Query Languages & Operations

  17. whole document collection Relevant documents Retrieved documents The ability of the search to retrieve top-ranked documents that are mostly relevant Retrieved documents that are relevant The ability of the search to find all of the relevant documents in the corpus Relevant documents that are retrieved Precision & Recall Purpose of all IRS is to retrieverelevant information Lecture 2: Query Languages & Operations

  18. retrieved & irrelevant Not retrieved & irrelevant irrelevant retrieved & relevant not retrieved but relevant relevant retrieved not retrieved Query Match • Match =retrieved document satisfying (relevant to) the information need • character strings in descriptor and query keywords match • Miss =not retrieveddocument satisfying (relevant to) the information need • character strings in descriptor and query keywordsdo not match(semantically similar) • False match =retrieved document which satisfies the query but is not relevant to the information need • character strings in descriptor and query keywords matchbut are semantically different Lecture 2: Query Languages & Operations

  19. Retrieval Evaluation Setting • Q - query • R – set of relevant documents • |R| - number of relevant documents • S(Q)  A – answer set • |A| - number of answer set documents • Ra – relevant documents in answer set • |Ra| - number of docs in R  A Relevant Documents in Answer Set |Ra| Relevant Docs |R| Answer Set |A| Lecture 2: Query Languages & Operations

  20. |Ra| |A| Precision • Fraction of the retrieved documents (A), which are relevant • high precision • when there are relatively few False Matches • can be determined exactly Precision = (System & User: Yes) Precision = (User: No & System: Yes) (System & User: Yes) Relevant documents retrieved Precision = All documents retrieved Lecture 2: Query Languages & Operations

  21. |Ra| |R| Recall • Fraction of the relevant documents (R), which are retrieved • high recall • when there are relatively few Misses • cannot be determined exactly - requires knowledge of all relevant documents in a collection Recall = (System & User: Yes) Recall = (User: Yes & System: No) (System & User: Yes) Relevant documents retrieved Recall = All relevant documents Lecture 2: Query Languages & Operations

  22. Determining Recall is Difficult • Total number of relevant items is sometimes not available: • Sample across the database and perform relevance judgment on these items • Apply different retrieval algorithms to the same database for the same query. The aggregate of relevant items is taken as the total relevant set Lecture 2: Query Languages & Operations

  23. Returns relevant documents but misses many useful ones too The ideal 1 Precision 0 1 Recall Returns most relevant docs but includes lots of junk Trade-off between Recall & Precision We aim to obtain the highest for both • IR trying to increase the number of relevant docs will also retrieve increasing numbers of non-relevant • efforts to increase one measure tend to decrease the other Lecture 2: Query Languages & Operations

  24. Computing Recall/Precision Points • For a givenquery • produce the ranked list of retrievals • Adjust a thresholdon this ranked list • produces different sets of retrieved documents • and therefore different recall/precision measures • Markeach document in the ranked list that isrelevant • Compute a recall/precision pair for each position • in the ranked list that contains a relevant document Lecture 2: Query Languages & Operations

  25. Computing Example Let total # of relevant docs = 6 Check each new recall point: R=1/6=0.167; P=1/1=1 R=2/6=0.333; P=2/2=1 R=3/6=0.5; P=3/4=0.75 R=4/6=0.667; P=4/6=0.667 Missing one relevant document. Never reach 100% recall R=5/6=0.833; p=5/13=0.38 Lecture 2: Query Languages & Operations

  26. Example http://www.googlewhack.com/ findthat elusive query (two words - no quote marks)with a single, solitary result! http://www.webology.ir/2005/v2n2/a12.html comparison of precision and recall in Search Engines Lecture 2: Query Languages & Operations

  27. Low Recall& Solutions • Words exist in several forms e.g. limit, limits, limited, limitation • Stemmingto increase recall • Suffix removal allows word variants to match • e.g. word roots often precede modifiers • Boolean systems often allow manual truncation • Stemming does automatic truncation Lecture 2: Query Languages & Operations

  28. Low Recall& Solutions • Synonymy • Many words with similar meanings: • Synonym(w1, w2)  m [w1Meansm  w2Meansm] • Recall increased by: • Thesaurus-based query expansion • Latent semantic indexing • Polysemy • One word has dissimilar meanings • PolySem(w)  m1m2 [wMeansm1  wMeansm2] • Recall increased by word sense disambiguation • Indexing word meanings rather than words • Context provides clues to word meaning Lecture 2: Query Languages & Operations

  29. Query Languages (QL) • Which queries can be formulated • Dependent on the underlying IR model • Use: • content (semantics) • content structure (text syntax) • to find relevant documents • Query enhancement techniques • e.g. synonyms, thesauri, stemming, etc. • Query • Formulation of the user’s info need • Words or combination of words & operations Lecture 2: Query Languages & Operations

  30. Keyword-based Querying • Keywords: • contained in documents • Retrieval Unit: • retrieved document • contains the answer to the query • Intuitive • Easy to express • Allow for fast ranking • Basic queries (single & multiple words) Lecture 2: Query Languages & Operations

  31. Single-word queries • Text documents  search for the keywords • Set of docs – ranked according to the degree of similarity to the query • Ranking • word occurrences inside the text • term frequency - counts the number of times a word appears inside a document Lecture 2: Query Languages & Operations

  32. Context queries • Complement single-word queries with search for ‘context’ – word, which are near to other words • Phrase context query • Sequence of single-word queries • Proximity context query • More relaxed version of phrase query • Sequence of single-word queries with a max allowed distance between them • Distance – in characters or words Lecture 2: Query Languages & Operations

  33. Examples Context Queries • Phrase • ‘information retrieval’ • ‘information about retrieval’ • ‘information with respect to the retrieval’ • Distance • 1 • 4 • Ranking similar to single-word queries Lecture 2: Query Languages & Operations

  34. Boolean Queries • Oldest form of keyword query • words + operators • atoms (basic queries) + Boolean operators • A or B, A and B, A not B • Query syntax tree AND OR white paper chocolate Lecture 2: Query Languages & Operations

  35. Boolean Query Mechanics • Basic Query: • Find X  return all documents containing term X • X = Single words or phrases • Simple text or string matching • Complex Query: • boolean connectors and, or, not Lecture 2: Query Languages & Operations

  36. Boolean IR • Boolean operators approximate natural language • e.g. find documents about a colour printers that are not made by Hewlett-Packard • AND can denote relationships between concepts • e.g. colour AND printer • OR can denote alternate terminology • e.g. colour AND(printer OR laser-printer) • NOT can exclude alternate meanings • e.g. colour AND(printer OR laser-printer) NOT (Hewlett-Packard OR HP) Lecture 2: Query Languages & Operations

  37. Google Search • Google basic search • http://www.google.com/help/basics.html • Google advanced search • http://www.google.com/help/refinesearch.html Lecture 2: Query Languages & Operations

  38. Natural Language Queries • Enumeration of words & context queries • All docs matching a portion of the query are retrieved • Higher rankingto all docs matching more parts of query • Negation- user determines words to be eliminated  lower ranking • Threshold for too low ranked docs • Boolean queries a simplified version of NL queries • Vector of term weights (doc & query) Lecture 2: Query Languages & Operations

  39. Natural Language Queries Lecture 2: Query Languages & Operations

  40. Lecture 2: Query Languages & Operations

  41. Lecture 2: Query Languages & Operations

  42. Pattern Matching • More specific query formulation • Based on concept of pattern • a set of syntactic features that occur in a text segment • segments that fulfils the pattern specifications – pattern match • Retrieve pieces of text that have some property • Useful for linguistics, text statistics, data extraction • Pattern types: • Words, prefixes, suffixes, substrings, ranges, errors, regular expressions, extended patterns Lecture 2: Query Languages & Operations

  43. Examples Pattern Matching • Words – string – sequence of chars • Prefixes – ‘program’ programmer • Suffixes – ‘er’  computer, monster, poster • Substrings – ‘tal’  coastal, talk, matallic • any flow  will match ‘many flowers’ • Ranges – a pair of strings which matches any word lying between them in lexicographical order – eg. range between words held and hold will retrieve strings such as hoax, hissing, helm, help, etc. (lexicographical order) Lecture 2: Query Languages & Operations

  44. Examples Pattern Matching • Allowing errors • word together with an error threshold • retrieves all text words similar to a given word • errors are caused by typing, spelling, etc. • most accepted model is the Levenshtein distance or edit distance Lecture 2: Query Languages & Operations

  45. Examples Pattern Matching • Regular expression – general pattern build up by simple strings & operators (, , ) • “pro (blem|tein) (s|ε) (0|1|2)*” • will match words like: • problem02 • proteins • Extended patterns • subset of the regular expressions • conditional expressions (part of the pattern may not appear always • wild characters matching any sequence in the text Lecture 2: Query Languages & Operations

  46. Example • distance between: • COLOR and COLOUR is 1 • SURVEY and SURGERY is 2 • in the query, must be specified the maximum number of allowed errors for a word to match the pattern Lecture 2: Query Languages & Operations

  47. Structural Queries • Based on structure of the text • Structure in text usually very restrictive • Languages to represent structured documents (HTML) • 3 structures • Fixed (form-like) • Hypertext • Hierarchical • Current query languages integrate both contents and structural queries Lecture 2: Query Languages & Operations

  48. Fixed Structure • Docs have fixed set of fields • Some fields are not present in all docs • No nesting or overlap between fields is allowed • Each model refers to a concrete structure of a collection Lecture 2: Query Languages & Operations

  49. Hypertext • Max freedom with respect to structuring power • Directed graph where thenodeshold some text and thelinksrepresent connection between nodes or positions outside of nodes • User manually traverses the hypertext nodes following links to search • http://xanadu.com/zigzag/ Lecture 2: Query Languages & Operations

  50. Hierarchical Structure • Intermediate model • between fixed and hypertext • Recursive decomposition of text • typical for many text collections • Simplification from hypertext to a hierarchy • allows for faster algorithms to solve queries • The more powerful the mode – the less efficiency implemented • Example: • retrieve a figure on a page with structure • Title: car • Introduction: blue in with figure with section introduction title Lecture 2: Query Languages & Operations

More Related