1 / 14

Precision & Recall

Precision & Recall. Relevancy: the likelihood or probability that a search result (i.e. “a hit”) meets the user’s expected information need; the result fulfills or partially satisfies the need or answers the question at hand

teddy
Télécharger la présentation

Precision & Recall

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Precision & Recall Relevancy: the likelihood or probability that a search result (i.e. “a hit”) meets the user’s expected information need; the result fulfills or partially satisfies the need or answers the question at hand Precision: the ratio of retrieved documents (or a pre-defined subset) that are relevant, contraposed to those that are irrelevant Recall: the portion of the total number (or a pre-defined subset) of relevant documents that are successfully retrieved

  2. Precision & Recall Retrieved Relevant Documents

  3. High Precision, Low Recall Retrieved Relevant Documents Retrieved but not relevant

  4. Low Precision, High Recall Relevant but not retrieved Retrieved Relevant Documents

  5. Precision & Recall • There is a trade-off between precision and recall. Greater precision decreases recall and greater recall leads to decreased precision • Both rely on a good query • To get best recall and precision the user must have: • An understanding of the optimal search syntax accepted by the system • An understanding of what types of content is stored in the system

  6. Indexing • Manual: done by professional indexers with subject knowledge; intellectually labor intensive; expensive • experience with using thesauri • understand formatting styles to be extracted from and conformed to • Machine: done with computer algorithms; can process much more data but prone to mistakes which cause loss of recall/precision on the user’s end; little or no labor; relatively inexpensive

  7. Search Types • Thesaurus • Controlled vocabulary, uniform headings, etc. • Fields • Author, Title, Publication Year, etc. • Full-Text • Natural Language, Keyword (Google) • Cited Reference • Related (Similar) • Find documents related to or similar to target • A comprehensive search requires all of these

  8. Specialized Search Types • Image matching • Find documents with similar or matching images • Chemical Structure • Find documents with similar or matching chemical structures • LaTex • Scientific & mathematical formulas • Video & Audio

  9. Search Modifiers & Operators • Boolean (AND, OR and NOT) • Combinations of search terms • Rifle AND pistol; rifle OR pistol; rifle NOT pistol • Correct syntax important • Nesting • Combining terms in one search field with terms in another search field • SU=(Biology OR Ecology) AND AU=White

  10. Boolean Concepts Each circle represents a different subject concept Finds citations containing both Finds citations containing at least one Finds citations which contain A but not B Stimulate 4, October 2004, VUB Brussel

  11. Search Modifiers & Operators • Exact Phrase • Usually quotation marks; “pride and prejudice” • Proximity • Search for terms adjacent to each other or within a specified proximity of each other; order may or may not matter • Christmas ADJ Eve • Adam NEAR\7 Eve • Meteorite SAME Earth • Emergency BEFORE\2 responder

  12. Search Modifiers & Operators • Truncation (wildcard) • Search for variant forms of a word or stem • Some databases offer auto-stemming; similar but not the same as truncation

  13. Search Modifiers & Operators • Truncation: right, left and center; not all databases offer all types • Right: environment?, hydrolog? • Left: ?phobia, • Center: Wom?n, Organi?ation, Col?r • Some databases allow for mixed truncation • ?librar? • Some have special symbols to represent • Exactly one character • One or more characters • Zero or more characters

  14. Search Logs The practice of recording search strategies, search strings, keywords used, fields searched, search modifiers, etc. Many modern databases have features built in to help you keep track

More Related