190 likes | 279 Vues
Relevance and Evaluation of IR Performance. Dr. Bilal IS 530 Fall 2009. Searching for Information. Imprecise Incomplete Tentative Challenging. Why Some Items Are Not Retrieved?. Indexing errors Wrong search terms Wrong database Language variations Other (to be answered by students).
E N D
Relevance and Evaluation of IR Performance Dr. Bilal IS 530 Fall 2009
Searching for Information • Imprecise • Incomplete • Tentative • Challenging
Why Some Items Are Not Retrieved? • Indexing errors • Wrong search terms • Wrong database • Language variations Other (to be answered by students)
Boolean Operators • OR increases recall • AND increases precision • NOT increases precision by elimination
Relevance • A match between a query and information retrieved • A judgment • Can be judged by anyone who is informed of the query and views the retrieved information
Relevance • Judgment is dynamic • Documents can be ranked by likely relevance • In practice, not easy to measure • Not focused on user needs
Pertinence • Based on information need rather than a match between a query and retrieved documents • Can only be judged by user • May differ from relevance judgment
Pertinence • Transient, varies with many factors • Not often used in evaluation • May be used as a measure of satisfaction • User-based, as opposed to relevance
Relevance Judgment • Users base it on: • Topicality • Aboutness • Utility • Novelty • Satisfaction
IR Performance Recall Ratio = the number of relevant documents retrieved the total number of relevant documents
Recall and Precision in Practice • Inversely related • Search strategies designed for high precision or high recall (or medium) • Needs of users dictate search strategy towards recall or precision • Practice helps changing queries to favor recall or precision
Recall and Precision 1.0 Recall 1.0 Precision
IR Performance Precision Ratio = the number of relevant documents retrieved the total number of documents retrieved
High Precision Search • Use these strategies, as appropriate: • Controlled vocabulary • Limit feature (e.g., specific fields, major descriptors, date(s), language, as appropriate) • AND operator • Proximity operators carefully • Truncation carefully
High Recall Search • Use these strategies, as appropriate: • OR logic • Keyword searching • No or minimal limit to specific field(s) • Truncate • Broader terms
Improving IR Performance • Good mediation of search topic before searching • User presence during search, if possible • Preliminary search judged by user • Evaluation during search (by searcher or by searcher and user)
Improving IR Performance • Refinement of search strategies • Searcher to evaluate final results • User to evaluate final results
Improving IR Performance • Better system design • Better indexing and word parsing • Better structure of thesauri • Better user interface (e.g., more effective help feature) • Better error recovery feedback • User-centered design
Relevance in Information Science • Dr. Tefko Saracevic talk at SIS on: “Relevance in information science” • To watch the streaming video and the slide show, click on • http://mediabeast.ites.utk.edu/mediasite4/Viewer/?peid=fb8f84cb-9f82-499f-b12c-9a56ab5cf5ba