1 / 14

On the Evaluation of Snippet Selection for Information Retrieval

On the Evaluation of Snippet Selection for Information Retrieval. A. Overwijk , D. Nguyen, C. Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong. Contents. Properties of a good evaluation method Evaluation method of WebCLEF Approach Results Analysis Conclusion. Good evaluation method.

lyris
Télécharger la présentation

On the Evaluation of Snippet Selection for Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Evaluation of Snippet Selection for Information Retrieval A. Overwijk, D. Nguyen, C. Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong

  2. Contents • Properties of a good evaluation method • Evaluation method of WebCLEF • Approach • Results • Analysis • Conclusion

  3. Good evaluation method • Reflects the quality of the system • Reusability

  4. Evaluation method of WebCLEF • Recall • The sum of character lengths of all spans in the response of the system linked to nuggets (i.e. an aspect the user includes in his article), divided by the total sum of span lengths in the responses for a topic in all submitted runs. • Precision • The number of characters that belong to at least one span linked to a nugget, divided by the total character length of the system’s response.

  5. Approach • Better system, better performance scores? • Similar system, same performance scores? • Worse system, lower performance scores?

  6. Better system • Last year’s best performing system contains a bugour %stopwords = qw( for my $w … { ‘s next if exists $stopwords{$w}; a … … } zwischen);

  7. Better system

  8. Similar system • General idea • Almost identical snippets should have almost the same precision and recall • Experiment • Remove the last word for every snippet in the output of last year’s best performing system

  9. Similar system

  10. Worse system • Delivering snippets based on occurrence • 1st snippet = 1st paragraph of 1st document • 2nd snippet = 2nd paragraph of 2nd document • ... • No difference with search engines, except that documents are split up in snippets

  11. Worse system

  12. Analysis • Pool of snippets • Implementation • Assessments

  13. Conclusion • Evaluation method is not sufficient: • Biased towards participating systems • Correctness of a snippet is too strict • Recommendations: • N-grams (e.g. ROUGE) • Multiple assessors per topic

  14. Questions

More Related