130 likes | 137 Vues
Lessons Learned from Information Retrieval. Chris Buckley Sabir Research chrisb@sabir.com. Legal E-Discovery. Important, growing problem Current solutions not fully understood by people using them Imperative to find better solutions that scale Evaluation required
E N D
Lessons Learned from Information Retrieval Chris Buckley Sabir Research chrisb@sabir.com
Legal E-Discovery • Important, growing problem • Current solutions not fully understood by people using them • Imperative to find better solutions that scale • Evaluation required • How do we know we are doing better? • Can we prove a level of performance? Chris Buckley – ICAIL 07
Lack of Shared Context • The basic problem of both search and e-discovery • Searcher does not necessarily know beforehand “vocabulary” or background of either author or intended audience of documents to be searched Chris Buckley – ICAIL 07
Relevance Feedback • Human judges some documents as relevant, system finds others based on judgements • Only general technique to improve system knowledge of context proven successful • works from small collections of 1970’s to large collections of present (TREC HARD track) • Difficult to apply to discovery • Need to change entire discovery process Chris Buckley – ICAIL 07
Toolbox of other techniques • Many other aids to search • Ontologies, linguistic analysis, semantic analysis, data mining, term relationships • Good techniques for IR uniformly: • Give big wins for some searches • Give mild losses for others • Need a set of techniques, a toolbox • In practice for IR research, issue not finding big wins, but avoiding the losses Chris Buckley – ICAIL 07
Implications of toolbox • No expected silver bullet AI solution • Boolean search will not expand to accommodate combinations of solutions • Test collections are critical Chris Buckley – ICAIL 07
Test Collection Importance • Needed to develop tools • Needed to develop decision procedures of when to use tools • Toolbox requirement means needed to distinguish a good overall system from one with a good tool • All systems are able to show searches on which individual tools work well • Good system shows performance gain on entire set of searches. Chris Buckley – ICAIL 07
Test Collection Composition • Large set of realistic documents • Set (at least 30) of topics or information needs • Set of judgements: what documents are responsive (or non-responsive) to each topic • Judgements are expensive and limit how test collection results can be interpreted Chris Buckley – ICAIL 07
Incomplete Judgements • Judgements are too time consuming and expensive to be complete (judge every one) • Pool retrieved documents from a variety of systems • Feasible, but: • Known incomplete • We can’t even accurately estimate how incomplete Chris Buckley – ICAIL 07
Inexact Judgements • Humans differ substantially on judgements • Standard TREC collections: • Topics include 1-3 paragraphs describing what makes a document relevant • Given same pool of documents, 2 humans overlap on 70% of their relevant sets • 76% agreement on small TREC legal test Chris Buckley – ICAIL 07
Implications of Judgements • No gold standard of perfect performance is even possible • Any system claiming better than 70% precision at 70% recall is working on a problem other than general search • Almost impossible to get useful absolute measures of performance Chris Buckley – ICAIL 07
Comparative Evaluation • Comparisons between systems on moderate size collections (several GBytes) are solid. • Comparative results on larger collections (500 GBytes) are showing strains • Believable but larger error margin • Active area of research • Overall goal for e-discovery has to be comparative evaluation Chris Buckley – ICAIL 07
Sabir TREC Legal Results • Submitted 7 runs • Very basic approach (1995 technology) • 3 tools from my toolbox • 3 query variations • One of the top systems • All results basically the same • tools did not help on average Chris Buckley – ICAIL 07