150 likes | 165 Vues
Sabir at TREC 2007 Legal Workshop. Chris Buckley Sabir Research cabuckley@sabir.com. Legal Interactive Track. Track Goal: Identify relevant docs for 1 to 12 topics. Score 1 point per relevant document found Score -1/2 point for each non-relevant doc found.
E N D
Sabir at TREC 2007 Legal Workshop Chris Buckley Sabir Research cabuckley@sabir.com
Legal Interactive Track • Track Goal: Identify relevant docs for 1 to 12 topics. • Score 1 point per relevant document found • Score -1/2 point for each non-relevant doc found. • My Goal: Find as many new relevant docs as possible in 15 minutes. Chris Buckley – TREC 2007
Interactive Run Strategy • SMART Version 16.0 with a (very) hacked together user interface. • Perform Rocchio relevance feedback based upon TREC 2006 judgments plus additional judgments • Current query in an editable window • Get 10 large snippets of documents each iteration • Each snippet has pointer to full document • Judge each snippet/doc Relevant or Not Relevant or Undetermined. • Only human actions are judging docs and modifying query. • Stop new iterations after 15 minutes from time first see the query. Chris Buckley – TREC 2007
Interactive Run Submissions Chris Buckley – TREC 2007
Interactive Run Results Chris Buckley – TREC 2007
Historical Disagreement • Given 2 assessors and a large pool of judgments over many topics, expected overlap of their relevant doc set is only 40-60%. • (NIST: TREC 4, Waterloo: TRECs 6 and 7, Zobel: SIGIR 1998) • About half the topics have little disagreement, but some have massive disagreements. • Disagreements make no difference when comparing systems over all topics. • Disagreements come (mostly) from blunders and scope • Blunder: a differing assessor would agree they made a mistake • Scope: One assessor more lenient than the other on some aspect • My personal estimate on TREC newswire is 5% "blunder" rate per assessor, most of the rest is scope. • 5% rate is on the union of two assessors’ relevant docs. Chris Buckley – TREC 2007
Interactive Disagreement Chris Buckley – TREC 2007
Obvious “Blunder" Examples • <RequestText>All documents discussing or referencing the California Cartwright Act.</RequestText> • <FinalQuery>California w/3 (antitrust OR monopol! OR anticompetitive OR restraint OR "unfair competition" OR "Cartwright")</FinalQuery> • Doc gdc90e00 • In the original complaint filed July 16, 1980,plaintiff charged Philip Morris with breach of contract, unfair competition, and violation of the Cartwright Act.After eighteen months of exhaustive discovery, thep raintiff amended his complaint and deleted all causes of action against Philip Morris exce t the Cartwriqht Act claim, • Doc mae78c00 • False Claims Act (CaI. Gov't Code 9*****-12655) (id. at 24); and damages equivalent to the State's Medi-Cal expenditures for alleged Relief Rectuested: Prohibitory injunctive relief (id. At 3-24); civil fines and penalties under the IICA and the California action for violation of the Cartwright Act (?d. 1170-74) and one cause of action for violations of the False Claims Act. (Id. 1$75-80) . Chris Buckley – TREC 2007
Obvious “Blunder" Example • <RequestText>All documents to or from employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.</RequestText> • <FinalQuery>(cand! OR chocolate) w/10 cigarette!</FinalQuery> • Doc vqg61e00 • ~"' -' . -~ your letter of April 18, 1961 (by Mr. Shigihara) and the proposed candy packages bearing CLD *****, KENT and PIE`APaRT label facsir>:iles as -they are proposed to be amended by .- overprint or substitution of such legends as "8ubble Oum", "Chocolate Cigarettes" or like **********, we approve of the proposed packages and are willing,to grant you permission to market them in Japan, subject to the following conditionsi 1. Packages containing the bubble gum or chocolate cigarettes will conform precisely to the sample packages which ************ your letter and may not be changed without our written consent. Chris Buckley – TREC 2007
Non-obvious “Blunder" Example • <RequestText>All documents that refer or relate to pigeon deaths during the course of animal studies.</RequestText> • <FinalQuery>(research OR stud! OR "in vivo") AND pigeon AND (death! OR dead OR die! OR dying)</FinalQuery> • Doc gwp94f00 • We have successfully conducted and met the protocol requirements for periodic three-month sacrifices over a now eighteen consecutive months of tobacco smoke inhalation in pigeons. • 7 copies of this 8 page document. Chris Buckley – TREC 2007
"Maybe" for Topic 45 • <RequestText>All documents that refer or relate to pigeon deaths during the course of animal studies.</RequestText> • <FinalQuery>(research OR stud! OR "in vivo") AND pigeon AND (death! OR dead OR die! OR dying)</FinalQuery> • Explicit pigeon autopsies considered relevant • My "maybe"s include • 5 implied autopsies (eg liver or heart examinations) • 4 explicit autopsies proposed • 12 implied autopsies proposed Chris Buckley – TREC 2007
Maybe Relevant example • <RequestText>All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated.</RequestText> • <FinalQuery>((guide! OR strateg! OR approv!) AND (place! or promot!)) AND (("G-rated" OR "G rated" OR family) W/5 (movie! OR film! OR picture!))</FinalQuery> • Doc alp15f00 • 6. You agree not to place the Products in any motion picture that is made for or intended for display on television or any motion picture intended for or likely to appeal to an audience under the age of twenty-one. • 10 docs with this text (2 different form letters). • 2 were judged relevant. Chris Buckley – TREC 2007
Interactive Disagreement within “Essentially” Duplicates Chris Buckley – TREC 2007
Discussion • Overall disagreement rate a bit higher than expected but reasonable • Blunder rate noticeably higher than historical • OCR docs • non-newswire docs • non-professional writers • essentially duplicate docs increases borderline errors • volunteer assessors (but spent more time per doc) • Even so, blunder rate less than scope error rate Chris Buckley – TREC 2007
Implications of Large Disagreements • TREC-style comparison of systems still valid • We have enough topics • But, judgments on individual topics are questionable • lawyers are less interested in averages over topics, and more interested in worst case (their case). • Blunder rate caused by uncertainty in doc judgment • Can be directly addressed by "good practices" • Should TREC help develop those good practices? • Scope error rate caused by uncertainty in what the topic means • How should this be addressed? • Need to involve both sides in a case? • Do we need formal involvement of both sides during development of relevance feedback searches? Chris Buckley – TREC 2007