170 likes | 177 Vues
This paper discusses structured queries and their effectiveness in the legal search domain, using the TREC 2007 Legal Track as a case study.
E N D
Structured Queries for Legal SearchTREC 2007 Legal Track Yangbo Zhu, Le Zhao,Jamie Callan, Jaime Carbonell Language Technologies Institute School of Computer Science Carnegie Mellon University 11/06/2007
Agenda • Introduction • Main task – ad hoc search • Routing task – relevance feedback
AND OR W/5 guide OR OR strategy family movie approval “G rated” film What is legal search • Goal: retrieve all documents for production requests. • Production request: describes a set of documents that the plaintiff forces the defendant to produce. • Recall-oriented: high risk (value) of missing (finding) important documents. Final query Sample request text: All documents discussing, referencing, or relating to company guidelines, strategies, or internal approval for placement of tobacco products in movies that are mentioned as G-rated.
Data set • 7 million business records from tobacco companies and research institutes. • Metadata: title, author, organizations, etc. • OCR text: contain errors • 50 topics generated from four hypothetical complaints created by lawyers
Main task – Ad hoc search Indri query formulation • Without boolean constraint #combine(ranking function) • With boolean constraints #filreq( #band(boolean constraint) #combine(ranking function) )
AND OR W/5 guide OR OR strategy family movie approval “G rated” film Boolean constraint • Translate the Final Query
AND OR W/5 guide OR OR strategy family movie approval “G rated” film Ranking functions • Bag of words (guide strategy approval family G rated movie film) • Respect phrase operators (guide strategy approval family #1(G rated) movie film) • Group synonyms together (#syn(guide strategy approval) #syn(family #1(G rated)) #syn(movie film))
Experiments and findings • Boolean constraints improve recall and precision • Structured queries outperform bag-of-words ones * B is the number of documents matching the Final Query. Its average value is 5000.
Per topic performance(Difference to the median of 29 manual runs) • est_RB • est_PB
Routing task of Legal track 2007 • Structured queries are known to be hard to construct. • Not, with supervision • Questions • Weighted query help? • Metadata&Annotations help? • A definitive answer from Supervised Structured Query Construction
Structured query • #weight( w1 t1 w2 t2 … wn tn)
Supervised Structured Query Construction • Relevance feedback => supervised learning • Train linear SVM with keyword, keyword.field feature • SVM classifier • fi : training weights for terms, choose to be tfidf/LM scores • Retrieval: #weight( w1 t1 w2 t2 … ) • fi : tfidf/LM scores for terms • Advantages • Given enough training, know for sure whether one type of feature helps
Example Query • <RequestNumber>13</RequestNumber> • <RequestText>All documents to or from employees of a tobacco company or tobacco organization referring to the marketing, placement, or sale of chocolate candies in the form of cigarettes.</RequestText> • <FinalQuery>(cand! OR chocolate) w/10 cigarette!</FinalQuery>
Annotations NE: bush.person sentence: violate.sent meta: television.title • Feedback query:
Performance On 39 topics of Legal 2006 (2/3 of judged documents for training, the rest for testing) On 10 topics of Legal 2007 routing task
Routing Conclusions • A principled way of constructing structured queries • Annotations • Query term weights • Answers from a supervised learning algorithm • Weights helps, annotations less.
Thank you! Questions?