1 / 9

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search. Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University. Technology Survey Task @ Chem. Document Collection

tino
Télécharger la présentation

How to Make Manual Conjunctive Normal Form Queries Work in Patent Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Make Manual Conjunctive Normal Form Queries Workin Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University

  2. Technology Survey Task @ Chem • Document Collection • 1.3 million patents + 0.18 million scientific articles • Tend to be long, have XML field structure • Topics • 6 topics (last year only 2 groups submitted runs, not reusable) • About use/detection of chemicals (in certain applications) • Similar to Ad hoc retrieval queries

  3. Example Topic: TS-20 • <title>tests for HCG hormone</title><narrative>The hormone Human Chorionic Gonadotrophin (HCG) is produced when a women becomes pregnant. Tests are usually carried out by analysing blood or urine. We are looking for articles and patents on these pregnancy test kits or the chemical tests used to produce them.</narrative><details><chemicals>Human Chorionic Gonadotrophin OR HCG</chemicals><condition>pregnancy</condition><target>Human Chorionic Gonadotrophin OR HCG</target></details>

  4. Our Runs • Automatic Queries • Unweighted bag of word baseline • Weighting and combining words from different query fields • Manual Queries • Interactive search using Boolean CNF queries • (test OR check OR detection OR detect)AND(HCG OR “Human Chorionic Gonadotrophin” OR “Chorionic Gonadotropin” OR Choriogonadotropin OR Choriogonin) • Effective, used by lawyers, librarians, medical, IR thesaurus & interaction check top ranked results MeSH etc. thesauri

  5. Lemur CGI Identify synonyms 0.5 hours per topic

  6. Results at Large (xinfAP) Not much difference on average Worst manual queries have reasonable AP Manual queries lower some high AP topics slightly Figure credit: MihaiLupu

  7. Observations • Weighting different query fields helped. • Boolean CNF query (manual interaction) • Good • Expressive • Helps a lot for hard (low AP) queries • Bad • Takes time & care to create & interact • Manual error in formulating those queries • Phrase or window restrictions improves top precision, but destroys lower level recall/precision • Difficult to identify from top rank, new tools needed

  8. Comparisons with Best Runs • Fraunhofer-SCAI • Semantic search (similar to our CNF queries) • IPC classification filtering • Doc field based term weighting • Topics that our manual queries got better • TS-22 detect => detection test predict check determine determination • TS-29 minimum inhibitory concentration => … • Expanded all terms, but not all resulted in 

  9. Thanks to track organizers • NSF grant IIS-1018317 • Questions?

More Related