1 / 8

Applying the KISS Principle with Prior-Art Patent Search

Learn how simplicity leads to better results in prior-art patent search, using structured search, filtering, and combination of terms effectively. Key findings presented from CLEF-IP 2009 study at Dublin City University.

bryder
Télécharger la présentation

Applying the KISS Principle with Prior-Art Patent Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying the KISS Principle with Prior-Art Patent Search CLEF-IP, 22 Sep 2010 Walid Magdy Gareth Jones Dublin City University

  2. DCU participation in CLEF-IP 2009 • The more text, the better the results • Structured search does not help • Filtering helps • Combination of terms and phrases does better • Word matching for search is not the best • Blind relevance feedback is ineffective • Part of the answer is within the question

  3. KISS • Keep It Simple and Straightforward • Three submitted simple runs:1. IR run (simple search)2. Cit run (straightforward citation extraction)3. IR+Cit run (combine IR and Cit runs) • Evaluation results (25 submitted runs):1. IR run (3rd in recall)2. Cit run (1st in precision)3. IR+Cit run (2nd in MAP, recall, and PRES)

  4. IR run • Different document versions of a patent are merged • Only English parts are indexed (title, abstract, description, and claims) • Query is constructed from the same fields as follows:- unigrams with freq>2 from “description” field- bigrams with freq>3 from all fields • French and German topics are translated using Google translation • 1st three levels of classification are used to filter results

  5. Cit and IR+Cit runs • All patents IDs are extracted from description section in patent topics • IDs that do not exist in collection are filtered out • Remaining IDs are considered as relevant documents • Only 771 out of 2,005 topics could have citations extracted from its text (2,307 citations) • IR run is appended to Cit run after removing duplicates to create IR+Cit run

  6. Results

  7. Conclusion & Future Work • When simpler approaches achieve better results than sophisticated ones:Much research is still needed in this area • Extracted citations can be useful for relevance feedback • Better translations can be used for FR/DE topics • Faster translation techniques can be used to translate FR/DE documents

  8. Simply, this was theKISSprinciple with patent search Thank you

More Related