1 / 33

Overview of Patent Retrieval Task at NTCIR-3

Overview of Patent Retrieval Task at NTCIR-3. (2001/4 ~ 2002/10). Organizers: Makoto Iwayama (TIT / Hitachi) Atsushi Fujii (Univ. of Tsukuba / JST) Noriko Kando (NII) Akihiko Takano (NII) Collaboration with: JIPA (Japan Intellectual Property Association) Data provider: PATOLIS Co.

olina
Télécharger la présentation

Overview of Patent Retrieval Task at NTCIR-3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Patent Retrieval Task at NTCIR-3 (2001/4 ~ 2002/10) Organizers: Makoto Iwayama (TIT / Hitachi) Atsushi Fujii (Univ. of Tsukuba / JST) Noriko Kando (NII) Akihiko Takano (NII) Collaboration with: JIPA (Japan Intellectual Property Association) Data provider: PATOLIS Co.

  2. What is Patent Retrieval Task- Evaluation workshop for Patent Retrieval - • Evaluating/comparing patent retrieval techniques of participating systems • Investigating evaluation methods of patent retrieval techniques • Constructing/providing test collections for patent retrieval • Providing a forum for research groups of patent retrieval

  3. search results1 pooled results How to Construct Test Collection (Basic Idea) assessors (human experts) search target (doc. collection) relevance judgment system1 relevant docs. searchtopic system2 search results2 pooling runs Test Collection evaluation (recall/precision)

  4. Why Patent • Patent information processing is a good test bed of researches on information access • Real task, real user, real information need • Variety of tasks (IR, CLIR, Filtering, Summarization, Text Mining) • Unique document collection which is different from other well-studied document collections • Searching patents may be different from searching news article (in TREC, CLEF) or searching technical abstracts (in NTCIR).

  5. Characteristics of Patent Documents • Structured documents • claims, purposes, effects, embodiments, etc. • Unusual style for claims (esp. Japanese) • Many sub-topics in single long sentence • Large variation in document length • The longest patent in our collection has 30,000 unique words !

  6. Characteristics of Patent Documents (cont) • Many technical terms • Many new terms (defined by applicants) • General terms in claimsex. floppy disk → external storage • Classifications • IPC(International Patent Classification) • FI(File Index) and FT(File Forming Term) for Japanese patents

  7. Characteristics of Patent Retrieval • Various search purposes • High recall is necessary in some situations • Claim interpretation is necessary in some situations • Difference between industries • Chemical formulas/materials are important in chemistry • Images/diagrams are important in machinery • IPC is primal in some industries, but not in some others (ex. business method patents)

  8. Main Task (Technical Survey) non-expert (manager) expert(patent searcher) clipping + reading news articles • patent survey before product development • patents as technical documents (cf. as legal documents) • cross-DB retrieval memo for supplementing, focusing, etc.

  9. search results1 pooled results How to Construct Test Collection (Basic Idea) assessors (human experts) search target (doc. collection) relevance judgment system1 relevant docs. searchtopic system2 search results2 pooling runs evaluation (recall/precision)

  10. Example of Search Topic (JA) <TOPIC> <NUM>0004</NUM> <LANG>JA</LANG> <PURPOSE>技術動向調査</PURPOSE> <TITLE>バーコードなどの符号を比較し優劣を判定する装置</TITLE> <ARTICLE> <A-DOC> <A-DOCNO>JA-981031179</A-DOCNO> <A-LANG>JA</A-LANG> <A-SECTION>社会</A-SECTION> <A-AE>無</A-AE> <A-WORDS>189</A-WORDS> <A-HEADLINE>エポック社の特許侵害訴訟、バンダイが敗訴--東京地裁</A-HEADLINE> <A-DATE>1998-10-31</A-DATE> <A-TEXT> カードゲームの特許を侵害されたとして、玩具(がんぐ)製造会社のエポック社がバンダイに2億6400万円の損害賠償を求めた訴訟で、東京地裁は30日、約1億1400万円の支払いを命じた。 森義之裁判長は、バンダイが1992年7月~93年3月に製造・販売した小型ゲーム機「スーパーバーコードウォーズ」のキー操作などの機能について「エポック社が持つ特許の技術的範囲に属する」と指摘した。</A-TEXT> </A-DOC> </ARTICLE> <SUPPLEMENT>バーコードなどを読み込み、これに基づく数値を比較して勝敗を決定していればよい。</SUPPLEMENT> <DESCRIPTION>バーコードなどの符号を複数読み込ませ、これら符号に対応する数値を比較することにより、これらの優劣/勝敗の判定を行うことで対戦を行う装置にはどのようなものがあるか。</DESCRIPTION> <NARRATIVE>「スーパーバーコードウォーズ」とは、小型ゲーム機の一種であり、キャラクターなどが描かれたカードに記録されたバーコードを読み込ませ、プレーヤーが攻撃や防御などのキー操作を行うことで、半リアルタイムに対戦を行うものである。符号の例としては、バーコードや磁気コードなどがあるが、これらに限定するものではない。</NARRATIVE> <CONCEPT>符号 バーコード コード 優劣 勝敗 比較 判定</CONCEPT> <PI>PATENT-KKH-G-H01-333373</PI> </TOPIC> <TITLE> <ARTICLE> <SUPPLEMENT> <DESCRIPTION> <NARRATIVE> <CONCEPT> <PI>

  11. Search Topics • 31 Japanese topics created by JIPA • English, Korean and Chinese (simplified and traditional) topics translated from Japanese

  12. search results1 pooled results How to Construct Test Collection (Basic Idea) assessors (human experts) search target (doc. collection) relevance judgment system1 relevant docs. searchtopic system2 search results2 pooling runs evaluation (recall/precision)

  13. Patent Collections (from PATOLIS Co.) kkh: Publication of unexamined patent applications jsh: JAPIO Patent Abstracts paj: Patent Abstracts Japan

  14. Relationships between Patent Collections kkh: (98,99) Full texts with author’s abstracts (in Japanese) Modification of the original abstracts (by JAPIO experts) length normalization around 400 words and term normalization jsh: (95-99) Abstracts (in Japanese) paj: (95-99) Abstracts (in English) Translation (by JAPIO experts)

  15. search results1 pooled results How to Construct Test Collection (Basic Idea) assessors (human experts) search target (doc. collection) relevance judgment system1 relevant docs. searchtopic system2 search results2 pooling runs evaluation (recall/precision)

  16. Runs • Mandatory runs • Search topic fields:<ARTICLE> + <SUPPLEMENT> only (i.e. cross-DB) • Search target:full texts (98, 99) • manual or automatic retrieval • mono-lingual or cross-lingual retrieval • Optional runs • TREC-styled ad-hoc runs recommendedsearch from <DESCRIPTION> (+ <NARRATIVE>)

  17. pooled results How to Construct Test Collection (Basic Idea) assessors (human experts) search target (doc. collection) relevance judgment search results1 system1 relevant docs. searchtopic system2 search results2 pooling runs evaluation (recall/precision)

  18. search results1 relevant docs. Manual Search before Pooling JIPA (Japan Intellectual Property Association) by systems and experts assessors (human experts) by experts only relevant docs. preliminary search by systems only evaluation relevance judgments original pool skimmed pool relevant docs. systems search results2 pooling skimming

  19. Evaluation • Recall/Precision graph • Average precision • R-precision

  20. Submitted Runs • 36 runs from 8 groups • 6 groups from Japan, 2 groups from overseas • 5 groups from universities, 3 groups from companies • 20 mandatory runs, 16 optional runs • 5 cross-lingual runs (EJ) from 2 groups

  21. Recall/Precision-- mandatory, A --

  22. Recall/Precision-- mandatory, A+B --

  23. Recall/Precision-- optional, A --

  24. Recall/Precision-- optional, A+B --

  25. Comparable Runs from the same group (brkly) ad-hoc run from Japanese cross-DB run from Japanese ad-hoc run from English cross-DB run from English

  26. Some Results • Cross-DB retrieval is more difficult than ad-hoc retrieval • Cross-lingual retrieval (EJ) is more difficult than Mono-lingual retrieval (JJ) • Pseudo relevance feedback may be effective Refer to our SIGIR2003 paper (to appear) for more detail of glass box evaluations

  27. Median of Mean Average Precisions

  28. Human Experts vs. Systems- breakdown of relevant documents -

  29. Human Experts vs. Systems- recall of the “A” relevant documents experts found - pooling (rank = 30)

  30. Results 2 • Medians of Mean Average Precisions vary topic by topic. • Numbers of relevant documents systems/experts found vary topic by topic.

  31. Proposal-based task was also supported • CRL • Using the Diff Command in Patent Documents • TIT • Rhetorical Structure Analysis of Japanese Patent Claims using Cue Phrases

  32. Summary • The first evaluation workshop for patent retrieval • Three kinds of patent collections • Technical survey task • Cross-DB / Cross-lingual retrieval • Relevance judgment by expert patent searchers (JIPA)

  33. Patent Retrieval Task at NTCIR-4(currently on “Dry Run”) • Joint project with JIPA • 10 years’ Japanese full texts (5 years’ for search target) • 10 years’ English Abstracts • Invalidity search from claims • Feasibility task for automatically creating “Patent Map”

More Related