J-Class : an hybrid patent classification system

  IPC Workshop
25/02/2013

  2. Plan • Jouve • J-class • Pre-processing • Similarity method • Semantic method • Combined method • Evaluations • Naïve Questions?

  3. Jouve • The Jouve Group provides customers with cross-media solutions for designing, enriching, showcasing and distributing content. By offering innovative turnkey solutions for publishing, digitization, business process outsourcing, IT and printing, we help our customers develop flexible strategies to gain the competitive edge in the digital market. • 3,000 employees • 25 locations, 15 in France • Around 150,000,000 € turnover • 25% of sales are export

  4. pre-processing • Classic linguistic pre-processing: phrase segmentation, tokenization, POS-tagging, lemmatization • Patent-specific pre-processing: « key-phrase » tagging • Key-phrase = part of the description that concisely describes the patent document topic • Detection of language inconsistencies => small number of documents have been ignored

  5. Semantic method • 1. Construction of semantic models • term extraction; • semantic relation verification; • different filtering methods : representative terms, polysemy reduction etc • Language-specific methods used (linguistic pre-processing, term extraction, lexical resources) • 2. Training • annotation of patent documents with extracted terms; • value calculation according to frequency and position; • feed an SVM classifier.

  6. Learning Terms Extraction Classified Documents Selected Terms Semantic Network (Wordnet) International Patent Classification Semantic Consolidation Sub Semantic network Relevant Terms

  7. Run to be classified Documents Semantic Annotation Annotated documents Classifier Relevant Terms Relevant Concepts Relevant Patterns Classified Documents

  8. Similarity method • Indexing – retrieval method using the Lemur System • Principle: • 1. Build an index using the target data • 2. Query using the test data: Retrieval of the most similar patents • 3. Calculate the query patent class, using the classes of the indexed documents • 1 index per language; • language-specific stop-words lists;

  9. Hybrid system • Input: 3 best candidates for the 2 methods above • from 3 to 6 candidates • build classifiers on the fly, through 1 vs 1 training • Final score = sum of probability values obtained for each binary classifier Similarity Classified Document Decision Document Semantic

  10. Fine grained Classification • Bio-technologies domain (A01H) • 19 classes • Learning with 4005 docs, evaluation with 1251 docs. • Use of a validated terminology of the domain (INRA/MIG)

  11. CLEF IP Results • 2,7M documents used for learning stage (1,3 M patents) • 600 classes • Learning with documents < 2002, • Evaluation with documents > 2002

  12. Pre-classification results • 4.5 M patents used for learning stage • 100 classes • Evaluation with 130 000 patents

  13. Naïve Questions • How can we improve our system performance ? • more patents -> better results • What does it mean 80% for patent office staff ? • What is the inter annotator agreement between examiners ? • What is the best achievable performance for an automatic classification system ? • it is not 100%, that is for sure

  14. Suspicious references • US2007215593 : Diaper rash prevention apparatus • IPC : A21B1/00 (Bakers’ ovens) • ECLA : A47K11/02 (Sanitary equipment) • US6459426B1 : Monolithic integrated circuit implemented in a digital display unit for generating digital data elements from an analog display signal received at high frequencies. “The present invention relates to digital display units used in computer systems” • IPC : B60R25/10F; B62H5/00; B62H5/20 (Vehicles; Cycles)

  15. Non mutually exclusive classes • ECLA : E05D13/04 -> IPC class E05C17/60 • but class ECLA E05C17/60 also exist !!! • ECLA : E05D13/04 Fasteners specially adapted for holding sliding wings open • E05D13/06 with notches • E05D13/06 acting by friction • ECLA : E05C17/60 holding sliding wings open • E05C17/62 using notches  • E05C17/64 by friction 

  16. Some confusions • US2006140886 : Tanning Aids - Claim 1. A tanning aid comprising a polymethyl methacrylate shaped body which comprises 0.1 to 1.5% … • EPO class : A61K8 Cosmetic preparations • J-Class : C08F Macromolecular compounds… • US2007282244 - Glaucoma Implant with Anchor - Claim 1. A method for reducing intraocular pressure… • EPO class : A61M27 Implants devices for drainage of body fluids from one part of the body to the other (intraocular A61F9/00) • J-Class : A61F9 Method or devices for treatment of the eyes • A61F9/007V Apparatus for modifying intraocular pressure, e.g. for glaucoma treatment

  17. Some confusions • GB1077771 : Improvements in or relating to hot water storage containers - Claim 1. A hot water storage container of double walled construction, the walls being 70 spaced apart to provide a cavity to receive heat insulation material… • EPO class : E03B11 Arrangements or adaptations of tanks for water supply • J-Class : F24D Domestic hot-water supply systems; Elements or Components therefor

  18. True confusions • EPO : F24H1/18 water-storage heaters • J-Class : F17C Liquified gaz containers

  19. True confusion • EPO : B62K25 Axle suspension • J-Class : B61F RAIL Vehicle suspension

