1 / 20

Mr. JOTL: A User Friendly Matching Software

Mr. JOTL: A User Friendly Matching Software. Stéphane Lhuillery, Julio Raffo & Fernando Lladós . Outline. Background Objectives & Rationale Results User Friendly Software Concept Alpha test Further steps. Background.

kipp
Télécharger la présentation

Mr. JOTL: A User Friendly Matching Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2nd "NameGame" APE-INV workshop Mr. JOTL: A User Friendly Matching Software Stéphane Lhuillery, Julio Raffo & Fernando Lladós 

  2. 2nd "NameGame" APE-INV workshop Outline • Background • Objectives & Rationale • Results • User Friendly Software • Concept • Alpha test • Further steps

  3. 2nd "NameGame" APE-INV workshop Background • Automatic patent retrieval is becoming compulsory due to the size of data sets. • Growing literature looking at this NameGame: • On firms’ names: Derwent, 2002; Mageman et al., 2006; Hall, 2006; Thoma et al. 2007. • On inventors’ names: Trajtenberg et al., 2006; Hoisl, 2006; Lissoni et al., 2006; Mariani et al., 2007; Raffo & Lhuillery, 2009; etc.‏ • Our ESF Project outcomes: • New matching best practices • APE-INV database

  4. 2nd "NameGame" APE-INV workshop Objectives of the NameGame Maximizing True positives ? Minimize False negative (=higher recall)‏ Minimize False positive (=higher precision)‏

  5. 2nd "NameGame" APE-INV workshop Rationale behind: A three step game

  6. 2nd "NameGame" APE-INV workshop Examples on matching (EPFL)

  7. 2nd "NameGame" APE-INV workshop Examples on filtering (EPFL)

  8. 2nd "NameGame" APE-INV workshop What we learned so far? • General • Matching algorithms are not perfect, but improve considerably the results. • Cleaning step • Data origin changes substantially the data preparation process • Matching step • There is a hierarchy pattern across algorithms, although specific to each particular case • Filtering step • Supplementary data availability enhances or constraints the disambiguation process

  9. 2nd "NameGame" APE-INV workshop Why to create a user friendly software? ISI Thomson Survey PATSTAT / APE-INV Database PATVAL SCOPUS EU FWProgram

  10. 2nd "NameGame" APE-INV workshop Concept behind Mr. JOTL • Intuitive for beginner users • Flexible on inputs and its preparation • Fair variety of standard matching processes • Adaptable on the disambiguation filters • But soundly customizable for advanced users • Conceived and coded to be expanded in the future by multiple developers

  11. 2nd "NameGame" APE-INV workshop From concept to real • (ok for the moment just an alpha!)

  12. Inputs

  13. Parsing

  14. Matching

  15. Disambiguation SSM

  16. 2nd "NameGame" APE-INV workshop Let’s test it!

  17. 2nd "NameGame" APE-INV workshop Technical notes • OS supported (so far): • Windows XP,  Vista, Seven (Server & x64) • Coded in C sharp • Pros:  • Free Development Environment • Low cost of entry • Large Developer community • Cons: • Proprietary language and libraries • Less performing memory management •  Libraries needed: Scintella: open source lexer, syntax highlighter •  Customizable code: • C sharp & VBA •  Suggested environment for future development: • Visual Studio (Express version is free to use) • Mono in Linux

  18. 2nd "NameGame" APE-INV workshop Further developments • Full coding existing algorithms. • Testing performance against large dataset (>Million records). • Pre-setting standard routines (as XML). • Drafting documentation (+Video). • Proof-testing with first time users (at EPFL).

  19. 2nd "NameGame" APE-INV workshop Openness and its governance • How to share it? • GitHub? • Forums • How to develop a dynamic sharing community?

  20. 2nd "NameGame" APE-INV workshop Thank you!

More Related