1 / 14

Combining Resources: Taxonomy Extraction from Multiple Dictionaries

Combining Resources: Taxonomy Extraction from Multiple Dictionaries. Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona. Information from Dictionaries. Dictionaries good source for information Long tradition of taxonomy extraction

teddy
Télécharger la présentation

Combining Resources: Taxonomy Extraction from Multiple Dictionaries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combining Resources: Taxonomy Extraction from Multiple Dictionaries Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona

  2. Information from Dictionaries • Dictionaries good source for information • Long tradition of taxonomy extraction • Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008) • Exploiting Machine Readable Dictionaries • Parsing definitional phrases • Pattern extraction, Shallow parsing • Full treatment of a single dictionary

  3. Combining Resources • There is a lot of information available • Hand crafted, high-qualify resources • Combining yields new data • Taxonomy from multiple dictionaries • Language-independent shallow method • Combining definitions of the same word • Various dictionaries, online versions • DRAE, DGLE, Clave, DEM • Frequency Based

  4. Consolidated Genus Terms • Dictionaries differ • Different lexicon and definitions • Even if only for legal reasons • Hyperonym should be the same • A cat is an animal • Unless there is uncertainty in the hyperonym • Most dictionaries should use same genus • Statistically relevant

  5. Example 3x ablandabrevas persona 2x com. inútil 1x substantivo común fig.

  6. Raw HTML input • Directly from harvested text • With begin/end tags • No textual analysis • More than definitions • Examples, multiple senses, etc. • Sense matching impossible • Entries unsystematic • Dictionaries do not match in senses

  7. Cleanup • Minimum number of dictionaries • Raw frequency count • Hyperonym tends to be repeated • Candidates have to be words • Of the same word-class • Use of a stop-list • Dictionary generated • Words that occur in more than 10% entries

  8. # deconstrucción (3 dictionaries) teoría 2 1 EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico; # descubrimiento (5 dictionaries) acción 3 3 cosa 3 5 efecto 2 - EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción; # cumbia (5 dictionaries) danza 2 - EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana; # asta (5 dictionaries) mar 6 - lanza 6 - media 5 - toro 5 - cuerno 5 - bandera 4 - EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;

  9. WordNet Verification • WordNet (still) best available taxonomy • Not the best resource for evaluation • Automatic Verification • 100 Random nouns • Best 5 hyperonymy candidates • Match when candidate in chain • Only about 50% accurracy

  10. Manual post-verification

  11. WordNet vs. Dictionary • WordNet • Many intermediate/artificial levels • Compulsory hyperonym • Contains proper names • Dictonaries • More word-senses • Alternative definitions (synonymy, paraphrasis, …) • Differences • Different choice of hyperonym • Different lexicon

  12. Human post-evalutation

  13. Effect # Dictionaries

  14. Thank you • Question?

More Related