140 likes | 249 Vues
Combining Resources: Taxonomy Extraction from Multiple Dictionaries. Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona. Information from Dictionaries. Dictionaries good source for information Long tradition of taxonomy extraction
E N D
Combining Resources: Taxonomy Extraction from Multiple Dictionaries Rogelio Nazar & Maarten Janssen IULA, Universitat Pompeu Fabra, Barcelona
Information from Dictionaries • Dictionaries good source for information • Long tradition of taxonomy extraction • Calzolari (1977), Amsler (1981), Chodorow et al (1985), Fox et al. (1988), Alshawi (1989), Boguraev (1991), Barrière & Popowich (1996), Chang (1998), Renau & Battaner (2008) • Exploiting Machine Readable Dictionaries • Parsing definitional phrases • Pattern extraction, Shallow parsing • Full treatment of a single dictionary
Combining Resources • There is a lot of information available • Hand crafted, high-qualify resources • Combining yields new data • Taxonomy from multiple dictionaries • Language-independent shallow method • Combining definitions of the same word • Various dictionaries, online versions • DRAE, DGLE, Clave, DEM • Frequency Based
Consolidated Genus Terms • Dictionaries differ • Different lexicon and definitions • Even if only for legal reasons • Hyperonym should be the same • A cat is an animal • Unless there is uncertainty in the hyperonym • Most dictionaries should use same genus • Statistically relevant
Example 3x ablandabrevas persona 2x com. inútil 1x substantivo común fig.
Raw HTML input • Directly from harvested text • With begin/end tags • No textual analysis • More than definitions • Examples, multiple senses, etc. • Sense matching impossible • Entries unsystematic • Dictionaries do not match in senses
Cleanup • Minimum number of dictionaries • Raw frequency count • Hyperonym tends to be repeated • Candidates have to be words • Of the same word-class • Use of a stop-list • Dictionary generated • Words that occur in more than 10% entries
# deconstrucción (3 dictionaries) teoría 2 1 EWN: 0.desconstrucción; 0.deconstrucción; 1.teoría filosófica; 1.doctrina filosófica; 2.filosofía; 3.creencia; 4.contenido mental; 5.conocimiento; 5.cognición; 6.rasgo psicológico; # descubrimiento (5 dictionaries) acción 3 3 cosa 3 5 efecto 2 - EWN: 0.descubrimiento; 1.logro; 1.presentación; 1.revelación; 2.realización; 2.información; 2.exposición; 3.acción; 3.hecho; 3.acto de habla; 3.comunicación visual; 4.acto; 4.actividad humana; 4.comunicación; 5.relación social; 6.relación; 7.abstracción; # cumbia (5 dictionaries) danza 2 - EWN: 0.cumbiamba; 0.cumbia; 1.baile regional; 1.danza popular; 2.baile social; 3.baile; 4.recreación; 4.diversión; 5.actividad; 6.acto; 6.actividad humana; # asta (5 dictionaries) mar 6 - lanza 6 - media 5 - toro 5 - cuerno 5 - bandera 4 - EWN: 0.cuerno; 0.asta; 1.tomadero; 1.materia animal; 1.cogedero; 1.bastón; 1.agarradera; 1.asimiento; 1.asidero; 1.asa; 2.materia; 2.apéndice; 2.vara; 2.palo; 3.porción; 3.sustancia; 3.parte; 3.herramienta; 4.utillaje; 5.artefacto; 6.objeto físico; 6.cosa; 6.objeto; 6.objeto inanimado; 7.competente; 7.respirar; 7.capaz; 7.entidad;
WordNet Verification • WordNet (still) best available taxonomy • Not the best resource for evaluation • Automatic Verification • 100 Random nouns • Best 5 hyperonymy candidates • Match when candidate in chain • Only about 50% accurracy
WordNet vs. Dictionary • WordNet • Many intermediate/artificial levels • Compulsory hyperonym • Contains proper names • Dictonaries • More word-senses • Alternative definitions (synonymy, paraphrasis, …) • Differences • Different choice of hyperonym • Different lexicon
Thank you • Question?