1 / 25

Complete and Consistent Annotation of WordNet with the Top Concept Ontology

Complete and Consistent Annotation of WordNet with the Top Concept Ontology. Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona). Introduction.

argyle
Télécharger la présentation

Complete and Consistent Annotation of WordNet with the Top Concept Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra, Antoni Oliver and German Rigau Basque Country Univ., Pompeu Fabra Univ, (Barcelona), Open Univ. Of Catalonia (Barcelona)

  2. Introduction • 4 years work • Full annotation of WordNet’s Nouns with Semantic Features (EWN TCO) • Aimed to be an important semantic resource for NLP (selectional preferences, synset clustering, reasoning…).

  3. Result • 65.989 noun concepts (synsets) = 116.364 noun lexemes (variants) consistently annotated • Average of 6.47 features per synset • Features organized in a multilevel hierarchy

  4. Structure of the talk • Methodology • Examples and Discussion • Conclusions

  5. Methodology • Annotation of the Inter Lingual Index(=EnWn1.6, SpaWN, mapping to other WNs...)with the nodes/features of the TCO(a shallow ontology defined in the EWN Project [Vossen et. Al 1998]) • Methodology based on: • INCOMPATIBILITY OF ONTOLOGICAL INFORMATION • SUBSUMPTION BLOCKAGE POINTS

  6. The Top Concept Ontology • Organized in three orders of entities: • 1st Order (physical entities) • 2nd Order (situations) • 3rd Order (abstract entities)

  7. The Top Concept Ontology • 1st Order entities organized in four Qualia-like features: • Origin (Artifact, Natural..) • Form (Object, Substance…) • Composition (Group, Part) • Function (Building, Container, Vehicle…)

  8. The Top Concept Ontology • 2nd Order Entities organized in two dimensions • Situation Type: Dynamic (Bounded Events, Unbounded Events) & Static (Properties, Relations) • Situation Component: (Cause, Manner, Modal…) • 3rd Order Entities, no further subdivided

  9. Methodology • We don’t modify the structure of neither the TCO nor WN (=> future work). We just annotate. • We declared pairs of TCO properties as incompatible (e.g.:natural vs. artifact, substance vs. object) • Initial annotation situation: In EWN, TCO features were manually assigned to a basic set of 1024 EWN synsets (= Base Concepts)

  10. Methodology • We annotatedautomatically the rest of the Top Synsets (from the BCs up to the Top) using a Wordnet’s SemanticFile-TCO table of equivalence (e.g. NounAct <=> Agentive , NounAttribute <=> Property ) • We performed a full automatic top-down expansion of such information via the WN1.6 hierarchy (feature inheritance)

  11. Methodology • This caused feature incompatibility to arise: • about 225.000 conflicts in 25.000 synsets • Causes: • Wrong manual annotation in EWN • Wrong TCO-SF equivalence • ... but basically: • Subsumption in WN not always work • ISA Overloading etc. • Multiple inheritance in WN

  12. Methodology • We checked manually all feature incompatibilities in order to: • (i) adding and/or deleting ontological features • (ii) setting inheritance blockage points. • A blockage point is an annotation in WN1.6 which breaks the ISA relation between two synsets, thus no inheritance is allowed.

  13. A simple example island city Java Bandung

  14. A simple example island =NATURAL city =ARTIFACT Java Bandung

  15. A simple example island =NATURAL city =ARTIFACT Java +NATURAL Bandung +NATURAL +ARTIFACT

  16. A simple example island =NATURAL city =ARTIFACT Java +NATURAL Bandung +ARTIFACT

  17. MethodologyInformation used for decision making • Relational information regarding every synset and neighbours; i.e. the WN structure • Synsets' glosses as provided by EWN • Glosses, descriptions and examples of the TCO features as provided in [Alonge et al. 1998] • Usual word-substitution tests to acknowledge hyponymy, as in [Cruse 1986]

  18. Methodology • When all incompatibilities were fixed, a new automatic re-expansion was launched which resulted in a new (smaller) number of conflicts. • Following this iterative and incremental approach, inheritance was re-calculated and data are re-examined several times. • Task finished when a new cycle of re-expansion of properties did not result in new conflicts.

  19. Methodology • Then, two final steps were applied: • Since the TCO is itself a hierarchy, for every synset, its annotation was expanded up-feature; e.g. Animal expands ot Living, Natural, Origin and 1stOrderEntity • The whole hierarchy was checked for consistencyusing formal Theorem Provers like Vampire and E-prover • This step resulted in a number of new conflicts which were finally fixed.

  20. Typology of miscategorizations (IS-A Overload) (in black:[Guarino 1998] original typology) • Overgeneralization • Reduction of sense • Confusion of senses • Suspect Type-to-role relationship • Extensional ambiguity • 3rd Order Entities vs Mental 2nd Order Entities (TCO labels) • Technical inconsistencies

  21. Typology of miscategorizations • Overgeneralitzation = Hypernym has more features than Hyponym should have • Reduction of Sense = Hypernym fails to capture part of the Hyponym’s meaning • Confusion of senses = Multiple inheritance where hypernyms are incompatible

  22. Typology of miscategorizations • Extensional ambiguity = e.g. “layer”: is it an object or a substance? • 3rd Order Entities vs Mental 2nd Order Entities (TCO labels) = e.g “discipline” (process thus 2ndOrder) IS-A “knowledge domain” (3rdOrder) • Technical inconsistencies = e.g. Hyponymy-Meronymy confusion

  23. Conclusions • WN1.6 (= ILI) fully and consistently annotated for Nouns with 60 semantic features organized in a shallow ontology • 65.000 synsets,116.000 variants • Average of 6.48 TCO features per synset • 350 inheritance-blocking points detected in WN • 28.000 synsets have at least one in their hypernymy chain [= they are affected by WN hierarchy mistakes or inadequacies] • The resource is free. It can be downloaded from our web site (vid. proceedings)

  24. object =OBJECT abstraction =CONCEPT artifact +OBJECT shape +CONCEPT art +OBJECT figure +CONCEPT impressionism +OBJECT sculpture =IMAGE_REPRESENTATION +CONCEPT +OBJECT monument +OBJECT The Statue of Liberty +OBJECT +IMAGE_REPRESENTATION +CONCEPT

  25. object =OBJECT abstraction =CONCEPT artifact +OBJECT shape +CONCEPT art =CONCEPT figure +CONCEPT impressionism +CONCEPT sculpture =IMAGE_REPRESENTATION =OBJECT monument +OBJECT The Statue of Liberty +OBJECT +IMAGE_REPRESENTATION

More Related