1 / 18

A Three Level Representation Model for Concepts, Terms and Linguistic Objects

A Three Level Representation Model for Concepts, Terms and Linguistic Objects. Thierry Declerck (DFKI GmbH, LT-Lab) & Piroska Lendvai ( Research Institute for Linguistics, Hungarian Academy of Sciences ). Introduction.

kishi
Télécharger la présentation

A Three Level Representation Model for Concepts, Terms and Linguistic Objects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Three Level Representation Model for Concepts, Terms and Linguistic Objects Thierry Declerck (DFKI GmbH, LT-Lab) & Piroska Lendvai (Research Institute for Linguistics, Hungarian Academy of Sciences ) Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  2. Introduction • Ontology/Taxonomy classes are often introduced with natural language expressions (terms), that: • reflect the „meaning“ of the class in an human readable way • indicate the way this class can be realised in textual documents • Those terms are ideally encoded within a „label“ feature associated with the ontology class. A (simplified and modified) example from RadLex • Class = RID1382 • Label: lang=„en“ string=„right inferior pulmonary ligament“ • Label: lang=„de“ string=„Rechtes inferiores Lungenligament“ • In most of the cases no linguistic information is associated with the natural language expressions used in the labels • But models for combining terminological or lexical information and conceptual information within (domain) ontologies have been recently proposed (the OTR meta-model by Axel Reymonet et al, IC2007, or the LexInfo approach by Buitelaar et al, ESWC 2009) Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  3. OTR Model Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  4. LexInfo Model Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  5. An approach for a standardisedsed Linguistic Annotation of Ontology Labels • Stand-off annotation of ontology labels with linguistic information (but searching for compatibility with LexInfo or other models) • No overload of the ontology with addtional linguistic information, which can be quite substantial in case the content of the label is a larger phrase or even a full sentence • The linguistic information externally associated with the labels can be organised in an ontological structure, which is mirroring the original domain ontology. • Proposal discussed first at a FlaReNet meeting in Pisa, September 2009, and further developped on this base. Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  6. Possible Benefits of the Approach • Possibility to link constitutive parts – linguistically speaking – of the string content of the label, which might have themselves no corresponding concepts in the actual ontology, to possibly related classes in other ontologies, or to suggest the addtition of such classes in the actual ontology • If the noun/noun compound „pulmonary ligament“ is being used in other labels, internal or external to the actual ontology, one can investigate if the corresponding classes might be related • The compound noun „lymph node“ is occuring as the linguistic head of noun phrases in many labels in the Radlex ontology, but never as the only term of the label of a class • Suggest a new class, and/or; • Search if this class is not part of another (to be related) ontology.. Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  7. Possible Benefits of the Approach • Linguistic annotation of ontology/taxonomy labels can ease the corresponding semantic annotation of text • One can better model the variation of surface realisation of the concept labels: Not searching for a 100% match but for compatible linguistic annotation (terms in the class label and in the text sharing for example the same lemmas, even in different word orders) Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  8. Relation to ISO TC37/SC4 • Apply the ISO strategy for linguistic annotation (Linguistic Annotation Framework, LAF), including feature structures (in XML) and a multi-layered stand-off annotation approach • Point from the feature structure representing an multilayered annotation graph to the label of the ontology class • Mapping of the tagset of the linguistic annotation to the morpho-syntactic and syntactic data categories defined in ISOcat, or direct use of the data categories of ISO Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  9. Linguistic Annotation that can be associated to an Ontology label • Label: Skelettmuskel des medialen Oberschenkels • Categorial Information for the whole term • „hasCat" => "NP", • Dependency Information for the whole term • "hasHead" => "Skelettmuskel", • "hasModifier" => "des medialen Oberschenkels", • "hasModifierType" => "PostModGen", • Recursive dependency Information • "hasModHead" => "Oberschenkel", # head of the modifying phrase • "hasModMod" = "medialen", # modiffyer within the modifying phrase • Recursive constituency and morpho-syntactic Information • "hasHeadPos" => "Noun", • "hasHeadCase" => "Nominative|Accusative", • "hasHeadCompound" => "Skelett Muskel", • "hasHeadLemma" => "skelett muskel", • "hasModCat" => „NP", # cat of the modifying phrase • "hasModHeadCompound" => "Ober Schenkels", • "hasModHeadLemma" => "ober schenkel", • "hasModHeadPoS" => "Noun", • "hasModHeadCase" => „Gen", • "hasModModPoS" => "Adj", Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  10. „hasCat" => "NP", • "hasHead" => „Lymphknoten", • "hasHeadPoS" => „N", • "hasHeadCompound" => „Lymph Knote", • "hasHeadLemma" => „knoten", • "hasModifier" => " Axillärer", • "hasModifierPoS" => " Adj", • "hasModifierLemma" => " axillär", PreModAdjectival Feature Structure for Pre-nominal adjectival Modification • For labels like „Axillärer Lymphknoten“ Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  11. Parsing Problem for certain Surface Realisation in Corpus • For example, expressions like „Lymphknoten axillär, mediastinal und hilär.“ • Incomplete sentence • Sequence „Noun + Coord Adverbs“, which is normally not considered as being a (correct) syntactic chunk Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  12. "hasString" => „Lymphknoten", • "hasCompound" => „Lymph Knoten", • "hasLemma" => „knoten", Noun • „hasCoord" => „und", • „hasAdvStrings“ => <axillär, mediastinal, hilär> • „hasAdvLemmas“ => <axillär, mediastinal, hilär> EnumAdvCoord Feature Structures for Surface Realisation in Corpus • „Lymphknoten axillär, mediastinal und hilär.“ Need for a rule for grouping/unifying the coordinated adverbs and the noun, and for transforming the generated structure in 3 equivalents PreModAdjectival feature structures that can unify with the corresponding labels in the ontology. Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  13. Unification of FS of Labels and FSs in Text • Identify the proper segment in text. • Collect the linguistic feature structures associated with the words in the text or the feature structure associated with a chunk • Search in the FSs of the Labels if they share lemmas with the FSs of the strings in text. • If the lemma of a noun in the FSs-Text matches with the lemma of a (head) noun in the labels, search for modifiers around the noun in the text (in the segment) and match with modifiers in the corresponding labels (identified by the (head) noun) Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  14. Consistency Checking of Ontologies • In RadLex « Ligamentum des Handgelenks » (ligament of wrist joint) Is_A « Handgelenk » (wrist joint) « Ligamentum des Ellenbogengelenks » (ligament of elbow joint) Part_Of « Ellenbogengelenk » (elbow joint) • Labels are annotated with consistency and dependency information: « Ligamentum des Handgelenks »  is a NP with HEAD « Ligamentum » and Post-Nominal-Genitive-Mod « des Handgelenks ». • This annotatedlabel is related with the label « Handgelenk », since they share the HEAD of a NP. We observe in the ontology that both labels have a « Is_A » relation. • But we observe that for the very similar second example, with the sole difference of one lexical difference « Ellenbogen » we have in the ontology a relation « Part_Of »between the two classes. • We adopt the principle: « linguistic regularities always characterise the same kind of knowledge, such as semantic relations » (Aussenac Gilles et Jacques, 2008), and asked a domain specialist about the small problem we described, and we got a positive feedback: both examples should be an instance of the same relation. Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  15. Proposed Model CTL Domain_Class: hasId: RID2694 hasREL: Part_Of hasSuperclass: RID2660 # the corresponding term: Oberschenkel Term_Class: hasId: Term:1767 hasString: Skelettmuskel des medialen Oberschenkels hasTokens: [t1 Skelettmuskel] [t2 des] [t3 medialen] [t4 Oberschenkels] hasClass: Class:RID2694 Linguistic_Class: hasId: LO:14 hasName: Ling:postNominalGentiveModification # POINTER to ISOCAT hasTerm: Term:1767_hasTokens[t1-t4] # and by transitivity hasClass: Class:RID2694 Linguistic_Class: hasId: LO:215 hasName: NP_Genitive # POINTER to ISOCAT hasTermTokens: Term:1767_has_Tokens[t2-t4] Linguistic_Class: hasId: LO:213 hasName: NOUN_Nominative # POINTER to ISOCAT hasTermTokens: Term:1767_Tokens[t1] … Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  16. Relations between Data Categories • Complex relations not to be defined within ISOcat, but through the linked terminologies and ontologies used in an application • Role of ISOcat: ease the access to the data categories for such applications • Maybe store the resulting „linguistic ontologies“ that are mirroring the application/domain ontologies • In ISOcat (like in the OTR Model), just define what is a linguistic object (has_PoS, has_Segment, has_Lang etc.) Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  17. CTL Model Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

  18. Thanks for your attention • Contact: Thierry Declerck declerck@dfki.de Clarin/ISO/FlaReNet Meeting Nijmegen, 08.01.2009

More Related