1 / 36

Overview

Ontologies Contributions from Language Technology Paul Buitelaar DFKI GmbH Language Techology Lab DFKI Competence Center Semantic Web Saarbrücken, Germany. Overview.

abril
Télécharger la présentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontologies Contributions from Language TechnologyPaul BuitelaarDFKI GmbHLanguage Techology LabDFKI Competence Center Semantic WebSaarbrücken, Germany

  2. Overview Ontologies and the Semantic WebSemantic Web Intro Ontologies and Knowledge Markup Ontology Development Ontology Lifecycle & Language Technology Language TechnologyLevels of Automatic Linguistic AnalysisOntologies in Multilingual Information Access  A Medical Example: MuchMore Project Semantic Resources in the Medical Domain Demo MuchMore System Language Technology in Annotation and Indexing ConclusionsMuchMore for the Legal Domain…

  3. Semantic Web Semantic Web Services Semantic Web Knowledge Markup Ontologies Intelligent Man-Machine Interface

  4. Ontology-based Knowledge Markup Semantic Metadata • Metadata, e.g. Dublin Core -- Title, Author, etc. • Semantic: Formal Properties of Objects of Class Author Knowledge Markup <xmnls jobs="http://www.jobs.org/daml+oil-jobs-ontology#"> <jobs:systems-analyst> John Smith </jobs:systems-analyst>

  5. Semantic Web Architecture Layered Architecture (Tim Berners-Lee)

  6. Syntax Semantics XML XML Schema NamespacesInterpretation Context Data Types Formalization: Classes (Inheritance), Properties RDF Schema RDF Formalization: Classes, Class Definitions, Properties, Property Types (e.g. Transitivity) OWL (DAML+OIL) Knowledge Markup Languages

  7. Ontologies: Basic Idea • Definition • “… Explicit, Formal Specification of a Shared Conceptualization of aDomain of Interest” T. Gruber Towards principles for the design of ontologies used for knowledge sharing. Int. J. of Human and Computer Studies, 1994 • Purpose • Knowledge Sharing (e.g. between Agents) • Inference (over Sets of Instances) • Related Areas, e.g. • Terminologies, Controlled Vocabulary, Thesauri, Taxonomies, Semantic Lexicons, Wordnets, etc. • Conceptual Models, Schemas, etc.

  8. Ontologies: Applications, e.g. • Semantic Web Services • Interoperability for (Semantic) Web Services • Intelligent Agents • Domain Models for Intelligent Agents • Text Interpretation • Ontology-aware Information Extraction • Multimedia Integration • Ontology-based Alignment of Extracted Objects in Text, Audio, Video • Intelligent Search/Navigation • Ontology-based Indexing in Web-Retrieval

  9. Ontologies: Development • Ontology Editor / KB Management • Most Widely Used: Protégé (Stanford University, Medical Informatics, USA) • Originally for Development and Maintenance of Medical Expert Systems • Other, e.g. • KAON: University of Karlsruhe - AIFB, Germany • WebOde: UPM – Ontology Group, Madrid, Spain • WebOnto: Open University - KMI, UK • Overview at XML.comby Michael Denny: Ontology Building: A Survey of Editing Tools

  10. Class Hierarchy Slot Descriptions http://dmag.upf.es/ontologies/2003/12/ipronto.owl

  11. Ontology Lifecycle Populating Validating Creating Deploying Evolving Maintaining

  12. LT in the Ontology Lifecycle Language Technology (LT) for Ontology: Creating & Evolving Linguistic Analysis to Extract Classes / Relations Classes, Relations/Properties Ontology (Knowledge) Documents (Text) Populating (Knowledge Base Generation) Linguistic Analysis to Extract Instances Instances Language Technology = Automated Linguistic Analysis

  13. Linguistic Analysis: Example The Dell computer with a flat screen had to be rejected because of a failure in the motherboard. flat screen Dell computer has-a reject has-a animate-entity motherboard failure location-of

  14. Part-of-Speech, Morphology Part-of-Speech • e.g.: noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags Morphology • Most languages have inflection and declination, e.g.: Singular/Plural computer, computers Present/Past reject, rejected Many languages have also complex (de)composition, e.g.:Flachbildschirm(flat screen) >flach + Bildschirm>flach + Bild + Schirm

  15. Phrases, Terms, Named Entities Semantic Units • Phrases (e.g. nominal - NP, prepositional - PP)NP a flat screen PP with a flat screen NP (recursive) the Dell computer with a flat screena failure in the motherboard Terms (domain-specific phrases)Dell computerDell computer with a flat screen Named Entities (phrases corresponding to dates, names, …) COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell

  16. Dependency Structure Semantic Structure Dependencies between Predicates and Argumentsthe Dell computer with a flat screen had to be rejectedPRED: reject ARG1: ENTITY ARG2: ‘the Dell computer with a flat screen’‘Logical Form’ :reject(x,y) & animate-entity(x) & computer(y) & … The Dell computer that has been rejected was claimed to have suffered from handling.reject(e1,x1,y1) & animate-entity(x1) & Dell_computer(y1) & claim(e2,x2,e3) & animate-entity(x2) & suffer_from(e3,y1,y2) & handling (y2)

  17. MuchMore Project http://muchmore.dfki.de Demonstration Prototype  Real-Life Medical Scenario for Cross-Lingual Information Retrieval Research & Development  Combined Data- and Knowledge-Driven Performance Evaluation  Performance Comparison of Existing and Novel Methods

  18. Semantic Resources Medical Domain UMLS: Unified Medical Language System Medical MetaThesaurus (only MeSH2001 is used) English, German, Spanish, … 730.000 Concepts 9 Relations (Broader, Narrower,…) Semantic Network 134 Semantic Types 54 Semantic Relations General WordNet (EN), GermaNet (DE), EuroWordNet (“linked”)

  19. C0019682|ENG|P|L0019682|PF|S0048631|HIV|0| C0019682|ENG|S|L0020103|PF|S0049688|HTLV-III|0| C0019682|ENG|S|L0020128|VS|S0049756|Human Immunodeficiency Virus|0| C0019682|ENG|S|L0020128|VWS|S0098727|Virus, Human Immunodeficiency|0| C0019682|FRE|P|L0168651|PF|S0233132|HIV|3| C0019682|FRE|S|L0206547|PF|S0277133|VIRUS IMMUNODEFICIENCE HUMAINE|3| C0019682|GER|P|L0413854|PF|S0538136|HIV|3| C0019682|GER|S|L1261793|PF|S1503739|Humanes T-Zell-lymphotropes Virus Typ III|3| Concept Names: 1.734,706 ENGLISH 1.462,202 GERMAN 66,381 other languages MetaThesaurus, SemNet • Each CUI (Concept Unique Identifier) is mapped to one out of 134 Semantic Types or TUI (Type Unique Identifier) • Clozapine: C0009079  Pharmacologic Substance: T121 • Semantic Types are organized in a Network through 54 Relations • T121|T154|T047

  20. Token (with Part-of-Speech) German: Kreuzbandes English: ligaments Lemma (or Sequence of Lemmas - Decomposition) German: Faserknorpel Faser + Knorpel English: ligament UMLS Concept Code and Semantic Type ligament : C0022745_T030 MeSH Code A2.513 Semantic Relation (over a Pair of UMLS Concepts) C0022745_T030 interconnects C0047693_T065 Annotation & Indexing

  21. UMLS Semantic Network specifies 54 types of relations between 134 semantic types Pharmacologic SubstanceaffectsCell Function Relations are generic and potentially false Therapeutic Proceduremethod_of Occupation,Discipline *discectomymethod_ofhistory Relations are ambiguous Therapeutic ProcedurepreventsNeoplastic Process Therapeutic ProcedurecomplicatesNeoplastic Process Therapeutic ProcedureaffectsNeoplastic Process Therapeutic ProceduretreatsNeoplasticProcess Relations

  22. Discontinuation of heparin is a simple andessential maneuvre, and anticoagulation has tobe continued by alternative drugs. Example

  23. Terms:C0019134Heparin C0005790 Blood coagulation tests C0013227Pharmaceutical preparations Example: Terms/Concepts Discontinuation of heparin is a simple andessential maneuvre, and anticoagulation has tobe continued by alternative drugs.

  24. Example: Relations Discontinuation of heparin is a simple andessential maneuvre, and anticoagulation has tobe continued by alternative drugs. Terms:C0019134Heparin C0005790 Blood coagulation tests C0013227Pharmaceutical preparations • Relations: C0019134 interacts_with C0013227 • C0005790 analyses C0019134 • C0005790 analyses C0013227

  25. Conclusions MuchMore for the Legal Domain… ResourcesLegal Domain Ontology with……Large-scale Terminology for Multiple Languages, or if not available……Large Legal Domain Corpora in Multiple Languages for Term Extraction……and for Relation Extraction if Ontology Needs to be Constructed/Adapted  ToolsLinguistic Analysis (PoS, Morphology, Term Grammars, etc.)……for Multiple Languages……Tuned to the Legal Domain…Information Retrieval Infrastructure, Interface Design, etc.

More Related