Rohit Kate

Computational Intelligence in Biomedical and Health Care InformaticsHCA 590 (Topics in Health Sciences) Rohit Kate Knowledge Representation: Description Logics

Reading • An Introduction to Description Logics By D. Nardi and R. J. Brachman Chapter 1 in F. Baader et al. (Eds.), Description logic handbook. Cambridge: Cambridge University Press. 2002. (Skip Sections 1.6 & 1.7) Additional Reading (for more formal definitions): By F. Baader and W. Nutt Section 2.2.1 of Chapter 2 in the above book.

What is Knowledge Representation (KR)? • Intelligence is impossible without knowledge • Computational intelligence is impossible without knowledge encoded in computer processable form • Knowledge Representation is representing knowledge of a domain, e.g. medical domain, in a computer processable form to enable computational intelligence • Automated reasoning • Automated discovery

Motivating Example[From Trotter & Uhlman, 2011] • Query: “Find records of all patients infected with Gram Positive Cocci” • Suppose they are at increased risk of developing kidney infection if they have been treated with a certain class of antibiotic • If a patient record has “Penumococcal Pneumonia” as the disease name then a keyword-based search will not find this patient • But if the disease name is recorded or converted into a SNOMED CT expression then a “subsumption” search will find the patient record

Motivating Example [From Trotter & Uhlman, 2011] Gram Positive Coccus Is_a Inference Streptococcus Pneumococcal infectious disease Causative agent Is_a Pneumococcal Pneumonia Finding site Associated Morphology Lung structure Inflammation SNOMED CT has been developed in the description logic formalism [Baader et al. 2003] and hence is highly amenable for automated reasoning.

What are different formalisms of KR? • Propositional Logic (Propositional Calculus) • First-order Logic (Predicate Logic, Predicate Calculus) • Description Logics • Semantic Networks and Ontologies • Often based on Description Logics

Description Logics (DL) • First-order logic is often more powerful and more expressive framework than needed in many domains • If one only needs to encode categories, objects and relations in a domain then something simpler will suffice and will be more efficient • Description logics (DL) is one such very widely used framework, especially in coding medical knowledge • “Describes” things in a domain • More expressive than propositional logic, but less expressive than first-order logic • Some things are easily expressible in description logics but are difficult or awkward to express in first-order logic • All the patients who visited clinic at least twice but not more than five times

Concepts • Description Logics work in terms of concepts and roles • Concepts: Represent classes i.e. set of entities, for example, Disease, Person, Female, Male, Mother, etc. • An instance of Disease will be COLD • Represented as Disease(COLD) • It means COLD is a Disease

Concept Constructors • Complex concepts can be built from simpler concepts using concept constructors • Various description languages differ according to the concept constructors they provide • Hence plural is used for description logics • Presence and absence of concept constructors affect the computational complexity of reasoning in a description language

Concept Constructors • Intersection (П) represents intersection of concepts, for e.g. Person П Female • An instance will be Female П Person (ANNA) • ANNA is a Female and a Person • Union (U) construction represents union of concepts, for e.g. Female U Male • Negation () construction represents all those individuals that are not in that concept class, for e.g.  Female

Atomic Concepts • Complex concepts can be given a name and defined with Ξ symbol • Woman Ξ Person П Female • Atomic concepts: Concepts that cannot be represented using other concepts • Woman is not an atomic concept here

Description Logics • Roles: Represent relations between pairs of instances, for e.g. hasChild • An instance will be hasChild(ANNA, JOCOPO) • ANNA has a child JACOPO • Roles can also be used to represent concepts • For example, hasChild represents all those who have a child

Description Logics • Roles can be used with quantification to represent concepts • Existential () •  hasChild.Female represents all those who have a female child • For all () •  hasChild.Female represents individuals whose all children are female • Value restriction • (>= 3 hasChild) П (<= 2 hasFemaleRelative) represents all individuals who have at least three children and at most two female relatives • (>=2 clinicVisits) П (<= 5 clinicVisits) represents all patients who visited clinic at least twice but not more than five times

An Example Person >= 1 hasChild Parent Female Woman Woman Person П Female Parent Person П >= 1 hasChild Mother Female П Parent Mother Is every Mother a Woman? Not explicit, but is implicit here. Reasoning: Every Mother is Female and Parent. Since every Parent is a Person, hence every Mother is a Female and Person which is the definition of a Woman. A knowledge representation system should be able to determine such relations automatically. Could be a complex task in some domains.

Reasoning in DL • The basic inference on concept expressions in DL is subsumption, written as C D, meaning everyone in C is also in D • Basic query in DL is whether a concept C is subsumed by another concept D, for e.g. is every Mother a Woman • In DL, subsumption is: • Sound (there is an algorithm which when returns “yes” then subsumption is true) • Complete (there is an algorithm which when returns “no” then subsumption is not true ) • Efficient (the above algorithms run very fast) • Because of above theoretical results, DL is very widely used in practice • There are several versions of DL which usually differ in what operators and quantifications are defined over concepts and roles, accordingly their efficiency of reasoning vary

A Well Known Trade-off in Knowledge Representation • There is a trade-off between expressiveness of a representation language and the difficulty of reasoning over it [Brachman and Levesque, 1984] • The more expressive a language is, computationally more difficult the reasoning is • Description logics languages are good compromise and its expressiveness often suffices many applications • Within different description logics languages, the same trade-off holds

DL Knowledge Base • A DL Knowledge base consists of two parts: TBox and ABox • TBox (terminology box): Describes general properties of concepts, for e.g. Person, Female, Woman etc. • ABox (assertion box): Specifies individuals of the domain, for e.g. : • Female П Person (ANNA) • ANNA is a Female and a Person • hasChild(ANNA,JACOPO) • ANNA has a child JACOPO

Basic Reasoning in DL • TBox • Subsumtion • Whether a concept subsumes another • Classification • Where to put a new concept in the hierarchy of concepts • Determine using subsumtion: place between the most specific concept that subsumes it and the most general concept that it subsumes • There are algorithms to do the above automatically

Basic Reasoning in DL • ABox • Instance checking • Whether an individual is an instance of a concept • Knowledge base consistency • Whether every concept admits at least one individual • Realization • Find the most specific concept an individual object is an instance of • Retrieval • Find the individuals that are instances of a concept The last three can be accomplished through instance checking. • There are algorithms to do all the above automatically

DL in Medicine • Many large-scale knowledge bases (hundereds of thousands of concepts) are common in medicine • GALEN (Generalized Architecture for Languages, Encyclopedias, and Nomenclatures in Medicine) [Rector et al. 1993] is a terminology resource for clinical systems built using a Description Logic • SNOMED CT (Systematic Nomenclature of Medicine Clinical Terms) is a comprehensive biomedical terminology also developed in a Description Logic

GALEN • GALEN: Generalized Architecture for Languages, Encyclopedias and Medicine, to represent “all and only sensible medical concepts” • Developed as a Eurpoean Union project (1992-99) • Uses a specialized description logic language (GRAIL: GALEN Representation and Integration Language), also available in OWL (Web Ontology Language) • OpenGALEN, publicly accessible version has about 25,000 concepts

SNOMED CT • SNOMED: Systematized Nomenclature of Medicine (SNOMED CT: Clinical Terms) • Developed by College of American Pathologists • Most comprehensive biomedical ontology • Contains about 269,864 classes and 407,510 names • Available for free as part of UMLS • Uses description logic formalism

SNOMED CT • A concept is described in terms of roles and other concepts Viral infections of the central nervous system Infective meningitis Is_a Is_a Viral meningitis, Abacterial meningitis, Aseptic meningitis, viral (Unique ID: 58170007) Role Causative agent Course Courses Virus Episodicities Assosiative morphology Episodicities Finding site Inflammation Onset Severity Meninges structure Severities Sudden onset, Gradual onset

Conclusions • Knowledge representation formalisms enable automated reasoning and answering queries • Once properly represented, reasoning over the knowledge can be done through symbol manipulation hence it can be automated using a computer • Different formalisms have varying expressive power and computational complexity • Knowledge in any domain of biomedicine is typically large (huge terminology) and complex and its systematic organization and automated reasoning are indispensable

Rohit Kate

Rohit Kate

Presentation Transcript

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Khokher

Rohit Kate

Rohit Kate

Natural Language Processing COMPSCI 423/723 Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate

Rohit Kate