650 likes | 950 Vues
Ontologies Computer-interpretable Guidelines Hi6 Artificial Intelligence in Medicine. Maged N Kamel Boulos MIM Centre, City University London, UK E-mail: M.Nabih-Kamel-Boulos@city.ac.uk. Lecture Outline. Introduction and Definitions Ontology (Knowledge) is Power Why Do We Need Ontologies?
 
                
                E N D
OntologiesComputer-interpretable GuidelinesHi6 Artificial Intelligence in Medicine Maged N Kamel BoulosMIM Centre, City University London, UKE-mail: M.Nabih-Kamel-Boulos@city.ac.uk
Lecture Outline • Introduction and Definitions • Ontology (Knowledge) is Power • Why Do We Need Ontologies? • Moving Beyond If-Then Rules • Ontology Application Areas • Description Logics • Ontology Modelling Languages and Tools • Types of Ontologies • RDF as a Semantic Web Ontology Representation Language • Protégé-2000 • OILEd • Medical Examples/Computer-interpretable Guidelines • Practical Points • Conclusions • References
Introduction • Originally a philosophical discipline, ontologies have also become a hot topic in computer science in recent years. The research in ontologies has been triggered by the Artificial Intelligence community (knowledge representation), but is now also used and applied in the research done by the WWW community (especially the Semantic Web Initiative--http://www.w3.org/2001/sw/), the database community, natural language processing community, and the machine learning community. • Ontologies are also very important ingredients of research into multi-agent systems information brokering and retrieval, and knowledge discovery and management (transforming document repositories into proper knowledge repositories)--all in turn are very important in developing the Semantic Web.
Definitions - 1 • An ontology is a consensual, shared and formal description of the concepts that are important in a given domain and their properties (attributes) and relations between them, i.e., it is a conceptual knowledge model or a specification of a conceptualisation. • Property constraints, facts, assertions, axioms* and rules are also part of an ontology. * Axioms are factual statements that assert some additional facts about ontology classes. For example, an axiom might state that the classes male and female are disjoint (i.e, have no instances in common); a reasoner can then infer given a statement like “Rami is type male” that the statement “Rami is type female” must be false and is not possible. Axioms are considered one of the key ingredients in ontology definitions and are one of the major benefits of ontology applications.
Definitions - 2 • There is some similarity between ontologies and conceptual schemas in databases, but ontologies can do more (e.g., support inheritance). • Databases can be considered a kind of ontologies, but the main problem is that relational tables don’t support inheritance by default, although this (inheritance) can be implemented at a price using some tricks, so that they can store ontologies. Object-oriented databases attempt to solve the deficiencies of the relational model.
Definitions - 3 • Typically, an ontology identifies classes or categoriesof objects that are important in a domain, and organises these classes in a subclass-hierarchy. • Each class is characterised by properties shared (inherited) by all elements in that class. • Important relations between classes or between the elements of these classes are also part of the ontology.
Definitions - 4 • Ontologies are content theories about the sorts of objects, properties of objects, and relations between objects that are possible in a specified domain of knowledge (Chandrasekaran et al, 1999). • N.B. Theories in AI fall into two broad categories mechanism theories and content theories.
Ontology (Knowledge) is Power - 1 • The previous descriptions reminds us of taxonomies. However, the real power of ontologies depends on the presence of inference and deduction rules, and reasoning and classification services. • An ontology of the domain is usually not a goal in itself. Developing an ontology is like defining a set of data and their structure (e.g., a database) for other programs to use. • Problem-solving methods/inference engines, domain-independent applications, and software agents use ontologies and knowledge bases built from ontologies as data (much like clinical terminologies, e.g., SNOMED-CT, and their description logics used for inference and classification).
Ontology (Knowledge) is Power - 2Knowledge Engineers are now called Ontologists • Slide from a Medinfo2001 presentation by Professor Mark Musen (Stanford) showing the photo of a female ontologist working for Yahoo! Ontologies (and their granularity/level of detail) are what make these online services excel.
Why Do We Need Ontologies? - 1(Noy and McGuinness, 2001) • Ontologies establish a common terminology (grounding language) between members of a community of interest. • These members can be human or automated agents. • For example, suppose several different Web sites contain medical information or provide medical e-commerce services. If these Web sites share and publish the same underlying ontology of the terms they all use, then computer agents can extract and aggregate information from these different sites. The agents can use this aggregated information to answer user queries or as input data to other applications.
Why Do We Need Ontologies? - 2(Noy and McGuinness, 2001) • Problems might arise if two ontologies used within the same project refer to the same thing in different ways (e.g., sex and gender, or zip code and postal code). • This confusion can be resolved if ontologies provide equivalence relations: one or both ontologies may contain the information that, for example, ‘sex’ is equivalent to ‘gender’.
Why Do We Need Ontologies? - 3(Noy and McGuinness, 2001) • Ontologies also enable reuse of domain knowledge. • For example, models for many different domains need to represent the notion of time. This representation includes the notions of time intervals, points in time, relative measures of time, and so on. If one group of researchers develops such an ontology in detail, others can simply reuse it for their domains. • Additionally, if we need to build a large ontology, we can integrate several existing ontologies describing portions of the large domain. We can also reuse a general ontology and extend/customise it to describe our domain of interest.
Merging/Integrating Ontologies • This is usually necessary to solve real world problems. KMIhttp://kmi.open.ac.uk/
Why Do We Need Ontologies? - 4(Noy and McGuinness, 2001) • Ontologies help making domain assumptions explicit. • This makes it possible to change these assumptions easily if our knowledge about the domain changes. • Hard-coding assumptions about the world in programming-language code makes these assumptions not only hard to find and understand but also hard to change, in particular for someone without programming expertise. • In addition, explicit specifications of domain knowledge are useful for new users who must learn what terms in the domain mean and how the domain is structured.
Why Do We Need Ontologies? - 5(Noy and McGuinness, 2001) • Ontologies help separating domain knowledge from the operational knowledge. • We can describe a task of configuring a product from its components according to a required specification and implement a program that does this configuration independent of the products and components themselves. • We can then develop an ontology of PC-components and characteristics and apply the algorithm to configure made-to-order PCs. We can also use the same algorithm to configure elevators if we “feed” an elevator component ontology to it.
Why Do We Need Ontologies? - 6(Noy and McGuinness, 2001) • Ontologies also help us analyse domain knowledge. This is possible once a declarative specification of the terms is available. • Formal analysis of terms is extremely valuable when both attempting to reuse existing ontologies and extending them.
Moving Beyond Rules(Musen, 2000) - 1 • Many developers in AI believed that large knowledge-based systems could be constructed simply by adding more and more “modular” if–then statements to the disordered collection of production rules that characterised most knowledge-based systems at the time. • Clancey (1983—cited in Musen, 2000) is credited for demonstrating early on the limitations of constructing knowledge-based systems simply by amassing collections of production rules. His careful analyses of the MYCIN knowledge base showed how the developers on that system’s rule base had to construct production rules that would interact with one another in rather arcane ways in order to coerce the system to demonstrate desired problem-solving behaviour.
Moving Beyond Rules(Musen, 2000) - 2 • For example, the sequencing of the clauses in the left-hand side of a rule often needed to be considered very carefully; changing the order of the conditions might radically change the way in which the program would gather and process information about the case under consideration. • Although members of the MYCIN project never documented such programming practices explicitly, knowledge-base builders would carefully tinker with the ordering of rule clauses in order to achieve necessary system performance. When subsequent developers made seemingly innocent changes to the rule base, surprising changes in the system’s program-solving behaviour could result.
Moving Beyond Rules(Musen, 2000) - 3 • Although an official claim was often made that rules such as those in MYCIN are “independent and modular,” it became clear that developers needed to view production rules as elements of a very high-level programming language. Developers intentionally (although perhaps subconsciously) created dependencies among the various rules in a knowledge base in order to effect the desired problem-solving behaviour. • Because the rules in most rule-based systems generally are not annotated, classified, or organised, the purpose of individual rules and the relationships among them can be difficult to determine by direct inspection of a knowledge base. Rule-based knowledge bases can rapidly become unmanageable.
Ontology Application Areas - 1 • Ontologies can be used in: • Information retrieval systems. • Digital libraries. • Integrating heterogeneous information sources. • Internet search engines. • Ontologies can enhance Web searches, relate the information on a page to associated knowledge structures and inference rules, and help us develop agents that can address complicated questions whose answers do not reside on a single Web page. Search engines also use ontologies to find pages with words that are syntactically different but semantically similar.
Ontology Application Areas - 2 • A good example of a well-understood ontology is the categorisation that Yahoo! provides users for searching the Internet (Musen, 2000). • The Yahoo! ontology defines broad categories of entries on the Web. Users understand the ontology, and apply it to locate the concepts that define their interests; the Yahoo! search engine also can process the ontology, and uses it to locate corresponding Web pages. • The relationships among the concepts in the Yahoo! ontology generally are taxonomic (i.e., they are primarily class– subclass relationships). Although the ontology does include some part–whole relationships, in general the goal is simply to provide an enumeration of searchable concept descriptions.
Ontology Application Areas - 3 • Ontologies can be used in (Cont’d): • Object-oriented design of software systems. • Natural-language understanding. • Knowledge-based problem solving. • e-commerce to enable machine-based communication between buyers and sellers. • In medicine: computer-interpretable guidelines. Also Clinical terminologies are ontologies.
Mark Musen, Medinfo 2001It’s not just for e-commerce anymore • The notion of ontology is central to the controlled medical terminologies that have formed the centrepiece of work in medical informatics since the nineteenth century. • Ontologies form the basis by which clinicians communicate with electronic patient record systems and enter case descriptions into decision-support systems. • Ontologies make it possible to build large, maintainable knowledge bases that can codify what we know about specific areas of clinical practice in precise, unambiguous terms. • Ontologies define how nurses record observations regarding their patients. • Ontologies define the terms with which health-care consumers interact with online information resources. • The development, use, and adaptation of ontologies are foundational to all work in informatics.
Description Logics - 1 • A description logic (DL) lies at the heart of any clinical terminology, e.g., SNOMED. • A DL is a language that allows reasoning about information, in particular supporting the classification of descriptions. It can infer knowledge implied by an ontology. • A DL models a domain in terms of individuals (modelled objects), concepts (descriptions of groups of objects sharing common characteristics) and roles (the relationships between concepts or individuals). Individuals are instances of the concepts that represent them.
Description Logics - 2 • DLs allow us to reason about the concepts and work out how they relate to one another. The subsumption or kind-of relationship is the most important, e.g., “Surgical Operation” subsumes “Kidney Transplant” as every “Kidney Transplant” is a “Surgical Operation.” • A collection of descriptions can be organised into a classification using the subsumption relationship, forming a hierarchy of descriptions, ranging from general to specific. • Classification is dynamic as new descriptions can get their position in an existing hierarchy determined by a classifier.
Description Logics - 3 • We can make assertions about individuals (objects) which tell us facts about them and we can relate two individuals (objects) using a role. This may change the classification of concerned individuals. • Given a concept definition or description, we can ask for or retrieve all the individuals that are instances of that concept. The hierarchy can be used during retrieval to allow different types of queries with very crisp results. • Given an individual, we can also determine the most specific concepts that the individual is an instance of, taking into account any assertions that have been made about the individual. This is known as realisation.
Description Logics - 4 • DL is often described as being split into two parts: T-box and A-Box. • The T-box is concerned with reasoning about the concept definitions, providing subsumption and classification services. • The A-box reasons about relationships between individuals (instances), providing retrieval and realisation services.
Ontology Modelling Languages and Tools - 1 • Ontology modelling languages and tools like Protégé-2000 (Stanford, US) and OILEd (Manchester, UK) supply the modelling primitives necessary to provide adequate power of expression and clarity. • These primitives are derived from first-order logic, description logics and frame-based systems, based on the notions of classes (concepts or frames) having properties (roles or slots).
Ontology Modelling Languages and Tools - 2 • Slots can have their own properties (constraints), e.g., domain and range*, and can be arranged in a hierarchy (i.e., subslots). • Not all languages are equally powerful. For example, OIL** can represent axioms while raw RDF(S)*** cannot (see later). *Besides strings and numbers, classes and instances of classes can act as slot values for instances of other classes. Allowed classes for slots of type Instance are often called a range of a slot. The classes to which a slot is attached or classes which property a slot describes, are called the domain of the slot. **Ontology Inference Layer ***Resource Description Framework/Schema
Types of Ontologies (Stoffel et al, 1997) • Traditional Ontologies: ontologies consisting of only the definitions. • Hybrid Ontologies: ontologies combining both ontological relations and the instances defined thereon. • Protégé-2000 supports hybrid ontologies and saves class definitions and instances in two separate files (under the same project).
RDF as a Semantic Web Ontology Representation Language - 1 • In his original 1989 proposal that gave rise to the World Wide Web, Tim Berners-Lee mentioned some ideas very closely related to those formalised nearly a decade later as RDF (Resource Description Framework). In particular, he suggested a directed, labelled graph model with link and node types that encompass metadata applications, in addition to simpler document linking. • RDF is a W3C (World Wide Web Consortium) recommendation for metadata. • RDF is based on XML (the eXtensible Markup Language), and builds on a well-known branch of mathematics: graph theory, plus the experiences of the knowledge acquisition and representation community.
RDF as a Semantic Web Ontology Representation Language - 2 • RDF can represent relationships, while raw XML cannot. • RDF statements can be viewed in three mathematically equal representations: • as a labelled directed graph. This is a good representation for humans; • as triples: Object (usually a resource identified by a URI – Universal Resource Identifier), Attribute (a property of the resource), Value triples (or Subject, Predicate, Object triples). Values can be either atomic or other resources (URIs) or even metadata instances. Triples are accessible to application software; and • as an XML-based representation for exchange between computers.
RDF as a Semantic Web Ontology Representation Language - 3 • RDF is a very simple format for predicate logic, making it possible to use it for modelling ontologies and drawing conclusions by generalising from assertions or from combining several assertions. • The difference from traditional predicate logic is that the syntax of RDF is declared in the RDF schema (RDFS), which means it is specific to the application instead of general, like predicate logic. • The RDF schema is used to define the set of resources that may be used by a model, including constraints for resource (e.g., range and domain) and literal values (constants or string values). • It creates the structure which the user later fills with his/ her description (instances) and which can be used for consistency checks (that the actual RDF triples are following the defined constraints).
RDF as a Semantic Web Ontology Representation Language - 4 • RDF offers only the most basic ontology-modelling primitives; it only knows binary relations (properties), and so cannot model ontological axioms, which correspond to n-ary relations between class expressions, where n is two or greater. • Newer languages, namely OIL (Ontology Inference Layer) and DAML+OIL (DARPA Agent Markup Language merger with OIL, which build on RDFS, offer full support for axioms.
Building Ontologies: Protégé-2000 -1 http://protege.stanford.edu/
The Protégé-2000’s “tabbed” top-level GUI design permits an integration of (1) the modelling of an ontology of classes describing a particular subject, (2) the creation of a knowledge-acquisition tool for collecting knowledge (Forms), (3) the entering of specific instances of data and creation of a knowledge base, and (4) the execution of applications.
Protégé-2000 - 3 • The ontology defines the set of concepts and their relationships (classes and slots; a slot is an attribute of a class, e.g., a physician class might have name, title, and phone number as slots). • Slots can have different facets describing the value type, allowed values/default value, the number of values (cardinality), and other features of the values the slot can take. • A value-type facet describes what types of values can fill in the slot. Common value types include string, number, Boolean, and enumerated (or symbol slots—list of allowed slot values). • Classes and instances of classes can also act as slot values for instances of other classes. • The knowledge-acquisition tool (Forms) is designed to be domain-specific, allowing domain experts to easily and naturally enter their knowledge of the area (entering of specific instances of data and creation of a knowledge base). Protégé-2000 is FREE for anyone to download: http://protege.stanford.edu
Protégé-2000 - 4 • The resulting knowledge base can then be used with a Problem-Solving Method (PSM - a computer program used in conjunction with a knowledge base) or inference engine to answer questions and solve problems regarding the domain. • Finally, an application is the end product created when the knowledge base is used in solving an end-user problem employing appropriate problem-solving, expert-system, or decision-support methods.
Protégé-2000 - 5 • Protégé-2000 can be extended with many useful plug-ins like the UMLS tab. • The UMLS tab is a handy Protégé-2000 plug-in that connects from within Protégé-2000 to the UMLS Knowledge Source Server (KSS) of the US National Library of Medicine (the user must first register his IP address with the UMLS KSS). It allows browsing and searching UMLS, and directly annotating an ontology in Protégé-2000 with terms imported from UMLS as classes or instances (Li et al, 2000).
Screenshot of Protégé-2000 showing UMLS search results/ narrow tree for ‘diabetes mellitus’ in the UMLS tab. The 2001 edition of the Unified Medical Language System (UMLS) Metathesaurus includes about 800,000 concepts and 1.9 million concept names in over 60 different biomedical source vocabularies, some in multiple languages, all available through the UMLS tab in Protégé-2000. The UMLS Semantic Network is indeed a huge ontology on its own merits. A system using UMLS can (if properly implemented) support concept synonyms, multilingual concepts, concept qualifiers, and semantic relationships (related concepts) like ancestors, descendants, parents, children, siblings, narrower, broader and other related concepts.
Protégé-2000 - 7 • Protégé-2000 supports hybrid ontologies and saves class definitions and instances in two separate files (under the same project). • Protégé-2000 native format is CLIPS text files (C Language Integrated Production System). • A Protégé JDBC database back-end permits the storage of Protégé-2000 knowledge bases in a relational database. Users can access the database outside Protégé-2000 environment, and will benefit from the faster access and search facilities that a relational database offers. • An RDF (Resource Description Framework) Schema backend plug-in is also available that allows saving and opening projects in RDFS format.
RDF Support in Protégé-2000(Musen, 2000) • Knowledge bases and ontologies encoded in RDF and RDF-schema promise to interact with Internet-based problem solvers in ways that will greatly expand the scope of what we commonly view as a “knowledge-based system.” • Intelligent systems promise to insinuate themselves ubiquitously into the very fabric of the World Wide Web. As the software-engineering community seeks scalable solutions to the problem of developing systems for this highly dynamic, distributed environment, large-grained abstractions such as domain ontologies and reusable problem-solving methods will inevitably be important components of Semantic Web-based software systems.
OILEd - 1 Screenshot of OILEd, the free OIL editor developed at the University of Manchester, UK (http://img.cs.man.ac.uk/oil/). FaCT (Fast Classification of Terminologies), also developed at the University of Manchester, is a Description Logic (DL) classifier that can also be used for modal logic satisfiability testing(http://www.cs.man.ac.uk/~horrocks/FaCT/).It can be invoked from within OILEd (Reasoner Menu). In the screenshot on the right, FaCT detected that “tasty_plant” (highlighted in red near the bottom of the screenshot) has a problem [having “carnivore” as value for the slot “is_eaten_by” since this conflicts with the class “carnivore” (only eats animals, but “tasty_plant” is a subclass of “plant”)]. N.B.: An important feature of DL expressions is that they can be described in a mathematically precise way, enabling reasoning with concept descriptions and the automatic derivation of classification taxonomies.
OILEd - 2 The same result is obtained in Protégé-2000 with the OIL classifier tab which allows the classification of OIL ontologies by calling the same FaCT descriptions logic classifier.
Medical Examples - 1 • The National Cancer Institute (NCI), Bethesda, MD, USA developed an “eligibility criteria writer tab” in collaboration with the Protégé Team at Stanford Medical Informatics. • Each clinical trial protocol includes a long list of patient criteria that determine eligibility or exclusion. For given types of tumours, certain eligibility criteria can be inferred from well-characterised clinical states. Each eligibility criterion must be linked to standard patient data that can be used to determine the value of that criterion. • You may download the Protégé EligibilityWriter fromhttp://protege.stanford.edu/download/old_releases/0.7_nci_demo/Install.exe • See also: http://protege.stanford.edu/plugins/eligwritertab/EligWriterJan01.ziphttp://protege.stanford.edu/plugins/eligscreeningtab/eligibility_screening_tab.html
Medical Examples - 3(Eligibility Criteria Writer Tab) Linking eligibility criteria to “Common Data Elements”(CDE) cf. the cancer dataset in the UK
Medical Examples - 5 • The Digital Anatomist at the University of Washington had its ontology developed in Protégé. • The project aims at supporting human anatomy teaching, and radiotherapy planning. • See: http://medicine.ucsd.edu/f99/D005771.htm and http://www9.biostr.washington.edu/da.html
Computer-interpretable Guidelines - 1 • Computer-interpretable guidelines (CIG) aims at delivering patient-specific recommendations that are integrated with electronic patient records and health information systems at point of care, i.e., integrated into workflow. • CIG are used to generate automated reminders/alerts; in decision support and task management; to perform retrospective analysis to test if patients were treated appropriately; to check order entry appropriateness, referral criteria; for background monitoring, execution of care plans and quality review. • Individual patient data from the electronic patient record are matched to guideline terms and flowchart; the recommendations in guidelines are matched to actions in order entry system or for prescription printing. • The ultimate goal is to fully apply guidelines to clinical practice, and continually evaluate their application and modify/refine guidelines accordingly (protocol-guided care). • See: http://www.smi.stanford.edu/projects/intermed-web/guidelines/GLIF3/GLIF_Tutorial.pdf