Ontologies

Ontologies Presented by: Rokhlenko Oleg olegro@cs.technion.ac.il Data Integration Seminar Spring 2002 Supervisor: Ron Y. Pinter

References Ontologies: Principles,Methods and Applications by: Mike Uschold – The university of Edinburg, Scotland & Michael Gruniger – University of Toronto, Canada Ontology-Driven Integration of Scientific Repositories by: Vassilis Christophide Catherine Houstis Spyros Lalis Hariklia Tsalapata Department of CS, University of Crete, Greece

Agenda • Why Ontologies and what are they? • Uses of Ontologies. • A skeletal methodology for building ontologies. • Ontologies in practice.

What are the problems? • The lack of shared understanding leads to: • Poor communication within and between people and their organizations • Difficulties in identifying requirements and thus defining of a specification of the system • Disparate modeling methods, paradigms, languages and software tools severely limit: • Inter-operability • The potential for re-use and sharing • => much wasted effort re-inventing the wheel

How can we solve them? The way to address these problems, is to reduce or eliminate conceptual and terminological confusion and come to a shared understanding. Such an understanding can function as a unifying frameworkfor the different viewpoints and serve as the basis for: • Communication between people. • Inter-Operability between systems. • System Engineering benefits as: • Re-usability • Reliability • Specification

Examples (1) • Unifying Research Fields Situation/Problem: Researches in the different but related fields of AI Planning, Decision Theory and Distributed Systems Theory cannot readily make use of each other’s results. This is because they have a different perspective on and use different terms to describe the same underlying ideas. Solution: Develop a unifying conceptual framework which enables research results in one field to be applied to the other fields.

Examples (2) • SemiConductor Fabrication Situation/Problem: Software bought in from the outside includes a WIP tracking system and production line simulation package. The simulation package requires as input, a very large description of a model of the product flow in the factory, which incorporates various details of the WIP tracking mechanism. When new versions of the simulation package are released, or if a new supplier is chosen, the model must be converted to a new format. This conversion is both timeconsuming and errorprone. Solution: Automate the process of converting the model when new external software is introduced. This both saves time and ensures model fidelity.

Examples (3) • Spacecraft Mission Operations Situation/Problem: Various knowledgebased systems were developed independently to assist in different aspects of spacecraft operations (e.g. in planning, anomaly detection, diagnosis). Each uses its own approach to structuring and representing the relevant concepts in a large knowledge base.It is desirable to integrate these system, so that each can make use of the knowledge of the others. Solution: Use a federated agentbased approach to knowledge sharing. The overall system is called ATOS: Advanced Technology Operations System.

What is an ontology? • From Greek: Ontos = being, logos = science • `Ontology' is the term used to refer to the shared understanding of some domain of interest which may be used as a unifying framework to solve the above problems in the above described manner. • An ontology necessarily entails or embodies some sort of world view with respect to a given domain. The world view is often conceived as a set of concepts (e.g. entities, attributes, processes), their definitions and their interrelationships; this is referred to as a conceptualization.

What is an ontology? (cont.) • Such a conceptualization may be implicit, • e.g. existing only in someone's head, or embodied in a piece of software. For example, an accounting package presumes some world view encompassing such concepts as invoice, and a department in an organization. The word `ontology' is sometimes used to refer to this implicit conceptualization. • However, the more standard usage and that which we will adopt is that the ontology is an explicit account or representation of [some part of] a conceptualization.

What does an ontology look like? • An [explicit] ontology may take a variety of forms, but necessarily it will include a vocabulary of terms and some specification of their meaning (i.e. definitions). The degree of formality by which a vocabulary is created and meaning is specified varies considerably: • highly informal: expressed loosely in natural language • semiinformal: expressed in a restricted and structured form of natural language • semiformal: expressed in an artificial formally defined language • rigorously formal: meticulously defined terms with formal semantics, theorems and proofs of such properties as soundness and completeness.

What did we see till now? • Why Ontologies and what are they? • Uses of Ontologies. • A skeletal methodology for building ontologies. • Ontologies in practice.

Uses of ontologies COMMUNICATION between people and organizations We identify three main categories of uses for ontologies. Within each, other distinctions may be important, such as the nature of the software, who the intended users are, and how general the domain is. INTER-OPERABILITY between systems Reusable Components Reliability Specification SYSTEM ENGINEERING

Communication • Normative Models • Within any largescale integrated software system, different people must have a shared understanding of the system and its objectives. • Networks of Relationships • We can also use ontologies to create a network of relationships, keep track of what is linked, and explore and navigate through this network. • Consistency and Lack of Ambiguity • One of the most important roles an ontology plays in communication is that it provides unambiguous definitions for terms used in a software system. • Integrating Different User Perspectives • If we have a system with multiple communicating agents, this integration through shared understanding becomes vital.

InterOperability • Many applications of ontologies address the issue of interoperability, in which we have different users that need to exchange data or who are using different software tools. A major theme for the use of ontologies in domains such as enterprise modeling and multiagent architectures is the creation of an integrating environment for different software tools.

The term ‘procedure’ used by one tool is translated into the term ‘method‘ used by the other via the ontology, whose term for the same underlying concept is ‘process’. Example procedure viewer give me the procedure for… give me the process for… translator procedure = ??? here is the procedure for… Ontology ??? = process give me the METHOD for… procedure = process translator METHOD = process here is the process for… method library here is the METHOD for…

L1 L2 L1 L2 T1 T2 Interlingua L4 L3 T3 T4 L3 L4 Ontologies as InterLingua One approach is to design unique translators for every two party exchange; however, this would require O(n 2 ) translators for n different ontologies To assist interoperability, ontologies can be used to support translation between different languages and representations. Using ontologies as interlingua to support translation, we can reduce the number of translators to O(n) for n different ontologies, since it would only require translators from a native ontology into the interchange ontology

System Engineering • The applications of ontologies that we have considered to this point have focused on the role that ontologies play in the operation of software systems. In this section we consider applications of ontologies that support the design and development of the software systems themselves: • Specification • Reliability • Reusability

1. Specification • A shared understanding of the problem and the task at hand can assist in the specification of software systems. The ontology’s role in specification varies with the degree of formality within the system design methodology: • In an informal approach, ontologies facilitate the process of identifying the requirements of the system and understanding the relationships among the components of the system. This is particularly important for systems involving distributed teams of designers working in different domains. • In a formal approach, an ontology provides a declarative specification of a software system, which allows us to reason about what the system is designed for, rather than how the system supports this functionality.

2. Reliability • Informal ontologies can improve the reliability of software systems by serving as a basis for manual checking of the design against the specification. • Using formal ontologies enables the use of [semi]automated consistency checking of the software system with respect to the declarative specification. In addition, formal ontologies can be used to make explicit the various assumptions made by different components of a software system, facilitating their integration. • Declaratively specified assumptions may explicitly restrict the applicability of a particular ontology to a problem domain . By proving that the ontology is capable of supporting various reasoning problems, we can demonstrate the reliability of the software system within the domain.

3. Reusability • To be effective, ontologies must also support reusability, so that we can import and export modules among different software systems. • The problem is that when software tools are applied to new domains, they may not perform as expected, since they relied on assumptions that were satisfied in the original applications but not in the new ones. • By characterizing classes of domains and tasks within these domains, ontologies provide a framework for determining which aspects of an ontology are reusable between different domains and tasks.

A Skeletal Methodology for Building Ontologies • Although there is much collective experience in developing and using ontologies, there are no standard methodologies for building ontologies. • Proposed comprehensive methodology for developing ontologies includes the following: • Identify Purpose and Scope; • Building the Ontology; • Evaluation; • Documentation;

Purpose and Scope • It is important to be clear about why the ontology is being built and what its intended uses are. The previous section explores the space of possible uses; this can be a starting point in identifying the purpose of an ontology yet to be constructed. • It will also be useful to identify and characterize the range of intended users of the ontology.

Building the Ontology • The identification of the purpose and scope of the ontology, at least in general terms, serves to provide a reasonably welldefined target for building the ontology. • Three aspects to this are: • capture, • coding, • and integration of existing ontologies.

Capture • By ontology capture, we mean: 1) identification of the key concepts and relationships in the domain of interest; (scoping) 2) production of precise unambiguous text definitions for such concepts and relationships; 3) agreeing on all of the above.

1) Scoping • Brainstorming---Have a brainstorming session to produce all potentially relevant terms and phrases; • Grouping ---Structure the terms loosely into work areas corresponding to naturally arising subgroups. • Connecting --- Identify semantic crossreferences between the areas; i.e. concepts that are likely to refer to or be referred to by concepts in other areas. This information can be used to help identify which work area to tackle first to minimize likelihood of rework.

2) Produce Definitions • Determining MetaOntology --- Let the careful consideration of the concepts and their interrelationships determine the requirements for the metaontology. Keep in mind various possibilities, and use words and phrases in a consistent manner where appropriate (e.g. role, entity, relationship, type, instance). • Work Areas --- Address each work area in turn. Start with work areas that have the most semantic overlap with other work areas. • Terms --- Proceed in a middleout fashion rather than topdown or bottom up. That is, define the most fundamental terms in each work area before moving on to more abstract and more specific terms within a work area. • The idea of what is fundamental, or basic, is a psychological phenomenon. For example, `dog' is basic, `mammal' is a generalization, and `cocker spaniel' is a specialization.

Why Middle-Out Approach? Bottom-Up Approach Difficult commonality High level of detail Inconsistency Re-work & more effort

Why Middle-Out Approach? Choosing arbitrary high-level categories Top-down Approach Better control of the level of detail Less stability Re-work & more effort

Why Middle-Out Approach? Starting with most important concepts Middle-Out Approach Balance in terms of the level detail Capture commonality Stability Consistency & Accuracy Less re-work & less effort

3) Reaching Agreement • There is considerable variation in the degree of effort required to agree on definitions and terms for underlying concepts. For some terms, consensus on the definition of a single concept can be fairly easy. In other cases several terms seem to correspond with one concept definition. • In practice, there are only few cases where commonly used terms have significantly different informal usage, but no useful different definitions could be agreed. This should be recorded in notes against the definition. • Finally, some highly ambiguous terms are identified as corresponding with several closely related, but different concepts. In this situation, the term itself gets in the way of a shared understanding.

Coding • By coding, we mean explicit representation of the conceptualization captured in the previous stage in some formal language. This will involve: • committing to the basic terms that will be used to specify the ontology (e.g. class, entity, relation); this is often called a `metaontology' because it is in essence, the [underlying] ontology of representational terms that will be used to express the main ontology; • choosing a representation language (which is capable of supporting the metaontology); • writing the code.

Integrating Existing Ontologies • During either or both of the capture and coding processes, there is the question of how and whether to use [all or part of] ontologies that already exist. In general this is a very difficult problem. One way forward is to make explicit all assumptions underlying the ontology. • Overall, provision of guidance and tools in this area may be one of the biggest challenges in developing a comprehensive methodology for building ontologies. It is easy enough to identify synonyms, and to extend an ontology where no concepts readily exist. However, when there are obviously similar concepts defined in existing ontologies, it is rarely clear how and whether such concepts can be adapted and reused.

Evaluation • G'omezP'erez provides a good definition of evaluation in the context of knowledge sharing technology: “to make a technical judgment of the ontologies, their associated software environment, and documentation with respect to a frame of reference … The frame of reference may be requirements specifications, competency questions, and/or the real world.” • Some detailed work has been done on the evaluation of ontologies which could contribute to a comprehensive methodology for building ontologies .The approach taken in some of this work, is to look first at what has been done in the field of KBS, and to adapt it for ontologies.

Documentation • It may be desirable to have established guidelines for documenting ontologies, possibly differing according to type and purpose of the ontology. • As pointed out by Skuce , one of the main barriers to effective knowledge sharing, is the inadequate documentation of existing knowledge bases and ontologies. To address these problems all important assumptions should be documented, both about the main concepts defined in the ontology, as well as the primitives used to express the definitions in the ontology (i.e. the metaontology).

A scenario for Costal Zone Management (CZM) • Environmental scientists and public institutions working on CZM often need to extract and combine data from different scientific disciplines, such as marine biology, physical and chemical oceanography, geology and engineering, stored in distributed repositories. • For example: the transport of waste in particular coastal area given a pollution source. Local authorities could require this information to determine the best location for installing a waste pipeline. • This data is typically generated through a 2- step process, involving the execution of 2 different programs:

Example Combination of data and programs for producing waste transport data: Currents Bathymetry Sea Circulation Bathymetry Ocean Circulation Model Waste Transport Model Pollution Source Sea Circulation Waste

A scenario for CZM (cont.) Provided that user has no knowledge of this information, the following actions are necessary to discover which productions can be used to obtain Waste data for a particular costal are using the available recourses: • Locate Waste data stored in the distributed repositories. • Determine usability of such search results. • If no Waste data that satisfies the user requirements is available, locate programs capable to produce this data. • Having identified as appropriate program, i.e. a Waste Transport model, determine the required input. • For each input, locate appropriate sources or determine ways to produce corresponding data sets.

An Ontology for the Waste Transport Scenario CZM Chemical Oceanography Physical Oceanography Waste Global Currents Sea Circulation Bathymetry Waste Transport Model Ocean Circulation Model

The Knowledge Base • In order to allow reasoning on the combination alternatives between data and programs, we advocate a definition of the ontology notions in a KBS using Horn Clauses. • An ontology notion N is defined as a clause N(A1, A2, …,An) where A1,A2,…,An are it’s attributes. • Relations between concepts are expressed as rules of the form: N(A1,A2,…,An) :- N1(A1,…,An), … , Nn(A1,…,An), Expr(A1,…,An) where “:-” denotes implication and “,” conjunction.

The Knowledge Base (cont.) N(A1,A2,…,An) :- N1(A1,…,An), … , Nn(A1,…,An), Expr(A1,…,An) • The rule body includes program and data concepts Ni as well as constrains Expr , e.g. parameter restrictions, for deducing the notion appearing as a consequent in the rule head. • Exactly one literal in the body describes the corresponding program notion. The rest ot the literals stand for the description of input data required by that program. • The following clauses define the notions introduced in the above ontology: Bathymetry(Location, GridRes) ExtCurrents(Location, GridRes) SeaCirc(Location, GridRes) OceanCircModel(Location, GridRes) WasteTransprotModel(Location, GridRes)

The Knowledge Base (cont.) • The ontology relation are formalized using 2 rules: • SeaCirc (Location, GridRes) :- OceanCircModel (Location, GridRes) , ExtCurrents (Location, GridRes’) , Bathymetry (Location, GridRes’’) , GridRes <= GridRes’ , GridRes <= GridRes’’ “<=“ denotes higher or equal grid resolution Rule 1 states that Sea Circulation data for a specific location and grid resolution can be derived from local Bathymetry and external Current data using Ocean Circulation program.

The Knowledge Base (cont.) • Waste (Location, GridRes) :- WasteTransportModel (Location, GridRes) , SeaCirc (Location, GridRes’) , Bathymetry (Location, GridRes’’) , GridRes <= GridRes’ , GridRes <= GridRes’’ Rule 2 states that Waste data for a specific location and grid resolution can be produced by combining Sea Circulation with local Bathymetry data via a Waste Transport program.

The Knowledge Base (cont.) • Clauses without a body, called facts , are instances of abstract notions. For example: • SeaCirc (HER, 10m3) stands for 3-D Sea Circulation for the area of Heraklion with a grid resolution of ten cubic meters. • WasteTransport (HER, 1m2) stands for Waste Transport program that computes 2-D Waste data for the area of Heraklion with a grid resolution of one square meter. • Facts are either extensional, indicating available data sets or programs, or intentional, denoting data sets that can be generated through programs. • There is no need to explicitly store facts in KBS. Intentional facts are dynamically deduced through rules. Extensional facts can be constructed “on-the-fly” via metadata search engine that locates the corresponding resources.

On-Demand Generation of Data Production Paths • Given this formal representation of the ontology, requests for data productions translate into queries to the knowledge base. • A query is a description of the desired resource in terms of an ontology concept. It must be satisfied through the extensional or intentional facts. The latter being sub-queries requiring further expansion. This iterative matching process takes into account all possible combinations of rules and extensional facts. • The result is set of trees, whose nodes are intentional facts and leaves are extensional facts, embodying all valid production paths through which data for the queried concept can be generated.

Example • To illustrate the on demand generation of data production paths, let us assume that the following resources are available in the system repositories, expressed as extensional facts: • Bathymetry (HER, 10m2) • ExtCurrents (HER, 10m3) • OceanCircModel (HER, 10m2) • SeaCirc (HER, 25m3) • WasteTransportModel (HER, 10m2) • WasteTransportModel (HER, 50m3) • The use can inquire on the concept of Waste without restricting any attributes by posing the query Waste(X, Y)

- Extensional fact Production for Waste data as presented to user by GUI. Example - Intentional fact - Program Waste (HER, 10m2) Waste (HER, 50m3) WasteTransportModel (HER, 10m2) WasteTransportModel (HER, 50m3) Bathymetry (HER, 10m2) SeaCirc (HER, 10m2) OceanCircModel (HER, 10m2) Bathymetry (HER, 10m2) SeaCirc (HER, 25m3) Bathymetry (HER, 10m2) ExtCurrents (HER, 10m3)

Architecture (Overview) query Metadata Search Engine Knowledge Base System Middleware system Graphical User Interface productions Workflow Runtime Workflow specification export resources Invoke/access resources wrapper data sets model

Ontologies

Ontologies

Presentation Transcript

Ontologies

Ontologies

Ontologies

Generating Application Ontologies from Reference Ontologies

Object Ontologies

Ontologies

Ontologies

Ontologies

Ontologies

Biological Ontologies

Reference Ontologies, Application Ontologies, Terminology Ontologies

Ontologies

Ontologies

Information Ontologies

Ontologies

Ontologies

Building Ontologies

Ontologies

Ontologies

Ontologies

Ontologies

Reference Ontologies, Application Ontologies, Terminology Ontologies