Introduction to Knowledge Representation and Conceptual Modeling

Introduction to Knowledge Representation and Conceptual Modeling Martin Doerr Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion – Crete, Greece

Knowledge RepresentationOutline • Introduction • From tables to Knowledge Representation: Individual Concepts and Relationships • Instances, classes and properties • Generalization • Multiple IsA and instantiation • A simple datamodel and notation

Knowledge RepresentationIntroduction: Basic Notions • Knowledge Representation: Representation of concepts, items (particulars) and their interrelation as perceived by humans and expressed in terms of a formal language, such as logic, conceptual graphs etc. The intended meaning (semantics) is the interpretation (identification) of used symbolsas things and categories in the described universe (“domain”, “world”, “real world), and the interpretation of expressions, which use those symbols, as statements about their structure and interrelations (early Wittgenstein). A set of related knowledge representation expressions is called a model (of a domain). • IT Jargon (due to limited scope): • “KNOWLEDGE” instead of model • “SEMANTICS” instead of expressions

Knowledge RepresentationIntroduction: Reservations • Limitations: “Principles of equivalence”: Given a model accepted as correct by a human, logical (automatable) inferences from a model should conform with the expectations of a human. Only in this sense represents knowledge representation (KR) knowledge. KR is a means of communication. Expressions are rarely/never definitions, but partial constraints. (see also late Wittgenstein, Eleonore Rosch - George Lakoff). Formal languages fit only partially the way we think. • Psychological Obstacles to create KR: The true structure of our thoughts is unconscious. BEWARE of compressions (Gilles Fauconnier, “The Way We Think”). See “Jargon” Methodological questions reveal part of it (e.g. change of context).

Patient Name String Weight Number Birth date Time Birth Place String Address String Knowledge RepresentationFrom Forms to Classes (a “decompression”) Relational Database Tables: - an abstraction from forms, - a model for (statistical) information processing Table name Attributes (sometimes called “part-of”) Valuetypes • What does that mean as statements about the world? • Is it correct, e.g., “Address” ?

Patient Name String Weight Number Birth Date Time Birth Place String has Address table Knowledge RepresentationFrom Forms to Classes (a “decompression”) Address: • Shared with others • Changes over time • Can be multiple • Independent entity What about Birth Date? ∞ ∞

Patient Name String Weight Number has has Address table Knowledge RepresentationFrom Forms to Classes (a “decompression”) Birth Date, Birth Place • Shared with others • Birth shared with others (twins)! • Independent entity ∞ ∞ Birth Date Time Place String 1 ∞

Patient Name String has has has Address Patient’s Weight Knowledge RepresentationFrom Forms to Classes (a “decompression”) 1 ∞ ∞ ∞ Birth Date Time Place String 1 Weight: • Similar but not shared! • Multiple units, measurements • Dependent, but distinct entity What about the name? ∞

Patient has has has has Address Patient’s Weight Name String Birth Date Time Place String Knowledge RepresentationFrom Forms to Classes (a “decompression”) 1 ∞ ∞ ∞ 1 Name: • Shared • Context specific • Independent entity Who is the Patient then? ∞

Knowledge RepresentationFrom Forms to Classes (a “decompression”) • Summary: • In the end, no “private” attributes left over. • Widening/ change of context reveals them as hidden, distinct entities. • The “table” becomes a graph of related, but distinct entities, a MODEL • Things are only identified by unique keys – and the knowledge of the reality! • Do we describe a reality now? Are we closer to reality? Do we agree that this is correct? (“Ontological commitment”). For a database schema, a projection (birth!) of perceived reality can be sufficient and more efficient. For exchange of knowledge, it is misleading. For a database schema, it can hinder extension.

Knowledge RepresentationClasses and Instances • In KR we call these distinct entities classes: • A class is a category of items that share one or more common traits serving as criteria to identify the items belonging to the class. These properties need not be explicitly formulated in logical terms, but may be described in a text (here called a scope note) that refers to a common conceptualisation of domain experts. The sum of these traits is called the intension of the class. A class may be the domain or range of none, one or more properties formally defined in a model. The formally defined properties need not be part of the intension of their domains or ranges: such properties are optional. An item that belongs to a class is called an instance of this class. A class is associated with an open set of real life instances, known as the extension of the class. Here “open” is used in the sense that it is generally beyond our capabilities to know all instances of a class in the world and indeed that the future may bring new instances about at any time. (related terms: universals, categories, sortal concepts).

Knowledge RepresentationParticulars • Distinguish particulars from universals as a perceived truth. Particulars do not have specializations. Universals have instances, which can be either particulars or universals. • particulars: me, “hello”, 2, WW II, the Mona Lisa, the text on the Rosetta Stone, 2-10-2006, 34N 26E. • universals: patient, word, number, war, painting, text • “ambiguous” particulars: numbers, saints, measurement units, geopolitical units. • “strange” universals: colors, materials, mythological beasts. • Dualisms: • Texts as equivalence classes of documents containing the same text. • Classes as objects of discourse, e.g. “chaffinch” and ‘Fringilla coelebs Linnaeus, 1758’ as Linné defined it.

instance property Knowledge RepresentationClasses and Instances In KR, instances are independent units of models, not a restricted to the records of one table. Identity is separated from description. We can do “multiple instantiation”. ? Weight weighs Doctor Patient dwells at Address 85 Kg Odos Evans 6. GR71500 Heraklion, Crete, Greece Costas 65 George 1 • What have doctors and patients in common?

Knowledge RepresentationGeneralization and Inheritance An instance of a class is an instance of all its superclasses. A subclass inherits the properties of all superclasses. (properties “move up”) weighs Physical Object Weight superclass dwells at Person Address isA subclass Doctor Patient 85 Kg Odos Evans 6. GR71500 Heraklion, Crete, Greece Costas 65 George 1

Knowledge RepresentationOntology and Information Systems • An ontology is a logical theory accounting for the intended meaning of a formal vocabulary, i.e. its ontological commitment to a particular conceptualization of the world. The intended models* of a logical language using such a vocabulary are constrained by its ontological commitment. An ontology indirectly reflects this commitment (and the underlying conceptualization) by approximating these intended models. Nicola Guarino, Formal Ontology and Information Systems, 1998. * “models” are meant as models of possible states of affairs. • Ontologies pertains to a perceived truth: A model commits to a conceptualization, typically of a group, how we imagine things in the world are related. • Any information system compromises perceived reality with what can be represented on a database (dates!), and with what is performant. An RDF Schema is no more a “pure” ontology. Use of RDF does not make up an ontology.

Knowledge RepresentationLimitations • Complex logical rules may become difficult to identify for the domain expert and difficult to handle for an information system or user community. • Distinguish between modeling knowing (epistemology) and modeling being (ontology): necessary properties may nevertheless be unknown. Knowledge may be inconsistent or express alternatives. • Human knowledge does not fit with First Order Logic: There are prototype effects (George Lakoff), counter-factual reasoning (Gilles Fauconnier), analogies, fuzzy concepts. KR is an approximation. • Concepts only become discrete if restricted to a context and a function! Paul Feyerabend maintains they must not be fixed.

Ontology EngineeringScope Constraints of for Ontology Activities Conceptual framework viewpoints Domain work Research disciplines Communication maps in order to serves Ontology Constraint: affordable technical complexity Constraint: how select Precision/ detail Current domain priorities talks about how select Real World Things

E53 Place E53 Place • A place is an extent in space, determined diachronically with regard to a larger, persistent constellation of matter, often continents - by coordinates, geophysical features, artefacts, communities, political systems, objects - but not identical to • A “CRM Place” is not a landscape, not a seat - it is an abstraction fromtemporal changes - “the place where…” • A means to reason about the “where” in multiple reference systems. • Examples: • figures from the bow of a ship • African dinosaur foot-prints appearing in Portugal by continental drift • where Nelson died

Knowledge RepresentationA Graphical Annotation for Ontologies P7 took place at (witnessed) P26 moved to (was destination of) E12 Production Event P88 consists of (forms part of) E53 Place E9 Move E9 Move P27 moved from (was origin of) P108 has produced (was produced by) P87 identifies (is identified by) P25 moved (moved by) P53 has former or current location (is former or current location of) E44 Place Appellation P59 is located on or within (has section) P58 defines section of (has section definition) E46 Section Definition E18 Physical Stuff E47 Spatial Coordinates E48 Place Name E24 Ph. M.-Made Stuff E45 Address E19 Physical Object

E20 Person E19 Physical Object Spanair EC-IYG Martin Doerr E9 Move E9 Move My walk 16-9-2006 13:45 Flight JK 126 E53 Place E53 Place E53 Place Frankfurt Airport-B10 EC-IYG seat 4A Madrid Airport Knowledge RepresentationA Graphical Annotation for Ontology Instances How I came to Madrid… Martin Doerr P25 moved P25 moved P59B has section P26 moved to P27 moved from P26 moved to P27 moved from

Ontology EngineeringProcess Planning • Define methods of decision taking, revisions, (=> consensus, conflict resolution by analysis of implicit/unconscious purpose of defenders and examples) • Engineer the vocabulary – get rid of the language bias. (words are not concepts: “the child is safe” “I bought a book”). • Carry out a bottom-up process for IsA hierarchies, monontony! • Make Scope notes and definitions (but: Note limitations of definition!) • Do experimental “overmodelling” of the domain to understand the impact of simplifications

Ontology EngineeringThe Bottom-Up Structuring Process • Take a list of intuitive, specific terms, typically domain documents (“practical scope”). • too abstract concepts are often mistaken or missed! • Create a list of properties for these terms • essential properties to infer identity (coming into being, ending to be) • relevant properties (behaviour) for the discourse (change mental context!) • split term into concepts if necessary (“Where was the university when it decided to take more students?”) • Detectnew classes from property ranges. • Typically strings, names, numbers hide concepts. • Identify concepts independent from the relation: “Who can be a creator?”

Ontology EngineeringThe Bottom-Up Structuring Process • Detect entities hidden in attributes, find their properties • Property consistencytest • Test domain queries • Revise properties and classes • Create the class hierarchy • Revise properties and classes • Create property hierarchies • Revise properties and classes • Closing up the model - reducing the model • delete properties and classes not needed to implement the required functions.

Ontology EngineeringEvidence, Relevance and Evaluation An Ontology must be • capable to represent the relevant senses appearing in a source (empirical base) • Analyzing texts, dictionaries, • Mapping database schemata. • capable to answer queries queries useful for its purpose/function • “Where was Martin on Sept.16, 15:00 ?” Its concepts should be • valid under change of context: • E.g., is “This object has name ‘pencil’ ” valid in a pencil shop?. • objectively recognizable and likely to be recognized (useful for integration) • E.g., hero or criminal? • relevant measured by dominance/ frequency of occurrence in a source collection. • Balance subject coverage! Do not let experts get lost in details!

Ontology EngineeringPractical Tips: Theory of Identity • A class of individual particulars must define for them a substance on their own (scope note!), without depending on relations:“ Learning Object, Subtask, Expression, Manifestation” are not classes! “Work” is a class… • We must know how to decide when they come into/ go out of existence. • We must know what holds them together (unity criterion, scope note!), but we need not be able to decide on all parts! • Instance of a class must not have conflicting properties! (dead and alive, at one place and at many places?) = Is a collection material or immaterial? Is “Germany” one thing? • Essential properties of a class may be unknown! (Platon’s man). The scope note only “reminds” a common concept restricted by limited variation in the real world.

Ontology EngineeringPractical Tips: How to use IsA • No repetition of properties: What has a Doctor, Person, Animal, Physical Object in common? (count the freight weight of a plane). • Dangers of multiple IsA: Don’t confuse polysemy with multiple nature: Is the Zugspitze a place? Can a museum take decisions? • Identify “primitives”. Distinguish constraints from definitions: Platon’s man. “Washing machine” may be any machine washing. • IsA is a decrease of knowledge: “If I don’t know if he’s a hero, I know he’s a human…” • In an open world: never define a complement: Open number of siblings! Caution with disjoint classes!

Ontology EngineeringOther Tips • Avoid concepts depending on accidental and uncontextual properties (“creator”, “museum object”, “Buhmann”). • Maintain independence from scale: “hamlet – village”, at least introduce a scale-independent superclass. • Independence from point of view: Bying-Selling • Most non-binary relationships acquire substance as temporal entities. Never model activities as links. • Epistemological bias: Distinguish structure of what you can know (information system) from what you believe is (ontology). Quantification!

Introduction to Knowledge Representation and Conceptual Modeling