Department of Computer Science & Engineering University of California, San DiegoCSE-291: Ontologies in Data IntegrationSpring 2003Ontologies in Action Amarnath Gupta GUPTA@SDSC.EDU
Overview • Information Integration • Querying with Ontologies • Registering Into Ontologies • Ontologies of Processes • An Application Scenario • A Disease Map • A look at a theory
Ontologies in Information Integration • Why is information integration with Ontologies different from “regular” information integration? • Regular Information Integration • Assume relational sources S1, S2 • S1 exports relation R1(patientID, brain_region, brain_vol) • S2 exports relation R2(species, brain_region, protein, density) • Define an “integrated view” • V1(B, V, P, D) if R1(_, B, V) R2(“human”, B, P, D) • A Query against the view • Ans(Brain_region, Protein) if V1(Brain_region,V,Protein,D) D > 0 V < 0.25
Information Integration under GAV • Ans(Brain_region, Protein) if V1(Brain_region,V,Protein,D) D > 0 V < 0.25 • Ans(Brain_region, Protein) if R1(_, Brain_region, V) R2(“human”, Brain_region, Protein, D) D > 0 V < 0.25 • Ans(Brain_region, Protein) if R1(_, Brain_region1, V) R2(“human”, Brain_region2, Protein, D) Brain_region1= Brain_region2 D > 0 V < 0.25 • Ans(Brain_region, Protein) if R1(_, Brain_region, V) V < 0.25 R2(“human”, Brain_region, Protein, D) D > 0 Brain_region1= Brain_region2 • Ans(Brain_region, Protein) ifR1(_, Brain_region1, V) V < 0.25 @S1 R2(“human”, Brain_region2, Protein, D) D > 0 @S2 Brain_region1= Brain_region2 @mediator
neuron brain granule cell Purkinje cell layer compartment Purkinje cell molecular layer fiber bundle cerebellum dendrite axon cell body granular layer cortex medullary center r. cerebellar hemisphere l. cerebellar hemisphere vermis flocconodular lobe corpus cerebelli posteolateral fissure paravermeal zone deep cerebellar nuclei flocculus posterior lobe anterior lobe primary fissure interposed nucleus dentate nucleus inf. olive nucleus globose nucleus folia fastigial nucleus cerebellar peduncle brain stem Sup. CP Mid. CP receives_afferent_from Inf. CP attaches(cp,cerebellum,bstem)
Effect of an Ontology in GAV Integration • Ontologies provide • relations (subclass, part-of…) over terms and axioms about relations • Part-of can be of different kinds • member-collection (axons are part of a fiber bundle) • component-object (compartments like axon are components of a neuron) • portion-mass (myelin-sheath around axons constitute white matter of the brain) • stuff-object (cytosol is the constituent part of cytoplasm) • phase-activity (metastasis is a phase of cancer) • place-area (Manhattan is a place in New York) • feature-event • For each flavor of part-of there is a transitive relation part-of-tr within itself but not necessarily with respect to each other • Arm is a part of a musician, and a musician is part of an orchestra BUT an arm is *not* part of an orchestra!! • constraints in the form logic statements • Intensional (derived) relations: • inside(a,b) if part_of(mc)(a,b) part_of(co)(a,b) part_of(pm)(a,b) spatially_in(a,b) • Integrity constraints • The protein “neuN” is not expressed in Purkinje cells
Effect of an Ontology in GAV Integration • Consider the same case • S1 exports relation R1(patientID, brain_region, brain_vol) • S2 exports relation R2(species, brain_region, protein, density) • Ontology source Ont exports all relations and constraints shown before • Define an “integrated view” • V1(B, V, P, D) if R1(_, B1, V) R2(“human”, B2, P, D) part-of-tr(B2,B1) • A Query against the view • Ans(Brain_region, Protein) if V1(Brain_region,V,Protein,D) D > 0 V < 0.25 • Ans(Brain_region, Protein) ifR1(_, Brain_region1, V) V < 0.25 @S1 R2(“human”, Brain_region2, Protein, D) D > 0 @S2 part-of-tr(Brain_region1,Brain_region2) @Ont • Issues • The possibility of having recursive queries and having recursive views • Smart use of constraints in query evaluation
Using Constraints in Query Evaluation • Techniques from Semantic Query Optimization • V1(B1, B2, P, D) if R1(_, B1, V) R2(“human”, B2, P, D) part-of-tr(B2,B1) • Ans(Brain_region, Density) if V1(“cerebellum”, Brain_region, “neuN”, Density) IC1: Density = 0 if R2(“human”, “Purkinje Cell”, “NeuN”, Density) IC2: Density2 = 0 if R2(S,B1,P,0) R2(S,B2,P,Density2) part-of-tr(B2,B1) • Modify the query • Ans(Brain_region, Density) if V1(“cerebellum”, Brain_region, “neuN”, Density) not(Brain_region=“Purkinje_cell”) How would you compute a residue? How complex/feasible is this computation? Residue(derived predicate) • But more importantly • How do you control evaluation of a recursive predicate in Ont by supplying integrity constraints from the mediator or a data source? • By invoking general recursion control mechanisms?– OPEN RESEARCH PROBLEM
The Registration Problem • Suppose a semantic mediator system already exists with n sources • A new source Sn+1 wants to join the mediator such that • The mediator can simply “read in” the source’s model without any disruptions • All existing integrated views can make “best effort” use of the new source seamlessly • Problems: • What does the source need to declare itself to mediator? • How does the mediator use this information to assimilate the new source?
Source Description • Conceptual Model • Local Ontology (ONT) – the terminological vocabulary used by the schema • Properties of relationships in the ontology • Object Model (OM) – the export schema • Ontological Grounding (ONTG) – relationship between export schema and local ontology • Contextualization (CON) – relationship of OM and ONT with mediator’s knowledge base ONT(M) • CSL: a language to express CON formulae
surrounds(Structure.ID, Structure.ID) deposit_loc(Deposit.ID) Structure.ID An Example Objects Local Ontology cell Image has has substance stores nucleus cytoplasm has has Structure mitochondrion cytosol endosome has has membrane matrix Deposit has inner membrane Associations Property of Local Ontology tc_has(X) = trans_closure(has(X)) Functions Ontological Grounding dom(Image.Struct) in tc_has(cytoplasm) Structure.Name stores Protein
Roles of Ontological Grounding • Semantic Constraints on Attribute Domains • Image.Struct has to be below Cytoplasm • Refinement of local Ontology • Cytoplasm stores substances, but instances of the exported object called Structure stores only proteins • Intensional Definitions • DENATURED PROTEIN(ProtName) IF DEPOSIT(ID, ProtName, protein, dark, _), deposit in structure(ID) NULL;
Contextualization • WHAT: Local schema elements are expressed as views over mediator’s ontology • Recall: integrated views are still defined in a global as view fashion • WHY: The LAV technique allows sources to join while queries against GAV views do not need us to do an inverse rule mapping
Context Specification Language • Types of local schema elements • From Object Model: classes(S), attributes(S), associations(S), instances(S) • From Local Ontology: concepts(S), relationships(S) • From Both: constraints(S) • Types of mediator’s schema elements • concepts(M), relationships(M), constraints(M) • Context specification map (correspondence relation)(X1,…, Xn) IF type declarations, body • Correspondence relation: the name of the mapping • X1 …Xn : the S elements and the M elements • Type declarations: types of the S and M elements • Body: the actual mapping definition
Context Specification Language • map (subconcept)(cytoplasm, cell_compartment) IF cytoplasm:concepts(CCDB), cell_compartment: concepts(mediator) • Relates a concept of the local ontology (cytoplasm) to that of the mediator’s ontology(cell_compartment) – cytoplasm is a cell compartment • Consider a query at the mediator • “Which cell_compartments have associated images?” • The mapping will enable the mediator to ask the CCDB source “Which ‘isa descendants’ of ‘cytoplasm’ have associated images?” • Using ontological grounding the source can translate this to a query against the Image class
Some Example Cases • map (concept-concept)(regulates( nejire,CREB )) IF nejire:concepts(mediator), CREB:concepts(CCDB) • The mapping instantiates a relation (regulates) between the mediator’s concept nejire and CCDB’s concept CREB • Query enabled: “Find images with deposits of nejire-regulated proteins” • map (concept concept)(tc_regulates(nejire, CREB)) IF nejire:concepts(mediator), CREB:concepts(CCDB) • Query enabled: “Find images with deposits of proteins that are indirectly regulated by nejire” • The query will traverse the “regulates” edges in the mediator and the source to find all paths between nejire in the mediator and CREB in CCDB. The concepts in the path will then be used to answer the query.
Some Example Cases • Relating edges map (assoc-rel)(surrounds(s1 s2), inverse( inside(s2,s1)) IF surrounds(s1; s2):assoc(CCDB), inside(s2,s1):relationships(mediator), not has_part(s1,s2) • The mediator’s ontology has a relationship “inside” and the source’s object model has an association called “surrounds” • They are almost inverses of each other • A surrounds B B inside A unless B part_of A • This brings out the conceptual difference between the source’s semantics of a relationship and the mediator’s semantics of the same • The mapping will force the mediator to test the has_part condition before pushing a (rewritten) query to the CCDB source
Registration at Mediator • The source sends the mediator its conceptual model including the CSL mappings • The mediator • Stores the description in a global registry • Updates ONT(M) with new relationships or rules about the relationships, duly tagged by the source name • Translates ontological groundings to executable rules • domain(STRUCTURE.volume) in [0,300] becomes false :– X:structure[volumeV], not (0 < V < 300)
Registration at Mediator • Translates each CSL statement to two rules map (subrelation)(has(co); has part) IF has(co):relationships(CCDB), has_part:relationships(mediator) translates to: has part(X,Y) :– CCDB.has(co)(X,Y) (derive) false :– CCDB.has(co)(X,Y), not has_part(X,Y) (denial) • The first rule is an IDB for has_part • The second rule is an integrity constraint
Ontologies of Processes • What is a Process? • From Merriam-Webster 2 a (1) : a natural phenomenon marked by gradual changes that lead toward a particular result <the process of growth> (2) : a natural continuing activity or function <such life processes as breathing> b: a series of actions or operations conducing to an end; especially: a continuous operation or treatment • Revisiting the Central Theme of Formal Ontology • Given a logical language L ... • ... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary) • ... an ontology is a (possibly incomplete) axiomatization of a conceptualization. • Theory of formal distinctions among things and relations • Basic tools • Theory of parthood • Theory of integrity • Theory of identity • Theory of dependence
Disease Maps: “Designing” an Ontology • On-going work (Gupta, Ludäscher, Martone, Grethe) • Goal: to characterize the processes, manifestations and outcomes of a specific disease (or family of diseases) • A node and edge labeled multigraph where logical formulae can be constructed over subset of edge labels to describe • Transitive relations • Temporal relations • Causal relations • … • Views • A subgraph that reflect the viewpoint of a specific discipline • Elaborations and Abstractions • A “zoom in” ability where a subgraph may be the detail of another smaller subgraph • Query Support • Path and subgraph extraction, closure computation, graph aggregates, homomorphic graph matching, consequence derivation Can such an ontology be constructed with one formalism? How do you combine different formalisms and still obtain the right conclusions?
Apoptosis (Suicide of a Cell) Apoptosis Triggering Event Receipt of Death Signal Degeneration Disintegration shrink mitochondria break down release of cytochrome c bleb development on surface degradation of chromatin in nucleus • Processes have phases (temporal part-of) • Every process P goes through the phases • initiate-progress-terminate • Every phase can be progressively divided into finer sub-phases
An Intuitive Attempt to Formalize • Let S0 be an initial situation • Let occurs be a distinguished binary function symbol • occurs(a, s) denotes a successor situation to situation s resulting from event a • events may be parameterized • degrades(chromatin, nucleus) may mean that chromatin degrades in the nucleus • occurs(degrades(chromatin, nucleus), s) demotes the resultant situation occurring due to degradation of chromatin when the current situation is s • occurs(degrades(chromatin, nucleus, occurs(bleb_development, occurs(release(cytochrome_c), S1))) refers to the sequence of events [release(cytochrome_c), bleb_development, degrades(chromatin, nucleus)]
A Step Back: Second Order Logic • First order logic permits • quantification over individuals • Second order logic permits • quantification over predicates and functions • Thus a second order logic has • Predicate variables – Xn1 for infinitely many n-place predicates • Function variables – Fn1 for infinitely many n-place functions • Second order logic is incomplete!! • It is not possible to have an axiomatization and rules of inference that can recursively enumerate all and only the valid second-order sentences • However, second order theories and their special cases are useful for developing the ontological basis for processes
Situation Calculus [McCarthy, Reiter, Levesque](adapted for our purpose) • Lsitcalc is a second order language with equality • Sorts: events, situations, objects • Logical Symbols and Quantifiers:, , , \forall, \exists • Function Symbols of sort situation: • Constant symbol S0, called initial situation • Binary function occurs: event situation situation • Binary predicate symbol \sqsubset: situation situation • Defines an ordering relation (temporal part-of) on situations • Binary predicate symbol poss: event situation • poss(a, s) means it is possible for event a to occur in situation s • Countably infinitely many symbols for • n-ary predicates (event object)n • Functions (event object)n object and (event object)n event
Situation Calculus(adapted) • Relational Fluents • Infinitely many predicate symbols of sort (event object)n situation • They are situation-dependent relations, i.e., predicates with situation-dependent truth value • binds-to(FasL, cell-surface) is a relationship between FasL and cell-surface, but it is not always true • binds-to(FasL, cell-surface, occurs(bound(toxic-T-cell, target),s)) • Functional Fluents • Infinitely many function symbols of sort (event object)n situation event object • Since chromatin-content(cell) varies with the state of apoptosis Represent it as:chromatin-content(cell, s) situation term
Examples of Fluents initially: location(MPP+) = synaptic_cleft occurs(uptake_by(DAT)): location(MPP +) = bound_to(DAT) occurs(release_by(DAT)): location(MPP+) = inside(neuron) occurs(transport_to(DAT,nucleus)): location(MPP +) = inside(mitochondria) Neurotoxin ‘MPTP’ is converted to ‘MPP+’ by ‘MAOB’ in the synaptic cleft. The active form ‘MPP+’ is picked up by the dopamine transporter, and released inside the neuron, where it accumulates in mitochondria. This leads to complex I (an antioxidant) inhibition, which leads to free radical generation. • Relation content(Organelle, Substance, Concentration) • initially:content(mitochondria, MPP+, 0) • occurs(transport_to(nucleus)): content(mitochondria, MPP+, inc(0)) • occurs(transport_to(nucleus)): content(mitochondria, MPP+, inc(inc(0)))
The Frame Problem If there are E events and S situations, 2 E S frame axioms may be needed!! • Events have • Preconditions • Poss(breakdown(mitochondria), s) releases(Bcl2, Apaf1, s) leaks(cytochrome-c, mitochondria,s) • Effect Axioms • An effect axiom states how an event affects the value of a fluent • membrane(x,cell) ion(y) permeable(x,y) enters(y, cell, occurs(high-conc(y,outside(x)),s)) • Fluents have • Frame Axioms • A frame axiom specifies the event invariants (fluents that are not affected by an event) of a domain • Positive frame axiom • content(mitochondria, y, V, s) content(mitochondria, y, V, occurs(enters(y, cell, s))) • Negative frame axiom • high-conc(x, cell,s) [xy] high-conc(x, cell, occurs(high-conc(y,outside(cell)),s))
Toward a Conclusion • Solutions for the Frame Problem • Causal Completeness Assumption • We know all preconditions under which an event causes a fluent to change values to a successor state • Explanation Closure Assumption • We know all events that may cause a fluent to change its value • Unique Name Assumption • Identical events have identical attributes • Then, the number of axioms can be reduced to the order of E+F provided • conditional, iterative, recursive and nondeterministic events do not occur • For a multi-theory Ontology like a disease map • We need much more than a description logic and a situation calculus
References • D. Leviant, “Higher Order Logic” In D.M. Gabbay, C.J. Hogger and J.A. Robinson (eds.), Handbook of Logic in Artif. Inell. And Logic Programming, pp. 229-321, Clarendon Press, Oxford, 1994. • A. Gupta, B. Ludäscher, M. E. Martone, “Registering Scientific Information Sources for Semantic Mediation”, 21st International Conference on Conceptual Modeling, (ER), Tampere, Finland, pp. 182-198, October 2002. • J. McCarthy, Situations, actions and causal laws. Tech. Report, Stanford Univ., 1968. • R. Reiter, Knowledge in Action, The MIT Press, Cambridge, MA, 2001. • P. Godfrey, J. Grant, J. Gryz, and J. Minker, “Integrity constraints: Semantics and applications” In Jan Chomicki and Gunter Saake (eds.), Logics for Databases and Information Systems. Kluwer, 1998. • U. Chakravarthy, J. Grant, and J. Minker, “Logic-based approach to semantic query optimization”, ACM Transactions on Database Systems, 15(2), pp. 162-207, 1990. • R. Kolwaski and M. Sergot, “A logic-based calculus of events”, New Generation Computing, 4, pp. 67-95, 1986.