1 / 28

Bertram Ludäscher ludaesch@sdsc Data and Knowledge Systems San Diego Supercomputer Center

Towards Semantic Mediation for GEON: Facilitating Scientific Data Integration using Knowledge Representation. Bertram Ludäscher ludaesch@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego. +/- Energy. GEON Metamorphism Equation:.

orien
Télécharger la présentation

Bertram Ludäscher ludaesch@sdsc Data and Knowledge Systems San Diego Supercomputer Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Semantic Mediation for GEON:Facilitating Scientific Data Integration using Knowledge Representation Bertram Ludäscher ludaesch@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego

  2. +/- Energy GEON Metamorphism Equation: Geoscientists + Computer Scientists Igneous Geoinformaticists “Smart” Geologic Map Prototype: Kai Lin klin@sdsc.edu Data and Knowledge Systems San Diego Supercomputer Center Geo-Knowledge-Engineer: Boyan Brodaric brodaric@NRCan.gc.ca Natural Resources Canada ... and many GEONites : Dogan, Krishna, ..., State Geologic Surveys, Chaitan, Ilya, Michalis, Ashraf, ... (upcoming demo) Acknowledgements

  3. Midatlantic Region Rocky Mountains GEON and “Semantic” Data Integration

  4. What is Knowledge Representation? Relating Theory to the World via Formal Models Source: John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations “All models are wrong, but some are useful!”

  5. What is (an) “Ontology” ???(... what CS graduate students need to know ...) 1. Ontology as a philosophical discipline 2. Ontology as a an informal conceptual system 3. Ontology as a formal semantic account 4. Ontology as a specification of a “conceptualization” 5. Ontology as a representation of a conceptual system via a logical theory 5.1 characterized by specific formal properties 5.2 characterized only by its specific purposes 6. Ontology as the vocabulary used by a logical theory 7. Ontology as a (meta-level) specification of a logical theory [Guarino’95] http://ontology.ip.rm.cnr.it/Papers/KBKS95.pdf

  6. What is an Ontology? (CSE-291 cont’d ;-) • Given a logical language L ... • ... a conceptualization is a set of models of L which describes the admittable (intended) interpretations of its non-logical symbols (the vocabulary) • ... an ontology is a (possibly incomplete) axiomatization of a conceptualization. set of all models M(L) logic theories ontology conceptualization C(L) [Guarino96] http://www-ksl.stanford.edu/KR96/Guarino-What/P003.html

  7. domain knowledge ? Information Integration Knowledge Representation: ontologies, concept spaces Database mediation Data modeling raw data Problem: Scientific Data Integration ... from Questions to Queries ... What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? “Complex Multiple-Worlds” Mediation GeoPhysical (gravity contours) Geologic Map (Virginia) GeoChronologic (Concordia) Foliation Map (structure DB) GeoChemical

  8. Got Glue? Which one? What for? • XML (common syntax) • flexible (semistructured) data model • used at all levels: data / metadata exchange, message exchange (SOAP), schemas & data types (XML Schema), Semantic Web & web ontologies (RDF(S), OWL), ... • Grid infrastructure (system interoperation) • distributed computing and data management • web services • Controlled Vocabularies (“joins”) • data level: joins across different data sets • but meta-data and ontologies (concept names, relationship names, ...) are also data! • Integrated View Definitions (mediated views/virtual databases) • declarative specification of “integration logic”: XQuery, Datalog, ... • Thesauri (translator for retrieving related information) • synonyms, broader/narrow term, e.g., UMLS (meta-thesaurus, “ontology”) • Taxonomies (classification) • shared vocabulary, concept hierarchy (is-a) • Ontologies (classification + additional semantics): • formal specification of a conceptualization, shared meaning • facilitates “smart querying”, semantic mediation

  9. Semantics Structure Syntax • reconciling S4heterogeneities • “gluing” together multiple data sources • bridging information and knowledge gaps computationally System aspects Information Integration Challenges • System aspects: “Grid” Middleware • distributed data & computing • Web Services, WSDL/SOAP, OGSA, … • sources = functions, files, data sets, • … • Syntax & Structure: • (XML-Based) Data Mediators • wrapping, restructuring • (XML) queries and views • sources = (XML) databases • Semantics: • Model-Based/Semantic Mediators • conceptual models and declarative views • Knowledge Representation: ontologies, description logics (RDF(S),OWL ...) • sources = knowledge bases (DB+CMs+ICs)

  10. Standard (XML-Based) Mediator Architecture USER/Client Query Q ( G (S1,..., Sk) ) Integrated Global (XML) View G Integrated View Definition G(..) S1(..)…Sk(..) MEDIATOR (XML) Queries & Results (XML) View (XML) View (XML) View wrappers implemented as web services Wrapper Wrapper Wrapper S1 S2 Sk

  11. Integrated-DTD := XQuery(Src1-DTD,...) Integrated-CM := CM-QL(Src1-CM,...) “Glue Maps” ontologies, concept spaces Semantics, Constraints in Logic No Semantics / Domain Constraints IF  THEN  IF  THEN  IF  THEN  Structural Constraints (DTDs), Parent, Child, Sibling, ... Classes, Relations, is-a, has-a, ... C1 A = (B*|C),D B = ... C2 R C3 . . .... .... .... XML Elements .... (XML) Objects XMLModels Raw Data Raw Data ConceptualModels Raw Data XML-Based vs. Semantic Mediation CM ~ {Descr.Logic, ER, UML, RDF(S), …} CM-QL ~ {F-Logic, …} 0.0155381,1.54906,2,140,29,Tertiary,Trc,CHINLE FORMATION,59,57

  12. GEON Framework for Interoperability in the Geosciences • Systems level: GEON Grid ... • enable sharing of data and tools via grid services • based on Open Grid Services Architecture (OGSA) • acquisition of cluster endpoints and initial deployment at some sites underway, including SDSC, UTEP, VT, ..., • Syntactic and schema level: Data integration via (meta)data standards (often XML-based) • database mediators create integrated virtual databases => dynamic creation and automatic update of data-warehouses • Semantic level: data integration via “semantic” mediation • Situating 4-D data in context  spatio-temporal, thematic, processcontexts can be represented as “concept spaces” • specifically: use of ontologies, and logic-based knowledge representation • development guided/driven by specific scientific data integration problems

  13. Towards Shared Conceptualizations: High-level Domain Ontology & Standard Data Model Adoption of a standard (meta)data model => wrap data sets into unified virtual views Source: NADAM Team (Boyan Brodaric et al.)

  14. Towards Shared Conceptualizations: Data Contextualization via Concept Spaces

  15. Towards Knowledge Sharing: Rock-type “Ontology” Genesis Fabric Composition Texture

  16. Biomedical Informatics Research Network http://nbirn.net Getting Formal: Source Contextualization & Ontology Refinement in Logic

  17. domain knowledge Knowledge representation AGE ONTOLOGY Nevada Show formations where AGE = ‘Paleozic’ (with age ontology) Show formations where AGE = ‘Paleozic’ (without age ontology)

  18. Querying with Multiple Classifications/Ontologies:Age, Composition, Texture, Fabric, Genesis

  19. What to do with the “KR Glue”? • Conceptual-level information, concept spaces, ontologies, and other KR techniques for ... • ... smart data discovery • ... browsing and querying by themes, disciplines, ... • ... defining virtual/mediateddatabases at conceptual level • ... support “plugging together” of “data and information experiments” into Scientific Workflows (a.k.a. Analytical Pipelines in the SEEK ITR) • ... smarter user interfaces • is “find felsic sedimentary rocks” a meaningful (satisfiable) query? • ...

  20. Some enabling operations on “ontology data” • Concept expansion: • what else to look for when asking for ‘Mafic’ Composition

  21. Some enabling operations on “ontology data” • Generalization: • finding data that is “like” X and Y Composition

  22. Towards Knowledge Sharing: Rock-type Ontology Genesis Fabric Composition Texture

  23. DEMO... do NOT click this ... http://kbis.sdsc.edu/GEON/ahm03-demo.html

  24. request response Architecture of Integrated Geologic Map Prototype System Map Definition HTTP Server (Java Server Page) local layer remote layer local layer MapServer (Minnesota) Mediator (Java application) Database (Arizona) Database (Montana) Global Ontology Definitions Rock classification Geologic age

  25. Data Source Wrapping and Integration ABBREV Arizona PERIOD FORMATION AGE Idaho NAME Colorado PERIOD LITHOLOGY Utah TYPE PERIOD Nevada FMATN TIME_UNIT Wyoming NAME Livingston formation FORMATION PERIOD Tertiary-Cretaceous Montana West AGE New Mexico NAME PERIOD LITHOLOGY andesitic sandstone Montana East FORMATION PERIOD

  26. Ontology-Enabled Query Processing User: “Show formations from Cenozoic!” Age Ontology Cenozoic Query Rewriting Quaternary Tertiary select FORMATION where AGE=“Tertiary” or AGE=“Quaternary” PERIOD FORMATION PERIOD LITHOLOGY ABBREV Arizona Montana West Map Rendering Color Definition

  27. MANY! non-available or non-interoperable data “Dirty data”, no controlled vocabularies Many different controlled vocabularies! (“clean data”) What is entailed by a vocabulary?  Formal Ontologies  Extensible Ontologies Integration Challenges

  28. What’s next? • YOU! • GEON-SCI: • Science questions waiting to be turned into queries! • GEON-KR Working Group activities • guided (if not driven by) geoscientists • marry KR technologies to standards (W3C, Semantic Web: RDF, OWL, ...) • collect GEON-able KR resources (data models, controlled vocabularies, ontologies, ...) • GEON-DEV: • Generalize and merge current KR/semantic mediation architecture with standard Grid architecture • building systems

More Related