1 / 19

Semantic Mediation of Scientific Data via Logic-Based Data Federation Software

Information Integration / Mediation . Goal:combine data from different sources s.t. the integrated whole is more than the sum of its isolated parts=> SDSC/CSE MIX project (Mediation of Information in XML)Standard Scenarios:C2B, e.g. comparison shopping: AddAll := I

menefer
Télécharger la présentation

Semantic Mediation of Scientific Data via Logic-Based Data Federation Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Semantic Mediation of Scientific Data via Logic-Based Data Federation Software Amarnath Gupta Bertram Ludäscher Reagan Moore San Diego Supercomputer Center University of California, San Diego

    2. Information Integration / Mediation Goal: combine data from different sources s.t. the integrated whole is more than the sum of its isolated parts => SDSC/CSE MIX project (Mediation of Information in XML) Standard Scenarios: C2B, e.g. comparison shopping: AddAll := IntegratedView(amazon, barnes&noble, ...) B2B, e.g. marketplaces: Virt_Market := IntegratedView(supplier_1, ... supplier_n) C2M, e.g. home-buyer: Full_Picture := IntegratedView(Realtor, Crime, Schools, ...)

    3. MIX Mediation Challenges MIX Mediator Architecture (middleware) wrappers: wrap different data into common format (XML) mediator: combines sources’ XML views into IntegratedView MIX Mediator Components declarative mediator view definition language: XMAS (XML Matching And Structuring) language, algebra, and first prototype ~ 1999 [SIGMOD99,EDBT00,...] query composition and rewriting esp. with limited source capabilities on-demand (“lazy”) query processing of virtual XML docs (DOM-VXD) Blended Browsing and Querying user interface (BBQ)

    4. New MIX Challenges from Scientific Applications Complex Data (S2S) SDSC’s Scientific Data Applications (current/planned, e.g. Neurosciences: SciDAC/SDM, NCMIR, NIH BIRN, Earth sciences, ...) show that syntactic/structural integration is insufficient for ... Complex Multiple-World Mediation Problems: complex, disjoint, seemingly unrelated data “hidden semantics” in complex, indirect relationships => Semantic (aka Model/Knowledge-Based) Mediation lift mediation to the level of conceptual models (CMs) use domain experts’ knowledge formalized as rules over CMs => Specialized Extensions temporal, geospatial, statistical, DQ/accuracy... operations => Extend Mediation Scope and Power via Deductive Rules

    5. A Neuroscience Question protloc = NCMIR, excel + images morphometry (measurement) = NCMIR, excel + txt +images neurotrans (stimulate then electrical responses, probes) = RDB, SENSELAB, Yale CaBP (chemical structure, PDB links, function of CaBP, found-in...) = Web, Vanderbilt U Expasy (Protein-info as Sequence data) = Web, Europeprotloc = NCMIR, excel + images morphometry (measurement) = NCMIR, excel + txt +images neurotrans (stimulate then electrical responses, probes) = RDB, SENSELAB, Yale CaBP (chemical structure, PDB links, function of CaBP, found-in...) = Web, Vanderbilt U Expasy (Protein-info as Sequence data) = Web, Europe

    6. Example for Formalizing Domain Knowledge: Domain Map (Ontology) for SYNAPSE and NCMIR

    7. Domain Map Refinement

    8. Semantic Annotation Tool for Domain Scientists

    9. Extended Mediator Architecture for Semantic Mediation

    10. ANATOM Domain Map with Registered Data

    11. Query Processing

    12. Mediator System Architecture

    13. Mediation Services: Source Registration (System Issues)

    14. Mediation Services: Source Registration (Semantics Issues) Domain Map Registration provide concept space/ontology … as a private object (“myANATOM”) … merge with others (give “semantic bridges”) … and check for conflicts Conceptual Model Registration schema: classes, associations, attributes domain constraints “put data into context” (linking data to the domain map)

    15. Mediation Services: Client Registration

    16. Other Existing Infrastructure Transparent Access to Remote Data Collections: Storage Resource Broker (SRB) and Metadata Catalog (MCAT) “Production-Level” Software PPDG: interface to LBNL Storage Manager, collection creation, replication management Use of manual and automatic wrapper technology (Minerva, Roadrunner, V. Crescenzi, Universita di Roma Tre) => XWrap Elite

    17. SRB and the Particle Physics Data Grid

    18. Year 1 Deliverables define interface metadata format (Critchlow) extend XWrap to generate wrappers using the interface metadata description instead of requiring human interaction (GT) develop a canonical XML-based query and response format as a dynamic interface between query engine and wrappers (Critchlow, GT, SDSC) communication via agent protocols? How about using digital library infrastructure (e.g. Simple Digital Library Interoperability Protocol, SDLIP) use extended XWrap to create wrappers for the genomics domain for evaluation (GT) extend the SDSC query and metadata architecture to interoperate with the LLNL DataFoundry (SDSC, Critchlow) ... interoperation at the wrapper level: Minerva wrappers, XWrap

    19. References Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society, April 2001. Model-Based Information Integration in a Neuroscience Mediator System, B. Ludäscher, A. Gupta, M. E. Martone, demonstration track, 26th Intl. Conference on Very Large Databases (VLDB), Cairo, Egypt, September 2000. Knowledge-Based Integration of Neuroscience Data Sources, A. Gupta, B. Ludäscher, M. E. Martone, 12th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Berlin, Germany, IEEE Computer Society, July 2000.

More Related