1 / 30

The S-words and the O-words

The S-words and the O-words. Panel – ESDSWG Thursday Oct. 21 2010. Peter Fox (RPI) pfox@cs.rpi.edu Tetherless World Constellation. Tetherless World Constellation tw.rpi.edu. Future Web Web Science Policy Social. Hendler. Themes. Xinformatics Data Science Semantic eScience

lisbet
Télécharger la présentation

The S-words and the O-words

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The S-words and the O-words Panel – ESDSWG Thursday Oct. 21 2010 Peter Fox (RPI) pfox@cs.rpi.edu Tetherless World Constellation

  2. Tetherless World Constellationtw.rpi.edu • Future Web • Web Science • Policy • Social Hendler Themes • Xinformatics • Data Science • Semantic eScience • Data Frameworks Fox McGuinness • Semantic Foundations • Knowledge Provenance • Ontology Engineering Environments • Inference, Trust Multiple depts/schools/programs ~ 35 (Post-doc, Staff, Grad, Ugrad)

  3. Working premise Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data that: • appears to be integrated • appears to be locally available But… data and information is obtained by multiple means (instruments, models, analysis) using various (often opaque) protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed AND created in a form that facilitates generation, not use (except by accident) And … significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

  4. Ideas • Getting answers to science questions and solving real problems • Solving new problems: • Open-world not closed-world • Solving the standards ‘mash-up’ ‘problem’ in a declarative way • Implementing science ‘logic’ • Staying distributed • Having a spectrum of choices

  5. Use Cases • What calibrations have been applied to this image?1 • What were the cloud cover and seeing conditions during the observation period of this image?1 • Why does this image look bad?1 • What are the processing and parameter differences between the MODIS Daily AOT Data Product vs. the MODIS Monthly AOT Data Product?2 1SPCDIS http://tw.rpi.edu/portal/SPCDIS 2MDSA http://tw.rpi.edu/portal/MDSA

  6. Multi-domain Knowledge Base

  7. Examining a Use Case Raw Image • What calibrations have been applied to this image? Flat-field Calibration Optics Calibration Process Angle of Incidence Calibration Data Calibration Process Science concepts Junk Data Filter Data Filtering Process Data Processing concepts Data Product Provenance concepts

  8. Knowledge Base with Provenance and Domain Models in Alignment Instrument #CHIP rdf:type rdf:type Source 2009-12-16T17:30:00-08:00 rdf:datatype xsd:DateTime rdf:type SourceUsage Data Capture rdf:type #He-1083 nm Continuum Image Capture hasInferenceRule hasSourceUsage Rule NodeSet rdf:type rdf:type #MyImage_justification rdf:type Justification CSR Image #MyImage rdf:type rdf:type Conclusion

  9. Concept Alignment (PML) Instrument SourceUsage Source Observation Period DateTime Data Capture Rule hasSourceUsage NodeSet Justification Raw Data Conclusion Calibration Rule hasAntecedentList Engine Data Calibration NodeSet Justification Conclusion Data Product

  10. Alignment via Ontology Constructs • Use ontology constructs to map a relationship between concepts in different domains • Can be defined in a separate ontology than the models being mapped • Does not require a change to the source models! • OWL • owl:equivalentClass • owl:equivalentProperty • owl:sameAs • RDFS • rdfs:subClassOf • Rdfs:subPropertyOf Instrument rdfs:subClassOf Source Calibration rdfs:subClassOf Rule Data Product rdfs:subClassOf Conclusion

  11. Direct Alignment using Rules* • Rules provide conditional logic on semantic constructs outside application logic • Rules can be updated or tweaked without requiring an application update. • Easily shared and managed • Provides for more complex mapping than ontology constructs ex:Instrument(?x)  pmlp:Sensor(?x) pmlp:Information(?x) ^ pmlp:hasURL(?x,?url) ^ swrlb:endsWith(?url, ”.hsh.fts ”) • Ex:CHIPIntensityImage(?x) *Many rule systems exist, this slide uses the Semantic Web Rule Language (SWRL)

  12. Querying/Interrogating the Knowledge Base #_A0 #RawImage Calibration • Back to the use case: What calibrations have been applied to this image? • We construct a query returns any individuals with type Calibration used as the InferenceRule in the justification from any artifact the current artifact was derived from. • We assume that any calibration applied to an artifact the current artifact was derived from can also be considered as ‘applied’ to the current artifact, and that the wasDerivedFrom property is transitive hasInferenceRule rdf:type wasDerivedFrom #Flat Field Calibration #_A0 #Intermediate1 wasDerivedFrom #Angle of Incidence Calibration #_A1 #Intermediate2 rdf:type hasInferenceRule Calibration wasDerivedFrom #_A2 #Image

  13. But back to reality Fragmentation Disconnection Encapsulation … all are bad for … the illusion of… transparency 20080602 Fox VSTO et al.

  14. What is the ecosystem? • Just a few elements, and they are scattered • But these are what enable scientists to explore/ confirm/ deny their ‘hunches’ Accountability Identity Explanation Justification Verifiability Proof Trust Provenance Transparency

  15. Provenance • Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility • Knowledge provenance; enrich with semantics (especially the relations between concepts previously isolated, and retaining context) and semantically-aware tools

  16. Provenance aware faceted search Data+Provenance+Ontologies+RDF+RDFa+ SPARQL Tetherless World Constellation

  17. MODIS Terra & Aqua vs. AIRS Cloud Top Pressure AIRS vs. MODIS Terra AIRS vs. MODIS Aqua Correlation maps for Jan 1 – 16, 2008 Impact:Findings using aerosol data apply to other geophysical parameters! MODIS Aqua vs. MODIS Terra

  18. Semantic Advisor Your Selected Options: Spatial Area: Longitude ( -30, 150), Latitude (-10,60) Parameters: A: MYD08_D3.005 Aerosol Optical Depth at 550 nm B: MOD08_D3.005 Aerosol Optical Depth at 550 nm Temporal Range: Begin Date: Jan 01 2008 End Date: Jan 31 2008 Visualization Function: Lat –Lon map Time-averaged About your selected parameters: Known Issues: The difference of EQCT and Day Time Node, modulated by data-day definition, caused the included overpass time difference, which makes the artifact difference. See sample images: MODIS Terra vs. MODIS Aqua AOD Correlation Included Overpass time Difference Continue process to display image Return to selection page

  19. Knowledge Integration • This is where we are • “Just” need to: • Train more people in how to do this • Implement and evaluate • Be gearing up for establishing trust in NASA science data (pixels, granules and products)

  20. What TIWG/SW has done

  21. 2007-2008 Hype Cycle for Emerging Semantic Web Technologies v0.7 Visibility Semantic Web Services Triple stores, e.g. Jena, Sesame, Mulgara, Oracle Spatial XML Semantic Wiki Ontology editor, SWOOP Concept map, Cmap Query Lang, SPARQL Smart search, e.g. NOESIS Estimated years to mainstream adoption in Earth science RDF OWL 1.0 Protégé Mid-level ES domain ontologies, e.g GEON Tagging / annotation Rules/Logic, SWRL < 2 years 2-5 years DL Reasoners, e.g. Pellet, Racer SKOS, FOAF Species Validators Query Lang, OWL-QL 5-10 years Upper level ontologies, e.g ABC, DOLCE, SUMO Mid-level ES domain ontologies, e.g SWEET OWL 1.1 > 10 years Obsolete before plateau Natural Language Ontologies Query Lang, Commercial and embedded QL Managing modular ontologies (ES and general) Time Slope of Enlightenment Plateau of Productivity Technology trigger Peak of Inflated Expectations Trough of Disillusionment Produced for NASA TIWG semantic web subgroup

  22. 2010-2011 Hype Cycle for Emerging Semantic Web Technologies v0.81 SKOS, FOAF RDFa Drupal 7 Tagging / annotation Visibility Rules/Logic, SWRL OWL 2.0 Smart search, e.g. NOESIS Triple stores, e.g. Jena, Sesame, Mulgara, Oracle Spatial Query Lang, SPARQL Semantic Web Services Managing modular ontologies (ES and general) XML RDF Ontology editor, SWOOP Linked data Concept map, Cmap Estimated years to mainstream adoption in Earth science Faceted Search OWL 1.0 Query Lang, Commercial and embedded QL Protégé Mid-level ES domain ontologies, e.g GEON < 2 years DL Reasoners, e.g. Pellet, Racer Natural Language Ontologies 2-5 years Species Validators 5-10 years Semantic Wiki Query Lang, OWL-QL Upper level ontologies, e.g. ABC, DOLCE, SUMO > 10 years RDFa RDF 2 ?? Mid-level ES domain ontologies, e.g. SWEET Obsolete before plateau Time Slope of Enlightenment Plateau of Productivity Technology trigger Peak of Inflated Expectations Trough of Disillusionment Produced for NASA TIWG semantic web subgroup

  23. Semantic Web Roadmap - Gap Analysis  Improved Information Sharing  Increased Collaboration & Interdisciplinary Science  Acceleration of Knowledge Production  Revolutionizing how science is done Results • Yellow - okay, or some effort, not proven • Orange - fair, definite gap, effort needed • Red - none or poor, serious gap, effort required Outcome  Geospatial semantic services established  Geospatial semantic services proliferate  Scientific semantic assisted services  Autonomous inference of science results Output  Some common vocabulary based product search and access  Semantic geospatial search & inference, access  Semantic agent-based searches  Semantic agent-based integration Assisted Discovery & Mediation Capability  Local processing + data exchange  Basic data tailoring services (data as service), verification/ validation • Interoperable geospatial services(analysis as service), results explanation service  Metadata-driven data fusion (semantic service chaining), trust Interoperable Information Infrastructure  SWEET core 1.0 based on GCMD/CF  SWEET core 2.0 based on best practices decided from community  Reasoners able to utilize SWEET 4.0  SWEET 3.0 with semantic callable interfaces via standard programming languages Technology Vocabulary  RDF, OWL, OWL-S  Geospatial reasoning, OWL-Time  Numerical reasoning  Scientific reasoning Languages/ Reasoning Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs) 23 23

  24. Semantic Web Roadmap (expanded capability) Capability  Some common vocabulary based product search and access  Semantic geospatial search & inference, access  Semantic agent-based searches  Semantic agent-based integration Assisted Discovery & Mediation  Ontologies for data mining, visualization and analysis emerging/ maturing  Common terminology captured in ontologies, crossing domains  Provenance/ annotation with ontologies in user tools  Some metadata and limited provenance available Assisted Knowledge Building  Verification is manual with minimal tool support  Ontologies for information quality developed  Domain and range properties in ontologies used in tools  Service ontologies carry quality provenance Verifiable Information Quality  Services must be hardwired and service agreements established  Services annotated with resource descriptions  Dynamic service discovery and mediation, and data scheduling  Semantic markup of data latency (time lags) which adapt dynamically Responsive Information Delivery  Local processing + data exchange • Basic data tailoring services (data as service), verification /validation • Interoperable geospatial services(analysis as service), results explanation service  Metadata-driven data fusion (semantic service chaining), trust Interoperable Information services  Limited metadata passed to analysis applications  Tag properties, non-jargon vocabulary for non-specialist use  Shared terminology for the visual properties of interface objects and graph types...  Semantic fields to describe tag key modal functions. Interactive Data Analysis  Access mediated by agreed standard vocabularies, hard-wired connections  Access mediated by common ontologies  Mediation aided by services with domain/ range properties  Key data access services are semantically mediated Seamless Data Access Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs) September 2008 24

  25. Roadmap - getting from near-term to mid-term -> requires agent development and vocabulary for agent characterization  Semantic geospatial search & inference, access Assisted Discovery & Mediation  Semantic agent-based searches -> requires mature (domain and data-type) ontologies with community endorsement and governance and a robust integration framework  Common terminology captured in ontologies, crossing domains  Ontologies for data mining, visualization and analysis emerging/ maturing Assisted Knowledge Building -> requires mature quality and uncertainty ontologies with domain and range properties added and populated  Ontologies for information quality developed  Domain and range properties in ontologies used in tools Verifiable Information Quality -> requires semantic service (ontology) registry  Dynamic service discovery and mediation, and data scheduling  Services annotated with resource descriptions Responsive Information Delivery -> requires service to implement v/v, new descriptions of analyses, developing explanation • Interoperable geospatial services(analysis as service), results explanation service  Basic data tailoring services (data as service), verification/ validation Interoperable Information services -> requires development of portal modal function vocabulary and ontology, link to domain context and data structure  Shared terminology for the visual properties of interface objects and graph types...  Tag properties, non-jargon vocabulary for non-specialist use Interactive Data Analysis -> requires adding properties to classes in ontologies and populating instances with expert agreement  Mediation aided by services with domain/ range properties  Access mediated by common ontologies Seamless Data Access first priority, second priority, third priority Capability Near Term (0-2 yrs) Mid Term (2-5 yrs)

  26. Semantic Web: Roadmap Details Competing catalog schemas Common semantic service catalog established Enhanced semantic search into search engines Automatic knowledge discovery and mining Standard workflow language (BPEL) Semantic framework for Web Services Semantic service chaining Built into code logic and in the head of the user Basic semantics (DL, FOL) High degree of semantic understanding Intelligent message routing (SOL) GCMD, CF, ESML, GML, etc. SWEET Core 1.0 VSTO, MMI, others SWEET core 2.0 + domain and math plug-in SWEET 3.0 + science applications plug-in XML, RDF OWL-DL, OWL-Full OWL-S, RIF PML first priority, second, third, done by others (comp. sci.), in place Discovery Intelligent algorithm programming chaining Workflow Inference Earth Science Standards Languages Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs) Syntax Explanation/Rules Proof/Trust Semantics

  27. Semantic Web: Roadmap Details – TRL estimate Competing catalog schemas Common semantic service catalog established Enhanced semantic search into search engines Automatic knowledge discovery and mining Discovery 8 6 6 2 Standard workflow language (BPEL) Semantic framework for Web Services Semantic service chaining, Intelligent algorithm programming chaining Workflow 8 4 4 2 Built into code logic and in the head of the user Basic semantics (DL, FOL) High degree of semantic understanding Intelligent message routing (SOL) Inference 8 4 2 Earth Science Standards GCMD, CF, ESML, GML, etc. SWEET Core 1.0 VSTO, MMI, others SWEET core 2.0 + domain and math plug-in SWEET 3.0 + science applications plug-in 8 8 4 2 XML, RDF OWL-DL, OWL-Full OWL-S, RIF PML 4 Languages 4 9 8 Current Near Term (0-2 yrs) Mid Term (2-5 yrs) Long Term (5+ yrs) Syntax Explanation/Rules Proof/Trust Semantics

  28. What NASA could fund now – priorities first priority, second priority, third priority • Basic data tailoring services (data as service), verification/ validation • Ontologies for information quality developed • Services annotated with resource descriptions • Mediation aided by services with domain/ range properties • Access mediated by common ontologies • Ontologies for data mining, visualization and analysis emerging/ maturing • Domain and range properties in ontologies used in tools • Common terminology captured in ontologies, crossing domains • Semantic geospatial search & inference, access • Semantic agent-based searches • Shared terminology for the visual properties of interface objects and graph types... • Interoperable geospatial services (analysis as service), results explanation service • Dynamic service discovery and mediation, and data scheduling • Tag properties, non-jargon vocabulary for non-specialist use September 2008 28

  29. Curriculum Data Science Semantic eScience Xinformatics

  30. Semantic Advisor RPI

More Related