290 likes | 446 Vues
Geoscience Knowledge Representation Using the SWEET Ontologies. Rob Raskin Jet Propulsion Laboratory. Transforming Data into Knowledge. Data Information Knowledge. Basic Elements Bytes Numbers Models Facts
E N D
Geoscience Knowledge Representation Using the SWEET Ontologies Rob Raskin Jet Propulsion Laboratory
Transforming Data into Knowledge Data Information Knowledge Basic Elements Bytes Numbers Models Facts Services Ingest Archive Visualize Infer Understand Predict Storage File Database HDF-EOS GIS MIS Ontology Mind Interoperability Syntactic OPeNDAP WMS/WCS Semantic Volume/Density High/Low Low/High Statistics Checksum Moments Descriptive Inferential Analysis Fourier Wavelet EOF SSA Methodology Exploratory-analysis Model-based-mining Syntax Semantics
What is Knowledge? • Facts, relations, meanings, contexts • Organized information • Core ingredient in “common sense” • Common understanding • In a form to apply reasoning/inference • Dynamic • Expandable
Semantic Understanding is Difficult! Sea surface temperature: measured 3 m above surface Sea surface temperature: measured at surface Data quality= 5 Variable t: temperature Variable t: time Let’s eat, Grandma. Let’s eat Grandma. Time flies like an arrow. Fruit flies like a pie. LA Times headline “Mission accomplished. Major combat operations in Iraq have ended”
Database vs Knowledge Base • Database • Entities and Relations • Closed world • All facts included • Knowledge base • Classes and Properties • Collection of facts • Captures corporate memory • Open world • Facts not stated may be either true or untrue
PO.DAAC Knowledge Bases Public access Documents People Roles/Tasks Data Processing (Docushare) Data Products Metadata Tools/ Services Web Pages Science Concepts Missions Instruments Organiza- tions Applications Announce- ments Inquiries Computers
Relations • People have roles • Instruments measure science parameters • Inquiries relate to data products • etc.
Example of Knowledge-Assisted Service • Yellow Page Lookup: • cars vs automobiles • Hotels vs motels vs resorts
Semantic-based Service Example: Google • Type into Google: “gymnasiums in Seattle” • Generates map of Seattle with dots locating gyms • Google understands that • Seattle is a place • Gymnasiums is a place-based service • Google understands semantics so that the search results also could include • locations near Seattle • Similar services (e.g., health club)
Assertion of Facts as Triples Subject-Verb-Object representation • Flood subClassOf WeatherPhenomena • HDF subClassOf FileFormat • Pressure subClassOf PhysicalProperty • Ocean hasSubstance Water • AIRS measures Temperature
Applications • Software tools can find “meaning” in resources for • Discovery • Fusion • Lineage • … • Requirements • Data products associated with objects in “science concept space” • Richer descriptions than DIFs • Data services associated with objects in “service concept space” • Richer descriptions than SERFs • Search/fusion tools that exploit ontologies
Semantic Web Vision • Web page creators place XML tags around technical terms on web pages • XML tags point to knowledge base where term is “defined” • Search tools use this information to provide value-added services • Common search engines (Google) use these capabilities only minimally, at present
Ontologies • Current preferred method to store “facts” • General definition: “all that is known” • Computer science definition: Machine-readable definition of terms and how they relate to one another • As with a dictionary, terms are defined in terms of other terms • Provide shared understanding of concepts • Support knowledge reuse • Support machine-to-machine communications with deeper semantics than controlled vocabulary
XML-based Ontology Languages • XML satisfies desired properties for language syntax • Readable by both humans and machines • However, there are too many possible ways that XML tags can be named and used • No standardization of XML tag meanings as in HTML (<b> </b> pair => renders in bold) • Additional standardized semantics needed to exploit shared understanding of concepts
RDF and OWL • W3C has adopted languages that specialize XML Resource Description Formulation (RDF) • Ontology Web Language (OWL) • Languages predefine specific tags • RDF: Class, subclass, property, subproperty, … • RDF and OWL form a nested collection of languages, each roughly a specialization of the preceding language with further shared understanding • XML • RDF • RDFS • OWL Lite • OWL DL • OWL Full
Semantic Web for Earth and Environmental Terminology (SWEET) • SWEET is a concept space • Enables scalable classification of Earth system science concepts • Currently being expanded to Space science • Anybody can import, expand, and specialize the work of others • No need to regenerate a physics, chemistry, or math ontology • Concept space is translatable into other languages/cultures using “sameAs” notions
SWEET Ontologies and Their Interrelationships Faceted Ontologies Living Substances Non-Living Substances Integrative Ontologies Natural Phenomena Physical Processes Human Activities Earth Realm Data Physical Properties Space Time Units Numerics
SWEET as an Upper Level Earth Science Ontology Math Physics Chemistry Space import Property EarthRealm Process, Phenomena Substance Data SWEET Time import Stratospheric Chemistry Biogeochemistry Specialized domains
Why an Upper-Level Ontology for Earth System Science? • Many common concepts used across Earth Science disciplines (such as properties of the Earth) • Provides common definitions for terms used in multiple disciplines or communities • Provides common language in support of community and multidisciplinary activities • Provides common “properties” (relations) for tool developers • Reduced burden (and barrier to entry) on creators of specialized domain ontologies • Only need to create ontologies for incremental knowledge
How SWEET was Initially Populated • Initial sources • GCMD • Over 10,000 datasets • Over 1000 keywords • Data providers submit far more than the 1000 terms for “free-text” search • CF • Over 500 keywords • Very long term names • surface_downwelling_photon_spherical_irradiance_in_sea_water • Decomposed into facets
Spatial Ontology • Concepts of 0-D, 1-D, 2-D, and 3-D objects • Default coordinate system: lat/lon/up • Polygons used to store spatial extents • Spatial attributes added (population, area, etc.) • Scientific applications include: geology to represent 3-D structure
Numerical Ontologies • Numerics • Extents: interval, point, 0, positiveIntegers, … • Relations: lessThan, greaterThan, … • SpatialEntities • Extents: country, Antarctica, equator, inlet, … • Relations: above, northOf, … • TemporalEntities • Extents: duration, century, season, … • Relations: after, before, …
Numerical Ontologies (cont.) • Numeric concepts defined in OWL only through standard XML XSD spec • Intervals defined as restrictions on real line • Numerical relations defined in SWEET • lessThan, max, … • Cartesian product (multidimensional spaces) added in SWEET • Numeric ontologies used to define spatial and temporal concepts
Conceptual Ontologies • Phenomena • ElNino, Volcano, Thunderstorm, Deforestation) • Each has associated, spatial/temporal extent, EarthRealms, PhysicalProperties etc. • Specific instances included • e.g., 1997-98 ElNino • Human Activities • Fisheries, IndustrialProcessing, Economics, Public Good • State • History or state of planet or component
SWEET Users • ESML- Earth Science Markup Language • ESIP - Earth Science Information Partner Federation • GEON- Geosciences Network • GENESIS- Global Environmental & Earth Science Information System • IRI- International Research Institute (Columbia) • LEAD- Linked Environments for Atmospheric Discovery • MMI- Marine Metadata Initiative • NOESIS • PEaCE- Pacific Ecoinformatics and Computational Ecology • SESDI- Semantically Enabled Science Data Integration • VSTO- Virtual Solar-Terrestrial Observatory
Collaboration Web Site • Discussion tools • Blog, wiki, moderated discussion board • Version Control/ Configuration Management • Trace dependencies on external ontologies • Tools to search for existing concepts in registered ontologies • Ontology Validation Procedure • W3C note is formal submission method • Registry/discovery of ontologies • Support workflows/services for ontology development
Community Issues • Content • Maintain alignment given expansion of classes and properties • Standards and Conventions • Agreement on standards for use of OWL • Fuzzy representation conventions • Review Board • Who will oversee and maintain for perpetuity (or at least through the next funding cycle) • ESIP Federation? ESSI? • Global Support • Provide tools to visualize and appreciate the big picture
Update/Matching Issues • No removal of terms except for spelling or factual errors • Subscription service to notify affected ontologies when changes made • Must avoid contradictions • Additions can create redundancy if sameAs not used • Humans must oversee “matching” • CF has established moderator to carry out analogous additions • OWL “import” imports entire file • Associate community with ontology terms • Community tagging
Best Practices • Keep ontologies small, modular • Be careful that “Owl:Import” imports everything • Use higher level ontologies where possible • Identify hierarchy of concept spaces • Model schemas • Try to keep dependencies unidirectional