1 / 47

INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES

INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES. Mark Williams, University of Colorado. Forecasting. Reporting. Analysis. Done poorly. Integration. >>> Increasing value >>>. Data >>> Information >>> Insight. Distribution. Done poorly to moderately. Aggregation.

gur
Télécharger la présentation

INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INTEGRATED DATA SYSTEM FOR CRITICAL ZONE OBSERVATORIES Mark Williams, University of Colorado

  2. Forecasting Reporting Analysis Done poorly Integration >>> Increasing value >>> Data >>> Information >>> Insight Distribution Done poorly to moderately Aggregation Quality assurance Sometimes done well, by many groups,but could be vastly improved Collation Monitoring The water information value ladder Slide Courtesy CSIRO, BOM, WMO, Ilya, Dozier

  3. Provenance and transparency

  4. CZOs as platforms for research Integrating satellite & ground measurements with modeling CZO measurements provide the basis for advances in multiple Earth sciences CZOs are DATA-RICH places to develop & test Earth system models

  5. Challenges to CZO Data Management Atmosphere Biosphere Hydrosphere Lithosphere Minutes Decades Millenia Eons Hillslope Catchment Watershed • Many Object & Data Types! • Diverse media • Sensor-based • Stationary • Mobile • Spectra/photos • Sample-based • Sub-samples • Preparations/Fractions • Numeric & Categorical

  6. Sample Fractions for Soil GeochemistryAdapting SESAR IGSN for CZO Ziplock (~500g) Bulk soil horizon or depth increment glass vial: <2mm fines dry sieved EA-IRMS FTIR SA DRY SIEVE 2 mm <2mm SA WET SIEVE, or DENSITY, or SETTLING (with or without sonication) glass vial: sand + small detritus XRD CEC The choice here is important. Do we want aggregates or not? EA-IRMS FTIR SPEX mill ICP-MS after Li-borate fusion >2mm: glass vial: plant detritus milled (1) Pick out plant roots & detritus, rinse with DI water, oven dry, mill (SPEX?) SA glass vial: silt + clay EA-IRMS FTIR XRD CEC EA-IRMS FTIR SPEX mill glass vial: pebbles hard ground EA-IRMS FTIR (2) Remaining pebbles & rocks, hard grind ICP-MS after Li-borate fusion ICP-MS after Li-borate fusion XRD? Al Can (~70 g) For Gamma Counting 137Cs Extractions Dithionite-Citrate extraction Na pyrophosphate extraction Ammonium oxalate extraction Christiana River CZO example

  7. Overall Approach • Do not reinvent the wheel! Build on • CUAHSI HIS, EarthChemDB, LTER, etc • Consistent data presentation on web • Metadata • Data values • Central data system for data discovery • Harvested by SDSC (pull system)

  8. CZO data principles and policies • Each CZO will operate and be responsible for its own local data management system for collecting, organizing, quality controlling and publishing data through its web site. • Different philosophy than CUAHSI ODM • Each CZO is master of it’s own data • We don’t care what goes on under the hood • Each site uses it’s own protocols, data bases, etc • Allows CZO to honor site legacy data

  9. CZO data principles and policies • Each CZO publish’s its data on the web in ascii format with sufficient metadata so that the data can be unambiguously interpreted • Metadata follows a proscribed format • Data managers just need rules to follow • Easy to harvest by central portal • Makes it simple at the site level so scientists comply • Addresses the chokepoint that is getting data/metadata from the scientists to data managers

  10. Data Management Team • David Tarboton, Utah State. PI on the CUAHSI Hydrologic Information System (HIS) • Kerstin Lehnert, Columbia. PI on EarthChemDB • Ilya Zaslavsky, Lead, SDSC Spatial Information Systems Lab; hosts CUAHSI HIS. • Mark Williams, CU-Boulder. PI Niwot Ridge LTER • Anthony Aufdenkampe, co-I Christiana River Basin CZO

  11. Integrated CZO data system Synthesizing information management experience and software from CZO partners and neighboring earth science projects into a standards-based system for publishing environmental data to emphasize the “critical zone” nature of our shared data sets

  12. CZO Data Publication System Local CZO DB Local CZO DB Local CZO DB CZO Data Repository and Indexing (CZO Central) External cross-project registries CZOData Products Standard CZO Services DataNet, NEON CZO Desktop Applications Harvester Ontology Archive Shared vocabularies CZO Metadata CZO Web-based Data Discovery System CZO Desktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Spatial, hydrologic, geophysical, geochemical, imagery, spectral…

  13. Data Publication Process(for hydrologic time series) CZO Desktop ODM WaterML Service CZO Display File CZO Central Catalog Catalog Search Service Raw Display file metadata Is registered with the CZO data portal, to assure original data is discoverable and downloadable. WFS Service Is registered with the CZO data portal OGC WFS Service Broader internet community accessing data using standard protocols. OGC CSW Service CZO Portal utilizes the OGC CSW (catalog services for the web)

  14. CZO data interoperability: what does it mean System components Levels of interoperability Data discovery portal • Find and download CZO resources: files and file collections, services, documents – organized by CZO thematic category and by type • Data available in compatible semantics: ontologies, controlled vocabularies • Data available via the same service interfaces (e.g. WFS, SOS) but different information models • Compatibility at the level of domain information models and databases Different types of data collected by CZOs Shared vocabulariesand ontology management Wider variety of data Deeper integration Serviceadministration (CZOCentral) Well-understood data with formal information models available via standard services CZOdesktop, others

  15. Data disclaimer

  16. Data Catalogue • Biogeochemistry: Including: anything on (Carbon), N (Nitrogen), P (Phosphorus) nutrients, microbes • Climatology/Meteorology: Including: Met tower, temps, snow • Ecology/Biology: Including: microbial, land use • Geology/Chronology: Including: geologic, descriptions of rocks-mineralogy, CRN ages/rates • Geomorphology: Including: topography, chronological data, sediment flux, fracture space • Geophysics: Including: seismic refraction etc • Geospatial: Including: GIS/RS, imagery, geologic map, Gordon Gulch and GLV camera's

  17. Water Chemistry • Header group (/doc): - Title, Abstract, Investigator, Variable names, Keywords, Methods, Instrument, Citation, Publications, Comments • Header group, column information • COL1. Label=ValueAttribue, value=site • COL2. label=ValueAttribute, value=DateTime, UTCOffset=-7, Timezone=MST, format=”YYYYMMDD hh:mm” • COL3. label=ValueAttribute, value=pH, units=pH, SampleMedium=water, units=pH units, missing value indicator=, ,methods=method1, etc • Header group, column (series) defaults that apply to all columns (eg site below) • Data (/data) • GREENLAKE4,820311,6.4,18,88.51,0.40,,114.77,24.68,21.75,10.23,25.389,,58.296,83.200,,,,,,,,,,,,,,,,,, • GREENLAKE4,820422,5.7,18,90.15,2.00,,99.80,24.68,17.40,12.79,9.591,,72.870,44.928,,,,,,,,,,,,,,,,,, • Automatically harvested using WaterML and EML • ASCII format, metadata and comma-deliminated data

  18. CZO Data Management Web Administration Interface CZO data managers use this web-based system to register display files, edit service metadata, initiate data retrieval, validate the data against shared vocabularies, and update hydrologic time series services The administration system will be extended to geochemical samples and other data http://central.criticalzone.org

  19. Services edited and validated by CZO data managers Editable service definitions and management interface for each CZO data service Data managers control how their data is annotated. Ingesting of Display files is triggered on the server by the Data manager. Display file ingestion log

  20. CZO Central Catalog Statistics, March 24, 2011(time series services only)

  21. New Development: Central CZO Data Discovery Portal Registered data are organized by CZO thematic categories

  22. Display files from CZO web sites are registered to the data discovery portal automatically In addition, display files of known types are expressed as data services, which are also registered in the portal The portal is CSW-compliant (CSW=Catalog Services for the Web): can be federated with other catalogs including data.gov Supports search by location, resource type, thematic category, keywords, plus full-text abstract search Federation with CUAHSI HydroCatalog, to allow search of hydrologic data from ~70 networks

  23. Shared Vocabulary Local CZO DB Local CZO DB Local CZO DB CZO Data Repository and Indexing (CZO Central) External cross-project registries CZOData Products Shared Vocabulary DataNet CZO Desktop Applications Harvester Ontology Archive Shared vocabularies CZO Metadata CZO Web-based Data Discovery System CZO Desktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Spatial, hydrologic, geophysical, geochemical, imagery, spectral…

  24. CZO Shared Vocabulary System Purpose: To promote the consistent use of terminology. http://sv.critialzone.org Builds on CUAHSI HIS

  25. Data Managers and SV CSV Data File CSV Data File Local CZO Website Observation Database SV Database ❷ Unknown Term Email Data Managers ❶ Request Term Web Page ❸ XML SV List

  26. Preferred vocabularies. Moderators to be designated by CZO with expertise in each category • Variable names (extended from CUAHSI HIS) • Units (extended from CUAHSI HIS) (e.g. m, g/L) • Value type (from CUAHSI HIS) (e.g. Field observation, derived value, model output) • Sample type (from CUAHSI HIS) (e.g. stream water, ground water, rock, soil) • Data type (from CUAHSI HIS) (e.g. average over interval, cumulative, continuous, sporadic) • Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 = fully infilled and quality controlled) • Spatial references ( extensible based on EPSG) (e.g. NAD 1983, WGS84, UTM zone 11) • KEY: CZO expands ODM controlled vocabularies to a larger audience using “preferred vocabularies”

  27. Methods Major problem for metadata Solution: lookup table that is part of the controlled vocabulary Three parts: sample collection, sample preparation, analytical procedure Up and running, needs moderators

  28. CZO Spatial Data Local CZO DB Local CZO DB Local CZO DB CZO Data Repository and Indexing (CZO Central) Standard CZO Services Spatial Data CZO Desktop Applications Harvester Ontology Archive Shared vocabularies CZO Metadata CZO Web-based Data Discovery System CZO Desktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Spatial, hydrologic, geophysical, geochemical, imagery, spectral…

  29. Metadata and Spatial View • Metadata • Multi File control • Spatial Extent • Ex: LiDAR flights, transects, etc. • Point data (collected at particular location). • Uses Google Maps API • KML functionality Spatial View Guo lab, UC Merced

  30. Local CZO DB Local CZO DB Local CZO DB Geochemical Samples (based on CZEN) Depth-resolved geochemistry EarthChem Data Engine & Portal Geochemical web services, EarthChemDB CZO Desktop Applications Harvester IGSN management Archive Shared vocabularies Metadata CZO Web-based Geochemical DB CZODesktop Matlab R Standard CZO data display formats Excel Web site Web site Web site ArcGIS Modeling Geochemical samples

  31. Sample 1 Preparat./Treatment 2 Chemical Phys. Minr Others Personcontributor Meta-Data Publication Var-Lookup/Unit Country/State Landuse/Veg. Project Sources Methods Loc_info/Climate Geo-Info SMPLTime Series Precision Lab-Info Sub-sample Main Data Sub-smpl 2 ... Lab Analysis Sub-smpl n Location(Watershed) Sampling Site(Soil / Water) Sample(Layer/Depth) Sub-Sample Preparation/Treatment Analysis Data CZO Chemistry Database Conceptual Model – (CZOCHEMDB) Penn State lead

  32. Progress • Database is accessible at www.czo.psu.edu • PSU CZO students and post-docs have used template for data • entry • Susan Melzar (Colorado State) has used template and data • has been entered into database • Published data from Muhs et al. (2001), Harden 1987, White et • al. (2008) • Current version contains 1391 records, representing 17,604 data values • Ran webinar August 24th to show database capabilities and usage of data entry template • 15 participated with representation from all 6 CZO’s • User guide is in progress

  33. Integration withEarthChemDB EarthChem Portal EarthChem XML DB Topical Data Collections Geochemical Resource Library GEOROC External Databases datasets (original data & derived products) GCDM DB Metadata catalog NAVDAT USGS GfG Data Entry User Submission Kerstin Lehnert 35

  34. EarthChem Portal GEOROC NAVDAT USGS Others PetDB XML XML XML XML XML EarthChem Data Engine Database Partner databases encode their data & metadata in XML and send them to the EarthChem portal database in Kansas. Queries submitted at the EarthChem portal search the contents of the EarthChem Portal Database. EarthChem Data Engine Search & Visualization Similar to our ODM hydrology portal 36

  35. INTERNATIONAL GEOSAMPLE NUMBER D3-1 ? • Purpose: Unique identification for samples and related sampling features in the Earth Sciences • To allow unambiguous referencing of data to samples in publications and data systems • To allow tracking samples through repositories & labs • To allow integration of distributed data for samples

  36. Sample 1 Sample 1 Sample 2 Sample 2 Sample 3 Sample 3 Parent Child Parent Child Child Parent Core Section 1 Fossil separate Sample 1 Microprobe mount IGSN:XXX0065B3 Sample 2 Core Section 2 Rock powder Core IGSN:ABC0L653X Mineral conc. IGSN:ABC078HGB IGSN:XXX000120 IGSN:ABC0L53NW IGSN:XXX07ST4K Leachate IGSN:ABC0L98SW Core Section 3 IGSN:XXX9K23G6 IGSN:XYZ0G693M Geoinformatics for Geochemistry

  37. IGSN International Organization Managing Agent: SESAR ExoPlanet (invented example) Near Space Observatory (invented example) Registrar IEDA USGS Geoscience Australia CZO ICDP Registration Agents: Registrants: Analytical Lab Repository Investigator

  38. ADAPTING IGSN for CZO Register any type of sample: pedons, hand specimens, mineral concentrates, etc. … Register any type of material: soil, rock, sediment, fluid, gas, bio …. Register ‘sample-related features’: sites, wells, cores, dredges … Register relations (parent – children): e.g. site pedon  mineral

  39. Exploring A More General Data Model: ODM 2.0 To achieve interoperability between EarthCHEM, CUAHSI ODM, LTER EML Better support for samples and unique identifiers (IGSN/SESAR) Extensibility to table attributes Better annotation and provenance Enable integrated web service based publication of a broader class of CZO data

  40. ODM 2.0 – Field Sensor Extension to support field sensor deployments and in situ observations Sensor deployment details Attributes of sensor Data series from sensor

  41. ODM 2.0 – Provenance and Annotations Extensions Better support for storing provenance of observational data

  42. General Extensibility Provides capability to record information (add fields) in tables that was not anticipated a-priori

  43. Web-based User Access CZO Web Discovery CZO Desktop Other client systems GeoChemDB Search EarthChem Portal EarthChemXML CZO-Services Geochem Services (IEDA) USGS NAVDAT GEOROC EarthChemXML CZO-Central GeoChemDB [ODM 2.0] Geochemical database IEDA Data Publication Service (DataCite) GfG Data Validation & Ingest CZO Data Display Format IEDA Long-Term Archiving Service CZchemDB Sample Registration SESAR

  44. Where we are today • Each site has a data manager • Data sets are posted to the web • consistent metadata and ascii format in progress • We’ve prototyped harvesting data and posting to a central data portal • Shared vocabulary system in place • Developed protocol for unique sample ID • Partnering with EarthChemDB • Expanding ODM to become more general • Way beyond what I thought possible

  45. Work plan for next two years • Extending the CZO data publication model to geochemical and GIS data; then to other types of data • towards deeper interoperability • Integration based on service and information model standards (WaterML, EarthChemXML, EML, OGC services) • Requirements gathering from all CZOs, data modeling, display file format specification, services specification, development and validation • Upgrade to WaterML 2 once approved as international standard (~Q3, 2011) • Registering more hydrologic time series data via CZO Central • Regularly harvesting registered files and updating CZO services; keeping provenance information • Enhancing parameter-based search across CZOs, with a shared parameter ontology • Making CZO central data system more robust • Currently a single server with 24/7 monitoring; need redundant setup • Enhancing role of Data Managers

More Related