1 / 34

CZO Integrated Data Management Data Model and Metadata

CZO Integrated Data Management Data Model and Metadata. David Tarboton. Based on CUAHSI HIS.

barr
Télécharger la présentation

CZO Integrated Data Management Data Model and Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CZO Integrated Data ManagementData Model and Metadata David Tarboton

  2. Based on CUAHSI HIS Internet based system to support the sharing of hydrologic data comprised of hydrologic databases and servers connected through web services and software for data publication, discovery and access. Metadata Data Discovery and Integration HIS Central Analysis Data WaterML GML OGC Services HydroServer HydroDesktop Data Synthesis and Research Data Publication CUAHSI HIS Sharing hydrologic data ODM Geo Data Support EAR 0622374

  3. Data System Overview CZO Desktop CZO Central GetSites GetSiteInfo GetVariableInfo GetValues WaterML Data Store WaterOneFlow Web Service Harvester ASCII text Standardized web based display Boulder Shale Sierra Luquillo Jemez Christina CZO Servers

  4. Requirements • Sufficient metadata for published CZO data to be unambiguously interpreted and used • Each CZO operate own local data management system • Format used to present data and metadata should be identical across CZOs and should support heterogeneous local systems • Local systems are autonomous with local control on the release and publication of data

  5. Access • Users required to agree to CZO data use policies • Same data use agreement for all CZOs • Data accessible freely to registered users who have agreed to policy

  6. Information Hierarchy • National CZO • Experimental Watershed • Sites • Variables • Series • Data values

  7. Abstract data model • (where) location, object or platform identifier • (when) date and time • (what) attribute (or identifier of attribute) • THE VALUE • (how) method (or identifier of method) • (who) creator (or identifier of creator or data source)

  8. Data series • used as an organizing construct • logical grouping of data values (usually from a column in a table) • commonly, but not limited to time series (e.g. type series with depth) • Properties we control become identifying series-level attributes • Properties we measure become variables or variable level attributes

  9. Why an Observations Data Model • Syntactic consistency (File types and formats) • Semantic consistency • Language for observation attributes (structural) • Language to encode observation attribute values (contextual) • Publishing and sharing research data • Metadata to facilitate unambiguous interpretation • Enhance analysis capability What are the basic attributes to be associated with each single data value and how can these best be organized?

  10. Community Design Requirements(from comments of 22 reviewers) • Incorporate sufficient metadata to identify provenance and give exact definition of data for unambiguous interpretation • Spatial location of measurements • Scale of measurements (support, spacing, extent) • Depth/Offset Information • Censored data • Classification of data type to guide appropriate interpretation • Continuous • Indication of gaps • Indicate data quality http://www.neng.usu.edu/cee/faculty/dtarb/HydroObsDataModelReview.pdf

  11. Observations Data Model Groundwater levels Streamflow Precipitation & Climate Soil moisture data Flux tower data Water Quality • A relational database at the single observation level • Common persistence model for observations data • Metadata for unambiguous interpretation • Traceable heritage from raw measurements to usable information • Promote syntactic and semantic consistency • Cross dimension retrieval and analysis Horsburgh et al., 2008, WRR 44: W05406

  12. CUAHSI Observations Data Model http://his.cuahsi.org/odmdatabases.html Horsburgh, J. S., D. G. Tarboton, D. R. Maidment and I. Zaslavsky, (2008), A Relational Model for Environmental and Water Resources Data, Water Resour. Res., 44: W05406, doi:10.1029/2007WR006392.

  13. Stage and Streamflow Example

  14. Water Chemistry from a profile in a lake

  15. Water Chemistry from Laboratory Sample

  16. 6 5 7 4 1 3 2 At last … Work from Out to In And don’t forget … CUAHSI Observations Data Model http://www.cuahsi.org/his/odm.html

  17. Map Server Time Series Analyst HydroServerWebsite HydroServerCapabilities Web Service HydroServer - A Platform for Managing and Publishing Experimental Watershed Data HydroServerDatabase Configuration Tool WaterOneFlow Services HydroServer Database Spatial Services WaterOneFlow WaterOneFlow WaterOneFlow ODM ODM ODM ODM Databases and WaterOneFlow Web Services ArcGIS Server Spatial Data Services http://hydroserver.codeplex.com/

  18. Dynamic shared vocabulary moderation system ODM Data Manager ODM Website ODM Tools ODM Shared Vocabulary Moderator XML Master ODM Shared Vocabulary Local ODM Database ODM Shared Vocabulary Web Services Local Server http://his.cuahsi.org/mastercvreg.html From Jeff Horsburgh

  19. Data System Overview CZO Desktop CZO Central GetSites GetSiteInfo GetVariableInfo GetValues WaterML Data Store WaterOneFlow Web Service Harvester ASCII text Standardized web based display Web service Collaboratory Boulder Shale Sierra Luquillo Jemez Christina CZO Servers

  20. CUAHSI HIS – looking ahead • A “data sharing/social networking” site for hydrologic data (and possibly models) • Simple and easy to use • Find, create, share, connect, integrate, work together online. Collaborate • Hydro value added

  21. CZO web based file format • Time series display files • The data – time series in columns • Methods files • A single file listing the methods used by the CZO • Measurement location files (the term agreed for what used to be called a site. Other names considered were station, node, monitoring point, platform) • A single file listing the measurement locations at which measurements are made by the CZO • Need a concept of spatial grouping for locations • Identify the groups that locations belong to – implies a need for a location groups file. (Measurement groups) The slides from this one following contain edits made during the presentation, e.g. the change from “site” to “measurement location”. As a result they may not be entirely consistent, but were as we left things at the end of the meeting.

  22. Time series display file • Header • Doc group • Default parameter group • Column header group • Data • Columns of data

  23. Doc group

  24. Default parameters pertain to all data in file except when overridden by a specific column header (to encourage specification only once) Examples DEFAULT_PARAMETER. site  ="GREEN LAKE 4" DEFAULT_PARAMETER. offset_value ="2", offsetUnits = "meters", offset_description= "this is vertical offset from the ground level down" DEFAULT_PARAMETER. quality_control_level ="0"   DEFAULT_PARAMETER. missing_value_indicator  ="-9999"

  25. Column headers Examples COL1. label=ValueAttribute, value=DateTime, UTCOffset=-7, Timezone=MST, format=”YYYYMMDD hh:mm” COL2. label=VariableName, value=StreamFlow, units=m3/s, TimeSupport= 3, TimeSupportUnits=hr, NoDataValue=-9999, SampleMedium=water, method=method1, Offsetvalue = 3, OffsetValueUnits=m , offsetDescription = "Depth below surface" COL3. label=VariableName, value=pH, units=pH units, missing value indicator=-9999 COL4. label=VariableName, value=conductance, units=uS/cm @ 25 degrees C

  26. Series level attributes • Required metadata for each data value in a CZO time series display file SiteCode Units Method OffsetValue OffsetDescription SampleType VariableName SampleMedium ValueType TimeSupport TimeSupportUnits DataType DataLevel NoDataValue UTCOffset TimeZone OffsetValue OffsetDescription OffSetUnits CensorCode

  27. Series level attribute definitions 1

  28. Series level attribute definitions 2

  29. Value level attributes Any value level attribute that is the same for an entire series may be promoted to series level attribute and go in column header

  30. Measurement Locations file Sampling feature refers to feature of interest.

  31. Methods file Is further subdivision needed to elicit specific method elements ?

  32. Shared vocabularies • Variable names (grouped into categories with a keyword list associated with each name. Need a field for keywords and categories to be added to present CUAHSI HIS system) (e.g. Precipitation, Streamflow, Nitrogen, Soil moisture) • Units (extended from CUAHSI HIS) (e.g. m, g/L) • Value type (from CUAHSI HIS) (e.g. Field observation, derived value, model output) • Sample type (from CUAHSI HIS) (e.g. stream water, ground water, rock, soil) • Data type (from CUAHSI HIS) (e.g. average over interval, cumulative, continuous, sporadic) • Data level (based on Ameriflux list) (e.g. level 0=raw data, level 4 = fully infilled and quality controlled) • Spatial references ( extensible based on EPSG) (e.g. NAD 1983, WGS84, UTM zone 11) • Censor code (from CUAHSI HIS) (e.g. less than, not-censored, non detect) • Qualifier code (in CUAHSI HIS qualifiers are not a PV. A CZO specific set of qualifiers will need to be developed) • Vertical datum (from CUAHSI HIS) (e.g. Mean Sea Level, NGVD29)

  33. Ilya’s Unresolved issues • Policies and best practices for generating display files and setting up data folders, and how we detect what is new • Update frequency • Semantic tagging (how automated) • How shall we handle situations when data are removed/overwritten? • Need more examples and test cases • What information in log files is needed • How to present data use agreements in services • How to deal with different types of data

  34. Other issues • Other data types • Maps, GIS data (OGC web services?) • Geophysical data, images, geochemistry data, geological data, soil profile data • Simple capability to store and share arbitrary digital objects with metadata using e.g. Catalog Services for the web • LIDAR data (just use SDSC Open Topography or NCALM) • Archiving • Questions, additional needs (wishes)

More Related