540 likes | 669 Vues
This presentation outlines the creation of a Common Data Access Model (CDM) that integrates various scientific data formats, including NetCDF, HDF5, and OPeNDAP. It discusses the development of coordinate systems, data types, and the machine-independent file formats for self-describing scientific data. The session highlights the implementation of the CDM, efficient subsetting of multidimensional arrays, and access through the THREDDS Data Server. This model aims to simplify and standardize data access for scientific communities, facilitating the discovery and utilization of diverse datasets.
E N D
Unidata’sCommon Data Modeland theTHREDDS Data Server John Caron Unidata/UCAR, Boulder CO Jan 6, 2006 ESIP Winter 2006
Outline • Definitions • Creating a Common Data (Access) Model from NetCDF, HDF5, OPeNDAP • CDM Coordinate Systems, Data Types • CDM implementation • NetCDF Markup Language (NcML) • The THREDDS Data Server
NetCDF-3 • Machine and OS independent file format for “self-describing” scientific data • C library (Fortran, C++, Perl, IDL, MatLab, Python, Ruby), Java library • Efficient subsetting of multidimensional arrays. • > 20,000 downloads last year
HDF5 • Machine and OS independent file format for “self-describing” scientific data • C library (Fortran, Java, PyTables) • Evolution from HDF4, but different. • HDF-EOS, HDF5-EOS, standard formats for EOSDIS, ASCI, NPOESS • Parallel-IO, chunked storage, compression filters, many data types. • Developed at NCSA, now independent
NetCDF-4 • Project funded by NASA to create new version of netCDF using the HDF5 file format. • “Extend and merge” netCDF and HDF5 • Widespread use and simplicity of netCDF • Generality and performance of HDF5
NetCDF-Java 2.2 (nj22) • 100% Java library • Prototype implementation of CDM • File formats: • General: NetCDF, HDF5, OPeNDAP • Grids: GRIB1, GRIB2 • Radar: NEXRAD, NIDS, DORADE • Satellite: DMSP, GINI • Access to THREDDS catalogs
OPeNDAP • Client-server protocol for scientific data access • C++ client and server, Java client and server libraries. • Current version 2.0; NASA ESE standard • Working on new 4.0 protocol spec
THREDDS • Originally funded by NSDL • “discovery and use of scientific data” • Middleware between data providers and users • Dataset Inventory Catalogs (XML) • Now part of Unidata core funding • Data Serving (pull)
What’s a Data Model? • Its about scientific data: storing, accessing • It’s an abstraction • Equivalent to an abstract object model in OOP • An Abstract Data Model describes data objects and what methods you can use on them
An API is the interface to the Data Model for a specific programming language A file format is a way to persist the objects in the Data Model. A data access protocol plays the role of a file format. The Abstract Data Model removes the details of any particular API and the persistence format. What’s a Data Model?
Creating a Common Data Access Model from NetCDF, HDF5, OPeNDAP
NetCDF-3 Data Model
HDF5 Data Model
Scientific Datatypes Point Trajectory Station Radial Grid Swath Common Data Model Layers Coordinate Systems Data Access
Coordinate Systems needed • NetCDF, OPeNDAP, HDF data models do not have integrated coordinate systems • so georeferencing not part of API • Need conventions to specify (eg CF-1, COARDS, etc) • Contrast GRIB, HDF-EOS, other specialized formats • Must be done in a general way
Coordinate Systems • Same underlying mathematics as VisAD, ASCII
Scientific DataTypes • Based on datasets Unidata is familiar with • APIs are evolving • How are data points connected? • Intended to scale to large, multifile collections • Intended to support “specialized queries” • Space, Time • Corresponding “standard” NetCDF file conventions
PointObsDataset Methods // Collection of StructureData Collection getData( LatLonRect boundingBox, Date start, Date end);
TrajectoryObs Methods int getNumPoints(); StructureData getData(int point);
StationObs Methods // return List of Station List getStations(); // return List of StructureData List getData( Station s, Date start, Date end);
Radial methods interface Radial { int getNumGates(); float getData(int gate); float getStartingGate(); float getGateSize(); float getElevation(); float getAzimuth(); double getTime(); }
Grid methods interface GridCoordSys { CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); } Array getDataCube(Range time, Range z, Range y, Range x);
Standardizing NetCDF Formats • Grid: CF-1 Convention • Need improvements for regional models (WRF), GIS info • Radar: “Radar Exchange Format” • With radar community (led by NCAR ATD) • Point Observations • Unidata Observation Dataset Conventions
netCDF-3 Interface netCDF-4 Library HDF5 Library NetCDF-4 C Library NetCDF-4 C Library
NetCDF-4 Status • 4.0 Beta implements CDM access layer • complete, but waiting for HDF5 release 1.8 to finalize file format • 4.1: adding Coordinate Systems • 4.?: merge OPeNDAP access (pending funding)
NetCDF-Java 2.2 (nj22) • Prototype implementation of CDM • File formats: • General: NetCDF, HDF5, OPeNDAP • Grids: GRIB1, GRIB2 • Radar: NEXRAD, NIDS, DORADE • Satellite: DMSP, GINI • Access to THREDDS catalogs • Implements NcML
Scientific Datatypes Point Trajectory Station Radial Grid Swath Common Data Model Coordinate Systems Data Access
THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder NetcdfFile ADDE I/O service provider OPeNDAP NetCDF-3 NIDS NetCDF-4 GRIB HDF5 GINI Nexrad DSMP …
NetCDF-Java 2.2 Status • Data Access layer: Beta quality • also waiting for HDF5 release to finish NetCDF-4, commit to API • Coordinate Systems: early Beta • Finishing docs, runtime plugability • Data Types: Alpha, still experimenting with APIs
NetCDF Markup Language (NcML) • XML representation of netCDF metadata (like ncdump -h) • Create new netCDF files (like ncgen) • Modify existing datasets • Add/delete/rename • Create logical sections of existing variables. • Create unions and aggregations of multiple existing datasets.
NcML example <?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" location=“/data/nids/N0R_20041119_2147"> <attribute name=“DataType" value=“Radar" /> <remove type=“attribute” name=“password" /> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> </variable> </netcdf>
= + + + = NcML Aggregation • Union • Join Existing • Join New • Forecast Model Run
NcML Aggregation Example <netcdf xmlns=“http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2”> <aggregation dimName="time" type="joinNew"> <variableAgg name="Temperature"/> <variableAgg name="Pressure"/> <scan location=“C:/data/goes/" suffix=".gini"/> </aggregation> </netcdf>
THREDDS Data Server • Integrates data access with THREDDS catalogs and services • Tomcat/Servlet, 100% Java, single war file • Data input is netCDF Java 2.2 library • Data output: • OPeNDAP • HTTP Server • OGC Web Coverage Server (gridded)
THREDDS Data Server HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer • WCS NetCDF-Java library hostname.edu Datasets IDD Data
TDS as WCS Gateway hostname.edu HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer • WCS NetCDF-Java library OPeNDAP Server anotherHost.org
NcML TDS and NcML hostname.edu HTTP Tomcat Server Application THREDDS Server • OPeNDAP • WCS Netcdf-Java Catalog.xml Datasets
TDS and NcML • Server serves the dataset “wrapped” by the NcML • Client sees OPeNDAP or WCS, not NcML • Can “fix” metadata problems • Can augment metadata • Use NcML aggregation on the TDS • replaces the old “Aggregation Server”
OAI Harvester DL Records TDS and Digital Libraries HTTP Tomcat Server Catalog.xml Application THREDDS Server • OPeNDAP • HTTPServer NetCDF-Java library • WCS Datasets hostname.edu otherhost.gov OPeNDAP Server
TDS and Digital Libraries • Framework to add metadata • By hand (collection level) • Automatic extraction from datasets • Send records to existing DLs • No search • Both collection and inventory level