1 / 23

Data Formats and Tools

Deutscher Wetterdienst. Data Formats and Tools. R.W. Mueller, R.Hollmann, C.Träger-Chatterjee. Content. Overview HDF5 netCDF Binary ASCII Conclusion. Overview. New data formats have been developed

kaiyo
Télécharger la présentation

Data Formats and Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deutscher Wetterdienst Data Formats and Tools R.W. Mueller, R.Hollmann, C.Träger-Chatterjee

  2. Content • Overview • HDF5 • netCDF • Binary • ASCII • Conclusion

  3. Overview • New data formats have been developed • better handling of manifold information provided by satellite data, reanalysis or model data • optimise computing performance (IO-process) • reduce disk space needed • Requirements to the data format: storage of … • … the data itself with high resolution in space-time • Different data layers possible • … the meta-information, e.g. • Calibration coefficients • Geolocation and projection • Statistical error information • Gain and offset • Whatever the operator would like to add as meta information

  4. Overview – the favourites • For satellite data two formats are important. Different but related – both with associated data model • HDF: Hierarchical data format • netCDF: network Common Data Form • Further formats for satellite data: • HRIT raw data format  not discussed, focus on products • Specific Binary Format  always possible, no common data model • ASCII  no data model, quite seldom

  5. HDF – Hierarchical Data Format • HDF5 - general purpose library and file format for storing scientific data • Create and store almost any kind of scientific data structure • e.g. images, arrays of vectors, structured and unstructured grids, … • one can also mix and match different data formats in HDF5 files • Efficient storage and I/O • created to address the data management needs of high performance, data intensive computing environments • As a result, library and format emphasize storage and I/O efficiency (especially on parallel machines), including file compression

  6. HDF – Hierarchical Data Format • The most recent version is HDF5, but a lot of data are still in HDF4 format. • Both are machine independent (no big / little endian problem) • Information, tools, examples and the HDF software (library) available at http:/hdf.ncsa.uiuc.edu/HDF5 and http://hdf.ncsa.uiuc.edu/hdf4.html • Widely used, e.g.: • MODIS (HDF4) • Eumetsat, e.g. all SAFs (HDF5)

  7. HDF command line tools • No downward compatibility • many hdf5 command line tools and interfaces (e.g. implemented in f90,c programs) can not be used for HDF4 files. • h5dump - dumps displays the input of the hdf file in ASCII • h5ls - lists the contents of a file, enables fast checks if the needed data is in there • h5import - imports ASCII to hdf5 • configuration file is needed, hence some basic knowledge about HDF data model and structure required

  8. HDF5 as ASCII using h5dump Common data model but in detail it can look quite different, comments in red !! HDF5 "TRS_SR_20040708_1200_V000.hdf" { filename GROUP "/" { definition of a group GROUP "Data" { DATASET "TRS" { definition of the dataset DATATYPE H5T_STD_I16BE def. of the data type DATASPACE SIMPLE { ( 3712, 3712 ) / ( 3712, 3712 ) } the dimension DATA { the data (0,0): -32767, -32767, -32767, -32767, -32767, -32767, -32767, (0,7): -32767, -32767, -32767, -32767, -32767, -32767, -32767,…. (883,707): 495, 455, 436, 436, 378, 323, 378, 416, 342, 277, 296, ……} ATTRIBUTE "Gain" { ….. definition of attributes continued on the next slide

  9. HDF5 as ASCII using h5dump ATTRIBUTE "Gain" { …. Gain and offset DATATYPE H5T_IEEE_F32BE used to reduce needed DATASPACE SCALAR disk space (possible to DATA { save data as integer) (0): 0.25 } } ATTRIBUTE "Offset" {….. DATATYPE and DATASPACE…. DATA { (0): 0 } } ATTRIBUTE "nodatavalue" {…. DATATYPE and DATASPACE…… DATA { Attribute is also used (0): -32767 for unit, title,… }

  10. HDF5 as ASCII using h5dump GROUP "Geolocation" { definition of a new group, and the DATASET "projection" { dataset needed to define the projection DATATYPE H5T_COMPOUND { H5T_STRING { STRSIZE 128; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } "reference ellipsoid"; H5T_ARRAY { [10] H5T_IEEE_F32LE } "parameter"; } DATASPACE SIMPLE { ( 1 ) / ( 1 ) } DATA { (0): { "geostationary view", "WGS-84", [ 1856, 1856, 667.204, 667.204, -1, -1, -1, -1, -1, -1 ] } }} DATASET "region" { a group usually consists of different datasets }}}

  11. HDF GUI Tools -HDFView- • The complex data model might act as a deterrend for beginners • Graphical User Interface HDFView overcomes this handicap. It is a tool for browsing and editing HDF4 and HDF5 files using a GUI • Relatively easy to install and available for many platforms, e.g Windows, Solaris, AIX, Linux • Everything can be managed with buttons and mouse clicks • Data can be saved as ASCII table • Images can be generated and saved. • http://www.hdfgroup.org/hdf-java-html/hdfview/index.html

  12. HDF Tools – CMSAF GUI • Software available for CM-SAF customers via www.cmsaf.eu • Features: • visualisation of CM-SAF products (in HDF5 format) • simple data analysis • Export (ASCII, lat/lon grid) • Uses free IDL Virtual Machine

  13. HDF Tools CM-SAF GUI

  14. HDF Tools CM-SAF GUI More on this topic in the exercise session

  15. netCDF • Information, tools, examples and the netCDF library are available at:http://www.unidata.ucar.edu/software/netcdf/ • Widely used, e.g.: • Reanalysis data of National Centers for Environmental prediction (NCEP) and European Centre for Medium Weather forecast (ERA40) • HOAPS, Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data • CM-SAF selected monthly means

  16. netCDF command line tools • ncdump - file shows the input of the netCDF file • ncgen - converts ascii to netcdf and vica versa • sounds easy but a configuration file (CDL file) is needed • some basic knowledge about the net CDF data model and structure • however, easier to handle for beginners as HDF5 • example of ASCII CDL configuration file:

  17. netCDF as ASCII netcdf SRBmm200604 { dimensions: lat = 501 ; lon = 741 ; time = UNLIMITED ; // (0 currently) variables: float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degree" ; float lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees" ; float Z(lat, lon) ; Z:units = "Watt" ; Z:valid_range = 0., 1400. ; data: lat = 35, 35.05, 35.1, 35.15, 35.2, 35.25, 35.3, 35.35, 35.4, … ; lon= 44,45,….; Z=300,340,…; }

  18. netCDF Tools – Integrated Data Viewer (IDV) • Free GIS tool • Display data / generate maps • Imports netCDF

  19. netCDF GUI Tool CDAT • Open source integrated environment for data analysis and visualisation. • Mainly netCDF, but can also deal with GRIB and HDF. • Import of binary and ASCII data possible. • Available for different platforms but not for Windows!

  20. Binary Data • Usually used instead of ASCII • to reduce disk space and to increase the computing performance. • Machine readable format not readable by humans • Usually files with / without header and data as defined data type e.g. float (2.44) or integer (4) • Reading and writing with e.g. C, C++, Fortran • Formats are not common  indivdual read / write routines needed • some tools can read and visualise binary data. e.g. • CDAT, GRADS, idl • data is not self-explanatory  The length of the header and the data type has to be known

  21. Binary Data and ASCIII • Examples for binary data: • International Satellite Cloud Climatology Project (ISCCP). http://isccp.giss.nasa.gov • AVHRR based USGS land use maps.

  22. ASCII • readable with a text editor • a quite unusual format • sometimes provided by the data centre for subsets of the data on request, e.g. CM-SAF • 2006 9 27 6 0 71.93 • 2006 9 27 6 15 109.75 • 2006 9 27 6 30 73.28 • 2006 9 27 6 45 96.04 • 2006 9 27 7 0 84.16 • 2006 9 27 7 15 91.51 • 2006 9 27 7 30 110.54 • 2006 9 27 7 45 122.44 • 2006 9 27 8 0 166.66

  23. Conclusion • HDF5 • Header, describing the data. Data in binary format • HDF-View, CM-SAF GUI • Official format of CM-SAF Daten • netCDF • Header, describing the data. Less cryptic than HDF5. Data in binary format • Diverse GIS, e.g. ArcView, Integrated Data Viewer, CDAT • On demand some CM-SAF data can be provided in netCDF. • Binary • Instead of ASCII, to reduce disk space • ASCII • Readable with a text editor.

More Related