  1. Creating File Format Guidelines:The Aura Experience David Cuddy Jet Propulsion Laboratory California Institute of Technology October 20, 2009 ESDSWG, Wilmington, DE

  2. Agenda • Authors/Affiliations • Aura Instruments • Introduction • Aura Format Guidelines • Aura Swath Data File Structure • Background into the Process • Validating and Verifying • Items to Standardize • Team Organization • Process • Summary • Web Sites

  3. Authors/Affiliations • Cheryl Craig – HIRDLS (NCAR) • Ken Stone – HIRDLS (UofColorado) • Nathaniel Livesey – MLS (JPL) • Steve Friedman – TES (JPL) • David Cuddy – MLS (JPL) • Doug Ilg – OMI (RITSS) • Pepijn Veefkind – OMI (KNMI – Netherlands) • Scott Lewicki – TES (JPL) • Peter Leonard – OMI (ADNET) • Al Fleig – OMI (PITA) • Paul Wagner – MLS (JPL) • Christina Vuu – MLS (Raytheon) • Doug Shepard – TES (JPL) • Silent Authors: • Steve Larson TES (JPL) • Joost Carpay OMI (KNMI – Netherlands) • Susan Paradise- TES (JPL)

  4. Aura Instruments • HIRDLS (High Resolution Dynamics Limb Sounder) • Limb infrared sounder • University of Colorado and Oxford • MLS (Microwave Limb Sounder) • Limb microwave sounder • JPL and University of Edinburgh • OMI (Ozone Monitoring Instrument) • Nadir wide-field-imaging spectrometer • Netherlands, Finland and US • TES (Tropospheric Emission Spectrometer) • Nadir and limb infrared-imaging spectrometer • JPL All instruments have world-wide co-investigators

  5. Introduction • “Creating File Format Guidelines: The Aura Experience” • Creation of a common file format developed and used by the individual teams working on the four instruments on NASA’s Aura satellite • Each team was independent and under no mandate to use a common file format • The decision and the implementation of it was a grassroots effort • Accepted by all of the PIs and leading scientists • Early on in the Aura program, the teams realized that common data and file formats would greatly facilitate the sharing of data • This presentation describes the process and lessons learned used in developing the guidelines and the keys to its success • Future NASA missions can build on this technical note of the Aura experience to develop their own set of guidelines

  6. Aura Format Guidelines • The teams agreed to: • HDF5/HDF-EOS5 data format • Specific details within the file • Names, data types and dimension order of fields • File-, group- and field-level attributes to include in each product file • A file-naming convention • HDF-EOS library allows flexibility - further constraints desirable • Data fields which are common are stored in same way, and with same name • Identified attributes which aid data use • By sharing format of data sets across all Aura instrument teams: • Ease development of software • Make data sets easier to understand • Used common standard library

  7. Aura Swath Data File Structure (Data File Structure) File Level Attributes: InstrumentName, ProcessLevel GranuleMonth, GranuleDay, GranuleYear, TAI93At0zOfGranule PGEVersion Swath Name: Instrument Specific Swath Level Attributes: Pressure, VerticalCoordinate Dimensions: nTimes, nLevels, nWavel, nXtrack, nLayers Geolocation Fields: Time, Latitude, Longitude, Pressure Solar Zenith Angle, Local Solar Time, etc. (See the valids for the complete listing of possible geolocation fields) Geolocation Field Attributes: MissingValue, Title, Units UniqueFieldDefinition ScaleFactor (only if applicable) Offset (only if applicable) Data Fields: Temperature O3 etc. (See the valids for the complete listing of Possible Data fields) Data Field Attributes: MissingValue, Title, Units UniqueFieldDefinition ScaleFactor (only if applicable) Offset (only if applicable) Swath Name2: Instrument Specific Additional swaths may occur in a file

  8. Background into the Process • The standard each guideline must meet: • Does it help the end user to develop one universal reader to read the primary data within the Aura teams’ data files? • Items not affecting the reading of the data were not standardized • Example is compression • Instrument specific data fell outside of the standardization process • Instrument teams were free to add any additional fields • A feature of HDF files: • Additional information can be added to a file and it does not impact a reader • Unless that data is required to be read

  9. Validating and Verifying • Validating the Guidelines • A preliminary guidelines document had been circulated within the instrument teams and with a representative from the GES-DISC • V1.0 of the document described the Level 2 data files sufficiently for development of these data products to proceed. • A validation tool was developed specifically to check Aura Level 2 data files for compliance • Verifying Files • Teams shared their data files with the other Aura instrument teams as development progressed • The teams then verified that the data file structures matched other teams’ structures • This verification was an important part of the process, because the coauthors were not confident that the guidelines were defined adequately • Early versions had one glaring omission - the data type of fields • By the time this was discovered, teams had already developed their data files, but fortunately all teams had chosen to use the same data types • This guideline was actually fleshed out after the initial data file development was completed

  10. Items to Standardize • If a self-describing data format such as HDF, HDF-EOS or netCDF is being used, then standardization should include: • Names of fields (including capitalization and spacing) • Names and ordering of dimensions for each field • Data types and sizes for each field (for instance integer, 32 bit) • Attributes for each field and their types and definitions. • Additional benefits can be realized by standardizing the following contents as well: • Units for each field • Coordinates: the actual values of any fields which describe the location of data (such as latitudes if a gridded product, pressure levels, etc.) • File naming scheme

  11. Team Organization • Commitment from every team to the process at the outset • Significant amount of time and compromise involved • The guidelines were a voluntary effort • Be willing to commit to the process for the long haul • Acceptance of the effort needs to be at all levels of management, especially the leading scientists • Every team must have at least one dedicated author and representative • Appoint a dedicated group leader • Have a forum for gathering the team members interested in data issues together

  12. Process • Document needs to be detailed • Use of a direct access, self-describing data storage library (like HDF and HDF-EOS) eases the standardization process • The data fields which are in common between two or more instrument teams are the only ones which need to be standardized • Allow flexibility • Modify the document to incorporate every team’s input • Be willing to compromise. • Look for creative solutions to attain compromise. • Exchange data sets early on • Create a strawman draft, early

  13. Communicate, Communicate, Communicate • Essential and start early • At every DSWG Meeting, items that needed agreement were discussed • When extensive discussion was required, splinter meetings took place at the DSWG • Email was the primary tool for discussion and reaching consensus • At times, conference calls were used • Some issues were tabled until the next DSWG meeting • The email list contained both named and unlisted authors. • Anyone who wished to be included on the email list was added to it • All discussions were sent using the general email list (openness was important) • When an agreement needed to be reached, everyone was entitled to respond, but authors whose names were on the document were required to respond • Controversial items were taken to their individual instrument teams for discussion and approval/disapproval • The results of these discussions were then reported back to the group • Every major version release was agreed upon by all of the named authors

  14. Summary • Aura instrument teams developed their own set of file format guidelines • Aura instrument teams presented common data in a standardized way but let instrument specific information vary • Because of this effort, generic readers could be written to read the standardized data from any Aura instrument • Future instruments can build on these procedures to develop their own guidelines for their instruments or use the Aura Guidelines as they stand

  15. Web Sites NASA web site Aura Guidelines: Creating the Aura Guidelines: