1 / 33

Improving long-term preservation of EOS data by independently mapping HDF4 data objects

Improving long-term preservation of EOS data by independently mapping HDF4 data objects. The HDF Group. Mapping project team members. The HDF Group. NASA. Ruth Duerr & Luis Lopez(NSIDC ) Chris Lynnes (GES DISC). Ruth Aydt Mike Folk Joe Lee Elena Pourmal Binh-Minh Ribler

kaleigh
Télécharger la présentation

Improving long-term preservation of EOS data by independently mapping HDF4 data objects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving long-term preservation of EOS data by independently mapping HDF4 data objects The HDF Group

  2. Mapping project team members The HDF Group NASA Ruth Duerr & Luis Lopez(NSIDC) Chris Lynnes (GES DISC) • Ruth Aydt • Mike Folk • Joe Lee • Elena Pourmal • Binh-Minh Ribler • Muqun{Kent} Yang Raytheon • Evelyn Nakamura • many others Annual HDF Briefing to NASA

  3. Recap • Problem • The complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software. • Solution • Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. • Implement tools to create layout maps for EOS data products. • Deploy tools at select EOS data centers. Annual HDF Briefing to NASA

  4. Annual HDF Briefing to NASA

  5. HDF4 mapping workflow HDF4 File HDF4 Map File (XML document) h4mapwriter linked with HDF4 library Object Data Groups, Data Objects, Structural and Application Metadata; Locations of Object Data Readerprogram Annual HDF Briefing to NASA

  6. Phase 1Build a prototype(completed in 2009) Annual HDF Briefing to NASA

  7. Phase 2Productize HDF4 Mapping schema and tools for deployment Annual HDF Briefing to NASA

  8. Phase 2 tasks • Investigate integration of mapping schema with existing standards • Determine HDF-EOS 2 requirements • Redesign and expand the XML schema • Implement production quality map writer • Develop demo map reader • Deploy tools at select NASA data centers Annual HDF Briefing to NASA

  9. Annual HDF Briefing to NASA

  10. Task AInvestigate integration of mapping schema with existing standards Annual HDF Briefing to NASA

  11. Investigate existing standards • Investigated: • METS, PREMIS, ESML, NcML, and CSML • Concluded: • Existing standards have different purposes than mapping schema • None meet all needs of mapping project • Develop new schema tailored to project goals • Harmonize with PREMIS • Leverage terminology and approaches from all • Status: • Need to write report • Need to include some PREMIS-like data such as HDF4 file size and possibly MD checksum Annual HDF Briefing to NASA

  12. Task BDetermine HDF-EOS2 requirements Annual HDF Briefing to NASA

  13. Background An HDF-EOS2 file is an HDF4 file, so one can create an HDF4 mapping file to archive the HDF-EOS2 file. However, for some HDF-EOS2 files, it may be extremely difficult to retrieve correct geo-location information from the mapping files. For those files, special HDF-EOS2 mapping files may be needed. Annual HDF Briefing to NASA

  14. Categorize HDF-EOS2 data products • Created a data pool from NASA data centers • GES DISC, NSIDC, LAADS, LP DAAC • LaRC, PO.DAAC, GHRC, OBPG • Analyzed data and reported options for adding HDF-EOS2 contents to the mapping file • Conclusion: No special mapping for HDF-EOS2 needs to be done • However, the study uncovered some important shortcomings in certain HDF-EOS products Annual HDF Briefing to NASA

  15. Status and Plans • Status: Complete • Detailed descriptions of sample data: • http://hdfeos.org/zoo/Data_Collection/index.php • Documents and reports at wiki: • http://wiki.hdfgroup.org/MappingPhase2_TaskB • Plans • We plan to recommend a future task in which these issues are made known to the project Annual HDF Briefing to NASA

  16. Task CRedesign Schema Annual HDF Briefing to NASA

  17. Design priorities and assumptions • Mapping files • Provide complete access to user-supplied content in NASA’s EOS binary HDF4 files • Have enough information to stand on their own • Be as simple as possible • Mapping schema • Describe the Mapping files • Used for validation and documentation • May not be available to target user Annual HDF Briefing to NASA

  18. Status and Plans • Status • All HDF4 objects found in EOS products are now handled by the Mapping schema. • Plans • Complete schema elements for HDF4 file description information • File size, MD checksum (?), HDF4 library version stamp (?) • Finalize schema documentation • Address any additional HDF4 objects found during remainder of project, either by updating schema and map writer, or with follow-on proposal if substantial amount of effort required. Annual HDF Briefing to NASA

  19. Task DImplement map Writer Annual HDF Briefing to NASA

  20. Map Writer Requirements • Retrieve information needed from HDF4 file • Write out corresponding XML file • Quality requirements • Completeness • Don’t miss any objects in file • Report on objects or features not handled by the writer • Accuracy – don’t give wrong information • Readability – provide adequate instructions in the file Annual HDF Briefing to NASA

  21. Activities • Implement functions to facilitate map creation • Develop writer requirements based on new XML schema and additional deployment needs • Implement new functions as needed • Include functions in library as appropriate • Implement writer: h4mapwriter • Interpret map requirements according to schema • Implement writer • Package for deployment • Support deployment Annual HDF Briefing to NASA

  22. Status and Plans • Status • Implement functions to facilitate map creation • All functions implemented • Implement writer • Handles all objects • Available as alpha-2 release • Being tested by GES DISC, NSIDC, Raytheon • Plans • Functions to facilitate map creation • Include in future HDF4 releases • Writer • Finish HDF4 file description elements • Complete testing and documentation • Support deployment, fix bugs and add features as needed Annual HDF Briefing to NASA

  23. Task EImplement demo reader Annual HDF Briefing to NASA

  24. Demo Reader Requirements Multiplatform command line tool Easy to use clear arguments and output Must validate that objects in the mapping file are actually in the HDF4 file Developed in a well-supported high level language (python) Well documented Available as open source Annual HDF Briefing to NASA

  25. Demo reader activities Develop requirements, based on new schema and identification of additional deployment needs. Design reader, based on requirements, and from a review of the prototype design. Implement and document reader. Test reader on EOS file “zoo” Deposit reader, documentation, and tests in open source repository, probably SourceForge. Annual HDF Briefing to NASA

  26. Demo Reader Status • Status • Support provided so far for Vdata, SDS, Group, and Attribute • Current source code available at http://sourceforge.net/projects/hdf4mapreader/ • Documentation at http://hdf4mapreader.sourceforge.net/ • Plans • Add raster image (RIS) and palette support Annual HDF Briefing to NASA

  27. Task GDeploy Annual HDF Briefing to NASA

  28. Task G: Deploy • Begin in April 2011, complete in June • The HDF Group • Provide h4mapwriter map generation tool • Maintain tool during deployment and validation • Assist GES DISC, NSIDC, and Raytheon with deployment and validation • Raytheon • Validate HDF4 map software in anticipation of future deployment • GES DISC and NSIDC: see next slide Annual HDF Briefing to NASA

  29. What about GES DISC and NSIDC? • Activities (formerly): • GES DISC • Incorporate into the existing archive ingest system • Manage the retrofit into existing metadata files • NSIDC • Support implementation in NSIDC’s ECS system • Other ESDCs • Encouraged to join in • But deployment to other centers expected subsequent to the project. • Ruth Duerr’s observation: • The task for NSIDC is to assist in the ECS implementation at NSIDC, which won't take place until 2012 • Task G only includes the work up to the handoff to ECS • Thus, what NSIDC does needs to extend after the period of performance of this award is over   • How do we resolve that issue? Annual HDF Briefing to NASA

  30. Beyond July 15 Annual HDF Briefing to NASA

  31. Future work • NSIDC • assist in the ECS deployment at NSIDC • GES DISC: • ? • The HDF Group • Monitor deployment activities by Raytheon and others to identify • Unsupported objects and tags occurring in products • Software defects • Feature requests • As needed, fix defects, add features, and add support for new objects and tags • Address performance issues • Add h4mapwriter tool and supporting API to regular HDF4 testing regime • Perform other services in support of the software as needed • All • Perform post mortem and identify lessons learned • Write paper summarizing the project • Investigate HDF5 mapping Annual HDF Briefing to NASA

  32. The End

  33. Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. Annual HDF Briefing to NASA

More Related