1 / 24

Preservation DataStores (PDS): A demonstration of Preservation Aware Storage

Preservation DataStores (PDS): A demonstration of Preservation Aware Storage. Contacts: Simona Cohen, Michael Factor, Dalit Naor. http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml. Preservation Descriptive Information. Content Information. Content Data Object. Representation

kagami
Télécharger la présentation

Preservation DataStores (PDS): A demonstration of Preservation Aware Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation DataStores (PDS):A demonstration of Preservation Aware Storage Contacts: Simona Cohen, Michael Factor, Dalit Naor http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  2. Preservation Descriptive Information Content Information Content Data Object Representation Information Reference Provenance Context Fixity PDS Overview • OAIS–based preservation-aware storage, media-agnostic, and generic storage to support logical preservation • Manage preservation specific metadata • Compute fixity • Update technical provenance • Manage PDI RepInfo • Ensure referential integrity • Storlet container • Module container that can execute restricted modules with predefined interfaces for data intensive functions, e.g., transformations, fixity calculations • Optimal scheduling • Update PDS modules, e.g., fixity algorithm, packaging format • Managing availability/ data loss • Physically co-locate data and metadata • Cluster related AIPs on the same media unit, based upon their relative importance • AIP identifier generation – globally unique identifiers http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  3. PDS Architecture ingestAIP, accessAIP, transformAIP, getPreservationPolicy • Layered approach is based on open standards: OAIS, XAM, OSD • In CASPAR, all layers are utilized. In other embodiments, only some layers can be used • Utilizes XAM to provide logical abstraction of containers (XSets) • Offloads preservation functionality to storage http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  4. AIP Representation in the Preservation Engine • The basic building-block of an AIP is the Information Object • A single instance of data is a byte stream • Zero or more instances of RepInfo to interpret the data • RepInfo is also an AIP • Shared resource with unique ID • RepInfo recursive network • Keep the design simple http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  5. Preservation of an AIP • Ingest AIP • Storing an AIP in PDS • Bit and logical preservation of an AIP • Bit migrations • Format migrations - data transformation • Transformation modules are packed as AIPs and preserved • Transformation result is a new version for the original AIP • Migrations are documented as Provenance records • During migrations PDS performs operations on AIP • Update PDI (e.g., fixity calculation, additional provenance events) • Execute previously loaded storlets • Access AIP • Retrieval of an AIP • By retrieval time, the original AIP may have several versions and copies http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  6. Demo

  7. MST DataNatural Environment Research Council (NERC) Mesosphere-Stratosphere-Troposphere (MST) Radar at Aberystwyth • Provides measurements of vertical and horizontal components of the wind • Highly complex data • Preserved for the atmospheric scientific community • Needs long-term preservation for research purposes • Community is highly dependent on interoperability http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  8. British Atmospheric Data Centre and Self-Describing Data Formats • BADC strongly advocate the use of self-describing files to capture provenance and semantic information • Two self-describing file formats compete in the Atmospheric Science Community • NASA AMES (ASCII) • NetCDF (binary) • The current user community finds NetCDF more suitable http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  9. Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  10. pds_view Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  11. Ingest AIP flow in PDS • store (XUID, AIP ID) mapping persistently PDS web service AIP ID ingest (AIP) Preservation Engine • store (OID, XUID) mapping persistently XUID create XSet write fields (AIP data) commit XAM • Unpack AIP • Generate AIP ID • Compute fixity • Add “ingest” provenance event XAM to OSD (VIM) create OSD objects write (XSet data) OSD http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  12. Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  13. pds_ingest Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  14. 10 Years Later... http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  15. Format Change • Converting NetCDF to NASA-AMES • The MST NetCDF file version is no longer supported by the NetCDF foundations libraries, and files are not backwards compatible with this NetCDF version • The skills base of the atmospheric scientist in the future is heavily biased towards the use of ASCII-based file formats • The NASA-AMES format has become the preferred data format of the atmospheric scientists’ community • Converting NetCDF to PNG • Provide additional quick image view of the data for users with limited access permissions http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  16. Load transformation flow in PDS PDS web service AIP ID loadTransformation (AIP) Preservation Engine • Register transformation • Ingest transformation AIP http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  17. Transform AIP flow in PDS PDS web service AIP ID transformAip (targetAIPID, transformationAIPID) Preservation Engine • Access transformation AIP • Access target AIP • Execute transformation • Ingest result as new AIP, a version of the target AIP http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  18. Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  19. pds_xform Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  20. 20 Years Later... http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  21. Access AIP flow in PDS • Validate AIP ID • Validate fixity • Add “access” provenance event • Package AIP PDS web service AIP access (AIP ID) Preservation Engine AIP data open XSet (XUID) read fields XAM • Lookup XUID by AIP ID XAM to OSD (VIM) XSet data read OSD objects • Lookup OSD OID by XUID OSD http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  22. Demo Flow View AIP Ingest AIP 10 years later Load and apply transformation 20 years later Access and view transformed data http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  23. pds_access Video: http://www.haifa.il.ibm.com/projects/storage/ltdp/index.shtml

  24. Thank you! Haifa preservation team: Simona Cohen, Michael Factor, Aner Hamama, Ealan Henis, Dalit Naor, Petra Reshef, Shahar Ronen

More Related