1 / 30

Chronopolis: Preserving Our Digital Heritage

Chronopolis: Preserving Our Digital Heritage. David Minor UC San Diego San Diego Supercomputer Center. What is Chronopolis?. UCSD Libraries.

peterr
Télécharger la présentation

Chronopolis: Preserving Our Digital Heritage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center

  2. What is Chronopolis? UCSD Libraries • A digital preservation network developed by a national consortium, with initial funding from The Library of Congress / National Digital Information and Infrastructure Preservation Program (NDIIPP). • Chronopolis partners are : • San Diego Supercomputer Center (SDSC) and the UC San Diego (UCSD) Libraries • University of Maryland Institute for Advanced Computer Studies (UMIACS) • National Center for Atmospheric Research (NCAR) in Boulder, Colorado http://chronopolis.sdsc.edu

  3. Chronopolis Fast Facts • Digital preservation environment using a data grid framework • Designed to leverage capabilities at multiple institutions • Emphasizes heterogeneous and redundant data storage systems • Has a current storage capacity of 150 TB (50 TB at 3 nodes) • Has geographically distributed copies of all data • Includes detailed monitoring and monthly auditing of all data

  4. Institutional Roles • All partners provide: • Storage, network support • Complete copy of all data • SRB support • UCSD Libraries: • Metadata expertise • SDSC: • Project Management • Finances, contracts, etc • UMIACS: • Preservation tool development • Storage technology testing • NCAR: • Data portal development http://chronopolis.sdsc.edu

  5. Data Providers • California Digital Library • 12 TB of data • Crawls of political and government web sites • ARC files, uniform size • BagIt protocol for data transfer • Inter-university Consortium for Political and Social Research (ICPSR) • 10 TB of data • 40+ years of social science research • Millions of files • Already using SRB http://chronopolis.sdsc.edu

  6. Data Providers • North Carolina State University Libraries • 6 TB of data • State and local geospatial data • BagIt protocol for data transfer • Scripps Institution of Oceanography • 1 TB of data • 50 years of data from SIO research cruises • Already using SRB http://chronopolis.sdsc.edu

  7. Core Chronopolis Tools • Storage Resource Broker (SRB) • BagIt • SRB Replication Monitor • Auditing Control Environment (ACE) • Chronopolis Web Portal http://chronopolis.sdsc.edu

  8. Storage Resource Broker • The underlying infrastructure of Chronopolis • Each site is a separate zone with its own MCAT and management • Data is replicated at each zone • Will be moving to iRODS in next few months http://chronopolis.sdsc.edu

  9. BagIt BagIt is a hierarchical file packaging format for the exchange of generalized digital content. • There is no software to install • Consists of base directory with manifest file & subdirectory with content • Manifest file has a row for each content file with: • Full path in content directory • A checksum for file Holey Bags • Have additional ‘fetch.txt’ file in base directory & empty content directory • URLs for each content file are listed in fetch.txt file. • Can reduce transfer time by fetching content in parallel http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf

  10. BagIt http://chronopolis.sdsc.edu

  11. SRB Replication Monitor • Product of UMIACS • A webapp that watches registered directories and ensures that copies exist at designated mirrors. • The monitor stores enough information to know if files have been added or removed from the master site and when the last time a file was seen. • Any action that the webapp takes on files is logged. • The monitor does NOT do any type of integrity checking, this is the responsibility of other components (eg, ACE). http://chronopolis.sdsc.edu

  12. Replication Process ReplicationMonitor http://chronopolis.sdsc.edu

  13. Auditing Control Environment (ACE) • Product of UMIACS • Software to protect the integrity of digital assets in the long term • Underpinnings are based on rigorous cryptographic techniques • Scalable, cost-effective, can interoperate with any archiving architecture http://chronopolis.sdsc.edu

  14. ACE-AM 3rd Party Auditor ACE – Overview object Hash (obj) ACE-IMS Client Integrity Token (IntegrityManagementService) (Audit Manager)

  15. ACE Audit • Can audit millions of files and TBs of data • Two types of audit: • A file audit: checks files in registered directories against stored hashes to ensure files have not been corrupted • Token audit: checks the stored hashes against a remote Integrity Management Server to ensure nobody has tampered with the stored hashes http://chronopolis.sdsc.edu

  16. ACE Audit 1. Each digital object is audited locally using the integrity token, according to the policy set by the local manager. Object 2. The integrity management system periodically audits the integrity tokens according to its policies. IntegrityToken CryptographicSummaryInformation Witness 3. Cryptographic summaries are audited as necessary using the published witness values. http://chronopolis.sdsc.edu

  17. http://chronopolis.sdsc.edu

  18. http://chronopolis.sdsc.edu

  19. http://chronopolis.sdsc.edu

  20. Web Portal • Designed to give data providers an in-depth look at their holdings • Shows where data is in all locations • Unifies information from SRB, ACE and the Replication Monitor http://chronopolis.sdsc.edu

  21. Chronopolis Metadata • Working with team from UCSD Libraries • What technical metadata is system tracking? • What descriptive metadata is present? • What are the significant events? http://chronopolis.sdsc.edu

  22. ACE ET-1 Service Level Agreement ET-5 Acquisition Registration into ACE ET-8 File Integrity Check Node 2 DP ET-7 Acquisition Replication ET-3 Acquisition Validation Replication Monitor Manifest Data Data ET-2 Acquisition Transfer ET-6 Inter-Node Inventory Check ET-4 Acquisition Registration to SRB MCAT Node 3 Node 1 http://chronopolis.sdsc.edu

  23. Future directions • Update auditing procedures • Updated portal • Automation of collection ingest • New collections and storage nodes • Fully-fledged business model • TRAC certification http://chronopolis.sdsc.edu

  24. http://chronopolis.sdsc.edu minor@sdsc.edu

More Related