1 / 14

Auditing Distributed Digital Preservation

Website: http://safearchive.org Contact: Jonathan Crabtree < jonathan_crabtree@unc.edu > Micah Altman < micah_altman@harvard.edu >. Website: http://safearchive.org Contact: Jonathan Crabtree < jonathan_crabtree@unc.edu >

isra
Télécharger la présentation

Auditing Distributed Digital Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Website: http://safearchive.org Contact: Jonathan Crabtree <jonathan_crabtree@unc.edu> Micah Altman <micah_altman@harvard.edu> Website: http://safearchive.org Contact: Jonathan Crabtree <jonathan_crabtree@unc.edu> Micah Altman <micah_altman@harvard.edu> Auditing Distributed Digital Preservation Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013

  2. Collaborators* * And co-conspirators • Micah Altman, Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Tom Carsey, Thu-Mai Christian, Kevin Condon, Jonathan Crabtree, MerceCrosas, Gary King, Patrick King, Sophia Lafferty-Hess, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Terry Rowland, Akio Sone, Bob Treacy • Research Support Thanks to the, IMLS (LG-05-09-0041-09), Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.

  3. Related Work Reprints available from: http://futurelib.org Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. Thu-mai Christian, Jonathan Crabtree, Nancy Mcgovern et al., Overview of SafeArchive : An Open-Source System for Automatic Policy-Based Collaborative Archival Replication. Proceedings of iPres 2011. (Forthcoming) Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. "A Prototype Platform for Policy-Based Archival Replication." Against the Grain. 21(2): 44-47. Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182

  4. Managing copies can be challenging

  5. Why distributed digital preservation?

  6. Potential Nexuses for Preservation Failure Source: Reich & Rosenthal 2005 • Technical • Media failure: storage conditions, media characteristics • Format obsolescence • Preservation infrastructure software failure • Storage infrastructure software failure • Storage infrastructure hardware failure • External Threats to Institutions • Third party attacks • Institutional funding • Change in legal regimes • Quiscustodietipsoscustodes? • Unintentional curatorial modification • Loss of institutional knowledge & skills • Intentional curatorial de-accessioning • Change in institutional mission

  7. Why was Created? Verified geographically-distributed replication of content is an essential component of any comprehensive digital preservation plan. The requirement has emerged as a necessity for recognition and certification as a trusted repository.

  8. What can you do with ? • Analyze any existing set of public LOCKSS systems or Private LOCKSS Network • which collections are replicated? • when were they last verified, and updated? • identify potential problems with the storage network • Create formal TRAC policies • create operational policies for replication and distribution • create advisory policies for all TRAC criteria • Audit your storage network against your policies • verify that collections are currently replicated, verified, updated • create historical audit trails and evidence of long-term compliance • Replicate content from web sites or digital repository systems • use SafeArchive/DVN plugins to replicate content in the Dataverse Network • use SafeArchive/LOCKSS plugins to replicate content through OAI or HTTP • Automatically deploy and repair LOCKSS replication based on policy

  9. Why use ? • SafeArchive provides the reliability of a top-down replication system with the resiliency of a peer-to-peer model. • SafeArchive automates high-level replication and distribution policies • SafeArchive automates multi-institutional replication • SafeArchive facilitates sharing TRAC policies • SafeArchive verification and audit trails for replication policies • SafeArchive is Open Source, and integrates with LOCKSS, and the Dataverse Network • SafeArchive is Standards-Based, and supports DDI, OAI-PMH, and TRAC

  10. Latest Research: Lessons Learned Lesson 1: Replication agreement does not prove collection integrity  seek external evidence of correct harvesting Lesson 2: Replication disagreement does not not prove collection corruption  seek diagnostics Lesson 3: Distributed digital preservation works …with evidence-based tuning and adjustment

  11. Lessons Learned Cont. Lesson 4: All networks had substantial and unrecognized gaps  Trust but continuously verify Lesson 5: Don’t aim for 100% performance,aim for 100% compliance Lesson 6: Many different things can go wrong in distributed systems, without easily recognizable external symptoms  Distributed preservation requires distributed auditing analysis Lesson 7: External information on system operation and collection characteristics is important for analyzing results Transparency helps preservation

  12. Potential Alignment Areas Sharing experiences and solutions Sharing auditing tools Expand tools sets to additional audit standards Develop standardized audit interfaces to distributed digital preservation networks

  13. Future SafeArchivePossibilities • Support additional audit standards • Data Seal of Approval • ISO 16363 • Support additional replication networks • iRODS • Data Conservancy • Others?? • Audit other policy sets • Data Management policies • IRB Policies

  14. Questions • Website • www.safearchive.org • Sourceforge • http://safearchive.sourceforge.net/ • Contacts • Micah.Altman@gmail.com • Jonathan_Crabtree@unc.edu

More Related