610 likes | 925 Vues
Preserving Digital Records for the Long-Term: Building a Trustworthy Digital Repository at the Archives of Ontario Association for Manitoba Archives – April 29 th , 2011 Ryan Carpenter Senior Coordinator, Archival Electronic Records Archives of Ontario. ontario.ca/archives. Agenda.
E N D
Preserving Digital Records for the Long-Term: Building a Trustworthy Digital Repository at the Archives of Ontario Association for Manitoba Archives – April 29th, 2011 Ryan Carpenter Senior Coordinator, Archival Electronic Records Archives of Ontario ontario.ca/archives
Agenda • Archives of Ontario – A Brief Introduction • Digital Preservation Challenge • Digital Preservation at the Archives of Ontario • Trustworthy Digital Repository (TDR) • What is it • Why do we need it • What has been done • What is being done • What’s next • TDR & ECM • Digital Preservation Collaboration
Archives of Ontario: A Brief Introduction The Archives was established in 1903 Provides leadership to collect, manage and preserve the records of Ontario and to promote and facilitate their use by present and future generations Recently became part of Information, Privacy and Archives Division of Corporate Chief Information Office. Archives is made up of three integrated program delivery areas: Collections Development and Management Customer Service and Outreach Recordkeeping Support
The Digital Environment • Digital records encompass email , audiovisual recordings, textual documents, websites, images, etc. • Digital records are pervasive in all aspects of our personal and working life. • The creation of digital information is exploding at an exponential rate. • Some similarities but many differences between digital and analog records.
The Digital Environment – Government • Ontario Public Service (OPS) digital records experience mirrors what is happening in other jurisdictions. • Currently, 98% of new information created in the OPS is in digital format only. • The implementation of the Enterprise Content Management (ECM) system will shift government recordkeeping from paper to electronic media across the OPS with the electronic form of the record, rather than the paper records, will be considered authoritative. • The complexity involved in the long-term digital preservation coupled with the explosive growth of archival digital records in the next few years presents the Archives with a critical challenge; the volume of potentially archival digital records is roughly estimated to be 100 terabytes by 2013 across OPS. • Under the Archives and Recordkeeping Act, 2006, the Archives is mandated to preserve and make available archival electronic records for as long as required.
Long-term Digital Preservation - Volume Impact to the Archives • Volume • Approximately 85 TB of electronic information created in OPS in 2011 is of archival value and will potentially have to be transferred to the AO eventually. (Literature suggests that 3-5% of government records (paper records) are archival) • The current total volume of digital records collections in the Archives is 5.5 TB. • The average annual volume increase rate is approximately 400% (1998-2010) • With future ECM implementations in the OPS, there will be more rigour in transferring archival electronic records to the Archives. The OPS is managing about 1.7PB of electronic information in 2011. (Source: Managing Information Assets in the OPS: The Future is Now)
What is Digital Preservation? • Digital Preservation is the management of digital information to ensure it is accessible and understandable over time. OR • Digital Preservation encompasses a broad range of activities designed to extend the usable life of digital files and protect them from media failure, physical loss, and obsolescence. • However, it is one thing to preserve a bitstream, but quite another to preserve the content, form, style, appearance, and functionality.
Digital Preservation Threats • File Format and Software Obsolescence • Hardware and Media Obsolescence • Physical Threats
Digital Preservation Strategies Basic • Bitstream Copying (backups) • Refreshing • Durable/Persistent Media (e.g. Gold CDs) • Analog Backups (e.g. microfilm) Expensive – Not Feasible • Technology Preservation (‘computer museum’) • Digital Archaeology (data recovery) Preferred Approaches • Migration (most preferred approach currently) • Normalization (reliance on standard format – PDF/A) • Emulation (e.g. Universal Virtual Computer) • Encapsulation (‘wrapping’)
Digital Preservation Standards - ISO • ISO 14721:2003 - Open Archival Information System (OAIS) - Reference model • Metrics for Digital Repository Audit and Certification RED BOOK, CCSDS. Oct 2009 • ISO/TR 18492:2005 - Long-term preservation of electronic document-based information • ISO 19005-1:2005 - Document management - Electronic document file format for long-term preservation - Part 1: Use of PDF 1.4 (PDF/A-1) • ISO 15801 - Electronic imaging - Information stored electronically - Recommendations for trustworthiness and reliability
Digital Preservation at the Archives of Ontario 12
Existing Archival Digital Records Program Program has existed since 1997. Program is focused on the long-term preservation of archival digital records. 2 full-time employees – Senior Coordinators, Archival Electronic Records. Created Electronic Records Online section of AO website in 2009.
Existing Archival Digital Repository Existing digital repository is on a virtual server maintained by Infrastructure Technology Services (ITS). Current digital holdings are about 5.5 TB, consisting of some 1.5 TB of archival born-digital records and 4 TB of digitized images (mostly VS records). These digital records are in various formats: MS Office documents, e-mails, HTML, digital audio and video files, databases, digital images, and websites etc. Existing repository is not adequate to meet future operational requirements as it offers little functionality to preserve and secure the digital records properly or make them accessible online.
Transfer of Digital Records • The Archives of Ontario currently acquires archival digital records from Ontario public bodies and private donors. • Guideline for Transferring Electronic Records to the Archives of Ontario was revised in September 2009. • Assists with the transfer of archival digital records to the Archives in accordance with an approved records series that has a final disposition of ‘Transfer to Archives’. • This guideline applies to all Ontario government public bodies that are subject to the requirements of the Archives and Recordkeeping Act, 2006.
Transfer of Digital Records – Cont’d Originating public bodies are responsible for ensuring that all digital records in their custody remain readable, accessible, secure, free of viruses, and are able to satisfy legal and evidentiary requirements throughout their lifecycle. Digital records are to be transferred in a software independent format whenever possible, or in a format the Archives finds acceptable. In general, the Archives will not acquire specialized software applications and their ongoing licenses.
Transfer of Digital Records – Cont’d Transfer Procedures Consult with Archives Identify Records for Transfer Complete a Test Transfer Transfer Official Records and Documentation Confirm Receipt of Records Transfer
Trustworthy Digital Repository (TDR) – What is it? Definition: ‘a mission to provide reliable, long-term access to managed digital resources to its Designated Community, now and into the future’ Taken from ‘Audit and Certification of Trustworthy Digital Repositories’ - October 2009
TDR - What is it - Cont’d A TDR is a long-term solution for the preservation of digital records of archival value. It will be driven by the Archives’ business requirements and will be modelled on ISO standards and other best practices as well.
Staff TDR - What is it - Key Components TDR will be modelled on ISO standards – OAIS Reference Model, and Audit and Certification of Trustworthy Digital Repository. The Archives’ TDR will be certified once an international/national certification process is developed.
TDR - Why do we need it? Ensures the Archives meets its mandated statutory obligations as per the Archives and Recordkeeping Act, 2006. Meets the priority for long-term digital preservation as identified in Ontario’s Five Year Corporate I&IT Plan (2008-2013). Meets the government’s priority of strengthening front-line service delivery by greatly improving services to the public at the Archives. TDR will provide ‘anytime, anywhere’ remote 24/7 online access to archival digital records.
TDR - Why do we need it? Cont’d • To preserve anytype of electronic record, • Created using anytype of application, • On anycomputing platform, • Delivered on anydigital media, • From anypublic body in the Ontario Government and any private donor, • To provide discovery and delivery to anyonewith an interest and legal right of access, • For present and future generations … … Revised from: http://www.archives.gov/era (U.S. A. National Archives and Records Administration Electronic Records Archives)
TDR - What has been done? • Full Business Case • Main recommendation: Acquire a Modifiable Off-the-Shelf (MOTS) solution or a Commercial Off-the-Shelf (COTS) solution • Request for Information (RFI) for a trusted digital repository solution • Identified 5 vendors with viable long-term digital preservation repository solutions • High-level Functional Requirement Analysis for the future trustworthy digital repository • For main entities and functions of digital repository • IT Governance Process • Gate 0 approval and Gate 1 GGRC endorsement
TDR – What has been done - Full Business Case • Main recommendation: Acquire a Modifiable Off-the-Shelf (MOTS) solution or a Commercial Off-the-Shelf (COTS) solution • Other options which have been analyzed for the development of a TDR are: • Utilize an integrated open source software (OSS) solution • Acquire a commercial custom system • Develop a digital preservation system in-house • Rely on OPS public bodies to preserve archival digital records
TDR – What has been done - Request for Information • The RFI has been well received by potential vendors with none finding difficulty with the concepts and constructs (such as OAIS Reference Model and TDR etc.) contained in the RFI document. A wealth of valuable information was received from the 7 respondents. • All 5 TDR-focused submissions meet or exceed the basic requirements for a TDR as outlined in the RFI and demonstrate the availability of modifiable off-the-shelf (MOTS) products on the digital repository market. • The estimated cost of purchasing and implementing such a solution (including software, hardware, customization, integration, and implementation, etc.) varies from $400,000 to $2,000,000. • The adoption of Open Source Software (OSS) applications seems inevitable. Among the 5 TDR-focused submissions, 3 solutions comprise OSS components; while 2 other solutions are completely made up of OSS applications. • The OAIS Reference Model, and the other TDR-related standards and best practices are highly accepted and followed by the solution providers. • The use of any solution proposed alone will not guarantee the TDR’s compliance with the OAIS Reference Model and Trustworthy Repositories Audit & Certification.
TDR – What has been done - High-level Functional Requirement Analysis 35 Use Cases were developed for main Entities and Functions of a TDR: • Ingest (7) • Archival Storage (8) • Data Management (4) • Access (4) • Administration (7) • Preservation Planning (5)
TDR - What has been done - High-level Functional Requirement Analysis cont’d Use Case Template
TDR - What has been done - High-level Functional Requirement Analysis cont’d
TDR - What has been done - High-level Functional Requirement Analysis cont’d Reengineering of digital records management process is one of the biggest challenges we are facing. We mapped the archival process into OAIS Entities and Functions.
TDR - What has been done - High-level Functional Requirement Analysis cont’d Entity - specific Policies & Procedures TDR Structure of Policies and Procedures Recommended TDR Media Digital Records TDR Database DIP Packaging TDR Import and Technology Management Transfer Guideline administration policy Standard Export Guideline Monitoring Guideline Guideline …… …… …… … ... TDR AIP TDR AIP Migration Packaging Standard Procedure Ingest Archival Storage Data Management Access Administration Preservation Planning & Procedures TDR Overall Policies TDR Mission Statement TDR Security Policy Backup and Recovery Policy System Configuration Manual … ... TDR Naming / Numbering TDR User Access Control TDR Contingency Plan Convention The Archives Fundamental Digital Preservation Polices & Procedures Digital Records Selection and Digital Preservation Strategic Digital Preservation Policy Digital Collection Policy Culling Guideline Plan … ... Digital Records File Format Digital Preservation Method Guideline
TDR- What is being done - Open Source Software (OSS) Experiments OSS testing: objectives • Test functionalities of various products • Assess the feasibility of utilizing these tools for interim • Validate and refine the detailed functional requirements for the TDR • Inform revisions to the Archives’ existing digital records guidelines and associated policies • Determine appropriate preservation tools • Further understand our existing electronic records, identify preservation risks, and potential mitigation approaches
TDR- What is being done - Open Source Software (OSS) Experiments – Cont’d OSS testing: tools to be tested • Tools which validate file formats and extract technical metadata: • DROID (created by The National Archives of UK) • JHOVE (created by Harvard University) • NLNZ (created by the National Library of New Zealand) • Tools which convert digital objects to open formats: • XENA (created by the National Archives of Australia) • Tools which manage the object assessment and ingest process: • Archivematica (created by Artefactual Systems) • Preservation testbed environment and project management software: • Planets Comparator, Planets Testbed, Planets Plato
TDR- What is being done - Open Source Software (OSS) Experiments – Cont’d Technical Inventory of Digital Records in the AO’s e-Repository • Identify the file formats and the other technical features of digital records in the Archives holdings • Identify records requiring immediate preservation action • Assess preservation risks of digital records in the Archives’ holdings • Determine priorities for future preservation operations • Inform revisions to current procedures
TDR – Next Steps? • Work will proceed in-house on developing detailed functional requirements for the TDR. • Explore options for the development of the TDR. • Creation of long-term digital preservation strategy. • Creation of long-term digital preservation policy.
TDR - Detailed Requirements – Preliminary Plan • Deliverables • Detailed requirement specifications for all 6 Entities (Ingest, Archival Storage, Data Management, Access, Preservation Planning and Administration) of a future TDR to be developed and validated • Detailed workflow for the management of archival digital records, starting from receiving, selection, accessioning, through archival description, storage to search and ordering etc. to be developed and validated • Objectives • Provide a sound foundation for the future development and implementation of a TDR in the Archives; • Ensure the future TDR can fit well into the overall Archives business environment, meet actual business requirements, work smoothly with the other IT applications already in place, and • Follow related ISO standards and digital preservation/TDR best practices.
Linkages with ECM • Long-term digital preservation begins at the desktop -active records. • Proper recordkeeping during all stages of IM lifecycle will ensure that records can be properly managed in TDR. • Preservation policy required to mitigate risks to legacy digital records. • IT and information management areas need to partner to address challenges, incorporating recordkeeping requirements.
Linkages with ECM Cont’d • Elements of a TDR can be applied to non-archival active/semi-active records that have long-term retention requirements. • TDR ensures the sustainability of an Enterprise Content Management (ECM) strategy by providing a trustworthy exporting channel and permanent repository for archival digital records initially managed by ECM system.
TDR vs. ECM/RDMS TDR ≠ ECM • Have different objectives. • Use different standards. • Look forward to future developments such as an integrated solution with both records management and long-term digital preservation capabilities.
Public electronic records with long retention periods All archival electronic records that have fulfilled their retention periods Almost all public electronic records Inactive Semi-active Active Archives’ TDR ECM Repositories Transfer of archival electronic records into the Archives' Repository TDR vs. ECM/RDMS Cont’d
Digital Preservation Collaboration: Pan-Canadian Efforts & External/Internal Partnerships
Collaboration - Goals • Similar to the Archives of Ontario, other archives and many areas of government are facing preservation challenges. • Promote the awareness of long-term digital preservation. • Bring key stakeholders together. • Collectively share the knowledge gained from the important work being done in the Archives and across government.
National Digital Preservation Working Group (NDPWG) The group was established by the Archives of Ontario in August 2008. 8 meetings have been held to date. The mandate of the group is to provide a forum for practitioners in the field of digital preservation to share ideas and expertise, discuss best practices and lessons learned. The membership includes : Saskatchewan – Manitoba – Nova Scotia – Nunavut Northwest Territories – Yukon – Alberta Manitoba – Library and Archives Canada The Archives of Ontario is the current chair for the NDPWG.
Canadian Preservation Cooperation Strategy • Library and Archives Canada (LAC) visited Archives on July 27th, 2010, to discuss a number of digital preservation projects where they could work collaboratively with the Archives. • Subsequent to the meeting, the Archives, LAC and the Saskatchewan Archives Board agreed to develop a Canadian Preservation Cooperation Strategy on Digital Preservation that outlines the principles of the group and its proposed projects. • Meetings have been held to develop work plans and other planning documents. • Canadian Preservation Cooperation Strategy was presented at National, Provincial and Territorial Archivists Conference (NPTAC) on Friday 22 October 2010. • First joint project is Canadian Registry of Digital Storage Media – final draft completed.