1 / 9

Stephen Abrams Harvard University Library stephen_abrams@harvard

Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture. Stephen Abrams Harvard University Library stephen_abrams@harvard.edu. Digital preservation at Harvard.

Télécharger la présentation

Stephen Abrams Harvard University Library stephen_abrams@harvard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007Preservation and Access RepositoryStorage Architecture Stephen Abrams Harvard University Library stephen_abrams@harvard.edu

  2. Digital preservation at Harvard • Obligation to ensure the ongoing usability of library digital assets over time • Digital Repository Service (DRS) • Managed preservation and access repository • Seven years of production operation • 6.7 million assets (27 TB) • Primary strategy: redundancy and heterogeneity • Primary challenge: scaling

  3. Scaling: linear or exponential?

  4. Storage classification • All managed assets are assigned a storage classification • Public use (U) High availability, fast response • Archival storage (A)  High capacity, low cost • Use assets are optimized for web-friendly delivery • Archival assets are optimized for longevity • Asset classification is known at the point of acquisition

  5. Architectural requirements • Each asset is stored: • In at least 3 physical locations • On at least 2 storage mediums • With at least 2 on-line copies (U) / 1 on-line copy (A) • With at least 1 off-line copy • Ongoing auditing for bit-level error detection and correction • Virtualization layer with uniform interface to all assets, regardless of physical medium • Application interface exposed as NFS-mountable file systems

  6. Storage architecture

  7. Storage architecture • QFS cache and primary U disk archive on EMC CX3-40 (FC/SATA, RAID-1/RAID-5) at on-campus data center • Redundant switched FC data paths to primary/fail-over Sun T2000/Solaris file servers running SAM-QFS • Primary A/secondary U disk archive on EMC CX3-80 (FC/SATA, RAID-1/RAID-5) at off-campus data center • Redundant FC data paths to T2000 file server running SAM-QFS • Secondary A/tertiary U tape archive on StorageTek SL500 (LTO-3) FC-attached to primary on-campus T2000 • Tertiary A/quaternary U tape archive on LTO-3 media at off-campus managed storage facility • Disk archives are UFS file systems containing Tar files; even with the loss of the SAM infrastructure they are susceptible to full (if time-consuming) recovery with standard Unix/Linux tools

  8. Storage virtualization • SAM-QFS reader/writer on primary on-campus T2000 file server • SAM-QFS reader on fail-over on-campus/off-campus T2000 file servers • All U and A assets written to QFS cache on CX3-40 • Immediate creation of all UFS disk and LTO-3 tape archive copies • Immediate release from cache with “stage never” • SAM manages all copies of all assets; externally each asset appears as a single file in an NFS-mountable file system • Application access requests are initiated by NFS reads and are fulfilled directly from primary disk archive copy without staging to cache

  9. Issues • Disk vs tape • LTO-3 vs LTO-4 • Tape archive media pooling • All hardware/software installed; currently engaged in configuration and preliminary unit / integration testing • Need to establish benchmarks for system performance • Planning for migration from existing storage solution • Automated data classification • Response to an anticipated escalating rate of asset acquisition • Google mass digitization • Web archiving • Audio/video content • Scientific data sets

More Related