1 / 38

Digital Preservation at HUL & DRS 2

Digital Preservation at HUL & DRS 2. HMS Countway Library Andrea Goethals July 20, 2009. Agenda. The problem What are we doing about it? DRS 2 Open for questions. 1. The problem …. The problem is twofold. 1. Keeping the bits safe. 2. Keeping the bits useful to people.

yakov
Télécharger la présentation

Digital Preservation at HUL & DRS 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Preservation at HUL & DRS 2 HMS Countway Library Andrea Goethals July 20, 2009

  2. Agenda • The problem • What are we doing about it? • DRS 2 • Open for questions

  3. 1. The problem …

  4. The problem is twofold 1. Keeping the bits safe 2. Keeping the bits useful to people

  5. Keeping the bits safe • Digital things are amazingly easy to destroy • Bad people • Software or hardware failure • Human mistakes • Destruction is not always apparent • Data not used frequently is at risk of unnoticed damage • Some damage is not noticeable to human eyes and ears

  6. Keeping the bits useful to people • Digital material is fragile • Humans are dependent on technology to interpret the content... • Technologies must understand the format of the content • Technologies age and disappear!

  7. Using information content information content information content HW (paper) symbols bits formats language SW HW (paper) HW Analog book Unmediated use Digital book Technology-mediated use

  8. Formats are key to determining usability information content bits formats SW HW Formats are the bridge between the content we want to preserve and supporting technologies digital content supporting technologies

  9. 2. What are we doing about it?

  10. Keeping the bits safe • Store the bits in multiple copies, in multiple places • Make sure the bits are not corrupt • Replace media periodically • Restrict who can access the bits • Be able to recover the bits!

  11. Keeping the bits safe at HUL • 3-4 copies of each file, 2 different media • 1-2 (tape and sometimes disk): 60 Oxford Street, Cambridge • 1 (disk): Summer Street, Boston • 1 (tape): Southborough

  12. Keeping the bits safe at HUL • Automated integrity monitoring • Drscheck script • Compares the MD5 of each file at the Summer Street location to the MD5 stored in a database • Also checks the 60 Oxford Street disk copy • A copy of each file checked ~every 2 weeks • Recent enhancement: Trigger on database update of MD5 • Storage media replaced every 4-5 years

  13. Keeping the bits safe at HUL • Overseen by OIS and UIS IT staff • Just-in-case plans • Disaster recovery • Server fail-overs • Software failure • Tape libraries • Fabric switches • Lost or damaged tapes • Data recovery (corruption)

  14. It’s safe - but is it usable??? • It’s not enough to preserve the bits if the format of the bits is obsolete! • WordStar? AppleWorks? Excel 1.0? • For digital content we are dependent on software that can understand the format…

  15. The importance of format • Understanding formats is fundamental to preservation ffd8ffe000104a46494600010201 008300830000ffed0fb050686f74 6f73686f7020332e30003842494d 03e90a5072696e7420496e666f00 0000007800000000004800480000 000002f40240ffeeffee03060252 0347052803fc0002000000480048 0000000002d80228000100000064 000000010003030300000001270f 0001000100000000000000000000 0000600800190190000000000000 0000000000000000000000000000 0000000000000000000000003842 494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200 ...

  16. The importance of format • Understanding formats is fundamental to preservation ffd8ffe000104a46494600010201 008300830000ffed0fb050686f74 6f73686f7020332e30003842494d 03e90a5072696e7420496e666f00 0000007800000000004800480000 000002f40240ffeeffee03060252 0347052803fc0002000000480048 0000000002d80228000100000064 000000010003030300000001270f 0001000100000000000000000000 0000600800190190000000000000 0000000000000000000000000000 0000000000000000000000003842 494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200 ... SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2 ...

  17. The importance of format • Understanding formats is fundamental to preservation ffd8ffe000104a46494600010201 008300830000ffed0fb050686f74 6f73686f7020332e30003842494d 03e90a5072696e7420496e666f00 0000007800000000004800480000 000002f40240ffeeffee03060252 0347052803fc0002000000480048 0000000002d80228000100000064 000000010003030300000001270f 0001000100000000000000000000 0000600800190190000000000000 0000000000000000000000000000 0000000000000000000000003842 494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200 ... SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2 ...

  18. Keeping the bits useful to people • Know what formats you have • Make sure there’s technology to support the formats! • Provide ways for people to find it • Provide ways for curators to manage it • Keep records of significant events • Repair, replace

  19. Can we approach the problem differently? • In way that’s more proactive? • And more efficient? • And less expensive? Yes…

  20. The content production matters! • The least expensive, and most effective preservation measure is to think about the future when digital content is created! • It makes good sense to try to influence the content creation process

  21. Preservation lifecycle • Create digital content • Ingest into a preservation repository • Continuous cycle of: • Monitoring • Planning • Intervention • Subject to collection management decisions • Transfer to next generation of the repository or to a different repository

  22. Keeping the bits useful to people at HUL • Guidelines • More ‘preservable’ files • formats: standard, well-understood, well-supported, open • Recommended supplementary documentation (metadata) • Tools • FITS, JHOVE: check quality of files, automated metadata extraction • Staff available to consult

  23. Keeping the bits useful to people at HUL • Collection management applications • Discoverable content • Catalogs • Persistent names • Search engines • Extensive metadata • Administrative, Technical, Structural, Provenance • Suite of delivery applications…

  24. Keeping the bits useful to people at HUL • Suite of delivery services • Delivery applications created and maintained at OIS • IDS, PDS, SDS, ADS, FTS • Third party middle-ware maintained at OIS • RealServer, Luratech JPEG 2000 Server • Third party rendering applications on users’ desktops • Web browsers, RealAudio Players, TIFF viewers, ZIP utilities

  25. Involvement in broader preservation community efforts • E-journal archiving • Technical metadata • Still images, audio, documents • METS (package for metadata and digital objects) • PDF-A • PREMIS (preservation metadata) • AIHT (repository interaction demonstration) • Registry of digital masters • Repository certification • Formats registry (UDFR)

  26. 4. DRS 2 …

  27. DRS 2 changes Why? • To better support digital preservation • To better support needs of DRS depositors, curators and collection managers

  28. DRS 2 changes • New conceptual foundation • Objects, content models • User improvements • Opaque objects, new file formats, tools, guidance • A new approach to metadata • Increased preservation planning and activities

  29. Objects • Currently only a file level in the DRS • All management has to be done at the individual file level • Objects are aggregations of files • Page-turned object • Still image object • More intuitive unit for management, reporting and searching • Example: How many Page-turned objects do I have in the DRS?

  30. Content models • Types of objects • Example: audio content model

  31. Support for opaque objects • A special content model • Allows files in any format • Digital equivalent of buying time at HD • Content can be minimally processed, or can be fully processed by depositors but not yet supported by the DRS • Must be intended for long-term preservation • Will receive some preservation services • Will be on a path to fuller DRS preservation

  32. Support for new file formats • PDF • Audio • MP3, MP4/AAC • Drawings • AutoCAD • Adobe Illustrator • Video • What’s next?

  33. Deposit, management & delivery tools • Enhanced Batch Builder • Integrated with File Information Tool Set (FITS) • Enhanced DRS Web Admin • Better searching • Richer management and reporting • Ability to perform batch updates • File Delivery Service (FDS) • Created for PDF delivery • Delivers a file to user’s web browser

  34. Future of http://hul.harvard.edu/ois/

  35. Guidance & user community New website for digital preservation • Formats central • Content models • DRS practices • HUL digital preservation projects • Emerging standards and best practices • Tools, services, registries • Resources & Experts

  36. A new approach to metadata • Moving towards community-standard schemas • PREMIS, MODS, MIX, textMD, etc. • Metadata files on the file system alongside content files • “object descriptor files” • Preservation, rights, descriptive metadata • More reliance on embedded metadata • Automatic extraction at deposit time by FITS • Third party delivery applications are becoming aware of file-embedded metadata

  37. Increased preservation planning and activities • More granular format identification • Sub-file characterization • Preservation plans per content model • Digital first aid (content & metadata) • “Localization,” migrations, normalizations • Technology watch • Virus checking

  38. 5. Open questions …

More Related