1 / 30

Harvesting and DAMS

Harvesting and DAMS. Glen Robson, DAMS Manager, National Library of Wales. What do we do when it gets here. Normalise Meta data Migrate? Storage Access. Normalise Metadata. Consistency Convert to NLW standards (METS) Consistent METS between projects Add technical metadata

imani-roman
Télécharger la présentation

Harvesting and DAMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harvesting and DAMS Glen Robson, DAMS Manager, National Library of Wales

  2. What do we do when it gets here • Normalise Meta data • Migrate? • Storage • Access

  3. Normalise Metadata • Consistency • Convert to NLW standards (METS) • Consistent METS between projects • Add technical metadata • Link file format to PRONOM registry • Automatic technical metadata • Jhove or NZ metadata extraction tool • Add preservation metadata (PREMIS) • Objects history

  4. Harvesting • Take a copy of metadata and Thesis • Different formats • PDF, Word and Text • Complex Objects • E.g. 1 PDF per chapter

  5. Migration • Input: • 221 application/msword   • 4 application/octet-stream • 114 application/pdf   • 3 application/vnd.ms-excel • 340 text/plain

  6. Now or later? • Migrate on ingest • How do you choose the format? • Storage Cost • Migrate on obsolescence • Tools available?

  7. Migration • Microsoft Word • Can open it now • Have to have a copy of Word • application/octet-stream • Can’t open now

  8. Storage • LOCKSS • University copy • NLW Copy • Archive copy on tape • Archive copy on Optical Disc • Archive copy offsite • Access copy • Ethos copy

  9. Access • Convert to MARC • Digital and Print in MARC • Single Point of access for all collections • Mostly automated • Best use of resources

  10. Lessons Learnt and Problems Encountered • Started using Fedora in 2004 • Ingested 3 Digitisation Project 2 Mass Digitisation • Ingesting Video and Radio Programs • Started with Pilot • Purchased VITAL based on Fedora • Project Driven

  11. Lesson 1: Physical carriers degrade or obsolete

  12. Lesson 1: Physical carriers degrade or obsolete

  13. Lesson 1: Physical carriers degrade or obsolete

  14. Lesson 1: Physical carriers degrade or obsolete

  15. Why is this a problem for the library? • Deposit • Sometimes no choice on carrier • Depositors aren’t in a position to change the carrier

  16. Lesson 1: Physical carriers degrade or obsolete • Age • Storage conditions • Sun light • Temperature • “Widely differing claims have been made for the life expectancy of CD-Rs, but it is generally accepted that they will last longer than the associated technology and are therefore suitable for preservation purposes. CD-Rs offer storage capacities of 650 MB to 700 MB. CD-RW is based upon a different recording process to CD-R, and is not recommended for archival storage.” • http://www.nationalarchives.gov.uk/documents/media_care.rtf

  17. Practical Example • Deposit of CDs from Cliff McLucas and Brith Gof Theater company • 22% of the Cliff McLucas CDs • 60% from Brith Gof could not be copied or read. • According to the sleeves, many of the Brith Gof discs contain material relating to performances between about 1989 and 1992. • Only real solution is to copy data from carrier as soon as possible

  18. CDAS

  19. Lesson 2: Digital can get BIG • Wills Project • 182, 404 Wills • 816, 325 Images • 998, 729 Fedora Objects • Welsh Journals • 50 Titles • Thousands of Pages • Offair • 40,000 Records • SCIF Newspaper and Magazines • 2 Million Pages • Repository 3 Million plus Objects

  20. Problems • Processing takes time • Management • Discovery • Cost • Cataloguing / Metadata

  21. Lesson 2: Digital can get BIG • Sgrîn – Cardiff Media Company • Company closing down (2006) • Collect data from Shared drive • Stats: • 29.2 GB • 68,446 files • Microsoft Word Documents: 32,086 • JPEG Images: 18,093 • Rich Text Format: 2,707 • Microsoft Excel Documents: 2,498 • Microsoft Works Word Document: 2,127 • Files with missing File extension: 2,036 • Selection? • Cataloging?

  22. Lesson 3: Metadata is expensive • Accessioning: • Depositor adds metadata (Roda) • Deposit comes with metadata (Ethos) • Digitisation • Structure / Context • From Catalogue • Write Once use many • Automate as much as possible

  23. Lesson 4: You can’t automate everything • Offair Recording • Original Plan: • BOB System records programs • Metadata from EPG • Harvest from BOB create MARC record • Ingest • Totally automated

  24. Lesson 4: You can’t automate everything • Spanners in the works: • Duplicate Recordings • Failed Recordings • EPG Errors • New workflow: • BOB System records programs • Metadata from EPG • Fix failed validation records (Human Process) • Harvest from BOB create MARC record • Ingest

  25. Lesson 5: Things Change

  26. Ingest Early • Items managed early • Missing items picked up earlier • Change / Creation at the same point • 1 interface rather than 1 creation 1 edit • Preserve but allow change • Systems make it difficult

  27. Lesson 6: Workflows not Projects • Develop specific Project based workflows • Have to be customised each time • Symptom of project based funding • Digitisation Workflow • Generic Services • Technical Metadata • Checksums

  28. Preservation Paranoia • Lesson we may learn: • How much metadata is too much? • How much technical metadata should we have? • Migrations MS-Word: • PDF • Text • Image of each page • Open Office • XML

  29. Summary • Physical carriers degrade or obsolete • Digital can get BIG • Metadata is expensive • You can’t automate everything • Things change • Workflows not Projects • Preservation Paranoia

  30. Questions

More Related