1 / 13

Metadata Tools for JISC Digitisation Projects of still images and text

Metadata Tools for JISC Digitisation Projects of still images and text. Ed Fay BOPCRIS, Hartley Library University of Southampton. Overview: BOPCRIS today. Move to work natively with standards Interoperability Preservation Design project procedures from ground up with metadata in mind

fayre
Télécharger la présentation

Metadata Tools for JISC Digitisation Projects of still images and text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Toolsfor JISC Digitisation Projectsof still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton

  2. Overview: BOPCRIS today • Move to work natively with standards • Interoperability • Preservation • Design project procedures from ground up with metadata in mind • File-naming and directory structuring • Metadata capture processes • Production workflow that automates where possible • Minimize possibility for human error / subjectivity • “Final package” of digital object that records preservation information on the “digital shelf” and aims for maximum interoperability between systems, all in one place

  3. Overview: technical details • File-naming / directory structure • Incorporating project-specific “unique ids” • Final package (digital object) • Internally consistent “tarball” [*.TAR] • Relative path-naming conventions • METS wrapper • Extension formats for metadata: descriptive (MODS); technical (MIX); process (PREMIS) • Production workflow • Automated production of final package • Metadata recording • Dynamic input by scanner operators

  4. History • Eighteenth Century Parliamentary Papers • Project under Phase 1 of JISC Digitization Programme • Proprietary system and data formats (Agora) • Manual input of metadata • Descriptive and Structural • Advantages and Disadvantages

  5. History: Advantages • Proprietary system with advanced functionality: • OCR workflow • Web presentation • Highly customizable • Metadata fields specified and modified at will

  6. History: Disadvantages • Non-standard metadata fields • No mapping to standard formats •  difficulties: interoperability; metadata harvesting • Translation • Between systems, or between “use” and “archive” formats •  introduces possibility of versioning issues • No scope for preservation metadata • Separation between workflow / presentation system and preservation strategy • Resulted in disparate collection of scripts and tools to manage data

  7. Present: Metadata Standards • Bibliographic database export • File-system level • Directory structure • File-naming conventions • Scanning level • TIFF headers • Additional descriptive metadata • METS profile • Tailored to project needs • Extension formats (MODS, MIX, PREMIS) • Checksums (MD5)

  8. Present: Metadata Origins File-naming Directory structure Bibliographic Metadata MARC21 / MODS / etc. PRECURSORS GENERATED • Scanned Images • TIFF headers • MIX • (Z39.87) • Other metadata • Process • Additional descriptive • PREMIS • Custom dmdSec OCR (Agora / ABBYY) METS • File formats • TIFF master / Derived JPEG • Flat text (TXT) & Word-co-ordinated OCR (TAR)

  9. Present: Digital Object (“final package”) (1) ID.TAR METS XML ./ID.XML dmdSec MODS XML amdSec MIX, PREMIS XML fileSec  ./master (TIFF)  ./derived (JPEG)  ./txt (plain text)  ./idx (word-co-ordinated) structMap physical logical Master images (TIFF) ./master/ Derived images (JPEG) ./derived/ Text OCR (TXT) ./txt/ Word-co-ordinated OCR (IDX) ./idx/ (2) ID.CHECKSUM (MD5)

  10. Future • One tool for entire process, from scanned images to METS • Tool would: • Extract technical metadata • Include descriptive metadata • Build flat-structure METS • Tool would require: • File-naming, directory-structuring conventions • Image file sources

  11. Future: Advantages • Abstraction = standardization • All digitization projects will produce metadata in similar formats interoperability • Certain technical base-standards will be present preservation • Any centrally developed preservation or presentation systems would be able to ingest output from any project • Saves wasted effort developing similar solutions many times, when one solution can be developed once and adapted

  12. Future: Questions… • Usefulness of such a tool? • Relevance to your project? • Problems / obstacles? • How much flexibility is necessary? • Manual input / editing? • Main points: • Abstraction, functionality, flexibility

  13. Further information • Ed Fay, Software Developer • BOPCRIS, Hartley Library • University of Southampton • ef1@soton.ac.uk • 023 8059 3575

More Related