1 / 33

Developing an Ingest Service for Fedora

Developing an Ingest Service for Fedora. Ryan Scherle Muzaffer Ozakca. IUDL infrastructure project. 2-year project funded by University Information Technology Services to reengineer digital library infrastructure around Fedora

kirsi
Télécharger la présentation

Developing an Ingest Service for Fedora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing an Ingest Service for Fedora Ryan Scherle Muzaffer Ozakca

  2. IUDL infrastructure project • 2-year project funded by University Information Technology Services to reengineer digital library infrastructure around Fedora • Builds on experience with Fedora in context of EVIA Digital Archive (ethnomusicology video) • 2 full-time staff, plus part-time from many others • Dozens of legacy collections with roughly 100,000 objects • New collections: some content-focused, some research-focused

  3. Multiple media types Multiple brands Multiple tools Diversity

  4. The goal Aajk fs jkflsf jkds s jfs sdkf Ingest Aajk fs jkflsf jkds s jfs sdkf Jkl id jid whi ahin inpa aialw hwiwl Jkl id jid whi ahin inpa aialw hwiwl Aajk fs jkflsf jkds s jfs sdkf

  5. Required features • Ingest common content types: • Images • Paged documents • Textual documents • Allow for easy creation of new content types • Must support several workflows • Metadata or media may be primary • Most objects include derived media • Systematic changes to metadata may be desired • May need to connect with external tools for metadata generation, validation, etc. • A workflow engine may sit on top of the ingest system

  6. Existing Ingest Tools

  7. Criteria • Ease of install • Native content models • Custom content models (e.g. paged) • Workflow neutrality, including object modification • Batch ingest Remember, we’re evaluating object ingest only, not object delivery!

  8. But first, some disclaimers… • This is not an objective evaluation, just our experiences • We’re not experts in these systems • We’re evaluating ingest only, not delivery! • We’re evaluating ingest with a focus on our needs • We believe in community

  9. Fedora admin client • Comes with Fedora • Geared towards admins rather than end users • No systematic way of entering data or attaching files • Very flexible • The only way to create disseminators • Tedious

  10. Fez • End-to-End GUI system • Highly customizable content models, workflow, security • Customizable role and group based access control • Growing community • Originally developed as an Institutional Repository • Many preset content models • Can create “extension” metadata based on an XSD • External MySQL database for workflow/vocabulary data • GPL

  11. Fez - ingest • Single object ingest • Through Web UI • ImageMagick/JHOVE integration • Bulk ingest: • Upload files to a directory • Also can import existing Fedora objects in bulks • Templates for metadata common to all objects, manual updates for the rest • Batches possible, but only one file per object • No disseminators • Custom metadata can be stored as a simple XML file • Objects must use “compound” content model Fedora

  12. Fez – object organization

  13. Elated overview • End to end complete system for digital collections • Simple customizable metadata and a simple workflow supported • GPL “Elated is a lightweight, general-purpose application for managing digital files. ELATED is built on top of the Fedora Repository System, and could be used as a digital assets management system, an institutional repository, or to meet other collection archiving, publishing and searching needs.”

  14. Elated ingest • Single object ingest • Through Web UI • Focused on DC metadata,custom fields can be added • Multi object ingest via zipped folders and files • Metadata template + manually • Batches possible, but only one file per object • Simple content model • Manually-attached disseminators Fedora

  15. Elated object organization

  16. Valet for ETDs • A component of the VTLS VITAL product focused on ETD submission • Allows submission of thesis and a simple workflow for approval • Part of a larger framework • Highly focused on ETDs

  17. DirIngest overview • Ingests objects from a structured ZIP file • Highly flexible • User must create METS structure by hand • Doesn’t handle disseminators • Can create some RELS-EXT data, but not fully flexible • Cannot modify existing objects/collections • Easy to use OhioLink Bulk Ingest

  18. DirIngest Zip Archive METS.xml Crules.xml Fedora

  19. Batch modify • A method of controlling API-M with simple XML statements • Can create “empty” objects and change them in systematic ways. • Requires manual (or programmatic) creation of the modify scripts • Can be used in conjunction with other tools…

  20. Summary

  21. Indiana Ingest Tool

  22. Indiana Ingest Tool • A structured interface between a workflow management or repository management GUI and the Fedora repository • Focused on simple input formats for maximum flexibility • Keeps the tools independent of the repository architecture • Builds the FOXML, rather than requiring a full structure to be pre-built • Binds disseminators • Creates RELS-EXT relationships • Can create and/or alter items in a collection • Auto-generates technical metadata with JHOVE or XSLT.

  23. Image Cataloging Tool Sheet Music Cataloging Tool EAD JPG MODS PDF SIP Ingest Tool FOXML Datastreams Fedora

  24. Performing an ingest • Place source metadata in an accessible location (filesystem, website) • Place media files (both master and derivative) in an accessible location • Define the "collection configuration" • Run the ingest process • Receive report

  25. Sample collection config file Collection defn <cc:collectionName>Hoagy Carmichael Correspondence</cc:collectionName> <cc:contentModel>paged</cc:contentModel> <cc:collectionID>hoagy</cc:collectionID> <cc:collectionPid>iudl:6</cc:collectionPid> <cc:existingItem> <cc:fedoraItemExists action="alter"/> </cc:existingItem> <cc:masterContent type="image" subtype="tiff"> <cc:source location="localfs">{path to master images}</cc:source> <cc:extension>.tif</cc:extension> </cc:masterContent> <cc:derivedContent derivativeType="images"> <cc:source location="localfs">{path to dreivative images here}</cc:source> <cc:extension item="thumb">-thumb.jpg</cc:extension> <cc:extension item="screen">-screen.jpg</cc:extension> <cc:extension item="large">-full.jpg</cc:extension> </cc:derivedContent> <cc:descriptiveMetadata> <cc:metadataItem type="ead" authoritative="true" level="collection"> <cc:source location="localfs">{path to ead}</cc:source> </cc:metadataItem> ... <cc:technicalMetadata> <cc:metadataItem type="mix" authoritative="true" level="masterContent"> </cc:metadataItem> ... What to do If item exists File defn Desc. metadata Tech. metadata

  26. Example – Sheet Music Fedora FOXML Ingest Tool Datastreams: Images METS RELS-EXT

  27. Example – preservation package SIP Fedora FOXML Ingest Tool Datastreams: Images METS RELS-EXT

  28. Summary

  29. Major difficulties in any ingest tool • Providing flexibility in “style” of content model • Matching filenames with metadata records • Indicating the sequence of files in complex objects • Abstracting over differing local metadata standards (even in our own collections)

  30. Topics for future discussion • What is the best structure for an ingest tool? • Is our tool of interest to others? • Would it be better to combine our capabilities with an existing tool? • Can we agree on some core content models?

  31. Thank You! • Infrastructure project wiki: • http://wiki.dlib.indiana.edu/confluence/display/INF • Contact info: • Ryan Scherle rscherle@indiana.edu • Muzaffer Ozakca mozakca@indiana.edu

More Related