90 likes | 196 Vues
Learn about the ongoing revision of the Portico Preservation Metadata 2.0 project, its design goals, choices, and implications. Understand the shift towards optimized metadata handling and event documentation. Stay updated on the progress and future direction of this essential project.
E N D
ITHAKA Preservation Metadata 2.0:Revising the Event ModelA last-minute presentation on work currently in progressEvan OwensVP, Content ManagementITHAKA (JSTOR / Portico)evan.owens@ithaka.org
Background • Portico Preservation Metadata designed & implemented in 2002-2003 • Inspired by PREMIS working group participation • Operational before PREMIS was completed! • Portico Archive as of October 2009 • >14 Million E-Journal Articles plus other content • ~150 Million Files • ~1 Billion Events • Only 1K manual events; 99.999% system generated • Over 1 TB of Preservation Metadata • Portico / JSTOR / Ithaka merger in 2009
2.0 PMD Revision Project • Begun in 2008; Implementation now underway • Design Goals for Revision to Events: • Consistent editorial/coding practices (capitalization, verb tenses, etc.) • Clarify what event goes with which object and why • Eliminate redundant information where possible • Make explicit all data constraints not currently expressed in our schemas • Synchronize event metadata with the high-level preservation metadata so that the events properly document changes in the core metadata • Establish a clean base line for future expansion of events metadata
PMD 2.0 Design Choices • Use our own data model / information architecture • Optimized for Java, Oracle, and XML instantiations • XML designed to reduce future versioning: • XSD schema for frame (syntax) only • All business rules (semantics) expressed in Schematron • Not METS, not DIDL, not PREMIS XML • PREMIS compliant • Optimized for size and speed • Fully relationally normalized • Inheritable attributes / metadata • Events attached to objects
Processing Record“master” for each processing pass Bring together information common to all the events from a given processing pass; e.g., initial ingest, future migration, etc.
Not a real event!Example XML serialization showing all possible child elements to illustrate the information model
Event Types • Check: Virus, Fixity, … • Characterize: File, … • Generate: Desc. MD, Tech. MD, Fixity, … • Edit: Desc. MD, … • Set: Status, Format, Preservation Level, … • Ingest: into Archive • Add, Create, Remove File
Observations • Large-scale automated events feel very different from human events • ITHAKA archive will quadruple in 2010 • Likely 3-5 billion events . . . • Every bit of metadata has to be need justified • Events have proved their value • An entire talk on that subject alone • Nothing is easy in quantities of billions • We still have to work on full lifecycle events • THIS IS STILL A WORK IN PROGRESS!