Versioning of Digital Objects in a Fedora-based Repository
240 likes | 433 Vues
Versioning of Digital Objects in a Fedora-based Repository. Matthias Razum FIZ Karlsruhe DORSDL Workshop Alicante September 21, 2006. Outline. Motivation Versioning Concepts in eSciDoc Content Models Technical Approach Conclusion. Project Setup and Mission.
Versioning of Digital Objects in a Fedora-based Repository
E N D
Presentation Transcript
Versioning of Digital Objects in a Fedora-based Repository Matthias RazumFIZ Karlsruhe DORSDL WorkshopAlicanteSeptember 21, 2006
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Project Setup and Mission • eSciDoc is a joint project of the Max-Planck-Society (MPS) and FIZ Karlsruhe • 6 million € five-year grant (2004 – 2009) from the German Federal Ministry of Education and Research • It aims to build an integrated information, communication and publishing platform for web-based scientific work, exemplarily demonstrated for multi-disciplinary applications in the MPS • eSciDoc is not a mere research project, but aims at establishing an innovative productive system
Repositories for eScience • The contents of an institutional repository or a digital library form the ‘institutional memory’ of an organization • And just like human memory, they should allow for associating information objects in novel contexts, thus creating new scholarship • Interdisciplinary work is becoming increasingly important, so systems have to span scientific disciplines • Repositories should be open, application-independent and flexible, thus laying the ground today for repurposing the information in future applications
Turning Static Objects into ‘Living’ Knowledge • e-Scholarship allows to publish all intermediate results of knowledge generation from first ideas, theories, discussions with peers to final results • Institutional Repositories and Digital Libraries need to support scholars already in the early steps of this process, thus enabling their users to share their work in progress with peers • Thinking a step further leads to interactive authoring environments with support for collaboration and annotations • As a result, objects loose their static nature and become ‘active nodes’ in a network of knowledge
Implications • The concept of ‘ownership’ of an artifact is loosened and partly replaced by an ongoing authoring process which spans persons, places, and time • Collaborative authoring raises an issue familiar to software developers: versioning of digital objects • All intermediate or working versions of artifacts should become part of the repository, not just the final versions • Good Scientific Practice requires provenance data for objects and versioning
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Versioning on Object Level • Fedora’s basic object model – as defined in FOXML – is composed of an identifier, some key descriptive properties and a set of datastreams • Currently, each change to a datastream leads to a new version of the datastream, but not of the object itself. • On the other hand, authors and editors perceive objects as one coherent entity, not as a set of datastreams. • They request a ‘whole-object’ versioning which complies with their mental model.
Fixed and Floating Object References • Scholarly work strongly relies on citations and external references to existing material (e.g. primary data and supplementary material) • In the context of digital repositories, these associations are expressed as object relations. • Versioning of objects then raises the question how to handle relations pointing to a versioned object. • eSciDoc implements two approaches: fixed relations pointing exactly to a given version of an object and floating relations which always point to the latest version of an object.
Internal and Public Versions • Versions represent intermediate work statuses and are only visible to authors of digital objects • Revisions are published versions of objects with persistent identifiers. • Creating a revision is an intellectual step which most often includes some form of quality assurance, whereas versioning is an automated process.
Container Objects • eSciDoc allows the grouping of objects by means of container objects like collections or bundles. • A change to one of the contained objects substantially changes the container object as well. Therefore, any change to a contained object should lead to a new version of the container object. • The same applies to revisioning: container objects are citable objects with their own persistent identifier. Revisioning of contained objects forces a new revision of the container object too.
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Content Models in General • An important part of implementing a Fedora repository is modeling different classes or “genre” of digital object that will be created, stored, and managed in the repository. • A content model will typically describe the following: • Datastream composition • the number and kinds of datastreams that must be present in the digital object • the format(s) for those datastreams, either MIME or format identifiers • whether each kind of datastream is required or optional • whether each kind of datastream has cardinality contraints • Semantic identifiers for each kind of datastream relationships • in the cases where a content model is a “graph” of related content models • Disseminators (optional)
Essential Properties hasProperties 1 hasDefaultMD 1 eSciDoc Metadata hasRevision hasMD * * Metadata hasComponent * hasLicense Content Component License * hasMD hasLicense * 1 CC License CC Metadata Structural View of Content Item Content Item
Content Item Modeled as Fedora Object hasComponent * Content Item Content Component RELS-EXT RELS-EXT eSciDoc MD CC MD MD1 License1 ... ... MDn Licensen WOV MD Content Stream
Container Modeled as Fedora Object hasMember * Container Content Item RELS-EXT RELS-EXT eSciDoc MD eSciDoc MD MD1 MD1 ... ... MDn MDn Structure Map WOV MD WOV MD
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Whole-Object Versioning Metadata • Fedora versioning works automatically within objects • The eSciDoc middleware keeps track of whole object versions via objectVersion metadata • The eSciDoc middleware also can tag particular whole object versions as “revisions” which will be official published views of the object
Animated View Revision t0 t1 t2 t3 t4 PID: parent:1 VersionID: 1.0 DOI: -- PID: parent:1 VersionID: 1.1 DOI: -- PID: parent:1 VersionID: 1.2 DOI: -- PID: parent:1 VersionID: 1.3 DOI: x.y/rev:1 PID: parent:1 VersionID: 1.4 DOI: -- Content Item CC1 PID: child:1 Version: t0 PID: child:1 Version: t0 PID: child:1 Version: t0 PID: child:1 Version: t0 PID: child:1 Version: t4 CC2 PID: child:2 Version: t0 PID: child:2 Version: t1 PID: child:2 Version: t1 PID: child:2 Version: t1 PID: child:2 Version: t1 CC3 PID: child:3 Version: t2 PID: child:3 Version: t2 PID: child:3 Version: t2
Object Version XML <objectVersion versionID=”1.0”> <comment> this is the first whole object version </comment> <component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/> <component PID=”child:6” dateTime=”2006-05-10T12:21:57Z”/> </objectVersion> <objectVersion versionID=”1.1” revisionID=”doi:10.11.1234”> <comment>demo:5 is the same; demo:6 modified; demo:7 ingested </comment> <component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/> <component PID=”child:6” dateTime=”2006-08-11T09:23:09Z”/> <component PID=”child:7” dateTime=”2006-08-11T09:23:09Z”/> </objectVersion>
Outline • Motivation • Versioning Concepts in eSciDoc • Content Models • Technical Approach • Conclusion
Conclusion • Versioning is essential for repositories which cover the whole object lifecycle • Fedora already comes with a powerful versioning mechanism, but cannot fulfill all requirements of eSciDoc • Atomistic content models make versioning even more complex • The proposed approach provides a solution for advanced versioning requirement and at the same time is a demonstration of Fedora’s flexibility and adaptability
Acknowledgements The concepts in this presentation are based on • eSciDoc’s Logical Data Model, created by Natasa Bulatovic (ZIM, Max Planck Society) • a joint workshop of ZIM and FIZ with Sandy Payette and Carl Lagoze
Questionsmatthias.razum@fiz-karlsruhe.dewww.escidoc-project.de/homepage.htmlQuestionsmatthias.razum@fiz-karlsruhe.dewww.escidoc-project.de/homepage.html