500 likes | 622 Vues
This presentation discusses the development of "Pergamos", a digital library system based on FEDORA and created at the University of Athens. The motivation behind this initiative stems from the UoA's need for an efficient way to manage over one million digitized objects across diverse collections. The system addresses automation in content ingestion and simplifies web-based cataloging while emphasizing effective management of various digital object types. The implementation details and challenges, along with a preview of Pergamos, are presented, highlighting the collaboration of a dedicated team.
E N D
European FEDORA User MeetingCopenhagen, 28 September 2005 Introducing “Pergamos” A FEDORA-based Digital Library System utilizing Digital Object Prototypes Kostas Saidissaiko@di.uoa.gr Libraries Computer Center Department of Informatics & Telecommunications University of Athens
Outline • Motivation – The University of Athens (UoA) DL • Digital Objects (DOs) • DO Storage (FEDORA) • DO Manipulation (DL Application Logic) • Digital Object Prototypes • Automatic DO Type Conformance • Scope of Prototypes & Collection Management • Implementation Details • A Preview of Pergamos • Discussion
The UoA DL Project • Over 1 million objects originating from 8 disparate collections • Folklore notebooks, Ancient papyri, UoA Historical Archive, Byzantine music manuscripts, Theatrical photos & brochures, Informatics research papers and dissertations, Medical images, Press articles • Heterogeneous material, in terms of content type, metadata, structure, user requirements • Mostly digitized material, requiring detailed cataloging
UoA DL Project Metadata • Build a Web-based DL System to handle all material • Centralized DL approach due to • Existing hardware infrastructure • Funding restrictions • Administration simplicity • FEDORA is our DO Repository
UoA DL Project Metadata Contd. • Small Team • 2.5 developers, 1 librarian, 1 manager • Requirements, Specifications, Development, Digitization & Cataloging Management … • … while everyday tasks keep running! • Cataloging Personnel • Scholars & Experts in each collection’s domain (not librarians) • Strict Schedule • First Collection deadline: early 2006 • Project deadline: end of 2006
Motivation • Simplify & speed up the cataloging process • Provide effective Web-based cataloging interfaces • Automate content ingestion • Decrease development time • Avoid custom coding for each content variation • Elaborate on reusable and configurable DL modules • Provide the means to treat content variations in a unified manner
Digital Objects • A Digital Object is a human generated artifact consisting of the digital content and related information
FEDORA • FEDORA Digital Object Model • Content Models, Datastreams, Behavior Definitions, Mechanisms & Disseminators • FEDORA is a DO Repository • Focus on how each DO part is encoded & stored • Handles effectively issues related to storage, preservation & versioning, searching & indexing, interoperability
DL Application Logic • Cataloging, Workflows, Collection Building & Management, User Interfaces, etc • DL Modules manipulate DOs in a higher level of abstraction • Focus on the overall behavior of the DO (what are the DO parts and how do they behave) • DOs reflect the underlying “real world” objects – they behave according to their nature, their essence, their type
DO Typing information Do we effectively capture, express and utilize the nature (type) of DOs?
An example – Theatrical Collection • Albums containing photos of National Theater Performances • What is a Photo DO? • A digital image • stored in various formats (e.g high quality, www quality, thumbnail) • accompanied by the metadata required for describing the picture • What is an Album DO? • A container of Photo DOs accompanied by theatrical play metadata
A 2nd example – Historical Archive • University’s Senate Session Proceedings > Folders > Sessions > Items • What is a Item DO? • A digital image (capturing 1 or 2 pages) • stored in various formats (e.g high quality, www quality, thumbnail) • What is a Session DO? • A container of Item DOs + metadata • What is a Folder DO? • A container of Session DOs + metadata
DO Typing Information • FEDORA Content Models express DO Typing information • Content Models are metadata attributes (e.g. “photo”, “album”) that we use as a guide • Humans interpret Content Models, not the DL System • Manual resolution of DO Typing issues
Problems • Catalogers carry out manual XML editing in a low level of abstraction with too technical, complex & over detailed semantics • Developers generate ad-hoc, custom & not reusable implementations of DO types’ variations of behavior • DL modules exhibit limited evolution and configuration capabilities
DO Typing Information The DL System should resolve DO Typing issues automatically (in a manner transparent to the DL Application Logic)
Automatic DO Type Conformance • The designer specifies the various DO types… • … and the DL System makes DOs conform to these type specifications automatically • How?
The OO Viewpoint • In the OO model an object is itself aware of its “nature” and behaves accordingly • Objects are conceived as instances of a type, automatically conforming to the type’s definitions & specifications • OO types are separate entities (named either classes or prototypes)
Digital Object Prototypes • A DO Prototype is a DO Type Specification, a separate entity that defines the DO’s: • Constitutional parts – metadata sets, files, structure, etc • Private behaviors – DO internal operations such as serializations, validations, assignment of default values, content conversions, etc • Public behaviors (behavior schemes) – the DO external interface, consisting of high level operations such as Detail view, Browse View, Edit View, etc
DO Prototypes & Instances • The designer carries out the definition of DO Prototypes – the DL System handles the rest • DO Prototypes represent the realization of the Content Model notion in a OO fashion: • The process of generating a DO from a Prototype is called instantiation • The resulted object is an instance of the prototype • A DO instance automatically conforms to the Prototype’s specifications • Stored DOs vs DO instances
Digital Object Dictionary • The runtime environment in which DO instances and Prototypes operate: • Instantiation of DOs based on the prototype specifications (private behaviors: load & parse XML, assign default values, etc) • Exposure of the public DO behaviors in a high level, uniform API (for use by DL Modules) • Serialization of the DO instance back to FEDORA (private behaviors: serialize data structures in XML, perform validations, etc)
A DL Module performs the following steps: Acquire the DO Instance do = dictionary.acquireObject(“type”) do = dictionary.acquireObject(“uoadl:1024”) Perform operations upon it do.getMDSet(“DC”).getField(“title”) dictionary.executeBehavior(do, “editView”) Store the DO in the repository dictionary.saveObject(do) Cleaner, simpler, more effective Expression of DL Application Logic
3-tier DL Architecture Separation of Concerns
3-tier DL Architecture Separation of Concerns Storage
3-tier DL Architecture DO Typing & Instantiation Separation of Concerns Storage
3-tier DL Architecture Composition of DO behaviors DO Typing & Instantiation Separation of Concerns Storage
Pergamos If it sounds like Greek…
Scope of Prototypes • Should we have global DO Types? • Collection-pertinent types: A DO Prototype is defined in the context of a Collection • Support fine grained definition of collection specific kinds of material • Hierarchical naming scheme for types • Theatrical Collection Photo: dl.theatre.photo • Medical Collection Photo: dl.medical.photo • Stored in the “contentModel” metadata attribute • Avoid type collisions
Collection Management • DL = Hierarchy of DO instances • Collections are also DOs • The DL itself is a DO, representing the “super-collection” (the collection of all the collections) • Easily add new collections & sub-collections • All content is modeled in a unified manner & can be characterized • Allow the DL designer to work out the details of each collection independently, yet in a uniform manner
Implementation details • DO Prototypes are • Specified in XML form • Stored in the “TEMPLATE” datastream of the appropriate Collection DO • Loaded, parsed & interpreted by the DO Dictionary in its bootstrap procedure • Transparent to FEDORA • DO Instances are supplied with the “CONTAINER” datastream, containing the pids of the DOs they “contain”
DO Prototypes in detail • MD Sets • Specification of each individual field (label, description, multi-value, mandatory, UI characteristics) • Serialization information (how to store it in FEDORA) • Field mappings (under development) • Files: Automatic conversions (tiff -> jpeg + thumb) • Batch Import: automatically create Dos from zip bundles • Structure: allowed children types • Browsers: browse field • Indices: e.g. subject catalog • Behavior schemes: atomic DO elements
Pergamos • Historical Archive (production) • Folklore Notebooks (testing) • Theatrical Collection, Medical Images & Byzantine music manuscripts (finalization of requirements & specifications) • Undergoing development … the remaining collections are coming next • Historical Archive will be published on early 2006… • … with a multi-lingual UI, hopefully!
Future Work • Fully implement the OO paradigm • OO Inheritance for DO Prototypes (e.g the Notebook type derives from the Book type) • OO Polymorphism for DO instances (e.g the DO “uoadl:1234” is both a Notebook & a Book) • Supply general purpose linking capabilities that exceed structural relations (FEDORA Metadata for Object-to-Object Relationships?) • Deliver on schedule…
Conclusions • If in doubt, use FEDORA • Flexible & Extensible (they mean it) • 1 year of Pergamos development, 2 months of testing & 3 months of production use (Historical Archive) with no serious problems • Though, Sandy & Carl, I’d be grateful for some minutes of your time!!! • DO Prototypes: a realization of Content Models in OO terms, implemented on top of FDOM to handle DO Typing issues automatically • Detailed report on Pergamos to appear…
Thank You • Questions? • Comments? • For details: • "On the Effective Manipulation of Digital Objects: A Prototype-based Instantiation Approach"Kostas Saidis, George Pyrounakis, Mara Nikolaidou, Proc. 9th European Conference on Research and Advanced Technology for Digital Libraries, ECDL 2005, Vienna, Austria, September 2005 • email: saiko@di.uoa.gr