340 likes | 452 Vues
eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources. Natasa Bulatovic Max Planck Digital Library Research and Development. The Max Planck Digital Library (MPDL) in a Nutshell.
E N D
eSciDoc, VIRR and Digitization Lifecycle - insights into an infrastructure for management of digitized resources Natasa Bulatovic Max Planck Digital Library Research and Development
The Max Planck Digital Library (MPDL) in a Nutshell • Max Planck Digital Library (MPDL) is a service unit within the Max Planck Society (MPG) • MPG consists of about 80 institutes in three scientific sections • the Chemistry, Physics and Technology Section • the Biology and Medicine Section • the Human Sciences Section • The core activities of the MPDL lie in building up service infrastructure and tools for publications and research data • MPDL develops software solutions in close cooperation with scientists, librarians and technicians • In the Human Sciences Section several institutes have digitizedcultural artefacts and want to make them open access
How? • PubMan – Publication Management • VIRR – Textual digitized resources management • IMEJI – Image management
VIRR is about • Collaboration of the MPDL with the Max Planck Institute for European Legal History • Motivation: The period of the Holy Roman Empire produced a enormous corpus of legislative sources.Till now no complete collection of this works exist.
ViRR Key features • Web-based collaborative application • Editor (bibliographic metadata, table of contents and structural metadata) • Viewer (online representation) • Browser
ViRR Editor • Combines a set of tools • Paginator • Table of Contents Editor • Metadata Editor • One complex, but flexible workspace • No default order for the usage of the tools
ViRR Editor - Paginator • Assign the logical page numbers to the physical ones • Choose between different formats (Arabic, Latin, custom) • Paginate manually or automatically
ViRR Editor - ToC Editor • Gather the logical structure of a work by breaking it down in structural elements • Arrange the hierarchical order of structural elements in the tree • Assign scans to structural elements • Choose from fine granular structural element types (over sixty)
ViRR Editor – Metadata Editor Assign descriptive metadata to structural elements • Detailed description of every structural element • Systematic browsing • Dedicated search will be possible
ViRR Viewer Browse by ToC Navigate to page View metadata of structural element Browse by scan Page (web resolution) Page (full resolution) on click
ViRR: Sharing and reuse http://virr.mpdl.mpg.de
From ViRR to Digitization Lifecycle Project • Goal • support the complete Digitization Lifecycle with guideliness, standards, tools and a publishing platform • Partners: • MPI for European Legal History, Frankfurt • KunsthistorischesInstitut, Florenz (KHI) • Bibliotheca Hertziana, Rom • MPI for Human Development, Berlin • Related projects: • ViRR(see http://colab.mpdl.mpg.de/mediawiki/ViRR:_Virtueller_Raum_Reichsrecht) • XML-Workflow (see http://colab.mpdl.mpg.de/mediawiki/MPDL_Project_XML_Workflow)
Imeji: repository of Digital Images Organized into • Collections Created and defined by the institution, project, working group • Albums Created and defined by the researcher
Imeji: what is so different about it? Imeji is not Flickr, nor Facebook... • Freely definable metadata profiles at collection level • Controlled Vocabularies may be integrated • Smart search for dates, ranges (based on the metadata type) Helps gathering the metadata more effectively Focusses on collaboration and metadata quality Repository: Data can be exported at any time
Report Handler Report Definition Handler Aggregation Definition Handl. Statistics Data Handler Scope Handler Admin Handler Set Handler (OAI-PMH) Item Handler Container Handler Content Relation Handler Context Handler Organizational Unit Handler Content Model Manager User Account Handler Role Handler Group Handler eSciDoc core infrastructure Statistics Security Resources & Data
CoNE Service • Manages named entities • Journals • Persons • Dewey Decimal Classification (3 public levels) • Creative Commons Licenses (CC licenses) • ISO 639-3 Languages • MIME Types • PACS classification • Custom classifications • Reuse • Data delivered in multiple formats (JSON, HTML, RDF/XML, Options list) • Motivation • Metadata quality: autosuggest components in solutions during metadata editing • Disambiguation: each entity is a named graph • Data linking: CoNE identifiers in publication metadata • Technical facilitation: all lists in one place • Persons: Researcher Portfolio • Extensions • Refresh data from external sources
CoNE – Control of Named Entitieshttp://cone.mpdl.mpg.de/ http://pubman.mpdl.mpg.de/cone/persons/resource/persons2450 + Content negotiation supported
Transformation Service • Transforms textual data formats • Metadata • Resources • Standard formats • Specific formats (e.g. EndNote custom fields) • Motivation • Migration of data from MPI • Exports and dissemination • Imports • Continuous interoperability enhancement • Implement once, use wherever needed
Search&Export ServiceCiation style manager • Searches and exports results • Citation styles (Citation style manager) • EndNote • BibTex • … • Reuse • Data delivered in multiple formats (PDF, HTML, XML, ODT) • By external systems (content management, wordpress) • Motivation • Search results should be available in various outputs • One service – many presentations (e.g. Wordpress Plug-in) • One interface – easy inclusion of various export formats
Syndication Service • Provides with the latest data updates • RSS • Atom • Reuse • Subscription to feeds and data reuse • By any external clients • Extensions • Media RSS Feeds: <feed> <!--The title of the feed --> <title>Recent releases in repository</title> <!--Feed's description --> <description>Recent releases in repository (item versions)</description> … </feed> Feeds: <feed> <!--The title of the feed --> <title>Recent releases in repository</title> <!--Feed's description --> <description>Recent releases in repository (item versions)</description> … </feed> Feeds: <feed> <!--The title of the feed --> <title>Recent releases in repository</title> <!--Feed's description --> <description>Recent releases in repository (item versions)</description> … </feed> 2: Get feed definition 2: Get feed definition 2: Get feed definition Syndication Service 1 4 Syndication Service 1 4 Syndication Service 1 4 3: Search/retrieve items 3: Search/retrieve items 3: Search/retrieve items eSciDoc Repository eSciDoc Repository eSciDoc Repository
Validation service • Semantical validation • Contextual validation • Validation rule editor (upcoming)
Data acquisition service • Fetches data from known sources via identifier (unAPI interface) • Transforms data to other format
Pubman SWORD Server • Deposit of data packages (metadata and fulltexts) • Logic implements a pubman specific workflow
PID Cache manager • Fetches Handles from the GWDG Handle System (dummy resolution) • Assigns a pre-fetched handle to the resource • Synchronizes the assigned handle with the resolution to a resource in the Handle system EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, http://www.pidconsortium.eu/ )
A note on the metadata profiles • DCAP based (Dublin Core Application Profile) • DC terms (identified URIs) • eSciDoc solution specific terms (identified by URIs) • METS/MODS • Publicly available • Functional description http://colab.mpdl.mpg.de/mediawiki/ESciDoc_Application_Profiles • Schemas http://metadata.mpdl.mpg.de/escidoc/metadata/schemas/0.1/ • Interoperability levels • Shared term definitions (done) • Semantic interoperability (done) • Description set syntactic interoperability (prepared) • Description set profile interoperability (prepared)
Premises • Applications • Web-based • Internationalized • Integrated Help system • Easy to use • Easy to install • Services and infrastructure • Reusable, interoperable, composed, technology-independent • Extensible, Scalable and performant • Data • Persistently identified, versioned, discoverable, provenance and authenticity information, fine-grained authorization • Described with published metadata profiles • Interoperable and enabled for reuse and repurpose
Related projects and new developments • DARIAH Digital Research Infrastructure for Arts and Humanities (see http://dariah.eu) • Imeji • AWOB • Astronomers Workbench • Resource Registries • ECHO – European Cultural Heritage Online (seehttp://echo.mpiwg-berlin.mpg.de/home )
Thank you! • bulatovic@mpdl.mpg.de http://colab.mpdl.mpg.de http://escidoc.org