1 / 19

Distributed Open Archives

Distributed Open Archives. Dr. Heinrich Stamerjohanns Institute for Science Networking at the University of Oldenburg. Goals of PhysDoc. dissemenation of articles (objects) low-barrier interoperability framework. Approaches (I) unstructured data. self-archiving (on web servers)

paloma
Télécharger la présentation

Distributed Open Archives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Open Archives Dr. Heinrich Stamerjohanns Institute for Science Networking at the University of Oldenburg

  2. Goals of PhysDoc • dissemenation of articles (objects) • low-barrier interoperability framework

  3. Approaches (I) unstructured data • self-archiving (on web servers) • crawl, harvest articles by Harvester • make searchable through unified search interface • try to extract metadata and/or extend unstructured data by metadata • approaches taken by Harvest, mnogosearch

  4. PhysDoc together with Harvest documents documents documents WWW-Client Gatherer Search Interface Summarizer Filesystem with SOIF Records

  5. Approaches (II)structured data in homogenous environment • store data in relational databases • replicate data with proprietary protocol • can either be synchronous ar asynchronous • or use distributed database • but • same definitions everywhere • same data layout everywere

  6. Approaches (III)structured data in heterogenous environment • collect data from different databases through web search interfaces • meta-search engines • succesful implementation has been done: MetaPhys • but: • relies heavily on layout of presented data • a lot of adjusting needs to be done

  7. Approaches (IV)structured data in heterogenous environment • should: • be in machine-readable format not for humans • use strict formats which can be validated • support various content-models (metadata formats) • use existing technologies • easy to implement • easy to adopt

  8. Low-barrier framework • Transport protocol • HTTP • Data format • XML • Metadata format • interoperability • at least Dublin Core • extensibility • communities can use the metadata format which fits their needs

  9. Open Archive Initiative • OAI defines such a protocol: OA-PMH • is not intended to replace more complete interoperability protocols such as Z39.50 • distinguishes between two classes • Data providers expose metadata about their content • Service providers harvest metadata from data providers by using OA-PMH and offer value-added services such as the possibility to search through the collected data

  10. PhysDoc as OAI-Data-Provider • PMH v2.0 has been implemented by us • phpoai2 written in PHP • open source (GNU license) • supports various SQL databases through PEAR (PHP Extension and Application Repository) • supports on-the-fly XML output compression, which greatly reduces bandwith needs • easily configurable and adaptable to different metadata standards

  11. Metadata container as SQL Database documents documents Gatherer Summarizer Mapper Filesystem with SOIF Records OAI-Gateway Mapper Quality function Normalizer DC, MARC PhysDoc as OAI Data-Provider XML on-the-fly offline

  12. PhysDoc together with ??? • Use of metadata container yields many advantages • consistency check of data • quality assurance • static HTML export • any desired export metadata format besides DC possible  is prepared for any other exchange protocols than OAI

  13. OADPhysDoc as Service-Provider • PhysDoc will offer services to the physics community through Open Archives • articles are collected through OAI from various OAI Data-Providers • other publishers are and will be incorporated through proprietary interfaces. • these interfaces do not depend on layout of the offered data

  14. Metadata Container as SQL DB Mapper Normalizer Scheduler XML Parser WWW Search Interface OAI Data-Provider OAI Data-Provider OADPhysDoc as Service-Provider

  15. OADPhysDoc as Service-Provider • uses expat library to parse XML • currently supports only PMH-1.1 • cannot be easily adapted by other sites • support for PMH-2.0 is in progress

  16. Technical Details • local development • implementation also written in PHP • scheduler is based on database • expat library is used as XML-Parser for OAI and proprietary interfaces • database is again mySQL • with “tricks” • full text extensions • cannot be easily adapted by other sites • support for PMH-2.0 is in progress

  17. Technical Details • successful implementation by testing on the local data-provider • Added another data-provider within five minutes • normalization is again necessary (might raise further technical, textual and legal problems) • [but yet problems • vagueness in protocol definition • 503 flow control… • bad choice, because it depends on layout]

  18. Thank you • OAI am Institute for Science Networking, Oldenburg: http://physnet.uni-oldenburg.de/oai/ • stamer@uni-oldenburg.de

  19. OADDistributed Open Archives • joint project by Virginia Tech and University of Oldenburg Aims: • setup prototype service based on Open Archives which focuses on physics • design and implementation of prototype implementations which run the OAI protocol for metadata harvesting (PMH) • enable establishment and scalable interoperation of hundreds of Open Archives

More Related