1 / 25

OAI from the needle box

OAI from the needle box. Thomas Krichel Palmer School of Library and Information Science Long Island University With apologies to Carl Lagoze. Humboldt Universit ät Berlin, March 20, 2002. Where I come from. Trained economist Early (1991) visionary of free online scholarship

coy
Télécharger la présentation

OAI from the needle box

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OAI from the needle box Thomas Krichel Palmer School of Library and Information Science Long Island University With apologies to Carl Lagoze Humboldt Universität Berlin, March 20, 2002

  2. Where I come from... • Trained economist • Early (1991) visionary of free online scholarship • Creator of NetEc in 1993 • Principal founder of RePEc in 1997 • Largest distributed academic DL in the world • Collection that is open for • Contribution • Usage • Grown to over 200 archives, over 10 partly interoperable user services

  3. Metadata collection process • Metadata is expensive to collect. • Free online scholarship requires academic self-documentation • Building free metadata collection is difficult • no established business model • no established funding channels • Only a collaborative effort will be succeed.

  4. The example of eprint servers • attractive building block for the transformation of scholarly communication • but isolated efforts do not make for a scholarly communication system • need to federate archives • need to interoperate with other scholarly communication components

  5. e-print e-print e-print e-print e-print Example: e-print accessibility

  6. e-print e-print e-print e-print e-print Example: e-print accessibility

  7. e-print e-print e-print e-print metadata harvesting metadata e-print

  8. e-print e-print e-print e-print Author Title Abstract Identifer metadata harvesting metadata e-print

  9. other examples • within the area of scholarly commuication • already implemented in RePEc • Sharing of log data between service providers • Provision non-document data for document data provider • personal data • institutional data

  10. Reply • XML Schema • Self contained core concepts in OAI 1.1 • low-barrier interoperability • data-provider / service-provider model • metadata harvesting model OAI 1.1 protocol HTTP based • shared metadata format Dublin Core • parallel metadata formats Community specific

  11. supportdata repos i tory harves ter oai protocol items harvesting data harvester / repository

  12. repos i tory harves ter OAI protocol requests service provider data provider • Supporting protocol requests: • Identify • ListMetadataFormats • ListSets • Harvesting protocol requests: • ListRecords • ListIdentifiers • GetRecord

  13. HTTP encoding - requests BASE-URL -----------> an.oa.org/OAI-scriptkeyword arguments --> verb=ListIdentifers&set=S1 GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1 POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1

  14. xml namespaces responseheader responsedata HTTP encoding - responses <xml version=1.0 encoding=“UTF-8” ?><GetRecord xmlns=“http://oai.namespace.uri” xmlns:xsi=“http://w3.namespace.uri” xsi:schemaLocation=“http://oai.namespace.uri http://oai.schemaURL”> <responseDate>2000-19-01T19:30:30-04:00</responseDate> <requestURL>http://an.oa.org/OAI-script?verb=GetRecord &amp;identifier=oai%3AarXiv%3A0001 &amp;metadataPrefix=oai_dc</requestURL> <record>record contents </record>additional records</GetRecord>

  15. protocol support format-specificmetadata community-specificrecord data record <record> <header> <identifier>oai:eg:001</identifier> <datestamp>1999-01-01</datestamp> </header> <metadata> <dc xmlns=“http://purl.org/dc”> <title>My Example</title> </dc> </metadata> <about> <ea xmlns=“http://www.arXiv.org/ea” <usage>No restrictions</usage> </ea> </about></record>

  16. harvest withindate range repos i tory record record selective harvesting - datestamps

  17. S1 harvest within set repos i tory record record record selective harvesting - sets S2

  18. Communication re OAI • lists: subscribe via http://www.openarchives.org • oai-general list • oai-implementers list • web: http://www.openarchives.org • FAQ: http://www.openarchives.org/faq.htm • mail: openarchives@openarchives.org

  19. revision of specifications • Version 1.1 frozen specifications for 12 -18 months: • stable for experimentation; not definitive • minimize risk for early adopters • maximize chances for future interoperability across communities The technical committee are working on the “definitive” specifications. They will come out 2002-05-01.

  20. The technical committee - Herbert Van de Sompel (LANL) - Carl Lagoze (Cornell U) - Thomas Krichel (Long Island U & RePEc) - Jeff Young (OCLC) - Tim Cole (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U & arXiv) - Michael Nelson (NASA & NACA) - Caroline Arms (Library of Congress) - Muhammad Zubair (Old Dominion U & ARC) - Steven Bird (U Penn & Open Language Archive Community) - Robert Tansley (MIT & DSpace) - Andy Powell (UK (UKOLN) - Mogens Sandfær (DTV, Denmark) - Thomas Severiens (Oldenburg U & Physnet) - Thomas Baron (CERN) - Les Carr (U of Southampton) - Thomas Place (Tilburg U)

  21. Issues in front of the committee Error Handling: SOAP: Harvesting Granularity:  Mandatory DC: Set Semantics and Collection Description: XML Schema: Result Set Filtering: Flow Control, Result Set Cardinality, Response Level Container: Awareness Mechanisms: Multiple Metadata Return and "Best" Metadata Selection: Machine Readable Rights Management: From GetRecord to GetRecords: Dedupping Issues: idempotency of base-urls: xml format for mini-archives: response compression:

  22. Thank you for your attention! Thomas Krichel Palmer School of Library and Information Science 720 Northern Boulevard Brookville NY 11548-1300 USA http://openlib.org/home/krichel Krichel@openlib.org

  23. Error handling • badArgument • badGranularity • badResumptionToken • badVerb • cannotDisseminateFormat • idDoesNotExist • noRecordsMatch • noSetHierarchy

  24. SOAP • SOAP is a mechanism to transmit service requests over the Internet. • As yet it is not a fully matured protocol. • A SOAP compatible version of the protocol may be written later.

  25. Harvesting granuality • From and Until arguments may allow a more finer time stemps, up to one second. • Level supported is chosen by the data provider and set in the response to the Identify verb. • All times expressed in UTC.

More Related