120 likes | 231 Vues
This document provides detailed implementation notes on the Open Archives Initiative (OAI) for three prominent collections: NASA's Langley Technical Report Server (LTRS), the National Advisory Committee for Aeronautics (NACA), and the Open Video Project. It highlights the significance of OAI in enhancing the accessibility and exposure of digital libraries within NASA, as well as in offering a collaborative platform for Open Video research. This resource discusses metadata structures, protocols, and implementation challenges encountered during the project's development.
E N D
OAI Implementation Notes for LTRS, NACA and Open Video Michael L. Nelson NASA Langley Research Center & University of North Carolina mln@ils.unc.edu http://www.ils.unc.edu/~mln/ OAI Open Meeting, Washington DC, January 23, 2001
Collections Represented • NASA • LTRS (Langley Technical Report Server) • ~2300 reports, begun in 1992 • http://techreports.larc.nasa.gov/ltrs/ • OAI: http://techreports.larc.nasa.gov/ltrs/oai/ • NACA (National Advisory Committee for Aeronautics) • NACA was the predecessor organization to NASA, operating from 1917-1958 • ~6300 reports, begun in 1996 • http://naca.larc.nasa.gov/ • OAI: http://naca.larc.nasa.gov/oai/
Collections Represented • University of North Carolina • The Open Video Project • ~ 200 public domain video segments, project begun in 1998 • http://www.open-video.org/ • OAI: http://buckets.dsi.internet2.edu/openvideo/oai/ • Open Video contents and OAI services still strictly experimental
NASA: Why is OAI Important? • NASA builds DLs out of necessity, but ultimately NASA is a publisher • Interested in maximum exposure of and accessibility to its “unrestricted, unlimited” contents • In the NASA DLs, we left our “dark matter” partially exposed • individual reports were spidered by robots anyway… • OAI provides a more formal interface & protocol for exposing contents
UNC: Why is OAI Important? • goal is to grow Open Video into a TREC-like corpus for video segments to share with the research community • a standard collection of short (10 seconds – 1 hour) video segments on which to perform video content based retrieval • variability in video types: color/b&w, sound/no sound, high/low motion, etc. • currently in MPEG-1 • others formats in the future
OAI Implementation • Protocol only specifies CGI stub • many implementations possible • I used a “bucket” for each: LTRS, NACA & Open Video • buckets are aggregative, computational entities normally used for data storage • generally, 1 bucket per “report” • buckets = metadata + data + methods
OAI Bucket Structure Bucket index.cgi _method.pkg _http.pkg _log.pkg _tc.pkg oai source files for methods http dependency files terms and conditions oai.pl element is a support library that defines access for the specific DL logs _md.pkg _state.pkg metadata bucket state bucket payload is DL specific support library default bucket packages in addition to the ~ 30 bucket methods each OAI verb is implemented as a separate method
NACA OAI Implementation normal WWW use OAI requests NACA file system OAI responses built from examining structure of NACA filesystem OAI Server 1917 1918 . . . 1958 . . . . . . naca-tn-1 LTRS, NACA, Open Video have different file structures, metadata formats,etc. refer metadata thumbnail GIFs full size GIFs index.cgi
Implementation • Did not implement sets • possible set candidates: • NACA: years, report type • LTRS: NASA STI subject classification • Only supporting Dublin Core • DC not sufficient for targeted applications • Did not implement resumptionToken
if load > 0.05 redirect request http://blah/oai/?verb=ListIdentifiers OAI Server harvester HTTP Status Code 302 naca.larc.nasa.gov/oai/ http://blah/oai/?verb=ListIdentifiers <?xml version=“1.0” encoding=“UTF-8”?> … <ListIdentifiers> … </ListIdentifiers> OAI Server buckets.dsi.internet2.edu/naca/oai/ 302 Load Balancing • Interactive users on main DL machine should not be impacted by metadata harvesting • don’t take deliveries through the front door
Metadata Quality • XML is very brittle – 1 bad character in the metadata and an entire ListIdentifiers mesg can be damaged • yes, my DLs should be more diligent about scrubbing their metadata, but… • author contributed metadata particularly a problem (e.g. control characters from copy-n-paste) • one advantage of resumptionToken is that it compartmentalizes bad data
OAI Impact • Can use OAI to build our own generalized services • updates, alerts • Finally have a clean method to export metadata, both to: • the general community for unrestricted data • closed communities with restricted data • Los Alamos, Air Force Research Laboratory, NASA