100 likes | 258 Vues
Harvesting Process. Isovera Consulting Feb. 2006. Quarterly Harvesting Process. Harvesting Overview. Goal: To collect decentralized metadata for learning objects, located at various collections, in a central “portal” database. Requirements:
E N D
Harvesting Process Isovera Consulting Feb. 2006
Quarterly Harvesting Process Internet consulting for non-profits
Harvesting Overview • Goal: To collect decentralized metadata for learning objects, located at various collections, in a central “portal” database. • Requirements: • Portal and Collections must agree upon the metadata format, i.e., the BEN Metadata Specification • Collections must provide metadata which adheres to Portal’s specifications, i.e., is of sufficient quality, includes all required fields, etc. • Portal and Collections must agree upon a protocol (communication procedure), i.e., the Protocol for Metadata Harvesting • Portal must implement a “thresher” program which requests data from Collections • Collection must implement a “reaper” program which responds to thresher requests. Internet consulting for non-profits
Quarterly Harvesting Process, Part 1 • BEN Project Manager asks each collaborator to determine whether a significant body of resources has been added or modified in the past quarter, and have been properly peer reviewed. • Collaborator responds, indicating readiness or lack of readiness for harvesting. • BEN Technical Staff use Harvester/Thresher administrative tools to harvest resources from Collaborator to the BEN staging site. • BEN Technical Staff prepare a report of all resources harvested from the Collaborator. Internet consulting for non-profits
Quarterly Harvesting Process, Part 2 • BEN Project Manager and Collaborator Project Manager review the resources on the staging site. • If some resources have technical problems, BEN Technical Staff re-harvest to the staging site, and another round of review begins. • If resources appear to be technically sound, BEN Technical staff use Harvester/Thresher administrative tools to harvest resources from Collaborator to the BEN production site. • Once resources have been harvested to the production site, BEN Portal end users may view them. Internet consulting for non-profits
Requirements for Harvested Resources • Metadata includes required metadata fields (see Metadata presentation.) • Resource is peer-reviewed for scientific accuracy and educational value. • URL is well-formed and resolves to a proper digital resource. • Metadata is accurate, and lacks spelling and grammatical errors. Internet consulting for non-profits
Process Technical Overview • BEN Harvester/Thresher issues “Identify” request. • Collaborator Harvester/Reaper responds with identifying information. • BEN Harvester/Thresher issues “ListIdentifiers” request. • Collaborator Harvester/Reaper responds with a list of identifiers for all resources created or modified since the last harvest. • Usually, an identifier is just a number automatically generated by the database as a primary key. • BEN Harvester/Thresher issues a “GetRecord” request for each resource listed in Step 4. • Collaborator Harvester/Reaper responds with a BEN-LOM XML document for the requested resource. Internet consulting for non-profits
Process Technical Overview Internet consulting for non-profits
How the Harvester/Reaper Works • Harvester/Thresher issues HTTP Request • e.g., http://www.collaborator.org/reaper.cgi?verb=GetRecord&identifier=123&metadataPrefix=oai_BEN • Collaborator web server executes Harvester/Reaper CGI program (Perl script) • Harvester/Reaper parses HTTP request • Harvester/Reaper requests record from XML-DBMS library • XML-DBMS library reads and parses file mapping database structure to XML document. • XML-DBMS issues SQL queries and transforms query results into raw XML. • Harvester/Reaper transforms XML into well-formed BEN-LOM. • Harvester/Reaper wraps BEN-LOM document in OAI-PMH envelope. • Harvester/Thresher opens OAI-PMH envelope, reads BEN-LOM document, and inserts metadata into BEN Portal database. Internet consulting for non-profits
References • Open Archives Initiative Protocol for Metadata Harvesting - http://www.openarchives.org/OAI/openarchivesprotocol.html • Metadata Harvester Quick Start Guide - http://www.biosciednet.org/docs/BENMetadataHarvesterQuickStartGuide2.0.pdf • BEN Collaborators Peer Review Policies - http://www.biosciednet.org/project_site/PeerReviewProcessOfBENPartners.pdf Internet consulting for non-profits