320 likes | 482 Vues
Open Archives Initiative. Where we are, Where we are going. Carl Lagoze 4 th OAF Workshop September, 2003. Where we are now. De facto standard for Internet information exchange Deployed extensively and internationally (digital) libraries Museums Eprint repositories Research projects.
E N D
Open Archives Initiative Where we are, Where we are going Carl Lagoze4th OAF WorkshopSeptember, 2003
Where we are now • De facto standard for Internet information exchange • Deployed extensively and internationally • (digital) libraries • Museums • Eprint repositories • Research projects
Protocol Stability • OAI-PMH has been stable since release • No functional changes, just typographic edits • Validation of leadership/participation model • No plans for a 3.0 release • Core protocol will not be extended • Minor 2.x release could occur (more later) • Additional implementation guidelines (more later)
The NSDL Context • National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library • Major National Science Foundation project targeted at the application of web and Internet to (STEM) education • $25M over six years to over 100 projects • Collections • Services • Targeted Research • Core Integration
NSDL technical guidelines • Aggregation rather than collection • Core integration team will not manage any collections • Spectrum of interoperability • Accommodate diversity of participation models • Open interfaces and standards permitting plug in of array of value-added services • One library many portals • Accommodate multiple quality and selection metrics • Tailor presentation of content and nature of services to audience needs
Spectrum of interoperability Level Agreements Example Federation Strict use of standards AACR, MARC (syntax, semantic, Z 39.50 and business) Harvesting Digital libraries expose Open Archives metadata; simple metadata harvesting protocol and registry Gathering Digital libraries do not Web crawlers cooperate; services must and search engines seek out information
Translating to initial goals • This is a big task that no one has done before! • Work on the priorities • Focus on one point on spectrum of interoperability • Metadata harvesting • Incorporate NSF funded collections and selected other collections • Leverage existing (or at least emerging) technologies and protocols • OAI, uPortal, Shibboleth, SDLIP, InQuery • Provide reliable base level services • Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence • Plant some seeds for the future • Machine-assisted metadata generation • Automated collection aggregation • Web gathering strategies
Metadata Repository • Central storage of all metadata about all resources in the NSDL • Defines the extent of NSDL collection • Metadata includes collections, items, annotations, etc. • MR main functions • Aggregation • Normalization • redistribution • Ingest of metadata by various means • Harvesting, manual, automatic, cross-walking • Open access to MR contents for service builders via OAI-PMH
Metadata Strategy • Collect and redistribute any native (XML) metadata format • Provide crosswalks to Dublin Core from standard formats • DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD • Concentrate on collection-level metadata • Use automatic generation to augment item-level metadata
Cleanup and crosswalks Harvest Database load Metadata Repository Staging area Collections Importing metadata into the MR
NSDL and OAI-PMH Two years later • Concepts are good, practice is hard • Issues • Metadata is hard • http://www.well.com/~doctorow/metacrap.htm • XML is hard • Protocols are hard • Static repositories (more later) • IP is relevant (more later)
Some Essential Metadata Questions • Review original (DC) metadata assumptions • Metadata is essential for good resource discovery • “Joe Sixpack” could create metadata • Account for current realities • 2003 is not 1994 • Google, etc. keeps getting better
Reconsidering the Dublin Core Requirement • Questions about utility of unqualified DC • The conundrum…. • Specification too loose to serve intended interoperability goal • But more complex metadata may be too hard • Limited energy for interoperability • Data providers implement required DC at expense of better metadata • Use of protocol for purposes other than resource discovery
Rethinking record-oriented model Implications for record-oriented harvesting????
Topology Evolution Simple Data Provider, Service Provider Topology
Topology Evolution (cont.) Metadata Aggregator
Topology Evolution (cont.) OAI-PMH p2p network
OAI-P2pMH Issues • Document (metadata) location • Exploit unique identifiers, use efficient key-based location mechanisms (distributed hash tables) • Provenance-based queries • Metadata records may go through refinement and/or translation phases as they move through value-added aggregators. • Exploit provenance guidelines • Network harvesting • Broadcast query (Gnutella) inefficient • Exploit techniques for efficient routing of queries (P-trees)
OAI-PMH and Intellectual Property • Protocol exists in a context where information providers have concerns about use of intellectual property • OAI-PMH is nominally about metadata, but… • Rich metadata is an intellectual product • The protocol can be used to transmit anything (e.g. content) that can be encoded in XML • Generally metadata leads to content so….
OAI-rights effort • Goal is to investigate and develop means of expressing rights about metadata and resources in the OAI framework. • The result will be an addition to the OAI implementation guidelines that specifies mechanisms for rights expressions within OAI-PMH. • No changes to core protocol
OAI-rights Effort (cont.) • Extensible, providing a general framework for expressing rights statements within OAI-PMH. • Not an effort to develop a new rights expression language • Use Creative Commons licenses as a motivating and deployable example. • Release of specification by 2nd quarter ’04 • Invited OAI-rights group • Standard OAI development model
Dimensions of OAI-PMH and rightsEntity Association • Metadata: concern in NSDL for (re)use of rich metadata • Content: predominant application of the protocol to resource discovery and ultimate access makes this important
Dimensions of OAI-PMH and rights Aggregation Association • OAI-PMH aggregations • Repository • Set • Item • Rights association with an aggregation may provide shortcut (e.g., the rights for all resources in a repository/set…) • Cost of shortcut is pseudo-statefulness, possibly complex overriding rules
Dimensions of OAI-PMH and rightsBinding • Choices • exploit mechanisms in metadata formats e.g., DC-rights • restrict the rights statements to some more specific protocol mechanism • allow some mixture of these methods. • DC-rights problems • Semantics is restricted to rights about resource • Can’t embed XML in dc value • What if DC is not required • Burden on harvesters if rights embedding is not explicit but scattered across several locations
OAI-PMH Static Repositories • Provide a lightweight mechanism for data provider participation • Intended for relatively small and static collections • Two components • Static Repository XML format • Semantically equivalent to Identify and ListRecords • Invisible to harvester • Static Repository Gateway • Virtual data provider for static repository data • Unique baseURL for each “contained” static repository
Static Repositories Open Issue Relationship to RSS?????
Conclusions • Interoperability and lowest common denominator • Rapid advances automated methods • Moore’s law • Smart algorithms • Benefits of issues of scale • Combining human effort and automated methods • Extracting order from chaos • Learning from order • Move beyond resource discovery