180 likes | 279 Vues
Making Metadata Work for the NSDL. Starting from Sept. 2001 with . A prototype with not much behind it that was re-usable ( http://siteforscience.org ) Lots of good ideas based on that prototype An Oracle license A very small group of people with many different visions of what we were doing
E N D
Starting from Sept. 2001 with ... • A prototype with not much behind it that was re-usable (http://siteforscience.org) • Lots of good ideas based on that prototype • An Oracle license • A very small group of people with many different visions of what we were doing • The management structure of a research project (e.g., none)
... jump to Dec. 2, 2002, when you will see (http://nsdl.org): • A Metadata Repository with roughly 250,000 metadata records (items and collections) • A uPortal-based user interface, containing: • a search service • a simple topic browse of collections • featured collection exhibits • views of future enhancements • A developing plan for the future
Getting from there to here • Designing the Metadata Repository • Working with unfinished standards • Dublin Core in transition • XML schema for qualified DC in early stages • OAI 2.0 not yet cooked • Concerns from partners and funders around quality issues • Envisioning Simple Metadata-Based Services (SiMBaS)
The Metadata Repository • Designed to scale • Based on an automated harvest/expose model with OAI at each end • A notion of “normalized metadata” with qualified Dublin Core as its base • Transformations on the way in, native and transformed re-exposed
Standards at the bleeding edge • Metadata strategy based on crosswalking from 8 formats to one (NSDL-DC) • The reality: a Baskin-Robbins model of “standard metadata” • Standards badly documented, little organized support, very little training available at any price • Projects not obligated to offer metadata, even if they had it (in whatever form)
OAI in transition—the story in 2002 • Version 1.1 was not yet widely used • Version 2 not yet available; NSDL became beta-tester (!) • Final version of OAI 2.0 delayed by NSDL needs (definition of change) • Now working with collection partners to bring up servers, ensure validation • Lower end option for OAI on the way
The DC schema wars • DC-Architecture group working primarily on RDF schema • Gang of Five began work outside DC, presented version for comment to DC-Architecture Oct. 2002 • Process of approval not yet complete, NSDL using “final” version
A few schema issues ... • Three namespaces • Restricted to “simple literal” values • Refinements expressed as elements • Encoding schemes expressed as new complexTypes (schemes not limited to a single element) • NSDL Schema types: • NSDL-DC • NSDL-Search • NSDL-All
The Process • Data harvesting • Data evaluation • Transform specification • DB_insert file creation • Database ingest • OAI re-exposure
Data evaluation • XML validity • DC conformance (whether simple or qualified) • Emphasis on Date, Type, Format, Identifier • Potential problem areas: • Special characters • “funky text” • Tools: XML Spy, Spotfire
Specifying transform • Simple transforms (DC simple—>DC qualified) • Scheme identification for standard values (date, type, format, language) • Quality transforms • improving functionality of search limits by ensuring appropriate values for type and format • improving user experience by deleting funky text and special characters that affect display
DB_Insert file • Header • First harvest? • Category (item, collection, annotation ...) • Harvest date • Source • Link to “native” metadata
OAI exposure • OAI “About” • OAI “Provenance” • Metadata origin and rights assertions • Alterations to originally harvested data • Re-harvest information • Collection (& brand) association
Still to do ... • Currently running on “manual” • Ingest process not yet completed or documented • data validation routines • additional metadata types (annotation) and services linked to metadata
Automation opportunities • Collection registration • assignment of unique identities • “responsible entities” • harvest/re-harvest, transform/re-transform, and associated record keeping • integrating/linking new information with metadata record (service model)
Other challenges • Educating data providers and aggregators about GOOD METADATA • Better techniques for evaluation and transformation • Coping with users? (Uh, oh)
For more information • The NSDL Metadata Primer (http://metamanagement.comm.nsdlib.org/outline.html) • NSDL XML schema (http://ns.nsdl.org/schemas/nsdl_dc/nsdl_dc_v1.00.xsd)