160 likes | 280 Vues
This presentation outlines tools and services developed to implement the PREMIS standard within the METS container format, emphasizing the PREMIS in METS Toolbox. It features open-source tools for validation, conversion, and description of metadata. Additionally, it discusses the importance of using controlled vocabularies through the authorities and vocabularies web service, providing access to linked data for enhanced resource discovery. Key topics include SKOS for knowledge organization, URI usage, and the technological infrastructure supporting these systems.
E N D
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress rgue@loc.gov NDIIPP Partners Meeting July 21, 2010
Outline of presentation • PREMIS in METS Toolbox (PiM) • Authorities and vocabularies web service (id.loc.gov) NDIIPP Partners Meeting
PREMIS in METS toolbox • Developed by Florida Center for Library Automation under contract with LC • A set of open-source tools to support the implementation of PREMIS especially in the METS container format • 3 components: validate, convert, describe • Source code being made available: http://pimtoolbox.sourceforge.net NDIIPP Partners Meeting
Describe: uses the DAITSS description service <premis> <ext></premis> /a/real/file droid/jhove
Convert: between PREMIS and PREMIS in METS OR PREMIS in METS to PREMIS <mets> <premis></mets> <premis/> xslt
Validate:PREMIS in METS document confirmation orerrors <mets> <premis/></mets> Schematron
Demo: http://pim.fcla.edu/ Audio file: http://lcweb2.loc.gov/diglib/ihas/loc.natlib.ihas.200150574/default.html http://lcweb2.loc.gov/natlib/ihas/service/sousa/200150574/0001.mp3 PDF file: describe demo.pdf Image: http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.09601/default.html NDIIPP Partners Meeting
Authorities and vocabularies web service • id.loc.gov • Makes LC owned and maintained authorities vocabularies available as Linked Data • Allows both human-oriented and programmatic access to LC-promulgated authorities and vocabularies. • First offering was LCSH; later additional vocabularies added • Search and download available NDIIPP Partners Meeting
Why establish controlled vocabularies? • Control values that occur in metadata • Reduce ambiguity • Control synonyms • Document and publish for reuse • Test and validate terms • Establish formal relationships among terms (where appropriate) • Includes enumerated values in schemas, formal thesauri, code lists, etc. NDIIPP Partners Meeting
Standards maintained at LC that contain controlled vocabularies • LCSH/NAF • Thesaurus of Graphic Materials • MARC Code lists: GACs, countries, languages • ISO 639-2 and ISO 639-5 (language codes) • Other MARC controlled lists • Enumerated lists in XML schemas • MODS enumerated values • METS enumerated values • MIX (Technical metadata for digital still images) • PREMIS controlled vocabularies • Others… NDIIPP Partners Meeting
Simple Knowledge Organization System (SKOS) • RDF application used to express knowledge organization systems such as thesauri, taxonomies and the concepts within. • SKOS has a defined element set which is particularly relevant for controlled vocabularies • Relationships between concepts in a concept scheme can be expressed (e.g. broader, narrower) and between concepts in different schemes • Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards NDIIPP Partners Meeting
“Linked Data” • A feature of the “Semantic Web” where links are made between resources • Goes beyond hypertext links (i.e. between web pages) but between any kind of object or concept • From Wikipedia: "a term used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web” • Users can use links to find similar resources and aggregate results • Interaction between data relies on URIs NDIIPP Partners Meeting
Reasons for developing a web service for vocabularies • Facilitate development and maintenance process for vocabularies • Make controlled lists openly available • Provide comprehensive information about controlled terms • Experiment with semantic web technologies and linked data • Expose vocabularies to wider communities NDIIPP Partners Meeting
URIs in id.loc.gov • Interaction with any given individual term and vocabulary is with its URI • Some examples of URIs: http://id.loc.gov/vocabulary/relators/art http://id.loc.gov/vocabulary/graphicMaterials/tgm005222 http://id.loc.gov/vocabulary/preservationEvents/migration http://id.loc.gov/authorities/sh85063136 • Known-label searches: use when you know the label but not the identifier http://id.loc.gov/vocabulary/relators/label/artist http://id.loc.gov/authorities/label/hunting%20dogs NDIIPP Partners Meeting
Technical infrastructure • Django (Python) • LCSH • MySQL • SKOS RDF generated at time of request • Operates, more or less, as traditional relational DB • MARC mapped to relational DB tables • Everything else • RDFlib (Python library, uses MySQL as triplestore) • Runs on triples • XML to SKOS RDF/XML before ingest • XSL, Xquery used NDIIPP Partners Meeting
Next steps • MADS OWL Schema to enable identification of facets e.g. Aeronautics--Soviet Union—History • Enhance existing vocabularies to show relationships • Broader/narrower relator terms • Matches to other vocabulary terms (e.g. MARC vs. ISO 3166 country codes) • Add new vocabularies • PREMIS controlled vocabularies • MARC country, geographic area, languages • ISO 639-2 and 639-5 • Name authorities • Enhance PiM to validate PREMIS vocabulary terms NDIIPP Partners Meeting