1 / 28

A Registry for controlled vocabularies at the Library of Congress

A Registry for controlled vocabularies at the Library of Congress. Rebecca Guenther Network Development & MARC Standards Office, Library of Congress October 29, 2008. Outline of presentation. Types of controlled vocabularies Vocabularies maintained at LC An introduction to SKOS

cheri
Télécharger la présentation

A Registry for controlled vocabularies at the Library of Congress

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Registry for controlled vocabularies at the Library of Congress Rebecca Guenther Network Development & MARC Standards Office, Library of Congress October 29, 2008

  2. Outline of presentation • Types of controlled vocabularies • Vocabularies maintained at LC • An introduction to SKOS • Establishing concept databases at LC • Examples of concept schemes: ISO 639-2 and PREMIS event type • Providing the registry as a web service ASIST 2008

  3. Why establish controlled vocabularies? • Control values that occur in metadata • Document and publish for reuse • Reduce ambiguity • Control synonyms • Establish formal relationships among terms (where appropriate) • Test and validate terms ASIST 2008

  4. Types of Controlled Vocabularies used in metadata standards • Lists of enumerated values • Code lists (e.g. language, country) • Taxonomies • Formal Thesauri • Locally controlled enumerated lists ASIST 2008

  5. Enumerated lists • Simple list of terms used in a pull-down menu or Web site pick list • Values enumerated in an XML schema • Little additional information or structure about each value • Examples: • Code and value from a MARC 21 fixed field, e.g. code “e” in Leader/06 is “cartographic material” • Enumerated value “MD5” for METS CHECKSUMTYPE • Enumerated value “born digital” in MODS digitalOrigin ASIST 2008

  6. Code lists • Some established as ISO standards and used worldwide in many communities for many purposes • The standard standardizes the code, not a particular name for it • Codes are used as identifiers • Examples (maintained by LC): • ISO 639-2 (language codes) • MARC relator codes • MARC country codes ASIST 2008

  7. Thesauri • A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF paddy BT Cereals BT Plant products NT Brown rice RT Rice straw ASIST 2008

  8. Standards maintained at LC that use controlled vocabularies • MARC (including code lists) • MODS • METS • MIX (XML schema for Z39.87 Technical metadata for digital still images) • PREMIS • ISO 639-2 (language codes) • Thesaurus of Graphic Materials • LCSH • … and some others ASIST 2008

  9. SKOS: What is it? Simple Knowledge Organisation System(s) SKOS is … for declaring and publishingtaxonomies, thesauri or classification schemes, for use in adistributed, decentralised information system (i.e. a semantic web). fordescribing Concepts and creating relationships between Concepts and Terms A practical application of RDF a formal language for representing controlled, structured vocabularies ASIST 2008

  10. The SKOS data model 10 ASIST 2008 …views a knowledge organization system as a concept scheme comprising a set of conceptual resources (concepts). • These concept schemes and conceptual resources are identified by URIs. • The model is multilingual and extensible

  11. Concepts can be… 11 ASIST 2008 labeled with any number of strings. One label, in any given language, can be indicated as the "preferred" label for that language, and others as "alternate“ labels, "hidden“ labels, or using a notation: • skos:prefLabel • skos:altLabel • skos:hiddenLabel • skos:notation

  12. Concepts can be… 12 ASIST 2008 linked to other concepts within the same concept scheme. • Hierarchical links: • skos:broader and skos:narrower • skos:broaderTransitive and skos:narrowerTransitive • Associative links: • skos:related

  13. Concepts can be… 13 ASIST 2008 grouped into collections, which can be labeled and/or ordered. A concept can be in one or more collections • skos: Collection • skos: OrderedCollection • skos: member • skos: memberList

  14. Concepts can be… 14 ASIST 2008 mapped to other concepts in different concept schemes. • Hierarchical mapping: • skos:broadMatch • skos:narrowMatch • Associative mapping: • skos:relatedMatch • skos:closeMatch • skos:exactMatch

  15. Advantages to using SKOS • SKOS has a defined element set which is particularly relevant for controlled vocabularies • Relationships between entries in a thesaurus can be expressed (broader, narrower, etc.) • Relationships between entries in different thesauri can be expressed (exactMatch, related) • Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards ASIST 2008

  16. Controlled vocabularies registry at LC • Library of Congress is establishing databases with controlled vocabulary values for standards that it maintains • Controlled lists are represented using SKOS as well as alternative syntaxes • Lists currently in progress: • ISO 639-2 and MARC language code list • MARC geographic area codes • MARC country code list • MARC relators • PREMIS controlled value lists • Thesaurus of Graphic Materials • Other possibilities • Enumerated values in MODS schema • Coded and uncoded value lists in MARC ASIST 2008

  17. Reasons for developing a registry • Facilitate development and maintenance process • Make controlled lists openly available • Develop a web service where comprehensive information about controlled terms is available • Experiment with semantic web technologies • Expose vocabularies to a wider communities ASIST 2008

  18. http://www.loc.gov:8081/standards/registry/lists.html

  19. Example: ISO 639-2 vocabulary • One in the family of ISO 639 language coding standards • Has a close relationship with other language coding standards (ISO 639-1 and -3, MARC) • LC is maintenance agency • The standard is the CODE, not the language name; multiple names are given ASIST 2008

  20. ISO 639-2 language code example <rdf:Descriptionrdf:about= "http://www.loc.gov/standards/registry/vocabulary/iso639-2/por"> <rdf:typerdf:resource="http://www.w3.org/2008/05/skos #Concept"/> <skos:prefLabelxml:lang="x-notation">por</skos:prefLabel> <skos:altLabelxml:lang="en-Latn">Portuguese</skos:altLabel> <skos:altLabelxml:lang="fr-Latn">portugais</skos:altLabel> <skos:notationrdf:datatype="xs:string">por</skos:notation> <skos:definitionxml:lang="en-Latn">This Concept has not yet been defined.</skos:definition> <skos:inSchemerdf:resource="http://www.loc.gov/standards/registry/vocabulary/iso639-2"/> <vs:term_status>stable</vs:term_status> <skos:historyNoterdf:datatype="xs:dateTime">2006-07-19T08:41:54.000- 05:00</skos:historyNote> <skos:exactMatchrdf:resource= "http://www.loc.gov/standards/registry/vocabulary/iso639-1/pt"/> <skos:changeNoterdf:datatype="xs:dateTime">2008-07- 09T13:49:05.321-04:00</skos:changeNote> </rdf:Description>

  21. PREMIS controlled lists • PREMIS Data Dictionary for Preservation Metadata • Some semantic units call for controlled vocabularies and have suggested lists • A central registry could document and make them available • Users could submit their own terms • PREMIS schema could be enhanced with enumerated values for validation generated dynamically ASIST 2008

  22. PREMIS event type example <rdf:Description rdf:about= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/creation"> <rdf:type rdf:resource= "http://www.w3.org/2008/05/skos#Concept"/> <skos:prefLabel xml:lang="en-latn"> creation</skos:prefLabel> <skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/migration"/> <skos:narrower rdf:resource= "http://www.loc.gov/standards/registry/vocabulary/preservationEvents/normalization"/> <skos:definition xml:lang= "en-latn">the act of creating a new object</skos:definition> <skos:inScheme rdf:resource= "http://www.loc.gov/standards/registry/vocabulary /preservationEvents"/> </rdf:Description>

  23. Registry Web service XML Database using XQuery (eXist) RDF Triple Store (Sesame) HTTP request User Runs queryGets resultsSends back to database and then to user Interprets URIFormulates SPARQL query

  24. Further development • Consider programming changes to improve speed • Develop mechanisms to output all public documentation from database • Include additional coding about relationships to other concept schemes and controlled vocabularies (facilitating crosswalks) • Encourage experimentation ASIST 2008

  25. Questions? • Contacts: • Rebecca Guenther: rgue@loc.gov • Clay Redding: cred@loc.gov ASIST 2008

More Related