420 likes | 536 Vues
An Introduction to Dublin Core Making Sense of Metadata, Society of Archivists EAD/Data Exchange SIG London, Thursday 17 November 2005 Pete Johnston Research Officer, UKOLN, University of Bath. UKOLN is supported by:. www.bath.ac.uk. An Introduction to Dublin Core. A brief history
E N D
An Introduction to Dublin Core Making Sense of Metadata, Society of Archivists EAD/Data Exchange SIG London, Thursday 17 November 2005 Pete Johnston Research Officer, UKOLN, University of Bath UKOLN is supported by: www.bath.ac.uk
An Introduction to Dublin Core • A brief history • What is Dublin Core, really? • The DCMI Abstract Model • Encoding Dublin Core metadata • DC Application Profiles • DC in practice
A brief history (1) • Mid 1990s: rapid growth of World Wide Web • Challenge of resource discovery • search engines providing many hits, but little precision • recognition that library approach to cataloguing could not scale to Web resources • 1995 OCLC/NCSA Workshop in Dublin, Ohio • interdisciplinary consensus on 13 "metadata elements" • for discovery of "document-like objects" • relatively simple, usable by non-cataloguers • 1996 OCLC/CNI Workshop in Dublin, Ohio • expand to 15 elements • explicitly cross-domain • for discovery of broad range of "resources"
Title Subject Description Creator Publisher Contributor Date The Dublin Core Metadata Element Set • Type • Format • Identifier • Source • Language • Relation • Coverage • Rights
A brief history (2) • 1997-2000 Development of notion of "qualification" • tension between simplicity and complexity • element refinement • Narrow the meaning of a DC element • e.g. "date modified" v "date" • encoding scheme • Provide additional information about a value • e.g. that a subject is a Library of Congress Subject Heading • the "Dumb-Down" principle • Rules for transforming "qualified" description into "simple" description • the "One-to-One" rule • A DC description describes exactly one resource
A brief history (3) • 1997-2000 What is a "resource"? • e.g. Can the DCMES be applied to people? • DCMI Type Vocabulary • Collection, Dataset, Event, Image (Still or Moving), Interactive Resource, Service, Software, Sound, Text, Physical Object • But still fairly non-prescriptive • 1998- Emergence of Resource Description Framework (RDF) • 2000-2001 "Grammatical Principles" as informal data model
A brief history (4) • 2000-2005 Development of notion of DC "Application Profile" • tailoring metadata standards for context • providing local guidelines, constraints • combining components from different sources • 2003-2005 Formalisation of DCMI Abstract Model • concepts used in DC metadata • different types of terms used in DC metadata • how those terms used in combination to construct descriptions
Dublin Core is... • a conceptual framework/set of rules... • DCMI Abstract Model • describes how to use certain types of terms • ... to make statements... • ... that form descriptions (of resources) • a "core" vocabulary/set of terms... • managed by DCMI (Usage Board) • growing (relatively) slowly as new requirements arise • each identified by a Uniform Resource Identifier (URI) • a set of specifications for representing or encoding DC metadata descriptions in various formats
DCMI Abstract Model • A description • describes exactly one resource • may specify a resource URI • consists of a set of statements
Description Statement Resource URI DCMI Abstract Model: Descriptions
DCMI Abstract Model • A statement must contain • a reference to a property • property URI • all DC "elements" are properties • properties may be defined by agencies other than DCMI • a reference to a second resource (value) • value URI, and/or • one or more value representations • value string • rich representation
Description Statement Resource URI Property URI Value URI Property URI Value string Property URI Rich representation DCMI Abstract Model: Statements
DCMI Abstract Model • A statement may contain • a reference to a vocabulary encoding scheme • vocabulary encoding scheme URI • type of value • a reference to a syntax encoding scheme • syntax encoding scheme URI • how value string is interpreted
Description Statement Resource URI Property URI Value URI Vocab Enc Scheme URI Property URI Value string Syntax Enc Scheme URI Property URI Rich representation DCMI Abstract Model: Statements
DCMI Abstract Model • A description describes one resource • Applications typically based on description sets • groups of descriptions • where the described resources may be related in some way • Description sets encoded or serialised as records • according to rules of binding
Description Set Description Statement Resource URI Resource URI Property URI Property URI Value URI Value URI Vocab Enc Scheme URI Vocab Enc Scheme URI Property URI Property URI Value string Value string Syntax Enc Scheme URI Syntax Enc Scheme URI Property URI Property URI Rich representation Rich representation
DCMI Abstract Model and Bindings • For transfer between applications, descriptions must be represented as digital objects • Binding maps between constructs in conceptual model and components in a digital format • Two way • encoding application: description set -> record • decoding application: record -> description set • DCMI currently provides three "encoding guidelines" specifications • Other agencies may also provide bindings
Property URI Value string Encoding Scheme URI Value URI Using X/HTML meta & link elements • The set of meta/link elements represent a single DC description. • The resource described is the X/HTML document in which the metadata is embedded. • Each meta/link element represents a single statement • Property and Encoding Scheme URIs encoded as prefixed names <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /><link rel="schema.DCTERMS" href="http://purl.org/dc/terms/" /> <meta name="DC.title" content="A guide to DC metadata" /> <meta name="DCTERMS.audience" content="information managers" /> <meta name="DC.language" scheme="DCTERMS.ISO639-2" content="eng" /> <link rel="DCTERMS.references"href="http://dublincore.org/documents/dcq-html" />
Property URI Value string Encoding Scheme URI Using the DC-XML format • Supports only limited subset of Abstract Model (revision forthcoming) • The container element, here <meta>, represents a single DC description. • Each child element represents a single statement • Property URIs and Encoding Scheme URIs encoded as XML QNames <?xml version="1.0"?><meta xmlns="http://www.ukoln.ac.uk/metadata/dcdot/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/"> <dc:identifier>http://example.org/doc/1234/</dc:identifier> <dc:title>A Guide to DC Metadata</dc:title> <dc:language xsi:type="dcterms:ISO639-2">eng</dc:language> <dcterms:references>http://dublincore.org/documents/dcq-html</dcterms:references> </meta>
Using the Resource Description Framework (RDF) • Specifications for DC in RDF do exist… • … but currently work in progress to • resolve ambiguities • revise in light of DCAM
DC Application Profile • Implementers adapt metadata standards to the context of their application • Tension between localisation and interoperability • A DC Application Profile • specifies the terms (properties, vocabulary/syntax encoding schemes) used in a class of description sets • describes how those terms are used • supplementary information on how properties applied/interpreted in context • constraints on occurrence of properties • constraints on values and value representations (encoding schemes)
DC Application Profiles: Examples • "Simple Dublin Core" • use of the 15 properties of the DCMES • all optional and repeatable • values represented by value strings • no vocabulary or syntax encoding schemes • UK eGMS • use of selected properties from DCMI vocabularies, additional properties • guidelines on use of properties • some properties mandated/recommended • some vocabulary encoding schemes mandated/recommended • guidance on content of value strings
DC Application Profiles: Examples • JISC Information Environment Service Registry (IESR) Metadata Schema • supports description of several related resources (Collection, Service, Agents) • use of selected properties from DCMI vocabularies, selected properties from RSLP CD vocabularies, some properties created for IESR • for each subject resource type, guidelines on use of properties • some properties mandated/recommended • many vocabulary encoding schemes mandated/recommended
DCMIProperties DC ApplicationProfile A: "Simple DC" DCMIVocabEncodingSchemes DC ApplicationProfile B: IESR IESRProperties IESRVocabEncodingSchemes
Dublin Core in X/HTML • Initial implementation focused on DC-in-HTML • Robot crawls individual HTML pages to extract metadata • But today little/no use by large Web search engines • Problems of spamming/trust • Lack of take-up by authors/publishers • Success of full-text crawling/indexing, esp. Google! • However, some use in controlled domains • Intranets • Trusted groups of providers (e.g. eGMS) • Embedding DC in XHTML useful if you know a search engine exploits it
Harvester HTTP GET Web Sites
Picture Australia- images "related to all things Australian" from 40+ cultural agencies" • central search service based (initially at least) on crawling HTML-embedded DC metadata • providers migrating to OAI-PMH • currently hybrid approach? http://www.pictureaustralia.org/
Dublin Core and OAI-PMH • Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) • Fairly simple mechanism for sharing metadata records between applications • Has origins in “e-prints” community • Built on HTTP, XML • Allows a harvester to ask a repository for all or some of its metadata records (in a specified metadata format) • i.e. supports "incremental harvesting" • "Give me all your records updated since yyyy-mm-dd" • "OAI-DC" (Simple DC) is mandatory format • But no limitation on format that can be transferred (as long as can be described by XML Schema)
Harvester OAI-PMH Repositories
OAIster (University of Michigan) • "academically-oriented digital resources" • "5,947,627 records from 557 institutions" (2005-11-15) http://oaister.umdl.umich.edu/
Summary • DCMES/"Simple DC" as a "core" for discovery of wide range of resources • "Simple DC" is, by definition, simple! • Limitations in terms of functions/services that can be offered • DCMI Abstract Model provides a framework for extensibility and modularity • A DC Application Profile describes a real-world usage of that model
An Introduction to Dublin Core Making Sense of Metadata, Society of Archivists EAD/Data Exchange SIG London, Thursday 17 November 2005 Pete Johnston Research Officer, UKOLN, University of Bath UKOLN is supported by: www.bath.ac.uk