240 likes | 363 Vues
This presentation by Gordon Dunsire at the Cataloguing and Indexing Group Scotland seminar delves into the intersection of libraries and linked data within the framework of the Semantic Web. Dunsire explores how relational records and RDA vocabularies influence metadata practices, emphasizing the importance of disaggregated, machine-readable data. Key concepts include RDF, triples, and the potential for libraries to leverage linked data for enhanced authority control and metadata reuse. This talk provides insights about simplifying metadata for efficient machine processing, ultimately leading to improved bibliographic control.
E N D
Introduction to linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and the Semantic Web: what have libraries got to do with it?”, Edinburgh, National Library of Scotland, 17 June 2011
Overview • Relational records • Influence of RDA vocabularies • Disaggregated, distributed “records” • Logical conclusion: simple metadata statement • RDF • Triples, etc. • Linked data • Chains, clusters
Bibliographic record: 12345 Name authority record: 8765 Title: Cataloguing is fun! Heading: MacDonald, Mary Author: Mary MacDonald 8765 Place of birth: 9876 Edinburgh Content type: text 1234 LCSH authority record: 5432 Carrier type: microfiche 5432 Heading: Cataloging LCSH: 5432 Cataloging See also: 65443 Books RDA content type record: 1234 Term: text Definition: Content expressed through a form of notation for language intended to be perceived visually. RDA carrier type record: 5432 Term: microfiche Definition: A sheet of film bearing a number of microimages in a two-dimensional array.
Bibliographic record: 12345 Name authority record: 8765 Title: Cataloguing is fun! Heading: MacDonald, Mary Author: 8765 Place of birth: 9876 Content type: 1234 Carrier type: 5432 Stop! Ambiguous: link not safe. LCSH: 5432 Identifier: ok to link. 9876 Country 4567 8765 9876 Heading Name “Edinburgh” “MacDonald, Mary” 8765 Place of birth 9876 12345 Author 8765
Linked data is not a new idea! • It extends concepts of authority control • “Preferred” labels • Change once; link many times • Re-use of metadata • More than one “attribute” associated with a “heading” • E.g. Place of birth of person with name heading • Concepts can be applied to authority records • As well as bibliographic description records • Full extension leads to “record” dis-aggregation • All “records” in bibliographic control systems
Linked data and RDF • Resource Description Framework (RDF) • Designed for machine-processing of metadata at global scale • 24/7/365 • Trillions of operations per second • Everything must be dis-ambiguated • Machines are dumb • Simplicity helps! • Machine-readable identifiers
RDF triple • Metadata expressed as “atomic” statements • A simple, single, irreducible statement • The title of this book is “Cataloguing is fun!” • Constructed in 3 parts • “Triple” • The title of this book is “Cataloguing is fun!” • Subject of the statement = Subject: This book • Nature of the statement = Predicate: has title • Value of the statement = Object: “Cataloguing is fun!” • This book – has title – “Cataloguing is fun!” • subject – predicate - object
Identifiers • Need unambiguous way of identifying each part of the triple for efficient machine-processing • Human labels (“This book”, “has title”) no good • Same thing, different labels; different things, same label • Exploit the utility of the URL • Machine-readable, regular syntax, unambiguous • Uniform Resource Identifier (URI)
Uniform Resource Identifier • Can be any unique combination of numbers and letters • No intrinsic meaning; it’s just an identifying label • Can look like a URL • http://iflastandards.info/ns/isbd/elements/P1001 • But does not lead to a Web page (in principle ...) • RDF requires the subject and predicate of triple to be URIs • Object can be a URI, or a literal string (“Cataloguing is fun!”)
Namespaces • URI can be constructed from a base plus a unique, identifying suffix • http://iflastandards.info/ns/isbd/elements/ • + P1001 • Base is known as a namespace • Can be abbreviated by human programmer • “isbd” = http://iflastandards.info/ns/isbd/elements/ • isbd:P1001 • Machine expands abbreviation for processing
Everything as triples in RDF • Every aspect of the metadata must be expressed in RDF to be machine-processable • Metadata about real-world objects (books, people, etc.) • Metadata about the predicates (definition, label, scope, etc.) • Common predicates apply to many types of thing (human-readable label, etc.) • High-level RDF namespaces (rdfs, owl, skos) • RDF is expressed in RDF (“bootstrap”)
RDF properties • Predicates are called properties in RDF • “Verbal” part of the metadata statement • E.g. “A has title ...”, “B is author of C”, “D is embodiment of E” • Properties link specific instances of two things • A = a specific book, B = a specific person, etc. • ... = a specific label, character string, annotation • => a “literal” • Properties are the links in linked data, the pathways through the Semantic Web
Domains and ranges • A property can specify the types of thing it links • E.g. Bibliographic resources, Persons, Places, etc. • Types of thing are RDF classes • A domain is the class of the subject of the property • E.g. The domain of “is embodiment of” is Expression (FRBR) • A range is the class of the object of the property • E.g. The range of “is embodiment of” is Manifestation (FRBR)
Inferencing • RDF enables semantic inferencing • Deducing additional, unstated triples from an existing statement or set of statements • E.g. “D is embodiment of E” + “(is embodiment of) has domain Expression” => “D is a Expression” • And “D is embodiment of E” + “(is embodiment of) has range Manifestation” => “E is a Manifestation”
The truth • There is no test of veracity for a single triple in RDF • Anybody can say Anything about Anything (AAA) • Inferencing only tests for logical inconsistency • E.g. If it results in “E is a Manifestation” + “E is not a Manifestation” • Library linked data must choose and apply its properties/links with care • To maintain our reputation for reliability, quality, etc. • In a web of user-, machine-, and politically-generated metadata
Thank you • To be continued ...