Introduction to linked data

Introduction to linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and the Semantic Web: what have libraries got to do with it?”, Edinburgh, National Library of Scotland, 17 June 2011

Overview • Relational records • Influence of RDA vocabularies • Disaggregated, distributed “records” • Logical conclusion: simple metadata statement • RDF • Triples, etc. • Linked data • Chains, clusters

Bibliographic record: 12345 Name authority record: 8765 Title: Cataloguing is fun! Heading: MacDonald, Mary Author: Mary MacDonald 8765 Place of birth: 9876 Edinburgh Content type: text 1234 LCSH authority record: 5432 Carrier type: microfiche 5432 Heading: Cataloging LCSH: 5432 Cataloging See also: 65443 Books RDA content type record: 1234 Term: text Definition: Content expressed through a form of notation for language intended to be perceived visually. RDA carrier type record: 5432 Term: microfiche Definition: A sheet of film bearing a number of microimages in a two-dimensional array.

Bibliographic record: 12345 Name authority record: 8765 Title: Cataloguing is fun! Heading: MacDonald, Mary Author: 8765 Place of birth: 9876 Content type: 1234 Carrier type: 5432 Stop! Ambiguous: link not safe. LCSH: 5432 Identifier: ok to link. 9876 Country 4567 8765 9876 Heading Name “Edinburgh” “MacDonald, Mary” 8765 Place of birth 9876 12345 Author 8765

Linked data is not a new idea! • It extends concepts of authority control • “Preferred” labels • Change once; link many times • Re-use of metadata • More than one “attribute” associated with a “heading” • E.g. Place of birth of person with name heading • Concepts can be applied to authority records • As well as bibliographic description records • Full extension leads to “record” dis-aggregation • All “records” in bibliographic control systems

Linked data and RDF • Resource Description Framework (RDF) • Designed for machine-processing of metadata at global scale • 24/7/365 • Trillions of operations per second • Everything must be dis-ambiguated • Machines are dumb • Simplicity helps! • Machine-readable identifiers

RDF triple • Metadata expressed as “atomic” statements • A simple, single, irreducible statement • The title of this book is “Cataloguing is fun!” • Constructed in 3 parts • “Triple” • The title of this book is “Cataloguing is fun!” • Subject of the statement = Subject: This book • Nature of the statement = Predicate: has title • Value of the statement = Object: “Cataloguing is fun!” • This book – has title – “Cataloguing is fun!” • subject – predicate - object

Identifiers • Need unambiguous way of identifying each part of the triple for efficient machine-processing • Human labels (“This book”, “has title”) no good • Same thing, different labels; different things, same label • Exploit the utility of the URL • Machine-readable, regular syntax, unambiguous • Uniform Resource Identifier (URI)

Uniform Resource Identifier • Can be any unique combination of numbers and letters • No intrinsic meaning; it’s just an identifying label • Can look like a URL • http://iflastandards.info/ns/isbd/elements/P1001 • But does not lead to a Web page (in principle ...) • RDF requires the subject and predicate of triple to be URIs • Object can be a URI, or a literal string (“Cataloguing is fun!”)

Namespaces • URI can be constructed from a base plus a unique, identifying suffix • http://iflastandards.info/ns/isbd/elements/ • + P1001 • Base is known as a namespace • Can be abbreviated by human programmer • “isbd” = http://iflastandards.info/ns/isbd/elements/ • isbd:P1001 • Machine expands abbreviation for processing

Everything as triples in RDF • Every aspect of the metadata must be expressed in RDF to be machine-processable • Metadata about real-world objects (books, people, etc.) • Metadata about the predicates (definition, label, scope, etc.) • Common predicates apply to many types of thing (human-readable label, etc.) • High-level RDF namespaces (rdfs, owl, skos) • RDF is expressed in RDF (“bootstrap”)

RDF properties • Predicates are called properties in RDF • “Verbal” part of the metadata statement • E.g. “A has title ...”, “B is author of C”, “D is embodiment of E” • Properties link specific instances of two things • A = a specific book, B = a specific person, etc. • ... = a specific label, character string, annotation • => a “literal” • Properties are the links in linked data, the pathways through the Semantic Web

Domains and ranges • A property can specify the types of thing it links • E.g. Bibliographic resources, Persons, Places, etc. • Types of thing are RDF classes • A domain is the class of the subject of the property • E.g. The domain of “is embodiment of” is Expression (FRBR) • A range is the class of the object of the property • E.g. The range of “is embodiment of” is Manifestation (FRBR)

Inferencing • RDF enables semantic inferencing • Deducing additional, unstated triples from an existing statement or set of statements • E.g. “D is embodiment of E” + “(is embodiment of) has domain Expression” => “D is a Expression” • And “D is embodiment of E” + “(is embodiment of) has range Manifestation” => “E is a Manifestation”

The truth • There is no test of veracity for a single triple in RDF • Anybody can say Anything about Anything (AAA) • Inferencing only tests for logical inconsistency • E.g. If it results in “E is a Manifestation” + “E is not a Manifestation” • Library linked data must choose and apply its properties/links with care • To maintain our reputation for reliability, quality, etc. • In a web of user-, machine-, and politically-generated metadata

Thank you • To be continued ...

Introduction to linked data