html5-img
1 / 9

A Perspective on Preservation of Linked Data

A Perspective on Preservation of Linked Data. Richard Cyganiak DERI, NUI Galway. How is Linked Data preservation different?. Easier because RDF is (sometimes) self-describing Representation information and context tends to be explicit and machine- processable

afya
Télécharger la présentation

A Perspective on Preservation of Linked Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Perspective on Preservation of Linked Data Richard Cyganiak DERI, NUI Galway

  2. How is Linked Data preservation different? • Easier because RDF is (sometimes) self-describing • Representation information and context tends to be explicit and machine-processable • Harder because it is tied to a particular technology infrastructure • If the domain name is lost, a dataset can no longer be LD (cf. TimBL's four principles) • Doesn't mean the data is no longer useful

  3. Why think about preservation of LD? • Can the preservation community teach us how to make data more self-describing? • Preservation requires packaging. LD needsbetter data packaging • Preservation requires versioning. LD needs better versioning • LD datasets do go offline. How can we deal with it? Preserving the bits is not necessarily the hardest problem!

  4. Access and formats • Multiple methods of publishing/accessing LD • Dereferenceable URIs • SPARQL endpoints • RDF dumps (triple/quad) • Embedding into web pages (RDFa, microdata) • Focus on RDF dumps to keep things tractable and to maximise usefulness for non-RDF data

  5. Vocabularies • Meaning of an LD dataset depends on used vocabularies (a.k.a. ontologies) • Most important representation information • Vocabularies can change and disappear too • Need to be preserved alongside the data • Vocabularies would be good starting point for LD preservation • Note: LOV already archives versions of 100s of vocabularies (http://lov.okfn.org/)

  6. Versioning • How to package individual versions of a dataset in an explicit, machine-readable way? • There is no strong notion of versioning in the RDF community. • Books have editions. Software products have releases. This is important for data too. What version of Dataset X are you using? • “Dependencies” between datasets and vocabularies, incl. versions? • See also: Memento

  7. Cataloging and packaging • How can the various parts of a dataset and its surrounding information be packaged and held together in an explicit, machine-readable way? • What metadata needs to be recorded about these packages to preserve context and make them findable? • Potential benefit: Tooling for setting up a local copy of a published/archived dataset including all its dependencies • See also: OKFN's data packages • http://www.dataprotocols.org/en/latest/data-packages.html

  8. Existing relevant (?) standards • VoID • Metadata standard for RDF vocabularies • DCAT • Upcoming W3C standard for data catalogs • PROV • W3C standard for provenance • DDI Discovery Vocabulary • Used by data archives to document statistical microdata, survey data, etc.

  9. Summary • The most important repository for LD preservation will be one that versions vocabularies • Focus on bulk RDF (dumps, not SPARQL endpoints or deref URI crawling) • Work towards good practices for making data self-describing and for metadata? • Work towards standards and good practices for packaging, versioning, dependencies? • Use existing standards: VoID, DCAT, PROV, Disco • Preservation across time… • But also preservation across space and communities

More Related