1 / 18

Preservation Through Evolution Management: The DIACHRON Approach

This article discusses the preservation of data through change detection and understanding of evolution. It explores the challenges and architecture of the DIACHRON system and presents a case study on change detection in a pilot dataset. The article also discusses the representation of changes and provides a summary of the DIACHRON approach.

kathyf
Télécharger la présentation

Preservation Through Evolution Management: The DIACHRON Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preservation Through Evolution Management: The DIACHRON ApproachDIACHRON Final Dissemination Workshop 24.03.2016 Giorgos Flouris (FORTH)fgeo@ics.forth.gr

  2. Preservation and Evolution Management • Two sides of the same coin • Understanding evolution allows preservation • Preservation through change detection • Terminology changes (e.g., Yugoslavia) • Modelling changes (e.g., Pluto is a Planet) • Trace back our understanding at a given point in time, by “reverse engineering” changes • Equivalent to keeping the old versions, but: • Cheaper (in terms of space) • Helps understand (not just access) older versions

  3. DIACHRON Architecture

  4. Change Detection

  5. Change Detection Challenges • Change detection for evolution management • Identifying changes between versions • Challenges • Going beyond simple “delta” solutions • High-level deltas • More intuitive lists of changes • Without loss of formal rigor

  6. Additional challenges in DIACHRON • Change detection challenges (in DIACHRON) • Diverse data models • Dynamic datasets • Recoverable versions • Changes as first-class citizens • Cross-snapshot queries

  7. Change Detection in DIACHRON Pilot dataset DIACHRON DIACHRON Version 1 Change Change Pilot dataset Version 2

  8. Defining Changes: Layers Low-level Universal Simple Model-specific Complex User-specific

  9. Change Hierarchy: Low-level (1/3) • Low-level changes • DIACHRON model, for internal use • Fixed: Add, Delete • Just additions and deletions of triples • Simple set difference

  10. Change Hierarchy: Simple (2/3) • Pilot terminology: • Add_SuperClassAdd_Dimension • Fixed, pre-defined • Comprising of low-level changes • Partitioning is perfect • Complete and unambiguous

  11. Change Hierarchy: Complex (3/3) • Pilot terminology: • Add_Synonym, Mark_As_Obsolete • Totally custom, pilot-specific (defined at run-time)

  12. Detecting Changes Based on SPARQL queries Add_SuperClass (simple) Mark_as_Obsolete (complex) INSERT INTO <changesOntology> { ?mao a co:Mark_As_Obsolete; co:mao_p1 ?a; co:mao_p2 ?x; co:consumes ?asc; co:consumes ?al. } WHERE { GRAPH <changesOntology> { ?asc a co:Add_Superclass; co:asc_p1 ?asc1; co:asc_p2 ?asc2. FILTER NOT EXISTS { ?maoco:consumes ?asc. }. FILTER (?asc2 = <http://www.geneontology.org/formats/oboInOwl#ObsoleteClass>). BIND(?asc1 as ?a). OPTIONAL { ?al a co:Add_Label; co:al_p1 ?al1; co:al_p2 ?al2. FILTER NOT EXISTS { ?maoco:consumes ?al. }. FILTER(?al1 = ?asc1). FILTER(regex(str(?al2), 'obsolete_')). } BIND(concat(str(?a), str(?x)) as ?url) . filter ('v1'=?v1). filter ('v2'=?v2). BIND(IRI(CONCAT('http://mao/',SHA1(?url))) AS ?mao). } } INSERT INTO <changesOntology> { ?asc a co:Add_Superclass; co:asc_p1 ?a; co:asc_p2 ?b. } WHERE { GRAPH <v2> { ?r diachron:subject ?a; diachron:hasRecordAttribute ?ratt. ?rattdiachron:predicaterdfs:subClassOf; diachron:object ?b. } FILTER NOT EXISTS { GRAPH <v1> { ?r diachron:hasRecordAttribute ?ratt. ?rattdiachron:predicaterdfs:subClassOf; diachron:object ?b. } } FILTER NOT EXISTS { GRAPH <assoc> { {?assoc1 co:new_value ?a.} UNION {?assoc2 co:new_value ?b.} } } BIND(IRI('v1') as ?v1). BIND(IRI('v2') as ?v2). BIND(concat(str(?a), str(?b), str(?v1), str(?v2)) as ?url) . BIND(IRI(CONCAT('http://asc/',SHA1(?url))) AS ?asc). }

  13. Representing Changes: Motivation • Interesting motivating query • Return all countries for which the unemployment rate of their capital city increased faster than the average increase of the country as a whole, in the last 5 versions • Requires • Access to both the changes and the data • Access to multiple versions • Changes are first-class citizens • Necessary for preservation

  14. Representing Changes: Ontology DIACHRON D/changes/App1/schema Change Data Complex_Change Simple_Change INSERT … sparql_info Mark as Obsolete Add SuperClass … … Schema level Data level EFO_001927 asc_p1 SC1 ObsoleteClass asc_p2 D/changes/v1-v2

  15. Putting it All Together • DIACHRON data model contains all versions as well as changes • In a compact form (ontology of changes) • Detection based on SPARQL queries • Provided at deployment time (for simple) • Generated at creation time (for complex) • Recoverability • Allows moving back and forth between versions (important for preservation, and also for archiving)

  16. Summary of Changes • Problem • Lots of changes in a single version pair • Look at only a subset of the delta • Need for more intuitive deltas • Solution • Pinpoint locations in the ontology where “important” changes happened • Assessment strategies for “change summaries” • Number of changes, change of centrality/relevance, importance of position, hybrid strategies

  17. D2V Demo • D2V tool for: • Creating and managing complex changes • Visualizing the evolution history of a dataset • Demonstration video • https://www.youtube.com/watch?v=oY7qBBfcHYg • http://www.diachron-fp7.eu/videos.html • Online (live) demo • http://www.diachron-fp7.eu/demos.html

  18. Conclusion • Main DIACHRON message • (Linked) data preservation is related to evolution management • DIACHRON challenges • Diverse data models • Dynamic datasets • Recoverable versions • Changes as first-class citizens • Cross-snapshot queries • Solutions • DIACHRON data model (#1) • Appropriate change definition and detection (#2, #3) • Changes and data represented at the same level (#4, #5) • Work with high potential (e.g., summaries)

More Related