1 / 36

High-Level Change Detection in the Semantic Web

High-Level Change Detection in the Semantic Web. Giorgos Flouris fgeo@ics.forth.gr. Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece. Joint work with: Vicky Papavassiliou, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides.

Télécharger la présentation

High-Level Change Detection in the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Level Change Detectionin the Semantic Web Giorgos Flouris fgeo@ics.forth.gr Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Joint work with:Vicky Papavassiliou, Irini Fundulaki, Dimitris Kotzinos, Vassilis Christophides Giorgos Flouris

  2. World Wide Web • WWW (and HTML) focus on human readability • Page presentation (fonts, colors, images, …) • Human understanding • Presentation  Semantical content • Content is not formally described (for a machine to understand) • WWW contains documents, not data Giorgos Flouris

  3. Problems with Current Web • Search and access becomes difficult • Software ignorant of the semantical content of a web page • Keyword search • High recall, low precision • Terminological issues • Synonyms (heart disease = cardiac disease) • Hyponyms/hypernyms (parliament members are politicians) • Queries on the semantical content cannot be made • Fetch articles that support B. Obama’s foreign policy • Fetch the home pages of all members of the Greek Parliament Giorgos Flouris

  4. Semantic Web • The Semantic Web is an extension of the current webin which information is given well-defined meaning, better enabling computers and people to workin cooperation(Berners-Lee et al., 2001) • The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/ • [Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partnershttp://www.w3.org/2001/sw/ Giorgos Flouris

  5. Semantic Web in Practice • Web of data, rather than documents • HTML for presentation • Semantical languages for semantical content • Readable and understandable by humans and machines • Semantic Web languages, protocols, etc • Web page annotation (metadata descriptions etc) • Publication of data on the Internet • Efficient communication and manipulation of data over the Internet • Different applications • Efficient searching • Sharing of data (e-science, e-government, remote learning, …) Giorgos Flouris

  6. Ontologies • Backbone of the Semantic Web • Ontologies allow the description of data • Annotation and metadata regarding web pages • Terminological relations (synonyms, hyponyms, …) • Communication and description of data, ideas, beliefs • An ontology is an explicit specification of a shared conceptualization of a domain(Gruber, 1993) • Precise, logical account of the intended meaning of terms, data structures etc • Common (shared) interpretation of terms • Formal vocabulary for information exchange (for humans and machines) Giorgos Flouris

  7. Ontologies in Practice • Basic structures: • Classes (or concepts): collections of objects (e.g., Actor, Politician) • Properties (or roles): binary relationships between objects (e.g., started_on, member_of) • Instances (or individuals): objects (e.g., Giorgos, B. Obama) • Relations between them • Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), … • The allowed relations and their semantics depend on the language • Different representation languages for ontologies • RDF, RDFS, DAML+OiL, OWL, OWL-DL, OWL-Lite, OWL2, DLs, … • Usually triple-based Giorgos Flouris

  8. instantiation subsumption Visualization, Triples, Serialization Visualization Triple Representation Serialization (RDF/XML) Period <rdfs:Class rdf:ID=“Period”> </rdfs:Class> <rdf:Property rdf:ID=“participants”> <rdfs:domain rdf:resource=“Onset”/> <rdfs:range rdf:resource=“Actor”/> </rdf:Property> <G_Birth rdf:about Birth> <participants> <Giorgos rdf:about Actor/> </participants> </G_Birth> <rdfs:Class rdf:ID=“Event”> <rdfs:subClassOf rdf:resource=“Period”/> </rdfs:Class> Define classes [Period type Class] Define properties [participants type Property] [participants domain Onset] [participants range Actor] Instantiate/define individuals [G_Birth type Birth] [Giorgos type Actor] [G_Birth participants Giorgos] Define hierarchies [Event subClass Period] Actor Event participants started_on Onset Existing Stuff Birth participants Giorgos G_Birth Giorgos Flouris

  9. Ontology Dynamics • Ontologies change constantly • World changes (dynamic models) • View on the world changes (new knowledge, measurements, etc) • Perspective and usage changes • Example: GO ontology changes daily • Gene Ontology: information about gene products (biology) • Must find a way to cope with changes • Ontology evolution (modify an ontology in response to a change) • Ontology versioning (keep track of versions and their relations) • … • We deal with a peripheral problem (change detection) Giorgos Flouris

  10. What is Change? Real World Ontology EvolutionAlgorithm Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)… Ontology Giorgos Flouris

  11. What is Change Detection? Real World Change Detection Algorithm Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)… Ontology Giorgos Flouris

  12. C1 C2 C3 C4 V1 V2 V3 V4 V5 Keeping Track of Changes • Purpose of this work: change detection • A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way • It is important to store the changes between versions • Visualization of differences • Efficient storage and/or communication • Evolution history • Record changes as they happen (manual or automatic) • Error-prone, difficult (often impossible) Giorgos Flouris

  13. instantiation instantiation subsumption subsumption Sample Evolution Version 1 (V1) Version 2 (V2) Period participants Actor Event Actor Event started_on Birth Persistent Onset participants Evolution started_on Onset Existing Stuff Stuff Birth participants G_Birth Giorgos participants Giorgos G_Birth Giorgos Flouris

  14. Triples in V1 (partial list) [Event type Class] [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Giorgos type Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] [Birth subclass Onset] … Triples in V2 (partial list) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Analyzing the Evolution (Using Triples) Giorgos Flouris

  15. Triples in V2 but not in V1(added triples) [Event domain participants] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Birth subclass Event] Triples in V1 but not in V2(deleted triples) [Period type Class] [Event subclass Period] [participants domain Onset] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Birth subclass Onset] Low-Level Delta Low-Level Delta Add([Event domain participants])Add([Persistent type Class]) …Del([Period type Class])… Giorgos Flouris

  16. instantiation subsumption Analyzing the Evolution (Visually) Version 1 (V1) Version 2 (V2) Period participants Actor Event started_on Actor Event Birth Persistent Onset participants Evolution started_on Onset Existing Stuff participants G_Birth Giorgos Stuff Birth High-Level Delta Generalize_Domain(participants, Onset, Event) Pull_Up_Class(Birth, Onset, Event) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø) Rename_Class(Existing, Persistent) participants Giorgos G_Birth Giorgos Flouris

  17. Del([participants domain Onset]) Add([participants domain Event]) Del([Period type Class]) Del([Event subclass Period]) Del([Birth subclass Onset]) Add([Birth subclass Event]) Delete_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø) Generalize_Domain(participants, Onset, Event) Pull_Up_Class(Birth, Onset, Event) instantiation subsumption Comparing the Deltas Version 1 (V1) Version 2 (V2) Period participants Actor Event started_on Actor Event Birth Persistent Onset participants Evolution started_on Onset Existing Stuff participants G_Birth Giorgos Stuff Birth participants Giorgos G_Birth Low-level delta High-level delta Giorgos Flouris

  18. Associations (Partitioning) Giorgos Flouris

  19. Low-Level Versus High-Level Deltas • Purpose: • A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way • Low-level deltas • Easier to get • High-level deltas • More concise (e.g., Rename_Class) • More intuitive (e.g., Pull_Up_Class) • Carry additional information (e.g., Generalize_Domain) • Objective: detection of high-level deltas Giorgos Flouris

  20. Language of Changes and Algorithm • Deltas based on some language of changes • A set of formal definitions that describe the changes that can be understood and detected • Can be high-level or low-level • Must be coupled with a corresponding detection algorithm • Low-level languages easy to define (Add(t), Del(t)) • High-level languages more complicated • Several proposals; no standard • Challenges for high-level languages • Must be deterministic (exactly one high-level delta) • Must be fine-grained enough to capture subtle changes • Must be coarse-grained enough to be concise Giorgos Flouris

  21. Proposed Language L • The formal definition of a change consists of: • Changes required in the low-level delta (added/deleted triples) • Conditions that should hold in V1 and/or V2 • Generalize_Domain(P, X, Y) • Del([P domain X]) • Add([P domain Y]) • P existing property in both V1, V2 • X, Y existing classes in both V1, V2 • X subclass of Y in both V1, V2 • Generalize_Domain(participants, Onset, Event): detectable • Similarly for the other changes in L (about 120 in total) Giorgos Flouris

  22. Results on L: Granularity • Granularity problem: solved by defining levels of changes • Basic Changes: fine-grained, roughly correspond to low-level • Composite Changes: coarse-grained, group several basic changes together • Heuristic Changes: based on heuristics, necessary for Rename, Merge, Split etc • Problems with determinism • One evolution could correspond to different sets of basic/composite changes • Priorities in detection • Heuristic  Composite  Basic Giorgos Flouris

  23. Results on L: Types of Changes Changes Low-Level High-Level AddDel Basic Composite Heuristic Delete_Subclass Delete_Domain Pull_Up_Class Change_Domain Rename_Class Split_Class Giorgos Flouris

  24. Results on L: Determinism • Each low-level change is associated with exactly one detectable high-level change • Full partitioning of low-level changes into high-level ones • Each pair of versions (V1, V2) is associated with: • Exactly one low-level delta • Exactly one high-level delta • Determinism is necessary • More than one would lead to ambiguities • Less than one would make some inputs (V1, V2) irresolvable Giorgos Flouris

  25. Results on L: Application Version 1 (V1) Version 2 (V2) Period participants Actor Event Actor Event Detect C started_on Birth Persistent Onset participants started_on Apply C Onset Existing Stuff Apply C-1 Stuff Birth participants G_Birth Giorgos participants Giorgos G_Birth Giorgos Flouris

  26. C1 C2 C3 C4 V1 V2 V3 V4 V5 Results on L: Deltas Keep Version History • Can reproduce all versions as long as you keep (any) one version and the deltas • Deltas are more concise than the versions themselves • Storage and communication efficiency Giorgos Flouris

  27. Detection Algorithm for L (1/2) List of Mappings <V1:Existing> is matched with <V2:Persistent> Run Matcher(External) Compute Heuristic Changes Heuristic Changes Rename_Class(Existing, Persistent) Triples in Delta (step 1: low-level) Del([participants domain Onset]) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Existing type Class]) Del([Stuff subclass Existing]) Del([started_on domain Existing]) Del([Period type Class]) Add([Birth subclass Event]) Add([participants domain Event]) Add([Persistent type Class]) Add([Stuff subclass Persistent]) Add([started_on domain Persistent]) Triples in V2 (Partial List) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Triples in V1 (Partial List) [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] … Calculate Low-Level Delta Giorgos Flouris

  28. Detection Algorithm for L (2/2) Del([participants domain Onset]) Find Associated Change ? ? ? Generalize_Domain(participants, Onset, Event) DETECTABLE Triples in V2 (Partial List) [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Triples in V1 (Partial List) [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] … Triples in Delta (step 2: heuristic) Del([participants domain Onset]) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Period type Class]) Add([Birth subclass Event]) Add([participants domain Event]) Rename_Class(Existing, Persistent) Triples in Delta (step 3: basic and composite) Del([Birth subclass Onset]) Del([Event subclass Period]) Del([Period type Class]) Add([Birth subclass Event]) Rename_Class(Existing, Persistent) Generalize_Domain(participants, Onset, Event) Triples in Delta (step 4: result) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø) Pull_Up_Class(Birth, Onset, Event) Rename_Class(Existing, Persistent) Generalize_Domain(participants, Onset, Event) Giorgos Flouris

  29. Find Associated Change Operations Pull_Up_Class(*,*,*) [not in the table] Delete_Property(participants,*,*) [necessary triples not found] Specialize_Domain(participants, Onset, Event) [conditions not true] Generalize_Domain(participants, Onset, Birth) [wrong parameter (triples not found)] Generalize_Domain(participants, Onset, Event) [DETECTABLE (ASSOCIATED)] Delete_Domain(participants, Onset) [composite changes have priority] Giorgos Flouris

  30. Implementation • Algorithm implemented for experiments and evaluation • Uses the APIs of SWKM • Platform for efficient and scalable management of dynamic RDF/S ontologies and data • Query, update, low-level delta, high-level delta, versioning, … Giorgos Flouris

  31. Performance • Complexity: O(max{N1,N2,N2}) • Linear average-case • Highly dependent on the detected changes (type, number) Giorgos Flouris

  32. Evaluation: Usefulness and Intuitiveness • L is well-defined (changes used in practice) • GO: add/delete class, comments changing • CIDOC: add/delete/rename properties • Results confirmed by literature/editor notes Giorgos Flouris

  33. Evaluation: Conciseness • Basic ≈ Low-Level • Basic+Composite+Heuristic << Low-Level Giorgos Flouris

  34. Editor notes Delete class: 3 Add property: 54 Delete property: 16 Rename property: 24 Redirect properties (domain): 14 Redirect properties (range): 14 Detection result Delete class: 6 Add property: 58 Delete property: 18 Rename property: 30 Generalize_Domain: 13 Specialize_Domain: 1 Generalize_Range: 14 Specialize_Range: 1 Change_Range: 1 Manual Change Recording (CIDOC) Giorgos Flouris

  35. Conclusion • High-level change detection • A posteriori detection (input: V1, V2) • No further information needed (e.g., logs, change recording etc) • Formal semantics • Formal results (reversibility, determinism, …) • Non-heuristic based (except for heuristic changes) • No need for precision and recall evaluation • Efficient, sound and complete detection algorithm • Nice informal properties • Conciseness, intuitiveness • Future work: more operations, evaluation on other datasets, evaluation with real users Giorgos Flouris

  36. References • Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki,Dimitris Kotzinos,Vassilis Christophides. On Detecting High-Level Changes in RDF/S KBs. In Proceedings of the 8th International Semantic Web Conference (ISWC-09), to appear, 2009 • Vicky Papavassiliou, Giorgos Flouris, Irini Fundulaki,Dimitris Kotzinos,Vassilis Christophides. Formalizing High-Level Change Detection for RDF/SKBs. Technical Report TR-398, FORTH-ICS, 2009 Thank You Giorgos Flouris

More Related