1 / 35

Building scalable semantic PDMS: the SomeWhere approach.

Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué, Gia-Hien Nguyen, Laurent Simon. Focus of this talk. How to make semantic approaches scalable to the web ?.

Télécharger la présentation

Building scalable semantic PDMS: the SomeWhere approach.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building scalable semantic PDMS: the SomeWhere approach. Marie-Christine Rousset Joint work with Philippe Adjiman, Philippe Chatalic, François Goasdoué, Gia-Hien Nguyen, Laurent Simon

  2. Focus of this talk How to make semantic approaches scalable to the web ? • A data centered vision of the Semantic Web • viewed as a huge semantic and distributed data management system • SomeWhere • a peer to peer infrastructure • based on simple personalized ontologies and mappings distributed at large scale

  3. P2P Data Management Systems • Logical network of peers (≠ physical network) • each peer is characterized by • its physical address (IP) • a description of the stored resources • its neighbors in the network • the peers to which it can transmit messages (queries,...) • Various topologies • random and dynamic (Gnutella) • fixed (Chord, Hypercube) • guided by the semantics • SON, Edutella, Piazza, DRAGO, coDB, Somewhere

  4. SomeWhere logical networks • The topology is not fixed • Guided by mappings • A peer • joins by declaring mappings between its ontology and the ontologies of some peers that it knows • leaves by removing the mappings with its acquaintances in the network

  5. SomeWhere in a nutshell • Simple data model based on a propositional language of classes • for defining ontologies, mappings, and queries • a sublanguage of OWL DL (W3C) • Scales up to one thousand peers • logical network : « small world »

  6. A0 A1 A2 A3 A4 A5 St_A5 St_A3 Ontology: hierarchy of intentional classes Storage description: extensional classes More complex inclusion statement: St_A1 A1¬A2 SomeWhere Data Model Schema+Data Data Data

  7. Queries: Logical combination of class literals: A1 ¬A3 Mappings: Q1 Q2 A1 B3 B3 A1 A A1 A2 A3 A1 (A2 A3) B1 ¬B3 B B1 B2 Q B3 SomeWhere Data Model

  8. Semantics • Standard FO logical semantics • one single domain of interpretation • a distributed set of formulas interpreted in the same way as if they were not distributed • in contrast with some other approaches • coDB: epistemic logic • DRAGO: distributed semantics of DDL or DFOL based on a collection of domains of interpretations • Our assumption: • the objects have a unique URI • objects stored at different peers and having the same URI are interpreted as being the same

  9. Rock Pop Pop_Rock Radio Mouv Nostalgie Music St_MouvSt_Nostalgie Pop_Rock Classical Tchaikovsky Tchai St_Pop_Rock Ru It St_Ru Tchai Tchai Tchaikovsky St_Tchai Data model: example P1 Musique P2 Rock Pop Classique Mouv Rock Français US St_pop Tchaikovsky St_FrançaisSt_USSt_Tchaikovsky

  10. Mouv Rock Mouv? Rock ? Pop? Rock Pop Pop_Rock Radio Mouv Nostalgie St_MouvSt_Nostalgie Tchaikovsky Tchai St_FrancaisSt_Pop St_USSt_Pop Tchai Tchaikovsky St_Mouv St_Pop Pop_Rock ? Query answering : illustration P1 Musique P2 Rock Pop Classique Français US St_pop Tchaikovsky St_FrançaisSt_USSt_Tchaikovsky St_Mouv St_Mouv St_Français St_Français St_US St_US St_Pop St_Pop St_Pop St_Pop Music Rewritings Pop_Rock Classical St_Pop_Rock St_Pop_Rock Ru It St_Ru Tchai St_Tchai

  11. Query answering in SomeWhere • Decomposition of queries/recombination of answers • only atomic queries are transmitted to peers • a complex query is splitted into atomic queries • each solicited peer processes a given atomic query q and incrementally sends back intentional answers for it • (conjunction of) extensional classes that are rewritings of q • intentional answers of different atomic queries resulting from the split of a complex query must be recombined • intentional answers can combine extensional classes of different peers • Can be reduced to a consequence finding problem in distributed clausal propositional theories • Ontologies and mappings are encoded as clauses • The maximal conjunctive rewritings of a conjunctive query Q correspond exactly to the negation of the clauses that are proper prime implicates of the negation of Q w.r.t the union of the local theories and the mappings

  12. Query answering algorithm • Message based local algorithm running on each peer • query, answer, and termination messages • Global properties • soundness • completeness • termination (even for cyclic networks)

  13. http://www.lri.fr/~adjiman/somewhere/ Flash demo of the SomeWhere extension

  14. MyBookmarks DVD Action Suspense Thriller Animation Action Suspense P5 P4 P2 P1 P6 P3 MyBookmarks MyBookmarks Animation Cartoons BruceWillis Action Comedy Humor DVD Movies Humor Comedy Humor Comedy Romance Thriller Cartoons Adult Friends Humor BenStiller Comedy Drama Thriller MyBookmarks MyBookmarks IMDB DramaComedy Drama MyBookmarks Actors Drama Comedy TVShows Genre BruceWillis BenStiller JuliaRoberts Drama Fantastic Gore Friends Alias

  15. Action Suspense Thriller P3 P1 P2 P5 P6 P4 BruceWillis Action Animation Cartoons Comedy Humor Humor Comedy Drama Thriller DramaComedy Drama Classes extensions Friends Humor BenStiller Comedy

  16. Q1: Thriller ? Q2: NOT Adult ? Q3: Thriller AND Comedy ? MyBookmarks DVD Animation Action Suspense P4 P6 P5 P3 P1 MyBookmarks MyBookmarks Animation Cartoons BruceWillis Action Comedy Humor Movies DVD Humor Comedy Comedy Humor Thriller Cartoons Adult Romance Drama Thriller MyBookmarks P2 MyBookmarks IMDB DramaComedy Drama MyBookmarks Actors Drama Comedy TVShows Genre BruceWillis BenStiller JuliaRoberts Drama Fantastic Gore Friends Alias Action Suspense Thriller Friends Humor BenStiller Comedy

  17. Thriller ? Rewritings: MyBookmarks P3:Thriller P1:Action P1:Suspense P5:Drama P6:DramaComedy P2:BruceWillis P1:Suspense DVD Animation Action Suspense P3 P1 P2 P6 P4 P5 MyBookmarks MyBookmarks BruceWillis Action Animation Cartoons Comedy Humor Movies DVD Humor Comedy Comedy Humor Thriller Cartoons Adult Romance Drama Thriller MyBookmarks MyBookmarks IMDB DramaComedy Drama MyBookmarks Actors Drama Comedy TVShows Genre BruceWillis BenStiller JuliaRoberts Drama Fantastic Gore Friends Alias Action Suspense Thriller Friends Humor BenStiller Comedy

  18. Navigational Navigational P3 P1 P5 P6 P5 Navigational Rewritings of Thriller: evaluation Local P3:Thriller P1:Action P1:Suspence P5:Drama P6:DramaComedy P2:BruceWillis P1:Suspense Integration

  19. 1 machine N peers N machines K peers per machine N machines 1 peer per machine SomeWhere infrastructure

  20. Zoom on one machine SomeWhere infrastructure 100 % JAVA 1.5 somewhere.jar ~ 250 Ko

  21. Scalability experiments [IJCAI 05] • on randomly generated networks • 1000 peers deployed on a cluster of 75 machines • small world topology • Close to the topology of the web • peers • ontologies • random clauses of length 2 • mappings • random clauses of length 2 or 3

  22. Varying topologies 1000 peers Ring, 10 neighbours/peer P = 0.01 P = 0.1 P = 1 Small world Random graph P: probability of redirecting an edge Model of Watts and Strogatz

  23. Scalability results Varying parameters • Number of mappings between peers • complexity of mappings • ratio of clauses of length 3 (0%, 20%, 100%) timeout : 30 s/query Depth of query processing • Small depth (less than 7) even on the hard cases Time to produce a number of answers • In 90% cases, the first answer is produced within 2 seconds • Easy cases (simple mappings): • few answers per query (5 on average) • very fast (less than 0.1s) to compute all the answers without timeouts • Hard cases (complex and more mappings per edge) • around 1000 answers per query (but > 30% queries not complete : timeouts) • quite fast to obtain them (less than 20s)

  24. MuseumName http://www.louvre.fr "Le Louvre" Located CityName " Paris" http://www.paris.fr Ongoing work (1) • Extending the data model to RDF(S) • W3C recommendation for describing web resources • Classes and (binary) relations between objects • each object is identified by a URI Triple notation: <resource, property, value> Relational notation: property(resource, value)

  25. CulturalPlace Is-a Contains MadeBy MuseumName Literal Work Artist Museum WorkName Located ArtistName Is-a Is-a Literal Literal CityName Literal City ModernMuseum ArcheologyMuseum RDFS

  26. SomeRDFS: data model a simple fragment of RDFS distributed throughsimplemappings (using the same constructors)

  27. Query rewriting • Propositionalisation of RDFS statements • Query rewriting using SomeWhere C1dom  C2dom C1range C2range P1rel P2rel Prel  Cdom Prel  Crange

  28. P2.refersTorel P2.Workdom P2.Workrange SomeWhere rewriting SomeWhere rewriting SomeWhere rewriting P2.Paintingdom … P1.Paintsrel … P1.belongsTorel … P1.belongsTo(X,Y) P2.Painting(X) P1.Paints(Z,X) R1(X,Y): P2.Painting(X)P1.belongsTo(X,Y) illustration Q(X,Y): P2.Work(X)P2.refersTo(X,Y)

  29. P2.refersTorel P2.Workdom P2.Workrange SomeWhere rewriting SomeWhere rewriting SomeWhere rewriting P2.Paintingdom … P1.Paintsrel … P1.belongsTorel … P1.belongsTo(X,Y) P2.Painting(X) P1.Paints(Z,X) R2(X,Y): P1.Paints(Z,X)P1.belongsTo(X,Y) illustration Q(X,Y): P2.Work(X)P2.refersTo(X,Y)

  30. B’ A’ A B A’ A B’ B Ongoing work (2) • Handling inconsistencies • how to define them ? • insatisfiability (no model) => inconsistency • not a necessary condition • how to check consistency? • at each join of a new peer • how to deal with inconsistency? • correct it or reason with it ? for each A, there exists a model in which A is non empty: S | A

  31. 2005 m0 AIPubliTheory AIPubli BDPubli 2005Conf TheoryJournal m1 m2 Publi P3 >--< Conf Journal illustration path m1: AIPubli is a subclass of Conf. Article P1 P2 Theory path m0 -> m2: AIPublic is a subclass of Journal. Expe Conf and Journal are disjoint, therefore AIPUbli is necessarily empty inconsistencies are caused by mappings.

  32. ¬2005v Conf ¬Theoryv Journal m2 m1 P2P detecting of inconsistencies ¬AIPubli v 2005 ¬BDPubli v 2005 ¬Theory v Article ¬Expe v Article ¬AIPubliv Theory ¬Conf v Publi ¬Journal v Publi ¬Journal v ¬Conf • Propagation of m2: { ¬TheoryvJournal;¬AIPublivJournal;…..;¬AIPubli ;…;¬AIPubliv¬Conf}.Production of a unit clause Inconsistency{m1,m2} is a NoGood stored at P3 • Propagation of m1: { ¬AIPublivConf;¬AIPublivPubli;¬AIPubliv¬Journal; ¬BDPublivConf; ¬BDPublivPubli;¬BDPubliv¬Journal}. No production of unit clause No inconsistency

  33. { } { } M*1 M*2 … M*n { } Distributed storage of the NoGoods

  34. P2P well-founded reasoning • Principle: • avoid the inconsistencies when constructing answers • Semantics of « well-founded » answer: • obtained from a consistent subset of formulas • Algorithm: • for each answer, • build its set of mapping supports and return the set of NoGoods encountered during the reasoning, • discard the mapping supports including a NoGood • return the answers having a not empty set of mapping supports It will be presented in more details at ECAI 06

  35. Perspectives • Coupling SomeWhere to a DHT for optimizing lookup queries • Adapting the SomeWhere algorithm to support the epistemic semantics • Modeling and handling trust in P2P Semantic overlay networks • based on a logical approach • P2P discovery and composition of smart devices • based on a semantic description of the functionality, inputs and outputs of devices

More Related