Answering queries across mappings

Answering queries across mappings Grigoris KarvounarakisUniversity of Pennsylvania WPE-II Presentation

Global mediated schema (virtual) Query Q T Mappings M1 M2 Mn Data integration Heterogeneous data sources S1 S2 Sn ... In I2 I1 ... WPE-II

J J is a data exchange solution if: • hI,Ji²M • J²ST Data exchange Source Target ST M S T I WPE-II

Query answering (basic problem setting) Source Target Query Q M S T I • Given source and target schemas (S, T), mapping M, source instance(s) I and a query QT (over the target), evaluate Q (using data from I) • Query reformulation: Compute a reformulation Q’ of Q that only refers to source relations • Data exchange: Compute a data exchange solution J, such that Q can be evaluated directly on J WPE-II

Outline • Preliminaries • Mapping languages • Semantics of query answering • Query reformulation • Query answering using data exchange • Comparison WPE-II

Mapping languages • Two approaches: • Containment between conjunctive queries • Dependencies (logical assertions) WPE-II

Query containment • Definition: A query Q1 is contained in a query Q2, denoted by Q1vQ2, if for all database instances I: Q1(I) µQ2(I). • Two queries Q1 and Q2 are equivalent, if Q1vQ2 and Q2vQ1. • In the case where Q1 and Q2 are over different schemas, related through mapping M: • M²Q1vQ2 if 8I,J:hI,Ji²M: Q1(I) µQ2(J) WPE-II

Containment mappings • General form (GLAV): • QS(x,y) vQT(x,z) (sound – Open World Assumption) • QS(x,y) ´QT(x,z) (exact – Closed World Assumption) • QS, QTare conjunctions of relational atoms over S,T resp. • Special cases: • GAV(global-as-view): target is specified as a view of the source(s) • QS(x,y)vT(x)(sound – OWA) • QS(x,y)´T(x)(exact – CWA) • LAV(local-as-view): sources are specified as views of the virtual mediated schema • S(x)v QT(x,y)(sound – OWA) • S(x)´ QT(x,y)(exact – CWA) WPE-II

Dependencies • Tuple-generating dependencies (tgds): 8x,z(x,z) y (x,y) (where, are conjunctions of relational atoms and x,y,z are vectors of variables) • Equality-generating dependencies (egds): 8x(x) xi = xj WPE-II

Data exchange schema mappings • Source-to-target tgds: 8x,z(x,z) y (x,y) •  is a conjunction of atoms overS and is a conjunction of atoms overT • Target tgds • Both, areconjunctions of atoms overT • Target egds 8x(x) xi = xj •  is a conjunction of atoms over T WPE-II

Containment mappings vs. source-to-target tgds • A source-to-target tgd of the form: 8x,zQS(x,z) y QT(x,y) is equivalent to the sound GLAV mapping: QS(x,z)v QT(x,y) • Sound GAV and LAV mappings can also be expressed by source-to-targettgds. • But exact mappings also include a target-to-source direction: • E.g.: S(x,z) ´T1(x,y), T2(y,z) is equivalent to: 8x,z S(x,z) yT1(x,y) ÆT2(y,z) (source-to-target) and 8x,y,z T1(x,y) ÆT2(y,z)  S(x,z) (target-to-source) WPE-II

Incompleteness • Mappings do not specify target instance completely • E.g.: 8x,zS(x,z) !9yT(x,y) ÆT(y,z) does not specify the values of y Source Target M E.g., if I = {S(a,b)}: J1 = {T(a,a),T(a,b)} J2 = {T(a,b),T(b,b)} J3 = {T(a,X),T(X,b)} J4 = {T(a,X),T(X,b), T(a,Y),T(Y,b)} . . . S T I J1 J2 J3 . . . WPE-II

Semantics of query answering What do we expect as answers to queries over the target schema? • “Possible worlds” semantics: for every instance I of S, consider all possible instances J of the target schema T such that hI,Ji²M • Convention: certain answers certainM,I(QT) = IJ: hI,Ji²MQT (J) WPE-II

Outline • Preliminaries • Mapping languages • Semantics of query answering • Queryreformulation • Query answering using data exchange • Comparison WPE-II

Equivalent reformulation Definition: Q’S is an equivalent reformulation of QT across M (denoted M²QT´Q’S) if, for every pair of instances I,J of S,T s.t. hI,Ji²M: Q’S (I) = QT (J) WPE-II

Equivalent reformulations may not exist • Any reformulation over S can only return values v such that T(v,v) • But there are instances J, s.t. T contains tuples in which a ¹ b • … even if the mapping is exact 8xS(x) $T(x,x) T(a,b) S(c) Q(x) :- T(x,y) WPE-II

Contained reformulation Definition: Q’S is an contained reformulation of QT across M (denoted M²Q’Sv QT) if, for every pair of instances I,J of S,T s.t. hI,Ji²M: Q’S (I) µQT (J) WPE-II

Maximally-contained reformulation • Definition: QSmax is a maximally-contained reformulation of QT across M if: • M²QSmaxvQTand • Q’Sv QSmax, for every Q’Ss.t.M² Q’Sv QT • The union of all contained reformulations is a maximally-contained reformulation: QSmax´reformM(QT) ´UQ’S: M²Q’SvQTQ’S WPE-II

Maximally-contained reformulations compute certain answers Proposition ([AD98],[FKMP03],[T05]): Let certainM(Q) = lI. certainM,I (Q) Then: certainM(Q) ´reformM(Q) (i.e.,: 8I, reformM(Q)(I) = certainM,I(Q) ) • Note that the above holds for any mapping (i.e., not necessarily conjunctive) WPE-II

Reformulation algorithms (GAV) • Sound/exact GAV mappings: e.g. QS(x,y) vT(x) • Reformulation: • for every relation Ti(x) of the target schema, let ri be the set of rules with Ti on their head (maybe > 1). • Let QTi(x) be the union of the conjunctive queries in the body of the rules in ri • Substitute Ti(x) atoms in Q by QTi(x) WPE-II

Reformulation algorithms (LAV/GLAV) • Sound LAV/GLAV mappings: r: S1(x,y),…,Sn(x,y) vT1(x,z), …, Tm(x,z) (note: Ti ’s are not necessarily distinct relational atoms) (equivalent tgd: 8x,yS1(x,y),…,Sn(x,y) !Ti(x,z),…, Tm(x,z)) • Inverse rules ([DG97]): • For every rule r and every i 2 [1..m] define a rule: Ti(x, fr,z1(x,y), …, fr,zk(x,y)) :- S1(x,y),…,Sn(x,y) (tgd: 8x,yS1(x,y),…,Sn(x,y) !Ti(x,fr,z1(x,y),…, fr,zk(x,y)) skolemization of existential variables) WPE-II

Inverse rules: Example • r: S1(x,y),S2(y,w) vT1(x,z),T1(z,w) • Inverse rules: • T1(x,fr,z(x,y,w)) :- S1(x,y),S2(y,w) • T1(fr,z(x,y,w),w) :- S1(x,y),S2(y,w) • Observe that the same skolem term (fr,z(x,y,w)) represents the common existential variable (z) of the two atoms WPE-II

Query reformulation using inverse rules • Create a logic program PQ composed by: • the query Q • the inverse rules of all mappings M • Let P(I) be the result of the evaluation of the composition of a logic program P with a set of facts I • Theorem ([DG97,AD98]): Let PQ+ be a logic program s.t. for every set of facts I, PQ+(I) is the result of discarding all tuples that contain skolem terms from PQ(I). Then: • PQ+ is a maximally-contained reformulation • PQ+(I) = certainM,I(Q) WPE-II

Peer Data Management Systems • LAV source-to-peer mappings • P2P mappings: inclusion (sound)or equality (exact) GLAV + definitional (GAV) • Queries can be issued at any peer • Every peer can be both source and target w.r.t. different mappings • Pairs of peers may be indirectly connected (by paths of mappings) I3 ... In S3 Sn ... Mn3 P3 Pn M31 M23 M12 P1 P2 S1 S2 I1 I2 WPE-II

Simple PDMS example Q(n1,n2) :- SameProj(n1,n2,p), Author(n1,p),Author(n2,p) r0: SameProj(n1,n2,p) = ProjMem(n1,p),ProjMem(n2,p) ProjMem SameProj Area Author P1 P2 r1:S1(n,p,a)µProjMem(n,p),Area(p,a) r2: S2(n1,n2)µAuthor(n1,p), Author(n2,p) S1 S2 S1 S2 I1 I2 WPE-II

Mapping Graph r0a: SameProj(n1,n2,p) ¶ProjMem(n1,p),ProjMem(n2,p) r0b: SameProj(n1,n2,p) µ ProjMem(n1,p),ProjMem(n2,p) r2: S2(n1,n2)µAuthor(n1,p),Author(n2,p) r1: S1(n,p,a)µProjMem(n,p),Area(p,a) r0a r0b ProjMem SameProj Area Author P1 P2 r1 r1 r2 S1 S2 S1 S2 I1 I2 WPE-II

Query answering in PDMS Theorem: ([HIST05]) • In general, query answering in PDMS is undecidable • Reason: cycles in mapping graph • For acyclic mapping graph: query answering is in PTIME • Still in PTIME, for a limited form of cycles (i.e., exact mappings with some restrictions) • Allows chains of sound (“LAV”) mappings and exact (“GAV”) mappings without projections • Piazza reformulation algorithm • Sound and complete for acyclic mapping graph and limited form of cycles • Sound, in general (computes subset of certain answers) WPE-II

q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Piazza reformulation algorithm (1) q: Q(n1,n2) :- SameProj(n1,n2,p), Author(n1,w), Author(n2,w) r0: SameProj(n1,n2,p) :- ProjMem(n1,p), ProjMem(n2,p) r1: S1(n,p,a)µProjMem(n,p),Area(p,a) ir1a: ProjMem(n,p) :- S2(n,p,a) r2: S2(n1,n2)µAuthor(n1,p), Author(n2,p) ir2a: Author(n1,f(n1,n2)) :- S2(n1,n2) ir2b: Author(n2,f(n1,n2)) :- S2(n1,n2) WPE-II

Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_))Æ (S2(n1,n2)[S2(n2,n1)) Æ(S2(n2,n1)[S2(n1,n2)) WPE-II

Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ(S2(n1,n2)[S2(n2,n1))Æ(S2(n2,n1)[S2(n1,n2)) WPE-II

Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ(S2(n1,n2)[S2(n2,n1)) Æ(S2(n2,n1)[S2(n1,n2)) WPE-II

Piazza reformulation algorithm (2) Q(n1,n2) q SameProj(n1,n2,p) Author(n1,w) Author(n2,w) ir2a ir2a r0 ir2b ir2b ProjMem(n1, p) ProjMem(n2, p) S2(n2, n1) S2(n1, n2) S2(n1, n2) S2(n2, n1) ir1a ir1a S1(n1, p,_) S1(n2, p,_) Q(n1,n2) :- (S1(n1,p,_)ÆS1(n2,p,_)) Æ(S2(n1,n2)[S2(n2,n1)) Æ(S2(n2,n1)[S2(n1,n2)) ´ (S1(n1,p,_)ÆS1(n2,p,_)ÆS2(n1,n2)) (S1(n1,p,_)ÆS1(n2,p,_)ÆS2(n2,n1)) WPE-II

Universal solutions • Data exchange setting S,T,M, instance I of S • An instance J of T is a universal solution of the de setting above if it has homomorphisms to all other solutions • Solutions contain constants (i.e., values that appear in I) and variables (labeled nulls) • Homomorphismh: J1→ J2between target instances: • h(c) = c, for constant c • If R(a1,…,am) is in J1,, then R(h(a1),…,h(am)) is in J2 WPE-II

Universal solutions Source Target M S T J I Universal Solution Homomorphisms h1 h2 h3 J2 J1 J3 . . . Solutions WPE-II

Universal solutions example • M: 8x,zS(x,z) !9yT(x,y) ÆT(y,z) • I = {S(a,b)} • Solutions: J1 = {T(a,a), T(a,b)} is not universal J2 = {T(a,b), T(b,b)} is not universal J3 = {T(a,X), T(X,b)} is universal J4 = {T(a,X), T(X,b), T(a,Y), T(Y,b)} is universal J5 = {T(a,X), T(X,b), T(Y,Y)} is not universal . . . WPE-II

Computing universal solutions Apply the chase procedure on joint instance hI,;i • Source-to-target dependencies only: terminates in PTIME and produces a joint instance hI,Ji, where J is a universal solution (chase(I)) • Target dependencies: not guaranteed to terminate • If it does, it computes universal solution • If it fails, no universal solution exists WPE-II

Example chase sequence d1: 8x,y,zS(x,y)ÆS(y,z) !9w T(x,z,w) h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i )h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i h1: x! a, y! b, z! c extend to h1’: w!X1 WPE-II

Example chase sequence d1: 8x,y,zS(x,y)ÆS(y,z) !9w T(x,z,w) h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i )h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i )h2h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1),T(a,d,X2)}i h1: x! a, y! b, z! c extend to h1’: w!X1 h2: x! a, y! b, z! d extend to h2’: w!X2 WPE-II

Example chase sequence d1: 8x,y,zS(x,y)ÆS(y,z) !9w T(x,z,w) h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},;i )h1h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1)}i )h2h{S(a,b),S(b,c),S(b,d),S(a,e),S(e,c)},{T(a,c,X1),T(a,d,X2)}i h1: x! a, y! b, z! c extend to h1’: w!X1 h2: x! a, y! b, z! d extend to h2’: w!X2 h3: x! a, y! e, z! c extend to h3’: w!X1 not applicable! WPE-II

Universal solutions and query answering Theorem ([FKMP]): • If Q is a conjunctive query, I is a source instance and J is a universal solution: Q(J)+= certainM,I(Q) • Any solution J, for which the above holds for any conjunctive query, is universal WPE-II

Using inverse rules to compute universal solutions • For every relation Ti of T, let PM,Ti be the reformulation of the query Q(x) :- Ti(x), using the inverse rules algorithm. Proposition:UiPM,Ti (I) @chase(I) • Crux: every step of a chase sequence corresponds to a step in the evaluation of the logic program using SLD resolution Corollary:UiPM,Ti (I) is a universal solution WPE-II

T Applying data exchange in GAV/LAV settings J2 J1 J ... Jn Query Q M1 M2 Mn S S1 S2 Sn ... In I I2 I1 ... WPE-II

Performance tradeoffs Data exchange: - requires the computation of a solution (polynomial in the size of the instance I) - need to propagate updates in the source - may require to recompute the whole universal solution + But then query evaluation is easy and efficient + If query load is large, the cost of computing the solution may be amortized WPE-II

Performance tradeoffs Reformulation + No “startup” cost + No need to propagate updates - Adds overhead to query processing (although reformulations for “common” queries can be precomputed/cached) - Requires distributed query evaluation engine (but there is room for optimization, e.g., adaptive query processing) - Generated reformulations are generally not minimal WPE-II

Conclusions • Two approaches for answering queries across mappings • Reformulation (data integration) • Universal solutions (data exchange) • Different problems • Data exchange is concerned with other aspects, e.g., identifying the appropriate solution to materialize • Same answers (certain answers) • Performance tradeoffs • Tight relationship between chase and inverse rules techniques WPE-II

Answering queries across mappings

Answering queries across mappings

Presentation Transcript

Answering Queries Using Views LMSS 95

Answering Similar Region Search Queries

Answering Queries Using Views: A Survey

Answering Queries and Hypertree Decompositions

Answering Top-k Queries Using Views

Aggregate Query Answering under Uncertain Schema Mappings

Answering Queries Using Views: A Survey

Answering Top-k Queries Using Views

Retroactive Answering of Search Queries

Answering Queries: Problems

Answering Approximate Queries Efficiently

Answering Imprecise Queries over Web Databases

Answering Queries using views: A survey

Answering Relationship Queries on the Web

Answering Top-k Queries Using Views

Answering Tree Pattern Queries Using Views

Answering Conceptual Queries with Ferret

Answering Queries Using Views

Retroactive Answering of Search Queries

Crowd DB : Answering Queries with Crowdsourcing

Answering Queries Using Views

Answering Approximate Queries Efficiently