220 likes | 326 Vues
Query Answering using Views. based on [Pottinger and Halevy, VLDB 2000]. Setting 1. Def. Q – CQ over base data, V1, …, Vn – conjunctive views.
E N D
Query Answering using Views based on [Pottinger and Halevy, VLDB 2000].
Setting 1 Def. Q – CQ over base data, V1, …, Vn – conjunctive views. (may eventually allow built-ins, but consider pure conjunctive for now.) Rewrite Q into Q’ solely in terms of V1, …, Vn such that D: Q(D) = Q’(V1(D), …, Vn(D)). • exact equivalence insisted. • appropriate for query optimization & physical data independence.
An example s = sameTopic, c = cites. Q: q(X,Y) s(X,Y), c(X,Y), c(Y,X). V1: v1(A,B) c(A,B), c(B,A). V2: v2(C,D) s(C,D), c(C,D’), c(D,C’). An equivalent rewriting – Q’: q(X,Y) v1(X,Y), v2(X,Y). expand Q’. q(X,Y) c(X,Y), c(Y,X), s(X,Y), c(X,Y’), c(Y,X’). redundant. Q’ Q. Note: Asking for equivalent rewriting in II context not realistic!
Setting 2 L – QL of interest. Def. Q and vi’s as before. Q’ is a maximally contained rewriting of Q, provided (1)D: Vi vi(D): 1 i n: Q’(V1, …, Vn) Q(D) & (2) ! [query Q”L: D: Vi vi(D): (a) Q’(V1,…Vn) Q”(…) Q(D) & (b) D, Vi vi(D): Q’(…) Q”(…) ]. Just what is this def. saying?
An example S1: v1(A,B) c(A,B), c(B,A), inS(A), inS(B). S2: v2(A,B) … inV(A), inV(B). S3: v3(A,B) s(A,B), c(A,B’), c(B,A’). Q: q(X,Y) s(X,Y), c(X,Y), c(Y,X). Two rewrites – Q’: q(X,Y) v3(X,Y), v1(X,Y). Q”: q(X,Y) v3(X,Y), v2(X,Y). Q Q” – maximally contained rewriting (MCR). Q’ Q” “’’ Q. Intuitively: best you can do.
Diff. Between the two settings • Query optimization and maintaining physical data independence: need equiv. rewriting; goal – cheapest plan. • Information integration (using LAV): MCR is the most realistic thing you can hope for. • QAV theory and algorithms next.
QAV Theory E.g.: global schema relations – phone(E,P), emp(E), office(E,O), mgr(E,M), dept(E,D). Source views – S1: v1(E,P,M) e(E), p(E,M), m(E,M). S2: v2(E,O,D) e(E), o(E,O), d(E,D). S3: v3(E,P), e(E), p(E,P), d(E,``toy”). Q: q(P,O) p(`mary’,P), o(`mary’,O). R1: r1(P,O) v1(`mary’,P,M), v2(`mary’,O,D). R2: r2(P,O) v2(`mary’,O,D), v3(`mary’,P). When is a rewrite good? soundness – what does this mean? irredundancy – what does this mean? How do we check these properties? [explained in class.]
Another example S1: v1(X,Y) p(X,Y). S2: v2(X,Y) p(X,Y). Q: q(X,Y) p(X,Y). What are the sound & irredundant (and MCR) rewritings?
LMSS Theorem Theorem [Levy et al. PODS’95]: If Q has k subgoals, suffices consider rewrites with <= k view subgoals. Other rewrites are guaranteed to be redundant. Why is this so? Proof: Q: q(…) g1( ), …, gk( ) R: r(…) v1( ), …, vm( ). (m >= k). r(…) gi( )… gj( ) … . One of the expansions is not the target of any g in Q, under a c.m. (why?)
LMSS Theorem Proof (contd.) Example (revisit emp, office, … example & illustrate). So, can remove any vi whose expansion is “untouched” by the c.m. Resulting rewrite, say R’ contains R, but is itself contained in Q. But then R’ has <= k (view) subgoals. Note: Rewrites are allowed to introduce built-ins. Q.E.D.
QAV – Bucket Algorithm E.g.: q1(X) c(X,Y), c(Y,X), s(X,Y). v4(A) c(A,B), c(B,A). v5(C,D) s(C,D). v6(F,H) c(F,G), c(G,H), s(F,G).
QAV – Bucket Algorithm E.g.: q1(X) c(X,Y), c(Y,X), s(X,Y). v4(A) c(A,B), c(B,A). v5(C,D) s(C,D). v6(F,H) c(F,G), c(G,H), s(F,G). Create a bucket per query subgoal. Should we list v4(X) twice in the buckets?
BA (contd.) Consider all combos & check for containment (of expansion) in Q. • v4(X) cannot be combined w/ anything. (why?) • Try q1(X) v4(X), v4(X), v5(X,Y). • Try q1(X) v4(X), v6(X,Y), v5(X,Y). • Does any of these work? • When can we discard a view from consideration?
BA (contd.) • Here is a successful rewrite: q1(X) v6(X,Y), v6(X,Y), v5(X,Y). redundant. By itself is not contained in Q1. But, with subgoal X=Y added, it is! But, “shrinking” the expansion, this is nothing but Q1(X,Y) v6(X,X).
BA (concluded) Remarks: • You get to add built-ins to a rewrite. • v4 didn’t contribute to any rewrite – but BA doesn’t recognize it ahead. • Consider q2(X,Y) c(X,Y), c(Y,X). Then both c’s can be folded into v4. – not recognized by BA. • Inverse Rules algorithm – see paper.
MiniCon (Revisit previous example) E.g.: q1(X) c(X,Y), c(Y,X), s(X,Y). v4(A) c(A,B), c(B,A). v5(C,D) s(C,D). v6(F,H) c(F,G), c(G,H), s(F,G). Form buckets more intelligently: • Ask which query SGs this view can “cover”? • Ask what is the minimal set of query SGs that must be covered (via mappings) by this view? e.g.: can v4(A) cover any SG in q1? Hint: look at “join conditoons” in q1.
MiniCon ex. (contd.) • Consider v5(C,D): it can cover s(X,Y) (and nothing else). to do this, we use h: CC, DD; : XC,YD. • h imposes any additional equalities on distinguished vars in v5 (none in this case). • says how the query SG(s) may be covered by v5.
MiniCon ex. (contd.) • Consider v6(F,H): it can cover s(X,Y). But, note: Y is an existential var! So, X must be mapped to F and Y to G. Can we stop there? need to also map X to H (because of join conditions), which is only possible if we equate F to H. Key insight: join obligations in query via existential vars cannot be fulfilled by another view (unless those vars are mapped to the view’s DVs)! Current view should fulfill them all.
MiniCon ex. (contd.) • So, what’s the story for v6(F,H)? v6(F,H) can cover the block c(X,Y), c(Y,X), s(X,Y) in q1 provided we impose the condition F=H on the distinguished vars of v6. That is, our h: FF, HF; : XF, YG. • Since we cover all query SGs with v6(F,F), the (unique) MCR is q1(X) v6(X,X).
miniCon Algorithm Key steps: • For every query SG and view SG, form a MCD = (h(view predicate), h, , query SGs covered). • h least restrictive head homomorphism (what is that?) • the partial c.m. enabled by h.
miniCon Algorithm (contd.) • For every possible way of covering all query SGs, “chain” the corresponding MCDs of views in the covering, and form a rewriting. The union of all such rewritings is the MCR of Q w.r.t. the given views. • See paper for details of algorithm and properties of the miniCon approach. • miniCon – best known algo for QAV; empirically shown to scale to 1000s of views (I.e., sources).
CQs with built-ins. E.g.: y = year. q2(X) inS(X), c(X,Y), y(X,R1), y(Y,R2), R1>=1990, R2<=1985. v7(A,S1) inS(A), c(A,B), y(A,S1), y(B,S2), S2<=1983. v8(A,S1) inS(A), c(A,B), y(A,S1), y(B,S2), S2<=1987. V7 covers all ordinary SGs via h (identity) and : XA, YB, R1S1, R2S2. R2<=1985 is satisfied by S2<=1983. But what about R1>=1990? We can impose S1>=1990. Qn: could we impose if the view was instead v7(A) same body ? What happens when v8 is considered?