180 likes | 287 Vues
In this talk, we explore the intricacies of view matching algorithms specifically tailored for outer joins. We begin with the motivation behind utilizing outer joins in database queries, especially in OLAP and hierarchical views. We define key concepts, present the join-disjunctive normal form, and discuss a robust view-matching algorithm. We also share experimental results that illustrate the efficiency of our approach in optimizing queries involving outer joins, including applications in generating negative subqueries. This research was presented at VLDB 2005.
E N D
View Matching for Outer-Join Views Paul Larson and Jingren ZhouMicrosoft Research
Outline of talk • Motivation • Definitions and notation • Join-disjunctive normal form • View-matching algorithm • Experimental results VLDB 2005
Outer-join usage Outer joins can be used for several purposes • OLAP queries that preserve tuples from dimension tables • Construct hierarchical views (XML) that preserve objects with no children • Computing negative subqueries (NOT EXISTS, NOT IN, ALL…) SELECT Department.dname FROM Department WHERE NOT EXISTS (SELECT * FROM Employee WHERE Department.dno = Employee.dno) SELECT Department.dname FROM Department LEFT OUTER JOIN Employee ON (Department.dno = Employee.dno) WHERE Employee.dno IS NULL VLDB 2005
Motiving example SELECT ... FROM Part LEFT OUTER JOIN Lineitem ON (p_partkey=l_partkey) ? View V: SELECT * FROM Part LEFT OUTER JOIN (Orders LEFT OUTER JOIN Lineitem ON (o_orderkey=l_orderkey)) ON (p_partkey=l_partkey) NOTE:l_orderkey -> o_orderkey l_partkey -> p_partkey Rewrite:SELECT ... FROM VWHERE p_partkey is not null ? SELECT ... FROM Orders LEFT OUTER JOIN Lineitem ON (o_orderkey=l_orderkey) VLDB 2005
Definitions and notation (1) • Predicate P is strong (null-rejecting) if P is false when any referenced column is null • Outer union R ⊎ S • Schema is union of columns in R and S • Null-extend input tuples and take union VLDB 2005
Definitions and notation (2) • A tuple t1 subsumes as tuple t2 if • t1 agrees with t2 on all non-null columns and • t1 contains fewer null values than t2 • Removal of subsumed tuples T⇓ eliminates all subsumed tuples from T • Minimum union: R ⊕ S = (R ⊎ S)⇓ • Is associative and commutative • Left outer join: T1 o⋈p T2 = (T1 ⋈p T2) ⊕ T1 • Full outer join: T1 o⋈op T2 = (T1 ⋈p T2) ⊕ T1 ⊕ T2 VLDB 2005
Normalize σp1∧p2(P×O×L) σp1(O×L) O P Join-disjunctive normal form Full outer join P2 is null-rejecting σp2(P,O), σp1∧p2(P,O,L); P; O, σp1(O,L); Left outer join o⋈op2 Part o⋈p1 O; σp1(O,L) Orders Lineitem Minimum union σp1∧p2(P×O×L) ⊕ σp1(O×L)⊕O ⊕ P C. Galindo-Legaria, Sigmod, 1994 VLDB 2005
? Matching term by term View V Query 1 Query 2 o⋈p2 o⋈p1 o⋈p2 Part o⋈p1 Part Lineitem Orders Lineitem Foreign-key join from L to O Orders Lineitem σp1(O,L) ⊕ O σp2(P,L) ⊕ P σp1∧p2(P,O,L) ⊕ P VLDB 2005
Outer-join view matching View Step 1: Convert to normal form Step 2: Check containment of terms Step 3: Recover required terms - selects with not-null predicates Step 4: Select desired tuples - apply residual query predicates Step 5: Eliminate subsumed tuples - selects + [duplicate elimination] Step 6: Combine partial results - outer union of partial results - reduce no of scans Query VLDB 2005
Example view and query View V1: Select lok, ln, lq, lp, ok, od, otp, ck, cn, cnkfrom (select * from C where cnk < 10) Cr right outer join ((select * from O where otp > 50 ) Or full outer join (select * from L where lq < 100) Lr on (ok = lok) ) OLj on (ck = ock) Query Q1: Select lok, lq, lp, od, otpfrom (select * from O where otp > 150 ) Or right outer join (select * from L where lq < 100) Lr on (ok = lok) VLDB 2005
Step 1: Convert to normal form V1 = σ[cnk<10 & otp >50 & lq <100 & jp_co & jp_ol] (C,O,L) ⊕ σ[cnk<10 & otp >50 & jp_co] (C,O) ⊕ σ[otp >50 & lq < 100 & jp_ol] (O,L) ⊕σ[otp >50 ] (O) ⊕σ[lq < 100] (L) Q1 = σ[otp >150 & lq < 100 & jp_ol] (O,L) ⊕σ[lq < 100] (L) VLDB 2005
Step 2: Check containment Is every query tuple contained in the view? Test each term: pred(Q) ⇒pred(V) ? (O,L)-term: (otp >150 & lq < 100 & jp_ol) ⇒(otp >50 & lq < 100 & jp_ol) (L)-term: (lq < 100) ⇒ (lq < 100) VLDB 2005
Step 3: Term recovery • (O,L)-term: • Duplicate elimination not required because the (C,O,L) term has the same hub as the (O,L) term • Non-null columns available for O and L • σ[ok ≠ null & ln ≠ null] V1 • (L)-term: • Duplicate elimination not required because the terms (C,O,L) and (O,L) have the same hub as the (L) term • Non-null columns available for L • σ[ln ≠ null] V1 VLDB 2005
Step 4: Select desired tuples • (O,L)-term: • View predicate: (otp > 50 & lq < 100 & jp_ol) • Query predicate: (otp >150 & lq < 100 & jp_ol) • σ[otp > 150]σ[ok ≠ null & ln ≠ null] V1 • (L)-term: • View predicate: (lq < 100) • Query predicate: (lq < 100) • σ[ln ≠ null] V1 VLDB 2005
Step 5: Eliminate subsumed tuples • (O,L)-term: • Maximal term contains no subsumed tuples • σ[otp > 150 & ok ≠ null & ln ≠ null] V1 • (L)-term: • An (L)-tuple may be subsumed by a tuple in the (O,L)-term • Discard all tuples that also satisfy predicate of (O,L)-term • σ[ln ≠ null & ~(otp > 150 & ok ≠ null & ln ≠ null) ] V1 VLDB 2005
Step 6: Combine partial results • Project each term onto required columns • Combine using outer union • Q1 =Πlok, lq, lp, od, otpσ[otp > 150 & ok ≠ null & ln ≠ null] V1⊎Πlok, lq, lp σ[ln ≠ null & ~(otp > 150 & ok ≠ null & ln ≠ null) ] V1 • The two predicates are mutually exclusive • Combine the two scans to produce final substitute expression • Q1 =Πlok, lq, lp, cstmtσ[ln ≠ null ) ] V1 • cstmt = case when ln ≠ null & ~(otp > 150 & ok ≠ null & ln ≠ null) then null, null else od, otp end VLDB 2005
Experimental results Database: TPC-R, 1GB View: σ(C)⊕ σ(C,O) ⊕σ(C,O,L) Query 1: σ(C,O,L) Query 2: σ(C,O) ⊕σ(C,O,L) Query 3: σ(C,O) ⊕σ(C,O,L), two scans, one with duplicate elimination Query 4: σ(C,O), duplicate elimination VLDB 2005
Conclusion • First general view-matching algorithm for outer-join views • Produces efficient substitute expressions • Extended to outer-join views with aggregation (see paper and tech report) • Additional results and proofs in tech report (MSR-TR-2005-78) available at www.research.microsoft.com VLDB 2005