290 likes | 397 Vues
This discussion revolves around advanced techniques in SQL query optimization through logical plan rewriting. The focus is on transforming logical queries into efficient relational algebra representations while applying transformation rules to preserve equivalence and reduce execution costs. We explore how to estimate result sizes and costs for different query plans, incorporating statistics and physical plan considerations. The presentation includes practical examples and heuristics for identifying good transformations, emphasizing early selection and rule application that facilitate effective query processing.
E N D
CS4432: Database Systems II Logical Plan Rewriting
SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} query processing
Query in SQL Query Plan in Algebra (logical) Other Query Plan in Algebra (logical)
Query plan 1 (in relational algebra) B,D sR.A=“c” S.E=2 R.C=S.C X R S
Query plan 2 (in relational algebra) B,D sR.A = “c”sS.E = 2 R S natural join on R.C=S.C
Relational algebra optimization • What are transformation rules ? • preserve equivalence • What are good transformations? • reduce query execution costs
Can also write as trees, e.g.: T R R S S T Rules:Natural join rewriting. R S = S R (R S) T = R (S T)
Rules: Other binary operators ? R S = S R (R S) T = R (S T) • What about : • Cross product? • Condition join? • Union? • Intersection ? • Difference ?
Rules: Selects sp1p2(R)= sp1 [ sp2 (R)] [ sp1 (R)] U [ sp2 (R)] sp1vp2(R) =
Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} What about union R U S = ? • Option 1 SUM • R U S = {a,a,b,b,b,b,b,c,c,c,d} • Option 2 MAX • R U S = {a,a,b,b,b,c,c,d}
Which option makes this rule work ?sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c • sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,c} Let us try MAX():
Which option makes this rule work ?sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c • sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,b,b,b,c} What about Sum()?
Which option makes this rule work ?sp1p2(R)= sp1 [ sp2 (R)] Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c What about MAX versus SUM ?
Yet another example ! Senators (……) Reps (……) T1 = pyr,state Senators; T2 = pyr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? “Sum” option makes more sense!
Executive Decision -> Use “SUM” option for bag unions -> CAREFUL ! Some rules cannot be used for bags
Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)]
[sp (R)] S R [sq (S)] Rules:s + combined Let p = predicate with only R attributes q = predicate with only S attributes m = predicate with both R and S attribs sp (R S) = sq (R S) =
Rules:s + combined spq (R S) = ? Rule can be derived !
Derivation for rule : spq (R S) = sp [sq (R S) ] = sp[ R sq (S) ] = [sp (R)] [sq (S)]
Rules:s + combined (continued) More Rules can be Derived: spq (R S) = spqm (R S) = spvq (R S) =
We did one, do others on your own : spq (R S) = [sp (R)] [sq (S)] spqm (R S) = sm[(sp R) (sq S)] spvq (R S) = [(sp R) S] U [R(sq S)]
pxz px Rules:p,s combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = {sp [ px (R) ]}
pxy{[pxz (R) ][pyz (S) ]} Rules:p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S)=
Conventional wisdom: do projects early Example: relation R(A,B,C,D,E) predicate P: (A=3) (B=“cat”) pE {sp(R)} vs. pE {sp{pABE(R)}}
What if we have A, B indexes? But B = “cat” A=3 Intersect pointers to get pointers to matching tuples! Then better to do projection later !
Which are “good” transformations? sp1p2 (R) sp1 [sp2 (R)] sp (R S) [sp (R)] S R S S R px [sp(R)] px {sp [pxz(R)]}
Bottom line: • Some heuristics : • Early selection is usually good • No transformation is always good • Rule application defines a search space • Need cost criteria to make decision
In textbook: more transformations • Chapter 16.2, 16.3.3 • More rewrite rules • Other operations, such as, duplicate elimination, etc.