CS4432: Database Systems II

CS4432: Database Systems II Logical Plan Rewriting

SQL query parse parse tree convert answer logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..} query processing

Query in SQL  Query Plan in Algebra (logical)  Other Query Plan in Algebra (logical)

Query plan 1 (in relational algebra) B,D sR.A=“c” S.E=2  R.C=S.C X R S

Query plan 2 (in relational algebra) B,D sR.A = “c”sS.E = 2 R S natural join on R.C=S.C

Relational algebra optimization • What are transformation rules ? • preserve equivalence • What are good transformations? • reduce query execution costs

Can also write as trees, e.g.: T R R S S T Rules:Natural join rewriting. R S = S R (R S) T = R (S T)

Rules: Other binary operators ? R S = S R (R S) T = R (S T) • What about : • Cross product? • Condition join? • Union? • Intersection ? • Difference ?

Rules: Selects sp1p2(R)= sp1 [ sp2 (R)] [ sp1 (R)] U [ sp2 (R)] sp1vp2(R) =

Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} What about union R U S = ? • Option 1 SUM • R U S = {a,a,b,b,b,b,b,c,c,c,d} • Option 2 MAX • R U S = {a,a,b,b,b,c,c,d}

Which option makes this rule work ?sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c • sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,c} Let us try MAX():

Which option makes this rule work ?sp1vp2 (R) = sp1(R) U sp2(R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c • sp1vp2 (R) = {a,a,b,b,b,c}sp1(R) = {a,a,b,b,b}sp2(R) = {b,b,b,c}sp1(R) U sp2 (R) = {a,a,b,b,b,b,b,b,c} What about Sum()?

Which option makes this rule work ?sp1p2(R)= sp1 [ sp2 (R)] Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c What about MAX versus SUM ?

Yet another example ! Senators (……) Reps (……) T1 = pyr,state Senators; T2 = pyr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? “Sum” option makes more sense!

Executive Decision -> Use “SUM” option for bag unions -> CAREFUL ! Some rules cannot be used for bags

Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y pxy (R) = px [py (R)]

[sp (R)] S R [sq (S)] Rules:s + combined Let p = predicate with only R attributes q = predicate with only S attributes m = predicate with both R and S attribs sp (R S) = sq (R S) =

Rules:s + combined spq (R S) = ? Rule can be derived !

Derivation for rule : spq (R S) = sp [sq (R S) ] = sp[ R sq (S) ] = [sp (R)] [sq (S)]

Rules:s + combined (continued) More Rules can be Derived: spq (R S) = spqm (R S) = spvq (R S) =

We did one, do others on your own : spq (R S) = [sp (R)] [sq (S)] spqm (R S) = sm[(sp R) (sq S)] spvq (R S) = [(sp R) S] U [R(sq S)]

pxz px Rules:p,s combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) px[sp (R) ] = {sp [ px (R) ]}

pxy{[pxz (R) ][pyz (S) ]} Rules:p, combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes pxy (R S)=

Which are “good” transformations?

Conventional wisdom: do projects early Example: relation R(A,B,C,D,E) predicate P: (A=3)  (B=“cat”) pE {sp(R)} vs. pE {sp{pABE(R)}}

What if we have A, B indexes? But B = “cat” A=3 Intersect pointers to get pointers to matching tuples! Then better to do projection later !

Which are “good” transformations? sp1p2 (R) sp1 [sp2 (R)] sp (R S)  [sp (R)] S R S  S R px [sp(R)] px {sp [pxz(R)]}

Bottom line: • Some heuristics : • Early selection is usually good • No transformation is always good • Rule application defines a search space • Need cost criteria to make decision

In textbook: more transformations • Chapter 16.2, 16.3.3 • More rewrite rules • Other operations, such as, duplicate elimination, etc.

CS4432: Database Systems II

CS4432: Database Systems II

Presentation Transcript

Chapter 8 Physical Database Design

Principles of Database Management Systems 4.2: Hashing Techniques

Mobile Database Systems

Chapter 12: Database Managers on z/OS

CS216 Advanced Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition

Advanced Database Systems

Database Systems: Design, Implementation, and Management Tenth Edition

CPS216: Advanced Database Systems Notes 07:Query Execution (Sort and Join operators)

COP5725 Advanced Database Systems

CSC 545: Database Systems

CPS216: Advanced Database Systems

Introduction to Database Management Systems (DBMS)

Fundamentals of Database Systems Transaction Management

Database II

CS216 Advanced Database Systems

CS216 Advanced Database Systems

Mobile Database Systems

Database Management Systems Session 9