100 likes | 252 Vues
Query Optimization Advanced Databases By Dr. Akhtar Ali. Overview of Query Optimization. Logical Optimization: Using some transformation rules and algebraic equivalences To choose different join orders R ⋈ S S ⋈ R (Commutative) (R ⋈ S) ⋈ T R ⋈ ( S ⋈ T) (Associative)
E N D
Query Optimization Advanced Databases By Dr. Akhtar Ali
Overview of Query Optimization • Logical Optimization: Using some transformation rules and algebraic equivalences • To choose different join orders • R ⋈S S ⋈R (Commutative) • (R ⋈S) ⋈T R ⋈ (S ⋈T) (Associative) • To push selections and projections bellow of joins • (attr1 op value (R) ) ⋈S attr1 op value(R ⋈S ) • attr1, attr2, attr3 ((attr1, attr2 (R) ) ⋈S ) attr1, attr2,attr3 (R ⋈S ) • Physical Optimization: For a given R.A expression • There could be several different plans possible using different implementation of the R.A operators • Calculate the cost of these different plans and choose the best • Ideally, we want the best plan, but • Practically, we should avoid worst plans
Pushing Selection & Projection below Join • A join is quite expensive operation. • The cost can be reduced by reducing the sizes of the input relations. • The sizes of the Input Relations can be reduced by applying: • Selection: restricting the input relation. • Projection: reducing the number of columns. • Usually Selection reduces the size of the input relation more than Projection. • Projection before the Join should be done quite carefully as the cost of Projection could increase the overall cost if it does not reduce by a good factor the size of the input.
Estimating the Evaluation Cost of a Plan • For each plan, we should be able to estimate the overall cost. • For each node in the query tree, we estimate the cost of performing the corresponding operation; • For each node in the query tree, we estimate the size of the result, which is used by the operation in the parent node; • In order to correctly estimate the cost of each operation and the size of its result, the optimizer uses certain statistical information maintained by the DBMS.
Statistics Maintained by a DBMS • Cardinality • The number of tuples for each relation. • Size • The number of pages for each relation. • Index Cardinality • The number of distinct key values for each index. • Index Size • The number of pages for each index. • Index Height • The number of non-leaf levels for each tree index. • Index Range • The minimum present key value and the maximum present key value for each index.
A motivating example SELECT S.sname FROM Reserves R, Sailors S Where R.sid = S.sid AND R.bid = 100 AND S.rating > 5 • Cost = 500 + 500 * 1000 I/Os (Using nested loops – page-oriented) • 500,500 I/Os • This is not an efficient plan. • We could have pushed selections down before join, no index is used. • Goal of Optimization: • To find more efficient plans that compute the same answer.
Optimization: Alternative 1 (no index) • Push Selects before join • Assuming that there are 1000 tuples in Reserves with bid=100 and 20000 tuples in Sailors with rating > 5. So Cost of Selections: • Scan Reserves (1000 pages) and write the selected 1000 tuples to temp relation T1 (10 pages), so in total 1010 I/Os • Scan Sailors (500 pages) and write the selected 20000 tuples to temp relation T2 (250 pages), so in total 750 I/Os • Total cost = 1010 + 750 = 1760 I/Os so far • Using SNL the cost = 10 + 10 * 250 = 2510 I/Os • Total cost = 1760 + 2510 = 4270 I/Os about 117 times less than the cost of the initial plan i.e. 500,500 I/Os
Optimization: Alternative 2 (uses indexes) • The same RA as alternative 1 • Using a clustered Hash index on Reserves: • A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples • Cost of Selection = 1 + 10 = 11 I/Os • Cost of Writing to T1 = 10 I/Os • Sub-Total = 11 +10 = 21 • The size of T1 = 10 pages • Using a clustered B+ tree index on Sailors: • Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os, Cost of Writing to T2 = 250 I/Os, Sub-Total = 252 + 250 = 502, and the size of T2 = 250 pages • Using SNL the cost is: • Cost = 10 + 10 * 250 = 2510 I/Os • Total cost = 21 + 502 + 2510 = 3033 I/Os, which is 165 times less than the cost of the initial plan i.e. 500,500 I/Os
Optimization: Alternative 3(using pipelining i.e. intermediate results are not written to disk) • The same RA as alternative 1 • Using a clustered Hash index on Reserves: • A hash index will take 1 plus 10 to retrieve the 1000 qualifying tuples • Cost of Selection = 1 + 10 = 11 I/Os • The size of T1 = 10 pages • Using a clustered B+ tree index on Sailors: • Cost of Selection = 2 (the constant cost) + 250 = 252 I/Os • The size of T2 = 250 pages • Using Block-Nested Loops join with 7 buffer pages: • Cost = • Total cost = 11 + 252 + 510 = 773 I/Os, which is 647 times faster than the initial plan and about 4 times faster than the previous one.
Summary • Query optimization is very important component of a DBMS. • We must understand the principals of optimization to know how it influences the performance of a database system. • Two ways of optimization: • Logical: Push projection/selection below the join • Physical: Using indexes and better implementation of relational operators.