1 / 44

Determining the Cost of Algebraic Operators

Determining the Cost of Algebraic Operators. Background: Memory, Files, Indexes. Selection. External Sort. Join. Evaluating Individual Operators. Other. Finding Efficient Query Plans. Optimizer Architecture. Cost of Combining Operators. Enumerating Plans. Relational Query Optimization.

rossa
Télécharger la présentation

Determining the Cost of Algebraic Operators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determining the Cost of Algebraic Operators Background: Memory, Files, Indexes Selection External Sort Join Evaluating Individual Operators Other Finding Efficient Query Plans Optimizer Architecture Cost of Combining Operators Enumerating Plans

  2. Relational Query Optimization Evaluating Queries with Several Operators

  3. Topic • Until now, we have seen how to optimize queries with a single operator • Most queries have several operators • We now discuss how queries with several operators can be optimized

  4. Simplest Way to Implement Complex Queries • Evaluate operators, one at a time • After each evaluation, write temporary results to disk • Read temporary results from disk, as input to the next operator

  5. Simplest Way to Implement Complex Queries • Example: • Compute join and write result T1 to disk • Read T1, compute selection, and write result T2 to disk • Read T2, compute projection, and return to user SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = 100 and S.rating>5 sname(bid=100 and rating>5(Reserves Sailors))

  6. Simplest Way to Implement Complex Queries • Compute join and write result T1 to disk • Read T1, compute selection, and write result T2 to disk • Read T2, compute projection, and return to user • A query plan: sname(bid=100 and rating>5(Reserves Sailors)) sname File scan, write to T2 rating > 5 bid=100 Block nested loops, write to T1 sid=sid Sailors Reserves 6

  7. Simplest Way to Implement Complex Queries • Think about it: • Does this seem efficient? • No. join and select can be calc together, can print the name while choosing it. • 2) Can we use an index for the selection? No. the selection is for temp table that has no index. • 3) How would we implement the projection? • as we said in last part (last slide). • Compute join and write result T1 to disk • Read T1, compute selection, and write result T2 to disk • Read T2, compute projection, and return to user • A query plan: sname(bid=100 and rating>5(Reserves Sailors)) sname File scan, write to T2 rating > 5 bid=100 Block nested loops, write to T1 sid=sid Sailors Reserves 7

  8. Query Optimization: Basic Architecture Query Parser Query Optimizer Plan Generator Plan Cost Estimator Catalog Manager Query Plan Evaluator

  9. Simplifications • SQL Queries may be composed of several blocks • Each block is optimized separately • In our discussion, we assume a single block SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ and S.rating = (SELECT MAX(rating) FROM Sailors) GROUP BY S.sid HAVING count(*)>2

  10. Simplifications • Queries may contain group-by and aggregation • Are applied as a final step in evaluation (how? Hash/ Sort) • We only consider queries without aggregation SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ and S.rating = (SELECT MAX(rating) FROM Sailors) GROUP BY S.sid HAVING count(*)>2

  11. Simplifications • By making the simplifications discussed, we derive queries that can be translated into relational algebra. We show how to evaluate such queries. SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid = R.sid and R.bid = B.bid and B.color = ‘red’ and S.rating = (SELECT MAX(rating) FROM Sailors) GROUP BY S.sid HAVING count(*)>2

  12. Query Plans

  13. Translating to Algebra Trees SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = 100 and S.rating>5 sname(bid=100 and rating>5(Reserves Sailors)) sname rating > 5 bid=100 sid=sid Sailors Reserves

  14. sname rating > 5 bid=100 sid=sid Sailors Reserves (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Query Plans • A query plan is a relational algebra tree, annotated with access and evaluation methods • Convention: In trees when using block/index nested loops join, left child is outer relation

  15. (On-the-fly) sname (Sort-Merge Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Alternative Plans • There may be many different query plans for the same query. Query optimizer must: • enumerate such plans • choose the best plan

  16. Pipelined Evaluation Evaluation “On the Fly” פעולת ביניים לא נכתבת אל הדיסק אלא מועברת ישירות לפעולה הבאה

  17. Pipelining • When evaluating a query, the results of one operator may have to be fed into another operator. This can be done in two ways: • Materializing: the first operator creates a temporary relation, which is read by the second operator • Pipelining (on the fly computation): the results of the first operator are fed directly into the second operator • Pipelining can save on the I/O costs! • We save the cost of writing a temporary result to disk! • We save the cost of reading for the following operation!

  18. Pushing Selections/Projections and Pipelining Why? • BNLO: R is read once. While reading check the selection cond. If it does add it to the projectile. Without writing to the disk. • INLO: same. • When both possible it is always better to push and pipeline

  19. When is Pipelining Impossible • Impossible if: • Input must be read several times • Block nested loops join – inner relation • Input must be read via anindex • Index nested loops join – inner relation • Input must be read in a specific order (and the previous operator does not output in that order • Sort merge join

  20. Example 1: Pipelining in Unary Operators • Suppose we want to apply two selections on the table Reserves: date<05/05/05 and bid=100. We have a BTree index on date: • Materializing: Find tuples matching date condition using BTree. Store in temporary relation. Scan relation to find tuples matching bid condition • Pipelining: Find tuples matching date condition using BTree. As tuples are retrieved, apply second condition • Pipelining into unary operators is called on-the-flyevaluation

  21. C B A Example 2: Pipelining in Joins • Suppose we want to compute (A B) C • Materializing: Compute A B, store and then read and join with C • Pipelining: The joins can be pipelined if some type of nested loops join is used. • Read blocks of tuples from A • Find matching tuples from B (using a scan or index) • Find matching tuples from C

  22. C B A Pipelining Joins: Questions • Can the execution of the join be pipelined if the join with C is performed using sort-merge join? • No. • Example: R(A, B), S(B, C), T(C, D) • Sort-merge (R, S) will be sorted by B • Sort-merge (S, T) will be sorted by C • Therefore C must be read from the file, Meaning it must have been written to the file

  23. Fully Pipelined Execution • The execution is fully pipelinedif no temporary relations are written to disk throughout the entire query processing

  24. D D C C D B A C B A B A Pipelining Joins: Questions • Which of the following trees can allow for fully pipelined execution (e.g., using a nested loops join)? Middle one.

  25. D D C C D B A C B A B A Left Deep Plans • A plan is left-deep if the inner relation of each join (i.e., the right child of each join operator) is a base relation. Which of the following are left deep? • Left deep plans allow for fully pipelined execution

  26. Alternative Plans Motivating Example

  27. Motivating Example • We will consider the cost of various plans for the following query. • NOTE: No distinct in SELECT clause! SELECT S.sname FROM Sailors S, Reserves R WHERE S.sid = R.sid and R.bid = 100 and S.rating>5

  28. What is the cost of this plan, if the buffer is of size 5? BR + BS * BR / (B-2) 1000+500*1000/3= 168,000 What would you suggest to change in this plan? Switch between the join and the select Switch between S and R Do the projection early (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves

  29. (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Alternative Plan: Pushing Selections • Selections generally reduce the sizes of the relations • Often a good strategy to push the selections, i.e., apply early, so that the join is over smaller relations • Which selections could be pushed in this query?

  30. (On-the-fly) sname (Block NL Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors • This is an plan has fully-pushed selections כל הבחירות נעשות כמה שיותר מוקדם

  31. (On-the-fly) sname (Block NL Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors • Find cost, assuming: • 5 buffer pages • 100 different boats, uniform distribution on bid • rating between 1 and 10, uniformly distributed

  32. (On-the-fly) sname (Block NL Join) sid=sid (Scan; (Scan; write to write to rating > 5 bid=100 temp T2) temp T1) Reserves Sailors • Finding cost: קריאת reserve: 1000 כתיבת T1: גודל T1 הוא כ 1000/100. לכן עולה כ 10 קריאת salors: 500 כתיבת T2: 500/2 = 250. לכן כתיבת T2 תעלה כ 250 Block Nested Loop: 10 + 250*10/3 = 1010 סה"כ העלות היא: 2770 = 1000+10+500+250+1010

  33. Pushing Selections/Projectionsand Pipelining If we tried to push selection/projection we would no longer have an index

  34. (On-the-fly) sname (On-the-fly) rating > 5 bid=100 (Block Nested Loops) sid=sid Sailors Reserves Alternative Plan: Pushing Projections • Projections also reduce the sizes of the relations • Often a good strategy to push the projections, i.e., apply early, so that the join is over smaller relations • Which projections could be pushed in this query? • Note: in general, duplicate elimination is performed only in the outermost projection

  35. This is an plan has fully-pushed selections and projections (On-the-fly) sname (Block NL Join) sid=sid (On-the-fly; write to T2) (On-the-fly; write to T1) sid sname,sid (Scan) (Scan) rating > 5 bid=100 Reserves Sailors

  36. נניח ש sid תופס 10 byte, ונניח sid, sname תופס 25 byte חישוב עלות קריאת Reserve: 1000 כתיבת T1: 1000/(100*4) = 3 (ערך עליון) קריאת Salors: 500 כתיבת T2: 500/(2*2) = 125 חישוב BNL: 3 + 125*3/3 = 128 סה"כ העלות: 1000+3+500+125+128 = 1756 (On-the-fly) sname (Block NL Join) sid=sid (On-the-fly; write to T2) (On-the-fly; write to T1) sid sname,sid (Scan) (Scan) rating > 5 bid=100 Reserves Sailors

  37. Alternative Plan: Using Indexes • In there are indexes, then it may be preferable to access one (or more) of the relations via the indexes, instead of by using a sequential scan • Suppose we have a clustered hash index on bid of Reserves and a hash index on sid of Sailors

  38. Find cost, assuming that access to hash table indices costs 1.2 I/Os Is the selection fully pushed? No.Is this good? No (index). (On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Alternative Plan: Using Indexes

  39. חישוב עלויות עלות INL: BR+TR(time to find matching raws in S) עלות קריאת R היא בזמן הבחירה, לכן אין צורך לקרוא אותו שוב בשלב ה INL. קריאת Reserve: 1.2 + 1000*100/(100*100) = 11.2 חלוקה אחת ב 100 כי בהסתברות 0.01 נקבל bid=100 חלוקה שניה ב 100 היא בגלל שהאינדקס מכווץ. עלות INL: 1000*(1.2+1) סה"כ העלות: 2211.2 (On-the-fly) sname (On-the-fly) rating > 5 (Index Nested Loops, with pipelining ) sid=sid (Use hash Sailors bid=100 index; do not write result to temp) Reserves Alternative Plan: Using Indexes

  40. Determining the Cost of Algebraic Operators Background: Memory, Files, Indexes Selection External Sort Join Evaluating Individual Operators Other Finding Efficient Query Plans Optimizer Architecture Cost of Combining Operators Enumerating Plans

  41. Calculating the Costs: Block NL • N is number of blocks in the outer relation • M is number of blocks in the inner relation • B is number of buffer blocks • Cost, selections/projections are not pipelined: N + M(N/(B-2)) • Cost, selections/projections are pipelined on outer relation: M(N/(B-2)) • Notes: If selections/projections are pushed through the join, then N and M are the sizes of the relations AFTER the selections/projections

  42. Calculating the Costs: Index NL • N is number of blocks in the outer relation • T is number of tuples in the outer relation • S is the cost of traversing the index • Tmis number of tuples in the inner relation that match each tuple in the outer relation • P is number of tuples in the inner relation that fit into a page • Notes: If selections/projections are pushed through the join, then N and T are the sizes AFTER the selections/projections

  43. Calculating the Costs: Sort-Merge Join • N is number of blocks in the outer relation • M is number of blocks in the inner relation • B is number of buffer blocks • Cost: 2N(1+ logB-1(N/B)) + 2M(1+ logB-1(M/B)) + N + M • If outer relation is already sorted: 2N(1+ logB-1(N/B)) + N + M • If inner relation is already sorted: 2M(1+ logB-1(M/B)) + N + M • Bother relations are already sorted: N + M • Notes: If selections are pushed through the join, then N and M are the sizes of the relations AFTER the selections

  44. Determining the Cost of Algebraic Operators Background: Memory, Files, Indexes Selection External Sort Join Evaluating Individual Operators Other Finding Efficient Query Plans Optimizer Architecture Cost of Combining Operators Enumerating Plans

More Related