1 / 61

Cse 344

Cse 344. February 12 th – RDBMS Internals. Administrivia. HW5 out tonight OQ5 out Wednesday Both due February 21 (11:30 & 11:00) Exam grades on canvas by Wednesday Handed back in section on Thursday. Today. Back to RDBMS ”Query plans” and DBMS planning

msawyer
Télécharger la présentation

Cse 344

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cse 344 February 12th– RDBMS Internals

  2. Administrivia • HW5 out tonight • OQ5 out Wednesday • Both due February 21 (11:30 & 11:00) • Exam grades on canvas by Wednesday • Handed back in section on Thursday

  3. Today • Back to RDBMS • ”Query plans” and DBMS planning • Management between SQL and execution • Optimization techniques • Indexing and data arrangement

  4. Query Evaluation Steps SQL query Parse & Rewrite Query Logicalplan (RA) Select Logical Plan Queryoptimization Select Physical Plan Physicalplan Query Execution Disk

  5. Logical vs Physical Plans • Logical plans: • Created by the parser from the input SQL text • Expressed as a relational algebra tree • Each SQL query has many possible logical plans • Physical plans: • Goal is to choose an efficient implementation for each operator in the RA tree • Each logical plan has many possible physical plans

  6. Review: Relational Algebra Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) SELECTsname FROM Supplier x, Supply y WHEREx.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ πsname σscity=‘Seattle’and sstate=‘WA’ andpno=2 sid = sid Relational algebra expression is also called the “logical query plan” Supplier Supply

  7. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 1 (On the fly) πsname A physical query plan is a logical query plan annotated with physical implementation details (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 SELECTsname FROM Supplier x, Supply y WHEREx.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Nested loop) sid = sid Supplier Supply (File scan) (File scan)

  8. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 2 (On the fly) πsname Same logical query plan Different physical plan (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 SELECTsname FROM Supplier x, Supply y WHEREx.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (Hash join) sid = sid Supplier Supply (File scan) (File scan)

  9. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Physical Query Plan 3 Different but equivalent logical query plan; different physical plan πsname (d) (On the fly) SELECTsname FROM Supplier x, Supply y WHEREx.sid = y.sid and y.pno = 2 and x.scity = ‘Seattle’ and x.sstate = ‘WA’ (c) (Sort-merge join) sid = sid (Scan & write to T1) (a) σscity=‘Seattle’and sstate=‘WA’ (b) σpno=2 (Scan &write to T2) Supplier Supply (File scan) (File scan) CSE 344 - 2017au

  10. Query Optimization Problem • For each SQL query… many logical plans • For each logical plan… many physical plans • Next: we will discuss physical operators;how exactly are query executed?

  11. Physical Operators Each of the logical operators may have one or more implementations = physical operators Will discuss several basic physical operators, with a focus on join

  12. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sidSupply Propose three physical operators for the join, assuming the tables are in main memory:

  13. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sidSupply Propose three physical operators for the join, assuming the tables are in main memory: • Nested Loop Join O(??) • Merge join O(??) • Hash join O(??) CSE 344 - 2017au

  14. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Main Memory Algorithms Logical operator: Supplier ⨝sid=sidSupply Propose three physical operators for the join, assuming the tables are in main memory: • Nested Loop Join O(n2) • Merge join O(n log n) • Hash join O(n) … O(n2) CSE 344 - 2017au

  15. BRIEF Review of Hash Tables Separate chaining: A (naïve) hash function: Duplicates OK WHY ?? h(x) = x mod 10 Operations: find(103) = ?? insert(488) = ??

  16. BRIEF Review of Hash Tables • insert(k, v) = inserts a key k with value v • Many values for one key • Hence, duplicate k’s are OK • find(k) = returns the list of all values v associated to the key k CSE 344 - 2017au

  17. Iterator Interface Each operator implements three methods: • open() • next() • close() CSE 344 - 2017au

  18. Iterator Interface Example “on the fly” selection operator interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Iterator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; while (!found) { Tuple in = child.next(); if (in == EOF) return EOF; found = p(in); } return in; } void close () { child.close(); } }

  19. Iterator Interface Example “on the fly” selection operator interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; while (!found) { Tuple in = child.next(); if (in == EOF) return EOF; found = p(in); } return in; } void close () { child.close(); } }

  20. Iterator Interface Example “on the fly” selection operator interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { } }

  21. Iterator Interface Example “on the fly” selection operator interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(in); } } }

  22. Iterator Interface Example “on the fly” selection operator interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(in); } return r; } }

  23. Iterator Interface Example “on the fly” selection operator interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } class Select implements Operator {... void open (Predicate p, Operator child) { this.p = p; this.child = child; } Tuple next () { boolean found = false; Tuple r = null; while (!found) { r = child.next(); if (r == null) break; found = p(in); } return r; } void close () { child.close(); } }

  24. Iterator Interface interfaceOperator { // initializes operator state // and sets parameters void open (...); // calls next() on its inputs // processes an input tuple // produces output tuple(s) // returns null when done Tuple next (); // cleans up (if any) void close (); } Operator q = parse(“SELECT ...”); q = optimize(q); q.open(); while (true) { Tuple t = q.next(); if (t == null) break; else printOnScreen(t); } q.close(); Query plan execution

  25. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join (On the fly) πsname (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  26. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join open() (On the fly) πsname (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  27. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  28. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 open() (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  29. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 open() (Nested loop) sno = sno open() Suppliers Supplies (File scan) (File scan)

  30. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join open() (On the fly) πsname open() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 open() (Nested loop) sno = sno open() open() Suppliers Supplies (File scan) (File scan)

  31. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join next() (On the fly) πsname (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  32. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  33. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 next() (Nested loop) sno = sno Suppliers Supplies (File scan) (File scan)

  34. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 next() (Nested loop) sno = sno next() Suppliers Supplies (File scan) (File scan)

  35. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 next() (Nested loop) sno = sno next() next() Suppliers Supplies (File scan) (File scan)

  36. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss: open/next/closefor nested loop join next() (On the fly) πsname next() (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 next() (Nested loop) sno = sno next() next() next() Suppliers Supplies (File scan) (File scan)

  37. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss hash-joinin class (On the fly) πsname (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Hash Join) sno = sno Suppliers Supplies (File scan) (File scan)

  38. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss hash-joinin class (On the fly) πsname (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Hash Join) sno = sno Tuples fromhere arepipelined Suppliers Supplies (File scan) (File scan)

  39. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Pipelining Discuss hash-joinin class (On the fly) πsname (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 Tuples fromhere are“blocked” (Hash Join) sno = sno Tuples fromhere arepipelined Suppliers Supplies (File scan) (File scan)

  40. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Blocked Execution (On the fly) πsname Discuss merge-joinin class (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Merge Join) sno = sno Suppliers Supplies (File scan) (File scan)

  41. Supplier(sid, sname, scity, sstate) Supply(sid, pno, quantity) Blocked Execution (On the fly) πsname Discuss merge-joinin class (On the fly) σscity=‘Seattle’and sstate=‘WA’ andpno=2 (Merge Join) sno = sno Blocked Blocked Suppliers Supplies (File scan) (File scan)

  42. Pipelined Execution • Tuples generated by an operator are immediately sent to the parent • Benefits: • No operator synchronization issues • No need to buffer tuples between operators • Saves cost of writing intermediate data to disk • Saves cost of reading intermediate data from disk • This approach is used whenever possible

  43. Query Execution Bottom Line • SQL query transformed into physical plan • Access path selection for each relation • Scan the relation or use an index (next lecture) • Implementation choice for each operator • Nested loop join, hash join, etc. • Scheduling decisions for operators • Pipelined execution or intermediate materialization • Pipelined execution of physical plan

  44. Recall: Physical Data Independence • Applications are insulated from changes in physical storage details • SQL and relational algebra facilitate physical data independence • Both languages input and output relations • Can choose different implementations for operators

  45. Query Performance • My database application is too slow… why? • One of the queries is very slow… why? • To understand performance, we need to understand: • How is data organized on disk • How to estimate query costs • In this course we will focus on disk-based DBMSs

  46. Student Data Storage • DBMSs store data in files • Most common organization is row-wise storage • On disk, a file is split into blocks • Each block contains a set of tuples In the example, we have 4 blocks with 2 tuples each block 1 block 2 block 3

  47. Student Data File Types The data file can be one of: • Heap file • Unsorted • Sequential file • Sorted according to some attribute(s) called key

  48. Student Data File Types The data file can be one of: • Heap file • Unsorted • Sequential file • Sorted according to some attribute(s) called key Note: key here means something different from primary key: it just means that we order the file according to that attribute. In our example we ordered by ID. Might as well order by fName, if that seems a better idea for the applications running onour database. CSE 344 - 2017au

  49. Index • An additional file, that allows fast access to records in the data file given a search key

  50. Index • An additional file, that allows fast access to records in the data file given a search key • The index contains (key, value) pairs: • The key = an attribute value (e.g., student ID or name) • The value = a pointer to the record

More Related