1 / 40

CS411 Database Systems

CS411 Database Systems. Kazuhiro Minami. 15: Final Review. Final exam. 7:00pm-10:00pm on Thursday, December 17 1404 Siebel Center Bring UIUC photoid, pen, pencil, and eraser Closed book, no notes Don't cheat. Final exam (Subject to changes). 10 - 15 True-False questions Data Storage

goddardc
Télécharger la présentation

CS411 Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS411Database Systems Kazuhiro Minami 15: Final Review

  2. Final exam 7:00pm-10:00pm on Thursday, December 17 1404 Siebel Center • Bring UIUC photoid, pen, pencil, and eraser • Closed book, no notes • Don't cheat

  3. Final exam (Subject to changes) • 10 - 15 True-False questions • Data Storage • Indexes • Query execution • Query optimization • Logging & Recovery • Concurrency control

  4. Concurrency Control

  5. Concurrency Control – Basic concepts • What is a transaction? • Which actions do we consider in a transaction? • How to represent a transaction? • What is a schedule? • What is the goal of concurrency control? • What is a serial schedule? • What is a serializable schedule? • What is a conflict-serializable schedule? • What are conflicting swaps? • How to determine whether a schedule is conflict-serializable?

  6. How to draw a precedence graph? Why the precedence-graph test works? • r1(A); r2(A);r3(B);w1(A);r2(C);r2(B);w2(B);w1(C) • r1(A); w1(B);r2(B);w2(C);r3(C);w3(A);

  7. Enforcing Serializability by Locks • What is a lock? • What is a lock table? What kind of information is stored there? • What is the consistency of transactions? • What is the legality of transactions? • What is the notations for actions of locking and unlocking? • What is the job of the locking scheduler? • What is the two-phase locking (2PL) condition? What type of serializable schedules are produced with this approach?

  8. Two-phase locked schedule T1: r1(A);w1(B); T2: r2(A);w2(A);w2(B); • Make T1 and T2 consistent, but not 2PL by adding lock and unlock actions • Give a legal, but not serializable schedule of T1 and T2 in question 1 • Make T1 and T2 consistent and 2PL by adding lock and unlock actions • Give a legal schedule of T1 and T2 in question 2

  9. Why 2PL works? • Lock actions li(A), li(B), and li(C) precede unlock action ui(A) • Unlock action ui(A) precedes the other unlock actions ui(B) and ui(C) • When ui(A) occurs, transaction Ti holds all locks on A, B, and C. • We can freely move read or write action on X during the period where Ti holds a lock on X li(A) lj(A) ui(A) uj(A) ui(B) uj(B) li(B) lj(B) Ti Tj li(C) ui(C) Time

  10. Enforcing Serializability with Timestamps • What is a timestamp? • When is a timestamp assigned to a transaction? • What’s the notation for transaction T’s timestamp? • What information do we maintain for each database element X? • What type of serializable schedules are produced with this approach? • How the timestamp-based approach solve the problem of “dirty” read?

  11. Assumed Serial Schedule • Conflict serializable schedule that is equivalent to a serial schedule in which the timestamp order of transactions is the order to execute them Actual schedule T starts U starts V starts Serial schedule T starts U starts V starts

  12. Transaction T is aborted if • T tries to read data written by transaction U, which started later than T • Q: Why? • Q: How to detect this situation? • T tries to write data to database element X, which is already read by transaction U, which started later than T • Q: Why? • Q: How to detect this situation?

  13. Transaction T reads X to too late • Since T starts earlier than U, we want to make sure that schedule S is equivalent to serial schedule (T, U) • But, S = (wU(X), rT(X)) cannot be converted to • S’ = (rT(X), wU(X)) • Thus, we abort T when it tries to read X • How do we detect X is already written by U, which starts later than T? • U needs to leave some footprint (i.e., timestamp TS(U)) on element X U write X T read X T start U start

  14. How can you detect T’s writing to late? U read X T write X T start U start

  15. Prevention of Dirty Read • How to prevent transaction T from reading data written by uncommitted transaction U? • We want T to wait until U commits U write X T read X U start T start U aborts

  16. Thomas Write Rule • Why can we skip transaction T’s write on element X, which is already modified by transaction U? • What if there is another transaction V starting after T but before U? • What if V starts after U? U writes X T writes X T start U start

  17. Another Problem with Dirty Data • Thomas write rule: T’s write can be skipped if • But, we want T to wait until U commits TS(T) < WT(X) U writes X T writes X We need to restore the previous value for X, but… T start U start T commits U aborts

  18. Concurrency Control by Timestamps • Tell me what happens as each executes • st1; st2; r1(A); r2(B); w2(A); w1(B) B T2 T1 A 100 200 RT=0 RT=0 WT=0 WT=0 r1(A) RT=100 r2(B) RT=200 w2(A) WT=200 w1(B) Write too late; need to roll back.

  19. B T2 T1 A 100 200 RT=0 RT=0 WT=0 WT=0 r1(A) RT=100 w2(B) WT=200 r2(A) RT=200 w1(B) Concurrency Control by Timestamps • Tell me what happens as each executes • st1; r1(A); st2; w2(B); r2(A); w1(B) OK, but needs to wait until T2 commits

  20. Concurrency Control by Validation

  21. Concurrency Control by Validation • Another type of optimistic concurrency control • Maintain a record of what active transactions are doing • Just before a transaction starts to write, it goes through a “validation phase” • If a there is a risk of physically unrealizable behavior, the transaction is rolled back Read actions Write actions Validation Transaction T

  22. Validation-based Scheduler • Keep track of each transaction T’s • Read set RS(T): the set of elements T read • Write set WS(T): the set of elements T write • Execute transactions in three phases: • Read. T reads all the elements in RS(T) • Validate. Validate T by comparing its RS(T) an WS(T) with those in other transactions. If the validation fails, T is rolled back • Write. T writes its values for the elements in WS(T)

  23. Scheduler Maintains Information Sets • START: the set of transactions that have started, but not yet completed validation. For each T, maintain (T, START(T)) • VAL: the set of transactions that have been validated, but not yet finished. For each T, maintain (T, START(T), VAL(T)) • FIN: the set of transaction that have completed. For each T, maintain (T, START(T), VAL(T), FIN(T))

  24. Assumed Serial Schedule for Validation • We may think of each transaction that successfully validates as executing at the moment that it validates Actual schedule T validates U validates V validates Serial schedule T validates U validates V validates

  25. Potential Violation: Read too Early • Transactions T and U such that • U has validated • START(T) < FIN(U) • RS(T)  WS(U) is not empty T reads X U writes X U start T start U validated T validating

  26. Another Potential Violation: Write too Early • Two transactions T and U such that • U is in VAL • VAL(T) < FIN(U) • WS(T)  WS(U) is not empty T writes X U writes X U finish U validated T validating

  27. Validation Rules To validate a transaction T, • Check that RS(T)  WS(U) is an empty set for any validated U and START(T) < FIN(U) • Check that WS(T)  WS(U) is an empty set for any validated U that did not finish before T validated, i.e., if VAL(T) < FIN(U)

  28. RS={A,B} WS={A} RS={B,C} WS={A} T1 T2 T3 RS={C,D} WS={B} Example Problem In the following sequence of events, tell what happens when each sequence is processed by a validation-based scheduler. R1(A,B); R2(B,C); V1; R3(C,D); V3; W1(A); V2; W2(A); W3(B); T1’s write Read Too early! T3’s read

  29. RS={A,B} WS={A} RS={B,C} WS={A} T1 T2 T3 RS={C,D} WS={B} Example Problem In the following sequence of events, tell what happens when each sequence is processed by a validation-based scheduler. R1(A,B); R2(B,C); V1; R3(C,D); V3; W1(A); V2; W2(A); W3(B); T1’s write Write too early! T3’s write

  30. RS={A,B} WS={A} RS={B,C} WS={A} T1 T2 T3 RS={C,D} WS={B} Example Problem In the following sequence of events, tell what happens when each sequence is processed by a validation-based scheduler. R1(A,B); R2(B,C); V1; R3(C,D); V3; W1(A); V2; W2(A); W3(B);

  31. Comparison of Three Mechanisms • Storage utilization • Locks: space in the lock table is proportional to the number of database elements locked • Timestamps: Read and write times for recently accessed database elements • Validation: timestamps and read/write sets for each active transaction, plus a few more transactions that finished after some currently active transaction began

  32. Comparison of Three Mechanisms • Delay • Locking delays transactions but avoids rollbacks, even when interaction is high • If interference is low, neither timestamps nor validation will cause many transactions • When a rollback is necessary, timestamps catch some problems earlier than validation

  33. Serial Schedules Conflict-serializable Schedules Serializable Schedules Summary Timestamping Validation Two-Phase Locking

  34. Storage: T/F Questions • Secondary storage is volatile • Spanned records refer to records that are longer than a single block and therefore are broken into fragments • When we store multiple fixed-length records in a block, we do not need to have a record header for each record. • A block header always contains pointers to each record in the block • When we insert a new record into a relation, we sometimes need to create an overflow block even if records in that relation are not sorted. • When a client requests a record that contains a BLOB, the database server returns the entire record at a time

  35. Index: T/F Questions • We can only use a dense index if the data file is sorted by the search key. • When we add a second level of index, the second-level index must be dense. • B-trees sometimes need overflow blocks. • B-trees can be used to perform ranged queries. • Extensible hash tables sometimes have overflow blocks. • In a linear hash table, each data block has the “nub” indicating how many bits of the hash function’s sequence is used.

  36. Query execution • Cost parameters • Sort methods • Hash methods • Index method • 1 phase vs. 2 phase • Pipelining vs. materialization • Estimation

  37. How to do joins Know the major ways to do joins • hash join • simple-sort-based join • sort join (merge join) • indexed join • nested loop join and when to pick one over the other Make sure to understand how each algorithm works, its required memory, and I/O cost

  38. True/False questions • Duplicate elimination is a physical operator. • The cost of scanning a unclustered relation R is B(R) • A selection operator can be always executed with a one-pass algorithm. • We can sort any number of blocks with 2-way merge sort using memory buffer of size M. • When we sort two relations R and S with sort-merge-join, its total cost would be 5B(R) + 5B(S). • Two-pass hash-based join algorithm has a size requirement only on the smaller of the two input relations

  39. Query Optimization • Do you know how to apply algebraic laws to get a better logical plan? • Do you know how to perform a cost-based optimization on a logical plan? • Do you know how to convert a logical plan to a physical plan? • Do you know how to estimate the size of an intermediate operation?

  40. Logging & Recovery • What’s the correctness principle? • What are primitive four operations of transactions? • How does undo logging work? • Log records • Undo-logging rules • Recovery procedure • How does redo logging work? • How does undo/redo logging work? • What is a checkpoint? • What is a nonquiescent checkpoint? • Do you know when <END CKPT> is added in each method? • Do you know recovery procedures with a checkpointed log? • What are advantages and disadvantages of each method?

More Related