Concurrent execution of user programs is essential for good DBMS performance.

Concurrent Execution • Concurrent execution of user programs is essential for good DBMS performance. • Disk accesses are frequent, and relatively slow. • Want to keep the CPU working on several user programs concurrently. • Challenges • Concurrency Control: How do the DBMSs handle concurrent transactions? • Crash Recovery: How do the DBMSs handle partial transactions because of machine crashes or users abort the transactions ? P1 P3 P2 R/W DBMS R/W DB

Transaction Management

Definition of Transaction: An execution of a user program in a DBMS • Executing the same program several times generates several transactions. • From the DBMS’s point of view, a transaction contains a sequence of reads and writes of database objects (e.g., pages, records). • A user’s program may have many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. P1 R/W DBMS R/W DB

Notation RT(O): A transaction T reading an object O into a program variable O in memory WT(O): A transaction T writing an object O to disks: T1 T2 R(A) W(A) R(B) W(B) Commit R(A) W(A) R(B) W(B) abort Each transaction consists of a final action which is either commit or abort. Commit: Transaction is completed successfully. Abort: Transaction is terminated and all actions done so far are undone. AbortT denotes the action of T aborting. CommitT denotes T committing.

Users submit transactions, and can think of each transaction as executing by itself. • Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. • Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. Transaction DB DB’ Inconsistency is allowed. In a consistent state In a consistent state

Properties of Transactions: ACID • ATOMICITY: All actions in a transaction are carried out or none are. • CONSISTENCY: Each transaction with no concurrent execution of other transactions must preserve the consistency of the database. (Users have to ensure this). • ISOLATION: Transactions are isolated from the effects of other concurrently executing transactions. • DURABILITY: Once the transaction has been successfully completed, its effects should persist if the system crashes before all its changes are reflected on disk.

Schedule: A list of actions from a set of transactions and the order in which any two actions of a transaction T appear in a schedule must be the same order as they appear in T. T1 T2 R(A) W(A) R(B) W(B) Commit R(C) W(C) Commit Time Read object A into a variable A. Write object B to a disk.

Schedule: A list of actions from a set of transactions and the order in which any two actions of a transaction T appear in a schedule must be the same order as they appear in T. T1 T2 R(A) W(A) R(B) W(B) Commit R(C) W(C) Commit Time Read object A into a variable A. Write object B to a disk. A complete schedulecontains either an abort or commit for each transaction in the schedule.

Schedule: A list of actions from a set of transactions and the order in which any two actions of a transaction T appear in a schedule must be the same order as they appear in T. T1 T2 R(A) W(A) R(B) W(B) Commit R(C) W(C) Commit Time Read object A into a variable A. Write object B to a disk. A complete schedulecontains either an abort or commit for each transaction in the schedule. Not all schedules are “good” schedules!!!

Scheduling Transactions • Serial schedule: Schedule that does not interleave the actions of different transactions. • There is no guarantee on the order of which transactions are executed. Given a set of n transactions, there are n! possible execution results. T1 T2 Tn DB0 DB1 DB2 DBn

Scheduling Transactions • Serial schedule: Schedule that does not interleave the actions of different transactions. • Given a set of n transactions, there are n! possible execution results. • Serializable schedule: A schedule whose effect on any consistency must be identical to that of some complete serial schedule (will be refined later on). • The result must be equal to one of n! results. T1 T2 Tn DB0 DB1 DB2 DBn We know the requirement, the problem now is how!

Example of Concurrent Executions • T1 is transferring $100 from A’s account to B’s account. T2 is crediting both accounts with a 50% interest payment. • There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. However, the net effect must be equivalent to these two transactions running serially in some order. T1: BEGIN A=A-100, B=B+100 END T2: BEGIN A=1.5*A, B=1.5*B END

A=100,B=100 A=100,B=100 T2 T1 T1 T2 A=0,B=300 A=50,B=250 • Consider interleaving schedule • Serial Schedules T1 T2 A= A-100 A=A*1.5 B=B+100 B=B*1.5 T1 T2 R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit This schedule is OK.

A=100,B=100 T1 T2 A= A-100 A=A*1.5 B=B*1.5 B=B+100 A=0, B=250 T1 T2 R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit This schedule is not OK.

What causes anomalies with interleaved execution? • write operations • abort/commit operations • RW Conflicts • WR Conflicts • WW Conflicts No abort in any transaction. Some abort in some transaction.

Anomalies: Unrepeatable Reads (RW Conflicts): A has value 5 initially; T1: Increment A; T2: Decrement A. The right value of A is 5. T1 T2 R(A) R(A) W(A) Commit W(A) Commit Value of A 5 5 6 (T1’s view of A) (T2’s view of A) (T1’s view of A) (T2’s view of A) 4 The effect of this schedule is different from any serial schedule of T1 and T2

A=100,B=100 A=100,B=100 T2 T1 T1 T2 A=0,B=300 A=50,B=250 Correct values • WR Conflicts; “dirty reads”: Schedule I T1 T2 T1 T2 A=A-100 A=A*1.5 B=B*1.5 B=B+100 R(A) W(A) R(A) W(A) R(B) W(B) Commit R(B) W(B) Commit A=100,B=100 I A=0,B=250 Wrong !!

Anomalies: WW Conflicts • T1 sets A and B to 10; T2 sets A and B to 20. • Consistency constraint: A and B must have the same value. Value of A T1 T2 W(A) W(A) W(B) Commit W(B) Commit 10 20 20 10 A =20 while B=10. Blind write: Write without reading the value of the objects.

Scheduling Involving Aborted Transactions Problems • If T2 has not been committed • Cascade abort: abort T2; Other transactions reading data updated by T2 are also aborted. • If T2 has been committed, T2 cannot be aborted: • Unrecoverable: T2 cannot be aborted • Lost: Rolling back T2 undoes the effect of T2, but T2 will be not be executed again T1 T2 R(A) W(A) R(A) W(A) Commit Abort Unrecoverable schedule!

Serializable Schedule: A schedule whose effect on any consistency must be identical to that of some complete serial schedule over the set of committed transactions in S. Recoverable Schedule: A schedule in which transactions commit only after all transactions whose changes they read commit. W(X) . . Commit R(X) . Commit Time A DBMS must ensure that only serializable and recoverable schedules are allowed

Serial schedule:Once a transaction starts, no other transactions can be started until it either commits or aborts. Strict schedule: 1) Once a transaction reads a value, then before it commits/aborts, no other transactions are allowed to write the value; 2) Once a transaction writes a value, then before it commits or aborts, no other transactions are allowed to read or write the value

Serial schedule:Once a transaction starts, no other transactions can be started until it either commits or aborts. Strict schedule: 1) Once a transaction reads a value, then before it commits/aborts, no other transactions are allowed to write the value; 2) Once a transaction writes a value, then before it commits or aborts, no other transactions are allowed to read or write the value Time Time T T R(X) . . Commit or Abort W(X) . . Commit or Abort No R(X) or W(X) allowed No W(X) allowed Strict schedules are serializable and recoverable • It avoids RW, WR, WW conflicts, and • It does not require cascading aborts, and actions of aborted transaction can be undone.

A serial schedule must be a strict schedule, but not vice versa. S12 R(A) R(A) W(A) Commit Commit S13 R(A) Commit R(A) W(A) Commit T1 T2 strict schedule serial schedule Not a serial schedule!

Implementing Strict Schedule Strict Two-phase Locking (Strict 2PL) Protocol: • Each transaction must obtain an S (shared) lock on object before reading, and an X (exclusive) lock on object before writing. If a transaction holds an X lock on an object, no other transaction can get a lock (S or X) on that object. • All locks held by a transaction are released when the transaction completes. Requests to acquire and release locks are automatically inserted into transactions by DBMSs.

R W OID Lock Status Holders Suspended C DBMS O1 N A O2 S T1, T4 T2 : : : : O1, ::, On On X T1 T2, T3 T1 T2 T4 T3

ST(O): Shared lock on object O XT(O): Exclusive lock on object O T1: R(A)  W(A) T2: R(A)  W(A) In this case, strict 2PL results in serial execution of the two transactions. T1 T2 T1 T2 X(A) R(A) W(A) Commit X(A) R(A) W(A) Commit T1 T2 X(A) R(A) W(A) Commit X(A) R(A) W(A) Commit X(A) R(A) W(A) T2 tries to do X(A) and cannot ! T2 has to be suspended until T1 is done. All locks are released.

Example of strict 2PL with interleaved actions. Schedule T3 T4 S(A) R(A) S(A) R(A) X(B) R(B) W(B) Commit X(C) R(C) W(C) Commit T3 T4 T3 T4 R(A) R(C) W(C) Commit R(A) R(B) W(B) Commit S(A) R(A) X(C) R(C) W(C) Commit S(A) R(A) X(B) R(B) W(B) Commit

Strict 2PL Strict 2PL ensures strict schedules (why?)

Deadlocks • Deadlock: Cycle of transactions waiting for locks to be released by each other. • Two ways of dealing with deadlocks: • Deadlock prevention • Deadlock detection T1 T2 X(A) W(A) X(B) W(B) X(B) X(A)

Deadlock Detection • Transaction manager maintains a waits-for graph: • Nodes correspond to active transactions. • Add an edge from Ti to Tj iff Ti is waiting for Tj to release a lock. • Remove an edge when a lock request is granted. • Periodically check for cycles in the waits-for graph. • Use a timeout mechanism: If a transaction has been waiting for too long, abort the transaction.

T1 T2 T3 T4 Wait for graph (Wait for B) S(A) R(A) X(B) W(B) S(B) S(C) R(C) X(C) X(B) X(A) T1 T2 (Wait for C) (Wait for B) T4 T3 T1 T2 Cyclic Deadlock T3 T3

Deadlock Prevention • Assign priorities based on timestamps • The lower the timestamp, the higher is transaction’s priority • Assume Ti wants a lock that Tj holds. • Wait-die: (older waits for the younger) • If Ti has higher priority (older), Ti waits for Tj; • Otherwise, abort Ti. • Wound-wait: (younger waits for the older) • If Ti has higher priority (older), abort Tj; • Otherwise, Ti waits. • If a transaction re-starts (younger transaction restarts), make sure it has its original timestamp so that no transaction is perennially aborted.

Performance of Locking • Locked-based schemes resolve conflict using blocking and aborting, both incurring performance penalty • Blocked transactions may hold locks that force other transactions to wait • Aborted transactions need to be rolled back and restarted

Performance of Locking • Locked-based schemes resolve conflict using blocking and aborting, both incurring performance penalty • Blocked transactions may hold locks that force other transactions to wait • Aborted transactions need to be rolled back and restarted • Increasing the number of transactions will initially increase the concurrency, but when the number of deadlocks increase to certain level (i.e., thrashing), the performance starts to downgrade

Relevant Questions with Lock-Based Concurrency Control • Should we use deadlock prevention or deadlock detection ? • How frequently should we check for deadlocks? • When deadlock occurs, which transaction should be aborted? • Detection-based schemes work well in practice. • Choice of deadlock victim to be aborted: • Transaction with fewest locks. • Transaction that has done the least work • Transaction that is farthest from completion. • There is a rich literature on this topic.

Strict schedule is sufficient but not necessary for serializability and recoverability • being too strict reduces the concurrency Not strict but still serializable and recoverable Strict and therefore serializable and recoverable Time T T W(X) . R(X) Commit W(X) . . Commit or Abort No R(X) or W(X) allowed W(X) . Commit R(X) or W(X) . Commit

Conflict Equivalent Schedules • Two schedules are conflict equivalent if: • They involve the same actions of the same transactions. • Every pair of conflicting actions of two committed transactions is ordered the same way. • Two actions conflict if they operate on the same data object and at least one of them is write.

Conflict Equivalent Schedules • Two schedules are conflict equivalent if: • They involve the same actions of the same transactions. • Every pair of conflicting actions of two committed transactions is ordered the same way. • Two actions conflict if they operate on the same data object and at least one of them is write. T1 T2 T1 T2 R1(A) W1(A) R2(A) W2(A) R1(B) W1(B) R1(A) W1(A) R1(B) W1(B) R2(A) W2(A)

Conflict Equivalent Schedules • Two schedules are conflict equivalent if: • They involve the same actions of the same transactions. • Every pair of conflicting actions of two committed transactions is ordered the same way. • Two actions conflict if they operate on the same data object and at least one of them is write. • If two schedules are conflict equivalent, they have the same effect on a database • The order of the conflicting actions determines the final state of a database • Swapping nonconflicting actions does not affect the final state of a database  allow more concurrency

Conflict Serializable Schedules • Schedule S is conflict serializable if S is conflict equivalent to some serial schedule. • A conflict serializable schedule must be serializable assuming that a set of objects does not grow or shrink. • A serializable schedule may not be a conflict serializable

Conflict Serializable Schedules • Schedule S is conflict serializable if S is conflict equivalent to some serial schedule. • A conflict serializable schedule must be serializable assuming that a set of objects does not grow or shrink. • A serializable schedule may not be a conflict serializable Schedule I Schedule II: (serial schedule) T1 T2 T3 T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit R(A) W(A) Commit W(A) Commit W(A) Commit A serializable schedule (schedule I = T1T2T3 or T2T1T3) but it is not conflict serializable (the conflicting pairs are in different order)

To determine if a schedule does not result in anomaly, we just need to make sure it is conflict equivalent to some serial schedule

To determine if a schedule does not result in anomaly, we just need to make sure it is conflict equivalent to some serial schedule • How can we know if a schedule is conflict equivalent to some serial schedule? • Using precedence graph or serializability graph.

Precedence Graph (Serializability Graph) The precedence graph for a schedule S contains: • A node for each committed transaction in S. • An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj ’s actions.

T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit Precedence Graph (Serializability Graph) The precedence graph for a schedule S contains: • A node for each committed transaction in S. • An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj ’s actions.

T1 T2 T3 R(A) W(A) Commit W(A) Commit W(A) Commit T1 T2 T3 Precedence Graph (Serializability Graph) The precedence graph for a schedule S contains: • A node for each committed transaction in S. • An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj ’s actions. Cycle Not conflict serializable!

Theorem A schedule is conflict serializable if and only if its dependency graph is acyclic.

Theorem A schedule is conflict serializable if and only if its dependency graph is acyclic. Strict 2PL ensures strict schedules and conflict serializable schedules (why??)

T1: … W(A) … T2: … R(A) … (the first conflicting pair) Schedule 1 Schedule 2 Time Time T1 T T2 S(A) R(A) X(A) W(A) . . Commit or Abort No W(A) allowed No R(A) or W(A) allowed . Commit or Abort S(A) R(A) . Commit X(A) W(A)

T1: … W(A) … T2: … R(A) … (the first conflicting pair) Schedule 1 Schedule 2 Time Time T1 T T2 S(A) R(A) X(A) W(A) . . Commit or Abort No W(A) allowed No R(A) or W(A) allowed . Commit or Abort S(A) R(A) . Commit X(A) W(A) Strict 2PL ensures that the precedent graph for any schedule that it allows is acyclic -- the arrow direction is determined by the execution order of the first conflicting pair.

Concurrent execution of user programs is essential for good DBMS performance.

Concurrent execution of user programs is essential for good DBMS performance.

Presentation Transcript

A Framework for Testing Concurrent Programs

ConcJUnit: Unit Testing for Concurrent Programs

A Foundation for Verifying Concurrent Programs

Random Testing of Concurrent Programs

Why Computer Repair is Essential For User?

QED: A Simplifier for Concurrent Programs

Performance Evaluation of a DBMS

Testing Concurrent Programs

Optimizations for Faster Execution of Esterel Programs

Testing Concurrent Programs

ConcJUnit: Unit Testing for Concurrent Programs

Reachability testing for concurrent programs

Hoare Logic for Concurrent Programs

Testing Concurrent Programs

Algorithmic Verification of Concurrent Programs

Debugging Techniques For Concurrent Programs

Concurrent Execution

Testing Concurrent Programs

Testing Concurrent Programs

A Framework for Testing Concurrent Programs

Attending Good, Informative Programs Is Often Essential for Success