Transaction Management

Transaction Management Department of Computer Science and Engineering, HKUST 1

Introduction • Time sharing systems executes more than one program at the same time by interleaving the execution of the programs • In DBMS, we consider transactions, not programs • A transaction is a database program that must be completed entirely in order to retain the consistency of the database; if the transaction cannot be completed, the database should remain at the same state as if the transaction hadn’t been executed at all • Concerned only with interleaved execution of transactions If the database is initially in consistent state (or empty), a sequence of transactions would bring the database from one consistent state to another Department of Computer Science and Engineering, HKUST 2

Assumptions and Basic Operations • A database is a set of data items accessed and modified by transactions • A transaction accesses or modifies the contents of a database • read_item(X) or R(X): Reads a database item X • write_item(X) or W(X): Writes a value into the database item X • Additional operations: • Commit - the transaction is successful and the data items value must be changed (if any) on the database permanently • Rollback/Abort - the transaction is not successful, do not change any of the data values • BEGIN_TRANSACTION, END_TRANSACTION DBMS may need UNDO and REDO for database recovery • A data item X could be: • An attribute • A column • A row • A page (of tuples) • A table Department of Computer Science and Engineering, HKUST 3

Correctness Requirement of Transactions • A database state consists of the complete set of data values in the database • A database state is consistent if the database obeys all the integrity constraint • A transaction brings the database from one consistent state to another consistent state, whether or not: • The transaction is executed alone or concurrently with other transactions • Failures occur in any transaction during its execution Database in a consistent state Database may be temporarily in an inconsistent state during execution Database in a consistent state Execution of Transaction Begin Transaction End Transaction Department of Computer Science and Engineering, HKUST 4

Read/Write Begin transaction End transaction commit Partially committed committed active abort abort failed terminated States of a Transaction Active: transaction is started and is issuing reads and writes to the database Partially committed: operations are done and values are ready to be written to the database Committed: writing to the database is permitted and successfully completed Abort: the transaction or the system detects a fatal error Terminated: transaction leaves the system Department of Computer Science and Engineering, HKUST 5

Other Properties of a Transaction • Incomplete transactions cannot reveal its results to other transactions before commitment • Committed updates are persistent (won’t be lost) • Once a transaction commits, the system must guarantee that the results of its operations will never be lost (database recovery) Department of Computer Science and Engineering, HKUST 6

Requirements on the DBMS • To maintain database consistency over time (consistency, atomicity) • To ensure multiple users can simultaneously access and modify the data without creating inconsistency (isolation) • To make the changes to data permanent (durability) • Consistency, atomicity and durability are needed even if transactions are executed serially (i.e., no interleaving) Department of Computer Science and Engineering, HKUST 7

Why is Concurrency Control Needed? • Several problems occur when concurrent transactions execute in an uncontrolled manner • A schedule of concurrent transactions is a particular sequence of interleaving of their read or write operations • In general a transaction, has a set of data items it accesses (read set), and a set of data items it modifies (write set) Department of Computer Science and Engineering, HKUST 8

Problem 1: Lost Update Problem A transaction overwrites a data item modified by other transactions Transaction 1 (HK ATM) R1(Balance) Balance=Balance + 500 W1(Balance) Transaction 2 (Kln ATM) R2(Balance) Balance=Balance - 700 W2(Balance) Schedule 2 Balance R1(Balance) 1000 R2(Balance) 1000 W2(Balance) 300 W1(Balance) 1500 Schedule 1 Balance R1(Balance) 1000 R2(Balance) 1000 W1(Balance) 1500 W2(Balance) 300 overwrite overwrite The correct (consistent) value of Balance is 800, when initial Balance is 1000 Department of Computer Science and Engineering, HKUST 9

Problem 2: Dirty Read A transaction reads uncommitted modified data item values updated by other transactions. Transaction 1 (HK ATM) R1(Balance) Balance=Balance + 500 W1(Balance) Abort Transaction 2 (Kln ATM) R2(Balance) Balance=Balance -1200 W2(Balance) Commit Schedule R1(Balance) 1000 W1(Balance) 1500 R2(Balance) 1500 W2(Balance) 300 Abort T1 Commit T2 300 For a consistent database state, Transaction 2 should also be aborted T2 read a “dirty value” from T1 Department of Computer Science and Engineering, HKUST 10

Problem 3: Incorrect Summary A transaction reads partially updated data item values from other transactions Transaction 1 (HK ATM) R1(Balance2) Balance2=Balance2 - 500 W1(Balance2) R1(Balance1) Balance1=Balance1+500 W1(Balance1) Transaction 2 (HK ATM) R2(Balance2) R2(Balance1) Sum=Balance1+Balance2 Schedule Bal2 Bal1 R1(Balance2) 2000 1000 W1(Balance2) 1500 R2(Balance2) 1500 R2(Balance1) 1000 R1(Balance1) 1000 W1(Balance1) 1500 The correct sum calculated by T2 should be 3000 Department of Computer Science and Engineering, HKUST 11

Why is Recovery Needed? • When a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either • all the operations in the transaction are completed successfully and their effect is recorded permanently in the database, OR • the transaction has no effect whatsoever on the database or on any other transactions • The DBMS must not permit some operations of a transaction to be applied to the database while other operations of the transaction are not, even when a transaction fails after executing some of its operations but before executing all of them. Department of Computer Science and Engineering, HKUST 12

Types of Failures • A computer failure (system crash) • A transaction error (due to system or the transaction itself) • Concurrency control enforcement • Disk failure • Physical problems and catastrophes • The first three types are more common types of failures, and the system must keep sufficient information to recover from the failure. Department of Computer Science and Engineering, HKUST 13

Serializability Theory and Concurrency Control Department of Computer Science and Engineering, HKUST 14

Introduction • Since we allow multiple transactions to execute concurrently by interleaving their operations, we need a way to maintain the properties of the transactions • Serializability theory, concept of schedules forms the basis of transaction processing Department of Computer Science and Engineering, HKUST 15

R(x),R(y) W(z) Time C Schedules Read operations can be done in parallel while read/write and write/write have to be done according to the specified order Example: T = { R(x), R(y), z = x+y, W(z), C } R(x) and R(y) can be done in parallel but W(z) must follow R(x)/R(y) Department of Computer Science and Engineering, HKUST 16

R(x) C W(z) R(y) Order along the time axis No particular ordering Representation of a Schedule The schedule for a transaction T can be represented by a DAG (Directed Acyclic Graph) Department of Computer Science and Engineering, HKUST 17

Serializability Theory • For every transaction T, we have a set of operations executed by the transaction, like R(x), W(x), C, A • For each transaction these operations are executed in some order according to a schedule • The serializability theory defines the correctness of a schedule and provides a mechanism for testing the correctness Department of Computer Science and Engineering, HKUST 18

Serializability Theory (Cont.) • Two operations in a set of transactions are conflicting if they both operate on the same data item and at least one of them is a write • Given Ti and Tj, the following pairs contain conflicting opreations • Ri(x), Wj(x) • Wi(x), Wj(x) • Conflicting operations among transactions must be ordered • Ri(x) precedes Wj(x), or Wj(x) precedes Ri(x) • Wi(x) precedes Wj(x), or Wj(x) precedes Wi(x) Department of Computer Science and Engineering, HKUST 19

Note that this is NOT a precedence graph (introduced next) T1 R1(X) W1(X) C1 T2 R2(X) W2(X) C2 A Schedule Example 1 T1: {R1(x), x = x+1, W1(x), C1} T2: {R2(x), x = x+1, W2(x), C2} Conflicting operations between T1 and T2: { (R1(x), W2(x)) , (R2(x), W1(x)) , (W1(x), W2(x)) } We need to order these conflicting operations, e.g., R1(x)  W2(x) R2(x)  W1(x) W2(x)  W1(x) Department of Computer Science and Engineering, HKUST 20

Another Way of Representing Schedules T1 R1(X) W1(X) C1 T2 R2(X) W2(X) C2 Show time dependency but not ALL time ordering Note the explicit ordering of R1(X) and R2(X) and C1 and C2 in the table representation Linear form: compact but hard to read Department of Computer Science and Engineering, HKUST 21

T2 R2(x)  W2(y)  W2(x)  C2 T3 R3(y)  W3(x)  W3(y)  W3(z)  C3 T1 R1(x)  W1(x)  C1 Example 2 T1 = R1(x)  W1(x)  C1 T2 = R2(x)  W2(y)  W2(x)  C2 T3 = R3(y)  W3(x)  W3(y)  W3(z)  C3 This is just ONE of many possible schedules. At this point, we are not concerned with the “correctness” of the schedule or why this schedule is generated by the DBMS Department of Computer Science and Engineering, HKUST 22

T2 R2(x)  W2(y)  W2(x)  C2 T3 R3(y)  W3(x)  W3(y)  W3(z)  C3 T1 R1(x)  W1(x)  C1 Example 2 in Table Form Department of Computer Science and Engineering, HKUST 23

Precedence Graph • A set of transactions may have many possible schedules, which ones are correct? • Answer: use precedence graph • Precedence graph is a directed graph: • nodes/vertices as transactions • edges denoting precedence relationship between conflicting operations of different transactions. • For a pair of conflicting operations Oij Ti and Okl  Tk, • if Oij Okl then have an edge from Ti to Tk, else • if Okl Oij then have an edge from Tk to Ti • If there are more than one edge between two nodes, retain only one of the edges Department of Computer Science and Engineering, HKUST 24

R1(X) W1(X) C1 T1 T2 R2(X) W2(X) C2 Precedence Graph of Example 1 Since there are two transactions, we have two nodes T1 and T2 R1(X)  W2(X)  edge T1 T2; R2(X)  W1(x)  edge T2 T1, T1 T2 W2(X)  W1(X)  edge T2 T1 A cycle in Precedence Graph implies potential for database inconsistency Department of Computer Science and Engineering, HKUST 25

R1(X) W1(X) C1 T1 T2 R2(X) W2(X) C2 Why is it NOT a “Correct” Schedule? Because it is not equivalent to any SERIAL schedule! time There are only two possible serial schedules: T1 then T2, or T2 then T1 In either case, you can find a pair of out-of-order operations (back arrows in the figures) Department of Computer Science and Engineering, HKUST 26

R2(x)  W2(y)  W2(x)  C2 T2 T3 R3(y)  W3(x)  W3(y)  W3(z)  C3 T1 R1(x)  W1(x)  C1 Precedence Graph of Example 2 T1 = R1(x)  W1(x)  C1 T2 = R2(x)  W2(y)  W2(x)  C2 T3 = R3(y)  W3(x)  W3(y)  W3(z)  C3 T1 T3 T2 T1  T3 because W1(x)  W3(x); T3  T2 because W3(x)  R2(x); T1  T2 because W1(x)  W2(x) No cycle in precedence graph so no database inconsistency. Department of Computer Science and Engineering, HKUST 27

R2(x)  W2(y)  W2(x)  C2 T1  T3  T2 R3(y)  W3(x)  W3(y)  W3(z)  C3 R1(x)  W1(x)  C1 Using Another Schedule for Example 2 T1 = R1(x)  W1(x)  C1 T2 = R2(x)  W2(y)  W2(x)  C2 T3 = R3(y)  W3(x)  W3(y)  W3(z)  C3 T2 T3 T1 PG has a cycle; the database may not be consistent Department of Computer Science and Engineering, HKUST 28

T1 T2 T3 Another Example on Serializability Schedule is NOT serializable Department of Computer Science and Engineering, HKUST 29

time R2(x)  W2(y) Schedule = R3(y)  W3(x)  W3(y) not exist yet! T1 T3 T2 R1(x)  W1(x)  C1 Run-time Construction of Precedence Graph T1 = R1(x)  W1(x)  C1 T2 = R2(x)  W2(y)  W2(x)  C2 T3 = R3(y)  W3(x)  W3(y)  W3(z)  C3 • The scheduler keeps building the Precedence Graph as the operations of active transactions are scheduled so that the PG is acyclic. • The scheduler delays scheduling those operation that can cause a cycle in PG Department of Computer Science and Engineering, HKUST 30

Serial and Serializable Schedules • Serial Schedule • all actions of a transaction occur consecutively; • no interleaving of transactions • Serializable Schedule • the net effect of a schedule on a database is equivalent to some serial schedule; • PG is acyclic • also known as conflict serializable schedules Department of Computer Science and Engineering, HKUST 31

S Wj(x) Ri(x) Wi(x) commit S’ Wj(x) Ri(x) Wi(x) commit View Serializable Schedules Two schedules S and S’ are view equivalent schedules if • The same set of transactions participates in S and S’, and include same set of operations of those transactions • For any Ri(X) of Ti in S, if the value read by the operation has been written by an operation Wj(X) of Tj (or original value), the same condition must hold for the value of X read by operation Ri(X) in Ti in S’ • If Wk(Y) of Tk is the last operation to write item Y in S, then Wk(Y) of Tk must also be the last operation to write Y in S’ A schedule S is view serializable if it is view equivalent to a serial schedule. Department of Computer Science and Engineering, HKUST 32

T2 commits before T1 T2 reads from T1 T2 reads from T1 T’: … W(X) … … … … … abort! T: … … … R(X) ... commit T’: … W(X) … commit T: … … … R(X) … … .. commit T2 commits after T1 Recoverable Schedules • A schedule S is recoverable if no transaction T in S commits until all transactions T’ that have written an item that T reads have committed Sc: r1(X);w1(X); r2(X); r1(Y); w2(X); c2; a1 Sc is not recoverable because T2 reads from T1, but commits before T1. Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); c1; c2 is recoverable In a recoverable schedule no committed transaction needs to be rolled back Department of Computer Science and Engineering, HKUST 33

3) but T2 already read the “dirty” value of X 2) X recovered 1) T1 aborts Cascading Rollback When a transaction aborts, it causes other transactions which have read an item written by the aborted transaction. Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y), a1 • T2 has to be rolled back because T1 is aborted. • A schedule avoids cascading rollback if every transaction in the schedule only reads items that were written by committed transaction Department of Computer Science and Engineering, HKUST 34

This is a blind write Strict Schedules • Transactions in a strict schedule can neither read nor write an item X until the last transaction that wrote X has committed or aborted.Sf: w1(X=5), w2(X=8), a1 (suppose original value of X is 9) • If T1 aborts, the system rolls back the value of X to 9 • T2 loses update w2(X=8) • Strict schedule would not have allowed w2(X) before T1 commits or aborts • A recoverable schedule cannot detect this situation because T2 didn’t readＸ before writing to it (a blind write). Department of Computer Science and Engineering, HKUST 35

Lock-Based Protocols Department of Computer Science and Engineering, HKUST 36

Lock compatibility: Lock-based Protocols • Locking protocols control concurrent access to data items • The objective is to guarantee serializability of the schedule • Data items can be locked in two modes: • exclusive (X) mode: no one else can share it; I.e, you can do read or write on the locked item • shared (S) mode: you can only read the locked item, since others may be reading it at the same time Department of Computer Science and Engineering, HKUST 37

Lock-based Protocols • Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted. • A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions.Request Lock Existing LockR(x) R(x) or noneW(x) None • Any number of transactions cab hold shared locks on an item. • If one transaction holds an exclusive lock on the item, no other transaction may hold any lock on the item. • If a lock cannot be granted, the requesting transaction is blocked till all incompatible locks held by other transactions have been released. The lock is then granted. Department of Computer Science and Engineering, HKUST 38

Updated: A=50 B=100 Simple Locking • Example of a transaction performing locking: lock-S(A);read(A);unlock(A);lock-S(B);read(B);unlock(B);displayed(A+B) Original: A=100 B=200 A=100 B=100 A+B=200 A+B should be either 300 or 150 • Locking as above is not sufficient to guarantee serializability • if A and B get updated in-between the read of A and B, the displayed sum would be wrong. Department of Computer Science and Engineering, HKUST 39

Locking Protocols • A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. • Locking protocols restrict the set of possible schedules • E.g., a protocol which requires a transaction to lock all items before it starts and release them only when it finishes • Interleaving of reads and updates in the previous slide won’t occur, but such a “restrictive” protocol will reduce concurrency and system throughput Department of Computer Science and Engineering, HKUST 40

Pitfalls of Lock-based Protocols • Consider the partial schedule • Neither T3 nor T4 can make progress • executing lock-S(B) causes T4 to wait for T3 to release its lock on B • executing lock-X(A) causes T3 to wait for T4 to release its lock on A. • Such a situation is called a deadlock. To handle a deadlock either T3 or T4 must be rolled back to release its locks. • Deadlock exists in most locking protocols, but it is a necessary evil • Deadlock detection: if deadlock is detected, kill one of the transactions Department of Computer Science and Engineering, HKUST 41

abort abort another new transaction Starvation Starvation is also possible if concurrency control manager is unfair. E.g., • A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same item. • The same transaction is repeatedly rolled back due to deadlocks. • Concurrency control manager can be designed to prevent starvation. Department of Computer Science and Engineering, HKUST 42

Acquire locks Release locks No of locks acquired time The Two-Phase Locking protocol • Two-phase locking ensures conflict-serializable schedules. • Phase 1: Growing Phase • transaction may obtain locks but are not allowed to release locks • Phase 2: Shrinking Phase • transaction may release locks but are not allowed to get new locks Department of Computer Science and Engineering, HKUST 43

The Two-Phase Locking Protocol • The protocol assures serializability. It can be proven that the transaction can be serialized in the order of their lock points(I.e. the point where a transaction acquired its final lock). • Two-phase locking does not ensure deadlocks won’t occur T4  T3 Department of Computer Science and Engineering, HKUST 44

Keep all excl locks Acquire locks Number of locks acquired Release locks time Strict Two-Phase Locking • Cascading roll-back is possible under two-phase locking • Strict two-phase locking: a transaction must hold all its exclusive locks untill it commits/aborts. • Avoids cascade roll-back but reduces concurrency Department of Computer Science and Engineering, HKUST 45

Rigorous Two-Phase Locking • Rigorous two-phase locking is even stricter: here all locks (both exclusive and non-exclusive) are held till commit/abort. In this protocol transaction can be serialized in the order in which they commit. Department of Computer Science and Engineering, HKUST 46

Two-Phase Locking and Conflict Serializability • There can be conflict serializable schedules that cannot be obtained if two-phase locking is used. • E.g., two transactions accessing different sets of data items won’t need any lock at all. • Since a transaction has no idea about how other transactions are going to access the data item, 2-phase locking is an assurance of conflict serializability Department of Computer Science and Engineering, HKUST 47

Data items W-TS(Q), R-TR(Q) TS(T1) T1 W-TS(R), R-TR(R) T2 TS(T2) W-TS(S), R-TR(S) Timesstamp- Based Protocols • Each transaction is issued a timestamp when it enters the system. If an older transaction Ti has time-stamp TS(Ti), a younger transaction Tj is assigned time-stamped TS(Tj) such that TS(Ti) < TS(Tj). • The protocol manages concurrent execution such that the time-stamps determine the serializability order. • For each data item Q, two timestamps are maintained: • W-timestamp(Q) is the largest time-stamp of any transaction that executed write(Q) successfully. • R- timestamp(Q) is the largest time-stamp of any transaction that executed read(Q) successfully. Department of Computer Science and Engineering, HKUST 48

TS(Ti) = W-timestamp(Q) means Ti is the latest transaction writing to Q Timestamp-Based Protocol • The timestamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. • Suppose a transaction Ti issues a read(Q) • If TS(Ti) < W-timestamp(Q) • Ti wants to read a value of Q that has been written by a “younger” transaction • Hence, the read operation is rejected, and Ti is rolled back. • If TS(Ti)  W-timestamp(Q) • then the read operation is executed • R-timestamp(Q) is set to the maximum of R-timestamp(Q) and TS(Ti). Department of Computer Science and Engineering, HKUST 49

Timestamp-Based Protocol Suppose transaction Ti issues write(Q): • If TS(Ti) < R-timestamp(Q) • Q is already read by a “younger” transaction • if Ti writes to Q, then it may conflict with this younger transaction • Hence, the write operation is rejected, and Ti is rolled back. • If TS(Ti) < W-timestamp(Q) • Ti is attempting to write an obsolete value of Q since a “newer” value has already been written onＱ • Hence, this write operation is rejected, and Ti is rolled back. • Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti). write(Q) succeeds only if TS(Ti)  Q’s R-TS and W-TS Department of Computer Science and Engineering, HKUST 50

Transaction Management