Transaction Management Overview

Transaction Management Overview

What is a transaction? • Suppose you have embedded SQL code • You can have start transaction, a sequence of SQL statements, then commit transaction. • In JDBC, we can write: • Connection connection = DriverManager.getConnection(…); • connection.setAutoCommit(false); • …. • connection.commit(); or connection.rollback(); • Disk content is not changed until commit is executed!

Transactions • Concurrent execution of user programs is essential for good DBMS performance. • Because disk accesses are frequent, and relatively slow, it is important to keep the cpu humming by working on several user programs concurrently. • A user’s program may carry out many operations on the data retrieved from the database, but the DBMS is only concerned about what data is read/written from/to the database. • A transactionis the DBMS’s abstract view of a user program: a sequence of reads and writes.

ACID properties (desirable) • A transaction MUST satisfy the ACID property • A: atomic: all or nothing • the state of the database should be such that either all database operations are executed or non are • C: consistent: if the database instance is consistent before the start of the transaction, it should be consistent after the transaction commits. • I: isolation: even though other transaction may be executing at the same time, the result of executing the transaction on the database should be as though no other transactions are running. • D: durability: if the transaction commits, then the changes should be permanent, even if the system crashes.

Concurrency in a DBMS • Users submit transactions, and can think of each transaction as executing by itself. • Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. • Each transaction must leave the database in a consistent state if the DB is consistent when the transaction begins. • DBMS will enforce some ICs, depending on the ICs declared in CREATE TABLE statements. • Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). • Issues:Effect of interweaving transactions, and crashes.

Atomicity of Transactions • A transaction mightcommitafter completing all its actions, or it could abort(or be aborted by the DBMS) after executing some actions. • A very important property guaranteed by the DBMS for all transactions is that they are atomic.That is, a user can think of a transaction as always executing all its actions in one step, or not executing any actions at all. • DBMS logs all actions so that it can undothe actions of aborted transactions.

Assumptions • Transactions interact with each other only through the database (they don't send messages to each other) • A database is a fixed collection of independent objects.

Overview • Scheduler manages read/write requests from transactions; allows execution in an order that is “serializable,” i.e., it “looks like” the transactions executed one at a time (serial)

Scheduler • In most cases, the scheduler will execute the read or write request immediately. • However, in some cases, a read or a write needs to be deferred in time.

Example • Assume A = B is required for consistency. • T1 and T2 individually preserve DB consistency.

A Serial Schedule S1 • Assume initially A = B = 25. Here is one way to execute (S1= T1; T2) so they do not interfere.

Another Serial Schedule S2 • Here, transactions are executed as (S2=T2; T1). The result is different, but consistency is maintained.

Interweaving Does Not Necessarily Hurt (S3)

But Then Again, It Might!

The Semantics of transactions is also important. Suppose T2 multiplies by 1

Example with new notation • r1(A); w1(A); r2(A); w2(A); r1(B); w1(B), r2(B); w2(B); • is this schedule serializable? • Note: We require that the order of actions in each transaction are not changed in a schedule.

Transactions and Schedules • A transaction is a sequence of r and w actions on database elements. • A schedule is a sequence of reads/writes actions performed by a collection of transactions. • Serial Schedule = All actions for each transaction are consecutive. • r1(A); w1(A); r1(B); w1(B); r2(A); w2(A); r2(B); w2(B); …

Transactions and Schedules (Cont’d) • Serializable Schedule: A schedule whose “effect” is equivalent to that of some serial schedule. • We will now present a condition that is sufficient to assume that a schedule is serializable (but not necessary).

Conflicts • Suppose for fixed DB elements X & Y, • ri(X); rj(Y) is part of a schedule, and we flip the order of these operations. • ri(X); rj(Y) ≡ rj(Y); ri(X) … In what sense? • This holds always (even when X=Y) • We can flip ri(X); wj(Y), as long as X≠Y • That is, ri(X); wj (X) wj(X); ri (X) • In the RHS, Ti reads the value of X written by Tj, whereas this is not the case on the LHS.

Conflicts (Cont’d) • We can flip wi(X); wj(Y); provided X≠Y • However, wi(X); wj(X) ≢ wj(X); wi(X); • The final value of X may be different depending on which write occurs last. • There is a conflict if 2 conditions hold. • A read and a write of the same X, or • Two writes of X conflict in general and may not be swapped in order. • All other events (reads/writes) may be swapped without changing the effect of the schedule (on the DB).

Rough Idea • We want to test if a schedule is conflict-serializable. • That is, is it conflict-equivalent to a serial schedule. • We can keep swapping actions that do not conflict until we reach a serial schedule.

Elaborating ... • If by swapping pairs of non-conflicting actions in a schedule S, we end up in a serial schedule, then S is called a conflict-serializable schedule. • S:r1(A); w1(A);r2(A); w2(A); r1(B); w1(B);r2(B); w2(B); • Conflict-serializability is a sufficient condition for serializability but not necessary. • That is, conflict-serializability is a stronger notion than serializability.

Why Conflict-Serializable is not Necessary for Serializability • Consider the following serial schedule: • S1 =w1(Y); w1(X); w2(Y); w2(X); w3(X); • X has the value written by T3, Y has the value written by T2 • Next, consider the following schedule: • S2=w1(Y); w2(Y); w2(X); w1(X); w3(X); • S2 is serializable (equivalent to the serial schedule S1) • However, it is not conflict-serializable (w1(X) and w2(X) are conflicting)

Precedence Graphs Serializability/precedence graph for a schedule: S Nodes: transactions {T1,…,Tk} Arcs: There is an arc from Ti to Tj if they have conflicting access to the same database element X and Ti is accessing this element first. Ti <STj. Conditions: • Action A1 is before action A2 in S • Both actions involve the same element and at least of the actions is a write

Is there is a cycle in the graph? • If yes, then there is no serial schedule which is conflict-equivalent to S. • Each arc represents a requirement on the order of transactions in a conflict equivalent serial schedule. • If there is no cycle in the graph • Then anytopological order of the graph suggests a conflict-equivalent schedule.

Why the Precedence-Graph Test Works? • Idea: if the precedence graph is acyclic, then we can swap actions to form a serial schedule consistent with the same partial order; • Proof: By induction on n, number of transactions. • Basis: n = 1. That is, S={T1}; then S is already serial. • Induction: S={T1,T2,…,Tn}. Given that SG(S) is acyclic, then pick TiinS such that no Tjin S is dependent on. • We swap all actions of Tito the front (of S). • (Actions of Ti)(Actions of the other n-1 transactions) • The tail is a precedence graph that is the same as the original without Ti, i.e. it has n-1 nodes.  By the induction hypothesis, we can reorder the actions of the other transactions to turn it into a serial schedule

Schedulers • A scheduler takes requests from transactions for reads and writes and decides if it is “OK” to allow them to operate on DB or defer them until it is safe to do so. • Ideal: a scheduler forwards a request iffit cannot lead to inconsistency of DB • Too hard to decide this in real time. • Real: it forwards a request if it cannot result in a violation of conflict-serializability. • We thus need to develop schedulers which ensure conflict-serializability.

The use of lock must be proper in 2 senses: • Consistency of Transactions: • Read or write X only when hold a lock on X. • ri(X) or wi(X) must be preceded by some li(X) with no intervening ui(X). • If Ti locks X, then Ti must eventually unlock X. • Every li(X) must be followed by ui(X). • Legality of Schedules: • Two transactions may not have locked the same element X without one having first released the lock. • A schedule with li(X) cannot have another lj(X) until ui(X) appears in between

Legal Schedule DOES NOT Mean Serializable

Two Phase Locking There is a simple condition, which guarantees confict-serializability: In every transaction, all lock requests (phase 1) precede all unlock requests (phase 2).

Why 2PL Works? • Precisely: a legal schedule S of 2PL transactions is conflictserializable. • Proof is an induction on n, the number of transactions. • Remember: conflicts involve only read/write actions, not locks, but the legality of the transaction requires that the r/w's be consistent with the l/u's.

Why 2PL Works (Cont’d) • Basis:if n=1, then S={T1}, and hence S is conflict-serializable. • Induction: S={T1,…,Tn}. Find the first transaction, say Ti, to perform an unlock action, say ui(X). • Can we show that the r/w actions of Ti can be moved to the front of the other transactions without conflict? • Consider some action such as wi(Y). Can it be preceded by some conflicting action wj(Y) or rj(Y)? In such a case we cannot swap them. • If so, then uj(Y) and li(Y) must intervene, as wj(Y)...uj(Y)...li(Y)...wi(Y). • Since Ti is the first to unlock, ui(X) appears before uj(Y). • But then li(Y) appears after ui(X), contradicting 2PL. • Conclusion:wi(Y) can slide forward in the schedule without conflict; similar argument for a ri(Y) action. Therefore, produce: • (Actions of Ti)(Actions of the other n-1 transactions)

Strict two-phase locking • All locks are released immediately before the transaction commits. • Advantage: other transactions cannot access uncommitted data • Prevents cascading rollbacks • Does not guarantee a dead-lock free schedule

Deadlocks • When we cannot proceed forward. • T1 holds a lock on X but needs a lock on Y • T2 holds a lock on Y but needs a lock on X • neither can proceed forward if 2PL is used • Simple solution: if no progress after 5 seconds, shoot (i.e., abort) a random transaction • problem: cascading aborts if uncommitted data is read

Conclusion • You should know what is a transaction • What is a schedule, what is a serializable schedule • How concurrent transaction execution is supported by the scheduler • What is 2PL, why is it needed • How can deadlocks occur

Transaction Management Overview