260 likes | 286 Vues
Transactions. What is it?. Transaction - a logical unit of database processing Motivation - want consistent change of state in data Transactions developed in 1950's banking activities Idea of transaction formalized in 1970's Transactions in clouds in 2010’s?. Why?.
E N D
What is it? • Transaction - a logical unit of database processing • Motivation - want consistent change of state in data • Transactions developed in 1950's • banking activities • Idea of transaction formalized in 1970's • Transactions in clouds in 2010’s?
Why? Want to interleave operations- • increases throughput of the system (number of transactions that can finish in any given period) • interleave I/Os and CPU time
Problems 1) Inconsistent result if crash in middle of transaction crash due to: hardware failure, system error, exception condition 2) Can have error if concurrent execution 3) Uncertainty when changes permanent. -- Write to disk every time? Concurrency control and Recovery from failures are needed to solve these problems
ACID properties – Jim Gray • ACID properties • Atomicity - transaction is indivisible, • Consistency - correct execution takes DB from one consistent state to another • Isolation – transactions affect each other as if not concurrent • Durability - once a transaction completes (commits), changes made to data are permanent and transaction is recoverable
Systems guarantees 4 properties • ACID properties • Atomicity • all or nothing, performs transaction in entirety • solves 1) Inconsistent result if crash in middle of transaction • Consistency • a logical property based on some consistency rule, implied by isolation • If isolation is implemented properly • solves 2) correct execution takes DB from one consistent state to another • Isolation • equivalent to serial schedule (serializability) • T2 -> T1 or T1 -> T2 • takes care of 2) • Durability • changes are never lost due to subsequent failures – • solves 3) Uncertainty when changes permanent
ACID properties • Atomicity ensures recovery • Consistency ensures programmer and DBMS enforce consistency • Isolation ensures concurrency control is utilized • Durability ensures recovery
Operations of transactions • Read and Write of data items • Granularity can be rows of a table or a table (Granularity can affect concurrency) • Each transaction has an identifier i assigned to it (Ti) • Transaction commit statement - causes transaction to end (commit) successfully (Ci) • Transaction abort (rollback) statement - all writes undone (Ai) • System or User can specify commit and abort
R’s and W’s Notation: R1(X) - transaction 1 reads data item X W2(Y) - transaction 2 writes to data item Y C1 - transaction 1 commits BL1 – transaction 1 blocks A2 - transaction 2 aborts, rollback • A series of R's and W's is a schedule (history) • Allow multiple users to execute simultaneously to access tables in a common DB • Concurrent access - concurrency
Lost Update Problem – Dirty Write Dirty write - some value of DB is incorrect T1 T2 where A=10 R1(A) R2(A) (A=10) A=A-10 W1(A) A=A+20 W2(A) R1(A)R2(A)W1(A)C1W2(A)C2 The value of A is 30 when T1 and T2 are done, but it should be 20
Dirty Read Problem Transaction updates DB item, then fails T1 T2 where A=10 R1(A) A=A-10 W1(A) R1(C) R2(A) (A=0, reads value T1 wrote) A1 - T1 fails R2(B) B=B+A W2(B) R1(A)W1(A)R1(C)R2(A)W2(A)A1 R2(B)W2(B)C2 When T1 is aborted, A is reset to value of 10, values B written by T2 are incorrect
Unrepeatable Read Transaction reads data item twice, in between values change T1 T2 where A=10 R1(A) R1(B) B=B+A W1(B) R2(A) A=A+20 W2(A) R1(A) R1(C) C=C+A W1(C) R1(A)R1(B)W1(B)R2(A)W2(A)C2R1(A)R1(C)W1(C)C1 When T1 first reads A it has a value of 10, then a value of 30
Degrees of Isolation DBs try to achieve the following: degree 0 - doesn't overwrite data updated (dirty data) by other transactions with degree at least 1 degree 1 - no lost updates (no dirty writes) degree 2 - no lost updates and no dirty reads degree 3 - degree 2 plus repeatable reads degree 4 - serializability
Serializable A transaction history is serializable if it is equivalent to some serial schedule (does not mean it is a serial schedule) The important word here is ‘some’ Example of two serial schedules: R1(X)W1(X)C1 R2(X)W2(X)R2(Y)W2(Y)C2 T1<< T2 R2(X)W2(X)R2(Y)W2(Y)C2 R1(X)W1(X)C1 T2 << T1
Equivalence • Is a given schedule equivalent to some serial schedule? R2(X)W2(X)R1(X)W1(X)R2(Y)W2(Y)C1C2 • How do we answer this question? • What is equivalence? • Need same number of operations • How to define equivalence
Result equivalence Result equivalent if produce same final state R1(X) (X*2)W1(X)C1 R2(X)(X+Xmod10)W2(X)C2 if X=10 result is X=20 R1(X)R2(X)(T2:X+Xmod10)W2(X)C2(T1: X*2)W1(X) if X=10 result is X=20 Same results if X=10, but next try X= 7: (results are 18 and 14)
Conflict equivalence First define conflicting operations R and W conflict if: 1. from 2 different transactions 2. reference same data item 3. at least one is a write
Conflicting operations The conflicting operations are: Ri(A) << Wj(A) read followed by a write Wi(A) << Rj(A) write followed by a read Wi(A) << Wj(A) write followed by a write Ri(A) << Rj(A) don't conflict Ri(A) << Wj(B) don't conflict
Conflict equivalence Conflict equivalent if order of any 2 conflicting operations is the same in both schedules R1(A)W1(A)C1 R2(A)W2(A)C2 or in reverse order R1(A)<<W2(A) R2(A)<<W1(A) W1(A)<<R2(A) W2(A)<<R1(A) W1(A)<<W2(A) W2(A)<<W1(A)
Conflict equivalence R1(A)R2(A)W1(A)C1W2(A)C2 R1(A)<<W2(A) R2(A)<<W1(A) W1(A)<<W2(A) NOT serializable - example of lost update
Example Is this schedule serializable? R2(X)W2(X)R1(X)W1(X)R2(Y)W2(Y)C1C2 R2(X)<<W1(X) W2(X)<<R1(X) W2(X)<<W1(X) T2<<T1 Is this schedule serializable? R2(X)R1(X)W2(X)R2(Y)W1(X)C1W2(Y)C2 R2(X)<<W1(X) R1(X)<<W2(X) W2(X)<<W1(X) not equivalent
Strategy • There is a simple algorithm for determining conflict serializability of a schedule • What is the algorithm?
Testing for Equivalence Construct a precedence graph (aka serialization graph) 1. For each Ti in S, create node Ti in graph 2. For all Rj(X) after Wi(X) create an edge Ti->Tj 3. For all Wj(X) after Ri(X) create an edge Ti->Tj 4. For all Wj(X) after Wi(X) create an edge Ti-> Tj 5. Serializable iff no cycles If a cycle in the graph, no equivalent serial history
Example Is this the following schedule serializable? R1(A)R2(A)W1(A)W2(A)C1C2 T1 T2
Serializability • If a schedule is serializable, we can say it is correct • Serializable does not mean it is serial • Difficult to test for conflict serializability - interleaving of operations is determined by the operating system scheduler • Concurrency control methods don't test for it. • Instead, protocols developed that guarantee a schedule is serializable.
Concurrency Control Techniques Protocols - set of rules that guarantee serializability • locking • Timestamps • multiversion • validation or certification - optimistic