Chapter 16: Transaction Management • Transaction Processing Overview • Transaction Concept • Transaction Definition in SQL • Transaction State • ACID Properties • Concurrent Executions • Serializability • Recoverability • Implementation of Isolation • Testing for Serializability.
Transaction Processing - Basics • A transaction is a logical unit of a database processing • Transaction processing systems include large databases and hundreds of concurrent users • Examples of these systems are: • airline reservations, • banking, • credit card processing, • supermarket checkout, and • similar systems
Multi - User Database Systems • One way to classify DBMSs is according to the number of concurrent users: • single user • multi-user • Majority of database systems are of a multi - user type • Concurrent (or simultaneous from the user point of view) database usage is possible thanks to computer multiprogramming • Multiprogramming operating systems execute some commands of one process, then suspend this process and execute some commands of another process • After a while, the execution of the first process is resumed at the point where it was interrupted • This type of process execution is called interleaving
Interleaved and Parallel Processes process 1 process 1 Inter- leaved process 2 process 2 t t5 t3 t2 t4 t1 process 1 Parallel process 2 t t2- t1+ t4- t3 t3- t2+ t5- t4
A Question for You • Are multiprogramming and interleaving more efficient than monoprogramming with serial program execution? • Answers: • Yes, but only from a user’s point of view • No, because multiprogramming means interrupting and resuming programs, which introduce OS overheads • Yes, from the point of view of both users and computers, because an OS interrupts execution of a program when it issues an i/o operation, thus saving long idle times, and resumes its execution when the i/o has been finished
Transaction Concept • A transaction is a single logical unit of a database processing that includes one or more database access operations (read and write) • A transactionis a unit of program execution that accesses and possibly updates various data items. • It may involve one or more operations on the database • It could be as simple as a single SQL command • Or as complex as the set of accesses performed by an application program • E.g. transaction to transfer $50 from account A to account B: 1. read(A) 2. A := A – 50 3. write(A) 4. read(B)` 5. B := B + 50 6. write(B) • Here our transaction consists of four database accesses (reads and writes), and twp non-database activities (decreasing value of A & increasing value of B) • A transaction should always transform the database from one consistent state to another. • If a transaction finishes successfully, all data it has changed are visible to other transactions • If a transaction fails for any reason, DBMS has to undo all the changes that the transaction made against the database
Transactions (continued) • In multi – user transaction processing systems, users execute database transactions concurrently • Most often, concurrent means interleaved • The users can attempt to modify the same database items at the same time, and that is potential source of databaseinconsistency • Checking database integrity constraints is not enough to protect a database from threats induced by its concurrent usage • Two main issues to deal with: • Failures of various kinds, such as hardware failures and system crashes, due to which transaction is incomplete • Concurrent execution of multiple transactions
Transaction Outcomes • A transaction can have one of two outcomes: • If it completes successfully, the transaction is said to have committed and the database reaches a new consistent state. • If it does not execute successfully, the transaction is aborted.– In this case, the database must be restored to the consistent state it was in before the transaction started.– This is known as roll-back. • Whatever the outcome, the database is in a consistent state at the end of the transaction
Transaction Support • Once a transaction has committed, it cannot be aborted. • Thus, if we decide that a committed transaction was a mistake, then we must perform another transaction to reverse it. • On the other hand, an aborted transaction can be restarted later, and depending on the cause of failure, may successfully execute and commit at that time. • A DBMS has no way of knowing which updates are grouped together to form a single, logical transaction. • Therefore, the user must be provided with a way to indicate the boundaries of each transaction. • For example, there may be keywords such as BEGIN_TRANSACTION, COMMIT, and ROLLBACK to delimit a transaction. • If such delimiters are not used, the whole program is usually treated as a single transaction with the DBMS automatically performing a COMMIT upon successful termination, or ROLLBACK if not.
Transaction Execution with SQL • Transaction support provided by • COMMIT • ROLLBACK • COMMIT statement is reached when all changes are permanently recorded within the database, it indicates successful end of transaction • ROLLBACK statement is reached in which all changes are aborted and database is rolled back to previous consistent stage, it indicates unsuccessful end of transaction.
Transaction State • Active –the initial state; the transaction stays in this state while it is executing • Partially committed –after the final statement has been executed. • Failed -- after the discovery that normal execution can no longer proceed. • Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: • restart the transaction • can be done only if no internal logical error • kill the transaction • Committed – after successful completion. Effects of committed transactions cannot be undone
Transaction State (Cont.) Commit End transaction Abort Begin Transaction Abort
Transaction Properties (ACID) To preserve the integrity of data the database system must ensure: • Atomicity. Either all operations of the transaction are executed and properly reflected in the database or none are. • Consistency. A transaction transforms the database from one consistent state to another. • Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. • That is, for every pair of transactions Tiand Tj, it appears to Tithat either Tj, finished execution before Ti started, or Tj started execution after Ti finished. • Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
Example of Fund Transfer • Transaction to transfer $50 from account A to account B: 1. read(A) 2. A := A – 50 3. write(A) 4. read(B) 5. B := B + 50 6. write(B) • Atomicity requirement • if the transaction fails after step 3 and before step 6, money will be “lost” leading to an inconsistent database state • Failure could be due to software or hardware • the system should ensure that updates of a partially executed transaction are not reflected in the database • Either all the operations of the transaction be completed or if not then the transaction should be aborted
Example of Fund Transfer • Transaction to transfer $50 from account A to account B: 1. read(A) 2. A := A – 50 3. write(A) 4. read(B) 5. B := B + 50 6. write(B) • Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures. • Ensuring durability is the responsibility of recovery management component
Example of Fund Transfer (Cont.) • Transaction to transfer $50 from account A to account B: 1. read(A) 2. A := A – 50 3. write(A) 4. read(B) 5. B := B + 50 6. write(B) • Consistency requirement : • The sum of A and B is unchanged by the execution of the transaction • A transaction must see a consistent database. Execution of a transaction must leave the database in either a new stable state or revert to old stable state. • During transaction execution the database may be temporarily inconsistent. • When the transaction completes successfully the database must be consistent • Erroneous transaction logic can lead to inconsistency • In general, consistency requirements include • Explicitly specified integrity constraints such as primary keys and foreign keys • Implicit integrity constraints • e.g. sum of balances of all accounts, minus sum of loan amounts must equal value of cash-in-hand
Example of Fund Transfer (Cont.) • Isolation requirement — if between steps 3 and 6, another transaction T2 is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be).T1 T2 1. read(A) 2. A := A – 50 3. write(A) read(A), read(B), print(A+B) 4. read(B) 5. B := B + 50 6. write(B • Data used during execution of a transaction cannot be used by a second transaction, till the first one is completed • Isolation can be ensured trivially by running transactions serially, that is, one after the other. • However, executing multiple transactions concurrently has significant benefits, as we will see later.
Reason for Incomplete transactions • It can be aborted or terminated unsuccessfully because of some problem during execution. • System may crash while one or more transactions are in progress. • Transaction may encounter an unexpected situation (read an unexpected data value or unable to access some disk) and decide to abort. • DBMS ensures transactions atomicity by undoing actions of incomplete transactions. • It maintains a records, called log, of all the writes to the database. It ensures durability also. • If system crashes before changes made by complete Transaction are written to the disk, log is used to remember and restore changes when system restarts.
Log File • Typically, a log file contains records with the following contents: [start_transaction, T ] (*T is transaction id*) [write_item, T, X, old_value, new_value] [read_item,T, X ] (*optional*) [commit, T ] [abort, T ]
Sources of Database Inconsistency • Uncontrolled execution of database transactions in a multi – user environment can lead to database inconsistency • There is a number of possible sources of database inconsistency • The typical ones are: • lost update problem, • dirty read problem, and • unrepeatable read problem
Lost Update Problem T2 T1 read_item (X) X = X + M write_item (X) read_item ( X ) X = X – N write_item (X ) • Generally, lost update • problem is characterized by: • T2 reads X, • T1 writes X, and • T2 writes X t i m e Aftertermination of T2, X = X + M. T1's update to X is lost because T2 wrote over X
Lost update problem • Imagine that a customer wants to withdraw £30 from a bank account. • At the same time, the bank is crediting this month’s salary. Time T1 (withdrawal) T2 (credit salary) Balance t1 begin_transaction 100 t2 begin_transaction read(balance) 100 t3 read(balance) balance=balance+1000 100 t4 balance=balance-30 write(balance) 1100 t5 write(balance) commit 70 t6 commit 70 • Both transactions occur at roughly the same time and read the same initial balance. • The last transaction to commit overwrites the update made by the first.
Uncommitted Dependency : Dirty Read Problem • Generally, dirty read • problem is characterized • by: • T1 writes X, • T2 reads X, and • T1 fails • Since T1failed, DBMS is • going to undo the changes • it made against the • database • T2 has already read item X = • X - N value, and that value is • going to be altered by DBMS • back to X T2 T1 read_item (X) X = X + M write_item (X) read_item ( X ) X = X – N write_item (X ) read_item ( Y ) T1 fails t i m e
The Uncommitted dependency :Dirty Read problem • The uncommitted dependency problem (dirty read) occurs when one transaction sees the intermediate results of another (aborted) transaction. Time T1 (withdrawal) T2 (credit salary) Balance t1 begin_transaction 100 t2 read(balance) 100 t3 balance=balance – 30 100 t4 write(balance) begin_transaction 70 t5 read(balance) 70 t6 rollback balance = balance + 1000 70 t7 write(balance) 1070 t8 commit 1070 • For some reason, the withdrawal transaction is aborted, but the salary credit transaction has already seen the update. • When T2 commits, the balance is incorrect (it should be 1100).
Unrepeatable Read / Inconsistent Retrieval problem T2 T1 read_item (X) X = X + M write_item (X) read_item ( X ) read_item (X ) • Generally, unrepeatable read • problem is characterized by: • T1 reads X, • T2 writes X, and • T1 reads X t i m e Transaction T1has got two different values of X in two subsequent reads, because T2has changed it in the meantime Even if T1 didn't execute the second read command, it would use a stale X value, and that's another form of the unrepeatable read problem
Unrepeatable Read / Inconsistent Retrieval problem • The previous problems involved simultaneous updates to the database. However, problems can also result if a transaction merely reads the result of an uncommitted transaction. • Below, one transaction (T1) is transferring £10 from account Balw to Balz, and at the same time, another transaction (T2) is summing all the accounts (Balw, Balx, Baly and Balz). Try to figure out what’s gone wrong…. Time T1 (transfer funds) T2 (sum accounts) Balw Balx Baly Balz Sum t1 begin_transaction 100 50 10 25 t2 begin_transaction sum=0 100 50 10 25 0 t3 read(Balw) read(Balw) 100 50 10 25 0 t4 balw=Balw - 10 sum = sum + Balw 100 50 10 25 100 t5 write(Balw) read(Balx) 90 50 10 25 100 t6 read(Balz) sum = sum + Balx 90 50 10 25 150 t7 balz = Balz + 10 read(Baly) 90 50 10 25 150 t8 write(Balz) sum = sum + Baly 90 50 10 35 160 t9 commit read(balz) 90 50 10 35 160 t10 sum = sum+Balz 90 50 10 35 195 t11 commit 90 50 10 35 195 ….here, the £10 transferred by T1 has been counted twice by T2, making its result too large by £10.
A Question for You • What is the difference between: • Dirty read and • Unrepeatable read
A Question for You • What is the difference between: • Dirty read and • Unrepeatable read Answers: • There is no difference. • Even if there is a difference, I can't recall what it is. • The difference is: • The dirty read is a consequence of reading updates made by a transaction before it has successfully finished (and has even failed later). • The unrepeatable read is a consequence of allowing a transaction to read data that the other one is altering.
Prevention of Concurrency Anomalies • Lost update, dirty read and unrepeatable read are called concurrency anomalies • The concurrency control part of a DBMS has the task to prevent these problems • DBMS is responsible to ensure that either all operations of a transaction are successfully executed and their effect is permanently stored in the database, or it happens as if the transaction were even not started • The effect of a partially executed transaction has to be undone
Advantages of Concurrent Execution of Transactions • Advantages of concurrent execution are: • Increased Throughput and Resource Utilization, leading to better transaction throughput • E.g. one transaction can be using the CPU while another is reading from or writing to the disk (processor and disk spend less time idle.) • Because I/O activity can be done in parallel with CPU activity. • Reduced Average Response Time: The average time for a transaction to be completed after it has been submitted • For transactions: short transactions need not wait behind long ones. Concurrent execution reduces unpredictable delays in running trans.
Implementation of Atomicity and Durability • The recovery-management component of a database system implements the support for atomicity and durability. • E.g. A very simple but extremely inefficient scheme is the shadow-copy scheme: • all updates are made on a shadow copy of the database • db_pointer is made to point to the updated shadow copy after • the transaction reaches partial commit and • all updated pages have been flushed to disk.
Implementation of Atomicity and Durability (Cont.) • db_pointer always points to the current consistent copy of the database. • In case transaction fails, old consistent copy pointed to by db_pointer can be used, and the new copy can be deleted. • The shadow-database scheme: • Assumes that only one transaction is active at a time. • Assumes disks do not fail • Useful for text editors, but • extremely inefficient for large databases (why?) • Does not handle concurrent transactions
Transaction Manager Scheduler Recovery Manager Buffer Manager Database Architecture Low Level DBMS
Database Architecture • The components of a DBMS that manage transactions are as follows: • The transaction manager coordinates transactions on behalf of application programs. • It communicates with the scheduler, which implements a particular strategy for concurrency control • The scheduler tries to maximize concurrency without allowing transactions to interfere with one another. • If failure occurs during a transaction, the recovery manager ensures that the database is restored to the state it was in before the start of the transaction. • The buffer manager is responsible for the transfer of data between disk storage and main memory
Transaction Management • Both concurrency and recovery control are required to protect the database from data inconsistencies and data loss. • Many DBMSs allow users to undertake simultaneous operations on the database. • If these operations are not controlled, they can interfere with each other, and the database may become inconsistent. • To overcome this the DBMS implements Concurrency control schemes– i.e. mechanisms to achieve isolation • That is, to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database • Database recovery is the process of restoring the database to a correct state following a failure– The failure may be the result of a system crash, media failure, software error, or accidental or malicious destruction of data. • Whatever the cause of failure, the DBMS must be able to restore the database to a consistent state
Concurrency Control • Concurrency control is the process of managing simultaneous operations on the database without having them interfere with one another. • Concurrency control is needed because many users are able to access the database simultaneously. • Note that managing concurrent access is easy if all users are only reading data. • There is no way that such uses can interfere with one another. • However, when two or more users are accessing the databases simultaneously, and at least one of them is updating data, there may be interference that can cause inconsistencies.
DB (consistency constraints) Concurrency Control How to prevent harmful interference btw transactions? T1 T2 … Tn • To prevent such problems a DBMS must implement concurrency control techniques based on - locks - timestamps and validation
Schedules • Schedule – A sequences of instructions that specify the chronological order (possibly interleaving) in which instructions of concurrent transactions are executed • Suppose there are n transactions T1, T2,…,Tn • A scheduleS of these n transactions is an ordering of their operations such that for each Tiparticipating in S, the operations of Ti in S appear in the same order as in Ti itself • A schedule for a set of transactions must consist of all instructions of those transactions • It must preserve the order in which the instructions appear in each individual transaction.
Complete Schedules • A transaction that successfully completes its execution will have a commit instructions as the last statement • by default transaction assumed to execute commit instruction as its last step • A transaction that fails to successfully complete its execution will have an abort instruction as the last statement • A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called complete schedule.
Serial Schedules • A schedule S is said to be a serial schedule if for each Ti in S, all operations of Ti are executed consecutively in S • If the actions of different transactions are not interleaved-that is transactions are executed from start to end one by one • Serial schedules are considered to be correct, i.e. that they do not exhibit concurrency anomalies, like: • lost update • unrepeatable read • dirty read • But, serial schedules mean no interleaving and hence are considered inefficient
Serial Schedules : Schedule 1 • Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. • A serial schedule in which T1 is followed by T2:
Schedule 2 • A serial schedule where T2 is followed by T1
Schedule 3 • Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, its concurrent schedule. But it is equivalent to Schedule 1. In Schedules 1, 2 and 3, the sum A + B is preserved.
Schedule 4 • The following concurrent schedule does not preserve the value of (A + B).
Examples of Legal Schedules • T < S • 1. avoid lost update problem • T: transfer $100 from A to C: R(A) W(A) R(C) W(C) • S: transfer $100 from B to C: R(B) W(B) R(C) W(C) • 2. avoid inconsistent retrievals problem • T: transfer $100 from A to C: R(A) W(A) R(C) W(C) • S: compute total balance for A and C: R(A) R(C) • 3. avoid non-repeatable reads • T: transfer $100 from A to C R(A) W(A) R(C) W(C) • S: check balance and withdraw $100 from A: R(A) R(A) W(A)
Defining the Legal Schedules • 1. To be serializable, the conflicting operations of T and S must be ordered as if either T or S had executed first. • We only care about the conflicting operations: everything else will take care of itself. • 2. Suppose T and S conflict over some shared item(s) x. • 3. In a serial schedule, T’s operations on x would appear before S’s, or vice versa....for every shared item x. • As it turns out, this is true for all the operations, but again, we only care about the conflicting ones. • 4. A legal (conflict-serializable) interleaved schedule of T and S must exhibit the same property. • Either T or S “wins” in the race to x; serializability dictates that the “winner take all”.
Serializable Schedules • Basic Assumption – Each transaction preserves database consistency. • Thus serial execution of a set of transactions preserves database consistency. • Schedules that allow interleaving to some extent and are correct are called serializableschedules • A serializable schedule is considered to be correct if it is equivalent to a serial schedule • Different forms of schedule equivalence 1. conflict serializability 2. view serializability
Conflict Serializable Schedules • Two operations conflict if • they belong to different transactions • at least one of them is a write • they access the same item X • A serializiable schedule is conflict equivalent to a serial schedule if • order of any two conflicting operations is the same in both schedules • Simplified view of transactions • We ignore operations other than read and write instructions • We assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. • Our simplified schedules consist of only read and write instructions.
Conflicting Instructions • Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q. 1. li = read(Q), lj = read(Q). li and ljdon’t conflict. 2. li = read(Q), lj = write(Q). They conflict. 3. li = write(Q), lj = read(Q). They conflict 4. li = write(Q), lj = write(Q). They conflict • Intuitively, a conflict between liand lj forces a (logical) temporal order between them. • If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.