Chapter 9 Transaction Support
Transaction • Transaction • An action or series of actions, carried out by a single user or application program, which accesses or changes the contents of the database. • A transaction is a logical unit of work on the database. • It may involve any number of operations on the database. • The execution of an application program can be seen as a series of transactions with non-database processing done in between.
Transaction states • A transaction transforms a database from one consistent state into another. • During the transaction inconsistent states are possible. • Outcomes: • Committed : database reaches a new consistent state. • Aborted : database must be restored to the consistent state it was in before the transaction started. • An aborted transaction is rolled-back or undone (message independence). • A committed transaction cannot be aborted. • Transactions cannot be nested. • The DBMS has no way of finding the transaction boundaries but most DML have appropriate keywords available: • Begin Transaction • Commit • Rollback
Properties of a transaction • Atomicity • The “all or nothing” property . • A transaction is performed in its entirety or not performed at all. • Consistency • A transaction must transform a database from one consistent state into another consistent state. • Independence • Transactions execute independently of one another. In other words, the partial effects of incomplete transactions should not be visible to other transactions. • Durability • The effects of committed transactions should not be lost because of subsequent failure.
Access methods File manager System buffers Database and data dictionary Database Architecture Transaction manager Scheduler Buffer manager Recovery manager Transaction Manager : Coordinates transactions on behalf of the applications. Scheduler : Responsible for implementing a strategy for concurrency control. Recovery Manager : Ensures that the database remains in a consistent state. Buffer Manager : Responsible for the transfer of data between disk storage and main memory.
Concurrency Control • Concurrency Control • The process of managing simultaneous operations on the database without having them interfere with one another. • Many users work simultaneously. • Database records are first transferred to central memory where the modifications are performed and then rewritten to the disk. • Transactions can be interleaved due to program interrupts. • This leads to concurrency problems: • Lost update problem; • Uncommitted dependency problem; • Inconsistent analysis problem.
Lost Update Problem • Successfully completed update is overwritten by another user. • T1 withdrawing €10 from an account with balance x, initially € 100. • T2 depositing € 100 into same account. • Serially, final balance would be € 190.
Lost Update Problem Time Transaction T1 Transaction T2 value x t1 Begin transaction t2 Begin Transaction Read R1(x) x=100 t3 Read R1(x) t4 x=x+100 t5 x=x-10 t6 write R(x) x=200 t7 write R(x) x=90 t8 commit t9 commit
Uncommitted Dependency Problem • Occurs when one transaction can see intermediate results of another transaction before it has committed. • T4 updates x to €200 but it aborts, so balx should be back at original value of €100. • T3 has read new value of x (€200) and uses value as basis of €10 reduction, giving a new balance of €190, instead of €90.
Uncommitted Dependency Problem Time Transaction T3 Transaction T4 value x t1 Begin transaction t2 Read R1(x) x=100 t3 x = x + 100 t4 write R1(x) x=200 t5 Begin transaction t6 read R1(x) t7 Rollback x=100 t8 x = x - 10 x=190 t9 write R1(x) t10 Commit should be 90
Inconsistent Analysis Problem • Occurs when transaction reads several values but second transaction updates some of them during execution of first. • Sometimes referred to as dirty read or unrepeatable read. • T6 is totaling balances of account x (€100), account y (€50), and account z (€25). • Meantime, T5 has transferred €10 from x to z, so T6 now has wrong result (€10 too high).
Inconsistent Analysis Problem Time Transaction T5 Transaction T6 x y z sum t1 Begin transaction 100 50 25 t2 Begin Transaction sum=0 100 50 25 0 t3 read x read x 100 50 25 0 t4 x = x - 10 sum = sum + x 100 50 25 100 t5 write x read y 90 50 25 100 t6 read z sum = sum + y 90 50 25 150 t7 z = z + 10 90 50 25 150 t8 write z 90 50 35 150 t9 commit read z 90 50 35 150 t10 sum = sum + z 90 50 35 185 t11 commit 90 50 35 185
Serializability • Objective of a concurrency control protocol is to schedule transactions in such a way as to avoid any interference. • Could run transactions serially, but this limits degree of concurrency or parallelism in system. • Serializability identifies those executions of transactions guaranteed to ensure consistency.
Serializability • Schedule • A transaction consists of a sequence of reads and writes to the database. The sequence of reads and writes by a set of concurrent transactions taken together is the schedule. • Serial Schedule • A schedule where the operations of each transaction are executed consecutively without any interleaved operation from other transactions. • Nonserial Schedule • A schedule where the operations from a set of concurrent transactions are interleaved.
Nonserial Schedule • Schedule where operations from set of concurrent transactions are interleaved. • Objective of serializability is to find nonserial schedules that allow transactions to execute concurrently without interfering with one another. • In other words, want to find nonserial schedules that are equivalent to some serial schedule. Such a schedule is called serializable.
Serializable Schedule • Serializable Schedule • If a set of transactions executes concurrently, we say that the schedule is correct (serializable), if it produces the same result as some serial execution. • The ordering of reads and writes is important in serializability • if two transactions only read a data item, they do not conflict and order is not important; • if two transactions either read or write completely separate data items, they do not conflict and order is not important ; • if one transaction writes a data item and another either reads or writes the same data item, the order of execution is important.
Recoverability • Serializability identifies schedules that maintain database consistency, assuming no transaction fails. • Could also examine recoverability of transactions within schedule. • If transaction fails, atomicity requires effects of transaction to be undone. • Durability states that once transaction commits, its changes cannot be undone (without running another, compensating, transaction).
Recoverable Schedule A schedule where, for each pair of transactions Ti and Tj, if Tj reads a data item previously written by Ti, then the commit operation of Ti precedes the commit operation of Tj.
Concurrency Control Techniques • Two basic concurrency control techniques: • Locking • Time-stamping • Both are conservative approaches: delay transactions in case they conflict with other transactions. • Optimistic methods assume conflict is rare and only check for conflicts at commit.
Concurrency Control Techniques • Locking • A procedure used to control concurrent access to data. When one transaction is accessing the database, a lock may deny access to other transactions to prevent incorrect updates. • Data items of various sizes, ranging from the entire database down to a field , may be locked. • The size of the item determines the granularity of the lock. • Implementation can be done by • setting a bit in the data item; • keeping a list of locked parts; • other techniques.
Lock Types • Read lock • If a transaction has a read lock on a data item, it can read the item but not update it. • Write lock • If a transaction has a write lock on the data item , it can both read and update the item.
Using Locks • Any transaction that needs to access the data item must first lock the item, requesting a read lock for read only access or a write lock for both read and write access. • If the item is not already locked by another transaction , the lock will be granted. • If the item is currently locked, the DBMS determines whether the request is compatible with the existing lock : • a read request on an item with a read lock will be granted; • for other requests the transaction must wait until the existing lock is released. • A transaction continues to hold a lock until it explicitly releases it , either during execution or when it terminates. • It is only when the write lock has been released that the effects of the write operation will be made visible to other transactions.
Two-phase Locking • 2PL • A transaction follows the two-phase locking protocol if all locking operations precede the first unlock operation in the transaction. • With this protocol every transaction has two phases: • growing phase: where no locks can be released; • shrinking phase: where no locks can be acquired. • Some systems allow upgrades ( in the growing phase ) or downgrades ( in the shrinking phase ) of a lock. • With 2PL serializability of schedules can be granted.
2PL: Lost update solution Time Transaction T1 Transaction T2 value x t1 Begin transaction t2 Begin Transaction Read R1(x) WL x=100 t3 request WL t4 wait x=x+100 t5 wait write R(x) x=200 t6 wait unlock(x) UL t7 Read R1(x) WL commit t8 x=x-10 t9 write R(x) x=190 t10 unlock(x) UL t11 commit
2PL: Uncommitted Dependency Time Transaction T1 Transaction T2 value x t1 Begin transaction t2 Read R1(x) WL x=100 t3 x = x + 100 t4 write R1(x) x=200 t5 Begin transaction t6 request WL x=200 t7 wait Rollback UL x=100 t8 read R1(x) WL x=100 t9 x = x - 10 x=90 t10 write(x) x=90 t11 Commit
Deadlock Deadlock • An impasse that may result when two or more transactions are each waiting for locks held by the other to be released. Time Transaction 1 Transaction 2 t1 begin transaction t2 write-lock (x) begin transaction t3 read (x) write-lock (y) t4 x = x - 10 read (y) t5 write (x) y = y + 100 t6 write-lock (y) write (y) t7 wait write-lock (x) t8 wait wait t9 wait wait
Deadlock • Only one way to break deadlock: abort one or more of the transactions. • Deadlock should be transparent to user, so DBMS should restart transaction(s). • Two general techniques for handling deadlock: • Deadlock prevention. • Deadlock detection and recovery.
Deadlock Prevention • DBMS looks ahead to see if transaction would cause deadlock and never allows deadlock to occur. • Could order transactions using transaction timestamps: • Wait-Die - only an older transaction can wait for younger one, otherwise transaction is aborted (dies) and restarted with same timestamp.
Deadlock Prevention • Wound-Wait - only a younger transaction can wait for an older one. If older transaction requests lock held by younger one, younger one is aborted (wounded).
Timestamping • Timestamp • A unique identifier created by the DBMS that indicates the relative starting time of a transaction. • Data items can get read-timestamp or a write-timestamp. • Timestamping • A concurrency control protocol in which the fundamental goal is to order transactions globally in such a way that older transactions (with smaller time stamps) get priority in the event of conflict . • If a transaction attempts to read or write a data item , it can only proceed if the last update on that data item was carried out by an older transaction; otherwise , the transaction is restarted and given a new timestamp.
Timestamping • Read/write proceeds only if last update on that data item was carried out by an older transaction. • Otherwise, transaction requesting read/write is restarted and given a new timestamp. • Also timestamps for data items: • read-timestamp - timestamp of last transaction to read item. • write-timestamp - timestamp of last transaction to write item.
Timestamping - Read(x) • Consider a transaction T with timestamp ts(T): ts(T) < write_timestamp(x) • x already updated by younger (later) transaction. • Transaction must be aborted and restarted with a new timestamp.
Timestamping - Read(x) ts(T) < read_timestamp(x) • x already read by younger transaction. • Roll back transaction and restart it using a later timestamp. ( must not be aborted because it cannot have read a wrong version since the write has not yet been done).
Granularity of Data Items • Size of data items chosen as unit of protection by concurrency control protocol. • Ranging from coarse to fine: • The entire database. • A file. • A page (or area or database spaced). • A record. • A field value of a record.
Granularity of Data Items • Tradeoff: • coarser, the lower the degree of concurrency. • finer, more locking information that is needed to be stored. • Best item size depends on the types of transactions.
File Recovery • Two Complementary Techniques. • Backup. • Periodical copy of the Database on an archive file. • full backup. • incremental backup. • Recovery ( more than 10% of code of DBMS ). • after a failure bring the database back in a reliable state. • Redundancy is needed. • Time factor is crucial.
Restore Actions (simplified) In case of failure: DB is unreadable DB is unreliable Last Archive New DB REDO New DB UNDO Logfile
UNDO • UNDO a started transaction can occur in case of: • insert • delete • update. • Direct access device is needed • active log Continuous transition • archive log • Image before only on active log. • }
Database Recovery • Process of restoring database to a correct state in the event of a failure. • Need for Recovery Control • Two types of storage: volatile (main memory) and nonvolatile. • Volatile storage does not survive system crashes. • Stable storage represents information that has been replicated in several nonvolatile storage media with independent failure modes.
Recovery Techniques • If database has been damaged: • Need to restore last backup copy of database and reapply updates of committed transactions using log file. • If database is only inconsistent: • Need to undo changes that caused inconsistency. May also need to redo some transactions to ensure updates reach secondary storage. • Do not need backup, but can restore database using before- and after-images in the log file.
Types of failures • System crashes, resulting in loss of main memory. • Media failures, resulting in loss of parts of secondary storage. • Application software errors. • Natural physical disasters. • Carelessness or unintentional destruction of data or facilities. • Sabotage.
Transactions and Recovery • Transactions represent basic unit of recovery. • Recovery manager responsible for atomicity and durability. • If failure occurs between commit and database buffers being flushed to secondary storage then, to ensure durability, recovery manager has to redo (rollforward) transaction's updates.
Transactions and Recovery • If transaction had not committed at failure time, recovery manager has to undo (rollback) any effects of that transaction for atomicity. • Partial undo - only one transaction has to be undone. • Global undo - all transactions have to be undone.
System interruption • I/O buffers lost • DB intact • UNDO of current transactions is needed but those transactions are difficult to identify. • Concept of " Checkpoint " • periodically the following steps must be performed: • Step 1: Log-buffers emptied on the logfile. • Step 2: Checkpoint record written on the Logfile. • Step 3: Database buffers emptied on the database. • Step 4: Address of checkpoint record written on the "restart file".
Log File • Contains information about all updates to database: • Transaction records. • Checkpoint records. • Often used for other purposes (for example, auditing).
Log File • Transaction records contain: • Transaction identifier. • Type of log record, (transaction start, insert, update, delete, abort, commit). • Identifier of data item affected by database action (insert, delete, and update operations). • Before-image of data item. • After-image of data item. • Log management information.
Log File • Log file may be duplexed or triplexed. • Log file sometimes split into two separate random-access files. • Potential bottleneck; critical in determining overall performance.