Database Transactions

Database Transactions

Transaction • Atransactionis an interaction between a user (or application) and a database. • A transaction may involve simply accessing the database and reading some of its data. • A transaction may involve updating (changing, mutating) some of the data. • A transaction may involve more than a single field of a single record of a single table. It is a set of accessing or updating steps that can be viewed as a unit. • E.g. entering a new employee as opposed to entering a new employee’s last name.

Transactions and Update Anomalies • Recall that we normalized tables to reduce redundancy, not so much to reduce the amount of storage necessary, but to avoid update anomalies. • A transaction should not place the database into an inconsistent state(a state violating some integrity constraint). • An individual step in a transaction may put the database into an inconsistent state, but the set of steps that make up the transaction should not.

Ex: Firing a broker from the firm • A decision would have to be made about what happens to the clients of a stock broker if that broker leaves the company. • They may be re-assigned. • They may be deleted. • They may be made inactive. • Whatever choice is made, the full set of steps necessary to maintain the consistency of the database would go into the BrokerLeaves transaction.

Committed or aborted • If a transaction is successfully completed, it takes the database from one consistent state to another consistent state. A transaction is then said to be committed. • If a transaction for some reason cannot be completed (e.g. if it violates an integrity constraint or if the system crashes), it is said to be aborted. This may leave the database in an inconsistent state.

Roll back • If an aborted transaction leaves a database in an inconsistent state, one must return the database to its previous (pre-aborted transaction) consistent state. • One says that the transaction is rolled backor undone.

Compensating transaction • Sometimes a transaction is committed and leads to a consistent state, but that state is nevertheless wrong. • We may have fired the broker and reassigned the clients but find out the broker is not fired after all. • In such an instance, we need a compensating transactionthat returns the database to its previous state.

BEGIN TRANSACTION, COMMIT, ROLLBACK • There may be many Data Manipulation Language (DML) statements within a transaction, so there must be a way to indicate where a transaction starts and where it ends (i.e. a way to “delimit” a transaction). • The DML keywords • BEGIN TRANSACTION • COMMIT • ROLLBACK • Etc serve this purpose.

ACID test • Atomic: a transaction should be all or nothing, an indivisible unit. • Consistent: a transaction should take the database from consistent state to consistent state. • Isolated: a transaction should be independent, the constituent steps of one transaction should not interfere with those of another. • Durable: once committed, the result of a transaction should be permanent (including if the system goes down and has to be recovered).

Some Database Architecture

Some Database Architecture (Cont.) • Transaction manager: handles transactions at the interface with users/applications • Scheduler: handles the timing of transactions, tries to allow as many simultaneous transactions (concurrency) as possible provided they do not interfere • Recovery manager: makes sure database is returned to a consistent state if failure occurs • Buffer manager: handles the transfer of data between memory and storage

Concurrency problem: the bank example • To update someone’s account, you determine his/her balance and then add/subtract the amount of the deposit/withdrawal and write that in as the new balance. • balance = getBalance(accountInfo) • addDeposit(accountInfo, balance, depositAmount)

TRANSACTION 1 getBalance(myAccount) addDeposit(myAccount,$100) TRANSACTION 2 getBalance(myAccount) addDeposit(myAccount, $50) Concurrency problem: the bank example (lost update) State $500 $500 $600 $550 The first update, the deposit of $100 is lost. Cure: Do not allow the second transaction to read the balance until the first update has occurred.

TRANSACTION 1 getBalance(myAccount) addDeposit(myAccount,$100) Transaction failure (ROLLBACK) TRANSACTION 2 getBalance(myAccount) addDeposit(myAccount, $50) Concurrency problem: the bank example (uncommitted dependency, dirty read) State $500 $600 $600 $500 $650 The first update, the deposit of $100 is kept even though the transaction was rolled back. Cure: Do not allow the second transaction to read the balance until the first transaction has been committed.

TRANSACTION 1 Sum =0 Sum+=myAccnt1.balance Sum+=myAccnt2.balance TRANSACTION 2 getBalance(myAccnt1) addDeposit(myAccnt1, $50) Concurrency problem: the bank example (inconsistent analysis) State $500 $100 $500 $100 $550 $100 $550 $100 The update was made after the analysis (in this case summing) was started, the deposit of $50 is not seen in the analysis.

Serial Scheduling • One way to prevent two transactions from interfering would be to perform all of the transactions in serial, that is, to allow only one transaction (up to and including commitment or rollback) at a time. • The order of transactions can still be important. Performing addInterest() and then deposit(amount) has a different outcome from deposit(amount) and then addInterest(). But either way the results are consistent.

Multi-user database • Complete serialization of a database is unsatisfactory if there are multiple, simultaneous database users. • Some transactions are independent and do not interfere. • We not only want the guaranteed non-interference provided by complete serialization but also want to allow parallel transactions to occur so long as they do not interfere. • An allowable schedule of transactions should be consistent with a completely serialized set of transactions.

Parallel transactions • Allowed: • If two transactions only read data, they cannot interfere. • If two transactions involve (reading or writing) disjoint sets of data, they cannot interfere. • Disallowed: • If one transaction writes some data and a second transaction reads or writes the same data, they can interfere.

Serializabilty Completely Serialized These parts of transactions 7 and 8 do not interfere.

Ex. Schedule that is not consistent with a completely serialized schedule. Cannot be moved earlier because of read(balx) Cannot be moved later because of read(baly)

Locking • In order to prevent two transactions from interfering, a transaction can place a lock on some data. • Any subsequent transaction must respect the lock until it is released by the original transaction.

Shared Lock/Shared Lock • Recall that two transactions can read the same data without causing any interference. • In such a case, the first transaction would request a shared lockon the data in question. • (Assuming there are no previous locks on the data it will be granted.) • The second transaction would also request a shared lock. • This lock will also be granted and both transactions can read the data, and eventually each releases its lock on the data.

Shared Lock/Exclusive Lock • Suppose the first transaction only wants to read the data in question, it requests a shared lock. • (Assuming there are no previous locks on the data it will be granted.) • Now suppose the second transaction wants to read and update the data, it would request an exclusive lock. • The exclusive lock would be denied, and only after the first transaction has read the data and released its lock can the second transaction receive its exclusive lock on the data.

Exclusive Lock/Shared Lock • Suppose the first transaction wants to read and update the data in question, it requests an exclusive lock. • (Assuming there are no previous locks on the data it will be granted.) • Now suppose the second transaction wants only to read the data, it would request a shared lock. • The shared lock would be denied and only after the first transaction has read/updated the data and released its lock can the second transaction receive its shared lock on the data.

Exclusive Lock/Exclusive Lock • Suppose the first transaction wants to read and update the data in question, it requests an exclusive lock. • (Assuming there are no previous locks on the data it will be granted.) • Now suppose the second transaction also wants to read and update the data, it would also request an exclusive lock. • The second exclusive lock would be denied and only after the first transaction has read/updated the data and released its lock can the second transaction receive its exclusive lock on the data.

Exclusive/Shared Locks • Only shared locks can be shared. • Exclusive locks must be exclusive. • The DBMS must keep track of which data is locked by what transaction and what kind of lock it is. • The DBMS must also ensure that locks are released when transactions are committed or rolled back.

Upgrading and downgrading locks • Some systems allow a transaction that has a shared lock to request an exclusive lock in mid transaction, this is known as upgrading the lock. • One might read some data, test some condition and then decide to update. • Similarly some systems allow a transaction to change an exclusive lock to a shared lock, i.e. to downgrade the lock. • If one assumed some data would be updated, but then decided not to update that particular set of data.

Simple locking does not cure all problems. The schedule below is still not consistent with the serialized transaction. T9 locks balx. T9 releases balx. T10 locks balx T10 releases balx T10 locks baly T10 releases baly T9 locks baly. T9 releases baly.

Two-phase locking • The scenario on the previous slide can be avoided by insisting on what is called two-phase locking. • In two-phase locking, a transaction cannot release a lock until it has completed all of the locking needed.

Not allowed by two-phase locking. T9 locks balx. T9 releases balx. T10 locks balx T10 releases balx T10 locks baly T10 releases baly T9 locks baly. T9 releases baly. T9 cannot release balx until after it locks baly

Growing and shrinking phases • Two phase locking is so-called since initially the amount of data locked by the transaction can only increase. This is known as the growing phase. • But once data begins to be released it can only be released, then the amount of data locked by the transaction can only decrease. This is known as the shrinking phase.

Upgrade/downgrade two phase • If upgrading a lock is allowed, it is only allowed during the growing phase. • If downgrading a lock is allowed, it is only allowed during the shrinking phase.

Lost data prevented T1 has to wait until T2 releases the lock on balx.

Uncommitted dependency (dirty read) prevented T3 must wait until T4 releases lock, which is done after the rollback.

Inconsistent analysis prevented Analysis put on hold until updating is complete.

Cascading Rollback • If a transaction does not wait until the end to release its locks, then some subsequent transaction may be reading or updating that data. Then if the original transaction is rolled back, so too must the subsequent transaction be rolled back. This phenomenon is known as cascading rollback.

Preventing Cascading Rollbacks • One way to prevent cascading rollbacks is to delay the shrinking phase until the very end of the transaction. No locks are released until the end. (This is known as rigorous two phase locking.) • Another way is to allow shared locks to be released in the middle of a transaction but only allow exclusive locks to be released at the end. (This is known as strict two phase locking.)

Deadlock • If two transactions are both in their growing phase, the first transaction may request a lock on data already locked by the second transaction. Meanwhile the second transaction may be requesting a lock on data already locked by the first. Neither transaction can proceed. • This situation is called deadlock.

Deadlock Example

Breaking the tie • The DBMS must have a way to get out of deadlock. • There are various scenarios. • One of the simplest is setting a limit on the amount of time a transaction will wait for a lock before stopping the transaction and releasing any holds it has. The transaction is then tried again later (by the DBMS, not by the user/application).

References • Database Systems (Rob and Coronel) • Database Systems (Connolly and Begg)

Database Transactions