410 likes | 612 Vues
F28DM Database Management Systems Transaction Management. Monica Farrow monica@macs.hw.ac.uk Room: EMG30, Ext: 4160 Material on Vision Content taken from HW & GLA lecturers. In this lecture. All about transactions What is a transaction? Problems with conflicting read/writes
E N D
F28DM Database Management SystemsTransaction Management Monica Farrow monica@macs.hw.ac.uk Room: EMG30, Ext: 4160 Material on Vision Content taken from HW & GLA lecturers Transaction Management
In this lecture • All about transactions • What is a transaction? • Problems with conflicting read/writes • Serialisation and precedence graphs • Locks and deadlocks • Timestamps • MVCC • Exercises Transaction Management
Supporting Concurrent Access • Many applications require a lot of users to access the data simultaneously (e.g. airline booking systems) • Uncontrolled simultaneous access can result in chaos, so some controlling mechanism is required • A transaction is a logical unit of work which takes the DB from one consistent state to another, i.e. obeying constraints • It will probably be made up of smaller operations which temporarily cause inconsistency Transaction Management
Database Transactions • Database transactions are logical units of work which must ALL be performed to maintain data integrity • E.g. Move money from one account to another • UPDATE Account SET balance = balance – 100WHERE accountNo = 123; • UPDATE Account SET balance = balance + 100WHERE accountNo = 124; • Another example would be a purchase • Create order, decrease stock quantity, add payment Transaction Management
More generally, in databases • A transaction is the execution of a program that accesses the DB and • starts with a BEGIN operation, • followed by a sequence of READ and WRITE operations, • ending with a COMMIT or ABORT operation. • An update, for example adding 10 to a value, will actually • begin • first read the value, • calculate the new value, • and then write the new value • commit Transaction Management
How transactions are used • Transactions are used for three purposes in DBMS: • to determine when integrity constraint checks should occur (only at the end of transactions) • to control concurrent access. Gives a single user the illusion of being the sole user of the database • to manage recovery from system crashes Transaction Management
User transactions (aside) • In database requirements, the word ‘transaction’ is used in a general way, • meaning a user’s logical unit of work, without necessarily requiring this to use database transactions discussed in this lecture. • e.g. • Show data to user • Ask for response • Use response in next sql command Transaction Management
User transactions and time • There is a problem with this sort of transaction where the user is involved • because of the length of time taken. • It is impractical to treat them within the DBMS in the same way as transactions not involving a user. • There are various partial solutions not discussed in this course. Transaction Management
ACID Properties of Transactions • Atomicity • ALL operations in a transaction must be completed. If not, the transaction is aborted. The entire transaction is treated as a single, indivisible unit of work which must be performed completely or not at all. • Consistency • A successful transaction takes the database from one state that is consistent with the rules to another state that is also consistent with the rules. • If an operation is executed that violates the database’s integrity constraints, the entire transaction will be rolled back. Transaction Management
ACID Properties of Transactions • Isolation • Data used within a transaction cannot be used by another transaction until the first transaction is completed. (or it must appear that this happened!). The partial effects of incomplete transactions should not be visible to other transactions. • Durability • Once the transaction changes have been made, they will survive failure. The recovery system must ensure this. Transaction Management
Transactions in MySQL • To successfully transfer money, use the START TRANSACTION and COMMIT commands: START TRANSACTION UPDATE Account SET balance = balance – 100WHERE accountNo = 123; UPDATE Account SET balance = balance + 100WHERE accountNo = 124; COMMIT; • The database is not updated until the COMMIT command is executed Transaction Management
Transactions in MySQL • Transactions can consist of one or more SQL commands. • Not all storage engines support transactions, but InnoDB does • You should be specifying InnoDB in your Create Table statements because it supports foreign key constraints and transactions. Transaction Management
Implicit Transactions in MySQL • MySQL runs by default with auto-commit enabled. • Each MySQL statement is treated as a single transaction, with an implicit COMMIT at the end. • In this case, UPDATE Account SET balance = balance + 100; • is the same as START TRANSACTION UPDATE Account SET balance = balance + 100; COMMIT; Transaction Management
Rollback • The DBMS maintains a transaction log. • If the computer crashes in the middle of a transaction, the DBMS will rollback the database to the last completed transaction Transaction Management
Rollback • Also, you can use the MySQLROLLBACK command • This is most useful for testing updates – your database is restored to the state immediately before the last transaction. Check it worked, then rollback START TRANSACTION UPDATE Account SET bal = bal – 100 WHERE accountNo = 123; UPDATE Account SET bal = bal + 100 WHERE accountNo = 124; SELECT balance FROM Account WHERE accountNo = 123; SELECT balance FROM Account WHERE accountNo = 124; ROLLBACK; Transaction Management
Concurrent access • In introducing many users, we can either serialise their transactions or interleave them A | Get V | Add 10 | Put V B || Get X | Add 20| put X V 55 5 5 515 15 X 1010 10 10 10 10 30 Serialised schedule Interleaved schedule A Get V Add 10 Put V B || | Get X Add 20 put X V 55 15 15 15 15 X 1010 10 10 10 30 Transaction Management
Concurrent access • We wish to interleave transactions, as we want to use the processor to perform other work while one transaction waits for a disc access • However, we must not allow the transactions to conflict with each other • Conflict may occur when two transactions are trying to use the same piece of data and at least one of them is trying to change it Transaction Management
Schedules • The execution of a set of transactions is called a schedule • If each transaction is executed entirely before the next transaction is started, the schedule is said to be serial • Non-serial schedules are called interleaved schedules • A schedule is serializable if it has the same effect on the database as a serial schedule • Here are some examples of problems with non-serial schedules Transaction Management
A | Get V | Add 10 | Put V B || Get V | Add 20| put V V 55 5 5 515 25 Problem 1 - Lost updates A and B both take a copy of the original value of V They both change the value in memory A puts back its new value first and then B puts back its new value which immediately overwrites A’s change A's update is lost! Consider two transactions A and B which add 10 and 20 respectively to a value V Transaction Management
A | Store 20 in V | Crash &Rollback B || Get V | | Use wrongvalue of V| V 520 20 55 5 Problem 2 - Temporary Update A updates V B uses A's updated value A aborts and V's old value is restored B continues with erroneous value! Transaction Management
Problem 3 - Incorrect Summary A updates all the values in a set V B calculates an average while A is half-way through B uses inconsistent data B || | Read all Vs | Calculate avg | A | Update V1 Update V2 . . . . . Update Vn Transaction Management
Determining if schedule is serialisable • A precedence graph is a good way of checking whether a schedule is serialisable • If there are no cycles in the diagram, the schedule is serialised. • i.e. the output of one transaction does not depend on the output of another transaction cyclic acyclic T1 T2 T1 T2 T3 T3 Transaction Management
Drawing a precedence graph • Draw a node for each transaction in the schedule • For each action, check to see if • A read follows an earlier write by a different transaction • A write follows an earlier read or write by a different transaction • In each case, draw an arrow from the earlier transaction to the later transaction B a4/a7 T1 T2 A a1/a5 C a6/a8 A a3/a5 T3 Transaction Management
Drawing a precedence graph • It is not usual to put the action numbers (a1/a5) in precedence graphs, they are just there to help you understand which actions the lines have come from • This graph is acyclic, so the schedule is serialisable B a4/a7 T1 T2 A a1/a5 C a6/a8 A a3/a5 T3 Transaction Management
Drawing a precedence graph • You can check your graphs herehttp://www-stud.uni-due.de/~selastoe/?mdl=dbms&mode=precedence#graph • Another example • This graph is cyclic, so the schedule is not serialisable B a2/a6 T1 T2 B a4/a5 Transaction Management
Concurrency control Algorithms • The scheduler component of a DBMS applies some concurrency control algorithm (enforces a protocol) to ensure that only serializable schedules are permitted • Concurrency control algorithms can be divided into • Locking vs timestamp protocols • Pessimistic and optimistic • One commonly used algorithm is 2-phase locking Transaction Management
Locking • Every time a transaction makes use of a piece of data it notifies the DBMS of this and acquires a lock on that item • This gives it certain access rights, usually one of two types: • an exclusive Lock (X-lock) means that no-one else can use it • a shared Lock (S-lock) means that anyone else can also have an S-lock but not an X-lock • (NB – some DBMS have more than this) • "One writer or many readers" • When updating, the transaction needs an X-lock • When retrieving, the transaction only needs an S-lock Transaction Management
Locking continued • If a transaction tries to acquire a lock but someone else already holds an incompatible lock, the transaction must wait • The database system might provide locks at different levels of granularity: • e.g. locking a cell, a record, a page, the whole table, the whole database • the bigger the locking unit the more the system will be slowed down by blocked transactions • the smaller the locking unit the more lock management needs to be done Transaction Management
2-Phase Locking protocol • If a lock request cannot be granted, the transaction must wait. • Rather than acquiring and releasing locks whenever they are required, a 2-phase locking protocol if often used. • The transaction passes through 2 phases • a growth phase, acquiring locks and not releasing any • a shrink phase, releasing locks and not acquiring more. • There are variations – in strict 2PL, all write locks are released at the end. Transaction Management
Demonstrating locks in MySQL • MySQL uses row-level locking • The update in Terminal 2 cannot be processed until the commit is processed in Terminal 1 Terminal 1 START TRANSACTION; Terminal 2 UPDATE Account SET balance = 100 WHERE accountNo = 23525; UPDATE Account SET balance = balance + 10 WHERE accountNo = 23525; UPDATE Account SET balance = 100 WHERE accountNo = 23526; COMMIT; Transaction Management
Solving Lost Updates Transaction A Transaction B Value of V Request X-lock on V 5 Request X-lock on V 5 Acquire X-lock on V 5 Wait 5 Get V .... 5 .... 5 Update V Wait 15 Release X-lock on V 15 Acquire X-lock on V 15 Get V 15 Update V 35 Release X-lock on V 35 Transaction Management
Solving Temporary update Transaction A Transaction B Value of V Request X-lock on V 5 Acquire X-lock on V Request S-lock on V 5 Wait 5 Set V to 20 20 Crash 20 Roll back 5 Release lock 5 Acquire an S-lock on V 5 Get V 5 Transaction Management
Solving incorrect summary Transaction A Transaction B Request X-lock on V1 Acquire X-lock on V1 Request S-lock on V1 Update V1 Wait Request X-lock on V2 . . . . . . . Release all locks Acquire S-lock on V1 etc. Transaction Management
Deadlocks • 2PL still leads to deadlocks • A deadlock is a cycle of transactions waiting for locks to be released by each other • E.g. • T1 holds excl(Y) and requests shrd(Z) • T2 is holding excl(Z) and requests shrd(Y) • Deadlocks can be timed-out, prevented or detected Transaction Management
Demonstrating deadlocks in MySQL • You can create a deadlock in MySQL as shown below • The 2nd update in Terminal 2 causes deadlock and an error message is produced – aborted, try again Terminal 1 START TRANSACTION; Terminal 2 Update row in Account table START TRANSACTION; where accountNo = 23525 (row locked) Update row In Employee table where staffNo = 301 (row locked) Update row where staffNo = 301 In Employee table (has to wait) Update row in Account table where accountNo = 23525 (deadlock) Transaction Management
Deadlock handling • Time-outs • Assume that if a transaction is blocked longer than a certain period of time, it must be involved in a deadlock. Abort. • Easy to implement, but may abort some transactions unnecessarily • Prevention • Order transactions by timestamps, apply rules as to whether transactions are allowed to wait or must be restarted. • Detection • Periodically check for deadlock. Create a Wait-For Graph (similar to Precedence Graphs, but for locks) • Deadlock exists if there are cycles in the graph. Abort transactions until cycles vanish Transaction Management
Time Based Concurrency Control: Idea • Lock-based concurrency control is pessimistic • Assumption – conflicts are likely to happen, and locking prevents this • Timestamp-based concurrency control is optimistic • Assign timestamps to transactions and the data that they access • Validate timestamps and abort if incompatible Transaction Management
Timestamps • Each transaction T gets a ‘timestamp’ TS at startup • This may be from the system clock or simply by incrementing a logical counter for each transaction • Each data item contains timestamp data • Read-timestamp: timestamp of the transactions which read the data most recently • Write-timestamp: timestamp of the transactions which wrote the data most recently Transaction Management
Timestamps • R/W operation is only allowed if the last update on a data item was carried out by an older transaction. • If not, the transaction requesting the operation is aborted and restarted with a new timestamp. • No locking means no waiting and no deadlocks Transaction Management
Multiversion concurrency control • MVCC provides each user connected to the database with a "snapshot" of the database for that person to work with. • A sequence of read actions will use the same snapshot, so that the data is consistent • Read queries do not use locks and so do not block updates. • MySQL InnoDB and many other DBMS, such as Oracle, use MVCC. Transaction Management