1 / 21

# Query Optimization

Query Optimization. Ex.: SELECT DISTINCT Orders.Customer FROM Orders, Contains WHERE Orders.O_No = Contains.O_No AND Contains.Product = 'Brie' Assumptions: 100,000 tuples in Orders, 1000 bytes each

Télécharger la présentation

## Query Optimization

E N D

### Presentation Transcript

1. Query Optimization • Ex.: • SELECT DISTINCT Orders.Customer • FROM Orders, Contains • WHERE Orders.O_No = Contains.O_No • AND Contains.Product = 'Brie' • Assumptions: • 100,000 tuples in Orders, 1000 bytes each • 1,000,000 tuples in Contains, 100 bytes each • 1,000 tuples in Contains concern Brie • 100 MB main memory O. Günther: Database Management Systems

2. Query Optimization (cont.) • Strategy 1: • 1) Compute cartesian product Orders  Contains • 2) Select all tuples with Orders.O_No = Contains.O_No • 3) Select all tuples with Contains.Product = 'Brie' • 4) Project to Customer • Strategy 2: • 1) Select all tuples from Contains with Product = 'Brie' • 2) Compute cartesian product with Orders • 3) Select all tuples with Orders.O_No = Contains.O_No • 4) Project to Customer O. Günther: Database Management Systems

3. Query Optimization (cont.) • Analysis Strategy 1: • (1)+(2): Tuple-I/Os for Orders: • Tuple-I/Os for Contains: • (3)+(4): Tuple-I/Os: • Tuple-I/Os in total: • Analysis Strategy 2: • (1): Tuple-I/Os for Contains: • (2)-(4): Tuple-I/Os: • Tuple-I/Os in total: • Strategy 2 is clearly superior (Factor?) O. Günther: Database Management Systems

4. Query Optimization (cont.) • Which (meta)data should be stored? (Statistics) • - number of tuples for each relation • - number of columns for each relation • - number of different values per column • - occurence frequencies of particular values • More information facilitates query optimization but slows down updates • Automatical optimization preferable because • - statistics always up-to-date • - more cost-efficient • - dynamic • Important strength of relational systems O. Günther: Database Management Systems

5. Transaction Processing • Transaction (TA) • - logical unit of work • - should be executed either completely or not at all • - atomic, i.e., not decomposable • Recovery • Concurrency O. Günther: Database Management Systems

6. Recovery • Recovery: restart after system fault • System faults • - program crash • - arithmetic mistakes (e.g. overflow) • - disk crash • - power failure • Ex.: • DELETE • FROM Contains • WHERE O_No = 1024 • What happens in case of system fault „in the middle“ O. Günther: Database Management Systems

7. Recovery (cont.) • COMMIT • - operation to terminate a TA successfully • - all updates are stored in the database permanently • - storage on „safe“ storage medium • - transaktion is finalized • - bundling of several COMMIT operations in checkpoints • ROLLBACK • - operation to abort a TA in case of system fault • - changes in CPU registers and storage are reversed • Important for ROLLBACK • - logging each single modification • - storing the log on a „safe“ medium O. Günther: Database Management Systems

8. recovery checkpoint checkpoint checkpoint error Recovery (cont.) (Updates are stored on some “safe” medium) O. Günther: Database Management Systems

9. Recovery (cont.) • 3 types of transactions • - transactions that already completed and whose results have been made • permanent: T1 • - transactions that have already completed but whose results have not • yet been made permanent: T2, T4  REDO (i.e. re-run, after recovery • these transactions will have completed) • - transactions that started but that did not finish: T3, T5  UNDO (i.e. • reversal of all modifications, ROLLBACK of each transaction concerned; • after recovery these transactions will not have completed) O. Günther: Database Management Systems

10. transaction A transaction B update R.X read from R.X action on basis of R.X commit B Problem! ROLLBACK A R .. relation R.X .. attributes of R Concurrency: Dirty Read Problem O. Günther: Database Management Systems

11. transaction A transaction B Concurrency: Lost Update Problem A reads R.X B reads R.X double R.X B adds 2 to R.X A writes new value of R.X B writes new value of R.X Commit A Commit B transaction A transaction B A changes R.X A reads R.X B changes R.X A Rollback B Commit O. Günther: Database Management Systems

12. Concurrency: Possible Solutions • Timestamps to coordinate transactions • Locks: temporary blocking of parts of the database • - exclusive lock (X-Lock): read/write lock, i.e. no other TA • is allowed to read or write the blocked data • - shared lock (S-Lock): write lock, i.e., others can read but not write • If a TA wants to read, it first has to ask for an S-lock for the required data • If a TA wants to write, it first has to ask for an X-lock for the required data • compatibility of locks • S+S ... OK • S+X ... Not OK • X+X ... Not OK O. Günther: Database Management Systems

13. Locks: Application to Dirty Read N N Yes N Yes Yes Yes Yes Yes O. Günther: Database Management Systems

14. Locks: Application to Dirty Read (cont.) TA A obtains an X-lock for the field R.X to prepare for the planned update TA B asks for an S-lock to prepare for the planned read operation  REJECTED ROLLBACK A  locks are released TA A obtains S-lock TA B performs read operation + COMMIT restart TA A • Ex. 1: O. Günther: Database Management Systems

15. Locks: Application to Dirty Read (cont.) TA A requests X-Lock for R.X TA A obtains X-Lock, updates R.X TA B requests S-Lock  REJECTED, TA B waits TA A ROLLBACK TA B obtains S-Lock, reads R.X TA B COMMIT restart TA A, re-obtains X-Lock • Ex. 2: O. Günther: Database Management Systems

16. TA A wants to read R.X, asks for S-lock TA A obtains S-lock, reads R.X TA B also wants to read R.X, asks for S-Lock TA B obtains S-Lock, reads R.X TA A wants to update R.X, asks for X-Lock TA A does not obtain X-Lock because TA B holds an S-Lock  A waits TA B wants to update R.X, asks for X-Lock TA B does NOT obtain X-Lock  B waits Locks: Application to Lost Update DEADLOCK  break via Rollback of some TA O. Günther: Database Management Systems

17. Deadlocks • Problem: How to recognize deadlocks? • How to treat deadlocks involving several TAs? • Searching for cycles in the WAIT-FOR graph wait for O. Günther: Database Management Systems

18. Serializability • Given a set of TAs, which possible events should be considered correct? • Convention: a schedule is considered correct if it is serializable • Serializability means that the result of the schedule is identical to the result • of some serial schedule • Ex.: (TA1) A := A + 1  read A into main memory add 1 write A back into the DB (TA2) A := 2 * A  read A into main memory multiply by 2 write A back into the DB (TA3) write A  read A into main memory display A on the screen set A to 1 in the DB O. Günther: Database Management Systems

19. Serializability - An Example • Assumption: A = 1 • TA1, TA2, TA3: • TA1, TA3, TA2: • TA2, TA3, TA1: • TA2, TA1, TA3: • TA3, TA1, TA2: • TA3, TA2, TA1: O. Günther: Database Management Systems

20. Concurrency: 2-Phase Locking • 2-Phase locking protocol • for each transaction one first asks for all required locks (phase I) • processing ... • then all locks are (gradually) released (phase II) number of locks TA2: no 2-phase-locking O. Günther: Database Management Systems

21. Concurrency and 2-Phase Locking Theorem: 2-Phase Locking Protokoll for each transaction Serializability of the schedule all „reasonable“ possibilities serializable 2-phase-locking equivalent to FIFO serial O. Günther: Database Management Systems

More Related