Chapter 20: Recovery

Chapter 20: Recovery

Failure Types • Transaction Failures: local recovery • System Failure: Global recovery • Main memory is lost • Disk survives • Media Failure

Motivation • Atomicity: All-or-nothing • Transactions may abort (“Rollback”). • Durability: • Changes survive server crash • Desired Behavior after system restarts: • T1, T2 & T3 should be durable. • T4 & T5 should be aborted (effects not seen). crash! c c c a T1 T2 T3 T4 T5

Update Approaches • In-place updating • Change value of object and write it back to the same place on stable storage • Used in most DB • Multiversion System • Old object version is kept, new version is created • E.g., in PostgreSQL • Vacuum procedure that from time to time deletes old versions that are no more needed

Handling the Buffer Pool • Force/NoForce • If transaction T has written object X on page P: Force page P to disk (flush) before T commits • All changes of T are in the database on stable storage before T commits • NoForce: Flushing pages to disk is only determined by replacement policy of buffer manager; • some of the changes of T might not be in the stable database at commit • Steal/NoSteal • NoSteal: If transaction T has updated object X on page P: do NOT flush P before T commits; • No change of an active, uncommitted transaction is on stable storage • Steal: Replacements strategy is allowed to replace and flush a page even if the page contains update of uncommitted transaction • Changes of uncommitted transactions might be in the stable database

Combinations NoSteal Steal Atomic flush at Commit Time flush any time before commit Force Flush any time flush any time after commit No Force

More on Steal • STEAL (why enforcing Atomicity is hard) • To steal frame F:Current page in F (say P) is written to disk; some transaction T holds lock on object A on P. • What if the T with the lock on A aborts? What if the system fails directly after the flush and before T commits? • Must remember the old value of A at steal time (to support UNDOing the write to page P). • Crash case: we have to do something with transaction T4 ACTIVE at the time of the crash • No Steal (no uncommitted changes are in the stable database) • At restart after failure we are sure that none of the changes of T4 are in DB • nothing to do for ACTIVE transactions

More on Force • Force (write changes before transaction commits) • At restart after failure we are sure that changes of T1, T2, and T3 are in DB and changes of T5 are NOT in the DB; nothing to do for TERMINATED transactions • NO FORCE(why enforcing Durability is hard) • Assume a transaction T has modified tuple on page P and T committed but update is not yet in the stable database? Now system crashes before modified page is written to disk? • Write as little as possible, in a convenient place, at commit time,to support REDOing modifications.

Combinations • Ideal: FORCE/NO-Steal: • nothing has to be done at recovery • Problem: basically not possible with update-in-place • In reality: mostly NOFORCE/STEAL

Basic Idea: Logging • A log is a read/append data structure maintained on stable storage (survives failures) • UNDO information: when transaction T updates an object it stores the old version of object (before-image); when transaction aborts, copy before-image to current object-location. • REDO information: when transaction T updates an object it stores the new version of object (after-image); can be used to redo updates of transactions that committed. • In total: whenever a transaction T updates an object, both before- and after-image are written as one log-record and appended to the log. • Additionally: when transaction starts, a BEGIN record is appended to the log; when transaction commits/aborts, a commit/abort record is appended to the log.

Architecture II Secondary Storage (stable) Upper Layer Cache/Buffer Manager Buffer Pool (random access) Access cost: ~15 ms Log Disk Log (append/read) Access cost: ~1 ms

DB pages and Log pages Db page: Rid = (i,N) Page i Rid = (i,2) Rid = (i,1) 20 16 N 24 N . . . 2 1 # slots T255: w(x) Log page: T255: begin T3: commit T3: before(y), after (y) … T255: before(z), after (z) … … T255: before(x), after (x) Log tail

When to flush a log page • The Write-Ahead Logging Protocol: • Must force the log entry for an update before the corresponding data page gets to disk. • Must write all log entries for a Xact beforecommit. • Note: flushing log page is much cheaper than flushing DB page! • #1 guarantees Atomicity • Assume active T has changed X; page with X get flushed to disk (steal); now system crashes before T commits => must undo T changes; need before image of X! • #2 guarantees Durability • Assume T has changed X and committed; page with X does not get flushed to disk (no-force); now system crashes => must redo T changes; need after image of X!

Types of Recovery • Local UNDO during normal processing • whenever a transactions aborts, undo updates of aborted Xact by installing before-images. • Log-records are probably still in main memory; scan backwards starting from log-tail; • Global UNDO: at restart after system crash • Xacts that aborted before the crash (we find abort record in log) • Xacts that were active at the time of the crash (we find neither abort nor commit record in log) • Whenever pages on the disk have updates of such Xacts (we say the update is reflected in the database), undo these updates by installing before-images • Pages contain additional information to detect this!

Types of Recovery • Partial REDO: at restart after system crash • Xacts that committed before the crash (we find a commit record in the log) • Whenever pages on the disk do not have the updates of such Xacts (we say the update is not reflected in the database), redo the updates by installing after-images • Page contains additional information so that we can detect this. • Global REDO: after disk failure • Make snapshot of database (once a day /once a week) • Duplicate log and keep on two disks • Keep log on a second storage • After disk failure • Start with snapshot and then apply log

Recovery after Crash • Simple procedure: • Backward pass: Scan log from tail to head; For each record • If commit of T, include T in list of committed transactions C • If abort of T, include T in list of aborted transactions A • If update record of T and T is neither in A or C, include T in list of aborted transactions • If update record of T on object X and T in A • Read in page P with object X • If update on X performed, install before-image • Forward pass: Scan log from head to tail: for each record • If update record of T on object X and T in C • Read in page P with object X • If update on X not yet performed, install after-image

Example of Recovery LOG BM • Backward pass: • 7: Put T3 in C • 6: Put T2 in A • 6: Read P5; nothing has to be done • 4,5: nothing • 3: put T1 in A • 2: read p3; install before-image of B • 1: read p5: install before image (the write on A was flushed to disk but not the undo during normal processing) 1 2 3 4 5 6 7 update: T1 writes A on P5 update T2 writes B on P3 T1 Abort update: T3 writes C on P1 update: T3 write D on P3 update: T2 writes Z P5 T3 commit CRASH, RESTART P5 is flushed P3 is flushed • Forward pass: • Step 4: read P1 install after-image of P1 • Step 5: read P3; nothing has to be done

Checkpointing • Log becomes longer and longer => recovery has to read entire log! • Periodically, the DBMS creates a checkpoint, in order to minimize the time taken to recover in the event of a system crash. • Simple checkpoint: • Goal: only log that was written after the checkpoint has to be analyzed • Algorithm: • Prevent new transactions from starting • Wait until all transactions have terminated • Flush all dirty pages to stable storage • Write a checkpoint log entry • Start new transactions • Upon recovery: backward pass only goes to last checkpoint entry • In real life more complicated; transaction processing is not interrupted; no big flush in one step

Example Write Chkpt record Flush Buffer start T1 T2 T3 T4 T5 c a c

Further Issues • Crash during Recovery • Logical logging: instead of physical before/after image redo operation / inverse operation (e.g. increment by one, decrement by one) • Hard disk failures: • mirror disk or • Archive copy (consistent copy of database on tape, created e.g. once every night when no transaction processing) + archive log (similar to log shown here) • Real Life: much more complicated: see textbook with ARIES

Chapter 20: Recovery

Chapter 20: Recovery

Presentation Transcript

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20

Chapter 20