1 / 21

MVCC on Flash Memory

MVCC on Flash Memory. Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13. Outline. Motivation. MVCC. Berkeley DB. PostgreSQL. Future work. Motivation. Characteristics Not In-Place Update. HDD. Flash. Motivation. Kinds of Lock.

selena
Télécharger la présentation

MVCC on Flash Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MVCC on Flash Memory Fan Yulei, Lab of WAMDM, School of Information, Renmin University of China, Beijing, China, 2009-06-13

  2. Outline Motivation MVCC Berkeley DB PostgreSQL Future work

  3. Motivation • Characteristics • Not In-Place Update HDD Flash

  4. Motivation Kinds of Lock 1st : Lock 2nd : Release Lock Log File & Data File Checkpoint: D & S 2PL Read Log file  Undo & Redo Multiple Version Read Log file  Undo & Redo Backup Database Hot-standby : mirrored media Directed Acycling Graph • CC • 2PL • MVCC • Conflict graph • Timestamp • Index CC • Recovery • Log • Transaction • Media Timestamp Ordering Index : B+-Tree Transaction MVCC Snapshot Isolation

  5. MVCC Conflict cycle: t1,t2 • Monoversion Schedule • s = r1(x) w1(x) r2(x) w2(y) r1(y) w1(z) c1 c2 • s’ = r1(x) w1(x) r2(x) r1(y) w2(y) w1(z) c1 c2 • Multiversion Schedule & Monoversion Schedule • Multiversion Schedule • m = r1(x0) w1(x1) r2(x1) w2(y2) r1(y0) w1(z1) c1 c2 • h(ri(x))=wj(x) & h(wi(x))=wi(x): version function • Monoversion Schedule • m= r1(x0) w1(x1) r2(x1) w2(y2) r1(y2) w1(z1) c1 c2 • s = r1(x) w1(x) r2(x) w2(y) r1(y) w1(z) c1 c2 • Monoversion Schedule is a special case of Multiversion Schedule

  6. MVCC • Traditional Conflict • s = w0(x) c0 w1(x) c1 r2(x) w2(y) c2 • m = w0(x0) c0 w1(x1) c1 r2(x0) w2(y2) c2 • View Equivalent • Reads-From Relationship • RF(m) := {(ti, x, tj) | rj(xi) ∈OP(m) & ti, tj ∈trans(m)} • View Equivalent • trans(m) = trans(m’) and RF(m) = RF(m’) • Example • m = w0(x0) w0 (y0) c0 r3(x0) w3(x3) c3 w1(x1) c1 r2(x1) w2(y2)c2 • m’ = w0(x0) w0 (y0) c0 w1(x1) c1 r2(x1) r3(x0) w2(y2)w3(x3) c3 c2

  7. MVCC • Multiversion View Serializability • Serializable but not View Equivalent • m = w0(x0) w0(y0)c0 r1(x0) r1(y0) w1(x1) w1(y1)c1 r2(x0) r2(y1)c2 • s = w0(x) w0(y)c0 r1(x) r1(y) w1(x) w1(y)c1 r2(x) r2(y)c2 • MVSR • m’ is a serialized monoversion schedule • trans(m) = trans(m’) • m and m’ are view equivalent • Example • m = w0(x0) w0 (y0) c0 w1(x1) c1 r2(x1) r3(x0) w3(x3) c3 w2(y2)c2 • m’ = w0(x0) w0 (y0) c0 r3(x0) w3(x3) c3 w1(x1) c1 r2(x1) w2(y2)c2 • s = w0(x) w0 (y) c0 r3(x) w3(x) c3 w1(x) c1 r2(x) w2(y)c2

  8. MVCC r2(x0)r2(y1) r2(x1)r2(y0) • Conflict Graph G(m) = (V , E) • V = trans(m) ; • E = {(ti, tj) | rj(xi) ∈OP(m) & ti, tj ∈trans(m)}} • m and m’ are View Equivalent => G(m) = G(m’) • Version Oder • m = w0(x0) w0 (y0) w0 (z0) c0 r1(x0) r2(x0) r2(z0) r3(z0) w1(y1) w2(x2) w3(y3) w3(z3) c1c2c3 r4(x2)r4(y3) r4(z3)c4 • Version Oder = {x0«x2, y0«y1«y3, z0«z3} • MVSG • MVSG = G(m) + Version Order • rk(xj) and wi(xi), k≠i≠j • If xi « xj then (ti, tj) ∈E; else (tk, ti) ∈E • M ∈ MVSR iff MVSG(m, «) have no cycle T1 T3 T4 T0 T2

  9. MVCC • Multiversion Conflict • ri(xj) and wk(xk) and ri(xj) < wk(xk) • Multiversion Conflict Serializability • m’ is a serialized monoversion schedule • trans(m) = trans(m’) • Pair of operations with conflict: same ordering • Multiversion Conflict Graph • E={(ti, tk) | ri(xj) < wk(xk) } • M ∈ MVCR iff MSVG(m, «) have no cycle all MVSR MCSR VSR CSR

  10. MVCC • Limit the number of version: k=2 • w0(x0) c0 r1(x0) w3(x3) c3 w1(x1) c1 r2(x1) w2(x2) c2 • w0(x0) c0 r1(x0) w1(x1) c1 r2(x1) w2(x2) c2 w3(x3) c3 • w0(x0) c0 r1(x0) w1(x1) c1 w3(x3) c3 r2(x3) w2(x2) c2 • w0(x0) c0 r2(x0) w2(x2) c2 r1(x2) w1(x1) c1 w3(x3) c3 • w0(x0) c0 r2(x0) w2(x2) c2 w3(x3) c3 r1(x3) w1(x1) c1 • w0(x0) c0 w3(x3) c3 r1(x3) w1(x1) c1 r2(x1) w2(x2) c2 • w0(x0) c0 w3(y3) c3 r2(x3) w2(x2) c2 r1(x2) w1(x1) c1 • K-version view serializability (kVSR): • Serializable • View equivalent • k newest/nearest version • Hierarchy Relationship x1,x2 x2,x3 x2,x3 x1,x3 x1,x3 x1,x2 x1,x2

  11. MVCC • MVCC Protocol • MVTO (multiversion timestamp ordering) • MV2PL : 2VPL • three kinds of kinds: rl, wl, cl • MVSGT • ROMV • Read-only transaction

  12. Berkeley DB a standalone utility • Five components • Deadlock detection • db_deadlock • DB_ENV->lock_detect, DB_ENV->set_lk_detect • Checkpoints • db_checkpoint • DB_ENV->txn_checkpoint • Database and log file archival • db_archive • DB_ENV->log_archive • Log file removal • db_archive • DB_ENV->log_archive • Recovery procedures • db_recover • DB_ENV->open one or more library interfaces

  13. Berkeley DB • Transaction API • Transaction Subsystem and Related Methods Description • DB_ENV->txn_checkpoint, DB_ENV->txn_recover DB_ENV->txn_stat • DB_ENV->open DB_ENV->close DB_ENV->remove • Transaction Subsystem Configuration • DB_ENV->set_timeout DB_ENV->set_tx_max DB_ENV->set_tx_timestamp • Transaction Operations • DB_ENV->txn_begin DB_TXN->abort DB_TXN->commit DB_TXN->discard DB_TXN->id DB_TXN->prepare DB_TXN->set_name DB_TXN->set_timeout

  14. Berkeley DB • 2PL In Berkeley DB • Locks are released • during DB_TXN->abort or DB_TXN->commit. • Guidelines: • If possible, use nested transactions to protect the parts of your transaction most likely to deadlock • Transaction limits • Transaction IDs: 31-bit unsigned integer (OX80000000) • Cursors: can not span more transactions, must be opened and closed within a single transaction • Multiple Threads of Control:

  15. Berkeley DB • Several filesystem operations on Berkeley DB • Disk seek to database file, Database file read, Disk seek to log file, Log file write, Disk seek to update log file metadata, Log metadata write, Flush log file information to disk, Flush log file metadata to disk • Ways to increase transactional throughput • Berkeley DB software support group commit • Additional tuning parameters • Tune the size of the database cache • Put the database and the log files on different disks • Set the filesystem configuration • Upgrade your hardware • Turn on DB_TXN_WRITE_NOSYNC or DB_TXN_NOSYNC flags • ACI, but not D

  16. PostgreSQL • PG: a sanpshot of data • Reading never blocks writing • Writing never blocks reading • Three undesirable phenomena • dirty reads, non-repeatable reads, phantom read • SQL Transaction Isolation Levels

  17. PostgreSQL • Read Committed Isolation Level • the default isolation level • A SELECT query sees only data committed • The SELECT does see the effects of previous updates executed within this same transaction • Two successive SELECTs can see different data • Other transactions commit changes during executions • NOT adequate for many applications that do complex queries and updates • Serializable Isolation Level • This level emulates serial transaction execution.

  18. PostgreSQL • Data consistency checks at the application level • Readers in PostgreSQL don't lock data • To ensure the current existence of a row and protect it against concurrent updates one must use SELECT FOR UPDATE or an appropriate LOCK TABLE statement. (SELECT FOR UPDATE locks just the returned rows against concurrent updates, while LOCK TABLE protects the whole table.) • Lock and Tables • Table-level Lock • Row-level : when rows are being updated • Lock and Index • Gist and R-tree : released after statement is done • Hash Index : released after page is processed • B-Tree : released immediately after each index tuple is fetched/inserted

  19. Future work • Experiment • BDB & PG Code • Transaction on Flash Memory • Concurrency Control • MVCC • Recovery • Log

  20. Thank You !

More Related