Database Replication Using Generalized Snapshot Isolation
300 likes | 325 Vues
Learn about Snapshot Isolation in databases, its advantages, drawbacks, and use in replication. Explore Generalized Snapshot Isolation for improved scalability and transaction consistency in replicated environments.
Database Replication Using Generalized Snapshot Isolation
E N D
Presentation Transcript
Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL
Snapshot Isolation (SI) • Snapshot = committed state of database • On begin: • Snapshot(T) = latest snapshot at start(T) • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its start & commit times
Advantages of SI • Read-only T’s never block or abort • Read-only T’s never cause update T’s to block or abort • Compare to 2PL • No read-locks are used in SI • Important for read-dominated workloads
Drawbacks of SI • Not serializable • Permits certain anomalies • But • Anomalies are rare in practice • Conditions on workload can identify and avoid them • Developers use SI serializably
Summary of SI • SI is here to stay • Used in several databases, e.g., • Oracle • PostgreSQL • Microsoft SQL Server ( 2PL & SI ) • Borland InterBase
SI Replication • Replicate SI to scale performance for dynamic content Web servers • E.g., E-commerce, bulletin boards • Workload is suitable for SI • Read-only T’s dominate workload • Update T’s are short & few • How to maintain SI properties?
SI in Replicated Database • On begin: • Snapshot(T) = latest snapshot at start(T) • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its start & commit times
Strict SI in Replicated Database • On begin: • Snapshot(T) = latest snapshot at start(T) • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its start & commit times
Generalized Snapshot Isolation (GSI) • On begin: • Snapshot(T) = (latest) older snapshot • At replica, use latest local snapshot • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its (start) snapshot & commit times
Generalized Snapshot Isolation (GSI) • On begin: • Snapshot(T) = (latest) older snapshot • At replica, use latest local snapshot • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its (start) snapshot & commit times Certification for update T
Advantages of GSI • All T’s reads and writes are local • Important for replicated databases • Read-only T’s never block or abort • Read-only T’s never cause update T’s to block or abort • Important for read-dominated workloads
A - GSI Serializability • Not serializable • Permits certain anomalies as in SI But • Anomalies are rare in practice • Two serializability conditions (in the paper) • Static: examine transaction templates • Dynamic: at run time • Easy to verify workload is serializable • Easy to modify workload to be serializable
A - GSI Serializability • Not serializable • Permits certain anomalies as in SI But • Anomalies are rare in practice • Two serializability conditions (in the paper) • Static: examine transaction templates • Dynamic: at run time • Easy to verify workload is serializable • Easy to modify workload to be serializable Similar to what many OracleDBA’s already do
B - GSI Older Snapshots 1- On begin: Snapshot(T) = (latest) older snapshot • GSI uses older snapshots But • Clear definition, always consistent data • No new anomalies ( same as in SI ) • In replicated database • Transparent: db appears as running SI • Efficient: reads are non-blocking • Staleness: can be bounded
C - GSI Abort Rates • 3- On commit: • - Read-only T commits immediately • - Update T commits if no conflicting writes between its (start) snapshot & commit times • Potentially higher abort rate for updates But • Abort rates are small in target workloads • GSI Abort rates can be higher or lower Certification for update T
GSI in Replicated Databases • System consists of • Many SI replicas, full replication • Centralized certifier ( distributed in the paper ) • A client connects to one replica • Issues read and update transactions • Algorithm implements an instance GSI • Snapshot(T) = latest local snapshot at replica
Algorithm at Replica • On begin: • Provide T with a local Snapshot • Record T.version = Snapshot.version • On read or write operation: • Run transaction (reads/writes) locally • Record T.writeset • On commit: • IF ( T is read-only ) THEN { commit } • ELSE { Invoke certification ( T.version, T.writeset ). . . }
Algorithm at Certifier • Check for conflicting writes from committed T’s with larger version number • IF ( yes ) THEN { Reply ( abort ) } • ELSE { Advance certifier-version Record (writeset, certifier-version) to log Reply ( 1 - commit, 2 - certifier-version, 3 - “missing” writesets ) }
Algorithm at Replica (cont.) • On begin: . . . • On read or write operation: . . . • On commit: • IF ( T is read-only ) THEN { commit } • ELSE { Invoke certification (T.version, T.writeset )1- Apply “missing” writesets 2- Commit locally 3- Advance local version }
Performance Tradeoff GSI : SI • GSI • better response time • SI • “fresher” data (latest snapshot in the system) • lower abort rate for updates (?) • Analytical performance model • Model used by Jim Gray • Replicated database over WAN
Analytical Model • GSI • Execute T immediately • Updates are certified remotely (communication) • SI • Block T to obtain latest version (communication) • Updates are certified remotely (communication) • Objective is to compare GSI : SI • Response time • Abort rate
Analytical Equations • Parameters x = round trip delay / transaction length • Response time ratio (GSI : SI) Read-only update
Analytical Equations • Parameters x = round trip delay / transaction length t = snapshot age / transaction length • Response time ratio (GSI : SI) Read-only update • Abort rate ratio (GSI : SI) Read-only (never aborted!) update
Analytical Results • Parameters x = round trip delay / transaction length t = snapshot age / transaction length • X-axis x = round trip delay / transaction length x = 0 centralized database x is increasing as technology advances • Y-axis Response time ratio (for reads & updates) Abort ratio (updates)
Response Time Ratio of GSI : SI • . GSI is better
Abort Ratio of GSI : SI for Updates • . SI better GSI better Parameter t = ( snapshot age / transaction length )
Abort Ratio of GSI : SI for Updates • . t decreasing fresher snapshot SI better GSI better Parameter t = ( snapshot age / transaction length )
GSI : SI - Summary • GSI response times are better • Read-only T’s ratio : significantly better • Update T’s ratio : reaches ½ • GSI abort rate • maybe higher or lower • COST: observing older data in GSI • Favorable trade-off • Distributed environments • Read-dominated workloads
Conclusions • GSI is appealing for replication • All T’s read & write operations are local • Read-only T’s never block or abort • GSI can be made serializable • Algorithm for GSI in replicated databases • Analytical results are encouraging