1 / 36

ADVANCED TRANSACTION TOPICS

This article discusses reading uncommitted data, resolving deadlocks, and replication issues in advanced transaction topics in distributed databases.

cesarp
Télécharger la présentation

ADVANCED TRANSACTION TOPICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ADVANCEDTRANSACTIONTOPICS Spring 2015

  2. Warning This is a first draft I welcome your corrections

  3. Transactions that read uncommitted data

  4. Reading uncommitted data • Can create cascading rollbacks • solutions • strict locking: T will not release any write lock (or increment lock) until the transaction has aborted or committed and the commit/abort record has been flushed to disk • recoverable schedule • Must still fix data in buffers

  5. Resolving deadlocks

  6. Deadlock detection by timeouts • Simplest method • A transaction is assumed to be in a deadlock with other transactions if it fails to make progress over a certain time interval • Must then decide which transaction to abort

  7. Deadlock detection by wait-for graphs (I) • We represent transactions by the summits of a graph that has an arc from summit T to summit U iff transactionT is waiting for a lock that is currently held by transaction U T U

  8. Deadlock detection by wait-for graphs (II) • A sufficient and necessary condition for a deadlock to exist is the existnce of a cycle in the wait for graph T U V W U, V and W are involved in a deadlock

  9. Limitations • Maintaining the wait-for graph is time-consuming • Especially for distributed DB • Can use cheaper heuristics • Prevent all deadlocks • Will abort from time to time transactions that do not need to be aborted

  10. Deadlock elimination heuristics • Attach to each transaction an immutable time stamp • Does not change when a transaction is rolled back • Two possible schemes • Wait-Die • Wound-Wait

  11. Wait-die scheme • When transaction T has to wait for a lock that is held by a transaction u • If T has a smaller timestampthan U • T is older than U • T is allowed to wait for the lock held by U • If U has a smaller timestampthan T • U is older than T • T is aborted and restarted Deadlocks are avoided because no transaction will ever wait for a lock held by an older transaction

  12. Wound-wait scheme • When transaction T has to wait for a lock that is held by a transaction u • If T has a smaller timestampthan U • T "wounds" U: it forces it to roll back unless U is very close to releasing its locks • If U has a smaller timestampthan T • T is allowed to wait for the lock held by U Deadlocks are avoided because no transaction will ever wait for a lock held by an newer transaction

  13. Distributed databases

  14. The idea • Storing relations or fragment of relations • on different servers • at different sites with the possibility of replacing some relations • Still maintaining central control • Not the same as federated databases

  15. Motivations (I) • Keeping data closer to their users • An industrial concern that has several plants might want information about its personnel to be kept at the plant where each employee works • EMPLOYEE relation will be horizontally partitioned

  16. Motivations (II) • Splitting data according to their primary usages • May want to split vertically EMPLOYEE relation by keeping some of its attributes in one table and moving the others to a different table

  17. Motivations (III) • Replicating data at different sites in order to: • Increase DB availability • Reduce risk of data loss • Speed up read-only queries • But not update queries

  18. Replication issues • Maintaining the replicated data in a "consistent" state • That is, keeping replicas identical • Selecting the proper level of replication and the locations of the replicas • Tradeoffs between faster read access and slower updates • Storage costs • Handling network partitions

  19. Availability and Reliability • Availability • Fraction of time the DBMS is operational • Often expressed in nines • 99.9 percent is 3 nines • 99.999 percent is five nines • Reliability • Function R(t) representing the probability the system will not fail over the time interval [0, t]

  20. Tradeoffs • Must distinguish between read availability and write availability • Will make tradeoff between • Availability and data replication cost • Higher storage cost • Higher update costs • Read availability and write availability

  21. Example CAN SKIPTHE MATH • Let A be the availability of a single server • To increase the read availability of a DB we can replicate it on two servers • Use write all/read any • Writes must update both replicas • Can read from either of them • Read availability = A×A + 2A(1-A) = 2A – A2 • Write availability = A×A= A2 • A2 < A !!!

  22. Example CAN SKIPTHE MATH • Can also replicate the DB on three servers and use write all/read any • Read availability =A3 + 3A2(1 – A) + 3A(1 – A)2 • Write availability = A3 • A3 < A !!!

  23. Example CAN SKIPTHE MATH • With three servers, we can use majority consensus voting • Two servers out of 3 • Required for all accesses • Read availability = write availability =A3 + 3A2(1 – A) = 3A2 – 2A3 • Majority consensus voting improves both read and write availabilities

  24. Updating distributed DBs • Updates can now involve several local updates at the different sites where the DB data are stored • Challenge is to handle partial failures that could result in partial updates that would leave the DB in an inconsistent state • Solution is two-phase commit protocol

  25. Two-phase commit protocol • Has nothing to do with two-phase locking. • Applies of distributed entities consisting of • A leader L that initiates the updates • Also called the coordinator • Followers F, which participate in the update and its validation • Lead and followers each have their own log

  26. First phase • Leader • Writes on its log <PREPARE T> • Sends to all followers a PREPARE T message • Followers that are ready to finalize the transaction • Write on their logs <READY T> • Send to leader a READY T message • Other followers • Write on their logs <DON'T COMMIT T> • Send to leader a DON'T COMMIT T message

  27. Second phase • If leader has received READY T messages from all sites involved in the transaction • Writes to its log < COMMIT T> • Sends to all followers a COMMIT T message • Followers commit T and write on their logs <COMMIT T> • Otherwise leader • Writes to its log < ABORT T> • Sends to all followers an ABORT T message • Followers abort T and write on their logs <ABORT T>

  28. Recovering from a crash • Sites that find on their log • <COMMIT T> can assume that the T is committed and replay it if needed • <ABORT T> or <DON'T COMMIT T> should abort T • <READY T>must check with all other sites in order to know the status of T • no mention of T can safely abort it

  29. Recovering from a leader failure • Followers should elect a new leader • Use arbitrary rule, such as site with lowest IP address • The rule is arbitrary but must result in the election of a singleleader • Compare with succession rule inWestern monarchies

  30. Recovering from a leader failure • New leader will poll its followers (and itself) about the status of transactions and will • Commit all transactions that are committed at some sites • Abort all transactions that are aborted or in <DON'T COMMIT T> state at some sites • Commit all transactions that in <READY T> state at all sites and abort them otherwise

  31. Distributed locking

  32. The three approaches (I) • Using centralized locking • Lock server • Use a primary copy approach • All updates are performed first on a single primary copies then distributed to secondary copies • Secondary copies are read-only and can be only used for operations that do not modify the data. • Locks only apply to primary copy

  33. The three approaches (II) • Implementing global locking with local locks • Useful for managing replicated data • Each replica has its own locks • In write all/read any • Global read lock requires acquiring a single local lock for the replicated entity • Global write lock requires acquiring all local locks for the replicated entity • In majority consensus voting • Global lock requires acquiring a majority of the local lock for the replicated entity

  34. Long-running transactions

  35. Long-running transactions • Pessimistic concurrency control require transactions to maintain locks for most of their duration • Two-phase locking • Not possible for long running transactions • Especially those involving human interactions • Approving travel plans and expenses

  36. The best solution • Decompose long-running transactions into smaller transactions that • Correspond to steps of long-running transactions • Can individually commit as soon as step is completed • Rely on compensating transactions—anti-transactions—to undo them if the long-running transaction must be aborted

More Related