1 / 48

CSC 536 Lecture 10

CSC 536 Lecture 10. Outline. Recovery Case study Google Spanner. Recovery. Recovery. Error recovery: replace a a present erroneous state with an error-free state Backward recovery: bring system into a previously correct state

glain
Télécharger la présentation

CSC 536 Lecture 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 536 Lecture 10

  2. Outline Recovery Case study • Google Spanner

  3. Recovery

  4. Recovery • Error recovery: replace a a present erroneous state with an error-free state • Backward recovery: bring system into a previously correct state • Need to record the system's state from time to time (checkpoints) • Example: • Forward recovery: bring system to a correct new state from which it can continue to execute • Only works with known errors • Example:

  5. Recovery • Error recovery: replace a a present erroneous state with an error-free state • Backward recovery: bring system into a previously correct state • Need to record the system's state from time to time (checkpoints) • Example: retransmit message • Forward recovery: bring system to a correct new state from which it can continue to execute • Only works with known errors • Example: erasure correction

  6. Backward recovery • Backward recovery is typically used • It is more general • However • Recovery is expensive • Sometimes we can't go back (e.g. a file is deleted) • Checkpoints are expensive • Solution for the last point: message logging • Sender-based • Receiver-based

  7. Checkpoints: Common approach • Periodically make a “big” checkpoint • Then, more frequently, make an incremental addition to it • For example: the checkpoint could be copies of some files or of a database • Looking ahead, the incremental data could be “operations” run on the database since the last transaction finished (committed)

  8. Problems with checkpoints p request reply q • P and Q are interacting • Each makes independent checkpoints now and then

  9. Problems with checkpoints p request reply q • Q crashes and rolls back to checkpoint

  10. Problems with checkpoints p request q • Q crashes and rolls back to checkpoint • It will have “forgotten” message from P

  11. Problems with checkpoints p request reply q • … Yet Q may even have replied. • Who would care? Suppose reply was “OK to release the cash. Account has been debited”

  12. Two related concerns • First, Q needs to see that request again, so that it will reenter the state in which it sent the reply • Need to regenerate the input request • But if Q is non-deterministic, it might not repeat those actions even with identical input • So that might not be “enough”

  13. Rollback can leave inconsistency! • In this example, we see that checkpoints must somehow be coordinated with communication • If we allow programs to communicate and don’t coordinate checkpoints with message passing, system state becomes inconsistent even if individual processes are otherwise healthy

  14. More problems with checkpoints p request reply q • P crashes and rolls back

  15. More problems with checkpoints p request reply q • P crashes and rolls back • Will P “reissue” the same request? Recall our non-determinism assumption: it might not!

  16. Solution? • One idea: if a process rolls back, roll others back to a consistent state • If a message was sent after checkpoint, … • If a message was received after checkpoint, … • Assumes channels will be “empty” after doing this

  17. Solution? • One idea: if a process rolls back, roll others back to a consistent state • If a message was sent after checkpoint, roll receiver back to a state before that message was received • If a message was received after checkpoint roll the sender back to a state prior to sending it • Assumes channels will be “empty” after doing this

  18. Solution? p request reply q • Q crashes and rolls back

  19. Solutions? q rolled back to a state before this was received, or reply was sent p request reply q • Q crashes and rolls back

  20. Solution? p q • P must also roll back • Now it won’t upset us if P happens not to resend the same request

  21. Implementation • Implementing independent checkpointing requires that dependencies are recorded so processes can jointly roll back to a consistent global state • Let CPi(m) be the m-th checkpoint taken by process Pi and let INTi(m) denote the interval between CPi(m-1) and CPi(m) • When Pi sends a message in interval INTi(m) • Pi attaches to it the pair (i,m) • When Pj receives a message with attachment (i,m) in interval INTj(n) • Pj records the dependency INTi(m) INTj(n) • When Pj takes checkpoint CPj(n), it logs this dependency as well • When Pi rolls back to checkpoint CPi(m-1): • we need to ensure that all processes that have received messages from Pi sent in interval INTi(m) are rolled back to a checkpoint preceding the receipt of such messages…

  22. Implementation • Implementing independent checkpointing requires that dependencies are recorded so processes can jointly roll back to a consistent global state • Let CPi(m) be the m-th checkpoint taken by process Pi and let INTi(m) denote the interval between CPi(m-1) and CPi(m) • When Pi sends a message in interval INTi(m) • Pi attaches to it the pair (i,m) • When Pj receives a message with attachment (i,m) in interval INTj(n) • Pj records the dependency INTi(m) INTj(n) • When Pj takes checkpoint CPj(n), it logs this dependency as well • When Pi rolls back to checkpoint CPi(m-1): • Pj will have to roll back to at least checkpoint CPj(n-1) • Further rolling back may be necessary…

  23. Problems with checkpoints p q • But now we can get a cascade effect

  24. Problems with checkpoints p q • Q crashes, restarts from checkpoint…

  25. Problems with checkpoints p q • Forcing P to rollback for consistency…

  26. Problems with checkpoints p q • New inconsistency forces Q to rollback ever further

  27. Problems with checkpoints p q • New inconsistency forces P to rollback ever further

  28. This is a “cascaded” rollback • Or “domino effect” • It arises when the creation of checkpoints is uncoordinated w.r.t. communication • Can force a system to roll back to initial state • Clearly undesirable in the extreme case… • Could be avoided in our example if we had a log for the channel from P to Q

  29. Sometimes action is “external” to system, and we can’t roll back • Suppose that P is an ATM machine • Asks: Can I give Ken $100 • Q debits account and says “OK” • P gives out the money • We can’t roll P back in this case since the money is already gone

  30. Bigger issue is non-determinism • P’s actions could be tied to something random • For example, perhaps a timeout caused P to send this message • After rollback these non-deterministic events might occur in some other order • Results in a different behavior, like not sending that same request… yet Q saw it, acted on it, and even replied!

  31. Issue has two sides • One involves reconstructing P’s message to Q in our examples • We don’t want P to roll back, since it might not send the same message • But if we had a log with P’s message in it we would be fine, could just replay it • The other is that Q might not send the same response (non-determinism) • If Q did send a response and doesn’t send the identical one again, we must roll P back

  32. Options? • One idea is to coordinate the creation of checkpoints and logging of messages • In effect, find a point at which we can pause the system • All processes make a checkpoint in a coordinated way: the consistent snapshot (seen that, done that) • Then resume

  33. Why isn’t this common? • Often we can’t control processes we didn’t code ourselves • Most systems have many black-box components • Can’t expect them to implement the checkpoint/rollback policy • Hence it isn’t really practical to do coordinated checkpointing if it includes system components

  34. Why isn’t this common? • Further concern: not every process can make a checkpoint “on request” • Might be in the middle of a costly computation that left big data structures around • Or might adopt the policy that “I won’t do checkpoints while I’m waiting for responses from black box components” • This interferes with coordination protocols

  35. Implications? • Ensure that devices, timers, etc, can behave identically if we roll a process back and then restart it • Knowing that programs will re-do identical actions eliminates need to cascade rollbacks

  36. Implications? • Must also cope with thread preemption • Occurs when we use lightweight threads, as in Java or C# • Thread scheduler might context switch at times determined by when an interrupt happens • Must force the same behavior again later, when restarting, or program could behave differently

  37. Determinism Despite these issues, often see mechanisms that assume determinism Basically they are saying • Either don’t use threads, timers, I/O from multiple incoming channels, shared memory, etc • Or use a “determinism forcing mechanism”

  38. With determinism… We can revisit the checkpoint rollback problem and do much better • Eliminates need for cascaded rollbacks • But we do need a way to replay the identical inputs that were received after the checkpoint was made Forces us to think about keeping logs of the channels between processes

  39. Two popular options Receiver based logging • Log received messages; like an “extension” of the checkpoint Sender based logging • Log messages when you send them, ensures you can resend them if needed

  40. Why do these work? • Recall the reasons for cascaded rollback • A cascade occurs if • Q received a message, then rolls back to “before” that happened • Now, Q can regenerate the input and re-read the message

  41. With these varied options • When Q rolls back we can • Re-run Q with identical inputs if • Q is deterministic, or • Nobody saw messages from Q after checkpoint state was recorded, or • We roll back the receivers of those messages • An issue: deterministic programs often crash in the identical way if we force identical execution • But here we have flexibility to either force identical executions or do a coordinated rollback

  42. Google Spanner

  43. Google Spanner • Scalable globally-distributed multi-versioned database • Main features: • Focus on cross-datacenter data replication • for availability and geographical locality • Automatic sharding and shard migration • for load balancing and failure tolerance • Scales to millions of servers across hundreds of datacenters • and to database tables with trillions of rows • Schematized, semi-relational (tabular) data model • to handle more structured data (than Bigtable, say) • Strong replica consistency model • synchronous replication

  44. Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Detailed DB features • SQL-like query interface • to support schematized, semi-relational (tabular) data model • General-purpose distributed ACID transactions • even across distant data centers • Externally (strongly) consistent global write-transactions with synchronous replication • Lock-free read-only transactions • Timestamped multiple-versions of data

  45. Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Detailed DS features • Auto-sharding, auto-rebalancing, automatic failure response • Replication and external (strong) consistency model • App/user control of data replication and placement • number of replicas and replica locations (datacenters) • how far the closest replica can be (to control reading latency) • how distant replicas are from each other (to control writing latency) • Wide-areas system

  46. Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Key implementation design choices • Integration of concurrency control, replication, and 2PC • Transaction serialization via global, wall-clock timestamps • using TrueTime API • TrueTime API uses GPS devices and Atomic clocks to get accurate time • acknowledges clock uncertainty and guarantees a bound on it

  47. Google Spanner • Scalable globally-distributed database • Follow-up to Google’s Bigtable and Megastore • Production use • Unrolled in Fall 2012 • Used by Google F1, Google’s advertising backend • Replaced a shardedMySQL database • 5 replicas across the US • Less critical app may only need 3 replicas in a single region which would decrease latency (but also availability) • Future use: Gmail, Picasa, Calendar, Android Market, AppEngine, etc

  48. Google Spanner • spanner-osdi2012.pptx d

More Related