1 / 101

Moving away from the independent and identically distributed failure assumption

Moving away from the independent and identically distributed failure assumption. University of California, San Diego Flavio Junqueira Research Exam/Thesis Proposal Advisors: Keith Marzullo and Geoffrey M. Voelker. Motivation. Common approach for distributed systems: replicate!

raleigh
Télécharger la présentation

Moving away from the independent and identically distributed failure assumption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Moving away from the independent and identically distributed failure assumption University of California, San Diego Flavio Junqueira Research Exam/Thesis Proposal Advisors: Keith Marzullo and Geoffrey M. Voelker

  2. Motivation • Common approach for distributed systems: replicate! • Cheaper than investing on ultra-reliable, specialized components • Enhance performance, availability • E.g. Processes on software-based systems • Typical replication strategy • Compute a threshold t on the failures of processes • Determine the degree of replication required, depending on the problem (e.g. n > 3t for Consensus with arbitrary failures ) • Replicate to this degree • Well suited for independent and identically distributed failures (IID failure assumption) • Non-negligible probability of t failures in any subset of size t+1 • Is it often a reasonable assumption?

  3. Where IID does not apply… • Systems for the Internet • Hosts execute the same popular software systems • Hosts share the same vulnerabilities • Some major outbreaks • Code Red: over 360,000 hosts [Moore02] • Sapphire: over 75,000 hosts [Moore03a] A threshold on the number of failures is unrealistic.

  4. Where IID does not apply… • Quorum systems in a wide-area network [Amir96] • Failures are strongly correlated • Power outages • Network partitions • Software bugs [Little01] • Single version • A demand may cause all replicas to crash • Multiple independently-developed versions • Difficulty of a demand: difficulty in handling it • Level of difficulty varies among the demands • More difficult demands tend to cause multiple versions to fail

  5. Where IID does not apply… • Multi-computer systems [Tang92] • Correlated failures due to shared resources • Network errors • Shared memory • Impact on availability, reliability, and performance • Grid computing • Master delegates computation • Wait replies from slaves • Replicate to achieve fault-tolerance • Dependent failures: same sub-network, same software systems, etc.

  6. Outline • System model • Modeling failures • The classical approach: The threshold model • An alternative to the threshold model: Cores/Survivor sets • Applying it to problems: Consensus • Traditional results on Consensus • Consensus in the core/survivor set model • Generalizing the results for Consensus • General bounds on process replication • Coping with dependent failures in the real world • A few systems that assume dependent failures • An application: The Phoenix Recovery System

  7. System model • Set of processes  = {p1, p2, , pn} • A process is a unit of computation • Communicate by exchanging messages • Reliable channels • Validity: If a correct process p sends a message m to a correct process q, then q eventually receives m; • Integrity: A process p receives a message m from some process q only if q sent m to p;

  8. System model Set  of processes Processes exchange messages Channels are reliable State Distributed algorithm: collection of state machines Step of a process Atomic Execution: sequence of steps of processes

  9. Distributed algorithm • Collection of state machines, one for each process p  • Proceeds in steps of processes • In a step, a process p • Sends a message to a single process • Receives a message from a single process • Undergoes a state transition • Execution • Sequence of steps of processes in 

  10. Timing assumptions • Synchronous systems • Clock drift, message delay, processor speed are bounded • Execution in synchronous rounds • In a synchronous round, a process • sends messages to any number of processes • receives messages from any number of processes • Undergoes a state transition • Asynchronous systems • No bounds on clock drift, message delay, or processor speed

  11. Failure modes for processes • Crash failures • For every faulty process p in some execution of an algorithm A, there is a time tpafter which p stops executing steps of A • Arbitrary failures • A faulty process can deviate arbitrarily from the specification of the algorithm • E.g. crash, sending messages selectively, modify arbitrarily the content of messages • Receive-omission failures • A faulty process either crashes or selectively fail to receive messages • Assumptions • Once a process fails it does not recover • Probability of a total failure is negligible

  12. Modeling failures

  13. The threshold model • Threshold t on the number of process failures • Degree of reliability: R[0,1] • The probability of t+1process failures is smaller than 1-R • Simple and compact representation (n > f(t)) • SIFT project [Wensley76] • Ultra-reliable computer system • Process failures are arbitrary, but non-malicious • Hardware designed to isolate faults (independent failures) • Similar hardware (identically distributed process failures) • IID failure assumption is valid • What if failures are not IID? • Still safe • t is the size of the largest subset of faulty processes in any execution • It does not hurt to consider more

  14. Limitations of the threshold model R : target degree of reliability >R: subset of processes has reliability greater than R

  15. An alternative to the threshold model • Desirable properties • Expressive: scenarios in the previous slide • Flexible: not tied to any particular way of characterizing failures • General: widely applicable • Cores [JM03a] • A core c: minimal reliable subset of processes • At least one process in c is correct in every execution of the system • Generalize subsets of size t+1 • Survivor sets [JM03a] • A survivor set: contains all the correct processes of some execution • Generalize subsets of size n-t

  16. Cores and Survivor sets • R: desired degree of reliability • r(X), X  : evaluates to the reliability of x • A subset C   is a core of  iff • r(C)  R • p C, r(C - {p})  R • C: set of cores of  • A subset S   is a survivor set of  iff • C  C, SC   • p S,  C  C, such that (p  C) and ((S - {p})  C = ) • S: set of survivor sets of  • Cores and survivor sets are the dual of each other

  17. An alternative definition • Design of algorithms • W: be the set of allowed executions • up(w): be the set of correct processes in execution w • A subset C   is a core of  iff • wWs.t. C up() • C’C,  wW s.t. C’ up()= • C: set of cores of  • A subset S   is a survivor set of  iff •  wW s.t. S =up() • S’  S,  wW, S’ up() • S: set of survivor sets of  • : system configuration

  18. An example Blue, Red, and Yellow fail independently Failures of Yellow processes are highly correlated r({Red, Blue, Yellow})= R

  19. Another example Blue: highly-reliable server Red: client Failures of Blue and Red are negatively correlated Probability of more than 3 Red processes failing is negligible

  20. Determining cores and survivor sets • Probability models • E.g. Markov models used in the analysis of dynamic fault trees [Ren98] • To find cores: Minimal subset of processes s.t. probability of total failure in the subset is negligible • Often difficult in practice • Attribute-based model [JM02] • Processes characterized by attributes • Attributes determine failure correlation • Finding a core is NP-hard • Color-based model [JM02] • Single attribute characterizes a process • Polynomial time algorithm to find cores

  21. Cores/Survivor sets vs. Quorum systems • Cores, Survivor sets, Quorums • Subsets of processes • Quorums [Giff79] • Enforce mutual exclusion [GM85] • E.g. One-copy serializability • Quorums necessarily intersect • Execute operations on behalf of the system • Cores/Survivor sets • Do not necessarily execute operations on behalf of the system • Weaker than quorums: no intersection requirement a priori • Generalize objects commonly used in proofs and algorithms • Cores: subsets of size t+1 • Survivor sets: subsets of size n-t

  22. Consensus

  23. Motivation for Consensus • Replication often requires coordination • Coordination problems • Atomic broadcast • Clock synchronization • Agreement on fault-tolerant processors (FTP)

  24. Consensus specification • Each process begins with a proposed value v V • Goal: agree on a single value • Typical Consensus definition [Attiya98] • Agreement: No two correct processes decide on different values • Termination: Every correct process eventually decides • Validity: If a process p decides on value v, then v was proposed by some process q • Strong validity: if every process has v as its initial value, then v is the only possible decision value [Attiya98] • Vector validity: A correct process decides on a vector  such that [Doudou98] • If pi is correct, then [i] has the initial value of pi or null • At least t+1 elements of  are initial values of correct processes

  25. Solution for any number of failures Full-information algorithm (t+1 rounds, ) Early-deciding algorithms [LF82, CB00] For any execution with f failures, correct processes decide in at most f+1 rounds ( ) Clean round: Round in which no process fails Process receives messages from the same set of processes in two consecutive rounds Message complexity: O(f·||2) Synchronous systems - Crash failures

  26. Algorithm SyncCrash [JM03a, JM03d] Choose a core C, preferentially the smallest Execute early-deciding algorithm among processes of C Every process in P has an array of |C| positions, one for each process in C Processes in C send messages to processes in P-C as well A process decides when a round with no failures in C happen Decision in at most |C| rounds If |C|-1 < t, then improves on number of rounds Message complexity: O(f·|C|·||) In the core/survivor set model

  27. Impossible if n 3•t [Lamport82] Strong Consensus Proof idea Consensus algorithm that solves for ||  3·t Execution in which agreement is violated Assume ||  3·t Partition (A, B, C) of  s.t. each subset has at most t processes Execution 1 (A, B, C: v) Execution 2 (A, B, C: v’) Execution 3 (A: v; B: v’, C: *) Synchronous systems - Arbitrary failures

  28. In the core/survivor set model • Lower bound on process replication [JM03a, JM03d] • Byzantine Partition: Every partition (A, B, C) of  is such that at least one of the subsets contains a core • Byzantine Intersection: • The intersection of every pair of survivor sets in S contains a core • The intersection of every three survivor sets in S is not empty Scenario  (A, B, C: v) Scenario  (A, B, C: v’) Scenario  (A: v; B: v’, C: *)

  29. No subset contains a core S1S2S3 is empty Equivalence of Byzantine Intersection and Partition In a partition (A,B,C):

  30. Solving Consensus for arbitrary failures • In the threshold model: Lamport et al. [Lamport82] • Solution for n>3·t in t+1 rounds • In the core/survivor set model • Modified algorithm by Lamport et al. • Solution for systems satisfying Byzantine Partition • Replace subsets of processes of size n-t by survivor sets • Replace majority by intersection of two survivor sets • Enable solution for some systems • ={pa, pb, pc, pd, pe} • C={papbpc, papd, pape, pbpd, pbpe, pcpd, pcpe, pdpe} • S={papbpcpd, papbpcpe, papdpe, pbpdpe, pcpdpe}

  31. Lower bound on the number of rounds • Definitions • : replication requirement (e.g. Byzantine Partition) • is a subsystem of iff • satisfies  • A subsystem is minimal if there is no smaller subsystem • Theorem: Given a system [JM03a, JM03b] • is a minimal subsystem of sys • A is a Consensus algorithm

  32. Back to the example • ={pa, pb, pc, pd, pe} • C={papbpc, papd, pape, pbpd, pbpe, pcpd, pcpe, pdpe} • S={papbpcpd, papbpcpe, papdpe, pbpdpe, pcpdpe} • Crash failures • Lower bound on the number of rounds: • Arbitrary failures • Lower bound on the number of rounds: • Bound is different for crash and arbitrary failures!

  33. Asynchronous systems • No solution for pure asynchronous systems even for a single crash failure [FLP85] • Slow process vs. Faulty process: requires a liveness property • Common approaches • Partially synchronous systems [DLS88] • Extend model with failure detectors [CT96] • Crash failures (S [CT96]) • Crash Partition: Every partition (A,B) of  is such that either A or B contains a core • Crash Intersection: The intersection of every two survivor sets contains a core (coterie [GM88]) • Arbitrary failures (M [Doudou98]) • Byzantine Partition/Intersection

  34. Related work - Hybrid failures models • Moves away only from the identically distributed failure assumption • Different failure modes, one class for each mode [LR94] • Manifest (c):detectable failures (e.g. corrupted messages) • Symmetric (s): behavior deviates arbitrarily, but it is the same for every other processor (e.g. send the same erroneous value to every other process) • Arbitrary (a): behavior deviates arbitrarily (e.g. send different values to different processes) • Algorithm for the Oral messages problem

  35. Replication requirements elsewhere • More general descriptions of failure scenarios • Fail-prone systems [Malkhi97] • Collusion and adversary structures (malicious players) [Hirt97] • Martin et al [Martin02] • Confirmable writes in quorum systems • Property: for every subset B in a fail-prone system and every pair of quorums Q1, Q2, we have that Q1Q2\B  intersection of every pair of quorums contains a core • Hirt and Maurer [Hirt97] • Secure multi-party protocols • Passive model: no pair of collusions can add up to the set of players  set of correct players is a coterie • Active model: no three adversaries can add up to the set of players  intersection of three sets of correct players is not empty

  36. Generalizing n > k t (Work in progress)

  37. Motivation: k integer • Properties establishing bounds on process replication are similar for problems • Asynchronous crash Consensus( W) • TM: n > 2 • t • C/SS: S1, S2  S: S1  S2   • State-machine replication: arbitrary failures • TM: n > 2 •t • C/SS: S1, S2  S: S1 S2   • Synchronous arbitrary Consensus • TM: n > 3 •t • C/SS:S1, S2, S3 S: S1 S2  S3  

  38. Consensus for synchronous systems with receive-omission faults In the threshold model: Execution 1: Process in B and C crash Processes in A propose 0 and decide upon 0 Execution 2 Process in A and C crash Processes in B propose 1 and decide upon 1 Motivation: k rational • Proof idea • Execution 3 • Process in A omit to receive msgs from processes not in A • Processes in B omit to receive msgs from processes not in B • Processes in A propose 0 and decide upon 0 • Processes in B propose 1 and decide upon 1 • Agreement is violated!

  39. Generalizing the partition and the intersection properties • (, )-Partition. For every partition of , there is a subset such that: • (, )-Intersection. For every : ,: subset of S

  40. Threshold Model ( ) Some intuition on the generalized properties • =3, =2 AC contains a core Core/Survivor set Model

  41. Bounds on process replication • Lower bound • Every set of processes  that satisfies , also satisfies (, )-Partition • In every partition of  into  subsets, there are  subsets s.t. the union contains at least t+1 processes • consequently a core • Upper bound (work in progress) • If a problem P can be solved by an algorithm A in a system satisfying , then P can be solved by a system satisfying (k,1)-Partition • Simulate a system under the threshold model • Rational k • Looking for a candidate algorithm to motivate

  42. Implications • Algorithms designed under the threshold model can be automatically translated to our model, for integer k • There is no need to rethink the whole FT distributed systems world • If it simplifies, one may design an algorithm under the threshold model and later translate using our technique

  43. Correlated failures in the real world (work in progress)

  44. Background: Systems considering dependent failures • Oceanstore [WMK02] • Online mechanism to correlate failures • Identify subsets of maximally independent failures • Problem • Correlate failures only after they have happened • Not useful for malicious behavior • PASIS [BWWG02] • Survivable storage systems • Add correlation level to classical model of availability • Two models to determine correlation level • Conditional probabilities • Beta-binomial distribution • Problem: • Requires the computation of failure distributions

  45. Coping with Internet catastrophes: Phoenix • Possible approaches • Contain Internet pathogens: very challenging [Moore03b] • Recover from catastrophes: replicate data • Typical replication strategy • Assume independent host failures • Compute a threshold t on the number of failures • Replicate to this degree • Shared vulnerabilities Dependent host failures • Independent host failures is not a suitable assumption • Threshold t on the number of host failures • From previous events, t can be large • Code Red worm infected over 360,000 hosts

  46. Our replication strategy • Desirable properties • Enable recovery of data after an Internet catastrophe • Small replica sets • Informed strategy for replica placement [JBMSV03] • Sets of hosts that fail independently • Hosts executing different sets of software systems • Classes of software systems: attributes • E.g. Operating system • Potentially vulnerable software systems: attribute values • E.g. Linux, Windows • Replicate data on a set of hosts that have different values for each attribute: cores

  47. Phoenix { , , } An example • Attributes • Operating system:{ , } • Web server:{ , } • Web browser:{ , } • Cores • Red and Green (orthogonal core) • Red, Yellow, and Blue { , , } { , , } Attribute configurations Attribute configurations { , , }

  48. In this presentation… • Feasibility of this approach • What is the impact of diversity on storage overhead and load? • Diversity: distribution of attribute configurations • Storage overhead: size of the replica set (core) • Storage load: given a host h, number of cores h participates • Simulations • Levels of diversity • Varying attribute sets

  49. A set H of hosts A set A of attributes Every attribute has the same cardinality y A mapping M from hosts to attribute configurations Diversity Determined by M Often skewed in practice (93% Windows) [OneStat] Modeling diversity Single parameter f [0.5,1) A sharef of the hosts has a share(1-f)of the attribute configurations System model Attribute configurations: Example 1: f = 0.5 Example 2: f = 0.75

  50. { , , } { , , } { , , } { , , } Phoenix Phoenix Attribute configurations Attribute configurations Attribute configurations Attribute configuration { , , } { , , } { , , } Heuristic to find cores • Attributes • Operating system:{ , } • Web server:{ , } • Web browser:{ , } • Cores • Red and Green • Red, Yellow, and Blue

More Related