220 likes | 343 Vues
This study explores the concept of coterie availability in multi-site distributed systems, highlighting its significance in improving system reliability and performance. It presents a model for characterizing failures, utilizing survivor sets and quorum constructions to optimize availability metrics. The implementation of coteries aids in achieving better resource sharing, resilience against site failures, and supports mechanisms like distributed mutual exclusion and consensus protocols, notably using Paxos. Practical issues are examined with experimental insights from PlanetLab, providing a roadmap for enhanced multi-site resilience.
E N D
Coterie availability in sites Flavio Junqueira andKeith Marzullo University of California, San Diego DISC, Krakow, Poland, September 2005
Multi-site systems • Emerging class of distributed systems • Collection of sites across a WAN • Multiple nodes in each site • Share resources • Data sets • Computational power • E.g. BIRN, Geon, TeraGrid, PlanetLab • Site failure • All the nodes in a site simultaneously unavailable
Site availability — BIRN 10 sites experience at least one outage One site under 97%
Improving availability • Better availability through replication • Coteries • Set system of processes: a set of subsets of processes • Each subset is called a quorum • Minimal sets, pairwise intersect • Coteries are useful • Distributed mutual exclusion • Distributed registers • Consensus through Paxos • Coterie availability in multi-site systems
Roadmap • System model • Availability metrics • Previous deterministic metrics not necessarily good • A new metric • Failure model • Characterize failures using survivor sets • Survivor sets: more expressive • Quorum construction • Multi-site hierarchical construction • Practical issues • Failure model in practice • PlanetLab experiment • Conclusions
System model • Set Pof processes • Pairwise connected by quasi-reliable asynchronous channels • Process failure: crash • Processes can recover • Set B of sites • Partition of the set processes • Site failure: simultaneous failure of all the processes in the site • Process failures are not independent • Execution • Sequence of steps of processes • E: set of all executions • In a step s • Available process in s • p P is available if p F(s)
Survivor sets • A set S P is a survivor set iff • Example E={E1,E2,E3,E4} Processes Sites E1,E2: s1 E3: s2 E4: s1 s1 NF(si) Survivor sets
Availability metrics • Traditional deterministic metrics • Undirected graph: nodes = processes, edges = comm. links • Node vulnerability: Minimal number of nodes • Edge vulnerability: Minimal number of edges • Majority is optimal [Barbara and Garcia-Molina’86] • Complete graphs
Majority Quorum: 5 processes In some step, no quorum can be formed Using SPas quorums In every step, at least one quorum can be formed A counterexample Sites Processes Survivor sets Majority is not optimal
Availability metrics • Traditional deterministic metrics • Undirected graph: nodes = processes, edges = comm. links • Node vulnerability: Minimal number of nodes • Edge vulnerability: Minimal number of edges • Majority is optimal [Barbara and Garcia-Molina’86] • Complete graphs • A new metric A(Q), Q is a coterie • Number of covered survivor sets in Q • A survivor set S is covered in Q if:
Fp[1]={{ }: i{1,2,3}} i Fp[2]={{ }: i{1,2,3}} i Fp[3]={{ }: i{1,2,3}} i Sp={{}:i, j,k,l{1,2,3} ij kl} i j k l {{}:i, j,k,l{1,2,3} ij kl} i j k l {{}:i, j,k,l{1,2,3} ij kl} i j k l Failure model Processes (P) • Multi-site hierarchical model • A set Fs of subsets of B • Subsets of simultaneously faulty sites • An array Fp • One entry per site • Each entry: subsets of processes in the site • Subsets of simultaneously faulty processes at a site • A survivor set S: FS Fs • Bi FS:FP Fp[i]:P\FP S • Bi FS:Bi S = 1 2 3 1 2 3 1 2 3 B1 B2 B3 Sites(B ) Fs ={{B1},{B2},{B3}}
Quorum construction • Optimal availability with respect to A • Coterie Q : Sp = Q OR Q dominates Sp • Survivor sets in Sp pairwise intersect • If not, then optimally discarding survivor sets is NP-Complete • A special case: Qsite • All subsets of B of size fs inFs • All subsets of size t of Bi in Fp[i], for every i Quorums E.g.:fs = 1, t = 1 Site 1 Site 2 Site 3
Failure transitions Repair transitions Model in practice • Qsite • fs: Threshold on site failures • Data on site availability • t : Threshold on process failures • Markov chains • One Markov chain for each site • Transitions • Failure transitions: same probability, homogeneous processes • Repair transitions: variable probability, amount of resources used
PlanetLab experiment • Toy application • Paxos: quorums of acceptors • Client accessing quorums • Hosts used • Three sites: three from each site • One UCSD host: proposer, learner • Three settings • 3Sites: One acceptor per site • Quorum: two hosts • 3SitesMaj: All hosts • Quorum: four hosts, majority from each of two sites • SimpleMaj: All hosts • Quorum: any five processes UC Davis UC San Diego Duke UT Austin 3SitesMaj has better availability SimpleMaj has worse availability
The Bimodal model • Sites are survivor sets • Sp is not a coterie • “Throw out” survivor sets • In general, optimal solution is NP-Complete • Simple solution for this model • Practical issues • Practical for two sites • More than two sites: open problem
Conclusions • Coteries for multi-site systems • Site failures: process failures not independent • A new metric • Counts covered survivor sets • Multi-site hierarchical construction • Practical • Illustrated with Markov model • Experiment shows better availability • Using majority quorums is not a good idea • Not optimal • Poor performance • Future work • More experiments, more constructions, real deployment
The multi-site hierarchical model A set Fs of subsets of B An array Fp One entry per site Each entry: subsets of processes in the site A survivor set S: FS Fs Bi FS:FP Fp[i]:P\FP S Bi FS:Bi S = The bimodal model A set Fs of subsets of B There is one site that is in no element of Fs An array Fp A survivor set S As in the previous model OR Bi B: S = Bi Failure models Processes 1 2 3 1 2 3 B1 B2 Fs = Fp[1]={{ }: i{1,2,3}} i Fp[2]={{ }: i{1,2,3}} i MSH:Sp={{}:i, j,k,l{1,2,3} ij kl} i k l j B:Sp={{}:i, j,k,l{1,2,3} ij kl} B i j k l
Bimodal construction • Bimodal model • By construction: Not all pairs of survivor sets intersect • Discard survivor sets until remaining intersect • Selecting optimally is NP-Complete • Solution: Remove |B|-1 survivor sets • Survivor sets containing processes from multiple sites pairwise intersect • Construction is also optimal with respect to metric A • A special case: Bsite • All elements of Fs have size fs • All elements of Fp[i] have the same size t, for every i E.g.:fs = 1, t = 1 B1 Quorums B2
Site availability • Goals • Show that sites are unavailable frequently enough • BIRN - Biomedical Informatics Research Network • Test bed projects centered around brain imaging • Currently: 19 universities, 26 research groups • Availability • Monthly basis • Pings (BIRN-CC) • Storage broker logs • Site availability • Jan/04-Aug/04 • Availability under 100% • On average in 5 out of the 8 months
Causes of site failures • Misconfigured software • Shared resources • Storage • Power circuits • Cooling pipes • Air conditioning • Network