Analysis of Asynchronous Gossip Complexity in Distributed Systems

C. Georgiou, S. Gilbert, R. Guerraoui, D. R. Kowalski On the Complexity of Asynchronous Gossip Presented by: Tamar Aizikowitz, Spring 2009

Introduction • In previous lectures, we considered the problem of gossiping in synchronous systems. • It is common to argue that distributive applications are generally synchronous. However, sometimes… • Delay bounds are not known. • Known bounds may be conservative. • Today, we consider gossip in asynchronous systems. • No a priori bounds on message delay and relative processor speeds.

Outline • Model Definition and Assumptions • Lower Bound and “Cost of Asynchrony” • Asynchronous Gossip Algorithms • EARS • SEARS • TEARS • Application for Randomized Consensus

Model Definition and Assumptions

Model Definitions • Asynchronous message-passing • Fixed set of n processes, with known unique identifiers: [n] = {1,…,n} • Direct communication between all processes • Physical communication network ignored • Up to f < ncrash failures • No lost or corrupt messages

Timing Assumptions • Time proceeds in discrete steps. • Each step, some subset of processes are scheduled to take a local step. • Each local step, a process: • As long as a process has not crashed, it will eventually be scheduled for a local step Local Computations

Timing Bounds • For a given execution, we define bounds on delays: • d = maximum message delivery time • If p sent m to q at time t, then q will receive m no later than time t + d (assuming q is not crashed). • Simulates communication delay. • δ = maximum step size • Every δ time steps, every non-crashed process is scheduled at least once. • Simulates relative processor speeds.

Gossip • Every process has a rumor it wants to spread • A gossip protocol must satisfy: • Rumor gathering: eventually, every correct process has collected all rumors of all other correct process. • Validity: any rumor added to a processes’ collection must be the initial rumor of some process. • Quiescence: eventually, every process stops sending messages forever.

Gossip Continued… • Gossip completes when each correct process has: • Received the rumors of all other correct processes • Stopped sending messages • All other processes are crashed. • Note:In an asynchronous system, a process can never terminate. • It cannot be “sure” that it received all messages. • It can, however, stop sending messages.

Complexity Measures • Let A be an asynchronous gossip algorithm. • A has time complexityTas(d,δ) and message complexityMas(d,δ) if for every infinite execution where bounds d and δ hold: • Every correct process completes by expected time Tas • The total number of messages sent is ≤ Mas • If d=δ=1 are known a priori to the algorithm then A is synchronous and Ts, Ms are defined analogously.

Adversary Models • We consider two adversary models… • Adaptive Adversary: • Schedules processes, message deliveries, and crashes, dynamically during the computation. • Determines d and δ bounds. • Knows the distribution of the algorithm’s random choices. • Oblivious Adversary: • Determines schedule beforehand.

Lower Bound The “cost” of asynchrony.

Background • Best results for synchronous gossip: • Time:O(polylogn) • Messages:O(n polylogn) B.S. Chlebus, D.R. Kowalski, Time and Communication Efficient Consensus for Crash Failures (will be presented next week…) • Trivial algorithm for asynchronous gossip: • Time:O(d+δ) • Messages:Θ(n2)

Lower Bound • Theorem 1: For every gossip algorithm A, there exist d,δ≥1 and an adaptive adversary that causes up to f<n failures such that, in expectation, either: • Mas(d,δ) = Ω(n + f2 ) , or • Tas(d,δ) = Ω( f (d+δ)) • In other words… No randomized asynchronous gossip protocol can be both time and message efficient against an adaptive adversary. • Efficient = w.r.t. best known synchronous protocol.

Adversary Strategy • Main Idea: two types of gossiping techniques… • Send to many: • Send to few: message inefficient time inefficient

Proof of Lower Bound • The Ω(n) lower bound for the number of messages is straightforward: • Therefore, we need to show Ω(f 2) for the number of messages or Ω(f(d+δ)) for the time… Every proc. needs to send its rumor to at least one other proc.

Divide and Conquer • Set f = min{f,n/4} • Partition [n] into two sets: • |S1| = n-f/2 and |S2| = f/2 • Execute set S1 with d=δ=1 until all processes in S1 complete, and cease to send messages. S1 S2

Choose Adversary Strategy • Let t be the time at which S1 completes. If t > f: • Fail all processes in S2 Gossip is complete at time t • As d=δ=1 and t > ft = Ω(f(d+δ)) ✓ • If t ≤ f, check whether most processes in S2 send “many” messages or “few” messages. • Apply appropriate adversarial strategy • Many messages Mas(d,δ) = Ω( f2 ) • Few messages Tas(d,δ) = Ω( f (d+ δ))

Examine S2 • For each p in S2 simulate: • p receives all messages sent to it from S1 • p executes f/2 isolated steps, i.e., doesn’t receive any messages • p is promiscuousif, in expectation, p sends at least f/32 messages. • Let PS2denote the set of promiscuous procs.

S2 Mostly Promiscuous • Case 1:|P| ≥ f/4 (most procs. are promiscuous) • At time t, deliver all messages from S1 to S2. • Schedule all processes from S2 in each of the next f/2 time steps δ = 1 • Do not deliver any messages d > f/2 •  All processes in S2 have taken f/2 isolated steps •  In expectation, each proc. in P sends f/32 messages • Mas(d,δ) =Ω(f/4 ∙ f/32) =Ω( f2 )✓

S2 Mostly Non-promiscuous • Case 2:|P| < f/4 (most procs. not promiscuous) • NonP = S2 – P, i.e. the non-promiscuous procs. • Main idea: find two procs. in NonP with a constant probability of not communicating directly, and make sure they don’t communicate for a “long” time. S1

Finding two Disconnected Procs. • We need to find two processes with a constant probability of not communicating directly… • N(p) = all processes q s.t. p sends a message to q with probability < 1/4 during f/2 isolated steps. • For pNonP, the number of processes not in N(p) is less than f/8. • Else, p sends a message with probability > 1/4 to at least f/8 processes p sends > f/32 messages, which is a contradiction to pNonP.

Finding two Disconnected Procs. • Claim: For pNonP, there are “many” processes in N(p) from NonP. •  |N(p) ∩NonP|≥ f/8 |NonP| ≥ f/4 |N(p)| ≤ f/8

Finding two Disconnected Procs. • Consider the following directed graph: • Nodes: processes from NonP at least f/4 nodes • Edges: if qN(p) • Each p has ≥ f/8 outgoing edges •  Total of ≥ f/8∙f/4 = f2/32 edges • There are () = f/4 (f/4 - 1)/2 = f2/32 - f/8 pairs. •  There exists a bi-directional edge in the graph. •  There exist p,qs.t. pN(q) and qN(p). • p,q have a constant probability of not communicating. p q f/4 2

Isolating Two Processes • At time t, fail all processes in S2 except p and q. • Execute p,q for f/2 local steps with d=1. • Fail all processes in S1 that p,q send messages to. S1

Isolating Two Processes Continued… • Pr[p,qdo not communicate] = (1-1/4)(1-1/4)=9/16 • All processes which receive messages in S1 are failed p and q are isolated with probability 9/16. • By Markov’s inequality: the probability that p or q send less than f/8 messages is at least 3/4. • Pr[X ≥ f/8] ≤ f/32/f/8 = 1/4 •  With probability 9/16, p and q send at most f/4 messages. •  Number failed ≤f/4+ f/2 - 2 =3f/4 – 2 < f .

Proof of Lower Bound Completion! • Using a union bound, the probability that p,q do not communicate and that they send no more than f/4 messages is at least (1-(7/16 + 7/16)) = 1/8. • In this case, gossip is not complete after f/2 local steps, as p and q do not know each other’s rumor. • d=1 and each local step takes δp,q run for time at least (d +δ)f/2 with probability at least 1/8. •  In expectation, Tas(d,δ) = Ω(f(d+δ)) .

Cost of Asynchrony • Consider the worst cast ratio between asynchronous algorithms and synchronous ones: • CostT = Tas / min Ts • CostM = Mas / min Ms • Based on Theorem 1 we have: • CostT = Ω( f ) • CostM = Ω(1 + f2/n) • Note: For f = Θ(n) we have either a Θ(n) slowdown or a Θ(n) increase in messages.

Gossip Algorithms EARS SEARS TEARS

Epidemic Asynchronous Rumor Spreading • Each process has the following data: • rp = the rumor of process p • Vp = the set of all rumors known to p • Ip = a set of pairs (r,q) s.t. p knows r was sent to q • Lp = { q | rVp , (r,q)  Ip} • Main idea: • Send Vp and Ip to a random process • Update Vp and Ip according to messages received • Use Lp to know when to “sleep”

EARS(rp) • Init:Vp{rp} ; IpØ ; Lp[n] ; sleep_cnt0 • repeat: • for every message m = < V,I > received do • VpVpUm.V; IpIpUm.I • update Lp based on Vp and Ip • ifLp= Ø then sleep_cnt++elsesleep_cnt0 • ifsleep_cnt<Θ(n/n-f log n) then • choose q uniformly at random from [n] • send m = <Vp,Ip> to q • for every r in Vpdo IpIpU (r,q) • update Lp based on Vp and Ip

EARS Analysis • Rumor Gathering: • Every correct process eventually takes a local step and sends its rumor to another process. • Every process that receives this rumor will continue spreading it until it knows that all procs. have received it. • Eventually, w.h.p., every process has received the rumor. • Validity: Only original rp values are gossiped • Quiescence: After all processes have gathered all the rumors, all Lp -s will be empty, and eventually, w.h.p., all processes will go to sleep.

EARS Analysis Continued… • Theorem 6: Algorithm EARS completes gossip w.h.p. under an oblivious adversary with • O(n/n-f log2n(d+δ)) time complexity • O(n log3n(d+δ)) message complexity • Note: for small f and d=δ=1, complexity is comparable to best synchronous algorithm. • O(log2n) time complexity • O(n log3n) message complexity

SpammingEARS • Same as EARS except: • Message is sent to Θ(nε log n) processes • Only one shut-down step • Theorem 7: For every constant ε < 1, algorithm SEARS has, w.h.p. • O(n/ε(n-f) (d+δ)) time complexity • O(n2ε/ε(n-f)log n(d+δ)) message complexity • Note: for f < n/2 we have constant time w.r.t. n. • Intuition: Send more messages each round to save time, but pay with high message complexity.

Two-hop EARS • Majority gossip: Each correct process receives only a majority of the rumors. • Assumption:f < n/2 • Majority gossip is useful for applications such as Consensus… • Main idea: two phase algorithm: • Phase 1: send rumor to a set of processes • Phase 2: every certain number of phase 1 messages received, send all known rumors to a set of processes

TEARS(rp) Sketch • Init: • a4 n1/2 log n; Vp{rp} ; first_cnt0 • set1q, putq in set1 with probability a/n • set2q, putq in set2 with probability a/n • for every q in set1do send m = < Vp , first > to q • for every m received do • VpVp U m.V • if m.flag = firstthen first_cnt ++ • if pred(first_cnt) then // check number of first messages • for every q in set2do send m = < Vp , second > to q

TEARS Correctness • Best case analysis: The sets set1 and set2 are of size a, in expectation. Therefore: • Each process sends its rumor, in expectation, a times in the first phase. •  Every process eventually receives a first level messages with processes rumors. • If all processes receives all a first level messages before sending their final second level message, then they will send a second level messages with a rumors. •  Every process will receive a2=16n log2n > n/2 .

TEARS Correctness Continued… • Worst case analysis: Using the Chernoff bound, it can be shown, w.h.p., that • A sufficient number of rumors reach a sufficient number of processes in first level messages before they finish their second phase. • These are called well distributed rumors. • These rumors are then sent by “enough” processes in second phase messages. • Therefore, w.h.p., each process receives an additional amount of rumors that complements the number of well distributed rumors to at least a majority.

TEARS Analysis • Theorem 12: Algorithm TEARS completes majority gossip w.h.p. under an oblivious adversary with: • Time complexity:O(d+δ) • Message complexity:O(n7/4 log2n) • Proof of time complexity: • By time δ, all 1st level messages have been sent. • By time δ+d, all these messages have arrived. • By time 2δ+d, all 2nd level messages have been sent. • By time 2δ+2d, all these messages have arrived •  Gossip completes in O(d+δ).

Randomized Consensus

The Consensus Problem • n processes, each with an initial value vp. • Each process must choose an output value dp satisfying: • Agreement: All output values are the same. • Validity: Every output value is vp for some p. • Termination: Every process eventually decides and outputs a value, w.h.p. (preferably 1). • Recall: Non-randomized consensus with even one crash failure is impossible.

The Rabin-Canetti Framework • Initially:r = 1 and prefer = vp • while true do • votes get-core(vote,prefer,r) // get votes of majority • let v be the majority of phase r votes • if all phase r votes are vthendpv // decide v • outcomes get-core(outcome,v,r) • if all phase r outcome values are wthenpreferw • elseprefer common-coin() • r ++

Routine get-core • initially:set1 = set2 = set3 = Ø ; values[j] =  • when get-cor(val) invoked do • values[i]val • broadcast(1,val) • when (1,v) received from pj • values[j]v • add j to set1 • if | set1| = n-fthen broadcast(2,values) • when (2,V) received from pj • merge V into values • add j to set2 • if |set2| = n-fthen broadcast(3,values) • when (3,V) received from pj • merge V into values • add j to set3 • if |set3| = n-fthenreturn(values)

Implementing get-core using Gossip • Replace broadcast sends with asynchronous gossip. • Majority gossip is sufficient. • Note: gossip start asynchronously (not all processes finish phase 1 and start phase 2 at the same time). • Assuming a process begins gossip as soon as it receives a rumor, the asymptotic complexity remains the same. • To do so, if a process receives a rumor from a gossip protocol it has not yet initiated, it adopts the state of the sender, and proceeds to gossip accordingly.

Analysis of Algorithms • Theorem 13: For an oblivious adversary and f < n/2, consensus algorithms based on EARS, SEARS and TEARS using the Canetti-Rabin framework have the same complexity as the gossip protocols. • In particular: the algorithm based on TEARS has: • O(d+δ) time complexity • O(n7/4 log2n) message complexity • This is the first randomized asynchronous consensus algorithm to terminate in constant time w.r.t. n and with strictly sub-quadratic message complexity.

Thank you!

Analysis of Asynchronous Gossip Complexity in Distributed Systems

Analysis of Asynchronous Gossip Complexity in Distributed Systems

Presentation Transcript

On the Complexity of Scheduling

Gossip

Gossip

Simplicity on the Other Side of Complexity

On the Complexity of Join Predicates

On the Round Complexity of Covert Computation

On the Cryptographic Complexity of the Worst Functions

Gossip

Gossip

On the Complexity of Distributed Network Decomposition

On the Complexity of Trial and Error

on the complexity of orthogonal compaction

On the Complexity of K-Dimensional-Matching

On the Complexity Measures of Genetic Sequences

On the Complexity of Transfer in Multilingualism

On the complexity of numerical computation

On The Complexity of the k -Colorabitly Problem

GOSSIP

On the Complexity of the Marginal Consistency Problem

What’s the scuttlebutt on Secure Gossip ?

On the Computational Complexity of Markets

On the Round Complexity of Covert Computation