Byzantine-Resilient Random Membership Sampling in Unstructured Overlay Networks
400 likes | 567 Vues
This research explores Byzantine-resilient gossip-based membership sampling in large-scale unstructured overlay networks. We address the challenges of Byzantine failures, including arbitrary node behavior and statistics biasing, while ensuring connectivity among correct nodes. Our novel Brahms sampling approach guarantees uniform random samples of unique network elements, enhancing robustness against churn and malicious nodes. We analyze gossip protocols using simulated scenarios and propose adaptive strategies to mitigate vulnerabilities, validating our model's performance.
Byzantine-Resilient Random Membership Sampling in Unstructured Overlay Networks
E N D
Presentation Transcript
Brahms Byzantine-Resilient Random Membership Sampling Bortnikov, Gurevich, Keidar, Kliot, and Shraer
Edward (Eddie) Bortnikov Maxim (Max) Gurevich Idit Keidar Alexander (Alex) Shraer Gabriel (Gabi) Kliot
Why Random Node Sampling • Gossip partners • Random choices make gossip protocols work • Unstructured overlay networks • E.g., among super-peers • Random links provide robustness, expansion • Gathering statistics • Probe random nodes • Choosing cache locations
The Setting • Many nodes – n • 10,000s, 100,000s, 1,000,000s, … • Come and go • Churn • Every joining node knows some others • Connectivity • Full network • Like the Internet • Byzantine failures
Byzantine Fault Tolerance (BFT) • Faulty nodes (portion f) • Arbitrary behavior: bugs, intrusions, selfishness • Choose f ids arbitrarily • No CA, but no panacea for Cybil attacks • May want to bias samples • Isolate nodes, DoS nodes • Promote themselves, bias statistics
Previous Work • Benign gossip membership • Small (logarithmic) views • Robust to churn and benign failures • Empirical study [Lpbcast,Scamp,Cyclon,PSS] • Analytical study [Allavena et al.] • Never proven uniform samples • Spatial correlation among neighbors’ views [PSS] • Byzantine-resilient gossip • Full views [MMR,MS,Fireflies,Drum,BAR] • Small views, some resilience [SPSS] • We are not aware of any analytical work
Our Contributions • Gossip-based BFT membership • Linear portion f of Byzantine failures • O(n1/3)-size partial views • Correct nodes remain connected • Mathematically analyzed, validated in simulations • Random sampling • Novel memory-efficient approach • Converges to proven independent uniform samples The view is not all bad Better than benign gossip
Brahms Sampling - local component Gossip - distributed component Gossip view Sampler sample
Sampler Building Block • Input: data stream, one element at a time • Bias: some values appear more than others • Used with stream of gossiped ids • Output: uniform random sample • of unique elements seen thus far • Independent of other Samplers • One element at a time (converging) next Sampler sample
Sampler Implementation • Memory: stores one element at a time • Use random hash function h • From min-wise independent family [Broder et al.] • For each set X, and all , next init Sampler Choose random hash function Keep id with smallest hash so far sample
Component S: Sampling and Validation id streamfrom gossip init next Sampler Sampler Sampler Sampler using pings sample Validator Validator Validator Validator S
Gossip Process • Provides the stream of ids for S • Needs to ensure connectivity • Use a bag of tricks to overcome attacks
Gossip-Based Membership Primer • Small (sub-linear) local view V • V constantly changes - essential due to churn • Typically, evolves in (unsynchronized) rounds • Push: send my id to some node in V • Reinforce underrepresented nodes • Pull: retrieve view from some node in V • Spread knowledge within the network • [Allavena et al. ‘05]: both are essential • Low probability for partitions and star topologies
Brahms Gossip Rounds • Each round: • Sendpushes, pulls to random nodes from V • Wait to receive pulls, pushes • Update S with all received ids • (Sometimes) re-compute V • Tricky! Beware of adversary attacks
Problem 1: Push Drowning Push Alice A E M M Push Bob Push Mallory Push Carol M B M Push Ed Push Dana Push M&M D Push Malfoy M
Trick 1: Rate-Limit Pushes • Use limited messages to bound faulty pushes system-wide • E.g., computational puzzles/virtual currency • Faulty nodes can send portion p of them • Views won’t be all bad
Problem 2: Quick Isolation Ha! She’s out! Now let’s move on to the next guy! Push Alice A E M Push Bob Push Carol Push Mallory Push Ed Push Dana Push M&M Push Malfoy C D
Trick 2: Detection & Recovery • Do not re-compute V in rounds when too many pushes are received • Slows down isolation;does not prevent it Push Bob Push Mallory Hey! I’m swamped! I better ignore all of ‘em pushes… Push M&M Push Malfoy
Trick 3: Balance Pulls & Pushes • Control contribution of push - α|V| ids versus contribution of pull - β|V| ids • Parameters α, β • Pull-only eventually all faulty ids • Pull from faulty nodes – all faulty ids, from correct nodes – some faulty ids • Push-only quick isolation of attacked node • Push ensures: system-wide not all bad ids • Pull slows down (does not prevent) isolation
Trick 4: History Samples • Attacker influences both push and pull • Feedback γ|V| random ids from S • Parameters α + β + γ = 1 • Attacker loses control - samples are eventually perfectly uniform Yoo-hoo, is there any good process out there?
View and Sample Maintenance Pushed ids Pulled ids S |V| |V| |V| View V Sample
Key Property • Samples take time to help • Assume attack starts when samples are empty • With appropriate parameters • E.g., • Time to isolation > time toconvergence Prove lower bound using tricks 1,2,3(not using samples yet) Prove upper bound until some good sample persists forever Self-healing from partitions
History Samples: Rationale • Judicious use essential • Bootstrap, avoid slow convergence • Deal with churn • With a little bit of history samples (10%) we can cope with any adversary • Amplification!
Analysis Sampling - mathematical analysis Connectivity - analysis and simulation Full system simulation
Connectivity Sampling • Theorem:If overlay remains connected indefinitely, samples are eventuallyuniform
Sampling Connectivity Ever After • Perfect sample of a sampler with hash h: the id with the lowest h(id) system-wide • If correct, sticks once the sampler sees it • Correct perfect sample self-healing from partitions ever after • We analyze PSP(t) – probability of perfect sample at time t
Convergence to 1st Perfect Sample • n = 1000 • f = 0.2 • 40% unique ids in stream
Scalability • Analysis says: • For scalability, want small and constant convergence time • independent of system size, e.g., when
Connectivity Analysis 1: Balanced Attacks • Attack all nodes the same • Maximizes faulty ids in views system-wide • in any single round • If repeated, system converges to fixed point ratio of faulty ids in views, which is < 1 if • γ=0 (no history) and p < 1/3 or • History samples are used, any p There are always good ids in views!
Fixed Point Analysis: Push Local view node i Local view node 1 i Time t: lost push push push from faulty node 1 Time t+1: • x(t) – portion of faulty nodes in views at round t; • portion of faulty pushes to correct nodes : • p / ( p + ( 1 − p )( 1 − x(t) ) )
Fixed Point Analysis: Pull Local view node i Local view node 1 i Time t: pull from i: faulty with probability x(t) pull from faulty Time t+1: E[x(t+1)] = p / (p + (1 − p)(1 − x(t))) + ( x(t) + (1-x(t))x(t) ) + γf
Faulty Ids in Fixed Point Assumed perfect in analysis, real history in simulations With a few history samples, any portion of bad nodes can be tolerated Perfectly validated fixed pointsand convergence
Convergence to Fixed Point • n = 1000 • p = 0.2 • α=β=0.5 • γ=0
Connectivity Analysis 2:Targeted Attack – Roadmap • Step 1: analysis without history samples • Isolation in logarithmic time • … but not too fast, thanks to tricks 1,2,3 • Step 2: analysis of history sample convergence • Time-to-perfect-sample < Time-to-Isolation • Step 3: putting it all together • Empirical evaluation • No isolation happens
Targeted Attack – Step 1 • Q: How fast (lower bound) can an attacker isolate one node from the rest? • Worst-case assumptions • No use of history samples ( = 0) • Unrealistically strong adversary • Observes the exact number of correct pushes and complements it to α|V| • Attacked node not represented initially • Balanced attack on the rest of the system
Isolation w/out History Samples • n = 1000 • p = 0.2 • α=β=0.5 • γ=0 Isolation time for |V|=60 Depend on α,β,p
Step 2: Sample Convergence • n = 1000 • p = 0.2 • α=β=0.5, γ=0 • 40% unique ids Perfect sample in 2-3 rounds Empirically verified
Step 3: Putting It All TogetherNo Isolation with History Samples • n = 1000 • p = 0.2 • α=β=0.45 • γ=0.1 Works well despite small PSP
Sample Convergence (Balanced) • p = 0.2 • α=β=0.45 • γ=0.1 Convergence twice as fast with
Summary O(n1/3)-size views Resist Byzantine failures of linear portion Converge to proven uniform samples Precise analysis of impact of failures