BubbleStorm : Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search

BubbleStorm : Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search Sigcomm 2007 By Wesly W. Terpstra, Jussi Kangasharju, Christof Leng and Alejandro P. Buchmann Presented by Kyungbaek Kim ( kyungbak@uci.edu )

Introduction • Motivation • Limitation of key matched P2P( especially DHT ) • Hard to operate complex queries • Vulnerable to massive failure • Propose a new topology and search method for p2p application • Separate query evaluation from network topology • Make topology and measure the network state • Both queries and data are replicated by using BubbleCast • Support more sophisticated queries • More tolerable to massive crash and leave

A1 C2 B1 A B A2 D1 C1 B3 C D B2 Topology Circle view Multigraph view • BubbleStorm’s overlay is a random multigraph • Assigns degree proportional to bandwidth • Each node only manages its neighbor information • When Crush occurs, each node just joins again to get new neighbors  simple self-stabilizing

Join/Leave • To keep the network topology consistent, all join and leave operations are serializable • Use TCP for in-order delivery

1 1 4 3 1 1 8 8 2 1 3 4 1 1 2 1 Bubblecast • Hybridizes random walks with flooding • Precisely controlled replication of random walks • The low latency of flooding • Input  item(Q or D), number of replicas and split factor • Unlike a random walk, it reaches an exponential growing number of nodes per step • Flooding only controls the logarithm of replication and is affected by the degree of nodes • Because of split factor, all edges will carry the same average traffic 17 # of replicas : 17 Split factor : 2

Bubble Size • Correctness in BubbleStorm depends on the number of query and data replicas (q,d) • 1-r = e-c2 = e-qd/n( r = reliability, c = certainty factor, n = # of nodes) • Rq, Rd : the rates at which queries and data are injected into the system • To lead to balanced bubble sizes qRq = dRd q = c √ ( nRd/Rq) and d = c √ (nRq/Rd) • Match threshold T:= D12 / ( D2 – 2D1 ) q = c √ (TRd/Rq) and d = c √ (TRq/Rd) • Cf) Homogeneous case T := n * deg / (deg -2) • As heterogeneity increases, T decreases (Q1, D1) (Q1, D2) (Q2, D1) (Q2, D2) pairs D1 or Q1 D1 or Q1 (Q1, D1) pair D2 or Q2

Measurement Protocol • It measures the number of query and datum replicas needed to ensure that for each (query, datum) pair there exists a rendezvous node. • Actually, it measure the degree of whole nodes • It piggy-backs on the periodic keep-alive messages used by the overlay to detect crashed neighbors. • A useful analogy is Measuring a lake’s volume • Each peer picks a random fish size and puts 1.0 fish into its region • When sending keep-alives, peer v mixes its lake region with its neighbors and itself • Bigger fish in a region consume all smaller fish • The Biggest fish swims throughout the lake until only it remains • Every peer’s ratio of water divided by fish equals the sum

Comparison of P2P search schems

Massive Leave

Massive Crash

Critique • Pros • A new topology which uses heterogeneity and eliminates huge maintenance system traffic • A new search method : BubbleCast • Hybridize random walks with flooding • Cons • Every new content must replicate until the required bubble size • There is no explicit replication management against churn • Tradeoff between maintenance traffic and availability • When crash occurs its availability drops temporally

BubbleStorm : Resilient, Probabilistic, and Exhaustive Peer-to-Peer Search