Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”“Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28th, 2007

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi

Outline • Random Graph Models • Flooding and Normalization • Random Walks and Replication • Generalized Search Schemes • Experimental evaluation

Motivation • Flooding + small time-to-live (TTL) performs well in regular graphs • Performance metric: number of exchanged messages/distinct response • Its performance decreases: when TTL increases or for irregular networks • Random Walk performs better than flooding • scalability, granularity • Hybrid + Generalized search schemes: • Random Walks with lookahead, Random Walks with 1-step replication

Contribution • Random walks (RW) with shallow flooding offer good performance (analytic justification) R1: In a random graph model with O(n) nodes of constant degree and O(n1/2) nodes of degree O(n1/2) the expected time to discover Ω(n) is O(n1/2). R2: Random Walks with look-ahead 1 or 1-step replication perform better when there is discrepancy on the degrees of the underlying topology. • Normalized Flooding (NF) solution R3: NF achieves comparable performance to flooding in regular graphs. R4: NF with 1-step replication achieves performance comparable to RW with 1-step replication. R5: Local information of the network (nodes degree) offers global benefit. • Generalized Search Schemes

Random Graph Models • Random Regular Graphs – Gn,d Gn,d represents a graph with n nodes and each node is of degree d. Gn,d has a sum of degree D = nd . • Random Graphs with super-nodes - Gn,d,α,β Given α and βconstants, Gn,d,α,βdenotes a graphs with αn1/2 of degreeβn1/2 (i.e. large vertices) and the remaining nodes of degree d (i.e. small vertices). Gn,d,α,βhas a sum of degree D = (αβ+d)n.

Flooding and Normalization • Theorem 3.1.: Let us consider Gn,drandom regular graph, flooding scenario from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2 Claims: (1) (2) (3)

(1) • Proof:

(2) • Proof:

(3) • Proof:

Theorem 3.2.: Let Gn,d,α,β be a random graph with supernodes and a flooding scenario from node v of degree d with time-to-live τ. Claim: For some τ = O(log log n), the number of distinct responses isΩ(n). Proof: Consider flooding with τ = c logd-1(log n)+1 and vertices visited with TTL τ-1. Assumption: this set (of visited nodes) doesn’t contain a large degree vertex. From d-regular graphs we know that this set contains at least (d - 1)τ-1 edges. The probability that no vertex in Γ(Sτ-1(v)) is bounded by (d/(d+αβ))(d - 1)^(τ-1) = (d/(d+αβ))clog n so within the first O(loglog n) steps we see a large vertex. Flooding and Normalization

Flooding and Normalization • Theorem 3.3. : Let Gn,d,α,β be a random graph with supernodes, a normalized flooding scenario from node v with TTL . Then the number of distinct responses is Ω((d - 1)τ-1) and the number of messages per response is O(1). Proof: From Theorem 3.1. the number of minigroups seen is (d - 1)τ-1 The expected number of small vertices is Q = (d *(d - 1)τ-1)/(d+αβ) LetXi, i = 1,…,N be random variables with P[ Xi=1]=pi and P[Xi=0]=1-pi Using the above Chernoff bound the probability that less than Q/2 are seen is vanishingly small.

Random Walks and Replication • Random Walk with Look-Ahead: • a random walk with shallow flooding on each step of the walk • RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2)) • Theorem 4.2.: Let Gn,d,α,β be a random graph with supernodes and consider a random walk from a node v. Then, in 1-step replication scenario, the expected number of messages and response time to obtain distinct responses is

Theorem 4.3.: Let Gn,d,α,β be a random graph with supernodes and consider Normalized flooding from v with TTL τ≈ (log n)/(2*log(d-1)). Then, in 1-step replication scenario, the number of distinct responses is at least and the number of messages is at most Proof: The number of minigroups seen is(d - 1)τ– 1 and using the Chernoff bounds there will be minigroups corresponding to large vertices.

Generalized Search Schemes • Searching procedure: • A node of degreedinitiates a search based on a budgetk budget = number of messages that are propageted in the network • Among its d neighbors the node picks certain quantities k1,k2,…,kd such that k1 + k2 + … + kd = k • For every neighbor i the master node forwards the message with budget ki (forki = 0 the message is not transmitted) • Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0 • Every node that receives the message for the second yime from another neighbor forwards the message with the corresponding budget • Random Walks + Flooding

Experimental Evaluation • Methodology • Performance Metrics • Median and Mean number of distinct peers discovered (hits) • Minimum, Maximum, Standard Deviation of the number of hits • Number of messages • Granularity of number of messages • Response time • Topologies • Random d-Regular Graphs • Power Law Graphs • Bimodal topologies • Clustered topologies

Normalized Flooding (NF) • Mean number of unique peers discovered as a function of the initial TTL • NF and Standard Flooding behave similarly in Regular Graphs • NF controls the number of messages and provides higher efficiency

Normalized Flooding (NF) • The number of unique peers increases exponentially with TTL in NF case • The number of peers increases faster than exponentially with TTL in topologies with high degrees

Random Walk with 1-step replication

Random Walk with LookAhead (RWLA) • RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered) • RWLA response time is much smaller compared to standard RW

Edge Criticality & Searching with weights • Generalized Searching performs similarly to Standard Flooding in regular graphs • Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used.

Conclusions • Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs • RW with 1-step replication performs better than RW and NF in irregular graphs • Open for improvements: • Generalized schemes (analytic investigation) • Quantifying Directional flooding

“Random Walks in Peer-to-Peer (P2P) Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi

Outline • Motivation • Statistical Estimation and Random Walks (RW) • Searching • Methodology and Topologies importance • Construction and Summary

Motivation • Random Walks (RW) were proposed for constructing searching and topology maintenance protocols in P2P networks • RW improve searching performance as compared to flooding (Cao et al., 2002) • A RW approach to constructing and maintaining unstructured topologies provides good connectivity properties (i.e. constant degree, constant expansion) • Claim: RW approach is a good candidate • to simulate uniform sampling • the number of simulation steps required can be as low as the number of samples in independent uniform sampling • Searching and Overlay Topology Construction • RW searching performs better than flooding for the same number of messages and for cluster and slow dynamic topologies • Construction of P2P networks by random walks

Statistical Estimation & Random Walks • Coupon collection and Chernoff bounds • n - type of coupons & each time one is drawn (uniformly distributed) • Tn - time by which we extracted coupons belonging to all n types • Tαn - time by which we encountered αn distinct types, 0 < α < 1 • X1,…,Xk independent Bernoulli trials, P[Xi=1]=piand P[Xi=0]=1-pi • p -probability that a random drawn object has a particular property • the probability that the property is found in substantially fewer draws than its frequency in the search space and the quality of the estimator X/k are bounded by

Statistical Estimation & Random Walks • Random Walks (RW), Convergence and Cover Time • G = (V,E) undirected graph, |V| = n, and di- degree of vertex I • Aij -adjacency matrix, P -transition matrix which satisfies • f: V→{0,1} which satisfies • Convergence rate metric - the rate at which the RW approaches the stationary distribution • Cover time metric - the time by which all nodes were visited • Trajectory sample average - the rate at which the value of f averaged over successive vertices of the RW trajectory approaches p

Statistical Estimation & Random Walks • Convergence rate is related to the second eigenvalue of P (1) • yt – the vertex that the RW visited at time t • Cover time (2) • Trajectory sample average (3) (1) :[ 11], (2) :[ 12, 13] , (3) :[ 3, 4, 5, 6]

Statistical Estimation & Random Walks • Second Eigenvalue, Expansion and Conductance • S subset of V, C(S) cutset of V (i.e. edges with one point in S and the other one in V\S), vol(S) (i.e. the sum of degrees of vertices in S) • Expansion • Conductance • Known bound [ 11, 14, 15, 16, 17, 18, 19]

Searching • Performance metrics for Flooding and RW • average number of distinct copies of an item located in the search • number of messages used by the searching algorithm • RW performs better than flooding if • multiple search requests for the same item with slow-changing topology • peer clustering ( see [20, 21, 22, 23, 24, 25] for details) • Searching analysis • Methodology • Flat topologies with Uniformly Distributed Content • Topologies with Peer Clustering • Re-issuing the Same Query • Real topologies

Searching - Methodology • Performance Metrics • mean of the number of distinct copies (i.e. Mean) • discrepancy around the mean (i.e. Std) and the failure probability • Cost • number of messages or queries performed during search • Peer-to-peer topologies ( ≈ 1 million nodes) • Flat regular expanders, Two tier topologies with clustering, Power law graphs, Samples from real topologies • Dynamic topologies • rewiring • Content placement • Content clustering affects the performance of searching

Searching – Flat Topologies • Experiment: • one request in a network of 500K peers • Mean hits,Minimum # of hits and Std are similar for Flooding and RW • the entire distribution of hits is similar for Flooding and RW

Searching -Topologies with Peer Clustering • Cluster topology consists of • 5 flat regular graphs of size 40K; from each one pick randomly 1000 nodes to construct another flat regular graph • Number of hits for RW is more concentrated around the mean compared to Flooding

Searching - Reissuing the Same Query • Experiment setup – repeat 4 times the below procedure • each peer sends a request and waits for response • between requests 2% of the links are rewired • each peer initiates a new searching • RW have better performance than Flooding • Mean Hits and Failure Probability

Searching - Reissuing the Same Query • Performance of successive searches depends • on the number of topology changes considered between consecutive searches • Performance of Flooding increases as the rate of topological changes increases • RW Performance remains the same for small variations

Searching – Real Topologies • The number of hits for RW is more concentrated around the mean than in Flooding • P2P have good expansion properties

Construction • P2P network construction concerns with: • peers arrive and leave the network dynamically • strong and weak decentralization • low network overhead per addition or deletion

Baseline Construction of Expander Graphs • ABASE (undirected graph) consists of: • n vertices where each one chooses randomly d vertices • total number of edges = nd and expected vertex degree = 2d • Theorem 4.1. Let G(V,E) a graph constructed by ABASE. Then, G is an expander with high probability and for positive constant α < 1

Baseline Construction of Expander Graphs with Constant Overhead in Random Bits • A’BASE constructionalgorithm: • start a RW at a random vertex on H (constant degree expander graph) • when ABASE needs a random number this is taken from the RW on H • Theorem 4.2. Let G(V,E) a graph constructed by A’BASE. There are positive constants α, 0 < β < 0.5 such that any subset S of at least β|V| and at most 0.5|V| has cutset expansionαalmost surely.

Distributed Construction of Expanders with Constant Overhead on Network Resources • A’H – construction • d daemons , one for each Hamilton cycle • a new arriving node, it contacts the daemon associated with the i-th Hamilton cycle • it attaches after c number of steps between the peer that currently hosts daemon iand one of its neighbors in the cycle i

Distributed Construction of Expanders with Constant Overhead on Network Resources • A’M – construction • d daemons , one for each Hamilton cycle • the arrival of a new arriving node consists of two X and Y nodes; X and Y contact the central server to discover the location of the d daemons • X becomes the neighbor of daemon i and Y the neighbor of the initial daemon’s neighbor

Summary • For Searching • Random Walks (RW) are superior to Flooding • For Construction • RW add new peers with constant overhead • Open Problems • Strong Decentralized Construction algorithm • Can we handle better deletions and expansions of small sets? • How the P2P network parameters (e.g. capacities) affect the performance of RW?

Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

Presentation Transcript

Wednesday 28 th February

Milena Mihail mihail@cc.gatech.edu

February 28, 2007

Stations February 24 th – February 28 th

Presented by Paul Davidson

Presented by: Paul Revill

Morning Work (February 24 th – February 28 th )

February 28 th – March 4 th

February 28 th , 2012

Monday February 28 th

Presented by: Dawie Roodt 27 February 2007

February 28 th 2013

February 28 th , 2012

Network programming Presented by Bogdan Simion

February 28, 2007

February 28 th 2013

FEBRUARY 28 th is…

Week of: February 24 th – February 28 th

Weekly Newsletter: February 24 th – February 28 th

Quiz February 28 th

Presented by: Dawie Roodt 27 February 2007

February 27 th , 2007