Study on Network Size Estimation Schemes for Peer-to-Peer Networks 2008/02/19 Hosik Cho firstname.lastname@example.org
Some Questions • How many people in this room? • Why do you think that? • How many people in this campus? • Can you count them all? • How many nodes in a P2P network over the world?
Contents • Peer to Peer networks • Network size estimation • Estimation methods • Unstructured P2P • Structured P2P • Conclusion
P2P networks • A peer to peer overlay network connects peers in a logical manner on top of IP. • Unstructured P2P: Gnutella, Freenet • Structured P2P: Chord, CAN, Pastry, … • P2P applications • File sharing systems (Kazza, Gnutella) • Video over IP (CoolStreaming) • Voice over IP (Skype)
P2P networks • Characteristics • Scalable • Self-organizing capability • Resilience to failure • Fully decentralized • The system monitoring and obtaining global statistics become much more complex.
Network size estimation • Network size (N) • Load balancing • Restricted broadcasting • Determining network parameters • For unstructured P2P network, most approaches are based on broadcasting. • For structured P2P network, the size can be directly inferred from the density of identifiers.
Related Works • Unstructured P2P • Sample & Collide • Hops Sampling • Gossip-based aggregation • Structured P2P • Token passing • Neighbor sampling • Finger sampling
Sample&Collide (1) • “Birthday Paradox” – The probability of having two people in a room that have the same birthday is at least 50%, for a group of 23 peoples. • The initiator samples nodes uniformly at random until a sample returns a node that already has been selected. • The expected number (X) of samples is √2n • The system size is estimated to X2/2
T Sample&Collide (2) • Initiator node set T>0 • Send to neighbors • Nodes picks a random number U, and decrements T by log(U)/di • T>0, forwards the message • T<0, return its ID to the initiator (sample)
HopsSampling (1) • Probabilistic polling approach • An initiator spreads messages in the network and estimates the system size based on the replies it gets back. • If hopCount < minHopsReporting, a response is set with prob. 1 • Else, the response is sent with prob. 1/2(hopCount-minHopsReporting) • If minHopsReporting=2, only 25% of nodes with distance 4 will report back.
HopsSampling (2) • Initiator node set hopCount=0 • Send to neighbors • If hopCount < minHopsReport, send response • Else, send response with probability depending on hopCount.
Gossip-based (1) • Epidemic-based approach • If exactly one node of the system holds a value 1, and all the other values are 0, the average is 1/N. • An initiator take the value 1, and start gossiping. • The reached nodes participate to the process by setting their value to 0. • At each cycle, each node in the network chooses one of its neighbor and swaps its estimation parameter.
Gossip-based (2) • Estimation (Estimation+neighbor’s_Estimation)/2 • To provide correct estimations, this algorithm needs to wait a certain number of rounds to elapse before computing the size estimation. • This period is the required time for the gossip to propagate in the whole network and for the values to converge.
N Estimation in S-P2P • Assumptions • IDs are uniformly distributed. • Each node knows the total number of nodes (N) in the system. • Nodes do not leave and join frequently.
Basic approaches 7 4 Token 5 (a) Token passing (b) Neighbor sampling
N Estimation in S-P2P • In actual deployed system, • Nodes join and leave frequently. • Node must estimate the time how long a query delivered to the destination. O(logN) • Proximity-based identifiers are adopted for efficient routing. • AS number • geographic location
Uniformity of Identifiers Myth Real
Estimation result (1) Uniformly distributed IDs Proximity ID’s
Extended approach • Structured P2P maintains fingers, routing tables, contacts, etc. • Estimate N more precisely using structural information.
Estimation result (2) Uniformly distributed IDs Proximity ID’s
Conclusion • For unstructured P2P • Tradeoff between the quality of the estimate and the associated overhead. • A proper algorithm should be applied according to its objectives and applications. • For structured P2P • Distribution of identifiers may be skewed. • Use of structural information will make the estimation results more accurate.
References • D. Psaltoulis, D. Kostoulas, I. Gupta, K. Birman, and A. Demers, “Practical algorithms for size estimation in large and dynamic groups,” PODC 2004. • D. Kostoulas, D. Psaltoulis, I. Gupta, K. Birman, and A. Demers, “Decentralized schemes for size estimation in large and dynamic group,” IEEE NCA’05, 2005. • L. Massoulie, A.-M. Kermarrec, E. Le Merrer, and A.J. Ganesh, “Peer couting and sampling in overlay networks: random walk methods,” Technical report MSR-TR-2005-156, 2005. • G.S. Manku, M. Bawa, and P. Raghavan, “Symphony: Distributed Hashing in a Small World,” USITS 2003.