200 likes | 335 Vues
This study explores the highly successful file-sharing system KaZaA, utilizing the FastTrack protocol to understand its overlay topology, peer selection mechanisms, and index management. With around three million active nodes and four main clients (KaZaA, KaZaA-lite, Grokster, and iMesh), the research aims to measure performance, availability, and scalability. Key findings include dynamic connection patterns, unique peer distributions, and insights into how peer selection is influenced by workload and IP prefix matching. The results have significant implications for the design of unstructured P2P overlays.
E N D
Understanding KaZaA Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y.
KaZaA/FastTrack Operation • Top file sharing system • 3 million active nodes • four clients: KaZaA, KaZaA-lite, Grokster and iMesh • Good availability and scalability • Proprietary protocol; signaling traffic encrypted • in contrast with Gnutella and e-mule
Purpose of Measurement Study • Try to understand highly successful file-sharing system • Overlay topology and dynamics • Peer selection • Index management • Utilize the KaZaA as a test-bed for further research. • Content pollution research (another paper)
Existing Tools and Projects • FastTrack encryption algorithm • available from a Web site: http://gift-fasttrack.berlios.de/ • KaZaA Media Desktop (KMD) software architecture • http://kazaasearch.narod.ru/
Big Picture of Overlay • Two layer hierarchy • Ordinary Node (ON) • Super Node (SN)
Measurement Apparatus • KaZaA Sniffing Platform • KaZaA Probing Tool
KaZaA Sniffing Platform • Poly (Ethernet) • Home (cable modem)
KaZaA Probing Tool • Campus & home based probing • Node list • Workload
Signaling Protocol ON-SN session initial SN-SN session initial
TCP Connections Evolution Poly campus 4 – 6 hour measurement Cable modem 7-11 hour measurement
SN Workload 7 - 11 hours TCP connections evolution 7 - 11 hours workload values evolution
Port Dynamic and NAT • 19,637 unique SN addresses collected • Found only 707 SNs (3.6%) use the default 1214 port number. • 18,887 SNs (96.3%) use non-default port numbers. • Of total unique 64834 peers (SN + ON), 21269 peers (ON) use private IP.
Summary of Results • 20,000 ~ 40,000 active super nodes • Each SN connects to approx. 0.1% of other SNs • Highly dynamic connections: over 35% SN-SN durations are less than 30 sec.
Summary of results • Peer selection uses IP prefix match, workload, RTT and freshness • No index exchange between SNs but query forwarding • Skewed content distribution: 20% peers provide 70% metadata for sharing.
Design Principles forUnstructured P2P Overlays • Distributed design • No infrastructure • Avoiding legal attacks. • Exploit heterogeneity • Hierarchy • Self organization • Load balancing - workload balancing. • Explicit locality awareness • Shuffle connections in core overlay
Design Principles forUnstructured P2P Overlays • Properly designed gossip mechanisms • peers have a fresh list of SNs • Firewall circumvention • dynamic port numbers • improves availability • NAT circumvention