1 / 64

Presentation by Nanda Kishore Lella Lella.2@wright

A Measurement Study of Peer-to-Peer File Sharing Systems by Stefan Saroiu P. Krishna Gummadi Steven D. Gribble. Presentation by Nanda Kishore Lella Lella.2@wright.edu. Outline. P2P Overview What is a peer? Example applications Benefits of P2P P2P Content Sharing Challenges

walden
Télécharger la présentation

Presentation by Nanda Kishore Lella Lella.2@wright

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Measurement Study of Peer-to-Peer File Sharing SystemsbyStefan SaroiuP. Krishna GummadiSteven D. Gribble Presentation by Nanda Kishore Lella Lella.2@wright.edu

  2. Outline • P2P Overview • What is a peer? • Example applications • Benefits of P2P • P2P Content Sharing • Challenges • Group management/data placement approaches • Measurement studies • Conclusion

  3. What is Peer-to-Peer (P2P)? • Most people think of P2P as music sharing Examples: • Napster • Gnutella

  4. What is a peer? • Contrasted with Client-Server model • Servers are centrally maintained and administered • Client has fewer resources than a server

  5. What is a peer? • A peer’s resources are similar to the resources of the other participants • P2P – peers communicating directly with other peers and sharing resources

  6. P2P Application Taxonomy P2P Systems Distributed Computing File Sharing Collaboration Platforms JXTA

  7. P2P Goals/Benefits • Cost sharing • Resource aggregation • Improved scalability/reliability • Increased autonomy • Anonymity/privacy • Dynamism • Ad-hoc communication

  8. P2P File Sharing • Content exchange • Gnutella • File systems • Oceanstore • Filtering/mining • Opencola

  9. Research Areas • Peer discovery and group management • Data location and placement • Reliable and efficient file exchange • Security/privacy/anonymity/trust

  10. Current Research • Group management and data placement • Chord, CAN, Tapestry, Pastry • Anonymity • Publius • Performance studies • Gnutella measurement study etc.

  11. Management/Placement Challenges • Per-node state • Bandwidth usage • Search time • Fault tolerance/resiliency

  12. Approaches • Centralized • Flooding • Document Routing

  13. Centralized Bob Alice • Napster model • Benefits: • Efficient search • Limited bandwidth usage • Efficient network handling • Drawbacks: • Central point of failure • Limited scale Judy Jane

  14. Flooding Carl Jane • Gnutella model • Benefits: • No central point of failure • Limited per-node state • Drawbacks: • Slow searches • Bandwidth intensive Bob Alice Judy

  15. Document Routing 001 012 • FreeNet, Chord, CAN, Tapestry, Pastry model • Benefits: • More efficient searching • Limited per-node state • Drawbacks: • Limited fault-tolerance vs redundancy 212 ? 212 ? 332 212 305

  16. Document Routing – CAN • Associate to each node and item a unique id in an d-dimensional space • Goals • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes • Properties • Routing table size O(d) • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes Slide modified from another presentation

  17. CAN Example: Two Dimensional Space • Space divided between nodes • All nodes cover the entire space • Each node covers either a square or a rectangular area • Example: • Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  18. CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  19. CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  20. CAN Example: Two Dimensional Space • Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  21. CAN Example: Two Dimensional Space • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  22. CAN Example: Two Dimensional Space • Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  23. CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Can route around some failures • some failures require local flooding • Example: assume n1 queries f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  24. 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 CAN: Query Example Slide modified from another presentation

  25. 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 CAN: Query Example Slide modified from another presentation

  26. 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 CAN: Query Example Slide modified from another presentation

  27. Node Failure Recovery • Simple failures • know your neighbor’s neighbors • when a node fails, one of its neighbors takes over its zone • More complex failure modes • simultaneous failure of multiple adjacent nodes • scoped flooding to discover neighbors • hopefully, a rare event Slide modified from another presentation

  28. N5 N10 N110 K19 N20 N99 N32 N80 N60 Document Routing – Chord • MIT project • Uni-dimensional ID space • Keep track of log N nodes • Search through log N nodes to find desired key

  29. Each node and key is assigned an id. If a node needs a key, searches in its table of n nodes for the key. If fails, goes to the last node of its table and repeats until it finds the key. Search through log N nodes to find desired key N5 N10 N110 K19 N20 N99 N32 N80 N60 Document Routing – Chord(2)

  30. 43FE 993E 13FE 73FE F990 04FE 9990 ABFE 239E 1290 Doc Routing – Tapestry/Pastry • Global mesh of meshes • Suffix-based routing • Uses underlying network distance in constructing mesh

  31. Every node has a 4 bit name similar to IP address Each bit in the name can hold 16 types Keys present at the node are in accordance with the node name. 43FE 993E 13FE 73FE F990 04FE 9990 ABFE 239E 1290 Naming in Tapestry

  32. 7598 B4F8 Msg to 4598 4598 9098 6789 B437 Tapestry Routing

  33. Remaining Problems? • Hard to handle highly dynamic environments • Methods don’t consider peer characteristics

  34. Measurement Studies Gnutella vs. Napster

  35. R R Q Q P napster.com P P D S S Q P P P P R S S P Q P P P Q D P P Q Napster Gnutella Q peer query P D file download R response server S

  36. Methodology 2 stages: • periodically crawl Gnutella/Napster • discover peers and their metadata • feed output from crawl into measurement tools: • bottleneck bandwidth – SProbe • latency – SProbe • peer availability – LF • degree of content sharing – Napster crawler

  37. Crawling • May 2001 • Napster crawl • query index server and keep track of results • query about returned peers • don’t capture users sharing unpopular content • Gnutella crawl • send out ping messages with large TTL

  38. Measurement Study • How many peers are server-like…client-like? • Bandwidth, latency • Connectivity • Who is sharing what?

  39. Graph results • CDF: cumulative distribution function • From this graph, we see that while 78% of the participating peers have downstream bottleneck bandwidths of at least 1000Kbps • Only 8% of the peers have upstream bottleneck bandwidths of at least 10Mbps. • 22% of the participating peers have upstream bottleneck bandwidths of 100Kbps or less.

  40. Reported Bandwidth

  41. Graph results • The percentage of Napster users connected with modems (of 64Kbps or less) is • About 25%, while the percentage of Gnutella users with similar connectivity is as low as 8%. • 50% of the users in Napster and 60% of the users in Gnutella use broadband connections • only about 20% of the users in Napster and 30% of the users in Gnutella have very high bandwidth connections • Overall, Gnutella users on average tend to have higher downstream bottleneck bandwidths than Napster users.

  42. Graph results • Approximately 20% of the peers have latencies of at least 280ms, • Another 20% have latencies of at most 70ms

  43. Graph results • This graph illustrates the presence of two clusters; a smaller one situated at (20-60Kbps, 100-1,000ms) and a larger one at over (1,000Kbps, 60-300ms). • horizontal lines in the graph they predicate that the latency also depends on the location of the peer for measuring system.

  44. Measured Uptime

More Related