1 / 41

A Step Back Reflections on P2P Techniques

A Step Back Reflections on P2P Techniques. Indranil Gupta March 16, 2006 CS 598IG. SP06. Let’s keep it Short Today. 2 P2P or Not to P2P Scooped, Again. 2 P2P or Not 2 P2P?. Mema Roussopoulos Mary Baker David S. H. Rosenthal TJ Giuli Petros Maniatis Jeff Mogul. Kerry.

lily
Télécharger la présentation

A Step Back Reflections on P2P Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Step BackReflections on P2P Techniques Indranil Gupta March 16, 2006 CS 598IG. SP06.

  2. Let’s keep it Short Today • 2 P2P or Not to P2P • Scooped, Again

  3. 2 P2P or Not 2 P2P? Mema Roussopoulos Mary Baker David S. H. Rosenthal TJ Giuli Petros Maniatis Jeff Mogul

  4. Kerry Candidate problems • Internet Routing (RON) • Resource Sharing (PlanetLab) • Cooperative Web Caching • Internet Backup and Corporate Backup • Distributed Digital Libraries • Distributed Monitoring • Ad hoc Routing in Disaster Recovery • Metropolitan-area Cell Phone Forwarding

  5. Ideal P2P properties • Self Organizing • P2P routing • Discovery • Symmetric communication • Peers are approximately equal • Decentralized control • No single point of failure

  6. P2P Networks Gnutella Usenet Images from http://www.cybergeography.org/atlas/more_topology.html

  7. 2 P2P or not P2P Budget Relevance Trust

  8. Budget Low Effect High • Lowest possible cost per peer, rather than lowest global cost • Bittorrent, Gnutella, Freenet, etc. • SETI@home • Dictates how many peers join • Decides if P2P is viable for problem • Worries less about performance criticality • Favors centralized approaches, P2P irrelevant • Clusters, High performance computing

  9. Relevance Low Effect High • Personal data • Private data • Internet backup • Corporate backup • Web caching • Relevance of resources encourages peers to join • “When resource relevance is high, cooperation in a P2P solution evolves naturally” • File sharing • Freenet • Content distribution • Internet routing • Bit Torrent • Gnutella • Kazaa

  10. Trust Low Effect High • Encryption • Anonymity • Freenet • Oceanstore • Ivy • Timestamping • MojoNation • Mutual trust • Risks • Gnutella • Napster • Overlays • File sharing • Usenet

  11. Rate of Change Low Effect High • Tangler • Freenet • LOCKSS • Time stamping • Content distribution • Usenet • Flash crowds • Churn • Timeliness • Consistency • Internet routing • Online net monitoring

  12. Criticality Low Effect High • Usenet • Content distribution • Offline net study • File sharing • Centralized control • Accountability • Fault tolerance • Ad hoc disaster recovery • Flash crowds • Internet monitoring • Routing

  13. 2 P2P or not P2P Budget Relevance Trust

  14. Conclusion • Framework for analyzing P2P applications • Captures constraints and app requirements • Limited budget is motivating factor • Problems with low relevance are inappropriate for P2P • Same as our “Penny Lane” motivation for P2P systems

  15. Critique • Strengths • Quantifies application requirements and suitable use cases • Generically describes suitability of classes of P2P apps • Weaknesses • High churn: “p2p inappropriate”? Or most current non-Kelips solutions insufficient? • Why can’t p2p systems handle critical applications? It’s a question of developing the right, e.g., real-time, technologies. • Why is the order of preference budget > relevance > trust > churn > criticality? Why not a different ordering? • Fuzzy requirements not accounted for • Other requirements – will they evolve as new p2p applications emerge?

  16. Scooped, Again Jonathan Ledlie Jeff Shneidman Margo Seltzer John Huth

  17. Outline • Introduction • Grid Computing • P2P Systems • Fallacies preventing cooperation • Shared and Disjoint Problems • Conclusions What they are, Goals, Manifestations

  18. Introduction • Peer-to-Peer vs. Grid Computing • Overlapping problem domain • P2P focuses on research • Grid is concerned with concrete, tangible solutions • History, repeated – the Web

  19. Introduction – cont. • Current trends • Divergent, parallel development • Duplication of work • Grid: risk of non-optimal solutions • Missing out on P2P’s strong achievements (search and storage scalability, decentralization, anonymity, denial of service prevention) • Cooperation is the key

  20. Grids • What is the Grid? “a type of parallel and distributed system that enables the sharing, selection, and aggregation of resources distributed across multiple administrative domains based on the resources’ availability, capability, performance, cost, and user’s QoS requirements” • Short version: virtualizing computer resources • Large scale heterogeneous resource sharing (different platforms, hardware/software architectures, and computer languages) • Functional classification: • Computational grids (run batch jobs during idle times) • Data grids

  21. Grid Layout

  22. Grid Goals • Design goal: • Solve problems too big for a single supercomputer, but retain the flexibility to work on multiple smaller problems • Self-configuring, self-tuning, self-healing • Allow data sharing and support computation across administrative domains • Standardized programming interface • GGF (Global Grid Forum) • Globus toolkit – the de facto standard for grid middleware

  23. Grid Manifestations • Protocols: • Resource management: • Grid Resource Allocation & Management Protocol (GRAM) • Information services: • Monitoring and Discovery Service (MDS) • Security services: • Grid Security Infrastructure (GSI) • Data movement and management: • Global Access to Secondary Storage (GASS), GridFTP • Tools: • Grid Portal Software (GridPort, OGCE) • Grid Packaging Toolkit • Grid-enabled MPI (MPICH-G2) • Network Weather Service • Condor (CPU cycle scavenging) and Condor-G (job submission) • APIs: • Web Services: Open Grid Services Architecture (OGSA)

  24. P2P • What is P2P? “…a class of applications that take advantage of resources – storage, cycles, content, human presence – available at the edges of the Internet” • Decentralized, non-hierarchical node organization • Inherently untrusted (well…)

  25. P2P Goals • Cost sharing / reduction • Every peer responsible for its own cost • Reduction of file storage costs • Reduction of computation costs • Improved scalability / reliability • Lack of centralization allows new algorithms (CAN, Chord…etc) to be designed to allow improved scalability • Resource Aggregation • Every peer lends its own resources to the network • Increased Autonomy • Tasks are performed locally – no central service provider

  26. P2P Goals – cont. • Anonymity / Privacy • FreeNet • Dynamism • Nodes enter and leave the system in a transparent way • Ad-hoc communication • Members can join and leave based on their physical location or interests

  27. Grids Parallel, distributed systems concerned with resource sharing, selection, aggregation Resource availability, capability, performance, cost, and user QoS requirements are considered Self-configuring, self-tuning, self-healing Idle cycle and storage utilization P2P Distributed systems that take advantage of resources scattered throughout the Internet Decentralized, non-hierarchical node organization Concerned with fault-tolerance, scalability, availability…etc. Idle cycle and storage utilization Summary

  28. Grid Distributed computation distributed.net SETI@home Data production / aggregation P2P Distributed file sharing Gnutella, KaZaA Distributed computation distributed.net Anonymity Freenet, Publius Summary – cont.

  29. Outline • Introduction • Grid Computing • P2P Systems • Fallacies preventing cooperation • Shared and Disjoint Problems • Conclusions What they are, Goals, Manifestations

  30. Fallacies preventing cooperation • “The technical problems in Grid systems are different from those in p2p systems” • Usage misconception: Grid for computing problems, P2P for file sharing • Data handling and data production in Grid systems has become important • P2P used in desktop collaboration and network computation • “open problems” in both camps have striking similarities

  31. Fallacies preventing cooperation • “While the technical problems are similar, the architectures (physical topology, bandwidth availability and use, trust model, etc.) demand that the specific solutions be fundamentally different” • Solving common problems through sharing good ideas from each community • Application dependent – special requirements tailored to application needs, however the technical approaches for solving a particular problem could benefit both communities

  32. Fallacies preventing cooperation • “Grid projects do not have the flexibility to try new algorithms/ideas because they have to get real work done. P2P research is all about this flexibility” • Grid has room for flexible research, too • Testing new applications and protocols • Users willing to adopt different technologies to get the work done

  33. Outline • Introduction • Grid Computing • P2P Systems • Fallacies preventing cooperation • Shared and Disjoint Problems • Conclusions What they are, Goals, Manifestations

  34. Shared problems • Topology Formation • Node join and neighbor discovery • Work has been done by both groups: • Grid: “On fully decentralized resource discovery in grid environments” • P2P: “Self-organization in p2p systems” • Grid infrastructure in not flexible – hard coded • Could benefit from P2P research prototypes

  35. Shared problems – cont. • Utilization • Resource discovery, data retrieval • P2P hash-based look-up schemes are useful • Resource management / optimization • How to “best” utilize resources in a network • Data replication/caching examined by both communities • Scheduling and handling of contention • P2P focus: bandwidth usage (e.g. Gnutella) • Grid focus: scheduling • Load balancing: break large tasks into distributed smaller ones

  36. Shared problems – cont. • Coping with Failure • P2P: lossy storage model (Freenet, Gnutella) • Considerations for Grid adaptability: • Different common loss model • Storage size (O(petabyte/month)) • Security-related issues • Authenticity: verification of data/computation • Availability: resilience to DoS attacks • Authorization: ACLs

  37. Shared problems – cont. • Maintenance • P2P: essentially no standards or APIs • Efforts by Berkeley BOINC, Google Compute, overlay standardization • Grid: pushes for a standardized API • GGF (Global Grid Forum) • OGSA (Open Grid Services Architecture) • Web services oriented API – Globus as reference implementation

  38. Disjoint Problems • Anonymity • Not really useful for Grid systems, yet

  39. Conclusions • A lot of overlap between the goals and research interests of the two communities • P2P community needs to consider the needs of the Grid users to see how existing research can be applied successfully to Grid problems • Aim for common standards as much as possible

  40. Critique • Since this paper was published (2003), a little bit of convergence has happened, but not as much as predicted by these authors and as predicted by Foster et al • Will it just take more time? • (Skeptics’ Viewpoint) Really? Aren’t P2P and Grid two different areas? • They still have mostly-disjoint research communities • Or is that an opportunity for more researrch?

  41. Have a good Break! Remember – Midterm report (with initial experimental data) is due April 2!

More Related