390 likes | 522 Vues
This text explores the intricacies of data management within Peer-to-Peer (P2P) systems, covering foundational concepts such as distributed nodes, resource sharing, and the evolution of computing devices. The discussion includes various P2P system architectures like Napster, Gnutella, and BitTorrent, highlighting their benefits and limitations. Challenges like data authenticity, network performance, and participation incentives are examined, proposing solutions such as micropayments and reputation systems. This comprehensive overview is essential for understanding the current landscape of P2P technologies and their applications.
E N D
Data Management in Peer-to-Peer Systems Qi Sun Beverly Yang
Introduction • What is P2P? • Distributed nodes • Equal roles and functionality • Providing/exchanging resources • Why now? • PCs are becoming valuable resources! • Computing devices becoming pervasive
Many Applications • Grid computing • e.g., Seti-at-Home • Ubiquitous computing • Cell phones, wireless devices, hand helds • Cars, refrigerators, microwaves • Preservation/Archival systems • File-sharing
File-sharing model • Data: (Title string, File blob) • Query: “Find songs by Madonna” • Result: • 63.274.18.3: Madonna – “Vogue” • 63.274.18.3: Madonna – “Beautiful Stranger” • 27.48.3.124: Madonna – “Like a Prayer” • 17.64.75.18: Madanna – “Vogue” • How is this “search” implemented?
Many Approaches • Napster • Gnutella • KaZaA • OverNet • BitTorrent
? C,E,F Server Napster • “Hybrid” P2P system A D E Index B F C Peers
Napster • Benefits • Efficient • Comprehensive • Can handle complex queries • Disadvantages • Server is single point of failure • Server is performance bottleneck • Server costs money to maintain!!!
Gnutella • “Pure” P2P system TCP “Overlay network”
= source = forward query = processed query = found result = forward response Gnutella
Gnutella • Benefits • No server needed (cost) • Robust (nodes can come and go) • Can handle complex queries per node • Disadvantages • Not comprehensive (can miss results) • Inefficient! (many messages)
Index Index Index KaZaA • “Super-peer” P2P system
Index Index Index ? Like Gnutella Like Napster KaZaA • “Super-peer” P2P system
KaZaA • Change the ratio of clients to super-peers • Napster: everyone (minus one) is a client • Gnutella: no one is a client • Combines strengths of hybrid and pure systems • Leverages heterogeneity of peers • e.g., bandwidth, memory, processing power • Napster: everyone (minus one) is a client • Gnutella: no one is a client
3561246 Hash(ABC) ABC ABC 7x106 – 8x106 Y 106 – 2x106 3x106 – 4x106 0 - 106 OverNet • Uses all peers to build a distributed index Z W . . . X . . .
OverNet: Searching • Given key k, which peer has the index? 4 2 8 1 Peer 0 looking for k=25 16 0 31 Distributed Hash Table (DHT) 25 24
Blk1 Blk2 Blk3 . . . Blk n BitTorrent • Downloading of a single file Tracker Peers 2, 3, 6
BitTorrent: Downloading • Tit-for-Tat strategy • Choking Mechanism • Periodic un-choke • Rare blocks first B: 3,5 A: 1,2,3,4 C: 2,3,4 B: 3 A: 1,2,3,4 C: 4
Challenges • Performance, Performance, Performance! • Find rare/popular files quickly • Minimize maintenance cost • Spread workload evenly • Etc. • Zillions of heuristics/variants
Challenges (2) • Participation: Peers are selfish! • Do not want to “donate” bandwidth • Do not want to share their files • Do not care about others • Need some incentive mechanism!!
Challenges (3) • Authenticity of data • How do you know you have the right file? • Bogus copies • Corrupt copies • Need detection/correction mechanisms
Techniques • Performance • Routing Indices • Network Awareness • Participation • SLIC • Micropayments • Correctness • DoS Prevention • Reputation Systems
? Routing Indices
DB 2,4 OS 2 AI 2,3,4 EE 3 1 DB 11,13 AI 11,12 AI 8,9 EE 10 DB 5 OS 5,6,7 2 3 4 EE AI DB Routing Indices (2) DB? 5 6 7 8 9 11 10 12 13 DB OS OS OS AI EE AI AI DB DB AI
Routing Indices (3) • Benefits • Potentially reduce # messages • Drawbacks • Update cost (any time you have state) • Size of index
File Y Reputation Systems I do! Who has file X? Bob Alice
? ? ? ? ? ? Reputation Systems Node 1 Node 2 • Have a “opinion list” • Base on personal experience? • Problem: sparse Node 3 Node 4 Node 5 Node 0 Node 6 Node 7 Node 8
Node 4 Node 1 Node 2 Node 6 Reputation Systems • Have a “trust list” • Base on personal experience? • Problem: sparse • Ask friends • Efficient • Automatic
Micropayments Micropayments • Only if you have money, will people do things for you! • Like a vending machine • Goods are cheap • Security can’t be too expensive
Scalability and performance bottleneck Micropayments $ • Server is needed… • Handle accounts • Distribute and cash coins • Security
Micropayments • Peers can do work too! • Challenge: SECURITY $
Fragment B B A Fragment A SLIC: Link-based Incentive • Use quality of service as incentive They need each other to reach more nodes. Þ Can retaliate
SLIC (2) B C D W(A,C) W(A,D) W(A,B) A Adjust weights, and use them to reward good neighbors and to penalize bad ones
Network Awareness • Overlay network can be poor! Timbuktu Mali, Africa San Francisco Palo Alto
Timbuktu Mali, Africa Palo Alto Network Awareness (2) • Form only “good” links • Probe a few and pick the best San Francisco
Timbuktu Mali, Africa Palo Alto Network Awareness (3) • “Swap” peers around San Francisco
Denial of Service • Malicious peers can flood queries on unstructured networks • Rate limit • Incentive • Micro-payment
Denial of Service • Malicious peers can drop queries and indices in structured networks • Tracing/Audit • Reorganization • Alternate path
Concluding Remarks • P2P provides a cheap infrastructure for leveraging the capacities of the masses. • P2P’s “openness” is both its strength and its weakness.