1 / 22

Handling Churn in Less-structured P2P Systems Elders Know Best

Handling Churn in Less-structured P2P Systems Elders Know Best . Yi Qiao & Fabián E. Bustamante Department of Electrical Engineering & Computer Science Northwestern University {yqiao,fabianb}@cs.northwestern.edu. John Lennon, 1940-1980. Toward Massively Distributed Systems.

azizi
Télécharger la présentation

Handling Churn in Less-structured P2P Systems Elders Know Best

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling Churn in Less-structured P2P SystemsElders Know Best Yi Qiao & Fabián E. Bustamante Department of Electrical Engineering & Computer Science Northwestern University {yqiao,fabianb}@cs.northwestern.edu

  2. John Lennon, 1940-1980 Toward Massively Distributed Systems • What scale may bring • Virtually infinite resources always available • Information everywhere at anytime • Power to the people! • … but not for free • Resource management • Heterogeneity • Naming • Administration • Measurement, testing & debugging in the mist of chaos Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  3. Peers’ Transiency (a.k.a. Churn) • The problem with peers’ transiency • Very large peer populations • Autonomous nature of peers • Architectural mutual dependencies of P2P systems • Median session length from 1hr to 1’ [Sariou ’02], [Bustamante ‘03], [Rhea ’04] … • Why should you care? • E.g. for data sharing applications: control traffic cost, spread of queries, cache effectiveness, degree of replication, … Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  4. Peer Lifespan Distribution Active probing of ~1 million peers’ lifespans RCDF of peers with lifespan in [~22’, 3.5 days] Pareto distribution of the form λTk (k < 0) The Lifespan Approach A peer’s expected remaining session length is proportional to the peer’s ageBasis for churn-resilient protocols and strategies! Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  5. Outline • Motivation & background • Lifespan-based protocols and strategies • Organizational protocols • Query-related strategies • Evaluation • Conclusions Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  6. Organizational Protocols • The way a peer-to-peer system is structured • Unstructured (UDP) - All peers equal; e.g., Gnutella v0.4 • Loosely structured (HDP) - Leaf & super-peers; e.g., Gnutella v0.6, Kazaa • Highly structured (DHT) • Lifespan-based organizational protocols • Opt for longer lived peers when choosing neighbors and/or recommending peers to others [Bustamante02] • Lifespan UDP (LUDP) • Opt for older peers for connections; random recommendations • Lifespan HDP (LHDP) • Leaf and super-peers opt for older super peers for connections Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  7. Query, Caching & Replication Strategies • Flooding • Query is propagated to all neighbors within a radius • Inherently un-scalable • K-random walks • k parallel query messages randomly forwarded at each hop[Lv02] • Improvement factoring in node’s degree [Adamic01] , capacity[Lv03], … • Lifespan-based k-random walk Query • Opt for older peers when forwarding a query walker • A simple weighted probabilistic approach works well • Avoids collision between walkers • Prevents hot spots Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  8. Query, Caching & Replication Strategies • Neighbor Caching with incremental Update (NCU) • Path Caching with eXpiration (PCX) [Roussopoulos03] • Effectiveness not obvious for less-structured systems • Regional Caching with eXpiration (RCX) - new • Peers in query hit path push query hit entries to some of their neighbors • Lifespan-based RCX • Caching in older neighbors along the path • Expiration threshold for cached entries is based on age of target peer Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  9. Query, Caching & Replication Strategies • Simple replication – make replicas on requesters • Proactive replication (path replication) puts more replicas on multiple peers • Regional replication - more effective than path replication • Put replicas on some neighbors of each peer along the query path • Lifespan-based Regional Replication (LRRep) • Opt for in-the-path-region peers’ older neighbors for placing replicas • Upper-bound for number of replicas each peer can store Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  10. Determining Peer’s Age • Effectiveness of lifespan-based approach, depends on • Fitness of session length estimators • Accuracy of peers’ age information … • A lightweight distributed protocol for age determination • Some good characteristics • Age never directly requested from peer itself • Trimming/sampling reduces the probability of small cabals P trying to determine C’s age1.Witness collection Get from C list of potential witnesses & interaction windows2.Witness sampling & trimming a. Trim witness with suspiciously large interaction windows b. Sample final list W3.Collecting testimonies & determining age a. Validate C reported interaction windows asking peers in W b. Determine C’s age Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  11. Outline • Motivation & background • Lifespan-based protocols and strategies • Organizational protocols • Query-related strategies • Evaluation • Conclusions Query Caching Replication Organizational protocol Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  12. Evaluation Setups • Simulation • Simulations driven by 4 of the 20 lifespan traces • ~150,000 peers, 3000-4000 online at any time • 4 query walkers, with TTL = 20 • Simulated time 511,000” (~6 days) • Wide-area • Modified open-source Gnutella client • 150 PlanetLab nodes • 200-300 online peers during experiment • 3 query walkers, with TTL = 7 • Simulated time 511,000” (~6 days) Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  13. k-random-walk query (RQuery) Simple replication (SRep) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Lifespan-based Unstructured (LUDP) Basic Advantages of Lifespan Approach … and 50- 70% more query hits LUDP has 50-70% shorter query resolution time than UDP Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation

  14. k-random-walk query (RQuery) Simple replication (SRep) k-random-walk query (RQuery) Simple replication (SRep) Random Unstructured (UDP) Lifespan-based Unstructured (LUDP) Basic Advantages of Lifespan Approach Comparable results in wide-area experiments LUDP delivers >40% more query hits than UDP Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Wide-Area

  15. k-random-walk query (RQuery) Simple replication (SRep) k-random-walk query (RQuery) Simple replication (SRep) Random Hierarchical (HDP) Lifespan-based Hierarchical (LHDP) Basic Advantages of Lifespan Approach And with hierarchical protocols … and more query hits Significantly faster query - 3x faster for 50% of queries Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation

  16. k-random-walk query (RQuery) Regional replication (RRRep) Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Unstructured (UDP) Unstructured (UDP) Lifespan-based Query-related Strategies … and more query hits Significantly faster query - 2-3x faster for 50% of queries Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Wide-Area

  17. k-random-walk query (RQuery) Regional replication (RRRep) Lifespan k-random-walk query (LQuery) Lifespan-based regional replication (LRRep) Random Hierarchical (HDP) Lifespan-based Hierarchical (LHDP) Combined Strengths … and 3x improvement on query hits >4x times faster query resolution times Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation

  18. Conclusions & Future Work • Need to address churn resilience in massively distributed systems • Lifespan is a good base for structural resilient systems • Illustrative lifespan-based organizational protocols & strategies • Demonstrated effectiveness through trace-driven simulations & wide-area experiments • Lower control overhead • Faster query resolution • Higher query hits • Currently applying similar ideas to build structurally churn-resilient DHT systems Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  19. Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005

  20. Basic Advantages of Lifespan Approach • Relative query satisfaction: the percentage of queries achieving Z satisfaction (i.e. at least z query hits) • Why lifespan-based LUDP is better? • Queries more likely to reach older peers • which store more replicas, • cache indexes longer, and • are much less likely to breakdown query/reply paths Using PCX, LUDP results on faster query resolution. Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation

  21. k-random-walk query (RQuery) Simple replication (SRep) Lifespan k-random-walk query (LQuery) Simple replication (SRep) Unstructured (UDP) Unstructured (UDP) Lifespan-based Query-related Strategies Just from query: ~100% improvement on query resolution time & hit numbers Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation

  22. k-random-walk query (RQuery) Regional caching (RRCX) Regional replication (RRRep) Unstructured (UDP) Lifespan k-random-walk query (LQuery) Lifespan-based regional caching (LRCX) Lifespan-based regional replication (LRRep) Unstructured (UDP) Lifespan-based Query-related Strategies median query hit number 25 to 60 90% query resolution time: 0.2 sec to 0.55 sec Qiao & Bustamante, EE&CS,Northwestern U. IEEE P2P 2005 Simulation

More Related