1 / 81

Unstructured P2P overlay

Unstructured P2P overlay. P2P Application. Centralized model e.g. Napster global index held by central authority direct contact between requestors and providers Decentralized model e.g. Gnutella, Freenet, Chord no global index – local knowledge only (approximate answers)

kiril
Télécharger la présentation

Unstructured P2P overlay

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unstructured P2P overlay

  2. P2P Application • Centralized model • e.g. Napster • global index held by central authority • direct contact between requestors and providers • Decentralized model • e.g. Gnutella, Freenet, Chord • no global index – local knowledge only (approximate answers) • contact mediated by chain of intermediaries

  3. Gnutella search mechanism • Each peer keeps a list of other peers that it knows about • Neighbors • Increasing the degree of the peers reduces the longest path from one peer to another but requires more storage at each peer • Once a peer is connected to the overlay, it can exchange messages with other peers in its neighbor.

  4. A Gnutella search mechanism • Steps: • Node 2 initiates search for file A 7 1 4 2 6 3 5

  5. A A A Gnutella Search Mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors 7 1 4 2 6 3 5

  6. A A A A Gnutella Search Mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message 7 1 4 2 6 3 5

  7. A A A A:5 A:7 Gnutella Search Mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message 7 1 4 2 6 3 5

  8. A A A:5 A:7 Gnutella Search Mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated 7 1 4 2 6 3 5

  9. A:5 A:7 Gnutella Search Mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated 7 1 4 2 6 3 5

  10. Gnutella Search Mechanism • Steps: • Node 2 initiates search for file A • Sends message to all neighbors • Neighbors forward message • Nodes that have file A initiate a reply message • Query reply message is back-propagated • File download download A 7 1 4 2 6 3 5

  11. Scalability • Whenever a node receives a message, (ping/query) it sends copies out to all of its other connections. • existing mechanisms to reduce traffic: • TTL counter • Cache information about messages they received, so that they don't forward duplicated messages.

  12. Gnutella Search Mechanism FloodFoward(Query q, Source p) if (q.id oldIdsQ) return oldIdsQ=oldIdsQ ∪q.id q.TTL=q.TTL-1 if (q.TTL <= 0) return foreach (sNeighbors) if (s <> p) send(s,q)

  13. Total Generated Traffic Ripeanu has determined that Gnutella traffic totals 1Gbps (or 330TB/month)! • Compare to 15,000TB/month in US Internet backbone (Dec. 2000) • this estimate excludes actual file transfers Reasoning: • QUERY and PING messages are flooded. They form more than 90% of generated traffic • predominant TTL=7

  14. Mapping between Gnutella Network and Internet Infrastructure A F B D E G C H Perfect Mapping

  15. Mismatch between Gnutella Network and Internet Infrastructure • Inefficient mapping • Link D-E needs to support six times higher traffic. A F B D E G C H

  16. Free Riding on Gnutella • 70% of Gnutella users share no files • 90% of users answer no queries • Those who have files to share may limit number of connections or upload speed, resulting in a high download failure rate. • If only a few individuals contribute to the public good, these few peers effectively act as centralized servers.

  17. Query Expressiveness • Format of query not standardized • No standard format or matching semantics for the QUERY string. Its interpretation is completely determined by each node that receives it. • String literal vs. regular expression • Directory name, filename, or file contents • Malicious users may even return files unrelated to the query

  18. Conclusions • Gnutella is a self-organizing, large-scale, P2P application that produces an overlay network on top of the Internet; it appears to work • freedom • High network traffic cost • Scalability • File availability

  19. Random Walk • To avoid the message overhead of flooding, unstructured overlays can use some type of random walk. • A single query message is sent to a randomly selected neighbor • The message has a TTL that is decremented at each hop • Termination • The query locates a node with the desired object • Search timeout

  20. Random Walk • To improve the response time, several random walk queries can be issued in parallel.

  21. Some References • [1] Eytan Adar and Bernardo A. Huberman, Free Riding on Gnutella http://www.firstmonday.dk/issues/issue5_10/adar/ • [2] Igor Ivkovic, Improving Gnutella Protocol: Protocol Analysis And Research Proposals http://www9.limewire.com/download/ivkovic_paper.pdf • [3] Jordan Ritter, Why Gnutella Can't Scale. No, Really. http://www.monkey.org/~dugsong/mirror/gnutella.html • [4] Matei Ripeanu, Peer-to-Peer Architecture Case Study: Gnutella network. http://www.cs.uchicago.edu/%7Ematei/PAPERS/gnutella-rc.pdf • [5] The Gnutella Protocol Specification v0.4 http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf

  22. Improving on Flooding and Random Walk

  23. Peer-to-peer Networks • Peers are connected by an overlay network. • Users cooperate to share files (e.g., music, videos, etc.)

  24. Overviews • Disadvantages • Flooding : does not scale • Random walks : take a long time to find an object • Key ideas to improve performance • Query forwarding criteria • Overlay topology • Object placement

  25. Overviews • Query forwarding • By using additional knowledge about where the object is likely to be. • Overlay topology • Proximity of the peers in the network • Connecting with high degree nodes • Shared properties of the peers • Object placement • Object popularity

  26. Overviews • Metrics • Overlay hop (hop) • The overlay hop may corresponding to many network hops! • Request hit rate • Latency

  27. Topics • Search strategies • Beverly Yang and Hector Garcia-Molina, “Improving Search in Peer-to-Peer Networks”, ICDCS 2002 • Arturo Crespo, Hector Garcia-Molina, “Routing Indices For Peer-to-Peer Systems”, ICDCS 2002 • Short cuts • Kunwadee Sripanidkulchai, Bruce Maggs and Hui Zhang, “Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems”, infocom 2003. • Replication • Edith Cohen and Scott Shenker, “Replication Strategies in Unstructured Peer-to-Peer Networks”, SIGCOMM 2002.

  28. Improving Search in Peer-to-Peer Networks ICDCS 2002 Beverly Yang Hector Garcia-Molina

  29. Motivation • The propose of a data-sharing P2P system is to accept queries from users, and locate and return data (or pointers to the data). • Metrics • Cost • Average aggregate bandwidth • Average aggregate processing cost • Quality of results • Number of results • Satisfaction : a query is satisfied if Z (a value specified by user) or more results are returned. • Time to satisfaction

  30. Current Techniques • Gnutella • BFS with depth limit D. • Waste bandwidth and processing resources • Freenet • DFS with depth limit D. • Poor response time.

  31. Broadcast policies • Iterative deepening (Expanding ring) • Directed BFS

  32. Iterative Deepening • In system where satisfaction is the metric of choice, iterative deepening is a good technique • Under policy P= { a, b, c} ;waiting time W • A source node S first initiates a BFS of depth “a” • The query is processed and then becomes frozen at all nodes that are “a” hops from the source • S waiting for a time period W

  33. Iterative Deepening • If query is not satisfied, S will start the next iteration, initiating a BFS of depth b. • S send a “Resend” with a TTL of “a” • A node that receives a Resend message will • simply forward the message or • if the node is at depth “a”, it will drop the resend message and unfreeze the corresponding query by forwarding the query message with a TTL of b-a to all its neighbors

  34. Directed BFS • If minimizing response time is important to an application, iterative deepening may not be appropriate • A source send query messages to just a subset of its neighbors • A node maintains simple statistics on its neighbors • Number of results received from each neighbor • Latency of connection

  35. Directed BFS (cont) • Candidate nodes • Returned the Highest number of results • The neighbor that returns response messages that have taken the lowest average number of hops

  36. Routing Indices For Peer-to-Peer Systems Arturo Crespo, Hector Garcia-Molina Stanford University {crespo,hector}@db.Stanford.edu

  37. Motivation • A distributed-index mechanism • Routing Indices (RIs) • Give a “direction” towards the document, rather than its actual location • By using “routes” the index size is proportional to the number of neighbors

  38. Peer-to-peer Systems • A P2P system is formed by a large number of nodes that can join or leave the system at any time • Each node has a local document database that can be accessed through a local index • The local index receives content queries and returns pointers to the documents with the requested content

  39. Routing indices • The objective of a Routing Index (RI) is to allow a node to select the “best” neighbors to send a query • A RI is a data structure that, given a query, returns a list of neighbors, ranked according to their goodness for the query • Each node has a local index for quickly finding local documents when a query is received. Nodes also have a CRI containing • the number of documents along each path • the number of documents on each topic

  40. Routing indices (cont.) • Thus, the number of results in a path can be estimated as : • Example : search documents contain (DB and L) • Goodness of • B: (20/100) *(30/100) * 100= 6 • C: ( 0/1000)*(50/1000)*1000=0 • D: (100/200)*(150/200)*200=75 • Note that these numbers are just estimates and they are subject to overcounts and/or undercounts • A limitation of using CRIs is that they do not take into account the difference in cost due to the number of “hops” necessary to reach a document

  41. Using Routing Indices

  42. Using Routing Indices (cont.) • t is the counter size in bytes, c is the number of categories, N the number of nodes, and b the branching factor • Centralized index would require t × (c + 1) × N bytes • the total for the entire distributed system is t × (c + 1) × b × N bytes • the RIs require more storage space overall than a centralized index, the cost of the storage space is shared among the network nodes

  43. Creating Routing Indices

  44. Maintaining Routing Indices • Maintaining RIs is identical to the process used for creating them • For efficiency, we may delay exporting an update for a short time so we can batch several updates, thus, trading RI freshness for a reduced update cost

  45. Hop-count Routing Indices

  46. Hop-count Routing Indices (cont.) • The estimator of a hop-count RI needs a cost model to compute the goodness of a neighbor • We assumes that document results are uniformly distributed across the network and that the network is a regular tree with fanout F • We define the goodness (goodness hc) of Neighbor iwith respect to query Q for hop-count RI as: • If we assume F = 3, the goodness of X for a query about “DB” documents would be 13+10/3 = 16.33 and for Y would be 0+31/3 = 10.33

  47. Exponentially aggregated RI • Each entry of the ERI for node N contains a value computed as: • th is the height and F the fanout of the assumed regular tree, goodness() is the Compound RI estimator, N[j] is the summary of the local index of neighbor j of N, and T is the topic of interest of the entry

  48. Exponentially aggregated RI (cont.)

  49. Cycles in the P2P Network

  50. Cycles in the P2P Network • There are three general approaches for dealing with cycles: • No-op solution: No changes are made to the algorithms • only works with the hop-count and the exponential RI schemes • hop-count RI: cycles longer than the horizon will not affect the RI. However, shorter cycles will affect the hop-count RI • exponential RI: updates are sent back to the originator. However, the effect of the cycle will be smaller and smaller every time the update is sent back (due to the exponential decay

More Related