1 / 70

Improving Search in P2P Networks

Improving Search in P2P Networks. By Shadi Lahham. Purpose of This Lecture. General understanding of P2P systems Appreciating the need for efficient search Applying different search techniques to different scenarios. P2P Basics What Is P2P Advantages of P2P Types of P2P Systems

colm
Télécharger la présentation

Improving Search in P2P Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Search in P2P Networks By Shadi Lahham

  2. Purpose of This Lecture • General understanding of P2P systems • Appreciating the need for efficient search • Applying different search techniques to different scenarios Improving P2P Search

  3. P2P Basics What Is P2P Advantages of P2P Types of P2P Systems Shortcomings Search Methods The Search Problem Current Methods Suggested Methods Experimental Setup Metrics Data Collection Calculating Costs Analysis of Results Conclusions Table Of Contents Improving P2P Search

  4. Introduction P2P Basics

  5. What is P2P • Distributed system • Peers (nodes) are servers and clients simultaneously • Peers are of equal roles • Resources shared across peers • No central server needed • Examples of P2P system Improving P2P Search

  6. Key File f1 file1 f2 file2 f3 file3 P2P Overview Improving P2P Search

  7. Advantages of P2P • P2P vs. Centralized Servers • Distributes disk space / bandwidth • Inexpensively scalable • Self organized (autonomous) • Load balancing • Adaptative / fault tolerant • Less susceptible to attacks • Allows for redundancy Improving P2P Search

  8. Types of P2P Systems • Hybrid ( napster ) • Pure ( gnutella ) • Super Peers ( kaZaA ) Improving P2P Search

  9. Hybrid ( napster ) Improving P2P Search

  10. Pure ( gnutella ) Improving P2P Search

  11. Super Peers ( kaZaA ) • Make use of heterogeneity • Powerful peers serve as super peers • Weaker peers act as clients • Super-peers index clients’ files • Requires updates on join/leave/update • Queries handled at super-peer level • Saves query costs Improving P2P Search

  12. Super Peers ( kaZaA ) Improving P2P Search

  13. Hybrid - Shortcomings • High cost on centralized index • Performance & scalability bottleneck • Needs maintenance • Vulnerable ! Highly visible target Improving P2P Search

  14. Pure - Shortcomings • Inefficient search (flooding) • Heterogeneity of peers not considered • Bottlenecks (limited peers) • Fragmentation Improving P2P Search

  15. Super Peers - Shortcomings • Super nodes might become bottlenecks for clients • requires redundancy • Bad selection of supernodes might cause even worse problems Improving P2P Search

  16. Search Methods

  17. The Search Problem • Connected graph • Might contain cycles • Individual node doesn’t know structure • Only knows its neighbors • No idea where data can be found Improving P2P Search

  18. The Search Problem • Goal : Find as many occurrences of the data using min time and resources • Solution : • BFS ? • Bounded BFS ? • (naive approaches) Improving P2P Search

  19. Bounded BFS Search TTL=2 TTL=1 TTL=0 Improving P2P Search

  20. Bounded BFS Search • Messages get a global TTL (time to live) • Algorithm • Source broadcasts a message to a subset of neighbors • Neighbors search locally . Results are sent to source if found • TTL = TTL – 1; • As long as TTL > 0 Nodes forward message to neighbors • Downside : wastes bandwidth / processing Improving P2P Search

  21. Current Methods • Gnutella - BFS • High cost • Gets complete results ( for depth D) • Relatively short time • Freenet - DFS • Poor response time • Minimizes BW costs Improving P2P Search

  22. Suggested Methods • Iterative deepening • Directed BFS • Local Indices Improving P2P Search

  23. Iterative Deepening • Idea: • Search at a small depth and increase if required • Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries • Notice that given enough iterations this method returns %100 results of BFS Improving P2P Search

  24. Iterative Deepening (cont…) • Elements : • Policies P={a,b,c,..} define deepening behavior • BFS is run to depth a and frozen • If source is satisfied it stops the process • Otherwise it asks BFS to resume to depth b • Process is repeated until source satisfied or we reach the last policy item Improving P2P Search

  25. Iterative Deepening (cont…) • Elements : • We can specify how long to wait between iterations • We need a system-wide message ID to identify individual messages Improving P2P Search

  26. Example P={1,3,4} W=1 Improving P2P Search

  27. Directed BFS • Idea: • Choose a subset of neighbors to query • Neighbors will BFS as usual • Aims to provide a balance between good response time and results • Minimize costs of full BFS • Notice that only a subset of possible results are returned so we might fail to satisfy query Improving P2P Search

  28. Directed BFS Example TTL=2 TTL=1 TTL=0 Improving P2P Search

  29. Directed BFS (cont…) • But which neighbors to pick ?? • Maintain simple statistics on neighbors to derive heuristics • Highest past results • Lowest average hops • (close to nodes containing useful data) • High message count • (stable - can handle large flow) • Shortest message queue • (long implies saturation) • More to come … Improving P2P Search

  30. Local Indices • Idea: • Nodes hold metadata of all nodes at radius r • Can process query at a few nodes, but get same number of results • Aims to balance satisfaction / costs Improving P2P Search

  31. Local Indices • Elements: • Policies P={a,b,c,..} define the depths at which we search • Example P={1,5,6} • Nodes at depth 1 process the query • Nodes at depth 2,3,4 forward without processing • Policy ends at depth 6 • System-wide Radius r(small ~ 50K metadata ) Improving P2P Search

  32. Example P={1,4} Process Don’t process r = ? Improving P2P Search

  33. Local Indices (cont…) • Notice that now there is an overhead • On Join • Send join message of TTL = r • Direct Exchange of metadata • On leave / timeout • remove metadata of gone / dead nodes • On Update • Send update message of TTL = r Improving P2P Search

  34. Experimental Setup

  35. Metrics • How to compare methods ? • Costs • Results • Time Improving P2P Search

  36. Metrics 1. Costs • We do not base cost on a specific query but rather calculate the average cost on Q rep , a representative set of real queries submitted • It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network) • Therefore our two cost metrics are • Average aggregate bandwidth • Average aggregate processing cost Improving P2P Search

  37. Metrics 2. Results Quality • Number of results • Satisfaction 3. Time to satisfaction Improving P2P Search

  38. Data Collection • Data gathered from Gnutella network • Directly measured • Iterative deepening • Directed BFS • Performance data & analysis • Local indices Improving P2P Search

  39. Data Collection Collected Data Improving P2P Search

  40. Data Collection Extracted Data Improving P2P Search

  41. Calculating Costs • We’ve seen two types of costs • Bandwidth (BW) costs • Processing costs • Calculations should take into account • Costs of sending a query • Costs of sending replies • A example of calculating BW costs Improving P2P Search

  42. D BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) n=1 + n · ( c · R(Q,n) + d · M(Q,n)) Calculating Costs Improving P2P Search

  43. Analysis of Results Iterative Deepening

  44. Symbols Used Improving P2P Search

  45. Results – Iterative Deepening • Recall that iterative deepening policies P={a,b,c,..} define deepening behavior • In order to have the same level of satisfaction as BFS a policy must have D as the last depth • Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier Improving P2P Search

  46. Results – Iterative Deepening • Variables • Define : Pd = { d , d+1 , … , D } P = { Pd for d = 1,2,…,D } = { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} } W (waiting time) can take the values 1,2,4,6,150 (seconds) Improving P2P Search

  47. Results – Iterative Deepening • Fixed values Z = 50 , Ng = 8 • Increasing Z • Lower probability of satisfaction • Higher costs • More results • Decreasing Ng • Slightly Lower probability of satisfaction • Significantly Lower costs Improving P2P Search

  48. Results – Iterative Deepening Improving P2P Search

  49. Results – Iterative Deepening • BW costs same for P7 for all W’s • As d increases costs increase. the larger d is the more likely the policy will “overshoot” • As W decreases costs increase on a small W premature determination of un-satisfaction again leads to overshooting Improving P2P Search

  50. Results – Iterative Deepening Improving P2P Search

More Related