1 / 70

700 likes | 886 Vues

Improving Search in P2P Networks. By Shadi Lahham. Purpose of This Lecture. General understanding of P2P systems Appreciating the need for efficient search Applying different search techniques to different scenarios. P2P Basics What Is P2P Advantages of P2P Types of P2P Systems

Télécharger la présentation
## Improving Search in P2P Networks

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**Improving Search in P2P Networks**By Shadi Lahham**Purpose of This Lecture**• General understanding of P2P systems • Appreciating the need for efficient search • Applying different search techniques to different scenarios Improving P2P Search**P2P Basics**What Is P2P Advantages of P2P Types of P2P Systems Shortcomings Search Methods The Search Problem Current Methods Suggested Methods Experimental Setup Metrics Data Collection Calculating Costs Analysis of Results Conclusions Table Of Contents Improving P2P Search**Introduction**P2P Basics**What is P2P**• Distributed system • Peers (nodes) are servers and clients simultaneously • Peers are of equal roles • Resources shared across peers • No central server needed • Examples of P2P system Improving P2P Search**Key**File f1 file1 f2 file2 f3 file3 P2P Overview Improving P2P Search**Advantages of P2P**• P2P vs. Centralized Servers • Distributes disk space / bandwidth • Inexpensively scalable • Self organized (autonomous) • Load balancing • Adaptative / fault tolerant • Less susceptible to attacks • Allows for redundancy Improving P2P Search**Types of P2P Systems**• Hybrid ( napster ) • Pure ( gnutella ) • Super Peers ( kaZaA ) Improving P2P Search**Hybrid ( napster )**Improving P2P Search**Pure ( gnutella )**Improving P2P Search**Super Peers ( kaZaA )**• Make use of heterogeneity • Powerful peers serve as super peers • Weaker peers act as clients • Super-peers index clients’ files • Requires updates on join/leave/update • Queries handled at super-peer level • Saves query costs Improving P2P Search**Super Peers ( kaZaA )**Improving P2P Search**Hybrid - Shortcomings**• High cost on centralized index • Performance & scalability bottleneck • Needs maintenance • Vulnerable ! Highly visible target Improving P2P Search**Pure - Shortcomings**• Inefficient search (flooding) • Heterogeneity of peers not considered • Bottlenecks (limited peers) • Fragmentation Improving P2P Search**Super Peers - Shortcomings**• Super nodes might become bottlenecks for clients • requires redundancy • Bad selection of supernodes might cause even worse problems Improving P2P Search**The Search Problem**• Connected graph • Might contain cycles • Individual node doesn’t know structure • Only knows its neighbors • No idea where data can be found Improving P2P Search**The Search Problem**• Goal : Find as many occurrences of the data using min time and resources • Solution : • BFS ? • Bounded BFS ? • (naive approaches) Improving P2P Search**Bounded BFS Search**TTL=2 TTL=1 TTL=0 Improving P2P Search**Bounded BFS Search**• Messages get a global TTL (time to live) • Algorithm • Source broadcasts a message to a subset of neighbors • Neighbors search locally . Results are sent to source if found • TTL = TTL – 1; • As long as TTL > 0 Nodes forward message to neighbors • Downside : wastes bandwidth / processing Improving P2P Search**Current Methods**• Gnutella - BFS • High cost • Gets complete results ( for depth D) • Relatively short time • Freenet - DFS • Poor response time • Minimizes BW costs Improving P2P Search**Suggested Methods**• Iterative deepening • Directed BFS • Local Indices Improving P2P Search**Iterative Deepening**• Idea: • Search at a small depth and increase if required • Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries • Notice that given enough iterations this method returns %100 results of BFS Improving P2P Search**Iterative Deepening (cont…)**• Elements : • Policies P={a,b,c,..} define deepening behavior • BFS is run to depth a and frozen • If source is satisfied it stops the process • Otherwise it asks BFS to resume to depth b • Process is repeated until source satisfied or we reach the last policy item Improving P2P Search**Iterative Deepening (cont…)**• Elements : • We can specify how long to wait between iterations • We need a system-wide message ID to identify individual messages Improving P2P Search**Example P={1,3,4} W=1**Improving P2P Search**Directed BFS**• Idea: • Choose a subset of neighbors to query • Neighbors will BFS as usual • Aims to provide a balance between good response time and results • Minimize costs of full BFS • Notice that only a subset of possible results are returned so we might fail to satisfy query Improving P2P Search**Directed BFS Example**TTL=2 TTL=1 TTL=0 Improving P2P Search**Directed BFS (cont…)**• But which neighbors to pick ?? • Maintain simple statistics on neighbors to derive heuristics • Highest past results • Lowest average hops • (close to nodes containing useful data) • High message count • (stable - can handle large flow) • Shortest message queue • (long implies saturation) • More to come … Improving P2P Search**Local Indices**• Idea: • Nodes hold metadata of all nodes at radius r • Can process query at a few nodes, but get same number of results • Aims to balance satisfaction / costs Improving P2P Search**Local Indices**• Elements: • Policies P={a,b,c,..} define the depths at which we search • Example P={1,5,6} • Nodes at depth 1 process the query • Nodes at depth 2,3,4 forward without processing • Policy ends at depth 6 • System-wide Radius r(small ~ 50K metadata ) Improving P2P Search**Example P={1,4}**Process Don’t process r = ? Improving P2P Search**Local Indices (cont…)**• Notice that now there is an overhead • On Join • Send join message of TTL = r • Direct Exchange of metadata • On leave / timeout • remove metadata of gone / dead nodes • On Update • Send update message of TTL = r Improving P2P Search**Metrics**• How to compare methods ? • Costs • Results • Time Improving P2P Search**Metrics**1. Costs • We do not base cost on a specific query but rather calculate the average cost on Q rep , a representative set of real queries submitted • It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network) • Therefore our two cost metrics are • Average aggregate bandwidth • Average aggregate processing cost Improving P2P Search**Metrics**2. Results Quality • Number of results • Satisfaction 3. Time to satisfaction Improving P2P Search**Data Collection**• Data gathered from Gnutella network • Directly measured • Iterative deepening • Directed BFS • Performance data & analysis • Local indices Improving P2P Search**Data Collection**Collected Data Improving P2P Search**Data Collection**Extracted Data Improving P2P Search**Calculating Costs**• We’ve seen two types of costs • Bandwidth (BW) costs • Processing costs • Calculations should take into account • Costs of sending a query • Costs of sending replies • A example of calculating BW costs Improving P2P Search**D**BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) n=1 + n · ( c · R(Q,n) + d · M(Q,n)) Calculating Costs Improving P2P Search**Analysis of Results**Iterative Deepening**Symbols Used**Improving P2P Search**Results – Iterative Deepening**• Recall that iterative deepening policies P={a,b,c,..} define deepening behavior • In order to have the same level of satisfaction as BFS a policy must have D as the last depth • Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier Improving P2P Search**Results – Iterative Deepening**• Variables • Define : Pd = { d , d+1 , … , D } P = { Pd for d = 1,2,…,D } = { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} } W (waiting time) can take the values 1,2,4,6,150 (seconds) Improving P2P Search**Results – Iterative Deepening**• Fixed values Z = 50 , Ng = 8 • Increasing Z • Lower probability of satisfaction • Higher costs • More results • Decreasing Ng • Slightly Lower probability of satisfaction • Significantly Lower costs Improving P2P Search**Results – Iterative Deepening**Improving P2P Search**Results – Iterative Deepening**• BW costs same for P7 for all W’s • As d increases costs increase. the larger d is the more likely the policy will “overshoot” • As W decreases costs increase on a small W premature determination of un-satisfaction again leads to overshooting Improving P2P Search**Results – Iterative Deepening**Improving P2P Search

More Related