1 / 19

Information Retrieval Techniques For Peer-To-Peer Networks

Information Retrieval Techniques For Peer-To-Peer Networks. Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos Presented By Ranjan Dash. Layout. Introduction P2P Network IR Techniques PeerWare Infrastructure and experiments. Introduction. Major challenge

tracy
Télécharger la présentation

Information Retrieval Techniques For Peer-To-Peer Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos Presented By Ranjan Dash IR Techniques For P2P Networks

  2. Layout • Introduction • P2P Network IR Techniques • PeerWare Infrastructure and experiments IR Techniques For P2P Networks

  3. Introduction • Major challenge • efficiently search the content of other peers • Definition • Large number of peers collaborate dynamically in an ad hoc manner and share information in large-scale distributed environments without centralized co-ordination • P2P environment characteristic • Each peer has a database or collection of docs • Query contains set of key words • Reply message contains pointers to matching documents • Different from static data environments • No central repository • Nodes join and leave in ad hoc and dynamically IR Techniques For P2P Networks

  4. P2P Network IR Techniques • P2P Network IR Techniques • Breadth-First Search (BFS) • Random Breadth-First-Search (RBFS) • Intelligent Search Mechanism (ISM) • Directed BFS and >RES • Random Walker Searches • Randomized Gossiping • Local Routing Indices • Centralized Approaches • Searching Object Identifiers • Distributed IR IR Techniques For P2P Networks

  5. P2P Network IR Techniques • Breadth-First Search (BFS) • Widely used in file-sharing systems • Propagates to all neighbors except sender • QueryHit Msg (#of docs, bandwidth info) follows the same path • Simple, guarantees high hit rate • Poor in performance and network utilization • Low bandwidth node - a bottleneck • Can be improved using TTL IR Techniques For P2P Networks

  6. P2P Network IR Techniques • Random Breadth-First Search (RBFS) • Dramatic improvements over BFS • Forwards only to a fraction of its peers, selected at random • Does not need global knowledge, takes local decisions - faster • Probabilistic – might not reach some large network segments IR Techniques For P2P Networks

  7. P2P Network IR Techniques • Intelligent Search Mechanism (ISM) • Quick, efficient and least communication costs • Propagates only to peers more likely to reply • Consists of 2 components that run in each peer • Profile mechanism • Relevance rank • Works good for query locality • Forwards to same neighbor always -Starvation for new peers • Solution – add small random subset of peers to most relevant set IR Techniques For P2P Networks

  8. P2P Network IR Techniques • Profile mechanism • Builds a profile for each of its neighboring peers • Maintains T most recent Queries and QueryHits with no of results • Least recently used replacement policy for most recent query IR Techniques For P2P Networks

  9. P2P Network IR Techniques • Relevance rank • Ranking of neighbors to decide which ones to forward a query • Ranking of a peer ‘Pi’ for a query ‘q’ • Qsim is cosine similarity between 2 queries = 0, most results in the past that matters like >RES IR Techniques For P2P Networks

  10. P2P Network IR Techniques • Directed BFS and >RES • forwards a query to a subset of its peers based on some aggregated statistics • Send out to ‘k’ peers which had returned the most results for the last ‘m’ queries • BFS turned into a DFS for ‘k’ = 1, ‘m’=10 • Similar to ISM, but simpler • Does not explore nodes that contain content related to query • Performs well because it routes larger networks segments IR Techniques For P2P Networks

  11. P2P Network IR Techniques • Random-Walker Searches • Each node randomly forwards a query message, called a walker to one of its peers • Can be extended from 1-walker to k-walker • Resembles RBFS but message numbers increase linearly • Like RBFS does not use most relevant content to guide query • Adaptive Probability search (APS) – similar • Uses feed back from previous searches to probabilistically guide future walkers IR Techniques For P2P Networks

  12. P2P Network IR Techniques • Randomized Gossiping – PlanetP • Global inverted index, partially constructed by each node, called local index bloom filter • Propagates it to the rest through gossiping • Adv. Of bloom filter – • Smaller messages • Saving in network I/O • Problem of scalability for PlanetP IR Techniques For P2P Networks

  13. P2P Network IR Techniques • Local Routing Indices • by Arturo Crespo and Hector Garcia-Molina • Hybrid technique uses local indices containing the “direction” toward the documents • 3 techniques – • compound routing indices (CRI) • hop-count routing index (HRI) • exponentially aggregated index (ERI) • Good for topologies where only few nodes have very large numbers of neighbors - (tree, tree with cycles) • The routing indices are similar to the routing tables deployed in the Bellman–Ford • CRI - a node q maintains statistics for each neighbor that indicate how many documents are reachable through each neighbor. • HRI - CRI for k hops – prohibitive storage cost for large k. • ERI - addresses the issue of HRI by aggregating HRI using a cost formula. IR Techniques For P2P Networks

  14. P2P Network IR Techniques • Centralized Approaches • maintain an inverted index over all the documents in the participating hosts’ collections - Google, Yahoo, Napster • Each joining peer A uploads an index of all its shared documents to the central repository R. • A querying node B searches A’s documents through R. • B can communicate with A directly (using an out-of-band protocol such as HTTP). • Kazaa - Little different. Uses a set of more-powerful peers that acts as a central repositories • different kind of animal than the rest. • Simple, Robust, shorter search time, guaranteed to find all results IR Techniques For P2P Networks

  15. P2P Network IR Techniques • Searching Object Identifiers • Distributed file indexing systems - Chord, OceanStore, and Content –Addressable Network (CAN), Freenet • efficient searches using object identifiers (a hashcode on the name of a file) rather than keywords. • Perform object lookup operations to get the address (an IP address) of the node that is storing the object. • Optimizes object retrieval by minimizing the numbers of messages and hops required. • Disadvantage - only search for object identifiers and thus can’t capture the relevance of the doc. IR Techniques For P2P Networks

  16. P2P Network IR Techniques • Distributed IR • Having distributed databases, the main IR problem is deciding which databases are most likely to contain the most relevant documents. • It’s possible to achieve good results for conceptually separated collections. • However, the assumption is that the querying party has some statistical knowledge about each database’s contents (word frequencies in documents) and therefore must have a global view of the system. IR Techniques For P2P Networks

  17. PeerWare Infrastructure and experiments • Evaluation metrics – • recall rate – the fraction of documents each of the search mechanisms retrieves • Efficiency - the number of messages needed to find the results • Implemented only algorithms that require local knowledge when searching for documents. • BFS (the baseline) • Implemented RBFS, >RES (k = 0.5 * d and m = 100, where d is the degree of a node) , and ISM • these 3 techniques forward query messages to half the neighbors that BFS contacts. • >RES and ISM use previous knowledge to decide on which peers to forward the query IR Techniques For P2P Networks

  18. PeerWare Infrastructure and experiments BFS requires almost 2.5 times as many messages as its competitors. IR Techniques For P2P Networks

  19. PeerWare Infrastructure and experiments ISM found the most documents. ISM achieved almost a 90-percent recall rate while using only 38 percent of the messages BFS required. ISM improves its knowledge over time. Both >RES and ISM started out with a low recall rate (around 40 to 50 percent) because initially they randomly choose their neighbors. IR Techniques For P2P Networks

More Related