P2pDating for Building semantic overlay networks

Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis P2pDating for Building semantic overlay networks

Why P2P-Networks ? • Decentralisation • No single point of failure • No content-control • Distribution of content,computing power, bandwith

Querying in P2P-networks TTL: 4 hops 3 hops 2 hops 1 hop 0 hops Dirk Nowitzki

Idea Semantic Overlay Networks Querying in unstructured P2P-networks message flooding with TimeToLive many redundant messages Group peers according to their content Querying in Semantic Overlay Network (SON) only ask all nodes for specific content field

Dirk Nowitzki Querying in a SON computer science basketball flowers geology

How to build a SON Dating • Contact other peer P • If( isFriend(P) ) • Add P in list of friends • Add P‘s friends in list of candidates • isFriend(P) judged by • How high is the similarity? • How small is the overlap? • How well did P cooperate?

Process of P2P-dating • peer to send to chosen from 3 lists: friends, candidates, random • send check-alive message to friends • send contact message to candidatesand random peers • receive synopses of collections and compute scores • friend and candidate lists have fixed lengths • Add until full then drop worst peers

Search in SON • peer P sends queries to peers with similar interest profile, i.e. all friends • Each peer only sends his top-k results back • When all answers have arrivedP merges results, removes duplicates and delivers top-k results

Strategies for scores Similarity(A,B) 0 = the same >0 until ∞ : differs • Similarity Only: • Overlap Only: • Weighted Sum: • Random: no Score computed

Overlap Measure Minwise Independent Permutations measure the overlap with formula: = hashs of documents

Similarity Measure Kullback Leibler Divergence/ Relative Entropy Similarity(A,B) 0 = the same >0 until ∞ : differs

PASTRY: network infrastructure • Distributed Hash Table • maps keys to peers currently responsible for that key • MINERVA uses PASTRY • O( log(N) ) hops for any message to reach any destination

Local Collections • Index file saved on hard disk • LUCENE Index is an Inverted Index for terms occuring in websites obtained by • user – with surfing (e.g. by a plugin) • crawler on bookmarks • Allows additions and deletions

Experimental Setup • NUTCH was used as crawler • Seeds: 14-16 start URL‘s on a certain topic from del.icio.us and dmoz.org • Depth: 2each peer ~400 pages peer 1-4 Basketball peer 5-7 Computer Science peer 8-10 Flowers peer 11-12 Geology • Queries for peer 1: „playoffs“, „Dirk Nowitzki“ • Queries for peer 7: „thesis“ • Queries for peer 12: „earth science“

Chart 1 • Comparision for 75 Iterations between - 5 random peers - and p2pdating for 5 friends with weighted sum strategy, alpha=0.8 • y-axis: recallx-axis: iterations in steps of 5

Chart 1

Chart 2 • Comparision for 50 Iterations between- random peers asked- and p2pdating for x friends with weighted sum strategy, alpha=0.8 • y-axis: recallx-axis: #peers asked

Chart 2

Conclusion • Use of PASTRY as underlying routing/networking infrastructure • Implementation of details of peer-to-peer network, p2pdating algorithm • Messages-handling several message types protocol for sending and receiving messages • Adaption of NUTCH to crawling • Use of LUCENE to query indexes • Experiments show benifit of P2PDating algorithm

Future Work • Further Experiments: • real-world data frombookmark lists of active del.icio.us users • Firefox- or Proxy-Plugin for on-the-fly indexing, querying and display of results • Further Applications: • Adaption to MINERVA P2P Web Search

Thank you for your interest Tim Benke PLAGIA

FreePastry • Free open source version under BSD-license called FreePastry • FreePastry provides application level interface to underlying P2P-Network • API for Java 1.5 • Version used: 2.0 Beta

Overview Basics of P2P-networks Querying in P2P-networks Overlap and Similarity Computation Process of P2P-dating Application examples: Firefox plugin del.icio.us

Chart 2 • Comparision for 50 Iterations between- random peers asked- and p2pdating for x-1 Friends and 1 Stranger with weighted sum strategy, alpha=0.8- only K-L-Divergence y-axis: recallx-axis: #Peers asked

Chart 1 • Comparision for 75 Iterations between - 5 random peers - and p2pdating for 4 Friends and 1 Stranger with weighted sum strategy, alpha=0.8 • - only K-L-Divergencey-axis: recallx-axis: iterations in steps of 5

O:P2P-Dating Project • Internet Crawls performed with APACHE-Project NUTCH provides collections • Collections are indexed by NUTCH and a LUCENE index is produced • 1 similarity measure and 1 overlap measure used to determine if node is a Friend

Process of P2P-dating FriendList Michael Jordan

P2pDating for Building semantic overlay networks

P2pDating for Building semantic overlay networks

Presentation Transcript

Overlay networks for wireless ad hoc networks

Overlay/P2P Networks

Resilient Overlay Networks

Semantic Overlay Networks in P2P systems

Semantic Networks

CS5412 : Overlay Networks

Two challenges for building large self-organizing overlay networks

Overlay Networks and Overlay Multicast

Resilient Overlay Networks

Voronoi Overlay Networks

Designing Overlay Multicast Networks for Streaming

GridVine: Building Internet-Scale Semantic Overlay Networks

Resilient Overlay Networks

Building Very Large Overlay Networks

Semantic Networks

Semantic Overlay Networks for P2P Systems

Resilient Overlay Networks

Resilient Overlay Networks

Infrastructure Primitives for Overlay Networks

Building Very Large Overlay Networks