1 / 27

P2pDating for Building semantic overlay networks

Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis. P2pDating for Building semantic overlay networks. Why P2P-Networks ?. Decentralisation No single point of failure No content-control Distribution of content, computing power, bandwith.

leroy
Télécharger la présentation

P2pDating for Building semantic overlay networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tim Benke Supervisors: Josiane Xavier Parreira, Sebastian Michel Bachelor thesis P2pDating for Building semantic overlay networks

  2. Why P2P-Networks ? • Decentralisation • No single point of failure • No content-control • Distribution of content,computing power, bandwith

  3. Querying in P2P-networks TTL: 4 hops 3 hops 2 hops 1 hop 0 hops Dirk Nowitzki

  4. Idea Semantic Overlay Networks Querying in unstructured P2P-networks message flooding with TimeToLive many redundant messages Group peers according to their content Querying in Semantic Overlay Network (SON) only ask all nodes for specific content field

  5. Dirk Nowitzki Querying in a SON computer science basketball flowers geology

  6. How to build a SON Dating • Contact other peer P • If( isFriend(P) ) • Add P in list of friends • Add P‘s friends in list of candidates • isFriend(P) judged by • How high is the similarity? • How small is the overlap? • How well did P cooperate?

  7. Process of P2P-dating • peer to send to chosen from 3 lists: friends, candidates, random • send check-alive message to friends • send contact message to candidatesand random peers • receive synopses of collections and compute scores • friend and candidate lists have fixed lengths • Add until full then drop worst peers

  8. Search in SON • peer P sends queries to peers with similar interest profile, i.e. all friends • Each peer only sends his top-k results back • When all answers have arrivedP merges results, removes duplicates and delivers top-k results

  9. Strategies for scores Similarity(A,B) 0 = the same >0 until ∞ : differs • Similarity Only: • Overlap Only: • Weighted Sum: • Random: no Score computed

  10. Overlap Measure Minwise Independent Permutations measure the overlap with formula: = hashs of documents

  11. Similarity Measure Kullback Leibler Divergence/ Relative Entropy Similarity(A,B) 0 = the same >0 until ∞ : differs

  12. PASTRY: network infrastructure • Distributed Hash Table • maps keys to peers currently responsible for that key • MINERVA uses PASTRY • O( log(N) ) hops for any message to reach any destination

  13. Local Collections • Index file saved on hard disk • LUCENE Index is an Inverted Index for terms occuring in websites obtained by • user – with surfing (e.g. by a plugin) • crawler on bookmarks • Allows additions and deletions

  14. Experimental Setup • NUTCH was used as crawler • Seeds: 14-16 start URL‘s on a certain topic from del.icio.us and dmoz.org • Depth: 2each peer ~400 pages peer 1-4 Basketball peer 5-7 Computer Science peer 8-10 Flowers peer 11-12 Geology • Queries for peer 1: „playoffs“, „Dirk Nowitzki“ • Queries for peer 7: „thesis“ • Queries for peer 12: „earth science“

  15. Chart 1 • Comparision for 75 Iterations between - 5 random peers - and p2pdating for 5 friends with weighted sum strategy, alpha=0.8 • y-axis: recallx-axis: iterations in steps of 5

  16. Chart 1

  17. Chart 2 • Comparision for 50 Iterations between- random peers asked- and p2pdating for x friends with weighted sum strategy, alpha=0.8 • y-axis: recallx-axis: #peers asked

  18. Chart 2

  19. Conclusion • Use of PASTRY as underlying routing/networking infrastructure • Implementation of details of peer-to-peer network, p2pdating algorithm • Messages-handling several message types protocol for sending and receiving messages • Adaption of NUTCH to crawling • Use of LUCENE to query indexes • Experiments show benifit of P2PDating algorithm

  20. Future Work • Further Experiments: • real-world data frombookmark lists of active del.icio.us users • Firefox- or Proxy-Plugin for on-the-fly indexing, querying and display of results • Further Applications: • Adaption to MINERVA P2P Web Search

  21. Thank you for your interest Tim Benke PLAGIA

  22. FreePastry • Free open source version under BSD-license called FreePastry • FreePastry provides application level interface to underlying P2P-Network • API for Java 1.5 • Version used: 2.0 Beta

  23. Overview Basics of P2P-networks Querying in P2P-networks Overlap and Similarity Computation Process of P2P-dating Application examples: Firefox plugin del.icio.us

  24. Chart 2 • Comparision for 50 Iterations between- random peers asked- and p2pdating for x-1 Friends and 1 Stranger with weighted sum strategy, alpha=0.8- only K-L-Divergence y-axis: recallx-axis: #Peers asked

  25. Chart 1 • Comparision for 75 Iterations between - 5 random peers - and p2pdating for 4 Friends and 1 Stranger with weighted sum strategy, alpha=0.8 • - only K-L-Divergencey-axis: recallx-axis: iterations in steps of 5

  26. O:P2P-Dating Project • Internet Crawls performed with APACHE-Project NUTCH provides collections • Collections are indexed by NUTCH and a LUCENE index is produced • 1 similarity measure and 1 overlap measure used to determine if node is a Friend

  27. Process of P2P-dating FriendList Michael Jordan

More Related