1 / 79

Search in Distributed Networks

Search in Distributed Networks. Lecture: Peer-to-peer networks Professor: Dr. Robert Tolksdorf Elena Antonenko  elena.Antonenko@web.de Malte Münchert muencher@inf.fu-berlin.de Jing Zhao zhao@inf.fu-berlin.de Shunfeng Zhang zhang@inf.fu-berlin.de. Language of the talk:.

dutch
Télécharger la présentation

Search in Distributed Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search in Distributed Networks Lecture: Peer-to-peer networksProfessor: Dr. Robert TolksdorfElena Antonenko  elena.Antonenko@web.deMalte Münchert muencher@inf.fu-berlin.deJing Zhao zhao@inf.fu-berlin.deShunfeng Zhang zhang@inf.fu-berlin.de

  2. Language of the talk: • English instead of German! • Comment: German is also a very beautiful language! • Question can asked in German!

  3. Structure of our talk: • Introduction • Content-Agnostic Search (Shunfeng); • Contect-Based Search (Elena); • Pastry(Malte); • JXTA Search (Jing)

  4. Introduction • Most applications (file sharing, instant-messaging, chatting) involve • finding objects and resource of interest • exchanging resources with other peers. • Accomplished by a system of advertisements and queries

  5. Introduction • Advertisement/query model: • Resource providers publish resource and resource consumer send • search queries; • Resource seekers advertise needs on the network and resource providers query the network for resource;

  6. Introduction • The problem reduced to: • query a dynamic and distributed directory of • advertiesements by advertisement consumers • Distributed directory is built using a subset of all the peers in the network

  7. Content-Agnostic Search >>>basic concept Organization of the peers not depend on the resources they index or point to;

  8. Content-Agnostic Search >>> central mediator • Register content with the central server; • Query the central server for Information; • Roles of central server: • Matchmaker • Broker;

  9. Content-Agnostic Search >>> central mediator as Matchmaker ASK-ALL: who can help? Reply: name1 + info1… Unadvertise Advertise STREAM-All „request“ REPLY… Matchmaker Requester Peer

  10. Content-Agnostic Search >>> central mediator as Matchmaker • Requester: an agent with an objective that it wants to be achieved by some other agent. • Matchmaker: an agent that • knows the names of many agents • and their corresponding capabilities. • Server: an agent that has committed itself to fulfilling objectives on behalf of other agents.

  11. Content-Agnostic Search >>> central mediator as Matchmaker

  12. Content-Agnostic Search >>> central mediator as Broker STREAM-ALL: „Request“ REPLY Unadvertise Advertise Broker Requester Peer

  13. Content-Agnostic Search >>>central mediator as Broker • Requester: an agent that has an objective that the agent wants to has achieved by another agent. • Broker: • an agent that knows the names of some other agents and their corresponding capabilities, • and advertises its own capabilities as some function of the capabilities of these other agents. • Brokered Server: an agent that has committed to the broker to taking on a predetermined class of objectives.

  14. Advantages Comprehensive Fast update Minimized messages exchange Disadvantages Central point failure Non-scalabe Needing central authority Comment: Be solved with decentralized mediator Content-Agnostic Search >>>central mediator

  15. Content-Agnostic Search

  16. Content-Agnostic Search >>>Network forming random connected Graphs • Nodes are connected to few random neighbors • Example: Gnutella network • Already done in 2.nd Talk in the Lecture • Power Law Networks The search takes advantage of the power law link distribution of naturally occurring networks

  17. Content-Agnostic Search >>>Power Law Networks

  18. Content-Agnostic Search >>>Power Law Networks • Power law distribution:few nodes have very high connectivitymany nodes with very low connectivity

  19. Content-Agnostic Search >>>Power Law Networks

  20. Content-Agnostic Search >>>Power Law Networks Rule: Each time: one node two edges connect to node with higher degree

  21. Content-Agnostic Search --Power Law Networks

  22. Content-Agnostic Search >>>Power Law Networks • Power law graphs are dynamically constructed  • the rewiring of nodes occurs not randomly, but preferentially attaching to the most connected nodes. 

  23. Content-Agnostic Search >>>Power Law Networks • Power law search algorithm • needs modification to the basic Gnutella approach;

  24. the Gnutella approach Broadcasting to all neighbors Can exchange with every neighbors Modified Gnutella the neighbor with highest connechtions Exchange with the first- and second-degree neighbors Content-Agnostic Search >>>Power Law Networks

  25. Content-Agnostic Search >>>Power Law Networks • Advantages of PLN • Networks of decentralized mediators • Broadcasting queries to all neighbors avoided • Search cost reduced

  26. Content-Based Search: Introduction • Content of queries is used to efficiently route the messages to the most relevant peers • Search techniques include: • Content-mapping networks; • Some variations of publish/subscribe networks; Content-Based Search

  27. Content – Mapping Search Networks • All peer in network index a „zone“ of the advertisement space • The zone is dynamic • Size of the zone depends on the number of peers • Peers map advertisement content to the space • Mapping is performed using hash functions • Examples include: CAN, Chord, Tapestry, Pastry Content-Based Search

  28. Distributed Hash Table (DHT) • DHT provides the same functionality as traditional hash table • DHT stores key value pair • Data structure is distributed over different nodes • Provides functions: • insert(id, item); • item = query(id); • Item can be anything: a data object, document, file, pointer to a file Content-Based Search

  29. Content Addressable Network (CAN) • CAN is based on virtual d-dimensional coordinate space • Associate to each node and item a unique idin an d-dimensional space • Goals • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes Content-Based Search

  30. Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area Example: Node n1: (1, 2) first node that joins  cover the entire space CAN Example: Two Dimensional Space Content-Based Search

  31. Node n2: (4, 2) joins space is divided between n1 and n2 CAN Example: Two Dimensional Space Content-Based Search

  32. Node n3:(3, 5) joins too CAN Example: Two Dimensional Space Content-Based Search

  33. Nodes n4:(5, 5) and n5:(6,6) join CAN Example: Two Dimensional Space Content-Based Search

  34. Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5); n5:(6,6) Items: f1:(2,3); f2:(5,0); f3:(2,1); f4:(7,5) CAN Example: Two Dimensional Space Content-Based Search

  35. Each item is stored by the node who owns its mapping in the space CAN Example: Two Dimensional Space Content-Based Search

  36. Each node knows ist neighbors in the d-space Forward query to the neighbor that is closest to the query id Example: assume n1 queries f4 CAN: Query Example Content-Based Search

  37. CAN Routing • For d dimensions with n equal zones each node has 2d neighbors • Routing table size O(d) • Guarantees that a file is found in at most d x n 1/d steps, where n is the total number of nodes • Algorithm: Choose the neighbor nearest to the destination Content-Based Search

  38. CAN: Multi-Dimension • Increase in the dimension reduces the path length Content-Based Search

  39. Chord: Introduction • Chord is a distributed lookup protocol • Given a key (data item), it maps the key onto a node (peer). • Hash function assigns each node and key anm-bit identifier. • A node’sidentifier is defined by hashing the node’s IP address. • A key identifier is produced by hashing the key • ID(node) = hash(196.178.0.1) • ID(key) = hash(“jingle-bells.mp3”) Content-Based Search

  40. Chord: Data Structure • Identifiers are ordered in a virtual ring of size 2m • Each node maintains • Finger table • Entry iin the finger table of node nis the first node that succeeds or equals n + 2i: successor(id) • Predecessor node • An item identified by idis stored on the successor node of id Content-Based Search

  41. Chord: Example • Assume an identifier space 0..7 • Node n1:(1) joins all entries in its finger table are initialized to itself Content-Based Search

  42. Chord: Example • Nodes n2:(2), n0:(0), n6:(6) join Content-Based Search

  43. Chord: Example Nodes: n0(0),n1:(1), n2(2), n6(6) Items: f1:(1), f7:(7) Content-Based Search

  44. Chord: Example Upon receiving a query for item id, a node • Check whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id Content-Based Search

  45. Chord: Properties • Routing table size O(log(N)) , where N is the total number of nodes • Guarantees that a file is found in O(log(N)) steps Content-Based Search

  46. Pastry - Introduction • Decentralized and scalable DHT-network • Designed for efficient message routing between nodes

  47. What does DHT mean? • Distributed Hash Table • Hash value for every peer • Every peer has knowledge of some other peers (stored in a hash table) • All hash tables from all peers represent a complete map for all peers

  48. Peers reside on a virtual circle made up from all possible addresses Blue points represents peers The Pastry namespace 2128 20

  49. Message is sent to (known) node which is numerically closest to the target-node Procedure is repeated until target-node is reached Pastry routing Origin Closest to target Distance Destination

  50. Message is sent to (known) node which is numerically closest to the target-node Procedure is repeated until target-node is reached Pastry routing Origin Destination

More Related