1 / 42

Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and

Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell’Informazione. Outline. Introduction to P2P systems Common topologies Data location Churn Newscast algorithm Security.

walda
Télécharger la présentation

Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Complex Adaptive Systems C.d.L. Informatica – Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell’Informazione

  2. Outline • Introduction to P2P systems • Common topologies • Data location • Churn • Newscast algorithm • Security

  3. Peer-to-peer vs. client-server client-server peer-to-peer • Only nodes located on the “periphery” of the Internet • Tasks distributed across all nodes • Clients talk to other clients • Server well connected to the “center” of the Internet • Servers carries out critical tasks • Clients only talk to server

  4. Example – Video sharing Client-server: YouTube • Advantages • Client can disconnect after upload • Uploader needs little bandwidth • Other users can find the file easily(just use search on server webpage) • Disadvantages • Server may not accept file or remove it later (according to content policy) • Whole system depends on the server(what if shut down like Napster?) • Server storage and bandwidthare expensive! downloader downloader downloader uploader downloader downloader client-server

  5. Example – Video sharing Peer-to-peer: BitTorrent • Advantages • Does not depend on a central server • Bandwidth shared across nodes(downloaders also act as uploaders) • High scalability, low cost • Disadvantages • Seeder must remain on-line to guarantee file availability • Content is more difficult to find(downloaders must find .torrent file) • Freeloaders cheat in order to download without uploading downloader downloader downloader seeder downloader downloader peer-to-peer

  6. Comparison: P2P vs. client-server • Client-server • Asymmetric: client and servers carry out different tasks • Global knowledge: servers have a global view of the network • Centralization: communications and management are centralized • Single point of failure: a server failure brings down the system • Limited scalability: servers easily overloaded • Expensive: server storage and bandwidth capacity is not cheap • Peer-to-peer • Symmetric: each node carries out the same tasks • Local knowledge: nodes only know a small set of other nodes • Decentralization: nodes must self-organize in a decentralized way • Robustness: several nodes may fail with little or no impact • High scalability: high aggregate capacity, load distribution • Low-cost: storage and bandwidth are contributed by users

  7. Characterizing peer-to-peer systems • The main characteristics of P2P systems are: • decentralization (i.e., no central server) • self-organization (e.g., adding new nodes and removing disconnected ones) • symmetric communications (e.g., peers act as clients and servers) • scalability (thanks to high aggregate capacity and load distribution) • shared ownership (i.e., storage and bandwidth are contributed by peers) • overlay construction and routing (i.e., nodes form a logical network on top of the underlying IP network) a message from one peer to another is sent through the underlying IP network

  8. P2P environment • P2P systems are deployed in a challenging environment: • High latency and low bandwidth between nodes- a high hop count will result in a high end-to-end latency - transferring large files may take a long time • Churn- nodes may disconnect temporarily- new nodes are constantly joining the system, while others leave the overlay permanently • Security- P2P clients run on machines under full control of their users- data sent to other nodes may be erased, corrupted, disclosed, etc.- malicious users may try to bring down the system (e.g., routing attack) • Selfishness- users may run hacked P2P clients in order to avoid contributing resources

  9. Problems • Some of the problems that a P2P systems designer must face: • Overlay construction and maintenance- maintain a given overlay topology (e.g., random, two-level, ring, etc.) • Data location- locate a given data object among a large number of nodes • Data dissemination- propagate data in an efficient and robust manner • Per-node state- keep the amount of state per node small • Tolerance to churn- maintain system invariants (e.g., topology, data location, data availability) despite node arrivals and departures

  10. Topology • Some common topologies: • Flat unstructured: a node can connect to any other node- only constraint: maximum degree dmax- fast join procedure- usually very tolerant to churn- good for data dissemination, bad for location • Two-level unstructured: nodes connect to a supernode- supernodes form a small overlay- used for indexing and forwarding- large state and high load on supernodes • Flat structured: constraints based on node ids - allows for efficient data location- constraints require long join and leave procedures- less robust in high-churn environments

  11. Data location - Flooding • Problem: find the set of nodes S that store a copy of object O • Solutions: • (1) Flooding: send a search message to all nodes [first Gnutella protocol] • A search message contains either keywords or an object id • Advantages: • - simplicity • - no topology constraints • Disadvantages: • - high network overhead (huge traffic generated by each search request) • - flooding stopped by TTL (which produces search horizon) • - only applicable to small number of nodes

  12. Data location - Flooding (1) Flooding (cont.) Flooding in a flat unstructured network: obj search horizon for TTL = 2 search • Objects that lie outside of the horizon are not found

  13. Data location - Superpeers • (2) Two-level overlay: use superpeers to index the locations of an object [eMule, Gnutella 2, BitTorrent] • Each node connects to a superpeer and advertises the list of objects it stores • Search requests are sent to the superpeer, which forwards them to other superpeers • Advantages: • - highly scalable • Disadvantages: • - superpeers must be realiable, powerful and well connected to the Internet (expensive) • - superpeers must maintain large state • - the system relies on a small number of superpeers

  14. Data location - Superpeers (2) Two-level overlay (cont.) request obj response • A two-level overlay is a partially centralized system • In some systems superpeers do not connect to each other (e.g., BitTorrent)

  15. Data location - KBR • (3) Structured networks: use a routing algorithm that implements Key-Based Routing [Overnet, Kad, BitTorrent trackerless] • Key-Based Routing (also known as Distributed Hash Tables, or DHTs) works as follows: • each node is given a unique node identifier, or nodeid • given a key k, the node whose nodeidis numerically closest to kamong all nodes in the network is known as the root of key k • given a routing key k, a KBR algorithm can route a message to theroot of k in a small number of hops, usually O(log N) • the location of an object with id objectid is tracked by the root ofk = objectid • thus, one can find the location of an object by routing a message to theroot of k = objectid and querying the root for the location of the object

  16. Data location - KBR (cont.) • Key-Based Routing [Pastry] • Source node id: 04F2 • k = object id: 8955 • Hop # Hop id Shared prefix length • 0 04F2 0 • 1 85E0 1 • 2 8909 2 • 3 8957 3 • 4 8954 3 (root of k) route(k=8955,msg) 04F2 E25A obj 8955 C52A 3A79 AC78 5230 8957 620F 8954 obj 8955 obj8955stored onnodes 620F,C52A 8909 85E0 • Object 8955 is tracked by node 8954, which knows of two copies storedat nodes 620F and C52A 8821 overlay address space [0000,FFFF]

  17. Data location - KBR (cont.) • Routing table for node 4F28 [Pastry] used to find next hop with longer shared prefix used to find the nodeid closest to a key that is close to the local nodeid • In this example the routing table size is 4 x 15 = 60 entries, for a maximum network size of N = 65536nodes. • The average route length in this case is 4 hops.

  18. Data location - KBR (cont.) • (3) Structured networks (cont.) • Advantages: • - completely decentralized (no need for superpeers) • - routing algorithm achieves low hop count for large network sizes • Disadvantages: • - each object must be tracked by a different node • - objects are tracked by unreliable nodes (i.e., which may disconnect) • - keyword-based searches are more difficult to implement than with superpeers (because objects are located by their objectid) • - the overlay must be structured according to a given topology in order to achieve a low hop count • - routing tables must be updated every time a node joins or leaves the overlay

  19. Data location - Loosely structured overlays • (4) Loosely structured networks: use hints on the location of objects [Freenet] • Nodes locate objects by sending search requests containing the object id • Requests are propagated using a technique similar to flooding • Objects with similar identifiers are grouped on the same nodes requestfor AE5J C A B AE5J 5B20 response F D E AF02

  20. Data location - Loosely structured overlays • (4) Loosely structured networks (cont.) • A search response leaves routing hints on the path back to the source • Hints are used when propagating future requests for similar object ids HintsAE5J: B HintsAE5J: D5B20: E C A requestfor AF02 B AE5J 5B20 F D AF02 E HintsAE5J: F

  21. Data location - Loosely structured overlays • (4) Loosely structured networks (cont.) • Advantages: • - no topology constraints, flat architecture • - searches are more efficient than with plain flooding • Disadvantages: • - does not support keyword-based searches • - search requests have a TTL • - contrary to structured overlays, loosely structured overlay do not guarantee a low number of hops, nor that the object will be found

  22. Data location - Summary • The location schemes described previously can be classified according to: • degree of structure • decentralization

  23. Effects of churn • Churn (node arrivals and departures) can have several effects on a P2P system: • data objects may be become unavailable if all replicas disconnect • routing tables may become inconsistent(e.g., entries may point to nodes which have disconnected) • the overlay may become partitioned if several nodes suddenly disconnect:

  24. Churn tolerance • Node arrivals and departures must not disrupt the normal behavior of the system • system invariants must be maintained • - connected overlay (i.e., no partitions), low average path length • - data objects accessible from anywhere in the network • two types of churn tolerance: • - dynamic repair: ability to react to changes in the overlay to maintain system invariants (e.g., heal partitions) • - static resilience: ability to continue operating correctly before adaptation occurs (e.g., route messages through alternate paths)

  25. Churn – Preventing partitions An simple way to prevent partitions is to increase the node degree Ring partitions can be avoided by keeping a list of successor nodes

  26. Churn – Static resilience (Ring topology) Percentage of failed paths for various numbers of successor nodes 1 successor 16 successors 48 successors More successors reduce the chances of “opening” the ring

  27. Churn – Static resilience (various topologies) Percentage of failed paths for various overlay topologies Some topologies provide more alternate paths between nodes than others

  28. Churn – Dynamic repair (Ring topology) • Dynamically update routing information to adapt to overlay changes • Two types of repair algorithms: • - reactive: start maintenance procedure immediately after detection • - periodic: execute maintenance procedure periodically • A reactive algorithm can bring down the system instead of repairing it: • Each node disconnection triggers a maintenance procedure on a set of nodes • Over a given churn rate, the maintenance traffic congests the network • The network congestion leads to nodes being considered as disconnected • This triggers even more maintenance procedures (positive feedback), eventually bringing down the network • This effect may be avoided using a periodic maintenance algorithm

  29. Churn – Summary • Churn is an important issue in P2P overlays: • Data may become unavailable, and routing information outdated • Static resilience - depends on the topology (i.e., the number of alternate paths) - increases with the average node degree • Dynamic repair protocols must be carefully designed - reactive protocols are usually faster - periodic protocols can handle higher churn rates

  30. Newscast • Peer-to-peer protocol that creates and maintains an unstructured overlay • Highly resilient to churn • Can be used to propagate information • Extremely simple design based on information gossip: • - Each node only knows about a small set of other knows (view) • - Each node periodically picks a random node from this set • - Both nodes exchange their views and update them • The random view exchange makes the algorithm very robust to failures and changes in the overlay

  31. Newscast - View exchange • Each node maintains a view v containing c entries, where • entry = {node address, timestamp} • Each node executes the following code every T seconds: • 1. select random entry r from local view • 2. send local view plus an entry with the local address to node r • 3. retrieve view of node r and merge it with local view • 4. keep the c entries with the most recent timestamps • The view of a node changes on each round

  32. Newscast - View exchange (cont.) Example view exchange initiated by node A B,10 D,5 E,3 X,8 S,12 W,2 B D W E A view of node A (c = 6) S X

  33. Newscast - View exchange (cont.) 1. Select random node B,10 D,5 E,3 X,8 S,12 W,2 selected node view of node A 2. Exchange views (plus local entry) with selected node E,15 J,14 C,9 D,8 H,10 L,14 Z,2 A,15 B,10 D,5 E,3 X,8 S,12 W,2 view of node E view of node A E A

  34. Newscast - View exchange (cont.) 3. Merge views, and order by timestamp 4. Keep c most recent entries E,15J,14L,14 S,12 H,10 B,10 E,15J,14L,14 S,12 H,10 B,10C,9 X,8 D,8 D,5 E,3 W,2 Z,2 merge result new view of node A (c = 6)

  35. Newscast - Average path length • Newscast overlays have a low average path length, i.e., O(log N)(average number of hops between any two nodes)

  36. Newscast - Static resilience • The overlay shows high static resilience

  37. Newscast - Summary • Simple peer-to-peer algorithm • Nodes use only local information • Periodic peer-wise data exchanges • Emergent properties: • - low average path length • - resistance to high churn

  38. Security • Security in peer-to-peer systems is hard to enforce: • Users have full control on their computers • Modified clients may not follow the standard protocol • Communications may be eavesdropped • Data may be corrupted • Private data stored on remote computers may be disclosed

  39. Security - Weak identities • The user may leave the system and rejoin it with a new identity (i.e, user id) • If an attack is detected, the attacker can reenter the system with a new id • An attacker may create a large number of false identities (Sybil attack) • Example of Sybil Attack: • Nodes S1 to S6 are actually 6 instances of the P2P client running on the same machine • The attacker can intercept all traffic coming from or going to node A S1 S2 S6 S3 A S5 S4

  40. Security - Strong identities • The user cannot change its identity • Solution: use a centralized, trusted Certification Authority (CA) • - Each new user must obtain an identity certificate: • certificate = { user id, IP address, user’s public key, signatureCA } • - The certificate is digitally signed by the CA, whose public key is known by all users • - A certificate cannot be forged (would require the CA’s private key) • To prove his identity, a user signs a message with his private key, and attaches the corresponding certificate signed by the CA • Strong identities prevent Sybil Attacks • If an attacker is caught, it cannot easily rejoin the system

  41. Security – Weak vs. strong identities • Strong identities requires a centralized CA • new nodes must contact the CA before joining the network: • - the CA response may be slow (a few days) • - if the CA is unavailable, new nodes cannot join the system • the security of the system depends on the CA: • - the CA must correctly verify the identity of the requester • - the CA’s private key must be kept secret • Many P2P systems use weak identities • IP address already gives some identity information • Some systems ensure anonymity (e.g., FreeNet)

  42. The end • You can get these slides from • http://www.cs.unibo.it/~picconi/slides • For any questions, mail me to: • picconi@cs.unibo.it

More Related