Seminar: The Economics of Peer-to-Peer Architectures

Seminar: The Economics of Peer-to-Peer Architectures P2P Networks are Distributed Hash Tables Thomas Zahn

Motivation • Peer-to-Peer used in an ever-increasing number of areas • first-generation P2P networks (Gnutella, etc) do NOT scale •  more efficient look-up service needed • DHTs provide guaranteed lower bounds on key look-up

Table of Contents • First-generation P2P systems: Napster, Gnutella • General concepts of DHTs • Case study I: Chord • Case study II: Pastry • Comparison / Conclusion

Napster • was used primarily for file sharing • NOT a pure peer-to-peer network • => hybrid system • peer turns to central DB for querying (client/server) • peer downloads directly from other peer(s) (peer-to-peer)

Napster 5 4 6 central DB 3 3. Download Request 2. Response 1. Query 4. File 1 2 Peer

Gnutella - overview • pure peer-to-peer • used for file sharing • very popular => practically proven ? • very simple protocol • no routing "intelligence" • messages are always broadcast

Gnutella - PING/PONG 3 6 Ping 1 Ping 1 Pong 3 Pong 6 Pong 6,7,8 Pong 6,7,8 Ping 1 7 Pong 3,4,5 Pong 5 5 1 2 Pong 7 Ping 1 Ping 1 Ping 1 Pong 2 Known Hosts: 2 Pong 8 Pong 4 8 Ping 1 3,4,5 6,7,8 Query/Response analogous 4

General Concepts of DHTs • every object has a (hash) key • an object is stored at node responsible for its key • every node stores and maintains part of hash table • ALL DHTs provide one elementary function: • lookup(key)  node

Case Study I: Chord (1) • consistent hashing (e.g. SHA-1) assigns each node and object an m-bit ID • IDs are ordered in an ID circle ranging from 0 – (2m-1) • new nodes assume slots in ID circle according to their ID • Key k is assigned to first node whose ID ≥ k •  successor(k)

Case Study I: Chord (2) successor(0)=0 0 1 14 successor(1)=1 0 15 1 successor(14)=15 2 successor(2)=3 14 10 2 13 3 12 4 5 11 6 10 successor(6)=6 6 successor(10)=13 7 9 8 successor(9)=9 9

Case Study I: Chord (3) • each node n maintains routing table with at most m entries  finger table • ith entry = first node s that succeeds n by at least 2i-1 •  s = successor(n.ID + 2i-1)

Case Study I: Chord (4) 0 1 14 0 15 1 2 14 10 2 13 3 12 4 5 11 6 10 6 7 9 8 9

Case Study I: Chord (5) Routing when node n searches for key k: • if successor(k) unknown, ask closest match (most immediately preceding) n' in finger table • if successor(k) also unknown to n', n' asks closest match in its finger table •  continues until successor(k) is found •  at each step distance to successor(k) is halved •  O (log N)

Case Study I: Chord (6) 14 0 15 1 2 14 13 3 12 4 5 11 6 10 7 9 8

Case Study I: Chord (7) • new node n joins by connecting to any existing node z • z looks up all m fingers for n •  would yield effort of O(m*logN) • BUT: often ith finger is also (i + 1)th finger •  actual effort can be shown to be O(log2N)

Case Study I: Chord (8) • new node n also has to be inserted into finger tables of other nodes • n becomes ith finger of node p iff: • (1) p.ID ≤ n.ID – 2i-1 • (2) p.finger[i].ID ≥ n.ID •  for each i=1 to m, node n finds immediate predecessor of n.ID – 2i-1 AND •  inserts itself (n) as that node's ith finger (if need be) •  can be shown to be O(log2N)

Case Study I: Chord (9) 0 15 1 2 14 13 3 12 4 5 11 6 10 7 9 8

Case Study I: Chord (10) • each node n runs stabilize procedure periodically • n asks its successor for the successor's predecessor p • n checks whether p ought to be its successor instead • n also periodically refreshes random finger x by (re)locating successor(n.ID + 2x-1)

Case Study II: Pastry (1) • similiar to Chord • organizes nodes into a circular ID space ranging from 0 to 2128 – 1 • each node and object is assigned a 128 bit hash key • BUT: routing is based on prefix matching rather than numerical difference • takes (topology) proximity into account

Case Study II: Pastry (2) • each (obj/node) ID sequence of digits with base 2b (b=1 binary, b=4 hex) • each node maintains a: • routing table • neighborhood set • leaf set

Case Study II: Pastry (3) Routing Table: • log2b N rows • each row has 2b – 1 entries • each entry in row n has the same n-digit prefix as current node • but each such node differs in n + 1st digit • each such node has one of 2b -1 other digits at position n + 1

Case Study II: Pastry (4) Routing Table: NodeID 10233102, b = 2

Case Study II: Pastry (5) Neighborhood Set: • contains m closest nodes (according to the proximity metric) • NOT used for routing • useful in maintaining locality properties

Case Study II: Pastry (6) Leaf Set: • contains l numerically closest nodes • l/2 numerically smaller nodes • l/2 numerically larger nodes

Case Study II: Pastry (7) NodeID 10233102, b = 2

Case Study II: Pastry (8) Routing (1): • node n checks leaf set for key k • if a leaf node l covers k, forward msg directly to node l • otherwise, check routing table for a node with matching prefix that is (at least) one digit longer than n's • if no such node exists, find node in leaf set that is numerically closer to k

Case Study II: Pastry (9) Routing (2): • process continues until key k found • intuitively: at each routing step, matching prefix grows by one digit • at most log2b 2128 digits per key •  O(log2b N)

Case Study II: Pastry (10) d471f1 0 2128-1 d467c4 d462ba d46a1c d4213f Route(d46a1c) 65a1fc

Case Study II: Pastry (11) • new node n ask any existing node z to route its ID to numerically closest node x • every node on path z  x sends complete routing table to n • ith row in n's RT will be initialized with ith row of ith node on path z  x • finally, n sends copy of its resulting state to each node in its RT, LS, and NHS •  O(log2b N)

Case Study II: Pastry (12) Locality • Pastry tries to have each entry in node X's RT refer to a node that is near X • new node n chooses a node z that is near • by induction it can be shown that RT initialization sustains this locality invariant • a node periodically requests states from nodes in its NHS to update its RT

Other DHTs • CAN • uses d-dimensional Cartesian coordinate space • if d=log N, it will also guarantee key look-up in O(log N) • Tapestry • VERY similiar to Pastry • also uses prefix-based routing • differs in approach to achieving locality • differs in replication support

Comparison / Conclusion • Chord and Pastry both guarantee O(log N) lower bound on key look-up • Chord appears less complex • Chord needs less state information • Pastry (heuristically) considers proximity •  both VERY similiar •  interchangeable

Questions ?

Seminar: The Economics of Peer-to-Peer Architectures