300 likes | 577 Vues
Peer to Peer and Distributed Hash Tables. Distributed Hash Tables. Challenge: To design and implement a robust and scalable distributed system composed of inexpensive, individually unreliable computers in unrelated administrative domains. Partial thanks Idit Keidar ).
E N D
Distributed Hash Tables Challenge: To design and implement a robust and scalable distributed system composed of inexpensive, individually unreliable computers in unrelated administrative domains Partial thanks IditKeidar) CS 271
Searching for distributed data • Goal: Make billions of objects available to millions of concurrent users • e.g., music files • Need a distributed data structure to keep track of objects on different sires. • map object to locations • Basic Operations: • Insert(key) • Lookup(key) CS 271
Searching N2 N1 N3 Key=“title” Value=MP3 data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5 CS 271
Simple Solution • First There was Napster • Centralized server/database for lookup • Only file-sharing is peer-to-peer, lookup is not • Launched in 1999, peaked at 1.5 million simultaneous users, and shut down in July 2001. CS 271
Publish Napster: Publish insert(X, 123.2.21.23) ... I have X, Y, and Z! 123.2.21.23 CS 271
Fetch Query Reply Napster: Search 123.2.0.18 search(A) --> 123.2.0.18 Where is file A? CS 271
Overlay Networks • A virtual structure imposed over the physical network (e.g., the Internet) • A graph, with hosts as nodes, and some edges Overlay Network Keys Node ids Hash fn Hash fn CS 271
Unstructured Approach: Gnutella • Build a decentralized unstructured overlay • Each node has several neighbors • Holds several keys in its local database • When asked to find a key X • Check local database if X is known • If yes, return, if not, ask your neighbors • Use a limiting threshold for propagation. CS 271
I have file A. I have file A. Reply Query Gnutella: Search Where is file A? CS 271
Structured vs. Unstructured • The examples we described are unstructured • There is no systematic rule for how edges are chosen,each node “knows some” other nodes • Any node can store any data so a searched data might reside at any node • Structuredoverlay: • The edges are chosen according to some rule • Data is stored at a pre-defined place • Tables define next-hop for lookup CS 271
Hashing • Data structure supporting the operations: • void insert( key, item ) • item search( key ) • Implementation uses hash function for mapping keys to array cells • Expected search time O(1) • provided that there are few collisions CS 271
Distributed Hash Tables (DHTs) • Nodes store table entries • lookup( key ) returns the location of the node currently responsible for this key • We will mainly discuss Chord, Stoica, Morris, Karger, Kaashoek, and Balakrishnan SIGCOMM 2001 • Other examples: CAN (Berkeley), Tapestry (Berkeley), Pastry (Microsoft Cambridge), etc. CS 271
CAN [Ratnasamy, et al] • Map nodes and keys to coordinates in a multi-dimensional cartesian space Zone source key Routing through shortest Euclidean path For d dimensions, routing takes O(dn1/d) hops
Chord Logical Structure (MIT) • m-bit ID space (2m IDs), usually m=160. • Nodes organized in a logical ring according to their IDs. N1 N56 N51 N8 N10 N48 N14 N42 N21 N38 N30
DHT: Consistent Hashing Key 5 K5 Node 105 N105 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID CS 271 Thanks CMU for animation
Consistent Hashing Guarantees • For any set of N nodes and K keys: • A node is responsible for at most (1 + )K/N keys • When an (N + 1)st node joins or leaves, responsibility for O(K/N) keys changes hands CS 271
DHT: Chord Basic Lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60 • Each node knows only its successor • Routing around the circle, one node at a time. CS 271
DHT: Chord “Finger Table” 1/2 1/4 1/8 1/16 1/32 1/64 1/128 N80 • Entry i in the finger table of node n is the first node that succeeds or equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring CS 271
DHT: Chord Join • Assume an identifier space [0..8] • Node n1 joins Succ. Table 0 i id+2i succ 0 2 1 1 3 1 2 5 1 1 7 6 2 5 3 4 CS 271
DHT: Chord Join • Node n2 joins Succ. Table 0 i id+2i succ 0 2 2 1 3 1 2 5 1 1 7 6 2 Succ. Table i id+2i succ 0 3 1 1 4 1 2 6 1 5 3 4 CS 271
DHT: Chord Join Succ. Table • Nodes n0, n6 join i id+2i succ 0 1 1 1 2 2 2 4 0 Succ. Table 0 i id+2i succ 0 2 2 1 3 6 2 5 6 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 CS 271
DHT: Chord Join Succ. Table Items • Nodes: n1, n2, n0, n6 • Items: f7, f1 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 CS 271
DHT: Chord Routing Succ. Table Items • Upon receiving a query for item id, a node: • Checks whether stores the item locally? • If not, forwards the query to the largest node in its successor table that does not exceed id 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 query(7) 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 CS 271
Chord Data Structures • Finger table • First finger is successor • Predecessor • What if each node knows all other nodes • O(1) routing • Expensive updates CS 271
Routing Time n • Node nlooks up a key stored at node p • pis in n’sith interval: p ((n+2i-1)mod 2m, (n+2i)mod 2m] • ncontacts f=finger[i] • The interval is not empty so:f ((n+2i-1)mod 2m, (n+2i)mod 2m] • f is at least 2i-1 away from n • p is at most 2i-1 away from f • The distance is halved at each hop. n+2i-1 f finger[i] p n+2i
Routing Time • Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step: • Expected number of steps: log N • Note that: • m = 160 • For 1,000,000 nodes, log N = 20 CS 271
P2P Lessons • Decentralized architecture. • Avoid centralization • Flooding can work. • Logical overlay structures provide strong performance guarantees. • Churn a problem. • Useful in many distributed contexts. CS 271