1 / 16

Scalable Content-Addressable Networks

Scalable Content-Addressable Networks. Prepared by Kuhan Paramsothy March 5, 2007. High-Level Overview. Hash tables (map keys to values) are heavily used in building software applications

gyala
Télécharger la présentation

Scalable Content-Addressable Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Content-Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

  2. High-Level Overview • Hash tables (map keys to values) are heavily used in building software applications • The concept of a Content-Addressable Network (CAN) provides hash table-like functionality on Internet-like scales. • CAN is: • Scalable • Robust/Fault-tolerant • Self-organizing • Low-latency ECE 1770 – Content-Addressable Networks

  3. Hash Tables and CAN • A data structure that efficiently maps keys onto values • CANs are a form of distributed, Internet-scale hash tables. ECE 1770 – Content-Addressable Networks

  4. What CAN would do for us • CAN would improve peer-to-peer systems • Napster: the process of locating a file is centralized • Expensive to scale the central repository, single point of failure • Gnutella: decentralized the file location process (network self-organizes into an application layer mesh) • Requests for files are done through flooding, not scalable, may not find content • Conclusion: P2P systems need a scalable indexing mechanism • CAN would improve large data repositories • These systems need efficient insertion and retrieval • CAN would create large-scale name resolution services that don’t use a naming scheme (ie. Not DNS) • No more location-dependent naming schemes ECE 1770 – Content-Addressable Networks

  5. Basic Operations Performed On CANs • Basic Operations • Insertion (of key,value pairs) • Lookup (of key,value pairs) • Deletion (of key,value pairs) • Each CAN stores • A piece (called a zone) of the entire hash table • Holds information about a small number of adjacent zones in the table • Routing in a CAN • Done by intermediate CAN nodes towards the CAN node whose zone contains that key • CAN Design is • Distributed (requires no centralized control or coordination) • Scalable (nodes hold only a small about of information that doesn’t grow with the network) • Fault-tolerant (nodes can route around failures) • Doesn’t require a naming hierarchy • Is entirely Application Layer ECE 1770 – Content-Addressable Networks

  6. CAN Design • Centers around a virtual d-dimensional Cartesian coordinate space on a d-torus • At any time, the entire coordinate space is dynamically partitioned among all the nodes in the system • Each node owns a distinct zone ECE 1770 – Content-Addressable Networks

  7. CAN Design (2) • To store a pair, key K1 is mapped to P via a uniform hash function • The pair is then stored at the node that owns the zone where P lies • To retrieve an entry corresponding to K1, any node can apply the same hash function to map K1 to P and get the corresponding value • A node learns and maintains the IP addresses of those nodes that hold adjoining coordinate zones Efficient routing is critical to a useful CAN ECE 1770 – Content-Addressable Networks

  8. Routing in a CAN • Routing in a Content Addressable Networks works by following the straight line path through the Cartesian space from source to destination coordinates. • A CAN node maintains a coordinate routing table that holds the IP address and virtual coordinate zone of each of its immediate neighbors in the coordinate space. Average Path Length = (d/4)(n1/d) Individual Nodes Have 2d Neighbors Average Path Length Grows As O(n1/d) ECE 1770 – Content-Addressable Networks

  9. Construction of a CAN Overlay • The entire CAN space is divided amongst the nodes currently in the system • Incremental construction process takes three steps • The new node finds a node already in the CAN • Using the CAN routing mechanisms, finds a node whose zone will be split • The neighbors of the split zone must be notified so that routing can include the new node • Bootstrapping: There are CAN bootstrap nodes associated to a DNS domain name Node Insertion Affects Only O(number of dimensions) existing nodes ECE 1770 – Content-Addressable Networks

  10. Maintenance of a CAN Overlay • Node Graceful Departure: node explicitly hands over its zone and the associated (key,value) database to one of its neighbors • Node Abrupt Disappearance: An immediate takeover algorithm ensures one of the “failed” node’s neighbors takes over the zone • Under normal conditions, a node sends periodic update messages to each of its neighbors and a list of neighbors and their zone coordinates. • Prolonged absence of an update message from a neighbor signals it’s failure ECE 1770 – Content-Addressable Networks

  11. Design Improvements • Basic CAN algorithm provides • Low per-node state (O(d) for a d-dimensional space) • Short path lengths (O(dn1/d) hops for d dimensions and n nodes) • The problem is that there are application-layer hops, not IP-layer hops • Latency of each hop might be substantial ECE 1770 – Content-Addressable Networks

  12. Design Improvements (2) • Improvement: Multi-dimensioned Coordinate Spaces • Increasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the size of the coordinate routing table • Path Length scales as O(d(n1/d)) • Fault-tolerance improves • Improvement: Multiple Coordinate Spaces (a.k.a. Multiple Realities) • Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate space (each coordinate space is a reality) • Fault-tolerance improves • Low per-node state (O(d) for a d-dimensional space) • Short path lengths (O(dn1/d) hops for d dimensions and n nodes) • Which is better? • Increasing the dimensions ECE 1770 – Content-Addressable Networks

  13. Design Improvements (3) • Improvement: Better CAN Routing Metrics • Have each node measure the network-level round-trip-time RTT to each of its neighbors. Then route messages accordingly. • Favors lower latency paths and avoids unnecessarily long hops • Improvement: Caching and Replication • A CAN node can maintain a cache of the data keys it recently accessed • A CAN node can replicate the data key at each of its neighboring nodes • Both schemes need an associated time-to-live field, to eventually expire from the cache ECE 1770 – Content-Addressable Networks

  14. Related Systems • Domain Name System • CANs are more general than the DNS because DNS closely ties the naming scheme to the manner in which a name is resolved to an IP address • Peer-to-Peer • A simple example is keys being analogous to a URL • Will improve robustness • Key difference is that content within the CAN can always be located by any other node because there is a clear “home” (point) in the CAN for that content and every other node knows what the home is how to reach it ECE 1770 – Content-Addressable Networks

  15. Discussion • Security? • Better or worse with CAN? • Any Other Design Improvement? • Is The Communication Overhead Significant? ECE 1770 – Content-Addressable Networks

  16. References • A Scalable Content-Addressable Network, Ratnasamy, University of California – Berkeley, http://www.sigcomm.org/sigcomm2001/p13-ratnasamy.pdf ECE 1770 – Content-Addressable Networks

More Related