1 / 93

Computing in the

Computing in the. R eliable A rray of I ndependent N odes. Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck. Marc Riedel. Marc Riedel. California Institute of Technology. IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems. May 5, 2000.

phong
Télécharger la présentation

Computing in the

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing in the Reliable Array of Independent Nodes Vasken Bohossian, Charles Fan, Paul LeMahieu, Marc Riedel, Lihao Xu, Jehoshua Bruck Marc Riedel Marc Riedel California Institute of Technology IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems May 5, 2000

  2. RAIN Project Collaboration: • Caltech’s Parallel and Distributed Computing Group www.paradise.caltech.edu • JPL’s Center for Integrated Space Microsystems www.csmt.jpl.nasa.gov

  3. RAIN Platform node node node Heterogeneous network of nodes and switches switch bus network switch node node node

  4. 4 eight-way Myrinet Switches 10 Pentium boxes w/multiple NICs RAIN Testbed www.paradise.caltech.edu

  5. Proof of Concept: Video Server Video client & server on every node. C D B A switch switch

  6. Limited Storage Insufficient storage to replicate all the data on each node. C D B A switch switch

  7. a b c d d+c d+a a+b b+c a b c d recover data b = a+b + a d = d+c + c k-of-n Code Erasure-correcting code: from any k of n columns

  8. Encoding Encode video using 2-of-4 code. C D B A switch switch

  9. Decoding Retrieve data and decode. C D B A switch switch

  10. Node Failure C D B A switch switch

  11. Node Failure C D B A switch switch

  12. Node Failure Dynamically switch to another node. C D B A switch switch

  13. Link Failure C D B A switch switch

  14. Link Failure C D B A switch switch

  15. Link Failure Dynamically switch to another network path. C D B A switch switch

  16. Switch Failure C D B A switch switch

  17. Switch Failure C D B A switch switch

  18. Switch Failure Dynamically switch to another network path. C D B A switch switch

  19. Node Recovery C D B A switch switch

  20. Node Recovery Continuous reconfiguration (e.g., load-balancing). C D B A switch switch

  21. tolerates multiple node/link/switch failures no single point of failure Certified Buzz-Word Compliant Features High availability: Efficient use of resources: • multiple data paths • redundant storage • graceful degradation Dynamic scalability/reconfigurability

  22. RAIN Project: Goals Efficient, reliable distributed computing and storage systems: key building blocks Applications Storage Communication Networks

  23. Fault-Tolerant Interconnect Topologies Connectivity Group Membership Distributed Storage Topics Today’s Talk: Applications Storage Communication Networks

  24. Interconnect Topologies Goal: lose at most a constant number of nodes for given network loss N N N N N N N N N N Network N = computing/storage node

  25. Resistance to Partitions Large partitions problematic for distributed services/computation N N N N N N N N N N Network N = computing/storage node

  26. Resistance to Partitions Large partitions problematic for distributed services/computation N N N N N N N N N N Network N = computing/storage node

  27. IEEE ACM Related Work • Hayes et al., Bruck et al., Boesch et al. Embedding hypercubes, rings, meshes, trees in fault-tolerant networks: Bus-based networks which are resistant to partitioning: • Ku and Hayes, 1997. “ Connective Fault-Tolerance in Multiple-Bus Systems”

  28. N N S S S N N S S = Node N N N S S S = Switch N A Ring of Switches degree-2 compute nodes, degree-4 switches a naïve solution

  29. N N S S S N N S S = Node N N N S S S = Switch N A Ring of Switches degree-2 compute nodes, degree-4 switches a naïve solution

  30. A Ring of Switches degree-2 compute nodes, degree-4 switches N N S S S a naïve solution N N easily partitioned S S = Node N N N S S S = Switch N

  31. 1 8 2 1 8 2 7 7 3 3 6 4 5 6 4 5 Resistance to Partitioning degree-2 compute nodes, degree-4 switches nodes on diagonals

  32. 1 8 2 1 8 2 7 7 3 3 6 4 5 6 4 5 Resistance to Partitioning degree-2 compute nodes, degree-4 switches nodes on diagonals

  33. 1 1 6 6 4 Resistance to Partitioning degree-2 compute nodes, degree-4 switches 8 2 8 2 nodes on diagonals 7 7 3 • tolerates any 3 switch failures (optimal) • generalizes to arbitrary node/switch degrees. 3 4 5 5 Details: paper IPPS’98, www.paradise.caltech.edu

  34. Isomorphic 1 1 6 8 2 1 1 8 6 4 3 4 2 7 7 3 3 7 3 7 8 6 8 2 4 5 5 6 4 2 5 5 Resistance to Partitioning Details: paper IPPS’98, www.paradise.caltech.edu

  35. Point-to-Point Connectivity A node node node ? Is thepath from A to Bup or down? Network node node node B

  36. Connectivity Bi-directional communication. Linkis seen asupordownby each node. Node A Node B {U,D} {U,D} Each node sends out pings. A node may time-out, deciding the link is down.

  37. Node State Node State A B A B U U U D D U Time Time D U U U D D D U U D U D D Consistent History A B

  38. Slack n=2: at most 2 unacknowledged transitions before a node waits Time Ais 1 ahead Ais 2 ahead Now A will wait for B to transition The Slack Node State A B U U D D U U D D U D U

  39. Consistent History Consistency in error reporting: If A sees channel error, B sees channel error. Node A Node B {U,D} {U,D} Birman et al.: “Reliability Through Consistency” Details: paper IPPS’99, www.paradise.caltech.edu

  40. B A C D Group Membership Consistent global view given local, point-to-point connectivity information ABCD ABCD • link/node failures • dynamic reconfiguration ABCD ABCD

  41. Theory IEEE ACM Chandra et al., Impossibility of Group Membership in an Asynchronous Environment Systems Totem, Isis/Horus, Transis Related Work

  42. B A D C Group Membership Token-Ring based Group Membership Protocol

  43. 1: ABCD Group Membership Token-Ring based Group Membership Protocol B A Token carries: • group membership list • sequence number D C

  44. 1: ABCD Group Membership Token-Ring based Group Membership Protocol B A 1 Token carries: • group membership list • sequence number D C

  45. Group Membership Token-Ring based Group Membership Protocol B A 1 2 Token carries: • group membership list • sequence number 2: ABCD D C

  46. Group Membership Token-Ring based Group Membership Protocol B A 1 2 Token carries: • group membership list • sequence number 3: ABCD D C 3

  47. Group Membership Token-Ring based Group Membership Protocol B A 1 2 Token carries: • group membership list • sequence number 4: ABCD D C 4 3

  48. Group Membership Token-Ring based Group Membership Protocol B A 5 2 Token carries: • group membership list • sequence number D C 4 3

  49. Group Membership Node or link fails: B A 5 2 D C 4 3

  50. Group Membership Node or link fails: B A 5 D C 4 3

More Related