1 / 20

Fine-Grained Failover Using Connection Migration

Fine-Grained Failover Using Connection Migration. Alex C. Snoeren, David G. Andersen, Hari Balakrishnan MIT Laboratory for Computer Science. Servers Fail. The Problem. Client. Content server. More often than users want to know…. Solution: Server Redundancy. Use a healthy one at all

zoltan
Télécharger la présentation

Fine-Grained Failover Using Connection Migration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fine-Grained FailoverUsing Connection Migration Alex C. Snoeren, David G. Andersen, Hari Balakrishnan MIT Laboratory for Computer Science

  2. Servers Fail. The Problem Client Content server More often than users want to know…

  3. Solution: Server Redundancy Use a healthy one at all times.

  4. Failover Components • Health Monitoring • Connection Resumption • Server Selection

  5. DNS Today’s Replication Technology • DNS/Content Routing • Wide-area replication • Need client awareness • Layer 4/Web Switches • Transparent, possibly mid-stream failover • Requires co-location Web Switch

  6. Ideal Technology • Wide area replication • Yet somehow synchronize replica servers • Transparent failover • Enable other servers to continue connections

  7. Migrate Architecture • Stream Mapping • Infer application state from transport layer information • Connection Migration • Transparently hand off sessions between servers Stream Mapper Stream Mapper Stream Mapper

  8. GET /StreamingContent.mpg HTTP/1.1 Stream Mapping Client: Server Response: TCP ISS 083521 HTTP 1.1 200 OK Content-Length: 328987 ... Content-Type: video/mpeg TCP SeqNo 083346 Stream Map:

  9. Initial Connection Migrated Connection Anatomy of Failover Client Support Group

  10. Support Groups • Set of partially mirrored servers • All servers able to provide same content • Can be topologically diverse • Synchronize on per-connection basis • Servers need not be complete mirrors • Connections from a failed server can be handled by a different support server • Connections may have distinct support groups

  11. Soft State Synchronization • Synchronize within support groups • Periodic advertisements • Advertise client application object requests • Communicate initial transport layer state • Only initial state need be communicated • Current info inferred from transport layer • Clients will reject redundant migrates from stale support servers

  12. TCP ConnectionMigration client server 1. Initial SYN 2. SYN/ACK 3. ACK (with data) 4. Normal data transfer 5. Migrate SYN 6. Migrate SYN/ACK 7. ACK (with data)

  13. TCP ConnectionMigration client server 1. Initial SYN 2. SYN/ACK 3. ACK (with data) 4.Normal data transfer 5. Migrate SYN 6. Migrate SYN/ACK 7. ACK (with data)

  14. failover server SYN 083521:083521(0) (migrate T, R) stale SYN 533525:533525(0) ack 545968 current 545968:546414(536) ack 533526 TCP ConnectionMigration client server 1. Initial SYN 2. SYN/ACK 3. ACK (with data) 4. Normal data transfer 5.Migrate SYN 6.Migrate SYN/ACK 7. ACK (with data)

  15. Implementation • Software “Wedge” • Stream Mapping • Synchronization Wedge Server App Stream Mapping Wedges Wedge Server App Client

  16. Wedge Overhead 1e+07 Wedge Direct 1e+06 Microseconds per request 100000 10000 1000 1 10 100 1000 10000 Request size (Kbytes)

  17. Experimental Topology Client initiates a transfer to A… Linux/Apache 1.3 128Kbs links then migrates to B… and back to A… Linux/Apache 1.3

  18. Varying Oscillation Rates 1e+06 900000 800000 700000 600000 Goodput (bytes) 500000 400000 No Oscillations 10 sec 300000 12 sec 2 sec 5 sec 200000 100000 0 0 10 20 30 40 50 60 Time (secs)

  19. Benefits & Limitations • Enable wide area server replication • Low server synchronization overhead • Infer current state from transport layer • Robust even under adverse loads • Health monitors can be overly reactive • Gracefully handle cascaded failures • Leverages connection migration • Requires modern transport stack

  20. Software available on the web: http://nms.lcs.mit.edu/software/migrate Networks and MobileSystems

More Related