A Framework for Highly-Available Real-Time Internet Services

A Framework for Highly-Available Real-Time Internet Services Bhaskaran Raman, EECS, U.C.Berkeley

Video smoothing proxy Internet Content server Transcoding proxy Real-time services Clients Problem Statement • Real-time services with long-lived sessions • Need to provide continued service in the face of failures

Service failure Network path failure Service replica Monitoring Problem Statement • Goals: • Quick recovery • Scalability Video-on-demand server Client

Our Approach • Service infrastructure • Computer clusters deployed at several points on the Internet • For service replication & path monitoring • (path = path of the data stream in a client session)

Summary of Related Work • Fail-over within a single cluster • Active Services (e.g., video transcoding) • TACC (web-proxy) • Only service failure is handled • Web mirror selection • E.g., SPAND • Does not handle failure during a session • Fail-over for network paths • Internet route recovery • Not quick enough • ATM, Telephone networks, MPLS • No mechanism for wide-area Internet No architecture to address network path failure during a real-time session

Research Challenges • Wide-area monitoring • Feasibility: how quickly and reliably can failures be detected? • Efficiency: per-session monitoring would impose significant overhead • Architecture • Who monitors who? • How are services replicated? • What is the mechanism for fail-over?

Is Wide-Area Monitoring Feasible? Monitoring for liveness of path using keep-alive heartbeat Time Failure: detected by timeout Time Timeout period False-positive: failure detected incorrectly Time Timeout period There’s a trade-off between time-to-detection and rate of false-positives

Is Wide-Area Monitoring Feasible? • False-positives due to: • Simultaneous losses • Sudden increase in RTT • Related studies: • Internet RTT study, Acharya & Saltz, UMD 1996 • RTT spikes are isolated • TCP RTO study, Allman & Paxson, SIGCOMM 1999 • Significant RTT increase is quite transient • Our experiments: • Ping data from ping servers • UDP heartbeats between Internet hosts

Ping measurements Berkeley HTTP 1 • Ping servers • 12 geographically distributed servers chosen • Approximation of a keep-alive stream • Count number of loss runs with > 4 simultaneous losses • Could be an actual failure or just intermittant losses • If we have 1 second HBs, and timeout after losing 4 HBs • This count gives the upper bound on the number of false-positives 3 Ping server ICMP 2 Internet host

Ping measurements

UDP-based keep-alive stream • Geographically distributed hosts: • Berkeley, Stanford, UIUC, TU-Berlin, UNSW • UDP heart-beat every 300ms • Measure gaps between receipt of successive heart-beats • False positive: • No heartbeat received for > 2 seconds, but received before 30 seconds • Failure: • No HB for > 30 seconds

UDP-based keep-alive stream

What does this mean? • If we have a failure detection scheme • Timeout of 2 sec • False positives can be as low as once a day • For many pairs of Internet hosts • In comparison, BGP route recovery: • > 30 seconds • Can take upto 10s of minutes, Labovitz & Ahuja, SIGCOMM 2000 • Worse with multi-homing (an increasing trend)

Architectural Requirements • Efficiency: • Monitoring per-session  too much overhead • End-to-end  more latency  monitoring less effective • Need aggregation • Client-side aggregation: using a SPAND-like server • Server-side aggregation: clusters • But not all clients have the same server & vice-versa • Service infrastructure to address this • Several service clusters on the Internet

Internet Source Client Keep-alive stream Service cluster: Compute cluster capable of running services Architecture Overlay topology Nodes = service clusters Links = monitoring channels between clusters Source  Client Routed via monitored paths Could go through an intermediate service Local recovery using a backup path

Architecture Monitoring within cluster for process/machine failure Monitoring across clusters for network path failure Peering of service clusters – to server as backups for one another – or for monitoring the path between them

Architecture: Advantages • 2-level monitoring  • Process/machine failures still handled within cluster • Common failure cases do not require wide-area mechanisms • Aggregation of monitoring across clusters  • Efficiency • Model works for cascaded services as well S1 Source S2 Client

Architecture: Potential Criticism • Potential criticism: • Does not handle resource reservation • Response: • Related issue, but could be orthogonal • Aggregated/hierarchical reservation schemes (e.g., the Clearing House) • Even if reservation is solved (or is not needed), we still need to address failures

Architecture: Issues(1) Routing Internet Source Given the overlay topology, Need a routing algorithm to go from source to destination via service(s) Also need: (a) WA-SDS, (b) Closest service cluster to a given host Client

Architecture: Issues(2) Topology How many service-clusters? (nodes) How many monitored-paths? (links)

BGP Border router Peering session Heartbeats IP route Destination based Service infrastructure Service cluster Peering session Heartbeats Route across clusters Based on destination and intermediate service(s) Ideas on Routing Similarities with BGP

S’ (San Francisco) S (Berkeley) Ideas on Routing • How is routing in the overlay topology different from BGP? • Overlay topology • More freedom than physical topology • Constraints on graph can be imposed more easily • For example, can have local recovery

Ideas on Routing Source AP1 AP2 Client AP3 BGP exchanges a lot of information Increases with the number of APs – O(100,000) Service clusters need to exchange very little information Problems of (a) Service discovery (b) Knowing the nearest service cluster – are decoupled from routing

Ideas on Routing • BGP routers do not have per-session state • But, service clusters can maintain per-session state • Can have local recovery pre-determined for each session • Finally, we probably don’t need to have as many nodes as the number of BGP routers • can have more aggressive routing algorithm It is feasible to have fail-over in the overlay topology quicker than Internet route recovery with BGP

Ideas on topology • Decision criteria • Additional latency for client session • Sparse topology  more latency • Monitoring overhead • Many monitored paths  more overhead, but additional flexibility  might reduce end-to-end latency

Ideas on topology • Number of nodes: • Claim: upper bound is: one per AS • This will ensure that there is a service cluster “close” to every client • Topology will be close to Internet backbone topology • Number of monitoring channels: • # AS: ~7000 as of 1999 • Monitoring overhead: ~100 Bytes/sec • Can have ~1000 peering sessions per service cluster easily • Dense overlay topology possible (to minimize additional latency)

Implementation + Performance Service cluster Exchange of session-information + Monitoring heartbeat Peer cluster Manager node

Implementation + Performance • PCM  GSM codec service • Overhead of “hot” backup service: 1 process (idle) • Service reinstantiation time: ~100ms • End-to-end recovery over wide-area: three components Detection time – O(2sec) Communication with replica – RTT – O(100ms) Replica activation: ~100ms

Implementation + Performance • Overhead of monitoring • Keep-alive heartbeat • One per 300ms in our implementation • O(100Bytes/sec) • Overhead of false-positive in failure detection • Session transfer • One message exchange across adjacent clusters • Few hundred bytes

Summary • Real-time applications with long-lived sessions • No support exists for path-failure recovery (unlike say, the PSTN) • Service infrastructure to provide this support • Wide-area monitoring for path liveness: • O(2sec) failure detection with low rate of false positives • Peering and replication model for quick fail-over • Interesting issues: • Nature of overlay topology • Algorithms for routing and fail-over

Questions • What are the factors in deciding topology? • Strategies for handling failure near client/source • Currently deployed mechanisms for RIP/OSPF, ATM/MPLS failure recovery • What is the time to recover? • Applications: video/audio streaming • More? • Games? Proxies for games on hand-held devices? http://www.cs.berkeley.edu/~bhaskar (Presentation running under VMWare under Linux)

A Framework for Highly-Available Real-Time Internet Services

A Framework for Highly-Available Real-Time Internet Services

Presentation Transcript

A Framework for Analyzing Real-Time Advanced Shading Techniques

Megastore: Providing Scalable Highly Available Storage for Interactive Services.

Internet Real Time Lab

Megastore: Providing Scalable, Highly Available Storage for Interactive Services .

NTCCRT: A CONCURRENT CONSTRAINT FRAMEWORK FOR REAL -TIME INTERACTION

Highly available services

Highly Scalable Algorithm for Distributed Real-time Text Indexing

A Real-Time Communication Framework for Wireless Sensor-Actuator Networks

The Internet Real-Time Laboratory

Tools Available for Real-Time Exposure Assessment

internet applications for the real time enterprise

Supporting real-time services to mobile Internet hosts

The Internet Real-Time Laboratory

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

A Real-Time Communication Framework for Wireless Sensor-Actuator Networks

Internet Real Time Laboratory

Internet Real-Time Laboratory demonstration

Full time assistance available for plumbing services

REAL TIME API SERVICES

The Internet Real-Time Laboratory