Enhancing TCP Performance with Integrated Congestion Control

TCP Performance Phil Cayton CSE 581 01/12/02

Papers read for this discussion • TCP Behavior of a Busy Internet Server: Analysis and Improvements • H. Balakrishnan, V. Padmanabhan, S. Seshan, M. Stemm, R. Katz • An Integrated Congestion Management Architecture of Internet Hosts • H. Balakrishnan, H. Rahul, S. Seshan • Endpoint Admission Control: Architectural Issues and Performance • L. Breslau, E. Knightly, S. Shenker, I. Stoica, H. Zhang • TCP Congestion Control with a Misbehaving Receiver • S. Savage, N. Cardwell, D. Wetherall, T. Anderson

Agenda • Overview of Loss Recovery and Congestion Control Behavior • Issues with TCP and Current Trends & Techniques to solve them • Attacks against & Defenses for TCP Vulnerabilities • Integrated Congestion Management • Endpoint Admission Control • Summary • Discussion

Changing Nature of Net Traffic • Historically – single, long-running xfers • Telnet, FTP, etc • Currently – multiple, concurrent, short-lived transfers and use of transports that do not adapt to congestion • Do not give TCP time to adapt to network congestion • Receiver modifications exacerbate problem by circumventing congestion control mechanisms. • Can hog bandwidth or commit DOS attacks • PROBLEM – Current trends could affect long term stability of internet

AIMD • Slow Start • Start the congestion window (CW) at the size of a single segment & send it. If segment acked before timeout add one segment to the CW size up to MSS. • Congestion Avoidance • Once CW reaches threshold only add a new segment for each estimated segment RTT. • Multiplicative Decrease • Reduce CW size by half for each retransmit

Problems with Current TCP Congestion Control • Loss Recovery Techniques not effective in dealing with packet losses • Default Socket Buffer Size to small • Receiver window becomes a bottleneck • Parallel connections less responsive to congestion-induced losses • Ack-Compression can artificially increase perceived queue-length

Solving Problems and Improving TCP Performance • Integrated Congestion Control/Loss Recovery • Apps can use multiple TCP connections • Single integrated CW for the set of TCP connections • Eliminates slow-start for new connections • Decreases overall effect of congestion as all connections are informed of and react to the congestion • Data-driven loss recovery integrated across set of TCP connections • Connections know if ‘lost’ segments arrive on other connections and do not need to timeout.

Solving Problems and Improving TCP Performance • Increase default socket buffer size • Increases maximum receiver advertised window on connection est. • Allows for higher potential performance • Calculate Ack-Compression factor • If significant ack-compression, slow-data rate to limit danger of packet loss

TCP Vulnerabilities and Misbehaving Receivers • Ack Division • Receiver can acknowledge segments multiple times (up to #bytes acks) • Leads Sender to grow CW faster than normal. Bunch of acks Burst 1 RTT later

TCP Vulnerabilities and Misbehaving Receivers • Solution to Ack Division • Modify congestion control to guarantee segment-level granularity • Only increment MSS when a valid Ack arrives for the entire segment.

TCP Vulnerabilities and Misbehaving Receivers • Duplicate Ack Spoofing • Receiver sends multiple acks/sequence # • No way to tell what segment is being acked • Causes sender to enter fast-recovery mode and increase MSS Burst of dup acks Sender enters Fast Recovery and bursts 1 RTT later

TCP Vulnerabilities and Misbehaving Receivers • Solution to Duplicate Ack Spoofing • Add new fields to TCP headers. • “nonce & nonce-reply” – random values sent with segments and replies. Only increment congestion Window for replies to previously unacked packets (determined by nonce/reply) • Oops…requires us to modify servers & clients • Server maintains count of un-acked segments. Only incr cwnd while count > 0

TCP Vulnerabilities and Misbehaving Receivers • Optimistic Acking • Send acks for segments not yet received • Dec perceived RTT, affecting CW growth. Segment acks Segs arrive

TCP Vulnerabilities and Misbehaving Receivers • Solution to Optimistic Acking • Again use nonce/nonce-reply as spoofer cannot guess random-nonce values • Oops… does not take into account cumulative losses • Cumulative Nonce • Oops… still requires modification of TCP • “slightly” random sequence sizes • Spoofer unable to correctly anticipate segment boundaries and incorrect sequence numbers can be ignored.

Integrated Congestion Management • TCP-Friendly approach for end-system Congestion management that • Enables efficient multiplexing of concurrent flows • Enables apps/xports to adapt to congestion • Ensures proper and stable congestion behavior • Delivers trusted intermediary between flows for bandwidth management • Provides per-host/per-domain statistics (bandwidth, loss rates, RTT)

Integrated Congestion Management • Goals • Ensure stable net behavior • Enable shared state learning for app adaptation • Guiding Principles • Put app in control • App decides what to transmit • App decides how to apportion allocated bandwidth • Accommodate traffic heterogeneity • TCP bulk xfers, short xactions, RT-flows at various rates, layered streams, new apps • Accommodate application heterogeneity (Syncronous or Asynchronous) • Learn from the application

Integrated Congestion Management • CM Functions • Query path status • Schedule data transmissions • Update variables on congestion • CM Algorithms • Rate-based AIMD control • Loss-resilient, light-weight feedback protocol • Exponential aging when feedback infrequent • Scheduler to apportion bandwidth between flows

Integrated Congestion Management Note high-variance Web traces for web-like workload with 4 concurrent connections using TCP-Reno Same workload with TCP/CM Note extremely low-variance

Integrated Congestion Management • More efficient if sender and receiver use CM, but not required • CM improves Reliability & consistency to make apps better netizens • CM Ensures proper & stable congestion behavior

Endpoint Admission Control • Goal of AC – Provide QOS to soft-RT flows • Traditional Approaches • Integrated Services: per-flow router based AC • Flows must request service from the network. • Acceptance depends on the level of available resources • Limited scalability (routers must keep per-flow state & process per-flow reservation messages) • Differentiated Services: routers use priority/buffering mechanisms based on DS field in packet headers • No per-flow admission control or signaling • Routers do not maintain per-flow state • Good scalability • No AC – QOS degrades if resources are limited • Goal –IntServe QOS & DiffServ scalability while maintaining compatibility w/ Best Effort traffic

Endpoint Admission Control • Router Scheduling Mechanisms • FIFO • no flow admitted if the load while probing is greater than the capacity (no stolen bandwidth) • Fair Queuing • Isolate flows to give each a fair share. Flow acceptance could impair service to others. • Possibility of lower than optimal utilization • Coexisting w/ Best Effort Traffic • Don’t “borrow” bandwidth from best-effort traffic • Don’t allow best-effort traffic to preempt AC traffic • Multiple Levels of Service • All probes (regardless of priority) at same level • Probes at different level than any data traffic to avoid “stealing” service between levels

Endpoint Admission Control • Endpoint Probing Algorithms • Acceptance Threshold • Too low threshold leads to higher blocking • All AC flows should adopt uniform acceptance threshold • Accuracy • In-Band Probing – probe packets same priority as data (shorter set-up times, no starvation) • Out-of-Band Probing – probe packets lower priority than data packets (no data packet loss) • Thrashing • Accepted flow levels low, but high probe-traffic prevents further admission • Use slow-start probing

Endpoint Admission Control • Simulations • Probing: slow-start, early reject and simple • Options: in-band probing, out-of-band probing, signal congestion with packet drops, signal congestion with congestion marks

Endpoint Admission Control • Thrashing - Loss rate for simple & early reject probing algorithms for in-band probing design substantially worse than that of the Intserv “MBAC” benchmark • Slow start based algorithm in contrast is much closer to MBAC • OOB dropping similar loss-load frontiers as MBAC • Robustness • In-band dropping has highest, OOB marking lowest dropping rates. • Heterogeneous Thresholds • Lower thresholds for higher QOS->higher block rates->lower QOS • Uniform thresholds seem to yield higher overall QOS • Heterogeneous Traffic • Traditional admission control discriminates against bigger flows. • Edge admission control less discriminate and can admit too much • Multi-hop • Drop rates higher for longer/multi-hops but Admission accuracy unaffected

Endpoint Admission Control • Cons • Substantial setup delay • QOS unpredictable across settings • No mechanism to enforce uniform admission thresholds • Users could forge admission fields and be granted higher QOS without being subject to AC • Pros • Scalable scheduling for soft-RT applications • Provides multiple QOS levels along with Best-effort • Early tests show good TCP-friendliness

Summary • TCP not designed for prevalent traffic patterns • TCP designed for cooperative environment and contains vulnerabilities • Modest (server side) changes can make TCP more robust for current traffic and less vulnerable to spoofs and DOS attacks • More substantial (sender and reciever) changes can further add congestion management and improve overall stability and security

Discussion

Enhancing TCP Performance with Integrated Congestion Control