Buffer requirements for TCP

Buffer requirements for TCP Damon Wischik www.wischik.com/damon DARPA grant W911NF05-1-0254

Motivation • Internet routers have buffers, • to accomodate bursts in traffic • to keep utilization high • Large buffers are challenging: • optical packet buffers are hard to build • large electronic buffers are unsustainablesince data volumes double every 10–12 months [Odlyzko, 2003]and CPU speeds double every 18 monthsbut DRAM memory access speeds double every 10 years • How big do the buffers need to be,to accommodate TCP traffic? • 3 GByte? Rule of thumb says buffer = bandwidth×delay • 300 MByte? buffer = bandwidth×delay/√#flows[Appenzeller, Keslassy, McKeown, 2004] • 30 kByte? constant buffer size, independent of line rate [Kelly, Key, etc.]

History of TCP Transmission Control Protocol • 1974: First draft of TCP/IP[“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn] • 1983: ARPANET switches on TCP/IP • 1986: Congestion collapse • 1988: Congestion control for TCP[“Congestion avoidance and control”, Van Jacobson] “A Brief History of the Internet”, the Internet Society

End-to-end congestion control Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] request Server (TCP) User

End-to-end congestion control Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] Server (TCP) User data

End-to-end congestion control Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] The TCP algorithm, running on the server, decides how fast to send data. acknowledgements Server (TCP) User data

TCP algorithm if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; _last_acked = seqno; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); send_packets(now); return; } uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _last_acked = seqno; _dupacks = 0; _in_fast_recovery = false; send_packets(now); return; } if (_in_fast_recovery) { _cwnd += _mss; send_packets(now); return; } _dupacks++; if (_dupacks!=3) { send_packets(now); return; } _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); retransmit_packet(now); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; } traffic rate [0-100 kB/sec] time [0-8 sec]

Buffer sizing I: TCP sawtooth • The traffic rate produced by a TCP flow follows a ‘sawtooth’ • To prevent the link’s going idle, the buffer must be big enough to smooth out a sawtooth • buffer = bandwidth×delay • But is this a worthwhile goal?Why not sacrifice a little utilization, so that virtually no buffer is needed? • When there are many TCP flows, the sawteeth should average out, so a smaller buffer is sufficient • buffer = bandwidth×delay/√NN = number of TCP flows[Appenzeller, Keslassy, McKeown, 2004] • But what if they don’t average out, i.e. they remain synchronized? rate time

Buffer sizing I: TCP sawtooth • The traffic rate produced by a TCP flow follows a ‘sawtooth’ • To prevent the link’s going idle, the buffer must be big enough to smooth out a sawtooth • buffer = bandwidth×delay • But is this a worthwhile goal?Why not sacrifice some utilization, so that virtually no buffer is needed? • When there are many TCP flows, the sawteeth should average out, so a smaller buffer is sufficient • buffer = bandwidth×delay/√NN = number of TCP flows[Appenzeller, Keslassy, McKeown, 2004] • But what if they don’t average out, i.e. they remain synchronized? rate time

Buffer sizing II: TCP dynamics • When the aggregate data rate is less than the service rate, the queue stays small • No packet drops, so TCPs increase their data rate aggregatedata rate service rate queue size[0-160pkt] time [0-3.5s]

Buffer sizing II: TCP dynamics • When the aggregate data rate is less than the service rate, the queue stays small • No packet drops, so TCPs increase their data rate • Eventually the aggregate data rate exceeds the service rate, and a queue starts to build up • When the queue is full, packets start to get dropped aggregatedata rate service rate drops queue size[0-160pkt] time [0-3.5s]

Buffer sizing II: TCP dynamics • When the aggregate data rate is less than the service rate, the queue stays small • No packet drops, so TCPs increase their data rate • Eventually the aggregate data rate exceeds the service rate, and a queue starts to build up • When the queue is full, packets start to get dropped • One round trip time later, TCPs respond and cut backThey may overreact, leading to synchronization i.e. periodic fluctuations aggregatedata rate service rate queue size[0-160pkt] time [0-3.5s]

Buffer sizing II: TCP dynamics • When the aggregate data rate is less than the service rate, the queue stays small • No packet drops, so TCPs increase their data rate • Eventually the aggregate data rate exceeds the service rate, and a queue starts to build up • When the queue is full, packets start to get dropped • One round trip time later, TCPs respond and cut backThey may overreact, leading to synchronization i.e. periodic fluctuations • Is it possible to keep the link slightly underutilized,so that the queue size remains small? aggregatedata rate service rate queue size[0-160pkt] time [0-3.5s]

Buffer sizing III: packet burstiness • TCP traffic is made up of packets • there may be packet clumps, if the access network is fast • or the packets may be spaced out • Even if we manage to keep total data rate < service rate, packet-level bursts will still cause small, rapid fluctuations in queue size • If the buffer is small, these fluctuations will lead to packet loss • which makes TCPs keep their data rates low • which keeps the queue size small, & reduces the chance of synchronization rate time packets aggregatedata rate queue size[0-16pkt]

Mathematical models for the queue • Over short timescales (<1ms), TCP traffic is approximately Poisson[“Internet traffic tends toward Poisson and independent as the load increases”, Cao, Cleveland, Lin, Sun, 2002] • With small buffers, queue fluctuations are rapid(average duration of a busy period is inversely proportional to line rate)[Cao, Ramanan, 2002] queue size[0-16pkt] time [0-0.1s] 500 flows, service rate 0.8Mbit/s (100pkt/s/flow),average RTT 120ms, buffer size 16pkt

Mathematical models for the queue • Over short timescales (<1ms), TCP traffic is approximately Poisson[“Internet traffic tends toward Poisson and independent as the load increases”, Cao, Cleveland, Lin, Sun, 2002] • With small buffers, queue fluctuations are rapid(average duration of a busy period is inversely proportional to line rate)[Cao, Ramanan, 2002] • Therefore, we model small-buffer routers using Poisson traffic modelsLong-range dependence, self similarity, and heavy tails are irrelevant! • The packet loss probability in an M/D/1/B queue, at link utilization , is roughly p = B queue size[0-16pkt] time [0-0.1s] 500 flows, service rate 0.8Mbit/s (100pkt/s/flow),average RTT 120ms, buffer size 16pkt

Mathematical models for TCP • Over long timescales (>10ms), TCP traffic can be modelled using differential equations & control theory[Misra, Gong, Towsley, 2000] • Use this to • calculate equilibrium link utilization (the “TCP throughput formula”) • work out whether the system is synchronized or stable • evaluate the effect of changing buffer size • investigate alternatives to TCP average data rate at time t round trip time(“delay”) pkt drop probability at time t-RTT(calculated from Poisson model for the queue) available bandwidth per flow

Instability plot link utilization  TCP throughput formula C = bandwidth per flowRTT = average round trip time C RTT = TCP window size log10 ofpkt lossprobability p loss probabilityfor M/D/1/B queue

Instability plot link utilization  extent ofoscillationsin TCP throughput formula C = bandwidth per flowRTT = average round trip time C RTT = TCP window size log10 ofpkt lossprobability p loss probabilityfor M/D/1/B queue

Instability plot link utilization  C RTT = 4 pkts log10 ofpkt lossprobability p C RTT = 20 pkts C RTT = 100 pkts B = 20 pkts

Different buffer size / TCP Intermediate buffers buffer = bandwidth×delay / √ #flowsorLarge buffers buffer = bandwidth×delay Large buffers with AQM buffer = bandwidth×delay× {¼,1,4} Small buffers buffer={10,20,50} pkts Small buffers, ScalableTCP buffer={50,1000} pkts[Vinnicombe 2002, T.Kelly 2002]

Results There is a virtuous circle:desynchronization permits small buffers; small buffers induce desynchronization.

Limitations/concerns • Surely bottlenecks are at the access network, not the core network? • Unwise to rely on this! • The small-buffer theory works fine for as few as 20 flows • If the core is underutilized, it definitely doesn’t need big buffers • The Poisson model sometimes breaks down • because of short-timescale packet clumps (not LRD!) • need more measurement of short-timescale Internet traffic statistics • Limited validation so far[McKeown et al. at Stanford, Level3, Internet2] • Proper validation needs • goodly amount of traffic • full measurement kit • ability to control buffer size

Conclusion • Buffer sizes can be very small • a buffer of 25pkt gives link utilization > 90% • small buffers mean that TCP flows get better feedback,so they can better judge how much capacity is available • use Poisson traffic models for the router,differential equation models for aggregate traffic • Further questions: What is the impact of packet ‘clumpiness’?How well would non-FIFO buffers work? • TCP can be improved with simple changes • e.g. spacing out TCP packets • e.g. modify the window increase/decrease rules[ScalableTCP: Vinnicombe 2004, Kelly 2004; XCP: Katabi, Handley, Rohrs 2000] • any future transport protocol should be designed along these lines • improved TCP may find its way into Linux/Windows within 5 years

Buffer requirements for TCP