FAST TCP

FAST TCP Bartek Wydrowski Steven Low netlab.CALTECH.edu

Acks & Collaborators • Internet2 • Almes, Shalunov • Abilene GigaPoP’s • GATech, NCSU, PSC, Seattle, Washington • Cisco • Aiken, Doraiswami, McGugan, Smith, Yip • Level(3) • Fernes • LANL • Wu • Caltech • Bunn, Choe, Doyle, Hegde, Jin, Li, Low Newman, Papadoupoulous, Ravot, Singh, Tang, J. Wang, Wei, Wydrowski, Xia • UCLA • Paganini, Z. Wang • StarLight • deFanti, Winkler • CERN • Martin • SLAC • Cottrell • PSC • Mathis

Outline • Background, motivation • FAST TCP • Architecture and algorithms • Experimental evaluations • Loss recovery • MaxNet, SUPA FAST

ns-2 simulation DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) average utilization 95% 1G 27% 19% txq=100 txq=100 txq=10000 Linux TCP Linux TCP FAST capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) Performance at large windows 10Gbps capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02)

Average Queue vs Buffer Size Dummynet • capacity = 800Mbps • Delay =200ms • 1 flows • Buffer size: 50, …, 8000 pkts (S. Hedge, B. Wydrowski, etc, Caltech)

Is large queue necessary for high throughput?

Congestion control Example congestion measure pl(t) • Loss (Reno) • Queueing delay (Vegas) pl(t) xi(t)

pl(t) • AQM: • DropTail • RED • REM/PI • AVQ xi(t) TCP: • Reno • Vegas TCP/AQM • Congestion control is a distributed asynchronous algorithm to share bandwidth • It has two components • TCP: adapts sending rate (window) to congestion • AQM: adjusts & feeds back congestion information • They form a distributed feedback control system • Equilibrium & stability depends on both TCP and AQM • And on delay, capacity, routing, #connections

ACK: W  W + 1/W Loss: W  W – 0.5W • Packet level • Flow level • Equilibrium • Dynamics pkts Packet & flow level Reno TCP (Mathis formula)

Reno TCP • Packet level • Designed and implemented first • Flow level • Understood afterwards • Flow level dynamics determines • Equilibrium: performance, fairness • Stability • Design flow level equilibrium & stability • Implement flow level goals at packet level

Reno TCP • Packet level • Designed and implemented first • Flow level • Understood afterwards • Flow level dynamics determines • Equilibrium: performance, fairness • Stability Packet level design of FAST, HSTCP, STCP guided by flow level properties

ACK: W  W + 1/W Loss: W  W – 0.5W • Reno AIMD(1, 0.5) ACK: W  W + a(w)/W Loss: W  W – b(w)W • HSTCP AIMD(a(w), b(w)) ACK: W  W + 0.01 Loss: W  W – 0.125W • STCP MIMD(a, b) • FAST Packet level

Flow level: Reno, HSTCP, STCP, FAST • Similarflow level equilibrium pkts/sec a = 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP)

Flow level: Reno, HSTCP, STCP, FAST • Commonflow level dynamics! window adjustment control gain flow level goal = • Different gain k and utility Ui • They determine equilibrium and stability • Different congestion measure pi • Loss probability (Reno, HSTCP, STCP) • Queueing delay (Vegas, FAST)

Implementation strategy • Commonflow level dynamics window adjustment control gain flow level goal = • Small adjustment when close, large far away • Need to estimate how far current state is wrt target • Scalable • Window adjustment independent of pi • Depends only on current window • Difficult to scale

Difficulties at large window • Equilibrium problem • Packet level: AI too slow, MD too drastic • Flow level: required loss probability too small • Dynamic problem • Packet level: must oscillate on binary signal • Flow level: unstable at large window 5

Problem: no target • Reno:AIMD (1, 0.5) ACK: W  W + 1/W Loss: W  W – 0.5W • HSTCP:AIMD (a(w), b(w)) ACK: W  W + a(w)/W Loss: W  W – b(w)W • STCP:MIMD (1/100, 1/8) ACK: W  W + 0.01 Loss: W  W – 0.125W

FAST Conv Slow Start Equil Loss Rec Solution: estimate target • FAST Scalable to any w*

Difficulties at large window • Equilibrium problem • Packet level: AI too slow, MD too drastic • Flow level: required loss probability too small • Dynamic problem • Packet level: must oscillate on binary signal • Flow level: unstable at large window

TCP Problem: binary signal oscillation

Solution: multibit signal FAST stabilized

Difficulties at large window • Equilibrium problem • Packet level: AI too slow, MD too drastic • Flow level: required loss probability too small • Dynamic problem • Packet level: must oscillate on binary signal • Flow level: unstable at large window Use multi-bit signal ! Stablize flow dynamics !

<RTT timescale RTT timescale Loss recovery Architecture

Architecture Each component • designed independently • upgraded asynchronously

Architecture Each component • designed independently • upgraded asynchronously Window Control

Window control algorithm • Full utilization • regardless of bandwidth-delay product • Globally stable • exponential convergence • Fairness • weighted proportional fairness • parameter a

Window control algorithm

Window control algorithm target backlog measured backlog

Dynamic sharing: 3 flows FAST Linux Dynamic sharing on Dummynet • capacity = 800Mbps • delay=120ms • 3 flows • iperf throughput • Linux 2.4.x (HSTCP: UCL)

Dynamic sharing: 3 flows FAST Linux Steady throughput HSTCP BIC

30min queue FAST Linux loss throughput Dynamic sharing on Dummynet • capacity = 800Mbps • delay=120ms • 14 flows • iperf throughput • Linux 2.4.x (HSTCP: UCL) HSTCP STCP

30min queue Room for mice ! FAST Linux loss throughput HSTCP HSTCP BIC

small window 800pkts large window 8000 Aggregate throughput Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Fairness Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

stable in diverse scenarios Stability Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

Responsiveness Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts

I2LSR, SC2004 Bandwidth Challenge Harvey Newman’s group, Caltech http://dnae.home.cern.ch/dnae/lsr4-nov04 OC48 OC192 November 8, 2004 Caltech and CERN transferred • 2,881 GBytes in one hour (6.86Gbps) • between Geneva - US - Geneva (25,280 km) • through LHCnet/DataTag, Abilene and CENIC backbones • using 18 FAST TCP streams • on Linux 2.6.9 kernel with 9000KB MTU • at 174 Pbm/s

Internet2 Abilene Weather Map OC48 OC192 7.1G: GENV-PITS-LOSA-SNVA-STTL-DNVR-KSCY-HSTON-ATLA-WASH-NYCM-CHIN-GENV Newman’s group, Caltech

“Ultrascale” protocol development: FASTTCP FAST TCP • Based on TCP Vegas • Uses end-to-end delay and loss to dynamically adjust the congestion window • Defines an explicit equilibrium Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow BW use 50% BW use 79% BW use 30% BW use 40% Linux TCP Westwood+BIC TCP FAST (Yang Xia, Caltech)

FAST backs off to make room for Reno Periodic losses every 10mins (Yang Xia, Harvey Newman, Caltech)

Linux Experiment by Yusung Kim KAIST, Korea, Oct 2004 • Dummynet • Capacity = 622Mbps • Delay=200ms • Router buffer size = 1BDP (11,000 pkts) • 1 flow • Application: iperf • BIC, FAST, HSTCP, STCP, Reno (Linux), CUBIC http://netsrv.csc.ncsu.edu/yskim/single_traffic/curves/

RTT RTT = 400ms double baseRTT FAST Throughput Yusung Kim, KAIST, Korea 10/2004 • All can achieve high throughput except Reno • FAST adds negligible queueing delay • Loss-based control (almost) fills buffer … • adding delay and reducing ability to absorb bursts HSTCP BIC

queue FAST FAST cwnd Yusung Kim, KAIST, Korea 10/2004 • FAST needs smaller buffer at both routers and hosts • Loss-based control limited at host in these expts HSTCP BIC

Loss Recovery Section Overview • Linux & TCP loss recovery has problems; esp. in non-congestion loss environments. • New Loss Architecture: • Determining packet loss & PIF • Decoupled window control • Testing in high loss environment • Receiver window issues • Forward Retransmission • SACK processing optimization • Reorder Detection • Testing in small buffer environment

New Loss Recovery Architecture • New Architecture for loss recovery motivated by new environments: • High loss wireless, 802.11, Satellite • Low loss, but large BDP • Measure of Path ‘difficulty’ should be extended • BDLP: Bandwidith x Delay x (1/(1-Loss))

Periodic losses every 10mins (Yang Xia, Harvey Newman, Caltech)

Haystack - 1 Flow (Atlanta-> Japan) • Iperf used to generate traffic. • Sender is a Xeon 2.6 Ghz • Window was constant: • Burstiness in rate due to • Host processing and ack spacing.

FAST TCP

FAST TCP

Presentation Transcript

FAST TCP

FAST TCP for Multi-Gbps WAN: Experiments and Applications

FAST TCP

FAST TCP: Motivation, Architecture, Algorithms, Performance

FAST TCP

Fast TCP

FAST TCP: Motivation, Architecture, Algorithms, Performance

FAST TCP : From Theory to Experiments

FAST TCP

FAST TCP

Simulation based analysis of FAST TCP using OMNET++

Status of FAST TCP and other TCP alternatives

FAST TCP: design and experiments

FAST TCP I: motivation, approach, architecture

FAST TCP: From Theory to Experiments

FAST TCP

FAST TCP in Linux

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers

FAST TCP

Status of FAST TCP and other TCP alternatives