1 / 14

Web100/Net100 at Oak Ridge National Lab

Web100/Net100 at Oak Ridge National Lab. Tom Dunigan thd@ornl.gov October 21, 2002. Net100: developing network-aware operating systems. DOE-funded (Office of Science) project ($1M/yr, 3 yrs beginning 9/01) Principal investigators Matt Mathis, PSC ( mathis@psc.edu )

yaholo
Télécharger la présentation

Web100/Net100 at Oak Ridge National Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web100/Net100atOak Ridge National Lab Tom Dunigan thd@ornl.gov October 21, 2002

  2. Net100: developing network-aware operating systems • DOE-funded (Office of Science) project ($1M/yr, 3 yrs beginning 9/01) • Principal investigators • Matt Mathis, PSC (mathis@psc.edu) • Brian Tierney, LBNL (bltierney@lbl.gov) • Tom Dunigan, ORNL (thd@ornl.gov) Florence Fowler Nagi Rao • Objective: • measure and understand end-to-end network and application performance • tune network applications (grid and bulk transfer) • first year emphasis: bulk transfer over high delay/bandwidth nets • Components (leverage Web100) • Network Tool Analysis Framework (NTAF) • tool design and analysis • active network probes and passive sensors • network metrics data base • transport protocol analysis • tuning daemon (WAD) to tune network flows based on network metrics www.net100.org

  3. TCP tuning with Web100+/Net100 • Path characterization (NTAF) • both active and passive measurement • data base of measurement data • NTAF/Web100 hosts at PSC, NCAR,LBL,ORNL • Application tuning (tuning daemon, WAD) • Web100 extensions • disable Linux 2.4 caching/SendStall • event notification • more tuning options • daemon tunes application at start up • static tuning information • query NTAF and calculate optimum TCP parameters • dynamically tune application (Web100 feedback) • adjust parameters during flow • split optimum among parallel flows • Transport protocol optimizations • what to tune? • is it fair? stable?

  4. Motivation • Poor network performance … • High bandwidth paths, but app’s slow • Is it application, OS, network? … Yes • Changing: bandwidths • 9.6 Kbs… 1.5 Mbs ..45 …100…1000…? Mbs • Unchanging TCP: • speed of light (RTT) • MTU (still 1500 bytes) • TCP congestion avoidance • TCP is lossy by design ! • 2x overshoot at startup, sawtooth • recovery after a loss can be very slow on today’s high delay/bandwidth links • Recovery proportional to MSS/RTT2 ORNL to NERSC ftp Linear recovery at 0.5 Mb/s! Instantaneous bandwidth Early startup losses Average bandwidth

  5. Net100 TCP tuning • TCP performance • reliable/stable/fair • need buffer = bandwidth*RTT • ORNL/NERSC (80 ms, OC12) need 6 MB • TCP slow-start and loss recovery proportional to MSS/RTT² • slow on today’s high delay/bandwidth paths • TCP is lossy be design • TCP tuning • set optimal (?) buffer size • avoid losses • modified slow-start • reduce bursts • anticipate (Vegas?) loss • reorder threshold • speed recovery • bigger MTU or “virtual MSS” • modified AIMD (0.5,1) • delayed ACKs and initial window ns simulation: 500 mbs link, 80 ms RTT Packet loss early in slow start. Standard TCP with del ACK takes 10 minutes to recover!

  6. Net100 TCP tuning WAD config file [bob] src_addr: 0.0.0.0 src_port: 0 dst_addr: 10.5.128.74 dst_port: 0 mode: 1 sndbuf: 2000000 rcvbuf: 100000 wadai: 6 wadmd: 0.3 maxssth: 100 divide: 1 reorder: 9 delack: 0 floyd: 1 • Work-around Daemon (WAD) • tune unknowing sender/receiver at startup and/or during flow • Web100 kernel extensions • uses netlink to alert daemon of socket open/close • Besides existing Web100 buffer tuning, new code and WAD_* variables • knobs to disable Linux 2.4 caching and sendstall • config file with static tuning data • mode specifies dynamic tuning (Floyd AIMD, NTAF buffer size, concurrent streams) • daemon periodically polls NTAF for fresh tuning data • written in C (LBL has python version)

  7. WAD tuning results (your mileage may vary …) Classic buffer tuning: ORNL to PSC, OC12, 80ms RTT network-challenged app. gets 10 Mbs same app., WAD/NTAF tuned buffer get 143 Mbs Virtual MSS tune TCP’s additive increase (WAD_AI) add K segments per RTT during recovery k=6 like GigE jumboframe

  8. WAD tuning Modified slow-start and AI ORNL to NERSC, OC12, 80 ms RTT often losses in slow start WAD tuned Floyd slowstart (WAD_MaxThresh) and AI (6) WAD tuned AIMD and slow start ORNL to CERN, OC12, 150ms RTT parallel streams AIMD (1/(2k),k) WAD tune single stream (0.125,4) WAD_MD Can tuned single stream compete with parallel streams? pre-tune Floyd AIMD or dynamically adjust tune concurrent flows -- subdivide buffer

  9. Net100 TCP tuning Reorder threshold seeing more out of order packets WAD tune a bigger reorder threshold Linux 2.4 does a good job already LBL to ORNL (using our TCP-over-UDP) dup3 case had 289 retransmits, but all were unneeded! WAD could turn off delayed ACKs -- 2x improvement in recovery rate and slowstart linux 2.4 already turns off delayed ACKs for initial slow-start WARNING: could be unfair, probably stable use only on intranet Web100 has proven very useful for experimenting with TCP tuning options.

  10. Web100 tools • Java applet bandwidth/client tester • measure in/out data rates • report flow characteristics • Try it http://firebird.ccs.ornl.gov:7123 • INSIGHTS: • what happened, what you can expect • from server log: • 25,755 flows • 53% with loss, 23% timeouts • Post-transfer statistics • ttcp100/iperf100 • Web100 daemon • avoid modifying applications • log designated paths/ports/variables • INSIGHTS: later...

  11. Web100 tools • Tracer daemon • collect Web100 variables at 0.1 second intervals • config file specifies • source/port dest/port • web100 variables (current/delta) • log to disk with timestamp and CID • C and python (LBL-based) • INSIGHTS: • watch uninstrumented app’s (GridFTP) • analyze flow dynamics with plots (cwnd, ssthresh, re-xmits,RTT…) • analyze tuned flows • aggregate parallel flow data # traced config file #local lport remote rport 0.0.0.0 0 124.55.182.7 0 0.0.0.0 0 134.67.45.9 0 #v=value d=delta d PktsOut d PktsRetrans v CurrentCwnd v SampledRTT

  12. PIX SACK problem Web100 reports timeouts into ORNL, not at other sites ?? Theory 1: yet another linux 2.4 TCP feature our TCP-over-UDP: no timeouts Tcpdump/tcptrace/xplot of flow both inside and outside ORNL ? Tcptrace bug -- SACK blocks wrong for one of the dumps… NOT. ORNL PIX firewall randomizing TCP sequence numbers, but failed to adjust SACK blocks RESULT: TCP timeouts

  13. Futures www.net100.org • Net100 • analyze effectiveness of current tuning options • NTAF probes -- characterizing a path to tune a flow • additional tuning algorithms (Vegas) • parallel/multipath selection/tuning • WAD-to-WAD tuning • Web100 extensions • Web100 trace files -- log all data efficiently • variable for count of duplicate data segments at receiver • remove wscale restriction • ESnet • Jumbo frames • Router/switch data

More Related