1 / 11

Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL

Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL. Net100 Novel Ideas. Net100 will tune network-UNaware applications based on recent and current link characteristics Net100 will tune more than just transport buffer sizes, such as TCP AIMD parameters DUP threshold

macon-dyer
Télécharger la présentation

Net100 PIs: Wendy Huntoon/PSC, Tom Dunigan/ORNL, Brian Tierney/LBNL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Net100 PIs: Wendy Huntoon/PSC,Tom Dunigan/ORNL,Brian Tierney/LBNL Net100 Novel Ideas • Net100 will tune network-UNaware applications based on recent and current link characteristics • Net100 will tune more than just transport buffer sizes, such as • TCP AIMD parameters • DUP threshold • Delayed ACK • Net100 will determine optimal paths and whether to use multiple streams and/or multiple paths • Net100 kernel utilizes passive monitoring from the Web100 kernel Impact and Connections Milestones/Dates/Status • IMPACT: • increase throughput of bulk transfers over high delay, bandwidth networks (like DOE’s ESnet) • select optimal paths and transport parameters for distributed (Grid) application (e.g.: GridFTP) • provide network performance data base from active and passive monitoring • CONNECTIONS: • SciDAC: Astrophysics, Bandwidth Estimation, Data Grid, INCITE, Logistical Networking • Base:Network Monitoring, Data Grid, Transport Protocols • Network probes and sensors Mon/Yr DONE - initial sensor and tool deployment 12/01 12/01 - data base design 4/02 - initial data base implementation 9/02 - final sensor/data base 6/03 •Transport protocol optimizations - protocol analysis 11/02 - initial tuning daemon 3/02 - bulk transfer tuning demos 8/02 - final tuning daemon 6/03 • Multipath support - analytical analysis 8/02 - proof-of-principal routing daemons 12/02 - grid applications demos 4/03 High-Performance Network Research- SciDAC/Base NET100: Developing network-aware operating systems Tasks: -develop/deploy network probes/sensors -develop network metrics data base -develop transport protocol optimizations -develop network-tuning daemon www.net100.org MICS Program Manager: Thomas Ndousse Date Prepared: 1/7/02

  2. Net100 project • New DOE-funded (Office of Science) project ($1M/yr, 3 yrs) • Principal investigators • Wendy Huntoon and the NCAR/PSC/Web100 team (Matt Mathis) • Brian Tierney, LBNL • Tom Dunigan, ORNL • Objective: develop network aware operating systems • optimize and understand end-to-end network and application performance • eliminate the “wizard gap” • Motivation • DOE has a large investment in high speed networks (ESnet) and distributed applications • many network applications are not utilizing the available bandwidth

  3. Net100 approach • Develop Network Tools Analysis Framework (NTAF) • collect data for network tuning • Develop/evaluate/deploy network tools (Enable, NWS, iperf, pipechar, …) • aggregate and transform output from tools and Web100 • Store/query/archive performance data • evaluate network applications over DOE’s ESnet (OC12, OC48,10GigE…) • bulk transfers over high bandwidth/delay network • distributed applications (grid) • Investigate TCP optimizations • simulate/emulate/deploy • Linux kernel mods • Autotune network applications • WAD (workaround daemon)

  4. Web100 summary • NSF funded (NCAR/PSC) web100.org • Modified Linux kernel (2.4.9) • instrumented kernel to read/set TCP variables for a specific flow • readable: RTT, counts (bytes, pkts, retransmits,dups), state (SACKs, windowscale, cwnd, ssthresh) (115 variables!) • settable: buffer sizes • GUI to display/modify a flow’s TCP variables, real-time • API for network-aware applications • Early evaluators: ANL,SLAC, LBNL, ORNL, universities

  5. Motivation • bulk transfers are slow • faster links (OC12, OC48, 10GigE ), but long delay • classic TCP tuning problem • also broken TCP stacks • Under-provisioned routers/switches • TCP is lossy, slow to recover • tune it or replace it? • Compute/data grids • sense/probe link bandwidths/latencies • schedule/configure distributed application

  6. TCP losses Packet losses during startup, linear recovery 0.5 Mbs instantaneous Packet loss average Early packet drops

  7. TCP tuning (workarounds) • Avoid losses • retain/probe for “optimal” buffer sizes • ECN capable routers/hosts • reduce bursts (TCP vegas) • Faster recovery • bigger MSS (jumbo frames) • speculative recovery (D-SACK) • modified congestion avoidance? • Autotune (WAD variables) • Buffer size • Dupthresh • Del ACK, Nagle • AIMD • Virtual MSS

  8. Tuning opportunities • Parallel streams (psockets) • how to choose number of streams, buffer sizes? • autotune ? • Application routing daemons • indirect TCP • alternate path (Wolski, UCSB) • multipath (Rao, ORNL) • Other protocols (SCTP, DCP) • Out of order delivery • rate-based • Are these fair?

  9. Work-around Daemon (WAD) • Version 0 • passively collect flow data • tune unknowing sender/receiver • config file with “tuning info” ? • Based on Web100/Linux 2.4 • To be done • collecting tuning info • adding more knobs to kernel • Related work • Feng’s Dynamic Right Sizing • Linux 2.4 auto-tuning/caching • Mathis TCP buffer tunning

  10. Network Tool Analysis Framework (NTAF) • Configure and launch network tools • measure bandwidth/latency (iperf, pchar, pipechar) • collect passive data (SNMP from routers, OS/Web100 counters) • forecast bandwidth/latency for grid resource scheduling • augment tools to report Web100 data • Collect and transform tool results into a common format • Save results for short-term auto-tuning and archive for later analysis • compare predicted to actual performance • measure effectiveness of tools and auto-tuning • Auto-tune network applications • WAD (WorkAround Daemon) • tunable TCP stack

  11. Net100 interactions • Net100 is both a producer and consumer of network performance data • Active probes (Claffy Bandwidth Estimation, INCITE) • Passive sensors (LBL Network monitoring) • Auto-tuning • TCP optimizations (Feng/LANL, Linux 2.4) • smart transfer (IQecho, Logistical networking) • non-TCP protocols (DCP, STP, SCTP, rate-based, ?) • Net100 tuning could be applied to distributed applications • Climate/Probe, SuperNova, DataGrids • interact with Grid metaware (forecasting, scheduling, tuning) http://www.net100.org

More Related