170 likes | 190 Vues
Develop a "single shot" diagnostic tool to measure performance to users' desktops, analyzing connection and identifying network problems. The tool can pinpoint performance bottlenecks, provide hard evidence to reduce finger-pointing, and detect duplex mismatches and faulty hardware/link issues. Installation instructions and current deployments are provided. Results include faulty hardware identification, new link detection algorithm, and usage statistics. The tool compares the Mathis et al. formula with a new detection algorithm using Packet-Pair timing and statistical analysis.
E N D
Developing the Web100 Based Network Diagnostic Tool (NDT) E2EpiPEs/Web100 Joint Session April 9, 2002 by Rich Carlson Argonne National Laboratory
Motivation for work • Develop “single shot” diagnostic tool that doesn’t us historical data • Measure performance to users desktop • Combine numerous Web100 variables to analyze connection • Develop network signatures for ‘typical’ network problems
NDT Benefits • End-user based view of network • Can be used to identify performance bottlenecks (could be host problem) • Provides some ‘hard evidence’ to users and network administrators to reduce finger pointing • Doesn’t rely on historical data
Network Signatures • Duplex Mismatch Detection • Good results in Campus environment • Faulty Hardware/Link • Few reports, needs more work
Network Signatures • Bottleneck Link Type • New detection algorithm being developed • Link Duplex setting • Needs more work • Normal Congestion • Needs more work
Current Deployment • 3 servers at ANL • Miranda Externally visible • Ophelia, Cordelia ANL Internal only • Non-ANL Servers • Swiss Education and Research Network (SWITCH) • University of Michigan - Flint, MI • University of California - Santa Cruz, CA • Rochester Institute of Technology - Rochester, NY • StarLight peering point (coming soon)
Availability • Tools available via anonymous ftp from: achilles.ctd.anl.gov/pub/web100 directory • Contains source code and executables • Email discussion list <ndt@anl.gov> • Majordomo list <majordomo@achilles.ctd.anl.gov> • subscribe ndt
Installation and Configuration • Download and Build Web100 kernel/lib • grab base kernel from ftp.kernel.org • apply web100 patch • run favorite ‘kernel config’ command • enable experimental code • enable web100 specific code • make and install web100lib{.a|.so} • reboot and you’re ready to rock & roll
Installation and Configuration • Download web100-tools.tar{.gz} from ANL anonymous FTP server (achilles.ctd.anl.gov) • decide to run pre-compiled or ‘make’ your own • grab java SDK from sun for javac compiler • ensure web100srv program can access web100lib routines • change LD_LIBRARY_PATH environment variable • edit /etc/ld.so.conf and add /usr/local/lib, run ldconfig • start fakewww & web100srv programs and you’re off to the races (start.ndt script provided)
Results and Observations • Faulty Hardware identification • New Link Detection algorithm & preliminary results • Mathis et.al formula fails • Usage statistics • Demo
100 Mbps FD Ave Rtt %loss loss/sec 5.41 0.00 0.03 1.38 0.78 15.11 6.16 0.00 0.03 14.82 0.00 0.10 10 Mbps 72.80 0.01 0.03 8.84 0.75 4.65 Speed 94.09 Good 22.50 Bad NIC 82.66 Bad reverse 33.61 Congestion 6.99 Good 7.15 Bad NIC Effect of Faulty HW & Congestion
New Link Detection Algorithm • Uses Packet-Pair timing • Small Libpcap program captures data • Timing taken for each transmit/receive pair • Results quantized into unique bins • Statistical analysis on resulting bin counts • Will compare results with Paxson’s “Receiver-Side Estimation Algorithm”
Mathis et.al. Formula fails • Estimate = (K * MSS) / (RTT * sqrt(loss)) • old-loss = (Retrans - FastRetran) / (DataPktsOut - AckPktsOut) • new-loss = CongestionSignals / PktsOut • Estimate < Measured (K = 1) • old-loss 91/443 (20.54%) • new-loss 35/443 (7.90%) • old agrees with new 26/35 (74.29%)
Demo http://miranda.ctd.anl.gov:7123
Disclosure/Disclaimer • This work was supported (in part) by the Office of Science, U.S. Department of Energy under Contract W-31-109-ENG-38 • Packet-Pair work was supported by the Cisco University Research Program Work-for-Others Contract P-03008