IEPM-BW Deployment Experiences

IEPM-BW Deployment Experiences Connie Logg SLAC Joint Techs Workshop February 4-9, 2006

Background • Originally conceived September 13, 2001 anddeveloped as an exhibitfor SC2001 in November 2001 on Solaris • Looked to be useful so development continued • After SC2001, it was installed on a Solaris host shared with other applications.Other applications interfered. • Original configuration file was a set of perl commands which defined the nodes, their configuration information, and the probes and parameters for each node.Very hard to understand, maintain, modify, and manage. • Quick port to Linux and moved to its own host. Still used perl commands as configuration database.

Background - continued • As the development proceeded, it was obvious that the configuration information for nodes and probes wasno longer manageable. Enter the MYSQL data base. • Whole package was redesigned using MySQL data base • Node specifications, monitoring host specifications, probe specifications, plot specifications, path specifications, and data are all maintained in the MySQL data base. Much more readable (web pages to display contents of monitoring configuration information), manageable and adaptable to changing needs and specifications

Conceptual Changes/Challenges The conception of what IEPM-BW should do and which probes it should use has changed over time. • Monitor with ping, traceroute, abwe • Added iperf • Added file transfers to the tests: bbcp, bbftp, gridftp – Discontinued because: • Performance tracked iperf • Disk speed is the overriding factor in throughput • Monitoring and target hosts not likely to be equipped with high speed disks • Disk latency studies are important, but should not be part of IEPM-BW • Added Pathload for available bandwidth measurements • Removed Pathload (suggestion that it was too intense) • Added Pathchirp • Added Thrulay to compare with Iperf • Added Pathload back in per suggestions from collaborators at Ultralight Meeting

Currently • Removed abwe as it was too noisy and did not work well on gigabit networks. • Evaluating Pathload vs Pathchirp and may remove Pathchirp…more likely will not run it to all nodes, just ones for which it works well. • May have to use different types of probes for different types of networks and distances between nodes – ONE SIZE DOES NOT FIT ALL All probes, presentation, and analysis is evolving as we understand more about the networking environments…which are themselves evolving.

Analysis and Presentation Analysis and data presentation ideas change • Timeseries plots first plots we had • Of interest from the plot below • Pathchirp not very good in some cases – reports > 1Gb thruput • Pathload more stable and probably accurate • RTT change in ping very clear - and seems to have no effect in this case – but does in others – note that it correlated with traceroute change

Analysis and Presentation • Added diurnal analysis to look at it and how it might be useful in event detection (bandwidth change) and possibly prediction

Analysis and Presentation • Scatterplots – useful for looking at correlations • Cross-plots (Y axis: pathchirp & iperf) vs X-axis: Thrulay

Analysis and Presentation • Added histograms to provide frequency distribution and CDF Shows possible multimodal distribution of achievable thruput measurements via thrulay But available bandwidth for the same node (by pathchirp) is stable

Analysis and Presentation • Packet Loss

Traceroute Visualization • One compact page per day • One row per host, one column per hour • One character per traceroute to indicate pathology or change (usually period(.)= no change) • Identify unique routes with a number • Be able to inspect the route associated with a route number • Provide for analysis of long term route evolutions Route # at start of day, gives idea of route stability Multiple route changes (due to GEANT), later restored to original route Period (.) means no change

Event Detection (throughput drops) • Must clearly define what you are looking for • How much change and in what time period • How to determine if it is time to alert again (don’t want repeated alerts for same drop) • Use the above to figure out how often you want to probe. • Do not overprobe…try to establish necessary frequency, and if that does the job, that is enough

Implementation Challenges • Functions such as ping require different options and parsing on different OSs. • When upgrading versions of the probe software, processing code may need to be modified because of output format changes. • Not only must upgrade monitoring host probe software, but also target host server versions • Being able to track what is working and what is not working and troubleshooting when code performance changes for the worse. • Automating distribution and maintenance • Which versions of gnuplot and drivers, MySQL and perl are available? Do they meet our needs? • Keeping the servers alive (target kit) • Monitoring and target hosts losing disks or having the OSs upgraded. • Maintaining proper TCP buffer sizes

Implementation Challenges • Many probes have to be done in a synchronous fashion. Do not run iperf, thrulay, and pathload at the same time. • Do not want to overload the network with probing activities – this constrains the number and frequency of probes that can be made • Currently high impact probes are short (20 seconds or less) and code only allows at most one probe to run within a minute. • If a process (probe, script, gnuplot, etc.) cannot hang…it will hang – Time everything out and watch for hangs so they can be automatically cleaned up.

Current Implementation MySQL tables for all configuration information • NODES – contains node definitions and path information for that node; all nodes, target and monitoring hosts are defined in this table • MONHOST – monitoring host specific information and plotting spec for all the data • TOOLSPECS – specification for each probe as well a plotting spec for the data and ‘last run’ field. • PLOTSPECS – miscellaneous plotting specifications (scatterplots, timeseries plots, other plot types)

Current Implementation MySQL tables for data storage • ABWEDATA – being discontinued (first data table) • BWDATA – All bandwidth data is stored here contains fields for: • RTT min, max, average, standard deviation • Thruput min, max, average, standard deviation, and final throughput • Number of streams, windowsize • Text results from probe • Time of probe Not all fields used for all data types

Current Implementation Tables for Traceroute data • ROUTENO – each route seen is given a unique identifier(routeno), and the row contains srcnode, destnode, firstseen, lastseen, ip hop list • ROUTEDATA – routeno (from ROUTENO table), text of traceroute output, number of hops, ip hop list, time of probe • Historical route data may be interesting to analyze for route changes over time, but no one has had the time or interest to do it. • NEW Coming Soon: ASN tables to store ASN info for hops – this is useful as it speeds up interactive drawing and display, and analysis of the traceroutes

Current Implementation SCHEDULE table holds the scheduling information for each probe, and tracks what state it is in. Each and every probe made (including ping) has a unique schedule ID which identifies the probe and all the parameters of the probe Scheduler checks the TOOLSPEC table to ascertain what probes are due to be run and inserts them in the SCHEDULE table Scheduled probes are only run if they are within the “current” time period. This prevents a large number of probes from being stacked up and flooding the network for a long time.

Trouble Shooting • Every script has a log file where it records errors and performance information such as how long it took to make a pass. These log files are rotated nightly, and kept for 7 days (easily changed) • Hanging processes (probes) are a fact of life. • Timeout all probes • Create a cleanup script that looks for processes which have been active longer than they should be and kills them

Troubleshooting • Lingering tasks report – A report showing schedule probes that were not run is generated every day. This is important, as if there are many probes not being run in the nick of time, it may mean that too many are being scheduled to run or that there is a performance problem. • Logging Report – A report showing the number of successful probes made, data base write failures, and other failure modes is generated. The info for this report is taken from the data logging log files.

Troubleshooting • NETFLOW records are valuable tool • Code running fine for years • TCP orphan sockets messages crashed machine • Netflow records for some 20 second iperf probes were lasting for > 1 minute (some 4 minutes) • Change in behavior from the past – were lasting 20-25 seconds • Disabled iperf probes and system stabilized • Now need to figure out what goes on with iperf probes…not all troublesome, just a few nodes

Performance Issues When probes show degradation in network performance • Is it the network? • Is it the monitoring node? – JAVA very bad experience • Is it the target node? Recommendation: • Have a local target host as a sanity check – also good to use as a target host from other monitoring hosts • The monitoring hosts should be dedicated systems • Monitor monitoring host load with Lisa, Ganglia, Nagios, APmon to MonALISA, etc.

Performance Issue Example – Bad JAVA Program Caltech monitoring host as seen from iepm-bw@slac CALTECH target host as seen from iepm-bw@slac SLAC target host as seen from iepm-bw@slac

Problems • Node name disappears from DNS • Ports get suddenly blocked • Disks crash (lost the entire CALTECH data base – backup was on same physical disk) – need separate physical disk for local backup • Monitoring and target hosts get OS upgrades without warning • installed code disappears • Data bases get zapped We are now working on backing up data bases and source code configuration information to SLAC once a day. • Utility packages (gnuplot, for example) get silently upgraded Discussion about distributing our own

Future Directions • Automate installation and configuration process • Manage code with CVS and distribute via pacman cache • Deploy IEPM-BW for LHC monitoring – see if it is useful and/or relevant – if so, it can be expanded and developed to meet changing needs • Upload monitoring data and alerts to MonALISA • Implement OWAMP and BWCTL • Look at Pathneck • Implement min and max (maybe also average) RTT analysis and integrate it with other change analysis

Summary • Are you monitoring to determine problems or monitoring for forecasting? They are very different but can both be done with same monitoring • With respect to real disk to disk transfers – the disk latency is the overwhelming factor. The monitoring can tell you how the network is performing, but this is not necessarily related to application performance. • Bearing this in mind, I do not think we need to perform disk to disk transfers with the monitoring systems or intensive network testing • Be prepared to be flexible in your architecture. Networks themselves are constantly evolving and so the probes, analysis, and presentation must also evolve. What would I have done differently along the way? In hindsight, not a lot. It has been a constant process of learning. The code adapted fairly well to the research we needed to do – Remember it started as an exhibit for SC2001 and has been a research and learning tool since then. More manpower would have been very useful and if it had been available, the code, package structure and the documentation would be more professional, and the change analysis and prediction/forecasting would be more complete.

References: • http://www-iepm.slac.stanford.edu/ • http://www.slac.stanford.edu/comp/net/iepm-bw.slac.stanford.edu/slac_wan_bw_tests.html • Papers/web pages on web100, netflow, and active measurement correlation: • http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-9641.pdf • http://www.slac.stanford.edu/comp/net/bandwidth-tests/web100/ • Recommended monitoring and target host configurations • IEPM-BW Installation and PLM (being updated and reorganized) Contributors: Les Cottrell, Jerrod Williams, Mahesh Chhaparia, I-Heng Mei, Manish Bhargava, Jiri Navratil, Yee Ting-Li, all at SLAC now or in the past; Maxim Grigoriev(FNAL), and developers of the probes we use. QUESTIONS?

Extra Slides

Installation Challenges - perl • Perl is not located on all machines in the same place. • Tried to settle on /usr/bin/perl • Needed to install various perl modules • Some conflict with already installed modules Haven’t done this but one possibility is to have a private version of perl which has been configured for this application.

Installation Challenges - MySQL • Is MySQL already installed? • What version? 3.x.y is not compatible with 4.x.y and 5.x.y – may have to install it • Used RPMs. Some releases of RPMs were buggy (5.0.16 vs 5.0.18). Install from source? • Need to install perl bundle for MySQL. Where to install it (/usr/local/bin/perl or /usr/bin/perl or ?) • Could not install it in /usr/local/bin/perl at SLAC because it had an old version and was in AFS. • Selected /usr/bin/perl. • Installation of MySQL and perl drivers requires attention to the details and order is important

IEPM-BW Deployment Experiences