1 / 24

Netbench T esting network devices with real-life traffic patterns

HEPiX Fall/Autumn 2017 Workshop. stefan.stancu@cern.ch. Netbench T esting network devices with real-life traffic patterns. Outline. The problem: evaluate network devices Test approaches Netbench Design Statistics visualization Sample results. The problem. N S. W E.

mglass
Télécharger la présentation

Netbench T esting network devices with real-life traffic patterns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HEPiXFall/Autumn 2017 Workshop stefan.stancu@cern.ch NetbenchTesting network devices with real-life traffic patterns

  2. Outline • The problem: evaluate network devices • Test approaches • Netbench • Design • Statistics visualization • Sample results

  3. The problem N S W E Stefan Stancu Network performance is key • Datacentre • Campus • Etc. Selecting new devices  evaluation is crucial: • Required features • Required performance • Traffic patterns

  4. Luxurious approach Tst Tst DUT Tst Tst Tst Tst Tst Tst partial mesh full mesh Tst = Tester DUT = Device Under Test Stefan Stancu • Use one tester port for each device port •  Cost explosion (tester port $$$) • N tester ports (tens or few hundreds) •  Exercises all ports • Line-rate • Packet size scan •  Forwarding of traffic with complex distributions [3] • Partial mesh • Full mesh •  Buffering only lightly exercised • RFC tests [1][2] create synthetic reproducible traffic patterns • Synchronized traffic generation, with minimal congestion •  Such tests are rarely performed and are likely biased • Manufacturers rent equipment • Third party tests on manufacturers demand

  5. Typical approach (affordable) – snake test Sequential path Tst DUT DUT Tst Tst Random path Tst Tst = Tester DUT = Device Under Test Stefan Stancu • Snake test • Use 2 tester ports • Loop back traffic •  Contained cost (tester port $$$): • only 2 tester ports •  Exercises all ports • Line-rate • Packet size scan •  Forwarding on a simple linear paths • Sequential paths  Easy to predict & optimize • Random path  “Impossible” to predict •  Buffering is not exercised • No congestion due to linear path

  6. Netbench DUT Tst = Tester DUT = Device Under Test Stefan Stancu • Use commodity servers and NICs • Orchestrate TCP flows (e.g. iperf3) •  Manageable cost • N server NIC ports – (tens or few hundreds) • Time-share the servers •  Exercises all ports • Mostly maximum size packets, similar with real-life •  Forwarding of traffic with complex distributions [3] • Partial mesh • Full mesh •  Buffering exercised • Multiple TCP flows, similar with real-life traffic • Congestion due to competing TCP flows • A reasonable size testbed becomes affordable

  7. Netbench design Traffic statistics Agent DUT iperf3 Web server {REST} XML-RPC Central Stefan Stancu

  8. Graphs/plots – expected flows • Plot the diff between expected and seen flows: • Goal: flat 0 (i.e. all flows have correctly started) netbench TCP fairness testing

  9. Graphs/plots – per node BW • Plot the per node overall TX/RX bandwidth • Goal: flat on all nodes (fair treatment of all the nodes) Stefan Stancu

  10. Graphs/plots – per-node flow BW • Plot the per node average bandwidth (and stdev) • Goal: flat on all nodes, and small stdev Stefan Stancu

  11. Graphs/plots – per-pair flow BW • Plot the per pair average bandwidth • Goal: flat (same colour) image (no hot/cold spots) Stefan Stancu

  12. Sample results Stefan Stancu • CERN has tendered (RFP) for datacentre routers • TCP flow fairness evaluation • Netbench with 64x 40G NICs

  13. Free running TCP (1) Stefan Stancu Per node BW (Tx/Rx) nice and flat on all nodes 

  14. Free running TCP (2) Line-Card GroupA GroupB GroupC LC2 gB LC1 gB LC2 gA LC2 gC LC1 gA LC1 gC Stefan Stancu • Groups A and C • all ports used • Small stdev • Group B has only • 8/12 ports used • Bigger stdev

  15. Free running TCP (3) Stefan Stancu • Clear pattern of the unfairness • RTT Latency (ping): • ~42ms regions • ~35ms regions • 0.2 – 25ms regions • TCP streams compete freely  device buffers fully exercised • Ports fully congested (most of them) • TCP needs to see drops to back-off • Good measure for the DUT buffers!

  16. Capped TCP window* (1) Stefan Stancu Per node BW (Tx/Rx) nice and flat on all nodes for all tests  * Capped TCP window: iperf3 -w 64k

  17. Capped TCP window (2) GroupA GroupB GroupC F2 gB F1 gB F2 gA F2 gC F1 gA F1 gC Stefan Stancu Uniform distribution Small stdev

  18. Capped TCP window (3) Stefan Stancu • All nodes achieve ~line-rate • Stable flat flow distribution • small stdev • latency < 1.1ms • Good measure for TCP flow fairness • When network congestion is controlled

  19. Summary Stefan Stancu • Netbench – affordable, large-scale testing of network devices • traffic patterns closely resemble real-life conditions. • Some manufacturers expressed strong interest in the tool • Anybody else?

  20. References Stefan Stancu [1] RFC 2544, Bradner, S. and McQuaid J., "Benchmarking Methodology for Network Interconnect Devices" [2] RFC 2889, Mandeville, R. and Perser J., "Benchmarking Methodology for LAN Switching Devices“ [3] RFC 2285, Mandeville, R., "Benchmarking Terminology for LAN Switching Devices" [4] iperf3 http://software.es.net/iperf/

  21. Stefan Stancu

  22. Backup material Servers tuning for fair flows

  23. Iperf and irq affinity [1] Stefan Stancu

  24. Iperf and irq affinity [2] Stefan Stancu

More Related