00:00

Efficient Traffic Management for ML Workloads in Datacenter Networks

The paper discusses the importance of network efficiency for AI/ML workloads in datacenter networks, highlighting challenges such as bursty traffic and flow collisions. It explores the need for high bandwidth and low latency for both AI training and inference tasks, emphasizing the impact of network performance on training step time and throughput. The introduction of CSIG (Congestion Signaling) protocol is proposed as a practical and effective in-band signaling solution for congestion control, traffic management, and network debuggability. CSIG offers fixed-length simple summaries of path bottlenecks and is designed for brownfield deployment with backward compatibility and interoperability.

escamochero
Télécharger la présentation

Efficient Traffic Management for ML Workloads in Datacenter Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Congestion Signaling (CSIG) Simple and Effective In-band Network Signals for Efficient Traffic Management in Datacenter Networks Paul Bottorff (HPE) Jai Kumar (Broadcom) IEEE 802 Nendica Meeting November 16, 2023 1

  2. Networking for ML Workloads ● Trend: More accelerators, more bandwidth per accelerator Horizontal scaling → exercises shared data center network fabric ○ ● Network efficiency is critical for AI/ML Bursty traffic, large # of connections, flow collisions Large-scale synchronized bursts stress the network unlike traditional compute / storage applications Training → throughput and latency sensitive, Inference / serving → latency sensitive ○ ○ ○ ○ ● Network performance directly impacts Training step time Barrier collectives in training sensitive to 100p network latency Throughput sensitive where compute/communication overlap. ○ ○ 2

  3. Networking for ML Workloads continued… ❑ 2 or 3 stage CLOS or Dragonfly+ topologies ❑ High bandwidth, high radix, ultra low latency switches • AI Inference: Needs Latency SLA aimed at distributed models • AI Training: Needs High bandwidth, low latency for small models ❑ 256K+ network endpoints ❑ Low tail latency and FCT ❑ High performant transport in tandem with efficient end-to-end congestion control ❑ Network signals for optimal bandwidth utilization, flow ramp up and scheduling 3

  4. Limitations of Existing INT Technologies • PMTU overhead • Increased switch latency, power budget and hardware complexity • Difficulty in working with tunneled domains and complexity in propagating INT metadata stack • Not very well suited for brownfield deployment • Protocols using IPv6 Hop-by-Hop options are processed in slow path • Considerable overhead for small sized packets Enters CSIG 4

  5. CSIG: Practical & Effective In-band Signaling protocol CSIG Reflection ● Provides fixed-length simple summaries from the path bottlenecks Custom Transpor t HW TCP Offloaded Transport ● Designed for Congestion Control, Traffic Management and Network debuggability use-cases UDP ● Designed for brownfield deployment with backward compatibility / interoperability IP ● Link to IETF Draft - https://datatracker.ietf.org/doc/draft-ravi-ippm-csig/ CSIG-tag Ethernet

  6. CSIG Tag 4B Version CSIG 1 | 2 | 3 | 4 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSIG-ethtype | T |R| Signal | Locator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 1-4 16 bits 3 bits queue] 1 bit 5 bit 7 bits sender] - CSIG-ethtype: IEEE allocated code point for 4Byte CSIG frame - T: Signal Type (0:minABW, 1: minABW/C, 2:maxDelay) [maxQueueSize/C][sumTrimTxByte per port or per trim - reserved - S: Signal: Bucketed Signal Value (fully configurable) - LM: Locator Metadata of bottleneck link (fully configurable per port information)[Port down notification back to

  7. CSIG Tag 8B Version CSIG 1 | 2 | 3 | 4 5 | 6 | 7 | 8 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CSIG-ethtype | Locator | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | T | Signal Value | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 1-4 16 bits 16 bits - CSIG-ethtype: IEEE allocated code point for 8Byte CSIG frame - LM: Locator Metadata of bottleneck link (fully configurable per port information) Byte 5-8 4 bits 20 bits 8 bits - T: Signal Type (0:minABW, 1: minABW/C, 2:maxDelay) - S: Signal: Quantized Signal Value - Reserved 8B: Fixed step function BW quantization 4B: Fully configurable 32 BW range buckets

  8. Ether Type Layering ARPA dstmac / srcmac / csig-tag / ethertype / payload 802.1q: dstmac / srcmac / vlan-tag / csig-tag / ethertype / payload 802.1ad dstmac / srcmac / vlan-tag / vlan-tag / csig-tag / ethertype / payload 802.1ad tunnel dstmac / srcmac / vlan-tag / vlan-tag / vlan-tag / vlan-tag / csig-tag / ethertype / payload 802.1ae dstmac / srcmac / security-tag / vlan-tag / csig-tag / ethertype / payload Bridge Domain will forward the traffic and “MAY” update the CSIG Tag if bridging device is capable Routed Domain need to understand the new ethtype and router will treat CSIG tag either as opaque tag for forwarding or will update the tag.

  9. Backed By Industry Leaders ✔ Google ✔ Broadcom ✔ Cisco ✔ HPE 9

  10. Thank You 10

More Related