1 / 41

Sockets Direct Protocol Over InfiniBand

Sockets Direct Protocol Over InfiniBand. Dror Goldenberg Senior Architect. Gilad Shainer Technical Marketing. gdror @ mellanox.co.il. shainer @ mellanox.com. Sockets Direct Protocol Over InfiniBand. Dror Goldenberg Senior Architect. Gilad Shainer Technical Marketing. gdror @ mellanox.co.il.

yon
Télécharger la présentation

Sockets Direct Protocol Over InfiniBand

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sockets Direct Protocol Over InfiniBand Dror GoldenbergSenior Architect Gilad ShainerTechnical Marketing gdror @ mellanox.co.il shainer @ mellanox.com

  2. Sockets Direct Protocol Over InfiniBand Dror GoldenbergSenior Architect Gilad ShainerTechnical Marketing gdror @ mellanox.co.il shainer @ mellanox.com

  3. Agenda • Introduction to InfiniBand • Sockets Direct Protocol (SDP) overview • SDP in WinIB stack • SDP performance

  4. Introduction To InfiniBand

  5. End Node End Node End Node End Node Switch Switch Switch End Node Switch Router End Node I/O Node End Node I/O Node Introducing InfiniBand • Standard interconnect • Defined by the InfiniBand Trade Association • Defined to facilitate low cost and high performance implementations • From 2.5Gb/s to 120Gb/s • Low latency • Reliability • Scalability • Channel based I/O • I/O consolidation • Communication, computation, management and storage over a single fabric

  6. InfiniBandOffload capabilities • Transport offload • Reliable/unreliable • Connection/datagram • RDMA and atomic operations • Direct access to remote node memory • Network-resident semaphores • Kernel bypass • Direct access from applicationto the HCA hardware

  7. Clustering and Failover Native InfiniBand Storage InfiniBand Today Operating Systems InfiniBandLanded on Motherboard Switches and Infrastructure InfiniBand Blade Servers Servers Storage Embedded, Communications,Military, and Industrial *Partial Lists

  8. InfiniBand Roadmap InfiniBand’s roadmap outpaces other proprietary and standard based I/O technologies in both pure performance and price/performance

  9. Commodities BasedSuper Computer • Clustering efficiency • High performance I/O • Scales with CPU performance • Scalability • 1000s and 10,000s of nodes • Price/performance • 20Gb/s fully offloaded • 40Gb/s H2 06 • $69 – IC, $125 – adapter • Volume OEM price • Single port IN SDR • Industry standard • Horizontal market

  10. Fabric Consolidation • Single fabric fits all • Communication • Computation • Storage • Management • Reduces fabric Total Cost of Ownership (TCO) InfiniBand truly enables fabric consolidation • Independent traffic virtual Lanes • Quality of service support • Logical partitioning • High capacity (BW↑, Latency↓)

  11. Data Center Virtualization Today 2006/2007 2007/2008

  12. Switch fabric links 2.5 to 120Gb/s HW transport protocol Reliable and unreliable Connected and datagram Kernel bypass Memory translation and protection tables Memory exposed to remote RDMA-read and RDMA-write Quality Of service Process at host CPU level QP at the adapter level Virtual lane at the link level Scalability and flexibility Up to 48K nodes in subnet, up to 2128 in network Network partitioning Multiple networks on a single wire Reliable, lossless, self-managing fabric End to end flow control Link level flow control Multicast support Congestion management Automatic path migration I/O Virtualization with channel architecture Dedicated services to guests OSes HW assisted protection and inter-process isolation Enable I/O consolidation The InfiniBand Fabric

  13. The Vision • Computing and storage as a utility • Exactly the same as electricity

  14. Sockets Direct Protocol (SDP) Overview

  15. Sockets Direct ProtocolMotivation SDP enables existing socket based application to transparently utilize the InfiniBand capabilities and achieve superior performance Better bandwidth Lower latency Lower CPU utilization

  16. Unmodified Applications WinSock API User SDP Kernel Access Layer Verbs Provider Driver HCA Hardware SDP In The Network Stack • Standardized wire protocol • Interoperable • Transparent • No need for API changes or recompilation • Socket semantics maintained • Leverages infiniBand capabilities • Transport offload – reliable connection • Zero copy – using RDMA • Kernel bypass* * Implementation dependent

  17. SDP Data Transfer Modes • Buffer copy (BCopy) • Zero copy (ZCopy) • Read ZCopy • Write ZCopy

  18. Transport offload SDP stack performs data copy SDP Data Message Buffer Copy Data Source Data Sink SDPBuf AppBuf SDP Buf SDPBuf SDP Buf SDP SDP Buf SDPBuf SDP Buf SDP Buf SDPBuf App Buf AppBuf SDP SDP Buf

  19. SrcAvail Message RDMA Read Read Response RdmaRdCompl Msg Read ZCopy Data Source Data Sink • Transport offloaded by HCA • True zero copy AppBuf App Buf AppBuf

  20. SinkAvail Message RDMA Write RdmaWrCompl Msg Write ZCopy Data Source Data Sink App Buf AppBuf AppBuf

  21. BCopy And ZCopy Summary

  22. SDP Versus WSD * SDP on Windows XP SP2 and Windows Server 2003 SP1 is supported by Mellanox It is unsupported by Microsoft

  23. SDP In WinIB Stack

  24. WinIB • Mellanox InfiniBand software stack for Windows • Based on open source development (OpenFabrics) • InfiniBand HCA verbs driver and Access Layer • InfiniBand subnet management • IPoverIB driver • SDP driver • WinSock Direct Driver (WSD) • SCSI RDMA Protocol Driver (SRP) • Windows Server 2003, Windows ComputeCluster Server 2003, Windows Server “Longhorn”* * WinIB on Windows XP SP2 is supported by Mellanox – It is unsupported by Microsoft

  25. Applications MPI2* Winsock Socket Switch SDP SPI WinSock Provider Access Layer Library Verbs Provider Library Applications TCP/UDP/ICMP IP SDP StorPort Windows NDIS Kernel Bypass Win IB SRP Miniport IPoIB Miniport Hardware Access Layer * Windows Compute Cluster Server 2003 Verbs Provider Driver Win IB Software Stack User WSD SAN Provider MPI2 Management Tools Kernel HCA Hardware

  26. Current APIs And I/O Supported

  27. Unmodified Applications WinSock API User Winsock Socket Switch SDP SPI SDP NDIS Applications IPoIB Miniport Windows Kernel Access Layer Win IB Verbs Provider Driver Hardware HCA Hardware SDP In Windows

  28. SDP Socket Provider • User-mode library • Implements Winsock Service Provider Interface (SPI) • Supports SOCK_STREAM socket types • WSPxxx function for each socket call • Socket switch implemented in the library • Policy based selection of SDP versus TCP • SDP calls are redirected to SDP module (ioctl) • Takes routing decision and performs ARP

  29. SDP Module • Kernel module • Implemented as a high level driver • Connection establishment/teardown • Mapping of MAC address to GID though IPoIB miniport • Path record query • IB Connection Management (CM) • Data transfer mechanism • Operations are implemented as asynchronous

  30. Buffer Copy Implementation • Only asynchronous mode is implemented in kernel • Synchronous calls artificially converted into overlapped operations and wait for their completion • SDP private buffers • Mapped through physical MR • 16KB buffers for send and receive • Data copy performed in • Caller’s context preferably • Dedicated helper thread per process otherwise

  31. Socket, WSASocket Connect, WSAConnect Bind Listen Accept, AcceptEx Close Send, WSASend,Recv, WSARecv Synchronous and overlapped Including IOCompletionPort getsockname getpeername getsockopt,setsockopt – partially WSAIoctl Data Transfer Modes Buffer Copy Current APIs And I/O Supported

  32. Future Plans • Zero Copy • Improve administrable policy (SDP versus TCP) • Performance tuning(latency, bandwidth, CPU%) • Automatic Path Migration • Quality of Service • Additional functionality

  33. SDP Performance

  34. Hardware Dual AMD Opteron 64 bit, 2.2GHz, 1MB Cache, 4GB RAM NVIDIA nForce 2200 MCP Mellanox InfiniHost III Ex DDR FW 4.7.600 Software Prediction for Windows Server “Longhorn” WinIB 1.3.0 (pre release) Benchmarks Bandwidth: nttcp 2.5 Latency: netpipe 3.6.2 InfiniBand 4x DDR 20Gb/s link HP Proliant DL145 G2 HP Proliant DL145 G2 Platform

  35. Bandwidth 1400.0 1200.0 1000.0 800.0 Bandwidth (MB/s) 600.0 400.0 200.0 0.0 1 2 4 8 64 32 16 1K 2K 4K 8K 1M 2M 128 256 512 16K 32K 64K 512K 128K 256K Message Size (Bytes) 1 Socket 2 Sockets

  36. Summary Of Results • Latency – 17.90us • for 1B message • Bandwidth – 1316 MB • at 128KB, 2 sockets

  37. ZCopy addition Increases BW Reduces CPU% Better scalability with numberof sockets Zero Copy

  38. Call To Action • Download WinIB • http://www.mellanox.com/products/win_ib.php • http://windows.openib.org/downloads/binaries/ • OpenFabrics InfiniBand Windows drivers development – sign up to contribute • http://windows.openib.org/openib/contribute.aspx

  39. Additional Resources • Web Resources • Specs: http://www.infinibandta.org/specs/ • White Papers http://www.mellanox.com/support/whitepapers.php • Presentations http://www.mellanox.com/support/presentations.php • Open Fabrics • http://www.openfabrics.org/ • https://openib.org/tiki/tiki-index.php?page=OpenIB+Windows • http://openib.org/mailman/listinfo/openib-windows • Feedback: Gdror @ mellanox.co.il

  40. © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related