1 / 33

InfiniBand: Today and Tomorrow

InfiniBand: Today and Tomorrow. Jamie Riotto Sr. Director of Engineering Cisco Systems (formerly Topspin Communications) jriotto@cisco.com. Agenda. InfiniBand Today State of the market Cisco and InfiniBand InfiniBand products available now Open source initiatives InfiniBand Tomorrow

ryu
Télécharger la présentation

InfiniBand: Today and Tomorrow

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. InfiniBand: Today and Tomorrow Jamie Riotto Sr. Director of EngineeringCisco Systems (formerly Topspin Communications) jriotto@cisco.com

  2. Agenda • InfiniBand Today • State of the market • Cisco and InfiniBand • InfiniBand products available now • Open source initiatives • InfiniBand Tomorrow • Scaling InfiniBand • Future Issues • Q&A

  3. InfiniBand Maturity Milestones • High adoption rates • Currently shipping > 10,000 IB ports / Qtr • Cisco acquisition will drive broader market adoption • End-to-end price points of <$1000. • New Cluster scalability proof-points • 1000 to 4000 nodes

  4. Cisco Adopts InfiniBand • Cisco acquired Topspin on May 16, 2005 • Adds InfiniBand to Switching Portfolio • Network Switches, Storage Switches, now Server Switches • Creates independent Business Unit to promote InfiniBand & Server Virtualization • New Product line of Server Fabric Switches (SFS) • SFS 7000 Series InfiniBand Server Switches • SFS 3000 Series Multifabric Server Switches

  5. Network Storage Cisco and InfiniBandThe Server Fabric Switch Network Switch Storage Switch Server Switch Clients Storage (SAN) Servers Network Resources (Internet, Printer, Server) Server

  6. Cisco HPC Case Studies

  7. Real Deployments Today: Wall Street Bank with 512 Node Grid Fibre Channel and GigE connectivity built seamlessly into the cluster SAN LAN Existing Networks GRID I/O 2 TS-360 w/ Ethernet and Fibre Channel Gateways 2 96-portTS-270 Core Fabric 23 24-port TS-120 Edge Fabric 512 Server Nodes

  8. Tungsten 2: 520 Node Supercomputer 6 72-portTS270 174 uplinkcables 29 24-port TS120 512 1mcables 520 Dual CPU Nodes1,040 CPUs 18 Compute Nodes 18 Compute Nodes Deployed: November 2004 NCSANational Center for Supercomputing Applications Core Fabric Edge Fabric • Parallel MPI codes for commercial clients • Point to point 5.2us MPI latency

  9. D.E. Shaw Bio-Informatics:1,066 Node Super Computer 1,066 Fully Non-Blocking Fault Tolerant IB Cluster Fault Tolerant Core Fabric 12 96-portTS-270 1,068 5m/7m/10m/15muplink cables 89 24-port TS-120 Edge Fabric 1,066 1mcables 12 Compute Nodes 12 Compute Nodes

  10. Large Government LabWorlds Largest Commodity Server Cluster – 4096 nodes • Application: • High Performance Super Computing Cluster • Environment: • 4096 Dell Servers • 50% Blocking Ratio • 8 TS-740s • 256 TS-120s • Benefits: • Compelling Price/Performance • Largest Cluster Ever Built (by approx. 2X) • Expected to be 2nd Largest Supercomputer in the world by node count 8x SFS TS740288 ports each CoreFabric 2048 uplinks(7m/10m/15m/20m) Edge 256x TS12024-ports each 18 Compute Nodes) 18 Compute Nodes) 8192 Processor 60TFlop SuperCluster

  11. InfiniBand Products Available Today

  12. InfiniBand Switches and HCAs • Fully non-blocking switch building blocks available in sizes from 24 up to 288 ports. • Blade servers offer integrated switches and pass-through modules • HCAs available in PCI-X and PCI-Express • IP & Fibre-Channel Gateway Modules

  13. IB Switch IB Switch HCA Integrated InfiniBand for Blade Servers Create “wire-once” fabric • Integrated 10Gbps InfiniBand switches provide unified “wire-once” fabric • Optimize density, cooling, space, and cable management. • Option of integrated InfiniBand switch (ex: IBM BC) or pass-thru module (ex: Dell 1855) • Virtual I/O provides shared Ethernet and Fibre Channel ports across blades and racks 10Gbps 30Gbps Blade Chassis with InfiniBand Switches

  14. Ethernet and Fibre Channel Gateways Unified “wire-once” fabric Server Cluster Single InfiniBand link for: - Storage - Network LAN/WAN SAN Server Fabric Fibre Channel to InfiniBand gateway for storage access Ethernet to InfiniBand gateway for LAN access

  15. InfiniBand Price / Performance • Myrinet pricing data from Myricom Web Site (Dec 2004) ** InfiniBand pricing data based on Topspin avg. sales price (Dec 2004)*** Myrinet, GigE, and IB performance data from June 2004 OSU study • Note: MPI Processor to Processor latency – switch latency is less

  16. CX4 Copper (15m) Flexible 30-Gauge Copper (3m) Fiber Optics up to 150m InfiniBand Cabling

  17. Host Drivers for Standard Protocols • Open source strategy = reliability at low cost • IPoIB: legacy TCP/IP applications • SDP: reliable socket connections (optional RDMA) • MPI: leading edge HPCC applications (RDMA) • SRP: block storage access (RDMA) • uDAPL: User level RDMA

  18. OS Support • Operating Systems Available: • Linux (Red Hat, SuSE, Fedora, Debian, etc.) • Windows 2000 and 2003 • HP-UX (Via HP) • Solaris (Via Sun)

  19. The InfiniBand Driver Architecture APPLICATION INFINIBAND SAN NETWORK BSD Sockets BSD Sockets FS API NFS-RDMA User UDAPL TCP SDP SDP DAT FILE SYSTEM Kernel IP SCSI TS TS Drivers API SRP FCP IPoIB VERBS ETHER INFINIBAND HCA FC ETHER SWITCH INFINIBAND SWITCH FC SWITCH ETH GW FC GW E SAN LAN/WAN SERVER FABRIC

  20. Open Software Initiatives • OpenIB.org • Topspin primary authors of major portions including IPoIB, SDP, SRP and TS-API. Cisco will continue to invest. • Current protocol development nearing production quality code. Expect release by end of year. • Charter has been expanded to include Windows and iWarp • MPI will be available in the near future (MVAPICH 0.96) • OpenSM • OpenMPI

  21. InfiniBand Tomorrow

  22. Looking into the future • Cost • Speed • Distance Limitations • Cable Management • Scalability • IB and Ethernet

  23. Speed: InfiniBand DDR / QDR, 4X / 12X • DDR Available end of 2005 • Doubles wire speeds to ? (ok, still working on this one) • PCI-Express DDR • Distances of 5-10m using copper • Distances of 100m using fiber • QDR Available WHEN? • 12X (30 Gb/s) available for over one year!! • Not interesting until 12X HCA • Not interesting until > 16X PCIe

  24. Future InfiniBand Cables • InfiniBand over CAT5 / CAT6 / CAT7 • Shielded cable distances up to ??? • Leverage existing 10-GigE cabling • 10-GigE too expensive?

  25. IB Distance Scaling • IB Short Haul • New Copper drivers • 25 – 50 Meters (KeyEye) • 75 - 100 Meters (IEEE 10Ge) • IB Wan • Same Subnet over distance (300 KM target) • Buffer / Credit / Timeout issues • Applications: Disaster Recover, Data Mirroring • IB Long Haul • IB over IP (over SONET?) • utilizes existing public plant (WDM, Debugging, etc)

  26. Scaling InfiniBand • Subnet Management • Host-side Drivers • MPI • IPoIB • SRP • Memory Utilization

  27. IB Subnet Manager • Subnets are getting bigger • 4,000 -> 10,000 nodes • Topology convergence times • Topology disturbance times • Topology disturbance minimization

  28. Subnet Management Challenges • Cluster Cold Start times • Template Routing • Persistent Routing • Cluster Topology Change Management • Intentional Change - Maintenance • Unintentional Change – Dealing with Faults • How to impact minimum number of connections • Predetermine fault reaction strategy? • Topology Diagnostic Tools • Link/Route Verification • Built-in BERT testing • Partition Management

  29. Multiple Routing Models • Minimum Latency Routing: • Load-Balanced Shortest-Path Routing • Minimum Contention Routing: • Lowest-Interference Divergent-Path Routing • Template Driven Routing: • Supports Pre-Determined Routing Topology • For example: Clos Routing, Matrix Row/Column, etc • Automatic Cabling Verification for Large Installations

  30. IB Routing Challenges • Static / Dynamic Routing • IB impliments Static Routing through Linear Forwarding Tables at each chip • Multi-LID Routing enables Dynamic Routing • Credit Loops • Cost Base Routing • Speed mismatches cause Store & Forward (vs. cut through) • SDR <> DDR <>QDR • 4X <> 12X • Short Haul <> Long Haul

  31. Multi-LID Source-Based Routing Support • Applications can implement “Dynamic” Routing for Contention Avoidance, Failover, Parallel Data Transfer 1,2,3,4 Leaf Switches Spine Switches Leaf Switches

  32. New IB Peripherals • CPUs? • Storage • SAN • NFS-RDMA • Memory (coherent / non-coherent) • Purpose built Processors? • Floating Point Processors • Graphics Processors • Pattern Matching Hardware • XML Processor

  33. THANK YOU! • Questions & Answers

More Related