190 likes | 291 Vues
Cluster computing integrates standard PCs or workstations connected by high-speed networks, providing an efficient price-to-performance ratio by utilizing idle machines or dedicated hardware. Unlike traditional supercomputers, cluster computers perform with similar processing power but rely on enhanced communication performance. This overview discusses the infrastructure of our departmental clusters, including DAS-1, DAS-2, and DAS-3, detailing their specifications, network protocols, and performance metrics. It highlights advances made in network interfaces like Myrinet, enabling better throughput and low-latency communication.
E N D
Introduction • Cluster computing • Standard PCs or workstations connected by a fast network • Good price/performance ratio • Exploit existing (idle) machines or use (new) dedicated machines • Cluster computers versus supercomputers • Processing power is similar: based on microprocessors • Communication performance was the key difference • Modern networks (Myrinet, Infiniband) have bridged this gap
Overview • Cluster computers at our department • DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone) • DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster • DAS-3: 85-node dual-core dual Opteron / Myrinet-10G cluster • Part of a wide-area system: Distributed ASCI Supercomputer • Network interface protocols for Myrinet • Low-level systems software • Partly runs on the network interface card (firmware)
Node configuration • 200 MHz Pentium Pro • 128 MB memory • 2.5 GB disk • Fast Ethernet 100 Mbit/s • Myrinet 1.28 Gbit/s (full duplex) • Operating system: Red Hat Linux
DAS-2 Cluster (2002-2006) • 72 nodes, each with 2 CPUs (144 CPUs in total) • 1 GHz Pentium-III • 1 GB memory per node • 20 GB disk • Fast Ethernet 100 Mbit/s • Myrinet-2000 2 Gbit/s (crossbar) • Operating system: Red Hat Linux • Part of wide-area DAS-2 system (5 clusters with 200 nodes in total) Ethernet switch Myrinet switch
DAS-3 Cluster (Sept. 2006) • 85 nodes, each with 2 dual-core CPUs(340 cores in total) • 2.4 GHz AMD Opterons (64 bit) • 4 GB memory per node • 250 GB disk • Gigabit Ethernet • Myrinet-10G 10 Gb/s (crossbar) • Operating system: Scientific Linux • Part of wide-area DAS-3 system (5 clusters with 263 nodes in total),using SURFnet-6 optical network with 40-80 Gb/s wide-area links
DAS-3 Networks Nortel 5530 + 3 * 5510 ethernet switch 85 compute nodes 85 * 1 Gb/s ethernet Nortel 1 or 10 Gb/s Campus uplink 10 Gb/s ethernet 8 * 10 Gb/s eth (fiber) 85 * 10 Gb/s Myrinet 80 Gb/s DWDM SURFnet6 Nortel OME 6500 with DWDM blade Myrinet 10 Gb/s Myrinet 10 Gb/s ethernet blade Myri-10G switch Headnode (10 TB mass storage)
DAS-1 Myrinet Components: • 8-port switches • Network interface card for each node (on PCI bus) • Electrical cables: reliable links Myrinet switches: • 8 x 8 crossbar switch • Each port connects to a node (network interface) or another switch • Source-based, cut-through routing • Less than 1 microsecond switching delay
128-node DAS-1 cluster • Ring topology would have: • 22 switches • Poor diameter: 11 • Poor bisection width: 2
PC PC PC PC Topology 128-node cluster • 4 x 8 grid withwrap-around • Each switch is connectedto 4 other switchesand 4 PCs • 32 switches (128/4) • Diameter: 6 • Bisection width: 8
Myrinet interface board Hardware • 40 MHz custom cpu (LANai 4.1) • 1 MByte SRAM • 3 DMA engines (send, receive, to/from host) • full duplex Myrinet link • PCI bus interface Software • LANai Control Program(LCP)
Network interface protocols for Myrinet • Myrinet has programmable Network Interface processor • Gives much flexibility to protocol designer • NI protocol: low-level software running on NI and host • Used to implement higher-level programming languages and libraries • Critical for performance • Want few μsec latency, 100s MB/sec throughput • Map network interface (NI) into user space to avoid OS overhead • Goal: give supercomputer communication performance to clusters
Enhancements (see paper) • Optimizing throughput using Programmed I/O instead of DMA • Making communication reliable using flow control • Use combination of polling and interrupts to reduce message receipt overhead • Efficient multicast communication in firmware
Multicast • Implement spanning tree forward protocol on NIs • Reduces forward latency • No interrupts on hosts 1 2 3 4
Performance • DAS-2: • 9.6 μsec 1-way null-latency • 168 MB/sec throughput • DAS-3: • 2.6 μsec 1-way null-latency • 950 MB/sec throughput
MareNostrum: largest Myrinet cluster in the world • IBM system at Barcelona Supercomputer Center • 4812 PowerPC 970 processors, 9.6 TB memory