240 likes | 468 Vues
AVERAGES, DISTRIBUTIONS AND SCALABILITY OF MPI COMMUNICATION TIMES FOR ETHERNET AND MYRINET NETWORKS. Nor Asilah Wati Abdul Hamid and Paul Coddington Presented by: Ibrahim Saidu GS22854 Kumane Saed GS24433 Cheng Kian Yong GS24460 Luay GS 21605. Lecturer: Dr. Nor Asilah Wati Abdul Hamid.
E N D
AVERAGES, DISTRIBUTIONS AND SCALABILITY OF MPICOMMUNICATION TIMES FOR ETHERNET AND MYRINET NETWORKS Nor Asilah Wati Abdul Hamid and Paul Coddington Presented by: Ibrahim Saidu GS22854Kumane Saed GS24433 Cheng Kian Yong GS24460 Luay GS 21605 Lecturer: Dr. Nor Asilah Wati Abdul Hamid
INTRODUCTION • In the past few years, commodity clusters have become the dominant architecture for high performance computing. • Most parallel programs that run on clusters use the Message Passing Interface (MPI) for communicating data between nodes of the clusters. • It is well known that Myrinet with GM has significant advantages over Fast Ethernet with TCP. • In the case of Ethernet withTCP, retransmit timeouts (RTOs) can also occur
PROBLEM STATEMENT • Most modern parallel computers are clusters using Myrinet or Ethernet communication networks. • Several studies have been published comparing the performance of these two networks for parallel computing, however these focus on average performance, and do not address the distributions of communication times, which can have long tails due to contention effects. • In the case of Ethernet with TCP, retransmit timeouts (RTOs) can also occur.
OBJECTIVES • To investigate the effect of Retransmit timeouts (RTOs) on Ethernet performance and how much could be gained from reducing the effects of RTOs. • We have analyzed the distributions of communication times for standard MPI routines on Ethernet with TCP and Myrinet with GM communications networks on the same cluster. • We also studied the scalability of the distributions as the number of communicating processes increases.
RELATED WORK • [4,5,6,7]) measure only the average times for point-to-point (ping-pong) communications between two nodes. • [3] Studied the effects of TCP Retransmit Timeouts (RTO) on MPI communications over Ethernet networks, including collective communications. • [3,4,5,6]) compare network performance using applications benchmarks such as the NAS Parallel Benchmarks. • [3,4] analyzed the effects of tuning Ethernet drivers or TCP configuration to improve MPI performance on Ethernet networks.
RELATED WORK • [8] has used MPIBench to compare the MPI performance (including distributions of communication times) of Ethernet and Myrinet networks, but these were not direct comparisons. • [9] compare the performance of different Ethernet network topologies in commodity clusters, showed that there were significant problems with the performance of collective communications in MPICH version 1.2.0 on Fast Ethernet networks. • [11] used later version of the MPICH for collective communication routines , which give much better performance on Ethernet networks and perhaps reduce the number of RTOs
IBM eServer 1350 Linux Cluster • Fast Ethernet Architecture
METHODOLOGY Bench Mark. • Measurements of MPI communication times were obtained using MPIBench [1,2,8]. All measurements were run with dedicated access to the cluster, so there were no other processes affecting the results.
Send/Receive (Cont..) • Fast Ethernet are about 10 times higher than Myrinet. • For higher message sizes the difference is primarily due to the difference in bandwidth for each network. • For Ethernet there is a jump between 64 and 128 CPUs (32 to 64 nodes) which is due to the communication no longer being between processors connected by a single switch.
Send/Receive (Cont..) • TCP Retransmit-Timeout (RTO), which the TCP specifications say should be given by RTO = SRTT + 4 * RTTVAR • The average communication time without RTO (SRTT= 25 ms) plus the 200 ms minimum value for 4 * RTTVAR set by the Linux kernel. • Presumably caused by communications that suffer 2 or 3 RTOs before finally being completed
Combined Send/Receive (Cont..) • Results are approximately a factor of 2 larger than the MPI_Send/MPI_Recv • Results indicated the duplex capability of these networks is not being utilized.
Barrier (Cont…) • The big jump in the Ethernet result is probably due to a different algorithm being used in MPICH 1.2.6 code. • Ethernet is approximately 4-5 times slower than Myrinet.
Broadcast (Cont…) • Through a single Ethernet switch, rather than between switches, there are no RTOs for broadcast. • Myrinet distributions have quite long tails, which are caused by a small number of repetitions of the benchmark
Alltoall (Cont…) • That average completion time for Myrinet increases gradually with message size and number of processes. • Ethernet performance for more than 32 CPUs shows the effect of Retransmit -Timeouts
6. Conclusions • As expected, the Myrinet network performs significantly better than Fast Ethernet. • The TCP RTO on the Ethernet network does affect communications performance, but only for large message sizes and large numbers of processors, where the network becomes saturated. • The effects are much less serious than previous measurements.
Faculty of computer science and information Technology Thank you