1 / 1

Enhancing MPI Intra-node Communication for Improved Performance in Multicore and Manycore Systems

This paper discusses the rising challenges of MPI intra-node communication in the context of increasingly complex multicore and manycore architectures. It addresses the significant impact of effective intra-node communication on application performance, emphasizing the necessity for improved methodologies due to the large number of cores and sophisticated memory hierarchies. The multi-tuning framework utilized includes the KNEM kernel copy module, which facilitates efficient data copy strategies without intermediate buffers. Performance comparisons highlight advancements in broadcast operations and bandwidth capabilities against traditional methods.

graham
Télécharger la présentation

Enhancing MPI Intra-node Communication for Improved Performance in Multicore and Manycore Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Smart MPI Intra-node Communicationamong Multicore and Manycore Machines Teng Ma, George Bosilca, Aurelien Bouteiller and Jack J. Dongarra • MPI intra-node communication has big challenge facing the more complex multicore/manycore architecture: more cores, more memory hierarchies and more complex interconnection. • Due to the large number of cores on each node, and the one process per core approach favored by MPI users, improving the intra-node communications has a significant effect on application performance. Multi-tuning framework Kernel assisted collective: KNEM coll • KNEM: a kernel copy module (http://runtime.bordeaux.inria.fr/knem/) • Without intermediate shared memory buffer • Single copy between processes. • Offloading memory copy to non-root processes to avoid sequential copy at root process • Hwloc: find runtime communication pattern (http://www.open-mpi.org/projects/hwloc/) • Rule table: find the best communication parameters set.( OTPO or models) • Runtime parameter setting (a) Tigerton (b) Nehalem EP (a) Tigerton inter-socket (b) Nehalem inter-socket (c) Tigerton intra-socket (d) Nehalem intra-socket (c) Nehalem EX (d) Istanbul Fig 2. Performance comparison of Broadcast Operations between shared memory based modules (Basic, SM and Tuned) and KNEM coll, normalized to the Basic module runtime (lower is better). Fig 1. Bandwidth of ping-pong test for vanilla MPICH2, vanilla OpenMPI and multi-tuning OpenMPI

More Related