220 likes | 329 Vues
This paper presents innovative thread load balancing strategies for fully permutable loops in hybrid parallel programming, specifically targeted at SMP clusters. Hybrid parallelization is vital due to the intrinsic load imbalance faced by SPMD (Single Program Multiple Data) models. We introduce two static load balancing techniques—constant and variable balancing—that enhance performance during coarse-grain funneled parallelization. Experimental evaluations demonstrate the effectiveness of these methods against micro-kernel benchmarks, advancing the efficiency of both fine-grain and coarse-grain hybrid parallelization models.
E N D
Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr
Motivation • fully permutable loops always a computational challenge for HPC • hybrid parallelization attractive for DSM architectures • currently, popular free message passing libraries provide limited multi-threading support • SPMD hybrid parallelization suffers from intrinsic load imbalance ICPP-HPSEC 2005
Contribution • two static thread load balancing schemes (constant-variable) for coarse-grain funneled hybrid parallelization of fully permutable loops • generic • simple to implement • experimental evaluation against micro-kernel benchmarks of different programming models • message passing • fine-grain hybrid • coarse-grain hybrid (unbalanced, balanced) ICPP-HPSEC 2005
Algorithmic model foracross tile1 do … foracross tileNdo for tilen-1do Receive(tile); Compute(A,tile); Send(tile); Restrictions: • fully permutable loops • unitary inter-process dependencies ICPP-HPSEC 2005
Message passing parallelization • tiling transformation • (overlapped?) computation and communication phases • pipelined execution • portable • scalable • highly optimized ICPP-HPSEC 2005
Hybrid parallelization So… why bother? ICPP-HPSEC 2005
Hybrid parallelization: why bother I shared memory programming model vs message passing programming model for shared memory architecture ICPP-HPSEC 2005
Hybrid parallelization: why bother II DSM architectures are popular! ICPP-HPSEC 2005
Fine-grain hybrid parallelization • incremental parallelization of loops • relatively easy to implement • popular • Amdahl’s law restricts parallel efficiency • overhead of thread structures re-initialization • restrictive programming model for many applications ICPP-HPSEC 2005
Coarse-grain hybrid parallelization • generic SPMD programming style • good parallelization efficiency • no thread re-initialization overhead • more difficult to implement • intrinsic load imbalance assuming common funneled thread support level ICPP-HPSEC 2005
fine-grain hybrid comp comp comm comm … comp coarse-grain hybrid comp comm comp … comp MPI thread support levels • single • masteronly • funneled • serialized • multiple ICPP-HPSEC 2005
Load balancing Idea Consequence master thread assumes a smaller fraction of the process tile computational load compared to other threads ICPP-HPSEC 2005
Load balancing (2) Assuming It follows T………total number of threads p………current process id ICPP-HPSEC 2005
Load balancing (3) ICPP-HPSEC 2005
Experimental Results • 8-node dual SMP Linux Cluster (800 MHz PIII, 256 MB RAM, kernel 2.4.26) • MPICH v.1.2.6 (--with-device=ch_p4, --with-comm=shared, P4_SOCKBUFSIZE=104KB) • Intel C++ compiler 8.1 (-O3 -static -mcpu=pentiumpro) • FastEthernet interconnection network ICPP-HPSEC 2005
Alternating Direction Implicit (ADI) • Stencil computation used for solving partial differential equations • Unitary data dependencies • 3D iteration space (X x Y x Z) ICPP-HPSEC 2005
ADI ICPP-HPSEC 2005
Synthetic benchmark ICPP-HPSEC 2005
Conclusions • fine-grain hybrid parallelization inefficient • unbalanced coarse-grain hybrid parallelization also inefficient • balancing improves hybrid model performance • variable balanced coarse-grain hybrid model most efficient approach overall • relative performance improvement increases for higher communication vs computation needs ICPP-HPSEC 2005
Thank You! Questions? ICPP-HPSEC 2005
ADI ICPP-HPSEC 2005
Synthetic benchmark ICPP-HPSEC 2005