Adaptive MPI: Flexible Parallel Programming for Dynamic Applications
E N D
Presentation Transcript
Adaptive MPI Chao Huang Parallel Programming Lab UIUC
Motivation • Issues • Highly dynamic parallel applications • Usually limited supercomputing platforms availability • Load imbalance and programming complexity • Little change to standard MPI program • Virtual processors • +vp option allows execution on desired number of virtual processors • Adaptive overlapping between computation and communication • Automatic load balancing Adaptive MPI
Outline • Motivation • Implementation • Features • Ongoing Work • Future Work Adaptive MPI
MPI processes MPI “processes” Implemented as virtual processes (user-level migratable threads) Real Processors AMPI: MPI with Virtualization • Each virtual process implemented as a user-level thread associated with a message-driven object Adaptive MPI
Virtualization • Basic idea • Virtual MPI processors mapped onto physical processors • Typically, # virtual processors > # processors • Advantages • Run program on any given number of processors • Adaptive overlapping computation and communication • Mapping strategy helps load balancing Adaptive MPI
Adaptive Overlapping Problem setup: Jacobi 3D problem size 2403 run on LeMieux. Run with virtualization ration 1 and 8. (p=8, vp=8 and 64) Adaptive MPI
Speedup Problem setup: Jacobi 3D problem size 2403 run on LeMieux. Shows AMPI with virtualization ratio of 1 and 8. Adaptive MPI
Virtualization Problem setup: Jacobi 3D problem size 2403 run on LeMieux. AMPI runs on any given # of PEs (eg 19, 33, 105), but native MPI needs cube #. Adaptive MPI
Load Balancing • Dynamic load balancing • Maps and re-maps objects as needed • Re-mapping strategies help adapt to dynamic variations • Load balancing by object migration: MPI_Migrate() • Collective call informing the load balancer that the thread is ready to be migrated, if needed • The load balancer migrates the objects • Packing, transferring and Unpacking (PUP) Adaptive MPI
Load Balancing Example AMR application Load balancer is activated at time steps 20, 40, 60, and 80. Adaptive MPI
Asynchronous Communications • Collective communications in MPI are complex and time consuming • MPI_Alltoall, etc • Implemented as blocking calls in MPI • Asynchronous calls enable overlapping computation with communication • Powered by communication optimization library MPI_Ialltoall(...) /* Some computation here */ MPI_Waitall(...) Adaptive MPI
Communication Optimization Alltoall time on 1K processors Adaptive MPI
Communication Optimization Alltoall CPU Overhead on 1K processors Adaptive MPI
Ongoing work • Automatic checkpointing • Improve robustness of large-scale applications • Performance prediction via direct simulation • Facilitate performance tuning w/o continuous access to large machine Adaptive MPI
Future Work • Support for visualization • Complete compliance to MPI-1.1 • Support for MPI-2 standard Adaptive MPI