PPT - Adaptive MPI: Flexible Parallel Programming for Dynamic Applications PowerPoint Presentation

Adaptive MPI Chao Huang Parallel Programming Lab UIUC

Motivation • Issues • Highly dynamic parallel applications • Usually limited supercomputing platforms availability • Load imbalance and programming complexity • Little change to standard MPI program • Virtual processors • +vp option allows execution on desired number of virtual processors • Adaptive overlapping between computation and communication • Automatic load balancing Adaptive MPI

Outline • Motivation • Implementation • Features • Ongoing Work • Future Work Adaptive MPI

MPI processes MPI “processes” Implemented as virtual processes (user-level migratable threads) Real Processors AMPI: MPI with Virtualization • Each virtual process implemented as a user-level thread associated with a message-driven object Adaptive MPI

Virtualization • Basic idea • Virtual MPI processors mapped onto physical processors • Typically, # virtual processors > # processors • Advantages • Run program on any given number of processors • Adaptive overlapping computation and communication • Mapping strategy helps load balancing Adaptive MPI

Adaptive Overlapping Problem setup: Jacobi 3D problem size 2403 run on LeMieux. Run with virtualization ration 1 and 8. (p=8, vp=8 and 64) Adaptive MPI

Speedup Problem setup: Jacobi 3D problem size 2403 run on LeMieux. Shows AMPI with virtualization ratio of 1 and 8. Adaptive MPI

Virtualization Problem setup: Jacobi 3D problem size 2403 run on LeMieux. AMPI runs on any given # of PEs (eg 19, 33, 105), but native MPI needs cube #. Adaptive MPI

Load Balancing • Dynamic load balancing • Maps and re-maps objects as needed • Re-mapping strategies help adapt to dynamic variations • Load balancing by object migration: MPI_Migrate() • Collective call informing the load balancer that the thread is ready to be migrated, if needed • The load balancer migrates the objects • Packing, transferring and Unpacking (PUP) Adaptive MPI

Load Balancing Example AMR application Load balancer is activated at time steps 20, 40, 60, and 80. Adaptive MPI

Asynchronous Communications • Collective communications in MPI are complex and time consuming • MPI_Alltoall, etc • Implemented as blocking calls in MPI • Asynchronous calls enable overlapping computation with communication • Powered by communication optimization library MPI_Ialltoall(...) /* Some computation here */ MPI_Waitall(...) Adaptive MPI

Communication Optimization Alltoall time on 1K processors Adaptive MPI

Communication Optimization Alltoall CPU Overhead on 1K processors Adaptive MPI

Ongoing work • Automatic checkpointing • Improve robustness of large-scale applications • Performance prediction via direct simulation • Facilitate performance tuning w/o continuous access to large machine Adaptive MPI

Future Work • Support for visualization • Complete compliance to MPI-1.1 • Support for MPI-2 standard Adaptive MPI

Adaptive MPI: Flexible Parallel Programming for Dynamic Applications

Presentation Transcript

MPI

AMPI: Adaptive MPI Tutorial

Grid Computing With Charm And Adaptive MPI

AMPI: Adaptive MPI

MPI

Adaptive MPI

MPI

MPI

AMPI: Adaptive MPI

MPI

MPI

MPI

MPI

Adaptive MPI

MPI

Adaptive MPI

Grid Computing With Charm++ And Adaptive MPI

MPI

MPI

AMPI: Adaptive MPI Tutorial

Adaptive MPI Tutorial