Enhancing Parallel Computing Performance: Key Strategies and Applications

Parallel Computing Research L.V. (Sanjay) Kale Professor Dept. of Computer Science http://www.ks.uiuc.edu/Research/namd

Overview • Research at PPL • Develop technology that improves: • performance of parallel applications • programmer productivity • Load balancing issues • Communication optimizations • Parallel algorithms • Collaboration: CSE applications

Protein Folding Quantum Chemistry (QM/MM) Molecular Dynamics Computational Cosmology Charm++ Parallel Objects, Adaptive Runtime System Libraries and Tools Crack Propagation Dendritic Growth Space-time meshes Rocket Simulation Enabling CS technology of parallel objects and intelligent runtime systems has led to several collaborative applications in CSE

Charm++ in wider use • Applications are using Charm++ • Adding to its stability, robustness • Rocket simulation (ASCI center) • Computational Cosmology (Astrophysics) • QM (Car-Parinello method) • Crack propagation • Space-time meshes in process simulation • Large data visualization

Blue Gene • Blue Gene/L • 64K dual processor nodes • Targeted peak performance 180/360TF/s • Simulation and performance prediction • Demonstrated efficient parallelization of skeletal MD program

Collective Communication • Performance impediment • Issues • Communication latencies not scaling with bandwidth and processor speeds • High software over head (α) • Synchronous operations (MPI_Alltoall) do not utilize the co processor effectively • All to all personalized communication • Each processor has P messages to send • Dominated by software overhead

Optimizing AAPC • Message combining for small messages • Reduce the total number of messages • Messages sent along a virtual topology • Multistage algorithm to send messages • Group of messages combined and sent to an intermediate processors which then forward them to the final destinations • Using virtual topologies reduces software overhead of sending messages

Virtual Topology:Mesh Organizeprocessors in a 2D (virtual) Mesh Message from (x1,y1) to (x2,y2) goes via (x1,y2) 2* messages instead of P-1

Namd Performance on Lemieux

Enhancing Parallel Computing Performance: Key Strategies and Applications

Enhancing Parallel Computing Performance: Key Strategies and Applications

Presentation Transcript

Parallel Computing

159.703 Parallel Computing

Parallel Computing

Parallel Computing Explained Parallel Computing Overview

Parallel Computing

Parallel Computing

Parallel computing

Parallel Computing

Parallel Computing

GPU Parallel Computing

Parallel Computing

Parallel Computing

Parallel Computing

Parallel Computing Overview

Parallel Computing

ITC Research Computing for Parallel Applications

Parallel Computing

Parallel computing

Parallel Computing Overview

Multiprocessors - Parallel Computing

Parallel Computing Seminar

Parallel Computing