90 likes | 231 Vues
This research focuses on improving the performance of parallel computing applications through advanced technology and efficient algorithms. Key areas of exploration include enhancing programmer productivity, optimizing load balancing, and refining communication methods. Collaborative efforts span various domains such as molecular dynamics, computational cosmology, and simulation of complex systems, leveraging the Charm++ framework. Noteworthy results demonstrate effective parallelization techniques that address communication latencies and software overhead, optimizing performance in large-scale computations like protein folding and rocket simulations.
E N D
Parallel Computing Research L.V. (Sanjay) Kale Professor Dept. of Computer Science http://www.ks.uiuc.edu/Research/namd
Overview • Research at PPL • Develop technology that improves: • performance of parallel applications • programmer productivity • Load balancing issues • Communication optimizations • Parallel algorithms • Collaboration: CSE applications
Protein Folding Quantum Chemistry (QM/MM) Molecular Dynamics Computational Cosmology Charm++ Parallel Objects, Adaptive Runtime System Libraries and Tools Crack Propagation Dendritic Growth Space-time meshes Rocket Simulation Enabling CS technology of parallel objects and intelligent runtime systems has led to several collaborative applications in CSE
Charm++ in wider use • Applications are using Charm++ • Adding to its stability, robustness • Rocket simulation (ASCI center) • Computational Cosmology (Astrophysics) • QM (Car-Parinello method) • Crack propagation • Space-time meshes in process simulation • Large data visualization
Blue Gene • Blue Gene/L • 64K dual processor nodes • Targeted peak performance 180/360TF/s • Simulation and performance prediction • Demonstrated efficient parallelization of skeletal MD program
Collective Communication • Performance impediment • Issues • Communication latencies not scaling with bandwidth and processor speeds • High software over head (α) • Synchronous operations (MPI_Alltoall) do not utilize the co processor effectively • All to all personalized communication • Each processor has P messages to send • Dominated by software overhead
Optimizing AAPC • Message combining for small messages • Reduce the total number of messages • Messages sent along a virtual topology • Multistage algorithm to send messages • Group of messages combined and sent to an intermediate processors which then forward them to the final destinations • Using virtual topologies reduces software overhead of sending messages
Virtual Topology:Mesh Organizeprocessors in a 2D (virtual) Mesh Message from (x1,y1) to (x2,y2) goes via (x1,y2) 2* messages instead of P-1