CMAQ Parallel Performance
This document summarizes the advancements in the CMAQ Sept 2003 release, focusing on parallel performance improvements utilizing a 118x118 xy grid with 21 z-layers across 84 species. Notable upgrades include a new version tested on a dual Xeon cluster, achieving significant reductions in computation time—1 day in just 6.7 hours and 1 year in 102 days. Key enhancements involve alternative numerical methods for aerosol computations, an improved Newton-Raphson solver for chemistry, and optimized I/O operations, facilitating improved performance for both serial and parallel processing.
CMAQ Parallel Performance
E N D
Presentation Transcript
CMAQ Parallel Performance • CMAQ Sept 2003 release, cb4_ae3_aq chemical mechanism • 118 x 118 xy grid (2 km), 21 z-layers, 84 species • 24-hour run, output every 2 hrs • HP Linux cluster, 3.0 GHz Xeon (dual-proc), 4 Gb/node, Myrinet, PVFS • New version on 1 proc: 1 day in 6.7 hrs 1 year in 102 days • New version on 32 procs: 1 day in 26 min 1 year in 6.8 days Original full Computation Original write Modified write Modified full
Performance Improvement Ideas May affect serial or parallel performance: Aerosol: alternative to numerical quadrature Chemistry: try damped Newton-Raphson solver Vdiff: array orderings for cache effects Xadv/Yadv: alternate formulation of PPM for reduced communication I/O: binary vs NetCDF (10x) multiple procs read & write pre- and post-processing of files 6.7 hrs 40 min