ZioLib, Parallel I/O Library
80 likes | 517 Vues
ZioLib, Parallel I/O Library. Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory. Parallel netCDF write (256 256 256). Parallel netCDF read (256 256 256). Height (Z). Latitude (Y). Longitude (X).
ZioLib, Parallel I/O Library
E N D
Presentation Transcript
ZioLib, Parallel I/O Library Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory
Height (Z) Latitude (Y) Longitude (X) ZioLib uses I/O staging processors for Z-decomposition Distributed array In (X,Z,Y) index order Remapped at I/O staging PEs In (X,Y,Z) index order I/O staging PEs write global field in parallel • Relieves memory limitations of a PE • Relieves congestion on I/O nodes • Writes/reads in large blocks (no seeks) in parallel • Eliminates gather/scatter from user codes
Current status of ZioLib • A set of Fortran 90 modules supporting • netCDF I/O (serial and parallel) • direct-access unformatted I/O (serial and parallel) • sequential-access unformatted I/O (serial) • Works for arrays of any number of dimensions of integer*4, real*4 and real*8 • Reads or writes in any array index order • Works with any parallel decomposition • Can handle ghost nodes • Uses MPI-1 routines only – can still work for serial I/O on machines without a parallel file system, a parallel netCDF library or MPI-2
Direct-access write (256256256; XZY to XYZ) transpose global array total remap
Direct-access write (256256256; XZY to XYZ)Speed-up w.r.t. existing MPI + single-PE I/O
More on testing • Direct-access I/O with T42L26 resolution (1286426: 1.625 MB) • Write: speed up by 3-4 • Read: speed up by 6-7 • CAM2.0 history I/O with 8, 16 and 32 processors • with EUL (T42L26, Y-decomposition) and FV (B26, 2D-decomposition), load balancing chunking turned off • used the serial netCDF with one staging processor speed-up by 1.5-2.5 (with serial netCDF only)