Download
ziolib parallel i o library n.
Skip this Video
Loading SlideShow in 5 Seconds..
ZioLib, Parallel I/O Library PowerPoint Presentation
Download Presentation
ZioLib, Parallel I/O Library

ZioLib, Parallel I/O Library

385 Vues Download Presentation
Télécharger la présentation

ZioLib, Parallel I/O Library

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ZioLib, Parallel I/O Library Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory

  2. Parallel netCDF write (256256256)

  3. Parallel netCDF read (256256256)

  4. Height (Z) Latitude (Y) Longitude (X) ZioLib uses I/O staging processors for Z-decomposition Distributed array In (X,Z,Y) index order Remapped at I/O staging PEs In (X,Y,Z) index order I/O staging PEs write global field in parallel • Relieves memory limitations of a PE • Relieves congestion on I/O nodes • Writes/reads in large blocks (no seeks) in parallel • Eliminates gather/scatter from user codes

  5. Current status of ZioLib • A set of Fortran 90 modules supporting • netCDF I/O (serial and parallel) • direct-access unformatted I/O (serial and parallel) • sequential-access unformatted I/O (serial) • Works for arrays of any number of dimensions of integer*4, real*4 and real*8 • Reads or writes in any array index order • Works with any parallel decomposition • Can handle ghost nodes • Uses MPI-1 routines only – can still work for serial I/O on machines without a parallel file system, a parallel netCDF library or MPI-2

  6. Direct-access write (256256256; XZY to XYZ) transpose global array total remap

  7. Direct-access write (256256256; XZY to XYZ)Speed-up w.r.t. existing MPI + single-PE I/O

  8. More on testing • Direct-access I/O with T42L26 resolution (1286426: 1.625 MB) • Write: speed up by 3-4 • Read: speed up by 6-7 • CAM2.0 history I/O with 8, 16 and 32 processors • with EUL (T42L26, Y-decomposition) and FV (B26, 2D-decomposition), load balancing chunking turned off • used the serial netCDF with one staging processor speed-up by 1.5-2.5 (with serial netCDF only)