1 / 10

WRF Software Development and Performance

WRF Software Development and Performance. NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP: Tom Black, Jim Purser, S. Gopal NOAA/FSL: T. Henderson, J. Middlecoff, L.Hart U. Oklahoma: M. Xue AFWA: J. Wegiel, D. McCormick

odeda
Télécharger la présentation

WRF Software Development and Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WRF Software Development andPerformance NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP: Tom Black, Jim Purser, S. Gopal NOAA/FSL: T. Henderson, J. Middlecoff, L.Hart U. Oklahoma: M. Xue AFWA: J. Wegiel, D. McCormick C. Coats (MCNC), J. Schmidt (NRL), V. Balaji (GFDL) S. Chen (UC Davis), J. Edwards (IBM) Acknowledgement: Significant funding for WRF software development from DoD HPCMO CHSSI program (CWO6) John Michalakes, NCAR

  2. Goals Community Model Good performance Portable across a range of architectures Flexible, maintainable, understandable Facilitate code reuse Multiple dynamics/ physics options Run-time configurable Nested Aspects of Design Single-source code Fortran90 modules, dynamic memory, structures, recursion Hierarchical software architecture Multi-level parallelism CASE: Registry Package-neutral APIs I/O, data formats Communication IKJ Storage Order Vector’s not dead yet! WRF Software WRF Users Workshop

  3. Software Architecture Driver Driver Layer Config Inquiry DM comm I/O API Package Independent Solve Mediation Layer OMP Config Module WRF Tile-callable Subroutines Data formats, Parallel I/O Package Dependent Message Passing Threads Model Layer External Packages • Driver: I/O, communication, multi-nests, state data • Model routines computational, tile-callable, thread-safe • Mediation layer: interface between model and driver • Interfaces to external packages WRF Users Workshop

  4. Single version of code for efficient execution on: Distributed-memory Shared-memory Clusters of SMPs Vector and microprocessors WRF Multi-Layer Domain Decomposition Logical domain 1 Patch, divided into multiple tiles Model domains are decomposed for parallelism on two-levels • Patch: section of model domain allocated to a distributed memory node • Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. • Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches Inter-processor communication WRF Users Workshop

  5. I/O Architecture • Requirements of I/O Infrastructure • Efficiency: key concern for operations • Flexibility: key concern in research • Both types of user-institution already heavily invested in I/O infrastructure • Operations: GRIB, BUFR • Research: NetCDF, HDF • “Portable I/O” – adaptable to range of uses, installations without affecting WRF and other programs that use the I/O infrastructure WRF Users Workshop

  6. I/O Architecture • WRF I/O API • Package-independent interface to NetCDF, Fast-binary, HDF (planned) • Random access of fields by timestamp/name • Full transposition to arbitrary memory order • Built-in support for read/write of parallel file systems (planned) • Data-set centric, not file-centric (planned); Grid Computing • Additional WRF model functionality • Collection/distribution of decomposed data to serial datasets • Fast, asynchronous, “quilt-server” I/O from NCEP Eta model WRF Users Workshop

  7. 5 MB/s 16 MB/s I/O Performance 120,000,000 netcdf 100,000,000 bin 80,000,000 bytes/second 60,000,000 40,000,000 20,000,000 0 0 1 4 i/o servers WRF Users Workshop

  8. WRF Performance • Platforms • IBM SP (blackforest.ucar.edu) • 293 4x375 Mhz Power3 nodes • Peak 1500 Mflop/s/cpu • Compaq TCS (lemieux.psc.edu) • 750 4x1 GHz EV68 nodes • Peak 2000 Mflop/s/cpu • Scaling efficiency (32 to 512pe) • IBM: 69 % • Compaq: 57 % • Efficiency relative to peak • 32pes: IBM (7%), Compaq (20%) • 512pes: IBM (5%), Compaq (11%) • Sustained Performance: • IBM: 39 Gflop/second • Compaq: 110 Gflop/second • 12 km CONUS • 425x300x35 • 4.5 million cells • 22 Gflop/time step • 48 hour forecast • 21 minutes on 128p • 8 minutes on 512p • I/O time not included

  9. Model Performance • Efficiency with respect to other models • WRF about 2x cost of NCEP Eta (mid 2001) • Complexity: WRF 1.6 times more operations for a given period of integration • Code efficiency: WRF .78 of Eta • Scientific or forecast efficiency…? WRF Users Workshop

  10. Summary • Status • Third release: WRFV1.2, April 2002 • Systems: IBM, Compaq, SGI, PC/Alpha Linux • Nesting, 3DVAR: first implementations this Summer • WRF software architecture designed to support development and maintenance as a community model serving operational and research users over a range of applications, and on a variety of computing architectures • Additional information: http://www.wrf-model.org WRF Users Workshop

More Related