1 / 10

Computational Needs Going Forward

Computational Needs Going Forward. Quentin F. Stout. Some Differences. We have changed Initially most computation was for code development Models 1D 2D 3D Did not need all resources provided Now in transition to production UQ runs Will have extensive, continual, usage

cai
Télécharger la présentation

Computational Needs Going Forward

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Needs Going Forward Quentin F. Stout

  2. Some Differences • We have changed • Initially most computation was for code development • Models 1D 2D 3D • Did not need all resources provided • Now in transition to production UQ runs • Will have extensive, continual, usage • Within PSAAP, we are somewhat unique • Appear to be more involved with large development runs (other than DATs), thus encounter more problems • scheduling • cluster performance • bandwidth • I/O . . .

  3. Access and Performance • Scheduling overly optimized for small jobs • DATs suitable for production use of large jobs • DATs poor for code development • Other than DATs, on Lobo (LANL) large jobs (1000 cores, 16 hours) can only be iterated twice per week • 75 hour wait in queue • System errors more likely to appear on large jobs • Stressed the I/O system, encountered serious problems • Uncovered node performance problems on Hera (LLNL) • Overall, Lobo has many more problems • Situation forced us to use local resources to produce the 3D results we have presented

  4. Application Mean Time to Interrupt John T Daly, Performance Challenges for Extreme Scale Computing, 2007

  5. Data Pathway Problematic Difficulties • Major Impediment • ≈ 1 Mb/sec from Hera (LLNL) • timeout 12 hr • ≈ 40 Mb/sec from Lobo (LANL) • timeout 4 hr

  6. Production CRASH UQ Runs • UQ will guide needs, current estimate • 2D multigroup: > 1000 / year • each: 256 cores × 16 hr • 3D gray: > 100 / year • each: 1000 cores × 24 hr • Additional runs, such as sensitivity studies • 1000*2D-mg + 100*3D-g ≈ Hera +Loboreplacement • If Lobo replacement allocation as expected • If everything works, and we get timely access

  7. Production PDT UQ Runs • Based on UQ needs and scaling studies • 2D: 15 weekend DATs / year • each: 2048 cores × 16 hr • 3D: 10 weekend DATs / year • each: 8192 cores × 60 hr • 15*2D-PDT + 10*3D-PDT ≈ Hera + Loboreplace • If everything works … • In addition to the production runs, UM + TAMU need • ≈ 1 M core-hours for code development, scaling, etc.

  8. 3D Multigroup • Well-resolved 3D multigroup would be quite useful • However, ≈ 1 month on 1000 cores • Feasible on BlueGene ? • Perhaps ≈ 2 dayon 32K cores • Scaling is a serious concern • Strong scaling on Hera poor  • Reasonable weak scale Pleiades • Requires tuning for BG • Have run MHD on small BG • Initially investigate with modest effort • If appears feasible then will decide how to proceed • Not on critical path

  9. 3D PDT • Moderately resolved 3D PDT is even more daunting • 3D at 512 × 512 × 1024 ≈ 2.5M core hours • Only hope is BG, • or worthy successor • 2.5M ≈ 32K cores × 80 hr • Again, scaling a concern • BG scaling studies start • next month • If scales as hoped, will be largest Alliance use of BG

  10. Computational Challenges of the Coming Year • Continue transition to productionUQ, continue code • development, efficiency improvements, … • Scaling CRASH and PDT to BG • (if will remain available) • Challenges of • Obtaining sufficient allocation • Improved end-to-end performance • Scheduling • Bandwidth • I/O • …

More Related