1 / 32

UNEDF 2011 ANNUAL/FINAL MEETING

UNEDF 2011 ANNUAL/FINAL MEETING. Progress report on the BIGSTICK configuration-interaction code Calvin Johnson 1 Erich Ormand 2 Plamen Krastev 1,2,3 1 San Diego State University, 2 Lawrence Livermore Lab, 3 Harvard University Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587,

preston
Télécharger la présentation

UNEDF 2011 ANNUAL/FINAL MEETING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNEDF 2011 ANNUAL/FINAL MEETING Progress report on the BIGSTICK configuration-interaction code Calvin Johnson1 Erich Ormand2 Plamen Krastev1,2,3 1San Diego State University, 2Lawrence Livermore Lab, 3Harvard University Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587, and DE-AC52-07NA27344

  2. We have good news and bad news... UNEDF 2011 ANNUAL/FINAL MEETING ....both the same thing.... ....the postdoc (PlamenKrastev) got a permanent staff position in scientific computing at Harvard.

  3. BIGSTICK: REDSTICK BIGSTICK • General purpose M-scheme configuration interaction (CI) code • On-the-fly calculation of the many-body Hamiltonian • Fortran 90, MPI and OpenMP • 35,000+ lines in 30+ files and 200+ subroutines • Faster set-up • Faster Hamiltonian application • Rewritten for “easy” parallelization • New parallelization scheme 2

  4. BIGSTICK: REDSTICK BIGSTICK • Flexible truncation scheme: handles ‘no core’ ab initio Nhw truncation, valence-shell (sd & pf shell) orbital truncation; np-nh truncations; and more. • Applied to ab initio calculations, valence shell calculations (in particular level densities, random interaction studies, and benchmarking projected HF), cold atoms, and electronic structure of atoms (benchmarking RPA and HF for atoms). Version 6.5 is available at NERSC: unedf/lcci/BIGSTICK/v650/ 2

  5. UNEDF 2011 ANNUAL/FINAL MEETING BIGSTICK uses factorization algorithm reducesstorage of Hamiltonian arrays Comparison of nonzero matrix storage with factorization TRIUMF – Feb 2011

  6. BIGSTICK: Micah Schuster, Physics MS project 2

  7. BIGSTICK: Joshua Staker, Physics MS project 2

  8. BIGSTICK: 2

  9. BIGSTICK: 2

  10. 3

  11. BIGSTICK 3

  12. Major accomplishment as of last year: excellent scaling of mat-vec multiply This demonstrates our factorization algorithm, as predicted, facilitates efficient distribution of mat-vec ops UNEDF 2011 ANNUAL/FINAL MEETING

  13. Major accomplishments after last UNEDF meeting: • Rebalanced workload with additional constraint for dimension of local Lanczos vectors (Krastev) • Fully distributed Lanczos vectors with hermiticity on (Krastev) • Major steps towards distributing Lanczos vectors with suppressed hermiticity (Krastev) • OpenMP implementations in matrix-vector multiply (Ormand & Johnson) • Significant progress in 3-body implementation (Johnson & Ormand) • Added restart option (Johnson) • Implemented in-lined 1-body density matrices (Johnson) 6

  14. Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck UNEDF 2011 ANNUAL/FINAL MEETING

  15. Highlighting accomplishments for 2010-2011: Add OpenMP -- Crude 1st generation by Johnson (about 70-80% efficiency) -- 2nd generation by Ormand (nearly 100% efficiency) Hybrid OpenMP+MPI implemented, full testing delayed due to reorthogonalization issues UNEDF 2011 ANNUAL/FINAL MEETING

  16. Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) UNEDF 2011 ANNUAL/FINAL MEETING We break up the Lanczos vectors so only part on each node Future: separate forward/backward multiplication

  17. Lanczos vectors distribution: Vin Vout 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 22

  18. Lanczos vectors distribution: Hermiticity on Forward and … Vin Vout 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 22

  19. Lanczos vectors distribution: Hermiticity on Forward and … Vin Vout … backward application of H 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 22

  20. Lanczos vectors distribution: Hermiticity on Forward and … Vin Vout … backward application of H 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 Each compute node needs at a minimum TWO sectors from initial and TWO sectors from final Lanczos vector 22

  21. Lanczos vectors distribution: Hermiticity off Forward application of H on one node and … Vin Vout Proton sector Neutron sector 1 1 1 1 2 2 2 2 23

  22. Lanczos vectors distribution: Hermiticity off Forward application of H on one node and … Vin Vout Proton sector Neutron sector 1 1 1 1 2 2 2 2 … backward application of H on another node 1 1 1 1 2 2 2 2 23

  23. Lanczos vectors distribution: Hermiticity off Forward application of H on one node and … Vin Vout Proton sector Neutron sector 1 1 1 1 2 2 2 2 … backward application of H on another node 1 1 1 1 2 2 2 2 Each compute node needs ONE sector from initial and ONE sector from final Lanczos vector 23

  24. Comparison of memory requirements for distributing Lanczos vectors: Memory required to store 2 Lanczos vectors (double precision) on a node 24

  25. Comparison of memory requirements for distributing Lanczos vectors: Memory required to store 2 Lanczos vectors (double precision) on a node Distribution scheme with suppressed hermiticity is the most memory efficient. This is the scheme of choice for us 24

  26. Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck UNEDF 2011 ANNUAL/FINAL MEETING

  27. Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck We (i.e. PK) spent time trying to make MPI/IO efficient for our needs via striping, etc. Analysis by Rebecca Hartman-Baker (ORNL) suggests our I/O still running sequentially rather than in parallel. Now we will store all Lanczos vectors in memory a la MFDn (makes restarting an interrupted run difficult) UNEDF 2011 ANNUAL/FINAL MEETING

  28. Next steps for remainder of project period: • Store Lanczos vectors in RAM (end of summer) • Write paper on factorization algorithm (drafted, finish by • 9/2011) • Fully implement MPI/ OpenMP hybrid code (11/2011) • Write up paper for publication of code (early 2012) UNEDF 2011 ANNUAL/FINAL MEETING

  29. UNEDF Deliverables for BIGSTICK • The LCCI project will deliver final UNEDF versions of LCCI codes, • scripts, and test cases will be completed and released. • Current version (6.5) at NERSC; expect final version by end of year; • plans to publish in CPC or similar venue. • Improve the scalability of BIGSTICK CI code up to 50,000 cores. • Main barrier was reorthogonalization; now putting Lanczos • vectors in memory to minimize I/O • Use BIGSTICK code to investigate isospin breaking in pf shell • Delayed due to problem with I/O hardware on Sierra UNEDF 2011 ANNUAL/FINAL MEETING

  30. SciDAC-3 possible deliverables for BIGSTICK • (End of SciDAC-2: 3-body forces on 100,000 cores) • Run with 3-body up to 1,000,000 cores on Sequoia, • Nmax =10/12 for 12,14C • Add in 4-body forces; investigate alpha-clustering with • effective 4-body forces (via SRG or Lee-Suzuki) • Currently interfaces with Navratil’s TRDENS to generate • densities, spectroscopic factors, etc, needed for RGM reaction • calculations; will improve this: develop fast post-processing • with factorization • Investigate general unitary-transform effective interactions, • adding constraint to observables UNEDF 2011 ANNUAL/FINAL MEETING

  31. Sample application: cold atomic gases at unitarity in a harmonic trap Using only 1 generator (d/dr) (very much like UCOM) Fit to A =3, 1-, 0+ A = 4, 0+,1+, 2+ UNEDF 2011 ANNUAL/FINAL MEETING starting rms = 2.32 final rms = 0.58 UNEDF -- MSU June 2010

  32. Cross-fertilization of LCCI project: On-the-fly construction of basis states and matrix elements UNEDF 2011 ANNUAL/FINAL MEETING Reorthogonalization and Lanczos vector management BIGSTICK MFDn J-projected basis J-projected basis NuShellX

More Related