UNEDF 2011 ANNUAL/FINAL MEETING

UNEDF 2011 ANNUAL/FINAL MEETING Progress report on the BIGSTICK configuration-interaction code Calvin Johnson1 Erich Ormand2 Plamen Krastev1,2,3 1San Diego State University, 2Lawrence Livermore Lab, 3Harvard University Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587, and DE-AC52-07NA27344

We have good news and bad news... UNEDF 2011 ANNUAL/FINAL MEETING ....both the same thing.... ....the postdoc (PlamenKrastev) got a permanent staff position in scientific computing at Harvard.

BIGSTICK: REDSTICK BIGSTICK • General purpose M-scheme configuration interaction (CI) code • On-the-fly calculation of the many-body Hamiltonian • Fortran 90, MPI and OpenMP • 35,000+ lines in 30+ files and 200+ subroutines • Faster set-up • Faster Hamiltonian application • Rewritten for “easy” parallelization • New parallelization scheme 2

BIGSTICK: REDSTICK BIGSTICK • Flexible truncation scheme: handles ‘no core’ ab initio Nhw truncation, valence-shell (sd & pf shell) orbital truncation; np-nh truncations; and more. • Applied to ab initio calculations, valence shell calculations (in particular level densities, random interaction studies, and benchmarking projected HF), cold atoms, and electronic structure of atoms (benchmarking RPA and HF for atoms). Version 6.5 is available at NERSC: unedf/lcci/BIGSTICK/v650/ 2

UNEDF 2011 ANNUAL/FINAL MEETING BIGSTICK uses factorization algorithm reducesstorage of Hamiltonian arrays Comparison of nonzero matrix storage with factorization TRIUMF – Feb 2011

BIGSTICK: Micah Schuster, Physics MS project 2

BIGSTICK: Joshua Staker, Physics MS project 2

BIGSTICK: 2

BIGSTICK 3

Major accomplishment as of last year: excellent scaling of mat-vec multiply This demonstrates our factorization algorithm, as predicted, facilitates efficient distribution of mat-vec ops UNEDF 2011 ANNUAL/FINAL MEETING

Major accomplishments after last UNEDF meeting: • Rebalanced workload with additional constraint for dimension of local Lanczos vectors (Krastev) • Fully distributed Lanczos vectors with hermiticity on (Krastev) • Major steps towards distributing Lanczos vectors with suppressed hermiticity (Krastev) • OpenMP implementations in matrix-vector multiply (Ormand & Johnson) • Significant progress in 3-body implementation (Johnson & Ormand) • Added restart option (Johnson) • Implemented in-lined 1-body density matrices (Johnson) 6

Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck UNEDF 2011 ANNUAL/FINAL MEETING

Highlighting accomplishments for 2010-2011: Add OpenMP -- Crude 1st generation by Johnson (about 70-80% efficiency) -- 2nd generation by Ormand (nearly 100% efficiency) Hybrid OpenMP+MPI implemented, full testing delayed due to reorthogonalization issues UNEDF 2011 ANNUAL/FINAL MEETING

Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) UNEDF 2011 ANNUAL/FINAL MEETING We break up the Lanczos vectors so only part on each node Future: separate forward/backward multiplication

Lanczos vectors distribution: Vin Vout 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 22

Lanczos vectors distribution: Hermiticity on Forward and … Vin Vout 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 22

Lanczos vectors distribution: Hermiticity on Forward and … Vin Vout … backward application of H 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 22

Lanczos vectors distribution: Hermiticity on Forward and … Vin Vout … backward application of H 1 1 Proton sector Neutron sector 1 1 2 2 2 2 3 3 4 4 Each compute node needs at a minimum TWO sectors from initial and TWO sectors from final Lanczos vector 22

Lanczos vectors distribution: Hermiticity off Forward application of H on one node and … Vin Vout Proton sector Neutron sector 1 1 1 1 2 2 2 2 23

Lanczos vectors distribution: Hermiticity off Forward application of H on one node and … Vin Vout Proton sector Neutron sector 1 1 1 1 2 2 2 2 … backward application of H on another node 1 1 1 1 2 2 2 2 23

Lanczos vectors distribution: Hermiticity off Forward application of H on one node and … Vin Vout Proton sector Neutron sector 1 1 1 1 2 2 2 2 … backward application of H on another node 1 1 1 1 2 2 2 2 Each compute node needs ONE sector from initial and ONE sector from final Lanczos vector 23

Comparison of memory requirements for distributing Lanczos vectors: Memory required to store 2 Lanczos vectors (double precision) on a node 24

Comparison of memory requirements for distributing Lanczos vectors: Memory required to store 2 Lanczos vectors (double precision) on a node Distribution scheme with suppressed hermiticity is the most memory efficient. This is the scheme of choice for us 24

Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck UNEDF 2011 ANNUAL/FINAL MEETING

Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck We (i.e. PK) spent time trying to make MPI/IO efficient for our needs via striping, etc. Analysis by Rebecca Hartman-Baker (ORNL) suggests our I/O still running sequentially rather than in parallel. Now we will store all Lanczos vectors in memory a la MFDn (makes restarting an interrupted run difficult) UNEDF 2011 ANNUAL/FINAL MEETING

Next steps for remainder of project period: • Store Lanczos vectors in RAM (end of summer) • Write paper on factorization algorithm (drafted, finish by • 9/2011) • Fully implement MPI/ OpenMP hybrid code (11/2011) • Write up paper for publication of code (early 2012) UNEDF 2011 ANNUAL/FINAL MEETING

UNEDF Deliverables for BIGSTICK • The LCCI project will deliver final UNEDF versions of LCCI codes, • scripts, and test cases will be completed and released. • Current version (6.5) at NERSC; expect final version by end of year; • plans to publish in CPC or similar venue. • Improve the scalability of BIGSTICK CI code up to 50,000 cores. • Main barrier was reorthogonalization; now putting Lanczos • vectors in memory to minimize I/O • Use BIGSTICK code to investigate isospin breaking in pf shell • Delayed due to problem with I/O hardware on Sierra UNEDF 2011 ANNUAL/FINAL MEETING

SciDAC-3 possible deliverables for BIGSTICK • (End of SciDAC-2: 3-body forces on 100,000 cores) • Run with 3-body up to 1,000,000 cores on Sequoia, • Nmax =10/12 for 12,14C • Add in 4-body forces; investigate alpha-clustering with • effective 4-body forces (via SRG or Lee-Suzuki) • Currently interfaces with Navratil’s TRDENS to generate • densities, spectroscopic factors, etc, needed for RGM reaction • calculations; will improve this: develop fast post-processing • with factorization • Investigate general unitary-transform effective interactions, • adding constraint to observables UNEDF 2011 ANNUAL/FINAL MEETING

Sample application: cold atomic gases at unitarity in a harmonic trap Using only 1 generator (d/dr) (very much like UCOM) Fit to A =3, 1-, 0+ A = 4, 0+,1+, 2+ UNEDF 2011 ANNUAL/FINAL MEETING starting rms = 2.32 final rms = 0.58 UNEDF -- MSU June 2010

Cross-fertilization of LCCI project: On-the-fly construction of basis states and matrix elements UNEDF 2011 ANNUAL/FINAL MEETING Reorthogonalization and Lanczos vector management BIGSTICK MFDn J-projected basis J-projected basis NuShellX

UNEDF 2011 ANNUAL/FINAL MEETING