1 / 23

Stochastic Optimization of Complex Energy Systems on High-Performance Computers

Stochastic Optimization of Complex Energy Systems on High-Performance Computers. Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov SIAM CSE 2013 Joint work with Olaf Schenk(USI Lugano ), Miles Lubin ( MIT ) ,

kiri
Télécharger la présentation

Stochastic Optimization of Complex Energy Systems on High-Performance Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Optimization of Complex Energy Systems on High-Performance Computers Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov SIAM CSE 2013 Joint work with Olaf Schenk(USI Lugano), Miles Lubin (MIT), Klaus Gaertner(WIAS Berlin)

  2. Outline • Application of HPC to power-grid optimization under uncertainty • Parallel interior-point solver (PIPS-IPM) • structure exploiting • Revisiting linear algebra • Experiments on BG/P with the new features

  3. Stochastic unit commitment with wind power • Wind Forecast – WRF(Weather Research and Forecasting) Model • Real-time grid-nested 24h simulation • 30 samples require 1h on 500 CPUs (Jazz@Argonne) Thermal generator Wind farm Slide courtesy of V. Zavala & E. Constantinescu

  4. Stochastic Formulation • Discrete distribution leads to block-angular LP

  5. Large-scale (dual) block-angular LPs Extensive form • In terminology of stochastic LPs: • First-stage variables (decision now): x0 • Second-stage variables (recourse decision): x1, …, xN • Each diagonal block is a realization of a random variable (scenario)

  6. Computational challenges and difficulties • May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty • “Large” scenarios (Wi up to 250,000 x 250,000) • “Large” 1st stage (1,000s, 10,000s of variables) • Easy to build a practical instance that requires 100+ GB of RAM to solve  Requires distributed memory • Real-time solution needed in our applications

  7. Linear algebra of primal-dual interior-point methods (IPM) Convex quadratic problem IPM Linear System Min subj. to. 2 solves per IPM iteration - predictor directions - corrector directions Multi-stage SP Two-stage SP nested arrow-shaped linear system (modulo a permutation) N is the number of scenarios

  8. Special Structure of KKT System (Arrow-shaped)

  9. Parallel Solution Procedure for KKT System • Steps 1 and 5 trivially parallel • “Scenario-based decomposition” • Steps 1,2,3 are >95% of total execution time.

  10. Components of Execution Time • Notice break in y-axis scale

  11. Scenario Calculations – Steps 1 and 5 • Each scenario is assigned to an MPI process, which locally performs steps 1 and 5. • Matrices are sparse and symmetric indefinite (symmetric with positive and negative eigenvalues). • Computing is very expensive when solving with the factors of against non-zero columns of and multiplying from left with • 4 hours 10 minutes wall time to solve a 4h-horizon problem with 8k scenarios on 8k nodes. • Need to run under strict time requirements • For example, solve 24h-horizon problem in less than 1h

  12. Revisiting scenario computations for shared-memory • Multiple sparse right-hand sides • Triangular solves phase hard to parallelize in shared-memory (multi-core) • Factorization phase speeds up very well and achieves considerable peak-performance • Our approach: incomplete factorization of • Stop factorization after the elimination of (1,1) block • will sit in the (2,2) block (Schurcomplement)

  13. Implementation • Requires modification of the linear solver • PARDISO (Schenk) -> PARDISO-SC • Pivot perturbations during factorization needed to maintain numerical stability • Errors due to perturbations are absorbed by iterative refinement • This would be extremely expensive in our case (many right-hand sides) • We let errors propagate in the “global” Schur complement C (Step 2) • Factorize the perturbed C (denoted by ) (Step 3) • After Step 1, 2 and 3, we have the factorization of an approximation matrix

  14. Pivot error absorption by preconditioned BiCGStab • Still we have to solve with • “Absorb errors” by solving Kz=r using preconditioned BiCGStab • Numerical experiments showed it is more robust than iterative refinement. • Preconditioner is • Each BiCGStab iteration requires • 2 mat-vecs: Kz • 2 applications of the preconditioner: • One application of the preconditioner resumes to performing “solve” steps 4 and 5 for

  15. Summary of the new approach

  16. Test architecture • “Intrepid” Blue Gene/P supercomputer • 40,960 nodes • Custom interconnect • Each node has quad-core 850 Mhz PowerPC processor, 2 GB RAM • DOE INCITE Award 2012-2013 – 24 million core hours

  17. Numerical experiments • 4h (UC4), 12h(UC12), 24h(UC24) horizon problems • 1 scenario per node (4 cores per scenario) • Large-scale: 12h horizon, up to 32k scenarios and 128k cores (k=1,024) • 16k scenarios – 2.08 billion variables, 1.81 billion constraints, KKT system size = 3.89 billion • LAPACK+SMP ESSL BLAS for first-stage linear systems • PARDISO-SC for second-stage linear systems

  18. Compute SC Times

  19. Time per IPM iteration • UC12, 32k scenarios, 32k nodes (128k cores) • BiCGStab iteration count ranges from 0 to 1.5 • Cost of absorbing factorization perturbation errors is between 10 and 30% of total iteration cost

  20. Solve to completion – UC12 • Before: 4 hours 10 minutes wall time to solve UC4 problem with 8k scenarios on 8k nodes • Now: UC12

  21. Weak scaling

  22. Strong scaling

  23. Conclusions and Future Considerations • Multicore-friendly reformulation of sparse linear algebra computations lead to one order of magnitude faster execution times. • Fast factorization-based computation of SC • Robust and cheap pivot errors absorption via Krylov iterative methods • Parallel efficiency of PIPS remains good. • Performance evaluation on today’s supercomputers • IBM BG/Q • Cray XK7, XC30

More Related