260 likes | 415 Vues
This talk is supported by Ewha University. High Performance Solvers for Semidefinite Programs. Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi @ NMRI Kazuhide Nakata @ Tokyo Tech Maho Nakata @ RIKEN.
E N D
This talk is supported by Ewha University High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi @ NMRI Kazuhide Nakata @ Tokyo Tech Maho Nakata @ RIKEN KSIAM Annual Meeting @ Jeju 2011/11/25(2011/11/25-2011/11/26)
Parallel SDPARA-C SDPARA Base solver SDPA SDPA-C Strucutural Sparsity SDPA-GMP SDPA-M Multiple precision Matlab Our interests & SDPA Family • How fast can we solve SDPs? • How large SDP can we solve? • How accurate can we solve SDPs? SDPA Homepage http://sdpa.sf.net/ KSIAM 2011 @ Jeju
SDPA Online Solver • Log-in the online solver • Upload your problem • Push ’Execute’ button • Receive the result via Web/Mail http://sdpa.sf.net/ ⇒ Online Solver KSIAM 2011 @ Jeju
Outline • SDP Applications • Primal-Dual Interior-Point Methods • Inside of SDPARA (Large & Fast) • Inside of SDPA-GMP (Accurate) • Conclusion
SDP Applications • Control Theory • Quantum Chemistry • Sensor Network Localization Problem • Polynomial Optimization KSIAM 2011 @ Jeju
SDP Applications 1.Control theory Against swing,we want to keep stability. Stability Condition⇒ Lyapnov Condition⇒ SDP INFOMRS 2011 @ Charlotte 6
Ground state energy Locate electrons Schrodinger Equation⇒Reduced Density Matrix⇒SDP SDP Applications2. Quantum Chemistry INFOMRS 2011 @ Charlotte 7
SDP Applications3. Sensor Network Localization Distance Information⇒Sensor Locations Protein Structure INFOMRS 2011 @ Charlotte 8
SDP Applications 4. Polynomial Optimization For example, NP-hard in general Very good lower boundby SDP relaxation method KSIAM 2011 @ Jeju 9
SDP Applications Control Theory Quantum Chemistry Polynomial Optimization Sensor Network Localization Problem Many Applications How Large & How Fast & How Accurate KSIAM 2011 @ Jeju 10
Standard form • The variables are • Inner Product is • The size is roughly determined by Ordinal solver Our target KSIAM 2011 @ Jeju
Primal-Dual Interior-Point Methods Central Path Target Optimal Feasible region KSIAM 2011 @ Jeju
Schur Complement Matrix Schur Complement Equation Schur Complement Matrix where 1. ELEMENTS (Evaluation of SCM) 2. CHOLESKY (Cholesky factorization of SCM) KSIAM 2011 @ Jeju
Computation time on single processor • SDPARA replaces these bottleneks by parallel computation Time unit is second, SDPA 7, Xeon 5460 (3.16GHz) Row-wise distribution Two-dimensional block-cyclic distribution KSIAM 2011 @ Jeju
Processor1 Processor2 Processor3 Processor4 Processor1 Processor2 Processor3 Processor4 Row-wise distribution Example • All rows are independent • Assign processorsin a cyclic manner • Simple idea⇒Very EFFICIENT • High scalability KSIAM 2011 @ Jeju
Block Algorithm for Cholesky factorization • Triangular Factorization (U: upper triangular matrix) Small Cholesky factorizaton Block Updates ParallelComputing
Processor1 Processor2 Processor3 Processor4 Processor1 Processor2 Processor3 Processor4 Two-dimensional block-cyclic distribution Example • Scalapack library • From the row-wise to TDBCD requires network communication • Cholesky on TDBCD is much faster than the on row-wise KSIAM 2011 @ Jeju
Numerical Results of SDPARA • Quantum Chemistry (m=7230, SCM=100%), middle size • SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory ELEMENTS 15x speedup CHOLESKY 12x speedup Total 13x speedup Very FAST!! KSIAM 2011 @ Jeju
Processor1:Thread1 Processor2:Thread1 Processor1:Thread2 Processor2:Thread2 Processor1:Thread1 Processor2:Thread1 Processor1:Thread2 Processor2:Thread2 Acceleration by Multiple Threading • Modern Processors have multi-cores • Multiple Threading is becoming common 2 Processors x2 Threads on each processor Two-level Parallel Computing KSIAM 2011 @ Jeju
Comparison with PCSDP • developed by Ivanov & de Klerk SDP: B.2P Quantum Chemistry (m = 7230, SCM = 100%)Xeon X5460, 3.16GHz x2 (8core), 48GB memory Time unit is second SDPARA is 8x faster by MPI & Multi-Threading (Two-level parallization) KSIAM 2011 @ Jeju
Extremely Large-Scale SDPs • 16 Servers [Xeon X5670(2.93GHz) , 128GB Memory] Other solvers can handle only The LARGEST solved SDP in the world KSIAM 2011 @ Jeju
Numerical Accuracy • One weakpoint of PDIPM • . • PDIPM requires • Eventually, numerical trouble(often, Cholesky fails) for example, KSIAM 2011 @ Jeju
Replace BLAS(Basic Linear Algebra Sytems) by MPLAPACK (Multiple precision LAPACK) SDPA-GMP Numerical Precision • Ordinal double precision in C or C++ • arbitrary precision in GMP library b c a 64bit = 1bit(sign) + 11bit(exponent)+53bit(fraction); accuracy = b c a We can arbitrary set the bit number offraction part. (for example, 200bit = )
Numerically Hard problem • Test Problem • PDIPM is stable if Slater’s condition • Graph Partition Problemhas no interior • Small ⇒ Numerically Hard KSIAM 2011 @ Jeju
Numerical Results of SDPA-GMP Small ⇒ Numerically Hard 24digits for even no-interior case KSIAM 2011 @ Jeju 25 SDPA-GMP uses 300 digits
Conclusion • SDPARA ⇒ How Fast & How Large100times & • SDPA-GMP ⇒ How Accurate • http://sdpa.sf.net/ & Online solver Thank you very much for your attention. KSIAM 2011 @ Jeju