Much Faster Algorithms for Matrix Scaling

Much Faster Algorithms for Matrix Scaling Matrix Scaling and Balancing via Box-Constrained Newton’s Method and Interior Point Methods Zeyuan Allen-Zhu, YuanzhiLi, Rafael Oliveira,AviWigderson Michael Cohen, Aleksander Mądry, Dimitris Tsipras, Adrian Vladu

Matrix Scaling M 1 = r MT1 = c .5 1 = 1 1 .5 2 Y X A M Matrix Balancing M1 = MT1 1 1 = 1 1 .5 2 X-1 X A M

Why Care? • Preconditioning linear systems (XAY) Y-1z = Xb A z = b • Approximating the permanent of nonnegative matrices Per(A) = Per(XAY) /(Per(X) Per(Y)) exp(-n) ≤ Per(XAY) ≤ 1 XAY doubly stochastic • Detecting perfect matchings A : adjacency matrix of bipartite graph ∃ perfect matching  Per(A) ≠ 0

Why Care? • Intensively studied in scientific computing literature • [Wilkinson ’59], [Osborne ’60], [Sinkhorn’64], [Parlett, Reinsch’69], [Kalantari, Khachiyan’15], [Schulman, Sinclair ’15], … • Matrix balancing routines implemented in MATLAB, R • Generalizations (operator scaling) are related to Polynomial Identity Testing • [Gurvits’04],[Garg, Gurvits, Oliveira, Wigderson’17] , …

Generalized Matrix Balancing Via Convex Optimization • Captures the problem’s difficulty • Solves matrix scaling via simple reduction rM = M 1 cM= MT1 1 1 = 1 1 .5 2 exp(-X) exp(X) A M X-1 X Goal: rM-cM=0 d f(x) = ∑ijAijexp(xi-xj) - ∑i dixi nice convex function ∇f(x) = rM - cM - d

Equivalent Nonlinear Flow Problem “Nonlinear Ohm’s Law”: fuv = Auvexp(xu- xv) Ohm’s Law: fuv = Auv(xu- xv) 1 2 e/2 .5 0 e/2 3 .5 e 1 e/2 3e/2 t e s .5 -2e e +2e 1.5 1 2 1 1 * edge weights = capacitances

Generalized Matrix Balancing Via Convex Optimization • Captures difficulty of both problems • Solves matrix scaling via simple reduction rM = M 1 cM= MT1 1 1 = 1 1 .5 2 A exp(X) M exp(-X) Goal: rM-cM=d Goal: |rM-cM-d|≤ ε f(x) = nice convex function ∇f(x) = rM - cM - d

Generalized Matrix Balancing Via Convex Optimization f(x) = nice convex function ∇f(x) = r - c - d General Convex Optimization Framework: f(x + Δ) = f(x) + ∇f(x)TΔ + ½ ΔTHxΔ + … Δ= argmin|Δ|≤c… Δ= argmin|Δ|≤c… Second order methods First order methods [Ostrovsky, Rabani, Yousefi’17] Matrix Balancing O(m+nε-2) • Sinkhorn/Osborne iterations are instantiations of this framework (coordinate descent) [Kalantari,Khachiyan, Shokoufandeh’97] Õ(n4 log ε-1)

Our Results • [AZLOW ’17 ] • [CMTV ’17 ] First Order Methods Second Order Methods Accelerated Gradient DescentO(mn1/3ε-2/3) Interior Point Method Õ(m3/2 log ε-1) Box-Constrained Newton Method New second-order framework (essentially identical in both papers) Õ((m+n4/3) log κ(X*)) Õ(m log κ(X*)) κ(X*) = condition number of matrix that yields perfect balancing

Generalized Matrix Balancing Via Convex Optimization f(x) = nice convex function • Can we use second order information to obtain a good solution in few iterations? ∇f(x) = rM - cM - d Hx)= diag(rM+cM) - (M+MT) f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ (*) • Hessian matrix is a graph Laplacian • Can compute Hx-1b in Õ(m) time [Spielman-Teng’08, …] M = exp(X) A exp(-X) rM = M 1 cM= MT1 • If |Δ|∞≤ 1then Hx ≈O(1)Hx+Δ (* whenever the Hessian does not change too much along the line between x and x+Δ)

Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: • If |Δ|∞≤ 1then Hx ≈O(1)Hx+Δ • Suppose we can exactly minimize the second order approximation over |Δ|∞≤ 1 • Goal: show that moving to minimizer inside box makes a lot of progress • f(x+Δ)-f(x*) ≥ 1/10 (f(x+Δ*)-f(x*)) Minimizer of quadratic approximationin L∞ region • Minimizer of f in L∞ region

R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method • f(O)-f(O) ≥ f(O)-f(O) • f(O)-f(O) ≥ (f(O)-f(O)) / |O-O|∞ • absolute upper bound R ∞ • arbitrarily close to O in Õ(R ∞) iterations

R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: • If |Δ|∞≤ 1then Hx ≈O(1)Hx+Δ • Õ(R∞) box constrained quadratic minimizations • Suppose we can exactly minimize the second order approximation over |Δ|∞≤ 1 • f(x+Δ)-f(x*) ≥ 1/10 (f(x+Δ*)-f(x*)) Minimizer of quadratic approximationin L∞ region • Minimizer of f in L∞ region

R∞ = maxx:f(x)≤f(x0) |x-x*|∞ Box-Constrained Newton’s Method f(x + Δ) ≈ f(x) + ∇f(x)TΔ + ½ ΔTHxΔ Key idea: • If |Δ|∞≤ 1then Hx ≈O(1)Hx+Δ • Õ(R∞) box constrained quadratic minimizations • Õ(kR∞) box constrained quadratic minimizations • Suppose we can exactly minimize the second order approximation over |Δ|∞≤ 1 • Unclear how to solve this fast  • Instead, relax the L∞ constraint by a factor of k • outsource to k-oracle

k-oracle Input: graph Laplacian L, vector b Ideally: output Instead: output • [AZLOW ’17 ] • [CMTV ’17 ] based on Laplacian solver [LPS ’15] based on approximate max flow algorithm [CKMST’11] Õ(m) Õ(m+n4/3)

Conclusions and Future Outlook • Nearly-linear time algorithms for matrix scaling and balancing • New framework for second order optimization • Used Hessian smoothness while avoiding self-concordance • Can we use any of these ideas for faster interior point methods? • Dependence in condition number log κ(X*) given by the R∞ bound • If we want to detect perfect matchings, R∞ = Θ(n) • Is there a way to improve this dependence? (log κ(X*))1/2 • We saw an extension of Laplacian solving. What else is there? • Better primitives for convex optimization?

Thank You!

Much Faster Algorithms for Matrix Scaling

Much Faster Algorithms for Matrix Scaling

Presentation Transcript

Dense Matrix Algorithms

Dense Matrix Algorithms

Inpainting Algorithms for Image Coding and Scaling

Parallel Matrix Multiplication and other Full Matrix Algorithms

Matrix Multiplication and Graph Algorithms

Scaling Genetic Algorithms using MapReduce

V3 Matrix algorithms and graph partitioning

Matrix Multiplication and Graph Algorithms

Sparse Matrix Algorithms

Faster algorithms for string matching with k mismatches

CS 219 : Sparse Matrix Algorithms

Dense Matrix Algorithms

Linear scaling fundamentals and algorithms

Using Sparse Matrix Reordering Algorithms for Cluster Identification

Genetic Algorithms for Fast Matrix Multiplication

How to Make WordPress Much Faster

How much faster is PyPy?

Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms

Parallel Matrix Multiplication and other Full Matrix Algorithms

Light travels much faster than sound. For example: