Sanjeev Arora Princeton University

Multiplicative weights method: A meta algorithm with applications to linear and semi-definite programming Sanjeev Arora Princeton University Based upon:Fast algorithms for Approximate SDP [FOCS ‘05]log(n) approximation to SPARSEST CUT in Õ(n2) time [FOCS ‘04]The multiplicative weights update method and it’s applications [’05]See also recent papers by Hazan and Kale.

Multiplicative update rule (long history) Applications: approximate solutions to LPs and SDPs, flow problems, online learning (boosting), derandomization & chernoff bounds, online convex optimization, computational geometry, metricembeddongs, portfolio management… (see our survey) n agents weights w1 w2 . . . wn Update weights according to performance: wit+1Ã wit (1 +  ¢ performance of i)

Simplest setting – predicting the market • N “experts” on TV • Can we perform as good as the best expert ? 1$ for correct prediction 0$ for incorrect

Weighted majority algorithm [LW ‘94] “Predict according to the weighted majority.” Multiplicative update (initially all wi =1): • If expert predicted correctly: wit+1Ã wit • If incorrectly, wit+1Ã wit(1 - ) Claim: #mistakes by algorithm ¼ 2(1+)(#mistakes by best expert) • Potential: t = Sum of weights= i wit (initially n) • If algorithm predicts incorrectly )t+1·t - t /2 • T· (1-/2)m(A) n m(A) =# mistakes by algorithm • T¸ (1-)mi • )m(A) · 2(1+)mi + O(log n/)

Generalized Weighted majority [A.,Hazan, Kale ‘05] Set of events (possibly infinite) n agents event j expert i payoff M(i,j)

Generalized Weighted majority [AHK ‘05] Set of events (possibly infinite) n agents p1 p2 . . . pn Algorithm: plays distribution on experts (p1,…,pn) Payoff for event j: i pi M(i,j) Update rule: pit+1Ã pit (1 +  ¢ M(i,j)) Claim: After T iterations, Algorithm payoff ¸ (1-) best expert – O(log n / )

Fast soln toLPs, SDPs Lagrangean relaxation Gameplaying, Online optimization Gradient descent Chernoff bounds Games with Matrix Payoffs Boosting

Common features of MW algorithms • “competition” amongst n experts • Appearance of terms like exp( - t (performance at time t) ) Time to get -approximate solutions is proportional to 1/2.

Boosting and AdaBoost(Schapire’91, Freund-Schapire’97) “Weak learning ) Strong learning” Input: 0/1 vectors with a “Yes/No” label (white, tall, vegetarian, smoker,…,) “Has disease” (nonwhite, short, vegetarian, nonsmoker,..) “No disease” Desired: Short OR-of-AND (i.e., DNF) formula that describes  fraction of data (if one exists) (white Æ vegetarian) Ç (nonwhite Æ nonsmoker) = 1- “Strong Learning” = ½ +  “Weak Learning” Main idea in boosting: data points are “experts”, weak learners are “events.”

Approximate solutions to convex programming e.g., Plotkin Shmoys Tardos ’91, Young’97, Garg Koenemann’99, Fleischer’99 MW Meta-Algorithm gives unified view Note: Distribution $ Convex Combination { pi} i pi =1 If P is a convex set and each xi2 P then i pi xi2 P

Solving LPs (feasibility) -  -     -  a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm x 2P w1 w2    wm P = convex domain kwk(ak¢ x – bk) ¸ 0 x 2P Event Oracle

x0 |ak¢ x0 – bk| ·  = width Solving LPs (feasibility) a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm x 2P [1 – (a1¢ x0 – b1)/] £ [1 – (a2¢ x0 – b2)/] £    [1 – (am¢ x0 – bm)/] £ w1 w2    wm Final solution = Average x vector kwk(ak¢ x – bk) ¸ 0 x 2P Oracle

Performance guarantees • In O(2 log(n)/2) iterations, average x is  feasible. • Packing-Covering LPs: [Plotkin, Shmoys, Tardos ’91] • 9? x 2P: j = 1, 2, … m: aj¢ x ¸ 1 • Want to find x2 P s.t. aj¢ x ¸ 1 -  • Assume: 8 x 2 P: 0 ·aj¢ x · • MW algorithm gets feasible x in O( log(n)/2) iterations Covering problem

Randomized Rounding O(log n) times) Solve LP relaxation of integer program. Obtain soln (yi) xiÃ 1 w/ prob. yi xiÃ 0 w/ prob.1 - yi Connection to Chernoff bounds and derandomization Deterministic approximation algorithms for 0/1 packing/covering problem a la Raghavan-Thompson Derandomize using pessimistic estimators exp(i t ¢ f(yi)) Young [95] “Randomized rounding without solving the LP.” MW update rule mimics pessimistic estimator.

Semidefinite programming (Klein-Lu’97) Application 2: a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm x 2P ajand x : symmetric matrices in Rn £ n P = {x: x is psd; tr(x) =1} Oracle: max j wj (aj¢ x) over P (eigenvalue computation!)

The promise of SDPs… 0.878-approximation to MAX-CUT (GW’94),7/8-approximation to MAX-3SAT (KZ’00) plog n –approximation to SPARSEST CUT, BALANCED SEPARATOR (ARV’04). plog n-approximation to MIN-2CNF DELETION,MIN-UNCUT (ACMM’05), MIN-VERTEX SEPARATOR (FHL’06),…etc. etc…. The pitfall… High running times ---as high as n4.5 in recent works

Solving SDP relaxations more efficientlyusing MW [AHK’05]

Keydifference between efficient and not-so-efficient implementations of the MW idea: Width management. (e.g., the difference between PST’91 and GK’99)

Recall: issue of width MW a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm • Õ(2/2) iterations to obtain  feasible x •  = maxk |ak¢ x – bk| • is too large!! Oracle k wk(ak¢ x – bk) ¸ 0 x 2P

Issue 1:Dealing with width MW a1¢ x ¸ b1 a2¢ x ¸ b2    am¢ x ¸ bm • A few high width constraints • Run a hybrid of MW and ellipsoid/Vaidya • poly(m, log(/)) iterations to obtain  feasible x Oracle k wk(ak¢ x – bk) ¸ 0 x 2P

Dealing with width (contd) MW a1¢ x ¸ b1 a2¢ x ¸ b2  am¢ x ¸ bm • Hybrid of MW and Vaidya • Õ(L2/2) iterations to obtain  feasible x • L¿ Dual ellipsoid/Vaidya Oracle k wk(ak¢ x – bk) ¸ 0 x 2P

Random sampling C Issue 2: Efficient implementation of Oracle: fast eigenvalues via matrix sparsification • Lanczos algorithm effectively uses sparsity of C • Similar to Achlioptas, McSherry [’01], but better in some situations (also easier analysis) C’ O(nij|Cij|/)non-zero entries k C – C’k · 

O(n2)-time algorithm to compute O(plog n)-approximationto SPARSEST CUT [A., Hazan, Kale ’04] (combinatorial algorithm; improves upon O(n4.5) algorithm of A, Rao, Vazirani)

S Sparsest Cut (G) = 2/5 S The sparsest cut: O(log n) approximation [Leighton Rao ’88] O(plog n) approximation [A., Rao, Vazirani’04] O(p log n) approximation in O(n2) time. (Actually, finds expander flows) [A., Hazan, Kale’05]

MW algorithm to find expander flows • Events – { (s,w,z) | weights on vertices, edges, cuts} • Experts – pairs of vertices (i,j) • Payoff: (for weights di,j on experts) Fact: If events are chosen optimally, the distribution on experts di,j converges to a demand graph which is an “expander flow”[by results of Arora-Rao-Vazirani ’04 suffices to produce approx. sparsest cut] shortest path according to weights we Cuts separating i and j

New: Online games with matrix payoffs (joint w/ Satyen Kale’06) Payoff is a matrix, and so is the “distribution” on experts! Uses matrix analogues of usual inequalities 1+ x · ex I + A · eA Can be used to give a new “primal-dual” view of SDP relaxations () easier, faster approximation algorithms)

New: Faster algorithms for online learningand portfolio management (Agarwal-Hazan’06, Agarwal-Hazan-Kalai-Kale’06 ) Framework for online optimization inspired by Newton’s method (2nd order optimization). (Note: MW ¼ gradient descent) Fast algorithms for Portfolio management and other online optimization problems

Open problems • Better approaches to width management? • Faster run times? • Lower bounds? THANK YOU

1 w.p. yi 0 w.p. 1 – yi Connection to Chernoff bounds and derandomization • Deterministic approximation algorithms a la Raghavan-Thompson • Packing/covering IPwith variables xi = 0/1 9? x 2 P: 8 j 2 [m], fj(x) ¸ 0 • Solve LP relaxation using variables yi2 [0, 1] • Randomized rounding: xi = • Chernoff: O(log m) sampling iterations suffice

Derandomization [Young, ’95] • Can derandomize the rounding using exp(tj fj(x)) as a pessimistic estimator of failure probability • By minimizing the estimator in every iteration, we mimic the random expt, so O(log m) iterations suffice • The structure of the estimator obviates the need to solve the LP: Randomized rounding without solving the Linear Program • Punchline: resulting algorithm is the MW algorithm!

Weighted majority [LW ‘94] • If lost at t, t+1· (1-½ ) t • At time T: T· (1-½ )#mistakes ¢ 0 • Overall:#mistakes · log(n)/ + (1+) mi #mistakes of expert i

Semidefinite programming • Vectors ajand x: symmetric matrices in Rn £ n • x º 0 • Assume: Tr(x) · 1 • Set P = {x: x º 0, Tr(x) · 1} • Oracle: max j wj(aj ¢ x) over P • Optimum: x = vvT where v is the largest eigenvector of jwjaj

Efficiently implementing the oracle • Optimum: x = vvT • v is the largest eigenvector of some matrix C • Suffices to find a vector v such that vTCv ¸ 0 • Lanczos algorithm with a random starting vector is ideal for this • Advantage: uses only matrix-vector products • Exploits sparsity (also: sparsification procedure) • Use analysis of Kuczynski and Wozniakowski [’92]

Sanjeev Arora Princeton University

Sanjeev Arora Princeton University

Presentation Transcript

Princeton University Library

Dmitri Tymoczko Princeton University

Princeton University WebMedia

Princeton University HPCRC

Xin Jin Princeton University

Princeton University

Abraham Asfaw Princeton University

Princeton University

Princeton University

Sanjeev Arora M.D. Professor of Medicine (Gastroenterology/Hepatology) Director Project ECHO

Law Collection Princeton University

sherrerd hall princeton university

Presentation to AHRQ 2006 Annual Conference Sanjeev Arora M.D.

David Etherton Princeton University

Eileen Zerba: Princeton University (ezerba@princeton)

Alison Murphy Princeton University

Princeton University

Princeton University Library

Presentation to AHRQ 2006 Annual Conference Sanjeev Arora M.D.

Princeton University Physics

David Etherton Princeton University