An Empirical Study of Optimal Noise and Runtime Distributions in Local Search

Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University, USA SAT 2010 Conference Edinburgh, July 2010 An Empirical Study of Optimal Noise andRuntime Distributions in Local Search Presented by:Holger H. Hoos

Optimal Noise and Runtime Distributions in Local Search Local Search Methods for SAT • A lot is known about Stochastic Local Search (SLS) methods[e.g. Hoos-Stutzle ’04], especially their behavior on random 3-SAT • Along with systematic search, the main SAT solution paradigm • Walksat one of the first widely successful local search solver • Biased random walk • Combines greedy moves (downhill) with stochastic moves (possibly uphill) controlled by a “noise” parameter [0% .. 100%] • Yet, new surprising findings are still being discovered • Part of this work motivated by the following observation: Empirical evidence that Walksat's running time on large, random,3-SAT instances is quite predictable, andscales linearlywith number of variables for a specific setting of the noise parameter[Seitz-Alava-Orponen 2005]

Our Motivation Our work looks at Walksat again, on large, random, 3-SAT formulas, and seeks answers to two questions: Can we further characterize the “optimal noise” and the linear scaling behavior of Walksat? Key parameter: the clause-to-variable ratio, α How do runtime distributions of Walksat behave at sub-optimal noise? Are they concentrated around the mean or do they have “heavy tails” similar to complete search methods? Heavy tails  very long runs more likely than we might expect Heavy tails not reported in local search so far Note: Walksat still faster than current adaptive, dynamic noise solvers on these formulas; studying behavior at optimal static noise of much interest Optimal Noise and Runtime Distributions in Local Search

Optimal Noise and Runtime Distributions in Local Search Summary of Results • Walksat on large, random, 3-SAT formulas: • Further characterization the “optimal noise” and linear scaling: • A detailed analysis, showing a piece-wise linear fit for optimal noiseas a function of α, with transitions at interesting points(extending the previous observation that ~57% is optimal for α=4.2) • Simple inverse polynomial dependence of runtime on α • Runtime distributions of Walksat behave at sub-optimal noise • Exponential decay in the high noise regime • Heavy tails in the low noise regime First quantitative observation of heavy tails in local search[earlier insights: Hoos-Stutzle 2000] • Preliminary Markov Chain model

A. Further Study of Optimal Noise and Linear Scaling

Optimal Noise and Runtime Distributions in Local Search Optimal Noise Setting vs. α • Question: • How does the optimal noise setting vary with α and N? • Experiment: • For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K] • For each, find the noise setting where Walksat is the fastest (binary search) • Average these optimal noise settings and plot against α

Optimal Noise and Runtime Distributions in Local Search Optimal Noise Setting vs. α • Data with 1 standard deviation bars • Optimal noise depends significantly on α (e.g., ~46% at α=3.9; ~57% at α=4.2) • Very good piece-wise linear fit • Transitions at interesting places: • α≈3: up to which generalized unit clause (GUC) rule works almost surely [Frieze-Suen 1996] • α≈3.9: up to which greedy Walksat (GSAT) works (also where “clustering structure” of the solution space is believed to change drastically: from one giant cluster to exponentially many small ones [Mezard-Mora-Zecchina 2005]) Greedy Walksat (GSAT) works till here Generalized Unit Clause heuristic works till here

Optimal Noise and Runtime Distributions in Local Search Linear Scaling at Optimal Noise Experiment: For α in [1.5...4.2], generate random 3-SAT formulas with N in [100K..400K] Measure Walksat's runtime with optimal noise (#flips till solution found) Plot #flips/N against α (one point per run, no averaging) Results: Inverse polynomial fit of #flips/N as a function of α Suggesting linear scaling for α < 4.235 [fig explained in paper] Points with varying N fall on each other after rescaling by N, showing linearity wrt N

B. Runtime Distribution of Local Search Methods

Optimal Noise and Runtime Distributions in Local Search Standard vs. Heavy Tailed Distributions Standard distributions: Exponential or faster decaye.g., Normal distribution Heavy-tailed distributions: Power law decaye.g. Pareto-Levy distribution Power Law Decay Exponential Decay Standard Distribution (finite mean & variance)

Optimal Noise and Runtime Distributions in Local Search Heavy Tailed Distributions Heavy-tailed distributions: Power law decaye.g. Pareto-Levy distribution Signature: tail of the distribution is a line in log-log plot Observed in systematic search solvers Mechanism well-understood in terms of “bad” variable assignments that are hard to recover from [Gomes, Kautz and Selman ‘99, ’00] Motivated key techniques such as search restarts,algorithm portfolios Not previously observed in studies on local search methods Power Law Decay

Runtime Distributions of Walksat Experiment: Generate a random 3-SAT formula with N=100K at α=4.2 Large formulas, free of small size effects Very hard to solve Still less constrained than formulas at the phase transition (α4.26) Run 100K (!) runs of Walksat with noise settings around the optimal Plot the runtime distribution: probability of failure to find a solutionas a function of #flips Optimal Noise and Runtime Distributions in Local Search 12

Runtime Distributions of Walksat [Setting: Large, random, 3-SAT formula with α=4.2] Summary of Results: There is a qualitative difference between noise higher that optimal (>56.7%) and lower that optimal (<56.7%) High noise regime: tail of P[failure] has an exponential distribution Low noise regime : tail of P[failure] has a power-law distribution Intuition captured by a (preliminary) Markov Chain model High noise means “guessing the solution” Low noise (too greedy) leads the search into “local traps” Optimal noise is where the two effects balance Optimal Noise and Runtime Distributions in Local Search 13

Heavy-Tails in Low Noise Regimes 100K data points plotted per curve; actual data points, no fitting;Not all data points markedwith o, x, +, etc. for clarity • LOG-LOG scale straight line = power-law decay Last 5% of tail (5K points) Linear slope = 0.38

Optimal Noise and Runtime Distributions in Local Search Heavy-Tails in Low Noise Regimes Same data as previous plot, but with all 100K data points (per curve) marked with o, x, +, etc., and full y-axis. As before, actual data points, no fitting.

Optimal Noise and Runtime Distributions in Local Search Qualitative Contrast: High vs. Low Noise Regimes • LOG-LOG scale straight line = power-law decay • High Noise • Low Noise Line  heavy tailed. Not straight lines  not heavy tailed. In fact, log-linear plot reveals a clear exponential tail extremely long runs are much more likely than one might expect!

Understanding Variation with Noise Leveland Power-Law Decay: Preliminary Insights

Optimal Noise and Runtime Distributions in Local Search Different “Search” at High, Low, Opt Noise • Experiment: • Run Walksat at different noise levels on a formula with 100K vars, 420K clauses • Plot how the number of unsatisfied clauses evolves as the search progresses(0 on y-axis = solution) High noise: search “stuck”at a relatively high value Optimal noise: a gradualdescent until solution found Low noise: #unsat clausesdecreases fast but gets “stuck”at a relatively low value

Optimal Noise and Runtime Distributions in Local Search Markov Chain Model CapturingPower-Law Decay (preliminary) • [details omitted; refer to paper. Similar to work of Hoos ’02] • Key features: • States represent (roughly) thenumber of unsatisfied clauses;left-most state = all solutions • Ladder structures capture fallinginto a “trap”; the farther it keepsfalling, the harder it gets to recover(recovery time = hitting time of a biased 1-dimensional Markov Chain)

Optimal Noise and Runtime Distributions in Local Search Markov Chain Model CapturingPower-Law Decay (preliminary) • [details omitted; refer to paper. Similar to work of Hoos ’02] • In the horizontal part of the chain: • High noise: avoids traps but attraction towards the top-middle node; exponential time to convergence, very concentrated around the mean • Low noise: leftward drift but good chance of falling into a trap; exponential time to convergence but power-law decay

Optimal Noise and Runtime Distributions in Local Search Summary • Further study of optimal noise for Walksat • depends on the clause-to-variable ratio, α, in piece-wise linear fashionwith transitions at interesting points • allows for a simple inverse polynomial fit for the linearity constant • Runtime distributions in local search • drastic change in behavior below and above optimal noise • exponential decay for higher-than-optimal noise • power-law decay (heavy tails) for lower-than-optimal noise • Future directions: • A better understanding of when heavy tails appear and when they don’t • Improved model capturing heavy tails in local search • Ways of utilizing these insights to improve local search solvers(similar to restarts and algorithm portfolios for complete search)

An Empirical Study of Optimal Noise and Runtime Distributions in Local Search

An Empirical Study of Optimal Noise and Runtime Distributions in Local Search

Presentation Transcript

An Empirical Study on MBASE and LeanMBASE

OPTIMAL Search

An Empirical Study of Optimizations in Yogi

An Empirical Study of Duplication in Cascading Style Sheets

An Empirical Study of the Reliability in UNIX Utilities

Empirical Distributions

COMPETITIVENESS AND INNOVATIVENESS IN MANUFACTURING: AN EMPIRICAL STUDY

An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising

An empirical study of defects in industrial projects

Search: Optimal

Pros and Cons of CALL ─ an empirical study in BIT

An Empirical Study of the Demeter System

An Empirical Study of UHF RFID Performance

An Empirical Study of Exposure at Default

An Empirical Study of Exposure at Default

Pros and Cons of CALL ─ an empirical study in BIT

Local Search for Optimal Permutations

OPTIMAL Search

Search: Optimal

Fog Computing An Empirical Study