Provably hard problems below the satisfiability threshold

A sharp threshold in proof complexity yields lower bounds for satisfiability search Provably hard problems below the satisfiability threshold Paul Beame Univ. of Washington Dimitris Achlioptas Microsoft Research Michael Molloy Univ. of Toronto

CNF Satisfiability • (x1 x2 x4) (x1x3)(x3x2)(x4 x3) • NP-complete but many heuristics because of its practical importance • presumably exponential in the worst case • If you know formula is satisfiable • How hard is it to find assignment? • No lower bounds known for interesting heuristics.

Satisfiability Algorithms • Local search (incomplete) • GSAT [Selman,Levesque,Mitchell 92] • Walksat [Kautz,Selman 96] • Backtracking search (complete) • DPLL [Davis,Putnam 60] [Davis,Logeman,Loveland 62] • DPLL + “clause learning”

Select*a literal l (some x or x) Remove all clauses containing l Shrink all clauses containingl While there are 1-clauses Pick some (arbitrary) 1-clause, satisfy it and simplify If there is a 0-clause (contradiction) Backtrack to last free step Backtracking search/DPLL Free step Yields `residual formula’ *many options for select

Resolution • Start with clauses of CNF formula F • Resolution rule • Given (A  x), (B x) can derive (A  B) • F is unsatisfiable  0-clause derivable • Proof size = # of clauses Running DPLL (with any select) on an unsatisfable formula F results in a tree-resolution proof ofF

Random CNF formulas • Random 2-CNF formula with sn clauses • is satisfiable w.h.p. for s  1 • and simple DPLL will find a satisfying assignment in linear time w.h.p. • is unsatisfiable w.h.p. for s  1 • and simple DPLL will finish and yield a resolution proof of unsatisfiability in linear time w.h.p.

# of DPLL backtracks probability satisfiable 1 0 4.26 DPLL on random 3-CNF* Can prove2W(n/D1+e) time is required for unsatisfiable formulas above the threshold What about satisfiable formulas below threshold? D ratio of clauses to variables * n = 50 variables

Phase transitions and algorithmic complexity • Easy connection • Hardest random problems will always be at a monotone sharp threshold bn if it exists • Can randomly reduce satisfiable problems of lower density to those at the threshold • Given a formula with Dn clauses D b can always add (b-D-e) n random clauses to make it a random problem nearly at the threshold and use that soln • Can reduce unsatisfiable problems of larger density to those at the threshold • Given a formula with Dn clauses D b ignore all but the first (b+e) nof them

Hard satisfiable formulas? • With non-deterministicselect we could • simply guessn correct value assignments. • .... How can a satisfiable formula possibly be hard? • Any implementation of select must run • in polynomial time. • …. Very simple heuristics used in practice

Some standard select rules for DPLL algorithms • UC • Pick variables in a fixed order • Always set True first • UCwm • Pick variables in a fixed order • Apply a majority vote among 3-clauses for assigning each value • GUC • Pick a variable v in a shortest clause C • Set v to satisfy C

Contributions These natural DPLL algorithms take exponential time on satisfiable formulas  family of unsatisfiable random formulas parametrized by s s.t. w.h.p. s  1 linear size resolution proofs s  1 only exponential size resolution proofs possible

Key property of each of the select rules we’ve seen • On random 3-CNF, before the first backtrack occurs, the residual formula is a uniformly random mix of 2-clauses and 3-clauses • If it has m22-clauses andm3 3-clauses then it is equally likely to be any formula with these properties • key property  proofs of algorithms’ success without backtracking

What do long runs look like? Residual formula at each node is a mix of 2- and 3-clauses Residual formula at is unsatisfiable 2rn Every resolution Algorithm’s proof of unsatisfiability is exponentially long

Proof Complexity Theorem.A random CNF formula withDn3-clauses andsn2-clauses wheres 1 hasnoresolution refutation of size2rnw.h.p. [Chvátal-Szemerédi 88] [Achlioptas,B.,Molloy 2001] Formula is unsatisfiable w.h.p. for D 4.57 s  1-eand D ????

4.57 Non-rigorous results [Kirkpatrick, Monasson, Selman, Zecchina 97] 2-clause ratio s We can add 2/3 n 3-clauses but notn 2-clauses 1 UNSAT SAT 2/3 4.26 3-clause ratio D

? ? ? ? ? ? ? ? ? ? ? ? ? Rigorous results[Achlioptas, Kirousis, Kranakis, Krizanc 97] 2-clause ratio We can add 2/3 n 3-clauses but notn 2-clauses 1 ? UNSAT s ? SAT 2/3 8/3 2.28 4.57 D 3-clause ratio

Proof Complexity Theorem.A random CNF formula withDn3-clauses andsn2-clauses wheres 1 hasnoresolution refutation of size2rnw.h.p. [Achlioptas,B.,Molloy 2001] Formula is unsatisfiable w.h.p. for D 4.57 D 2.281 and s  1-efore .0001 Sharp threshold since resolution is linear for s  1+e

These DPLL algorithms follow trajectories 2-clause ratio 1 [Chao,Franco 88] [Frieze,Suen 95] s [Achlioptas00] [Achlioptas,Sorkin 00] UC GUC 2/3 3.26 8/3 3-clause ratio D

DPLL crossing into the bad zone 2-clause ratio Algorithm Trajectory 1 Provably UNSAT & Hard s Provably SAT & Easy 4.57 3.26 4.26 3-clause ratio D

Exponential lower bounds far below the threshold. Theorem. Let A{UC, UCwm, GUC}. Let DUC = 3.81 DUCwm = 3.83 DGUC = 4.01 W.h.p. algorithm A takes more than 2rn steps on a random 3-CNF with DAn clauses Lower bound also applies to any resolution-based algorithm that extends the ‘first’ branch of the execution of A

Related Work • Experiments suggested DPLL algorithms may not be polynomial all the way to the threshold • [Cocco, Monasson 01] applied non-rigorous methods to suggest exponential GUC behavior below the threshold • Assumed every branch of GUC tree operates like an independent version of the first branch • Independent of our work

Implications for phase transitions and algorithmic complexity • Difference between polynomial and exponential hardness is not necessarily a function of the phase transition • Applies in both phases, not just the over-constrained phase • Algorithmically dependent • A good algorithm will have a transition in a different place from a bad algorithm • Can’t study the hardness transition in the absence of the study of algorithms

Proof Ideas • Connection between pure literals and resolution proof size [Chvátal,Szemerédi 88] [Ben-Sasson,Wigderson 99] • pure literals are those that occur only positively or only negatively in a formula • Digraph structure of random 2-CNF subformula • New graph-theoretic notion“clan” • generalization of connected component • Sharp concentration properties for clan size • moment generating function argument • Amortization of pure literals across clans

Resolution proof size and pure literals [Ben-Sasson,Wigderson 99] • If formula has an a s.t. • Every subformula with  a n clauses has at least one pure literal • Every subformula with between a n and a n clauses has a linear # of pure literals • Then • all resolution proofs of the formula require size 2rn

Basic idea of argument • By sparsity of the 2-clause part of the formula, any subset of the 2-clauses will have lots of pure literals • Clan size analysis & amortization • In a subformula involving both 2-clauses and 3-clauses, either there are • so many 3-clauses that they create lots of new pure literals on their own , or • so few 3-clauses that they can’t cover all the pure literals in the 2-clauses - analysis of clans easy case

2-CNF Digraph on literals x x c c y y w w z z d d (d y) (yx) (zy) (cw) (xw) (wz)

x c y w z d a b Hyper/Digraph on literals x c y w f g z d (a  b  z) (f  g w)

Pure literals x x c c y y w w f g z z d d a b

Pure cycle x x c c y y w w f g z z d d a b

Pure Items & Clans of G • Clans • small subgraphs of G • one clan per vertex; they cover G • analog of connected components in sparse random graphs • pure items typically two per clan  leaves in acyclic connected components in an ordinary graph • mostly constant size • never more than log3n vertices • if x clan(y) then y clan(x)

What are clans? Simpler notion first in(y) for vertex y in an ordinary digraph

y in(y) in ordinary digraph v x y t w z Subgraph of vertices that can reach y = Ancestors of y u

y clan(y) in ordinary digraph v x y t w z Descendants of ancestors of y u

y y clan(y) in 2-CNF digraph

y d A complication - bad events x x c c w w z z y d (d y) (zy) (cw) (xw) (wz) (w  d)

y y in(y) in a bad case

y y clan(y) in a bad case This can cascade and get even worse!

Analysis • If we ignore bad edges |in(y)| is dominated by a component process in a sub-critical random undirected graph • like trimmed out-trees [Bollobás,Borgs,Chayes,Kim,Wilson] • Ignoring bad edges |clan(y)| is dominated by a 2-level process • run a component process to get in(y) • take the union of |in(y)| independent component processes added to in(y)

Analysis • w.h.p. no more than one bad event happens per clan • |in(y)| is always dominated by the 2-level component process • w.h.p. no more than Clog n bad events occur in the whole digraph • fewer than polylog n literals interact with bad clans • rest of clans dominated by 2-level process

Analysis • Ordinary sub-critical component process on 2n vertices w.h.p. • # of vertices with component size  i is at most 2n (1-s)i for some fixed s0 • We show sub-critical 2-level component process on 2n vertices w.h.p. • for i  i0, # of vertices with 2-level size  i is at most 2n (1-t)i for some fixed t0 This is false for a 3-level component process!

? ? ? ? ? ? ? ? ? ? ? ? ? Open problem Conjecture. For every D > 2/3 there exists an s  1 such that a random (2,3)-CNF withDn 3-clauses and sn 2-clauses is w.h.p. unsatisfiable 1 UNSAT SAT 2/3 3.26 4.57

Open problem Conjecture. For every D > 2/3 there exists an s  1 such that a random (2,3)-CNF withDn 3-clauses and sn 2-clauses is w.h.p. unsatisfiable Implies. For every card-game algorithm A there exists a critical density DA such that for random 3-CNF formulas with Dn clauses For D DA w.h.p. A takes linear time For DDA w.h.p. A takes exponential time

Provably hard problems below the satisfiability threshold

Provably hard problems below the satisfiability threshold

Presentation Transcript

Attacking the Hard Problems of IAIMS

Where are the hard problems?

Hard Problems

Where are the hard manipulation problems?

Hard Problems

Very Hard Problems

Many Problems are Hard

Where are the hard problems?

Balance and Filtering in Structured Satisfiability Problems

The Satisfiability Problem

Generating Hard Satisfiability Problems

Approximations for hard problems

Where are the hard problems?

Boolean Satisfiability (SAT) Problems

Hard Problems

Satisfiability

Where are the hard manipulation problems?

The Satisfiability Problem

Hard Computational Problems