The Cook-Levin Theorem

The Cook-Levin Theorem Zeph Grunschlag

Announcements • Last HW due Thursday • Please give feedback about course at oracle.seas.columbia.edu/wces

Agenda • Proof from definition that CSAT is NP-complete (Cook-Levin Theorem) • CSAT P3SAT so 3SAT is NP-complete

Notation for NTM Branching We need a way of dealing with all the possible choices that a non-deterministic TM can take at any point of the computation. Every NTM has a finite maximum on the number of choices possible, called the branching factor and denoted by b. Q: What is b for the following? a|bR L bR a|bR 0 1 2 3

Notation for NTM Branching a|bR L bR a|bR 0 1 2 3 A: b= 2. This means that computation trees are binary: 0ababa a0baba ab0aba ab1aba aba0ba aba2ba crash abab0a abab1a ababa0 ababa2 crash abab3a

Notation for NTM Branching Given a state q and tape-symbol a in a NTM, d(q,a ) is a set of possible state-symbol-direction outputs. Order this set and let the k ’th possibility be denoted by: where q’, a’ and D are the transition’s output. When less than k possibilities were defined, just add transitions that go directly to the reject state and write blank: Q: What are below? a|bR L bR a|bR 0 1 2 3

Notation for NTM Branching a|bR A: L bR a|bR 0 1 2 3

Boolean Uniqueness Expression Let’s give some handy boolean expression in conjunctive normal form: The uniqueness expression U says that exactly one of its arguments is true: Here stands for the conjunction over all possible i,j. Q: Why is each a 2-clause?

Boolean Uniqueness Expression A: and are logically equivalent. Q: What is the size of U in terms of the number of variables?

Boolean Uniqueness Expression A: O (n2) It will be convenient also to define U over a range of variable. For example we could express instead by:

Cook-Levin Theorem THM: Suppose we are given a NTM N and a string w of length n which is decided by N in f (n) or fewer nondeterministic steps. Then there is an explicit cnf formula f of length O (f (n)3) which satisfiable iff N accepts w. In particular, when f (n) is a polynomial, f has polynomial length in terms of n so that every language in NPreduces to CSAT in polynomial time. Thus CSAT is NP-hard. Finally, as CSAT is in NP, CSAT is NP-complete.

Confusing Indices The most confusing thing about the proof of the Cook-Levin Theorem are all the indices. We use the following conventions: • a –index for letters in tape alphabet • q –index for NTM states • i –index for tape cells • t –index for computation step (time) • k –index for branching choice Q: How many choices for each index?

Confusing Indices A: The number of choices are • a : |G| the size of tape alphabet • q : |Q| the number of states • i : f (n), since ranges from 1 to f (n) • t : f (n)+1, since ranges from 0 to f (n) • k : b, since ranges from 1 to branching factor b

Variables forSimulating NTM Computation • T-variables will track the possible symbols on the tape cells at each step in computation Tt,i,a= trueiff at time t, i’ th cell reads a • S-variables will track the active state at each step in computation St,q= trueiff at time t in state q

Variables forSimulating NTM Computation • H-variables gives the head’s position Ht,i= trueiff at time t reading i’ th cell • C-variables gives the nondeterministic choice at each step in computation Ct,k= trueiff at time t +1take choice k

Boolean Expressions forSimulating NTM Computation Initialize the NTM. Insist that at time 0: • TM should be in the start state • Given input w = a1 a2 … an, cells 1 through n should contain the aiand all the rest should be blank • Head should be at left-most cell i =1 Together these give:

Boolean Expressions forSimulating NTM Computation Ending configuration should accept:

Boolean Expressions forSimulating NTM Computation At each time step t : • Machine is in a unique state: • Each tape cell has a unique symbol:

Boolean Expressions forSimulating NTM Computation • Head is reading unique cell: • Unique choice considered: • If already accepted, accept at t+1:

Boolean Expressions forSimulating NTM Computation • If already rejected, reject at t+1: • Cells which aren’t being read remain the same at time t+1: Q: Why can (xy z ) be considered a clause?

Boolean Expressions forSimulating NTM Computation A: (xy z ) is logically equivalent to (x y  z )

Boolean Expressions forSimulating NTM Computation Finally, given the branching choice k, and the current configuration, the next configuration should follow according to k’th d-function choice:

Boolean Expressions forSimulating NTM Computation Requiring all of the component cnf formulas to be true, simulates the NTM: Finally, let’s make sure that this is a poly-time reduction. The formulas are explicitly defined so running time is same as total space:

Space complexity • fstart --O( f (n) ) • fA --O( f (n) · |Q |2 ) • fB --O( f (n)2· |G|2 ) • fC --O( f (n) · f (n)2 ) • fD --O( f (n) · b 2 ) • fE --O( f (n) ) • fF --O( f (n) ) • fG --O( f (n)2· |G| ) • fH --O( f (n)2· |G|· |Q |· b ) • fend -- O( 1 ) Total: f -- O( f (n)3 )

Conclusion of Cook-Levin Proof Thus, any NTM that runs in time f (n) can have it’s inputs of size n converted into size O (f (n)3) instances of CSAT. So if f (n) is a polynomial, the reduction runs in polynomial time on a RAM (since the cube of a polynomial is a polynomial). This proves that CSAT is NP-complete.

Reducing CSAT to 3SAT Each clause with more than 3 literals can be converted into a conjunction of clauses with at most 3 literals. EG: (a b c d e f )  (a b x )(xc d e f )  (a b x )(xc y )(yd e f ) (a b x )(xc y )(yd z)(ze f ) (a b x )(x c y )(y d z)(z e f ) Q: How many 3-clauses are needed to simulate an n-clause?

Reducing CSAT to 3SAT A: Each substitution got rid of one extra variable, except for first. So need n-2 = O (n) 3-clauses to simulate an n-clause. Therefore, any instance of nSAT of size k is converted to an instance of 3SAT of size O (nk ). Any instance of CSAT of size k is at worst-case, a single clause of size k so can be converted to an instance of 3SAT of size O (k 2). This completes the poly-time reduction from CSAT to 3SAT.

Blackboard Exercise • Show that 2SAT is in P.

The Cook-Levin Theorem