Time-Space Tradeoff Lower Bounds for Branching Programs: Part I

Techniques for Time-Space Tradeoff Lower Bounds for Branching Programs: Part I Paul Beame University of Washington joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun

Branching programs x1 To compute f:{0,1}n {0,1} on input (x1,…,xn) follow path from source to sink 1 x3 x2 0 x4 x5 x5 x=(1,1,0,1,...) Time T= length of longest path x3 x1 x2 x7 x7 x8 Space S = log2 (# of nodes) 0 1

Branching program properties • Simulate random-access machines • same time T and space S • Multi-way version for xi in domain D • good for modeling RAM input registers • BPs will be leveled wlog. • same time T • at most 2S nodes per level

Overall approach to lower bounds • Iff:Dn {0,1} is computed using small time and space • thenf-1(1)has a special combinatorial structure. • Lower bounds for f follow if f-1(1) does not have the structure How do we find such structures?

Levelled BPs and Layers v0 Assume time T  kn and wlog that theBP is levelled (2S nodes per level) L1 L2 Break BP into r layers L1,…,Lr of height kn/r kn Partition (a subset of) the layers Lj into sets 1, 2,…, p p  2 Lr 0 1

The Trace of an Input v0 Partition of (a subset of) the layers Lj into sets 1, 2,…, p p  2 L1 v1 L2 • The trace of input x • the sequence of nodes reached on input x as the computation moves from one set ito another • E.g. trace(x)=(v1,v2,v3) • a =length of trace = # of alternations in the partition • 2Sa possible traces v2 kn v3 L5 0 1

Branching program time-space lower bounds using these ideas • Oblivious - same variable queried per level • [Chandra-Furst-Lipton 83], [Alon-Maass 86], [Babai-Nisan-Szegedy 89] • (Syntactic) read k - no variable queried k times on any path • [Borodin-Razborov-Smolensky 89], [Okol’nishnikova 89] • General BP’s • [B-Jayram-Saks 98], [Ajtai 99a], [Ajtai 99b], [B-Saks-Sun-Vee 00], [B-Vee 02]

The Case of Oblivious BP’s Partition of the layers Lj into sets 1, 2,…, p p  2 v0 L1 v1 • When the BP is oblivious • Each i is associated with the subset Ai of variables read in levels in i • trace(x) can be used as the messages on input x in a communication protocol between p players computing f, where theith player has values of the variables in Ai L2 v2 kn v3 L5 0 1

The Oblivious Case • Let C=ip Ai be the common variables for the players and A’i = Ai - C • For any assignment s to C, the trace can be used to compute fs • Space bound • S  CC(fs;A’1,…,A’p)/a for any s • Want: n-|A’i| large for all i small # of alternations a

u v The Read-k Case • Wlog first make the read-k BP uniform • For any pair of nodes u,v the multi-set of variables queried between u and v is the same on any path • Call the set Auv • Then apply levelling etc. Add extra ‘dummy’ queries on each path if necessary

Read-k Case Argument Overview v0 • Variation of the usual argument • First fix the node sequence s=(v0,v1,…,vr)for the r layers • Defines sets of inputs Av0v1,…,Avr-1vrread during these layers • fs is an ANDof functions defined on these sets of variables • (k,r)-rectangle • Then choose a layer partition 1, 2 that is good for Av0v1,…,Avr-1vr • Subsequence of (v0,v1,…,vr) at alternations forms the trace - also good v1 v2 v3 v4 vr 0 1

Partitioning the layers • r layers (of height  kn/r) • Let Layers(x,i) be the set of layers in which variable xi is read on input x • |Layers(x,i)|k • For a set of layers, • unread(x, ) ={ i : Layers(x,i)  =} • core(x, ) = { i : Layers(x,i)  } • Partition is good if these are large for = 1, 2

How to partition the layers • Assign every layer to 1 or 2 • A = core(x, 1) = unread(x, 2) • B = core(x, 2) = unread(x, 1) • C = set of variables read in common • Two techniques, both using probabilistic method • [Borodin-Razborov-Smolensky 89] • |A|, |B| n/2k+1, a  r  k22k • [Okol’nishnikova 89] • |A|  n/kO(k), |B| n/2, a = 2k, r = 2k2

The Read-k Case: Fixing the Trace Fix a node sequence and then partition the layers Ljinto sets1, 2 yielding a tracet v0 L1 v1 Define ft(x)=1  f(x)=1 andx follows t L2 v2 Again, by uniformity, the trace determines which variables are read in each component of the partition kn v3 ft(x)=g(xAC)  h(xBC) ft-1(1) is a pseudo-rectangle L5 vf 0 1

Rectangles and Pseudo-rectangles • Ordinary combinatorial rectangle in {0,1}n • Partition [n] into A and B • RARB for sets RA {0,1}A and RB{0,1}B • Alternatively • {x : xA RA and xBRB} • Pseudo-rectangle • [n] =D E, sets RD{0,1}D and RE{0,1}E • {x : xD RD and xE RE} • Or, partition [n] into A, B and C • {x: xAC RAC and xBC RBC}

Read-k lower bounds If f is computed by a (nondeterministic) read k branching program of size 2S then The ones of f, f-1(1), can be covered by 2Sa pseudo-rectangles R with |A| and |B| large and f(R)=1 • |A|, |B| n/2k+1, ak22k[BRS 89] • |A|  n/kO(k), |B| n/2, a=2k [Okol 89] Prove upper bound F on # of inputs in any such pseudo-rectangle on which f is constant 1 2S(|f-1(1)|/F)1/a or S  log (|f-1(1)|/F)

Lower bounds for general BPs [BST 98] • Major problem to handle • Fixing the node sequence and the layer partition does not fix setsA = core(x, 1)or B = core(x, 2) • Solutions • Apply one layer partition for all inputs • Use extension of [BRS 89] partition method • Ignore inputs for which partition is bad • Prob method argument bounds # of bad inputs • Partition remaining inputs based on the values ofcore(x, 1) and core(x, 2) as well as on their traces

Lower bounds for general BPs [BST 98] • Number of rectangles increases • Multiply2Sa by the number of choices of core(x, 1)andcore(x, 2) • A priori bound is 3n since sets are disjoint • Observation • a pseudo-rectangle w.r.t A,B,C remains a pseudo-rectangle w.r.t A’,B’,C’ if A’A, B’B, and C’=C (A-A’)  (B-B’) • Partition based on only the first m=n/2k+1 elements of core(x, 1)andcore(x, 2) • # of choices is at most

Lower bounds for general BPs [BST 98] If f is computed by a (nondeterministic) time kn branching program of size 2S Then most of f-1(1) can be covered by 2Sa pseudo-rectangles with |A|=|B|=m=n/2k+1 whereak22k(the cover is a partition if the program is deterministic) # of pseudo-rectangles is at most 24log2(n/m) m+Sa= 24(k+1)m+Sa Is that good?

Using the Bound: Embedded Rectangles • Pseudo-rectangles are hard to reason about • Easier objects: Embedded rectangles • Start with an pseudo-rectangle on A,B,C • Fix an assignment to the common set C • we get a simpler object with • a combinatorial rectangle RAxRB on AxB • an assignment sto C=AB spine • Result is an embedded rectangle

Partition of most of f-1(1) into embedded rectangles • Input space is Dn • Each pseudo-rectangle can be partitioned into at most|D|n-2membedded rectanglesRwith |A|=|B|=m=n/2k+1A,Bfeet of R • Total number of such embedded rectangles partitioning most off-1(1)24(k+1)m+Sa|D|n-2m • Total number of inputs is|D|n • Non-trivial only if, e.g. |D|  23(k+1) large domain

Lower bound on embedded rectangle size for which f is constant • Suppose|f-1(1)|d|D|n • Since at most24(k+1)m+Sa|D|n-2m embedded rectangles, average size is at least d2-4(k+1)m-Sa-1 |D|2mand at least 1/4 of f-1(1)is covered by those d2-4(k+1)m-Sa-2 |D|2m • Such a rectangle defined by (s,A,B,RA,RB) must have |RA|/|Dm|,|RB|/|Dm| d2-4(k+1)m-Sa-2 • Typical 2-party communication complexity results* say|RA|/|Dm|,|RB|/|Dm|  |D|-em *With extra work to handlesand easiestA,B

The time space tradeoff lower bounds [BST 98] • Therefore for such a hardfd2-4(k+1)m-Sa-2  |D|-em • So if dis constant and|D|  29(k+1)/eSa  [e log |D| 4(k+1)] m  c (e/2) m log |D| • Sincem=n/2k+1andak22k for some C  1 S  C-k n log |D| • Therefore T/n=kc’log ((n log|D|)/S), i.e.

What functions are this hard? • ComputingxTMx 0 (mod q) qn [BST 98] • Non-optimal bound when M is Sylvester matrix • Letg 1/2andc 2/(1H2(g)) • HAMg:[nc]n {0,1}: Is any pair (xi,xj) close in Hamming distance D(xi,xj)gclog n? • Any two sets in [nc]m each of density n-bmcontain a pair of coordinates that are within gclog n of each other • Defined in [Ajtai 99a] where weaker lower bounds proved using generalization of [Okol 89] instead of [BRS 89] • Best bounds follow immediately from [BST 98]

What functions are this hard? • Computing xTMyx 0 (modq)forxGF(q)n,y GF(q)2n-1, qn • Function defined in [Ajtai 99b]and case q=2 used for Boolean lower bounds • Key to improvement: For some y, My has better rigidity propertiesthan Sylvester matrices have • Defining these matrices and analyzing their rigidity properties is the key contribution of [Ajtai 99b] • Most of the hard work in Boolean lower bounds is in the second half of [Ajtai 99a], much of which does not fit in the STOC version

Ajtai’s matrices y1 0 y2 Myis constant on anti-diagonals below the main diagonal y3 y4 yn yn+1 yn+2 y2n-2 y2n-1 My

xTMyxon an embedded(m,a)-rectangle B A x For every son AUB, f (xAUB,s,y) = xAT MABxB + g(xA,y) + h(xB,y) A B My x

Rectangles, rank, & rigidity • Largest rectangle on which xATMxB is constant has density  q-rank(M) • [BRS 89] • Lemma[Ajtai 99b] Can fix y s.t. every dndn minor MAB of My has rank(MAB) c dn/log2(1/d)  d1+en • better than comparable rigidity bound of d2n for Sylvester matrices[BRS 89],[BST 98]

How to partition the layers • Assign every layer to 1 or 2 • A = core(x, 1) = unread(x, 2) • B = core(x, 2) = unread(x, 1) • C = set of variables read in common • Two techniques for read-k case, both using probabilistic method • [Borodin-Razborov-Smolensky 89] • |A|, |B| n/2k+1, a  r  k22k • [Okol’nishnikova 89] • |A|  n/kO(k), |B| n/2, a = 2k, r = 2k2

v0 L1 v1 L2 v2 kn vr-1 Lr vr 0 1 Read-k case:Branching program with node sequence

Partitioning the layers • r layers (of height  kn/r) • Let Layers(x,i) be the set of layers in which variable xi is read on input x • |Layers(x,i)|k • For a set of layers, • unread(x, ) ={ i : Layers(x,i)  =} • core(x, ) = { i : Layers(x,i)  } • Partition is good if these are large for = 1, 2

Partitioning the layers [Okol’nishnikova 89] • Fix node sequencesandxthat followss • Choose a random subset 1ofkof therlayers • For each indexi • Thus • Fix a partition achieving the average

Partitioning the layers [Okol’nishnikova 89] • I.e., for each suchx • Onlyklayers of heightkn/r • At most a=2k alternations • Total  k2n/r n/2 vars read in 1 if r=2k2  core (x, 2)  n/2

Partitioning the layers [BRS 89] • Assign each layer independently • Pr[Li1]=Pr[Li2]=1/2 • for =1 or2 • Letci=1ifLayers(x,i)  and 0otherwise • Pr[ci]=Pr[Layers(x,i) ] 1/2k • each variable is read in at mostklayers • E[ici]=E[#{i: Layers(x,i) } ] n/2k • i.e., E[|core(x, )|] n/2k E[|unread(x, )|] n/2k

Modification for general BP [BST 98] • Let l(i) =|Layers(x,i)| • i l(i) kn • Pr[ci] = Pr[Layers(x,i)] = 2 l(i) • E[|core(x, )|] = E[ici] = i2l(i) • By arithmetic-geometric mean inequality this is 

Second Moment Method [BRS 89][BST 98] • If r is big enough |core(x,)| is concentrated around its mean • Bound Var[|core(x, )|] = Var[ici] • Events for ci, cjcorrelated only ifxiand xjread in the same layer • At most l(i)kn/r vars read in the same layer as xi • Each contributes at most Pr[ci]=1/2l(i) to variance • Var[ici] = (kn/r)il(i) 2l(i)  (k/r) (jl(j))i2l(i)  (k2n/r)i2l(i) = (k2n/r) E[|core(x, )|] FKG-like inequality of Chebyshev - terms are anti-correlated

The Boolean case is much harder • [BST 98]Showed onlyT 1.017nforS=o(n)for quadratic form problem • Uses pseudo-rectangles but specialized to splitting BP only at theT/2level, deterministic • [Ajtai 99a]Shows lower bounds for Element Distinctness over [n2] that work for density2-em • Embedded rectangles not pseudo-rectangles, deterministic • [Ajtai 99b]T=O(n)S=W(n)for Boolean BP’s!!! • [B-Saks-Sun-Vee 00] Improved bounds and extension to O(n/T)-error randomized case • Talk later

Power of the Large Domain Technique • For oblivious BPs, best bound using two-party CC is • T=W(n log (n/S))[Alon-Maass 86] • Bounds match for general BPs over large domains • Best oblivious BP bounds use multiparty CC • T=W(n log2(n/S))[Babai-Nisan-Szegedy 89] • [B-Vee 02] Matching bounds for general BPs over large domains • Erik Vee talk later

Time-Space Tradeoff Lower Bounds for Branching Programs: Part I