1 / 39

Techniques for Time-Space Tradeoff Lower Bounds for Branching Programs: Part I

Techniques for Time-Space Tradeoff Lower Bounds for Branching Programs: Part I. Paul Beame University of Washington. joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun. Branching programs. x 1. To compute f : {0,1} n  {0,1} on input ( x 1 ,…, x n ) follow path from

paytah
Télécharger la présentation

Techniques for Time-Space Tradeoff Lower Bounds for Branching Programs: Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Techniques for Time-Space Tradeoff Lower Bounds for Branching Programs: Part I Paul Beame University of Washington joint work with Erik Vee, Mike Saks, T.S. Jayram, Xiaodong Sun

  2. Branching programs x1 To compute f:{0,1}n {0,1} on input (x1,…,xn) follow path from source to sink 1 x3 x2 0 x4 x5 x5 x=(1,1,0,1,...) Time T= length of longest path x3 x1 x2 x7 x7 x8 Space S = log2 (# of nodes) 0 1

  3. Branching program properties • Simulate random-access machines • same time T and space S • Multi-way version for xi in domain D • good for modeling RAM input registers • BPs will be leveled wlog. • same time T • at most 2S nodes per level

  4. Overall approach to lower bounds • Iff:Dn {0,1} is computed using small time and space • thenf-1(1)has a special combinatorial structure. • Lower bounds for f follow if f-1(1) does not have the structure How do we find such structures?

  5. Levelled BPs and Layers v0 Assume time T  kn and wlog that theBP is levelled (2S nodes per level) L1 L2 Break BP into r layers L1,…,Lr of height kn/r kn Partition (a subset of) the layers Lj into sets 1, 2,…, p p  2 Lr 0 1

  6. The Trace of an Input v0 Partition of (a subset of) the layers Lj into sets 1, 2,…, p p  2 L1 v1 L2 • The trace of input x • the sequence of nodes reached on input x as the computation moves from one set ito another • E.g. trace(x)=(v1,v2,v3) • a =length of trace = # of alternations in the partition • 2Sa possible traces v2 kn v3 L5 0 1

  7. Branching program time-space lower bounds using these ideas • Oblivious - same variable queried per level • [Chandra-Furst-Lipton 83], [Alon-Maass 86], [Babai-Nisan-Szegedy 89] • (Syntactic) read k - no variable queried k times on any path • [Borodin-Razborov-Smolensky 89], [Okol’nishnikova 89] • General BP’s • [B-Jayram-Saks 98], [Ajtai 99a], [Ajtai 99b], [B-Saks-Sun-Vee 00], [B-Vee 02]

  8. The Case of Oblivious BP’s Partition of the layers Lj into sets 1, 2,…, p p  2 v0 L1 v1 • When the BP is oblivious • Each i is associated with the subset Ai of variables read in levels in i • trace(x) can be used as the messages on input x in a communication protocol between p players computing f, where theith player has values of the variables in Ai L2 v2 kn v3 L5 0 1

  9. The Oblivious Case • Let C=ip Ai be the common variables for the players and A’i = Ai - C • For any assignment s to C, the trace can be used to compute fs • Space bound • S  CC(fs;A’1,…,A’p)/a for any s • Want: n-|A’i| large for all i small # of alternations a

  10. u v The Read-k Case • Wlog first make the read-k BP uniform • For any pair of nodes u,v the multi-set of variables queried between u and v is the same on any path • Call the set Auv • Then apply levelling etc. Add extra ‘dummy’ queries on each path if necessary

  11. Read-k Case Argument Overview v0 • Variation of the usual argument • First fix the node sequence s=(v0,v1,…,vr)for the r layers • Defines sets of inputs Av0v1,…,Avr-1vrread during these layers • fs is an ANDof functions defined on these sets of variables • (k,r)-rectangle • Then choose a layer partition 1, 2 that is good for Av0v1,…,Avr-1vr • Subsequence of (v0,v1,…,vr) at alternations forms the trace - also good v1 v2 v3 v4 vr 0 1

  12. Partitioning the layers • r layers (of height  kn/r) • Let Layers(x,i) be the set of layers in which variable xi is read on input x • |Layers(x,i)|k • For a set of layers, • unread(x, ) ={ i : Layers(x,i)  =} • core(x, ) = { i : Layers(x,i)  } • Partition is good if these are large for = 1, 2

  13. How to partition the layers • Assign every layer to 1 or 2 • A = core(x, 1) = unread(x, 2) • B = core(x, 2) = unread(x, 1) • C = set of variables read in common • Two techniques, both using probabilistic method • [Borodin-Razborov-Smolensky 89] • |A|, |B| n/2k+1, a  r  k22k • [Okol’nishnikova 89] • |A|  n/kO(k), |B| n/2, a = 2k, r = 2k2

  14. The Read-k Case: Fixing the Trace Fix a node sequence and then partition the layers Ljinto sets1, 2 yielding a tracet v0 L1 v1 Define ft(x)=1  f(x)=1 andx follows t L2 v2 Again, by uniformity, the trace determines which variables are read in each component of the partition kn v3 ft(x)=g(xAC)  h(xBC) ft-1(1) is a pseudo-rectangle L5 vf 0 1

  15. Rectangles and Pseudo-rectangles • Ordinary combinatorial rectangle in {0,1}n • Partition [n] into A and B • RARB for sets RA {0,1}A and RB{0,1}B • Alternatively • {x : xA RA and xBRB} • Pseudo-rectangle • [n] =D E, sets RD{0,1}D and RE{0,1}E • {x : xD RD and xE RE} • Or, partition [n] into A, B and C • {x: xAC RAC and xBC RBC}

  16. Read-k lower bounds If f is computed by a (nondeterministic) read k branching program of size 2S then The ones of f, f-1(1), can be covered by 2Sa pseudo-rectangles R with |A| and |B| large and f(R)=1 • |A|, |B| n/2k+1, ak22k[BRS 89] • |A|  n/kO(k), |B| n/2, a=2k [Okol 89] Prove upper bound F on # of inputs in any such pseudo-rectangle on which f is constant 1 2S(|f-1(1)|/F)1/a or S  log (|f-1(1)|/F)

  17. Lower bounds for general BPs [BST 98] • Major problem to handle • Fixing the node sequence and the layer partition does not fix setsA = core(x, 1)or B = core(x, 2) • Solutions • Apply one layer partition for all inputs • Use extension of [BRS 89] partition method • Ignore inputs for which partition is bad • Prob method argument bounds # of bad inputs • Partition remaining inputs based on the values ofcore(x, 1) and core(x, 2) as well as on their traces

  18. Lower bounds for general BPs [BST 98] • Number of rectangles increases • Multiply2Sa by the number of choices of core(x, 1)andcore(x, 2) • A priori bound is 3n since sets are disjoint • Observation • a pseudo-rectangle w.r.t A,B,C remains a pseudo-rectangle w.r.t A’,B’,C’ if A’A, B’B, and C’=C (A-A’)  (B-B’) • Partition based on only the first m=n/2k+1 elements of core(x, 1)andcore(x, 2) • # of choices is at most

  19. Lower bounds for general BPs [BST 98] If f is computed by a (nondeterministic) time kn branching program of size 2S Then most of f-1(1) can be covered by 2Sa pseudo-rectangles with |A|=|B|=m=n/2k+1 whereak22k(the cover is a partition if the program is deterministic) # of pseudo-rectangles is at most 24log2(n/m) m+Sa= 24(k+1)m+Sa Is that good?

  20. Using the Bound: Embedded Rectangles • Pseudo-rectangles are hard to reason about • Easier objects: Embedded rectangles • Start with an pseudo-rectangle on A,B,C • Fix an assignment to the common set C • we get a simpler object with • a combinatorial rectangle RAxRB on AxB • an assignment sto C=AB spine • Result is an embedded rectangle

  21. Partition of most of f-1(1) into embedded rectangles • Input space is Dn • Each pseudo-rectangle can be partitioned into at most|D|n-2membedded rectanglesRwith |A|=|B|=m=n/2k+1A,Bfeet of R • Total number of such embedded rectangles partitioning most off-1(1)24(k+1)m+Sa|D|n-2m • Total number of inputs is|D|n • Non-trivial only if, e.g. |D|  23(k+1) large domain

  22. Lower bound on embedded rectangle size for which f is constant • Suppose|f-1(1)|d|D|n • Since at most24(k+1)m+Sa|D|n-2m embedded rectangles, average size is at least d2-4(k+1)m-Sa-1 |D|2mand at least 1/4 of f-1(1)is covered by those d2-4(k+1)m-Sa-2 |D|2m • Such a rectangle defined by (s,A,B,RA,RB) must have |RA|/|Dm|,|RB|/|Dm| d2-4(k+1)m-Sa-2 • Typical 2-party communication complexity results* say|RA|/|Dm|,|RB|/|Dm|  |D|-em *With extra work to handlesand easiestA,B

  23. The time space tradeoff lower bounds [BST 98] • Therefore for such a hardfd2-4(k+1)m-Sa-2  |D|-em • So if dis constant and|D|  29(k+1)/eSa  [e log |D| 4(k+1)] m  c (e/2) m log |D| • Sincem=n/2k+1andak22k for some C  1 S  C-k n log |D| • Therefore T/n=kc’log ((n log|D|)/S), i.e.

  24. What functions are this hard? • ComputingxTMx 0 (mod q) qn [BST 98] • Non-optimal bound when M is Sylvester matrix • Letg 1/2andc 2/(1H2(g)) • HAMg:[nc]n {0,1}: Is any pair (xi,xj) close in Hamming distance D(xi,xj)gclog n? • Any two sets in [nc]m each of density n-bmcontain a pair of coordinates that are within gclog n of each other • Defined in [Ajtai 99a] where weaker lower bounds proved using generalization of [Okol 89] instead of [BRS 89] • Best bounds follow immediately from [BST 98]

  25. What functions are this hard? • Computing xTMyx 0 (modq)forxGF(q)n,y GF(q)2n-1, qn • Function defined in [Ajtai 99b]and case q=2 used for Boolean lower bounds • Key to improvement: For some y, My has better rigidity propertiesthan Sylvester matrices have • Defining these matrices and analyzing their rigidity properties is the key contribution of [Ajtai 99b] • Most of the hard work in Boolean lower bounds is in the second half of [Ajtai 99a], much of which does not fit in the STOC version

  26. Ajtai’s matrices y1 0 y2 Myis constant on anti-diagonals below the main diagonal y3 y4 yn yn+1 yn+2 y2n-2 y2n-1 My

  27. xTMyxon an embedded(m,a)-rectangle B A x For every son AUB, f (xAUB,s,y) = xAT MABxB + g(xA,y) + h(xB,y) A B My x

  28. Rectangles, rank, & rigidity • Largest rectangle on which xATMxB is constant has density  q-rank(M) • [BRS 89] • Lemma[Ajtai 99b] Can fix y s.t. every dndn minor MAB of My has rank(MAB) c dn/log2(1/d)  d1+en • better than comparable rigidity bound of d2n for Sylvester matrices[BRS 89],[BST 98]

  29. How to partition the layers • Assign every layer to 1 or 2 • A = core(x, 1) = unread(x, 2) • B = core(x, 2) = unread(x, 1) • C = set of variables read in common • Two techniques for read-k case, both using probabilistic method • [Borodin-Razborov-Smolensky 89] • |A|, |B| n/2k+1, a  r  k22k • [Okol’nishnikova 89] • |A|  n/kO(k), |B| n/2, a = 2k, r = 2k2

  30. v0 L1 v1 L2 v2 kn vr-1 Lr vr 0 1 Read-k case:Branching program with node sequence

  31. Partitioning the layers • r layers (of height  kn/r) • Let Layers(x,i) be the set of layers in which variable xi is read on input x • |Layers(x,i)|k • For a set of layers, • unread(x, ) ={ i : Layers(x,i)  =} • core(x, ) = { i : Layers(x,i)  } • Partition is good if these are large for = 1, 2

  32. Partitioning the layers [Okol’nishnikova 89] • Fix node sequencesandxthat followss • Choose a random subset 1ofkof therlayers • For each indexi • Thus • Fix a partition achieving the average

  33. Partitioning the layers [Okol’nishnikova 89] • I.e., for each suchx • Onlyklayers of heightkn/r • At most a=2k alternations • Total  k2n/r n/2 vars read in 1 if r=2k2  core (x, 2)  n/2

  34. Partitioning the layers [BRS 89] • Assign each layer independently • Pr[Li1]=Pr[Li2]=1/2 • for =1 or2 • Letci=1ifLayers(x,i)  and 0otherwise • Pr[ci]=Pr[Layers(x,i) ] 1/2k • each variable is read in at mostklayers • E[ici]=E[#{i: Layers(x,i) } ] n/2k • i.e., E[|core(x, )|] n/2k E[|unread(x, )|] n/2k

  35. Modification for general BP [BST 98] • Let l(i) =|Layers(x,i)| • i l(i) kn • Pr[ci] = Pr[Layers(x,i)] = 2 l(i) • E[|core(x, )|] = E[ici] = i2l(i) • By arithmetic-geometric mean inequality this is 

  36. Second Moment Method [BRS 89][BST 98] • If r is big enough |core(x,)| is concentrated around its mean • Bound Var[|core(x, )|] = Var[ici] • Events for ci, cjcorrelated only ifxiand xjread in the same layer • At most l(i)kn/r vars read in the same layer as xi • Each contributes at most Pr[ci]=1/2l(i) to variance • Var[ici] = (kn/r)il(i) 2l(i)  (k/r) (jl(j))i2l(i)  (k2n/r)i2l(i) = (k2n/r) E[|core(x, )|] FKG-like inequality of Chebyshev - terms are anti-correlated

  37. Second Moment Method [BRS 89][BST 98] • Var[|core(x, )|]  (k2n/r) E[|core(x, )|] = (k2n/r) m • By Chebyshev’s inequality • Pr[ m/2|core(x, )| 3m/2]  1  Var[|core(x, )|]/(m/2)2 1  4k22k/r sincem n/2k • Choose r=8k22k

  38. The Boolean case is much harder • [BST 98]Showed onlyT 1.017nforS=o(n)for quadratic form problem • Uses pseudo-rectangles but specialized to splitting BP only at theT/2level, deterministic • [Ajtai 99a]Shows lower bounds for Element Distinctness over [n2] that work for density2-em • Embedded rectangles not pseudo-rectangles, deterministic • [Ajtai 99b]T=O(n)S=W(n)for Boolean BP’s!!! • [B-Saks-Sun-Vee 00] Improved bounds and extension to O(n/T)-error randomized case • Talk later

  39. Power of the Large Domain Technique • For oblivious BPs, best bound using two-party CC is • T=W(n log (n/S))[Alon-Maass 86] • Bounds match for general BPs over large domains • Best oblivious BP bounds use multiparty CC • T=W(n log2(n/S))[Babai-Nisan-Szegedy 89] • [B-Vee 02] Matching bounds for general BPs over large domains • Erik Vee talk later

More Related