Extremal Properties of Polynomial Threshold Functions

Extremal properties of polynomial threshold functions Ryan O’Donnell (MIT / IAS) Rocco Servedio (Columbia)

Representing boolean functions Complexity theory studies dozens of different representations for boolean functions: • circuits: boolean, algebraic, threshold; formulas, low-depth variants • decision trees • branching programs • switching networks • polynomials over various fields • monotone span programs • contact networks

Extremal bounds For each representation, one can ask, “What is the “size” of the hardest boolean function, or of a random function?” Often a fairly easy problem: upper bound by “trivial” construction, lower bound by counting. E.g., for circuit size: • [Lupanov-58] Every function has a circuit of size (1+o(1))2n/n. • [Shannon-49] Almost every function requires circuits of size 2n/n.

Polynomial Threshold Functions Let f : {+1,-1}n→ {+1,-1} be a boolean fcn. Let p : Rn →R a multilinear polynomial. We say that p is a polynomial threshold function (PTF) for f, or p sign-represents f, if: f(x) = sgn(p(x)) for all x{+1,-1}n. • See the excellent survey “Slicing the hypercube” [Saks-93]. • PTFs correspond to the circuit class Threshold-Of-Parities.

PTF examples • AND: x1 + x2 + · · · + xn + (n-1) • OR: x1 + x2 + · · · + xn – (n-1) • Majority: x1 + x2 + · · · + xn • Parity: x1 x2 · · · xn • (x1x2) x3: 100 x1 x2 + x3– 100 There are two main size measures for PTFs:degree – number of vbls. in biggest monomial (between 0 and n)density – number of monomials (between 1 and 2n)

Why PTFs? • natural algebraic model of complexity • degree upper bounds:  machine learning algorithms [Klivans-S-01, O-S-03] PP closed under intersection [Beigel-Reingold-Spielman-95] • simultaneous degree/density lower bounds:  oracle separations (e.g., PNP ≠ PPA, [Beigel-94]) • degree lower bounds:  quantum decision tree lower bounds A

The PTF extremal problem Also, the PTF extremal problem is interesting! • Are there functions that require PTF degree n? • Do most functions have PTF degree << n? • Does every function have PTF density somewhat smaller than 2n? • Are there functions that require PTF density close to 2n?

Results in this talk In this talk I will discuss two of our results: • Degree upper bound: Almost every boolean function has PTF degree at most n/2 + O(√n log n). • Density upper bound: Every boolean function has PTF density at most (1 – O(n)) 2n. 1

Results not in this talk def: We say p is a weak PTF for f if, for all x{+1,-1}n, either p(x) = 0 or sgn(p(x)) = f(x). (Also, p is not allowed to be identically 0!) Saks asked whether almost all functions require weak PTF density (½ - ε) 2n. In fact, we show everyfunction has weak PTF density o(1)2n (Ramsay theory). We show a couple other bounds…

PTF Degree Bounds

Degree bounds: previous results • [Minsky-Papert-68]: Parity and its negation require PTF degree n. [Aspnes-Beigel-Furst-Rudich-94] show these are the only such functions. • [Wang-Williams-91], [ABFR-94]:Conjecture:almost every function has PTF degree n/2 or n/2. • Lower bound of n/2 shown by a counting argument [Anthony-92], [Alon-93] based on a result of [Cover-65].

Progress on the upper bound Towards the upper bound: • [Razborov-Rudich-94] showed almost every function has PTF degree .95 n. • [Alon-93] observed that the work of [Gotsman-89] implies a PTF degree upper bound of .89 n. We show the conjecture is true up to lower-order terms: Thm: Almost every function has PTF degree n/2 + O(√n log n).

Fourier detour It’s known that any function f : {+1,-1}n→ R can be exactly represented as f(x) = Σ f (S) xS, where the f (S)’s are real constants, and the monomial xS is Πxi. This is known as theFourier representation. Parseval’s identity: Σ f (S)2 = Σ f(x)2 / 2n. S  [n] iS S  [n] x{+1,-1}n

Our degree upper bound We actually show a stronger fact: Thm: Let Sbe any collection of (1-1/n)2n monomials. Then a.e. function has a PTF over these monomials. Cor: Almost every function has a PTF of degree n/2 + O(√n log n). Proof: For each z{+1,-1}n, let δz : {+1,-1}n→ R be the “Dirac delta function,” δz(z) = 2n, δz(x) = 0 for x ≠ z.

Proof sketch continued Random ±2nfunctions are made by formingΣ f(z) δz(x), where f(z)’s are coin tosses. The function δz(x) has a simple Fourier representation: δz(x) =Σ zSxS. Suppose we “approximate” each δzby deleting the summands outside S : δ'z(x) = Σ zSxS. z{+1,-1}n S  [n] S S

2n δz(·) 0 z {+1,-1}n δz(x) =+1 +x1 -x2+x3 -x1x2+x1x3-x2x3+ · · · 2n δ'z(·) |S| ±(1/n) 2n {+1,-1}n z δz(x) =+1 +x1 -x2+x3 -x1x2+x1x3-x2x3+ · · ·

Proof sketch continued We want to show that for any particular x, w.v.h.p, Σ f(z) δz(x) and Σ f(z) δ'z(x) have the same sign. (Then union bd. over x.) Taking the z = x summand starts the sum off with |S| f(x) = (1-1/n)2nf(x) – good shape so far. You get noise terms for all other z. But…! Key point: These are summed with random ±signs, so they get “dampened”. z{+1,-1}n z{+1,-1}n

Proof sketch completed To show that a random ± sum of quantities – {δ'z(x): z ≠ x} – is small w.h.p., the key is to show a) the #’s are bounded, and b) the sum of squares (variance) is small. Both come easily: each # is at most (1/n)2n (in abs. val.), and the sum of the squares is easily calculated exactly using Parseval’s equation: independently of x, it’s equal to (1/n–1/n2) 22n. SD≈(1/√n)2n. Hence Hoeffding |error| < .5 2n w.v.h.p.

PTF Density Bounds

Density bounds: previous results • [Gotsman-89] showed that every boolean function has PTF density at most 2n – 2n/2. • [Saks-93] observed that [Cover-65] implies that almost every boolean function requires PTF density at least .11 2n. • Our thm: Every boolean function has PTF density at most (1-1/O(n)) 2n. • We get to omit a 1/O(n) fraction of monomials, compared to [G89]’s 1/2n/2.

Proof sketch: density upper bound Let f : {+1,-1}n→ {+1,-1} be any boolean fcn. Let: L1(f) = Σ |f (S)|. Since Σ f (S)2 = Σ f (S)2 = 1 (Parseval), by Cauchy-Schwarz, L1(f) ≤ 2n/2. [Bruck-Smolensky-92] shows that f always has a PTF of density 2nL1(f)2. So we’re already done unless, say,L1(f) ≥ (1/n) 2n/2. S  [n] S  [n] S  [n]

Proof sketch continued If L1(f) is very close to its upper bound, 2n/2, then its coefficients must be very “spread out”: a handful may be “large,” but almost all must be close to 2-n/2. Recall: f(x) = Σ f (S) xS. Let L be the set of coefficients that are “small.” Fix x. We show that if you omit a random selection of (1/O(n)) 2n terms from L, the sum of what you omit is smaller than 1 w.p. 1 – 2-2n. S  [n]

Proof sketch: completed f(x) = Σ f (S) xS We’re adding up N ≈ 2n numbers, f (S) xS. Each is not much more than ±2-n/2 = ±1/√N. Their mean is very small – around ±log(N)/N: Had we summed over all S we would have gotten f(x) =±1; we omitted few terms. Hence (Hoeffding) if we sum a random subset of size N/log(N), the result has magnitude at most 1 w.p. at most 1/N2. S L

Open problems For the problem of degree, the conjecture of Wang & Williams and ABFR is still open: Is the PTF degree of almost every function as low as n/2? For the problem of density, we’re not even sure where the right answer lies: .11 2n … (1-1/O(n)) 2n. Our conjecture: Almost every function has PTF density .5 2n.

Extremal Properties of Polynomial Threshold Functions

Extremal Properties of Polynomial Threshold Functions

Presentation Transcript

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial functions

POLYNOMIAL FUNCTIONS

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

Polynomial Functions

POLYNOMIAL FUNCTIONS

Polynomial Functions

Polynomial Functions