Optimal Lower Bounds for Frequency Moments

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT dpwood@mit.edu

4 3 7 3 1 1 0 The Streaming Model … • Stream of elements a1, …, aq each in {1, …, m} • Want to compute statistics on stream • Elements arranged in adversarial order • Algorithms given one pass over stream • Goal: Minimum space algorithm

Frequency Moments • q = stream size, m = universe size • fi = # occurrences of item i • Define k-th Frequency Moment: Applications • F_0 = # distinct elements in stream, F_1 = q • F_2 = repeat rate • Compute self-joins in database

The Best Determininistic Algorithm • Trivial Algorithm for Fk • Store/update frequency fi of each item i • Space: m items i, log q bits for each fi Total Space = O(m log q) Can we do better? • Negative Result [AMS96]: Any algorithm computing Fk exactly must use (m) space.

Approximating Fk • Negative Result [AMS96]: Any deterministic algorithm that outputs x with |Fk – x| <  Fk must use (m) space. What about randomized approximation algorithms? • Randomized algorithm A -approximates Fk if A outputs x with Pr[|Fk – x| <  Fk ] > 2/3

Previous Work • Upper Bounds: Can -approximate F0 [BJKST02],F2 [AMS96], Fk [CK04], k > 2 with space respectively: • Lower Bounds: • [AMS96] 8 k, –approximating Fk need (log m) space • [IW03] -approximating F0 requires space if • Questions: Does the bound hold for k  0? Does it hold for F0 for smaller ?

First Result • Optimal Lower Bound: 8 k  1 and any  = (m-1/2), any -approximator for Fk must use (-2) bits of space. • F1 = q computed trivially in log q space • Fk computed in O(m log q) space, so need  = (m-.5) • Technique: Reduction from 2-party protocol for computing Hamming distance (x,y)

Idea Behind Lower Bounds Alice Bob y 2 {0,1}m x 2 {0,1}m Stream s(y) Stream s(x) S Internal state of A (1 §) Fk algorithm A (1 §) Fk algorithm A • Compute (1 §) Fk(s(x) ± s(y)) w.p. > 2/3 • Idea: If can decide f(x,y) w.p. > 2/3, space used • by A at least randomized 1-way comm. Complexity of f(,)

Randomized 1-way comm. complexity • Boolean function f: X£Y! {0,1} • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) • Only 1 message sent: must be from Alice to Bob • Comm. cost of protocol = expected length of longest message sent over all inputs. •  -error randomized 1-way comm. complexity of f, R(f), is comm. cost of optimal protocol computing f w.p. ¸ 1- How do we lower bound R(f)?

The VC Dimension [KNR] • F = {f : X! {0,1}} family of Boolean functions • f 2F is length-|X |bitstring • For S µX, shatter coefficient SC(fS) of S is |{f |S}f 2 F| = # distinct bitstrings when F restricted to S • SC(F, p) = maxS 2X, |S| = p SC(fS) • If SC(fS) = 2|S|, S shattered by F • VC Dimension of F, VCD(F), = size of largest S shattered by F

Shatter Coefficient Theorem • Notation: For f: X£Y! {0,1}, define: fX = { fx(y) : Y ! {0,1} | x 2X }, where fx(y) = f(x,y) • Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( fX ), R1/3(f) = (log(SC(fX, p)))

Hamming Distance Decision Problem (HDDP) Set t = (1/2) Alice Bob x 2 {0,1}t y 2 {0,1}t • Promise Problem : • (x,y) · t/2 – t1/2 (x,y) > t/2 • f(x,y) = 0 OR f(x,y) = 1 • We will lower bound R1/3(f) via SC(fX, t), but first, a critical lemma…

Main Lemma S µ{0,1}n • Show 9 S µ {0,1}n with |S| = n s.t. there exists 2(n) “good” sets T µ S so that: • 9 a separator y 2 {0,1}n s.t • 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 • 8 t 2 S – T, (y,t) > n/2 = T y = S-T

Lemma Solves HDDP Complexity • Theorem: R1/3(f) = (t) = (-2). • Proof: • Alice gets yT for random good set T applying main lemma with n = t. • Bob gets random s 2 S • Let f: {yT }T£ S ! {0,1}. • Main Lemma =>SC(f) = 2(t) • [BJKS] => R1/3(f) = (t) = (-2)

Back to Frequency Moments Idea: Use -approximator for Fk in a protocol to solve HDDP y2 {0,1}t s 2 S µ {0,1}t ith universe element included exactly once in auxiliary stream ay (resp. as) if and only if yi (resp. si) = 1. ay as Fk Alg Fk Alg State

Solving HDDP with Fk • Alice/Bob compute -approx to Fk(ay± as) • Fk(ay± as) = 2k wt(y Æ s) + 1k(y,s) For k  1, • Alice also transmits wt(y) in log m space. Conclusion: -approximating Fk(ay± as) decides HDDP, so space for Fk is (t) = (-2)

But How to Prove Main Lemma? • Recall: show 9 S µ {0,1}n with |S| = n s.t. there exists 2(n) sets T µ S so that: • 9 a separator y 2 {0,1}n s.t • 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 • 8 t 2 S – T, (y,t) > n/2 • Use probabilistic method • For S, choose n random elts in {0,1}n • Show probability arbitrary T µ S satisfies (1),(2) is > 2-zn for constant z < 1. • Hence expected such T is 2(n) • So exists S with 2(n) such T Key

Proving the Main Lemma Let T ={t1, …, tn/2} µ S be arbitrary Let yi = majority(t1,i, ..., tn/2,i) for all i 2 [m] What is probability p that both: • 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 • 8 t 2 S – T, (y,t) > n/2 For 1, let x = Pr[8 t 2 T, (y,t) · n/2 – cn.5] For 2, let y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2 By independence, p = x ¢ y. It remains to lower bound x…

The Matrix Problem • WLOG, assume y = 1n (recall y is majority word) • Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn.5] • Equivalent to matrix problem: t1 -> t2 -> … tn/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 Given random n/2 x n binary matrix w/each column majority 1, what is probablity each row has at least n/2 + cn.5 1s?

Bipartite Graphs • Matrix Problem  Bipartite Graph Counting Problem: … … • How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn.5 and each right vertex degree > n/2?

Second Result • Bipartite graph count: Probabilistic argument shows at least 2n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. • Analysis generalizes to show # bipartite graphs on m + n vertices w/each left vertex having degree > n/2 and each right vertex degree > m/2 is > 2mn-zm-n. • Previous known count: 2mn-m-n [MW – personal comm.] • Follows easily from a correlation inequality of Kleitman. • Our proof uses correlation inequalities, but more involved analysis.

Summary • Results: • Optimal Lower Bound: 8 k  1 and any  = (m-1/2), any -approximator for Fk must use (-2) bits of space. • Bipartite Graph Count: # bipartite graphs on m + n vertices w/each left vertex having degree > n/2 and each right vertex degree > m/2 is at least 2mn-zm-n for constant z < 1.

Optimal Lower Bounds for Frequency Moments