Optimal Space Lower Bounds for all Frequency Moments

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA ’04 paper

4 3 7 3 1 1 0 The Streaming Model [AMS96] … • Stream of elements a1, …, aq each in {1, …, m} • Want to compute statistics on stream • Elements arranged in adversarial order • Algorithms given one pass over stream • Goal: Minimum space algorithm

Frequency Moments Notation • q = stream size, m = universe size • fi = # occurrences of item i k-th moment • F0 = # of Distinct elements • F1 = q • F2 = repeat rate Why are frequency moments important?

Applications • Estimating # distinct elts. w/ low space • Estimate selectivity of queries to DB w/o expensive sort • Routers gather # distinct destinations w/limited memory. • Estimating F2 estimates size of self-joins: ,

The Best Determininistic Algorithm • Trivial algorithm for Fk • Store/update fifor each item i, sum fik at end • Space = O(mlog q): m items i, log q bits to count fi • Negative Results [AMS96]: • Compute Fk exactly => (m) space • Any deterministic alg. outputs x with |Fk – x| <  must use (m) space What about randomized algorithms?

Randomized Approx Algs for Fk • Randomized alg. -approximates Fk if outputs x s.t. Pr[|Fk – x| <  Fk ] > 2/3 • Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 in space: (big-Oh notation suppresses polylog(1/, m, q) factors) • Ideas: • Hashing: O(1)-wise independence • Sampling

Example: F0 [BJKST02] • Idea: For random function h:[m] -> [0,1] and distinct elts b1, b2, …, bF0, expect mini h(bi) ¼ 1/F0 Algorithm: • Choose 2-wise indep. hash function h: [m] -> [m3] • Maintain t = (1/2) distinct smallest values h(bi) • Let v be t-th smallest value • Output tm3/v as estimate for F0 • Success prob up to 1- => take median O(log 1/) copies • Space: O((log 1/)/2)

Example: F2 [AMS99] Algorithm: • Choose 4-wise indep. hash function h:[m] -> {-1,1} • Maintain Z = i in [m] fi¢ h(i) • Output Y = Z2 as estimate for F2 Correctness: Chebyshev’s inequality => O(1/2) space

Previous Lower Bounds: • [AMS96] 8 k, –approximating Fk => (log m) space • [Bar-Yossef] -approximating F0 => (1/) space • [IW03] -approximating F0 => space if • Questions: • Does the bound hold for k  0? • Does it hold for F0 for smaller ?

Our First Result • Optimal Lower Bound: 8 k  1, any  = (m-.5), -approximate Fk => (-2) bits of space. • F1 = q trivial in log q space • Fk trivial in O(m log q) space, so need  = (m-.5) • Technique: Reduction from 2-party protocol for computing Hamming distance (x,y) • Use tools from communication complexity

Lower Bound Idea Alice Bob y 2 {0,1}m x 2 {0,1}m Stream s(y) Stream s(x) S Internal state of A (1 §) Fk algorithm A (1 §) Fk algorithm A • Compute (1 §) Fk(s(x) ± s(y)) w.p. > 2/3 • Idea: If can decide f(x,y) w.p. > 2/3, space used • by A at least randomized 1-way comm. Complexity of f

Randomized 1-way comm. complexity • Boolean function f: X£Y! {0,1} • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) • Only 1 message m sent: must be from Alice to Bob • Communication cost = maxx,y Ecoins [|m|] •  -error randomized 1-way communication complexity R(f), is cost of optimal protocol computing f with probability ¸ 1- Ok, but how do we lower bound R(f)?

Shatter Coefficients [KNR] • F = {f : X! {0,1}} function family, f 2F length-|X|bitstring • For S µX, shatter coefficientSC(fS) of S : |{f |S}f 2 F| = # distinct bitstrings when F restricted to S • SC(F, p) = maxS µ X, |S| = p SC(fS). If SC(fS) = 2|S|, S shattered • Treat f: X£Y! {0,1} as function family fX : • fX = { fx(y) : Y ! {0,1} | x 2X }, where fx(y) = f(x,y) • Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p, R1/3(f) = (log(SC(fX, p)))

Warmup: (1/) Lower Bound [Bar-Yossef] • Alice input x 2R {0,1}m, wt(x) = m/2 • Bob input y 2R {0,1}m, wt(y) = m • s(x), s(y) any streams w/char. vectors x, y PROMISE: (1) wt(x Æ y) = 0 OR (2) wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 F0(s(x) ± s(y)) = m/2 + m F0(s(x) ± s(y)) = m/2 • R1/3(f) = (1/) [Bar-Yossef] (uses shatter coeffs) • (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () • Hence, can decide f ! F0 alg. uses (1/) space • Too easy! Can replace F0 alg. with a Sampler!

Our Reduction: Hamming Distance Decision Problem (HDDP) Set t = (1/2) Alice Bob x 2 {0,1}t y 2 {0,1}t • Promise Problem : • (x,y) · t/2 – (t1/2) (x,y) > t/2 • f(x,y) = 0 OR f(x,y) = 1 • Lower bound R1/3(f) via SC(fX, t), but need a lemma

Main Lemma S µ{0,1}n • 9 S µ {0,1}n with |S| = n s.t. exist 2(n) “good” sets T µ S s.t. • 9 y 2 {0,1}n s.t • 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 • 8 t 2 S – T, (y,t) > n/2 = T y = S-T

Lemma Resolves HDDP Complexity • Theorem: R1/3(f) = (t) = (-2). • Proof: • Alice gets yT for random good set T applying main lemma with n = t. • Bob gets random s 2 S • Let f: {yT }T£ S ! {0,1}. • Main Lemma =>SC(f) = 2(t) • [BJKS] => R1/3(f) = (t) = (-2) • Corollary: (1/2) space for randomized 2-party protocol to approximate (x,y) between inputs • First known lower bound in terms of !

Back to Frequency Moments Use -approximator for Fk to solve HDDP y2 {0,1}t s 2 S µ {0,1}t i-th universe element included exactly once in stream ay iff yi = 1 (as same) ay as Fk Alg Fk Alg State

Solving HDDP with Fk • Alice/Bob compute -approx to Fk(ay± as) • Fk(ay± as) = 2k wt(y Æ s) + 1k(y,s) • For k  1, • Alice also transmits wt(y) in log m space. Conclusion: -approximating Fk(ay± as) decides HDDP, so space for Fk is (t) = (-2)

Back to the Main Lemma • Recall: show 9 S µ {0,1}n with |S| = n s.t. 2(n) “good” sets T µ S s.t: • 9 y 2 {0,1}n s.t 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 • Probabilistic Method • Choose n random elts in {0,1}n for S • Show arbitrary T µ S of size n/2 is good with probability > 2-zn for constant z < 1. • Expected # good T is 2(n) • So exists S with 2(n) good T

Proving the Main Lemma • T ={t1, …, tn/2} µ S arbitrary • Let y be majority codeword of T • What is probability p that both: 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 • Put x = Pr[8 t 2 T, (y,t) · n/2 – cn1/2] • Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2 Independence => p = xy = x2-n/2

The Matrix Problem • Wlog, assume y = 1n (recall y is majority word) • Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn1/2] • Equivalent to matrix problem: t1 -> t2 -> … tn/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 For random n/2 x n binary matrix M, each column majority 1, what is probablity each row ¸ n/2 + cn1/2 1s?

A First Attempt • Set family A µ 2^{0,1}n monotone increasing if S12 A, S1µ S2 => S22 A • For uniform distribution on S µ {0,1}n, and A, B monotone increasing families, [Kleitman] Pr[A Å B] ¸ Pr[A] ¢ Pr[B] • First try: • Let R be event M ¸ n/2 + cn1/2 1s in each row, C event M majority 1 in each column • Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R | C] = Pr[R Å C]/Pr[C] • M characteristic vector of subset of [.5n2] => R,C monotone increasing • => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2-n/2 • But we need > 2-zn/2 for constant z < 1, so this fails…

A Second Attempt • Second Try: • R1: M ¸ n/2 + cn1/2 1s in first m rows • R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows • C: M majority 1 in each column • Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R1Å R2 | C] = Pr[R1Å R2Å C]/Pr[C] • R1, R2, C monotone increasing • => Pr[R1Å R2Å C]/Pr[C] ¸ Pr[R1Å C]Pr[R2]/Pr[C] = Pr[R1 |C] Pr[R2] • Want this at least 2-zn/2 for z < 1 • Pr[ Xi > n/2 + cn1/2] > ½ - c (2/pi)1/2 [Stirling] • Independence => Pr[R2] > (½ - c(2/pi)1/2)n/2 - m Remains to show Pr[R1 | C] large.

Computing Pr[R1 | C] • Pr[R1 | C] = Pr[M ¸ n/2 + cn1/2 1s in 1st m rows | C] • Show Pr[R1 | C] > 2-z’m for certain constant z’ < 1 • Ingredients: • Expect to get n/2 + (n1/2) 1s in each of 1st m rows | C • Use negative correlation of entries in a given row => show n/2 + (n1/2) 1s in a given row w/good probability for small enough c • A simple worst-case conditioning argument on these 1st m rows shows they all have ¸ n/2 + cn1/2 1s

Completing the Proof • Recall: what is probability p = xy, where 1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn1/2] • y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2-n/2 • R1: M ¸ n/2 + cn1/2 1s in first m rows • R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows • C: M majority 1 in each column • x ¸Pr[R1 | C] Pr[R2] ¸2-z’m (½ - c(2/pi)1/2)n/2 – m Analysis shows z’ small so this ¸ 2-z’’n/2, z’’ < 1 • Hence p = xy ¸ 2-(z’’+1)n/2 • Hence expected # good sets 2n-O(log n)p = 2(n) • So exists S with 2(n) good T

Bipartite Graphs • Matrix Problem  Bipartite Graph Counting Problem: … … • How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn1/2 and each right vertex degree > n/2?

Our Result on # of Bipartite Graphs • Bipartite graph count: • Argument shows at least 2n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. • Main lemma shows # bipartite graphs on n + n vertices w/each vertex degree > n/2 is > 2n^2-zn-n • Can replace > with < • Previous knowncount: 2n^2-2n • [MW – personal comm.] • Follows easily from Kleitman inequality

Summary • Results: • Optimal Fk Lower Bound: 8 k  1 and any  = (m-1/2), any -approximator for Fk must use (-2) bits of space. • Communication Lower Bound of (-2) for one-way communication complexity of (, )-approximating (x, y) • Bipartite Graph Count: # bipartite graphs on n + n vertices w/each vertex degree > n/2 at least 2n^2-zn-n for constant z < 1.

Optimal Space Lower Bounds for all Frequency Moments