290 likes | 392 Vues
Explore lower bounds for computing statistics on streams of elements in adversarial order with minimal space algorithms for frequency moments. Learn about randomized algorithms, applications, previous bounds, and communication complexity. Dive into the technique used for optimal lower bounds and how it relates to the 2-party protocol for computing Hamming distance. Discover the shatter coefficients theory and how it influences the lower bounds of communication complexity.
 
                
                E N D
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA ’04 paper
4 3 7 3 1 1 0 The Streaming Model [AMS96] … • Stream of elements a1, …, aq each in {1, …, m} • Want to compute statistics on stream • Elements arranged in adversarial order • Algorithms given one pass over stream • Goal: Minimum space algorithm
Frequency Moments Notation • q = stream size, m = universe size • fi = # occurrences of item i k-th moment • F0 = # of Distinct elements • F1 = q • F2 = repeat rate Why are frequency moments important?
Applications • Estimating # distinct elts. w/ low space • Estimate selectivity of queries to DB w/o expensive sort • Routers gather # distinct destinations w/limited memory. • Estimating F2 estimates size of self-joins: ,
The Best Determininistic Algorithm • Trivial algorithm for Fk • Store/update fifor each item i, sum fik at end • Space = O(mlog q): m items i, log q bits to count fi • Negative Results [AMS96]: • Compute Fk exactly => (m) space • Any deterministic alg. outputs x with |Fk – x| <  must use (m) space What about randomized algorithms?
Randomized Approx Algs for Fk • Randomized alg. -approximates Fk if outputs x s.t. Pr[|Fk – x| <  Fk ] > 2/3 • Can -approximate F0 [BJKST02], F2 [AMS96], Fk [CK04], k > 2 in space: (big-Oh notation suppresses polylog(1/, m, q) factors) • Ideas: • Hashing: O(1)-wise independence • Sampling
Example: F0 [BJKST02] • Idea: For random function h:[m] -> [0,1] and distinct elts b1, b2, …, bF0, expect mini h(bi) ¼ 1/F0 Algorithm: • Choose 2-wise indep. hash function h: [m] -> [m3] • Maintain t = (1/2) distinct smallest values h(bi) • Let v be t-th smallest value • Output tm3/v as estimate for F0 • Success prob up to 1- => take median O(log 1/) copies • Space: O((log 1/)/2)
Example: F2 [AMS99] Algorithm: • Choose 4-wise indep. hash function h:[m] -> {-1,1} • Maintain Z = i in [m] fi¢ h(i) • Output Y = Z2 as estimate for F2 Correctness: Chebyshev’s inequality => O(1/2) space
Previous Lower Bounds: • [AMS96] 8 k, –approximating Fk => (log m) space • [Bar-Yossef] -approximating F0 => (1/) space • [IW03] -approximating F0 => space if • Questions: • Does the bound hold for k  0? • Does it hold for F0 for smaller ?
Our First Result • Optimal Lower Bound: 8 k  1, any  = (m-.5), -approximate Fk => (-2) bits of space. • F1 = q trivial in log q space • Fk trivial in O(m log q) space, so need  = (m-.5) • Technique: Reduction from 2-party protocol for computing Hamming distance (x,y) • Use tools from communication complexity
Lower Bound Idea Alice Bob y 2 {0,1}m x 2 {0,1}m Stream s(y) Stream s(x) S Internal state of A (1 §) Fk algorithm A (1 §) Fk algorithm A • Compute (1 §) Fk(s(x) ± s(y)) w.p. > 2/3 • Idea: If can decide f(x,y) w.p. > 2/3, space used • by A at least randomized 1-way comm. Complexity of f
Randomized 1-way comm. complexity • Boolean function f: X£Y! {0,1} • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) • Only 1 message m sent: must be from Alice to Bob • Communication cost = maxx,y Ecoins [|m|] •  -error randomized 1-way communication complexity R(f), is cost of optimal protocol computing f with probability ¸ 1- Ok, but how do we lower bound R(f)?
Shatter Coefficients [KNR] • F = {f : X! {0,1}} function family, f 2F length-|X|bitstring • For S µX, shatter coefficientSC(fS) of S : |{f |S}f 2 F| = # distinct bitstrings when F restricted to S • SC(F, p) = maxS µ X, |S| = p SC(fS). If SC(fS) = 2|S|, S shattered • Treat f: X£Y! {0,1} as function family fX : • fX = { fx(y) : Y ! {0,1} | x 2X }, where fx(y) = f(x,y) • Theorem [BJKS]: For every f: X £ Y ! {0,1}, every integer p, R1/3(f) = (log(SC(fX, p)))
Warmup: (1/) Lower Bound [Bar-Yossef] • Alice input x 2R {0,1}m, wt(x) = m/2 • Bob input y 2R {0,1}m, wt(y) = m • s(x), s(y) any streams w/char. vectors x, y PROMISE: (1) wt(x Æ y) = 0 OR (2) wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 F0(s(x) ± s(y)) = m/2 + m F0(s(x) ± s(y)) = m/2 • R1/3(f) = (1/) [Bar-Yossef] (uses shatter coeffs) • (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () • Hence, can decide f ! F0 alg. uses (1/) space • Too easy! Can replace F0 alg. with a Sampler!
Our Reduction: Hamming Distance Decision Problem (HDDP) Set t = (1/2) Alice Bob x 2 {0,1}t y 2 {0,1}t • Promise Problem : • (x,y) · t/2 – (t1/2) (x,y) > t/2 • f(x,y) = 0 OR f(x,y) = 1 • Lower bound R1/3(f) via SC(fX, t), but need a lemma
Main Lemma S µ{0,1}n • 9 S µ {0,1}n with |S| = n s.t. exist 2(n) “good” sets T µ S s.t. • 9 y 2 {0,1}n s.t • 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 • 8 t 2 S – T, (y,t) > n/2 = T y = S-T
Lemma Resolves HDDP Complexity • Theorem: R1/3(f) = (t) = (-2). • Proof: • Alice gets yT for random good set T applying main lemma with n = t. • Bob gets random s 2 S • Let f: {yT }T£ S ! {0,1}. • Main Lemma =>SC(f) = 2(t) • [BJKS] => R1/3(f) = (t) = (-2) • Corollary: (1/2) space for randomized 2-party protocol to approximate (x,y) between inputs • First known lower bound in terms of !
Back to Frequency Moments Use -approximator for Fk to solve HDDP y2 {0,1}t s 2 S µ {0,1}t i-th universe element included exactly once in stream ay iff yi = 1 (as same) ay as Fk Alg Fk Alg State
Solving HDDP with Fk • Alice/Bob compute -approx to Fk(ay± as) • Fk(ay± as) = 2k wt(y Æ s) + 1k(y,s) • For k  1, • Alice also transmits wt(y) in log m space. Conclusion: -approximating Fk(ay± as) decides HDDP, so space for Fk is (t) = (-2)
Back to the Main Lemma • Recall: show 9 S µ {0,1}n with |S| = n s.t. 2(n) “good” sets T µ S s.t: • 9 y 2 {0,1}n s.t 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 • Probabilistic Method • Choose n random elts in {0,1}n for S • Show arbitrary T µ S of size n/2 is good with probability > 2-zn for constant z < 1. • Expected # good T is 2(n) • So exists S with 2(n) good T
Proving the Main Lemma • T ={t1, …, tn/2} µ S arbitrary • Let y be majority codeword of T • What is probability p that both: 1. 8 t 2 T, (y, t) · n/2 – cn1/2 for some c > 0 2. 8 t 2 S – T, (y,t) > n/2 • Put x = Pr[8 t 2 T, (y,t) · n/2 – cn1/2] • Put y = Pr[8 t 2 S-T, (y,t) > n/2] = 2-n/2 Independence => p = xy = x2-n/2
The Matrix Problem • Wlog, assume y = 1n (recall y is majority word) • Want lower bound Pr[8 t 2 T, (y,t) · n/2 – cn1/2] • Equivalent to matrix problem: t1 -> t2 -> … tn/2 -> 101001000101111001 100101011100011110 001110111101010101 101010111011100011 For random n/2 x n binary matrix M, each column majority 1, what is probablity each row ¸ n/2 + cn1/2 1s?
A First Attempt • Set family A µ 2^{0,1}n monotone increasing if S12 A, S1µ S2 => S22 A • For uniform distribution on S µ {0,1}n, and A, B monotone increasing families, [Kleitman] Pr[A Å B] ¸ Pr[A] ¢ Pr[B] • First try: • Let R be event M ¸ n/2 + cn1/2 1s in each row, C event M majority 1 in each column • Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R | C] = Pr[R Å C]/Pr[C] • M characteristic vector of subset of [.5n2] => R,C monotone increasing • => Pr[R Å C]/Pr[C] ¸ Pr[R]Pr[C]/Pr[C] = Pr[R] < 2-n/2 • But we need > 2-zn/2 for constant z < 1, so this fails…
A Second Attempt • Second Try: • R1: M ¸ n/2 + cn1/2 1s in first m rows • R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows • C: M majority 1 in each column • Pr[8 t 2 T, (y,t) · n/2 – cn1/2] = Pr[R1Å R2 | C] = Pr[R1Å R2Å C]/Pr[C] • R1, R2, C monotone increasing • => Pr[R1Å R2Å C]/Pr[C] ¸ Pr[R1Å C]Pr[R2]/Pr[C] = Pr[R1 |C] Pr[R2] • Want this at least 2-zn/2 for z < 1 • Pr[ Xi > n/2 + cn1/2] > ½ - c (2/pi)1/2 [Stirling] • Independence => Pr[R2] > (½ - c(2/pi)1/2)n/2 - m Remains to show Pr[R1 | C] large.
Computing Pr[R1 | C] • Pr[R1 | C] = Pr[M ¸ n/2 + cn1/2 1s in 1st m rows | C] • Show Pr[R1 | C] > 2-z’m for certain constant z’ < 1 • Ingredients: • Expect to get n/2 + (n1/2) 1s in each of 1st m rows | C • Use negative correlation of entries in a given row => show n/2 + (n1/2) 1s in a given row w/good probability for small enough c • A simple worst-case conditioning argument on these 1st m rows shows they all have ¸ n/2 + cn1/2 1s
Completing the Proof • Recall: what is probability p = xy, where 1. x = Pr[ 8 t 2 T, (y, t) · n/2 – cn1/2] • y = Pr[ 8 t 2 S – T, (y,t) > n/2] = 2-n/2 • R1: M ¸ n/2 + cn1/2 1s in first m rows • R2: M ¸ n/2 + cn1/2 1s in remaining n/2-m rows • C: M majority 1 in each column • x ¸Pr[R1 | C] Pr[R2] ¸2-z’m (½ - c(2/pi)1/2)n/2 – m Analysis shows z’ small so this ¸ 2-z’’n/2, z’’ < 1 • Hence p = xy ¸ 2-(z’’+1)n/2 • Hence expected # good sets 2n-O(log n)p = 2(n) • So exists S with 2(n) good T
Bipartite Graphs • Matrix Problem  Bipartite Graph Counting Problem: … … • How many bipartite graphs exist on n/2 by n vertices s.t. each left vertex has degree > n/2 + cn1/2 and each right vertex degree > n/2?
Our Result on # of Bipartite Graphs • Bipartite graph count: • Argument shows at least 2n^2/2 – zn/2 –n such bipartite graphs for constant z < 1. • Main lemma shows # bipartite graphs on n + n vertices w/each vertex degree > n/2 is > 2n^2-zn-n • Can replace > with < • Previous knowncount: 2n^2-2n • [MW – personal comm.] • Follows easily from Kleitman inequality
Summary • Results: • Optimal Fk Lower Bound: 8 k  1 and any  = (m-1/2), any -approximator for Fk must use (-2) bits of space. • Communication Lower Bound of (-2) for one-way communication complexity of (, )-approximating (x, y) • Bipartite Graph Count: # bipartite graphs on n + n vertices w/each vertex degree > n/2 at least 2n^2-zn-n for constant z < 1.