Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error T.S. Jayram David Woodruff IBM Almaden

Data Stream Model • Have a stream of m updates to an n-dimensional vector v • “add x to coordinate i” • Insertion model -> all updates x are positive • Turnstile model -> x can be positive or negative • stream length and updates < poly(n) • Estimate statistics of v • # of distinct elements F0 • Lp-norm |v|p = (Σi |vi|p )1/p • entropy • and so on • Goal: output a (1+ε)-approximation with limited memory

Lots of “Optimal” Papers • Lots of “optimal” results • “An optimal algorithm for the distinct elements problem” [KNW] • “Fast moment estimation in optimal space” [KNPW] • “A near-optimal algorithm for estimating entropy of a stream” [CCM] • “Optimal approximations of the frequency moments of data streams” [IW] • “A near-optimal algorithm for L1-difference” [NW] • “Optimal space lower bounds for all frequency moments” [W] • This paper • Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error

What Is Optimal? • F0 = # of non-zero entries in v • “For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε-2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition.” • If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n • So “optimal” algorithms are only optimal for algorithms with constant success probability

Can We Improve the Lower Bounds? x 2 {0,1}ε-2 y 2 {0,1}ε-2 Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε-2) with 1/3 error probability But upper bound of ε-2 with 0 error probability

Our Results

Streaming Results • Independent repetition is optimal! • Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ • Ω(ε-2 log n log 1/δ) bits for any p • [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2 • Estimating F0 in insertion model up to 1+ε w.p. 1-δ • Ω(ε-2log 1/δ + log n) bits • [KNW] get O(ε-2 log 1/δ) for ε-2 > log n • Estimating entropy in turnstile model up to 1+ε w.p. 1-δ • Ω(ε-2log n log 1/δ) bits • Improves Ω(ε-2 log n) bound [KNW]

Johnson-Lindenstrauss Transforms • Let A be a random matrix so that with probability 1- δ, for any fixed q 2 Rd |Aq|2 = (1 ± ε) |q|2 • [JL] A can be a 1/ε2 log 1/δ x d matrix • Gaussians or sign variables work • [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε rows • Our result: A needs to have 1/ε2 log 1/δ rows

Communication Complexity Separation f(x,y) 2 {0,1} y x 1 0 D1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f)

Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2r rows occur in communication matrix on these columns [KNR]: R||1/3(f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R||δ(f) = Θ(k log 1/δ) while R||δ(g) = Θ(k)

Our Techniques

Lopsided Set Intersection (LSI) U = 1/ε2¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1/δ • Alice cannot describe S with o(ε-2 log U) bits • If x, y are uniform then with constant probability, S Å T = ; • R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2log 1/δ)

Lopsided Set Intersection (LSI2) U = 1/ε2¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1 • R||δ/3(LSI2) ¸ R||1/3(LSI) = Ω(ε-2log 1/δ) • Union bound over set elements in LSI instance

Low Error Inner Product x 2 {0, ε}U |x|2 = 1 U = 1/ε2¢ 1/δ Does <x,y> = 0? y 2 {0, 1}U |y|2 = 1 Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R||δ(inner productε) = Ω(ε-2log 1/δ)

L2-estimationε - log 1/δ factor is new, but want an (ε-2log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor x 2 {0, ε}U |x|2 = 1 U = 1/ε2¢ 1/δ What is |x-y|2 ? y 2 {0, 1}U |y|2 = 1 • |x-y|22 = |x|22 + |y|22 - 2<x, y> = 2 – 2<x,y> • Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε • So R||δ(L2-estimationε) = Ω(ε-2log 1/δ)

Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ε2¢ 1/δ] j 2 [U] i*2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr½ [U] All i: |Si| = 1/ε2 Is j 2 Si*? R||1/3(ALSI2) = (r ε-2log 1/δ)

Reduction of ALSI2 to L2-estimationε • - Set r = Θ(log n) • R|| δ(L2-estmationε) = (ε-2log n log 1/δ) • Streaming Space > R|| δ(L2-estimationε) S1 S2 … Sr x1 x2 … xr j Si*+1 … Sr yi* xi*+1 … xr } } y x y - x = 10i* yi* - i=1i* 10i¢ xi |y-x|2 is dominated by 10i* |yi* – xi*|2

Lower Bounds for Johnson-Lindenstrauss x 2 {-nO(1), …, nO(1)} t y 2 {-nO(1), …, nO(1)} t Use public randomness to agree on a JL matrix A • Can estimate |x-y|2 up to 1+ε w.p. 1-δ • - #rows(A) = (r ε-2log 1/δ /log n) • Set r = Θ(log n) Ax - Ay |A(x-y)|2

Low-Error Hamming Distance Universe = [n] Δ(x,y) = Hamming Distance between x and y x 2 {0,1}n y 2 {0,1}n • R||δ (Δ(x,y)ε) =(ε-2 log 1/δ log n) • Reduction to ALSI2 • Gap-Hamming to LSI2 reductions with Low Error • Implies our lower bounds for estimating • Any Lp-norm • Distinct Elements • Entropy

Conclusions • Prove first streaming space lower bounds that depend on probability of error δ • Optimal for Lp-norms, distinct elements • Improves lower bound for entropy • Optimal dimensionality bound for JL transforms • Adds several twists to augmented indexing proofs • Augmented indexing with a small set in a large domain • Proof builds upon lopsided set disjointness lower bounds • Uses multiple Gap-Hamming to Indexing reductions that handle low error

ALSI2 to Hamming Distance Embed multiple copies by duplicating coordinates at different scales j 2 [U] i*2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr½ [1/ε2¢ 1/δ] All i: |Si| = 1/ε2 - Let t = 1/ ε2 log 1/δ - Use public coin to generate t random strings b1, …, bt2 {0,1}t - Alice sets xi = majorityk in Si bi, k - Bob sets yi = bi ,j

Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error