210 likes | 280 Vues
This paper explores optimal algorithms for estimating statistics of a vector with distinct elements, Lp-norm, entropy, and more, aiming to achieve a (1+ε)-approximation with limited memory. It discusses various optimal results and lower bounds for different streaming problems, including F0 estimation, entropy estimation, and Johnson-Lindenstrauss Transforms. The communication complexity separation and techniques like Lopsided Set Intersection (LSI) and Augmented Lopsided Set Intersection (ALSI) are also addressed. Additionally, it delves into the lower bounds for Johnson-Lindenstrauss transforms.
E N D
Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error T.S. Jayram David Woodruff IBM Almaden
Data Stream Model • Have a stream of m updates to an n-dimensional vector v • “add x to coordinate i” • Insertion model -> all updates x are positive • Turnstile model -> x can be positive or negative • stream length and updates < poly(n) • Estimate statistics of v • # of distinct elements F0 • Lp-norm |v|p = (Σi |vi|p )1/p • entropy • and so on • Goal: output a (1+ε)-approximation with limited memory
Lots of “Optimal” Papers • Lots of “optimal” results • “An optimal algorithm for the distinct elements problem” [KNW] • “Fast moment estimation in optimal space” [KNPW] • “A near-optimal algorithm for estimating entropy of a stream” [CCM] • “Optimal approximations of the frequency moments of data streams” [IW] • “A near-optimal algorithm for L1-difference” [NW] • “Optimal space lower bounds for all frequency moments” [W] • This paper • Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error
What Is Optimal? • F0 = # of non-zero entries in v • “For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε-2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition.” • If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n • So “optimal” algorithms are only optimal for algorithms with constant success probability
Can We Improve the Lower Bounds? x 2 {0,1}ε-2 y 2 {0,1}ε-2 Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε-2) with 1/3 error probability But upper bound of ε-2 with 0 error probability
Streaming Results • Independent repetition is optimal! • Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ • Ω(ε-2 log n log 1/δ) bits for any p • [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2 • Estimating F0 in insertion model up to 1+ε w.p. 1-δ • Ω(ε-2log 1/δ + log n) bits • [KNW] get O(ε-2 log 1/δ) for ε-2 > log n • Estimating entropy in turnstile model up to 1+ε w.p. 1-δ • Ω(ε-2log n log 1/δ) bits • Improves Ω(ε-2 log n) bound [KNW]
Johnson-Lindenstrauss Transforms • Let A be a random matrix so that with probability 1- δ, for any fixed q 2 Rd |Aq|2 = (1 ± ε) |q|2 • [JL] A can be a 1/ε2 log 1/δ x d matrix • Gaussians or sign variables work • [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε rows • Our result: A needs to have 1/ε2 log 1/δ rows
Communication Complexity Separation f(x,y) 2 {0,1} y x 1 0 D1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f)
Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2r rows occur in communication matrix on these columns [KNR]: R||1/3(f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R||δ(f) = Θ(k log 1/δ) while R||δ(g) = Θ(k)
Lopsided Set Intersection (LSI) U = 1/ε2¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1/δ • Alice cannot describe S with o(ε-2 log U) bits • If x, y are uniform then with constant probability, S Å T = ; • R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2log 1/δ)
Lopsided Set Intersection (LSI2) U = 1/ε2¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1 • R||δ/3(LSI2) ¸ R||1/3(LSI) = Ω(ε-2log 1/δ) • Union bound over set elements in LSI instance
Low Error Inner Product x 2 {0, ε}U |x|2 = 1 U = 1/ε2¢ 1/δ Does <x,y> = 0? y 2 {0, 1}U |y|2 = 1 Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R||δ(inner productε) = Ω(ε-2log 1/δ)
L2-estimationε - log 1/δ factor is new, but want an (ε-2log n log 1/δ) lower bound - Can use a known trick to get an extra log n factor x 2 {0, ε}U |x|2 = 1 U = 1/ε2¢ 1/δ What is |x-y|2 ? y 2 {0, 1}U |y|2 = 1 • |x-y|22 = |x|22 + |y|22 - 2<x, y> = 2 – 2<x,y> • Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε • So R||δ(L2-estimationε) = Ω(ε-2log 1/δ)
Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ε2¢ 1/δ] j 2 [U] i*2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr½ [U] All i: |Si| = 1/ε2 Is j 2 Si*? R||1/3(ALSI2) = (r ε-2log 1/δ)
Reduction of ALSI2 to L2-estimationε • - Set r = Θ(log n) • R|| δ(L2-estmationε) = (ε-2log n log 1/δ) • Streaming Space > R|| δ(L2-estimationε) S1 S2 … Sr x1 x2 … xr j Si*+1 … Sr yi* xi*+1 … xr } } y x y - x = 10i* yi* - i=1i* 10i¢ xi |y-x|2 is dominated by 10i* |yi* – xi*|2
Lower Bounds for Johnson-Lindenstrauss x 2 {-nO(1), …, nO(1)} t y 2 {-nO(1), …, nO(1)} t Use public randomness to agree on a JL matrix A • Can estimate |x-y|2 up to 1+ε w.p. 1-δ • - #rows(A) = (r ε-2log 1/δ /log n) • Set r = Θ(log n) Ax - Ay |A(x-y)|2
Low-Error Hamming Distance Universe = [n] Δ(x,y) = Hamming Distance between x and y x 2 {0,1}n y 2 {0,1}n • R||δ (Δ(x,y)ε) =(ε-2 log 1/δ log n) • Reduction to ALSI2 • Gap-Hamming to LSI2 reductions with Low Error • Implies our lower bounds for estimating • Any Lp-norm • Distinct Elements • Entropy
Conclusions • Prove first streaming space lower bounds that depend on probability of error δ • Optimal for Lp-norms, distinct elements • Improves lower bound for entropy • Optimal dimensionality bound for JL transforms • Adds several twists to augmented indexing proofs • Augmented indexing with a small set in a large domain • Proof builds upon lopsided set disjointness lower bounds • Uses multiple Gap-Hamming to Indexing reductions that handle low error
ALSI2 to Hamming Distance Embed multiple copies by duplicating coordinates at different scales j 2 [U] i*2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr½ [1/ε2¢ 1/δ] All i: |Si| = 1/ε2 - Let t = 1/ ε2 log 1/δ - Use public coin to generate t random strings b1, …, bt2 {0,1}t - Alice sets xi = majorityk in Si bi, k - Bob sets yi = bi ,j