1 / 27

How Robust are Linear Sketches to Adaptive Inputs?

How Robust are Linear Sketches to Adaptive Inputs?. Moritz Hardt , David P. Woodruff IBM Research Almaden. Two Aspects of Coping with Big Data. Efficiency Handle enormous inputs. Robustness Handle adverse conditions. Big Question: Can we have both?. Sketch y = Ax in R r.

cassie
Télécharger la présentation

How Robust are Linear Sketches to Adaptive Inputs?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden

  2. Two Aspects of Coping with Big Data Efficiency Handle enormous inputs Robustness Handle adverse conditions Big Question: Can we have both?

  3. Sketch y = Ax in Rr Algorithmic paradigm: Linear Sketches Applications: Compressed sensing, data streams, distributed computation, ... Unifying idea: Small number of linear measurements applied to data Data vector x in Rn Output Linear map A For this talk: output can be any (not necessarily efficient) function of y r << n

  4. “For each” correctness For each x: Pr { Alg(x) correct } > 1 – 1/poly(n) Pr over randomly chosen matrix A Does this imply correctness on many inputs? Only under modeling assumption: Inputs are non-adaptively chosen No guarantee if input x2 depends on Alg(x1) for earlier input x1 Why not?

  5. Example: Johnson-Lindenstrauss Sketch • Goal: estimate |x|2 from |Ax|2 • JL Sketch: if A is a k x n matrix of i.i.d. N(0, 1/k) random variable with k > log n, then Pr[|Ax|2 = (1±1/2)|x|2] > 1-1/poly(n) • Attack: 1. Query x = ei and x = ei + ej for all standard unit vectors ei and ej • Learn |Ai|2, |Aj|2, |Ai + Aj|2, so learn <Ai, Aj> 2. Hence, learn AT A, and learn kernel of A 3. Query a vector x 2 kernel(A)

  6. Correlations arise in nearly any realistic setting Benign/Natural Monitor traffic using sketch, re-route traffic based on output, affects future inputs. Adversarial DoS attack on network monitoring unit Can we thwart the attack? Can we prove correctness? In this work: Broad impossibility results

  7. Benchmark Problem GapNorm(B): Given decide if (YES) (NO) Goal: Show impossibility for very basic problem. Easily solvable for B = 1+ε using “for each” guarantee by sketch with O(log n/ε2) rows using JL.

  8. Main Result Theorem. For every B, given oracle access to a linear sketch using dimension r·n – log(Bn), we can find in time poly(r,B) a distribution over inputs on which sketch fails to solve GapNorm(B) Efficient attack (rules out crypto), even slightly non-trivial sketching dimension impossible Corollary. Same result for any lp-norm. Corollary. Same result even if algorithm uses internal randomness on each query.

  9. Application to Compressed Sensing l2/l2 recovery: on input x, output x’ for which: Theorem. No linear sketch with o(n/C2) rows gurantees l2/l2 sparse recovery with approximation factor C on a polynomial number of adaptively chosen inputs. Note: Impossible to achieve with deterministic matrix A, but possible with “for each” guarantee with r = k log(n/k). [Gilbert-Hemenway-Strauss-W-Wootters12] has some positive results

  10. Outline • Proof of Main Theorem for GapNorm • Proved using “Reconstruction Attack” • Sparse Recovery Result • By Reduction from GapNorm • Not in this talk

  11. Computational Model • Sketches Ax and UTx are equivalent, where UT has orthonormal rows and row-span(UT) = row-span(A) • Sketch UTx equivalent to PU x = UUT x satisfying Definition (Sketch) An r-dimensional sketch is any function Why? Sketch has unbounded computational power on top of PUx for some subspace

  12. Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V0 = {0}, subspace of 0 dimension Fort = 1 tot = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to Vt-1 (Progress) Put Vt = span{Vt-1, x} Output: Subspace Vr

  13. Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V0 = {0}, empty subspace Fort = 1 tot = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to Vt-1 (Progress) Put Vt = Vt-1 + span{x} Output: Subspace Vr

  14. Conditional Expectation Lemma Lemma. Given d-dimensional sketch f, we can find using poly(d) queries a distribution g such that: Moreover, 1. Δ ≥ poly(1/d) 2. g = N(0,σ)n for a carefully chosen σ unknown to sketching algorithm “Advantage over random”

  15. Simplification Fact: If g is Gaussian, then PUg = UUTg is Gaussian as well Hence, can think of query distribution as choosing random Gaussian g to be inside subspace U. We drop the PU projection operator for notational simplicity.

  16. The three step intuition (Symmetry) Since the queries are random Gaussian inputs g with an unknown variance, by spherical symmetry, sketch f learns nothing more about query distribution than norm |g| (Averaging) If |g| is larger than expected, the sketch is “more likely” to output 1 (Bayes) Hence, by sort-of-Bayes-Rule, conditioned on f(g)=1, expectation of |g| is likely to be larger

  17. Def. Let p(s) = Pr{ f(y) = 1 } y in U uniformly random with |y|2 = s Fact. If g is Gaussian with E|g|2 = t, then, density of χ2-distribution with expectation t and d degrees of freedom

  18. t By correctness of sketch p(s) = Pr(f(y) = 1) y in U unif. random with |y|2 = s 1 ? 0 Norm s d/n Bd/n

  19. Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V0 = {0}, empty subspace Fort = 1 tot = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting)Find single vector x strongly correlated with U, orthogonal to Vt-1 (Progress) Put Vt = Vt-1 + span{x} Output: Subspace Vr

  20. Boosting small correlations n • Sample poly(r) vectors • using CoEx Lemma x1 x2 x3 . . . xm M = • Compute top singular • vector x of M poly(r) Lemma: |PUx| > 1-poly(1/r) Proof: Discretization + Concentration

  21. Implementation in poly(r) time • W.l.og. can assume n = r + O(log nB) • Restrict host space to first r + O(log nB) coordinates • Matrix M is now O(r) x poly(r) • Singular vector computation poly(r) time

  22. Iterating previous steps Generalize Gaussian to “subspace Gaussian” = Gaussian vanishing on maintained subspace Vt Intuition: Each step reduces sketch dimension by one. After r steps: 1. Sketch has no dimensions left! 2. Host space still has n – r > O(log nB) dimensions

  23. Problem Top singular vector not exactly contained in U Formally, sketch still has dimension r Can fix this by adding small amount of Gaussian noise to all coordinates

  24. Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V0 = {0}, empty subspace Fort = 1 tot = r: (Correlation Finding) Find vectors x1,...,xm weakly correlated with unknown subspace U, orthogonal to Vt-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to Vt-1 (Progress) Put Vt = Vt-1 + span{x} Output: Subspace Vr

  25. Open Problems • Achievable polynomial dependence still open • Optimizing efficiency alone may lead to non-robust algorithms - What is the trade-off between robustness and efficiency in various settings of data analysis?

  26. Sliding χ2-distributions • ¢(s) = sl r (s-t) vt(s) dt • ¢(s) < 0 unless s > r – O(r1/2 log r) • s01¢(s) ds =s01sl r (s-t) vt(s) dt ds = 0

  27. Averaging Argument • h(s) = E[f(gt) | |g|2 = s] • Correctness: • For small s, h(s) ¼ 0, while for large s, h(s) ¼ 1 • s01 h(s) ¢(s) ds =s01sl r h(s) (s-t) vt(s) dt ds ¸ d • 9 t so s01 h(s) s vt(s) ds ¸ t s01 h(s) vt(s) ds + d/(l-r) • By a calculation, E[|gt|2 | f(gt) = 1] ¸ t + ¢

More Related