1 / 33

Foundations of Privacy Lecture 5

Foundations of Privacy Lecture 5. Lecturer: Moni Naor. Desirable Properties from a sanitization mechanism. Composability Applying the sanitization several time yields a graceful degradation Will see: t releases , each  -DP , are t ¢  - DP

cecil
Télécharger la présentation

Foundations of Privacy Lecture 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Foundations of PrivacyLecture 5 Lecturer:Moni Naor

  2. Desirable Properties from a sanitization mechanism • Composability • Applying the sanitization several time yields a graceful degradation • Will see: treleases , each -DP, are t¢ -DP • Next class: (√t+t2,)-DP (roughly) • Robustness to side information • No need to specify exactly what the adversary knows: • knows everything except one row Differential Privacy: satisfies both…

  3. Dwork, McSherryNissim & Smith 2006 Differential Privacy Protect individual participants: Curator/ Sanitizer M D1 + Curator/ Sanitizer M D2

  4. Differential Privacy Protect individual participants: Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB. May as well participate in DB… ε-differentially private sanitizer M For all DBs D, all individuals I and all events T Adjacency: D+I andD-I Handles aux input PrA[M(D+I)2 T] ≤ eε≈1+ε e-ε≤ PrA[M(D-I)2 T]

  5. Pr [response] (Bad) Responses: Z Z Z Differing in one user Differential Privacy Sanitizer Mgives -differential privacy if: for all adjacentD1and D2, and all Aµrange(M): Pr[M(D1) 2A] ≤ ePr[M(D2) 2A] ratio bounded • Participation in the data set poses no additional risk

  6. Example of Differential Privacy X is a set of(name,tag 2 {0,1})tuples One query: #of participants with tag=1 Sanitizer : output #of 1’s + noise • noise from Laplace distribution with parameter 1/ε • Pr[noise = k-1] ≈ eε Pr[noise=k] 0 -4 -3 -2 -1 1 2 3 4 5 0 5 -4 -3 -2 -1 1 2 3 4

  7. Pr [response] Z Z Z Bad Responses: (, d)- Differential Privacy Sanitizer Mgives (, d) -differential privacy if: for all adjacentD1and D2, and all Aµrange(M): Pr[M(D1) 2A] ≤ ePr[M(D2) 2A]+ d ratio bounded This course: dnegligible Typical setting and negligible

  8. Example: NO Differential Privacy U set of(name,tag 2{0,1})tuples One counting query: #of participants with tag=1 Sanitizer A: choose and release a few random tags Bad event T: Only my tag is 1, my tag released PrA[A(D+Me)2T] ≥ 1/n PrA[A(D-Me) 2 T] = 0 • Not ε diff private for any ε! • It is (0,1/n) Differential Private PrA[A(D+Me) 2 T] ≤ eε≈1+ε e-ε≤ PrA[A(D-Me) 2 T]

  9. Counting Queries Databasexof sizen Counting-queries Qis a setof predicates q: U {0,1} Query: how manyxparticipants satisfy q? Relaxed accuracy: answer query withinαadditive errorw.h.p Not so bad:someerror anyway inherent in statistical analysis Queryq nindividuals, each contributing a single point in U U Sometimes talk about fraction

  10. Bound on Achievable Privacy Want to get bounds on the • Accuracy • The responses from the mechanism to all queries are assured to be within α except with probability  • Number of queries t for which we can receive accurate answers • The privacy parameter εfor which ε differential privacy is achievable • Or (ε,) differential privacy is achievable

  11. Blatant Non Privacy Mechanism M is Blatantly Non-Private if there is an adversary A that • On any database D of size n can select queries and use the responses M(D) to reconstruct D’ such that ||D-D’||12 o(n) D’ agrees with D in all but o(n) of the entries. Claim: Blatant non privacy implies that M is not(, d)-DP for any constant 

  12. Sanitization Can’t be Too Accurate Usual counting queries • Query: qµ[n] • i2 qdiResponse = Answer + noise Blatant Non-Privacy: Adversary Guesses 99% bits Theorem: If allresponses are within o(n) of the true answer, then the algorithm is blatantly non-private. But: require exponential # of queries .

  13. Proof: Exponential Adversary • Focus on Column Containing Super Private Bit • Assume all answers are within error bound . 1 0 “The database” 0 1 0 1 1 Vector d2 {0,1}n Will show that  cannot be o(n)

  14. Proof: Exponential Adversary for Blatant Non Privacy • Estimate #1’s in all possible sets • 8Sµ[n]: |M(S) – i2Sdi | ≤  • Weed Out “Distant” DBs • For each possible candidate database c 2 {0,1}n: If for anySµ[n]: |i2 Sci – M(S)| > , then rule out c. • If cnot ruled out, halt and output c Claim: Real database d won’t be ruled out M(S): answer on S

  15. Proof: Exponential Adversary • Assume: 8Sµ[n]: |M(S) – i2Sdi | ≤  Claim: For c that has not been ruled out Hamming distance (c,d) ≤ 2 0 |M(S0) - i2S0ci | ≤ (c not ruled out) |M(S1) - i2S1ci | ≤ (c not ruled out) 1 S0 0 1 ≤ 2 1 ≤ 4 0 ≤ 2 0 S1 1 1 1 c d

  16. ? answer 1 answer 3 answer 2 Impossibility of Exponential Queries The result means that we cannot sanitize the data and publish a data structure so that • for all queries the answer can be deduced correctly to within 2 o(n) On the other hand: we will see that we can get accuracy up to log |Q| Sanitizer query 1,query 2,. . . Database

  17. What can we do efficiently? Allowed “too” much power to the adversary • Number of queries: exponential • Computation: exponential • On the other hand: lack of wild errors in the responses Theorem: For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it is blatantly non-private even against a polynomial time adversary making O(n log2 n) random queries.

  18. The Model • As before: database dis a bit string of length n. • Counting queries: • A query is a subset qµ{1, …, n} • The (exact) answer is aq= i2qdi • -perturbation • for an answer: aq±

  19. What If We Had Exact Answers? • Consider a mechanism 0-perturbations • Receive the exact answer aq= i2qdi Then with n linearly independent queries – over the reals we could reconstruct d precisely: • Obtain n linearly equations aq= i2qciand solve uniquely When we have -perturbations : get an inequality • aj- ≤ i2qci ≤ aj+ Idea: use linear programming A solution must • exist: d itself

  20. Privacy requires Ω(√n) perturbation • For every query qj: its answer according to c is • at most 2 far from its (real) answer in d. Consider a database with o(√n) perturbation • Adversary makes t = n log2 n random queries qj, getting noisyanswersaj • Privacy violating Algorithm: Construct database c = {ci}1 ≤ i ≤ nby solving Linear Program: 0 ≤ ci ≤ 1 for 1 ≤ i ≤ n aj- ≤ i2qci ≤ aj+ for 1 ≤ j ≤ t • Round the solution: • if ci> 1/2 set to 1 and to 0 otherwise A solution must • exist: d itself

  21. Bad solutions to LP do not survive A query q disqualifiesa potential database c 2 [0,1]nif its answer on q is more than 2far answer in d: |i2qci-i2qdi| > 2 • Idea: show that for a database c that is far away from d a random query disqualifiesc with some constant probability  • Want to use the Union Bound: all far away solutions are disqualified w.p. at least 1 – nn(1 - )t = 1–neg(n) How do we limit the solution space? Round each value to closest 1/n

  22. Privacy requires Ω(√n) perturbation A query q disqualifies a potential database c 2[0,1]n if its answer on q is more than 2far answer in d: Lemma: if c isfar away from d, then a random query disqualifies c with some constant probability  • If Probi2 [n] [|di-ci| ¸1/3] > , then there is a >0 such that Probq2 {0,1}[n] [|i2q(ci – di)|¸ 2+1] >  Proof uses Azuma’s inequality

  23. Privacy requires Ω(√n) perturbation Can discretize all potential databases c 2[0,1]n: Suppose we round each entry ci to closest fraction with denominator n: |ci – wi/n| · 1/n The response on q change by at most 1. • If we disqualify all `discrete’ databases then we also effectively eliminate all c 2 [0,1]n • There are nn `discrete’ databases

  24. Privacy requires Ω(√n) perturbation A query q disqualifies a potential database c 2[0,1]n if its answer on q is more than 2far answer in d: Claim:ifc isfar away from d, then a random query disqualifies c with some constant probability  • Therefore: t = n log2 n queries leave a negligible probability for each far away reconstruction. • Union bound: all far away suggestions are disqualified w.p. at least 1 – nn(1 - )t = 1 – neg(n) Count number of entries far from d Can apply union bound by discretization

  25. Review and Conclusion • When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n)-close db. • Database reconstructed using Linear programming – polynomial time. o(√n)databases are Blatantly Non-Private. • poly(n) time reconstructable

  26. Composition Suppose we are going to apply a DP mechanism t times. • Perhaps on different databases Want to argue that result is differentially private • A value b2{0,1} is chosen • In each of the t rounds adversary A picks two adjacent databases D0iand D1iand receives result ziof an -DP mechanism Mi on Dbi • Want to argue A‘s view is within  for both values of b • A‘s view: (z1, z2, …, zt)plus randomness used.

  27. Differential Privacy: Composition P[z1] = Pr z~A1(D)[z=z1] P’[z1] = Pr z~A1(D’)[z=z1] Handles auxiliary information Composes naturally • A1(D) is ε1-diffP • for all z1, A2(D,z1) is ε2-diffP, Then A2(D,A1(D)) is (ε1+ε2)-diffP Proof:for all adjacentD,D’ and (z1,z2):e-ε1 ≤ P[z1] /P’[z1]≤ eε1e-ε2 ≤ P[z2] /P’[z2]≤ eε2 e-(ε1+ε2) ≤ P[(z1,z2)]/P’[(z1,z2)]≤ eε1+ε2 P[z2] = Pr z~A2(D,z1)[z=z2] P’[z2] = Pr z~A2(D’,z1)[z=z2]

  28. Differential Privacy: Composition • If all mechanisms Miare -DP, then for any view the probability that A gets the view when b=0 and when b=1 are with et Therefore results for a single query translate to results on several queries

  29. Answering a single counting query Uset of(name,tag2 {0,1})tuples One counting query: #of participants with tag=1 Sanitizer A: output #of 1’s + noise Differentially private! If choose noise properly Choose noise from Laplace distribution

  30. Laplacian Noise 0 -4 -3 -2 -1 1 2 3 4 5 Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e-|y|/b Standard deviation: O(b) Take b=1/ε, get thatPr[Y=y] Çe-|y|

  31. Laplacian Noise: ε-Privacy 0 -4 -3 -2 -1 1 2 3 4 5 Take b=1/ε, get thatPr[Y=y] Ç e-|y| Release: q(D) + Lap(1/ε) For adjacent D,D’:|q(D) – q(D’)| ≤ 1 For outputa:e- ≤Prby D[a]/Prby D’[a]≤ e

  32. Laplacian Noise: ε-Privacy Theorem: the Laplace mechanism with parameter b=1/ is -differential private 0 -4 -3 -2 -1 1 2 3 4 5

  33. Laplacian Noise: Õ(1/ε)-Error 0 -4 -3 -2 -1 1 2 3 4 5 Take b=1/ε, get thatPr[Y=y] Ç e-|y| Concentration of the Laplace distribution: Pry~Y[|y| > k·1/ε] = O(e-k) Setting k=O(log n) Expected error is 1/ε, w.h.p error is Õ(1/ε)

More Related