1 / 39

Personalized Privacy Preservation

Personalized Privacy Preservation. Xiaokui Xiao, Yufei Tao City University of Hong Kong. Privacy preserving data publishing. Microdata Purposes: Allow researchers to effectively study the correlation between various attributes Protect the privacy of every patient. A naïve solution.

zed
Télécharger la présentation

Personalized Privacy Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong

  2. Privacy preserving data publishing Microdata • Purposes: • Allow researchers to effectively study the correlation between various attributes • Protect the privacy of every patient

  3. A naïve solution • It does not work. See next. publish

  4. Inference attack An external database (a voter registration list) Published table Quasi-identifier (QI) attributes An adversary

  5. Generalization • Transform each QI value into a less specific form A generalized table An external database Information loss

  6. k-anonymity • The following table is 2-anonymous Quasi-identifier (QI) attributes Sensitive attribute 5 QI groups

  7. Drawback of k-anonymity • What is the disease of Linda? A 2-anonymous table An external database

  8. A better criterion: l-diversity • Each QI-group • has at least l different sensitive values • even the most frequent sensitive value does not have a lot of tuples A 2-diverse table An external database

  9. Motivation 1: Personalization • Andy does not want anyone to know that he had a stomach problem • Sarah does not mind at all if others find out that she had flu A 2-diverse table An external database

  10. Motivation 2: Non-primary case Microdata

  11. Motivation 2: Non-primary case (cont.) 2-diverse table An external database

  12. Motivation 3: SA generalization • How many female patients are there with age above 30? • 4 ∙ (60 – 30 + 1) / (60 – 21 + 1) = 3 • Real answer: 1 An external database A generalized table

  13. Motivation 3: SA generalization (cont.) • Generalization of the sensitive attribute is beneficial in this case A better generalized table An external database

  14. Personalized anonymity • We propose • a mechanism to capture personalized privacy requirements • criteria for measuring the degree of security provided by a generalized table • an algorithm for generating publishable tables

  15. Guarding node • Andy does not want anyone to know that he had a stomach problem • He can specify “stomach disease” as the guarding node for his tuple • The data publisher should prevent an adversary from associating Andy with “stomach disease”

  16. Guarding node • Sarah is willing to disclose her exact symptom • She can specify Ø as the guarding node for her tuple

  17. Guarding node • Bill does not have any special preference • He can specify the guarding node for his tuple as the same with his sensitive value

  18. A personalized approach

  19. Personalized anonymity • A table satisfies personalized anonymity with a parameter pbreach • Iff no adversary can breach the privacy requirement of any tuple with a probability abovepbreach • If pbreach = 0.3, then any adversary should have no more than 30% probability to find out that: • Andy had a stomach disease • Bill had dyspepsia • etc

  20. Personalized anonymity • Personalized anonymity with respect to a predefined parameter pbreach • an adversary can breach the privacy requirement of any tuple with a probability at most pbreach • We need a method for calculating the breach probabilities What is the probability that Andy had some stomach problem?

  21. Combinatorial reconstruction • Assumptions • the adversary has no prior knowledge about each individual • every individual involved in the microdata also appears in the external database

  22. Combinatorial reconstruction • Andy does not want anyone to know that he had some stomach problem • What is the probability that the adversary can find out that “Andy had a stomach disease”?

  23. Combinatorial reconstruction (cont.) • Can each individual appear more than once? • No = the primary case • Yes = the non-primary case • Some possible reconstructions: the primary case the non-primary case

  24. Combinatorial reconstruction (cont.) • Can each individual appear more than once? • No = the primary case • Yes = the non-primary case • Some possible reconstructions: the primary case the non-primary case

  25. Breach probability (primary) • Totally 120 possible reconstructions • If Andy is associated with a stomach disease in nb reconstructions • The probability that the adversary should associate Andy with some stomach problem is nb/ 120 • Andy is associated with • gastric ulcer in 24 reconstructions • dyspepsia in 24 reconstructions • gastritis in 0 reconstructions • nb = 48 • The breach probability for Andy’s tuple is 48 / 120 = 2 / 5

  26. Breach probability (non-primary) • Totally 625 possible reconstructions • Andy is associated with gastric ulcer or dyspepsia or gastritis in 225 reconstructions • nb = 225 • The breach probability for Andy’s tuple is 225 / 625 = 9 / 25

  27. Breach probability: Formal results

  28. Breach probability: Formal results

  29. More in our paper • An algorithm for computing generalized tables that • satisfies personalized anonymity with predefined pbreach • reduces information loss by employing generalization on both the QI attributes and the sensitive attribute

  30. Experiment settings 1 • Goal: To show that k-anonymity and l-diversity do not always provide sufficient privacy protection • Real dataset • Pri-leaf • Nonpri-leaf • Pri-mixed • Nonpri-mixed • Cardinality = 100k

  31. Degree of privacy protection (Pri-leaf) pbreach = 0.25 (k = 4, l = 4)

  32. Degree of privacy protection (Nonpri-leaf) pbreach = 0.25 (k = 4, l = 4)

  33. Degree of privacy protection (Pri-mixed) pbreach = 0.25 (k = 4, l = 4)

  34. Degree of privacy protection (Nonpri-mixed) pbreach = 0.25 (k = 4, l = 4)

  35. Experiment settings 2 • Goal: To show that applying generalization on both the QI attributes and the sensitive attribute will lead to more effective data analysis

  36. Accuracy of analysis (no personalization)

  37. Accuracy of analysis (with personalization)

  38. Conclusions • k-anonymity and l-diversity are not sufficient for the Non-primary case • Guarding nodes allow individuals to describe their privacy requirements better • Generalization on the sensitive attribute is beneficial

  39. Thank you! Datasets and implementation are available for download at http://www.cs.cityu.edu.hk/~taoyf

More Related