k-Anonymity: A Model for Protecting Privacy

k-Anonymity: A Model for Protecting Privacy Latanya Sweeney Carnegie Mellon University Intl. Journal on Uncertainty 2002 Presented by – Munawar Hafiz March 14, 2006

Confiden--tiality Integrity Availability Information Security Internet and networking technologies have made the access to information much easier. Correlation of information compromises privacy. Information security is also related with confidentiality, integrity and availability. • User Preference • Usability • Proof of compliance Information Security Slide 1 of 19

Overview of the Presentation • Re-identification of Data • Terminology • Introduction to k-Anonymity • Attacks against k-Anonymity • l-diversity Slide 2 of 19

Disease Birth Date Zip Sex Name Re-identification of Data 87% of the population in the USA can be uniquely identified by zip, sex and DoB. Slide 3 of 19

Terminology • Tuple – A row of data • Attribute – A column, A semantic category, A domain • Inference – Belief on a new fact based on some other information • Disclosure – Explicit and inferable information about a person. • Disclosure Control – Attempt to identify and limit disclosures. Quasi-identifier – A minimal set of attributes in table that can be joined with external information to re-identify individual records. Slide 4 of 19

Terminology (continued) Frequency Set Select count(*) from patients group by sex, zipcode Slide 5 of 19

k-Anonymity k-Anonymity A relation is said to satisfy k-Anonymity property if every count in the frequency set is greater than or equal to k. The relation is called k-Anonymous. In plain English, a row in a table cannot be distinguished from at least k other rows. Slide 6 of 19

Z2 = {537**} Z1 = {5371*. 5370*} S1 = {Person} S1 = {*} Generalization and Suppression Generalization A value is replaced by a less specific/more general value that is faithful to the original. Suppression Imposing on each value generalization hierarchy a new maximal element atop the old maximal element. Z0 = {53715. 53710, 53706, 53703} B0 = {1/21/76, 2/28/76, 4/13/86} S0 = {Male, Female} Slide 7 of 19

Z2 = {537**} Z1 = {5371*. 5370*} S1 = {Person} Generalization and Suppression (continued) Domain Generalization Relationship, <D Di≤D Dk Value Generalization Function, γ γ: Di→ Dk S0 = {Male, Female} Domain Generalization Hierarchy γ: Di → Dk Implied Domain Generalization IfDi ≤D Dk and Dk ≤D Dm then Di ≤D Dm Z0 = {53715. 53710, 53706, 53703} Composite Value Generalization Function, γ+ 5371* = γ(53715), 537** εγ+(53715) Slide 8 of 19

537** Person 5371* 5370* Z2 = {537**} Z1 = {5371*. 5370*} S1 = {Person} Female Male Generalization Hierarchy S0 = {Male, Female} 53710 53706 53703 53715 Z0 = {53715. 53710, 53706, 53703} Slide 9 of 19

<S1, Z1> [1, 1] [0, 2] <S0, Z2> [1, 0] <S1, Z0> [0, 1] <S0, Z1> Z2 = {537**} S1 = {Person} Z1 = {5371*. 5370*} Generalization Lattice <S1, Z2> S0 = {Male, Female} [1, 2] <S0, Z0> Generalization Lattice Z0 = {53715. 53710, 53706, 53703} [0, 0] Distance Vector Generalization Lattice Slide 10 of 19

Generalization Tables Slide 11 of 19

Taxonomy of k-Anonymization models Generalization vs. Suppression Model Only suppress data or use intermediate steps for generalization Global vs. Local Recoding Consider local data items or work on the values in the domain Hierarchy based vs. Partition based Fixed value generalization hierarchy vs. partition into disjoint ranges Slide 12 of 19

Attacks against k-Anonymity: Unsorted Matching Unsorted Matching Attack Solution - Random shuffling of rows Slide 13 of 19

Attacks against k-Anonymity: Complementary Release Complementary Release Attack Slide 14 of 19

black 9/7/65 male 02139 headache black 11/4/65 male 02139 rash black 1965 male 02139 headache black 1965 male 02139 rash Attacks against k-Anonymity: Temporal Temporal Attack PTt1 GTt1 Slide 15 of 19

Attacks against k-Anonymity: Homogeneity Homogeneity Attack k-Anonymity can create groups that leak information due to lack of diversity in sensitive attribute. Slide 16 of 19

Attacks against k-Anonymity: Background Knowledge Background Knowledge Attack k-Anonymity does not protect against attacks based on background knowledge. Slide 18 of 19

l-Diversity Slide 17 of 19

Discussion • k-Anonymity is other domains? • Complexity of k-Anonymity? • Trade-off between privacy guarantees and usefulness of collected data? Slide 19 of 19

References 1. Achieving k-Anonymity Privacy Protection using Generalization and Suppression, Latanya Sweeney 2. Anonymizing Tables, G. Aggarwal et al. 3. Incognito: Efficient Full-Domain k-Anonymity, LeFevre et al. 4. On the Complexity of Optimal k-anonymity, Meyerson et al. 5. General k-Anonymization is Hard, Meyerson et al. 6. Approximation Algorithms for k-Anonymity, Aggarwal et al. 7. l-Diversity: Privacy beyond k-Anonymity, Machanavajjhala et al. Extra Slide

k-Anonymity: A Model for Protecting Privacy