1 / 9

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi KDD Laboratory University of Pisa and ISTI-CNR

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi KDD Laboratory University of Pisa and ISTI-CNR Speaker: Maurizio Atzori GeoPKDD, Venezia 18 October 2005. Pisa KDD Laboratory http://www-kdd.isti.cnr.it/. Anonymity Aware Data Mining. Our Taxonomy of Privacy Preserving Data Mining.

Télécharger la présentation

M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi KDD Laboratory University of Pisa and ISTI-CNR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. M. Atzori, F. Bonchi, F. Giannotti, D. Pedreschi KDD Laboratory University of Pisa and ISTI-CNR Speaker: Maurizio Atzori GeoPKDD, Venezia 18 October 2005 Pisa KDD Laboratory http://www-kdd.isti.cnr.it/ Anonymity Aware Data Mining

  2. Our Taxonomy of Privacy Preserving Data Mining • Intesional Knowledge Hiding • Bertino’s approach, based on DB Sanitization • Extensional Knowledge Hiding • Agrawal’s approach, based on DB Randomization • Sweeney’s approach, based on DB Anonymization • Distributed Extensional Knowledge Hiding • Clifton’s approach based on Secure Multiparty Computation • Secure Intesional Knowledge Sharing • Clifton’s Public/Private/Unknown Attribute Framework • Zaiane’s Association Rule Sanitization (but also IKH) • Our approach

  3. A Motivating Example • Suppose that The Fair Hospital, by doing association analysis, discovered some interesting patterns in its patient data • Now, such an institution wants to: • publish those research results • NOT release any personal information of its patients (i.e. Information that regard only few patients)

  4. Association Analysis can Break Anonymity • a1  a2  a3  a4 • [sup = 80, conf = 98.7%]

  5. Association Analysis can Break Anonymity sup(a1, a2, a3, a4) conf • a1  a2  a3  a4 • [sup = 80, conf = 98.7%] • sup(a1,a2,a3) = = 81

  6. Association Analysis can Break Anonymity sup(a1, a2, a3, a4) conf • a1  a2  a3  a4 • [sup = 80, conf = 98.7%] • sup(a1,a2,a3) = = 81 • Therefore there is only one individual such that the pattern a1  a2  a3 ¬a4 holds • This information can be used as a quasi-identifier for that person, breaking her anonyimity

  7. Bad and Good Things • BAD THING: In general Data Mining results contain information related to few individuals • This is true even if we use high values for the support threshold (i.e. Forcing samples to be large) • GOOD THING: inference channels leading to anonymity breaches can be discovered and removed

  8. Our Approach: k-Anonymous Patterns • Based on Relational DB K-anonymity [Sweeney, 1998-2002] • We can transform a Relation into another more general s.t. Every row appears at least k times • From Data to Pattern k-anonymity • A set of patterns is k-anonymous if it is impossible to infer anything about a set of individuals of cardinality less than k • DB K-anonymity guarantees the extraction of only k-anonymous patterns, but: • Optimal DB k-anonymity is NP-Complete (very high computational cost) • DB k-anonymity destroys also safe information

  9. Summary: Present and Future • What we did: • Formalization of such anonymity threats in data mining models, based on the concept of Sweeney’s k-Anonymity • We developed an Inference Channel Detector able to discover all anonymity breaches [Atzori et al., PKDD05] • We developed Sanitization Strategies to slightly modify the output of data mining algorithm in order to guarantee [Atzori et al., ICDM05] • What we (hopefully) will do: • Apply k-anonymity framework to Spatio-Temporal Databases and Data Mining

More Related