1 / 34

Inference Problem Privacy Preserving Data Mining

Inference Problem Privacy Preserving Data Mining. Readings and Assignments. Required: Pfleeger : Chapter 7 Interesting reading:

tmick
Télécharger la présentation

Inference Problem Privacy Preserving Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference ProblemPrivacy Preserving Data Mining

  2. Readings and Assignments • Required: • Pfleeger: Chapter 7 • Interesting reading: • I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.itd.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf • Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24 CSCE 522 - Farkas

  3. Indirect Information Flow Channels • Covert channels • Inference channels CSCE 522 - Farkas

  4. Communication Channels • Overt Channel: designed into a system and documented in the user's manual • Covert Channel: not documented. Covert channels may be deliberately inserted into a system, but most such channels are accidents of the system design. CSCE 522 - Farkas

  5. Covert Channel • Timing Channel: based on system times • Storage channels: not time related communication • Can be turned into each other CSCE 522 - Farkas

  6. Inference Channels Non-sensitive information Sensitive Information + Meta-data = CSCE 522 - Farkas

  7. Inference Channels • Statistical Database Inferences • General Purpose Database Inferences CSCE 522 - Farkas

  8. Statistical Databases • Goal: provide aggregate information about groups of individuals • E.g., average grade point of students • Security risk: specific information about a particular individual • E.g., grade point of student John Smith • Meta-data: • Working knowledge about the attributes • Supplementary knowledge (not stored in database) CSCE 522 - Farkas

  9. Types of Statistics • Macro-statistics: collections of related statistics presented in 2-dimensional tables • Micro-statistics: Individual data records used for statistics after identifying information is removed CSCE 522 - Farkas

  10. Statistical Compromise • Exact compromise: find exact value of an attribute of an individual (e.g., John Smith’s GPA is 3.8) • Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0) CSCE 522 - Farkas

  11. Methods of Attacks and Protection • Small/Large Query Set Attack • C: characteristic formula that identifies groups of individuals If C identifies a single individual I, e.g., count(C) = 1 • Find out existence of property • If count(C and D)=1 means I has property D • If count(C and D)=0 means I does not have D OR • Find value of property • Sum(C, D), gives value of D CSCE 522 - Farkas

  12. Small/Large Query Set Attack cont. • Protection from small/large query set attack: query-set-size control • A query q(C) is permitted only if N-n  |C|  n , where n  0 is a parameter of the database and N is all the records in the database CSCE 522 - Farkas

  13. Tracker attack q(C) is disallowed C=C1 and C2 T=C1 and ~C2 Tracker C C2 C1 q(C)=q(C1) – q(T) CSCE 522 - Farkas

  14. Tracker attack q(C and D) is disallowed C=C1 and C2 T=C1 and ~C2 C Tracker C2 C1 C and D q(C and D)= q(T or C and D) – q(T) D CSCE 522 - Farkas

  15. Query overlap attack Q(John)=q(C1)-q(C2) C1 C2 Kathy Paul John Eve Max Fred Mitch Protection: query-overlap control CSCE 522 - Farkas

  16. Insertion/Deletion Attack • Observing changes overtime • q1=q(C) • insert(i) • q2=q(C) • q(i)=q2-q1 • Protection: insertion/deletion performed as pairs CSCE 522 - Farkas

  17. Statistical Inference Theory • Give unlimited number of statistics and correct statistical answers, all statistical databases can be compromised (Ullman) CSCE 522 - Farkas

  18. Inferences in General-Purpose Databases • Queries based on sensitive data • Inference via database constraints • Inferences via updates CSCE 522 - Farkas

  19. Queries based on sensitive data • Sensitive information is used in selection condition but not returned to the user. • Example: Salary: secret, Name: public NameSalary=$25,000 • Protection: apply query of database views at different security levels CSCE 522 - Farkas

  20. Database Constraints • Integrity constraints • Database dependencies • Key integrity CSCE 522 - Farkas

  21. Integrity Constraints • C=A+B • A=public, C=public, and B=secret • B can be calculated from A and C, i.e., secret information can be calculated from public data CSCE 522 - Farkas

  22. Database Dependencies Metadata: • Functional dependencies • Multi-valued dependencies • Join dependencies • etc. CSCE 522 - Farkas

  23. Functional Dependency • FD: A  B, that is for any two tuples in the relation, if they have the same value for A, they must have the same value for B. • Example: FD: Rank  Salary Secret information: Name and Salary together • Query1: Name and Rank • Query2: Rank and Salary • Combine answers for query1 and 2 to reveal Name and Salary together CSCE 522 - Farkas

  24. Key integrity • Every tuple in the relation have a unique key • Users at different levels, see different versions of the database • Users might attempt to update data that is not visible for them CSCE 522 - Farkas

  25. Example Secret View Public View CSCE 522 - Farkas

  26. Updates Public User: • Update Black’s address to Orlando • Add new tuple: (Red, 22,000, Manassas) If Refuse update: covert channel Allow update: • Overwrite high data – may be incorrect • Create new tuple – which data it correct (polyinstantiation) – violate key constraints CSCE 522 - Farkas

  27. Updates Secret user: • Update Black’s salary to 45,000 If Refuse update: denial of service Allow update: • Overwrite low data – covert channel • Create new tuple – which data it correct (polyinstantiation) – violate key constraints CSCE 522 - Farkas

  28. Inference Problem • No general technique is available to solve the problem • Need assurance of protection • Hard to incorporate outside knowledge CSCE 522 - Farkas

  29. The Inference Problem General Purpose Database: Non-confidential data + Metadata Undesired Inferences Web Enabled Data: Non-confidential data + Metadata (data and application semantics) + Computational Power + Connectivity  Undesired Inferences

  30. place address fort district basin Base Water source Confidential Correlated Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Base Place base Public Public Water source Water Source

  31. Inference Control Access Control Confidential Public X Misinfo Organizational Data Attacker X Data Integration and Inferences Ontology Web Data

  32. Inference Control Confidential Public Misinfo Organizational Data ACCESS and INFERENCE CONTROL POLICY • Logic-based inference detection • Exact and partial disclosure • Data and metadata protection • Heterogeneous data manipulation • Metadata discovery

  33. Data Mining and Privacy • Statistical inference: • K-anonymity • Correlation • General inference: • Pattern  metadata • Biased learning CSCE 522 - Farkas

  34. Next Class • Software security CSCE 522 - Farkas

More Related