Inference Problem Privacy Preserving Data Mining
Learn about covert, inference, and communication channels in secure database management systems. Understand statistical database inferences, attacks, and protection methods. Explore issues with queries in general-purpose databases. Enhance privacy through database constraints.
Inference Problem Privacy Preserving Data Mining
E N D
Presentation Transcript
Readings and Assignments • Required: • Pfleeger: Chapter 7 • Interesting reading: • I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.itd.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf • Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24 CSCE 522 - Farkas
Indirect Information Flow Channels • Covert channels • Inference channels CSCE 522 - Farkas
Communication Channels • Overt Channel: designed into a system and documented in the user's manual • Covert Channel: not documented. Covert channels may be deliberately inserted into a system, but most such channels are accidents of the system design. CSCE 522 - Farkas
Covert Channel • Timing Channel: based on system times • Storage channels: not time related communication • Can be turned into each other CSCE 522 - Farkas
Inference Channels Non-sensitive information Sensitive Information + Meta-data = CSCE 522 - Farkas
Inference Channels • Statistical Database Inferences • General Purpose Database Inferences CSCE 522 - Farkas
Statistical Databases • Goal: provide aggregate information about groups of individuals • E.g., average grade point of students • Security risk: specific information about a particular individual • E.g., grade point of student John Smith • Meta-data: • Working knowledge about the attributes • Supplementary knowledge (not stored in database) CSCE 522 - Farkas
Types of Statistics • Macro-statistics: collections of related statistics presented in 2-dimensional tables • Micro-statistics: Individual data records used for statistics after identifying information is removed CSCE 522 - Farkas
Statistical Compromise • Exact compromise: find exact value of an attribute of an individual (e.g., John Smith’s GPA is 3.8) • Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0) CSCE 522 - Farkas
Methods of Attacks and Protection • Small/Large Query Set Attack • C: characteristic formula that identifies groups of individuals If C identifies a single individual I, e.g., count(C) = 1 • Find out existence of property • If count(C and D)=1 means I has property D • If count(C and D)=0 means I does not have D OR • Find value of property • Sum(C, D), gives value of D CSCE 522 - Farkas
Small/Large Query Set Attack cont. • Protection from small/large query set attack: query-set-size control • A query q(C) is permitted only if N-n |C| n , where n 0 is a parameter of the database and N is all the records in the database CSCE 522 - Farkas
Tracker attack q(C) is disallowed C=C1 and C2 T=C1 and ~C2 Tracker C C2 C1 q(C)=q(C1) – q(T) CSCE 522 - Farkas
Tracker attack q(C and D) is disallowed C=C1 and C2 T=C1 and ~C2 C Tracker C2 C1 C and D q(C and D)= q(T or C and D) – q(T) D CSCE 522 - Farkas
Query overlap attack Q(John)=q(C1)-q(C2) C1 C2 Kathy Paul John Eve Max Fred Mitch Protection: query-overlap control CSCE 522 - Farkas
Insertion/Deletion Attack • Observing changes overtime • q1=q(C) • insert(i) • q2=q(C) • q(i)=q2-q1 • Protection: insertion/deletion performed as pairs CSCE 522 - Farkas
Statistical Inference Theory • Give unlimited number of statistics and correct statistical answers, all statistical databases can be compromised (Ullman) CSCE 522 - Farkas
Inferences in General-Purpose Databases • Queries based on sensitive data • Inference via database constraints • Inferences via updates CSCE 522 - Farkas
Queries based on sensitive data • Sensitive information is used in selection condition but not returned to the user. • Example: Salary: secret, Name: public NameSalary=$25,000 • Protection: apply query of database views at different security levels CSCE 522 - Farkas
Database Constraints • Integrity constraints • Database dependencies • Key integrity CSCE 522 - Farkas
Integrity Constraints • C=A+B • A=public, C=public, and B=secret • B can be calculated from A and C, i.e., secret information can be calculated from public data CSCE 522 - Farkas
Database Dependencies Metadata: • Functional dependencies • Multi-valued dependencies • Join dependencies • etc. CSCE 522 - Farkas
Functional Dependency • FD: A B, that is for any two tuples in the relation, if they have the same value for A, they must have the same value for B. • Example: FD: Rank Salary Secret information: Name and Salary together • Query1: Name and Rank • Query2: Rank and Salary • Combine answers for query1 and 2 to reveal Name and Salary together CSCE 522 - Farkas
Key integrity • Every tuple in the relation have a unique key • Users at different levels, see different versions of the database • Users might attempt to update data that is not visible for them CSCE 522 - Farkas
Example Secret View Public View CSCE 522 - Farkas
Updates Public User: • Update Black’s address to Orlando • Add new tuple: (Red, 22,000, Manassas) If Refuse update: covert channel Allow update: • Overwrite high data – may be incorrect • Create new tuple – which data it correct (polyinstantiation) – violate key constraints CSCE 522 - Farkas
Updates Secret user: • Update Black’s salary to 45,000 If Refuse update: denial of service Allow update: • Overwrite low data – covert channel • Create new tuple – which data it correct (polyinstantiation) – violate key constraints CSCE 522 - Farkas
Inference Problem • No general technique is available to solve the problem • Need assurance of protection • Hard to incorporate outside knowledge CSCE 522 - Farkas
The Inference Problem General Purpose Database: Non-confidential data + Metadata Undesired Inferences Web Enabled Data: Non-confidential data + Metadata (data and application semantics) + Computational Power + Connectivity Undesired Inferences
place address fort district basin Base Water source Confidential Correlated Inference Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Base Place base Public Public Water source Water Source
Inference Control Access Control Confidential Public X Misinfo Organizational Data Attacker X Data Integration and Inferences Ontology Web Data
Inference Control Confidential Public Misinfo Organizational Data ACCESS and INFERENCE CONTROL POLICY • Logic-based inference detection • Exact and partial disclosure • Data and metadata protection • Heterogeneous data manipulation • Metadata discovery
Data Mining and Privacy • Statistical inference: • K-anonymity • Correlation • General inference: • Pattern metadata • Biased learning CSCE 522 - Farkas
Next Class • Software security CSCE 522 - Farkas