Tracker Problem in Statistical Database Security
This article discusses the Tracker Problem in statistical databases, focusing on compromising confidential information using logical formulas and examples to illustrate its impact.
Tracker Problem in Statistical Database Security
E N D
Presentation Transcript
Query Size Restriction: The Database Tracker Problem EECS710: Information Security and Assurance Professor H. Saiedian From: Denning, et al “The Tracker: A Threat to Statistical Database Security” ACM TODBS, 1978
A statistical database • Construction of a characteristic formula C • A logical formula, operators: AND, OR, NOT (~) • Common queries • count (C) • sum (C; j) • Examples • count (M AND CS) = 3 short for count (Sex=‘M’ AND Dept=‘CS’) • sum (M OR ~CS; Salary) = $176K • sum (salary <= 15K; Contributions) = $180
Compormise • When confidential info is deduced • Positive: deduce a value • Negative: learn that a value is not in a given field (e.g., Baker did not contribute $200) • Secure: no compromise is possible • Example: a person knows that Dodd is a female CS professor • count (F AND CS AND Prof) = 1 • count (F AND CS AND Prof AND Salary <= 15K) = 1 • If count = 0, Dodd’s salary is not <= $15K
Setting a lower bound? • Setting a lower bound value helps but not always We know count (~C) = n – count (C) • Ask a tautology count (Prof OR ~Prof) = 12 count (~(F AND CS AND Prof)) = 11 12-11 = 1 female prof sum (Prof OR Prof; Salary) = $194K sum (~(F AND CS AND Prof; Salary)) = $179K Dodd’s salary = $194 - $179 = $15K
Need an upper bound also • Respond to query (C) if k ≤ count (C) ≤ n k reject otherwise • Note: k ≤ n/2 (otherwise all queries will be unanswerable)
What value for k? • If a questioner knows (from external sources) that individual I is uniquely characterized by C, then the questioner will seek whether I has characteristicα • Assume k = 2 • Because count(C AND α) ≤ count (C) = 1 < k questioner cannot use the above example • Questioner may divide C into two parts to calculate count (C AND α)
The database tracker • How? Divide C into C = C1 AND C2 such that count (C1 AND ~C2) and count (C1) are answerable • T = C1 AND ~C2 is called a tracker of I • it tracks down additional characteristics of I
Calculating the tracker • C = C1 AND C2 • T = C1 AND ~C2 • count (C) = count (C1) – count (T) • count (C AND α) = count (T OR C1 AND α) – count (T) • If count (C AND α) = 0 negative compromise • If count (C AND α) = count (C) positive compromise (I has α) • If count (C) = 1 arbitrary stats about I can be computed from query (C) = query (C1) – query (T)
A tracker example • Suppose k = 2 • Query (C) is answerable if 2 <= count (C) <= 10 • Questioner believes C = F AND CS AND Prof is Dodd • Constructs T = C1 AND ~C2 where C1 = “F” C2 = “CS AND Prof”
To verify the tracker count (F AND CS AND Prof) = count (F) – count (F AND ~(CS AND Prof)) = 5 – 4 = 1 To find Dodd’s salary, apply query (c) = query (A) – query (T) sum (F AND CS AND Prof; salary) = sum (F; Salary) – sum (F AND ~(CS AND Prof); salary)= $90K - $75K = $15K
Negative compromise also possible count (F AND CS AND Prof AND Salary > $15K) = count (F AND ~(CS AND Prof) OR F AND Salary > $15K) – count (F AND CS AND Prof) = 4 – 4 = 0