200 likes | 213 Vues
Explore how auditors may inadvertently compromise your privacy through online auditing methods. Learn about statistical database auditing and the tradeoff between information privacy and data queries. Discover different auditing techniques and their limitations.
E N D
Online Auditing - How may Auditors Inadvertently Compromise Your Privacy Kobbi Nissim Microsoft With Nina MishraHP/Stanford Work in progress
q = (f ,i1,…,ik) f (di1,…,dik) The Setting Statisticaldatabase • Dataset: d={d1,…,dn} • Entries di: Real, Integer, Boolean • Query: q = (f ,i1,…,ik) • f : Min, Max, Median, Sum, Average, Count… • Bad users will try to breach the privacy of individuals
The Data Privacy Game: an Information-Privacy Tradeoff f i f f • Private functions: • Want to hide i(d)=di • Information functions: • Want to reveal query answers f(di1,…,dik) • Major question: what may be computed over d (and given to users) without breaching privacy? • Confidentiality control methods • Perturbation methods: give `noisy’ answers • Query restriction methods: limit the queries users may post, usually imposing some structure (e.g. size/overlap restrictions)
Auditing • [AW89] classify auditing as a query restriction method: • “Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued” • Partial motivation:May allow for more queries to be posed, if no privacy threat occurs • Early work: Hofmann 1977, Schlorer 1976, Chin, Ozsoyoglu 1981, 1986 • Recent interest:Kleinberg, Papadimitriou, Raghavan 2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003
Statisticaldatabase Auditor Auditing Here’s the answer OR Query denied (as the answer would cause privacy loss) Here’s a new query: qi+1 Query log q1,…,qi
Design choices in Prior Work (1) • Privacy definition: • Privacy breached (only) when a database entry may be deduced fully, or within some accuracy • These privacy guarantees do not generally suffice: • Should take into account: Adversary’s computational power, prior knowledge, access to other databases… • Exact answers given • Auditors viewed as a way to give `quality’ answers???
Design choices in Prior Work (2) 3. Which information is taken into account in the auditor decision procedure: • Decision made based on queries q1,…,qi,qi+1and their answers a1,…,ai,ai+1 • Denials ignored 4. Offline vs. Online: • Offline auditing: queries and answers checked for compromise at the end of the day • Only detect breaches • Online auditing: answer/deny queries on the fly • Prevent breaches just before they happen
Auditor Example 1: Sum/Max auditing • di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) Oh well…
Data Queries Breach Complexity Sum/Max [Chin] real Sum/max di learned NP-hard Boolean [KPR00] 0/1 Sum --”-- NP-hard* Max [KPR00] Real Max --”-- PTIME Interval based [LWWJ02] di[a,b] sum di within accuracy PTIME Generalized results [JK03] NP-hard /PTIME Some Prior Work on Auditors * Approx version in PTIME Can we use the offline version for online auditing?
Auditor … After Two Minutes … • di real, sum/max queries, privacy breached if some di learned q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) There must be a reason for the denial… q2 is denied iff d1=d2=d3 = 5 I win! Oh well…
Auditor Example 2: Interval Based Auditing • di [0,100], sum queries, =1 (PTIME) q1 = sum(d1,d2) Sorry, denied q2 = sum(d2,d3) sum(d2,d3) = 50 d1,d2 [0,1] d3 [49,50] Denial d1,d2[0,1] or [99,100]
Colonel Oliver North, on the Iran-Contra Arms Deal: On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights. • David Duncan, Former auditor for Enron and partner in Andersen: Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States. Sounds Familiar?
dn-1 dn d2 d4 d6 d1 d5 d7 d8 … d3 q2 = max(d1,d2,d3) q2 = max(d1,d2) Auditor Max Auditing • di real q1 = max(d1,d2,d3,d4) M1234 M123 / denied If denied: d4=M1234 M12 / denied If denied: d3=M123
Auditor Adversary’s Success q1 = max(d1,d2,d3,d4) If denied: d4=M1234 q2 = max(d1,d2,d3) Denied with probability 1/4 q2 = max(d1,d2) If denied: d3=M123 Denied with probability 1/3 Success probability: 1/4 + (1- 1/4)·1/3 = 1/2 Recover 1/8 of the database!
d2 dn-1 dn … d8 d7 d5 d3 d6 d1 d4 q1 = sum(d1,d2) q2=sum(d2,d3) q2=sum(di,dj,dk) Auditor Boolean Auditing? • di Boolean 1 / denied 1 / denied … qi denied iff di = di+1 learn database/complement Let di,dj,dk not all equal, where qi-1, qi,qj-1, qj, qk-1, qk all denied 1 / 2 Recover the entire database!
Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai) qi+1 denied Two Problems • Obvious problem: denied queries ignored • Algorithmic problem: not clear how to incorporate denials in the decision • Subtle problem: • Query denials leak (potentially sensitive) information • Users cannot decide denials by themselves
“Safe” “Unsafe” “Safe” q1,…,qi, qi+1 a1,…,ai q1,…,qi, qi+1 a1,…,ai, ai+1 q1,…,qi, qi+1 A Spectrum of Auditors Size overlap restriction Algebraic structure > privacy < utility *Note: can work in “unsafe” region, but need to prove denials do not leak crucial information
q1,…,qi Statisticaldatabase q1,…,qia1,…,ai qi+1 qi+1 Simulator Auditor Deny/answer Deny/answer Simulatable Auditing* An auditor is simulatable if a simulator exists s.t.: Simulation denials do not leak information * `self auditors’ in [DN03]
Possible assignments to {d1,…,dn} Assignments consistent with (q1,…qi, a1,…,ai ) qi+1 denied/allowed Why Simulatable Auditors do not Leak Information?
Summary • Improper usage of auditors may lead to privacy breaches, due to information leakage in the decision procedure. • Cell suppression / some k-anonymity methods should be checked similarly • Should make sure offline auditors do not leak information in decision • Simulatable auditors provably don’t leak information • Give best utility while still “safe” • A launching point for further research on auditors • Further research: • Auditors with more reasonable privacy guarantees