1 / 26

Sovereign Information Integration

Sovereign Information Integration. Rakesh Agrawal Jt. Work with Srikant & Evfimievski. Outline. Motivation Problem Statement Protocols Challenges. Information Integration Today. Assumption: Information in each database can be freely shared. Mediator. Q. R. Q. R. Centralized.

tyler
Télécharger la présentation

Sovereign Information Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sovereign Information Integration Rakesh Agrawal Jt. Work with Srikant & Evfimievski

  2. Outline • Motivation • Problem Statement • Protocols • Challenges

  3. Information Integration Today Assumption: Information in each database can be freely shared. Mediator Q R Q R Centralized Federated

  4. Need for a new style of information sharing • Compute queries across databases so that no more information than necessary is revealed (without using a trusted third party). • Need is driven by several trends: • End-to-end integration of information systems across companies. • Simultaneously compete and cooperate. • Security: need-to-know information sharing

  5. Selective Document Sharing • R is shopping for technology. • S has intellectual property it may want to license. • First find the specific technologies where there is a match, and then reveal further information about those. R Shopping List S Technology List Example 2: Govt. agencies sharing information on a need-to-know basis.

  6. Medical Research • Validate hypothesis between adverse reaction to a drug and a specific DNA sequence. • Researchers should not learn anything beyond 4 counts: DNA Sequences Mayo Clinic Drug Reactions

  7. Caveats • Schema Discovery & Heterogeneity • Multiple Queries

  8. Minimal Necessary Mediator Mediator Q Q R R And of course… Hybrids of Centralized, Federated, and Sovereign Architectures

  9. Outline • Motivation • Problem Statement • Protocols • Challenges

  10. Minimal Necessary Sharing R R  S • R must not know that S has b & y • S must not know that R has a & x R  S S Count (R  S) • R & S do not learn anything except that the result is 2.

  11. Problem Statement:Minimal Sharing • Given: • Two parties (honest-but-curious): R (receiver) and S (sender) • Query Q spanning the tables R and S • Additional (pre-specified) categories of information I • Compute the answer to Q and return it to R without revealing any additional information to either party, except for the information contained in I • For intersection, intersection size & equijoin, I = { |R| , |S| } • For equijoin size, I also includes the distribution of duplicates & some subset of information in R  S

  12. A Possible Approach • Secure Multi-Party Computation • Given two parties with inputs x and y, compute f(x,y) such that the parties learn only f(x,y) and nothing else. • Can be solved by building a combinatorial circuit, and simulating that circuit [Yao86]. • Prohibitive cost for database-size problems. • Intersection of two relations of a million records each would require 144 days (Yao’s protocol)

  13. Outline • Motivation • Problem Statement • Protocols • Challenges

  14. Intersection Protocol: Intuition • Want to encrypt the value in R and S and compare the encrypted values. • However, want an encryption function such that it can only be jointly computed by R and S, not separately.

  15. Commutative Encryption Commutative encryption F is a computable function f : Key F X Dom F -> Dom F, satisfying: • For all e, e’  Key F, fe ofe’ = fe’ ofe (The result of encryption with two different keys is the same, irrespective of the order of encryption) • Each fe is a bijection. (Two different values will have different encrypted values) • The distribution of <x, fe(x), y, fe(y)> is indistinguishable from the distribution of <x, fe(x), y, z>; x, y, z r Dom F and e r Key F. (Given a value x and its encryption fe(x), for a new value y, we cannot distinguish between fe(y) and a random value z. Thus we cannot encrypt y nor decrypt fe(y).)

  16. Example Commutative Encryption • fe(x) = xe mod p where • p: safe prime number, i.e., both p and q=(p-1)/2 are primes • encryption key e  1, 2, …, q-1 • Dom F: all quadratic residues modulo p • Commutativity: powers commute (xd mod p)e mod p = xde mod p = (xe mod p)d mod p • Indistinguishability follows from Decisional Diffie-Hellman Hypothesis (DDH)

  17. Intersection Protocol Secret key R S s r S R We apply fs on h(S), where h is a hash function, not directly on S. fs(S) Shorthand for { fs(x) | x  S}

  18. Intersection Protocol R S s r S R fs(S) fs(S) fr(fs(S)) Commutative property fs(fr(S))

  19. Intersection Protocol R S s r S fs(fr(S)) R fr(R) fr(R) <y, fs(y)> for y  fr(R) <y, fs(y)> for y  fr(R) Since R knows <x, y=fr(x)> <x, fs(fr(x))> for x  R

  20. Intersection Size Protocol R S s r S R fr(R) fs(S) R cannot map z fr(fs(R))back tox  R. fs(S) fr(R) fr(fs(S)) fs(fr(R)) fr(fs(R)) Not <y, fs(y)> for y  fr(R)

  21. Equijoin Protocol: Intuition • R needs some extra information ext(v) for values v  R  S. • ext(v): information about the other attributes in S for those records where S.A = v • S has second secret key s’ • For each value v  S, • S generates an encryption key  = fs’(v), and • encrypts ext(v) using encryption function K with key . • R to learns fs’(v) only for v  R. • f-1r (fs’ (fr(v))) = f-1r (fr (fs’(v))) = fs’(v)

  22. Equi Join and Join Size • See Sigmod03 paper • Also gives the correctness proofs as well as the cost analysis of protocols

  23. Related Work • [Naor & Pinkas 99]: Two protocols for list intersection problem • Oblivious evaluation of n polynomials of degree n each. • Oblivious evaluation of n2 linear polynomials. • [Huberman et al 99]: find people with common preferences, without revealing the preferences. • Intersection protocols are similar • [Clifton et al, 2003]: Secure set union and set intersection • Similar protocols

  24. Summary and Challenges • New applications require us to go beyond traditional centralized and federated information integration: sovereign information integration • Need models of minimal disclosure and corresponding protocols for • other database operations • combination of operations • Need faster protocols • Need further study of tradeoff between efficiency and • additional information disclosed • approximation

  25. Alice’s age Alice’s salary Bob’s age 30 | 70K | ... 50 | 40K | ... Randomizer Randomizer 30+35 65 | 20K| ... 25 | 60K | ... Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms Data Mining Model Privacy Preserving Data Mining • Insight: Preserve privacy at the individual level, while still building accurate data mining models at the aggregate level. • Add random noise to individual values to protect privacy. • EM algorithm to estimate original distribution of values given randomized values + randomization function. • Algorithms for building classification models and discovering association rules on top of privacy-preserved data with only small loss of accuracy.

  26. Queries Privacy Policy Data Collection Other Table Size: 10 million, no index Attribute Access Control Data Collection Analyzer Privacy Constraint Validator Privacy Metadata Creator Query Intrusion Detector Data Retention Manager Data Accuracy Analyzer Audit Info Audit Info Store Privacy Metadata Encryption Support Record Access Control Audit Trail Hippocratic Database • Vision: Database systems that take responsibility for the privacy of data they manage, while not impeding the flow of information. • Architectural principles derived from current privacy legislation.

More Related