1 / 19

Microdata Sharing Via Pseudonymization

Microdata Sharing Via Pseudonymization. UNECE Work session on statistical data confidentiality Manchester, 2007 December 18th. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Motivation. Individuals microdata is essential for empirical research

hedwig
Télécharger la présentation

Microdata Sharing Via Pseudonymization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microdata Sharing Via Pseudonymization UNECE Work session on statistical data confidentiality Manchester, 2007 December 18th TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA

  2. Motivation • Individuals microdata is essential for empirical research • Its direct release thwarts the privacy of the individuals • Goal: to build privacy-preserving microdata sharing systems through pseudonymization

  3. Problem statement • Suppliers own confidential microdata on individuals ((id1,D(id1)),…, (idn,D(idn)) • Researchers want to correlate microdata from different Suppliers • Example: A Researcher wants to find out the correlation between drug prescription (Chemists) and traffic accidents (Insurers) • Question: How to enable Researchers to correlate microdata without having access to sensitive information?

  4. Framework I want to correlate Maybe de-identifieddata?

  5. Supplying de-identified data • If Suppliers de-identify the data by: • - removing the identifier field • applying Statistical Disclosure Control (SDC) mechanisms • no sensitive information is leaked, but… Matching is not possible!

  6. Pseudonymizing data via TTPs • Solution 1: a Trusted Third Party replaces real identifiers by random identifiers (pseudonyms) Where P(id) is random This table is only know to the TTP Matching!

  7. Pseudonymizing data via TTPs (II) • Advantages: • Unconditional security (w.r.t. pnymization) • Matching is possible • Drawback: TTP must store a huge table secretly • Solution 2: Use a block cipher (Enc(K,·),Dec(K,·)), and then P(id)= Enc(K,id) • Advantage: • Only the key K must be stored secretly • Drawbacks: • Security is not unconditional • Different Researchers might not have the same access rights

  8. Pseudonymizing data via TTPs (III) We share and win! Not allowed to match Chemists and Insurers data Not allowed to match Chemists and Insurers data

  9. Pseudonymizing data via TTPs (IV) • Solution 3: Allocate a different key Ki for every Researcher Ri • Pseudonyms are destination-dependant: P(id,Ri)=Enc(Ki,id) P(id*,R1) and P(id*,R2) look unrelated

  10. Pseudonymizing data via TTPs (V) • Advantage: • Disallowed matching among malicious Researchers is prevented • Drawbacks: • TTP must be on-line to perform sensitive operations (pseudonymization and matching) Let’s see why…

  11. Pseudonymization with symmetric encryption Supplying pseudonymized data: • Supplier Sj sends datablocks D(id1),…,D(idl) to Researcher Ri • Sj sends the identities id1,…,idl in the same order to the TTP • TTP sends the list P(id,Ri)=Enc(Ki,id) to Ri • Ri forms the pnymized database (P(id1,Ri),D(id1)),…,(P(idl,Ri),D(idl))

  12. Pseudonymization with symmetric encryption • Matching Ri and Rd pnymized databases: • Ri sends to Rd the data D(id1,i),…,D(idl,i) • Ri sends to TTP P(id1,Ri),…, P(idl,Ri) • TTP decrypts Dec(Ki,P(id,Ri))=id and encrypts P(id,Rd)=Enc(Kd,id). The result is sent to Rd • Rd matches the pnymized databases (P(id1,Rd),D(id1,i)),…,(P(idl,Rd),D(idl,i))(P(idl,Rd),D(id1,d)),…,(P(idm,Rd),D(idm,d)) • As a result the TTP is a bottleneck to the system

  13. Pseudonymization using public key crypto • Let G=<g> a prime order group. Let H:{0,1}*! G a hash function • TTP assigns a secret key xi2 Zp to Researcher Ri • P(id,Ri)=H(id)x{i} • Supplying pseudonymized data from Sj to Ri • Supplier Sj and Researcher Ri jointly compute the pnymized database {P(id,Ri),D(id)} • TTP allocates pnymizing keys (¹,º) 2 Zp£Zp, such that ¹¢º=xi; ¹ is sent to Si, º is sent to Rj • Sj computes and sends H(id1)¹,…,H(idl)¹ to Rj • Rj computes (H(id)¹)º=H(id)x{i} =P(id,Ri) • Ri forms the pnymized database (P(id1,Ri),D(id1)),…,(P(idl,Ri),D(idl))

  14. Pseudonymization with public key crypto (II) • Matching Ri and Rd pnymized databases: • This can be done by Ri and Rd with a 1-round interactive protocol provided certain keys are obtained off-line from the TTP • Ri nor Rd learn their pnymizing keys xi, xd even if colluding • Rd only learns D(id,Ri) for id’s in the intersection • Security is based on Decision Diffie-Hellman assumption

  15. Pseudonymization with public key crypto (III) • Advantages: • Matching is possible • Disallowed matching among malicious Researchers is prevented • TTP is not a bottleneck (only delivers off-line crypto keys) • Drawbacks: • Suppliers must collaborate for every pnymization • Interactive protocols (on-line communication)

  16. Advanced setting

  17. Properties • Suppliers and Accumulators are assumed Honest-But-Curious • Researchers are assumed Malicious • Accumulators’ intersection and union operations are non-interactive • Two levels of pseudonymization corresponding to the different levels of trust • It uses ‘composite bilinear groups’

  18. Governance • The allowance of these protocols is governed by a Regulatory Privacy Body (RPB) from a functional perspective. A strict licensing infrastructure will be enforced by the RPB, describing: • Which parties are allowed to perform what protocols with each • What kind of data can be exchanged • Which subsets of identities or pnyms are allowed as input to the protocols

  19. Thanks!

More Related