Download
detecting data leakage n.
Skip this Video
Loading SlideShow in 5 Seconds..
Detecting Data Leakage PowerPoint Presentation
Download Presentation
Detecting Data Leakage

Detecting Data Leakage

100 Vues Download Presentation
Télécharger la présentation

Detecting Data Leakage

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Detecting Data Leakage Panagiotis Papadimitriou papadimitriou@stanford.edu Hector Garcia-Molina hector@cs.stanford.edu

  2. Leakage Problem Name: Sarah Sex: Female …. Name: Mark Sex: Male …. Jeremy Sarah Mark App. U1 App. U2 Other Sources e.g. Sarah’s Network Kathryn Stanford Infolab

  3. Outline • Problem Description • Guilt Models • Pr{U1 leaked data} = 0.7 • Pr{U2 leaked data} = 0.2 • Distribution Strategies Stanford Infolab

  4. Problem Description • Guilt Models • Distribution Strategies Stanford Infolab

  5. Problem Entities Stanford Infolab

  6. Agents’ Data Requests • Sample • 100 profiles of Stanford people • Explicit • All people who added application (example we used so far) • All Stanford profiles Stanford Infolab

  7. Problem Description • Guilt Models • Distribution Strategies Stanford Infolab

  8. Guilt Models (1/3) p: posterior probability that a leaked profile comes from other sources p p Guilty Agent: Agent who leaks at least one profile Other Sources e.g. Sarah’s Network Pr{Gi|S}: probability that agent Ui is guilty, given the leaked set of profiles S Stanford Infolab 8

  9. Guilt Models (2/3) Agents leak all their data items OR nothing Agents leak each of their data items independently p2 p(1-p) (1-p)p or or (1-p)2 or Stanford Infolab 9

  10. Guilt Models (3/3) Independently NOT Independently Pr{G2} Pr{G2} Pr{G1} Pr{G1} Stanford Infolab

  11. Problem Description • Guilt Models • Distribution Strategies Stanford Infolab

  12. The Distributor’s Objective (1/2) U1 R1 S (leaked) Request R2 U2 R1 Request R3 Request U3 R3 Request Pr{G1|S}>>Pr{G2|S}Pr{G1|S}>> Pr{G4|S} U4 R4 Stanford Infolab

  13. The Distributor’s Objective (2/2) • To achieve his objective the distributor has to distribute sets Ri, …, Rn that minimize • Intuition: Minimized data sharing among agents makes leaked data reveal the guilty agents Stanford Infolab

  14. Distribution Strategies – Sample (1/4) • Set T has four profiles: • Kathryn, Jeremy, Sarah and Mark • There are 4 agents: • U1, U2, U3 and U4 • Each agent requests a sample of any 2 profiles of T for a market survey Stanford Infolab

  15. Distribution Strategies – Sample (2/4) Poor Minimize     U1 U1     U2 U2     U3 U3     U4 U4 Stanford Infolab

  16. Distribution Strategies – Sample (3/4) • Optimal Distribution • Avoid full overlaps and minimize   U1   U2   U3   U4 Stanford Infolab

  17. Distribution Strategies – Sample (4/4) Stanford Infolab

  18. Distribution Strategies Sample Data Requests Explicit Data Requests The distributor must provide agents with the data they request General Idea: Add fake data to the distributed ones to minimize overlap of distributed data Problem: Agents can collude and identify fake data NOT COVERED in this talk • The distributor has the freedom to select the data items to provide the agents with • General Idea: • Provide agents with as much disjoint sets of data as possible • Problem: There are cases where the distributed data must overlap E.g., |Ri|+…+|Rn|>|T| Stanford Infolab

  19. Conclusions • Data Leakage • Modeled as maximum likelihood problem • Data distribution strategies that help identify the guilty agents Stanford Infolab

  20. Thank You!