1 / 34

DISTRIBUTING DATA FOR SECURE DATA SERVICES

DISTRIBUTING DATA FOR SECURE DATA SERVICES Vignesh Ganapathy , Dilys Thomas, Tomas Feder , Hector Garcia Molina, Rajeev Motwani March 25, 2011 Stanford, TRDDC, TRUST. Road Map. Motivation for Secure Databases Distributing Data Encryption, Distribution Privacy Constraints

dennis
Télécharger la présentation

DISTRIBUTING DATA FOR SECURE DATA SERVICES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DISTRIBUTING DATA FOR SECURE DATA SERVICES VigneshGanapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani March 25, 2011 Stanford, TRDDC, TRUST

  2. Road Map • Motivation for Secure Databases • Distributing Data • Encryption, Distribution • Privacy Constraints • Schema Decomposition • Query Partitioning • Cost Estimation • Where and Select clause processing • Query Decomposition • Experiments • Related Work

  3. Health Personal medical details Disease history Clinical research data Govt. Agencies Census records Economic surveys Hospital Records Banking Bank statement Loan Details Transaction history Manufacturing Process details Blueprints Production data Finance Portfolio information Credit history Transaction records Investment details Outsourcing Customer data for testing Remote DB Administration BPO & KPO Insurance Claims records Accident history Policy details Retail Business Inventory records Individual credit card details Audits Motivation 1: Data Privacy in Enterprises Privacy

  4. Motivation 2: Government Regulations

  5. Motivation 3: Personal Information • Emails • Searches on Google/Yahoo • Profiles on Social Networking sites • Passwords / Credit Card / Personal information at multiple E-commerce sites / Organizations • Documents on the Computer / Network

  6. Data Privacy • Value disclosure: What is the value of attribute salary of person X • Perturbation - Privacy Preserving OLAP • Identity disclosure: Whether an individual is present in the database table • Randomization, K-Anonymity etc. - Data for Outsourcing / Research • Linkage disclosure:Linking columns from multiple sites

  7. Losses due to Lack of Privacy: ID-Theft • 3% of households in the US affected by ID-Theft • US $5-50B losses/year • UK £1.7B losses/year • AUD $1-4B losses/year

  8. Road Map • Motivation for Secure Databases • Distributing Data • Encryption, Distribution • Privacy Constraints • Schema Decomposition • Query Partitioning • Cost Estimation • Where and Select clause processing • Query Decomposition • Experiments • Related Work

  9. Two Can Keep a Secret: A Distributed Architecture for Secure Database Services How to distribute data across multiple sites for : Redundancy and Privacy so that a singlesite being compromised does not lead to data loss Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi, Motwani, Srivastava, Thomas, Xu CIDR 2005

  10. Cloud Data Services • Data outsourcing growing in popularity • Cheap, reliable data storage and management • 1TB $399  < $0.5 per GB • $5000 – Oracle 10g / SQL Server • $68k/year DBAdmin • Privacy concerns looming ever larger • High-profile thefts (often insiders) • UCLA lost 900k records • Berkeley lost laptop with sensitive information • Acxiom, JP Morgan, Choicepoint • www.privacyrights.org

  11. Present solutions • Application level: Salesforce.com • On-Demand Customer Relationship Management • $65/User/Month ---- $995 / 5 Users / 1 Year • Amazon Elastic Compute Cloud • 1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s network bandwidth Elastic, Completely controlled, Reliable, Secure $0.10 per instance hour $0.20 per GB of data in/out of Amazon $0.15 per GB-Month of Amazon S3 storage used • Google Apps for your domain Small businesses, Enterprise, School, Family or Group

  12. Encryption Based Solution Encrypt DSP Client Query Q Q’ Client-side Processor Answer “Relevant Data” Problem: Q’“SELECT *”

  13. The Power of Two DSP1 Client DSP2

  14. The Power of Two DSP1 Q1 Query Q Client-side Processor Q2 DSP2 Key: Ensure Cost (Q1)+Cost (Q2)  Cost (Q)

  15. Privacy Constraints SB1386 Privacy • { Name, SSN} { Name, LicenceNo} { Name, CaliforniaID} { Name, AccountNumber} { Name, CreditCardNo, SecurityCode} are all to be kept private. • A set is private if at least one of its elements is “hidden”. • Element in encrypted form ok

  16. Techniques for Satisfying Privacy Constraints • Vertical Fragmentation • Partition attributes across R1 and R2 • E.g., to obey constraint {Name, SSN}, R1  Name, R2  SSN • Use tuple IDs for reassembly. R = R1 JOIN R2 • Encoding One-time Pad • For each value v, construct random bit seq. r • R1  v XOR r, R2  r • Deterministic Encryption • R1  EK (v) R2  K • Can detect equality and push selections with equality predicate • Random addition • R1  v+r , R2  r • Can push aggregate SUM

  17. Example Schema & Privacy Constraints • An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode} • Privacy Constraints • {Telephone}, {Email} • {Name, Salary}, {Name, Position}, {Name, DoB} • {DoB, Gender, ZipCode} • {Position, Salary}, {Salary, DoB} • Will use just Vertical Fragmentation and Encoding.

  18. An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode} • Privacy Constraints • {Telephone}, {Email} • {Name, Salary}, {Name, Position}, {Name, DoB} • {DoB, Gender, ZipCode} • {Position, Salary}, {Salary, DoB} • Decomposed schema • R1: {TID, Name, Email, Telephone, Gender, Salary } • R2: {TID, Name, Email, Telephone, DoB, Position, ZipCode } • Encrypted Attributes E: {Telephone, Email, Name}

  19. Partitioning, Execution • Partitioning Problem • Partition to minimize communication cost for given workload • Even simplified version hard to approximate • Hill Climbing algorithm after starting with weighted set cover • Query Reformulation and Execution • Consider only centralized plans • Algorithm to partition select and where clause predicates between the two partitions

  20. Hill Climbing Approach for Partitioning

  21. Road Map • Motivation for Secure Databases • Distributing Data • Encryption, Distribution • Privacy Constraints • Schema Decomposition • Query Partitioning • Cost Estimation • Where and Select clause processing • Query Decomposition • Experiments • Related Work

  22. Predicates for cost computation

  23. State Definitions for Bottom Up Evaluation • 0: condition clause cannot be pushed to either servers • 1: condition clause can be pushed to Server 1 • 2: condition clause can be pushed to Server 2 • 3: condition clause can be pushed to both servers • 4: condition clause can be pushed to either servers

  24. OR State Evaluation

  25. AND State Evaluation

  26. Query Partitioning Original Query SELECT Name, DoB, Salary FROM R WHERE (Name =’Tom’ AND Position=’Staff’) AND (Zipcode =’94305’ OR Salary > 60000) R1: {TID, Name, Email, Telephone, Gender, Salary R2: {TID, Email, Telephone, DoB, Position, ZipCode } • Query 1: SELECT TID, name, salary FROM R1 WHERE Name=’Tom’ • Query 2: SELECT TID, dob, zipcode FROM R2 WHERE Position=’Staff’

  27. Distributed Query Plan

  28. Road Map • Motivation for Secure Databases • Distributing Data • Encryption, Distribution • Privacy Constraints • Schema Decomposition • Query Partitioning • Cost Estimation • Where and Select clause processing • Query Decomposition • Experiments • Related Work

  29. Number of Iterations

  30. Perfomance Gain Experiment

  31. Iterations Vs Privacy Constraints

  32. Papers • [CIDR05]Two Can Keep A Secret. • [SIGMOD05] Privacy Preserving OLAP. • [ICDT05]Anonymizing Tables. • [PODS06]Clustering For Anonymity. • [KDD07] Probabilistic Anonymity.

  33. Thank You!

  34. Acknowledgements: Collaborators • Stanford Privacy Group • TRDDC Privacy Group • PORTIA, TRUST, Google

More Related