1 / 19

A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira

INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM. A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira CISUC – DEI – FCTUC University of Coimbra - Portugal Jorge Bernardino CISUC – DEIS – ISEC Polytechnic Intitute of Coimbra - Portugal.

ravi
Télécharger la présentation

A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM A Data MaskingTechnique for Data Warehouses Ricardo Jorge Santos & Marco Vieira CISUC – DEI – FCTUC Universityof Coimbra - Portugal Jorge Bernardino CISUC – DEIS – ISEC PolytechnicIntituteof Coimbra - Portugal ISEL, Lisbon – September/2011

  2. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Agenda • Background • Motivation • MOBAT: A MOD Based Data Masking Technique • Optimization Features • Experimental Results • Conclusions and Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  3. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork SecurityConcernsin Data Warehousing • A Data Warehouse (DW) is a critical asset for many enterprises • Stores all relevant historical and current business information needed for supporting decision making (sensitive data) • Main targets for stealing or compromising sensitive data • Attack rate and complexity has increased in the recent past Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  4. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Data SecurityDomains • Data Confidentiality: Only the right users should access the right data • Data Integrity: Data should always be correct, authentic and consistent • Data Availability: User should always be able to access data whenever needed 4 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  5. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Data PrivacyIssuesinToday’s DWs (OurFocus) • Masking solutions are not considered an acceptable solution • Encryption techniques introduce too much overheads • Storage Space • Data Loading Time • Query Response Time 5 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  6. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Data PrivacyIssuesinToday’s DWs (OurFocus) • Important feature: Facts in DW’s are mainly numerical-based columns! 6 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  7. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork MOBAT – MOdBAsed data maskingTechnique for DWs • MOBAT System Architecture 7 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  8. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork MOBAT – MOdBAsed data maskingTechnique for DWs Suppose table T => set of N numerical columns Ci = {C1, C2, C3, …, CN) to mask; total set of M rows Rj = {R1, R2, R3, …, RM). Each value to mask in the table identified as a pair (Rj, Ci) Rj and Ci respectively represent the row and column to which the value refers Each new masked value (Rj, Ci)’ is obtained by applying the following formula (1) for row j and column i of table T: (Rj, Ci)’ = (Rj, Ci) – ((K3, j MOD K1) MOD K2, i) + K2, i The inverse formula (2) for retrieving the original value is: (Rj, Ci) = (Rj, Ci)’ + ((K3, j MOD K1) MOD K2, i) – K2, i 8 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  9. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork MOBAT – ExampleDataset Supposing K1 = 7432, K2,1= 34 and K2,2= 17252 9 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  10. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork MOBAT – ExampleDataset Supposing K1 = 9264, K2,1= 12 and K2,2= 78254 10 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  11. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork MOBAT – Querying Using TPC-H benchmark with four numerical fact columns (i = 4) (L_Quantity, L_ExtendedPrice, L_Tax and L_Discount) masked by MOBAT New column L_KeyK3 for the j rows of the LineItem table, as the K3, j key K1=9342 K2, L_Quantity=12 K2, L_ExtendedPrice=51234 K2, L_Tax=6 K2, L_Discount=4 SELECT SUM(L_ExtendedPrice * L_Discount) AS Total_Revenue FROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND L_Discount BETWEEN 0.05 AND 0.07 AND L_Quantity<24 SELECT SUM((L_ExtendedPrice+MOD(MOD(L_KeyK3,9342),51234)-51234) * (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4)) AS Total_Revenue FROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4) BETWEEN 0.05 AND 0.07 AND (L_Quantity+MOD(MOD(L_KeyK3,9342),12)-12)<24 11 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  12. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork MOBAT – OptimizingFeatures & Performance • The inclusion of K3,j requires additional storage space • K3,j can be created in several ways, all with different impact in performance: • Simply adding a new column to the previous existing fact table • Recreating the fact table including K3,j from the start • Using a 128-bit integer column already existing in the fact table (typically can be the primary key column) 12 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  13. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Experimental Evaluation • 2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA HD • Oracle 11g DBMS • One standard benchmark and one real-world DW • TPC-H Decision Support Benchmark with 1GB and 10GB scale • Real-world Sales DW (2GB storage size) 13 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  14. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Experimental Evaluation 14 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  15. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Experimental Evaluation 15 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  16. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Experimental Evaluation 16 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  17. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork Conclusions • Our technique decreases data storage space and processing overheads, while still proving a significant level of security • Transparent method with minimal network bandwidth consumption overheads, due to only rewriting queries • Extremely easy and simple to implement in any DBMS / DW, with low costs • Querying the database directly will produce only realistic results (stored data is masked at all times) Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  18. Agenda Background Motivation MOBAT OptimizingFeatures Experimental Results Conclusions & FutureWork FutureWork • Developing the technique for also masking alphanumeric values • Assess its security strength in comparison with other solutions • Developing the technique for increasing its security strength • Using higher-sized keys • Enabling data integrity checks • Implementing false data injection Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011

  19. INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM A Data MaskingTechnique for Data Warehouses THANK YOU! QuestionsandComments? Ricardo Jorge Santos lionsoftware.ricardo@gmail.com 19 ISEL, Lisbon – September/2011

More Related