260 likes | 595 Vues
Data Warehousing Data Mining Privacy. Reading. Data Warehousing. Repository of data providing organized and cleaned enterprise-wide data (obtained form a variety of sources) in a standardized format Data mart (single subject area) Enterprise data warehouse (integrated data marts) Metadata.
E N D
Reading CSCE 824 - Spring 2011
Data Warehousing • Repository of data providing organized and cleaned enterprise-wide data (obtained form a variety of sources) in a standardized format • Data mart (single subject area) • Enterprise data warehouse (integrated data marts) • Metadata CSCE 824 - Spring 2011
OLAP Analysis • Aggregation functions • Factual data access • Complex criteria • Visualization CSCE 824 - Spring 2011
Warehouse Evaluation • Enterprise-wide support • Consistency and integration across diverse domain • Security support • Support for operational users • Flexible access for decision makers CSCE 824 - Spring 2011
Data Integration • Data access • Data federation • Change capture • Need ETL (extraction, transformation, load) CSCE 824 - Spring 2011
Data Warehouse Users • Internal users • Employees • Managerial • External users • Reporting and auditing • Research CSCE 824 - Spring 2011
Data Mining • Databases to be mined • Knowledge to be mined • Techniques Used • Applications supported CSCE 824 - Spring 2011
Data Mining Task • Prediction Tasks • Use some variables to predict unknown or future values of other variables • Description Tasks • Find human-interpretable patterns that describe the data CSCE 824 - Spring 2011
Common Tasks • Classification [Predictive] • Clustering [Descriptive] • Association Rule Mining [Descriptive] • Sequential Pattern Mining [Descriptive] • Regression [Predictive] • Deviation Detection [Predictive] CSCE 824 - Spring 2011
Security for Data Warehousing • Establish organizations security policies and procedures • Implement logical access control • Restrict physical access • Establish internal control and auditing CSCE 824 - Spring 2011
Security for Data Warehousing (cont.) • Security Issues in Data Warehousing and Data Mining: Panel Discussion • Panel discussion of BhavaniThuraisingham, The MITRE Corporation, Linda Schlipper, The MITRE Corporation, PierangelaSamarati, SRI International, T. Y. Lin, San Jose State University, SushilJajodia, George Mason University, Chris Clifton, The MITRE Corporation, xanadu.cs.sjsu.edu/~tylin/publications/paperList/109_security.ps CSCE 824 - Spring 2011
Integrity • Poor quality data: inaccurate, incomplete, missing meta-data • Source data quality vs. derived data quality CSCE 824 - Spring 2011
Access Control • Layered defense: • Access to processes that extract operational data • Access to data and process that transforms operational data • Access to data and meta-data in the warehouse CSCE 824 - Spring 2011
Access Control Issues • Mapping from local to warehouse policies • How to handle “new” data • Scalability • Identity Management CSCE 824 - Spring 2011
Inference Problem • Data Mining: discover “new knowledge” how to evaluate security risks? • Example security risks: • Prediction of sensitive information • Misuse of information • Assurance of “discovery” • Interesting Read: C. C. Aggarwal and P.S. Yu, PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS, http://charuaggarwal.net/toc.pdf CSCE 824 - Spring 2011
Privacy • Large volume of private (personal) data • Need: • Proper acquisition, maintenance, usage, and retention policy • Integrity verification • Control of analysis methods (aggregation may reveal sensitive data) CSCE 824 - Spring 2011
Privacy • What is the difference between confidentiality and privacy? • Identity, location, activity, etc. • Anonymity vs. accountability CSCE 824 - Spring 2011
Legislations • Privacy Act of 1974, U.S. Department of Justice (http://www.usdoj.gov/oip/04_7_1.html ) • Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, (http://www.ed.gov/policy/gen/guid/fpco/ferpa/index.html ) • Health Insurance Portability and Accountability Act of 1996 (HIPAA), (http://en.wikipedia.org/wiki/Health_Insurance_Portability_and_Accountability_Act ) • Telecommunications Consumer Privacy Act (http://www.answers.com/topic/electronic-communications-privacy-act ) CSCE 824 - Spring 2011
Online Social Network • Social Relationship • Communication context changes social relationships • Social relationships maintained through different media grow at different rates and to different depths • No clear consensus which media is the best CSCE 824 - Spring 2011
Internet and Social Relationships Internet • Bridges distance at a low cost • New participants tend to “like” each other more • Less stressful than face-to-face meeting • People focus on communicating their “selves” (except a few malicious users) CSCE 824 - Spring 2011
Social Network • Description of the social structure between actors • Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds • Support online interaction and content sharing CSCE 824 - Spring 2011
Social Network Analysis • The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities • Behavioral Profiling • Note: Social Network Signatures • User names may change, family and friends are more difficult to change CSCE 824 - Spring 2011
Interesting Read: • M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.149.4468 CSCE 824 - Spring 2011
Next Hippocratic Databases CSCE 824 - Spring 2011
Next Class Stream Data CSCE 824 - Spring 2011