1 / 18

Privacy-Preserving Databases and Data Mining

Privacy-Preserving Databases and Data Mining. Y ü cel SAYGIN ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin/. Outline. Privacy : an informal discussion Overview of data mining Overview of privacy preserving databases and data mining Privacy preserving data mining

sandro
Télécharger la présentation

Privacy-Preserving Databases and Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy-Preserving Databases and Data Mining Yücel SAYGIN ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin/

  2. Outline • Privacy : an informal discussion • Overview of data mining • Overview of privacy preserving databases and data mining • Privacy preserving data mining • Privacy protection against data mining • Privacy preserving databases • Future research directions

  3. Privacy: What, Why, and How • Privacy : Giving the people the right to be left alone • It is one of the fundamental rights of people in western civilizations • Privacy of data: Giving the data owners the right to say what can be done with their data

  4. Is data privacy something new? • Privacy has been one of the fundamental rights of people • Maybe termed differently but it has been studied in the past • Statistical databases, statistical disclosure control … • The inference problem

  5. Why privacy is a really big issue these days? • Technology is really integrated with our personal life • With new technology : Networking, WEB • New devices: Mobile Phones, RFID tags, Computers, digital cameras • Which means that data about us, and about what we are doing can be collected easily and at a fraction of the cost 10 years ago. • Navigation patterns in WEB • Location information (wireless phones, RFID tags) • Transactions (e-commerce, POS…) • Your emails (now scanned by gmail to display ads) (was a big discussion in the CFP conference at Berkeley this year )

  6. Why privacy is a really big issue these days? CAPPS II (Computer Assisted Passenger Prescreening System) collects flight reservation information as well as commercial information about passengers. This data, in turn, can be utilized by government security agencies. Although CAPPS represents US national data collection efforts, it also has an effect on other countries.

  7. Why privacy is a really big issue these days? The following sign at the KLM ticket desk in Amsterdam International Airport demonstrates the point: “Please note that KLM Royal Dutch Airlines and other airlines are required by new security laws in the US and several other countries to give security customs and immigration authorities access to passenger data. Accordingly any information we hold about you and your travel arrangements may be disclosed to the concerning authorities of these countries in your itinerary“.

  8. Why privacy is a really big issue these days? Some of the largest airline companies in US, including American, United and Northwest, turned over millions of passenger records to the FBI SSchwartz J. & Micheline M. (2004). Airlines Gave F.B.I. Millions of Records on Travelers After 9/11 NY Times, May 1.

  9. Why privacy is a really big issue these days? Total Information Awareness (TIA) project in US, which aims to build a centralized database that will store the credit card transactions, emails, web site visits, flight details of Americans was not funded by the Congress due to privacy concerns.

  10. Why is privacy a really big issue these days? • Data about us is being collected and stored somewhere • We need to have the right to control • what data is collected about us, • how long it should be stored, • who is going to see it • and how it is going to be used

  11. But we have all this security research going on for decades! • Security (Database, Network etc) is necessary but not sufficient to ensure full privacy. • Once someone has access to the data what can be done with it (e.g. giving your email to a third party, giving away your profile, shopping behavior etc.) needs to be regulated.

  12. Some of the past research in the context of security is useful for data privacy • Disclosure Control in statistical databases • The inference problem and proposed solutions • Encryption techniques • Secure multi party computation

  13. So what have Data Mining and Databases to do with Privacy? • They deal with data mostly about people. Therefore we need to integrate privacy into database systems and data mining tools. • Data mining is seen as a magic tool that can find secret information in piles of data, therefore there is some hesitation in public about data mining • This is partially true • But they are just tools designed by human beings, that need some good training data, and experts to interpret the results.

  14. Data mining and Privacy Issues Gained Momentum in US • “Pentagon has released a study that recommends the government to pursue specific technologies as potential safeguards against the misuse of data-mining systems similar to those now being considered by the government to track civilian activities electronically in the United States and abroad”. "Perhaps the strongest protection against abuse of information systems is Strong Audit mechanisms… we need to watch the watchers" Markoff J. (2002). Study Seeks Technology Safeguards for Privacy. NY Times, 19 December. • This shows us that even the most aggressive data collectors in the US are aware of the fact that the data mining tools could be misused and we need a mechanism to protect the confidentiality and privacy of people.

  15. Privacy Issues Gained Momentum among researchers. • More research funding • + More projects • = More sessions on privacy in database and data mining conferences • Search google : privacy data mining , you will have pages of results. It was not like that 3-4 years ago. • Centers for privacy research (IBM Almaden, Stanford, Purdue Univ. …)

  16. Overview of Data Mining • Data mining is a combination of statistics … • Data mining models • Patterns (associations, sequences,…) • Clusters • Classification

  17. Privacy preserving data mining • Privacy preserving classification model construction • Privacy preserving data clustering • Privacy preserving association rule mining

  18. Privacy preserving classification • Reference: Rakesh Agrawal and Ramakrishnan Srikant. “Privacy-Preserving Data Mining”. SIGMOD, 2000, Dallas, TX.

More Related