250 likes | 367 Vues
What’s the Gist? Privacy-Preserving Aggregation of User Profiles. Igor Bilogrevic (Google), Julien Freudiger (PARC) , Emiliano De Cristofaro (UCL), Ersin Uzun (PARC). Scott Kildall – Data Crystals. Data is the Crux of Internet Economy.
E N D
What’s the Gist? Privacy-Preserving Aggregation of User Profiles Igor Bilogrevic(Google), Julien Freudiger(PARC), Emiliano De Cristofaro(UCL), ErsinUzun(PARC) Scott Kildall – Data Crystals
Data is the Crux of Internet Economy • Corporations seek personal data for better targeting • More data and more sensitive data Third Parties Data Brokers Users Users Users Users • Credit card transactions • Interests • Political party • Apps usage • Browsing history • Mobility patterns • …
Issues with Current Approach • Privacy • What personal data is collected? • How much and how good is it? • Transparency • Who knows what about me?[1] • Where does this data come from? • Remuneration • Users value their data • Users don’t get money for it Data Brokers A Call for Transparency and Accountability FTC, May 2014 [1] aboutthedata.com
“This question calls for Acxiom to provide information that would reveal business practices that are of a highly competitive nature. Acxiom cannot provide a list of each entity that has provided data from, or about, consumers to us.” ACXIOM
An Emerging Model Third Parties Data Brokers Users Users Users Users Participatory Data Brokers • Benefits • Users retain control over who access what about them • Users decide what data can be monetized • Users get some revenue
“What if Facebook paid you? Several startups envision an era in which we are all the brokers, and beneficiaries, of our own personal data.“ David Zax, Is personal data the new currency?MIT Tech Review You
Our Contribution • What’s the Gist? • Method for monetization of user personal data with privacy • Users choose what to share • Brokers are not required to be trustworthy • Idea • Rather than selling data as-is, monetize a model of the data User data (age) User1 22 User2 56 User3 43 User4 33 … Aggregate (age) pdf Age 20 30 40 50 60
System Architecture 4. Extract features 2. Select users 3. Queries 1. Query Third Party Aggregator Users Users Users Users 5. Noisy encrypted answers 7. Answer 6. Aggregate, decrypt, sample, and monetize • Interactive mode • Customer queries for certain desired aggregates • Batch mode • Aggregator prepares certain aggregates
Users – Profile Computation • Each user ihas profile pi with K attributes {ai,j} • Each element ai,jis an integer representing a value or a preference 28 223 5 6 .. 2 3 Age # of friends Action movies Drama movies … Rock music History books ai,2 ai,2 ai,3 .. .. ai,K Example pi = pi = User i
Users – Feature Computation • Features depend on chosen probability model • For Gaussian model, each user i computes • fi = {[ai,1 , ai,12], …, [ai,K, ai,K2]} [28], [282] [223], [2232] [5], [52] [6], [62] .. [2], [22] [3], [32] Age # of friends Action movies Drama movies … Rock music History books pi =
Private Aggregation • Assume • Privacy Differentially private ri preventsaggregator from deducting user data[1] • Security • Aggregator can onlydecrypt sum • No shared secret, no pairwise distributed computations Aggregator User 1 Knows User i Computes … User n [1] E Shi et al. Privacy-Preserving Aggregation of Time-Series Data. NDSS, 2011
Aggregator – Gaussian Approximation • Entities contribute • Enc[a1], Enc[a12], …, Enc[ai], Enc[ai2] • Broker aggregates to compute mean μ, and variance σ2 • Obtains Gaussian approximation N(μ, σ2) for each attribute pdf N(μ, σ2) age
Aggregator - Attribute Ranking pdf • Assumption • Attributes with uniform distribution reveal less information about individual entities • Measure divergence • Distance between two probability distributions • Jenson-Shannon (JS) divergence • Small JS distance means low value pdf Uniform distribution
Performance • Dataset and implementation • 100,000 real users from U.S. Census [data.gov, July 2013] • 3 types of attributes (income, education, age) • Java, measurements on Core i5 2.53 GHz, 8 GB RAM • Metrics • Accuracy of Gaussian approximation • Information leakage for each attribute • Revenue • Overhead
Income Education Age 100 users 1,000 users 100,000 users
Gaussian Approximations • Accuracy improves quickly with number of users (100 is good) • Fit for income and age is 3x better than for education
Information Leakage vs Uniform • Maximum information leakage achieved at about 1,000 users • Information leakage not necessarily increasing with number of users (stable after a while) • Larger user samples do not necessarily provide better discriminating features
Revenue Model • Value of user information: from $0.0005[2] to $33[1] • Where w=0.1 is the commission. [1] J. P. Carrascal, C. Riederer, V. Erramilli, M. Cherubini, and R. de Oliveira. Your browsing behavior for a big mac: Economics of personal information online. WWW,2013 [2] L. Olejnik, T. Minh-Dung, C. Castelluccia. Selling off privacy at auction. NDSS, 2014
Revenue per Attribute • Three privacy sensitivity distributions • User revenue is small and does not increase with the number of participants • Revenue similar to Amazon Mechanical Turk • Broker incentivized to collect as many users as possible ($0.07 $ 2897) • Third parties incentivized to select demographic group of size 100
Overhead User Aggregator • 1.5 min for 100 users • 27.7 h for 100,000 users • Can and should be parallelized 1 mstotal Independent of number of users
Related Work • Privacy-preserving aggregation • Modified version of the Paillier encryption scheme[1,2] • But P2P communications between participants • Homomorphicencryption and differential privacy[3,4] • But differential privacy by third party and contributions linkable to users before aggregation [1] Z. Erkin and G. Tsudik. Private computation of spatial and temporal power consumption with smart meter.ACNS 2012 [2] E. Shi, R. Zhang, Y. Liu, and Y. Zhang. Prisense: privacy-preserving data aggregation in people-centric urban sensing systems. INFOCOM, 2010 [3] R. Chen, I. E. Akkus, and P. Francis. Splitx: high-performance private analytics. SIGCOMM, 2013 [4] R. Chen, A. Reznichenko, P. Francis, and J. Gehrke. Towards statistical queries over distributed private user data. NSDI, 2012
Related Work • Privacy-preserving monetization • Local user profile generation, categorization, and ad selection[1,2] • Anonymizingproxies to shield users’ behavioral data from third parties[3] [1] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, and S. Barocas. Adnostic: Privacy preserving targeted advertising. NDSS, 2010 [2] S.Guha, B.Cheng, and P. Francis. Privad: practical privacy in online advertising. NSDI, 2011 [3] C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez. For sale: your data: by: you. HotNETs, 2011
Conclusion • Designed method to monetize sensitive data with privacy • If data is new currency, we are creating marketplace • Evaluation shows practical performance, good accuracy with as little as 100 users and good incentives for parties involved • Future work • Enhance security features (range checks to thwart pollution attacks, fault-tolerance, efficient key establishment) • Enable targeting of users after aggregation • Enable subsequent collection of more than model (i.e., black swan)