1 / 12

Georeferencing in the Social Sciences – Promise and Peril

Georeferencing in the Social Sciences – Promise and Peril. Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences

hamal
Télécharger la présentation

Georeferencing in the Social Sciences – Promise and Peril

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Georeferencing in the Social Sciences– Promise and Peril Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences E: micah_altman@harvard.eduW: http://maltman.hmdc.harvard.edu/

  2. The Structural Challenges for Progress in Social Sciences • Pervasive Measurement Error • Scattered Data • Controlled Experiments not Available in Many Fields • Weak Theory Georeferencing in the Social Sciences -- Promise and Peril

  3. Georeferencing Can Make Measurements far More Accurate • E.g. travel, time spent exercising, commutes, time at work, agriculture, distance to voting booth Correlation between reported and real distance to tax office.Source: [McKenzie and Sakho, 2007 as quoted in Gibsen and McKenzie,2007] LA Voting Precincts Relocated.Source: [Hui and Brady, 2006] Georeferencing in the Social Sciences -- Promise and Peril

  4. Georeferencing Can Unify Data • Establishing comparability of most social science measurements is a major undertaking • Yet… most social science phenomenon are unambiguously located in time and space • Complete georeferencing would link almost all datasets at a basic conceptual level • However, most social science data is not yet georeferenced … this is an engineering challenge • Once done, coincident concepts can be revealed … Source: [Weeks, et al. 2007] Georeferencing in the Social Sciences -- Promise and Peril

  5. Can Georeferencing fix Experiments Theory? • Not in general … although visualizations may help Source: [Altman & McDonald 2008] Source: [J. Snow, 1854] Source: [Calabrese, et al 2007; Real Time Rome Project 2007] Georeferencing in the Social Sciences -- Promise and Peril

  6. Mountains of Unified, Accurate Data… What’s not to like? • “The increasing use of linked social-spatial data has created significant uncertainties about the ability to protect the confidentiality promised to research participants... At this time, however, no known technical strategy … adequately resolves conflicts among the objectives of data linkage, open access, data quality, and confidentiality protection across datasets and data uses” -- [Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, National Research Council, 2007] Georeferencing in the Social Sciences -- Promise and Peril

  7. Can Privacy Problems be Fixed? • Maybe not, some challenging findings… • Large, sparse datasets can “leak” private information when correlated with external data. Even when significantly sub-sampled, perturbed, etc. [Narayan and Shmatikov 2008] • Repeated release of perturbation-masked geospatial point data leaks increasing amounts of information. Does not help to combine with aggregation masking [Zimmerman and Pavlik 2008] • Possible to identify other relationships in networks if you can generate seemingly innocuous relationships in same network [Backstrom, et. al 2007] • Pseudonymous communication can be linked through textual analysis [Tomkins et. al 2004] • K-anonymized data still vulnerable if homogenous, or attacker has enough background knowledge. L-diversity offered as replacement [MachanavaJJhala, et al 2007] • Additional anonymization challenges for geospatial data • Very fine grained location – versus multi-state aggregation mask required by HIPAA, and large social science surveys • Background knowledge very likely • Easy to integrate with other datasets • Some data points may be directly observable • Sequences of locations even more challenging • May cross aggregation units • Repetitive, temporally correlated • Induce unique networks Georeferencing in the Social Sciences -- Promise and Peril

  8. Managing Privacy Issues With Digital Libraries • Embedding all sensitive data access in a digital library can greatly improve subject privacy: • Authentication, vetting, and access control • Standardized license terms governing analysis (derived from metadata and data characteristics) • Models can be run on-line without access to raw data • Monitoring and auditing of data use • Limit sequence of analyses by a user, in some cases ( for promising results, see [Dwork, et al 2006] ) Georeferencing in the Social Sciences -- Promise and Peril

  9. Federated and Virtually Hosted Digital Libraries http://dvn.iq.harvard.edu/ Georeferencing in the Social Sciences -- Promise and Peril

  10. Summary • Georeferencing would (partially) solve big problems for social sciences: measurement error, data integration • Privacy is likely the fundamental challenge for social scientists using this data • Privacy problem may never be fully solved mathematically • Digital libraries can provide leverage for management of data privacy issues with social, legal and technical means Georeferencing in the Social Sciences -- Promise and Peril

  11. References • M. Altman, M.P. McDonald ,2008. “Better Automated Redistricting”, Journal of Statistical Software, Forthcoming. • H.E. Brady, I. Hui. 2006. Is It Worth Going the Extra Mile to Improve Causal Inference?, Political Methodology Annual Meeting, Davis. • L. Backstrom, C. Dwork, J. Kleinberg. Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography. Proc. 16th Intl. World Wide Web Conference, 2007. • Calabrese F., Colonna M., Lovisolo P., Parata D., Ratti C., 2007, "Real-Time Urban Monitoring Using Cellular Phones: a Case-Study in Rome", Working paper # 1, SENSEable City Laboratory, MIT, Boston http://senseable.mit.edu/papers/, [also see the Real Time Rome Project [http://senseable.mit.edu/realtimerome/] • C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating Noise to Sensitivity in Private Data Analysis, Proceedings of the 3rd IACR Theory of Cryptography Conference, 2006 • J. Gibson, and D. McKenzie 2007. Using Global Positioning Systems in Household Surveys for Better Economics and Better Policy, The World Bank Research Observer 22(2):217-241 • A. MachanavaJJhala, D Kifer, J Gehrke, M. Venkitasubramaniam, 2007,"l-Diversity: Privacy Beyond k-Anonymity" ACM Transactions on Knowledge Discovery from Data, 1(1): 1-52 • McKenzie, David, and Yaye Seynabou Sakho. 2007. “Does It Pay Firms to Register for Taxes? The Impact of Formality on Firm Profitability.” Washington, D.C: World Bank. • A. Narayanan and V. Shmatikov, 2008, Robust De-anonymization of Large Sparse Datasets, Proc. of 29th IEEE Symposium on Security and Privacy (Forthcoming) • J. Novak, P. Raghavan, A. Tomkins, 2004. Anti-aliasing on the Web, Proceedings of the 13th international conference on World Wide Web • Panel on Confidentiality Issues Arising from the Integration of Remotely Sensed and Self-Identifying Data, National Research Council, 2007. Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data. National Academies Press • J. Snow, 1855, On the mode of communication of cholera. London • J.R. Weeks, A. Hill, D. Stow, A. Getis, D Fugate, 2007, "Can we spot a neighborhood from the air? Defining neighborhood structure in Accra, Ghana", GeoJournal 69(1-2): 9-22. • D.L. Zimmerman, C. Pavlik , 2008. "Quantifying the Effects of Mask Metadata, Disclosure and Multiple Releases on the Confidentiality of Geographically Masked Health Data", Geographical Analysis 40: 52-76 Georeferencing in the Social Sciences -- Promise and Peril

  12. Contact Information http://maltman.hmdc.harvard.edu/ <Micah_Altman@harvard.edu> Georeferencing in the Social Sciences -- Promise and Peril

More Related