140 likes | 280 Vues
A Microdata Computation Centre for de-centralized data sources. Anja Burghardt and David Schiller (FDZ of BA at IAB). IASSIST 2014, Toronto (Canada), June 4, 2014. Introduction. The European “Data without Boundaries” ( DwB ) project aims to improve the access to confidential microdata.
E N D
A Microdata Computation Centre for de-centralized data sources Anja Burghardt and David Schiller (FDZ of BA at IAB) IASSIST 2014, Toronto (Canada), June 4, 2014
Introduction • The European “Data without Boundaries” (DwB) project aims to improve the access to confidential microdata. • A European Remote Access Network (EuRAN) should ease data access. • A number of services will be provided to researchers. • One of them is a Microdata Computation Centre (MiCoCe).
Introduction • EuRAN with SPA are secure infrastructures to access tools to analyze distributed data sources. • Restriction when talking about confidential microdata: data has to stay physically in the facilities of the data providers. • This is due to legal restrictions: “I can only care about my data as long as it is stored in my institution or at least in my country (range of validity of my law)”. • Goal: enable analyzes with multiple distributed data sources via the EuRAN SPA.
MiCoCe workshop in Nuremberg, Germany • DwB proposed the MiCoCe in his deliverable 4.2. • The need for a tool like this is stressed out by both sides: • Researchers in order to answer European level research questions • Data owners in order to give secure access to their data sources • Fact: DwB had no real idea how to do and implement something like this? • Reaction: we organized an workshop with experts from different disciplines (took place April 29-30, 2014)
Workshop participants Tim Mulcahy (NORC), Duncan Smith (University of Manchester), Paul Burton (University of Bristol), Amadou Gaye (University of Bristol), Claus Goran Hjelm (Statistic Sweden), Christian Boehme (GWDG), Thorsten Busert (DIPF), Gerald Mahlmeister (TBA21), RoxaneSilberman (CNRS), Leo Engberts (CBS), Titus Purdea (Eurostat), Patricia Kelly Hall (Minnesota Population Center), ØrnulfRisnes (NSD), Andreas Nold (SAS), Lars Hvidberg (University Southern Denmark), Gillian Raab (University of Edinburgh), BeataNowok (University of Edinburgh), Alf Wachsmann (MDC for Molecular Medicine), Hans Irebäck (Statistic Sweden), CosminBasca (University of Zurich), Oliver Schmitt (GWDG), Peter-Paul de Wolf (CBS), ChristophStallmann (University Magdeburg), David Schiller (IAB), Anja Burghardt (IAB), Stefan Bender (IAB), Johanna Eberle (IAB), Thomas Rhein (IAB), Iris Dieterich (IAB), and JörgDrechsler (IAB). http://dwbproject.org/events/workshop-micoce.html
Outline • Need for harmonization of data sources. • Trusted third party approaches. • Example of solutions from different disciplines. • Big data and public/private sector cooperation. • Conclusion and outlook.
Need for harmonization of data sources Some statements: • “From a technical point you can run analysis on distributed data sources but the results are likely nonsense if the data is not harmonized” • “As a researcher I don’t care if data is marked as comparable. Access to data is always better than no access and it is up to me to use the data in a useful way” • “The responsibility for output harmonization lies with the data provider but also with the researcher. However, the data provider should support the researcher by providing good (and understandable) data documentation!“ • “Harmonization of microdata AND metadata are both necessary” • “The VRE of EuRAN can be a secured environment to create international data sets”
Trusted third party approaches • Some kind of centralized infrastructure is needed in order to work with distributed data sources. • If there is a trusted party, there can be a change to move the data physically to a central place. • Different levels of creating trust exist: • Trust in an organization or person (example: Tim Mulcahy from NORC who was able to create a data set out of business information of US rating agencies). • Trust in secure (and sophisticated) organizational workflows (example German National Cohort that works with confidential information about individuals coming from different sources).
Examples of solutions from different disciplines • Health Data (DataShield) - Data Aggregation Through Anonymous Summary-statistics from Harmonized Individual levELDatabases. • Governmental Data (Stat. Sweden and Netherlands) – e.g. federated solutions. • Synthetic Data (University Edinburgh) – creating non confidential data to support research projects. • Database and File systems (DIPF and GWDG) – adopting current solutions to fit into the needs of social science research. • Virtualization (SAS) – using approaches created for Big Data to solve data security issues. • Statistical approaches (CBS) – adopting statistical methods to solve the challenges of analyzing confidential microdata in a secure way. • Adaptive query processing over distributed linked data-endpoints (University Zurich) – privacy preserving solution if adopted to social science needs. Need to analyze and compare approaches to support the needs of researchers and data providers.
Big data and public/private sector cooperation • The use of Big Data for social sciences research is getting a more and more important topic. • Beside the evaluation of the usefulness of Big Data an infrastructure to work with Big Data is needed. • MiCoCe and EuRAN can be adopted to work as such an infrastructure. • Big Data is often hosted by private sector companies. • Need to make data sources from public and private sector available without harming the interests of each other. • MiCoCe and EuRAN can work as a trusted infrastructure to enable access to public and private sector data.
Conclusion and Outlook • This was the first summarizing of the workshop output. • A paper with more detailed discussions will be prepared by the workshop participants. • And will be (a small) part of the DwB output (European Data Access Forum (final event of DwB (I)) in March 2015). • Creating running tools can be a task for one or more follow-up project(s) of DwB. • Hopefully we will have good news on that at the 41th IASSIST in Minneapolis.
Thankyouforyourattention! Anja Burghardt, anja.burghardt2@iab.de David Schiller, david.schiller@iab.de This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 262608.