1 / 27

Services for Sensitive Research Data

Services for Sensitive Research Data. Gard Thomassen, PhD Head of Research Support Services Group Leader of the ”Services for Sensitive Data” p roject University Center for Information Technology (USIT) University of Oslo. Outline. What is sensitive data? Who has sensitive data?

tudor
Télécharger la présentation

Services for Sensitive Research Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Services for Sensitive Research Data Gard Thomassen, PhD Head of Research Support Services Group Leader ofthe ”Services for Sensitive Data” project University Center for Information Technology (USIT) Universityof Oslo

  2. Outline • What is sensitive data? • Who has sensitive data? • Project background • Collaborators and reference group • System requirements • System outline • Technical and security details • Maintenance • Advantages and current status • International collaborations Gard Thomassen,TSD 2.0

  3. Who has sensitive data? • Faculty of Medicine / Oslo University Hospital • Faculty of Theology • Faculty of Educational Sciences • Faculty of Social sciences • And so the list continues…also outside UiO.. Gard Thomassen,TSD 2.0

  4. Project background • UiO has an open network structure, but still with a high level of security • Most of the UiO data is open • Various UiO/OUS researchers approached USIT asking for an eInfrastructure for sensitive data (majority was MR-images and NGS data) • The pilot project TSD 1.0 was run Gard Thomassen,TSD 2.0

  5. Lessons learned • The need for our services far exceeded the scalability of our system • Too much hands-on maintaining and manual setup of new projects and new users • There is a need for a High Performance Computing (HPC) resource within a secure environment • Not very user friendly (both ends) Gard Thomassen,TSD 2.0

  6. Main collaboratorson TSD 2.0 Collaborators • Norwegian Storage Infrastructure (NorStore) • Norwegian Genetics Analysis Platform (GenAp) • Norwegian Dietary Registry (Faculty of Medicine) • Institute of Psychology (Faculty of Social Sciences) • Norwegian Cancer Sequencing Consortium (NCGC) Reference group Oslo University Hospital, NorStore, Regional Etichal Committee, National Institute of Public Health, Norwegian Cancer Registry, Research Network at OUS, Elixir Norway, NCGC, GenAPand Institute of Psychology,UiO. Gard Thomassen,TSD 2.0

  7. System requirements • Security, isolation and access control as given by law • Large storage capacity • Multiple users • High performance computing resource • High bandwidth • Easy to maintain • Easy to use (including audio and video) • Some freedom within user space • Accessible from anywhere through authentication • A variety of software and public DBs must be available • Windows and Linux support (OS X if possible) • Data collection service • Data sharing service • National scope (so far..) Gard Thomassen,TSD 2.0

  8. Solution outline Gard Thomassen,TSD 2.0

  9. System outline VM-server HPC - Colossus Gateway 1 n Internet 1 (project) Secure encrypted network to special high volume data production sites 1 (storage area) Storage Gard Thomassen,TSD 2.0

  10. Using TSD 2.0 for analysis TSD 2.0 P1 DB VM B1 P1 VM B2P1 TSD disk P1 User B1 P1 GW Front end Colossus User B2 P1 Colossus Colossus disk Gard Thomassen,TSD 2.0

  11. Data import and export using TSD 2.0 TSD 2.0 Virtual “sluice- server” 2 “Sluice-server” 3 NFS mount “Sluice HD” Project HD Virtual project-server Data copied here by ssh+scp or web-drive (2-factor authentication) encrypted data if sensitive 1 4 Gard Thomassen,TSD 2.0

  12. Data collection using TSD 2.0 minID “Nettskjema” Encrypted XML (PGP) Import mechanism Project VM Project disk TSD 2.0 Gard Thomassen,TSD 2.0

  13. Data-import for NGS-centers and other large scale data producers HiSEQ TSD 2.0 Project VM /tmp/ storage GW Encrypted connection Project disk TSD controlled box on-site Gard Thomassen,TSD 2.0

  14. Technical outline Closed network at USIT HPC-resource Admin services • Provisioning system • AD • Surveillance • Software repo • Cfengine • Vcenter • Backup • Antivirus • Log service Storage / DBs • PostgreSQL • Archiving • Compartmentalized disk Management • Mgmt of storage • Mgmt of network • Mgmt of hardware • Mgmt of VMs Access network • National Health network • Terminal servers • Thin client servers • VPN Clients (2-factor login) • Remote desktop clients • Thin-clients on dedicated network • Special network for large-scale data production centers Clinical health data projects Other sensitive data projects Publicly available network segment through “minID” Web-questionary Web portal Electronic consent Gard Thomassen,TSD 2.0

  15. Technical details • KVM for virtualization (RedHat Linux) • Cerebrum as provisioning (a USIT application) • AD system administration guided by the provisioning system (duplicated) • FreeBSD firewall and gateway (duplicated) • Integration with IDporten (Norwegian governmental eID system) for www-enquiries and applications • Storage with separation between projects (Hitachi disc system and encrypted backup to tape) • IPv6 on the inside (… and private IPv4) Gard Thomassen,TSD 2.0

  16. HPC resource – Colossus • At present about 500 cores • No project users are to log in on any nodes • One global job daemon to control data integrity (to ensure project data separation) • /tmp/ and /work/ will be per projects and cleaned after job finishes • As similar to Abel as possible • Separate disk and more nodes will come soon Gard Thomassen,TSD 2.0

  17. Security details • OATH TOTP 2-factor authentication • Smart phones or programmable hardware tokens • Special roles for those allowed to export data • Import/export is under strict control • No open connection to the internet • Strong separation between projects (VLAN) • Special security measures with remote desktops • Extremely hardened FreeBSD gateway and firewall • Encrypted backup, one key per project • Sys admins are single users (traceability) • Sys admins have to use same authentication process • Most hardware is physically separated from other UiO hardware Gard Thomassen,TSD 2.0

  18. Maintenance • Reuse as much as possible from the USIT eInfrastructure • Virtualize as much as possible • Management/ surveillance data can be pushed, but not pulled (Nagios, Collectd) • Surveillance based on existing systems • Sys admins have different access levels

  19. Opportunities enabled by TSD 2.0 • NGS research on humans is possible • Large scale imaging studies possible • “HUNT-like” studies online for the respondents and the scientists • Off-site analysis of sensitive data • Secure storage for verification of published research • Electronic consent • Possible work-area for making exams? • TSD to host all human NGS research data from UIO/OUS?? Gard Thomassen,TSD 2.0

  20. Nordic collaboration opportunities • Laws are fairly similar (Norway very strict) • Difficult to exchange data for research • One should learn from each others as these systems demands very special IT-knowledge • System development and system-administration is non-sensitive and may be shared • Building TSD addresses many novel security questionsin a University setting, to be learnt from • Large DBs of health data may enable very interesting research in the future (NeGI) • NeIC has shown interest into TSD 2.0 • TSD collaborate with CSC in Finland and with BILS / Elixir Sweden. BBMRI are interested Gard Thomassen,TSD 2.0

  21. Current status • Pilot project data is transferred now now • System is being prepared and finished for setting up new projects and go into production • Storage is up • Secure Nettskjema is up • Working on risk evaluation • Project registration when risk evaluation is finished • HPC-resource 4th quarter 2013 • Video and sound will be the main target during further work • System Whitepaper (v1.0) written

  22. People involved Project group / developers Administration / associated • IT-dir Lars Oftedal • Hans A. Eide • Märtha Felton • Dag-ErlingSmørgrav • PetterReinholtsen • Elisabeth Ytterdal • Tor Fuglerud • DBA (PostgreSQL team) • Cerebrum team • Morten Werner Forsbring • EspenGrøndahl • HPC – Colossus team • Gard Thomassen Gard Thomassen,TSD 2.0

  23. Cost per project • First year establishment price (per project) • Regular yearly project fee • License cost (licensed software usage) • Storage cost for storage exceeding basic allocation • Cost of DB administration (if DB needed) • Cost of CPU hours Colossus

  24. Project administration in TSD 2.0 - technical • Application through the National ID-portal + Nettskjema • The project is created in Cerebrum with role-categories • The project is connected to resources (VM + disc + VLAN + DB + HPC) • Users are created and given their roles • Username, pwd and one-time-passwords are distributed • Accounts kept on storage, HPC CPU time and additional VMs to enable control and book-keeping • NorStore may offer “free” storage within TSD (there might be a small security mgmt overhead cost) • In the the future there will be some level of self service through a web portal within TSD Gard Thomassen,TSD 2.0

  25. Conclusion • It is very hard to make something secure and user-friendly at the same time • Researchers wants the freedom of using the internet while doing research on sensitive data… • A thorough risk assessment must be made during and after the planning and implementation phase to make the best choices • What you can not avoid should at least be detected by some surveillance mechanism. • More (inter)national / local cooperation wanted Gard Thomassen,TSD 2.0

  26. Pilot project (TSD 1.0) • Secure storage for large amounts of NGS data and MR-images (>100TB) • Secure windows “research server” enabling usage of MS Office, STATA, SPSS etc on sensitive data • Research server is based on an isolated system using VMware ESX • Two-factor login-system • Encrypted backup Gard Thomassen,TSD 2.0

  27. “The Ultimate Goal is…. ….to be able to provide the same services that are available for researchers working with non-sensitive data, with the necessary security, with minimum impact on the user experience, and minimum extra overhead and cost.” Hans Eide, 2012 (my boss) Gard Thomassen,TSD 2.0

More Related