Data linkage: the key to long term outcomes Professor Ronan Lyons Farr Institute – CIPHER Centre for Improvement in Population Health through E-records Research. Swansea University Biennial Scientific Meeting, Congenital Anomaly Registers: Utilizing a valuable resource Tuesday 7th October 2104 Dylan Thomas Centre, Swansea
Content of Presentation • Farr Institute • Data linkage in the UK • What is possible now and in the future • Long term outcomes
MRC’s vision for UK medical bioinformatics research Enabling technologies & infrastructure Developing capacity & expertise Funding for innovative research Patient groups Cohorts Trials High throughput data NHS Clinical Data Demographic data BioBanks Educational Environmental Social Data
Strengthening health informatics research • MRC coordinated 10-partner £19m call for e-health informatics research centres across the UK • Cutting edge research using data linkage • capacity building • Additional £20m capital to create Farr Institute • UK Health Informatics Research Network • Coordinate training, share good practice and develop methodologies • Engage with the public, collaborate with industry and the NHS Farr UCL Partners Farr Scotland Farr - CIPHER Farr N8 Manchester
Who is Farr? “Diseases are more easily prevented than cured and the first step to their prevention is the discovery of their exciting causes.” William Farr
Our Vision “To harness health data for patient and public benefit by setting the international standard in trustworthy reuse of electronic patient records and related linkable data for large-scale research.”
Our Ten Key Activities 1. Collaborative Leadership 6. Meta Data and Enabling Datasets 2. Cutting edge Research 7. Harmonised eInfrastructure 3. Public engagement 8. Partnerships 4. Governance (safe havens) 9. Training/ Capacity Building 5. Methods development 10. Communications To deliver impact nationally an internationally
Various developments across the UK • Considerable number of initiatives • UK • Farr Institute • Administrative Data Research Centres/Network • England • Health and Social Care Information Centre • Clinical Practice Research Datalink • Northern Ireland • Northern Ireland Longitudinal study • Scotland • Information Services Division, ISD Scotland • Electronic Data Research and Innovation Service eDRIS • Wales • SAIL databank
Steps in utilising health information for research • Building trust, partnerships and collaboration • Development of anonymisation and linkage techniques • Quality assessment and appraisal of datasets • Use of datasets to support research SAIL uses a split file, trusted third party (TTP), multi-stage encryption, and step wise and restricted field remote access analysis system to ensure privacy protection Lyons RA, et al.The SAIL databank: linking multiple health and social care datasets. BMC Med Inform Decis Mak. 2009 Jan 16;9:3. http://www.biomedcentral.com/1472-6947/9/3
Secure Anonymised Information Linkage (SAIL) databank SAIL: a multi-sourced data bank of linkable anonymised data on the population of Wales: • health service operational systems • national databases • clinical and biological data • education, housing, social care, etc. Uses a trusted third party, split file and multiple encryption technologies to create Anonymised Linkage Fields (ALFs) for individuals and residences SAIL Gateway is a remote access analysis facility to curtailed data.
Demographicdata only Other recombined data Validated, anonymised data Clinical / activity data SAIL split file/trusted third party methodology Validate Trace & Geo-code Construct ALF HIRU (Blue C) NHS Wales Informatics Service Data Provider Anonymisationprocess Recombine Encrypt and load Operational system HIRU (Blue C)
Datasets in SAIL(incomplete coverage) Administrative Health: Population Inpatients Outpatients Emergency Department Child Health Database Wales NHS Direct Wales Administrative Non-Health: Births Deaths Educational Attainment Social Services Housing Clinically rich data bases: Specialty specific Cancer Incidence Cancer Screening Congenital Anomalies Arthropathies Myocardial Infarction Diabetes Etc. General GP Data Laboratory systems Study specific Embedded trials and cohorts
Partcular difficulties with congenital anomaly research • Fetal deaths common with more severe malformations • Fetus does not have an ‘identity’ such as an NHS number • Ther e may be multiple fetuses • Babies often leave hospital with incomple name – ‘Baby Surname’ • Early neonatal deaths - not registered with GP • However, possible to link maternal and baby NHS numbers if systems like National Community Child Health Databases in Wales exist • NN4B
Informatics challenges • Modern cohorts/registries designed for multi-modal data linkage • Huge amounts of data • Different database structures/sizes • Major challenges when creating cross/cohort/platform analyses • Semantic interoperability /data harmonisation issues • Original metadata - standards • Variable definitions from baseline/laboratory results • Variable definitions from routine GP/hospital data • GP Read codes: UK/NZ, user variation+++ • UK Inpatient data – different in Wales/England/Scotland • Too difficult to move very large and complex data • Recipients would need to design/implement very complex data structures just to receive data • Privacy protection essential • Potential for ‘jigsaw’ attacks, threat from reidentification scientists • World-wide shortage of skills and expertise in managing these challenges • No single institution with all necessary skills • Need for international collaboration • Build upon existing expertise, developments and investments
Cohort Data in UK Dementia Platform • 22 cohorts involved • UK Biobank – greatest variety • Baseline survey • Baseline anthropometrics/ physiological measurements (continuous/categorical) • Baseline biochemistry/haematology • Genomics – 821,000 SNPs • Imaging: retinal/MRI/US • Accelerometer data • Follow up • Death and cancer registry • Primary care • Hospital data • Disease registries • Self reported conditions/status • Functional/cognitive impairment
Remote analysis platform for multiple cohorts: UK Secure e-Research Platform (UK SeRP) • Built upon SAIL Gateway developments www.saildatabank.com • Built with MRC capital infrastructure for Farr Institute • bid supported by ALSPAC, UK Biobank, LifeStudy cohorts • A national / international resource delivered through FARR • A secure environment to enable research groups to conform to best practices of data management, security and information governance • A remote access large scale IT infrastructure with standard and bespoke analytical tools • Leaves data ownership with the cohorts • devolved account and access control • information governance responsibility & control with projects • Researchers focus on the science
Wales Electronic Cohort for Children (WECC) • Multidisciplinary collaborative project • Platform for translating routinely collected data into an anonymised population level child e-cohort • Investigate the widest possible range of social and environmental determinants of child health and social outcomes • Inform the development of interventions to reduce health inequalities of children in Wales • Two phases: - Phase 1: proof of concept - Phase 2: dynamic capabilities
WECC development WECC eligibility criteria applied • WDS • Child Health • (NCCHD) • ALF_E • Birth records (ONS births) • Mortality records (ONS deaths) Data cleaning: rules for removal of duplicates and errors • Wales Electronic Cohort for Children • N=981,404 WDS: Welsh Demographic Service, NCCHD: National Community Child Health, ONS: Office for National Statistics
Links with health and education data via ALF_E • Links with maternal health data via mALF_E • Links with SAIL eGIS data via ALF_E/RALF_E Born in Wales n= 766,309 ♂: 392,959 (51.3%) ♀ : 373,333 (49.0%) WECC core n = 981,404 ♂: 500,181 (51.0%) ♀ : 481,205 (49.0%) Environment House Moves Inpatient Non-Welsh births n=215,095 ♂: 107,222 (49.8%) ♀ : 107,872 (50.2%) GP consultations Education WECC derived tables National dataset Perinatal and Child health
Examples of analyses • Influence of maternal and child health factors on time to first admission with a respiratory disorder (Paranjothy S. et al (2013) Pediatrics132:6 e1562-e1569) • Influence of head injuries on educational attainment at age 7 (Gabbe B.J. et al (2014)Journal of Epidemiology and Community Health, J Epidemiol Community Health.68:5 466-470 ) • Educational outcomes for frequent movers (Hutchings H. et al (2013) PLoS One. 8(8) e70601) • Influence of the physical social and environment on childhood obesity
Background to WECC phase 2 Poor educational attainment unemployment and/or low salary ill-health A greater understanding of factors underlying education inequalities is necessary to target interventions to protect future generations from poverty and ill health.
Research questions • Does moving to a less deprived community influence child health and educational outcomes? • To what extent do serious childhood or family health conditions affect educational outcomes? • Is poor educational attainment a risk factor for adverse health in adolescence? • Can a novel hybrid cohort study; embedding a traditional detailed survey cohort e.g. Millennium Cohort Study (MCS) within D-WECC be used to evaluate the strengths and weaknesses of using e-cohorts for epidemiological studies?
Data linkage and long term outcomes • Individual linkage • Mortality data : survival and cause of death • GP and hospital activity: health service impact/comorbidy • Laboratory and imaging systems: severity of condition/comorbidity • Education attainment: social impact of condition • Work and benefits: social impact/disability • Family/household linkage • Impact on the wider family
Time to the first emergency respiratory hospital admission • Risk decreased with each successive week in gestation up to 40 – 42 weeks. • Risk further increased for babies that were small for gestational age. • The increased risk is small for late preterm infants but the number affected is large and will impact on healthcare services.
Head injury and school performance n=116,154 Born in Wales Sept 1998-Aug 2001 n=101,892 Remaining in Wales n=14,262 Left Wales n=90,661 Valid KS1 result n=290 Head injury admission n=90,371 No head injury For children entering the school, what is the association between preceding head injury and KS1 (age 5-7 years) performance? J Epidemiol Community Health 2014;68:466-470 doi:10.1136/jech-2013-203427
Association between head injury and satisfactory performance on KS1
Soon - a tidal wave of data… • Full genome sequence ~£3,000 • Dropping in price 10x every 2-4 years • Existing NHS genetic test ~£1,000 • Disk cost to store individuals variations ~10p • Development of continuous monitoring and remote sensors • Data from many other sources • New approaches needed for accessing, manipulating, visualizing • Requires entirely new perspective
The future is bright • Expect further development of data linkage capabilities across the UK • However, capacity is a major issue • Amount of work needed is often underestimated • Ensuring privacy is protected and that the public are engagement and accept this research approach are key activities