190 likes | 296 Vues
On June 17, 2008, Iain Buchan from the University of Manchester presented a pivotal discussion on extending epidemiology through innovative methodologies. He proposed the Shared Genomics platform, focusing on enhancing statistical genetics and data integration for better public health outcomes. The presentation covered central studies, software engineering approaches, and the necessity of incorporating complex datasets in addressing major health issues like Type 2 diabetes. This platform aims to foster real-time data updates and collaborative efforts in epidemiological research.
E N D
Shared Genomics:Extended Reasoning for Epidemiology Iain Buchan University of Manchester Microsoft Research Visit,Manchester, 17th June 2008
Today • Identify the need to extend epidemiologyand introduce the two central studies • Introduce the software engineering approach to building the shared genomics platform • Explore some extensions to statistical genetics and consider how to incorporate them 09:15 10:30 11:30
Thanks • Microsoft • Mark (Project Management) • Gareth (HPC Software Engineering) • Peter/Melandra Ltd (Annotation Bases) • David (Bioinformatics & Novel Analytics) • George (Bioinformatics) • Carole (Social Computing & myGrid) • Adnan, Angela & Fernando (MAAS Study) • John & Martin (Salford NHS)
def. Epidemiology “the study ofthe distributionand determinantsof diseaseand health-related statesin populations” JM Last, 2000
Epidemiology 1600-1860 Imagination Summarisation Knowledge Observation
Epidemiology 1860-2000 Imagination Summarisation & Statistical Modelling Knowledge Observation± Experimentation
Exhausted Epidemiology Platform Problem 1:Dwindling hits from tools todetect independent “causes” Problem 2:Knowledge can’t be managedby reading papers any more The big public health problems e.g. Type 2 Diabeteshave “complex webs of causes” The “data-set” and structureextend beyondthe study’s observations
GP GP GP Hosp. GP F I R E W A L L Outputs Person-identifiable and sensitive information removed Data Repository In PCT Anonymised Data Repository in PCT 24-hourly updates Real-time Link on NHS number Trusted person poses question(s) Optometrist Eye screening Community nurses Podiatry Biomics Data Deaths, Demographics etc.
Exposure (simple): Food intake & physical activity Modifying factors (e.g. sex) Exposure (compound): Sustained +ve energy balance Intermediate outcome: Overweight Intermediate outcome: Central Obesity Outcome (state): Type 2 diabetes Outcome (function): Early death Confounding factors (e.g. transport)
Obesity Graphs (emerging) Foresight
…ATTTAGGACCAATAAGTCT… …AATTAGGATCAATAAGTCT… ? ? Gene Association Studies Which genetic variation is responsible for disease variation Single nucleotide polymorphisms (SNPs) Human genome = 3 billion bases 3 million sites of variation Current cohorts ≈ 5,000 individuals vs. 500,000 SNPs
Patients S N P s Crude Pan-Genome Scans for( i = 1 to #random permutations) { } for( j = 1 to #SNPs) { } for( k = 1 to #patients) { disease status vs. locus status 2 } Given a typical 5k patients, 0.5m SNPs and 10k permutations: 20k 2 calcs per sec on modern single core 70 hrs single SNPs; ≈1,980 years for [n*(n-1)]/2 SNP pairs
Shared Views of Structure Pathway/expert 1 Pathway/expert 2 Causal pathway Modifiers Outcome Exposure Confounders
Evidence & Theory Data Configuration Algorithms Knowledge Management Visualisation Insight Abstract thinking Signal
Simple Algorithms Simple Algorithms z z z G G G P P P 1) 2) 3) Computational free-thinking, for insights from richly-observed health & environments
Shared Genomics Platform • Apr-Jun 08 • Statistical genetics algorithms for Win cluster • Annotation-base prototype • Jul-Dec 08 • Basic statistical genetics platform • Model genome-wide analyses with MSR & PIs • Jan 09 – Mar 10 • Epidemiologists driving genome-wide analyses • Integrated modelling and annotation
Wider Opportunities • Text-mining for causal inference • Prototyping planned with NaCTeM • Stress-testing annotation workflow systems • Proposal to OMII ENGAGE • Novel visualisation of annotations • Novel statistical algorithms • Graph-based causality workbench • Potential grant applications
Published papers; unpublished papers; slides; abstracts; blogs; experts; workflows; statistical scriptssignposts to other relevant data... Catalogues, ontologies, search engines, text-mining, analytical services, social networks etc. Causality Workbench Factors a, b and c are not in my study, but they cluster with it in various ways:Factor b is a potentially important measured confounder – I will add it... Modifiers Causality? Exposures Confounders Errors Outcome Structure