250 likes | 383 Vues
A flexible, scalable genomics framework for integrating heterogeneous vector sequence data. Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame. Assembly required…. VectorBase is here to help (esp. – OMICs data).
E N D
A flexible, scalable genomics framework for integrating heterogeneous vector sequence data Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame
VectorBase is here to help (esp. –OMICs data) Please see me and/or Dan Lawson (EBI) anytime this meeting
Anopheles gambiae M & S Lawnziak, Emrich et al. (2010, Science)
Some genomic regions display footprint of strong, recent selection Lawniczak, Emrich et al. 2010 Science
FlexReseq tool for integrating diverse sequence data Reference: ACGTCGT TACTGC Sample_1: ACGTC GATACTGC ACGTCGATAT TGC ACGTCGATAT TGC AC GTCGAT ACTGC ACGTCGAT ACTGC Sample_2: ACG TCGT TAT TGC ACGTCGT TAT TGC ACGTCGT TAT TGC ACGTCGT TAT TGC ACGTC GT TAT TGC
Genome Analysis Toolkit (GATK): Map-Reduce framework that allows efficient access to large resequencing data sets FlexReseq: A module for GATK: Configurable interface allows easy data exploration Modular implementation of rules allows for easy extension of software Saves you from lots of scripting (Perl) code! FlexReseq implementation McKenna et al., Genome Research, 2010
A malaria use-case for FlexReseq How did drug-resistance evolve? Why are some parasites drug-resistant? Goal: we want to connect genotype (genome) to phenotype (drug response) Samarakoon, Regier, et al., BMC Genomics, 2011
1. Whole genome shotgun sequencing 2. Reference genome mapping Parents HB3, Dd2 Parental genomes [shotgun libraries] Mapped: SSAHA2 http://www.sanger.ac.uk NCBI Trace Archive [28] Genetic cross Wellems et al. 1990 [24] Progeny recombinants SC05 7C126 Progeny genomes [shotgun libraries] Reference genome (3D7) Shotgun libraries GS-FLX technology 454/Roche PlasmoDB (v5.4) [27]
A more detailed map of P. falciparum (A) 7C126 (B) SC05 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Chromosome Chromosomeposition Dd2 HB3
Association of 2La with clines of aridity in Nigeria… 24,000 mosquitoes 194 sampling localities Modified from Coluzzi et al (1979)
High-throughput sequencing • Data from Besansky lab • Illumina Genome Analyzer • 4 population pools(S-form) • SHRiMP alignment • BWA works also C. Cheng et al, unpublished
In situ error isolation Has been shown to be important in ancient DNA-based ecology
Thanks to… Notre Dame Bioinformatics Lab, Summer 2010 VectorBase (NIH/NIAID) • Dr. Nora Besansky (ND) • Dr. Frank Collins (ND) • Rory Carmichael, Andrew Shehan, Nate Konopinski, Dave Campbell (ND), others… Anopheles genome cluster group i5K Arthropod Genomics Consortium steering committee