1 / 1

Towards Completion of the 1000 Genomes Project

Towards Completion of the 1000 Genomes Project. Adam Auton 1 on behalf of the 1000 Genomes Project Consortium 1 Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461. Overview. Method Developments. Integrated haplotype map of 2,535 human genomes

loman
Télécharger la présentation

Towards Completion of the 1000 Genomes Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards Completion of the 1000 Genomes Project Adam Auton1 on behalf of the 1000 Genomes Project Consortium 1Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 Overview Method Developments • Integrated haplotype map of 2,535 human genomes • 26 populations from Europe, Africa, Asia, and Americas • Low coverage (~5X) and targeted Exome sequencing (~80X), combined with high density SNP microarrays. • 424 high coverage Complete Genomics genomes • 129 family trios, 6 duos, and 25 unrelated individuals at ~45X. • 2 ‘Gold Standard’ trios • 1 CEU and 1 YRI trio sequenced to 60X with 250bp paired end reads using PCR-free libraries. • Expected Release Date: Spring 2014 Short tandem repeat calling • Earlier releases of the 1000 Genomes focused on simple, biallelic variants. • SNPs, indels, deletions • The final phase of the project is focusing on improving calling and integration methods to access more of the genome. • Multi-allelic indels, MEI, CNVs… r2 = 0.75 ∆ lobSTR Repeat Number Original Recalibrated ∆ Marshfield Repeat Number Haplotype-based calling Empirical Indel Error Modeling Proportion of calls Samples in Final Release Low Coverage + Exome Sequencing Complete Genomics Data Indel length De novo assembly Genotyping structural variation * Duplicated samples CEU Trio YRI Trio NA12892 NA19238 NA19239 NA12891 Analysis • The final dataset is expected to contain >66 million variants. • Approximately 80% more than the Phase 1 release. • It is expected that over 95% of variants with frequency of at least 1% will have been characterized by the project. NA19240 NA12878 New populations in bold Data Processing • Multiple Input Callsets • 8 SNP / Indel callsets • Multiple Structural Variant callsets • 2 Short Tandem Repeat callsets • 1 whole-genome de novocallset • High quality variant sites selected via machine learning techniques • FDR controlled to be < 5%. • Variants integrated, genotyped, and phased via statistical techniques De novo mutation Recombination Germline mutation rate The distribution of rare variants Admixture Genotype Accuracy Community Events • ASHG 1000 Genomes Project Data Tutorial • Boston Convention & Exhibition Center, Meetings Rooms 156ABC, Wednesday October 23rd, 7 – 9pm • Community Meeting • Churchill College, Cambridge, UK. June 2014, with dates to be announced. Genotypingwithout LD Genotypeswith LD Visit http://1000genomes.org for more project details

More Related