The 1000 Genomes project, Data Availability and Accessibility

The 1000 Genomes project, Data Availability and Accessibility L Clarke, H Zheng Bradley, R Smith, I Streeter, E Kulesha, B. Vaughan, P. Flicek and The 1000 Genomes Project. European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. The Phase 1 Integrated Variant Set based on 1092 individuals The 1000 Genomes data sets represent the largest public variation data resources available to the community. Providing coherent and useful resources based on the project data continues to be a key goal for the project Data Coordination Center (DCC). We present here a selection of the tools built on top of the 1000 Genomes data to make it as useful as possible to the wider community. This poster is available at http://www.1000genomes.org/ashg-2012-poster. For more information For more information about the work at the DCC please see The 1000 Genomes Project: data management and community access, Clarke L, et al. Nat Meth 9, 459-462 2012 Finding Data With more than 250,000 files and 275 Tbytes of data, finding information of the 1000 Genomes FTP site can prove challenging. The DCC provides some tools to assist with this. At the root of our FTP site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp | ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/), we present an index file, current.tree which lists all the files and directories the FTP site contains. This is updated nightly. The file is a 5 column tab delimited text file listing the following items for each file or directory. Improved Variation Views Our browser has been updated to Ensembl version 65 which includes improved variation views. • Relative file path (from the root of the FTP site) • Type (file or directory) • Size (bytes) • Timestamp (time file was last updated) • MD5 checksum The icons give rapid access to different sections of the variation view. This includes to the population genotype views which provide pie charts for the 1000 Genomes population genotypes. We also present an easy way to search this file on the website, see above screen shot (http://www.1000genomes.org/ftpsearch). The Gene Variation tables now also contain the minor allele for each variant and its frequency and as before these tables are available in CSV format. Tools As part of our browser, we present several tools to aid access and analysis of the 1000 Genomes data sets (http://browser.1000genomes.org/tools.html). These tools include the Ensembl Variant Effect Predictor, the Data Slicer, the Variation Pattern finder and the VCF to PED converter. The Variation Effect Predictor can provide functional annotation of SNVs and indels. This can including SIFT and PolyPhen consequences for non synonymous variants and overlap with high information parts of transcription factor binding sites. The Data Slicer allows users to get particular genomic sub sections of both VCF and BAM files. Announcements and Help It is also now easier than ever to find out about new releases of data from the project. We have created both RSS and Twitter feeds of our website announcements and you can now subscribe to announcement emails from 1000Announce@1000genomes.org. http://twitter.com/1000genomes http://www.1000genomes.org/announcements/rss.xml We also have a tutorial and a FAQhttp://www.1000genomes.org/using-1000-genomes-data http://www.1000genomes.org/faq Please send any questions to info@1000genomes.org The Variation Pattern Finder allows easy discovery of shared inheritance patterns. The VCF to PED converter can turn our VCF files into the ped and locus information files required by LD visualization tools like haploview, allowing using to explore the LD and haplotype structure of our data. Acknowledgements: We would like to thank the Ensembl Variation Team, Don Preuss and Christopher Cope at the NCBI and our funder the Welcome Trust. info@1000genomes.org

The 1000 Genomes project, Data Availability and Accessibility

The 1000 Genomes project, Data Availability and Accessibility

Presentation Transcript

Abortion in Europe: accessibility and availability

Structural Variation in the 1000 Genomes Project

1000 Genomes Project Data Tutorial

Lessons learnt from the 1000 Genomes Project about sequencing in populations

The 1000 Genomes Project

The 1000 Genomes Project Lessons From Variant Calling and Genotyping

Towards Completion of the 1000 Genomes Project

1000 Genomes SV detection Boston College

1000 Genomes Project Haplotype Integration

The 1000 Genomes Project Tutorial

Accessing the 1000 Genomes Data

Released 1000 Genomes indels : 328,528

The 1000 Genomes Tutorial Tools for Data Handling and Processing Laura Clarke

The 1000 Genomes Project

The 1000 Genomes Project Advanced Information Laura Clarke

The 1000 Genomes Project Advanced Information Laura Clarke

DEEPFISHMAN Data and data availability

The 1000 Genomes Project The Phase 1 Variant Set and Future Developments Laura Clarke

The 1000 Genomes Project: A Tutorial

An introduction to the Y-chromosomal data from Phase 3 of the 1000 Genomes Project

1000 Genomes Tutorial

Disease, natural selection and the 1000 Genomes Project

Sea Ice

Sea Ice