1 / 1

The 1000 Genomes project, Data Availability and Accessibility

The 1000 Genomes project, Data Availability and Accessibility L Clarke , H Zheng Bradley, R Smith, I Streeter, E Kulesha, B. Vaughan, P. Flicek and The 1000 Genomes Project. European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK.

tehya
Télécharger la présentation

The 1000 Genomes project, Data Availability and Accessibility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 1000 Genomes project, Data Availability and Accessibility L Clarke, H Zheng Bradley, R Smith, I Streeter, E Kulesha, B. Vaughan, P. Flicek and The 1000 Genomes Project. European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK. The Phase 1 Integrated Variant Set based on 1092 individuals The 1000 Genomes data sets represent the largest public variation data resources available to the community. Providing coherent and useful resources based on the project data continues to be a key goal for the project Data Coordination Center (DCC). We present here a selection of the tools built on top of the 1000 Genomes data to make it as useful as possible to the wider community. This poster is available at http://www.1000genomes.org/ashg-2012-poster. For more information For more information about the work at the DCC please see The 1000 Genomes Project: data management and community access, Clarke L, et al. Nat Meth 9, 459-462 2012 Finding Data With more than 250,000 files and 275 Tbytes of data, finding information of the 1000 Genomes FTP site can prove challenging. The DCC provides some tools to assist with this. At the root of our FTP site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp | ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/), we present an index file, current.tree which lists all the files and directories the FTP site contains. This is updated nightly. The file is a 5 column tab delimited text file listing the following items for each file or directory. Improved Variation Views Our browser has been updated to Ensembl version 65 which includes improved variation views. • Relative file path (from the root of the FTP site) • Type (file or directory) • Size (bytes) • Timestamp (time file was last updated) • MD5 checksum The icons give rapid access to different sections of the variation view. This includes to the population genotype views which provide pie charts for the 1000 Genomes population genotypes. We also present an easy way to search this file on the website, see above screen shot (http://www.1000genomes.org/ftpsearch). The Gene Variation tables now also contain the minor allele for each variant and its frequency and as before these tables are available in CSV format. Tools As part of our browser, we present several tools to aid access and analysis of the 1000 Genomes data sets (http://browser.1000genomes.org/tools.html). These tools include the Ensembl Variant Effect Predictor, the Data Slicer, the Variation Pattern finder and the VCF to PED converter. The Variation Effect Predictor can provide functional annotation of SNVs and indels. This can including SIFT and PolyPhen consequences for non synonymous variants and overlap with high information parts of transcription factor binding sites. The Data Slicer allows users to get particular genomic sub sections of both VCF and BAM files. Announcements and Help It is also now easier than ever to find out about new releases of data from the project. We have created both RSS and Twitter feeds of our website announcements and you can now subscribe to announcement emails from 1000Announce@1000genomes.org. http://twitter.com/1000genomes http://www.1000genomes.org/announcements/rss.xml We also have a tutorial and a FAQhttp://www.1000genomes.org/using-1000-genomes-data http://www.1000genomes.org/faq Please send any questions to info@1000genomes.org The Variation Pattern Finder allows easy discovery of shared inheritance patterns. The VCF to PED converter can turn our VCF files into the ped and locus information files required by LD visualization tools like haploview, allowing using to explore the LD and haplotype structure of our data. Acknowledgements: We would like to thank the Ensembl Variation Team, Don Preuss and Christopher Cope at the NCBI and our funder the Welcome Trust. info@1000genomes.org

More Related