1 / 1

Diversity Data at MaizeGDB

Diversity Data at MaizeGDB. Ethalinda KS Cannon 1 , Carson M. Andorf 2 , Bremen L. Braun 2 , Darwin A. Campbell 2 , Mary L. Schaeffer 3,4 , Cheng-Ting Yeh 5 , Patrick S. Schnable 5 , Carolyn J Lawrence 1,2

egan
Télécharger la présentation

Diversity Data at MaizeGDB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diversity Data at MaizeGDB Ethalinda KS Cannon1, Carson M. Andorf2, Bremen L. Braun2, Darwin A. Campbell2, Mary L. Schaeffer3,4, Cheng-Ting Yeh5, Patrick S. Schnable5, Carolyn J Lawrence1,2 1 Department of Genetics Development and Cell Biology, Iowa State University, Ames, IA 50011, USA 2 USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Ames, IA 50011, USA 3 USDA-ARS Plant Genetics Research Unit, University of Missouri, Columbia, MO 65211, USA 4 Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA 5 Department of Agronomy and Center for Plant Genomics Iowa State University, Ames, IA 50011, USA Introduction With the rapidly increasing quantity of diversity data, MaizeGDB is working with collaborators to lay the groundwork for storage, retrieval, visualization, and analysis of these datasets. There is need to handle not only Single Nucleotide Polymorphisms (SNPs) and Copy Number Variations + Presence/Absences variations (CNVs and PAVs), but also complex variation in which entire regions are rearranged along with transcripts that don’t appear on the reference genome because they don’t exist in B73, the sequenced reference genome. These data are useful for many research endeavors and would support Genome Wide Association Studies (GWAS), population studies, and to support high throughput phenotyping projects. Variations Types of data We plan to accept: • Associated phenotypes. (Also looking at NCBI’s ClinVar.) • Transcript assemblies and exome contigs. • Complex alternative alleles. • Assemblies and contigs that do not map to the reference genome. • Genetic maps. We plan to provide access to data stored outside MaizeGDB for: • Associated germplasm (note that Stock Center accessions will remain at MaizeGDB). • SNP collections (iPlant and dbSNP, also individual projects). • PAV & CNV collections in dbVar. (Also looking at NCBI’s ClinVar.) • Expression data, including RNA-seq and sequence-capture (PLEXdb, GEO, SRA, individual project sites, et cetera). Single Nucleotide Variations (SNPs) We recommend that these be stored at NCBI in the dbSNP database. Each SNP and assay is given a unique ID, can be aligned to a reference assembly, and has associated population frequency data. In addition, if the data are at NCBI, MaizeGDB and other informatics projects all will be enabled to access and work with the data. http://www.ncbi.nlm.nih.gov/projects/SNP/ Presence/Absence (PAVs) and Copy Number Variations (CNVs) We recommend that these be stored at NCBI in the dbVar database, which was designed for larger variations than dbSNP. Variations are grouped by study. We plan to work with GenBank and early submittors as dbVar does not yet contain any plant data. http://www.ncbi.nlm.nih.gov/dbvar/ Data access and visualization • Downloads (whole and partial datasets, for example, by chromosome coordinates). • Standard formats • GBrowse tracks Collaborators We are working with the Schnable lab, the Panzea project, iPlant Collaborative, the Genome Reference Consortium, and NCBI. Complex Alleles This is a form of variation in which an entire region can be duplicated, deleted, or rearranged. One maize example is R1. MaizeGDB contains 282 variants of this locus. We are talking with the Genome Reference Consortium (http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/) to learn how they are handling these variations for human, mouse, and zebrafish genome assemblies. • Survey • We created a survey to gather ideas from the community for what tools and views would be most helpful. Please help us by filling out the survey! • http://survey.maizegdb.org/diversity/ • Sample responses: • ‘In the NAM diversity lines, I would like to find all the SNPs and PAVs present in a particular gene-sized region. I would like to be able to get the data in a table that I could sort- like excel. It would also be nice if I could see the data on a genome browser, but relative to just one genome.’ • ‘I would like to develop genetic markers based on restriction site polymorphisms created by SNPs or PAV's (presence/absence/variations)’ • ‘Retrieve all polymorphisms compared to a reference strain’ • ‘Phenotype a ~500 line population for a trait of interest (disease resistance in my case) and be able to do high-resolution, genome-wide association analysis’ • ‘Given a putative pedigree relationship between two lines, give me an estimate of whether the pedigree relationship is correct or not’ Analysis • We are currently looking at tools like PLINK and TASSEL. • Some analyses will be pre-computed and the results made available. • We could provide analyses carried out by other labs. • We won’t provide computationally-heavy services but could potentially direct researchers to groups that might (for example, iPlant). Schematic Representation of the Sequence Relationship between Maize Inbreds B73 and Mo17 at Locus9002 on Chromosome 1L (Bin 1.08; Markers bz2, An1, and umc1446). The R1 locus Brunner S et al. Plant Cell 2005;17:343-360 Missing sequences As we all know, missing data can be due to a number of factors. For missing sequences, some may simply be present in the B73 genome but not yet included in the assembly, but others are entirely absent from B73. Instances of both will need to be recorded and made available, for example as FASTA sequence and in the MaizeGDB Genome Browser with each unaligned sequence as its own backbone. Acknowledgements Major funding from NSF grant #1027527: ‘Functional Structural Diversity among Maize Haplotypes’ and in-kind contributions from the USDA-ARS. We also thank Ed Buckler, Qi Sun, and Jeff Glaubitz at Panzea (NSF grant #0820619: ‘Genetic Architecture of Maize and Teosinte’); Eric Lyons and Nirav Merchant at iPlant (NSF grant #0735191: ‘The iPlant Collaborative: A Cyberinfrastructure-Centered Community for a New Plant Biology’).

More Related