1 / 79

Gramene Scientific Advisory Board December 14, 2010

Gramene Scientific Advisory Board December 14, 2010. Introduction of SAB Members. David Marshall (SCRI) Paul Flicek (EBI) Michael Ashburner (Cambridge) Anna M McClung (USDA-ARS) Patricia Klein (Texas A&M) William Beavis (Iowa State) Tim Nelson (Yale) Georgia Davis (Missouri).

wauna
Télécharger la présentation

Gramene Scientific Advisory Board December 14, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gramene Scientific Advisory BoardDecember 14, 2010 Gramene SAB 2010

  2. Introduction of SAB Members • David Marshall (SCRI) • Paul Flicek (EBI) • Michael Ashburner (Cambridge) • Anna M McClung (USDA-ARS) • Patricia Klein (Texas A&M) • William Beavis (Iowa State) • Tim Nelson (Yale) • Georgia Davis (Missouri) Gramene SAB 2010

  3. Introduction of Gramene • Doreen Ware (CSHL, PI) • Susan McCouch (Cornell, PI) • Pankaj Jaiswal (OSU, PI) • Ed Buckler (Cornell, PI) • Vindhya Amarasinghe (OSU, Pathways) • Karthikeyan Athikkattuvalasu (Cornell, Diversity, Phenotypes) • Terry Casstevens (Cornell, Diversity) • Charles Chen (Cornell, Diversity) • Aaron Chuah (CSHL, Diversity) • Genevieve DeClerck (Cornell, Diversity) • Palitha Dharmawardhana (OSU, Pathways) • Marcela Monaco (CSHL, Pathways) • Will Spooner (CSHL, Genomes) • Joshua Stein (CSHL, Genomes) • Jim Thomason (CSHL, Germplasm, Website, Pathways, Genes) • Sharon Wei (CSHL, Genomes) • Ken Youens-Clark (CSHL, Project Manager, etc.) Gramene SAB 2010

  4. Aim 1: Genomes Doreen Ware, PI Sharon Wei, Will Spooner, Ken Youens-Clark, Jim Thomason, Marcela Monaco, Josh Stein, (Total Full Time Equivalent [FTE] 3.5)Note: hired 25% FTE (Josh) to replace Noel Yap who left the project in the Cornell Group1.5 FTE available from Ware, Dvorak NSF collaborations Gramene SAB 2010

  5. Suggestions From Last Year • Add Brachypodium • Added in Release 29 • Add a basal plant, e.g. Selaginella • We chose Physcomitrella patens because it was better documented at the time (GB record and published) • Selaginella now has GB record and will be investigated for 2011 • Add a Solanacea and/or Legume • We are adding tomato in 2011 and are looking into either soybean or Medicago • Display RNAseq data • We now have the ability to display as DAS track (see maizesequence.org) • Need to investigate data sources Gramene SAB 2010

  6. Highlights in 2010 • Genomes: 3 new; many updates • Software: Ensembl 59 provides new visualizations • SNP view • SNP Mart • Multi-species view • Multi-sequence alignment • New Analyses • Gene-centered synteny build • EPO multi-sequence alignment • Split-gene detection • New Development • GERP Conservation (Sharon) • GWAS views (Aaron, NSF 2010 collaboration) • Tandem arrays (Josh, Will) Gramene SAB 2010

  7. 17 Genomes in Release 32 • Physcomitrella (moss): Basal land plant • Updated assemblies of grapevine & poplar • Updated annotations of Indica rice & Arabidopsis • Updated assemblies & annotations of Oryza chr 3S projects Gramene SAB 2010

  8. Genome Plans 2011: Planning: • Lycopersicon esculentum (tomato) • Oryza glabberima (African domesticated rice) • Oryza brachyantha (wild rice) • Aegilops tauschii (wheat D, NSF #0701916) Investigating: • Selaginella moellendorffii (basal vascular plant) • Triticum aestivum (hexaploid wheat) • Malus x domestica (apple) • Glycine max (soybean) or Medicago Gramene SAB 2010

  9. Collaborations Genomes • NSF PGI #0638820 PI Wing end 2009 (wild rice OMAP) • USDA ARS Grape end 2009 • NSF PGI PI Buckler end 2009 • NSF 2010 #0723510 PI Nordborg end 2011 (Arabidopsis thaliana, A. Lyrata, Capesella) • NSF #0701916 PGI PI Dvorak end 2011 (wheat) • NSF PGI PI Wilson end 2010 (maize) • NSF PGI PI #0723510 Scanlon end 2012 (maize) • NSF PGI PI Springer to start this year (maize) • NSF PGI PI Wing end 2011 (wild rice OGE) • NSF PGI #1032105 PI McCombie end 2012 (wheat) • EBI BBRSC Paul Kersey (travel for coordination participants) • NSF PGI PI McCouch end 2014 (rice) • NSF XXX Iplant Steve Goff

  10. New Maps and Markers New maps in last year: • Sorghum genetic (Mace) • Barley genetic (Close) • Ae. tauschii genetic (Dvorak) • Switchgrass genetic (Tobias) Gramene SAB 2010

  11. More genomes in CMap Added two more fully sequenced genomes to CMap with seq/seq comparisons based on orthology (build 32). Gramene SAB 2010

  12. New SNP View Shows functional consequences of polymorphism New in Ensembl 56 • Synonymous coding • Non-synonymous coding • Stop gain/loss • Splice site • UTR • Intronic Gramene SAB 2010

  13. Available for rice japonica, rice indica, Arabidopsis & grape datasets SNP BioMart Configure output fields and format (XLS, CSV, TSV, or HTML) If HTML, link to Variation, Gene, or Browser Pages Filter on region, phenotype, strains, id, & consequence (e.g. introduced STOP codon), and other attributes Gramene SAB 2010

  14. Whole Genome Alignments BLASTZ-CHAIN-NET between 20 pairs of species Schwartz S et al., Genome Res.;13(1):103-7 Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9 New & improved alignment viewer (Ensembl 56) Gramene SAB 2010

  15. Multispecies View • Stack any number of genomes aligned to a common reference by BLASTZ • Browse & zoom along any genome independently Re-introduced in Ensembl 56 Gramene SAB 2010

  16. Automated Detection of Split Genes Special class of “paralog” since Ensembl 58 Contiguous split paralog: Non-overlapping, nearby (<1 Mb), same strand Putative split paralog: Non-overlapping, different regions (e.g. scaffolds) Genome alignment confirms inconsistent annotation Gramene SAB 2010

  17. Gene-Centered Synteny Build • 2010: Implemented with automated pipeline runnables • Release 31: monocots • Release 32: dicots Compara Orthologs Collinear mappings (DAGchainer) “in-range” mappings near collinear anchors Map Gramene SAB 2010

  18. Grape Reference Highlights Duplicated Regions in Arabidopsis and Poplar • Polyploid and segmental duplications manifest as co-syntenic regions • SyntenyView links to browser: Thus users can easily navigate between duplicated regions Gramene SAB 2010

  19. EPO Multiple Alignment & Ancestor Reconstruction • Gramene implementation in 2010 • Release 32: 8-way EPO alignment • Rice japonica, indica, Brachypodium, sorghum, Arabidopsis, A. lyrata, grape, poplar Paten et al (2008) Genome Research 18:1814 Paten et al (2008) Genome Research 18:1829

  20. 2010 Genomes Development: Constrained Elements • Genomic Evolutionary Rate Profiling (GERP): measures purifying selection • Method testing using 4-way and 8-way EPO alignments as input with varying parameters • Input tree generated from 1301 ortholog sets • Planning release in 2011 Cooper et al (2005) Genome Research 15:901 Gramene SAB 2010

  21. 2010 Genomes Development Gramene SAB 2010

  22. Tandem Duplicate Detection • Adjacent paralogs with no more than 2 intervening unrelated gene • Increase gene dosage • Diversifying selection • Often species-specific LRR-Kinase species-specific expansions LRR-Kinase cluster in rice Gramene SAB 2010

  23. Collaboration with Ensembl Genomes • Share conference calls • Developers meeting (Hinxton, UK, Sept. 2010) • Co-authored papers/posters • Two releases • Ensembl Developer’s Workshop Gramene SAB 2010

  24. Website Improvements • Home facelift: quick entry-points • Migrated to Apache 2.0 in Release 31

  25. REST Interfaces New RESTful interface for site gives greater user control over data views and format Gramene SAB 2010

  26. New Oryza Pages • Highlights this genus with images, phylogeny, geographic origin, & traits of interest • Entry points to browsers, germplasm, markers, & taxonomy ontology Gramene SAB 2010

  27. Web Services • Distributed Annotation Server (DAS) serving Ensembl genes as well as Gramene markers, sequences, and QTL • Gramene Mart integration with Galaxy • Public MySQL server • Diversity data via Tassel and GDPC • Subversion for code access Gramene SAB 2010

  28. Browser Development 2011 Plans • Communicate/distinguish gene-confidence information • 28% of MSU6 rice genes are annotated as “TE_related” and 17% are in poorly-conserved “hypothetical” class • 20% Sorghum genes are “low-confidence” (TE, pseudogenes, etc) • Color-code or display in separate tracks in browser • Color-code in gene-tree display • List/Display detailed gene-level synteny information • Explicitly list syntenic genes from Gene Page • Indicate that a gene is syntenic to one or more genes of a different species within the browser (e.g. color-code or synteny track) • List co-syntenic genes • 2 genes (in separate blocks) having synteny to a common gene in another species arose from a large scale duplication event (e.g. polyploidy or segmental). • Tandem Array track • Indicate clusters of paralogous genes within browser • [Challenges of low-depth or highly fragmented genomes, e.g. wheat & Physcomitrella] Gramene SAB 2010

  29. 2010 Ongoing Development Work • miRNA pipeline runnable • Refine and automate steps in miRNA annotation • Vmatch alignment • mfold RNA secondary structure prediction • Filter based on secondary structure • Gene-Build with RNAseq evidence data • First pilot experiments performed Gramene SAB 2010

  30. Questions for the SAB? • Nominate genomes • New data types e.g. RNAseq data available for current genomes that we may not be aware of • Any physical aspects of web site needing improvement Gramene SAB 2010

  31. Aim 2: Pathways Pankaj Jaiswal, PI Palitha Dharmawardhana, Jim Thomason, Vindhya Amarasinghe, Liya Ren, AS Karthikeyan, Marcela Monaco Note: Liya left the project this year and has been replaced by Marcela. Gramene SAB 2010

  32. Aim#2 Plan (2009-2010 / Year-3) • Continue curating Rice and Sorghum Pathways • Release MaizeCyc and BrachyCyc • Add all available microarray probesets to MarkerDb and allow OMICS viewer to validate • Develop Reactome database for (Rice) • Update the gene database schema to structure the allele based annotations on function, phenotype and interactions. • Maintain and Develop Ontologies

  33. Added BrachyCyc, MaizeCycUpdated Pathway tools twice to latest versions.Updated the individual pathway databases twice to be consistent with the Pathway tools versionRice Pathways curated by addition of hydroxycinnamic acid and serotonin biosynthetic pathways, updates to auxin biosynthesis, tryptophan biosynthesis. Addition of 80 transport reactions and 477 transporters Gramene SAB 2010

  34. Suggestions from last SAB Concerns on supporting three technologies: Cyc, Reactome, WikiPathways. Suggested moving to Reactome and allow the Cyc and WikiPathway databases to be populated by automated exports using BioPax. Gramene SAB 2010

  35. Reactome Database Build • Reactome: • Rice • Start with RiceCyc import and build on the existing Enselmbl and Curated Genedb resources • Arabidopsis • After consulting with the Reactome project and the Arabidopsis Reactome group, this will become part of the renewal effort. The work on it will start with integrating it in the Reactome central database from its current location in JIC (www.arabidopsis reactome.org) , followed by active curation. • Active curation will be primarily done in collaboration with Nick Provart’s group at Univ. of Toronto. • This is a new International Collaboration • Plan is to integrate the plant specific Reactome database instances in the Reactome central database, but provide a modified user interface for users. Gramene SAB 2010

  36. Rice Reactome • Initial build of the Rice Reactome started by importing the complete (curated and predicted) RiceCyc data in BioPax level-2 format. • A test-v2 Rice Reactome is available from this link. • The Reactome tools with some tweaking successfully imported 375 pathways and the children reactions • Efforts are now on to integrate the mappings to • ChEBI, Ligand and PubChem for compounds/metabolites • KEGG for EC enzymes • Uniprot • Drawing the network diagrams requiring manual curation. • Priority is to draw networks for fully curated Rice Pathways by using the Reactome tools • Integrate predicted models of regulatory pathways for rice based on the reference pathway projections for cell cycle, transcription, translation etc. • Curate test case rice pathways • Organized a week long workshop attended by curators from Gramene and BAR-Univ. of Toronto (Nick Provart’s group) • Mentored by Reactome co-PI Peter D’Eustachio • A test case of ABA metabolism and signaling was curated, which contained both the molecular and genetic interaction datasets. Gramene SAB 2010

  37. ABA metabolism and signaling pathway Klinger et al J. Exp. Bot. (2010) 61 (12): 3199-3210. Reactome model: A prototype reaction network, ABA-mediated transcriptional regulation, was laid out using material from Nambara & Marion-Poll (2005 – PMID: 15862093) to supplement the pathways of ABA synthesis and catabolism available as RiceCyc templates, and the regulatory processes discussed by Xiong et al. (2002 – PMID: 11779861) (especially Figure 10) and Klingler et al. (2010 – PMID: 20522527) Gramene SAB 2010

  38. Automated Cyc and WikiPathways builds • Based on the SAB suggestions, the progress has been made towards the goal of extending the annotation of pathway databases in Cyc and Wiki versions in an automated way. • However to do that approach we have to streamline the data workflow and structure the current curated gene database as a central repository/aggregator of necessary datasets to help achieve this goal. • The Curated Gene database schema was restructured to hold, whole genome based annotations on genes and alleles and their associations to function, phenotype, germplasm, pathways, gene-to-gene interactions, gene products, and gene models, besides providing cross references to sequencing project objects (like gene models from IRGSP-RAP, MSU-OSA, BGI gene models for rice O. sativa) and published literature. • Use aggregated datasets for automated Cyc build using the standard patwhay tools and provide the BioPax and SMBL dumps to WikiPathways project for their users. • Gramene’s focus will be pathway curation and annotation in Reactome and functional annotation in gene database. Gramene SAB 2010

  39. Outreach • Curated rice specific pathways and compounds contributed to PlantCyc and MetaCyc projects on reference pathway databases. • Organized Workshops • Community Gene Annotation Workshop at Plant Biology 2010 (July 2010) • Jointly organized with Plant Ontology (PO) Project. • Provided meeting support by way of website portal and onsite helping hands • Tool development (plant configurations of Phenote annotation tool and Ontologies) and funding provided by PO project. • Attended by about 35 researchers of which 12 were awarded travel support by PO. • Reactome workshop at CSHL, 25-29 October 2010 • Attended by Gramene and BAR curators • Mentored by Reactome database (Peter D’Eustachio) • Hands on curation of a test case pathway. • Analysis of RiceCyc import and current Reactome Annotation tools. • Development of curation strategy and annotation guidelines. Gramene SAB 2010

  40. Plans for 2010-2011 • Release Rice Reactome • Release curated gene database in new avatar as aggregator of gene information • Integrate microarray probeset mappings in OMICS validator for non-rice pathways • Conduct the gene and pathway annotation outreach workshops. • Develop test cases for upcoming Renewal and strategies for analyzing large-scale datasets generated by NextGen technologies on transcriptomics and metabolomics. • Maintain the current Cyc based Pathway views upgare to v14.5 and later of Ptools Gramene SAB 2010

  41. Pathway Collaborations • Metacyc/BioCyc (Peter Karp) • Reactome (Lincoln Stein, Peter D’Eustachio) • Arabidopsis Reactome (Nick Provart, Henning Hermjakob) • PlantCyc (Sue Rhee) • SolCyc and Solanaceae Genome Network (Lukas Mueller) • Phenote curation tool (Nomi Harris, Suzi Lewis) • Ontologies (GO, PO, OBO) • BrachyBase (Todd Mockler) • Sorghum Biofuel and Bioenergy Project (John Mullet) • MaizeSequence.org • MaizeGDB • Maize Pathways (Andrew Hanson) • C3-C4 project (Tim Nelson, Tom Brutnell, Chris Myer, R. Bruskiewich) • WikiPathways • Expression data (Todd Mockler, Tim Nelson, Tom Brutnell) Gramene SAB 2010

  42. Questions for SAB? • Nominate Pathways • Types of analysis users are interested in • Potential collaborators (national and International) Gramene SAB 2010

  43. Aim3: Gramene Diversity Module Susan McCouch & Edward Buckler, PIs Terry Casstevens, Genevieve DeClerck, Charles Chen, AS Karthikeyan, Jon Zhang, Qi Sun, Ken Youens-Clark. Gramene SAB 2010

  44. Suggestions from last year • Integration with key tools • We provide new SNP query tool, Web-launched Tassel, and downloads to work with Flapjack, in formats like Plink, HapMap, etc. • How about genotype storage? • Implemented BLOBs to store SNPs

  45. New Data Sets • Arabidopsis • Atwell et. al.. Genotype, phenotype, association data. ~214,000 SNPs, 199 Germplasm, 107 Phenotypes. • Rice • Zhao et. al PLoS May 2010, "1536 Assay": 1311 SNPs x 395 varieties, mapped to MSU6.0 • Gross B, et. al, Mol Ecol. Aug 2010 SNP diversity study from PG • Maize • dbSNP IDs and AGPv2 coordinate update for current dataset (1.6 million SNP x 27 NAM lines)

  46. Web Interface – SNP Query

  47. Downloads

  48. Tassel

  49. GWAS Visualization Gramene SAB 2010

  50. Tassel Development • New data structure significantly improving memory efficiency • Alignment viewer • User-friendly “wizards” • Progress monitoring with ability to cancel tasks • Import/export Hapmap, Flapjack, Plink data formats • Auto-loading and analysis execution from web site startup • GLM and MLM: • GLM interface simplified. • Compression and faster P3D implemented for MLM resulting in reduced runtime. • Matrix Algebra library wrapper written to make switching to newer, faster libraries easier. • EJML Matrix Algebra library interface implemented. • Tassel 3.0 Pipeline… • Automates complex loading/analysis pipelines • Doesn't need Java coding to create • Has simultaneously executing pipeline segments • Works from web site launch, command line, and GUI

More Related