1 / 28

The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

Roche Life Sciences Workshop, Sept 2008. The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing. Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory.

Télécharger la présentation

The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Roche Life Sciences Workshop, Sept 2008 The Metagenomics RAST server: Annotation, Analysis, and ComparisonsPerfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory www.nmpdr.org www.theseed.org

  2. Outline • Metagenomics • Tools for analyzing sequences • Computational Challenges • Does it work? www.nmpdr.org www.theseed.org

  3. How much has been sequenced? 100 bacterial genomes Environmental sequencing First bacterial genome 1,000 bacterial genomes Number of known sequences Year www.nmpdr.org www.theseed.org

  4. How much will be sequenced? Everybody in USA Everybody in San Diego One genome from every species 100 people Most major microbial environments All cultured Bacteria www.nmpdr.org www.theseed.org

  5. Metagenomics(Just sequence it) 200 liters water 5-500 g fresh fecal matter 50 g soil Concentrate and purify bacteria, viruses, etc Epifluorescent Microscopy Extract nucleic acids Sequence Publish papers

  6. Metazoan associated Corals Fish Human blood Human stool Modern Metagenomics Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments Freshwater Aquifer Glacial lake Extreme Hot springs (84oC; 78oC) Soda lake (pH 13) Solar saltern (>35% salt) Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air

  7. The Problem How do you generate consistent and accurate annotations for metagenomes? www.nmpdr.org www.theseed.org

  8. The SEED Family www.nmpdr.org www.theseed.org

  9. Annotations using subsystems FIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex Extended subsystems into FIGfams – protein families that perform the same functions. www.nmpdr.org www.theseed.org

  10. Automated user originated processing Takes 1-7 hours depending on size and complexity of the genome ~2,000 external submissions, including hundreds of genomes not yet publicly released. Reannotation of >500 genomes complete 1,000 users, 200 organizations, 25 countries. Annotation of Complete Genomes http://rast.nmpdr.org/ www.nmpdr.org www.theseed.org

  11. The metagenomics RAST server www.nmpdr.org www.theseed.org

  12. Automated Processing

  13. Summary View www.nmpdr.org www.theseed.org

  14. Metagenomics ToolsAnnotation & Subsystems www.nmpdr.org www.theseed.org

  15. Metagenomics ToolsAnnotation & KEGG maps

  16. Metagenomics ToolsRecruitment Plots

  17. Metagenomics ToolsPhylogenetic Reconstruction

  18. Metagenomics ToolsComparative Tools

  19. Computational Requirements ~19 hours of compute per input megabyte Hours of Compute Time Input size (MB) www.nmpdr.org www.theseed.org

  20. How much so far 986 metagenomes 79,417,238 sequences 17,306,834,870 bp (17 Gbp) Average: ~15-20 M bp per genome Compute time (on a single CPU): 328,814 hours = 13,700 days = 38 years ~300 GS20 ~300 FLX ~300 Sanger www.nmpdr.org www.theseed.org

  21. Lots of sequencesall pyrosequencing www.nmpdr.org www.theseed.org

  22. Metagenomics ToolsFunctional Heat Maps

  23. Stress Membrane transport Sulfur Signaling Capsule Motility Phosphorus RNA Mine Saltern Respiration Marine Microbialites Fish Animals Coral Freshwater From Sequences To Environments CDA 60.2% CDA 21.7% Dinsdale et al, Nature 2008

  24. Workshops Free workshops on NMPDR, RAST, mg-RAST, SEED Contact Leslie McNeil lkmcneil@ncsa.uiuc.edu or visit http://www.nmpdr.org/ www.nmpdr.org www.theseed.org

  25. Acknowledgements FIG Ross Overbeek Veronika Vonstein Annotators Metagenomics Annotation Server Rick Stevens Folker Meyer Bob Olson Daniel Paarman Mark D'Souza Jared Wilkening Andreas Wilke Statistics & Web services Liz Dinsdale Robert Schmieder Dana Hall Beltran Rodriguez-Brito Bahador Nosrat Argonne Sequencing Marc Domanus Areej Ammar Environmental Genomics Forest Rohwer All the labs that provided sequence Artist Paula Morris www.nmpdr.org www.theseed.org

  26. Artists impression : not all machines are known to explode

  27. Terragenomics

  28. Differences between soil samples

More Related