1 / 22

Natalia Ivanova

Metagenome analysis: use case. Natalia Ivanova. MGM Workshop February 2, 2012.

Télécharger la présentation

Natalia Ivanova

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metagenome analysis: use case Natalia Ivanova MGM Workshop February 2, 2012

  2. …it seemed as though the sea was being sucked backwards, as if it were being pushed back by the shaking of the land…Behind us were frightening dark clouds, rent by lightning twisted and hurled, opening to reveal huge figures of flame. These were like lightning, but bigger. From Pliny the Younger’s Letter Minoan eruption and metagenomics

  3. Diagram by Gary Massoth/PMEL Apart from Minoan eruption… from Chernicoff & Stanley, Geology, 2007

  4. white mat red mat Key gradients white vs red: Temperature 60 vs 18oC CO2 tension >99% vs <1% Sampling sites

  5. This is what it looks like

  6. Chimney material may be of biological origin

  7. Standard JGI metagenome pipeline 454 standard DNA sample shotgun libraries 454 long mate pair DNA QC Illumina standard SSU pyrotags Illumina long mate pair Assembly http://pyrotagger.jgi-psf.org • Community composition • Semi-quantitative – OTU abundance Metagenome IMG/M-ER contigs + unassembled reads • Community composition • Functional analysis Analysis

  8. Pyrotag results – BLASTn against Greengenes database

  9. PhyloDistribution results – BLASTp of metagenome CDSs against isolates in IMG

  10. Pyrotags vs PhyloDistribution – white mat Big differences in abundance (an order of magnitude or more) of Bacteroidetes and Thermotogae

  11. Amplification artifacts in pyrotags – well known for metagenome data Sequencing GC bias in the metagenome – low and high (<30% and >65%) are underrepresented in Illumina data K-mer assembler problems: abundant populations may be undrrepresented in assembly if incorrect k-mer/coverage parameters selected Primer bias in pyrotags (against Proteobacteria)? Possible explanations

  12. PCR artifacts in metagenome data 454 technology includes an emulsion PCR step, which may lead to artificial overrepresentation of certain sequences Reason: presence of free beads during the library prep step; escaped emPCR products bind to free beads and are disproportionately amplified 12

  13. What about GC bias? Medium GC (Arcanobacterium) High GC (Cellulomonas) Low GC (Brachyspira) Question: how do you find average/max/min GC content for a clade? Answer: IMG=>Genome Browser=>View Phylogenetically=>click on green + to select the clade, then “Add selected to Genome Cart”=>Compare Genomes=>Genome Statistics Result: Thermotogae GC percent 41 average/47 max/31 min Bacteroidetes GC percent 42.5 average/66 max/31 min

  14. Are there any abundant populations that could be filtered out in assembly? Typical Pyrotagger output There are 2 highly abundant populations – just 2 clusters account for nearly all Bacteroidetes and Thermotogae in the sample

  15. Let’s take a closer look at the assemblies and unassembled reads

  16. JGI uses primer pair 946F-1492R 1492R primer TACGCYTACCTTGTTACGACTT TACGGTTACCTTGTTACGACTT Sequence in the metagenome CG mismatch JGI did extensive testing on artificial communities – this problem not detected It’s pyrotag bias after all!

  17. Red mat is taxonomically more diverse Is it more diverse functionally? Rarefaction curves: white mat is expected to have ~4000 different Pfams; red mat ~3600 Functional analysis: metagenome as a bag of functions Question: where do you find this information? Answer: IMG=>Taxon Details=>Metagenome Statistics; Genes with Pfam=>Display as a list =>Export

  18. Abundance Comparisons Motility and chemotaxis genes are overrepresented in white mat (detected by both Pfams and COG Categories) white mat red mat

  19. Scenario 1: the function/pathway is overrepresented because it is present in all members of the community, possibly at higher copy number Scenario 2: the function/pathway is overrepresented because it is present in one clade, which is absent from the second sample Is motility/chemotaxis common to all organisms in white mat? Question: can we distinguish between the two scenarios? Answer: click on the gene count for protein family/functional category, add all genes to Gene Cart=>add scaffolds to Scaffold Cart=>PhyloDistribution of all scaffolds in the Scaffold Cart

  20. The total number of sequences in all clusters assigned to Epsilonproteobacteria is 50 in white mat and 66 in red mat Largest cluster in white mat includes 125K+ sequences Largest cluster in red mat includes 14K+ sequences Question: what about the presence of Sulfurimonas-like bacteria in the metagenomes? Answer: go to Compare Genomes=>PhyloDistribution=>Genome vs Metagenomes, select the genome; the histogram shows the number of BLASTp hits from CDSs in all metagenomes to this genome Are Sulfurimonas-like bacteria present in both samples?

  21. Are there any methylotrophs in the white mat?

  22. Two communities have different composition; white mat sampled next to the hydrothermal vent has lower complexity Community composition as sampled by pyrotags and the metagenome may be quite different due to a number of biases Some protein families/functional categories are more abundant in one sample as compared to the other because of different community composition, and not necessarily because they are more important in this environment Conclusions

More Related