1 / 35

greengenes.lbl 16S rRNA gene database and workbench compatible with ARB

greengenes.lbl.gov 16S rRNA gene database and workbench compatible with ARB . Todd DeSantis, Phil Hugenholtz, Niels Larson, Igor Dubosarskiy, Jordan Moberg, Yvette Piceno, Ingrid Zubieta, Eoin Brodie, Gary Andersen LBL - JGI. Andersen Group Program Aims.

georgette
Télécharger la présentation

greengenes.lbl 16S rRNA gene database and workbench compatible with ARB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. greengenes.lbl.gov16S rRNA gene database and workbench compatible with ARB Todd DeSantis, Phil Hugenholtz, Niels Larson, Igor Dubosarskiy, Jordan Moberg, Yvette Piceno, Ingrid Zubieta, Eoin Brodie, Gary Andersen LBL - JGI

  2. Andersen Group Program Aims • Creating a microarray for the simultaneous differentiation and quantification of closely related prokaryotes in complex samples.

  3. The Biomarker 16S rDNA - identify and classify organisms by gene sequence variations. 16S rDNA rRNA (functional molecule) LSU SSU

  4. The Challenges • 16S sequence deposit rate is increasing. • Many are mis-annotated and/or chimeric. • Sequence Taxonomy updates lags years behind sequence availability (“Bacteria, Unclassified”). • Difficult to create and manage MSAs of all 16S seq data (or even thousands) using Clustal/BioEdit/Arb. • Probe quality is reliant on excellent MSAs and taxonomy. • “Signatures” can erode as more sequences are discovered.

  5. greengenes.lbl.gov

  6. greengenes.lbl.govStay current Source: http://www.ncbi.nlm.nih.gov/ ‘16S NOT 1.16S NOTmitochondr* NOT 18S’

  7. greengenes.lbl.govVerify ‘16S-ness’

  8. Hand curated MSA provided by Phil. • Alignment "template" is top BLAST HSP • q= -1, Favors long match • Candidate trimmed of extra-16S seq data • tRNA, intergenic spacer regions, and 23S rDNA • based on HSP boundries • If HSP paired opposite strands, candidate is reverse complemented. NAST alignstep 1: find template

  9. Hand curated MSA provided by Phil. • Alignment "template" is top BLAST HSP • q= -1, Favors long match • Candidate trimmed of extra-16S seq data • tRNA, intergenic spacer regions, and 23S rDNA • based on HSP boundries • If HSP paired opposite strands, candidate is reverse complemented. NAST alignstep 1: find template

  10. Hand curated MSA provided by Phil. • Alignment "template" is top BLAST HSP • q= -1, Favors long match • Candidate trimmed of extra-16S seq data • tRNA, intergenic spacer regions, and 23S rDNA • based on HSP boundries • If HSP paired opposite strands, candidate is reverse complemented. NAST alignstep 1: find template

  11. Hand curated MSA provided by Phil. • Alignment "template" is top BLAST HSP • q= -1, Favors long match • Candidate trimmed of extra-16S seq data • tRNA, intergenic spacer regions, and 23S rDNA • based on HSP boundries • If HSP paired opposite strands, candidate is reverse complemented. NAST alignstep 1: find template

  12. NAST alignstep 2: gap removal Preserves global MSA positions(columns) by allowing local misalignments. DEFINE St= post-Align0 template sequence. Sc= post-Align0 candidate sequence. Ht = alignment space (hyphen) inserted into Stby Align0. Hc = alignment space (hyphen) inserted into Scby Align0. WHILE (St contains one or more Ht) DO LHt = character index of distal 5' Ht within St L5' = character index of Hc within Sc which is 5' proximal to Ht L3' = character index of Hc within Sc which is 3' proximal to Ht IF ((LHt – L5') > (L3' – LHt)) Delete Hc found at L3' ELSE Delete Hc found at L5' Delete template gap character. END WHILE Result: Largest MSA of full-length (>1250 nt) 16S rDNA genes.

  13. greengenes.lbl.govName generator Genbank record Is sequence from whole genome record? • NCBI annotations are non-standardized • Determine if sequence is from an isolate, environmental amplicon/metagenome • Concatenate useful terms • Effort to guide future GenBank submitters in clear record descriptions • http://www.jgi.doe.gov/16s/ no Glob text from “DEFINITION”, “source”, and “TITLE” “Genus species” style name in DEFINITION or source>organism? Does a source>isolate field exist? Text glob contains “clone” OR “uncultur”? yes yes no yes no yes Record is from an isolate no if Gs Gs result? “Gs yes” Text glob “Isolate tag no” “Isolate tag yes” “Gs no” yes no yes no Text glob contains “symbiont”? Strain tag is present Record is from a clone Isolate tag present? Record is from a symbiont Record is from undecided yes no Record is from a isolate_str

  14. greengenes.lbl.govChimera tracking • Amplicons from complex gDNA can contain partial sequence from more than one genome. • Up to 4% of sequences are deemed chimeric by Bellerophon2 • Flags are set to avoid using these questionable sequences in phylogeny assessments

  15. greengenes.lbl.govMaintain Taxonomy JGI taxonomy organized in ARB using maximum parsimony tree insertions. Example: http://greengenes.lbl.gov/cgi-bin/User/show_one_record_v2.pl?prokMSA_id=82172 prokMSA_id: 82172 prokMSAname: termite gut clone Rs-050 GenBank ACCESSION: AB100461.1, GenBank GI: 28971862, RDP_id: S000122947, NCBI_tax_id: 203524, Study_id: 21358 G2_chip_tax_string=Bacteria; Firmicutes; Clostridia; Clostridiales; Peptostreptococcaceae; sf_5; otu_2988 JGI_tax_string=Bacteria; Firmicutes (incl. basal lineag; Firmicutes; Peptostreptococcaceae; Mogibacterium JGI_tax_string_format_2=Bacteria; Firmicutes (incl. basal lineag; Firmicutes; Peptostreptococcaceae; Mogibacterium; otu_415 Pace_tax_string=Bacteria; Firmicutes; Clostridium et al.; Peptostreptococcaceae; Clostridium acidiurici et al.; Clostridium difficile et al.; Clostridium aminobutyricum et RDP_tax_string= Bacteria; Firmicutes; Clostridia; Clostridiales; unclassified_Clostridiales. ncbi_tax_string=Bacteria; Firmicutes; Clostridia; Clostridiales; Eubacteriaceae; environmental samples

  16. greengenes.lbl.govMaintain Taxonomy

  17. greengenes.lbl.gov Tools • BLAST • SimRank • Probe matcher • Text search • PCR primer design • Private NAST aligner

  18. greengenes.lbl.govCompatible with ARB • Entire data base download-able in ARB format. • Can import new records into personal ARB data base.

  19. How we use greengenes data to get our work done…..

  20. 16S Sequence clustering • Each sequence reduced to an array (list) of “probe-friendly” 25-mers which: • Have high complexity • Can be synthesized with 75 or fewer masks • Adequate H-bond potential • G+C content over 48% • Or empirical bond stability found in test arrays • Transitive clustering by fraction of 25mers in common • Cluster considered an Operational Taxonomic Unit (OTU)

  21. Extended Bergey’s Taxonomy Bergey’s v0.9 with added nomenclature from Hugenholtz tree of environmental DNA • Each OTU assigned to one of 455 families • Families split into subfamilies where >15% sequence variation existed. • Results: (considering both domains) • 63 phyla • 136 classes • 262 orders • 455 families • 842 subfamilies (~94% identity) • 8,989 OTUs (~99% identity) • 30,627 sequences (each belong to only one OTU)

  22. Probe Design Desulfovibrio sp. str. DMB. Desulfovibrio sp. 'Bendigo A' Desulfovibrio vulgaris DSM 644 Example of the Location of Probes Used for the Desulfovibrio vulgaris Probe Set Sequence discrepancies Regions not unique to OTU Bacteria; Proteobacteria; Deltaproteobacteria; Desulfovibrionales; Desulfovibrionaceae; sf_1; otu_10051 Regions unique to OTU

  23. 22/22 25/25 20/25 Example: proteobacteria OTU composed of 26sequences Locus Specific Prevalence Scoring

  24. Probe selection objectives for each OTU • Find 11 or more 25mers (targets) • >90% prevalent in an OTU’s sequences • dissimilar from sequences outside the OTU • >48% G+C or empirically responsive • >1 loci within 16S rDNA gene • Presumed cross-hybridizing probes were those 25-mers that contained a central 17-mer matching sequences in more than one OTU (Urakawa, Stahl et al. 2002) • avoiding probes that were unique solely due to a mismatch in one of the outer four bases. • As each PM probe (Perfect Match to target) was chosen, it was paired with a control 25-mer (mismatching probe, MM), identical in all positions except the thirteenth base. • The MM probe did not contain an internal 17-mer complimentary to sequences in any OTU.

  25. Overview of Sample Preparation A C G G T C G A A C G G T C G A A C G G T C G A A C G G T C G A A C G G T C G A Extract Genomic DNA PCR Amplify DNA 18 µ Fractionate DNA 18 µ End-label with biotin Hybridize

  26. Over 500,000 data points Image Capture and Data Reduction • Scores for each of 9000 OTUS

  27. Distribution of 16S rDNA Sequences detected via Cloning or Microarray Analysis Clone Hits Only (8) Clone and Array Hits (73) Array Hits Only (97) Confirmed by specific PCR and sequencing: Actinobacteria; Actinosynnemataceae; sf_1 Nitrospira; Nitrospiraceae; sf_1 Clostridia; Syntrophomonadaceae; sf_5 Planctomycetes; Plantomycetaceae; sf_3 Gammaproteobacteria; Pseudoaltermonadaceae; sf_1 Acidobacteria; Ellin6075/11-25; sf_1 Spirochaetes; Spirochaetaceae; sf_1 Spirochaetes; Spirochaetaceae; sf_3 Spirochaetes; Leptospiracea; sf_3

  28. Array is quantitative r = 0.917

  29. Array is quantitative ~1011 16S gene copies ~107 16S gene copies

  30. Example query against meteorological data: Does detection of Actinobacterium PENDANT-38 correlate with temperature?

  31. Species specific - Geothrix fermentans Group specific - Geobacteraceae Real-time quantitative PCR confirmation of array monitoring. Uranium Bioremediation – is uranium re-oxidation under reducing conditions due to loss of metal reducers? (a) Array quantitation (b) qPCR quantitation

  32. Real-time quantitative PCR confirmation – Urban Aerosol Array hybridization signal correlates significantly with 16S copies in environmental aerosol DNA extract Pseudomonas oleovorans example

  33. FEMS Letters - pseudoshift

  34. Acknowledgements • Phil Hugenholtz – Taxonomy, Arb Interface, Chimera • Niels Larson – SimRank • Igor Dubosarskiy – JSP • Jordan Moberg – Microarrays, Cloning • Yvette Piceno – Microarrays, Primer Design • Ingrid Zubieta – PCR, Cloning • Eoin Brodie – Microarrays, QPCR • Gary Andersen – 16S Microarray Group Leader

  35. C. perfringens probe set identified in EPA sample 22 (N.Y. Spring) C.AURANTIBUTYRICUM CFB C.THERMOBUTYRICUM_SUBGROUP C. BUTYRICUM Cyan High G+C C.ALGIDICARNIS Proteo Bacteria C.BOTULINUM_SUBGROUP C.CADAVERIS Bacil-Strep Gram + C.PERFRINGENS C.BARATI_SUBGROUP Clostridium 1492 27 16S rDNA 420 469 5 6 7 8 ...CGTAAAGCTCTGTCTTTGGGGAAGATAATGACGGTACCCAAGGAGGAAGCCACGGCTAACT... C. perf. str.CPN50 ................................................................... C. perf. resistant ................................................................... Clostridium sp. AB&J ................................................................... clone p-4636-2Wa2 ................................................................... C. perf. A ................................................................... C. perf rrnA ................................................................... C. perf rrnE .................................T................................. C. perf rrnD ................................................................... C. perf rrnC ................................................................... C. perf rrnB ................................................................... C. perf rrnF ................................................................... C. perf rrnG ................................................................... C. perf str.13a ................................................................... C. perf str.13b ................................................................... C. perf rrnH ................................................................... C. perf rrnI ................................................................... C. perf rrnJ ................................................................... clone OI1612 ................................................................... C. perf. B ................................................................... Swine manure 37-3 ................................................................... Swine manure 37-4 TAAAGCTCTGTCTTTGGGGAAGATA tacccaaggaggaagccacggctaa AAAGCTCTGTCTTTGGGGAAGATAA AAGCTCTGTCTTTGGGGAAGATAAT AGCTCTGTCTTTGGGGAAGATAATG Ave Diff =1891 Probe Properties: 25mer exits in 90% of the taxon’s seqs Internal 21mer exists only in one taxon. Probes 5 - 8

More Related