300 likes | 408 Vues
This report presents the final results of a comprehensive analysis of various genome assembly pipelines, including 454, Illumina, and hybrid assemblies. The study evaluates several assemblers such as Newbler, CABOG, and SOAP Denovo, employing quality control measures like GAGE and FastQC. Parameter optimization and pre-processing efficiencies are analyzed, providing insights into the impact on assembly performance. The results summarize the assembly scores, spanning ratios, and performance metrics, highlighting the effectiveness of de novo assembly compared to reference-guided approaches for Vibrio vulnificus strains.
E N D
Final ResultsGenome Assembly Team Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick
Original Pipeline 454 • Illumina DeNovo • Allpaths LG • SOAP DeNovo • Velvet • Taipan • SUTTA • Hybrid DeNovo • Ray • MIRA Parameter optimization 454 raw reads Illumina raw reads Illumina hybrid • 454 DeNovo • Newbler • CABOG • SUTTA Process Illumina GAGE Statistical analysis Pre-processing 454 Evaluation Info. Illumina/ 454/ Hybrid DeNovo assembly Assemblers • GAGE • Hawk-eye Fastqc Prinseq NGS QC Assemblers Chosen Ref. Unmapped reads All possible combinations of the best 3 454 reads Illumina reads Read stats LEGEND contigs * 3 • Mimimus • MAIA Finished genome Scaffolds PRE-PROCESSING Align illumina reads against 454 contigs CONTIG MERGING Unmapped reads • MUMmer • PAGIT • Mauve Published Genomes from public databases Mac vector CLC wb V. vulnificus YJ016 V. vulnificus CMCP6 V. vulnificus MO6-24/O contigs Gap filling Nulceotide identity DENOVO ASSEMBLY GENOME FINISHING bwa Unmapped reads • GRASS • Built-in Align Illumina against the reference samstats contigs Compare mapping statistics Reference genome Illumina/(454?) reference based assembly Draft/ Finished genome • MUMmer • DNA Diff AMOScmp Reference evaluation Reference evaluation REFERENCE SELECTION REFERENCE BASED ASSEMBLY
Read Visualization – spot the differences Comparison of 454 Reads for 08-2462 (low coverage) and 2541-90 (improved coverage) Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Read Visualization - more is better! Nav 08-2462 454 reads compared to Nav 08-2462 Illumina reads. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Read Visualization – cousins or siblings? Nav_2541-90 and Vul_06-2432 (454 and Illumina reads) coverage comparison. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Data Quality Effect of pre-processing data (using prinseq)
V. navarensis (454; non-preprocessed|pre-processed) Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. Vulnificus (454; non-preprocessed|preprocessed) • Metric Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. navarensis (Illumina; non-preprocessed|preprocessed) Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
V. vulnificus (Illumina; non-preprocessed|preprocessed) Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Assembly Reference-guided and de-Novo
Reference guided assembly Comparison of reference guided assembly vs de-novo assembly
ARE – Assembly Score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Reference-guided vs de-Novo assembly ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Summary of Reference-guided assembly • Using V. vulnificus (CMCP6) reference strain • 84% coverage • De-Novo assemblers overall provided higher assembly score than reference based assembly Pipeline/Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
De Novo Assembly Pipeline / Read Processing / Assembler Results / ContigMerging / Assembler Review / Pipeline / Final Results
De-Novo Assembler Comparison (Optimal Parameters) ARE Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus CABOG Span Ratio Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Newbler (dn) has been removed to show variance in other tools. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Final Results – V. vulnificus 1000/(Break Points) Graph comparing assemblers on 3 criteria: Assembly Score, Span Ratio, 1/(Break Points). Higher score for all criteria are preferable. Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Summary of de-Novo results • OLC assemblers showed considerable differences in ARE than de-Brujin based assemblers • Cabog/Newblervs Soap de-Novo/Velvet • Hybrid assembler, Ray, did not perform as well in terms of assembly score Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Vul_06-2432 Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Merging-Nav_2541-90 Pipeline / Read Processing / Assembler Results / ContigMerging / Assembler Review / Pipeline / Final Results
Assembler Review Mira worked as good as our merged contigs but it is impractical – 40hr run time BB = branch-and-bound; OLC = overlap consensus; DBG = de Bruijn Graph; ZEBRA Pipeline / Read Processing / Assembler Results / ContigMerging / Assembler Review/ Pipeline / Final Results
Final Pipeline • Illumina DeNovo • Velvet 454 • Hybrid DeNovo • Ray • Mira 454 raw reads Illumina raw reads Illumina • 454 DeNovo • Newbler • CABOG hybrid Process Illumina Statistical analysis Pre-processing 454 Info. Illumina/ 454/ Hybrid DeNovo assembly Assemblers Fastqc Prinseq Assemblers Merge Ray –hyb/ Newbler Merge CABOG/Velvet MIRA-hyb LEGEND 454 reads Illumina reads Read stats contigs Mimimus Draft genome PRE-PROCESSING Align illumina reads against 454 contigs CONTIG MERGING contigs DENOVO ASSEMBLY Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Splinter Pipeline 1 Pipeline 2 Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results
Visualization Newbler Ray Hybrid Merged Pipeline / Read Processing / Assembler Results / Contig Merging / Assembler Review / Pipeline / Final Results