E N D
1. De novo assembly from Illumina/SOLEXA short reads: Assemblers and trends By: Urmi Trivedi
The Gene Pool
University of Edinburgh
2. Contents Challenge of assembling short reads
De novo assembly at The Gene Pool using several assemblers available
Comparison between short read assemblers
Assembly quality criteria
Factors affecting assembly
Concluding Remarks
3. De novo assembly of micro reads - A challenge Large amount of data but demands new analytical methods
Micro-reads assembly is challenging
Large amount of computing power to calculate the overlaps
Quality issues and hence chances of ambiguous assembly
4. Assemblers so far.. Velvet
Edena
SSAKE
SHARCGS
SHRAP
ALLPATHS
EULER-SR
5. De novo assembly at The Gene Pool using several assemblers Single end data is generated for several genomes
Listeria monocytogenes
Rhodococcus equi
Campylobacter jejuni
Magnetospirillum magneticum (AMB-1)
Streptomyces cattleya
Photorhabdus temparata
Mainly used assemblers:
Velvet
Edena
SSAKE
7. Measures of assembly N50
Largest contig formed
% bases in contigs >= 1KB
Total bases in contigs
Any other suggestions are WELCOME ?
8. De novo assembly of Listeria monocytogenes (~2.9 MB) with Edena
9. De novo assembly of Streptomyces cattleya (~8.9 MB) with Edena
10. Factors affecting assembly Varying assemblies from organism to organism
Biology along with technology
Several factors may be affecting assembly:
Coverage
% GC
Genomic content (repetitive regions, transposons, etc.)
11. Effect of Coverage Coverage = N*36/G
N=Total Number of Reads
G=Genome Size
Higher the coverage better the assembly
Some cases differ
Responsible factors ??
Possibly genomic content
12. %GC and %Base content in contigs >=1KB Higher the GC content poorer the assembly
Again, some cases differ
Why ???
13. Limitations How far can you go with unpaired data? A fundamental limitation
Paired ends (ALLPATHS upcoming assembler)
Combined approach with longer reads (Sanger or 454)
Raising Coverage (~80X)
14. Concluding Remarks Surprising to get such long contigs from such short reads!
De novo assembly from short reads was thought as impossible but the number of papers published for the same suggests otherwise.
Something will come up which will do the job or perhaps any of us here may be The One ?