Download
genovo de novo assembly for metagenomes n.
Skip this Video
Loading SlideShow in 5 Seconds..
Genovo : De Novo Assembly for Metagenomes PowerPoint Presentation
Download Presentation
Genovo : De Novo Assembly for Metagenomes

Genovo : De Novo Assembly for Metagenomes

133 Vues Download Presentation
Télécharger la présentation

Genovo : De Novo Assembly for Metagenomes

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Genovo: De Novo Assembly for Metagenomes Gao Song 2010/07/14

  2. Outline • Overview of Metagenomices • Current Assemblers • Genovo Assembly

  3. Overview of Metagemices

  4. Motivation • Metagenomics is: • Why Do We Need Metagenomics? • Snapshot of bacterial community • Cannot be cultivated <1%

  5. Applications • Monitoring the impact of pollutants on ecosystems • Discovery of new genes, enzymes… - Global Ocean Sampling Expedition • Human Microbiome Project • JGI sequenced Acid Mine Drainage sample

  6. Two Paradigms • Marker Gene Sequencing • 16s rRNA: • Two ways • Other marker genes: RuBisCo, NifH • Only composition • Whole Genome Sequencing (WGS) • Detailed picture of community

  7. Complex Communities X5000 >1000 200L 1million

  8. Current Assembler

  9. Current Status • Why not assemble reads? • ORFome assembler* • Three steps: • The putative ORFs are annotated for each read • ORFs are assembled using EULER • ORF homologs are searched for in Integrated Microbial Genomics (IMG) database • Existing WGS assemblers • Sanger reads: Phrap, Celera, Arachne, JAZZ… • Short reads: Velvet, Newbler… * Y. Ye and H. Tang, "An orfome assembly approach to metagenomics sequences analysis." Journal of bioinformatics and computational biology, vol. 7, no. 3, pp. 455-471, June 2009

  10. Genovo: De Novo Assembly for Metagenomes Jonathan Laserson, Vladimir Jojicand Daphne Koller. RECOMB 2010, LNBI 6044, pp. 341-356, 2010

  11. Main Idea • Propose a generative model for Metagenome data • Using iterated conditional modes (ICM) • Using hill-climbing steps iteratively • Design a score for evaluation

  12. Model • Initialize contigs: • Infinite contigs with infinite length • Partition the reads • Using Chinese Restaurant Process

  13. Model • Generate the starting point oi • Generate the length of read • Quality of assembly of each read

  14. Algorithm • Using ICM • Starting from initial condition, hill-climbing moves are performed iteratively • Move 1: Consensus Sequence: • Select the most frequent base

  15. Algorithm • Move 2: Read Mapping • For read i, first remove it, then recalculate its contig and alignment • First, for each potential location, compute alignment • Then, select the location according to possibility • Filtering: using common 10-mer

  16. Algorithm • Move 3: update geometric variable -> • Globle moves: • Propose indels • Center • Merge contigs • Chimeric reads • Disassemble the dangling contigs

  17. Evaluation • BLAST • PFAM • Designed score • 1stterm: quality of assembly • 2nd term: penalty for total length • 3rd term: prefer to merge when V>V0

  18. Results • Using 454 reads • Compare with Newbler, Velvet and EULER-SR • Single Genome

  19. Result • Metagenome data • Score • PFAM

  20. Discussion • New idea • Apply a mature algorithm to assembly domain • Systematically describe and analyze the problem and algorithm • Results are better

  21. Discussion • Slowly: minute vs. hours for 300k 454 reads • Main idea: try to extend as long as possible, so they will have more hits for BLAST • Why choose 20 for V0? • How to deal with branching? Repeats? • Model: • Why it can capture the property of metagenomic data? • How to argue the correctness of that model? • The distribution of starting points

  22. Thank you