1 / 53

MAKER 2014 What It Is Where It’s Been Where It’s Going

MAKER 2014 What It Is Where It’s Been Where It’s Going. Daniel Ence Yandell Lab University of Utah. What Are Annotations?. Annotations are descriptions of features of the genome Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes

teigra
Télécharger la présentation

MAKER 2014 What It Is Where It’s Been Where It’s Going

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MAKER 2014What It IsWhere It’s BeenWhere It’s Going Daniel Ence Yandell Lab University of Utah

  2. What Are Annotations? • Annotations are descriptions of features of the genome • Structural: exons, introns, UTRs, splice forms etc. • Coding & non-coding genes • Annotations should include evidence trail • Assists in quality control of genome annotations • Examples of evidence supporting a structural annotation: • Ab initio gene predictions • ESTs • Protein homology

  3. Secondary Annotation • Protein Domains and Families • InterPro • Pfam • GO and other ontologies • Pathways

  4. Genome Project Overview

  5. Genome Project Overview

  6. Genome Project Overview

  7. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  8. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  9. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  10. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  11. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  12. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  13. Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS

  14. MAKER An annotation pipeline and genome-database management tool for “next-generation” genome projects

  15. MAKER

  16. MAKER

  17. MAKER

  18. MAKER

  19. Beyond de novo annotation • mRNA-seq integration • Integrating new evidence into existing databases • Update/revise legacy annotation sets

  20. Beyond de novo annotation Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n new data current assembly • Identify legacy annotation most consistent with new data • Automatically revise it in light of new data • If no existing annotation, create new one

  21. Beyond de novo annotation Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n new data current assembly • Identify legacy annotation most consistent with new data • Automatically revise it in light of new data • If no existing annotation, create new one

  22. Distributed Parallelization • Supports Message Passing Interface (MPI), a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.

  23. Data throughput

  24. What happened in 2013?

  25. What happened in 2013? • MAKER-P

  26. What happened in 2013? • MAKER-P • Plant

  27. What happened in 2013? • MAKER-P • Plant • Parallelized

  28. What happened in 2013? • MAKER-P • Plant • Parallelized • Publication

  29. What happened in 2013 • Publication: MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations Campbell, Law, Holt et al., Plant Phys. 2013

  30. MAKER-P at iPlant • Atmosphere • MPI enabled for parallel computation • Maximum instance size 16 CPU • http://www.iplantcollaborative.org • TACC Lonestar • Supercomputer with 22,656 CPU • MPI enabled for parallel computation • Can complete entire rice genome in ~2 hrs (1,152 cores) • 96 CPU per chromosome • Currently being integrated into the iPlant Discovery Environment  http://www.iplantcollaborative.org • XSEDE https://www.xsede.org

  31. Data throughput Performance on Zea maize genome (~ 2Gb)

  32. Pinustaeda • 8,640 cpus on TACC • ~37 hours with queue (runtime 14 hours 37 minutes) • Throughput of > 1 Gb/hour

  33. Assembly & Annotation at iPlant

  34. Added to MAKER-P • non-coding RNA support • better repeat annotation • better pseudogene annotation

  35. non-coding RNA annotation • tRNAscan support • Will run from inside MAKER • Doesn’t install automatically • snoScan support • Can supply data file for annotation • Will run from inside automatically • Doesn’t install automatically

  36. Better Repeat Annotation • In the past: • Custom Repeat library • de novo generated RepeatModeler • Now: • RepeatModeler, but better. • Step-by-step guide available at: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic • To be automated in the future

  37. What’s Coming in 2014? • Expanded ncRNA support • MAKER-EVM • Expanded Augustus/bam support • Better integration with iPlant’s Discovery environment

  38. Expanded ncRNA annotation • More of a feeling than a to-do list • lncRNAs

  39. MAKER Evidence Modeler Haas et al., Genome Biology 2008

  40. MAKER Evidence Modeler Cantarel et al., 2008; Holt and Yandell, 2010

  41. MAKER Evidence Modeler EVM Cantarel et al., 2008; Holt and Yandell, 2010

  42. Better Augustus support • MAKER gives Augustus hints • Augustus can take better hints from a bam file • Users will be able to supply a bam file in the MAKER control file • Bam files open up a world of possibilities!

  43. Assembly & Annotation at iPlant

  44. Future Annotations • Trichmonasvaginalis • Pinustaeda • Apisdorsata • Cronartiumquercuum • Common Pigeon • Cardiocondylaobscurior • Southern right whale • Tardigrade • Spotted Gar • Gibbon • Turkey • 9 spinedstickelback • Golden Eagle

More Related