530 likes | 679 Vues
MAKER 2014 What It Is Where It’s Been Where It’s Going. Daniel Ence Yandell Lab University of Utah. What Are Annotations?. Annotations are descriptions of features of the genome Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes
E N D
MAKER 2014What It IsWhere It’s BeenWhere It’s Going Daniel Ence Yandell Lab University of Utah
What Are Annotations? • Annotations are descriptions of features of the genome • Structural: exons, introns, UTRs, splice forms etc. • Coding & non-coding genes • Annotations should include evidence trail • Assists in quality control of genome annotations • Examples of evidence supporting a structural annotation: • Ab initio gene predictions • ESTs • Protein homology
Secondary Annotation • Protein Domains and Families • InterPro • Pfam • GO and other ontologies • Pathways
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
Genome Project Overview >Smg5 MEVTFSSGGSSNASSECAIDGGTNRCRGLEPNNGTCILSQEVKDLYRSLYTASKQLDDAKRNVQSVGQLFQHEIEEKRSLLVQLCKQIIFKDYQSVGKKVREVMWRRGYYEFIAFV SUCCESS
MAKER An annotation pipeline and genome-database management tool for “next-generation” genome projects
Beyond de novo annotation • mRNA-seq integration • Integrating new evidence into existing databases • Update/revise legacy annotation sets
Beyond de novo annotation Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n new data current assembly • Identify legacy annotation most consistent with new data • Automatically revise it in light of new data • If no existing annotation, create new one
Beyond de novo annotation Legacy Annotation Set 1 Legacy Annotation Set 2 Legacy Annotation Set n new data current assembly • Identify legacy annotation most consistent with new data • Automatically revise it in light of new data • If no existing annotation, create new one
Distributed Parallelization • Supports Message Passing Interface (MPI), a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.
What happened in 2013? • MAKER-P
What happened in 2013? • MAKER-P • Plant
What happened in 2013? • MAKER-P • Plant • Parallelized
What happened in 2013? • MAKER-P • Plant • Parallelized • Publication
What happened in 2013 • Publication: MAKER-P: a tool-kit for the rapid creation, management, and quality control of plant genome annotations Campbell, Law, Holt et al., Plant Phys. 2013
MAKER-P at iPlant • Atmosphere • MPI enabled for parallel computation • Maximum instance size 16 CPU • http://www.iplantcollaborative.org • TACC Lonestar • Supercomputer with 22,656 CPU • MPI enabled for parallel computation • Can complete entire rice genome in ~2 hrs (1,152 cores) • 96 CPU per chromosome • Currently being integrated into the iPlant Discovery Environment http://www.iplantcollaborative.org • XSEDE https://www.xsede.org
Data throughput Performance on Zea maize genome (~ 2Gb)
Pinustaeda • 8,640 cpus on TACC • ~37 hours with queue (runtime 14 hours 37 minutes) • Throughput of > 1 Gb/hour
Added to MAKER-P • non-coding RNA support • better repeat annotation • better pseudogene annotation
non-coding RNA annotation • tRNAscan support • Will run from inside MAKER • Doesn’t install automatically • snoScan support • Can supply data file for annotation • Will run from inside automatically • Doesn’t install automatically
Better Repeat Annotation • In the past: • Custom Repeat library • de novo generated RepeatModeler • Now: • RepeatModeler, but better. • Step-by-step guide available at: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Basic • To be automated in the future
What’s Coming in 2014? • Expanded ncRNA support • MAKER-EVM • Expanded Augustus/bam support • Better integration with iPlant’s Discovery environment
Expanded ncRNA annotation • More of a feeling than a to-do list • lncRNAs
MAKER Evidence Modeler Haas et al., Genome Biology 2008
MAKER Evidence Modeler Cantarel et al., 2008; Holt and Yandell, 2010
MAKER Evidence Modeler EVM Cantarel et al., 2008; Holt and Yandell, 2010
Better Augustus support • MAKER gives Augustus hints • Augustus can take better hints from a bam file • Users will be able to supply a bam file in the MAKER control file • Bam files open up a world of possibilities!
Future Annotations • Trichmonasvaginalis • Pinustaeda • Apisdorsata • Cronartiumquercuum • Common Pigeon • Cardiocondylaobscurior • Southern right whale • Tardigrade • Spotted Gar • Gibbon • Turkey • 9 spinedstickelback • Golden Eagle