480 likes | 618 Vues
Explore the evolving field of high-throughput comparative genomics presented by Joe Parker from Queen Mary University, London. This talk delves into the significance of phylogenomics, shedding light on its benefits over traditional genetics. Key topics include background motivations, practical analyses, and future developments in next-generation sequencing (NGS). Real-world examples and case studies demonstrate the application of genomic technologies. The session addresses current challenges and outlines innovative approaches on the horizon, making it essential for those interested in ecology, evolution, and genomic advancements. ###
E N D
High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London
Topics • Introduction • Background: why phylogenomics? • Examples • Practice • Case study • On the horizon • Over the horizon
Aims • Context of phylogenomics: Next-generation sequencing (NGS) • Why phylogenomics? • Practical analyses • Future developments
Lab Interests • Ecology and evolution of traits • Echolocation, sociality • NGS data for population genetics and phylogenomics
Activities • Phylogeny estimation/comparison • Molecular correlates of evolution; • site substitutions, dN/dS, composition • Simulation • Dataset limitations (R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey
Why phylogenomics, not -genetics? • Causes of discordant signal • Incomplete lineage sorting • Lateral transfer • Recombination • Introgression
Quantitative biology • Multiple configurations • Hyperparameters empirically investigated • Determine sensitivity of results
Distributions • Genome-scale data provides context • Identify outliers Genes / taxa / trees • Compare values across biological systems
Integration with ‘Omics • Multiple databases • Functional data • Bibliographic information
Source material • Samples • Storage • Purification • Library prep
Sequencing • Genome • Sanger • Illumina • Pyro /454 • SOLiD • PacBio • Transcriptome / RNA-seq • MyBAITS • HiSeq / MiSeq • IonTorrent
Infrastructure • Desktop machines • Computing clusters • Grid systems • Cloud-based computation
Assembly, Annotation • Assembly • To reference (mapping) • De novo • Annotation • By homology • De novo • SOAPdenovo • MAKER • Velvet • Bowtie / Cufflinks / Tophat • Trinity
Alignment • PRANK • MUSCLE • MAFFT • Clustal
Phylogeny inference • MrBayes • RAxML • BEAST • MP-EST • STAR
Phylogenetic analysis • BEAST • HYPHY • PAML • Pipelines • LRT
Parker et al. (2013) • De novo genomes: • four taxa • 2,321 protein-coding loci • 801,301 codons • Published: • 18 genomes • ~69,000 simulated datasets • ~3,500 cluster cores
mean = 0.05 mean = -0.01 mean = -0.08
Development cycle Design Alignment loadSequences() getSubstitutions() Phylogeny trimTaxa() getMRCA() Review, refine & refactor Wireframe & specify tests DataSeries calculateECDF() randomise() Regression getResiduals() predictInterval() Implement
Models of computation • Cloud resources: Unlimited flexibility, finite time • Development trade-off • Off-the-shelf • Bespoke • Exploratory work • Real time genomic transects? • Essential fundamental data missing from nearly every system; • Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer
Serialisation • Process data remotely • Freeze-dry objects, download to desktop • Implement new methods directly on previously-analysed data
7. Over the horizon • Real-time phylogenetics • Field phylogenetics • Alignment-free analyses
Conclusions • Why phylogenomics? • Practice • Comparative approach • Statistical context
Thanks Steve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1 1School of Biological and Chemical Sciences, Queen Mary, University of London 2Wellcome Trust Sanger Institute 3Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan Chris Walker & Dan Traynor Queen Mary GridPP High-throughput Cluster Chaz Mein & Anna Terry Barts and The London Genome Centre Mahesh Pancholi School of Biological and Chemical Sciences BBSRC (UK); Queen Mary, University of London
Resources • My email: Joe Parker (Queen Mary University of London): j.d.parker@qmul.ac.uk • Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature502(7470):228-231 doi:10.1038/nature12511. • Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press. • Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature437:327-331. doi:10.1038/nature12130 • Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE30(5):1046-50. doi:10.1093/molbev/mst033 • Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature478:476–482 doi:10.1038/nature10530 • Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE24:(6)332-340 doi:10.1016/j.tree.2009.01.009 • The Tree Of Life: http://phylogenomics.blogspot.co.uk/ • RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html • Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/ • OpenHelix: http://blog.openhelix.eu/ • Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)