slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
M ulti P aranoid PowerPoint Presentation
Download Presentation
M ulti P aranoid

M ulti P aranoid

126 Vues Download Presentation
Télécharger la présentation

M ulti P aranoid

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. MultiParanoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple ProteomesAndrey Alexeyenko Ivica TamasGang Liu Erik L.L. SonnhammerStockholm

  2. Ancestral gene Homologs: genes that have descended from a common ancestral gene. Manifested by a sequence similarity. We do not believe in sequence similarity without a shared ancestry. S D Gene 1 BLAST hit. Low e-value Gene2 Orthologs are related via a speciation MultiParanoid Paralogs are related via a gene duplication. May or may not be in the same species Homologs: orthologs and paralogs

  3. MultiParanoid Orthology, paralogy and proposed classification for paralog subtypes Sonnhammer ELL and Koonin EV Trends in Genetics Volume 18, Issue 12 ,1 December 2002, Pages 619-620 Homologs: orthologs and paralogs • Inparalogs ~ co-orthologs • paralogs that were duplicated afterthe speciation and hence are orthologs to the other species’ genes • Outparalogs = not co-orthologs • paralogs that were duplicated before the speciation

  4. MultiParanoid Orthologs for functional genomics Orthologs are more likely than outparalogs to have identical/similar biochemical functions and biological roles Orthologs are optimal to discover gene function via model organism counterparts Benchmarkingortholog identification methods using functional genomics data.Hulsen T, Huynen MA, de Vlieg J, Groenen PM. Genome Biol. 2006;7(4):R31. Epub 2006 Apr 13. “…the InParanoid program is the best ortholog identification method in terms of identifying functionally equivalent proteins.”

  5. MultiParanoid Outline • InParanoid • The world of ortholog resources • Why MultiParanoid • Limitations • Future development

  6. MultiParanoid Homologs: orthologs and paralogs D S S D Orthologs Inparalogs Outparalogs

  7. MultiParanoid P r o t e o m e A InParanoid P r o t e o m e B Reciprocally best hits ~ seed orthologs Inparalogs Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5, 14 December 2001, Pages 1041-1052

  8. Eukaryotic Ortholog Groups 3409 diseases MultiParanoid Resources using InParanoid

  9. MultiParanoid Multi-species ortholog resources “Massive download” friendly: Tree-based, best for detailed analysis HOVERGEN release 47

  10. MultiParanoid Any cluster of more than 2 species’ genes is controversial in terms of orthology as the speciation gives rise to a pair of species. S D D D S S

  11. 1. Take >2 species with maximally close speciation points InParanoid cluster A-C ? 2. Generate 2-species InParanoid clusters B-C A-C A-B 3. Find protein counterparts across the clusters MultiParanoid InParanoid cluster A-B InParanoid cluster B-C MultiParanoid algorithm

  12. Genes: Fly MultiParanoid Worm Human • The MultiParanoid output was benchmarked on a manually curated set of 221 human-fly-worm clusters: • - 214 MultiParanoid clusters found • - 177 (almost) identical • The rest controversial mainly due to: • differences between pairwise and multiple alignments • the curator’s perception and InParanoid settings MultiParanoid validation However: tree conflicts InParanoid cluster membership

  13. MultiParanoid MultiParanoidvs. and

  14. ??? H.sapiens C.elegans D.melanogaster C.intestinalis MultiParanoid Current MultiParanoid release 40451 protein sequences classified into 7695 clusters

  15. 2. Merge respective cluster members across the clades: MultiParanoid 1. Process all the possible 3-species combinations: A solution: expansion ofMultiParanoidclusters

  16. MultiParanoid Butstill, orthology is a pairwise concept! The speciation gives rise to a pair of species.

  17. MultiParanoid How the ortholog resources cope with it? Post-processing (bootstrap, synteny, tree manual curation etc.) HOVERGEN release 47 Cluster size ~ outparalogs/orthologs ratio

  18. MultiParanoid Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143 • EGO • COG/KOG • HomoloGene • InParanoid/MultiParanoid • HOPS • KEGG • OrthoMCL • ENSEMBL Compara • PhiGs • MGD • HOGENOM • HOVERGEN • INVHOGEN • TreeFam • OrthologID

  19. MultiParanoid …the demand for multi-species clusters and pair-wise gene relations? The common feature is a single ancestor gene at the root point: How to reconcile… S D D D S S D

  20. MultiParanoid 2 new terms: Pseudo-proteome:a union of proteomes of the same clade Cluster of pseudo-inparalogs:a within-clade gene family

  21. MultiParanoid P s e u d o – p r o t e o m e A (reptiles) P s e u d o – p r o t e o m e B (mammals)

  22. MultiParanoid Another view:“gene-family”-wise: LCA D S D S S D … and all the members of the same cluster ascend to a single gene in the last common ancestor (LCA) of the two major clades

  23. MultiParanoid The clustering can be done at different levels For example: Fungi vs. animals Insects vs. mammals Rodents vs. primates S D S D Orthologs S S D • Having more than one species in a pseudo-proteome reduces mis-assignments in case of gene loss. • Closer pseudo-proteomes increase resolution. • Lineage(~pseudo-proteome)-specific expansions should be also available

  24. MultiParanoid Conclusions • Most of the ortholog resources may build clusters in form of gene trees, but only InParanoid seems to correctly delineate ortholog/inparalog groups • MultiParanoid algorithm has relieved the problem of “hidden outparalogs”, but the number/content of species remains limited • The “LCA-Paranoid” concept: the long waited solution? • Each of the two clade-specific cluster parts may be regarded as a multi-species cluster • When (in future) all possible “clade<->clade” clustering solutions will be found, each gene would receive a complete set of orthologs at a desirable level of LCA • With sufficient number of complete proteomes, it would be possible to date each gene pair’s point of divergence