340 likes | 1.78k Vues
Homology. Orthology, Paralogy, Xenology. What is the relationship of proteins included in a Pfam family?. Homologs, Orthologs, Paralogs, Xenologs? Functional equivalents? Varies from family to family. Fitch WM. Trends Genet. 2000 May;16(5):227-31. . Analogy vs Homology. Analogy
E N D
Homology Orthology, Paralogy, Xenology
What is the relationship of proteins included in a Pfam family? • Homologs, Orthologs, Paralogs, Xenologs? • Functional equivalents? • Varies from family to family
Analogy vs Homology Analogy The relationship of any two characters that have descended convergently from unrelated ancestors. Homology The relationship of any two characters that have descended, usually with divergence, from a common ancestral character.
Orthology The relationship of any two homologous characters whose common ancestor lies in the cenancestor of the taxa from which the two sequences were obtained. Paralogy The relationship of any two homologous characters arising from a duplication of the gene for that character. Xenology The relationship of any two homologous characters whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters.
Test Yourself • A1 – B1 • A1 – B2 • A1 – C3 • B1 – C2 • C2 – C3 • B2 – C3 • C3 – AB1
On a genome scale • Compare two or more genomes and infer • Which genes were present in the cenancestor? • Which genes are carrying out the same or related functions? • How and when did phenotypes associated with particular genes arise? * Homology applies to groups of characters larger and smaller than genes.
Predicting Orthologs from Genome Sequences • BLASTP reciprocal best hits • Two proteins from different genomes that are each others best BLASTP match • Evolutionary distances • Phylogenetic trees • Genome contexts
Reciprocal Best Hits (SymBets) Consider genomes Y and Z with genes 1 –100: Do two searches: BLASTP Y1-100 vs. Z1-100 BLASTP Z1-100 vs. Y1-100 Y1 matches Z40 best Z40 matches Y1 best Y1 and Z40 are orthologs.
Reciprocal Best Hits (SymBets) Consider genomes Y and Z with genes 1 –100: Do two searches: BLASTP Y1-100 vs. Z1-100 BLASTP Z1-100 vs. Y1-100 Y1 matches Z40 best Z40 matches Y1 best Y1 matches Z40 best Y6 matches Z40 best Z40 matches Y6 best Y6 and Z40 are orthologs. Y1 and Z40 are paralogs? Y1 and Z40 are orthologs.
Reciprocal Best Hits (SymBets) Consider genomes Y and Z with genes 1 –100: Do two searches: BLASTP Y1-100 vs. Z1-100 BLASTP Z1-100 vs. Y1-100 Y1 matches Z40 best Z40 matches Y1 best X X X X Y1 and Z40 are Orthologs Or not. Here B1 and C2 are paralogs and SymBets.
We might notice that AB1 is too similar to B1 and C1. X X X X X X X X Here B1 and C2 are paralogs and SymBets. Here AB1 is a SymBet xenolog of both B1 or C1. Here we might conclude that either (both or neither) C2 or C3 is an ortholog of B1, but both are paralogs. As long as we allow multiple “best” hits, we should notice there are two homologs in C. X X X
So? • SymBets are imperfect • Considering evolutionary distance is helpful • Having a tree is helpful • Knowing about deleted homologs is helpful
Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003 Sep;13(9):2178-89.
Evaluating performance • No “gold standard” set of true orthologs • Latent Class Analysis • Agreement between methods provides confidence • 27,562 proteins from 6 eukarotes assigned to Pfams • Correct for dependencies
Chen F, Mackey AJ, Vermunt JK, Roos DS. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE. 2007 Apr 18;2(4):e383.
Our latest strategyforenterobacteria • Empirically determine range of expected scores/Evalues/pct identities for a given comparison to establish thresholds. • Find “uncontested” BLASTP SymBets (only one best or equivalent match) that are close to full length (typically >60% aligned) and meet the threshold. Call these “Potential Orthologs”. • Find “contested” (more than one best or equivalent match) candidates that otherwise meet thresholds, call them “Homologs”. • Independently predict which genes are aligned using Mauve. Compare BLASTP and Mauve results. Call ones present in both sets “Approved Orthologs”. Call ones in one set but not the other “Predicted Orthologs”.
Example: pectate lyases of soft-rot enterobactia may be SymBets, but genome context suggests they may not be orthologs
Discussion • Should we expect orthologs to be functional equivalents? • How are paralogs functionally related? • What impact does “completeness” of the genome have on inference of orthology?