1 / 66

Phylogenetic Analysis

Phylogenetic Analysis. Phylogenetic Analysis Overview. Insight into evolutionary relationships Inferring or estimating these evolutionary relationships shown as branches of a tree Length and nesting reflects degree of similarity between any two items (in our case, sequences).

doane
Télécharger la présentation

Phylogenetic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Phylogenetic Analysis

  2. Phylogenetic Analysis Overview • Insight into evolutionary relationships • Inferring or estimating these evolutionary relationships shown as branches of a tree • Length and nesting reflects degree of similarity between any two items (in our case, sequences)

  3. Phylogenetics and Cladistics • Clade = a set of descendants from a single ancestor (Greek work for branch) • Three basic assumptions • Any group of organisms are related b descent from a common ancestor • There is a bifurcating pattern of cladogenesis • Change in characteristics occurs in lineages over time

  4. More default assumptions • Correct sequences and origins • Shared ancestral origin • Homologous sequences • No mixtures of nuclear and organellar sequences • Large enough taxa sampling size • Contains representative sequence variations • Sufficient sequence variations

  5. Basic Terminology • Clades: a group of organisms or genes that includes the most recent common ancestor of all of its members and all of the descendants of that most recent common ancestor. • Taxons: any named group of organisms; not necessary a clade. • Branches: branches sometimes correspond to the degree of divergence • Nodes: a bifurcating branch point Branch lengths are not significant Branch lengths are significant

  6. Basic Definition • Homologous: sequences that share an arbitrary threshold level of similarity determined by alignment of matching bases • Similarity: a quantifiable term that refers to a degree of relatedness between sequences, but does not necessarily reflect ancestry. • Orthologs: homologs produced by speciation; derived from a common ancestor; tend to have similar function • Paralogs: homologs produced by gene duplication; derived within an organism, tend to have differing functions • Xenologs: homologs resulting from horizontal gene transfer between two organisms; difficult to verify; variable function but tends to be similar.

  7. Phylogenetic Analysis Overview • Objective: • determine branch length and to figure out how the tree should be drawn • Sequences most closely related drawn as neighboring branches

  8. Phylogenetic Analysis Overview • Dependent upon good multiple sequence alignment programs • Group sequences with similar patterns of substitutions in order to reconstruct a phylogenetic tree

  9. Phylogenetic Analysis Overview • Consider two sequences that are related • Ancestoral sequence can be (partially) derived • With additional sequences, more information can be gathered to add to a correct derivation

  10. Phylogenetic Analysis Overview • Example: C-Terminal Motor Kinesin sequences • http://www.proweb.org/kinesin/BE4_Cterm.html

  11. Practical use of phylogenetic analysis • To prioritize the analysis of genes in the target family – give insight into protein functions

  12. P. asruginosa, a bacteria that is one of the top 3 causes or opportunistic infections, is noted for its antimicrobial resistance and resistance to detergents. • 3 homologous outer membrane proteins, OprJ, OprM and OprN were identified as playing a role in this antimicrobial resistance.

  13. Possible horizontal gene transfer Figure 14.2 Example of a phylogenetic tree based on genes that does not match organismal phylogeny, suggesting horizontal gene transfer has occurred.

  14. Uses of Phylogenetic Analysis • Given a set of genes, determine which genes are likely to have equivalent functions • Follow changes occurring in a rapidly changing species such as a virus • Example: influenza • Study of rapidly changing genes • Next year’s strain can be predicted • Flu vaccination can be developed

  15. UCMP Glossary: Phylogenetics

  16. Tree of Life • Phylogenies study how the evolution of species has occurred • Image: http://microbialgenome.org/primer/tree.html

  17. Tree of Life • Traditionally, morphological (visible features) characters have been used to classify organisms • Living organisms • Fossil records • Sequence data beginning to take larger role

  18. Tree of Life • Many different resources including: • NCBI taxonomy web sites • University of Arizona’s tree of life project

  19. NCBI Taxonomy Web Site • http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ 分類法;分類學

  20. Tree of Life • http://tolweb.org/tree/phylogeny.html

  21. Evolutionary Trees • Two dimensional graph showing evolutionary relationship among a set of items • can be organisms, genes, or sequences • Each unit is defined by a distinct branch on the tree

  22. Evolutionary Trees • leaves represent the units (taxa) being studied • nodes and branches representing the relationships among the taxa • Two taxa derived from the same common ancestor will share a node in the graph

  23. Evolutionary Trees • length of each branch may be drawn according to the number of sequence level changes that occurred • distance may not be in direct relation to evolutionary time • uniform rate of mutation analyses use the molecular clock hypothesis

  24. Rooted Trees • One sequence (root) defined to be common ancestor of all of the other sequences • A unique path leads from the root node to any other node • Direction of path indicates evolutionary time • Root chosen as a sequence thought to have branched off earliest

  25. Rooted Trees • If molecular clock hypothesis holds, it is possible to predict a root • As the number of sequences increase, the number of possible rooted trees increases very rapidly • In most cases, a bifurcating binary tree is the best model to simulate evolutionary events

  26. Example Rooted Tree SYSTEMATICS AND MOLECULAR PHYLOGENETICSImage source: http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

  27. Unrooted Tree (Star) • Indicates evolutionary relationship without revealing the location of the oldest ancestry • Fewer possible unrooted trees than a rooted tree

  28. Example Unrooted Tree Image source: http://www.shef.ac.uk/english/language/quantling/images/quantling1.jpg

  29. Image: http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

  30. Methods for Determining Trees • Three main methods: • maximum parsimony • Distance • maximum likelihood

  31. Maximum Parsimony • Predicts evolutionary tree minimizing number of steps required to generate observed variation • Multiple sequence alignment must first be obtained

  32. Maximum Parsimony • For each position, phylogenetic trees requiring the smallest number of evolutionary changes to produce the observed sequence changes are identified • Trees that produce the smallest number of changes for all sequence positions are identified

  33. Maximum Parsimony • Time consuming algorithm • Only works well if the sequences have a strong sequence similarity

  34. Maximum Parsimony Example 1 A AGAG T GCA 2 AGC CG T GCG 3 AGA T A T C CA 4 A GAGA T C CG • four sequences, three possible unrooted trees

  35. 3 2 3 1 1 1 2 3 4 4 4 2 Maximum Parsimony Example Possible Trees:

  36. Maximum Parsimony Example • Some sites are informative, and other sites are not • Informative site has the same sequence character in at least two different sequences • Only the informative sites need to be considered

  37. Maximum Parsimony Example 1 A AGAG T GCA 2 AGC CG T GCG 3 AGA T A T C CA 4 A GAGA T C CG Three informative columns

  38. 3 3 2 2 3 2 3 3 3 1 1 1 1 1 1 1 1 1 4 3 3 2 3 2 2 4 4 4 2 2 4 4 4 4 4 2 Maximum Parsimony Example 1 GGA 2 GGG 3 AC A 4 AC G Column 1 Column 2 Column 3 Is a substitution

  39. Distance Method • Looks at the number of changes between each pair in a group of sequences • Goal is to identify a tree that positions neighbors correctly and that also has branch lengths which reproduce the original data as closely as possible

  40. Distance Method • CLUSTALW uses the neighbor-joining method as a guide to multiple sequence alignments • PHYLIP suite of programs employ neighbor-joining methods • http://evolution.genetics.washington.edu/phylip.html

  41. Distance Programs in Phylip • NEIGHBOR: estimates phylogenies using either: • neighbor-joining (no molecular clock assumed) • unweighted pair group method with arithmetic mean (UPGMA) (molecular clock assumed)

  42. Distance Analysis • distance score counted as • number of mismatched positions in the alignment • number of sequence positions that must be changed to generate the second sequence • Success depends on degree the distances among a set of sequences can be made additive on a predicted evolutionary tree

  43. Example of Distance Analysis • Consider the alignment: A ACGCGTTGGGCGATGGCAAC B ACGCGTTGGGCGACGGTAAT C ACGCATTGAATGATGATAAT D ACACATTGAGTGATAATAAT

  44. Example of Distance Analysis • Distances can be shown as a table A ACGCGTTGGGCGATGGCAAC B ACGCGTTGGGCGACGGTAAT C ACGCATTGAATGATGATAAT D ACACATTGAGTGATAATAAT

  45. C A 2 1 4 1 2 B D Example of Distance Analysis • Using this information, a tree can be drawn: A ACGCGTTGGGCGATGGCAAC B ACGCGTTGGGCGACGGTAAT C ACGCATTGAATGATGATAAT D ACACATTGAGTGATAATAAT

  46. Fitch and Margoliash Algorithm (3 sequences) • Distance table used • Sequences combined in threes • define the branches of the predicted tree • calculate the branch lengths of the tree

  47. A a C c b B Fitch and Margoliash Algorithm (3 sequences) • 1)Draw unrooted tree with three branches originating from common node:

  48. Fitch and Margoliash Algorithm (3 sequences) 1)Calculate lengths of tree branches algebraically: • distance from A to B = a + b = 22 (1) • distance from A to C = a + c = 39 (2) • distance from B to C = b + c = 41 (3) • subtracting (3) from (2) yields: • b + c = 41 • -a – c = -39 • __________ • b – a = 2 (4) • adding (1) and (4) yields 2b = 24; b = 12 • so a + 12 = 22; a = 10 • 10 + c = 39; c = 29

  49. A 10 C 29 12 B Fitch and Margoliash Algorithm (3 sequences) • 3)Resulting tree:

  50. C A c a f D d b g B e E Fitch and Margoliash Algorithm (5 sequences) • Algorithm can be extended to more sequences. Consider the distances:

More Related