1 / 26

The Genome Access Course Phylogenetic Analysis

The Genome Access Course Phylogenetic Analysis. Phylogenetics. Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966). What is the ancestral sequence?. pfeffer pepper (pf/p)e(ff/pp)er. Evolutionary Trees.

Télécharger la présentation

The Genome Access Course Phylogenetic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. TheGenomeAccessCoursePhylogenetic Analysis

  2. Phylogenetics • Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)

  3. What is the ancestral sequence? • pfeffer • pepper • (pf/p)e(ff/pp)er

  4. Evolutionary Trees • A tree is a connected, acyclic 2D graph • Leaf: Taxon • Node: Vertex • Branch: Edge • Tree length = sum of all branch lengths • Phylogenetic trees are binary trees

  5. A Generic Tree

  6. Evolutionary Trees • Rooted • common ancestor • unique path to any leaf • directed • Unrooted • root could be placed anywhere • fewer possible than rooted

  7. Rooted Tree generated by DRAWGRAM (PHYLIP)

  8. Unrooted Tree generated by DRAWTREE (PHYLIP)

  9. Possible Evolutionary Trees

  10. Genes vs. Species • Sequences show gene relationships, but phylogenetic histories may be different for gene and species • Genes evolve at different speeds • Horizontal gene transfer

  11. Methods for Phylogenetic Analysis • Character-State • Maximum Parsimony • Maximum Likelihood • Genetic Distance • Fitch & Margoliash • Neighbor-Joining • Unweighted Pair Group

  12. Phylogenetic Software • PHYLIP • PAUP (Available in GCG) • TREE-PUZZLE • PhyloBLAST • Felsenstein maintains an extensive list of programs on the PHYLIP site

  13. PHYLIP Programs • dnapars/protpars • dnadist/protdist • dnaml (use fastDNAml instead) • neighbor • fitch/kitsch • drawtree/drawgram

  14. Maximum Parsimony • Most common method • Allows use of all evolutionary information • Build and score all possible trees • Each node is a transformation in a character state • Minimize treelength • Best tree requires the fewest changes to derive all sequences

  15. 3 Nodes 3 Nodes Which is the more parsimonious tree? 9 Node Crossings 8 Node Crossings

  16. Maximum Likelihood • Reconstruction using an explicit evolutionary model • Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data. • Demanding computationally • Slowest method • Use to test (or improve) an existing tree

  17. Clustering Algorithms • Use distances to calculate phylogenetic trees • Trees are based on the relative numbers of similarities and differences between sequences • A distance matrix is constructed by computing pairwise distances for all sequences • Clustering links successively more distant taxa

  18. DNA Distances • Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences • Can only work for pairs of sequences that are similar enough to be aligned • All base changes are considered equal • Insertion/deletions are generally given a larger weight than replacements (gap penalties). • Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.

  19. Amino Acid Distances • More difficult to compute • Substitutions have differing effects on structure • Some substitutions require more than one DNA mutation • Use replacement frequencies (PAM, BLOSUM)

  20. Fitch & Margoliash • 3 sequences are combined at a time to define branches and calculate their length • Additive branch lengths • Accurate for short branches

  21. Neighbor Joining • Most common method of tree construction • Distance matrix adjusted for each taxon depending on its rate of evolution • Good for simulation studies • Most efficient computationally

  22. UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages • Simplest method • Calculates branch lengths between most closely related sequences • Averages distance to next sequence or cluster • Predicts a position for the root

  23. Phylogenetic Complications • Errors • Loss of function • Convergent evolution • Lateral gene transfer

  24. Validation • Use several different algorithms and data sets • NJ methods generate one tree, possibly supporting a tree built by parsimony or maximum likelihood • Bootstrapping • Perturb data and note effect on tree • Repeat many times • Unchanged ~90%, tree’s correctness is supported

  25. Are there bugs in our genome? N-acetylneuraminate lyase

  26. The End

More Related