Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik PowerPoint Presentation
Download Presentation
Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

135 Vues Download Presentation
Télécharger la présentation

Burkhard Morgenstern Institut f ür Mikrobiologie und Genetik

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007

  2. Goal: Phylogeny reconstruction based on molecular sequence data (DNA, RNA, protein sequences)

  3. Multiple sequence alignment • Molecular phylogeny reconstruction relies on comparative nucleic acid and protein sequence analysis • Alignment most important tool for sequence comparison • Multiple alignment contains more information than pair-wise alignment

  4. Tools for multiple sequence alignment Y I M Q E V Q Q E R • Sequence duplicates in history (e.g. speciation event)

  5. Tools for multiple sequence alignment Y I M Q E V Q Q E R

  6. Tools for multiple sequence alignment Y I M Q E V Q Q E R Y I M Q E V Q Q E R

  7. Tools for multiple sequence alignment Y I M Q E A Q Q E R Y L M Q E V Q Q E R • Substitutions occur

  8. Tools for multiple sequence alignment Y I M Q E A Q Q E R Y L M Q E V Q Q E R

  9. Tools for multiple sequence alignment YAI M Q E A Q Q E R Y L M - - V Q Q E R V • Insertions/deletions (indels) occur

  10. Tools for multiple sequence alignment YAI M Q E A Q Q E R Y L M - - V Q Q E R V

  11. Tools for multiple sequence alignment Y A I M Q E A Q Q E R Y L M V Q Q E R V • because of insertions/deletions: sequence similarity no longer immediately visible!

  12. Tools for multiple sequence alignment Y A I M Q E A Q Q E R - Y - L M V - - Q Q E R V • Alignment brings together related parts of the sequences by inserting gaps into sequences

  13. Tools for multiple sequence alignment Y A I M Q E A Q Q E R - Y - L M V - - Q Q E R V

  14. Tools for multiple sequence alignment Y AI M QE A Q Q E R - Y -L M V- - Q Q E R V • Mismatches correspond to substitutions • Gaps correspond to indels

  15. Tools for multiple sequence alignment • Pairwise alignment: alignment of two sequences • Multiple alignment: alignment of N > 2 sequences

  16. Tools for multiple sequence alignment s1 R Y I M R E A Q Y E S A Q s2 R C I V M R E A Y E s3 Y I M Q E V Q Q E R s4 W R Y I A M R E Q Y E • Assumtion: sequence family related by common ancestry; similarity due to common history • Sequence similarity not obvious (insertions and deletions may have happened)

  17. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E- - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E- - - • Multiple alignment = arrangement of sequences by introducing gaps • Alignment reveals sequence similarities

  18. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E- - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E- - -

  19. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E- - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E- - -

  20. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y E S A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q E R - - s4 W R Y I A M R E - Q Y E - - - General information in multiple alignment: • Functionally important regions more conserved than non-functional regions • Local sequence conservation indicates functionality!

  21. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Phylogeny reconstruction based on multiple alignment: • Estimate pairwise distances between sequences (distance-based methods for tree reconstruction) • Estimate evloutionary events in evolution (parsimony and maximum likelihood methods)

  22. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Task in bioinformatics: Find best multiple alignment for given sequence set

  23. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!

  24. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Astronomical number of possible alignments!

  25. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - - - Y E - s3 Y I - - - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Computer has to decide: which one is best??

  26. Tools for multiple sequence alignment Questions in development of alignment programs: (1) What is a good alignment? → objective function (`score’) (2) How to find a good alignment? → optimization algorithm First question far more important !

  27. Tools for multiple sequence alignment Before defining an objective function (scoring scheme) • What is a biologically good alignment ??

  28. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein!

  29. Tools for multiple sequence alignment Criteria for alignment quality:

  30. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein!

  31. Tools for multiple sequence alignment Species related by common history

  32. Tools for multiple sequence alignment Genes / proteins related by common history

  33. Tools for multiple sequence alignment Criteria for alignment quality: • 3D-Structure: align residues at corresponding positions in 3D structure of protein! • Evolution: align residues with common ancestors!

  34. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about sequence evolution • Mismatches correspond to substitutions • Gaps correspond to insertions/deletions

  35. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - - Y I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about sequence evolution • Search for most plausible scenario! • Estimate probabilities for individual evolutionary events: insertions/deletions, substitutions

  36. Tools for multiple sequence alignment s1 - R Y I - M R E A Q Y ES A Q s2 - R C I V M R E A - Y E - - - s3 - Y - I - M Q E V Q Q ER - - s4 W R Y I A M R E - Q Y E - - - Alignment hypothesis about sequence evolution • Search for most plausible scenario! • Estimate probabilities for individual evolutionary events: insertions/deletions, substitutions

  37. Tools for multiple sequence alignment Compute score s(a,b) for degree of similarity between amino acids a and b based on probability pa,b of substitution a → b (or b → a) (Extremely simplified!)

  38. Tools for multiple sequence alignment

  39. Tools for multiple sequence alignment Reason for different substitutin probabilities pa,b : • Different physical and chemical properties of amino acids • Amino acids with similar properties more likely to be substituted against each other

  40. Tools for multiple sequence alignment Use penalty for gaps introduced into alignment • Simplest approach: linear gap costs: penalty proportional to gap length • Non-linear gap penalties more realistic: long gap caused by single insertion/deletion • Most frequently used: affine linear gap penalties: more realistic, but efficient to calculate!

  41. Traditional Objective functions: Define Score of alignments as • Sum of individual similarity scores s(a,b) • Minus gap penalties Needleman-Wunschscoring system for pairwise alignment (1970)

  42. Pair-wise sequence alignment T Y W I V T - - L V Example: Score = s(T,T) + s(I,L) + s (V,V) – 2 g Assumption: linear gap penalty!

  43. Pair-wise sequence alignment T Y W I V T - - L V Dynamic-programming algorithm finds alignment with best score. (Needleman and Wunsch, 1970)

  44. Pair-wise sequence alignment T Y W I V T - - L V • Running time proportional to product of sequence length • Time-complexity O(l1 * l2)

  45. Pair-wise sequence alignment • Algorithm for pairwise alignment can be generalized to multiple alignment of N sequences • Time-complexity O(l1 * l2 * … * lN) • Not feasable in reality (too long running time!) • Heuristic necessary, i.e. fast algorithm that does not necessarily produce mathematically best alignment

  46. `Progressive´ Alignment Most popular approach to (global) multiple sequence alignment: • Progressive Alignment Since mid-Eighties: Feng/Doolittle, Higgins/Sharp, Taylor, …

  47. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

  48. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Guide tree

  49. `Progressive´ Alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP Profile alignment, “once a gap - always a gap”