1 / 28

Multiple Sequence Alignment

Multiple Sequence Alignment. Evolution at the DNA level. Deletion. Mutation. …AC GGTG CAGT T ACCA…. SEQUENCE EDITS. …AC ---- CAGT C CACCA…. REARRANGEMENTS. Inversion. Translocation. Duplication. Orthology and Paralogy. Yeast. Orthologs : Derived by speciation Paralogs :

robertsong
Télécharger la présentation

Multiple Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Sequence Alignment

  2. Evolution at the DNA level Deletion Mutation …ACGGTGCAGTTACCA… SEQUENCE EDITS …AC----CAGTCCACCA… REARRANGEMENTS Inversion Translocation Duplication

  3. Orthology and Paralogy Yeast Orthologs:Derived by speciation Paralogs: Everything else HA1 Human HA2 Human WA Worm HB Human WB Worm

  4. Orthology, Paralogy, Inparalogs, Outparalogs

  5. Genome Evolution – Macro Events • Inversions • Deletions • Duplications

  6. Synteny maps Comparison of human and mouse

  7. Synteny maps

  8. Synteny maps

  9. Synteny maps

  10. Building synteny maps Recommended local aligners • BLASTZ • Most accurate, especially for genes • Chains local alignments • WU-BLAST • Good tradeoff of efficiency/sensitivity • Best command-line options • BLAT • Fast, less sensitive • Good for • comparing very similar sequences • finding rough homology map

  11. Index-based local alignment …… Dictionary: All words of length k (~10) Alignment initiated between words of alignment score  T (typically T = k) Alignment: Ungapped extensions until score below statistical threshold Output: All local alignments with score > statistical threshold query …… scan DB query Question: Using an idea from overlap detection, better way to find all local alignments between two genomes?

  12. Local Alignments

  13. After chaining

  14. Chaining local alignments • Find local alignments • Chain -O(NlogN) L.I.S. • Restricted DP

  15. Progressive Alignment x • When evolutionary tree is known: • Align closest first, in the order of the tree • In each step, align two sequences x, y, or profiles px, py, to generate a new alignment with associated profile presult Weighted version: • Tree edges have weights, proportional to the divergence in that edge • New profile is a weighted average of two old profiles y Example Profile: (A, C, G, T, -) px = (0.8, 0.2, 0, 0, 0) py = (0.6, 0, 0, 0, 0.4) s(px, py) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A) + 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -) Result:pxy= (0.7, 0.1, 0, 0, 0.2) s(px, -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -) Result:px-= (0.4, 0.1, 0, 0, 0.5) z w

  16. Threaded Blockset Aligner HMR – CD Restricted Area Profile Alignment Human–Cow

  17. Reconstructing the Ancestral Mammalian Genome Human: C C Baboon: C G Dog: G C or G Cat: C

  18. Neutral Substitution Rates

  19. Finding Conserved Elements (1) • Binomial method • 25-bp window in the human genome • Binomial distribution of k matches in N bases given the neutral probability of substitution

  20. Finding Conserved Elements (2) A C • Parsimony Method • Count minimum # of mutations explaining each column • Assign a probability to this parsimony score given neutral model • Multiply probabilities across 25-bp window of human genome A A G

  21. Finding Conserved Elements

  22. Finding Conserved Elements (3) GERP

  23. Phylo HMMs HMM Phylogenetic Tree Model Phylo HMM

  24. Finding Conserved Elements (3)

  25. How do the methods agree/disagree?

  26. Statistical Power to Detect Constraint N L C: cutoff # mutations D: neutral mutation rate : constraint mutation rate relative to neutral

  27. Statistical Power to Detect Constraint N L C: cutoff # mutations D: neutral mutation rate : constraint mutation rate relative to neutral

More Related