1 / 17

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

This paper presents an enhanced search algorithm aimed at optimizing multiple-sequence alignment (MSA). The study explores various heuristics for computing path costs while utilizing graph representation for effective alignment. The paper covers DNA and protein sequences, delving into the challenges of insertion, deletion, and point mutations in alignment. The experimental results demonstrate the efficacy of the proposed methods over existing techniques. Key applications include assessing common ancestry between species, identifying functional DNA segments, and predicting protein structures, emphasizing the biological implications of alignment efficiency.

hafwen
Télécharger la présentation

An Improved Search Algorithm for Optimal Multiple-Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Improved Search Algorithm for Optimal Multiple-Sequence Alignment • Paper by: Stefan Schroedl • Presentation by: Bryan Franklin

  2. Outline • Multiple-Sequence Alignment (MSA) • Graph Representation • Computing Path Costs • Heuristics • Experimental Results • Other Optimizations

  3. Multiple-Sequence-Alignment • Sequence: • DNA: String over alphabet {A,C,G,T} • Protein: String with |Σ|=20 (one symbol for each amino acid) • Alignment: • Insert gaps (_) into sequences to line up matching characters.

  4. Multiple-Sequence-Alignment Sequences: ABCB, BCD, DB

  5. Multiple-Sequence-Alignment • Indel: Insertion, Deletion, Point mutation (single symbol replacement) • Find a minimum set of indels between two or more sequences. • NP-Hard for an arbitrary number of sequences

  6. Multiple-Sequence-Alignment • Applications • Common ancestry between species • Locating useful portions of DNA • Predicting structure of folded proteins

  7. Graph Representation

  8. Computing G(n) • Considerations • Biological Meaning • Cost of computation • Sum-of-pairs • Substitution matrix ((|Σ|+1)2) • Sum alignment costs for all pairs • Costs can depend on neighbors

  9. Computing G(n) Sequences: ABCB, BCD, DB

  10. Computing G(n) 7 7 8 7 6

  11. Computing G(n)

  12. Heuristics • Methods Examined • 2-fold (hpair) • divide-conquer • 3-fold (h3,all) • 4-fold (h4,all)

  13. Heuristic Comparison

  14. Resource Usage

  15. Optimizations • Sparse Path Representation • Curve fitting for predicting threshold values • Sub-optimal paths periodically deleted

  16. The End • Any Questions?

More Related