1 / 8

Advanced Bioinformatics: Sequence Analysis and Algorithmic Techniques for Biological Data

This PhD course in Bioinformatics covers advanced methods for analyzing biological sequences. Topics include biological introductions, comparison of short and large sequences, and sequence assembly. Participants will learn about efficient data structures and algorithms, focusing on techniques such as dot matrices, pairwise, and multiple sequence alignments. The course also delves into the statistical significance of matching sequences and how to assess identity using algorithms. This comprehensive curriculum is designed for those seeking to deepen their understanding of computational biology and its applications.

ross-barton
Télécharger la présentation

Advanced Bioinformatics: Sequence Analysis and Algorithmic Techniques for Biological Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics PhD. Course Summary (approximate) • 1. Biological introduction • 2. Comparison of short sequences (<10.000 bps) • 3 Comparison of large sequences (up to 250 000 000) • 4 Sequence assembly • 5 Efficient data search structures and algorithms • 6 Proteins...

  2. 2. Comparison of short sequences (<10.000 bps) Summary (more or less) • 2.1 Dot matrix • 2.2 Pairwise alignment. • 2.3 Hash algorithms. • 2.4 Multiple alignment.

  3. 2. Dot matrix S2 y S1 x Given two sequences, how we can analyse their degree of identity? By searching those parts that match: 1/0 1 if both characters coincide

  4. 2. Dot matrix S2 S2 y y . . . . . S1 S1 x x . . 1/0 1 if both characters coincide ? Given two sequences, how we can analyse their degree of identity? By searching those parts that match:

  5. 2.1 Dot matrix accaccacaccacaacgagcata… acctgagcgatat a c c . . t • m(i,j)=1 iff S1(i..i+L)=S2(j..j+L): exact matching • m(i,j)=1 iff k over L coincide: approximate matching. • m(i,j)=k iff k over L coincide: approximate matching L=window length What is the cost of the algorithm? When are the matchings relevant?

  6. 2.1. Dot matrix: algorithm cost accaccacaccacaacgagcata… acctgagcgatat a c c . . t • long(S1)*long(S2)* L in other words O(n2 L) • can long(S1)*long(S2)be possible? • can we also say that O(n2 ) is independent of L?

  7. 2.1. Dot matrix: signals C: Random B: S1=S2 A: transposons When are signals statistically significant?

  8. 2.1. Dot matrix: statistical significance: Given L=window length S2 y . . . . . S1 x . . We need to define a random model against which to compare the signals: we define RV: X number of characters that coincide, then Prob(X=k)=comb(L,k) pk (1-p)L-k What is its expected value?

More Related