Advanced Bioinformatics: Sequence Analysis and Algorithmic Techniques for Biological Data

Bioinformatics PhD. Course Summary (approximate) • 1. Biological introduction • 2. Comparison of short sequences (<10.000 bps) • 3 Comparison of large sequences (up to 250 000 000) • 4 Sequence assembly • 5 Efficient data search structures and algorithms • 6 Proteins...

2. Comparison of short sequences (<10.000 bps) Summary (more or less) • 2.1 Dot matrix • 2.2 Pairwise alignment. • 2.3 Hash algorithms. • 2.4 Multiple alignment.

2. Dot matrix S2 y S1 x Given two sequences, how we can analyse their degree of identity? By searching those parts that match: 1/0 1 if both characters coincide

2. Dot matrix S2 S2 y y . . . . . S1 S1 x x . . 1/0 1 if both characters coincide ? Given two sequences, how we can analyse their degree of identity? By searching those parts that match:

2.1 Dot matrix accaccacaccacaacgagcata… acctgagcgatat a c c . . t • m(i,j)=1 iff S1(i..i+L)=S2(j..j+L): exact matching • m(i,j)=1 iff k over L coincide: approximate matching. • m(i,j)=k iff k over L coincide: approximate matching L=window length What is the cost of the algorithm? When are the matchings relevant?

2.1. Dot matrix: algorithm cost accaccacaccacaacgagcata… acctgagcgatat a c c . . t • long(S1)*long(S2)* L in other words O(n2 L) • can long(S1)*long(S2)be possible? • can we also say that O(n2 ) is independent of L?

2.1. Dot matrix: signals C: Random B: S1=S2 A: transposons When are signals statistically significant?

2.1. Dot matrix: statistical significance: Given L=window length S2 y . . . . . S1 x . . We need to define a random model against which to compare the signals: we define RV: X number of characters that coincide, then Prob(X=k)=comb(L,k) pk (1-p)L-k What is its expected value?

Advanced Bioinformatics: Sequence Analysis and Algorithmic Techniques for Biological Data

Advanced Bioinformatics: Sequence Analysis and Algorithmic Techniques for Biological Data

Presentation Transcript

Bioinformatic PhD. course

Introductory course in Bioinformatics

Structure Bioinformatics Course – Basel 2004

Interdisciplinary Introductory Course in Bioinformatics

Software Innovation PhD Course

PhD course on Nuclear Microelectronics

Richard Casey, PhD RMRCE CSU Center for Bioinformatics

SONATE Euro-NF PhD Course

TOT PhD-supervision course II - designing a PhD-supervision course

BIO337 Systems Biology/Bioinformatics (course # 50524)

Stellar Evolution (Lectured PhD Course)

MNW2 course Introduction to Bioinformatics

Bioinformatics Course Day 4

Bioinformatics PhD. Course

Bioinformatic PhD. course

Stellar Evolution (Lectured PhD Course)

Incorporating Bioinformatics in an Algorithms Course

Richard Casey, PhD RMRCE CSU Center for Bioinformatics

Bioinformatic PhD. course

Bioinformatics and Comparative Genome Analyses Course

Course Sequence Analysis for Bioinformatics Master’s

Richard Casey, PhD RMRCE CSU Center for Bioinformatics