1 / 33

Pairwise Sequence Alignment Part 2

Pairwise Sequence Alignment Part 2. Outline. Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments. Global Alignment -Cont. Needleman-Wunsch Alignment. Global alignment between sequences Compare entire sequence against another

darius
Télécharger la présentation

Pairwise Sequence Alignment Part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Sequence Alignment Part 2

  2. Outline • Global alignments-continuation • Local versus Global • BLAST algorithms • Evaluating significance of alignments

  3. Global Alignment -Cont

  4. Needleman-Wunsch Alignment • Global alignment between sequences • Compare entire sequence against another • Create scoring table • Sequence A across top, B down left • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B • Global alignment score is bottom right cell

  5. A -

  6. ACGCTG ------

  7. ----- CATGT

  8. A C

  9. AC -C

  10. ACG -C-

  11. ACGC ---C ACGC -C--

  12. ACG -CA

  13. ACGCTG- -C-ATGT

  14. ACGCTG- -CA-TGT

  15. -ACGCTG CATG-T-

  16. Global Alignment versus Local Alignment Global Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

  17. Global vs. Local alignment DOROTHY DOROTHY HODGKIN HODGKIN Global alignment: DOROTHY--------HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

  18. Local Alignment • Best score for aligning part of sequences • Often beats global alignment score • Similar algorithm: Smith-Waterman • Table cells never score below zero

  19. TAA TAA TACTA TAATA

  20. Problems with DP for sequence alignments -The complexity is very high - Given a score, how to evaluate the significance of the alignment?

  21. Complexity • Complexity is determined by size of table • Aligning a sequence of lengthmagainst one of lengthnrequires calculating(mn)cells • Time of calculation Lets say we calculate 108 cells per second on a one processor PC • Aligning two mRNA sequences of8,000 bprequires64,000,000 cells 0.64 seconds • Aligning an mRNA and a107 bpchromosome requires~1011 cells 1,000 secs =15 minutes

  22. Complexity for large databases • Let’s say a database contains3  1010base pairs • Searching an mRNA against the database will require ~2.5  1014 cells 2.5  106 secs =1 month! • We need an efficient algorithm to cut down on alignment

  23. BLAST • Basic Local Alignment Search Technique • A set of tools developed at NCBI (BlastN, BlastP,..) • BLAST benefits • Search speed • Ease of use • Statistical rigor

  24. BLAST • A good alignment contains subsequences of absolute identity: • First, identify very short (almost) exact matches. • Next, the best short hits from the 1st step are extended to longer regions of similarity. • Finally, the best hits are optimized using the Smith-Waterman algorithm.

  25. BLAST Algorithm (1) Query sequence Words of length W W default = 11 • Compare the word list to the database • and identify exact matches

  26. For each word match, extend alignment in both • directions (4) Score the alignments using Dynamic Programing (5) Evaluate the statistics significance

  27. Random Related Database Searches • Using the pairwise comparison, each database search normally yields 2 groups of scores: genuinely related and unrelated sequences, with some overlap between them. • A good search method should completely separate between the 2 score groups.

  28. E-value • The number of hits (with the same similarity score) one can "expect" to see just by chance when searching the given string in a database of a particular size. • higher e-value lower similarity • “sequences with E-value of less than 0.01 are almost always found to be homologous” • The lower bound is normally 0 (we want to find the best)

  29. Expectation Values Increases linearly with length of query sequence Decreases exponentially with score of alignment Increases linearly with length of database

More Related