1 / 50

Pairwise Sequence Alignment

Pairwise Sequence Alignment. WHAT?. WHAT?. Given any two sequences (DNA or protein) Seq 1: CATATTGCAGTGGTCCCGCGTCAGGCT S eq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar?. CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT. WHY?.

malise
Télécharger la présentation

Pairwise Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pairwise Sequence Alignment

  2. WHAT?

  3. WHAT? • Given any two sequences (DNA or protein) Seq 1: CATATTGCAGTGGTCCCGCGTCAGGCT Seq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar? CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

  4. WHY?

  5. Discover new function • Study evolution • Find crucial features within a sequence • Identify cause of diseases

  6. Discover function • Sequences that are similar probably have the same function

  7. Study evolution If two sequences from different organisms are similar , they may have a common ancestor

  8. Find crucial features • Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse. CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

  9. Identify cause of disease • Comparison of sequences between individuals can detect changes that are related to diseases

  10. Sickle Cell Anemia • Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

  11. What makes sequences different?

  12. Indel (replication slippage) TCCGT TCGAGT TCAGT TCGT Sequence Modifications • Three types of changes • Substitution (point mutation) • Insertion • Deletion TCAGT

  13. How do we quantitate similarity?

  14. Scoring Similarity • Assume independent mutation model • Each site considered separately • Score at each site • Positive if the same • Negative if different • Sum to make final score • Can be positive or negative • Significance depends on sequence length GTAGTCCTAGCG

  15. Total score +4 A weak match Substitutions Onlynot including indels • Sequences compared base-by-base • Count the number of matches and mismatches • Matches score +2, Mismatches score -1 TTCGTCGTAGTCGGCTCGACCTGGTACGTCTAGCGAGCGTGATCCT 9 matches +18 14 mismatches -14

  16. Total score +24 A strong match Including Indels • Create an ‘alignment’ • Count matches within alignment • Required if sequences are different length TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT- 17 matches +34 2 mismatches - 2 8 indels - 8

  17. TT-CGTCGTAGTCG-GC-TCGACC-TGGTACGTC-TAG-CGAGCGT-GATCCT- +24 -TTCGT-CGTAGTC-GGCTCG-ACCTGGTAC-GTCTA-GCGAGCGT-GATCC-T 0 Choosing an Alignment • Many different alignments are possible • Should consider all possible • Take the best score found • There may be more than one best alignment

  18. Why is it hard ? Alignment (without gaps) requires an algorithm that performs a number of comparisons roughly proportional to the square of the average sequence length. If we include gaps the number of comparisons becomes astronomical

  19. Algorithms for pairwise alignments • Dot Plots – Gibbs and McIntyre • Dynamic Programming : Local alignment : Smith- Waterman Global alignment :Needelman-Wunsch

  20. Dot Plots • Early method • Sequences at top and left • Dots indicate matched bases • Diagonal series show matched regions TAGTCG TAG-CG

  21. Dynamic Programming • A method for reducing a complex problem • to a set of identical sub-problems • The best solution to one sub-problem is independent from the best solution to the other sub-problem

  22. Dynamic Programming • A method for reducing a complex problem • to a set of identical sub-problems • The best solution to one sub-problem is independent from the best solution to the other sub-problem

  23. what does it mean? If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z

  24. Example Sequences: A = ACGCTG, B = CATGT A C G C T G 1 2 3 4 5 6 C 1 A 2 T 3 G 4 T Z 5

  25. Score of best alignment between AC and CATG …between ACG and CATG -1 2 …between AC and CATGT Calculate score between ACG and CATGT -2 ? Example Sequences: A = ACGCTG, B = CATGT Match:+2, Other:-1

  26. Needleman-Wunsch Example Align the next letter in the sequences Insertion in the first sequence 3 5 - 5 Insertion in the Second sequence 3 -

  27. -1 from before plus -1 for mismatch of G against T-2 2 from before plus -1 for mismatch of – against T1 -2 from before plus -1 for mismatch of G against –-3 Cell gets highest score of -2,1,-31 1 Needleman-Wunsch Example -1 2 -2 Sequences: A = ACGCTG, B = CATGT

  28. Needleman-Wunsch Example -1 2 -2 Sequences: A = ACGCTG, B = CATGT

  29. A -

  30. ACGCTG ------

  31. ----- CATGT

  32. A C

  33. AC -C

  34. ACG -C-

  35. ACGC ---C ACGC -C--

  36. ACG -CA

  37. ACGCTG- -C-ATGT

  38. ACGCTG- -CA-TGT

  39. -ACGCTG CATG-T-

  40. Summary Needleman-Wunsch Alignment • Global alignment between sequences • Compare entire sequence against another • Create scoring table • Sequence A across top, B down left • Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B • Global alignment score is bottom right cell

  41. Local AlignmentSmith-Waterman • Best score for aligning part of sequences • Often beats global alignment score Global Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

  42. Global vs. Local alignment DOROTHY DOROTHY HODGKIN HODGKIN Global alignment: DOROTHY--------HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

  43. Global vs. Local alignment Alignment of two Genomic sequences >Human DNA CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA >Mouse DNA CATGCGTCTGACgctttttgctagcgatatcggactATCGATATA

  44. Global vs. Local alignment Alignment of two Genomic sequences Global Alignment Human:CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA Mouse:CATGCGTCTGACgct---ttttgctagcgatatcggactATCGAT-ATA ****** ***** * *** * ****** *** Human:CATGCGACTGAC Mouse:CATGCGTCTGAC Human:ATCGATCATA Mouse:ATCGAT-ATA Local Alignment

  45. Global vs. Local alignment Alignment of two Genomic DNA and mRNA >Human DNA CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA >Human mRNA CATGCGACTGACATCGATCATA

  46. Global vs. Local alignment Alignment of two Genomic DNA and mRNA Global Alignment DNA: CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA mRNA:CATGCGACTGAC---------------------------ATCGATCATA ************ ********** DNA: CATGCGACTGAC mRNA:CATGCGACTGAC DNA: ATCGATCATA mRNA:ATCGATCATA Local Alignment

More Related