1 / 143

# The Basic Local Alignment Search Tool (BLAST)

The Basic Local Alignment Search Tool (BLAST). Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs. The Basic Local Alignment Search Tool (BLAST). A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y

Télécharger la présentation

## The Basic Local Alignment Search Tool (BLAST)

E N D

### Presentation Transcript

1. The Basic LocalAlignment Search Tool(BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs

2. The Basic LocalAlignment Search Tool(BLAST) A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y Most local alignments contain highly conserved sections without gaps

3. The Basic LocalAlignment Search Tool(BLAST) A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y -> search for high scoring segment pairs (HSP), i.e. gap-free local alignments

4. The Basic LocalAlignment Search Tool(BLAST)

5. The Basic LocalAlignment Search Tool(BLAST) A Y W T Y I V A L T – Q V R Q Y E A T S I L C I V M I Y S R A - Q Y R Y W R Y Advantages: (a) speed (b) statistical theory about HSP exists.

6. The Basic LocalAlignment Search Tool(BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs (2) Use word pairs as seeds

7. Pair-wise sequence alignment T W L M H C A Q Y I C I M X H X C X T H Y (1) Search word pairs of length 3 with score > T, Use them as seeds.

8. Pair-wise sequence alignment Naïve algorithm would have a complexity of O(l1 * l2) Solution: Preprocess query sequence: • Compile a list of all words that have a Score > T when aligned to a word in the Query.

9. Pair-wise sequence alignment Naïve algorithm would have a complexity of O(l1 * l2) Solution: Preprocess query sequence: • Compile a list of all words that have a Score > T when aligned to a word in the Query. Complexity: O(l1) • Organize words in efficient data structure (tree) for fast look-up

10. The Basic LocalAlignment Search Tool(BLAST) Rapid data base search tool (1990) Idea: (1) Search for high scoring segment pairs (2) Use word pairs as seeds (3) Extend seed alignments until score drops below threshold value

11. Pair-wise sequence alignment T W L M H C A Q Y I C I M X H X C X T H Y Extend seeds until score drops by X.

12. Pair-wise sequence alignment T W L M H C A Q Y I C I X M X H X C X T X H X Y Extend seeds until score drops by X.

13. Pair-wise sequence alignment Algorithm not guaranteed to find best segment pair (Heuristic) But works well in practice!

14. The Basic LocalAlignment Search Tool(BLAST) New BLAST version (1997) • Two-hit strategy

15. Pair-wise sequence alignment W L M H C A Q Y A R V I M X H X C X T H W AX R X v X Search twoword pairs of at the same diagonal, use lowerthreshold T

16. The Basic LocalAlignment Search Tool(BLAST) New BLAST version (1997) • Two-hit strategy • Gapped BLAST • Position-Specific Iterative BLAST (PSI BLAST)

17. The Basic LocalAlignment Search Tool(BLAST)

18. 1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn..............gE 1ycsB 1 kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE 1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgslvalgfsdgqearpeeiG 1ihvA 1 .NFRVYYRDsrd......pvwkGPAKLLWkg.................eG 1vie 1 .drvrkksga.........awqGQIVGWYctnlt.............peG 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN...... Multiple sequence alignment

19. Multiple sequence alignment First question: how to score multiple alignments? Possible scoring scheme: Sum-of-pairs score

20. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

21. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

22. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP......

23. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQtkngqGWVPSNYITPVN 1ycsB 39 WWWARlndkeGYVPRNLLGLYP

24. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

25. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

26. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

27. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp

28. Multiple sequence alignment Multiple alignment implies pairwise alignments: 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

29. Multiple sequence alignment Multiple alignment implies pairwise alignments: Use sum of scores of these p.a. 1aboA 36 WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39 WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51 WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27 AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28 YAVESeahpgsvQIYPVAALERIN......

30. Multiple sequence alignment Goal: Find multi-alignment with maximum score !

31. Multiple sequence alignment • Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment • Multidimensional search space instead of two-dimensional matrix!

32. Multiple sequence alignment

33. Multiple sequence alignment Complexity: For sequences of length l1 * l2 * l3 O( l1 * l2 * l3 ) For n sequences ( average length l ): O( ln ) Exponential complexity!

34. Multiple sequence alignment • Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment • Optimal solution not feasible:

35. Multiple sequence alignment • Needleman-Wunsch coring scheme can be generalized from pair-wise to multiple alignment • Optimal solution not feasible: • -> Heuristics necessary

36. Multiple sequence alignment (A) Carillo and Lipman (MSA) Find sub-space in dynamic-programming Matrix where optimal path can be found

37. Multiple sequence alignment (B) Stoye, Dress (DCA) • Divide search space into small • Calculate optimal alignment for sub-spaces • Concatenate sub-alignments

38. Multiple sequence alignment (B) Stoye, Dress (DCA)

39. Multiple sequence alignment (B) Stoye, Dress (DCA)

40. Multiple sequence alignment Progressive alignment. Carry out a series of pair-wise alignment

41. Multiple sequence alignment Most popular way of constructing multiple alignments: Progressive alignment. Carry out a series of pair-wise alignment

42. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP

43. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WWRLNDKEGYVPRNLLGLYP AVVIQDNSDIKVVPKAKIIRD YAVESEAHPGSFQPVAALERIN WLNYNETTGERGDFPGTYVEYIGRKKISP Align most similar sequences

44. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASFQPVAALERIN WLNYNEERGDFPGTYVEYIGRKKISP

45. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP

46. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Align sequence to alignment

47. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN- WW--RLNDKEGYVPRNLLGLYP- AVVIQDNSDIKVVP--KAKIIRD YAVESEASVQ--PVAALERIN------ WLN-YNEERGDFPGTYVEYIGRKKISP Align alignment to alignment

48. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP

49. Multiple sequence alignment WCEAQTKNGQGWVPSNYITPVN-------- WW--RLNDKEGYVPRNLLGLYP-------- AVVIQDNSDIKVVP--KAKIIRD------- YAVESEA---SVQ--PVAALERIN------ WLN-YNE---ERGDFPGTYVEYIGRKKISP Rule: “once a gap - always a gap”

50. Multiple sequence alignment Order of pair-wise profile alignments determined by phylogenetic tree based on pair-wise similarity values (guide tree)

More Related