1 / 25

Bioinformatics Tutorial I BLAST and Sequence Alignment

Bioinformatics Tutorial I BLAST and Sequence Alignment. What is BLAST?. Online tool from National Center for the Biotechnology Information ( NCBI ) “Google” for proteins and nucleotide sequences. What can you use BLAST for?. Identify an unknown sequence

solada
Télécharger la présentation

Bioinformatics Tutorial I BLAST and Sequence Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Tutorial I BLAST and Sequence Alignment

  2. What is BLAST? • Online tool from National Center for the Biotechnology Information (NCBI) • “Google” for proteins and nucleotide sequences

  3. What can you use BLAST for? • Identify an unknown sequence • Characterize the gene/protein of interest • Function/activity (gene and protein) • Structure or shape (new protein) • Location or preferred location (protein) • Stability (gene/transcript or protein) • Origin of a gene or protein

  4. Sequence alignment approaches • Global alignment • Needleman and Wunsch, 1970 • Local alignment (used in BLAST) • Smith and Waterman, 1980

  5. Global alignment • One approach for searching a query sequence is to align the entire sequence against all sequences in a database • This approach is very slow and hence impractical

  6. BLAST • A much faster approach • Divides your search query into short sequences (“words”) and initially looks for exact matches. Once found, these words are then extended • i.e. Basic Local Alignment Search Tool • Altschul, S.F. et al. Basic local alignment search tool. J Mol Biol. 215(3):403-10(1990).

  7. BLAST algorithm • Query sequences are usually split into words • Each word is then searched in database • Word hits are extended in either direction to generate alignment with score greater than the threshold score

  8. BLAST “The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T” - Alschul et al, 1990

  9. How does BLAST work?

  10. Step 1: Get your sequence • NCBI, UCSC etc.. • Sequencing facility (unknown gene)

  11. Step 2: Choose BLAST program

  12. The different BLAST programs • blastn (nucleotide BLAST) • blastp (protein BLAST) • blastx (translated BLAST) • tblastn (translated BLAST) • tblastx (translated BLAST)

  13. Simplified visualization

  14. Why translate in 6 reading frames? • DNA sequence can code for six different proteins 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’ 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG

  15. Step 3: Search parameters

  16. Step 4: Search results

  17. Important: Tabular output

  18. Score • Sequence similarity score is calculated based on the pair-wise alignment quality • Alignment score is the sum of scores for each position

  19. Score • Nucleotides • +1 score for each match • -2 score for each mismatch • Peptides • Each amino acid substitution is given a score

  20. Example David Fristrom, Introduction to BLAST AACGTTTCCAGTCCAAATAGCTAGGC ===--=== =-===-==-====== AACCGTTCTACAATTACCTAGGC Hits(+1): 18 Misses (-2): 5 Gaps (existence -2, extension -1): 1 Length: 3 Score = 18 * 1 + 5 * (-2) – 2 – 2 = 6

  21. E-value • E-value – expectation value; the number of different alignments which would yield a similar or better score if searched though the database by chance alone. • Low E-value – sequences may be homologous • Statistical significance depends on.. • Length of the query sequence • Size of the sequence database

  22. Graphical output

  23. Taxonomy Results

  24. Graphical output

  25. References • Figures and text adapted from the following sources: • David Fristrom, Introduction to BLAST • Jonathan Pevsner, BLAST: Basic local alignment search tool • Joanne Fox, BLAST: Finding function by sequence similarity

More Related