80 likes | 193 Vues
This lecture explores heuristic methods for Pairwise Sequence Alignment (PSA), focusing on the dot-matrix analysis and advanced searching techniques like FASTA and BLAST. It discusses the advantages of word-based alignment, including ktup-based strategies and rapid hash-table comparisons. The key differences between FASTA and BLAST are highlighted, such as word matching approaches and sensitivity. Detailed methodologies for finding diagonal runs, local search spaces, and iterative searching are presented, providing insights into optimizing sequence database searches for protein and DNA alignments.
E N D
Heuristic PSA • “Words” to describe dot-matrix analysis • Approaches • FASTA • BLAST • Searching databases for sequence similarities • PSA • Alternative strategies • Iterative searching • Reverse searching Lecture 7 CS566
“Words” for Dot-matrix analysis • Useful ideas from DM Alignment • Diagonal represents local match • Broken diagonal = intervening mismatch • Displaced diagonals = Matches with gaps • Advantage of using word-based alignment • Faster algorithm • Word-list comparison faster than sequence comparison • Hashes used for rapid comparison of words • “Devil is in the details” Lecture 7 CS566
FASTA (Fast-All) • Motivation: Needed rapid PSA method to search databases for matches to query sequence (1:n comparisons) • ktup (k-tuple or word) based alignment • Create hash tables for sequences • Find matching ktups (“hot-spots”/short diagonals) in pair of sequences • ktup size = 2 for protein (6 for DNA) Lecture 7 CS566
FASTA • Find 10 best “diagonal-runs” • Group hot-spots by the (i-j) diagonal they lie in • Main diagonal numbered 0; • Positive diagonals lie above main diagonal, negative lie below • Diagonal-run = set of consecutive (not necessarily contiguous) hot-spots, penalized by size of intervening mismatch • Save top 10 diagonal runs Lecture 7 CS566
FASTA • Find init1 • Init1 = best contiguous subsequence from top 10 diagonal runs, based on AAS (default BLOSUM50) • Define local search space around init1 • Include (32 / ktup) +/- diagonals in search space • For ktup = 2, 16 diagonals around init1 • Perform Smith-Waterman PSA in reduced space • Report resulting alignment as opt Lecture 7 CS566
BLAST (Basic local alignment search tool) • Built upon ideas derived from FASTA, with incorporation of new elements • For every word in query, generate set of words • Use AAS for similarity score between query word and all possible words of same size • Include all words exceeding cut-off in set • Example: For word DED, and threshold 0, word set includes DED, DDD, EEE, EDE etc. • For every query word, generate hot-spots based on set of similar words • Then merge contiguous words along same diagonal (a la FASTA) to form High Scoring Pairs (HSPs) Lecture 7 CS566
FASTA versus BLAST • Word matching exact in FASTA but inexact (AAS-based) in BLAST • Larger word size in BLAST • FASTA more sensitive (Why?) but slower (Why?) • BLAST handles “low-complexity” inline • Programs DUST and/or SEG used for filtering sequences Lecture 7 CS566
Variations on BLAST-based searching • Mapping query to different alphabets • Protein versus DNA, • DNA versus protein (Multiple reading frames) • PSI-BLAST: Position-specific iterative BLAST • Use query to find hits • Assemble hits into on-the-fly Position-specific-scoring matrix (PSSM) • RPS-BLAST: Reverse position-specific BLAST • Query is search space • Database of PSSMs used to search for match Lecture 7 CS566