260 likes | 387 Vues
This text explores the significance of Position-Specific Scoring Matrices (PSSMs) and PSI-BLAST in detecting homologs based on specific gene products. It details how PSSMs derive scoring matrices by assessing amino acid substitution frequency and accommodating insertions and deletions (INDELs). The iterative nature of PSI-BLAST is emphasized, showcasing how it refines the scoring matrix by filtering through significant hits, ultimately aiding in improved accuracy for homology searches across large protein databases. Learn how these powerful tools enhance protein sequence alignment.
E N D
Expect value(E-value) • Expected number of hits, of equivalent or better score, found by random chance in a database of the size searched.
Conserved domains Domain: sequence of amino acids that typically fold to a stable tertiary structure. Many proteins are multi-domain.
Blast to Psi-Blast • Blast makes use of Scoring Matrix derived from large number of proteins. • What if you want to find homologs based upon a specific gene product? • Develop a position specific scoring matrix (PSSM).
PSSM M F W Y G A P V I L C R K E N D Q S T H M G A S F 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Determine frequency of substitution, and converts to LogOdd score.
PSSM INDEL M F W Y G A P V I L C R K E N D Q S T H M G A S F 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Indel 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Can include a score for permitting insertions and deletions. Perhaps this position is at a turn, where INDELs are common.
PSSM • In evaluating (scoring) alignments, PSSM approaches typically: • Reward matches to columns that have conserved amino acids • Penalize mismatches to columns with conserved amino acid more than mismatches in a variable column
PSI-BLAST • Input a single query sequence. • Executes a BLAST run. • Program takes significant hits, incorporates matches into a PSSM. • Sequences >98% similar not included (avoid biasing the PSSM).
Power of approach: • PSI-BLAST is iterative. • Takes best hits and improves the scoring matrix.
The PSSM will skew towards this region