1 / 47

Scoring Matrices

Scoring Matrices. Diff. Scoring Rules Lead to Diff. Alignments. Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps).

Jimmy
Télécharger la présentation

Scoring Matrices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scoring Matrices

  2. Diff. Scoring Rules Lead to Diff. Alignments • Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) • Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps)

  3. Scoring Rules/Matrices • Why are they important? • The choice of a scoring rule can strongly influence the outcome of sequence analysis • What do they mean? • Scoring matrices implicitly represent a particular theory of evolution • Elements of the matrices specify the similarity of one residue to another

  4. The Sij in a Scoring Matrix (as log likelihood ratio)

  5. The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models • Common ancestry • By chance

  6. Likelihood Ratio for Aligning a Single Pair of Residues • Above: the probability that two residues are aligned by evolutionary descent • Below: the probability that they are aligned by chance • Pi, Pj are frequencies of residue i and j in all sequences (abundance)

  7. Likelihood Ratio of Aligning Two Sequences

  8. Two classes of widely used protein scoring matrices PAM = % Accepted Mutations:1500 changes in 71 groups w/ > 85% similarityBLOSUM = Blocks Substitution Matrix:2000 “blocks” from 500 families

  9. PAM and BLOSUM matrices are all log likelihood matrices • More specifically: • An alignment that scores 6 means that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

  10. Constructing BLOSUM Matrices Blocks Substitution Matrices

  11. BLOSUM Matrices of Specific Similarities • Sequences with above a threshold similarity are clustered. • If clustering threshold is 62%, final matrix is BLOSUM62

  12. A toy example of constructing a BLOSUM matrix from 4 training sequences

  13. Constructing a BLOSUM matr.1. Counting mutations

  14. 2. Tallying mutation frequencies

  15. 3. Matrix of mutation probs.

  16. 4. Calculate abundance of each residue (Marginal prob)

  17. 5. Obtaining a BLOSUM matrix

  18. Constructing the real BLOSUM62 Matrix

  19. 1.2.3.Mutation Frequency Table

  20. 4. Calculate Amino Acid Abundance

  21. 5. Obtaining BLOSUM62 Matrix

  22. BLOSUM matrices reference • S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919 • Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

  23. Break • Homework

  24. PAM Matrices (Point Accepted Mutations) Mutations accepted by natural selection

  25. Constructing PAM Matrix: Training Data

  26. PAM: Phylogenetic Tree

  27. PAM: Accepted Point Mutation

  28. Mutability of Residue j

  29. Total Mutation Rate is the total mutation rate of all amino acids

  30. Normalize Total Mutation Rate to 1% This defines an evolutionary period: the period during which the 1% of all sequences are mutated (accepted of course)

  31. Mutation Probability Matrix Normalized Such that the Total Mutation Rate is 1%

  32. Mutation Probability Matrix (transposed) M*10000

  33. -- PAM1 mutation prob. matr. -- PAM2 Mutation Probability Matrix? -- Mutations that happen in twice the evolution period of that for a PAM1

  34. PAM Matrix: Assumptions

  35. In two PAM1 periods: • {AR} = {AA and AR} or {AN and NR} or {AD and DR} or … or {AV and VR}

  36. Entries in a PAM-2 Mut. Prob. Matr.

  37. PAM-k Mutation Prob. Matrix

  38. PAM-k log-likelihood matrix

  39. PAM-250

  40. PAM60—60%, PAM80—50%, • PAM120—40% • PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

  41. PAM Matrices: Reference • Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff. ed. National Biomedical Research Foundation, 1

  42. Choice of Scoring Matrix

  43. PAM Based on extrapolation of a small evol. Period Track evolutionary origins Homologous seq.s during evolution BLOSUM Based on a range of evol. Periods Conserved blocks Find conserved domains Comparing Scoring Matrix

  44. Sources of Error in PAM

More Related