Reporting Protein Identifications from MS/MS Results

Reporting Protein Identifications from MS/MS Results Brian C. Searle Proteome Software Inc. Portland, Oregon USA Brian.Searle@ProteomeSoftware.com Creative Commons Attribution

Outline • Assigning Proteins from Peptide IDs • Correcting for One-Hit-Wonders • Protein False Discovery Rates? • Correcting for Shared Peptides • Publication Standards

Just to Review: F possibly correct R clearly wrong Elias JE, Gygi SP. Nat Methods. 2007 Mar;4(3):207-14.

Just to Review:

Just to Review: ?

…Well, Maybe

AEPTIR Protein IDVCIVLLQHK NTGDR

85% AEPTIR ??% 65% Protein IDVCIVLLQHK 25% NTGDR

FDRs for Whole Datasetsvs Individual Peptides • Cumulative FDRs only estimate the validity of a data set • Probabilities (or instantaneous FDRs) estimate the validity of a peptide of interest

One Possible Approach • Instantaneous False Discovery Rate • PeptideProphet (TPP, Scaffold) • Percolator • Spectral Energies • RAId De Novo Many Others:

Just to Review:

Just to Review: 4 to 5 3 to 4 2 to 3 1 to 2 0 to 1 -1 to 0 -2 to -1

Histogram of Decoy Matches “2x Decoy” # of Matches “Correct” Ion Score – Identity Score

Curve Fit Distributions “2x Decoy” # of Matches “Correct” Ion Score – Identity Score Choi H, Ghosh D, Nesvizhskii AI. J Proteome Res. 2008 Jan;7(1):286-92.

Instantaneous FDR Method “2x Decoy” # of Matches “Correct” Ion Score – Identity Score Choi H, Ghosh D, Nesvizhskii AI. J Proteome Res. 2008 Jan;7(1):286-92.

AEPTIR 85% ??% Protein 65% IDVCIVLLQHK 25% NTGDR

AEPTIR (15%) (??%) Protein (35%) IDVCIVLLQHK (75%) NTGDR Feng J, Naiman DQ, Cooper B. Anal Chem. 2007 May 15;79(10):3901-11.

AEPTIR (15%) (4%) Protein (35%) IDVCIVLLQHK (75%) NTGDR 0.15 * 0.35 * 0.75 = 0.04 Feng J, Naiman DQ, Cooper B. Anal Chem. 2007 May 15;79(10):3901-11.

AEPTIR 85% 96% Protein 65% IDVCIVLLQHK 25% NTGDR 0.15 * 0.35 * 0.75 = 0.04 Feng J, Naiman DQ, Cooper B. Anal Chem. 2007 May 15;79(10):3901-11.

If only it were so easy!

Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5 Peptide 6 Peptide 7 Peptide 8 Peptide 9 Peptide 10 80% Peptides

Peptide 1 Correct Protein A Peptide 2 Peptide 3 Correct Protein B Peptide 4 Peptide 5 Peptide 6 Peptide 7 Peptide 8 Peptide 9 Peptide 10 80% Peptides

Peptide 1 Correct Protein A Peptide 2 Peptide 3 Correct Protein B Peptide 4 Peptide 5 Incorrect Protein C Peptide 6 Peptide 7 Incorrect Protein D Peptide 8 Peptide 9 Peptide 10 80% Peptides 50% Proteins

One hit wonders aredubious at best

Outline • Assigning Proteins from Peptide IDs • Correcting for One-Hit-Wonders • Protein False Discovery Rates? • Correcting for Shared Peptides • Publication Standards

Actual Probability Computed Probability Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

UNDER estimation Actual Probability OVER estimation Computed Probability Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

What if we could scoreone-hit-wonderness? Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

Combining different peptides • Quantify as a score: If different peptides agree: Good! If peptides are one-hit-wonders: Bad! Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

Combining different peptides • Quantify as a score: If different peptides agree: Good! If peptides are one-hit-wonders: Bad! • Peptide agreement score: Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

Combining different peptides • Quantify as a score: If different peptides agree: Good! If peptides are one-hit-wonders: Bad! • Peptide agreement score: NSP score for peptide (k) is the sum of other agreeing peptides (not k) Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

Protein Prophet Distributions One-hit Wonders Multi-hit Proteins

Protein Prophet Distributions

Protein Prophet Distributions multi-hit proteins (increase prob) in between (keep same) one hit wonders (decrease prob)

UNDER estimation Actual Probability OVER estimation Computed Probability Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

with NSP Actual Probability without NSP Computed Probability Nesvizhskii, A. I.; Keller, A. et al Anal. Chem.75, 4646-4658

Brian, I hate math.What do I do?

Option 1:Throw Out One-Hit-Wonders Advantages: Easy, works! Disadvantages: Loss of sensitivity!

Option 2: Use Multiple Filters Filter 2 - Peptide Mode Filter 1 - Protein Mode • 1 peptide/protein • high spectrum threshold • ≥2 peptides/protein • moderate spectrum threshold

Option 2: Use Multiple Filters Advantages: More sensitive! Disadvantages: Pretty arbitrary!

Option 3: • Assigning Proteins from Peptide IDs • Correcting for One-Hit-Wonders • Protein False Discovery Rates? • Correcting for Shared Peptides • Publication Standards

Protein FDRs only accurate with >100 Proteins Uncertainty in Protein FDR 1% Error In FDR Estimation Number of Confidently IDed Proteins

Histogram of Decoy PROTEIN Matches “2x Decoy” # Protein Identifications “Correct” Protein Score

Reporting Protein Identifications from MS/MS Results