1 / 1

Georgetown University

Georgetown University. Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows. Nathan J. Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University. a ). b ). Introduction.

april
Télécharger la présentation

Georgetown University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Georgetown University Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University a) b) Introduction Protein inference tools are poorly designed for FDR filtered peptide identifications: • True peptide identifications clusteron relatively few true proteins, and • False peptide identifications are spreadacross many different proteins, magnifyingthe number of false positive proteins. • Boostingthe number of peptide identificationsat fixed FDR increasesthe number of false positive proteins. • Successful protein inference: • must ignorea significant proportion of peptide identifications. • must ensureinferred proteins are supported by atleast two unique peptides. Figure 1: Inferred proteins for a) Sigma49 and b) Yeast. References Spectra and Peptide Identifications Generalized Protein Parsimony Traditional Protein Parsimony • Zhang, B.;  Chambers, M. C.;  Tabb, D. L. 2007. • Nesvizhskii, A. I.;  Keller, A.;  Kolker, E.;  Aebersold, R. 2003 • Dominated and equivalent proteinscan be quickly and easily eliminated. • Unique peptides force proteins into the solution. • Unresolved protein-peptide bipartite-graph can be decomposed into components. • Many components are trivial. • Greedy solution is optimal for most components. • Branch-and-bound easily finds optimal for all. • Peptides weighted by the number of spectra (peptide identifications) represented. • Constrain the minimum number of unique peptides per protein. • Minimize proteins covering a fixed proportion of the peptide identifications (c.f. FDR), or • Maximize covered peptide identifications, subject to protein constraint(s). • Greedy solution not necessarily feasible! • Branch-and-bound readily finds optimal. • Sigma49 – 32,691 LTQ MS/MS spectra of 49 human protein standards1; IPI Human • Yeast – 162,420 LTQ MS/MS spectra from a yeast cell lysate1; Saccharomyces Genome Database. • X!Tandem (no refinement), filter at 1% FDR • FDR estimation using reversed target database • 1HW Elim. – 1-hit wonders eliminated before parsimony analysis1. • Comparison with ProteinProphet2applied to FDR filtered peptide identifications PP – Protein Prophet prob. > 0; PP* – Protein Protein Prophet prob. ≥ (1-FDR) & # unique stripped peptides ≥ 2. Figure 2: FDR threshold vs. uncovered ids (Yeast). a) b) Figure 3: Large connected components in protein-peptide bipartite-graphs: a) Sigma49, b) Yeast. Rows: proteins, Columns: peptides. Observed peptides in RED. Conclusions • Inferred proteins should be supported by at least two unique peptides. • Inferred proteins from FDR-filtered peptide-identifications should leave some peptides uncovered – especially one-hit-wonders. • Branch-and-boundsolves generalizations of the protein parsimony problem optimally.

More Related