1 / 46

Protein Identification by Database Searching

Protein Identification by Database Searching. John Cottrell Matrix Science. Three ways to use mass spectrometry data for protein identification. Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein. PMF Servers on the Web.

elyse
Télécharger la présentation

Protein Identification by Database Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Identification by Database Searching John Cottrell Matrix Science

  2. Three ways to use mass spectrometry data for protein identification • Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein Protein Identification by Database Searching

  3. Protein Identification by Database Searching

  4. Protein Identification by Database Searching

  5. PMF Servers on the Web • ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/ • Bupid: http://zlab.bu.edu/Amemee/ • Mascot: http://www.matrixscience.com/search_form_select.html • MassSearch: http://www.cbrg.ethz.ch/services/MassSearch_new • MS-Fit (Protein Prospector): http://prospector.ucsf.edu/prospector/mshome.htm • PepMAPPER: http://www.nwsr.manchester.ac.uk/mapper/ • Profound (Prowl): http://prowl.rockefeller.edu/prowl-cgi/profound.exe • Mowse, PeptideSearch, Protocall, Aldente, XProteo Protein Identification by Database Searching

  6. Search • Parameters • database • taxonomy • enzyme • missed cleavages • fixed modifications • variable modifications • protein MW • estimated mass measurement error Protein Identification by Database Searching

  7. Protein Identification by Database Searching

  8. Henzel, W. J., Watanabe, C., Stults, J. T., JASMS 2003, 14, 931-942. Protein Identification by Database Searching

  9. Peptide Mass Fingerprint • Fast, simple analysis • High sensitivity • Need database of protein sequences • not ESTs or genomic DNA • Sequence must be present in database • or close homolog • Not good for mixtures • especially a minor component. Protein Identification by Database Searching

  10. H+ x3 y3 z3 x2 y2 z2 x1 y1 z1 R1 O R2 O R3 O R4 O H – N – C – C – N – C – C – N – C – C – N – C – C – OH H H H H H H H H a1 b1 c1 a2 b2 c2 a3 b3 c3 • Roepstorff, P. and Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601. Protein Identification by Database Searching

  11. Three ways to use mass spectrometry data for protein identification • Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein • Sequence Query Mass values combined with amino acid sequence or composition data Protein Identification by Database Searching

  12. Mann, M. and Wilm, M., Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66 4390-9 (1994). Protein Identification by Database Searching

  13. 1489.430 tag(650.213,GWSV,1079.335) Protein Identification by Database Searching

  14. Sequence Tag Servers on the Web • Mascot • http://www.matrixscience.com/search_form_select.html • MS-Seq (Protein Prospector) • http://prospector.ucsf.edu/prospector/mshome.htm • MultiIdent (TagIdent, etc.) • http://www.expasy.org/tools/multiident/ • PeptideSearch, Spider Protein Identification by Database Searching

  15. Protein Identification by Database Searching

  16. Protein Identification by Database Searching

  17. Sequence Tag • Rapid search times • Essentially a filter • Error tolerant • Match peptide with unknown modification or SNP • Requires interpretation of spectrum • Usually manual, hence not high throughput • Tag has to be called correctly • Although ambiguity is OK • 2060.78 tag(977.4,[Q|K][Q|K][Q|K]EE,1619.7). Protein Identification by Database Searching

  18. Three ways to use mass spectrometry data for protein identification • Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein • Sequence Query Mass values combined with amino acid sequence or composition data • MS/MS Ions Search Uninterpreted MS/MS data from a single peptide or from a complete LC-MS/MS run Protein Identification by Database Searching

  19. SEQUEST • Eng, J. K., McCormack, A. L. and Yates, J. R., 3rd., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5 976-89 (1994) Protein Identification by Database Searching

  20. MS/MS Ions Search Servers on the Web Protein Identification by Database Searching

  21. Protein Identification by Database Searching

  22. Protein Identification by Database Searching

  23. MS/MS Ions Search • Easily automated for high throughput • Can get matches from marginal data • Can be slow No enzyme Many variable modifications Large database Large dataset • MS/MS is peptide identification Proteins by inference. Protein Identification by Database Searching

  24. Search Parameters Protein Identification by Database Searching

  25. Search Parameters • Sequence Database Protein Identification by Database Searching

  26. Search Parameters • Sequence Database • Swiss-Prot (~500,000 entries) • High quality, non-redundant • NCBInr, UniRef100 (~19,000,000 entries) • Comprehensive, non-identical • EST databases (>400,000,000 entries) • Very large and very redundant • Sequences from a single genome • A consensus sequence • Peptides are lost at exon-intron boundaries (Entry counts are from mid-2012) Protein Identification by Database Searching

  27. Search Parameters • Taxonomy • Swiss-Prot 2010_08 Mammalia (mammals)=65104 Primates=26940 Homo sapiens (human)=20292 Other primates=6648 Rodentia (Rodents)=25473 Mus.=16358 Mus musculus (house mouse)=16307 Rattus=7533 Other rodentia=1582 Other mammalia=12691 Protein Identification by Database Searching

  28. Search Parameters • Mass Tolerances • Most search engines support separate mass tolerances for precursors and fragments • May allow fixed units (Da, mmu) or proportional (ppm, %) • Some search engines can correct for selection of 13C peak • Unless search engine performs some type of re-calibration, need to provide conservative estimate of mass accuracy, not precision • This doesn’t have to be a guessing game. Run a standard, then look at the error graphs for strong matches Protein Identification by Database Searching

  29. Search Parameters • Enzyme can be • Fully specific • Non-specific (“no enzyme”) • Some search engines support • Limited number of missed cleavage points • Semi-specific enzymes • Enzyme mixtures Protein Identification by Database Searching

  30. Search Parameters • Common peak list formats • DTA (Sequest) • PKL (Masslynx) • MGF (Mascot) • mzData (.XML) • mzML (.mzML) Protein Identification by Database Searching

  31. Search Parameters • Modifications • Fixed / static / quantitative modifications cost nothing • Variable / differential / non-quantitative modifications are very expensive Protein Identification by Database Searching

  32. Search Parameters • Modifications • Common artefacts Protein Identification by Database Searching

  33. Site Analysis Protein Identification by Database Searching

  34. Site Analysis Protein Identification by Database Searching

  35. Site Analysis Protein Identification by Database Searching

  36. Site Analysis Protein Identification by Database Searching

  37. Multi-pass Searches • Implemented under a variety of names • X!Tandem: Model refinement • Mascot: Error tolerant search • Spectrum Mill: Search saved hits, homology mode, unassigned single mass gap • Phenyx: 2-rounds • Paragon: Thorough ID, fraglet-taglet Protein Identification by Database Searching

  38. Scoring Total matches Incorrect matches Correct matches Score Protein Identification by Database Searching

  39. Scoring Receiver Operating Characteristic Protein Identification by Database Searching

  40. Sensitivity & Specificity Protein Identification by Database Searching

  41. Sensitivity & Specificity • Search a “decoy” database • Decoy entries can be reversed or shuffled or randomised versions of target entries • Decoy entries can be separate database or concatenated to target entries • Gives a clear estimate of false discovery rate Protein Identification by Database Searching

  42. Sensitivity & Specificity Total matches Incorrect matches Correct matches Score Protein Identification by Database Searching

  43. Sensitivity & Specificity Protein Identification by Database Searching

  44. Protein Inference General approach is to create a minimal list of proteins. “Principal of parsimony” or “Occam’s razor” Protein A Peptide 1 Peptide 2 Peptide 3 Protein B Peptide 1 Peptide 3 Protein C Peptide 2 Protein Identification by Database Searching

  45. Further Reading: Exercises: http://www.ms-ms.com/exercises/exercises.html Protein Identification by Database Searching

More Related