1 / 65

Data Analysis 1

Data Analysis 1. Brett S. Phinney, Ph D. Protein Identification. Fundamentals How do we ID proteins MS MS/MS (tandem Mass spectrometry) What is a search engine Search Engine Examples Database based Non Database Based LC/MS/MS Real Examples. How do we ID proteins. Two main ways

ayita
Télécharger la présentation

Data Analysis 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Analysis 1 Brett S. Phinney, Ph D.

  2. Protein Identification • Fundamentals • How do we ID proteins • MS • MS/MS (tandem Mass spectrometry) • What is a search engine • Search Engine Examples • Database based • Non Database Based • LC/MS/MS • Real Examples

  3. How do we ID proteins • Two main ways • Finger Printing • Determine m/z of the Peptide ions only (MS) • Product Ion Scanning • Determine the m/z of the peptide ions (parent ions) • Fragment peptide ions • Determine m/z of Fragments (Product Ions)

  4. Finger Printing • Take Protein of Interest • 1d or 2d gel spot • HPLC fraction • Digest with a Specific Protease • Trypsin • Analyze peptides with a Mass spectrometer • Usually MALDI, but it can be any type as long as the mass is accurate.

  5. Why Trypsin • Very Specific R/ K/ except PR/ PK/ • Very Active • Can buy it in a very pure form • Promega, Sigma, Princeton Scientific • Resistant to digesting itself • K and R residues in an average protein are optimally spaced • Peptides are usually 600-3000 da • For electrospray, fragments are doubly charged because of c term basic AA’s K and R

  6. K R R K K R K R NH2 COOH 300.12 da 312.56 da 418.56 da 551.52 da 718.23 da 407.39da Finger Printing

  7. Finger Printing • Hopefully each unique Protein will have a unique set of peptides m/z after the digest

  8. Finger Print limitations • You need a mass spectrometer capable of reasonable accurate masses • 50 ppm • MALDI with DE and a reflector • Genome must be pretty small • Yeast or smaller for good results • The ABI 4700 tof/tof usually is run where it does fingerprints + an additional 4-8 MS/MS spectra per spot on the top ions

  9. Finger Print Advantages • Usually can give you better coverage • Very Fast • Easy

  10. Finger Printing example

  11. Product Ion Scanning • Digest Protein with tyrpsin • Determine the m/z of a peptide ion • MALDI, ESI • Isolate the peptide ion from any other ions • Fragment the peptide ion • Determine mass of Fragments • Obtain AA sequence data from fragments

  12. Peptide Fragmentation Roepstorff Nomenclature for Possible Peptide Fragments y3y2y1 R O R R O R O 1 2 3 4 H H H C H C H C H H C COOH C C N N C N N H 2 b1b2b3

  13. Peptide Fragmentation Roepstorff Nomenclature for Possible Peptide Fragments y3y2y1 R O R R O R O 1 2 3 4 H H H C H C H C H H C COOH C C N N C N N H 2 b1b2b3

  14. 97 97 71 71 147 147 115 115 87 87 113 113 131 131 71 71 129 129 101 101 113 113 Fragmenting a Peptide S-P-A-F-D-S-I-M-A-E-T-L-K (M + H) 1410.6 Delta Mass Delta Mass b-ions+y-ions+ 88.1 S PAFDSIMAETLK 1323.6 185.2 SP AFDSIMAETLK 1226.4 256.3 SPA FDSIMAETLK 1155.4 403.5 SPAF DSIMAETLK 1008.2 518.5 SPAFD SIMAETLK 893.1 605.6 SPAFDS IMAETLK 806.0 718.8 SPAFDSI MAETLK 692.3 850.0 SPAFDSIM AEsTLK 561.7 921.1 SPAFDSIMA ETLK 490.6 1050.2 SPAFDSIMAE TLK 361.5 1151.3 SPAFDSIMAET LK 260.4 1264.4 SPAFDSIMAETL K 147.2

  15. B2 y10 y9 B3 y8 y11 y7 B4

  16. 71 da y10 y11

  17. Monoisotopic mass: the sum of all the lightest isotopes. Average mass: the abundance weighted sum of all the isotopes of all the elements present. Amino acid3LCSLCAverageMonoisotopic Glycine Gly G 57.0519 57.02146 Alanine Ala A 71.0788 71.03711 Serine Ser S 87.0782 87.02303 Proline Pro P 97.1167 97.05276 Valine Val V 99.1326 99.06841 Threonine Thr T 101.1051 101.04768 Cysteine Cys C 103.1388 103.00919 Leucine Leu L 113.1594 113.08406 Isoleucine Ile I 113.1594 113.08406 Asparagine Asn N 114.1038 114.04293 Aspartic acid Asp D 115.0886 115.02694 Glutamine Gln Q 128.1307 128.05858 Lysine Lys K 128.1741 128.09496 Glutamic acid Glu E 129.1155 129.04259 Methionine Met M 131.1926 131.04049 Histidine His H 137.1411 137.05891 Phenyalanine Phe F 147.1766 147.06841 Arginine Arg R 156.1875 156.10111 Tyrosine Tyr Y 163.1760 163.06333 Tryptophan Trp W 186.2132 186.07931

  18. 71 da y10 y11 A

  19. B2 y10 y9 B3 y8 y11 y7 B4 S E D A

  20. Problems • You have two ladders of overlapping masses • Cannot tell which is a B ion or which is a Y ion • Incomplete fragmentation • Chemical Noise • Mass accuracy of some instruments are not good enough to determine R groups • Glutamine and lysine differ by 0.036 . Phenylalanine and oxidized methionine differ by 0.033 • Gly + Val = 156.090 u and Arg = 156.101

  21. Big Problem • Can take you a very long time to “sequence” a “good” product ion spectra without a computer • 30 minutes if your good • 1-2 days to never if you are not • One experiment can generate 10,000 MS/MS spectra

  22. Computer Programs • Called search engines • Some search a database • Some search within the MS/MS spectra

  23. Search Engines • 4 general Types • Automated De novo sequencing • Peaks • Lutefisk XP • Peptide Sequence tags • Gluten-Tag • Cross Correlation • SEQUEST • Probability Based (seem to be in fashion now) • Mascot (sort of) • xTandem 2 • OMSSA • PROTE_PROBE

  24. Database search engine? • What is a database? • Why do we need a database?

  25. Database • A text file containing a collection of sequences (not really a database) • NA • AA • Many Different sources • SwissProt • NCBI NR • IPI databases • Custom (anything else you can think of)

  26. Why do we need a database • If we do not to interpret the spectra by hand or use automated denovo. • We can do pattern matching

  27. Pattern matching • Not sequencing • Use the protein sequences in the database • Do a in silico digest • Calculate m/z of in silico peptides • Match the “pattern” of masses in the mass spectrometer to the in sillico “pattern” • Score the result

  28. MS/MS Search engines • Also called non-interpreted product ion spectra scanning • Because you are not interpreting it by hand

  29. First step • Mascot, Tandem and SEQUEST work this way

  30. All Search engines put a error window around the parent ion mass • All theoretical Parent ions within this mass range will be selected to compare MS/MS spectra FTICR Mass Spectrometer (+/-) 2 ppm Ion-trap mass Spectrometer (+/-) a few da

  31. Affect of Accuracy on Protein Identification Non- Redundant database PPM # peptides within mass error • Sequence = HLVDEPQNLIK • [M+H-e]2 = 653.36170 NR Sequences = 1,798,171 • Search was limited to trypsin specificity with 1 missed cleavage and no PTM’s • Used the Q-match value in the mascot dat file for the peptide numbers • 0.5 ppm at 1000da = 0.0005 Da • Mass of an electron = 0.000549 Da

  32. All Search engines put a error window around the parent ion mass • All theoretical Parent ions in the database within this mass range will be selected to compare MS/MS spectra • Comparing MS/MS spectra is the most time consuming part Ion-trap = 60,000 FTICR= 3,000

  33. Low Accuracy • NR DataBase approx • 3 Million protein sequences • 180 thousand tryptic peptide sequences (within error) = Match!!! Time = 15 seconds

  34. LC/MS/MS 1 mm x 5 cm 75 um x 10 cm 8 um Tip Blunt Frit 100 um x 2cm trap

  35. Nano-spray Source • New Objectives Picoview

  36. Real Example LC/MS/MS2ug B Cereus trypsin digest4 hour gradient

  37. Total Ion Chromatogram Spectra at 190.11 minutes Lets look at m/z 953.03

  38. MS/MS Spectra

  39. Sequence Match X!tandem 2

  40. Mascot and X!tandem • Main Differences • Mascot can search using Taxonomy rules • X!tandem is free and faster • X!tandem can search using Point mutations automatically for every protein • Mascot can search using Point Mutations and unexpected modifications manually per protein • Mascot can search EST and NA databases • X!tandem has built in false positive statistics • Both work very well

  41. Mascot ResultsSPRG 2006 • http://tinyurl.com/opjzs

  42. X!tandem results

  43. X!Tandem Workflow Quickly identify proteins from tryptic peptides. Create database containing identified proteins only. Extensively search for modified/ non-enzymatic peptides only on identified proteins.

More Related