1 / 102

Protein Feature Identification

Protein Feature Identification. David Wishart Depts. Computing & Biological Science University of Alberta david.wishart@ualberta.ca. Proteins. Exhibit far more sequence and chemical complexity than DNA or RNA

olympe
Télécharger la présentation

Protein Feature Identification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Feature Identification David Wishart Depts. Computing & Biological Science University of Alberta david.wishart@ualberta.ca

  2. Proteins • Exhibit far more sequence and chemical complexity than DNA or RNA • Properties and structure are defined by the sequence and side chains of their constituent amino acids • The “engines” of life • >95% of all drugs target proteins • Favorite topic of post-genomic era

  3. The Post-genomic Challenge • How to rapidly identify a protein? • How to rapidly purify a protein? • How to identify post-trans modification? • How to find information about function? • How to find information about activity? • How to find information about location? • How to find information about structure? Answer: Look at Protein Features

  4. Protein Features ACEDFHIKNMF SDQWWIPANMC ASDFDPQWERE LIQNMDKQERT QATRPQDS... Sequence View Structure View

  5. Different Types of Features • Composition Features • Mass, pI, Absorptivity, Rg, Volume • Sequence Features • Active sites, Binding Sites, Targeting, Location, Property Profiles, 2o structure • Structure Features • Supersecondary Structure, Global Fold, ASA, Volume

  6. Where To Go http://www.expasy.org/

  7. Amino Acids (Review)

  8. Glycine and Proline H C C HN COOH H2N COOH H H P G

  9. Aliphatic Amino Acids CH3 CH3 CH3 CH3 V I C C H2N COOH H2N COOH H H CH3 CH3 CH3 A L C H2N COOH C H2N COOH H H

  10. Aromatic Amino Acids N N N W H C C H2N COOH H2N COOH OH H H Y F C C H2N COOH H2N COOH H H

  11. Charged Amino Acids H - COO + N D NH3 R C NH H2N COOH + C H NH3 H2N COOH - COO H E K C C H2N COOH H2N COOH H H

  12. Polar Amino Acids CH3 OH CONH2 N T C C H2N COOH H2N COOH H H CONH2 OH Q S C C H2N COOH H2N COOH H H

  13. Sulfo-Amino Acids CH3 S SH C C H2N COOH H2N COOH H H M C

  14. Compositional Features • Molecular Weight • Amino Acid Frequency • Isoelectric Point • UV Absorptivity • Solubility, Size, Shape • Radius of Gyration • Free Energy of Folding

  15. Molecular Weight

  16. Molecular Weight • Useful for SDS PAGE and 2D gel analysis • Useful for deciding on SEC matrix • Useful for deciding on MWC for dialysis • Essential in synthetic peptide analysis • Essential in peptide sequencing (classical or mass-spectrometry based) • Essential in proteomics and high throughput protein characterization

  17. Molecular Weight • Crude MW calculation: MW = 110 X Numres • Exact MW calculation: MW = SAAi x MWi • Remember to add 1 water (18.01 amu) after adding all res. • Note isotopic weights • Corrections for CHO, PO4, Acetyl, CONH2

  18. Amino Acid versus Residue R R C C N CO H2N COOH H H H Amino Acid Residue

  19. Protein Identification via MW • MOWSE • http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse • CombSearch • http://ca.expasy.org/tools/CombSearch/ • Mascot • http://www.matrixscience.com/search_form_select.html • AACompSim/AACompIdent • http://ca.expasy.org/tools/

  20. Molecular Weight & Proteomics 2-D Gel QTOF Mass Spectrometry

  21. Amino Acid Frequency • Deviations greater than 2X average indicate something of interest • High K or R indicates possible nucleoprotein • High C’s indicate stable but hard-to-fold protein • High G, P, Q, or N says lack of stable structure

  22. Isoelectric Point (pI) • The pH at which a protein has a net charge=0 • Q = S Ni/(1 + 10pH-pKi) Transcendental equation

  23. Isoelectric Point • Calculation is only approximate (+/- 1 pH) • Does not include 3o structure interactions • Can be used in developing purification protocols via ion exchange chromatography • Can be used in estimating spot location for isoelectric focusing gels • Can be used to decide on best pH to store or analyze protein

  24. UV Spectroscopy

  25. UV Absorptivity • UV (Ultraviolet light) has a wavelength of 200 to 400 nm • Most proteins and peptides (and all nucleic acids) absorb UV light quite strongly • UV spectroscopy is the most common form of spectroscopy performed today • UV spectra can be used to identify or classify some proteins or protein classes

  26. C C H2N H2N COOH COOH H H UV Absorptivity • OD280 = (5690 x #W + 1280 x #Y)/MW x Conc. • Conc. = OD280 x MW/(5690 X #W + 1280 x #Y) OH N

  27. Hydrophobicity • Indicates Solubility • Indicates Stability • Indicates Location (membrane or cytoplasm) • Indicates Globularity or tendency to form spherical structure

  28. Average Hydrophobicity AH = S AAi x Hi Hydrophobic Ratio RH = S H(-)/S H(+) Hydrophobic % Ratio RHP = %philic/%phobic Linear Charge Density LIND = (K+R+D+E+H+2)/# Solubility SOL = RH + LIND - 0.05AH Average AH = 2.5 + 2.5 Insol > 0.1 Unstrc < -6 Average RH = 1.2 + 0.4 Insol < 0.8 Unstrc > 1.9 Average RHP = 0.9 + 0.2 Insol < 0.7 Unstrc > 1.4 Average LIND = 0.25 Insol < 0.2 Unstrc > 0.4 Average SOL = 1.6 + 0.5 Insol < 1.1 Unstrc > 2.5 Hydrophobicity

  29. Protein Dimensions • Radius and Radius of Gyration • Molecular and Partial Specific Volume • Accessible Surface Area • Provides a size estimate of a protein • Used in analytical techniques such as neutron or X-ray scattering, analytical ultracentrifugation, light scattering

  30. Radius & Radius of Gyration • RAD = 3.875 x NUMRES 0.333 (Folded) • RADG = 0.41 x (110 x NUMRES) 0.5 (Unfolded) RadiusRadius of Gyration

  31. Partial Specific Volume • Measured in mL/g • Inverse measure of protein density (0.70-75) • Depends on protein’s composition and compactness • Measured via sedimentation analysis • PSV = S PSi x Wi

  32. Packing Volume Loose Packing Dense Packing Protein Proteins are Densely Packed

  33. Packing Volume (VP) • Determined via X-ray or NMR structure • “True” measure of volume occupied by protein • Approximate Value VP = 1.245 x MW • Exact Value VP = S AAi x Vi

  34. Different Types of Features • Composition Features • Mass, pI, Absorptivity, Rg, Volume • Sequence Features • Active sites, Binding Sites, Targeting, Location, Property Profiles, 2o structure • Structure Features • Supersecondary Structure, Global Fold, ASA, Volume

  35. Sequence Features AHGQSDFILDEADGMMKSTVPN… HGFDSAAVLDEADHILQWERTY… GGGNDEYIVDEADSVIASDFGH… *[LIVM][LIVM]DEAD*[LIVM][LIVM]* (EIF 4A ATP DEPENDENT HELICASE)

  36. Probability & Seq. Features • Expectation value (e) is the expected number of hits for a given sequence pattern or motif • e = N x f1 x f2 x f3 x .... fk • N is the number of residues in DB (108) • fi is the frequency of a given amino acid(s)

  37. Example #1 ACIDS e = 108*0.088*0.021*0.054*0.059*0.065 e = 38.3 #Found in OWL database = 14

  38. Example #2 A*ACI[DEN]S e = 108*0.088*1.000*0.088*0.021*0.054 *{0.059 + 0.059 + 0.046}*0.065 e = 9.4 #Found in OWL database = 9

  39. Minimum Pattern Lengths e = 108*0.088 = 0.17 min = 8 e = 108*0.057 = 0.08 min = 7 e = 108*0.036 = 0.07 min = 6 f = 0.08 f = 0.05 f = 0.03

  40. How Long Should a Sequence Motif or Sequence Block Be? • How many matching segments of length “l” could be found in comparing a query of length M to a DB of N ? • Answer: n(l) = M x N x fl • Assume f = 0.05, M = 300, N = 100,000,000

  41. Rule of Thumb Make your protein sequence motifs at least 8residues long

  42. Sites that Support Pattern Queries • OWL Database • http://bioinf.man.ac.uk/dbbrowser/OWL/ • PIR Website • http://pir.georgetown.edu/pirwww/search/patmatch.html • SCNPSITE at EXPASY • http://ca.expasy.org/tools/scanprosite/ • FPAT (Regular Expression Query) • http://stateslab.bioinformatics.med.umich.edu/service/fpat/

  43. Regular Expressions • C[ACG]T - Matches CAT, CCT and CGT only • C . T - Matches CAT, CaT, C1T, CXT, not CT • CA?T - Matches CT or CAT only • C+T - Matches CT, CCT, CCCT, CCCCT… • C(HE)?A[TP] - Matches CHEAT, CAT, CHEAP, CAP • S[A-I,L-Q,T-Z]?LK[A-I,L-Q,T-Z]?A - Matches S*LK*A

  44. PROSITE Pattern Expressions C - [ACG] - T - Matches CAT, CCT and CGT only C - X -T - Matches CAT, CCT, CDT, CET, etc. C - {A} -T - Matches every CXT except CAT C - (1,3) - T - Matches CT, CCT, CCCT C - A(2) - [TP]- Matches CAAT, CAAP [LIV] - [VIC] - X(2) - G - [DENQ] - X - [LIVFM] (2) -G

  45. Sequence Feature Databases • PROSITE - http://ca.expasy.org/prosite/ • BLOCKS - http://www.blocks.fhcrc.org/ • DOMO - http://www.infobiogen.fr/services/domo/ • PFAM - http://pfam.wustl.edu • PRINTS - http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/ • SEQSITE - PepTool

  46. C C C H2N H2N H2N COOH COOH COOH H H H Phosphorylation Sites pY pT pS PO4 CH3 PO4 PO4

  47. Phosphorylation Sites

  48. Glycosylation

  49. Glycosylation Sites

  50. Signaling

More Related