1 / 75

Protein Sequence Motifs

Protein Sequence Motifs. Aalt-Jan van Dijk Plant Research International, Wageningen UR Biometris , Wageningen UR aaltjan.vandijk@wur.nl. Plant Bioinformatics. Integrated analysis of omics datasets Transcriptomics Alternative splicing EST analysis Proteomics

Télécharger la présentation

Protein Sequence Motifs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Sequence Motifs Aalt-Jan van Dijk Plant Research International, Wageningen UR Biometris, Wageningen UR aaltjan.vandijk@wur.nl

  2. Plant Bioinformatics • Integrated analysis of omics datasets • Transcriptomics • Alternative splicing • EST analysis • Proteomics • Data (pre-)processing pipelining • Alternative splicing • Protein interactions networks • Metabolomics • Database- development • Data (pre-)processing pipelining • Metabolite and pathway-identification • Systems biology • network modelling (bottom-up) • Protein interactions networks • Genomics • Next Generation Sequencing • Genome assembly & annotation • (Comparative) genome analysis • SNP analysis, marker development • Technology • Computational infrastructure • Database development • Webbased analysis tools • Software- development • Workflow management systems • machine learning

  3. My research • Protein complex structures • Protein-protein docking • Correlated mutations • Interaction site prediction/analysis • Protein-protein interactions • Protein-DNA interactions • Motif search • Enzyme active sites

  4. Overview • Protein Motif Searching • Hydrophobicity & Transmembrane Domains • Protein Interactions • Sequence-motifs to predict interaction sites • Secondary Structure Prediction

  5. Protein Motif Searching

  6. What is a motif? • A motif is a description of a particular element of a protein that contains a specific sequence pattern • Motifs are identified by • 3D structural alignment • Multiple sequence alignment • Pattern searching programs

  7. What is a motif? • A motif is a description of a particular element of a protein that contains a specific sequence pattern • Motifs are identified by • 3D structural alignment • Multiple sequence alignment • Pattern searching programs

  8. Protein Motif Searching • Strict consensus pattern • useonlystrictlyconservedresidues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C C C P C CxxxxxCxxxPxxxxxC

  9. Protein Motif Searching • Strict consensus pattern • useonlystrictlyconservedresidues C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C C C P C CxxxxxCxxxPxxxxxC

  10. Protein Motif Searching • Strict consensus pattern • use only strictly conserved residues • But what about: • variable residues? • gaps? C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C C C P C CxxxxxCxxxPxxxxxC

  11. Protein Motif Searching • Strict consensus patterns contain • no alternative residues • no flexible regions • no mismatches • no gaps C--QASCDGIPLKMNDC C---VTCEGLPMRMDQC CERTLGCQPMPVH---C CxxxxxCxxxPxxxxxC C C P C

  12. Protein Motif Searching • Most motifs defined as regular expressions • Motifs can contain • alternative residues • flexible regions C-x(2,5)-C-x-[GP]-x-P-x(2,5)-C CXXXCXGXPXXXXXC | | | | | FGCAKLCAGFPLRRLPCFYG

  13. The PROSITE Syntax • A-[BC]-X-D(2,5)-{EFG}-H • A • B or C • anything • 2-5 D’s • not E, F, or G • H

  14. PROSITE entries • Mandatory motifs characterise a protein (super-) family ID SUBTILASE_ASP; PATTERN. DE Serine proteases, subtilase family, aspartic acid active site. PA [STAIV]-x-[LIVMF]-[LIVM]-D-[DSTA]-G-[LIVMFC]-x(2,3)-[DNH]. ID SUBTILASE_HIS; PATTERN. DE Serine proteases, subtilase family, histidine active site. PA H-G-[STM]-x-[VIC]-[STAGC]-[GS]-x-[LIVMA]-[STAGCLV]-[SAGM]. ID SUBTILASE_SER; PATTERN. DE Serine proteases, subtilase family, serine active site. PA G-T-S-x-[SA]-x-P-x(2)-[STAVC]-[AG].

  15. Exercise • Find the three subtilase motifs in prosite (prosite.expasy.org) • Compare the lists of proteins in which the motifs occur – what does this tell you? • Similarly, compare protein structures in which the motifs occur • Have a look at the “sequence logo”

  16. Protein Motif Searching • Some motifs occur frequently in proteins; they may not actually be present, such as • Post-translational modification sites ID ASN_GLYCOSYLATION; PATTERN. DE N-glycosylation site. PA N-{P}-[ST]-{P}.

  17. Exercise • Use a glycosylation site predictor such as http://www.cbs.dtu.dk/services/NetNGlyc/ • Input: your favorite set of sequences • Do you observe that some N-{P}-[ST] sites are likely to be glycosylated and others not?

  18. Profiles • Many motifs cannot be easily defined using simple patterns • Such motifs can be defined using profiles • A profile is constructed from a multiple sequence alignment. For each position, each amino acid is given a score depending on how likely it is to occur

  19. Calculating a Profile • For each alignment position: take the (weighted) average of the appropriate rows from the scoring matrix • An (extremelysimple) example: seq_01 A AAAAAAAAA W seq_02 A AAAAAAAA W W seq_03 A AAAAAAA W WW seq_04 A AAAAAA W WWW seq_05 A AAAAA W WWWW seq_06 A AAAA W WWWWW seq_07 A AAA W WWWWWW seq_08 A AA W WWWWWWW seq_09 A A W WWWWWWWW seq_10 A W WWWWWWWWW

  20. Excerpt from the EBLOSUM62 matrix: A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 A C D E F G H I K L M 10A: 4.0 0.0 -2.0 -1.0 -2.0 0.0 -2.0 -1.0 -1.0 -1.0 -1.0 N P Q R S T V W Y -2.0 -1.0 -1.0 -1.0 1.0 0.0 0.0 -3.0 -2.0 A C D E F G H I K L M 5A+5W: 1.0 -2.0 -6.0 -4.0 -1.0 -2.0 -4.0 -4.0 -4.0 -3.0 -2.0 N P Q R S T V W Y -6.0 -5.0 -3.0 -4.0 -2.0 -2.0 -3.0 8.0 0.0 A C D E F G H I K L M 10W: -3.0 -2.0 -4.0 -3.0 1.0 -2.0 -2.0 -3.0 -3.0 -2.0 -1.0 N P Q R S T V W Y -4.0 -4.0 -2.0 -3.0 -3.0 -2.0 -3.0 11.0 2.0 prophecy (EMBOSS), using Henikoff profile type, and BLOSUM62 matrix;

  21. Pattern Searching • Short linear motifs: e.g. http://dilimot.russelllab.org/ • Profiles: meme http://meme.sdsc.edu/meme/cgi-bin/meme.cgi

  22. Exercise Use a number of sequences wich contain the prositesubtilase motif and find motifs in those sequences with MEME

  23. Hydropathy Plot Prediction hydrophobic and hydrophilic regions in a protein

  24. Partition Coefficients Hydrophobic Hydrophilic Oil Water

  25. Hydrophobicity/Hydrophilicity Values Fauchere & Pliska Kyte & Doolittle Hopp & Woods Eisenberg R -1.37 -4.50 3.00 -2.53 K -1.35 -3.90 3.00 -1.50 D -1.05 -3.50 3.00 -0.90 Q -0.78 -3.50 0.20 -0.85 N -0.85 -3.50 0.20 -0.78 E -0.87 -3.50 3.00 -0.74 H -0.40 -3.20 -0.50 -0.40 S -0.18 -0.80 0.30 -0.18 T -0.05 -0.70 -0.40 -0.05 P 0.12 -1.60 0.00 0.12 Y 0.26 -1.30 -2.30 0.26 C 0.29 2.50 -1.00 0.29 G 0.48 -0.40 0.00 0.48 A 0.62 1.80 -0.50 0.62 M 0.64 1.90 -1.30 0.64 W 0.81 -0.90 -3.40 0.81 L 1.06 3.80 -1.80 1.06 V 1.08 4.20 -1.50 1.08 F 1.19 2.80 -2.50 1.19 I 1.38 4.50 -1.80 1.38 hydrophilic hydrophobic

  26. Hydrophobicity Plot • Sum amino acid hydrophobicity values in a given window • Plot the value in the middle of the window • Shift the window one position

  27. Sliding Window Approach • Calculate property for first sub-sequence • Use the result (plot/print/store) • Move to next residue position, and repeat

  28. Hydrophobicity Plot MEZCALTASTESVERYNICE

  29. Hydrophobicity Plot MEZCALTASTESVERYNICE

  30. Hydrophobicity Plot MEZCALTASTESVERYNICE

  31. Hydrophobicity Plot MEZCALTASTESVERYNICE

  32. Hydrophobicity Plot MEZCALTASTESVERYNICE

  33. Hydrophobicity Plot MEZCALTASTESVERYNICE

  34. Hydrophobicity Plot MEZCALTASTESVERYNICE

  35. Transmembrane Regions Rotation is 100 degrees per amino acid Climb is 1.5 Angstrom per amino acid residue

  36. Transmembrane Regions So we need approx. 30/1.5 = 20 amino acids to span the membrane 30 angstrom

  37. Adapting the window size to the size of the membrane spanning segment makes the picture easier to interpret

  38. window = 1 window = 9 window = 19 window = 121

  39. Protein Interactions

  40. Protein Interactions hemoglobin Obligatory

  41. Protein Interactions hemoglobin Mitochondrial Cu transporters Obligatory Transient

  42. Experimental approaches (1) Yeast two-hybrid (Y2H)

  43. Experimental approaches (2) Affinity Purification + mass spectrometry (AP-MS)

  44. Interaction Databases • STRINGhttp://string.embl.de/

  45. Interaction Databases

  46. Interaction Databases • STRINGhttp://string.embl.de/ • HPRDhttp://www.hprd.org/

  47. Interaction Databases

  48. Interaction Databases • STRINGhttp://string.embl.de/ • HPRDhttp://www.hprd.org/ • InteroPorchttp://biodev.extra.cea.fr/interoporc/Default.aspx • Many others…. E.g. see http://nar.oxfordjournals.org./content/39/suppl_1.toc

  49. Yeast protein interaction network

More Related