1 / 31

Prediction of protein disorder

Prediction of protein disorder. Zsuzsanna Doszt á nyi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest, Hungary dosztanyi@ceaser.elte.hu. Protein Structure/Function Paradigm.

pascal
Télécharger la présentation

Prediction of protein disorder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction of protein disorder Zsuzsanna Dosztányi MTA-ELTE Momentum Bioinformatics Group Department of Biochemistry Eotvos Lorand University, Budapest, Hungarydosztanyi@ceaser.elte.hu

  2. Protein Structure/Function Paradigm Dominant view: 3D structure is a prerequisite for protein function

  3. But…. • Heat stability • Protease sensitivity • Failed attempts to crystallize • Lack of NMR signals • Increased molecular volume • “Freaky” sequences …

  4. IDPs • Intrinsically disordered proteins/regions (IDPs/IDRs) • Do not adopt a well-defined structure in isolation under native-like conditions • Highly flexible ensembles • Functional proteins

  5. p53 tumor suppressor transactivation regulation tetramerization DNA-binding DBD TD RD TAD Disordered region Disordered region Wells et al. PNAS 2008; 105: 5762

  6. Bioinformatics of protein disorder • Part 1 Prediction of protein disorder • Databases • Prediction of protein disorder • Part 2 Biology of disordered proteins • Prediction of functional regions within IDPs

  7. Datasets • Ordered proteins in the PDB • over 100000 structures • few 1000s folds • Some structures in the PDB classify as disordered! only adopt a well-defined structure in complex in crystals, with cofactors, proteins, … • Disorder in the PDB • Missing electron density regions from the PDB • NMR structures with large structural variations • Less than 10% of all positions • Usually short (<10 residues), often at the termini

  8. Disprot www.disprot.org Current release: 6.02Release date: 05/24/2013Number of proteins: 694Number of disordered regions: 1539 Experimentally verified disordered proteins collected from literature (X-ray, NMR, CD, proteolysis, SAXS, heat stability, gel filtration, …)

  9. Additional databases • Combining experiments and predictions • Genome level annotations • MobiDB: http://mobidb.bio.unipd.it • D2P2: http://d2p2.pro • IDEAL: http://www.ideal.force.cs.is.nagoya-u.ac.jp/IDEAL

  10. Sequence properties of disordered proteins • Amino acid compositional bias • High proportion of polar and charged amino acids (Gln, Ser, Pro, Glu, Lys) • Low proportion of bulky, hydrophobhic amino acids (Val, Leu, Ile, Met, Phe, Trp, Tyr) • Low sequence complexity • Signature sequences identifying disordered proteins Protein disorder is encoded in the amino acid sequence

  11. Amino acid compositions He et al. Cell Res. 2009; 19: 929

  12. Prediction methods for protein disorder Over 50 methods • Based on amino acid propensity scales or on simplified biophysical models • GlobPlot, FoldIndex, FoldUnfold, IUPred, UCON • Machine learning approaches • PONDR VL-XT, VL3, VSL2; Disopred; POODLE S and L ; DisEMBL; DisPSSMP; PrDOS, DisPro, OnD-CRF, POODLE-W, RONN

  13. 1.Amino acid propensity scale GlobPlot Compare the tendency of amino acids: • to be in coil (irregular) structure. • to be in regular secondary structure elements Linding (2003) NAR 31, 3701

  14. GlobPlot From position specific predictions Where are the ordered domains? Longer disordered segments? Noise vs. real data

  15. downhill regions correspond to putative domains (GlobDom) up-hill regions correspond to predicted protein disorder GlobPlot: http://globplot.embl.de/

  16. Globular proteins Large entropy penalty Large number of inter-residue contacts

  17. If a residue cannot form enough favorable interactions within its sequential environment, it will not adopt a well defined structure it will be disordered 2. Physical principles IUPred • Based on an energy estimation method • Parameters calculated from statistics of globular proteins • No training on disordered proteins Dosztanyi (2005) JMB 347, 827

  18. Decide the probability of the residue being disordered based on this A – 10% C – 0% D – 12 % E – 10 % F – 2 % etc… Estimate the interaction energy between the residue and its environment Amino acid composition of environ-ment: IUPred • The algorithm: …PSVEPPLSQETFSDL WKLLPENNVLSPLPSQAMDDLMLSP D DIEQWFTEDPGPDEAPRMPEAAPRVA PAPAAPTPAA... Based only on the composition of environment of D’s we try to predict if it is in a disordered region or not:

  19. 3. Machine learning approaches INPUT OUTPUT . A T V Q L S M I W Q ST R . D O

  20. DISOPRED2: …..AMDDLMLSPDDIEQWFTED….. SVM with linear kernel F(inp) Assign label: D or O O D Ward (2004) JMB 337, 635

  21. DISOPRED2 Cutoff value!

  22. PONDR VSL2 Differences in short and long disorder • amino acid composition • methods trained on one type of dataset tested on other dataset resulted in lower efficiencies PONDR VSL2: separate predictors for short and long disorder combined length independent predictions Peng (2006) BMC Bioinformatics 7, 208

  23. 4. Metaservers: Disorder prediction methods Meta-predictor PONDR VLXT PONDR VL3 PONDR VSL2 Sequence Prediction ANN IUPred FoldIndex TopIDP Xue et al. Biochem Biophys Acta. 2010; 180: 996

  24. Disordered regions and secondary structure • Coil is an ordered, irregular structural element • Disordered proteins usually do not contain stable secondary structural elements • (e.g. by CD) • They can contain transient secondary structure elements • (by NMR) • Pure random coil never occurs • Use secondary structure predictions methods for disordered proteins with extreme caution • Long segments without predicted secondary structure may indicate proteins disorder (NORsnet)

  25. Accuracy • True positive: Disordered residues are predicted as disordered • False positive: Ordered residues predicted as disordered • True negative: Ordered residues predicted as ordered • False negative: Disordered residues predicted as ordered 75-90%

  26. Prediction of protein disorder • Disordered residues can be predicted from the amino acid sequence • ~ 80% at the residue level • Methods can be specific to certain type of disorder • accordingly, accuracies vary depending on datasets • Predictions are based on binary classification of disorder

  27. Heterogeneity in protein disorder Transient structures Flexible loop RC-like Compact

  28. Modularity in proteins • Many proteins contains multiple domains • Composed of ordered and disordered segments • Average length of a PDB chain is < 300 • Average length of a human proteins ~ 500 • Average length of cancer-related proteins > 900 • Structural properties of full length proteins …

  29. Practical

More Related