400 likes | 735 Vues
Lecture 13 Structural Aspects of Protein-Protein Interactions. PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Slides modified from JoLan Chung. Agenda. Understand the importance of studying protein-protein interactions at the structural level
E N D
Lecture 13Structural Aspects of Protein-Protein Interactions PHAR 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD Slides modified from JoLan Chung
Agenda • Understand the importance of studying protein-protein interactions at the structural level • Classify the various types of interactions • Look in detail at one structure-based method for predicting protein-protein interactions
LINK Still seems the best review
Types of protein-protein interactions (PPI) Non-obligate PPI Obligate PPI the protomers are not found as stable structures on their own in vivo Non-obligate homodimer Sperm lysin Obligate homodimer P22 Arc repressor DNA-binding Obligate heterodimer Human cathepsin D 1LYB Non-obligate heterodimer RhoA and RhoGAP signaling complex
Types of protein-protein interactions (PPI) Non-obligate PPI Obligate PPI usually permanent the protomers are not found as stable structures on their own in vivo Permanent (many enzyme-inhibitor complexes) dissociation constant Kd=[A][B] / [AB] 10-7- 10-13 M Transient Weak (electron transport complexes) Kd mM-M Non-obligate transient homodimer, Sperm lysin (interaction is broken and formed continuously) Intermediate (antibody-antigen, TCR-MHC-peptide, signal transduction PPI), KdM-nM Strong (require a molecular trigger to shift the oligomeric equilibrium) KdnM-fM Obligate heterodimer Human cathepsin D Non-obligate permanent heterodimer Thrombin and rodniin inhibitor Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP
Types of protein-protein interactions (PPI) Non-obligate PPI Obligate PPI usually permanent the protomers are not found as stable structures on their own in vivo Permanent (many enzyme-inhibitor complexes) dissociation constant Kd=[A][B] / [AB] 10-7÷ 10-13 M Transient Weak (electron transport complexes) Kd mM-M Non-obligate transient homodimer, Sperm lysin (interaction is broken and formed continuously) Intermediate (antibody-antigen, TCR-MHC-peptide, signal transduction PPI), KdM-nM Strong (require a molecular trigger to shift the oligomeric equilibrium) KdnM-fM Obligate heterodimer Human cathepsin D Non-obligate permanent heterodimer Thrombin and rodniin inhibitor Bovine G protein dissociates into G and G subunits upon GTP, but forms a stable trimer upon GDP
One Approach Using Structural Bioinformatics: Exploiting Using Sequence and Structure Homologs to Identify Protein-Protein Binding Sites Work of Jo-Lan Chung (Former Graduate Student Chemistry/Biochemistry) Co-mentored by Wei Wang Proteins: Structure Function and Bioinformatics 2006 62:630-640 [PDF]
Current Situation • We have a relatively small number of protein complexes • We have a large number of predominantly apo form structures being determined by structural genomics and functionally driven structure determination • Exploiting the information obtained from these apo structures to identify functional sites, such as protein-protein binding sites, is an important question.
General Properties of Protein-Protein Binding Sites • More hydrophobic than the rest of the protein surface. • Relatively flat (except for enzyme-substrate binding sites). • The binding free energy is not distributed equally across the protein interfaces: a small subset of residues at the interfaces forms energy hot spots, enriched in tyrosine, tryptophan, and arginine. • Structurally conserved residues, especially polar resides, correspond to the energy hot spots.
Previous Methods to Identify Protein-Protein Binding Sites • Sequence conservation e.g. Consurf • Docking • Threading and homology modeling • Evolutionary tracing • Correlated mutations • Properties of patches • Hydrophobicity • Neural networks and support vector machines
Characteristics of Previous Methods • The above mentioned methods use evolutionary information from sequence alignments, and/or residue properties, and/or the geometric information from structures. • It is difficult to judge the predictive powers of these various methods due to the differences of learning sets, the different definitions of binding sites and the lack of benchmarks.
Structurally Conserved Surface Residues? • None of the above methods consider the residues which are spatially conserved on the surfaces among their structure homologs. • These residues are reported to have a correspondence to the energy hot spots on protein interfaces and can be derived from multiple structure alignments.
Approach • Survey the structurally conserved surface residues in the non-redundant chains of hetero complexes from the Protein Data Bank (PDB). • Incorporate the structurally conserved surface residues with the information derived from sequence alignments and single structures to identify protein-protein binding sites.
Dataset • All non-redundant hetero-complexes were collected from the PDB (<30% sequence identity) • A pair of chains was then selected if the reduction of the accessible solvent area (ASA) was at least 450 Å2 upon binding. • The pairs containing chains belong to SCOP class >= 8 (9 and 10 did not exist) or with length less than 80 amino acids were discarded
Dataset • Retrieve structure homologs at SCOP family level from Astral database at 40% sequence purge level. • Align each chain with its homologs with CEMC. • A chain was selected if at least 4 members in its alignment were aligned together over 60% of its length and with Z score > 4. • Discard the chains with less than 20 interfacial contacts or with constant B-factors. The final dataset was composed of 274 non-redundant chains of hetero-complexes. Each of these chains was accompanied with a structure alignment with at least 4 members.
Derive the Structurally Conserved Residues • The structural conservation scores were derived from the multiple structural alignments. • Each position in the alignment has a structural conservation score, which represents the conservation in 3D space. • A position has a high conservation score if the aligned residues are spatially conserved.
The Structural Conservation Score • Raw structural conservation score where if a is not gap and b is not gap otherwise where N is the total number of aligned structures, si(x) is the amino acid at position x in the ith structure in the alignment, M is a modified PET substitution matrix calculated by Valdar et al. d is the distance between Calpha atoms.
The Structural Conservation Score • The B-factors determined by X-ray crystallographic experiments provide an indication of the degree of mobility and disorder of an atom in a protein structure • Raw structural conservation scores were weighted by the normalized B-factors (Bnorm, i) to consider the structure flexibility where
Does the Structural Conservation Score Discriminate Interface from Non-interface Residues? 60,834 Residues from 274 Chains
Incorporating the Structural Conservation Scores to Predict the Interface Residues A surface residue ↓ Sequence profile + ASA + Structural conservation score in a window of 13 residues (The residue to be predicted and 12 spatially nearest surface residues) ↓ Support vector machine classifier trained to distinguish surface non-interface residues vs surface residues ↓ Interface or non-interface residue ? ↓ Clustering of sites
The Performances of the Predictors Predictor 1: Sequence profile + ASA.Predictor 2: Sequence profile + ASA + structural conservation score
Clustering of Results • Clustering was used to remove isolated interface residues and include non-interface residues surrounded by several interface residues • Any 2 residues were clustered if the distance between the C beta’s was less than 6.5A • Clusters with only 1or 2 residues were removed • A non-interface residue was converted to an interface residue if the C beta of the residue and at least 3 interface residues were located within 6A
The Performances of the Predictors Precise prediction: at least 70% interface residues were identified. Correct prediction: at least 50 % interface residues were identified. Partial prediction: some but less than 50 % interface residues were identified. Wrong prediction: no interface residues were identified.
The Predicted Binding Sites (Example 1) Protein : domain 1 of the human coxsackie and adenovirus receptor (CAR D1) - yellow • Mediate adenoviruses and coxsackie virus B infection. • CAR is an integral membrane protein expressed in a broad range of human and murine cell type. CAR D1 is one of its two extracellular domains. Binding partner: knob domain of the adenoviruses serotype 12 (Ad12) - blue Predictor 2 Predictor 1
The Predicted Binding Sites (example 1) • Each CAR D1 binds at the interface between two adjacent Ad12 knob domain • Consistent with the observation that most neutralizing antibodies to knob are directed against the trimer, rather than monomer.
The Predicted Binding Sites (example 2) • Protein : adrendoxin (Adx) • In mitochondria of the adrenal cortex, the steroid hydroxylating system requires the transfer of electrons from the membrane-attached flavoprotein AR via the soluble Adx to the membrane-integrated cytochrome P450 of the CYP 11 family. • Binding partner: adrenodoxin reductase (AR) - blue Predictor 1 Predictor 2
The Predicted Binding Sites (example 3) • Protein : fibroblast growth factor receptor 2 (FGFR2) Ser252Trp Mutant (Gray) • Apert syndrome (AS) is caused by substitution of one of two adjacent residues, Ser252Trp or Pro253Arg (green). • Binding partner: fibroblast growth factor (FGF2) (blue) Predictor 1 Predictor 2
The Predicted Binding Sites (example 4) Protein : Nitrogenase molybdenum-iron protein of Clostridium pasteurianum ß subunit Binding partner: Nitrogenase molybdenum-iron protein of Clostridium pasteurianum subunit Predictor 2 Predictor 1
Is the Method Sensitive to the Multiple Structure Alignment Algorithm? Hemoglobin chain Left: automatic multiple structure alignments (CE-MC) Right: expert curetted multiple structure alignments (HOMSTRAD)
Is the Method Sensitive to the Multiple Structure Alignment Algorithm? Murine t-cell receptor variable domain Left: automatic multiple structure alignments (CE-MC) Right: expert curetted multiple structure alignments (HOMSTRAD)
Is the Method Sensitive to the Multiple Structure Alignment Algorithm? Adrendoxin Left: automatic multiple structure alignments (CE-MC) Right: expert curetted multiple structure alignments (HOMSTRAD)
What Effects the Performance of Predictor 2? • Poor structure alignment eg. More than 30% of the residues of the methane monooxygenase hydroxylase subunit were in the gapped positions of its structure alignment. • Binding residues located in flexible loops eg. The deteriorated prediction of the interfaces on VP 1 in p1/mahoney poliovirus may be caused by its long and flexible loops contacting the other two coat proteins, VP 2 and VP 3.
Conclusions 1. Analysis of the structurally conserved surface residues • The conserved residues were measured by the structural conservation scores derived from the multiple structure alignments followed by a weighting process with the B-factors. • Although derived from the alignments of single structures, these structurally conserved residues did differentiate the protein interfaces and the rest of the surfaces.
Conclusions 2. Incorporating the structural conservation score improved the prediction significantly. • Before clustering, the recall increases about 10% at precision 50%, increases about 18% at precision 40%. • After clustering, at precision about 50 %, the number of correctly predicted binding sites increase close to 16%, the number of precisely predicted binding sites increase close to 13%. • 53.0% of the binding sites were precisely predicted, 75.9% of the binding sites were correctly predicted, and 20.4% of the binding sites were partially predicted.
Conclusions 3. This study was an initial trial that exploits multiple structure alignments on a large scale for the prediction of functional regions. 4. A more suitable scoring function or weighting function could be developed in the future. 5. This method can be used to guide experiments, such as site-specific mutagenesis or combined with docking procedures to limit the search space.
Acknowledgments Dr. Wei Wang Dr. Chittibabu Guda Dr. BVB Reddy All the members in Bourne group Dr. Philip E. Bourne This work was supported by the Molecular Biophysics Training Grants at UCSD