Stefan M. Larson 1 , Ariel Di Nardo 2 , Alan R. Davidson 2,3

Bionformatics and the protein folding problem:sequence analysis and structure comparison of the SH3 domain Stefan M. Larson1, Ariel Di Nardo2, Alan R. Davidson2,3 1Biophysics Program, Department of Structural Biology, Stanford University 2Department of Biochemistry, University of Toronto 3Department of Molecular and Medical Genetics, University of Toronto

Sequence Structure Behaviour thermostability binding affinity in vivo function dimerization . . . VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF PRLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAS FVTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAA VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF VTLFEALYDYEAARTEDDLSFKEDIIFLASAIIELAAF Sequence Analysis Experimental Studies Structure Comparison

The SH3 domain is an ideal model system • 266 unique sequences • 18 solved structures • well-behaved • well-characterized • simple fold Fyn Tyrosine Kinase SH3 Domain

Aims of the study 1. 2. 3. 4. 5. 6. Assemble a complete and accurate alignment of all available SH3 sequences. Analyze residue frequencies and conservation patterns in the sequence alignment to quantitate sequence variation in the SH3 fold. Develop an algorithm for covariation analysis which detects meaningful residue interactions within the SH3 domain. Interpret conservation and covariation patterns to identify residues and interactions critical for stability and function of the SH3 domain. Align and rigorously compare all available SH3 structures to quantitate the structural variation in the SH3 domain. Compare the results of sequence alignment analysis and structure comparison to provide insight in the sequence-structure relationship.

PSI-Blast against NRDB homologous "hits" Pull out SH3 domains crude set of new target homologues sequences ClustalW alignment with manual gap adjustment Removal of sequences <18% ID and redundant sequences Iterate until no new hits are found complete, non-redundant domain alignment Automated, iterative alignment protocol target sequence 2° structure crude alignment refined alignment

I50 4.07 hydrophobic core P51 peptide-binding 1.23 I28 4.14 hydrophobic core W36 peptide-binding 1.29 D9 4.30 ? ? G48 1.44 structural Y54 peptide-binding 4.67 G23 2.42 structural F26 4.75 hydrophobic core A6 2.49 hydrophobic core W37 4.92 hydrophobic core L18 2.68 hydrophobic core V55 4.93 hydrophobic core E24 3.14 buried H-bond to S41 L7 5.24 ? ? F20 3.26 hydrophobic core Y8 5.33 peptide-binding Y10 peptide-binding 3.53 G45 5.41 structural A39 4.05 hydrophobic core Top 20 conserved residues Fyn SH3 residue Fyn SH3 residue Entropy Role Entropy Role

Covariation analysis Observed Res Freq Expected Res Freq 7% A39 26% I50 26% A39/I50 15% G39 35% 50% F50 37% G39/F50 19% ? Statistical techniques used: 2 analysis - Chi-square p-value significance levels phi association coefficient Information theory - Shannon entropy mutual information Sequence bias reduction - Henikoff weighting sub-alignment diversity

Top 20 covarying pairs ResX ResY Phi %ID Seqs G39 G F50 F 0.690 0.33 100 I26 I G39 G 0.615 0.33 92 • 33/93 covariations are between hydrophobic core residues • five hydrophobic core positions (F20, I26, A39, F50, V55) participate in 53/93 covariations • five functional residues (Y8, E17, G35, L49, S52) show high covariation • covariation triplets also detected among core residues G48 G P51 P 0.529 0.28 257 I26 I F50 F 0.483 0.35 70 G39 -G F50 V -0.472 0.30 68 G39 A F50 I 0.451 0.38 48 V4 A E57 L 0.442 0.36 20 N53 N V55 V 0.442 0.34 117 E30 E K38 R 0.437 0.32 23 I26 L G39 A 0.429 0.34 46 R5 V E17 D 0.410 0.39 36 F20 -F I26 V -0.397 0.30 48 F50 V V55 L 0.392 0.31 27 G39 V F50 V 0.379 0.33 24 F20 L I26 V 0.362 0.32 29 Y8 Y N53 N 0.359 0.34 112 G39 A I58 R 0.358 0.36 11 Y8 F E57 E 0.357 0.36 24 I26 L F50 I 0.353 0.35 38 E30 D S32 S 0.350 0.39 16

Covariation predicts stable mutants  Mutant Covar 2 MutTm Cov1Tm Cov2Tm CombTm Covar 1 Simple 63.5 F20L F26V 0.3619 68.3 76.4 Covariation F26I A39G 0.6155 68.6 68.2 66.6 F26I I50F 0.4834 68.6 45.4 46.0 Multiple A39G I50F 0.6896 68.2 45.4 55.2 Covariation F26I A39G I50F 0.5962 68.6 68.2 45.4 74.5 Negative A6V F20F - 0.3421 38.6 69.1 (F20L) 55.1 Covariation

pairs < 8 Å apart covarying pairs Successful contact prediction SH3 position 27/32 (84%) covarying pairs are < 8 Å apart SH3 position

Structural variation in SH3 domains

18 SH3 structures were aligned S tr u c t u r e P D B A v e r a g e A v e r a g e f i l e r ms d ( Å ) i d e n t i ty ( % ) S e m 5- C 1 s e m 0 . 8 1 3 1 . 9 L c k 1 lc k 0 . 8 2 2 8 . 1 C s k 1 cs k 0 . 8 5 2 7 . 8 H c k 1 a d 5 0 . 8 6 3 2 . 4 C r k - N 1 c k a 0 . 8 8 2 9 . 9 E p s 8 1 a o j 0 . 8 8 2 5 . 2 A b l 1 ab o 0 . 9 0 2 7 . 2 S p e c trin 1 s hg 0 . 9 1 2 8 . 4 A m ph i phy s in 1 bb 9 0 . 9 1 2 5 . 5 5 3b p2 1 y c s 0 . 9 2 2 6 . 8 F y n 1 s h f 0 . 9 3 3 4 . 1 Hck Amphiphysin RMSD: 0.9 Å Sequence ID: 26% Sr c 1 fm k 0 . 9 6 3 1 . 6 G rb 2 - N 1 gb q 0 . 9 8 3 1 . 7 PI 3 k in a se 1 p ht 0 . 9 8 2 6 . 3 G rb 2 - C 1 gf c 1 . 1 1 3 1 . 2 N e b u l i n 1 n e b 1 . 2 3 2 9 . 5 B t k 1 a wx 1 . 2 5 3 0 . 8 g P l c - 1 h sq 1 . 4 7 2 7 . 9

Conservation of the structural core A RT-src B distal C n-src D E 2 1.5 1 0.5 0 GKYVRALYDYEAREDDELSFKKGDIITVLEKSDDGWWKGRLNDTGREGLFPSNYVEEIDS • little structural variation in b-sheets • secondary structure assignment very consistent • large RT-src loop surprisingly constant • regions with RMSD < 2 Å define structural core

Residue-by-residue structural variation 2.5 • hydrophobic core residues well conserved (RMSD < 1 Å) • ligand-binding residues well conserved (RMSD < 1 Å) • no correlation between sequence conservation and structural conservation 2 1.5 Positional RMSD (Å) 1 0.5 0 0 10 20 Positional sequence entropy

Function of SH3 domains

Structural conservation of ligand-binding residues 100 80 60 40 20 0 Residue Burial (%) GKYVRALYDYEAREDDELSFKKGDIITVLEKSDDGWWKGRLNDTGREGLFPSNYVEEIDS residue burial without bound ligand • some residues are consistently buried by the ligand (i.e. contact the ligand) • other residues contact the ligand less consistently from domain to domain • seven residues show an increase in residue burial with standard deviation less than one mean: Y8, Y10, G35, W36, P51, N53, Y54 residue burial with bound ligand standard deviation

Sequence conservation of ligand-binding residues Residue Entropy Y8 5.3 Y10 3.5 G35 7.3 W36 4.9 P51 1.2 N53 6.5 Y54 4.7 R13 12.6 E14 13.6 D15 9.9 E16 8.9 L49 8.6

Conclusions 1. 2. 3. 4. Important sequence-structure relationships in the SH3 domain are subtle, and are missed by studying only a single sequence and/or structure. Covariation data was used to make accurate predictions about stabilizing mutations and residue contacts. Residues participating in structurally conserved ligand contacts are more sequence conserved than residues contacting the ligand less consistently. This may be a source of binding specificity. Bioinformatics was successfully used to gain valuable data in an already very well-characterized system

Acknowledgements Davidson Lab Dr. Alan Davidson Ariel Di Nardo Julian Northey Arianna Rath Supervisory Committee Dr. Richard Collins Dr. Chris Hogue

References Larson SM, Di Nardo AA, Davidson AR. (2000) "Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions."Journal of Molecular Biology 303(3): 443-456 Larson SM & Davidson AR. (2000) "A comprehensive analysis of the sequences and structures comprising the SH3 domain." Protein Science (in press) Plaxco KW, Larson S, Ruczinski I, Riddle DS, Thayer EC, Buchwitz B, Davidson AR, Baker D. (2000) "Evolutionary conservation in protein folding kinetics." Journal of Molecular Biology 298(2): 303-312 http://www.stanford.edu/~smlarson

Stefan M. Larson 1 , Ariel Di Nardo 2 , Alan R. Davidson 2,3

Stefan M. Larson 1 , Ariel Di Nardo 2 , Alan R. Davidson 2,3

Presentation Transcript

By Richard M. Davidson

1 D. Colognesi , 2,3 L. Di Fresco, 2 G. Gorini 4 M. Hartl , 5 R. Senesi

Chrestomathy with R ariel faigon

(R 1 -R 2 ) homolytic cleavage 3( R 1 R 2 ) ISC 1( R 1 R 2 )

M. Nikolova 1 , M. Muhtarova 1 , M. Younas 2 , J.D. Lelievre 2,3 , H. Taskov 1 , Y. Levy 2,3

M 1 2 M

M. Hamada 1 *, M. Ushioda 1 , T. Fujii 2,3 and E. Takahashi 1

S Davidson 1 , R Popple 2 , G Ibbott 1 , D Followill 1

Stefan E. M üller Laboratori Nazionali di Frascati

Stefan E. M üller Laboratori Nazionali di Frascati

Statistics [1/2,3/2]

Alan M. Haywood

M. M. Bela 1 , K. M. Longo 2 , S. R. Freitas 2 , P. Artaxo 1

1 – M\1 of Microtus (Stenocranius) gregaloides ; 2,3,- M\1 of

Shlomi Dolev 1 , Ariel Hanemann 1 , Elad M. Schiller 2 , and Shantanu Sharma 1

R. Dittmann 1 , F. Froehlich 2 , R. Pohl 1 , M. Ostrowski 2

V. M.-R. Arnould 1,2,* , N. Gengler 2,3 , and H. Soyeurt 2,3

2 r 1 + r 2

{ χ M . (1-3.COS 2 θ )}/ (R M ) 3