Advancements in Protein Design: State-of-the-Art Techniques and Computational Strategies

Bioinformatics 2 -- lecture 20 • Protein design -- the state of the art

Protein folding/ protein design folding sequence structure design

Sequence space maps to structure space sequence families fold ..as many-to-one.

Short history of protein design Site-directed mutagenesis, minimization (J. Wells, 1980's-90’s) Coiled coils, helix bundles (DeGrado, 1980's-90's) Binary patterning (Hecht, 1990’s) Extreme protein stabilization (Mayo, 1990's) Binding pocket design (Hellinga, 2000) New fold design (Kuhlman & Baker, 2002-4) Protein-protein interface design (Gray & Baker, 2004) Open source protein design algorithm EGAD (Pokala, 2005) Novel enzyme by design (Baker, 2008) Experimental approaches: • in vitro evolution• phage display Computational approaches: • Dead-End elimination• binary patterning

rational design Minimizing proteins ANP = atrial natriuretic peptide (Bing Li et al, Science, 1995)

rational design Binary patterning in proteins (Kamtekar et al, Science, 1993)

Computational protein design using Dead-End Elimination • Select positions for mutating. • Select allowed amino acids at those positions. • For the selected amino acids, try all sidechain orientations (rotamers). • Chose the set of rotamers that gives the lowest “energy”

Sidechain Rotamers Sidechain conformations fall into three classes called rotational isomers, or rotamers. A random sampling of Phenylalanine sidechains, w/backbone superimposed

N N N CG CG CG CA CA CA H H H CB CB CB H H H O=C O=C O=C H H H Sidechain rotamers 1-4 interactions differ greatly in energy depending on the moieties involved. "m" "p" "t" -60° gauche 180° anti/trans +60° gauche

Rotamer stability is dependent on the backbone  angles W sidechain isshown here lying over Thr backbone Rotamers of W*:  P|=-140,=160P|=-60,=-40 p-90 +60 -90 0.372 0.079 p90 +60 +90 0.238 0.005 t-105 180 -105 0.033 0.251 t90 180 90 0.021 0.268 m0 -65 5 0.038 0.124m95 -65 95 0.183 0.203

Rotamer Libraries Rotamer libraries have been compiled by clustering the sidechains of each amino acid over the whole database. Each cluster is a representative conformation (or rotamer), and is represented in the library by the best sidechain angles (chi angles), the "centroid" angles, for that cluster. Two commonly used rotamer libraries: *Jane & David Richardson: http://kinemage.biochem.duke.edu/databases/rotamer.php Roland Dunbrack: http://dunbrack.fccc.edu/bbdep/index.php *rotamers of W on the previous page are from the Richardson library.

sidechain prediction Given the sequence and only the backbone atom coordinates, accurately model the positions of the sidechains. fine lines = true structure thick lines = sidechain predictions using the method of Desmet et al. Desmet et al, Nature v.356, pp339-342 (1992)

Theoretical complexity of sequence design Estimated number of sidechain rotamers: R=193 Typical small protein length: L=100 residues Sequence complexity: 20100 = 1.3*10130 Rotamer complexity: 193100 = 3.6*10228 Complexity of DEE algorithm: O( R2L2) = 3.6*108

Dead end elimination theorem • Each residue is numbered (i or j) and each residue has a set of rotamers (r, s or t). So, the notation ir means "choose rotamer r for position i". • The total energy is the sum of the three components: fixed-movable fixed-fixed movable-movable Eglobal = Etemplate + iE(ir) + ijE(ir,js) where r and s are any choice of rotamers. NOTE: Eglobal ≥ EGMEC for any choice of rotamers.

Dead end elimination theorem • If ig is in the GMEC and it is not, then we can separate the terms that contain ig or it and re-write the inequality. EGMEC = Etemplate + E(ig) + jE(ig,jg) + jE(jg) + jkE(jg,kg) ...is less than... EnotGMEC = Etemplate + E(it) + jE(it,jg) + jE(jg) + jkE(jg,kg) Canceling all terms in black, we get: E(ir) + j E(irjs) > E(ig) + j E(ig,js) So, if we find two rotamers ir and it, and: E(ir) + j mins E(irjs) > E(it) + j maxs E(it,js) Then ir cannot possibly be in the GMEC.

Dead end elimination theorem E(ir) + Sj mins E(irjs) > E(it) + Sj maxs E(it,js) DEE theorem can be translated into plain English as follows: If the "worst case scenario" for t is better than the "best case scenario" for r, then you always choose t.

x x x x x x DEE algorithm Eglobal = Etemplate + iE(ir) + ijE(ir,js) r 1 1 2 3 a b c a b c a b c E(r ) -1 1 1 -2 2 5 a 0 2 1 b 3 5 1 0 5 -1 0 c 5 5 -1 0 0 0 5 -1 3 5 0 0 1 0 a r E(r ,r ) 2 1 5 5 12 5 0 b 0 2 1 2 c 1 1 -1 4 3 0 0 0 -2 0 0 0 12 4 a 3 0 b 2 5 0 0 5 3 c 5 -1 0 1 0 0 12 0 0 5 0 0 0 0 0 12 E(r1) Find two columns (rotamers) within the same residue, where one is always better than the other. Eliminate the rotamer that can always be beat. (repeat until only 1 rotamer per residue)

DEE algorithm Eglobal = Etemplate + iE(ir) + ijE(ir,js) r 1 1 2 3 a b c a b c a b c E(r ) -1 1 1 -2 2 5 a 0 2 1 b 3 5 1 0 5 -1 0 c 5 5 -1 0 0 0 5 -1 3 5 0 0 1 0 a r E(r ,r ) 2 1 5 5 12 5 0 b 0 2 1 2 c 1 1 -1 4 3 0 0 0 -2 0 0 0 12 4 a 3 0 b 2 5 0 0 5 3 c 5 -1 0 1 0 0 12 0 0 5 0 0 0 0 0 12 E(r1) Find two columns (rotamers) within the same residue, where one is always better than the other. Eliminate the rotamer that can always be beat. (repeat until only 1 rotamer per residue)

r1 3 1 2 a b c a b c a b a b 0 0 5 0 0 0 0 0 12 2 -1 1 1 3 5 1 5 5 -1 -2 1 5 2 0 5 -1 2 0 0 0 3 a b c 1 E(r2) -1 3 5 1 5 5 1 1 -1 0 0 1 1 12 5 0 -3 4 3 0 1 a b c E(r1,r2) r2 2 -2 0 0 2 5 0 5 -1 02 2 3 0 12 4 0 5 3 1 0 01 -3 1 a b a b Asp 3 Leu 0 0 5 0 0 0 0 0 12 2 3 E(r1) a b c 1 2 Sequence design using DEE “Rotamers” within the DEE framework can have different atoms. i.e. they can be different amino acids. Using DEE, we choose the best set of rotamers. Now we have the sequence of the lowest energy structure. In the example, we have D or L at position 3.

some amazing accomplishments in protein design Proteins can be made super-stable natural seq Folded designed seq 8M 4M [GdnHCl] Malakauskas SM and Mayo SL (1998) “Design, Structure, and Stability of a Hyperthermophilic Protein Variant.” Nature Struct. Biol.,5, p.470.

some amazing accomplishments in protein design Computationally re-designed proteins are consistently stable Dantas et al., J. Mol. Biol. (2003) 332, 449–460

some amazing accomplishments in protein design Distinct conformational states can be stabilized. M2 integrin I domain in 2 conformations 2 crystal structures are known. They differ in the highlighted region. Shimaoka et al designed sequences for each form, open and closed. The two designs were shown to have different physiological properties. PDB codes: 1an5 1n9z Shimaoka, M., Shifman, J. M., Takagi, J., Mayo, S. L., Springer, T. A. (2000) “Computational design of an integrin I domain stabilized in the high affinity conformation.” Nature Struc. Biol. 7(8), 674-678.

some amazing accomplishments in protein design New binding sites can be designed Used to bind arabinose, now it binds seratonin. Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W. Nature 423, 185–190 (2003).

Recent success of sequence design Re-designing a binding site

DEE with alternative sequences and ligands Ligand conformers. r1 3 1 2 a b c a b c a b a b 0 0 5 0 0 0 0 0 12 2 -1 1 1 3 5 1 5 5 -1 -2 1 5 2 0 5 -1 2 0 0 0 3 a b c L E(r2) -1 3 5 1 5 5 1 1 -1 0 0 1 1 12 5 0 -3 4 3 0 1 a b c E(r1,r2) r2 2 L 3 -2 0 0 2 5 0 5 -1 02 2 3 0 12 4 0 5 3 1 0 01 -3 1 a b a b a b c Asp 3 Leu 2 0 0 5 0 0 0 0 0 12 2 E(r1) Each alternative ligand position is another “rotamer”.

An appropriate binding site was found The native ligand (arabinose) is approximately the same size as the targeted ligand (seratonin). Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W. Nature 423, 185–190 (2003).

A space was carved out for the ligand All sidechains in the binding site were truncated to alanines, and a space was defined (yellow) for the new ligand. Lots of possible ligand orientations were made. Ligand orientations were treated like rotamers in DEE! Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W. Nature 423, 185–190 (2003).

Sidechains were chosen using DEE Ligand position, along with sidechain identity and sidechain rotamer are chosen simultaneously. Looger, L. L., Dwyer, M. A., Smith, J. J. & Hellinga, H. W. Nature 423, 185–190 (2003).

H-bonding is key Successful designs have all H-bond donors and acceptors satisfied.

some amazing accomplishments in protein design New folds can be designed New proteins can be designed that have never been seen before. The designs are accurate (compare red and blue above) and they are highly stable. Kuhlman et al.Science, v.302(5649), 1364-1368 (2003)

Designing an enzyme An enzyme must bind the transition state tighter than the ground states High-resolution crystal structure closely matches the computational design. “Kemp elimination catalysts by computational enzyme design” Röthlisberger, Khersonsky, ... David Baker. Nature 19 March 2008

In-class exercise: redesign an active site • Make cocaine! (Use Builder. See diagram) • Bind it to a G-protein coupled receptor (adenergic receptor) PDB code 1BAK • Avoid collisions with backbone. Ignore sidechains. Bury it deep for maximum affinity. • Use rotamer explorer to modify the sequence of the binding site. • Two considerations • The shape of the site must fit the ligand • The protein must still fold. (changes should be similar sidechains if possible)

Advancements in Protein Design: State-of-the-Art Techniques and Computational Strategies

Advancements in Protein Design: State-of-the-Art Techniques and Computational Strategies

Presentation Transcript

Lecture 5: Bioinformatics software

Pathway Bioinformatics (2)

Core 2: Bioinformatics

Bioinformatics 2

BioInformatics (2)

Introduction to bioinformatics Lecture 2 Genes and Genomes

Introduction to bioinformatics Lecture 2 Genes and Genomes

Lecture 20

Bioinformatics for Stem Cell Lecture 2

Advanced Bioinformatics Lecture 2: Cancer pathways and therapeutics

BIOINFORMATICS Sequences #2

Core 2: Bioinformatics

Lecture 20:

Core 2: Bioinformatics

Core 2: Bioinformatics

Lecture 2: May 20 th 2009

Bioinformatics 2

Lecture 20

Lecture 20

BIOINFORMATICS Sequences #2