Protein Structure: protein folding

1 Protein Structure: protein folding Park, Jong Hwa MRC-DUNN Hills Road Cambridge CB2 2XY England Bioinformatics in Biosophy Next: 02/06/2001

An Unusual Fold • Alpha helix is in the middle of beta sheets

From DNA To Protein In Cell

Different types of protein folds

Types of protein folds

Most proteins => Common folds From Alex Finkelstein’s talk

unusual fold: No Alpha no Beta

Beta Inside

Few seq: Rare fold, Many seq: Common Fold

Physical Selection of Folds (NOT evolutionary selection)

What is protein folding • Protein folding refers to the process of protein primary structures folding into tertiary structures to produce stable and functioning biological molecules. http://www.people.virginia.edu/~rjh9u/protfold.html

Experimental background for Bioinformatics • Understanding experimental techniques and knowledge on protein folding is essential for making bioinformatic algorithms for folding. • Theories of folding and implementation of them are possible and quite acceptable. • It is possible to compute 3D folds using computer for small peptides.

Why protein folding problem? 1. Protein folding is one of the best known challenges humans have  good research target 2. The physical rules of folding are not known. 3. The computational complexity is extremely high. 4. There are many diseases which are linked to protein folding. 1. Alzheimer, BSE (mad cow disease), Parkinson, etc 5. Without knowing how proteins fold, it is not possible to design proteins fast and accurately.

Proteins fold back! • In 1961, Christian Anfinsen showed that the proteins actually tie themselves: If proteins become unfolded, they fold back into proper shape of their own accord; no shaper or folder is needed. • This means most of the information for protein folding is encoded in the primary structure of proteins (sequences) ! • It is quite unbelievable!

Levinthal’s paradox • The fact that many naturally-occurring proteins fold reliably and quickly to their native state despite the astronomical number of possible configurations has come to be known as Levinthal's Paradox. C. Levinthal, in Mössbauer Spectroscopy in Biological Systems, Proceedings of a Meeting Held at Allerton House, Monticello, Illinois , edited by J. T. P. DeBrunner and E. Munck, pp. 22-24, University of Illinois Press, Illinois (1969). http://brian.ch.cam.ac.uk/~mark/levinthal/levinthal.html

Astronomical possible conformations? • there would be 3N possible configurations in our theoretical protein (N = length of protein). • In nature, proteins apparently do not sample all of these possible configurations since they fold in a few seconds, and even postulating a minimum time for going from one conformation to another, the proteins would have time to try on the order of 108different conformations at most before reaching their final state. • From Levinthal

Longer than the age of universe? • Levinthal estimated the number of configurations a typical protein could adopt (3N where N is the number of amino acids) and noted that even if protein configurations could be sampled at a rate of, say, 1013 per second, it would take longer than the age of the universe to find the native structure if this sampling was simply random. • The result of this simple calculation markedly contradicts the actual properties of proteins, which can generally find the native structure from an unfolded state on time scales of the order of seconds or less. This well-known contradiction has come to be known as Levinthal's paradox.

NP-Hard? • The most relevant class is that of NP-hard problems, for which there is no known algorithm that is guaranteed to find the solution within polynomial time, effectively rendering NP-hard problems intractable for large sizes. • It has been shown that finding the global minimum of a protein is NP-hard.

Native fold is not the most stable http://online.itp.ucsb.edu/online/infobio01/finkelstein/oh/34.html

Guided Folding and nucleation • We feel that protein folding is speeded and guided by the rapid formation of local interactions which then determine the further folding of the peptide. • This suggests local amino acid sequences which form stable interactions and serve as nucleation points in the folding process.

Nucleus of folding

Folding Time and protein size

Proteins have Folding Pathways • As proteins do not search for all the possible conformations for folding, • there must be one or MORE folding pathways to have the final stable folds.

Many different paths Folding paths can have different shapes. It could be narrow and specific path or some kind of funnel. The paths can be complicaated and reversible.

Multiple pathways on a protein-folding energy landscape • Goldbeck, R.A., Thomas, Y.G., Chen, E., Esquerra, R.M., and Kliger, D.S. (1999) Multiple pathways on a protein-folding energy landscape: kinetic evidence. Proc Natl Acad Sci U S A 96: 2782-7.The funnel landscape model predicts that protein folding proceeds through multiple kinetic pathways. Abstract: Experimental evidence is presented for more than one such pathway in the folding dynamics of a globular protein, cytochrome c. After photodissociation of CO from the partially denatured ferrous protein, fast time-resolved CD spectroscopy shows a submillisecond folding process that is complete in approximately 10(-6) s, concomitant with heme binding of a methionine residue. Kinetic modeling of time-resolved magnetic circular dichroism data further provides strong evidence that a 50-microseconds heme-histidine binding process proceeds in parallel with the faster pathway, implying that Met and His binding occur in different conformational ensembles of the protein, i.e., along respective ultrafast (microseconds) and fast (milliseconds) folding pathways. This kinetic heterogeneity appears to be intrinsic to the diffusional nature of early folding dynamics on the energy landscape, as opposed to the late-time heterogeneity associated with nonnative heme ligation and proline isomers in cytochrome c. Evidence from BPTI

Pathway means some intermediates The folding pathway naturally indicate some intermediate states for protein folding. • Pre-folded or misfolded state • Some Nucleus • Intermediates such a molten globule or partially folded state • Natural fold state (folded)

Intermediates and Molten Globules Molten globule is not just misfolded structure.

Native versus Molten golubles

Free energy barrier

Native folds and energy difference

Correct Knotting for the final Str. • From Alexy Finkelstein’s lecture

Is folding hierarchical? • Baldwin, R.L., and Rose, G.D. (1999) Is protein folding hierarchic? II. Folding intermediates and transition states. Trends Biochem Sci 24: 77-83. The folding reactions of some small proteins show clear evidence of a hierarchic process, whereas others, lacking detectable intermediates, do not. Evidence from folding intermediates and transition states suggests that folding begins locally, and that the formation of native secondary structure precedes the formation of tertiary interactions, not the reverse. Some computational experiment results support a hierarchic model of protein folding.

Dynamic nature of folding Protein folding is dynamic, so the intermediates, misfolded and folded structures exist at the same time until all the species become natually folded. There is a continuous exchange of species during the processes. Dynamic nature  Studying Unfolding of proteins is necessary

Network of unfolding pathways

Typically accepted stages of folding • Forming short segments of secondary structure forming folding nuclei • Interaction of nuclei to form domains • Interaction of domains to form the molten globule • Rearrangement of molten globule to native conformation of a monomer • Interaction of monomers to form multimers in multisubunit proteins. • Further conformational adjustments.

Wait! Here comes the chaperone • We found some proteins which help other proteins fold correctly. • Folding became complicated by chaperones

Computational Folding

The essence of protein folding problem in Bioinformatics • We have learnt physical and experimental backgrounds of folding. • For Bioinformatists, the folding problem is on HOW to map and predict all the longrange residue-residue interactions. • Secondary structure prediction is easy. Putting secondary structures to tertiary structure is NOT. • Solving protein folding problem can be said as : “Ab Initio Structure Prediction”

Theory of protein structure. There are two theoretical approaches to the protein folding problem. The first approach is to consider proteins merely as long polymer chains whose energies must be minimized by searching in the space of torsion angles or by using simplified latticemodels.

All atoms model • Parameters are obtained through high-level quantum mechanical calculations on short peptide fragments • Such an approach has several advantages. It assures the generality and allows further refinement upon the availability of more accurate quantum mechanical methods and upon the need for such an improvement. • Detailed models require both a large number of particles, typically more than 10,000 • A small time step of one to two femtoseconds (10–15 seconds), direct simulation of the folding processes, which take place on a microsecond or larger time scale, has been difficult.

An alternative way of looking at the problem is to consider proteins as systems of regular secondary structures. In terms of secondary structure, the protein folding process can be represented as a sequence of the following events: • formation of a -helices and b -sheets by the hydrophobically collapsed peptide chain, • assembly of the regular secondary structure elements into the protein core, and • joining of nonregular loops and the less stable "peripheral" helices and b -strands to the core and the association of independently formed domains.

Theoretical protein folding A theory of protein self-organization must reproduce all these events to finally calculate three-dimensional structures of proteins. The development of this theory requires, first of all, • a design of new, specialized force field to calculate free energies of a-helices and b-sheets relative to the coil (the secondary structures and the coil are represented by large ensembles of conformers) , instead of enthalpies of individual conformers given by molecular mechanics force fields. • This theory requires also a design of alternative global energy optimization strategies, such as calculation of the lowest energy partition of the peptide chain into different secondary and supersecondary structures, instead of searching in the space of torsion angles or lattice simulations.

Protein Structure Modelling: Comparative modelling In 4 years, humans will have almost all protein domain structures in nature. Instead of solving the difficult protein folding problem, we can model most of protein structures by comparative modelling Methods.

The steps of modelling Sequence  homologous Structure  Alignment  Superimposition  adjustment of atomic coordinates (such as by satisfaction of spatial restraints ) Optimization  validation  Iteration of the process.

Modelling process

Modeller by Andrej Sali • First, many distance and dihedral angle restraints on the target sequence are calculated from its alignment with template 3D structures. The form of these restraints was obtained from a statistical analysis of the relationships between many pairs of homologous structures. This analysis relied on a database of 105 family alignments that included 416 proteins with known 3D structure [ali & Overington, 1994]. By scanning the database, tables quantifying various correlations were obtained, such as the correlations between two equivalent - distances, or between equivalent mainchain dihedral angles from two related proteins.

These relationships were expressed as conditional probability density functions (pdf's) and can be used directly as spatial restraints. For example, probabilities for different values of the mainchain dihedral angles are calculated from the type of a residue considered, from mainchain conformation of an equivalent residue, and from sequence similarity between the two proteins. Another example is the pdf for a certain - distance given equivalent distances in two related protein structures. An important feature of the method is that the spatial restraints are obtained empirically, from a database of protein structure alignments.

Protein Structure: protein folding