INTRODUCTION The aim of computational structural biology is to

INTRODUCTION The aim of computational structural biology is to understand (and predict) the structure and function of biological macromolecules, such as proteins and nucleic acids based on their microscopic (atomic) interactions. These are thermodynamic systems, which are also affected by environmental factors such as temperature, pressure, and solvent conditions. Classical thermodynamics is a general methodology that enables one to derive relations among macroscopic properties such as volume (V), pressure (P), temperature (T), energy (E) and entropy (S) without the need to consider the atomic properties of the system, e.g., the equation of state, PV=NRT.

In statistical mechanics, on the other hand, the macroscopicthermodynamic behavior is obtained from the specific microscopic description of a system (atomic interactions, atomic masses, etc.). Therefore, statistical mechanics also provides information that is beyond the reach of classical thermodynamics - what is the native structure of a protein ? It is impossible to know the exact dynamic behavior of a large system (i.e., the atomic coordinates at time t). Appearance of a 3D configuration is only known with a certain probability.Thus, statistical mechanics is a probabilistic theory, and computational methods, such as Monte Carlo and molecular dynamics are based on these probabilistic properties. Part of the course will be devoted to basic probability theory.

This is a short course. Therefore, the theory of statistical mechanics will not be derived rigorously. The emphasis will be on solving problems in statistical mechanics within the framework of the canonical ensemble, treating polymer systems and proteins. The probabilistic nature of the theory will be emphasized as reflected in computer simulation techniques.

Mechanical Systems are Deterministic Refreshing some basic physics Examples of forcesF (F=|F|): 1) Stretching a spring by a distance x: F=-kx, Hook’s Law k- spring constant. 2)Gravitation force:F=kMm/r2 - mandMmasses with distance r; k - constant. On earth(R,M large), g=kM/R2 F=mg 3)Coulomb law: F=kq1q2/r2 q1,q2 charges.

Newton’s second law:F=ma - aacceleration Mechanical workW: if a constant force is applied along distanced,W=Fd(F=|F|).More general, W=!F..dx. Potential energy: If mass m is raised to height, h negative work is done, W = –mgh and the mass gains potential energy,Ep= -W = +mgh - the ability to do mechanical work: when m falls dawn, Ep is converted into kinetic energy, Ek = mv2/2, where v2/2=gh (at floor). A spring stretched byd: Ep= -W = k!xdx = kd2/2 In a closed system the total energy, Et = Ep+Ekis constantbut Ep/Ekcan change; e.g., oscillation of a mass hung on a spring and distorted from its equilibrium position.

The dynamics of a mechanical macroscopic system in principle is deterministic in the sense that if the forces are known, and the positions and velocities of the masses at time t=0 are known as well, their values at time t can in principle be determined by solving Newton’s equations. Simple examples: harmonic oscillator (a spring), a trajectory of a projectile, movement of spaceship, etc. In some cases the solution is difficult and requires strong computers.

Stability - a system of interacting masses (by some forces) tends to arrange itself in the lowest potential energy structure (which might be degenerate) also called the ground state. The system will stay in the ground state if the kinetic energy is very small - this situation defines maximumorder. The larger the kinetic energy the larger is the disorder - in the sense that at each moment a different arrangement of the masses will occur (no ground state any more). Still, in principle, the trajectories of the masses can be calculated. Two argon atoms at rest positioned at the lowest energy distance e -interacting through Lennard-Jones potential. Microscopic system. e

Thermodynamic systems and Statistical Mechanics A typical system is described below; examined from a microscopic point of view – non-rigorous treatment TR R C TC A systemCof N molecules in a constant volume V is in thermal contact with a large reservoir (R) (also called heat bath) with a well defined temperature TR. At equilibrium (after a long time) energy is still exchanged between R and Cbut the average kinetic (and potential) energy of a molecule of C is constant, leading to Tc=TR.

However, in contrast to a macroscopic mechanical system, there is no way to know the trajectories of the particles that are changed constantly due to the energy exchange between C&R and quantum mechanics limitations. Relating kinetic energy to temperature, at low T,Ek is low, the effect of the molecular forces significant - the system arrange itself in a low potential energy state – relatively high order. At high T,Ek is high and dominant, Ep is high –high disorder that includes the uncertainty related to trajectories. Therefore, a thermodynamic system at equilibrium cannot be characterized by the positions & velocities of its 1023particles but only by the averagevalues of several macroscopic parameters such as P, T, E (internal energy) and entropy, S.

For example, in measuring T the thermometer feels the average effect of many molecular configurations of the tested system and for long measurement all the microscopic states are realized and affect the result of T. Hence to obtain the average values of macroscopic parameters from microscopic considerations a probability density P(xN,vN) should be assigned to each system state (xN,vN) where (xN,vN) = (x1,y1,z1,x2,y2,z2, ….xN,yN,zN,vx1,vy1vz1…. vxN,vyNvzN) thus assuming that all states contribute.

Then a macroscopic parameterMis a statistical average, <M> = !P(xN,vN) M(xN,vN)d(xNvN). The entropy for a discrete and continuous system is (kB is the Boltzmann constant), S = <S> = -kBSPilnPi and S= -kB!P(xNvN) lnP(xN,vN) d(xNvN)+ + ln{const.[dimension (xNvN)]} Notice, Pis a probability density with dimension 1/(xNvN) . The constant is added to make S independent of(xNvN).

The problem – how to determine P. In thermodynamics an N,V,T system is described by the Helmholtz free energy A, A(T,V,N)=E –TS, which from the second law of thermodynamics should be minimum for a given set of constraints. We shall determine P by minimizing the statistical free energy with respect to P.

A can be expressed as the average A = <A(P)>= !P(X)[E(X)+kBTlnP(X)]dX +const. X=(xN,vN). We derive A with respect to P and equate to zero; the const. is omitted. A’=![E(X)+kBTlnP(X) + kBT P(X)/P(X)]dX=0 E(X)+kBT[lnP(X) + 1]=0 lnP(X) = - [E(X) + kBT]/kBT= = - [E(X)/kBT] +1 P(X) =const.exp[-E(X)/kBT] The normalization is Q=!exp[-E(X)/kBT)]

i.e., PB(X)=exp[-E(X)/kBT]/Q PB –the Boltzmann probability (density). Q – the canonical partition function. The intgrand defining A isE(X)+kBTlnPB(X) Substituting PBand taking the ln gives, E(X)+kBT[- E(X)/kBT –lnQ]= -kBTlnQ -kBTlnQ is constant for any X and can be taken out of the integral of A. Thus, A=E-TS= - kBTlnQ

The relation A= - kBTlnQis very important. It enables calculating A by Q that is based on the details of the system. In classical thermodynamics all the quantities are obtained as derivatives of A. Hence, with statistical mechanicsall these quantities can be obtained by taking derivatives of lnQ. Also, the probabilistic character of statistical mechanics enables calculating averages even without calculating Q. These averages can be even geometrical properties that are beyond the reach of thermodynamics, such as the end-to-end distance of a polymer, the radius of gyration of a protein and many other geometrical or dynamic properties. This is in particular useful in computer simulation.

(1) Probability and Statistics, M.R. Spiegel, Schaum’s Outline Series, McGRAW-Hill ISBN 0-07-060220-4. (2) An Introduction to Statistical Thermodynamics, T.L. Hill, Dover, ISBN 0-486-652- 42-4. (Also Eddison-Wesley). (3) Statistical Mechanics, R. Kubo. North Holland (ISBN 0-7204-0090-2) and Elsevier (0-444-10637-5). (4)Statistical Mechanics of Chain Molecules. P. J. Flory. Hanser (ISBN 3-446- 15205-9) and Oxford (ISBN 0-19-520756-4). (5)Phase Transitions and Critical Phenomena. H.E Stanley (Oxford). 6) Introduction to Modern Statistical Mechanics. David Chandler (Oxford). ISBN 0-19-504277-8.

Lecture 2: Calculation of the partition function Q TR Systems at equilibrium N particles with velocities vN and coordinates xNare moving in a container in contact with a reservoir of temperature T. We haveseen last time thattheHelmholtz free energy, A is A=E-TS= - kBTlnQ where Q=!exp[-E(xN,vN)/kBT) d(xNvN)]

E(xN,vN)= Ek(vN) + Ep(xN) E(xN,vN) is the Hamiltonian of the system. If the forces do not depend on the velocities (most cases) Ekis independent of Ep and the integrations can be separated. Also, the integrations over the velocities of different particles are independent. Moreover, the integrations over the components vx ,vy ,and vzare independent. Therefore, we treat first the integration over vx (denoted v) of a single particle. To recall: the linear momentum vector p=mv;therefore, for one component: Ek= mv2/2 = p2/2m

A useful integral (from table): Therefore, our integral is because

The integral over the 3N components of the momentum (velocities) is the following product: Q is (h is the Planck constant – see Hill p.74; =VN), The problem is to calculate the configurational integral

The origin of the division by N factorial,N!=12  3…  N is that each configuration of the N particles in positions x1 , x2, x3,….,xNcan be obtained N! times by exchanging the particles in these positions (introduced by Gibbs). For example, for 3 particles there are 3!=6 possible permutations. sites Particles 1,2,3 12 3 3 21 3 12 1 3 2 2 1 3 2 3 1

Stirlingapproximate formula:ln N!N ln (N/e) The Helmholtz free energy A(N,V,T) is: The velocities (momenta) part is completely solved; it contains m and h - beyond classical thermodynamics! The problem of statistical mechanics is thus to solve the integral. For an ideal gas, E = 0 (no interactions) hence =VN trivial!!

Thermodynamic derivatives of A of an ideal gas (properties ~N or V are called extensive) Pressure: Intensive variable ~N/V Internal energy: Extensive variable ~N E is the average kinetic energy, proportional to T, independent ofV. Each degree of freedom contributes (1/2)kBT. If the forces (interactions) do not depend on the velocities, T is determined by the kinetic energy only (see first lecture).

Specific heat CV is independent of T and V. The entropy is: S increases with increasing T, V and N- extensive variable. S is not defined at T=0 – should be 0 according the third low of thermodynamics; the ideal gas picture holds only for highT. Both S and E increase with T.

In the case of a real gas E(xN)0 and the problem is to calculate the configurational partition function denoted Z, where the momentum part is ignored, Z= !exp[- E(xN)/kBT]dxN Notice: WhileQis dimensionless,Zhas the dimension ofxN. Also, Z= !(E)exp[- E/kBT] dE (E) – the density of states around E; (E)dE – the volume in configurational space with energy betweenEand E+ dE. For a discrete system (n(Ei) is the degeneracy of E) Z= exp [- Ei/kBT] = n(Ei) exp [- Ei/kBT]

The contribution of Z to the thermodynamic functions is obtained from derivatives of the configurational Helmholtz free energy, F F=-kBTln Z Calculating Z for realistic models of fluids, polymers, proteins, etc. by analytical techniques is unfeasible.Powerful numerical methods such as Monte Carlo and molecular dynamics are very successful. A simple example: Nclassical harmonic oscillators are at equilibrium with a heat bath of temperature T - a good model for a crystal at high temperature (the Einstein model), where each atom feels the forces exerted by its neighbor atoms and can approximately be treated as an independent oscillator that does not interact with the neighbor ones.

Therefore, QN=qN, where q is the partition function of a single oscillator. Moreover, the components (x, y, z), are independent as well; therefore, one can calculate qx and obtain QN=qx3N. The energy of a macroscopic oscillator (e.g., a mass hung on a spring) is determined by its amplitude (the stretching distance). The amplitude of a microscopic oscillator is caused by the energy provided by the heat bath. This energy changes all the time and the amplitude changes as well but has an average value that increases as T increases. Unlike a macroscopic mechanical oscillator, the position of the mass is unknown as a function of t. We only knowPB(x).

The kinetic and potential energy (Hamiltonian) of an oscillator are p2/2m+fx2/2 f is the force constant and q=qkqpwhere qk was calculated before;  is the frequency of the oscillator.

CV = kB The average energy of one component (e.g., x direction) of an oscillator is twice as that of an ideal gas– effect of interaction. For N 3D oscillators, E=3NkBT – extensive (~N) The entropy is (also extensive), S = E/T- A/T=3NkB(1+ln [kBT/h]) E and S increase with T. In mechanics the total energy of an oscillator is constant,fd2/2 wheredis the amplitude of the motion and at timetthe position of the mass is known exactly.

In statistical mechanics a classical oscillator changes its positions due to the random energy delivered by the heat bath. The amplitude is not constant, but the average energy is proportional to T. The positions of the mass are only known with their Boltzmann probability. When T increases the energy increases meaning that the average amplitude increases and the position of the mass is less defined; therefore, the entropy is enhanced. Notice: A classical oscillator is a valid system only at highT. At low T one has to use the quantum mechanical oscillator.

Lecture 3 Thus far we have obtained the macroscopic thermodynamic quantities from a microscopic picture by calculating the partition function Q and taking (thermodynamic)derivatives of the free energy –kBTln Q. We have discussed two simple examples, ideal gas and classical oscillators. We have not yet discussed the probabilistic significance of statistical mechanics. To do that, we shall first devote two lectures to basic probability theory.

Experimental probability Rolling a die ntimes. What is the chance to get an odd number? n 10 50 100 400 1000 10,000 …. m 7 29 46 207 504 5,036 .… Relativefrequency f(n)=m/n 0.7 0.58 0.46 0.517 0.5040 0.5036 …. f(n) →P = 0.5; P = experimental probability

In many problems there is an interest inPandotherproperties. To predict them, it is useful to define for each problem an idealizedmodel - a probability space. Sample space Elementary event: Tossing a coin – A happened,or B happened. Rolling a die-1, 2, 3, 4, 5,or6happened. Event: Any combination of elementary events. An even number happened (2,4,6); a number larger than 3 happened (4,5,6).

The empty event: impossible event –  (2 < a number < 3). The certain event – Ω (Coin: A or B happened). Complementary event -Ā=Ω-A(1,2,3,4,5,6) -(2,4,6) = (1,3,5). Union – (1,2,3)  (2,3,5) = (1,2,3,5). Intersection– (1,2,4)  (2,5,6) = (2)

A BAB =  AB = intersection red +green AB = AandBAB = whole

Elementary probability space · The sample space consists of a finite number n of points ( (elementary events) B. · Every partial set is an event. · A probability P(A) is defined for each event A. · The probability of an elementary event B is P(B)=1/n. ·P(A)=m/n; m- # of points in event A.

Properties of P · 0 P(A)  1 ·P(AB) P(A) + P(B) ·i P(Ai) = 1 (Ai, elementary events) Examples: a symmetric coin; an exact die. However, in the experimental world a die is not exact and its rolling is not random; thus, the probabilities of the elementary events are not equal. On the other hand, the probability space constitutes an ideal model with equal P’s. Comment: In general, elementary events can have different probabilities (e.g., a non-symmetric coin).

Example: A box contains 20 marbles, 9white, 11black. A marble is drawn at random. What is the probability that it is white? Elementary event (EE): selection of one of the 20 marbles. Probability of EE: 1/20 The event A– a white marble was chosen contains 9EE, P=9/20 This consideration involves the ideal probability space; the real world significance: P is a result of many experiments,P=f(n), n.

More complicated examples: What is the number of ways to arrange r different balls in n cells? Every cell can contain any number of balls. Each ball can be put in n cells  ##of # ways= nn n….n=nr nr= the number of words of length r (with repetitions) based on n letters. A,B,C  AA, BB, AB, BA, CC, CA, AC, BC, CB = 32 = 9

Permutations: # of samples of r objects out of n objects without repetitions (the order is considered) (0!=1): (n)r= n(n-1)…(n-r+1) = 12 ... (n-r)(n-r+1)…n= n! 12 …(n-r) (n-r)! (3)2  (1,2), (1,3), (2,1), (2,3), (3,1), (3,2) (1,3)  (3,1) # of r (2) letter words from n (3) letters:AB, BA, CA, AC, BC, CB = 6

Problem: Find the probability that r people (r 365) selected at random will have different birthdays? Sample space: all arrangements of r objects (people) in 365 cells (days) – their number = 365rp(EE) = 1/365r Event A: not two birthdays fall on the same day- # of points in A = 365364 ……. (365-r+1)=(n)r P(A) = (n)r= 365! 365r(365-r)!365r

Combinations # of ways to select rout of n objects if the order is not considered. # of combinations =# of permutations / r! n=3; r=2 Permutations/2 (1,2) (2,1) /2 (1,3) (3,1) /2 (2,3) (3,2) /2

Problem:In how many ways can n objects be divided into k groups ofr1, r2,…..rk; rk= nwithout considering the order in each group but considering the order between the groups? Problem:How many events are defined in a sample space of n elementary events? Binomial coefficient

Problem: 23 chess players are divided into 3 groups of 8, 8, and 7 players. What is the probability that players A, B, and C are in the same group (event A)? EE – an arrangement of the players in 3 groups. # EE = 23!/(8!8!7!) If A, B, and C in the first group the # of arrangs. 20!/(5!8!7!) etc. 

Problem: What is the number of ways to arrange n objects that r1 ,r2 ,r3 ,…rkof themare identical, ri= n? A permutation of the n objects can be obtained inr1 !r2 !…rn !times  the n! permutations should be divided by this factor 6 6 24 5551112222 5115251222

Problem: 52cards are divided among 4 players. What is the probability that every player will have a king? EE – a possible division of the cards to 4 groups of 13. #EE52!/(13!)4 Ifevery player has a king, only 48 cards remained to be distributed into 4 groups of 12 # of EE(A) = 48!/(12!)4 P(A)=[48!/(12!)4]/[52!/(13!)4]

Product Spaces So far the probability space modeled a singleexperiment –tossing a coin, rolling a die, etc. In the case of n experiments we define a product space: Coin; two EE: 0 ,1 P=1/2 1 2 ……. n 1 ………. n EE: (1,0,0,1,…,1); (0,1,1,1,…,0); (…….); # (EE)=2n If the experiments are independent,P(EE)=(1/2)n

Die:EE are: 1, 2 , 3 , 4 , 5 ,6 1…………..n 1…………...n EE= (1,5,2,4….,3,6); (2,3,2,5,….,1,1); (4,…….)…; #(EE)=6n In the cases of independent experiments, P(EE) = (1/6)n

Problem: 15 dice are rolled. Find the probability to obtain three times the numbers 1,2, and 3 and twice, 4, 5, and 6? EE: all possible outcomes of 15 experiments, #(EE)= 615 #(A): according to formula on p. 45: 15!/[(3!)3(2!)3]  Also: 1 can be chosen in (15 14 13)/3!ways. 2 in (12 11 10))/3!ways etc.

Dependent and independent events Event A is independent of B if P(A) is not affected if B occurred. P(A/B)=P(A) P(A/B) – conditional probability. For example, die: Independent: P(even)=1/2; P(even/square)=1/2 [ P({2,4,6}/{1,4})=1/2] Dependent: P(2)=1/6; P(2/even)=1/3 P(even/odd)=0, while P(even)=1/2 (disjoint)

INTRODUCTION The aim of computational structural biology is to