550 likes | 664 Vues
Parameterisation of a custom amino-acid, PCA. Contents. Force-fields and how they work What kind of interactions do we need to parameterise? Quantum Mechanical calculations Where force-fields obtain basic information CHARMM parameterisation process The goal of developers => follow same goal
 
                
                E N D
Contents • Force-fields and how they work • What kind of interactions do we need to parameterise? • Quantum Mechanical calculations • Where force-fields obtain basic information • CHARMM parameterisation process • The goal of developers => follow same goal • Worked example of pyroglutamic acid
What this talk will cover • It will: • Teach the necessary background • Advise starting points for parameterisation, by analogy with existing compounds • Follow the process for a CHARMM style strategy • pyroglutamic acid under CHARMM27 • We will be following a published process from developers • It will not: • Teach you how to create a force-field from scratch • Full parameterisation is a long process, with many potential pit-falls • This applies to new atoms and geometries not seen in force-fields • If your molecule falls under this category, it’s best to ask the experts whether someone else has done it NB: read Vanommeasleagh's home page and tutorials: http://dogmans.umaryland.edu/~kenno/#CGenFF
1: Force fields • All force-fields have a purpose • There are several force-fields, each with their own goals (e.g. following developer’s research interest) • They are thus not fully compatible • All force-fields are made from simple components • They are not really black-boxes • Each terms can be fully understood • Therefore, researchers like us can make force-fields • I will describe CHARMM-ffs from above points
Purpose of force-fields • Force-fields are used to replicate chemical or biological environments. • This means they should produce reasonable results compared to experimental data • Force-fields must also be derived from quantum mechanical (QM) simulations for molecular accuracy • i.e. FFs must approximate behaviour of electrons and atomic bonds • FFs must choose basic functions with which to do this • FF-developers must choose what kind of environment or interaction they will replicate • Thus, FFs must make choices • AMBER targets proteins conformations • OPLS-AA targets organic liquid environments • CHARMM targets hydrous solvent interactions and energies • .˙. Many force-fields can be used for task X, but some will perform better.
Purpose of force-fields Due to these differences: • Force-fields have an optimal range of performances, and limits • E.g. GROMOS works particularly well for studies of lipid behaviour • Results may be surprising if ffs used in exotic conditions • Force-fields should not be mixed with each other • The detailed mechanics are different Analogy: TIP3 water do not melt or boil at expected temperatures – they were never meant to!
CHARMM-CGenFF force field • Basis: Generalforcefield for all organic molecules • Match MM-properties to QM calculations (MP2/6-31G*) • Minimum necessary work, to allow parameters to be shared across many different molecules • Can use for, e.g. drug-binding studies in MD • Overall process for parametrisation • Charges <= Dipole moments, interaction with TIP3 water • Bond and angle values <= QM Equilibrium geometry • Bond and angle constraints <= Vibrational modes • Dihedral constraints <= Potential energy scans • Van der Waals <= Heat of solvation, molecular volume
CHARMM force field for biomolecules • Basis: Energetically accurate set of parameters • Replicate observed experimental quantities specific to proteins, nucleic-acids, etc. • i.e. Aim is for CHARMM proteins/DNA which act exactly like their counterparts in real life • Overall process for parametrisation • Fit parameters like CGenFF. • Modify to fit with relevant experimental observations. This means heat of solvation, solvent/solute interactions, etc. • Operating conditions: This force-field is optimised for room temperature, liquid phase experiments • Similar process: CGenFF can combine with CHARMM
Comparison: AMBER’s protein FF • Basis: To replicate protein behaviour by modelling amino acid conformations accurately • (subtle difference to CHARMM) • Process: • Point charges fit to higher QM calculations (B3LYP/cc-pVTZ/HF/6-31G**) at dielectric constant = 4 • Bonds and angles matched to X-ray crystals and vibrational spectra • Torsions fit to reproduce peptide conformations and phi-psi energy surfaces. • Performs similar to CHARMM by different means • Complete amino acids are parameterised, where as CHARMM uses fragments with known experimental data • More dependent on QM and biochemical observations, less on strict chemical data
Inner-working of CHARMM • CHARMM and CGenFF (from Duan e.t al., 2003) : • Harmonic bonds • Harmonic angles(modified) • Harmonicimpropers • Cosine dihedrals
Inner-working of CHARMM Compositions Observations Intra-molecular terms are very simple in nature They must be fitted to approximate real covalently bonded molecules QM-calculations form the basis of these target data During parameterisation, we won’t usually need to modify vdW terms • Basic interactions between bonded atoms are harmonic potentials • This reduces computation cost by using simple functions and eliminating electrons from the equation • Electrostatics and vdW forces are preserved • They are essential to molecular systems!
QM calculations: Gaussian • CHARMM developers use Gaussian to produce target data • Process documented for Gaussian • However, one can use other QM programs • I’ll explain what calculations are required and how to do them Using Gaussian with GaussView
Gaussian: Scripting • Textual interface provides all input necessary • One file for one simulation • GaussView (GUI) is provided to assist process • The QM-level for parameterisation is MP2/6-31G* • Enough to describe common interactions • Probably not sufficient for certain organo-metal interactions
Gaussian: Scripting Conditions and general purpose of this simulation Title Coordinates: different representations possible Detailed commands, follow-on simulations, etc.
QM calculations required: 1: Equilibrium geometry 2: Vibrational spectra 3: Dihedral energy surfaces NB: CHARMM website provides a fully worked tutorial
CHARMM parametrisation process • Priorities of different parameters • The energetics needs to be replicated ultimately to < 1 kcal/mol • Some parameters have wide-spread effects, other fill in important details • Flowchart • Parameterisation is iterative, and will take time • Illustrate process with pyroglutamic acid
CHARMM parametrisation process • The priority of different parameters: • Charges • Equilibrium bond and angle values • Bonds and angles force constants • Torsions • This order will permeate throughout parameterisation • Each set of parameters depends on everything above • Hence, refinement of parameters follows: • Create/modify data • Refine, check results • repeat.
Flowchart and notations • The rest of this talk follows like so: • 1 – Prepare entries and coordinates • 2 – Optimise charges • 3 – Optimise bonds and angles • 3.1 – equilibrium values by QM geometry • 3.2 – force constants by molecular vibrations • 4 – Optimise dihedrals • 4.1 – Generate Potential Energy Scans • 4.2 – Dihedral constants by matching and chemcial knowledge • 5 – Validation
Individual Processes Set initial topologyand geometry Set initial geometry Calculate QM vib. spec.in CHARMM QM-minim. geometrywith water molecules QM-minim. geometry Modify bond/angleforce const. Set charges and vdW Modify bonds and angles Calculate MM vib. spec.in CHARMM MM-minim. geometrywith water molecules MM-minim. geometry Do theymatch? Do theymatch? Do theymatch? bond/angleforce const. complete charge and vdWfits complete bond and anglefits complete ...etc.
Validation First parametrisation • After the last fit to dihedrals is complete, all outputs need to be verified • Starting again from partial charges and water interaction simulations... • Any changes likely propagate itself downwards through flowchart • Main reason why parameterisation is time-consuming Do charges Do bonds and angles Do dihedrals Validate all data all complete
Worked example: • Pyroglutamic acid • As its name suggests, an amino acid • N-terminal only, cyclisation of glumatic acid or enzymatic activity • Used in protein-ligand simulation with scorpion toxins.
1.1: prepare topology (idea) • Observe existing molecules in the database • What chemical groups are your molecules? • Cut and paste sections together, borrowing charges and groupings
1.1: prepare topology (details) • Adapt from existing residues. • PRO as base residue, referred to GLN and backbone values. • Consider its use and clashes with existing parameters • Right now, backbone angles and torsion parameters will be used (bad idea for cyclic molecule) • Change atom-typings to allow modification of important atom bonds and angles • NB: Creating new atom-types are not preferred, as it will be more difficult for future work
1.1: atom-typings • From existing ff: • neutral Ns in CHARMM • NH2 is primary amine • NH1 is secondary amine • N is tertiary amine • carboxyl Cs in CHARMM • CC/CD are proteins • CE1/CE2 are elementary alkanes • Use NH1/CC typing, no need to modify O and H. • Topology done! Return to ICs later.
1.1: prepare parameters • Work out required bonds, angles, dihedrals • running CHARMM can help, as it stops with a warning when bonds and angles are missing • Borrow known bond and angle values • PRO provides most of existing • CGenFF has same philosophy in making bonded parameters • 2PDO is a very similar molecule in CGenFF. Use its values for dihedrals as an initial guess. • TIP: Establish what you *need* to parameterise now, and change only them in future edits.
1.2: Create molecule for Gaussian • The starting geometry will affect results • We grab some experimental coordinates from, e.g. DrugBank or PDB • These coordinates can also be created with some chemical softwares • NB: CHARMM uses IC tables which it will use to generate the residues when coordinates are not given • Either paste by analogy or transfer from minimised geometry
1.2: QM Minimisation • Gaussian outputs in formats offering more accuracy than pdb • Outputs all bond, angle and dihedral data • file conversions may be necessary for visualisation
1.3: Create IC table CHARMM manual entry(!) • Each entry in the IC table (see below) lists 4 connected atoms, I, J, K and L,; for a normal IC table entry, the I-J bond length, R(IJ); the I-J-K bond angle, T(IJK); the dihedral angle I-J-K-L, PHI; the bond angle T(JKL); and the K-L bond length, R(KL) are listed. Improper dihedral angles, which are used to keep sp2 atoms planar and sp3 atoms in a tetrahedral geometry, are marked with a star. The center atom of an improper dihedral angle is marked with a star. For an improper dihedral angle entry, the I-K bond length, R(IK); the I-K-J bond angle, T(IKJ); the dihedral angle I-J-K-L , PHI; the J-K-L bond angle T(JKL); and the K-L bond length, R(KL) are listed. The atom entry "-99" indicates an undefined atom. • CHARMM begins with a seed of three atoms, then defines the rest of the molecule with these three • Protein conventions follow backbone N-CA-C • When you create an IC table, remember to base your first entries on these and ‘grow’ the molecule from there
1.3: IC table notes • CHARMM does not need every possible entry in a given IC table • Only the dihedrals are necessary • The command “IC PARAM” will fill in bonds/angles using existing parameters • Will save you a lot of time • Number of entries: about 1 less than the total number of atoms.
2: Optimise charges • In CHARMM ff, protein charges are not parameterised by QM calculations alone. • Consistent behaviour with other amino-acids means that PCA should obey similar charges, rather than QM data per-se • Obtained charges from analogy are “good enough” when compared with QM data • No unique chemical groups such that charge optimisation is required • This step is thus skipped for PCA • I will show you a mocked example
2.1: Optimise charges • Begin with charges from Gaussian/analogy • (can read from output)
2.1: Optimise charges • Run water interaction simulation and also check molecule dipole • i.e. imitate H-bonding interaction with charges • Modify charges to fit both data • NB: MM dipole needs to overestimate QM dipole by 30-50%, since QM data is in vacuum
3: Optimise bonds and angles • Begin by setting equilibrium values to QM or crystal values • Compare results and modify equilibrium values • Using an IC table by analogy gives the wrong ring conformation to CHARMM • I constructed an IC to start PCA near the other minimum. CHARMM then finds the equatorial conformer • It is important to check that your conformation agrees after dihedras are fitted
3.1: Bond/angle equilibrium values • CHARMM developers used 0.2 Å and 3° as the upper limit • Can usually do much better • Developers seek to use same parameters to describe many ligands, we do not need to
3.1: Bond/angle force constants • Method: Comparison of QM and MM vibrational spectra • Analogous residues are good starting points for force constants • However, perfect agreement is impossible • This is due to differences between MM and QM minima, and mixing with torsion parameters • May need to modify again during validation • A “general” agreement (about 10%) is good enough when all parameterisation is finished
3.1: Vibrational spectra • Vibrations can be collected into components • change in dipole determines IR absorption • Components are defined by motions of collective parts • bond stretches and angle bending • rocking, scissoring, wagging motions • Tip: revise IR and Raman spectroscopy dihedrals, out-of-plane motion N-C stretch C=O stretch C-H stretch N-H,O-H stretch
3.1: Vibrational notation • In CHARMM, you need to convert motions of atoms into bonds, angles and dihedrals (internal coordinates, IC) • Forms basis set of all the degrees of freedoms • # Vibrations = DoG = 3N-6 • CHARMM then uses these to fit vibrational spetra • Then you need to convert these ICs into vibrational modes • Read Pulayet. al. (1979) NB: I wrote a relatively simple tutorial on the CHARMM forums to run users through the Pulay conversion for water and propane.
MM QM
3.1: Bond/angle force constants • Visualising these vibrations with GaussView and others will help you identify important vibrations
NB: Remember there is a distinction between fitting to QM calculations, and fitting to experimental spectra
4: Optimise dihedrals • Potential Energy Scans involve: fixing a dihedral at discrete points, minimising the rest of the geometry, and calculating absolute energy. • Determines conformational preferences of residues, especially important for packing
4: Optimise dihedrals • PCA with initial dihedrals favour the opposite ring conformer (top) • One will need to work out which parameter affect the rotations you need • After a series of fits and rationales for given parameters, a closer agreement can be obtained (bottom)
Somerationales for the PCA case: • Fit only dihedrals that contains re-typed atoms • This includes NH1 and CC • Carboxyl backbone prefers equatorial over axial orientation w.r.t. ring • Fit a single 1-fold or 2-fold dihedral to CD-N-CA-C to express this • Keep all the 2PDO dihedrals as is, except where necessary • So, planar amide retains 2-fold only due to symmetry • Ring-dihedrals contain only 3-fold • Modify numbers to produce correct energy surface • The rest is heuristic searching
Pointers for a good fit: • Single parameters of 1-fold,2-fold, 3-fold, 6-fold. • Accuracy to about 0.2 kcal/mol • Attention to barrier heights and relative minima positions
Using multiple dihedral parameters and no restrictions on phase, one can achieve a fit like this. • The energy surface is very close to QM. However, the parameters used are arguably unphysical