1 / 76

Outline

QM/MM Modelling Lecture 2 Implementation Tutorial Practical QM/MM Calculations with the ChemShell Software. Outline. Implementation Issues what factors affect the design Three QM/MM software approaches QM added as an extension to forcefield MM environment added to a small-molecule treatment

nishan
Télécharger la présentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QM/MM ModellingLecture 2ImplementationTutorialPractical QM/MM Calculations with the ChemShell Software

  2. Outline • Implementation Issues • what factors affect the design • Three QM/MM software approaches • QM added as an extension to forcefield • MM environment added to a small-molecule treatment • Modular scheme with a range of QM and MM methods • ChemShell • Concepts and sample modules • Approaches to periodic systems • High Performance Computing • Approach within QUASI • Parallel computing techniques within GAMESS-UK, DL_POLY etc • Tutorial Session • Simple Tcl & ChemShell scripts, using the PyMOL GUI

  3. Implementation Issues • Most implementations are based on existing QM and MM packages • Implementers face a basic conflict between • modularity • keeping programs separate • minimal modifications to allow easy upgrades • Performance and Functionality • supporting more complex interactions • minimising overhead, recalculation, file access etc

  4. Software Approaches • QM/MM Implementations can • QM added as an extension to forcefield • CHARMM/GAMESS-UK • MM environment added to a small-molecule treatment • GAMESS-UK/AMBER, Gaussian/AMBER (Manchester) • Modular scheme with a range of QM and MM methods • specialised optimisation algorithms • e.g. ChemShell

  5. QM/MM Software Approach (i) • Specialised for a classical modelling approach, by integrating QM code into MM package • CHARMM + GAMESS(US), MNDO (Harvard & NIH) • AMBER + Gaussian (UCSF, Manchester) • QM/Pot, GULP + TURBOMOLE (Berlin) • CHARMM + GAMESS (UK) • Daresbury & Bernie Brooks, Eric Billings (NIH) • Similar to existing ab-initio interfaces; CHARMM side follows coupling to GAMESS(US) (Milan Hodoscek) • Support for Gaussian delocalised point charges • Advantages • Good MD capabilities, model building etc • Disadvantages • Restricted to certain classes of systems by forcefield choice

  6. QM/MM Software Approach (ii) • Extend QM code by adding MM environment as a perturbation • QM code is “in control” • MM environment will typically be relaxed for each QM structure (hidden coordinates) • e.g. IMOMM, Gaussian/AMBER (Manchester) • Advantages • Familiarity for quantum chemists • Tools for small molecule manipulation (internal coordinates, transition state search etc) are available • Disadvantages • Relatively poor tools for management of conformational search of QM/MM surface, QM/MM dynamics etc

  7. QM/MM Software Approach (iii) • Modular approach covering a range of MM and QM codes • e.g. ChemShell • Advantages • Flexibility of applications areas • Ability to choose the best QM code for the specific task • Ease of update of component codes • Disadvantages • Software Complexity • range of forcefield types • wide variation in QM and MM program design • Close integration needed for performance (e.g. HPC), but weak coupling simplifies maintenance

  8. Script design goals • Keep all control input together in one place • Avoid proliferation of small scripts and control files • Permit flexible user scripting • Support heirarchical command structure (modules can invoke each other) • Relationship between modules should be clear from the script • The same module should be able to appear more than once (with different control arguments) • Re-entrant invocation should be possible for some modules (e.g. optimisers) • Support both loosely-coupled and efficient execution • Provide option to keep data objects accessed by different modules in memory

  9. ChemShell • Extended Tcl Interpreter • Scripting capability • Interfaces to a range of QM and MM codes including • GAMESS-UK • DL_POLY • MNDO97 • TURBOMOLE • CHARMM • GULP • Gaussian94 • Implementation of QM/MM coupling schemes • link atom placement, forces etc • boundary charge corrections

  10. ChemShell Architecture - Languages • An extended Tcl interpeter, written in.. • Tcl • Control scripts • Interfaces to 3rd party executables • GUI construction (Tk and itcl) • Extensions • C • Tcl command implementations • Object management (fragment, matrix, field, graph) • Tcl and C APIs • I/O • Open GL graphics • Fortran77 • QM and MM codes: GAMESS-UK, GULP, MNDO97, DL_POLY

  11. Core ChemShell Tcl interpreter, with code to support chemistry data: Optimiser and dynamics drivers QM/MM Coupling schemes Utilities Graphics GAMESS-UK (ab-initio, DFT) MNDO (semi-empirical) DL_POLY (MM) GULP (Shell model, defects) Features Single executable possible Parallel implementations External Modules CHARMM TURBOMOLE Gaussian94 MOPAC AMBER CADPAC Features Interfaces written in Tcl No changes to 3rd party codes ChemShell Architecture

  12. ChemShell control files are Tcl Scripts Usually we use a .chm suffix ChemShell commands have some additional structure, usually they take the following form command arg1=value1 arg2=value2 Arguments can serve many functions mm_defs=dl_poly.ff Identify a data file to use coords=c Use object c as the source of the structure use_pairlist=yes Provide a Boolean flag (yes/no, 1/0,, on/off) list_option=full Provide a keyword setting theory=gamess Indicate which compute module to use Sometimes command arg1=value1 arg2=value2 data ChemShell basics

  13. ChemShell object types fragment - molecular structure creation: c_create, load_pdb …. Universal!! zmatrix z_create, newopt, z_surface matrix creation: create_matrix, energy and gradient evaluators, dynamics field creation: cluster_potential etc, graphical display, charge fits GUI only 3dgraph ChemShell Object types

  14. Between calculations, and sometimes between commands in a script, objects are stored as files. Usually there is no suffix, objects are distinguished internally by the block structure. ChemShell Object Representations % cat c block = fragment records = 0 block = title records = 1 phosphine block = coordinates records = 34 p 4.45165900000000e+00 0.00000000000000e+00 -8.17756491786826e-16 c 6.18573550000000e+00 -2.30082107458395e+00 1.93061811508830e+00 c 8.21288557680633e+00 -3.57856465464377e+00 9.30188790875432e-01 c 9.49481331797844e+00 -5.31433733714784e+00 2.37468849115612e+00 c 8.74959098234423e+00 -5.77236643959209e+00 4.81961751564967e+00 …….... • Multi block objects are initiated by an empty block (e.g. fragment) • Unrecognised blocks are silently ignored

  15. Object Caching • During a run objects can be cached in memory, the command to request this is the name of the object (similar to a declaration in a compiled language) # fragment c c_create coords=c { h 0 0 0 h 0 0 1 } list_molecule coords=c delete_object c # No file is created here • Confusion of objects with files can lead to confusion!!

  16. Object Input and Output • If you access an object from a disk, ChemShell will always update the disk copy when it has finished (there is no easy way of telling if a command or procedure has changed it). • Usually this is harmless (e.g. output formats are precise enough), but unrecognised data in the input will not be present in the output, take a copy if you need to keep it. • E.g. if a GAMESS-UK punchfile contains a fragment object and a single data field (e.g. the potential) you can use it as both a fragment object and a field object % rungamess test % cp test.pun my_structure % cp test.pun my_field % chemsh …….

  17. Energy Gradient Evaluators • Many modules are designed to work with a variety of methods to compute the energy and gradient. The procedure relies on • the interfaces to the codes being consistent, each comprises a set of callable functions e.g. • initialisation • energy, gradient • kill • update • the particular set of functions being requested by a command option, usually theory= • Example evaluators (depends on locally available codes) • gamess, turbomole • dl_poly, charmm • mopac, mndo • hybrid • You can write your own in Tcl

  18. Module options, using the : The : syntax is used to pass control options to sub-module. e.g. when running the optimiser, to set the options for the module computing the energy and gradient. {} can be used if there is more than one argument to pass on. Nested structures are possible using Tcl lists newopt function=zopt : { theory=gamess : { basis=sto3g } zmatrix=z } gamess arguments Command zopt arguments newopt arguments

  19. Object Oriented Interfaces • Purpose • To allow a script to manage multiple instances of the same data structure • Support for recursive/reentrant algorithms • Examples • PE surface drivers • newopt, dynamics • Coordinate transformation routines • copt, zopt, dlcopt • QM and MM interfaces • dl_poly, mopac • Syntax • Modelled on Tk, itcl, and other object-oriented Tcl extensions mopac mop1 coords=water.c energy=water.e hamiltonian=am1 mop1 energy # additional invocations mop1 delete

  20. Using multiple objects - A QM/MM optimisation scheme newopt1 Object types newopt: optimiser object zopt: Z-matrix/cartesian transformation dl_poly: MM energy and forces hopt: Custom written target function (150 lines of Tcl) Instances zopt1: reduced (active site) coordinates zopt2: environment coordinates newopt1: master optimisation, QM/MM hamiltonian, in a reduced coordinate space newopt2: relax the environment using a pure MM energy expression hopt newopt2 zopt2 dl_poly2 zopt1 hybrid dl_poly1 mopac

  21. HPC Exploitation • Adapt Tcl interpreter to run in parallel, master-slave architecture • one node reads input, all nodes execute commands • in-core storage of data objects • replicated: default • distributed: specific matrix objects • Use of parallel third party codes (e.g. TURBOMOLE) • Direct coupling to parallel codes: • GA-based • GAMESS-UK • MPI-based • MNDO, GULP, DL_POLY

  22. GAMESS-UK Parallel Implementation • Replicated data scheme • store P, F, S etc on every node • minimal communications (load balancing, global sum) • up to ca. 2000 basis functions • Message-passing version (MPI, TCGMSG) • SCF and DFT • Suitable for < 32 processors • Global Array version: • Parallel functionality • SCF, DFT, MP2, SCF Hessian • Parallel algorithms • GAs for in-core storage of transformed integrals (to vvoo) and MP2 amplitudes • parallel linear algebra (PEIGS, DIIS, MXM etc) • GA-mapped ATMOL file system

  23. Number of Nodes Thrombin QM/MM Benchmark QM region (69 Atoms) is modelled by SCF 6-31G calculation (401 GTOs), total system size for the classical calculation is 16,659 atoms. The MM calculation was performed using all non-bonded interactions (i.e. without a pairlist), and the QM/MM interaction was cut off at 15 angstroms using a neutral group scheme (leading to the inclusion in the QM calculation of 1507 classical centres).

  24. Scaling for Thrombin + Inhibitor (NAPAP) + Water Number of Nodes QM/MM Thrombin Benchmark QM/MM calculation on thrombin pocket: QM region (69 Atoms) is modelled by SCF 6-31G calculation (401 GTOs), total system size for the classical calculation is 16,659 atoms. QM/MM interaction was cut off at 15 angstroms using a neutral group scheme (leading to the inclusion in the QM calculation of 1507 classical centres).

  25. Workshop Session

  26. Workshop Preparation • ChemShell overview • Introduction to Tcl • Script basics • Modules overview • creating Input data objects • dl_poly • gamess • QM/MM Methods • hybrid • QM/MM models available • Input examples • ChemShell GUI

  27. ChemShell • A Tcl interpreter for Computational Chemistry • Interfaces • ab-initio (GAMESS-UK, Gaussian, CADPAC, TURBOMOLE, MOLPRO, NWChem etc) • semi-emprical (MOPAC, MNDO) • MM codes (DL_POLY, CHARMM, GULP) • optimisation, dynamics (based on DL_POLY routines) • utilities (clusters, charge fitting etc) • coupled QM/MM methods • Choice of QM and MM codes • A variety of QM/MM coupling schemes • electrostatic, polarised, connection atom, Gaussian blur .. • QUASI project developments and applications e.g. Organometallics, Enzymes, Oxides, Zeolites

  28. ChemShell • Extended Tcl Interpreter • Scripting capability • Interfaces to a range of QM and MM codes including • GAMESS-UK • DL_POLY • MNDO97 • TURBOMOLE • CHARMM • GULP • Gaussian94 • Implementation of QM/MM coupling schemes • link atom placement, forces etc • boundary charge corrections

  29. ChemShell Architecture - Languages • An extended Tcl interpeter, written in.. • Tcl • Control scripts • Interfaces to 3rd party executables • GUI construction (Tk and itcl) • Extensions • C • Tcl command implementations • Object management (fragment, matrix, field, graph) • Tcl and C APIs • I/O • Open GL graphics • Fortran77 • QM and MM codes: GAMESS-UK, GULP, MNDO97, DL_POLY

  30. Core ChemShell Tcl interpreter, with code to support chemistry data: Optimiser and dynamics drivers QM/MM Coupling schemes Utilities Graphics GAMESS-UK (ab-initio, DFT) MNDO (semi-empirical) DL_POLY (MM) GULP (Shell model, defects) Features Single executable possible Parallel implementations External Modules CHARMM TURBOMOLE Gaussian94 MOPAC AMBER CADPAC Features Interfaces written in Tcl No changes to 3rd party codes ChemShell Architecture

  31. ChemShell control files are Tcl Scripts Usually we use a .chm suffix ChemShell commands have some additional structure, usually they take the following form command arg1=value1 arg2=value2 Arguments can serve many functions mm_defs=dl_poly.ff Identify a data file to use coords=c Use object c as the source of the structure use_pairlist=yes Provide a Boolean flag (yes/no, 1/0,, on/off) list_option=full Provide a keyword setting theory=gamess Indicate which compute module to use Sometimes command arg1=value1 arg2=value2 data ChemShell basics

  32. Variable Assignment (all variables are strings) set a 1 Variable use [ set a ] $a Command result substitution [ <Tcl command> ] Numerical expressions set a [ expr 2 * $b ] Ouput to stdout puts stdout “this is an output string” Very Simple Tcl (i)

  33. Lists - often passed to ChemShell commands as arguments set a { 1 2 3 } set a “1 2 3” set a [ list 1 2 3 ] $ is evaluated within [ list .. ] and “ “ but not { } [ list … ] construct is best for building nested lists using variables Arrays - not used in ChemShell arguments, useful for user scripts Associative - can be indexed using any string set a(1) 1 set a(fred) x parray a Very Simple Tcl (ii)

  34. Very Simple Tcl (iii) • Continuation lines: • can escape the newline tclsh % set a “this \ is \ a single variable” this is a single variable tclsh % • { } will incorporate newlines into the list tclsh % set a {this is a single variable} this is a single variable tclsh %

  35. Very Simple Tcl (iv) • Procedures • Sometimes needed to pass to ChemShell commands to provide an action proc my_procedure { my_arg1 my_arg2 args } { puts stdout “my_procedure” return “the result” } • Files set fp [ open my.dat w ] puts $fp “set x $x” close $fp ….. source my.dat

  36. ChemShell object types fragment - molecular structure creation: c_create, load_pdb …. Universal!! zmatrix z_create, newopt, z_surface matrix creation: create_matrix, energy and gradient evaluators, dynamics field creation: cluster_potential etc, graphical display, charge fits GUI only 3dgraph ChemShell Object types

  37. Between calculations, and sometimes between commands in a script, objects are stored as files. Usually there is no suffix, objects are distinguished internally by the block structure. ChemShell Object Representations % cat c block = fragment records = 0 block = title records = 1 phosphine block = coordinates records = 34 p 4.45165900000000e+00 0.00000000000000e+00 -8.17756491786826e-16 c 6.18573550000000e+00 -2.30082107458395e+00 1.93061811508830e+00 c 8.21288557680633e+00 -3.57856465464377e+00 9.30188790875432e-01 c 9.49481331797844e+00 -5.31433733714784e+00 2.37468849115612e+00 c 8.74959098234423e+00 -5.77236643959209e+00 4.81961751564967e+00 …….... • Multi block objects are initiated by an empty block (e.g. fragment) • Unrecognised blocks are silently ignored

  38. Object Caching • During a run objects can be cached in memory, the command to request this is the name of the object (similar to a declaration in a compiled language) # fragment c c_create coords=c { h 0 0 0 h 0 0 1 } list_molecule coords=c delete_object c # No file is created here • Confusion of objects with files can lead to confusion!!

  39. Object Input and Output • If you access an object from a disk, ChemShell will always update the disk copy when it has finished (there is no easy way of telling if a command or procedure has changed it). • Usually this is harmless (e.g. output formats are precise enough), but unrecognised data in the input will not be present in the output, take a copy if you need to keep it. • E.g. if a GAMESS-UK punchfile contains a fragment object and a single data field (e.g. the potential) you can use it as both a fragment object and a field object % rungamess test % cp test.pun my_structure % cp test.pun my_field % chemsh …….

  40. Energy Gradient Evaluators • Many modules are designed to work with a variety of methods to compute the energy and gradient. The procedure relies on • the interfaces to the codes being consistent, each comprises a set of callable functions e.g. • initialisation • energy, gradient • kill • update • the particular set of functions being requested by a command option, usually theory= • Example evaluators (depends on locally available codes) • gamess, turbomole • dl_poly, charmm • mopac, mndo • hybrid • You can write your own in Tcl

  41. Module options, using the : The : syntax is used to pass control options to sub-module. e.g. when running the optimiser, to set the options for the module computing the energy and gradient. {} can be used if there is more than one argument to pass on. Nested structures are possible using Tcl lists newopt function=zopt : { theory=gamess : { basis=sto3g } zmatrix=z } gamess arguments Command zopt arguments newopt arguments

  42. z_create zmatrix=z { zmatrix angstrom c x 1 1.0 n 1 cn 2 ang f 1 cf 2 ang 3 phi variables cn 1.135319 cf 1.287016 phi 180. constants ang 90. end } z_list zmatrix=z set p [ z_prepare_input zmatrix=z ] puts stdout $p z_create provides input processor for the z-matrix object z_list can be used to display the object in a readable form z_prepare_input provides the reverse transformation if you need something to edit z_to_c provides the cartesian representation Loading Objects - Z-matrices

  43. Can include some atoms specified using cartesian coordinates Can use symbols for atom-path values (i1,i2,i3) Can append % to symbols to make them unique (e.g. o%1) Can create/destroy and set variables using Tcl commands Additional Z-matrix features (i) z_create zmatrix=z1 { zmatrix o%1 o%2 o%1 3. h%1 o%1 1.8 o%2 121.0 h%2 o%2 1.85 o%1 122.0 h%1 29.0 end } z_var zmatrix=z1 result=z2 control= "release all" z_substitute zmatrix=z2 values= {r2=3.0 r3=2.0} z_list zmatrix=z2

  44. Combine cartesian and internal definitions z_create zmatrix=z1 { coordinates ... zmatrix .... end } Additional Z-matrix features (ii) • c_to_z will create a fully cartesian z-matrix

  45. Loading Data Object - Coordinates c_create coords=h2o.c { title water dimer coordinates au o 0.0000000000 0.0000000000 0.0000000000 h 0.0000000000 -1.4207748912 1.0737442022 h 0.0000000000 1.4207748912 1.0737442022 o -4.7459987607 0.0000000000 -2.7401036621 h -3.1217528345 0.0000000000 -2.0097934033 h -4.4867611522 0.0000000000 -4.5020127872 } • No symbols allowed • Can also use read_xyz, read_pdb, read_xtl

  46. # c_create coords=mgo.c { space_group 1 cell_constants angstrom 4.211200 4.211200 4.211200 90.00 90.00 90.00 coordinates Mg 0.10000000 0.00000000 0.00000000 O 0.50000000 0.50000000 -0.50000000 Mg 0.00000000 0.50000000 -0.50000000 O 0.50000000 1.00000000 -1.00000000 Mg 0.50000000 0.00000000 -0.50000000 O 1.00000000 0.50000000 -1.00000000 Mg 0.50000000 0.50000000 0.00000000 O 1.00000000 1.00000000 -0.50000000 } list_molecule coords=c set p [ c_prepare_input coords=c ] Periodic Systems (i) • Crystallographic cell constants can be provided, along with fractional coordinates

  47. # c_create coords=d { title primitive unit cell of diamond coordinates au c 0.8425347285 0.8425347285 0.8425347285 c -0.8425347285 -0.8425347285 -0.8425347285 cell au 0.00000 3.37014 3.37014 3.37014 0.00000 3.37014 3.37014 3.37014 0.00000 } extend_fragment coords=d cell_indices= { -2 2 -2 2 -2 2 } result=d2 set_cell coords=d cell= { 0.00000 3.37014 3.37014 3.37014 0.00000 3.37014 3.37014 3.37014 0.00000 } Periodic Systems (ii) • Alternatively, input the cell explicitly, in c_create or attach to the structure later

  48. QM Code Interfaces • Provides access to third party codes • GAMESS-UK • MOPAC • MNDO • TURBOMOLE • Gaussian98 • Standardised interfaces • argument structure • hamiltonian (includes functional) • charge, mult, scftype • basis (internal library or keywords) • accuracy • direct • symmetry • maxcyc...

  49. GAMESS-UK Interface • Can be built in two ways • Interface calls GAMESS-UK and the job is executed using rungamess (so you may need to have some environment varables set) • parallel execution can be requested even if ChemShell is running serially • GAMESS-UK is built as part of ChemShell • mainly intended for parallel machines energy coords=c \ theory=gamess : { basis=dzp hamiltonian=b3lyp } \ energy=e • Notes • The jobname is gamess1 unless specified • Some code-specific options dumpfile= specify dumpfile routing getq = load vectors from foreign dumpfile

  50. GAMESS-UK example - basis library basisspec has the structure { { basis1 atoms-spec1} {basis2 atomspec2} ….. } Assignment proceeds left to right using pattern matching for atom labels * is a wild card This example gives sto-3g for all atoms except o Library can be extended in the Tcl script (see examples/gamess/explicitbas.chm) ECPs are used where appropriate for the basis energy coords=c \ theory=gamess : { basisspec = { { sto-3g *} {dzp o} } } \ energy=e

More Related