530 likes | 965 Vues
Problems and approaches in computational chemistry. Chemical descriptors and molecular graphs. Alessandra Roncaglioni - IRFMN. aroncaglioni@marionegri.it. Outline. Descriptors definition Structure Descriptors Descriptors classification (bi- or tri- dimensional) Pros & Cons
E N D
Problems and approaches in computational chemistry Chemical descriptors and molecular graphs Alessandra Roncaglioni - IRFMN aroncaglioni@marionegri.it
Outline • Descriptors definition • Structure Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Introduction • Molecular descriptors are numerical values that characterize properties of molecules • Examples: • Physicochemical properties (empirical) • Values from algorithms, such as 2D fingerprints • Vary in complexity of encoded information and in compute time Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Theoretical descriptors “A molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment” www.moleculardescriptors.eu Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Desiderable descriptors characteristics • Invariance with respect to labelling and numbering of the molecule atoms • Invariance with respect to the molecule roto-translation • An unambiguous computable definition • Values in a suitable numerical range • allowing structural interpretation • no trivial correlation with other molecular descriptors • gradual change in its values with gradual changes in the molecular structure • widely applicable • preferably, allowing reversible decoding (back from the descriptor value to the structure) Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Outline • Descriptors definition • Structure Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
From chemical compounds to descriptors CAS RN. 145131-25-5 N-(2,6-Bis(1-methylethyl)phenyl)-N'-((1-(1-methyl-1H-indol-3-yl)cyclohexyl)methyl)urea CC(C)C1=CC=CC(C(C)C)=C1NC(=O)NCC2(CCCCC2)C3=CN(C)C4=C3C=CC=C4 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Descriptors classification Depending on the structural dimensionality: • Up to 2D (0D-2D) Derived from the atomic composition and connectivity of molecules • 3D Encodingforenergetic and spatial information • Molecular interaction fields (MIF) Encodingforelectrostatic and stericvariation COMPLEXITY Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
2D Descriptors (I) • Many groups accounting for different characteristics • May requires explicit H (check file format) • Fast to be calculated (almost all expert systems rely on 2D descriptors) • More reproducible (do not require 3D structure) but ... • Might be focused on local contribution neglecting intramolecular interactions • Ignore conformational flexibility Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
2D Descriptors (II) but ... • Ignore stereo configuration • Not invariants to tautomerism Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
3D Descriptors (I) • Invarainttoroto-traslationalchanging • Theyrequireconformationalsearch • Followedby QM/MMoptimization Sampling Minimize Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
3D Descriptors (II) • More complete and realistic description of relevant molecular characteristics • Can discriminate among isomers and provide hints to select the most stable tautomer but ... • Computationally more demanding • Involve stochastic steps: non deterministic result • Results depend upon the QM/MM theory used for the optimization • Referencestructure: minimum conformation in vacuumnotnecessairlybeing the bioactiveone Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
MIF (I) • Requires 3D conformationalligned in the Euclideanspace • Relatesvariation in the fieldwithvariation in the activity (3D-QSAR) St1 St2 … Stm El1 El2 … Elm Mol 1 … ……………………… Mol 2 … ……………………… … …………………………… … …………………………………………………………………………………………… … …………………………… Mol n … ……………………… Mol 1 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
MIF (II) Probes: N3+ sp3 Amine NH3 cation N2+ sp3 Amine NH2 cation N2: sp3 NH2 with lone pair N2= sp2 Amine NH2 cation N2 Neutral flat NH2 eg amide N1+ sp3 Amine NH cation N1: sp3 NH with lone pair N1= sp2 Amine NH cation N1 Neutral flat NH eg amide NH= sp2 NH with lone pair N1# sp NH with one hydrogen N: sp3 N with lone pair N:= sp2 N with lone pair N:# sp N with lone pair N-: Anionic tetrazole N NM3 Trimethyl-ammonium cation O sp2 carbonyl oxygen O:: sp2 Carboxy oxygen atom O- sp2 phenolate oxygen O= O of SO4 or sulfonamide OH Phenol or carboxy OH O1 Alkyl hydroxy OH group OC2 Ether oxygen OES sp3 ester oxygen atom ON Oxygen of nitro group OS O of sulfone / sulfoxide OH2 Water OFU Furan oxygen atom C3 Methyl CH3 group C1= sp2 CH aromatic or vinyl .... ............ .... ............ BOTH The amphipathic Probe DRY The hydrophobic Probe Countur map Green = steric +; Yellow = steric -; Red = charge -; Blue = charge + Steric interaction (van der Waals energy calculated by Lennard-Jones function) Electrostatic interaction (calculated by coulombian type function) ... ... ... Hydrogenbondingenergy Solvationenergy Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
MIF (III) • More biologically plausible (receptor interactions) • Identifies areas responsible for the variation of the activity but … • Very sensitive to conformation selection and to the chosen alignment • Proper selection of force fields • Large number of grid point cotribution • QSAR modelling complexity Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Outline • Descriptors definition • Structure Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Typesofdescriptors • Constitutional descriptors • Topological descriptors (topological indexes, connectivity indexes, information contents) • Atom centred fragments • Functional groups • Fingerprints • Electrostatic descriptors(*) (charge descriptors) • Geometric descriptors* • Physico-chemical properties • Quantum- chemicaldescriptors* • Thermodynamicdescriptors(*) • Pharmacophores • WHIM & GETAWAY* • BCUT (or Burdeneigenvalues) • Autocorrelationdescriptors • EVA descriptors* * 3D descriptors 17 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Constitutional descriptors • The most simple and commonly used descriptors • Reflecting the molecular composition of a compound without any information about its molecular geometry • Examples • Molecular weight • Count of atoms and bonds • Count of rings Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Molecular graph • A molecular graph or chemical graph is a representation of the structural formula of a chemical compound in terms of graph theory. • It’s a very convenient and natural way of representing the relationships between objects: objects are represented by vertexes and the relationship between them by edges. . . . . . . . . . Vertex Edge Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Topological descriptors • Calculated from the 2D graph of the molecule on the basis of connection tables or closely-related formats • e.g. the distance matrix • an N x N table showing the distance (in bonds) between each pair of atoms • Obtained by operations on the distance matrices and whose values are independent of vertex numbering or labelling (graph invariants) • Characterize structures according to size, degree of branching, and overall shape, symmetry and cycling Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Connection table 1 O1 2 1 2 C 0 1 1 3 2 4 1 3 O 0 2 2 4 C 1 2 1 5 1 6 1 5 N2 4 1 6 C2 4 1 7 1 7 C0 6 1 8 2 12 1 8 C 1 7 2 9 1 9 C1 8 1 10 2 10 C 0 9 2 11 1 13 1 11 C 1 10 1 12 2 12 C 1 11 2 7 1 13 O 1 10 1 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Distance matrix Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Wiener index • Counts the number of bonds between pairs of atoms and sums the distances between all pairs • Add up all the off-diagonal elements and divide by 2 (because matrix is symmetrical) W = 268 23 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Molecular connectivity indexes • A whole series of indexes, developed by Kier & Hall in the late ‘70s, following earlier work by Randić • Identify all possible subgraphs of different sizes in the molecule • Size of subgraph determines the order of the index • 0 bond subgraph gives a zero order index • 1-bond subgraph gives a 1st order index • 2-bond subgraph gives a 2nd order index • 3-bond subgraph gives a 3rd order index • ... Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Randić index • Calculated from a the H-depleted molecular graph where each vertex is weighted by the vertex degree, i.e. the number of connected non-hydrogen atoms • Example: 1 3 .577 2 3 9 6 .333 3 .577 2 .707 3 .408 1 1 3 .577 1 valence at vertexes bond values as products of vertex valence edge terms as reciprocal of squared root of bond values Randić index = sum of edge terms = 3.179 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Kier & Hall indexes • Chi indexes introduces valence values to encode sigma, pi, and lone pair electrons δi and δj (i ≠ j) = values of the atomic connectivity • Atomic connectivity δi is calculated by: Zi = tot nr electrons in the i-th atom Zi υ = nr of valence electrons Hi = nr H attached to the i-th atom 26 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Kier Shape Indexes • Characterize aspects of molecular shape • Compare the molecule with the “extreme shapes” possible for that number of atoms • Based on the number of atoms (N) and the number of bonds (P) in the graph: • 1 = N (N-1)2 / P2 • 2 = (N-1) (N-2)2 / P2 • 3 = (N-1) (N-3)2 / P2 (if N is odd) • 3 = (N-3) (N-2)2 / P2 (if N is even) • alpha-modified kappa indexes can be generated taking into account the sizes of atoms, relative to C sp3 atom • A molecular flexibility index is derived from these = 1 2/ N Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Information content indexes • Defined on the basis of the Shannon information theory ni= nr of atoms in the i-th class n= tot nr of atoms in the molecule • Classes are determined by the coordination sphere taken into account, leading to indexes of different order k. • Other information content indices: SIC - structural IC CIC - complementary IC BIC - bonding IC q = nr of edges Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Considerations about topological descriptors • Frequently used, easily calculated • It is often difficult to disclose the chemical meaning of highest order indexes • Topological indexes effectively encode the same information as fingerprint fragments • in a less obvious way • but can be processed numerically Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Atom centred fragments & functional groups • Number of specific atom types in a molecule calculated by knowing the molecular composition and atom connectivities • Number of specific functional groups in a molecule, calculated by knowing the molecular composition and atom connectivities Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
2D Fingerprints • Two types: • One based on a fragment dictionary • Each bit position corresponds to a specific substructure fragment • Fragments that occur infrequently may be more useful • Another based on hashed methods • Not dependent on a pre-defined dictionary • Any fragment can be encoded • Originally designed for substructure searching, not for molecular descriptors Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Fragment dictionaries 000101000101000100000000011010100110101000000101000000001000 000101000101000100000000011010100110101000000001000000001000 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Pharmacophores • Used in drug design • Based on atoms or substructures thought to be relevant for receptor binding: specification of the spatial arrangement of a small number of atoms or functional groups • Typically include H bond donors and acceptors, charged centers, aromatic ring centers and hydrophobic centers • With the model in hand, search databases for molecules that fit this spatial environment • Might be 3D Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Creating a Pharmacophore Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Physico-chemical Properties • Will hear about them during QSPR lesson • The key descriptor widespread in QSAR is hydrophobicity • LogP – the logarithm of the partition coefficient between n-octanol and water • LogD – correct LogP on the basis of the dissociated fraction of the compound • Experimentally assessed with shaker flask or reversed phase HPLC • It is often useful to be able to calculate a physico-chemical property for a compound from its structure Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
LogPcalculation • Many methods have been proposed for calculating a good estimate for LogP • Fragment-based methods (ClogP) • pioneered by Corwin Hansch and Al Leo (Pomona College) • identify large fragments, whose contribution to logP value is known from their occurrence in other compounds with measured logP • large “training set” of compounds with accurately-measured logP (the “Starlist”) • works very well if test compound has the right fragments • problems arise if test compound contains fragments that are “missing” from the training set 36 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
LogPcalculation • Atom-based methods (AlogP, XlogP, SlogP) • pioneered by Gordon Crippen (Univ. Michigan) • based on identifying a series of “atom types” in the molecule • essentially, small atom-centred fragments • usually 60-200 such fragments are involved • each atom-type is assigned a numerical value • logP is obtained by adding values for the atom types present in the test molecule • atom-type values are obtained by regression analysis, based on a set of compounds with measured logP • sometimes some extra correction factors are used too 37 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Summary Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano Rognan D., British Journal of Pharmacology (2007) 152, 38–52
Outline • Descriptors definition • Structure Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Quantitative Structure-Activity Relationships • Tomorrow … • Lessons 4&5 Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Chemoinformatics • Molecular database management • Reverse engineering • Chemical similarity assessment Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Molecular similarity • The descriptors of a molecule can be considered a vector of attributes (properties). • The attributes may be real number (continuous variables) or they may be binary in nature (binary variables). For binary variables For continuous variables Tanimotosimilaritycoefficient (Range 0 to 1) (Range -.333 to +1) Hodgkinindex (Range –1 to +1) (Range 0 to 1) Euclideandistance (Range 0 to N) (Range 0 to ) a numnber of bits on for A b numnber of bits on for B c numnber of bits on for A AND B X are vectors Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Drug design • Hightroughput virtual screening Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Outline • Descriptors definition • Structure Descriptors • Descriptors classification (bi- or tri- dimensional) • Pros & Cons • Overview of common descriptor classes (mainly 2D) • Applications • Sw resources • Further reading Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Software resources • Db of calculated descriptors • MOLE db http://michem.disat.unimib.it/mole_db/ • Commercial sw • CODESSA, Dragon, MDL, TSAR, .... • Free sw • Virtual Computational Chemistry Laboratory www.vvclab.org • MODEL - MolecularDescriptorLabhttp://jing.cz3.nus.edu.sg/cgi-bin/model/model.cgi • Open source sw/libraries • Chemistry Development Kit (CDK) http://almost.cubic.uni-koeln.de/cdk/cdk_top • Linux4Chemistry http://www.redbrick.dcu.ie/~noel/linux4chemistry/ Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Further reading • Web • www.moleculardescriptors.eu • Book • “Handbook of Molecular Descriptors”. Roberto Todeschini and Viviana Consonni, Wiley-VCH, 2000. • Papers • Estrada,E., Molina,E. and Perdomo-López,I. (2001). Can 3D Structural Parameters Be Predicted from 2D (Topological) Molecular Descriptors? J.Chem.Inf.Comput.Sci., 41, 1015-1021. • Katritzky,A.R. and Gordeeva,E.V. (1993). Traditional Topological Indices vs Electronic, Geometrical, and Combined Molecular Descriptors in QSAR/QSPR Research. J.Chem.Inf.Comput.Sci., 33, 835-857. • Randic,M. (1990). The Nature of the Chemical Structure. J.Math.Chem., 4, 157-184. • Tetko,I.V. (2003). The WWW as a Tool to Obtain Molecular Parameters. Mini Reviews in Medicinal Chemistry, 3, 809-820. Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Concluding remarks • Depending on the application define the preferred complexity level for chemical description • Avoid to use meaningless numbers: all descriptor types have advantages and limitations but easily interpretable descriptors might be preferred • Examples Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Tautomers (I) Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
Tautomers (II) Predicted values for logBCF model Lipophilicitydescriptorvariation Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano
3D descriptorsvariability (I) LUMO energy Intra Lab. Inter Lab. (PM3) Inter Lab. (AM1) Problems and approaches in computational chemistry – 21 April 2008 – DEI – Milano