10 likes | 155 Vues
QSAR MODELLING AND PREDICTION OF PHENOL TOXICITY. Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of Insubria, Varese, Italy. E-mail: paola.gramatica@unimi.it
E N D
QSAR MODELLING AND PREDICTION OF PHENOL TOXICITY Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of Insubria, Varese, Italy. E-mail: paola.gramatica@unimi.it Web: http://fisio.varbio2.unimi.it/dbsf/home.html INTRODUCTION Phenols are chemicals widespread in the environment and widely used as precursors for many products. It is well known that phenols exert effects on human health at concentrations commonly encountered in the environment. For this reason, the toxicity of these compounds has been extensively studied on different end points, but obviously data are not available for all phenols and organisms. Thus, reliable estimation methods are required. QSAR studies are useful for a simple and fast prediction of such data DATA SET The compounds used in this work are the 109 phenols described by Schultz [2]. Toxicity data, available only for 103 chemicals, are expressed in mM/l and in logarithmic scale as log of the inverse of the IGC50 (percent inhibitory growth concentration) on Tetrahymena pyriformis strain. Three phenols (2-aminophenol, cathecol and 4-nitrophenol) that have been shown as outliers by several models, have been excluded from the data set. [2] T.W.Schultz et all. Quantitative structure-activity relationships for the Tetrahymena piryformis population growth end-point: a mechanism of action approach. Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental chemistry and toxicology, 241-262 (1990). MOLECULAR DESCRIPTORS The molecular structures of the studied compounds have been described by using several molecular descriptors, calculated by a software developed by R.Todeschini (tode@disat.unimib.it; http://www.disat.unimib.it/chm) Sum of atomic properties descriptors (6) Count descriptors (45) Empirical descriptors (2) Information indices (16) [1 ]R.Todeschini and P.Gramatica, 3D-modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of the WHIM descriptors, Quant.Struct.-Act.Relat., 16 (1997) 113-119. Autocorrelation descriptors (252) Directional WHIM descriptors (66) [1] Non directional WHIM descriptors (33) [1] Topological descriptors (58) Topographic descriptors (7) Geometric descriptors (170) Quanto-chemicals descriptors (6) CHEMOMETRIC METHODS Several chemometric analyses were applied to the compounds (represented by molecular descriptors) for the selection of an optimal training set for the QSAR models. The analyses performed are: Principal Component Analysis (PCA): this analysis was used to calculate just a few components from a large number of variables. These components allow the highlighting of the distribution of the compounds according to their structure; only the significant components were used in Cluster Analysis and Kohonen Maps to avoid the redundancy of the information. Hierarchical Cluster Analysis: hierarchical clustering was performed using the significant components of the molecular descriptors as variables. Different distance metrics (Euclidean and Manhattan) and different linkages (Complete, average, etc.) were used and compared to find the best way to cluster these compounds. Kohonen Maps: this is an additional way that allows the mapping of similar compounds by using the so-called “self-organised topological feature maps”, which are maps that preserve the topology of a multidimensional representation within the new two-dimensional representation. The position of the compounds in the cells of this map shows the similarity level of the structure of the studied phenols. The centroids of each cell have been selected as the most representative compounds in order to create a training set constituted of the more different phenols. Selection of training set training set test set REGRESSION MODELS The selection of the best subset variables for modelling toxicity was done by a Genetic Algorithm (GA-VSS) approach, where the response is obtained by ordinary least square regression (OLS). All the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models. THE NUMBERED COMPOUNDS ARE OUTLIERS CONCLUSION The present investigation confirms that the toxic response of phenols in the Tetrahymena system can be modelled by a logKow- dependent QSAR. The models developed starting from a wide set of various molecular descriptors identify the hydrophobicity as the single most important variable, as the logKow alone gives a good enough prediction model with a Q2(LOO)= 72.14; other structural parameters, such as electronic and connectivity ones play a role of secondary but useful relevance, at least for this set of compounds. Moreover this study demonstrates that theoretical molecular descriptors are an effective and useful alternative of LogKow. The internal and external validation procedures have confirmed the high predictive capability of the models developed.