10 likes | 77 Vues
D. e. n. d. r. o. g. r. a. m. o. f. h. i. e. r. a. r. c. h. i. c. a. l. c. l. u. s. t. e. r. a. n. a. l. y. s. i. s. E. u. c. l. i. d. e. a. n. d. i. s. t. a. n. c. e. -. c. o. m. p. l. e. t. e. l. i. n. k. a. g. e. S. i. m.
E N D
D e n d r o g r a m o f h i e r a r c h i c a l c l u s t e r a n a l y s i s . E u c l i d e a n d i s t a n c e - c o m p l e t e l i n k a g e . S i m i l a r i t y Variables = first 10 structural principal components 100 0 Chlorinated aliphatics (9) Phen.-Triaz. (10) Chloroaliphatic compounds (7) Organo-phosphates (12) CONGENERIC COMPOUNDS (NITROBENZENES) Benzene derivatives (2) DDT - PCBs (11) PAH (15) nOH is the number of OH groups, Sp is the sum of polarizabilities and Ds is the 3D-WHIM considering the global electrotopological distribution. RANKING OF “EEC PRIORITY LIST 1” CHEMICALS FOR STRUCTURAL SIMILARITY AND MODELLING OF ALGAL TOXICITY D 12 P. Gramatica1, H. Walter2 and R. Altenburger2 1QSAR Research Unit - DBSF - University of Insubria - VARESE - ITALY 2UFZ Centre for Environmental Research - LEIPZIG - GERMANY e-mail: paola.gramatica@uninsubria.it Web: http://fisio.dipbsf.uninsubria.it/qsar INTRODUCTION Environmental exposure situations are often characterized by a multitude of heterogeneous chemicals with different mechanisms of action and type of effect. The EEC priority List 1 (Council Directive 76/464/EEC) consists of heterogeneous environmental chemicals with mostly unknown or unspecific modes of action, so it was used to select components for mixture experiments in the EEC PREDICT (Prediction and Assessment of the Aquatic Toxicity of Mixtures of Chemicals) project. A list of 202 compounds was studied for structural similarity to identify the most representative and dissimilar chemicals and to find an objective method to group them on the basis of their structural aspects. STRUCTURAL DESCRIPTION OF COMPOUNDS Molecular descriptors represent the way chemical information contained in the molecular structure is transformed and coded. Among the theoretical descriptors, the best known, obtained simply from the knowledge of the formula, are: molecular weight and count descriptors (1D-descriptors, i. e. counting of bonds, atoms of different kind, presence or counting of functional groups and fragments, etc.). Graph-invariant descriptors (2D-descriptors, including both topological and information indices), are obtained from the knowledge of the molecular topology. WHIM molecular descriptors [1] contain information about the whole 3D-molecular structure in terms of size, symmetry and atom distribution. All these indices are calculated from the (x,y,z)-coordinates of a three-dimensional structure of a molecule, usually from a spatial conformation of minimum energy: 37 non-directional (or global) and 66 directional WHIM descriptors are obtained. A complete set of about two hundred molecular descriptors has been obtained [2]. [1] Todeschini R. and Gramatica P.; Quant.Struct.-Act.Relat. 1997, 16, 113-119; [2] Todeschini R. and Consonni V. - DRAGON - Software for the calculation of the molecular descriptors., Talete srl, Milan (Italy) 2000. Download: http://www.disat.unimib.it/chm. CHEMOMETRIC METHODS Several chemometric analyses have been applied to the compounds (represented by molecular descriptors) to group the more similar ones, in accordance with a multivariate structural approach, and with the final aim to highlight the structurally most dissimilar compounds. The analyses performed are: Hierarchical Cluster Analysis: hierarchical clustering was performed with the aim of finding clusters of the studied compounds in high dimensional space, using molecular descriptors as variables. Different distance metrics (Euclidean, Manhattan, Pearson) and different linkages (Complete, average, single, etc.) were used and compared to find the best way to cluster these compounds. Principal Component Analysis (PCA):this analysis was used to calculate just a few components from a large number of variables. These components allow the highlighting of the distribution of the compounds according to structure, and find the similarity between compounds assigned to the same cluster. Kohonen Maps: this is an additional way of mapping similar compounds by using the so-called “self-organized topological feature maps”, which are maps that preserve the topology of a multidimensional representation within a toroidal two-dimensional representation. The position of the compounds in this map shows the similarity level of the structure of the EEC List 1 compounds. The chemicals selected as the structurally most dissimilar compounds are: N. Substance Chemical Class 1 atrazine Triazine 2 biphenyl Aromatic 3 chloralhydrat Chlorinated aliphatics 4 2,4,5-trichlorophenol Benzene derivative 5 fluoranthene PAH 6 lindane HCH 7 naphthalene PAH 8 parathion Organophosphate 9 phoxime Organophosphate 10 tributyltin chloride Organotin 11 triphenyltin chloride Organotin REGRESSION MODELS QSAR models were developed by Ordinary Least Square regression (OLS) method. The selection of the best subset variables for modelling the algal toxicity of the studied compounds was done by a Genetic Algorithm (GA-VSS) approach and all the calculations have been performed by using the leave-one-out (LOO) and leave-more-out (LMO) procedures and the scrambling of the responses for the validation of the models. HETEROGENEOUS + CONGENERIC COMPOUNDS HETEROGENEOUS COMPOUNDS R2 = 93.9 Q2LOO = 91.8 Q2LMO = 87.5 SDEP = 0.342 SDEC = 0.296 R2 = 78 Q2LOO = 62.1 Q2LMO = 61.7 SDEP = 0.751 SDEC = 0.573 R2 = 77 Q2LOO = 69.7 Q2LMO = 69.7 SDEP = 0.709 SDEC = 0.619 nO is the number of O atoms, IDDM is the mean information content on the distance degree magnitude, while E1e is a directional 3D-WHIM descriptor of atomic distribution weighted on the electronegativity. nO is the number of O atoms and IDE is the mean information content on the distance equality. CONCLUSIONS The chemometric analyses here applied have been demonstrated to be very useful in ranking the studied chemicals in according to their structural similarity or dissimilarity. In the modelling of structural heterogeneous compounds with unknown mode of action, not very satisfactory QSAR models have been obtained. The role of specific parameters, such as directional WHIMs, capable to describe particular molecular features relevant for explaining the specific mode of action, is always relevant in QSAR models for congeneric chemicals. Increasing heterogeneity increases the role of structural and topological descriptors, accounting for general molecular features, not related to specific mode of action. This work was supported by the Environment & Climate programme for the European Commission, Contract EV4-CT96-0319 (PREDICT) and Contract EVK1-CT99-00012 (BEAM)