Institute for Advanced Studies in Basic Sciences – Zanjan

Institute for Advanced Studies in Basic Sciences – Zanjan Kohonen Artificial Neural Networks in Analytical Chemistry Mahdi Vasighi

Contents • Introduction to Artificial Neural Network (ANN) • Self Organizing Map ANN • Kohonen ANN • Applications

Introduction An artificial neural network (ANN), is a mathematical model based on biological neural networks. In more practical terms neural networks are non-linear statistical data modeling tools. They can be used to model complex relationships between inputs and outputs or to find patterns in data.

The basic types of goals or problems in analytical chemistry for solution of which the ANNs can be used are the following: • Election of samples from a large quantity of the existing ones for further handling. • Classification of an unknown sample into a class out of several pre-defined (known in advance) number of existing classes. • Clustering of objects, i.e., finding the inner structure of the measurement space to which the samples belong. • Making models for predicting behaviors or effects of unknown samples in a quantitative manner.

The first thing to be aware of in our consideration of employing the ANNs is the nature of the problem we are trying to solve: Supervised or Unsupervised

Target Supervised Learning The supervised problem means that the chemist has already a set of experiments with known outcomes for specific inputsat hand. In this networks, structure consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation.

Unsupervised Learning The unsupervised problem means that one deals with a set of experimental data which have no specific associated answers (or supplemental information) attached. In unsupervised problems (like clustering) it is not necessary to know in advance to which cluster or group the training objects Xs belongs. The Network automatically adapts itself in such a way that the similar input objects are associated with the topological close neurons in the ANN.

Kohonen Artificial Neural Networks The Kohonen ANN offers considerably different approach to ANNs. The main reason is that the Kohonen ANN is a ‘self-organizing’ system which is capable to solve the unsupervised rather than the supervised problems. The Kohonen network is probably the closest of all artificial neural networks architectures and learning schemes to the biological neuron network

As a rule, the Kohonen type of net is based on a single layer of neurons arranged in a two-dimensional plane having a well defined topology A defined topology means that each neuron has a defiend number of neurons as nearest neighbors, second-nearest neighbor, etc.

The neighborhood of a neuron is usually arranged either in squares or in hexagon. In the Kohonen conception of neural networks, the signal similarity is related to the spatial (topological) relation among neurons in the network.

Competitive Learning The Kohonen learning concept tries to map the input so that similar signals excite neurons that are very close together.

1st step : an m-dimensional object Xs enters the network and only one neuron from those in the output layer is selected after input occurs, the network selects the winner “c” (central) according to some criteria. “c” is the neuron having either: the largest output in the entire network

W 2nd step : After finding the neuron c, its weight vector are corrected to make its response closer to input. 3rd step : The weight of neighboring neurons must be corrected as well. These corrections are usually scaled down, depending on the distance from c. Beside decreasing with increasing the distance from c, it decreases with each iteration step. (learning rate)

amax Triangular d dc amax Mexican hat d dc XS i Input (1×i ) 5×5

4th step : After the correction have been made the weights should be normalized to a constant value, usually 1. 5th step : The next object Xs is input and the process repeated. After all objects are input once, one epoch is completed.

Input vector Winner 4×4×2 output

amax=0.9 amin=0.1 t=1 (first epoch) Neighbor function: Linear  winner d Input vector × 0.4×0.9 1×0.9× 0.8×0.9× 0.6×0.9×

XS a a c b d b c d e e Top Map Trained KANN Top Map After the training process accomplished, the complete set of the training vectors is once more run through the KANN. In this last run the labeling of the neurons excited by the input vector is made into the table called top map.

3 5 6 2 2 1 1 1 2 1 1 4 2 3 1 1 0 1 1 0 2 1 1 1 0 1 2 0 3 0 XS Input Vector 3 1 2 0 0 2 0 1 0 3 1 0 3 0 0 0 1 0 2 0 H H H H H H H H Top Map L L L L L L L Weight Map The number of weights in each neuron is equal to the dimension m of the input vector. Hence, in each level of weight only data of one specific variable are handled. Trained KANN

Kohonen Map toroid Toroidal Topology W 3rd layer of neighbor neurons

Linking Databases of Chemical Reactions to NMR Data: an Exploration of 1H NMR-Based Reaction Classification Anal. Chem.2007, 79,854-862 Analytical Applications • Classification and Reaction monitoring • Classification of photochemical and metabolic reactions by Kohonen self-organizing maps is demonstrated • Changes in the 1H NMR spectrum of a mixture and their interpretation in terms of chemical reactions taking place. • Difference between the 1H NMR spectra of the products and the reactants as a descriptor of reaction was introduced as input vector to Kohonen self organizing map.

Dataset: Photochemical cycloadditions. This was partitioned into a training set of 147 reactions and a test set of 42 reactions, all manually classified into seven classes. The 1H NMR spectra were simulated from the molecular structures by SPINUS. • The input variables: Reaction descriptors derived from 1H NMR spectra. • Topology: toroidal 13×13 and 15×15 for photochemical reactions and 29×29 for metabolic reactions. • Neighbor Scaling function: Linear decreasing triangular with learning rate of 0.1 to 0 with 50-100 epoch • Winning neuron selection criteria: Euclidean distance.

Toroidal top map of a 14×14 Kohonen self- organizing map Classes of Photochemical Reactions After the predictive models for the classification of chemical reactions were established on the basis of simulated NMR data, their applicability to reaction data from mixed sources (experimental and simulated) was evaluated.

A second dataset : 911 metabolic reactions catalyzed by transferases classified into eight subclasses according to the Enzyme Commission (E.C.) system. resulting surface for such a SOM, each neuron colored according to the Enzyme Commission subclass of the reactions activating it, that is, the second digit of the EC number.

For photochemical reactions, The percentage of correct classifications obtained for the training and test sets by SOMs. Correct predictions could be achieved for 94-99% of the training set and for 81-88% of the test set. For metabolic reactions, 94-96% of correct predictions for SOMs. The test set was predicted with 66-67% of accuracy by individual SOMs.

Current Computer-Aided Drug Design, 2005, 1, 73-78 Kohonen Artificial Neural Network and Counter Propagation Neural Network in Molecular Structure-Toxicity Studies For n molecules Activity m Descriptors Molecule n×1 n×m Analytical Applications • QSAR & QSTR A general problem in QSAR modeling is the selection of most relevant descriptors.

n molecule Data input m descriptor n×1 n molecule Data input m descriptor m×1 • Descriptor clustering • Calibration and test set Selection

References • Chem.Int.Lab.sys. 38 (1997) 1-23 • Neural Networks For Chemists, An Introduction. (Weinheim/VCH Publishers) • Anal. Chem. 2007, 79, 854-862 • Current Computer-Aided Drug Design, 2005, 1, 73-78 • Acta Chimica Slovenica 1994, pp. 327-352

Thanks

Institute for Advanced Studies in Basic Sciences – Zanjan