Neural Networks Part 3

Neural NetworksPart 3 Dan SimonCleveland State University

Outline • Sugeno RBF Neurofuzzy Networks • Cardiology Application • Hopfield Networks • Kohonen Self Organizing Maps • Adaptive Neuro-Fuzzy Inference Systems (ANFIS)

Sugeno RBF Sugeno fuzzy system; p fuzzy rules, scalar output Defuzzified output (centroid defuzzification) Summation over all p fuzzy rules wi = firing strength of i-th rule (Chapter 4) Suppose we use productinference. Then: i2(x2) i1(x1) wi x1 x2

Suppose the outputs are singletons (zero-order Sugeno system). Then zi(x) = zi and:

Suppose the input MFs are Gaussian. Then: Recall the RBF network: y =  wif (x, ci) =  wi ( ||xci|| )  (.) is a basis function { ci } are the RBF centers

x1 m1(x) w1 x2 m2(x) w2 y … … wp xm We started with a Sugeno fuzzy system and ended up with an RBF network that has input-dependent weights wi. This is a neuro-fuzzy network. mp(x)

cik and ik: p m zi : p A total of p(2m+1) adjustable parameters m = input dimension, p = number of hidden layers Gradient descent or BBO Chen and Linkensexample : y = x2 sin(x1) + x1 cos(x2) NeuroFuzzy.zip / BBO.m p = 4

Target Neurofuzzy Approximation 6,000 BBO generations, RMS error = 0.6 We can also use gradient descent training

Neurofuzzy Diagnosis of Heart Disease • Cardiovascular disease is the leading cause of death in the western world • Over 800,000 deaths per year in the United States • One in five Americans has cardiovascular disease • Cardiomyopathy: weakening of the heart muscle • Could be inherited or acquired (unknown cause) • Biochemical considerations indicate that cardiomyopathy will affect the P wave of an ECG

Neurofuzzy Diagnosis of Heart Disease • Cardiologists tell us that primary indicators include: • P wave duration • P wave amplitude • P wave energy • P wave inflection point • This gives us a neurofuzzy system with four inputs.

Neurofuzzy Diagnosis of Heart Disease • ECG data collection • Data collected for 24 hours • Average P wave data calculated each minute • Duration • Inflection • Energy • Amplitude • 37 cardiomyopathy patients, 18 control patients

Neurofuzzy Diagnosis of Heart Disease Normalized P wave features with 1- bars. Data is complex due to its time-varying nature.

Neurofuzzy Diagnosis of Heart Disease BBO training error and correct classification rate (CCR) percent as a function of the number of middle layer neurons p. What about statistical significance?

Neurofuzzy Diagnosis of Heart Disease Training error and correct classification rate (CCR) percent for different mutation rates using BBO (p = 3).

Neurofuzzy Diagnosis of Heart Disease Typical BBO training and test results

Neurofuzzy Diagnosis of Heart Disease Success varies from one patient to the next. Does demographic information needs to be included in the classifier? Percent correct Patient number

The Discrete Hopfield Net • John Hopfield, molecular biologist, 1982 • Proc. of the National Academy of Sciences • Autoassociative network: recall a stored pattern similar to the input pattern • Number of neurons = pattern dimension • Fully connected network except wii = 0 • Symmetric connections: wik = wki • Stability proof

The Discrete Hopfield Net • The neuron signals comprise an output pattern. • The neuron signals are initially set equal to some input pattern. • The network converges to the nearest stored pattern. • Example: Store [1, 0, 1], [1, 1, 0], and [0, 0, 1] • Input [0.9, 0.4, 0.6] • Network converges to [1, 0, 1]

Store Pbinary patterns, each with n dimensions: s(p) = [s1(p), …, sn(p)], p = 1, …, P Suppose the neuron signals are given by y = [s1(q), …, sn(q)] When these signals are updated by the network, they are updated to

Recall si [0, 1]. Therefore, the average value of the term in brackets is 0, unless q = p, in which case the average value is n / 2. Therefore, we adjust the neuron signals as: One neuron update at a time where i = threshold. This results in s(p) being a stable network pattern. (We still have not proven convergence.)

Binary Hopfield Net Example: Two patterns, p = 1 and p = 2, so P = 2 s(1) = [1, 0, 1, 0], s(2) = [1, 1, 1, 1]

Input y = [1 0 1 1] – close to s(2) = [1, 1, 1, 1] Threshold i = 1 Convergence

Recall s(1) = [1, 0, 1, 0], s(2) = [1, 1, 1, 1] Storage capacity: P = 0.15n (experimental) P = n / (2 log2n) Is s(1) stable? Is s(2) stable? Are any other patterns stable?

Hopfield Net Stability: Consider the “energy” function Is E bounded? How does E change when yi changes?

Recall our activation function: If yi = 1, it will decrease if wikyk < i This gives a negative change for E (see prev. page) If yi = 0, it will increase if wikyk > i This gives a negative change for E (see prev. page) We have a bounded E which never increases. E is a Lyapunov function.

Control Applications of Hopfield Nets • If we have optimal control trajectories, and noise drives us away from the optimal trajectory, the Hopfield net can find the closest optimal trajectory • Transform a linear-quadratic optimal control performance index into the form of the Hopfield network energy function. Use the Hopfield network dynamics to minimize the energy.

Kohonen Self Organizing Map Clustering; associative memory – given a set of input vectors { x }, find a mapping from the input vectors onto a grid of models { m } (cluster centers)Nearby models are similarVisualize vector distributionsTuevo Kohonen, engineer,Finland, 1982Unsupervised learning

All the weights from the n input dimensions to a given point in output space correspond to a cluster point. Note: The inputs are not multiplied by the weights, unless the inputs are normalized – then maxk[x.w(k)] gives the cluster point that is closest to x, because the dot product of x and w(k) is the cosine of the angle between them.

Kohonen SOM Learning Algorithm We are given a set of input vectors {x}, each of dimension n Choose the maximum number of clusters m Random weight initialization {wik}, i[1,n], k[1,m] Note wik is the weight from xi to cluster unit k Iterate for each input training sample x: Find k such that D(k)  D(k’) for all k’ scalar form: wik  wik + (xi wik), for i[1,n] n-dimensional vector form: wk  wk + (xwk)  = some function that decreases with the distance between x and wk, and decreases with time (# of iterations). This update equation moves the wk vector closer to x.

Kohonen SOM Example Cluster [1, 1, 0, 0]; [0, 0, 0, 1]; [1, 0, 0, 0]; [0, 0, 1, 1] Maximum # of clusters m = 2 (t) = (0.6)(0.95)t, where t = iteration # (coarse clustering to start, fine-tuning later) Random initialization: First vector: D(1) = 1.86, D(2) = 0.98 w2  w2 + 0.6(xw2) = [0.92, 0.76, 0.28, 0.12]T Second vector: D(1) = 0.66, D(2) = 2.28 w1  w1 + 0.6(xw1) = [0.08, 0.24, 0.20, 0.96]T

Third vector: D(1) = 1.87, D(2) = 0.68 w2  w2 + 0.6(xw2) = [0.97, 0.30, 0.11, 0.05]T Fourth vector: D(1) = 0.71, D(2) = 2.72 w1  w1 + 0.6(xw1) = [0.03, 0.10, 0.68, 0.98]T This is the end of the first iteration (epoch) Kohonen.m Adjust  for the next iteration Each cluster point (weight column) is converging to about the average of the two sample inputs that are closest to it.

Control Applications of Kohonen Networks • Fault accomodation • Suppose we have a family of controllers, one controller for each fault condition • When a fault occurs, classify it in the correct fault class to choose the control • This idea can also apply to operating modes, reference input types, user intent, etc. • Missing sensor data – the Konohen network can fill in the most likely values of missing sensor data

Adaptive Neuro-Fuzzy Inference Systems • Originally called adaptive network-based fuzzy inference systems • Roger Jang, 1993 (Zadeh’s student)

Figure 12.1(b) in Jang’s book Two-input, single-output ANFIS Layer 1: Fuzzy system; outputs = membership grades Layer 2: Product Layer 3: Normalization Layer 4: Sugeno fuzzy system Layer 5: Sum

Layer 1 outputs: A1(x), A2(x), B1(y), B2(y) Layer 2 outputs: w1 = A1(x)B1(y), w2 = A2(x)B2(y) Layer 3 outputs: Layer 4 outputs: Layer 5 output: (or any other T-norm)

So ANFIS is a Sugeno fuzzy system. • Neural network architecture • It can be trained with neural network methods (e.g., backpropagation). • Consequent parameters =pi, qi, and ri. • Output is linear with respect to these parameters. • We can optimize with respect to the consequent parameters using least-squares. • This is called the forward pass. • 1st-order Sugeno system with n inputs and m fuzzy Sugeno partitions per input  3mn linear parameters.

Premise parameters = parameters of fuzzy sets A1, A2, B1, B2, etc. • ANFIS output is nonlinear with respect to these parameters. • Gradient descent can be used to optimize the output with respect to these parameters. • This is called the backward pass. • Premise fuzzy system with n inputs, q fuzzy partitions per input, and k parameters per MF  kqn nonlinear parameters.

References • M. Chen and D. Linkens, A systematic neuro-fuzzy modelling framework with application to material property prediction, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 31, no. 5 (2001), pp. 781-790 • M. Ovreiu and D. Simon, Biogeography-Based Optimization of Neuro-Fuzzy System Parameters for Diagnosis of Cardiac Disease, Genetic and Evolutionary Computation Conference, Portland, Oregon, pp. 1235-1242, July 2010 • J. Hopfield, Neural Networks and Physical Systems with Emergent Collective Computational Abilities, 1982 • P. Simpson, Artificial Neural Systems, Pergamon Press, 1990 • L. Fausett, Fundamentals of Neural Networks, Prentice Hall • www.scholarpedia.org/article/Kohonen_network • J.-S. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, 1997

Neural Networks Part 3

Neural Networks Part 3

Presentation Transcript

Neural Networks

Chapter 3 ARTIFICIAL NEURAL NETWORKS

Neural Networks

Neural Networks

Neural Networks

Classification Part 3: Artificial Neural Networks

Neural Networks

PSY105 Neural Networks 3/5

Neural Networks

Neural Networks

Neural Networks

Neural networks

Neural Networks

Neural Networks (NN) Part 1

Part 8 : Neural Networks

Neural Networks Part 4

Chapter 7 (part 3) Neural Networks

Neural Networks

Chapter 3 ARTIFICIAL NEURAL NETWORKS

Neural Networks