1 / 24

Self Organization: Hebbian Learning

Self Organization: Hebbian Learning. CS/CMPE 537 – Neural Networks. Introduction. So far, we have studied neural networks that learn from their environment in a supervised manner Neural networks can also learn in an unsupervised manner as well. This is also known as self organized learning

adam-rivera
Télécharger la présentation

Self Organization: Hebbian Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self Organization: Hebbian Learning CS/CMPE 537 – Neural Networks

  2. Introduction • So far, we have studied neural networks that learn from their environment in a supervised manner • Neural networks can also learn in an unsupervised manner as well. This is also known as self organized learning • Self organized learning discovers significant features or patterns in the input data through general rules that operate locally • Self organizing networks typically consist of two layers with feedforward connections and elements to facilitate ‘local’ learning CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  3. Self-Organization • “Global order can arise from local interactions” – Turing (1952) • Input signal produces certain activity patterns in network <-> weights are modified (feedback loop) • Principles of self organization • Modification in weights tend to self-amplify • Limitation of resources leads to competition and selection of the most active synapse and disregard of less active synapse • Modifications in weights tends to cooperate • Order and structure in the activation patterns represent redundant information CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  4. Hebbian Learning • A self-organizing principle was proposed by Hebb in 1949 in the context of biological neurons • Hebb’s principle • When a neuron repeatedly excites another neuron, then the threshold of the latter neuron is decreased, or the synaptic weight between the neurons is increased, in effect increasing the likelihood of the second neuron to excite • Hebbian learning rule Δwji = ηyjxi • There is no desired or target signal required in the Hebbian rule, hence it is unsupervised learning • The update rule is local to the weight CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  5. Hebbian Update • Consider the update of a single weight w (x and y are the pre- and post-synaptic activities) w(n + 1) = w(n) + ηx(n)y(n) • For a linear activation function w(n + 1) = w(n)[1 + ηx2(n)] • Weights increase without bounds. If initial weight is negative, then it will increase in the negative. If it is positive, then it will increase in the positive range • Hebbian learning is intrinsically unstable, unlike error-correction learning with BP algorithm CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  6. Geometric Interpretation of Hebbian Learning • Consider a single linear neuron with p inputs y = wTx = xTw and Δw = η[x1y x2y … xpy]T • The dot product can be written as y = |w||x| cos(α) • α = angle between vectors x and w • If α is zero (x and w are ‘close’) y is large. If α is 90 (x and w are ‘far’) y is zero. CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  7. Similarity Measure • A network trained with Hebbian learning creates a similarity measure (the inner product) in its input space according to the information contained in the weights • The weights capture (memorizes) the information in the data during training • During operation, when the weights are fixed, a large output y signifies that the present input is "similar" to the inputs x that created the weights during training • Similarity measures • Hamming distance • Correlation CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  8. Hebbian Learning as Correlation Learning • Hebbian learning (pattern-by-pattern mode) Δw(n) = ηx(n)y(n) = ηxT(n)x(n)w(n) • Using batch mode Δw(n) = η[Σn=1Nx(n)xT(n)]w(0) • The term Σn=1Nx(n)xT(n) is sample approximation of the auto-correlation of the input data • Thus Hebbian learning can be thought of learning the auto-correlation of the input space • Correlation is a well-known operation in signal processing and statistics. In particular, it completely describes signals defined by Gaussian distributions • Applications in signal processing CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  9. Oja’s Rule • The simple Hebbian rule causes the weights to increase (or decrease) without bounds • The weights need to be normalized to one as wji(n + 1) = [wji(n) + ηxi(n)yj(n)] / √Σi[wji(n) + ηxi(n)yj(n)]2 • This equation effectively imposes a constraint on the weights that the sum at a neuron be equal to 1 • Oja approximated the normalization (for small η) as: wji(n + 1) = wjin) + ηyj(n)[xi(n) – yj(n)wji(n)] • This is Oja’s rule, or the generalized Hebbian rule • It involves a ‘forgetting term’ that prevents the weights from growing without bounds CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  10. Oja’s Rule – Geometric Interpretation • The simple Hebbian rule finds the weight vector with the largest variance with the input data. • However, the magnitude of the weight vector increases without bounds • Oja’s rule has a similar interpretation; normalization only changes the magnitude while the direction of the weight vector is same • Magnitude is equal to one • Oja’s rule converges asymptotically, unlike Hebbian rule which is unstable CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  11. CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  12. The Maximum Eigenfilter • A linear neuron trained with Oja’s rule produces a weight vector that is the eigenvector of the input auto correlation matrix, and produces at its output the largest eigenvalue • A linear neuron trained with Oja’s rule solves the following eigen problem Re1 = λ1e1 • R = auto-correlation matrix of input data • e1 = largest eigenvector which corresponds to the weight vector w obtained by Oja’s rule • λ1 = largest eigenvalue, which corresponds to the network’s output CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  13. Principal Component Analysis (1) • Oja’s rule when applied to a single neuron creates a principal component in the input space in the form of the weight vector • How can we find other components in the input space with significant variance ? • In statistics, PCA is used to obtain the significant components of data in the form of orthogonal principal axes • PCA is also known as K-L filtering in signal processing • First proposed in 1901. Later developments occurred in the 1930s, 1940s and 1960s. • Hebbian network with Oja’s rule can perform PCA CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  14. Principal Component Analysis (2) PCA • Consider a set of vectors x with zero mean and unit variance. There exist an orthogonal transformation y = QTx such that the covariance matrix of y is Λ = E[yyT] Λij = λi if i = j and Λij = 0 otherwise (diagonal matrix) • λ1 > λ2 > … > λp = eigenvalues of covariance matrix of x (C = E[xxT] • Columns of Q are the corresponding eigenvectors • Vector y is the principal component that has the maximum variance with all other components CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  15. PCA – Example CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  16. Hebbian Network for PCA CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  17. Hebbian Network for PCA • Procedure • Use Oja’s rule to find the principal component • Project the data orthogonal to the principal component • Use Oja’s rule on the projected data to find the next major component • Repeat the above for m <= p (m = desired components; p = input space dimensionality) • How to find the projection onto orthogonal direction? • Deflation method: subtract the principal component from the input • Oja’s rule can be modified to perform this operation; Sanger’s rule CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  18. Sanger’s Rule • Sanger’s rule is a modification of Oja’s rule that implements the deflation method for PCA • Classical PCA involves matrix operations • Sanger’s rule implements PCA in an iterative fashion for neural networks • Consider p inputs and m outputs, where m < p yj(n) = Σi=1p wji(n)xi(n) j = 1, m and, the update (Sanger’s rule) Δwji(n) = η[yj(n)xi(n) – yj(n) Σk=1j wki(n)yk(n)] CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  19. PCA for Feature Extraction • PCA is the optimal linear feature extractor. This means that there is no other linear system that is able to provide better features for reconstruction. • PCA may or may not be the best preprocessing for pattern classification or recognition. Classification requires good discrimination which PCA might not be able to provide. • Feature extraction: transform p-dimensional input space to an m-dimensional space (m < p), such that the m-dimensions capture the information with minimal loss • The error e in the reconstruction is given by: e2 = Σi=M+1pλi CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  20. PCA for Data Compression • PCA identifies an orthogonal coordinate system for the input data such that the variance of the projection on the principal axis is largest, followed by the next major axis, and so on • By discarding some of the minor components, PCA can be used for data compression, where a p-dimension (bit) input is encoded in a m < p dimensional space • Weights are computed by Sanger’s rule on typical inputs • The de-compressor (receiver) must know the weights of the network to reconstruct the original signal x’ = WTy CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  21. PCA for Classification (1) • Can PCA enhance classification ? • In general, no. PCA is good for reconstruction and not feature discrimination or classification CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  22. PCA for Classification (2) CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  23. PCA – Some Remarks • Practical uses of PCA • Data Compression • Cluster analysis • Feature extraction • Preprocessing for classification/recognition (e.g. preprocessing for MLP training) • Biological basis • It is unlikely that the processing performed by biological neurons in, say perception, involves PCA only. More complex feature extraction processes are involved. CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

  24. Anti-Hebbian Learning • Modifying the Hebbian rule as Δwji(n) = - ηxi(n)yj(n) • The anti-Hebbian rule find the direction in space that has the minimum variance. In other words, it is the complement of the Hebbian rule • Anti-Hebbian does de-correlation. It de-correlates the output from the input • Hebbian rule is unstable, since it tries to maximize the variance. Anti-Hebbian rule, on the other hand, is stable and converges CS/CMPE 537 - Neural Networks (Sp 2004/2005) - Asim Karim @ LUMS

More Related