Bioinspired Computing Lecture 13

Bioinspired ComputingLecture 13 Associative Memories with Artificial Neural Networks Netta Cohen

Biologically realistic architecture Dynamic interactive behaviour Natural learning protocols Biologically-inspired associative memories Also… steps away from biologically realistic model Unsupervised learning Applications Today Last time Recurrent neural nets: Attractor neural nets:

Biologically-realistic architecture/performance Complex self-sustained activity Distributed representations Dynamic interactions with environment Powerful computation Noise tolerance Graceful degradation Hard to formalise in information processing terms Hard to visualise activity Hard to train with no guarantee of convergence No guaranteed solution Recurrent Nets: Pros & Cons Pros Cons Pros attractor neural netsare a special case of recurrent nets.

Associative Memory The imprinting and recollection of memories is an important component of what we do & how we process information. • If we were to model these processes, here are a few conditions we might want to include in our model: • Store and reliably recall multiple independent memories. • Given only partial input, recall complete information, or • Given noisy input, recall noise-free prototype information. • Learn new memories in a biologically realistic manner. • Recall memories fast enough (before next input is received) • Once recalled, maintain attention or memory long enough (for information processing & transmission elsewhere in brain).

state space cross-section of energy landscape Attractor Neural Nets In RNNs, the state of the system is dictated both by internal dynamics & the system response to inputs from the environment. In some cases, trajectories in state-space can be guaranteed to lead to one of several stable states (fixed points, cycles or generic attractors).

Attractor Neural Nets (cont.) Dynamical systems such as RNNs could serve as models of associative memory if it was possible to encode each memory in a specific stable state or attractor. In 1982, John Hopfield realised that by imposing a couple of restrictions on the architecture of the nets, he could guarantee the existence of attractors, such that every initial condition would necessarily evolve to a stable solution, where it would stay. This is tantamount to the requirement that the above picture be described in terms of an energy landscape.

wij • All connections are symmetric wij =wji wji Attractor Neural Nets: Architecture j • No self-connectedness “Gerard Toulouse has called Hopfield’s use of symmetric connections a ‘clever step backwards from biological realism’. The cleverness arises from the existence of an energy function.”* i wii=0 • The existence of an energy function provides us with: • A formalism of the process of memory storage and recall • A tool to visualise the activity (both learning and recall) • A straightforward way to train the net • Once trained, a guaranteed solution (recall of the correct memory). * Hertz, Krogh & Palmer Introduction to the theory of neural computation (1990).

How Does It Work? Nodes are modelled by conventional binary MP neurons. Each neuron serves both as an input and output unit. (There are no hidden units.) States are given by the pattern of activity of the neurons (e.g. 101 for a network with three neurons). The number of neuron sets the maximum length for a bit-string of memory. Different patterns can be simultaneously stored in the network. The number of independent patterns that can be remembered is less than or equal to the number of nodes. Memory recall corresponds to a trajectory taking the system from some initial state (input) to the local energy minimum (closest association) . Each step along the (recall) trajectory results in the same or a lower energy. Since energy is bounded from below, a solution is guaranteed for every input.

Weight matrix 1 2 3 4 5 (neuron i) (neuron j) 5 4 3 2 1 ... t 1 1 1 1 0 ∑ xi wij i A working example 1 2 3 4 5 Input (t=0) 0 1 0 0 0 1 1 1 0 1 t=1 0 1 1 1 0 t=2 t=3 1 1 1 1 0 1 1 1 1 0 t=4 1 1 1 1 0 0 3 -1 0 1 -3 4 0 3 -5 threshold = 0 2 2 4 0 -1 2 3 3 2 -4 Exercise: repeat this example with an initial input of [ 0 1 0 1 0 ].

More general examples Stability: The stable pattern reached in the working example represents a fixed point in the dynamics. While stable solutions are guaranteed, not all stable solutions are fixed point solutions. State update rule: This example used a “synchronous updating” method. Asynchronous (sequential or random) updating methods can also be implemented.

{ x=1 if input>0 old x if input=0 x= -1 if input<0 Trajectories in energy landscape Where does energy come in? The formalism we need to answer this question comes from physics and requires slight modifications to our notation: 1) neuron coding [0 1]  [-1 1] 2) Threshold now becomes a “sign” function: sign(input) = Spin Glass 3) Asynchronous update

inputi= ∑ xj wij j From energies to MP neurons Ei = - xi inputi Define the energy of node ias where inputi to nodei is the weighted sum over all neurons This is called the mean field approximation: the “magnetic field” at each node (each spin) corresponds to a weighted average over all the fields generated by all other spins. When a specific node senses this field, it wants to align with the mean field, thus reducing its energy. This is the update rule: xi sign(inputi) We have just re-discovered that the MP update rule exactly corresponds to magnetic field alignment in Spin Glasses!

Attractor Neural Nets wij wji The restrictions imposed on the recurrent net are now: j • All connections are symmetric wij =wji • No self-connectedness “Gerard Toulouse has called Hopfield’s use of symmetric connections a ‘clever step backward from biological realism’. The cleverness arises from the existence of an energy function.”* i wii=0 • The existence of an energy function provides us with: • A formalism of the process of memory storage and recall • A tool to visualise the activity (both learning and recall) • A straightforward way to train the net • Once trained, a guaranteed solution (recall of the correct memory).

OK Plug in a guess for w: so that Training the Net We need to find the set of weights that encode a single pattern p of length N bits as a minimum energy solution. The minimum energy is obtained when the output of node i exactly matches the inputs to that node.

Training the Net (cont.) This weight assignment is remarkably reminiscent of Hebbian learning: If two nodes are spiking at the same time, then the weight connecting them is strengthened. Here anti-correlated nodes result in negative (inhibitory) weights. For gradual learning . Only patterns introduced repeatedly will result in the formation of new memories; noise will be ignored. Now generalising for M memories or patterns: Generalised Hebb Rule

Does it work? This applet demonstrates: The distributed representation The ability to perform powerful computation High storage capacity (7 100-bit patterns in 100 neurons) High fidelity and noise tolerance Graceful degradation (for more memories) Eliminated features: Dynamic inputs in training Intrinsic background activity http://www.cs.tcd.ie/Padraig.Cunningham/applets/Hopfield/Hopfield.htm from http://suhep.phy.syr.edu/courses/modules/MM/SIM/Hopfield/ where no longer available.

Storage Capacity How many memories can be stored in the network? To store Mmemories, each of length Nbits, in a network of Nneurons, we first ask how many stable patterns can be reached? In 1987, McEliece et al derived an upper limit for the number of memories that can be stored accurately: M = N/(2 logN). e.g. for N = 100 neurons, M= 11 distinct memories, each 100 bits long can be faithfully stored and recalled. To write out these 11 distinct memories, would take 1100 bits! In general, the coding efficiency of the network can be summarised as 2 log N neurons per pattern (each N bits long). This enormous capacity is paid for by a potentially lengthy recall process. McEliece et al., (1987) IEEE Trans. Inf. Theor.IT-33:461-482.

find path shortest Applications Hopfield nets have obvious applications for any problem that can be posed in terms of optimisation in the sense of maximising or minimising some function, that can be likened to an energy function. The distance matching problem: The travelling salesman problem: given points match pairs shortest length given points

What about the brain (pros)? Hopfield nets maintain some very attractive features from recurrent net architectures. However, the imposition of symmetric weights was a conscious move away from biological realism and toward engineering-like reliability. In contrast, Hopfield nets seem more biologically realistic in disallowing self-connected neurons. Hebbian-like learning is also a great appeal of Hopfield nets, capturing several important principles: (1) unsupervised learning (2) natural synaptic plasticity (3) No necessary distinction between training & testing. (4) robustness to details of training procedure

What about the brain (cons)? While we now have dynamics in training and in recall, we might still ask is this dynamics realistic in the brain? 1) In the memory recall stage: we consider inputs one at a time, waiting for the association to be made before proceeding to the next pattern. Is this how the brain works? 2) The aspiration of every Hopfield net is to arrive at a stable solution. Is this a realistic representation of association, or cognition in general in the brain? In other words, do we represent solutions to problems by relaxing to stable states of activity in the brain, or does the brain represent solutions according to very different, dynamic paradigms that handle continuous inputs and actively resist falling into the abyss of equilibrium?

What about the brain? (cont.) How memories are implanted in our brains remains an exciting research question. While Hopfield nets no longer participate in this discourse, their formative role in shaping our intuition about associative memories remains admirable.

Next time… • Final lecture about neural networks (for the time being) Reading • John Hopfield (1982) “Neural Networks and Physical Systems with Emergent Collective Computational Properties”, Proc. Nat. Acad. Sci.79: 2554-2588. • A highly accessible introduction to the subject, incl. both non-technical and technical approaches can be found at: • www.shef.ac.uk/psychology/gurney/notes/contents.html • Some food for thought: a popular article on CNN: “Researchers: It’s easy to plant false memories”, CNN.com, Feb 16, 2003.

Bioinspired Computing Lecture 13