The Principle of Presence: Enhancing Lifelong Learning in Neural Networks

The Principle of Presence: A Heuristic for Growing Knowledge Structured Neural Networks Laurent Orseau, INSA/IRISA, Rennes, France

Neural Networks • Efficient at learning single problems • Fully connected • Convergence in W3 • Lifelong learning: • Specific cases can be important • More knowledge, more weights • Catastrophic forgetting -> Full connectivity not suitable -> Need localilty

How can people learn so fast? • Focus, attention • Raw table storing? • Frog and • Car and • Running woman • With generalization

What do people memorize? (1) • 1 memory: a set of « things » • Things are made of other, simpler things • Thing=concept • Basic concept=perceptual event

What do people memorize? (2) • Remember only what is present in mind at the time of memorization: • What is seen • What is heard • What is thought • Etc.

What do people memorize? (3) • Not what is not in mind! • Too many concepts are known • What is present: • Few things • Probably important • What is absent: • Many things • Probably unrelevant • Good but not always true -> heuristic

Presence in everyday life • Easy to see what is present, harder to tell what is missing • Infants lose attention to balls that have just disappeared • The zero number invented long after other digits • Etc.

The principle of presence • Memorization = create a new concept upon only active concepts • Independant of the number of known concepts • Few active concepts -> few variables -> fast generalization

Implications • A concept can be active or inactive. • Activity must reflect importance, be rare ~ event (programming) • New concept = conjunction of actives ones • Concepts must be re-usable(lifelong): • Re-use = create a link from this concept • 2 independant concepts = 2 units -> More symbolic than MLP: a neuron can represent too many things

Implementation: NN • Nonlinearity • Graphs properties: local or global connectivity • Weights: • Smooth on-line generalization • Resistant to noise • But more symbolic: • Inactivity: piecewise continuous activation function • Knowledge not too much distributed • Concepts not too much overlapping

First implementation • Inputs: basic events • Output: target concept • No macro-concept: -> 3-layer • Neuron = conjunction, unless explicit (supervised learning), -> DNF • Output weights simulate priority

Locality in learning • Only one neuron modified at a time: • Nearest = most activated • If target concept not activated when it should: • Generalize the nearest connected neuron • Add a neuron for that specific case • If target active, but not enough or too much: • Generalize the most activating neuron

Learning: example (0) • Must learn AB. • Examples: ABC, ABD, ABE, but not AB. A B AB Inputs: C D Target already exists E …

N1 active when A, B and C all active 1/3 Disjunction 2/3 1/3 N1 1 1/3 Conjunction 1 0 1-1/Ns 1 Learning: example (1) • ABC: A B AB C D E

1/3 >1/3 1/3 >1/3 1/3 <1/3 1/3 1/3 2/3 1 N2 1/3 Learning : example (2) • ABD: 2/3 A N1 1 B AB C D E

>1/3 >>1/3 >>1/3 >1/3 <<1/3 <1/3 Learning : example (3) • ABE: N1 slightly active for AB 2/3 A N1 1 B AB C 1/3 1/3 2/3 1 N2 D 1/3 E

Unuseful neuron Deleted by criterion Learning : example (4) • Final: N1 has generalized, active for AB 2/3 1/2 A N1 1 1/2 B 0 AB C 1/3 1/3 2/3 1 N2 D 1/3 E

NETtalk task • TDNN: 120 neurons, 25.200 cnx, 90% • Presence: 753 neurons, 6.024 cnx, 74% • Then learns by heart • If inputs activity reversed -> catastrophic! • Many cognitive tasks heavily biased toward the principle of presence?

Advantages w/r NNs • As many inputs as wanted, only active ones are used • Lifelong learning: • Large scale networks • Learns specific cases and generalizes, both quickly • Can lower weights without wrong prediction -> imitation

But… • Few data, limiting the number of neurons: not as good as backprop • Creates many neurons (but can be deleted) • No negative weights

Work in progress • Negative case, must stay rare • Inhibitory links • Re-use of concepts • Macro-concepts: each concept can become an input

The Principle of Presence: Enhancing Lifelong Learning in Neural Networks