1 / 44

Memory

Computational Intelligence. Memory. Based on a course taught by Prof. Randall O'Reilly University of Colorado and Prof. Włodzisław Duch Uniwersytet Mikołaja Kopernika. Janusz A. Starzyk. Memory is any persistent effect of experience. General remarks.

tatum
Télécharger la présentation

Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Intelligence Memory Based on a course taught by Prof. Randall O'Reilly University of Colorado and Prof. Włodzisław Duch Uniwersytet Mikołaja Kopernika Janusz A. Starzyk

  2. Memory is any persistent effect of experience. General remarks Memory is seemingly uniform, but in reality it is very differentiated: spatial, visual, aural, recognition, declarative, semantic, procedural, explicit, implicit … Here we test mechanisms, so the primary division is: • Synaptic memory (physical changes in synapses), long-term and requiring activation to have some influence on functioning. • Dynamic memory, active, temporary activations, affects current functioning. • Long-term priming, based on synaptic memory, yielding to fast modification – semantic and procedural memory are the result of slow processes. • Short-term priming, based on active memory.

  3. Memory Types STM LTM Working memory Short term memory Long term memory Nondeclarative Declarative Events Facts Manual skills Conditioning Priming Parietal cortex Prefrontal cortex Limbic system Motor Emotional Nuclei Cerebellum Neocortex General remarks

  4. PC – rear parietal cortex and motor cortex; distributed representations, spatial memory, long-term priming, associations, deductions, schemes. FC – prefrontal cortex, isolated representations, disruption control, working memory. HC – hippocampus formation, episodic 3 regions memory, spatial memory, declarative memory, sparse representations, good image separation. • Slow learning, statistically relevant relationships => procedural and semantic memory, cortical; fast => episodic, HC. • Retaining active information and simultaneously accepting new information, eg. multiplying in your head 12*6, requires FC.

  5. A neurons learns situational probability, correlations between the desired activity and input signals; optimal value of 0.7 is reached rapidly only with a small learning constant of 0.005 Slow/rapid learning • Every experience is a small fragment of uncertain, potentially useful knowledge about the world => stability of one's image of the world requires slow learning, integration leads to forgetting individual events. • Relevant new information is learned after a single exposure. • Lesions in the formation of the hippocampus cause subsequent amnesia. • The neuromodulation system reaches a compromise of stability/plasticity.

  6. Complementary learning systems

  7. Distributed overlapping representations in the PC can efficiently record information about the world, but this is not very precise and blurs with the passage of time. FC – prefrontal cortex, stores isolated representations; increases memory stability. The effects of priming are evident in people with a damaged hippocampus, cortical priming in the PC is possible. Active memory and priming We will differentiate many forms of priming: • length (short-term, long-term), • type of information (visual, lexical), • similarity (repetition, semantic).

  8. Standard: completing roots, after reading a list of words we get a root and must add the ending, eg. rea--- If reaction was on the list earlier, then it is usually chosen. The interval of time can be about an hour, so active memory can't be responsible for this. Homophones: read, reed. Completion: "It was found that the ...eel is on the ...", in which the last word is "orange, wagon, shoe, table” is heard as: "peel is on the orange", "wheel is on the wagon", "heel is on the shoe" "meal is on the table". Priming

  9. Project wt_priming.proj, Chapter 9 from (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Wt_Priming) View Events: the first 3 have the same input images, but different output images, in total 13 pairs x 2 outputs = 26 combinations, IA - IB Attention: we're not yet learning the AB-AC lists, just the effect of learning. Priming model

  10. View TrainLog and evaluation of the result: similarity of the output image, summarized as a yellow line, the name of the most similar event, measured by sm_nm = binary errors in the names of the closest events, part of the result not very similar to the given: A  B. Exploring the model In blue both_err = 1 only if this isn't one of the two acceptable output images. Noise helps to break through impasses but it also causes a small lack of stabilisation of already-learned images.

  11. Test_logs: first we will check if there are some tendencies, and then if we can teach a network to change preference after the presentation of IA and then IB. wt_update=Test, Test does one epoch, check Trial1_TextLog: ev_nm is either IA, or IB, and sm_nm is either 0 or 1, randomly. In Epoch1_TextLog we can see that there is always one of the two results, in sum 13/26, or half the time: there is no tendency. We check whether one exposure changes anything. wt_update => On_Line, learning after every event, Run Test, the frequency increases significantly to 18 and then 25 times. Conclusions: just error reduction gives mixed outputs A and B, a network without kWTA won't learn this task. The parietal cortex can be responsible for long-term priming. Further tests

  12. People are able to learn two lists, word pairs A-B, and then A-C, eg. window-mind bike-trash .... and then: window-train bike-cloud without greater interference, doing well on tests for AB and AC. Networks with only error correction forget catastrophically! Interference results from using the same elements and weights to learn different associations. It's necessary to use different units, or to learn with context. AB-AC Learning

  13. Project ab_ac_interference.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_AB-AC_List_Learning) View Events_AB, Events_AC, Output: either A, or C, the context differentiates. Replication of catastrophic learning: View: Train_graph_log, red = errors, yellow = tests for AB. AB-AC Model The test shows that after learning AC, the network forgets AB, many units in the hidden layer take part in the learning of both lists.

  14. hid_kwta 12=>4 to decrease the number of active elements. The test, but without changes. Increase the variance of initial values. wt_var 0.25=>0.4 Stronger influence of context fm_context 1=>1.5 Hebbian learning hebb 0.01=>0.05 Decrease the rate of learning lrate => 0.1, Batch Nothing here clearly helps but the catastrophes are less likely... Two systems of learning are clearly necessary, a fast one and a slow one – cortex and hippocampus. AB-AC Model

  15. Anatomy and connections of the structures of the hippocampal formation: signals reach from uni- and multimodal association areas through the Entorminal Cortex (EC). Hippocampus

  16. Hippocampus = king of the cortex Bidirectional connections with the entorhinal cortex: olfactory bulb, cingulate cortex, superior temporal gyrus (STG), insula, orbitofrontal cortex. More anatomy

  17. Sporadic activation Representations in CA3 and CA1 are focused on specific stimuli, while in the subiculum and the entorhinal cortex they are strongly distributed. More anatomy

  18. Model contains structures: dentate gyrus (DG), areas CA1 and CA3, entorhinal cortex (EC). Pct Act = % of activation. Hippocampal formation

  19. Separation and conjunction of images • The hippocampus rapidly associates various representations of the cortex. • Creates episodic memory • Completes activations recreated from the memory and separates them into clearly distinct meanings • Sparse encoding eases the separation of meanings CA1 separates by conjunction of images (representations) It's also able to recreate the original activation from the EC by reversible connections

  20. Project hip.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Hippocampus) Input signals enter through the entorhinal cortex (EC_in), to the dentate gyrus DG and the CA3 area, DG also influences CA3, where received signals can be completed through associations. CA3 has strong internal connections. CA1 has more distributed sparse representations => EC_out. EC: 144 el = 4*36; 1 of 4 active.DG: 625 el, CA3: 240 el CA1: 384 el = 12 col * 32 el Model of the hippocampus

  21. Learning of AB – AC associations without interference. Autoassociations: EC_in = EC_out, reversible transformations. BuildNet, View_Train_Trial_Log will show the statistics. The input includes information about the input and output images and the list. StepTrain: units chosen in the previous step have white outlines. Partial overlapping of images in EC_in, DG, CA3, CA1. Training epoch: 10 list elements + 3 test sets: AB, AC, new View Test_Logs => text and graph log train_updt = no_updt to the test log, Run will do 3 epochs, the results are in Text_log, 70% remembered from the AB list and 100% from the AC list. Set test_uodt = no_updt, the network will more rapidly finish 3 training/test epochs. Test analysis: test_updt = Cycle_updt, Clear Trial1_1_Text_log StepTest, we see only A + context, we see how the image completes. Exploration of the hippocampal model

  22. Targ in Network shows what image was learned, act  targ In TextLog, stim_er_on = proportion of units erroneously activated in EC_out, stim_er_off = erroneously not activated in EC_out. In Trial_1_GraphLog we can see these two numbers after every test, for known images they're small, correct memories, for new ones they're large, but on ~0,5 and off ~0.8, the network rarely fails. To move to list AC we turn off Test_updt = Trial_updt (or no_updt) and StepTest until in text_log, epc_ctrl changes to 1. These are events for list AC: the network does not recognize them (rmbr=0) because it hasn't learned them yet. Train_Epcs=5, train_env=Train_AC, Run and check results. Further exploration

  23. The hippocampal model can rapidly, sequentially learn associations AB – AC without excessive interference. For this it was sufficient to use the Hebbian contrast rule, CPCA and the correct architecture. Interference results from using the same units, in CA3 it arrives at separation of identical images (representations) learned in another context. Separation of images doesn't allow associations, inferences based on similarity, efficient encoding of multidimensional information. The conjunction of images happens in CA1. This suggests a complementary role of the hippocampus, supplementing the slow learning mechanisms of the cortex. The hippocampus can remember episodes helping in spatial orientation, create conjunctive representations connecting different stimuli together quicker than the cortex. Summary

  24. Memory is not uniform Weights (long-term, require activation) vs activations (short-term, already activated, can influence processing) Based on weights The cortex has initial states but suffers from catastrophic influences. The hippocampus can learn fast without influences, using sparse distributed representations of images Based on activation The cortex shows initial states but isn't good for short-term memory Cooperation of activation and memory based on weights Video short-term memory in chimpanzees -30 sec Comparison with students– 30 sec Memory

  25. Short-term priming: attention and influence on reaction speed. Besides the duration, memory content and effects resulting from similarity are like long-term priming. Project act_priming.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Act_Priming) Completing roots or homophony, but without learning, only the influence of the remains after the last activation. The network has learned series IA-IB. The test has a series of images and results A and B, we show it A upon output, the network responds A; now we show the image for B but only phase is turned on – (lack of learning), the network's result is sometimes A, sometimes B. LoadNet, View TestLogs,Test The correlations of previous results A and B depend on the speed of fadingof activation; check efekt act_decay 1 => 0, tendency to leaving a.Analyze the influence on results in test_log. Active short-term memory

  26. Project act_maint.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Active_Maintenance): active maintenance of information in working memory despite interference, quickly accessible, doesn't require synaptic changes. Recurrence is necessary, an attractor network with a large pool of attraction, resistant to noise. Video – remembering with delay – 30 sec Active maintenance The processes of analysing environmental data don't require such networks, because they are steered by incoming information. Activation should diverse, enabling associations and inferences, while we have external signals this will suffice, eg. if we note on paper the results of intermediate operations. With a lack of external activations, we have to rely on actively maintained representations in working memory, which has serious limits (famous Miller's 72, and even 42 for complex objects). First a model without attractors, which requires external signals, then distributed representations, but shallow attractors, not very resistant to noise; in the end deep but localised attractors, which disable associations.

  27. Project act_maint.proj. 3 objects, 3 elements (features) Maintenance model r.wt, View Grid_log, Run: if there is an input activation is maintained, but after removal it disperses (the network blurred...). Check influence wt_mean =0.5, wt_var = 0.1, 0.25, 0.4 Net_Type Higher_order: we add combinations of feature pairs. Defaults, Run, add noise_var=0.01, the network forgets...

  28. Default to return to initial parameters. network = IsolatedNet Lack of connections between hidden units, but there is recurrence, activation doesn't fade. Noise = 0.01 doesn't interfere, but with 0.02 sometimes gets ruined. Is it worth learning to focus in spite of noise? Different task: does stimulus S(t) = S(t+2)? Parameters: input_data = MaintUpdateEnv, network Isolated, noise 0.01 Isolated representations Init, Run: there are two inputs, Input 1 and 2, wt_scale 1=>2, changes the strength of local connections. The network can be switched from fast actualization to long-lasting maintenance. How to do this automatically? Dopamine and dynamic regulation of reward in the PFC.

  29. The prefrontal cortex plays the central role in maintaining active working memory and has desired properties: isolated self-activating attractor networks with extensive pools. Neuroanatomy, PFC connections and microcolumns => specialized area for active memory. Working memory • A. PR – spatial. • B. PR - spatial, self-ordered tasks. • C. PR - spatial, object and verbal, self-ordered tasks and analytical thinking. • D. PR - objects, analytical thinking. Typical experiments require delayed choice and show the differences between PC, IT, which have only temporary stimulus representations, and PFC, which maintains them longer.

  30. Blocking of dopamine has a negative influence on working memory, and aiding it has a positive influence. Dopamine (DA) arrives from the VTA (ventral tegmental area). DA strengthens internal activations, regulating access to working memory. VTA displays such increased activity. TD – temporal Difference in RL Role of dopamine Basal ganglia can also regulate PFC activity.

  31. Project pfc_maint_updt.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_PFC_Maint_Updt) Working memory Dynamic "gate” AC added to the network with recurrence and learning based on temporal differences (TD). Inputs: A, B, C, D Ignore, Store, Recall decides what to do with them PFC is working memory, AC = adaptive critic is a reward system (dopamine) controlling information renewal in the PFC, hidden layer represents the parietal cortex, hidden 2 maps to the output (frontal cortex). AC learns to predict the next reward, modulating the strength of internal PFC connections.

  32. r.wt: one-to-one connections between input, hidden layers and the PFC. AC has connections with the hidden layer and the PFC, but reverse connections AC => PFC serve only to modulate. Act, Step: we observe phases – and +, at first the activation of PFC and AC is zero, there are two + steps, first to change PFC weights, and then to set the correct signal propagation. When signal R appears (reminders), the network will not act correctly at first, the reward in AC is 0. At first the network doesn't know what's going on, learning only on Store, Ignore hidden layer 2, but sometimes noise in the PFC will cause the correct result and reward to appear. PFC Model View Epoch_log, observe the change in weight of unit AC, r.wt Weights of S => AC should increase and error will decrease, the yellow line is the number of incorrect predictions of AC. View, Grid_log, Clear, act, Step. Store introduces data to the PFC, but Ignore doesn't. After Recall, PFC is zeroed.

  33. Interactions between active and synaptic memory - weights have already changed but active memory is in a different state: what wins? These interactions are visible in the developing brains of children ~ 8 months (Piaget 1954), experiments done also on animals. A toy (food) is hidden in box A and after a short delay the child (animal) can remove it from there. After several repetitions in A, the toy is hidden in box B; the children keep looking in A. A- not B Active memory doesn't work in children as efficiently as synaptic memory, lesions in the area of the prefrontal cortex cause similar effects in adult and infant rhesus monkeys. Children make fewer errors looking in the direction of the place where the toy was hidden, than reaching for it. There are many interesting variants of this type of experiment and explanations on different levels.

  34. Decision-making process model: we know that information about place and objects is divided, so this information is given on input: place A, B, C, toy T1 or T2 and cover C1 or C2. Synaptic memory is realized with the help of standard CPCA Hebbian learning, and active memory as bi-directional connections between network representations in the hidden layer. Output layers: decisions about the direction of looking and reaching. Project A- not B The direction of looking is always activated during each experience, reaching is activated less often, only after moving the whole set-up toward the child, so these connections will rely on weaker learning. Initial tendency: agreement of looking and reaching on A (weight 0.7). All inputs connected with hidden neurons, weight 0.3. Project a_not_b.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_A_Not_B)

  35. rect_ws =0.3 decides on the strength of recurrent activations in the hidden layer (working memory), changing this parameter simulates a child's development. View Events: 3 types of events, initial showing 4x, then A 2x, then B 1 x. An event has 4 temporal segments: 1) start, pretrial – boxes covered; 2) presentation, toy hidden in A; 3) expectation – toy in A; 4) choice – possible reaching. Experiment 1 Only visible elements are active. View: Grid_log, Run performs the entire experiment, turns off display. ViewPre shows on Grid_log, A is activated ViewA shows A tests, after learning. ViewB shows B tests: the network makes an error.

  36. Activation in the hidden layer flows toward the representation associated from A. rect_ws 0.3 => 0.75 for a mature child. Run, ViewB Although synaptic memory didn't change, more efficient working memory enables the undertaking of correct action. Try for rect_ws = 0.47 i 0.50 What happens? There is no activity – hesitation? Further experiments The results depend on the length of the delay, with a shorter delay there are fewer errors. Delay 3=>1 Do tests for rect_ws = 0.47 i 0.50 What happens with a very young child? rect_ws = 0.15, delay = 3; Weak recurrence, weak learning for A.

  37. The traditional approach to memory assumes functional, cognitive, monolithic, canonical representations in memory. From modeling, it turns out that there are many systems interacting with each other which are responsible for memory, with different characteristics, variable representations and types of information. Recognition memory: was an element of the list seen earlier? A "recognition" signal is enough, remembering is not necessary. A hippocampus model is also useful here, it allows for remembering, but this is too much – in recognition memory the central role seems to be played by the area of the perirhinal cortex. Cued recall - completion of missing information. Free recall – effects of placement on the list (best at the beginning and the end), as well as grouping (chunking) of information. Other types of memory

  38. Learning categories Categorization in psychology - many theories. Classic experiments: Shepard et al. (1961), Nosofsky et al. (1994). Problems with an increasing degree of complexity, division into categories C1, C2, 3 binary properties: color (black/white), size (small/large),shape (,). Type I: one property defines the category. Type II: two properties, XOR, np. Cat A: (black,large) or (white,small), any shape. Type III-V: one property + increasingly more exceptions. Type VI: lack of rules, enumeration Difficulties and speeds of learning: Type I < II < III ~ IV ~ V < VI

  39. Canonical dynamic What happens in the brain while learning category definitions based on examples? Complex neurodynamics <=> the simplest dynamics (canonical). For all logical rules, we can write corresponding equations. For type II problems, or XOR: Feature area

  40. Against majority List: diseases C or R, symptoms PC, PR, I Disease C is associated with symptoms (PC, I), disease R with (PR, I); C happens 3 times more often than R. (PC, I) => C, PC => C, I => C. Predictions „against majority” (Medin, Edelson 1988). Although PC + I + PR => C (60%), PC + PR => R (60%) Neurodynamic attractor pools? PDF in areas {C, R, I, PC, PR}. Psychological interpretation (Kruschke 1996): PR has meaning even though this is a differentiating symptom, although PC is more common. Activation PR + PC more often leads to result R although the gradient in direction R is greater.

  41. Learning Point of view Neurodynamics Psychology

  42. Testing Point of view Neurodynamics Psychology

  43. Summary • Knowledge formed in memory is • built, dynamic, continuous, appearing • Behavior and inhibition of knowledge are the result of dynamic information processing rather than interaction structures set at the top. • Recognition is based on the ability to differentiate earlier-learned activations from new, unknown activations. • The hippocampus ensures high-quality recognition with a high threshold guaranteeing association of earlier-learned activations. • Priming contributes to slow building of inviariant representations • Two learning mechanisms • Based on connection weights • Based on neuron activation

  44. Summary • The cortex helps recognition by priming • The cortex leads to unstimulated associations • The cortex is responsible for working memory cooperating with the hippocampus • Sequences of grouped representations are stored in long-term memory • Memory based on activation requires combining quick-actualizing with stable representations • The hippocampus uses sparse distributed representations for fast learning without mixing ideas • Priming memory can be long-term (based on weights) or short-term (based on activation)

More Related