A Bayesian Approach to the Reading Process: From Networks to Human Data

A Bayesian Approachto the Reading Process:From Networks to Human Data David A. Medler Center for the Neural Basis of Cognition Carnegie Mellon University

Bayesian Connections • The Bayesian Approach to Cognitive Neuroscience • How do we represent the world? • Bayesian Connectionist Framework. • Bayesian Generative Networks • Learning letters. • How does context affect learning? • Empirical and Simulation Results. • Symmetric Diffusion Networks • The Ambiguity Advantage/Disadvantage. • Closing Remarks

Representing the World P() D • Problem: how do we form meaningful internal representations, P(H), given our observations of the external world, P(D)? P() H

Bayesian Theory • For a given hypothesis, H, and observed data, D, the posterior probability of H given D is computed as: where • P(H) = prior probability of the hypothesis, H • P(H ) = probability of the data, D • P(D | H) = probability of D given H

Bayesian Connectionism P(H) Representation Layer P(D) Surface Layer Mediating Layer

Bayesian Connectionism P(H) Representation Layer Mediating Layer P(D) Surface Layer

It was 20 years ago today... An Interactive Activation Model of Context Effects in Letter Perception James L. McClelland & David E. Rumelhart (1981; 1982) • Word superiority effect • words > pseudowords > nonwords • The model accounted for the time course of perceptual identification.

Interactive Activation Model Word Level Letter Level Feature Level

20 Years Later... • Interactive Activation (IA) Model has been influential. • Many positives, but 20 years of negatives. • Internal representations are hard-coded: The Interactive Activation Model does not learn!

Bayesian Connections • The Bayesian Approach to Cognitive Neuroscience • How do we represent the world? • Bayesian Connectionist Framework. • Bayesian Generative Networks • Learning letters. • How does context affect learning? • Empirical and Simulation Results. • Symmetric Diffusion Networks • The Ambiguity Advantage/Disadvantage. • Closing Remarks

Bayesian Generative Networks • Initial work is an expansion of the Bayesian Generative Network framework of Lewicki & Sejnowski, 1997. • It is an unsupervised learning paradigm for multilayered architectures. • Simplified network equations, added sparse coding constraints, & included a “supervised” component.

Bayesian Generative Networks P(H) Representation Layer Mediating Layer P(D) Surface Layer

Sparse Coding Constraints • Modified the basic framework to include “sparse coding” constraints. • These are a Bayesian prior that constrain the types of representations learned. • Sparse coding encourages the network to represent any given input pattern with relatively few units.

Step 1: Learning the Alphabet • First stage of the IA model is the mapping between features and letters. • We use the Rumelhart & Siple (1974) character features.

Network Learning • 16 surface units (corresponding to 16 line segments) • 30 representation units • Trained for 50 epochs (evaluated at 1, 10, 25 & 50) • Evaluated: • Generative capability of the network • Internal representations formed

Generating the Alphabet

Interpreting Weight Structure

Network Weights Epoch: 1 No Sparse Coding Unit 1 Unit 1 Unit 2 Unit 2 Unit 3 Unit 3 Unit 4 Unit 4 Unit 5 Unit 5 Unit 6 Unit 6 Unit 7 Unit 7 Unit 8 Unit 8 Unit 9 Unit 9 Unit 10 Unit 10 Unit 11 Unit 11 Unit 12 Unit 12 Unit 13 Unit 13 Unit 14 Unit 14 Unit 15 Unit 15 Unit 16 Unit 16 Unit 17 Unit 17 Unit 18 Unit 18 Unit 19 Unit 19 Unit 20 Unit 20 Unit 21 Unit 21 Unit 22 Unit 22 Unit 23 Unit 23 Unit 24 Unit 24 Unit 25 Unit 25 Unit 26 Unit 26 Unit 27 Unit 27 Unit 28 Unit 28 Unit 29 Unit 29 Unit 30 Unit 30 Sparse Coding

What We Have Learned • In the unsupervised framework, the Bayesian Generative Network is able to learn the alphabet. • Representations are not necessarily the same as the IA model. • distributed (not localist) • redundant (features are coded several times) • Having learned the letters, can we now learn words?

Step 2: Learning Words • The second stage of the IA model is the mapping from letters to words. • Interested in how the Bayesian framework accounts for the development of contextual regularities (i.e., letters within words). • Look at participants’ learning of context.

Experimental Motivation --Z- --Z- ---P ---P -E-- -E-- --S- --S- ---R ---R -O-- -O-- KQZW READ GLUR • Our motivation for the current experiments is theword-superiority effect. • Specifically, we draw inspiration from the Reicher-Wheeler paradigm. + + + KQZW GLUR READ

The Task • The current set of studies was designed to simulate how the word superiority may develop. Specifically we were interested in: • the learning of novel, letter-like stimuli • whether stimuli were learned in parts or wholes • the effects of context on learning. • Consequently, we created an artificial environment in which we tightly controlled context.

Experimental Design: Training A B p1 p2 p3 p1 p2 p3 a b c d e f g h i j k l o1 o1 o2 o2 • Reicher-Wheeler task is based the discrimination between two characters. • Wanted a similar task in which context would interact with a character pair.

Total of 16 stimuli Detect change Testing: 288 Stimuli Experimental Design: Testing – 96 Familiar Stimuli: A B a b c g h i a e c g k l d b c j h i AAA BBB a e c g k i – 96 Crossed Stimuli: d e c j k i a b f g h l j e c g k f BAA ABB d b f j h l – 96 Novel Stimuli: a e f g k l d e f j k l a e r g n l CAA CBB

Characters were constructed from the RS features. Each character had six line segments with the following constraints: characters were continuous no two segments formed a straight line no character was a mirror image nor rotation of another. Stimuli A B p1 p2 p3 p1 p2 p3 o1 o1 o2 o2

Initial Simulations 16 P(H) 18 48 P(D) Character 1 Character 2 Character 3

Initial Simulations 16 P(H) 18 48 P(D) Character 1 Character 2 Character 3 • Performance was measured by computing a “differentiation value” based on the difference between the generated surface layer representation (Gi)and the target representation (Ti).

Initial Simulation Results

Simulation Conclusions • Regardless of the network architecture, all simulations showed a (slight) difference between the familiar and crossed stimuli. • No simulation performed well on the novel stimuli in comparison to the other stimuli. • These results are somewhat counter to what we expected. • Is the model broken? • How do participants perform on this task?

Stimulus Presentation 50 ms 500 ms 250 ms 200 ms 250 ms 200 ms

Stimulus Presentation

Data Analysis Detect Change? “Yes” “No” Differ Hit Miss Stimuli Same FA CR • Each participant’s reaction time and proportion of “hits” and “correct rejections” were recorded. • To correct for potential responder biases, the scores were converted to d’ scores using: d’ = ni(Hit) + ni(CR)

Experiment 1: One Novel • 4 Participants, 10 days each • 1440 trials per day: • 288 test trials intermixed with 1152 training trials. • Three conditions: • Familiar (AAA or BBB) • Crossed (BAA or ABB) • Novel (CAA or CBB)

d’ Scores

Reporting Changes

Reaction Times

Experiment Conclusions • Although there is a context effect, it is not as large as we expected, nor as stable. • There are no significant differences in reaction times for any of the conditions. • Participants do not perform well in the Novel condition • this is due to a tendency to respond “Change” to all novel stimuli

Re-Simulation of Task • The network was trained on the same data set that the participants were trained on. • Network learned on all training/testing trials • Wanted a similar measure for network performance. • Used a variant of the Kullback-Leibler divergence measure:

Simulation: Difference Measure

Simulation: Report Change?

Internal Representations Training “Day”: 1 Unit 1 Unit 13 Unit 7 Unit 8 Unit 14 Unit 2 Unit 9 Unit 3 Unit 15 Unit 4 Unit 10 Unit 16 Unit 5 Unit 11 Unit 17 Unit 6 Unit 12 Unit 18 • If we look at the internal representations formed by the network, we get an idea of why it behaves as it does...

Internal Representations Training “Day”: 6 Unit 1 Unit 13 Unit 7 Unit 2 Unit 14 Unit 8 Unit 3 Unit 15 Unit 9 Unit 4 Unit 10 Unit 16 Unit 5 Unit 11 Unit 17 Unit 6 Unit 12 Unit 18 • If we look at the internal representations formed by the network, we get an idea of why it behaves as it does...

A Bayesian Approach to the Reading Process: From Networks to Human Data