On - Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking

On-Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking this research: Computer Science Department Stanford University kpfleger@cs.stanford.edu ksl.stanford.edu/~kpfleger Karl Pfleger now: Google kpfleger@google.com

Abstract • this talk about compositional models, not compositionality • a compositional hierarchy (CH) is a part-whole hierarchy • predictive CHs are sensitive to statistical properties of the environment and can predict unseen data • this talk discusses on-line learning of such structures • goal: scale automatically from low-level data to higher-level representations • approach: identify frequent patterns in data, enabling the future (hierarchical) discovery of even larger patterns

Outline • Predictive compositional hierarchies • Learning compositional hierarchies • Boltzmann machine embodiment • Hebbian chunking • Key learning adjectives: unsupervised, on-line, data-driven, bottom-up, cumulative • again: not about how to represent transient compositions in one shot (e.g., sentence recognition or production), but about which compositions demonstrated strongly by the environment to learn

Context and Knowledge • context allows lateral inference (fill in the blank, resolve the ambiguity, predict the future/past) • this requires knowledge of common patterns

An Example Compositional Hierarchy • multiple edges can link the same parent and child

Neural Nets and Graphical Models • symmetric recurrent neural networks and graphical models can make general predictions as required • CHs can be encoded into network structure • activation flows along part-whole links • prior work did not learn weights or structure (e.g., IAM)

Structure Learning Strategy • identify frequently occurring patterns of primitives or previously identified patterns, bottom-up • two problems to solve: • how to embody a CH in a predictive representation • how to incrementally grow the CH with new data

Weight Sharing • we need chunks of different sizes (a hierarchy) • duplicated structure with shared weights insures ability to recognize small patterns at any position

A Hybrid Model • Interactive Activation Model of context effects in letter perception (McClelland & Rumelhart) • SEQUITUR (Nevill-Manning): greedy CH learner (chunk as soon as any sequence repeats) • SEQUITUR-IAM: direct encoding of SEQUITUR-induced CH structure in IA-style network • Interesting result: with no weight tuning reproduces many IAM phenomena • hierarchy helps: subword chunks provide added explanation of pronouncable non-word behavior

Categorical Boltzmann Machines • Boltzmann machines have nice probabilistic semantics, but no work embedding CHs in BMs • need to generalize binary variables to categorical • groups (or pools) of nodes represent a variable, with one node per value (instead of one/variable) • connect two pools by connecting all nodes pairwise

Hebb-Rule Based Chunking • add new chunks by growing new hidden nodes • trigger by correlations denoted by large Hebbian weights • promote correlations to 1st-class entities • 1st use of Hebbian dynamics for chunking

Example Learned Chunks • successfully learns a hierarchy of frequent chunks • 2-chunks: he, th, in, nd, an, er, ng, of, ed, hi, ou, ve, st, ly, on, re, as, wa, ll, ha, be, it, co, wi, ur, sh, ow, me, gh, ma, om, wh, by, ut, ch, is, to, ck, fo, ak, ul, at, ac, av, ab, yo, pr, li, br, up, po, im, or, ex, us, ic, ev, un • 3-chunks: the, ing, and, was, her, ver, you, his, hat, wit, for, man, com, oul, hav, ugh, oug, ved, abr, con, red, all, she, eve, vin, uld, ery, hic, ich • improves prediction accuracy with more data • first system to do bottom-up on-line CH learning with statistical sensitivity for arbitrary prediction patterns

Compositional Hierarchies Related Work (in addition to the areas likely represented at this symposium...) • prespecified CHs: HEARSAY-II, IAM (McClelland & Rumelhart ’81) • non-predictive: SEQUITUR (Nevill-Manning ’96), MK10 (Wolff ’75) • SCFG induction: Stolcke & Omohundro ’94, Langley & Stromsten ’00 • hierarchical RL: Ring ’95, Drescher ’93, Andre ’98, Sun & Sessions ’00 • segmentation: Olivier ’68, Redlich ’93, Saffran et al ’96, Brent ’99, Venkataraman ’01, Cohen ’01 • hierarchical HMMs: Fine et al ’98, Murphy ’01 • layered reps. in parameterized nets: Lewicki & Sejnowski ’97, Hinton • speedup learning: Soar, EBL • misc: de Marken, Geman & Potter, Lempel-Ziv compression

Hierarchical Sparse n-grams • part of a larger program on learning compositional hierarchies (Pfleger ’02) • n-grams are non-connectionist, but simpler for direct study of frequent pattern/chunk selection (picking needles from exponential haystack) • hierarchical sparse n-grams: multiwidth n-grams + frequent itemset pruning + on-line learning (Pfleger ’04)

Expressiveness vs. Learning Tradeoff • CHs are the simplest representation capable of embodying unbounded complexity • HHMMs in middle; learning structure is hard (also for SEQUITUR extensions) • work on connectionist symbol processing or sparse distributed representations also has taxonomic aspects; compositional structure learning largely unaddressed; lessons here

Conclusion • constructive node creation rule for Boltzmann machines that specifically builds compositionally hierarchical representations of common patterns • extension of early non-learning CH work • frequent patterns have myriad uses in cognition (memory, communication, learning, associative thought) • future: • combine compositional and taxonomic learning • embed CH learning in other compositional reps (esp. distributed) • fold in goals/reward/transfer to help direct chunking

On - Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking

On - Line Learning of Predictive Compositional Hierarchies by Hebbian Chunking

Presentation Transcript

Supervised Hebbian Learning

Collaborative Learning On-Line

Assignment 3: Hebbian learning

CHUNKING

Chunking

Chunking

Learning Invariances and Hierarchies

Chapter 7 Supervised Hebbian Learning

Hebbian Coincidence Learning

Chunking

PART 5 Supervised Hebbian Learning

Hebbian learning models

Chunking

Hebbian Model Learning

Chunking

Introduction Hebbian learning Generalised Hebbian learning algorithm Competitive learning

Knowledge Management by On-line Learning Communities

Self Organization: Hebbian Learning

Animation of Function of Hebbian learning circuit

On-Line Learning

WK7 – Hebbian Learning

WK7 – Hebbian Learning