Distributed Representations

Distributed Representations Psych 85-419/719 Feb 8, 2001

Distributed vs. LocalistRepresentations • With localist representations, a given object is represented as a single unit. • With distributed representations, a given object is represented by a pattern over multiple units • Typically, there is sharing of units between objects.

dog dog cat tree has has is furry cat is legs furry noisy leaves legs Whether A Representation is Localist or Distributed DependsOn What Level We’re Looking At Distributed at sentence level, localist at word level

Same Applies At Different Levels of Analysis • Can represent a word as a set of letter units • Distributed at the word level • Localist at the letter level • Can represent a letter as a set of features • Distributed at the letter level • Localist at the feature level • Do we always have to have some localist level?

Efficiency • Encoding n objects using a localist representation requires n units • Encoding n objects using a distributed representation generally requires fewer • At least log2(n) units (if binary) • In practice, depends on the scheme you use • Chomsky & Halle’s features: 25 features to encode 35 phonemes • Ladefoged’s: 15 features for 35 phonemes • Hare et al: 11 features for 35 phonemes

Tradeoff of Size and Speed • Time to train a network is # of examples times # of weights times # of iterations through training set we need • Localist representations tend to require fewer training iterations • … but generally require more units

Tradeoff of Speed andScalability • With localist representations, must allocate a new unit for each new object we want to learn about. • With distributed units, existing units (and weights) can be used. • Learning new item involves adjusting weights.

Tradeoff of Speed and Robustness • With localist units, when a unit is damaged, the object it represents is lost • Response to damage is not gradual, nor spread across classes of items • Humans don’t exhibit this property • Brain damage tends to result in gradual degradation, and across sets of related things (knowledge of animals, tools, etc.)

Choice of Representation Can Influence Speed of Learning • Ex: past tense of English verbs. • Can end with a /d/ sound (blamed), a /t/ sound (baked), or /Id/ (painted) • Cued by preceedingsound • Representing with localist phonemes: rule is complex (d, g, m, n,v … -> /d/; k, f, s … -> /t/, etc.) • But all the /d/ items are voiced; /t/ ones are unvoiced

Choice of Representation Can Influence Quality of Generalization • Suppose a child had never heard a verb ending in the /CH/ phoneme • (like butch: “Rosie O’Donnel out-butched Tom Sellek in the interview”) • Would not be able to form correct past tense, if using was localist phonemes • But: if representation was distributed, phonemes and encoded voicing, correct generalization could be made.

An Example of Choices MadeWhen Designing a Representation • Observation: in English, the rule for pronouncing the vowel is often keyed by the rhyme (part of word from the vowel on) • Take words with an oo for the vowel: • cook, took, look, book, shook • fool, tool, drool, cool, school

ake ake at at at b c c t t b c t at ake at We Could Use That...

.. But That Has Problems • Sometimes people generalize over other units of analysis: • Ex: wh is usually pronounced like what, where, why • But: who--- is often pronounced with a h sound (who, whom, whole, whose, whore) • How do you pronounce the name of the fish market Wholey’s in Pittsburgh?

.. And Other Problems • Sometimes you encounter rhymes you haven’t seen before (e.g., Dolph) • … or onsets you haven’t seen (e.g., phlange)

One Plan... • Represent things at the lowest level you think might be necessary • Allow system to learn to extract higher level regularities • Requires learning rule that can extract such regularities

Another Plan... • Represent objects at multiple levels of abstraction • Ex: Coding letters, but also onsets and rhymes • Requires learning rule that can attend to different sources of information, without one dominating.

Distributed Representations on Input and Output CAT +animal, +feline, +meows CATS +animal, +feline, +meows, +plural CUP +container, +beverage CUPS +container, +beverage, +plural WUG +pointy_head, ... WUGS +pointy_head, …, +plural

The “Starvation” Problem • If an input unit is never active during training, what happens to the weights projecting from that unit? • Need to choose a representational scheme such that units don’t starve • Using distributed representations helps, but doesn’t guarantee we totally solve the problem

The Binding Problem blue square circle green

A Bad Solution • Conjunctive Coding: allocate a unit for each possible combination. • Way too expensive. Explodes exponentially with number of objects and features you have to represent. • Also doesn’t support decent generalization • Leads to starvation

A Different Solution • Coarse coding: Have different units sensive to properties in different regions of space • Human vision is a bit like this: • Early on in visual stream, units are sensitive to features (oriented line segments, for example) in small receptive fields • Higher up, the size of receptive fields gets larger

Potential Problems • How to know how big to make receptive fields? • Harder to encode a lot of features; resolution/accuracy tradeoff • Generalization?

Another Possibility:Temporal Binding blue square circle green

Using Slots • Try to assign roles to objects by using different collections of units for each role • Problem: generalizing across slots • phone, sphere, phrase 1st Ltr 2nd Ltr 3rd Ltr

mammal hammer carpentry living bird saw dog cat physical social tool Traditional View of Semantics action thing

Hierarchical Knowledge in Distributed Reps • We could instantiate this hierarchy directly • .. Or, allow much more wide connectivity. Let system learn relationships between concepts • .. Or, do it implicitly (see example from text)

Discovered vs. StipulatedRepresentations • Instead of deciding what our representations should be to learn a given task, we could learn what representations work best for that task n <n n

spelling Ex: Plaut & KelloPhonology Model meaning “phonology” sound articulation

For Next Time • Read PDP2, Ch 18, “On Learning the Past Tense of English Verbs” • Skim handout: Pinker & Prince 1988

Distributed Representations

Distributed Representations

Presentation Transcript

Representations

Knowledge Representations

Alternative Representations

Underspecified Representations

Representations / Models

Multiple representations

Representations

Representations

Data Representations

Nonlinguistic Representations

Nonlinguistic Representations

Data Representations

Representations

Representations

Representations

Intermediate Representations

Dense Distributed Representations

Graphical Representations

Matrix Representations

Group representations

Does the Brain Use Symbols or Distributed Representations?