1 / 29

Distributed Representations

Distributed Representations. Psych 85-419/719 Feb 8, 2001. Distributed vs. Localist Representations. With localist representations, a given object is represented as a single unit. With distributed representations, a given object is represented by a pattern over multiple units

morela
Télécharger la présentation

Distributed Representations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Representations Psych 85-419/719 Feb 8, 2001

  2. Distributed vs. LocalistRepresentations • With localist representations, a given object is represented as a single unit. • With distributed representations, a given object is represented by a pattern over multiple units • Typically, there is sharing of units between objects.

  3. dog dog cat tree has has is furry cat is legs furry noisy leaves legs Whether A Representation is Localist or Distributed DependsOn What Level We’re Looking At Distributed at sentence level, localist at word level

  4. Same Applies At Different Levels of Analysis • Can represent a word as a set of letter units • Distributed at the word level • Localist at the letter level • Can represent a letter as a set of features • Distributed at the letter level • Localist at the feature level • Do we always have to have some localist level?

  5. Efficiency • Encoding n objects using a localist representation requires n units • Encoding n objects using a distributed representation generally requires fewer • At least log2(n) units (if binary) • In practice, depends on the scheme you use • Chomsky & Halle’s features: 25 features to encode 35 phonemes • Ladefoged’s: 15 features for 35 phonemes • Hare et al: 11 features for 35 phonemes

  6. Tradeoff of Size and Speed • Time to train a network is # of examples times # of weights times # of iterations through training set we need • Localist representations tend to require fewer training iterations • … but generally require more units

  7. Tradeoff of Speed andScalability • With localist representations, must allocate a new unit for each new object we want to learn about. • With distributed units, existing units (and weights) can be used. • Learning new item involves adjusting weights.

  8. Tradeoff of Speed and Robustness • With localist units, when a unit is damaged, the object it represents is lost • Response to damage is not gradual, nor spread across classes of items • Humans don’t exhibit this property • Brain damage tends to result in gradual degradation, and across sets of related things (knowledge of animals, tools, etc.)

  9. Choice of Representation Can Influence Speed of Learning • Ex: past tense of English verbs. • Can end with a /d/ sound (blamed), a /t/ sound (baked), or /Id/ (painted) • Cued by preceedingsound • Representing with localist phonemes: rule is complex (d, g, m, n,v … -> /d/; k, f, s … -> /t/, etc.) • But all the /d/ items are voiced; /t/ ones are unvoiced

  10. Choice of Representation Can Influence Quality of Generalization • Suppose a child had never heard a verb ending in the /CH/ phoneme • (like butch: “Rosie O’Donnel out-butched Tom Sellek in the interview”) • Would not be able to form correct past tense, if using was localist phonemes • But: if representation was distributed, phonemes and encoded voicing, correct generalization could be made.

  11. An Example of Choices MadeWhen Designing a Representation • Observation: in English, the rule for pronouncing the vowel is often keyed by the rhyme (part of word from the vowel on) • Take words with an oo for the vowel: • cook, took, look, book, shook • fool, tool, drool, cool, school

  12. ake ake at at at b c c t t b c t at ake at We Could Use That...

  13. .. But That Has Problems • Sometimes people generalize over other units of analysis: • Ex: wh is usually pronounced like what, where, why • But: who--- is often pronounced with a h sound (who, whom, whole, whose, whore) • How do you pronounce the name of the fish market Wholey’s in Pittsburgh?

  14. .. And Other Problems • Sometimes you encounter rhymes you haven’t seen before (e.g., Dolph) • … or onsets you haven’t seen (e.g., phlange)

  15. One Plan... • Represent things at the lowest level you think might be necessary • Allow system to learn to extract higher level regularities • Requires learning rule that can extract such regularities

  16. Another Plan... • Represent objects at multiple levels of abstraction • Ex: Coding letters, but also onsets and rhymes • Requires learning rule that can attend to different sources of information, without one dominating.

  17. Distributed Representations on Input and Output CAT +animal, +feline, +meows CATS +animal, +feline, +meows, +plural CUP +container, +beverage CUPS +container, +beverage, +plural WUG +pointy_head, ... WUGS +pointy_head, …, +plural

  18. The “Starvation” Problem • If an input unit is never active during training, what happens to the weights projecting from that unit? • Need to choose a representational scheme such that units don’t starve • Using distributed representations helps, but doesn’t guarantee we totally solve the problem

  19. The Binding Problem blue square circle green

  20. A Bad Solution • Conjunctive Coding: allocate a unit for each possible combination. • Way too expensive. Explodes exponentially with number of objects and features you have to represent. • Also doesn’t support decent generalization • Leads to starvation

  21. A Different Solution • Coarse coding: Have different units sensive to properties in different regions of space • Human vision is a bit like this: • Early on in visual stream, units are sensitive to features (oriented line segments, for example) in small receptive fields • Higher up, the size of receptive fields gets larger

  22. Potential Problems • How to know how big to make receptive fields? • Harder to encode a lot of features; resolution/accuracy tradeoff • Generalization?

  23. Another Possibility:Temporal Binding blue square circle green

  24. Using Slots • Try to assign roles to objects by using different collections of units for each role • Problem: generalizing across slots • phone, sphere, phrase 1st Ltr 2nd Ltr 3rd Ltr

  25. mammal hammer carpentry living bird saw dog cat physical social tool Traditional View of Semantics action thing

  26. Hierarchical Knowledge in Distributed Reps • We could instantiate this hierarchy directly • .. Or, allow much more wide connectivity. Let system learn relationships between concepts • .. Or, do it implicitly (see example from text)

  27. Discovered vs. StipulatedRepresentations • Instead of deciding what our representations should be to learn a given task, we could learn what representations work best for that task n <n n

  28. spelling Ex: Plaut & KelloPhonology Model meaning “phonology” sound articulation

  29. For Next Time • Read PDP2, Ch 18, “On Learning the Past Tense of English Verbs” • Skim handout: Pinker & Prince 1988

More Related