1 / 62

Artificial Dreams Lecture Series in Cognitive Science

Artificial Dreams Lecture Series in Cognitive Science. Danial Qaurooni-Fard Cognitive Science Group Amirkabir University of Technology Winter 2009. Outline. Language Learning: Connectionist AI. A Survey of Connectionist Models of Language Learning. Connectionism. Approach: Connectionism

Télécharger la présentation

Artificial Dreams Lecture Series in Cognitive Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial DreamsLecture Series in Cognitive Science DanialQaurooni-Fard Cognitive Science Group Amirkabir University of Technology Winter 2009.

  2. Outline

  3. Language Learning: Connectionist AI A Survey of Connectionist Models of Language Learning

  4. Connectionism • Approach: Connectionism • Main theme: Neural-inspired networks • Example systems: (Language Learning) • English past tense (Plunkett and Marchman 1991) • NETtalk (Sejnowski and Rosenberg 1980s) • SRNs (Elman 1990) • LSA (Landauer, Folz and Laham 1998) • Aphasia Model (Dell 1997)

  5. Language and AI • Language in Microworlds • Language as a Canned Product • Language as a set of Subtasks

  6. 1st: Language in Microworlds • Limiting language to specific domains. • 1967: STUDENT • Solved simple algebraic problems. • “What’s 5 plus 4?” • 1972: SHRDLU • Simulation of a robotic hand that worked with colored geometrical objects. • “Find a block which is taller than the one that you are holding and put it into a box.” • 1997: Jupiter • Weather forecasting system. • “Is Boston Cloudy today?”

  7. 2nd: Language as a Canned Product • Engage in “natural” conversation with a limited vocabulary that seems unlimited! • 1965: Eliza • Psychiatrist. • 2002: Claire • “Virtual service representative” for a telephone company. • “Let me get someone to help you!” • 1960s-Present: Translation • Inception: 1957 launch of Sputnik. • “The spirit is willing but the flesh is weak.” • 1960s: “The vodka is good but the meat is rotten.” • 2003: “The spirit is ready, but the meat is weak.”

  8. 3rd: Language as a Set of Subtasks • Break the problem into a set of subtasks like • speech processing • text reading • grammar acquisition … and the trainee will pick up patterns. • Roughly the tack of connectionists. • Why study language and connectionism? • Connectionism has fared well • But maybe such tasks as language and reasoning cannot be accomplished by associative methods alone • So maybe connectionists are unlikely to match the performance of classical models at explaining these higher-level cognitive abilities.

  9. Connectionism • 1960s: Rule-and-symbol AI: CYC • 1980s: PDP(Parallel Distributed Processing) • Neural Networks: At least distantly inspired by the architecture of the brain. • Abstracted away: • Multiplicity of types of neurons and synapses • Use of temporal properties • Connectivity constraints • Move the vocabularies of the various sciences of the mind closer together.

  10. Connectionism • Text-to-phoneme conversions: • DECtalk vs. NETtalk • Neither “understood” anything! • Connectionist models: Both a boon and a burden: • “Good at Frisbee, bad at logic” • Boon: motor control, face recognition, reading handwritten zip codes! • Burden: sequential reasoning, long-term planning, logic. • Substitutes pattern recognition for classical reasoning. • Humans ARE usually better at Frisbee than at logic.

  11. Simple 3 Layer Network

  12. A Connectionist at Work • Case of past tenses of English verbs. • Regular formation: stem + “ed” • Irregulars: • No change: hit >> hit • Vowel change: ring >> rang • Arbitrary: go >> went • Overregularization in children(“go” + “ed” >> “goed”) • U-shaped learning profile. • Nativists: rules and associative memory. • Language Acquisition Paradox • Universal Grammar. • Connectionists: (Plunkett and Marchman 1991) Mimic the u-shaped learning curve.

  13. Plunkett and Marchman(1991) • Standard feed forward network • Maps a phonological representation of the stem to a phonological representation of the past tense • Initially: 10 regular & 10 irregulars • Total: 500 stems, 90% regular • Final model successfully learned the 500 verbs in the training set 20 phonological units 30 hidden units 20 phonological units

  14. What P&M Had To Do • Decide on a manner of breaking the domain into objects, features … - in this case, into verb stems, suffixes, and inflections; • Decide on encoding and presentation of the above to the network; • Design the architecture – that is the number of layers, nodes etc; • Decide on the activation rule, the output function, the learning regimen, and so on; • Select a n appropriate corpus of data – in this case effective combination of regular and irregular verbs; • Carefully control the order and frequency of the presentation of the verbs to the network; • Train network on a sample of five hundred verbs; • Decide on the number of times (“epochs”) a set of input data should be presented to the network

  15. Connectionism: Intuitions • Architecture-over-function • Decoupling • Learning • Pre-wiring • Neural Reductionism • Adequacy

  16. Basic Intuitions A 1st Pass on Connectionism

  17. Connectionism: Basic Intuitions • Architecture over function • Mimic human brains • Cognition isarchitecture-dependent • Architecture is primary and function is secondary. • U-shaped learning profile.

  18. Connectionism: Basic Intuitions • Decoupling: Connected inside, disconnected from outside • Still representational, but of a distributed, implicit kind. • Eventually, certain groups of neurons behave as if they encode certain features. • Emergent Behavior • Constraint-satisfaction rather than goal-achievement • “Soft” constraints can be satisfied due to large degrees of freedom. • Survive “attacks”, “lesions” and “decay”.

  19. Connectionism: Basic Intuitions • Learning • Useful distinctions have to be made. • Inputs are not passive! • The flat surface of a rock provides different “affordances” • “climbability” for a deer • “sittability” for a hiking human • “steppability” for the same human crossing a creek • “the primitive units [categories] are not input, nor are they built in as primitives.”

  20. Learning • Learning is generalization. • “The capability for generalization in human beings crucially involves the ability to reperceive and rearrange things in novel ways.” • Flexibility requires a hierarchy. • Connectionists try to break up the homogeneity, add new layers and specialized modules. • 1990s: Add context layers.

  21. Recurrent Networks A New Approach

  22. Elman’s Recurrent Networks • Most linguistic behavior happens in time. • Classic connectionist models receive input all at once (i.e. Plunkett and Marchman’s past-tense learning model) • Recurrent networks take the internal state and copy it to the input, creating “memory”. • SRN: |input|= |original input| + |hidden layer|

  23. SRN

  24. Dissecting Language • Segmentation problem: How do children discover the atoms of language? • Words, morphemes, phonemes… • These atoms are NOT given. • Distinctions are often murky and unclear. • Dissecting language is a metalinguistic task.

  25. Elman: Discovering “Word” • “Word”: chunks or clusters of letters. • Network Structure: SRN with 5 input, 20 hidden,5 output and 20 context units. • Input: bit vector of length 5. • Training: 200 sentences were generated and concatenated to form 1,270 words or roughly 4,963 letters. • “Many years ago a boy and a girl lived together” • Output: bit vector of length 5.

  26. Discovering “Word” • After 10 epochs the network started to make predictions. • More errors at word boundaries than within words. • Cannot possibly be mere co-occurrence statistics? • Still not a model of word acquisition. • A cue to the boundaries that define the units which must be learned.

  27. Discovering “Lexical Classes” • Elman next considered the problem of discovering nouns, verbs, etc. • Network Structure: 31 input, 150 context and hidden and 31 output units. • Input: 31-bit vectors representing 29 words. • 10,000 two- and three-word sentences were generated. • Output: A 31-bit vector. • Different types of verbs and nouns were used: transitive/ intransitive, perception/sensation, human/animate/inanimate…

  28. Discovering “Lexical Classes” • After 6 epochs the output achieved the desired level of accuracy. • Using hierarchical clustering analysis revealed the following clusters: • 1st level: Words denoting humans vs. animals. • 2nd level: Words denoting animate vs. inanimate objects. • 3rd level: Words denoting nouns vs. verbs. • …

  29. Pre-wiring A 2nd Pass on Connectionism

  30. Connectionism: Basic Intuitions • Pre-wiring: Advanced Tinkering • Experiment with embedded sentence structures.(Hierarchy again!) • “Boys who chase dogs see girls.” • Task: Predict the next word. • Tinkering with the Input: Incremental increase in input complexity. • Tinkering with the Network: Incremental increase in network complexity. • Allow the network to go through maturational changes. • In this case: Increase memory capacity.

  31. Starting Small: Less Is More • Constrain the “solution space” • The learner deals with limited variance. • i.e. variance in number, grammatical category, verb type etc. • “The girl who the dogs that I chased down the block frightened ran away.” • This makes further learning easier. (or even possible?)

  32. SRN

  33. Hidden Unit Space(3 of 70 dimensions) • The “learned” network partitions the state space such that certain spatial dimensions signal • Differences between nouns and verbs • Singular vs. plural • Depth of Embedding • …

  34. Starting Large • Rhode and Plaut(1999) • Reported opposite conclusions(!) with a similar task, input, and architecture to Elman’s. • Starting large: They employed “a more naturalistic language…through the addition of semantic constraints.” • Co-occurrence of certain verbs and nouns • Transitive verbs only act on certain objects. • No gradual increase either in complexity or capacity. • Starting small or large?

  35. How To Evaluate? • Rhode and Plaut: • “there was significant advantage for starting with the full language.” • “we do not yet fully understand what led Elman to succeed in these simulations when we failed.” • Elman’s networks were not allowed enough training time. • Elman’s chosen learning parameters resulted in poor performance. • Given appropriate training parameters, an SRN can effectively learn without external preparation.

  36. Nativist vs. Statistical Approaches • Frequency of occurrence or occurrence per se? • Chomsky’s competence/performance distinction. • Connectionists erase the distinction and lose the evaluation criteria with it! • Are connectionist models too idealized and abstract in terms of meaning and context • Elman’s reply: Case of “man” and “zog”. • Starting Small/Large hypothesis vs. LAD

  37. 2 Approaches to Neural Networks • Symbolic Approximation: (ImplementationalConnectionsim) • Symbolic theories are roughly to connectionist models what classical mechanics is to quantum mechanics. • The former is a high-level compression of the latter. • Statistical Inference Machines: • Language as a Bag of Words: LSA.

  38. Latent Semantic Analysis • Landauer, Folz and Laham: 1998. • LSA provides “a method for determining the similarity of meaning of words and passages by analysis of large text corpora”. • The meaning of a word is “a kind of average of the meaning of all the passages in which it appears,” and the meaning of a passage is “a kind of average of the meaning of all the words it contains”.

  39. LSA Processing steps words word-by-context matrix reconstruction matrix Rank Lowering documents ‘concepts’

  40. LSA: Rank Lowering • The low-rank approximation may be preferred because the original matrix • May be too large to compute • Is presumed noisy • Overly sparse. • Thus it mitigates • Polysemy: Components of polysemous words are added to the components of words that share the meaning. • Synonomy: Expected to merge the dimensions ofwords associated with similar meanings.

  41. LSA: A Bag of Words A word-by-context matrix

  42. LSA: A Bag of Words Reconstruction matrix

  43. LSA: A Bag of Words • Based on constructing matrices that contain information about the correlations among words and passages. • The dot product of gives the correlation between two terms. The matrix product contains all the correlation. • Enter Vectorial semantics.

  44. LSA: SVD • Singular Value Decomposition (SVD): Assume that there exists a decomposition of X such that U and V are orthogonal and is a diagonal matrix: X = UΣVT Left Singular Vector Right Singular Vector Singular Values

  45. LSA: SVD • When the k largest singular values and their left and right singular vectors are selected, rank k approximation to X is achieved. Also this • has minimal error rate and • creates a concept space. Left Singular Vector Right Singular Vector Singular Values

  46. LSA Processing steps words word-by-context matrix reconstruction matrix documents ‘concepts’

  47. LSA: Applications • Semantic Clustering: “physician”, “bedside” and “patient”. • Finding similar documents across languages. • Finding relations between terms. • Trivia: LSA scored 60% on a multiple choice psychology comprehension tests after educating itself.

  48. Of Nuns, Sex and Content • LSA might be regarded as • A tool for text analysis. • A model for acquisition and representation of knowledge. • LSA, worries: • Word order? • Context? • Landauer and Dumais (1997) “One might consider LSA’s maximal knowledge of the world to be analogous to a well-read nun’s knowledge of sex, a level of knowledge often deemed a sufficient basis for advising young”

  49. Adequacy A 3rd Pass on Connectionism

  50. Adequacy • Levelt(1989) recognizes 3 components of language production: Takes a “message” and turns it into liguistic form Formation Formulation Articulation A “message”, a nonverbal representation, is formed. Movement of the articulatory organs for producing sounds

More Related