Subset Principle in Language Acquisition Theory

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 5. The Subset Principle: Essential but impossible?

Today: How far to generalize? • If there is no reliable source of negative data for correcting overgeneralization errors, they must be prevented in advance. • UG cannot do this alone. Not all languages have the samebreadth of generalizations (e.g. Verb-Second / aux-inversion). The learner must choose how far to generalize, from among the UG-compatible (parametric) alternatives. • Additional sources of innate guidance = strategies of the learning mechanism: Uniqueness Principle (Class 3) Subset Principle: “Guess conservatively” (Class1) • CRUCIAL QUESTION: Can the Subset Principle (SP) help learners to find the right balance between conservative (safe) learning and rapid progress toward the target grammar?

Generalize how far? which way? • Children don’t have unlimited memory resources. This makes wise grammar choice more difficult. • A standard assumption: No storage of raw input.  Child cannot contemplate a huge stored collection of examples and look for patterns (as linguists do). • Incremental learning = retain or change grammar hypothesis after each input sentence. Memory-free! • So, on hearing a novel sentence, a child must adopt it and some other sentences into her language. (Children don’t acquire onlysentences they’ve heard.) But which other sentences? • What size and shape is the generalization a learner formulates, based on that individual input sentence?

LM strategies must also be innate • Even with UG, a learner encountering a novel sentence has to choose from among many possible generalizations. • Even in the P&P framework, the number of possible grammars is in the millions or billions (230 ≈ a billion). • The choices that LM makes are evidently not random. All children exposed to a target language arrive at more or less the same grammar (as far as we can tell). • Conclusion: Guessing strategies must also be genetically given. (Perhaps these correspond to linguistic markedness.) • Traditional term for this: an evaluation metric. An important component of the learning mechanism (Chomsky, 1965). It prioritizes grammar hypotheses. • RESEARCH GOAL: Specify the evaluation metric. Is it specific to language, or domain-general (e.g., simplicity)?

Evaluation metric: Err in the direction of under-generalization • If LM undergeneralizes based on one input, the grammar can be broadened later as more examples are heard. • And even if not, the consequences are not disastrous: as an adult, the learner might lack some means of expression in the language. (Would anyone notice?)E.g., someone might not acquire the English subjunctive. • By contrast, if LM generalizes too broadly from an input sentence, with insufficient negative data to correct it later, it becomes a permanent error. Very noticeable if it occurs! • As an adult, s/he would utter incorrect sentences, e.g. *Went she home? I met the boy *(who) loves Mary.In L1 acquisition, this happens rarely, if ever. 

Overgeneralization rarely occurs • Children rarely overgeneralize syntactic patterns (despite informal impressions; Snyder 2011). • There are reports in the literature (e.g. Bowerman: *Brush me my hair), but remarkably few (considering how noticeable/cute these errors are). And most are lexical: a wrong subcategorization for the verb brush. • More research needed: Search the CHILDES database to establish the frequency and type of overgeneralization errors evidenced in children’s spontaneous speech. What proportion of them are pure syntax? E.g., How good a dancer is he? * How good dancers are they? • In morphology, there are many overgeneralizations: *foots, *runned. But these can be driven out later by correct forms (Uniqueness Principle, Class 3).

SP’s job is to fend off overgeneralization • SP informal statement: “…the learner must guess the smallest possible language compatible with the input at each step”. (Clark 1992) • JDF: Good thought, but imprecise. If 2 input-compatible languages intersect, LM may freely guess either.  • But it’s in the right spirit: A ‘small’ guess is always safer. It guarantees that if the guess was wrong, there will be a positive datum (trigger) to put it right later. • However: Unexpectedly, we will see that hugging the input tightly with small guesses can impede learning! • A paradox that we must solve: SP is essential for learning without negative evidence. But it impedes acquisition of valid linguistic generalizations.

SP: Adopt a ‘smallest superset’ of the input • On hearing s, the learner may hypothesize either L1 or L2, but not L3. A ‘smallest superset’ of some set S of sentences is a language which has no proper subset that contains S. Both L1 and L2 are smallest supersets of sentence s. L3 Assume these are the only languages permitted by UG which contain s. s L2 L1

SP: Adopt a ‘smallest superset’ of the input • On hearing s’, the learner must hypothesize L2, not L3. • Hypothesize L3 only on hearing s”.L3 is now the smallest superset of the current input, s”. L3 Assume these are the only languages permitted by UG which contain s, s’ and/or s”. s s’ s” L2 L1

SP works badly, in older TG framework • In terms of transformational rules: SP would favor the maximum context restrictions on any transformation:  CORRECT: Invert aux in the context of a fronted negative. Not once did I think of my own safety.  WRONG : Invert aux in the context of a fronted element. *At once did I think of my own safety. • SP favors maximum feature specifications for what fronts. CORRECT: Invert [+AUX, +V, -N].  WRONG: Invert [+V, -N]. *Not once thought I of my own safety. • In other words: In a rule-based framework, SP requires the learner to prefer the most complex rules! Simplicity yields generality. Not a plausible evaluation metric.

Subset Principle in a parametric model • Good: P&P theory doesn’t suffer from that dangerous relation between simplicity (good) and over-generality (incurable). Because all P-values are equally simple. • What is required: If parameter triggering is to be automatic & effortless (as the theory claims), LM must be able to tell without effort when a subset-superset choice presents itself, and which of the competing grammars yields the subset. • Is it sufficient for each parameter to have a default value? Simple Defaults Model. (Manzini & Wexler 1987) • That would make it easy to obey SP! If both values of a parameter P are compatible with the input, adopt the default. • But no. Natural language examples suggest the parameters must also be priority-ordered with respect to each other. (See below: the Ordered Defaults Model)

First, let’s check: Do s-s relations actually occur in the natural language domain? • Yes. Any parameter with values optionalvsobligatory. • Also, every historical process of either addition or loss (but not both at once!) creates a s-s relation. Imagine: Italian with split-DP (as in Latin). Adji Verb Nouni • Actual example: Expansion of the 's genitive since late Middle English, without loss of the of-genitive. • But not just s-s relations between whole languages. Must avoid superset errors parameter by parameter. • E.g. If target is English, don’t adopt long-distance anaphora – even if that language as a whole isn’t a superset of English (e.g., learner has no passives yet). • Why? Because there’s no link between LDA and Passive that could trigger a retreat on LDA.

Binding theory parameters (Manzini & Wexler,1987) • Binding principles: Anaphors must be locally bound. Pronouns must be non-locally bound. What counts as local? • 5 degrees of locality. An anaphor must be bound in the minimal domain that includes it & a governor for it, & has: a. A subject  more local (fewer anaphors, more pronouns) b. An Infl c. A tense d. A ‘referential’ tense e. a ‘root’ tense  less local (more anaphors, fewer pronouns) • Creates nested subset languages (5-valued parameter!). Proof that s-s relations exist in natural languages. • Other M&W assumptions: The Independence Principle (“the subset relations that are determined by the values of a parameter hold no matter what the values of the other parameters are.”)  M&W assumed theSimple Defaults Model

Do children actually apply SP? • There are cases in which it is claimed that children do not obey SP: they overgeneralize and then retreat. • Chien & Wexler (1990) found that children apply Principle A reliably, at least by age 6; but they make Principle B errors. Mama bear washed her. Compatible w. picture of self-washing. • Is Principle B not innate? Does it mature late? If it’s innate and accessible early, this is a clear violation of SP. • HOWEVER: C&W also found that the children did not make mistakes when a quantified antecedent: Every bear washes her. • Explanation: Principle B blocks co-indexation of a pronoun with a local antecedent. But only a pragmatic principle prevents contra-indexed expressions from having the same referent. It is this pragmatic principle that the children don’t yet know. They do well with the quantified examples because that’s not a matter of coreference at all.

Do children apply SP? Another example. • Déprez & Pierce (1993) claim that children sometimes overgenerate (= SP violation), and later retreat. • At approx 1½ to 3 years, learners of English have optional raising of the subject from its VP-internal base position. No my play my puppet.NegSubj Verb… (subj is low)He no bite you. SubjNeg Verb… (subj has raised) • “…the errors described here as the result of failure to raise the subject out of the verb phrase are attributed to a valid grammatical option to assign nominative Case to that position” • How do these learners retreat later? Not clearly addressed. • “earlier stages may be stages in which the grammar has yet to ‘stabilize’ on unique settings of various parameters” • So: Is this a real SP-violation? Or just vacillation between two values (maybe due to confusion in analyzing input - “no”)?

SP: summary so far • Subset-superset relations do hold among natural languages. • Not all are attributable to a subset/superset choice within a single parameter. (See examples, next slide.) • A Subset Principle that excludes global subset relations between parameters is more complex for a learner to impose. • Nevertheless, child learners in fact commit few or no superset errors. Some apparent violations of SP are explicable in other terms (e.g., pragmatic immaturity; input confusion). • So our next question: How do they do it?

How can LM know what’s a superset to avoid? • Simple Defaults Model: Easy to read off s-s relations.Prefer 0 over 1. (0 = default; 1 = marked value) • E.g. Grammar 01101 licenses languages that have proper subsets licensed by grammars 00101, 01001, 01100, and their proper subsets: 00001, 00100, 01000 and 0000. • So avoid 01101 if any of these others fits the input. • But in the natural language domain, there are s-srelations that exceed bounds of a single parameter. • Ex-1: WH-fronting (subset) versus Scrambling that includes scrambling of wh-items (superset; WH may be initial or not). A choice for learner to make: English versus Japanese. • Ex-2: Optional topic (subset) versus obligatory topic that is optionally null(superset; obligatory XPs can be missing). A choice for learner to make: English versus German.

Ordered Defaults Model(Fodor & Sakas) • Child hears: Which apples shall we buy?Decision: Which parameter to set to its marked value? • SP requires the prioritization: (0 = default; 1 = marked value) • How could this crucial prioritization be economically mentally represented by learners? • One way: Innately order the parameters such that parameters earlier in the ordering get set to their marked values before parameters later in the ordering. Cf. Gold’s ‘enumeration’! • Priority: 10000 > 01000 > 00100 etc. And for two marked values: 11000 > 10100 > 10010 etc. 0 Wh-movt 0 Scrambling 1 Wh-movt 0 Scrambling 0 Wh-movt 1 Scrambling  

But: Where does the parameter ordering come from? • Learning strategies cannot be learned! Must be innate. • So: Learners couldn’t learn this parameter ordering that enforces SP. So what is its source? • Could it be innate, supplied by UG? But HOW? (a) Could evolution have crafted a precise mental map of the s-s relationships between all possible natural languages? (b) Or could the LM calculate these complex subset-superset relations on-line? Surely a serious computational overload! • Neither is plausible! Perhaps instead, the ‘black hole’ explanation. There is an innate mental ordering but it’s completely arbitrary, not shaped by evolution. The learnable languages (which are the only ones linguists know about) are the ones that just happen to precede their supersets.

SP is necessary, but can be harmful too • SP can guide learners to making safe guesses, where a choice must be made in generalizing from a specific input. It can prevent learners from overshooting the target lg.  • But: it interacts badly with incremental learning.  • For an incremental learner, SP may cause learning failures due to undershooting the target language. • When setting a parameter to accommodate a new sentence, SP requires LM to give up much of what it had acquired from previous input. (Example below.) We call this retrenchment. • Why retrench? Because the combination of a previously hypothesized language plus a new hypothesis might be a superset of the target, violating SP. (See diagram). Must clean up, when abandoning a falsified grammar.

SP demands maximum retrenchment!  Learner first heard s, and happened to guessLcurrent.  Learner now hears t, so realizes that Lcurrent is wrong.  Learner doesn’t know which parts of Lcurrent are wrong.  So to be safe, all of it must be discarded, except what input sentence t entails in conjunction with UG.All the striped sentences must be ditched.

Clark’s example (Class 2): retrenchment needed • Without retrenchment, one mistake can lead to another. • Clark (1989) showed a sequence of steps that can end in a superset error, although no one step violates SP. (1) John believes them(ACC) to have left. • Is this exceptional case marking (ECM)? Or structural case assignment (SCA)? (2) They believe themselves to be smart. • Is the anaphor in the binding domain of they? (yes, if ECM) Or is this long-distance anaphora (LDA)? (yes, if SCA)

Without retrenchment  errors • Suppose the target is English (ECM; no LDA). • Dangerous parameter-setting sequence: Guess SCA for sentence (1). Wrong, but no problem. Then guess LDA for sentence (2). Still OK so far! • Later, learner hears an unambiguous trigger for ECM: (3) Sue is believed by John to have left. • On adopting ECM, learner must give up SCA. • And must also give up LDA (which was based on SCA). Otherwise, would mis-generate: (4) * He wants us to like himself.

SP demands massive retrenchment! • Retrenchment must eliminate all sentences except those that LM knows are valid. • In incremental learning (usually assumed), LM cannot consult previously heard sentences. Only the current one! • In P&P framework: SP requires giving up all marked parameter values except those that LM knows are correct. • Unless LM learns only from fully unambiguous triggers (unrealistic?), it can’t be sure that any previously set parameters were set correctly. So give up all that aren’t entailed by the current input – set them back to defaults. • In real-life, retrenchment would be extreme. Child hears “It’s bedtime.” Must discard all knowledge previously acquired except if entailed by that sentence plus UG.

SP demands excessive retrenchment • Child hears “It’s bedtime” and discards topicalization, wh-movement, aux-inversion, etc. • Could re-learn them. But retrenchment applies again and again. No knowledge could be retained for long. • Without SP  overgeneration. With SP  massive undergeneration, if incremental.  • The target grammar could be acquired only if all its parameters were set by a single input sentence!  • This disastrous interaction between SP and incremental learning wasn’t noticed prior to Fodor & Sakas (2005). • Most SP discussions implicitly assume without comment that LM has unlimited access to prior input.

How could SP be reconciled with incrementality? • How can retrenchment be tamed? By adding memory, of one kind or another. • Memory for previous inputs? Too many? But maybe ok to store only inputs that triggered a parameter change.  Problem: Now, LM has to fit parameter values to a collection of accumulated input sentences. Could that be done by triggering?? • Instead: Memory for previously disconfirmed grammars? Then SP demands only that LM adopt a ‘smallest language’ which fits the current input sentence and whichhasn’t yet been disconfirmed. The ‘smallest’ would get larger and larger as learning proceeds  less retrenching.  Problem: How to mentally represent which grammars have been disconfirmed? A huge list? A point on an enumeration? – but Pinker’s objection!

A lattice representing all s-s relations • This (as shown) is less than 1% of the total lattice for the 3,072 languages in the CoLAG domain. But only 7-deep. LM must select a grammar from the lowest level.  When a grammar is disconfirmed, delete it. Lattice shrinks from the bottom up  larger languages. Supersets  Fodor, Sakas & Hoskey 2007 Subsets 

The lattice model would work, but…. • But, a very ad hoc solution. Why would this huge data structure be part of a child’s innate endowment? • Worse still: It codes relations between E-languages, whereas we would like to assume that the human mind stores only I-language facts. • The only alternative to adding memory is: Trust only unambiguous input. • SP does not demand any retrenchment if LM knows that all the marked parameter values it has adopted so far are correct – because they were adopted in response to unambiguous triggers. • We’ll explore this solution next time (Class 6).

Writing assignment • A young child acquires the syntax of Russian in 5 or 6 years. Many linguists have worked for many decades (centuries!) to discover the syntactic properties of Russian. • Are children smarter than linguists? If not, what else might explain this discrepancy? • In what relevant respects do linguists have stronger resources than children? In what respects do children have stronger resources? • 2 or 3 pages, please, to hand in at Monday’s class.

Subset Principle in Language Acquisition Theory

Subset Principle in Language Acquisition Theory

Presentation Transcript

child language acquisition introduction

LANGUAGE ORIGINS SOCIETY ST PETERSBURG 12-18 JULY 19933

Janet Wagner Dean

Introduction to Second Language Acquisition

St. Petersburg

Introduction to Second Language Acquisition

Language Acquisition Department 2013

Janet Wagner Dean

Introduction to Second Language Acquisition

Language Acquisition Theory

Janet Wagner Dean

INTRODUCTION TO SECOND LANGUAGE ACQUISITION

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013

St. Petersburg

Savvichev Alexander St. Petersburg - 2013