The Subset Principle: Consequences and Conspiracies Mayfest 2006, University of Maryland

The Subset Principle: Consequences and Conspiracies Mayfest 2006, University of Maryland William Gregory SakasCity University of New York (CUNY)

Joint work with Janet Dean Fodor and Arthur Hoskey CUNY-CoLAG CUNYComputational Language Acquisition Group http://www.colag.cs.hunter.cuny.edu Fodor J.D.& Sakas, W.G. (2005) The Subset Principle in Syntax: Costs of Compliance Journal of Linguistics 41, 513-569

Overview for today: • Brief introduction to Gold-style learnability • The Subset Principle and Conspiracies • Innate enumeration: A possible way to thwart the conspiracies

G2 G0 G5 G6 G3 Gtarg G5 G4 • Syntax acquisition can be viewed as a state space search • nodes represent grammars including a start state and a target state. • arcs represent a possible change from one hypothesized grammar to another.

sL(G0) sL(G1) sL(G2) sL(G3) sL(Gtarg) sL(G0) sL(G1) sL(G2) sL(G3) Gtarg G1 G2 G3 G0 Gold’s grammar enumeration learner (1967) where s is a sentence drawn from the input sample being encountered by the learner

sL(G0) sL(G1) sL(G2) sL(G3) sL(Gtarg) sL(G0) sL(G1) sL(G2) sL(G3) Gtarg G1 G2 G3 G0 Gold’s grammar enumeration learner (con’t) • the learner is error-driven • error-driven learners converge on the targetin the limit (no oracle required) • no limit on how far along the enumeration the learner can go on a single sentence • in Gold’s original work, s was a set of accumulated sentences • subsets precede supersets in the enumeration

Definitions Learnability- Under what conditions is learning possible? Feasibility - Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? A domain of grammars H, is learnable iff a learner such that  G H,  texts (fair) generable by G, the learner converges on G.

An early learnability result (Gold, 1967) Exposed to input strings of an arbitrary target language Ltarg = L(G) where G H, it is impossible to guarantee that a learner can converge on G if H is any class in the Chomsky hierarchy.

Angluin’s Theorem (1980) A class of grammars H is learnable iff for every languageLi = L(Gi), Gi  H there exists a finite subset D such thatno other language L(G), G H includes D and is included in Li. if this language can be generated bya grammar in H, H is not learnable! L(Gi) L(G) D

Relevant assumptions (for today) about the Learning Mechanism (LM): • Incremental – no memory for either past hypotheses or past input sentences • receives only ‘positive’ inputs from the linguistic environment (the target language) • error-driven

The Subset Principle (SP) Avoid an Overgeneralization Hazard: LM adopts a language hypothesis that is overly general (OverGen–L) from which there is no retreat OverGen-L = a language containing all the sentences of the target and then some; i.e. a superset of the target language. A learning problem noticed early on by Gold (1967)

OverGen-L "Eated." Safe-L "Walked." Targ-L "She walked." "Walked she." "She eated."

An interpretation adopted by many linguists, put forth by Berwick (1985), Manzini and Wexler (1987) (who coined)Subset Principle (SP): LM must never hypothesize a language which is a proper superset of another language that is equally compatible with the available data Given a choice, don’t choose the superset hypothesis! Note that simply avoiding a superset of the target is an impossible strategy since LM has no knowledge of what the target is! Faced with a choice, LM must always pick the subset.

Can SP be obeyed under our Incremental assumption? Well it depends on how one interprets: “available data” Many interesting problems come to light when “standard” psycholinguistically attractive memory constraints (e.g. Incrementality) are applied to LM working under SP.

A Safe Incemental SP Interpretation (SP-d’ in the paper.): Inc-SP:When LM’s current language is incompatible with a new input sentence s, LM should hypothesize a UG-compatible language which is a smallest language containing s. A smallest language is one that contains s and has no proper subset that also contains s Notes: 1) There may be more than one smallest language containing s. If so, any are safe for LM to choose. 2) Our use of “smallest” does not indicate any relation of set size (cardinality). A smallest language might be the “largest” language in the domain!

A Safe Incremental definition of SP (con’t) Inc-SP safely protects from an OverGen-L, but problematic “retrenchment”: Previous facts that were correctly learned may have to be abandoned if the input does not exhibit them; each and every sentence is essentially the first sentence ever encountered (Janet’s focus at a talk here last semester).

Learnability under Inc-SP A domain is not learnable unless for all potential targets, targ-L, targ-L contains a subset-free-trigger Subset-free-trigger (sft) – a sentence, t, that is an element of targ-L but is not in any subsets of targ-L (i.e., targ-L is the smallest language containing t) sft’s are not necessarily an unambiguous trigger so a single encounter does not instantly identify targ-L (Note that sft’s function similarly to an Angluin telltale set of cardinality 1.)

Even a finite domain of languages might be unlearnable if every potential target does not contain an sft Targ-L = Li  Lj Targ-L does not have a subset-free-trigger, since by construction, Li and Lj cover all sentences of Targ-L and both are smaller languages than Targ-L. Lj Li Inc-SP forces LM to chronically oscillate between the two smaller languages, never reaching the target language.

Also, though necessary, sft’s are not sufficient to ensure acquisition Targ-L All the sentences of Targ-L are sft’s, but all three languages stand on equal footing in terms of Inc-SP – it appears to Inc-SP that there are no supersets in the domain so Inc-SP has nothing to say. Lj Li If LM utilizes a metric which evaluates that Li and Lj are “closer” to each other than either is to Targ-L, LM is forced to chronically oscillate between the them, never reaching the target language.

Summary so far: Inc-SPis a safe interpretation of an Incremental version of SP It is necessary for a learner that obeys Inc-SP to make use of subset-free-triggersto assure convergence on the target

Problems with Inc-SP and sft’s • Previous facts are unlearned • – no batch processing allowed, no gradualism, when an sft triggers the target, it might be directly from a very different language • 2) sft’s are not sufficient to attain targ-L • - since some superset-conspiracies are effectively hidden from Inc-SP • 3) linguistically-meaningful domains may lack sfts

SP is basically inconsistent with incremental learning Solution:Give LM Memory for past hypotheses One way might be borrow an important notion from the formal learning community and try and give it psycho-computational reality – give LM: an “innate” enumeration (listing) that posits subset languages in the domain before their supersets An enumeration offers a solution to all three problems on previous slide. How?

Identification by Enumeration Learners • - never taken up by the (psycho)linguistics community. • - Why not? The notion of innateness is certainly not new! • For Gold and many other folk since, LM is endowed not only with an enumeration but the ability to “skip ahead” in the enumeration • to the next language hypothesis compatible with X • (X = the data encountered; could be 1, or more sentences) • - Motivation for this was to determine what a powerfullearner can’t learnto set a strong learnability bound. Very reasonable! • But from the viewpoint of developmental or psychocomputational modeling, we need to explain how • LM can exploit the enumeration.

A psychocomputational parser/learner that could, plausibly, exploit an enumeration. - What would be required of an Enumeration-LM - possible strategy: - get a sentence - if consistent with current hypothesized language, do nothing since error-driven - otherwise, try the next hypothesis in the enumeration. - if still inconsistent, try the next - if still inconsistent, try the next etc. This works – but excessive computation – possibly thousands/millions/billions of grammars to check (See Pinker 1979).

A psychocomputational parser/learner that could, plausibly, exploit an enumeration. - Another strategy - if a sentence is inconsistent with the current hypothesized language, - generate all parses, - order the parses by the grammars that license them - “jump” to the grammar that first appears in the enumeration Pyschologically infeasible: – thousands, even millions, of parses per (ambiguous) sentence.

But we have a model: The Structural Triggers Learner (Fodor, 1998; Sakas & Fodor, 2001) That can fold all the grammars into one supergrammar – and can decodethe generating grammar serially from each parse tree

Parameter value treelets + UG principles Sentence structure Sentence Parser Current grammar • STL Serial Algorithm • At a choice point, go to the pool of parameter treelets, take what is needed (perhaps several possibilities) and fold the chosen treelets into the current grammar.

A psychocomputational parser/learner that could, plausibly, exploit an enumeration. - Another strategy, which might work out: - parse serially with the supergrammar - if no choice point (unambiguous trigger)jump to the target. Stop. - otherwise, complete the parse with the parameter value treelets from the supergrammar - jump to the earliest grammar in the enumeration that is indicated by the parse. This is a “resource-palatable” model, but danger of overshooting in the enumeration into a superset of the target because parse does not yield perfect information :-(

A psychocomputational parser/learner that could, plausibly, exploit an enumeration. One hope is that parsing heuristics or strategies could be ranked in terms of subset-superset relations. parse serially with the supergrammar - if no choice point (unambiguous trigger)jump to the target. Stop. - otherwise, at each choice point, pick the choice that yields will eventually yields a subset grammar rather than a superset. - jump to the earliest grammar in the enumeration that is indicated by the (now full) parse.

Summary: • Gold-style learnability offers bounds on what can and can’t be learned; introduces the fatal consequences of overgeneralization. • The inc-SP is (as far as we know) the only existing safe definition of SP under incrementality assumptions. But needs subset-free-triggers. • Innate enumeration together with a decoding learner can possibly be used to implement a psychologically plausible model of SP, it will help LM avoid learning failures and help to stave off excessive retrenchment. • But still not “the” answer. Suggestions welcome!

Thank you.

The Subset Principle: Consequences and Conspiracies Mayfest 2006, University of Maryland

The Subset Principle: Consequences and Conspiracies Mayfest 2006, University of Maryland

Presentation Transcript

UNIVERSITY OF MARYLAND, BALTIMORE

UNIVERSITY OF MARYLAND, BALTIMORE

Councils and Conspiracies

University of Maryland, Baltimore

Maryland Fire and Rescue Institute University of Maryland

Subset principle

University of Maryland Extension

UNIVERSITY OF MARYLAND, BALTIMORE

Conspiracies

University System of Maryland

University of Maryland Extension

University of Maryland Extension

University of Maryland Extension

UNIVERSITY OF MARYLAND

University of Maryland

University of Maryland Baltimore

University of Maryland

University of Maryland

University of Maryland Extension

Hosted by the University of Maryland and American University

COPD Research at the University of Maryland School of Maryland

University of Maryland