The Vexed Problem of Choice: Reflections on Experimental Design and Statistics with Corpora

That vexed problem of choicereflections on experimental design and statisticswith corpora ICAME 33 Leuven 30 May-3 June 2012 Sean Wallis, Jill Bowie and Bas Aarts Survey of English Usage University College London {s.wallis, j.bowie, b.aarts}@ucl.ac.uk

Outline • Introduction • Definitions • Refining baselines and the ratio principle • Surveying ‘absolute’ and ‘relative’ variation • Potential sources of interaction • Employing alternation analysis • Objections • Conclusions

Introduction • Research questions are really about choice • If speakers had no choice about the words or constructions they used, language would be invariant! • Lab experiments • Press button A or button B • Corpus • Speakers may choose constructionAor B • But they can only actually chose one,A, at each point • We have to infer the other type,B, counterfactually • Identifying alternates is often non-trivial

Mutual substitution • Mutual substitution A B • Given a corpus, identify all events of Type A that alternate with events of Type B, such that A is mutually replaceable by B, without altering the meaning of the text. • Replacement • BreplacesAifBincreases, and vice-versa • p(A)+p(B)+... = 1 • Freedom to vary • p(X)  [0, 1] • Ideal: eliminate invariantType Cterms

Mutual substitution • Mutual substitution A B • Pronoun who/whom • A = whom • B = who

Mutual substitution • Mutual substitution A B • Pronoun who/whom • A = whom • B = who(objective) • But whomis limited to objective case • C = who(subjective) • We therefore limit alternation to Objects • If whom is used ‘incorrectly’ as a Subject, it has an additional constraint (social disfavour)

True rate of alternation • True rate of alternation • If A B • p(A | {A, B}) = F(A) F(A)+F(B)

True rate of alternation • True rate of alternation • If A B • p(A | {A, B}) = • Proportion (fraction) of all cases that are Type A • we use p(A) as a shorthand for p(A | {A, B}) if the baseline {A, B} is stated F(A) F(A)+F(B)

True rate of alternation • True rate of alternation • If A B • p(A | {A, B}) = • Proportion (fraction) of all cases that are Type A • we use p(A) as a shorthand for p(A | {A, B}) if the baseline {A, B} is stated • Contingency tables F(A) F(A)+F(B) probability IV DV A B Total p1(A) condition 1 f1(A) f1(B) f1(A)+f1(B) p2(A) condition 2 f2(A) f2(B) f2(A)+f2(B) p(A) Total F(A) F(B) F(A)+F(B)

True rate of alternation • Shall/willalternation over time in DCPSE (Aarts et al., forthcoming)

True rate of alternation • Shall/(will+’ll) alternation over time in DCPSE (Aarts et al., forthcoming)

1 p t 0 True rate of alternation • Logistic ‘S’ curve assumes freedom to vary • p(X)  [0, 1]

True rate of alternation • Logistic ‘S’ curve assumes freedom to vary • p(X)  [0, 1] • as do Wilson confidence intervals 1 p shall/(will+’ll) shall/’ll t 0

B A Refining baselines • Over-general baselines • conflate opportunity and use • ‘normalisation’ per million words • implies that every word other than A is Type B! • is this plausible? • ‘Art’ of experimental design • refine baseline by narrowing dataset • reduce and eliminate non-alternatingType Ccases • optionally: subdivide where different constraints apply • different baselines test different hypotheses • cf. shall / will / ’ll

Refining baselines • Tensed VPs per million words, DCPSE Total: constant over time Diachronic variation: within text categories Synchronic variation: between text categories (Bowie et al., forthcoming)

F(modal) F(tVP) F(modal) F(word) F(tVP) F(word)   The ratio principle • Simple algebra • any sequence of ratios can be reduced to the ratio of the first and last term:

F(modal) F(tVP) F(modal) F(word) F(tVP) F(word)   The ratio principle • Simple algebra • any sequence of ratios can be reduced to the ratio of the first and last term: • we saw that the ratiotVP:wordvaries synchronically and diachronically in DCPSE • we can eliminate this variation by simply focusing onmodal:tVP • usetensed VPs as baseline for modals

F(modal) F(tVP) F(modal) F(word) F(tVP) F(word)   The ratio principle • Simple algebra • any sequence of ratios can be reduced to the ratio of the first and last term: • we saw that the ratiotVP:wordvaries synchronically and diachronically in DCPSE • we can eliminate this variation by simply focusing onmodal:tVP • usetensed VPs as baseline for modals • this baseline is not a strict alternation set • we have not eliminated all Type C terms

p (modal | tVP) p (modal | modal tVP) 0.30 0.04 0.25 0.03 0.20 0.15 0.02 0.10 0.01 0.05 0.00 0.00 will would may might must shall should can could ‘Absolute’ and ‘relative’ variation • Changes in core modals over time in DCPSE Left axis: absolute change as a proportion of tensed VPs Right axis: relative change as a proportion of set of modals (Bowie et al., forthcoming)

Employing alternation analysis • Simple grammatical interaction • Independent and dependent variables are grammatical • mutual substitution concerns the dependent variable

IV DV Total montr ditr exclamative CL(montr, exclam) CL(ditr, exclam) CL(exclam) interrogative CL(montr, inter) CL(ditr, inter) … … … Total CL(montr) CL(ditr) CL Employing alternation analysis • Simple grammatical interaction • Independent and dependent variables are grammatical • mutual substitution concerns the dependent variable • Numerous examples in Nelson et al. 2002 • e.g. clause table: mood  transitivity • not alternation, but survey: could be refined CL(inter)

Employing alternation analysis • Repeating choices: to add or not to add • e.g. repeated decisions to add an attributive AJP to specify a NP head: the tall white ship • A = add AJP • B = don’t add AJP (and stop)

Employing alternation analysis • Repeating choices: to add or not to add • e.g. repeated decisions to add an attributive AJP to specify a NP head: the tall white ship • A = add AJP • B = don’t add AJP (and stop) • Sequential analysis: examine p(A| {A, B}) at each step 0.25 p Conclusion: decision to add an AJP becomes successively more difficult 0.20 0.15 0.10 0.05 0.00 0 1 2 3 4 (Wallis, forthcoming)

Employing alternation analysis • Grammatically diverse alternates • Biber and Gray (forthcoming) investigate evidence for increasing nominalisation • A = nouns that have been derived from verb forms • This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS] • B = verbs that could be nominalised

Employing alternation analysis • Grammatically diverse alternates • Biber and Gray (forthcoming) investigate evidence for increasing nominalisation • A = nouns that have been derived from verb forms • This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS] • B = verbs that could be nominalised • Could just use clauses as baseline • But this is little better than words • Better option is to enumerate types • analysis • prediction • comparison • analyse • predict • compare

Employing alternation analysis • Grammatically diverse alternates • Biber and Gray (forthcoming) investigate evidence for increasing nominalisation • A = nouns that have been derived from verb forms • This paper reports an analysis of Tucker’s central prediction system model and an empirical comparison of it with two competing models. [1965, Acad-NS] • B = verbs that could be nominalised • Could just use clauses as baseline • Better option is to enumerate types • analysis • prediction • comparison • Examine cases: is alternation possible? • analyse • predict • compare

Objections • If this is such a good idea, why isn’t everybody doing it? • Three main objections are made: • alternates are not reliably identifiable • baselines are arbitrarily chosen by the researcher • different constraints apply to different terms (no such thing as free variation)

Alternates are not reliably identifiable? • Identifying alternates can be difficult • phrasal vs. Latinate verbs

Alternates are not reliably identifiable? • Identifying alternates can be difficult • phrasal vs. Latinate verbs • Strategies: • enumerate cases from bottom, up • find Type B cases for each Type A

Alternates are not reliably identifiable? • Identifying alternates can be difficult • phrasal vs. Latinate verbs • Strategies: • enumerate cases from bottom, up • find Type B cases for each Type A put uptolerate 4 put up with it [S1A-037 #1] ?position 3 put your feet up [S1A-032 #21] build, make 3 shacks put up without any planning [S2B-022 #118] display, project 2 put up two… trees [on the screen] [S1B-002 #157] sell 2 put the plant up for sale [W2C-015 #8] propose 2 put [a motion] up [S1B-077 #127] increase 1 put up the poll tax [W2C-009 #3] accommodate 1 we could put up the children [S1A-073 #197] finance 1 put up the money [W2F-007 #36]

Alternates are not reliably identifiable? • Strategies: • enumerate cases from bottom, up • find Type B cases for each Type A

Alternates are not reliably identifiable? • Strategies: • enumerate cases from bottom, up • find Type B cases for each Type A • refine baseline from top, down • start with verbs, eliminate non-alternating Type Cs • Copular verbs • Clitics • Stative verbs • are dynamic verbs the upper bound for alternation with phrasal verbs?

Alternates are not reliably identifiable? • Strategies: • enumerate cases from bottom, up • find Type B cases for each Type A • refine baseline from top, down • start with verbs, eliminate non-alternating Type Cs • Copular verbs • Clitics • Stative verbs • are dynamic verbs the upper bound for alternation with phrasal verbs? • combine strategies: • identify stative verbs lexically • a few verbs are stative and dynamic • check in situ

Baselines are arbitrary? • Is there such an ‘objective’ baseline? • No, but optimum baselines identify where speakers have a real choice:Type Avs.Type B • Baselines are a control • Experimental hypothesis: • the ratio of Type A to the baseline is constant over values of independent variable • Baseline cited as part of experimental reporting • Indeed we can experiment with baselines • e.g. does the present perfectcorrelatemore with past-referring or present-referring VPs?

Comparing baselines • Does the present perfectcorrelate more with past-referring or present-referring VPs?

present Total present perf present non-perf LLC 2,696 33,131 35,827 ICE-GB 2,488 32,114 34,602 Total 5,184 65,245 70,429 past Total present perf other TPM VPs LLC 2,696 18,201 20,897 ICE-GB 2,488 14,293 16,781 Total 5,184 32,494 37,678 Comparing baselines • Does the present perfectcorrelate more with past-referring or present-referring VPs? (Bowie et al., forthcoming)

present Total present perf present non-perf LLC 2,696 33,131 35,827 ICE-GB 2,488 32,114 34,602 Total 5,184 65,245 70,429 past Total present perf other TPM VPs LLC 2,696 18,201 20,897 ICE-GB 2,488 14,293 16,781 Total 5,184 32,494 37,678 Comparing baselines • Does the present perfectcorrelate more with past-referring or present-referring VPs? • Present perfect correlates more withpresent-referring VPs d% = -4.455.13% f’ = 0.0227 c2 = 2.68ns d% = +14.925.47% f’ = 0.0694 c2 = 25.06s (Bowie et al., forthcoming)

Different constraints apply in each case? • Speakers choices are influenced by multiple pressures • to talk about a single ‘choice’ is misleading • there is no such thing as free variation • We are not attempting to infer “the reason” for a particular speaker decision • we are attempting to identify statistically sound • patterns • correlations • trends • across many speakers

Different constraints apply in each case? • Does one or more of these multiple constraints represent a systematic bias on the true rate? • Yes= try to identify it experimentally • No= ‘noise’ • Can focus on subset of cases to restrict different influences • e.g. limit shall/ willby modal semantics • This objection is misplaced: • freedom to vary • grammatical and semantic possibility (potential) • not that choices are free from influence

A competitive ecology? • Not everything is a binary choice • but the same principles apply 100% 100% p p Meanings of THINK Complementation patterns of HOPE hoping that / Ø 80% 80% ‘ cogitate’ 60% 60% hoping to 40% 40% ‘ intend’ 20% 20% quotative hoping for interpretive 0% 0% 1920s 1960s 2000s 1920s 1960s 2000s (Levin, forthcoming)

Conclusions • Researchers need to pay attention to questions of choice and baselines • This does not mean that an observed change is due to a single source • Minimum condition: baseline is a control • statistics evaluate difference from this control • is it a good control? • Alternation studies: baseline is opportunity for making choice under investigation • Word-based baselines should only really be used for comparison with other studies • we should not make statements about choiceunless we investigate that question

Conclusions • ‘Alternation’ can be interpreted • strictly • all Type AsandType Bsidentified and cases checked • generously • small number ofType Cspermitted • Alternation is semantically bounded but grammatical analysis helps identify cases! • We may try different experimental designs, modifying baselines and subsets • many more novel experiments are possible • experimental assumptionsshould always be clearly reported

References ACLW: Aarts, B., J. Close, G. Leech and S.A. Wallis (eds.) (forthcoming). The Verb Phrase in English: Investigating recent language change with corpora. Cambridge: CUP. Preview at www.ucl.ac.uk/english-usage/projects/verb-phrase/book. • Aarts, B., J. Close and S.A. Wallis. forthcoming. Choices over time: methodological issues in investigating current change. ACLW Chapter 2. • Biber, D. and B. Gray. forthcoming. Nominalizing the verb phrase in academic science writing. ACLW Chapter 5. • Bowie, J., S.A. Wallis and B. Aarts, forthcoming. The perfect in spoken English. ACLW Chapter 13. • Levin, M., forthcoming. The progressive verb in modern American English. ACLW Chapter 8. • Nelson, G., S.A. Wallis and B. Aarts. 2002. Exploring Natural Language. Amsterdam: John Benjamins. • Wallis, S.A. forthcoming. Capturing linguistic interaction in a grammar:a method for empirically evaluating the grammar of a parsed corpus.

0.25 eN /100 F (A) 0.2 95 80 60 0.15 40 0.1 20 0.05 5 0 100 1,000 10,000 N Statistical postscript • Type Cs make statistical tests less sensitive • What happens to confidence intervals as we add to F (A)+F (B) = 100 alternating cases? Tests assume freedom to vary (F(A)+F(B)= N) Including Type Csmakes statistical tests conservative

The Vexed Problem of Choice: Reflections on Experimental Design and Statistics with Corpora

The Vexed Problem of Choice: Reflections on Experimental Design and Statistics with Corpora

Presentation Transcript

Statistics of Experimental Design

Experimental Design

Experimental Design

Reflections on the International Seminar on Green Economy and Official Statistics

Experimental Design and Choice modelling

Principles of Experimental Design

Experimental Design

Experimental Design With The Simpsons

Statistics and Problem Solving

Experimental Analysis of Choice

Experimental Design

EXPERIMENTAL DESIGN

Experimental Design for Discrete Choice Experiments

Experimental Design

Design of Experiments and Taguchi Experimental Design

Steps of Experimental Design:

MA in English Linguistics Experimental design and statistics II

Vexed

Why Biologists Need Sampling, Experimental Design, and Statistics

The Nature of Statistics: Experimental Design