1 / 37

Thomas Hoffmann (University of Regensburg)

Thomas Hoffmann (University of Regensburg). Corpus and Experimental Data as Corroborating Evidence: The Case of Preposition Placement in English Relative Clauses. Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives University of Tübingen, 02.02.-04.02.2006.

ona
Télécharger la présentation

Thomas Hoffmann (University of Regensburg)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thomas Hoffmann (University of Regensburg) Corpus and Experimental Data as Corroborating Evidence:The Case of Preposition Placement in English Relative Clauses Linguistic Evidence: Empirical, Theoretical, and Computational PerspectivesUniversity of Tübingen, 02.02.-04.02.2006

  2. 1. Introduction: Corpus vs. Introspection We do not need to use intuition in justifying our grammars, and as scientists, we must not use intuition in this way. (Sampson 2001: 135) You don’t take a corpus, you ask questions. […] You can take as many texts as you like, you can take tape recordings, but you’ll never get the answer. (Chomsky in Aarts 2000: 5-6)  Which type of data are we left with then?

  3. 1. Introduction: Corpus vs. Introspection A corpus and an introspection-based approach to linguistics […] can be gainfully viewed as being complementary. (McEnery and Wilson 1996: 16)  corpus and introspection data = corroborating evidence  case study: P placement in English Relative clauses

  4. 1. Introduction: What to Expect • corpora vs. introspection? • categorical corpus data (ICE-GB corpus) • Magnitude Estimation experiment • variable corpus data (ICE-GB corpus) • conclusion

  5. 2. Corpora and Introspection Arguments against corpus data: • “performance” problem: • “negative data” problem: • “homogeneity” problem:  “only use introspection”

  6. 2. Corpora and Introspection Arguments against corpus data: nocorpus • “performance” problem: yet: performance result of competence modern corpora representative • “negative data” problem: yet: only additional (different) data needed • “homogeneity” problem:yet:empirical claim that needs to be investigated  use corpora + additional data type

  7. 2. Corpora and Introspection Arguments against introspection data: • “unnatural data” problem: • “irrefutable data” problem: • “illusion” problem: • “stability” problem:  “only use corpora”

  8. 2. Corpora and Introspection Arguments against introspection data: nointrospection • “unnatural data” problem:yet: only additional (context) data needed • “irrefutable data”:yet: depends only on collection method • “illusion” problem:yet: only additional (natural) data needed • “stability” problem:yet:empirical claim that needs to be investigated  use corpora + additional data type

  9. corpus introspection • natural language • rare phenomena • contextual factors • negative data • unexpected patterns • ungrammaticality = weaknesses of introspection data = weaknesses of corpus data 2. Corpora and Introspection Corpora and introspection are corroborating evidence:

  10. 3. Case Study: Preposition Placement I want a data source ... (1) a. which I can rely on [stranded preposition] b. on which I can rely [pied-piped preposition] driving question: data source for empirical analysis of (1a,b)?

  11. 4. Empirical Study I: Corpus Data • Corpus used: International Corpus of English ICE-GB(Nelson et al. 2002)(educated Present-day BE, written & spoken) • Analysis tool: GOLDVARB computer programme(logistic regression; Robinson et al. 2001) relative influence of various contextual factors (weights:<0.5 = inhibiting factors; >0.5 = favouring)

  12. 4. Empirical Study I: Corpus Data I Pstrand/pied-piped token tested for • finiteness • restrictiveness • relativizer • XP contained in (V / N, e.g. entrance to sth. / Adj, e.g. afraid of sth.) • level of formality • X-PP relationship (Vprepositional, PPLoc_Adjunct, PPMan_Adjunct …) except 2: all factors discussed in literature before, but not w.r.t. interdependence (e.g.Bergh, G. & A. Seppänen. 2000;Trotta 2000)

  13. 4.1 Categorical corpus data raw ICE-GB P-placement data: 1074 finite relative clauses 659 (61.4%) tokens: pied piped 415 (38.6%) tokens: stranded as expected: many categorical effects  accidental vs. systematic gaps?

  14. 4.2 Categorical corpus data: that/Ø ≠ WH-relatives • relativizer: all that/Ø-tokens in ICE-GB stranded 176 that+Pstranded-token (2) a data source on that I can rely 177 Ø+Pstranded-token (3) a data source on ØI can rely  ICE-GB result: expected  implications: (2) = (3)? / that  WH-

  15. 4.3 Categorical corpus data: Constraints on Pstrand 2. X-PP relationship: Literature (e.g. Bergh, G. & A. Seppänen. 2000; Trotta 2000): Pstranding favoured with complement PP disfavoured with adjunct PP ICE-GB data: Pstranding restricted to PPs which add thematic information to predicates/events

  16. 4.3 Categorical corpus data: Constraints on Pstrand 2. X-PP relationship: categorical effect of WH-PPAdjuncts-tokens: a) just P+WH / no that/Ø+P in ICE-GB: manner, degree, frequency & respect PPs, e.g.: • a. the ways in which the satire is achieved <ICE-GB:S1B-014 #5:1:A> b. the ways which/that/Ø the satire is achieved in

  17. 4.3 Categorical corpus data: Constraints on Pstrand 2. X-PP relationship: categorical effect of WH-PPAdjuncts-tokens: b) just P+WH / but that/Ø+P in ICE-GB: subcat. PP (put sth. in/into/under) & locative, affected loc., direction PP adjuncts • a. … the world that I was working in and studying in<ICE-GB:S1A-001 #35:1B> b. … the world in which I was working and studying

  18. 4.3 Categorical corpus data: Constraints on Pstrand Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events • manner & degree adjuncts:compare events “to other possible events of V-ing” (Ernst 2002: 59) • frequency & respect adjuncts: have scope over temporal information (frequency) and truth value of entire clause (respect)  don’t add thematic participant Pstrand with these: systematic gap

  19. 4.3 Categorical corpus data: Constraints on Pstrand Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events • subcat. PP & loc., affected loc., direction PP adjuncts:  add thematic participant WH+Pwith these: accidental gap

  20. 4.3 Categorical corpus data: Constraints on Pstrand Claim: comparison of WH- vs that/Ø shows: P can only be stranded if: PP adds thematic information to predicates/events Comparison of WH- vs that/Ø good evidence, but: still “negative data” problem • further corroborating evidence needed • Introspection: Magnitude Estimation study

  21. 5. Empirical Study II: Magnitude Estimation • relative judgements (reference sentence) • informal, restrictive RCs tested for: P-PLACEMENT (Pstrand, Ppied-piped)RELATIVIZER (WH-, that-, Ø-)X-PP (VPrep, PPTemp/Loc_Adjunct, PPManner/Degree_Adjunct) • tokens counterbalanced: 6 material groups a 18 tokens + 36 filler = 54 tokens • tokens randomized (Web-Exp-software) • N = 36 BE native speakers (sex: 18m, 18f / age: 17-64)

  22. 5. Empirical Study II: Magnitude Estimation 18 filler sentences: ungrammatical a. That’s a tape I sent them that done I’ve myself (word order violation; original source: <ICE-GB:S1A-033 074>) b. There was lots of activity that goes on there (subject contact clause; original source: <ICE-GB:S1A-004 #067>) c. There are so many people who needsphysiotherapy (subject-verb agreement error; original source: <ICE-GB:S1A-003 #027>)

  23. 5. Empirical Study II: Magnitude Estimation ANOVA: significant effects • P-PLACEMENT: F(1,33) = 4.536, p < 0.05 • RELATIVIZER: F(2,66) = 17.149, p < 0.001 • P-PLACEMENT*X-PP: F(2,66) = 9.740, p < 0.001 • P-PLACEMENT*RELATIVIZER: F(2,66) = 4.217, p < 0.02

  24. 5. Empirical Study II: Magnitude Estimation ANOVA: not significant • AGE: F(1,33) = 2.760, p > 0.10 • GENDER:F(1,33) = 1.495, p > 0.20  indicates: homogeneity of subjects

  25. 5. Empirical Study II: Magnitude Estimation Post-hoc Tukey test: P-Place*Relativizer • Ppied-piped:WH- >> that [p < 0.001]WH- >>  [p < 0.001]that >  [p < 0.010] • Pstrand: no difference: WH- = that =  [p >> 0.100]

  26. 5. Empirical Study II: Magnitude Estimation Post-hoc Tukey test: P-Place*X-PP • Ppied-piped:PPMan/Deg > VPrep[p < 0.010] PPMan/Deg = PPTemp/Loc [p = 0.100]VPrep = PPTemp/Loc [p > 0.100] • Pstrand: no difference:VPrep> PPTemp/Loc > PPMan/Deg [p < 0.001]

  27. Fig. 1: Magnitude estimation result for P + relativizer P+WH >> P+that > P+Ø

  28. Fig. 2: Magnitude estimation result for P + relativizer compared with fillers P+that & P+Ø = ungrammatical fillers  violation of “hard constraint” (Sorace & Keller 2005)

  29. Fig. 3: Magnitude estimation result for relativizer + P WH + P= that + P = Ø + PVPrep > PPTemp/Loc > PPMan/Deg

  30. Fig. 3: Magnitude estimation result for relativizer + P VPrep > PPTemp/Loc > PPMan/Deg >> ungrammatical filler violation of “soft constraint” (Sorace & Keller 2005)

  31. 6. Corroborating Evidence Corroborating evidence: corpus: man/deg PPs: no Pstranded (not even with that/) semantic constraint on Pstranded experiment:man/deg PPs worst environment for Pstrandedyet: better than ungrammatical fillers (soft constraint violation)

  32. 7. Empirical Study III: Corpus Data II Constraints on variable corpus data (354 finite WH-token): Goldvarb identified 3 independent factors:(Log likelihood = -88.437 Significance = 0.004;Fit: X-square(27) = 27.977, accepted, p = 0.2040) 1. level of formality (as expected) 2. type of PP contained in (as expected) 3. restrictiveness (unexpected): restrictive RC favourpied piping: (weight: 0.592) nonrestrictive RC clearly inhibit pied piping (i.e. favour stranding; weight:0.248)

  33. 7. Empirical Study III: Corpus Data II (6) And uhm he left me there with this packet of Durex which I hadn't got a clue what to do **[with]** to be totally honest <ICE-GB:S1B-049 #167:1:B> reasons for restrictiveness effect: 1. weaker semantic ties of non-restrictive clause with antecedent (pause/comma) 2. Pied-piped P receives connective function  functionalisation of preposition placement in WH-relative clause

  34. 8. Conclusion corpus and introspection data = corroborating evidence: corpora:frequency/context effects (e.g. level of formality)unexpected patterns (e.g. restrictiveness)categorical data  require further investigation  introspection: differentiation of accidental gaps (WH+P with PPTemp/Loc)systematic gaps (X+P with PPMan/Deg)detection of degrees of ungrammaticality

  35. 9. References Aarts, B. 2000. "Corpus linguistics, Chomsky and Fuzzy Tree Fragments". In Christian Mair and Marianne Hundt, eds. 2000. Corpus Linguistics and Linguistic Theory. Amsterdam and Atlanta, GA: Rodopi, 5-13. Bard, E.G. et al. 1996. “Magnitude Estimation of Linguistic acceptability”. Language 72:32-68. Bergh, G. & A. Seppänen. 2000. “Preposition stranding with wh-relatives: A historical survey”. English Language and Linguistics 4:295-316. Cowart, W. 1997. Experimental Syntax: Applying Objective Methods to SentenceJudgements. Thousand Oaks: Sage. Huddleston, R. et al. 2002. “Relative constructions and unbound dependencies”. In: G.K. Pullum & R. Huddleston, eds. The Cambridge Grammar of the English Language. Cambridge: Cambridge University Press, 1031-1096. Jackendoff, R. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Levine, R. & I.A. Sag. 2003. “WH-Nonmovement”. <http://www-csli.stanford.edu/~sag>, 04.07.2004.

  36. 9. References Nelson, G. et al. 2002. Exploring Natural Language: Working with the British Component of the International Corpus of English. Amsterdam, Philadelphia: Benjamins. McEnery, T. and A. Wilson. 1997. Corpus Linguistics. Edinburgh: Edinburgh University Press. Pesetsky, D. 1998. “Some principles of sentence production”. In: Pilar Barbosa et al., eds. Is the Best Good Enough? Optimality and Competition in Syntax. Cambridge, MA: MIT Press, 337-83. Penke, M. & A. Rosenbach. 2004. "What counts as evidence in linguistics? An introduction". Studies in Language 28,3: 480-526. Pickering, M. & G. Barry. 1991. “Sentence processing without empty categories”. Language and Cognitive Processes 6:229-259. Quirk, R. et al. 1985. A Comprehensive Grammar of the English Language. London: Longman. Robinson, J. et al. 2001. “GOLDVARB 2001: A Multivariate Analysis Application for Windows”. <http://www.york.ac.uk/depts/lang/webstuff/goldvarb/manualOct2001>

  37. 9. References Sag, I.A. 1997. “English relative constructions”. Journal of Linguistics 33:431-484. Sampson, G. 2001. Empirical Linguistics. London, New York: Continuum. Schütze, Carson T. 1996. The Empirical Base of Linguistics: Grammaticality Judgements and Linguistic Methodology. Chicago: Chicago University Press. Sorace, Antonella and Frank Keller. 2005. "Gradience in linguistic data". Lingua 115,11: 1497-1525. Trotta, J. 2000. Wh-clauses in English: Aspects of Theory and Description. Amsterdam and Philadelphia, GA: Rodopi. Van der Auwera, J. 1985. “Relative that — a centennial dispute”. Journal of Linguistics 21:149-179.

More Related