1 / 37

Text-based typology

Text-based typology. Corpora, corpora of elicited texts and parallel corpora (based on STUF 2007) МД. Pros as compared to questionnaires. Contextualization of examples Naturalistic discourse Intralinguistic variation Potentially, makes up for grammar gaps. Frog stories. (Mercer Mayer).

houk
Télécharger la présentation

Text-based typology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text-based typology Corpora, corpora of elicited texts and parallel corpora (based on STUF 2007) МД

  2. Prosas compared to questionnaires • Contextualization of examples • Naturalistic discourse • Intralinguistic variation • Potentially, makes up for grammar gaps

  3. Frog stories • (Mercer Mayer)

  4. Pear stories • W. Chafe et al. • A six-minute film shot in UC (Berkeley) in 1975 • Widely used in various cross-linguistic research • referential density project

  5. Referential density (Bickel 2003) • Relative frequency of overt NPs: Via Nichols 2014

  6. Contras of elicited corpora • Not directly comparable • events focused and omitted • mostly quantitative results • Require massive linguistic effort • limited data for each language Any alternative? • Parallel corpora

  7. Massively parallel texts • Harry Potter • Including subtitles (76, 21) • Biblical translations • Pater Noster in 1300 lgs, 400 full texts, 1,000 gospels • Marxist texts • State and Revolution: 71 tr in 36 lgs • Legal databases: • Proceedings of the European Parliament • Universal Declaration of Human Rights (329) • Unesco online database of literary translations (1,5 mln items) • Andersen, Le Petit Prince, … Cysouw and Wälchli 2007

  8. Comparability (easy counts) • Parallel corpora: • roughly comparable number of sentences (from 1,663 to 1,528 for Petit Prince) • Elicited texts: • pear stories in the same language vary from 29 to 119 sentences (Bickel 2003 via Nichols 2014) • ‘Free’ corpora: • not applicable…

  9. Comparability (methodology) • Comparison by intension • definition of a phenomena • browsing grammars • Comparison by extension • linguistic structures used for expressing a contextualized situation • truly functional Wälchli 2007

  10. Extensional typology in parallel corpora • data we work with may be linguistically different but semantically identical • cf. much looser identity in elicited texts • rather, they are “defined as a selection of places in the parallel texts” • they may reflect linguistic variation • at points where one language uses the same construction, another languages uses several

  11. Parallel corpora support conventional typology • Newmeyer against Stassen • Classical Greek, Latin and Tibetan have the ‘exceed’ type comparative - contra Stassen 1985 • Wälchli supports Stassen • A study of parallel corpora does not show ‘exceed’ but ‘separative’ construction • Parallel corpora reflect dominant patterns – exactly where the typology’s primary interests lie • But they also numerically reflect variation or competition between dominant patterns, rather than provide yes or no typology

  12. Case studies, among other: • Wälchli 2005: co-compounds • Auwera et al. 2004: epistemic poss. in Slavic • Wälchli 2006: ‘again’ • Wälchli 2001: motion events • Wälchli & Zúñiga 2006: motion events ‘again’ • Stolz 2004: total reduplication • Stolz et al. 2005: comitatives and instrumentals • Stolz et al.: absolute possessives

  13. Stolz 2003, 2004 Le Petit Prince - quantitative ‘avec’-cline Total-reduplication-cline Does this require parallel corpora?

  14. Stolz 2003, 2004 Le Petit Prince – qualitative? Puis il s-épongea le frontavec un mouchoir à carreaux rouges. Then he mopped his forhead with a handkerchief decorated with red squares. Zatim obrise čelorupčičem s crvenim kvadratima. Wells with a rusty pulley – ornative or a separate category?

  15. Pitfalls: data analysis • Easier than raw texts • we know what was intended and where to look • still, as any grammatical analysis by a non expert, subject to mistakes • Alignment issues Anyway, same or easier than with elicited texts Wälchli 2007

  16. Pitfalls: sample bias Europe overrepresented, convenience sampling: • Europe > IE > other families • In his study of comitatives, Stolz ended up with an areal rather than sampling study

  17. Pitfalls: style/variant choice • Standard language bias • Better include texts reporting speech • ‘Hagiolect’ effects • ‘The sinners will-Evid not enter the heaven’ • Style incomparability • Bible translation are stylistically diverse • Purism Wälchli 2007

  18. Pitfalls: translation bias Wälchli 2007 “Incommensurability” of linguistic structures: some languages think differently… • Australian lgs prefer absolute over relative frame of reference • In Australian Gospels, occurrences of AFR are found but significantly less frequent than in natural discourse from this area • “Inert” construction – a construction that tends to be imported from the source language

  19. Case study: MVC in ‘bring’ and ‘run’ events Bible-based, Bernhard Wälchli Multi-verb construction: clauses that contain more than one lexical verb BRING and RUN events may be described as MVC or “solitarizing” verbs

  20. BRING and RUN events (Wälchli) Examples: Minnin ti-bouay la ban mouin.(Haitian Creole) lead little-boy def give I Ač-i-ne Man pat-ăm-a il-se kil-ĕr.(Chuvash) child-ps3-dat/acc I.gen to-poss1sg-dat take-conv come-imp2pl ‘… bring him unto me.’ (solitarizing) Data usually unavailable from grammars…

  21. BRING and RUN events (Wälchli) Bible-based, Bernhard Wälchli Multi-verb construction: clauses that contain more than one lexical verb BRING and RUN events may be described as MVC or “solitarizing” verbs Is there any correlation between the choice of either construction for encoding the two events?

  22. BRING and RUN events (Wälchli)

  23. BRING and RUN events (Wälchli) • 165 languages (Eurasia over-represented) • 18 BRING events, six RUN events • Correlation between MVC in BRING and RUN is highly significant (Fisher’s test)

  24. BRING and RUN events (Wälchli) • Is a language consistently MVC vs. solitarizing? • Surely not – then, is this a typological parameter at all?

  25. BRING and RUN events (Wälchli) • But: the distribution is bimodal

  26. BRING and RUN events (Wälchli) • If we only consider LOW and HIGH, fewer (14) languages are inconsistent

  27. Case study: demonstratives Potter-based, Federica da Milano 2007 • Distance-oriented systems • this near – that far • Person-oriented systems • this with us – that far from us • Is this a real disctinction, or are these two subtypes of something more general?

  28. Demonstratives (da Milano) • 48 stimuli (da Milano 2005) • Also include reciprocal orientation of the locutors: face to face, face to back, side by side • 83 occurrences of deictic demosntratives in “… and the Chamber of Secrets” • this with us – that far from us

  29. Demonstratives (da Milano) ‘Tie that round the bars,’ said Fred, throwing the end of a rope to Harry. ‘Przywiążto do kraty’, powiedział Fred, rzucając Harry’emu koniec liny.

  30. Demonstratives (da Milano) One term systems: French – cela, ca (ceci not used) German – der/die/das (dieser, jener not used)

  31. Demonstratives (da Milano) Two term systems: Unmarked vs. proximal – Scandinavian, English, Northen Italian Unmarked vs. distal – Polish, Russian, Czech, Hungarian, Modern Greek Dyad oriented - Catalan

  32. Demonstratives (da Milano) Three term systems: proximal, medial, distal Dual-anchored – medial (close to addressee or medium distance) Spanish (este~ese~aquel) Basque (hau~hori~hura) Addressee-anchored – medial is close to addressee only – not verified on HP Portuguese (esto~esso~aquele) Also Sardinian and Tuscun

  33. Demonstratives (da Milano) da Milano then proceeds to build a similar typology for adverbs; her conclusions are as follows: • The map of adverbs is by and large isomorphic to the map of pronouns • Levinson 2004 “perhaps one can hazard the generalizations that speaker-centered degrees of distance are usually (more) fully represented in the adverbs than the pronominals” confirmed • “It hasturned out to be fruitful to use parallel texts as a control test of data obtained through thequestionnaire. The results from the parallel texts mainly confirmed the prior typologicalgeneralizations.”

  34. ‘Free’ corpora! • No translations – no risk of inert categories, closer to naturalistic • Massive amounts of texts • Usually – literary • Vast playground for quantitative analysis

  35. ‘Free’ corpora! Examples: • Combinatorial statistics for property words • Lexical typology by LexTyp • Comparative occurrences • May be useful – cf. temperature domain

  36. Comparison: texts in typology • Free corpora: • No ‘meaning identity’, shift towards intensional typology • Massive collections: almost all kinds of phenomena • But a shift towards intensional typology • Natural discourse • Elicited texts: • Weak ‘meaning’ identity • Massive effort for transcription, poor collections • Only frequent phenomena • Natural discourse (with provisos) • Parallel corpora: • Strong ‘meaning’ identity • Natural written discourse (with provisos)

  37. Summary (obvious): • Corpora have their limitations and can not substitute conventional methods – but can go hand in hand with them

More Related