1 / 20

Corpus Stylistics

Corpus Stylistics. Outline: Background and introduction to current work Methodology in Corpus Stylistics Applications of Corpus Stylistics References. Corpus Stylistics. Background: What is Corpus Stylistics?

nayef
Télécharger la présentation

Corpus Stylistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpus Stylistics Outline: • Background and introduction to current work • Methodology in Corpus Stylistics • Applications of Corpus Stylistics • References

  2. Corpus Stylistics Background: What is Corpus Stylistics? • The statistical study of style, i.e. study of the relative frequency of elements in a text • Augustus de Morgan, 1851: disputes about the authenticity of some of the writings of St Paul settled by the measurement of the length of the words used in the various Epistles • T.C. Mendenhall, 1887: analysis of several authors’ frequency distributions of word-length

  3. Corpus Stylistics • Corpus: a body or collection of linguistic data for use in research • Since the early 1960s: interest in computer corpora or machine readable corpora • Statements about the relative frequency of various linguistic items in a corpus have become very accurate

  4. Corpus Stylistics • Some uses of statistical analysis of style through corpora: • Education, e.g. EFL textbook writing • Establishment of authorship, e.g. of unascribed manuscripts • Interpretive stylistics, e.g. study of the writer’s ideology and point of view

  5. Methodology Corpus Stylistics • Simple things may characterise different styles • average sentence length • average word length • type:token ratio (vocabulary richness) • number of types = number of different words • number of tokens = total number of words • vocabulary growth (homogeneity of text) • number of new types in 1st, 2nd, …, nth 1000 words • in rich varied text, number will climb steadily • Especially when used comparatively

  6. Corpus StylisticsMethodology (cont’d) • More complex analyses can give a more interesting picture • specific syntactic structures • degree of modification in NPs • types of verbs (e.g. verbs of persuasion, speech verbs, action verbs, descriptive verbs) • distribution of pronouns (1st/2nd/3rd person) • etc … (anything you can think of!) • Quite sophisticated mathematical techniques can give an overall picture • e.g. factor analysis: identifies from a (big) range of variables which ones best identify/characterise differences

  7. Corpus StylisticsMethodology (cont’d) Multidimensional analysis • Collect a huge range of measures of a wide variety • some simple word counts • syntactic features • classes and subclasses of N, V, Adj, Avd • Factor analysis • choose a range of features to measure, see which ones are correlated

  8. ~150 features in all

  9. Corpus StylisticsMethodology (cont’d) • Example: work based on corpora trying to quantify and characterise genre and register differences • Work pioneered by Douglas Biber* • Biber used statistical measures to identify stylistic factors that co-occurred, and could therefore be definitional of text types and genres • E.g. conjuncts like therefore, nevertheless and use of passive together indicate more formal style *D. Biber, S. Conrad & R. Reppen, Corpus Linguistics: Investigating Language Structure and Use, Ch 5: the study of discourse characteristics

  10. Corpus StylisticsMethodology (cont’d) • Corpora useful not only for counting frequencies of features, but also: • Concordancing • Lists occurrences of word in context • Identify syntactic use of word • Identify range of meanings • Identify relative frequency of different uses/meanings • Collocation • What words occur together? • Compare distribution of close synonyms

  11. Corpus StylisticsMethodology (cont’d) Vocabulary in context • “Concordance”, also known as KWIC list (key word in context) • Allows us to see the (immediate) environment in which a word appears • Listings can be customised to show what you want more clearly, e.g. • sorted according to next or previous word • showing more or less context

  12. Corpus StylisticsMethodology (cont’d) Collocation • Term coined by J R Firth (1957) to characterise (part of) his theory of meaning • “You shall judge a word by the company it keeps” • “The occurrence of two or more words within a short space of each other in a text” (Sinclair 1991) • “The relationship a lexical item has with items that appear with greater than random probability in its (textual) context” (Hoey 1991)

  13. Style and CorporaMethodology (cont’d) Collocation, text type and style – example: • Distinguish between general and more usual collocations vs. technical and more personal ones • e.g. in a general corpus time collocates with save, spend, waste, fritter away, … • but in a corpus of sports reports time collocates with half, full, extra, injury, first, second, third, …

  14. Style and CorporaApplications Stylometry • An attempt to capture the essence of the style of a particular author by reference to a variety of quantitative criteria, usually lexical, called discriminators. • Study of frequently occurring features: word/sentence length; choice and frequency of words; vocabulary richness) • The ideal situation for authorship studies is • when there are large amounts of undisputed text, or • few contenders for the authorship of the disputed text(s).

  15. Style and CorporaApplications (cont’d) Author attribution Establishing the author of an unascribed manuscript: • Build corpora • A - works definitely by author A • B - works definitely by author B • C - works of disputed authorship, but probably written by A or B • Then select discriminantsand associated measures • When the technique has been shown to discriminate effectively between A and B, then try it on C (M. Oakes: ‘Computational Stylometry’, in Handbook of Corpus Linguistics)

  16. Style and CorporaApplications (cont’d) Language Learning • Frequency - in particular, word frequency - had a role in language learning in the days before electronic corpora existed. • The 'corpus revolution' made available frequency information about language use in a totally unprecedented way • Frequency dictionaries and frequency-based grammatical information are becoming more and more available and new sources of frequency information from the Web are being tapped • Various kinds of knowledge found in present-day language textbooks (grammatical, collocational, semantic) are getting to be frequency-based. • In general, corpora represent real usage of language • In addition, "more frequent” can equal “more important“ in many aspects of language learning

  17. Style and CorporaApplications (cont’d) Interpretive stylistics • Programmes like WordSmith Tools and other Windows-based applications allowresearchers to derive a list of keywords (words which occur significantly more often than expected in texts when compared to a reference corpus). • Keywords are a powerful and quick means of analysis, and they have been used to examine discourses relating to specific social and cultural issues, and the ideology behind authors / texts • See e.g. work by P. Baker on gender and sexual identity

  18. Reading Leech, G. Language and Literature: Style and Foregrounding (Longman, 2008), ch.11 Leech, G. and Short, M. Style in Fiction (Routledge, 2007), ch. 2 and 3 Semino, E. & M. Short, Corpus Stylistics: Speech, writing and thought presentation in a corpus of English writing (Routledge, 2004) Short, M. Exploring the language of poems, plays, and prose (Longman, 1996), ch. 11

More Related