290 likes | 407 Vues
This project investigates the structural nature of texts through quantitative typology. By examining various forms of text and analyzing their components — such as word length, sentence structure, and genre — we aim to reconstruct norms and standards that define what a text is. Our methodology includes both qualitative and quantitative analyses, applied to data from diverse text types. We explore issues of data homogeneity and define basic analytical units while offering insights into text classification, including distinguishing between different functional styles and text sorts.
E N D
Peter Grzybek & Ernst Stadlober Quantitative Text Typology http://www-gewi.uni-graz.at/quanta http://quanta.uni-graz.at Austrian Research FundProject #15485
Let‘s suppose there is … … A Universe of Texts
Is the Universe Structured ? Or Can We Structure it ? How Can the Text Universe Be Structured?
Corpus Analysis vs.Text Analysis • (Re-)Construction • of a norm • of a standard • of „language“ Text As a Homogeneous Entity „Text Mixture“ Self-regulating System („Quasi Text“) Complete Text
What is a Text ? • Complete novel, composed of books ? • Complete book of a novel, consisting of several chapters ? • Individual chapters ? • Dialogical vs. narrative sequences within a text ? • Two Major Problems: • Data Homogeneity • Definition of Basic Analytical Units
Both problems relevant for quantitative approaches WHY QUANTITATIVE APPROACHES ? • ASSUMPTION: • If a ‚text‘ is governed by synergetic processes, these processes can and must be quantitatively described. • The descriptive models obtained for each ‚text‘, can be compaired to each other, possibly resulting in one or more general model(s). • Thus, a quantitative typology of texts can be obtained.
WHY WORD LENGTH ? Synergetics In a Nutshell – Frequencies and Dependencies Word Length: Graphemes, Phonemes, Syllables, Morphemes,…
TYPES OF TEXT TYPOLOGIES • I. Qualitative • II. Quantitative-Qualitative • Tabula Rasa Principle (Clustering Methods) • A-priori A-posteriori Principle (Discrimination Methods)
Structuring the Text Universe (Ia): Text Sorts
Structuring the Text Universe (Ib): Functional Styles
In a qualitative approach, the text universe is structured with regard to external (pragmatic) factors („with reference to the world“) • general communicative functions of language (functional styles) • specific situational functions (text sorts)
Top-Down Bottom-Up
Bottom-Up Top-Down First and Second Order Cross Comparisons
Intended Emphasis on Letters • ‚Letter‘ as a Prototype of Language • Located between Oral and Written Communication • Result of One Homogeneous Process of Text Generation
A Small World of Texts Word Length Frequencies (in %) of Four Texts Literary Prose Text (#256) Versified Poetic Text (#359) Journalistic Comment (#324) Private Letter (#1)
Post-Hoc-Tests (Text Sorts) Groups without significant differences form „homogeneoussubgroups“ • Homogeneous subgroups do exist • All four letter types in different subgroups !
Post-Hoc-AnalysesHomogeneousSubgroups DiscriminantanalysesCases are attributed to groups, on the basis of specific predictor variables Thevariablesare submitted to linear transformations in order to arrive at an optimal discriminationof the individual cases
DiscriminantAnalysis: Eight Text Sorts Discrimination variables: m1, m2, v, p1 (56.30%)
Discriminant Analysis: Four Letter Types (n=213) {Private L.} {Ep. Novel} {Readers‘ L.} {Open L.} Discrimination variables: m1, v 70.40 %
Discriminant Analysis: Three Letters Types (n=213) {Private L., Ep. Novel} {Readers‘ L.} {Open L.} Discrimination variables: m1, p2 86.90 % Distinction of Literary Letters Irrelevant ?
Discriminant Analysis: Private vs. Public Letters (n=213) {Private L., Ep. Novel}, {Readers‘ & Open L.} Discrimination variables: m1, p2 92.00 % Distinction of Private vs. Public Styles ?
Discriminant Analysis: Private vs. Public Texts (n=248) {Private L., Ep. Novel}, {Readers‘ & Open L., Comments} Discrimination variables: m1, p2 91.10 % Public vs. Private Styles ?
Discriminant Analysis: Private/Oral vs. Public/Written Texts (n=290) {Private L., Ep. Novel, Drama}, {Readers‘ & Open L., Comments} Discrimination variables: m1, p2 92.40 % Oral vs. Written Styles ?
Discriminant Analysis: Three Text Types (n=330) {Private / Oral} {Public / Written} {Verse} Discrimination variables: m1, p2, v 91.20 % Towards a New Typology ?
Discriminant Analysis: Four Text Types (n=398) {Private / Oral} {Public / Written} {Prose} {Verse} Discrimination variables: m1, p2, v 79.90 %
Discriminant Analysis: Three Text Types (n=398) {Private / Oral} {Public / Written / Prose} {Verse} Discrimination variables: m1, p2, v 92.70 %