Historical Linguistics • Language history • drift: change by internal development • contact: change by external borrowing • Possible relations among languages • family tree: • similarity due to separate development from common ancestor • diffusion of traits • similarity due to borrowing in period of contact • or, no provable relationship • Tasks of historical linguistics • inference of historical connections • reconstruction of “proto” languages
Colonial Philology • Thomas Jefferson corresponded with many sources to obtain word lists in Indian languages • Examined and compared the results of Peter the Great’s Siberian expeditions • Benjamin Franklin also collected Indian word lists
How many ages have elapsed since the English, Dutch, the Germans, the Swiss, the Norwegians, Danes and Swedes have separated from their common stock? Yet how many more must elapse before the proofs of their common origin, which exist in their several languages, will disappear? It is to be lamented then … that we have suffered so many of the Indian tribes already to extinguish, without our having previously collected and deposited in the records of literature, the general rudiments at least of the languages they spoke. Were vocabularies formed of all the languages spoken in North and South America, preserving their appellations of the most common objects in nature, of those which must be present to every nation barbarous or civilised, with the inflections of their nouns and verbs, their principles of regimen and concord, and these deposited in all the public libraries, it would furnish opportunities to those skilled in the languages of the old world to compare them with these, now or at a future time, and hence to construct the best evidence of the derivation of this part of the human race. Thomas Jefferson, Notes on the State of Virginia. [Written1781-82].
Benjamin Barton sees a pattern By a careful inspection of the vocabularies, the reader will find no difficulty in discovering that in Asia the languages of the … tribes of the Delaware-stock may be all traced to ONE COMMON SOURCE. Nor do I limit this observation to the languages of the American tribes just mentioned… HITHERTO, WE HAVE NOT DISCOVERED IN AMERICA… ANY TWO, OR MORE LANGUAGES BETWEEN WHICH WE ARE INCAPABLE OF DETECTING AFFINITIES (AND THOSE VERY OFTEN STRIKING) EITHER IN AMERICAN, OR IN THE OLD WORLD. New Views of the Origin of the Tribes and Nations of America Benjamin Smith Barton M.D., Professor of Materia Medica, Natural History and Botany, in the University of Pennsylvania (1798)
Barton’s hypothesis: My inquiries seem to render it probable, that all the languages of the countries of America may … be traced to one or two great stocks…
Jefferson disagreed: …imperfect as is our knowledge of the tongues spoken in America, it suffices to discover the following remarkable fact. Arranging them under the radical ones to which they may be palpably traced, and doing the same by those of the red men of Asia, there will be found probably twenty in America, for one in Asia, of those radical languages, so called because, if they were ever the same, they have lost all resemblance to one another. A separation into dialects may be the work of a few ages only, but for two dialects to recede from one another till they have lost all vestiges of their common origin, must require an immense course of time; perhaps not less than many people give to the age of the earth. A greater number of those radical changes of language having taken place among the red men of America, proves them of greater antiquity than those of Asia. Notes on the State of Virginia [Written 1781-82]
though later, J. considered a sociolinguistic explanation… Having heard that some Indians considered it dishonorable to use any language but their own, he suggested that when a part of a tribe separated itself, the seceded group might refuse to use the original language and invent their own. “Perhaps this hypothesis presents less difficulty than that of so many radically distinct languages preserved by such handfuls of men from an antiquity so remote that no data we possess will enable us to calculate it.” [Ms. notes circa 1800]
Jefferson’s plans • By 1801, he had collected vocabularies for dozens of indigenous languages • and began to arrange this for publication “lest by some accident it might be lost” • He put off publication in 1803 • due to the opportunity to include the results of the Lewis & Clark expedition
The sad end of J.’s linguistic career • His linguistic papers were packed in a large trunk and shipped back to Monticello in 1809 with his other effects • The trunk was stolen during the trip up the James River • The disappointed thief dumped the contents in the river • Only a few items floated to shore and were recovered
Jefferson to Barton (1809),sent with Lewis’ vocabulary of Pani: It is a specimen of the condition of the little that was recovered. I am the more concerned at this accident, as of the two hundred and fifty words of my vocabularies, and the one hundred and thirty words of the great Russian vocabularies … seventy three were common to both, and would have furnished materials… from which something might have resulted. Perhaps I may make another attempt to collect, although I am too old to expect to make much progress in it.
Sir William (“Oriental”) Jones • Lawyer appointed in 1783 to superintend British jurisprudence in India • Founded the Asiatic Society in Calcutta “for Inquiring into the History, Civil and Natural, the Antiquities, Arts, Sciences, and Literature, of Asia” • Learned Sanskrit because “the laws of the natives must be preserved inviolate; but the learning and vigilance of the English judge must be a check upon the native interpreters”
One of the early European “orientalists” • Cross-cultural pioneers? • Agents of colonial domination?
Historical Context • The British in India • piecemeal conquest 1750-1900 • began with trade concessions in Calcutta and Bombay • expanded one principality at a time • mixture of direct and indirect rule • many Indian institutions left in place • rule mainly administered and enforced by Indians • until 1850s, administration was in the hands of the East India Company rather than the British Crown
Jones learns Sanskrit (1783-1786) • Sanskrit • Language of Hindu holy texts (1000 BC) • Formalized by grammarians c. 600 BC • Preserved to the present day as a language of religion and learning • No Brahman would teach a foreigner • Jones hired a vaidya (doctor) as tutor while the Brahmanic scholars were away on a religious retreat
Jones’ Third Discourse (1786) • Anniversary addresses to the Asiatic Society • First Discourse: purposes and procedures of the Society • Second Discourse: a detailed research program • Third Discourse: on the nations of Asia The five principal nations, who have in different ages divided among themselves, as a kind of inheritance, the vast continent of Asia, with the many islands depending on it, are the Indians, the Chinese, the Tartars, the Arabs, and the Persians; who they severally were, whence and when they came, where they now are settled, and what advantage a more perfect knowledge of them all may bring to our European world, will be shown, I trust, in five distinct essays; the last of which will demonstrate the connexion or diversity between then, and solve the great problem, whether they had any common origin, and whether that origin was the same, which we generally ascribe to them.
The Indo-European Hypothesis The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more perfect than the Greek; more copious than the Latin, and more exquisitely refined than either, yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists; there is a similar reason, though not quite so forcible, for supposing that both the Gothick and the Celtick, though blended with a very different idiom, had the same origin with the Sanskrit, and the old Persian might be added to the same family.
Jones’ American connection • Jones was a radical Whig and an early political supporter of the American Revolution • Met Benjamin Franklin at the RS in 1771 • Visited Franklin in Paris in 1779, 1780, and 1782 • To explore compromise peace plans • To deal with a client’s property claims in Virginia • To obtain a pass for travel to America • considered emigration to Charleston or Philadelphia! • Many weeks of political and philosophical conversations • Indirect communication with Jefferson Relations to the Virginia manuscript?
Jones’ methods • Analyst must be “perfectly acquainted” with the languages compared • Meanings of proposed cognates must be nearly identical • Vowels should not be disregarded • No metathesis or unexplained consonant insertions • Transliterations must be systematic and careful • Use basic vocabulary, not exotic words more likely to be borrowed
Remember Barton By a careful inspection of the vocabularies, the reader will find no difficulty in discovering that in Asia the languages of the … tribes of the Delaware-stock may be all traced to ONE COMMON SOURCE. Nor do I limit this observation to the languages of the American tribes just mentioned… HITHERTO, WE HAVE NOT DISCOVERED IN AMERICA… ANY TWO, OR MORE LANGUAGES BETWEEN WHICH WE ARE INCAPABLE OF DETECTING AFFINITIES (AND THOSE VERY OFTEN STRIKING) EITHER IN AMERICAN, OR IN THE OLD WORLD. New Views of the Origin of the Tribes and Nations of America Benjamin Smith Barton M.D., Professor of Materia Medica, Natural History and Botany, in the University of Pennsylvania (1798)
…imperfect as is our knowledge of the tongues spoken in America, it suffices to discover the following remarkable fact. Arranging them under the radical ones to which they may be palpably traced, and doing the same by those of the red men of Asia, there will be found probably twenty in America, for one in Asia, of those radical languages, so called because, if they were ever the same, they have lost all resemblance to one another. A separation into dialects may be the work of a few ages only, but for two dialects to recede from one another till they have lost all vestiges of their common origin, must require an immense course of time; perhaps not less than many people give to the age of the earth. A greater number of those radical changes of language having taken place among the red men of America, proves them of greater antiquity than those of Asia. Thomas Jefferson again: Notes on the State of Virginia, 1787
The controversy continues • (Like Barton) Joseph Greenberg (1987): • All American languages in three groups: • Eskimo-Aleut • Na-Dene • Amerind • (Like Jefferson) Other scholars: • The Amerind category is a fiction • There are • ~60 unrelated families in N. America • ~19 unrelated families in C. America • ~80 unrelated families in S. America
Different methods • Mass comparison • Cognate ratios (lexicostatistics) • Glottochronology • Typological features • e.g. classifier systems • Comparative reconstruction • Determination of systematic sound laws • Lexical and morphological reconstruction
“Laws” of sound change • Meaning change is usually sporadic • Sound change is usually systematic, e.g. • t/d deletion (best, past, lost, etc.) • short a raising (camera, man, vanish, etc.) • “Neogrammarian hypothesis” (1870): • All sound change is systematic • Apparent exceptions: analysis is incomplete • Article of faith with scholars known as“the young grammarians”
Grimm’s Law • Jakob Grimm (1822) • Gradation of consonant manner bh dh gh -> b d g b d g -> p t k p t k -> f th h pater father labium lip tres three duo two canis hound ager acre bhratar brother dha do vah wagon
Verner’s Law • Karl Adolf Verner (1875) • Fixes “gaps” in Grimm’s Law: • voicing after accentless vowels • applies to non-Grimm’s Law cases as well • from PIE to Gothic in four algorithmic steps: PIE p@tér GL f@thér (vowels) fathár VL fadár AS fádar
More on sound change • Well attested in recent history • I.e. English Great Vowel Shift • Can study sound change in progress today • Tends to produce tree-like histories. • operates on the system as a whole • isn’t easily borrowed across languages
Problems with comparative reconstruction • Requires detailed knowledge of languages involved • Must be enough cognates for patterns to emerge • and layers of borrowing to be identified and discarded • Maximum time depth of 5-10K years • (Jefferson was right)
Cognate percentages • Catherine the Great’s method • make a list of appellations of the most common objects in nature, of those which must be present to every nation barbarous or civilised • Standard lists devised by Morris Swadesh around 1950 • For each pair of languages, estimate the proportion of cognate words • Raw result is a table of percentages • like a table of trip distances
Gunu [two lists] 82 Elip 85 90 Mmala [two lists] 78 90 89 Yangben[two lists] 77 81 81 88 Baca [two lists] 66 72 72 77 78 Mbule [two lists] 58 63 64 66 70 69 Bati 42 41 42 42 42 46 45 Hijuk [two lists] 39 38 41 38 37 40 41 88 Basaa Example Central Yambasa languages (Cameroon)
Questions about lexicostatistics • “Genetic descent” vs. borrowing • borrowing creates non-tree structures • Variability of rate of change • Swadesh: 14% per millenium • Expected rate of false cognates • How to combine with other evidence • Inference of tree structure • from cognate percentages • from detailed account of shared traits
Historical inferencefrom linguistic and genetic data Potentially “…the best evidence of the derivation of … the human race” (Thomas Jefferson) BUT Inferences are complex methods and results from several disciplines Intellectual stakes are high Work has often been careless sometimes spectacularly so dangers of overinterpretation and “scientism”
General methodological problems • Not all graphs are trees • “treeness” tests often left out • “treeness” hypothesis can often be rejected • Tree inference may be underdetermined • Branching structure • Root choice • Rates of change may not be constant • for different markers • across time • Gene trees (and language trees) may not be population trees • Biology and language are complicated • simplifying assumptions are sometimes perniciously mistaken
Trees vs. Clines (etc.) • A tree structure represents the results of a sequence of splits in population (or language) • no further influences among separate branches • if rates of change are constant, distances should be quantized • Within an interbreeding (intercommunicating) population, distances reflect the amount of gene flow (transmission of linguistic traits) • should correlate strongly with accessibility • e.g. geographical distance in the simplest case
The… procedures outlined here provide a rigorous method for inferring whether the geographical pattern of variation is consistent with an historical split (fragmentation) or no split(recurrent gene flow) using criteria that are completely explicit. For example, in analyzing the mtDNA of tiger salamanders, a clear split into eastern and western lineages was detected for mtDNA. Using the same explicit criteria, there was no split among any human populations. Quite the contrary, the present analysis documents recurrent and continual genetic interchange among all Old World human populations throughout the entire time period marked by mt DNA. Accordingly, estimating a date for a 'split' of Africans from non-Africans based on evidence from mtDNA is certainly allowed by many computer programs, but the results are meaningless because a date is being assigned to an 'event' that never occurred. Templeton (1997)
Methods for tree inference(“phylogeny”) • Two general approaches • clustering (easier but cruder) • generate and evaluate alternative trees • Distance-based methods • based on matrix of distances/similarities • Parsimony • based on set of partly-shared characters or traits http://evolution.genetics.washington.edu/phylip/software.html documents 193 different phylogeny packages
Cognate percentagesfor 8 Vanuatu languages Toga 64 Mosina 64 58 Peterara 57 51 65 Nduindui 29 28 34 32 Sakao 51 45 55 52 40 Malo 39 39 45 41 43 50 Fortsenal 52 48 57 60 31 48 45 Raga Data from Guy (1994)
Reconstruction Algorithm(Guy 1994) “A message is input at the root of a tree-shaped transmissionnetwork, whence it is transmitted to the terminal nodes. As they travel,copies of the original message are affected by errors consisting inrandomly selected segments of the message being replaced by othersegments randomly drawn from a pool of possible segments (the "alphabet“of the message). The problem is: from the garbled versions of theoriginal message collected at the terminal nodes, reconstruct thenetwork and the history of the transmission of the message.” “Additive-distance” tree with weights on branches ratherthan on nodes -- doesn’t assume constant rate of change…
Explanatory force of the model • Set of distances grows as • Set of binary-tree branch labels grows as • For 8 languages: we predict 28 numbers (the inter-language cognate proportions) with 14 numbers (the binary tree branch proportions)
Inferred tree Toga -830-----:-919-----:-972-----:-947-----: Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----' Fortsenal -----759-----------' | Malo ----------772----------------' Mosina/Toga: .77*.83 = .6391 (really 64%) Peterara/Mosina: .829*.919*.77 = .5866 (really 58%) Peterara/Toga: .829*.919*.830 = .6323 (really 64%) from Guy (1994)
True - predictedcognate percentages Toga 0 Mosina 1 -1 Peterara 1 -1 4 Nduindui -2 -1 0 0 Sakao 2 0 2 3 1 Malo -3 0 -1 -2 0 -2 Fortsenal -1 -1 -1 0 1 1 4 Raga The model fits very well!
Where’s the root? Isn’t it obvious? Toga -830-----:-919-----:-972-----:-947-----:--Protolanguage Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----' Fortsenal -----759-----------' | Malo ----------772----------------'
Oops: other options protolanguage Toga -830-----:-919-----:-972-----:-947-----: Mosina -770-----' | | | Peterara -----829-----------' | | Nduindui -----795-----------:-949-----' | Raga -----755-----------' | Sakao -----567-----------:-883-----:-895-----' Fortsenal -----759-----------' | Malo ----------772----------------'
And some more… protolanguage Toga -830-:-919-:-972-:-947-:-895-:-883-:-567- Sakao Mosina -770-' | | | `-759- Fortsenal Peterara -----829---' | `---772----- Malo Nduindui -----795---:-949-' Raga -----755---' In the absence of other constraints, the root can be placed anywhere in the tree without changing the model’s fit!
Possible “other constraints” • Historical evidence • about earlier forms • about structure of relationships among contemporary forms • “outgroup” • Constraints on rate of change • linguistic (or genetic) “clock”
A universal constantfor glottochronology? Thirteen sets of data, presented in partial justification of these assumptions, serve as a basis for calculating a universal constant to express the average rate of retention k of the basic-root morphemes: k = 0.8048 ± 0.0176 per millennium, with a confidence limit of 90%. Lees (1953)
Some more retentive languages(rates per 1000 years) Bergsland & Vogt (1962)