Citation form Picking a form that the user can find.
What is a citation form? • The citation form is the form of the word or phrase that begins the entry. • A dictionary that is alphabetized is sorted by the citation forms. • So the citation forms are the basis for the organization of the dictionary.
Consider… • When you create an entry for ‘sing’: singv. to make a series of prolonged sounds with a pleasing pattern of pitch and timing. • …you are actually making an entry for ‘singing’ ‘sings’ ‘sang’ and ‘sung’ as well. • The entry is not really for ‘sing’. It is for all the inflected forms of ‘sing’. • So where will the user look for ‘sung’? • Where will he look for the root of ‘unsung’, as in ‘unsung hero’?
The citation form is merely a representative. • A citation form is a form chosen to represent all the other forms of the lexeme. • In a stem-based dictionary the citation form represents all the inflected forms of the stem. • In a root-based dictionary the citation form represents all the derivatives and all their inflected forms.
Rules for choosing a citation form. • Form closest to the basic stem. • Form not “distorted” by morphophonemics. • Form regarded as the “natural” citation form. • Easy to understand in isolation. • Represents the basic meaning of the lexical item. • Form which occurs most frequently. • Form which is the best starting point. • Form from which the most subentries are derived.
Citation form for English. • We pick the basic stem: ‘cat’, not ‘cats’ ‘leak’, not ‘leaks’, ‘leaking’, or ‘leaked’ • These citation forms fit the criteria perfectly. • They represent the basic stem. • They are not distorted by morphophonemics. • They are regarded as the natural citation form. • They are easily understood in isolation. • They occur most frequently. • They are the best starting point from which to derive the other forms. • They are the forms from which subentries are derived (catty, leak-proof).
So what’s the big deal? • Many languages are not as easy as English. • Even English has irregular forms and patterns that do not fit the rules.
What’s the problem? • Can the user find what he’s looking for? • Choosing a citation form is not so much a matter of linguistic theory or morphological analysis, as it is a matter of the psychology of the user. • Before you publish, test the users. • Where will the user look to find the entries that you have worked so hard to produce? • If the user can’t find it, you have failed.
It isn’t as easy as it seems. • English is not a difficult language when it comes to picking a citation form. • We have no inflectional prefixes. • We have relatively few irregular forms like ‘man/men’ and ‘have/had’. • So sorting from the beginning of the word works quite well for us.
But what about…? • Languages with lots of prefixes? mwihi, beehi, miihi, liihi, meehi, kiihi, viihi, lwihi, keehiadj. small. [Lugungu] • Bantu languages have inflectional prefixes on almost every word. Adjectives have prefixes that concord with the class of the noun they modify. • So which form should be chosen as the citation form? You could choose any of them. None of them is basic. • Or should you choose the unaffixed root? -ihi adj. small. • Would the user know to look under –ihi to find ‘lwihi’?
But what about…? • Languages with morphophonemic variation? mwihi, beehi, miihi, liihi, meehi, kiihi, viihi, lwihi, keehiadj. small. [Lugungu] • Many languages have complex morphophonemics that obscure the form of the root. • If you choose the unaffixed root as the citation form… -ihi adj. small. • …would the user know to look under –ihi to find ‘beehi’ (ba- + ihi > be-ehi)?
Use full words. • It is very tempting to use unaffixed stems and roots. • But most users will not recognize a stem or root if it does not occur in isolation as a natural word. • Most English speakers would not recognized ‘nomin’ as a word, even though it is the root of ‘nominal’. • Most speakers of Lugungu would not recognize ‘on’ as a word, even though it is the root of ‘kwona’. The form ‘on’ never occurs by itself. No Lugungu word ends in a consonant, so it doesn’t even look like a word. • Just because you as a linguist can parse inflected forms doesn’t mean the average user can. • If you really must alphabetize by the root, consider using an affixed form as the citation form and putting the affixes in italics: kwonav. To see… • Or follow the citation form with a natural word. -on- kwonav. To see…
Use a form close to the basic stem. • Pick a form with few affixes. • The stem should be easily recognizable. kubona ku-bon-a kugenya ku-geny-a kulya ku-ly-a kutemba ku-temb-a
If a word has morphophonemic variants, use the underlying form. • Once you have done your morphological analysis, you should have a good idea what morphophonemic variants are underlying and which are derived. track track-s track-ing track-ed stop stop-s stopp-ing stopp-ed leave leave-s leav-ing lef-t run run-s runn-ing r-a-n
Use the “natural” form. • If you ask for a translation equivalent for a word, what form of the vernacular word is given? “What is your word for ‘forgive’?” • If the people discuss a word, what form do they use to refer to it? “We’ve been discussing the meaning of ‘grace’.”
It should be easy to understand. patri as in patri-ot, patri-cide ‘Patriot’ is easy to understand, and ‘patricide’ is easy to understand, but ‘patri’ is not.
Use a word that is basic to the meaning. be am ‘be + present + first singular’ was ‘be + past + third singular’ were ‘be + past + plural’ But sometimes the natural citation form will be the past tense rather than the present, or the third singular rather than the first singular.
Use the form from which other forms can be derived run, runner, but ‘also ran’ be, has-been go, goes, goner
Sorting • If your language has many prefixes, but few or no suffixes, consider sorting from the end of the word. Who says words have to be sorted from the beginning? • Sort by the alphabet, not by phoneme. Just because ‘ch’ in English is a different phoneme from ‘c’ doesn’t mean that we put ‘chat’ after ‘cut’. • Test these decisions to see what the people want and what works best.
The good, the bad, and the ugly. (easy languages, difficult languages, impossible languages) • Many languages have little or no affixation. For these languages it is easy to pick a citation form. • Other languages have some affixes, but they are regular, the stem can occur by itself, and people can easily find the stem. • Other languages have many affixes, the stem cannot occur by itself, and people have difficulty finding the stem. • Other languages have many affixes, the stem cannot occur by itself, the stem is often obscured by morphophonemics, and people find it impossible to identify the stem.
Highly inflected languages. • If your language has many affixes, complex morphophonemics, and people find it difficult or impossible to find the stem, you have three possible solutions. • Produce a stem based dictionary with lots of minor entries. • Publish in electronic format. Include a search function with a built in parser, or produce a finder list of all possible inflected forms. Use links so that the user can click on the citation form and jump to the entry. hadseehave. • Organize the dictionary by semantics.
Tradition. • Many language families have developed dictionary traditions. • Semitic dictionaries organize by the tri-consonantal root. But any learner of Hebrew can testify to the difficulty of finding anything in a Hebrew dictionary. • Bantu dictionaries sort by the singular prefix of nouns, the root of verbs (but enter the infinitive as the form), and the root of adjectives (but without a prefix). This tradition seems designed to make things difficult for the user (and the lexicographer who has to figure out how to sort the thing properly). • Investigate dictionaries within your language family to see what solutions others have devised.