1 / 16

The role of the Computer in Learner Lexicography

The role of the Computer in Learner Lexicography. early 1970s ALD3 1974 the dictionary was turned into a printed book the computer was not used for data-gathering or editorial work 1990s: large-scale corpora and machine-readable dictionaries The British National Corpus 117m. words

owen-roy
Télécharger la présentation

The role of the Computer in Learner Lexicography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The role of the Computer in Learner Lexicography • early 1970s • ALD3 1974 the dictionary was turned into a printed book • the computer was not used for data-gathering or editorial work • 1990s: large-scale corpora and machine-readable dictionaries • The British National Corpus 117m. words • The COBUILD Collins-Birmingham University International Language Database  Bank of English 320m. words

  2. From machine-readable to corpus-based dictionaries • ALD 3 1974 MRD • LDOCE 1978: information categories were ‘flagged’ so that the dictionary was a lexical database from which information could be extracted • COBUILD 1 1987: the computer was used for data-gathering, entry-preparation and compilation (selection of senses and examples)

  3. Computer corpora • Computers speeded up the process of gathering large bodies of authentic examples • Computer corpora are collections of texts stored in machine-readable form • texts can be captured from electronic sources (newspapers, documents), scanned by an optical reading machine or keyed into a PC

  4. The COBUILD corpus • John Sinclair at Birmingham University • The corpus was based on carefully measured samples of a range of varieties and discourse types (balanced corpus) • These texts would be relevant to international users • Spoken and written, non-technical, current, standard British English • Funded by Collins publishers  Collins Cobuild English Dictionary

  5. The balanced corpus • contains a measured sample of a range of varieties and discourse types • coverage of standard, core vocabulary • representation of subject areas of current interest • no systematic coverage of scientific and technical varieties

  6. concordancing • data should be quickly retrievable • KWIC Key word in context • concordance software/tool TextSTAT • concordancer, a concordance, a concordance string, the node, the node word, a concordance (on-screen) display • ‘raw’ data: spelling forms • lemmatizer: a tool designed to gather together inflected forms of the same lexeme

  7. tools • lemmatizer (give, gives, gave, giving) • grammatical tagging program: each item is given a grammatical category label so that it becomes possible to extract data according to grammatical classes (hard as an adj. or hard as an adv.) • a ‘parsed’ corpus is syntactically annotated (only parsed sub-corpora or ‘treebanks’ are available because of the complexity of parsing a text)

  8. the lexicographic workstation • resources available to the lexicographer for dictionary-making: • a lexical database (LDB) with structured and formalized information at entry level and between entries (cross-references) • a concordanced corpus • archives • pre-existing dictionaries in machine-readable form

  9. Impact of corpus linguistics on EFL lexicography • huge impact • importance of frequency of occurrence in a corpus for inclusion or non-inclusion in a learner’s dictionary • as a consequence, priority to core vocabulary and heavy-duty words (cf. Palmer and Hornby in 1930s)

  10. J. Sinclair Corpus, Concordance and Collocation 1991 • importance of large-scale corpora for the retrieval of linguistic information • context as a chief determinant of meaning • open choice vs idiom principle • illustrative examples

  11. the open-choice principle and the idiom principle • How does meaning arise from text? • Open choice principle: the combination of words in text is only governed by grammaticalness • Idiom principle: the combination of words is determined by the existence of semi-preconstructed phrases that constitute single choices (I see) • Palmer had come to the same conclusion 60 years before

  12. Authentic and made-up examples • Hornby strongly supported invented examples because they can be better shaped to meet learners’ needs • Sinclair strongly supported authentic examples as they better illustrate usage and guide composition (encoding) – they both explain the meaning and serve as models for speaking and writing • Authentic examples must be adapted anyway and adjusted to the physical limits of the dictionary

  13. Problems with authentic examples • they often reveal their full meaning with reference to a wider context • they may contain words that are difficult to understand or more difficult than the item being defined • e.g. “The children hadn’t been well, cooped up in a London flat she had procured at short notice”

  14. Usability of invented/edited examples • Palmer (GEW) had introduced the ‘listing’ of alternative words or phrases (e.g. a historic spot/event/speech) • Hornby had used ‘simplification’ in ISED: the reduction of a predicate or phrase pattern to a structural minimum (e.g. to repay kindness, to shiver with cold) • a full sentence example may provide superflous details in the name of authenticity

  15. composed allowed dark (attributive/predicative) company book/reserve twitter remind/remember Do you mind…? adjective patterns leave n. (collocations) rain (collocations) approximately, about, roughly research n. v. nearly/almost even though/even if whole look/see false friends (eventually) activities

  16. Notes on the final paper • NB (for students who did not attend the course) • The activities suggested in the previous slide are just ideas for the final paper. A skeleton sample paper is presented in the file composed_activity (“The syntactic pattern of the lexical item ‘composed’”) • A corpus search can be done using the British National Corpus (the COCA or the TIME corpus) which can be accessed at http://corpus.byu.edu/bnc/

More Related