300 likes | 426 Vues
Corpora for all. Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex. Overview. What is a corpus? History Humanities Linguistics Language teaching Corpora in the classroom Scaring the students Alternative strategies. What is a corpus.
E N D
Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex
Adam Kilgarriff Overview What is a corpus? History Humanities Linguistics Language teaching Corpora in the classroom Scaring the students Alternative strategies
Adam Kilgarriff What is a corpus A collection of texts When used for linguistic/literary research Growth in last two decades Computer power Text available electronically Tools
Adam Kilgarriff History Bible studies Literary criticism Shakespeare concordance: 200 years ago Dictionary-making Samuel Johnson (1754) Oxford English Dictionary
Adam Kilgarriff History Psychology How do children learn language? Education Teaching to read Thorndike and Lorge, 1940s Word lists for teaching Brown corpus, 1960s, 1m words First modern corpus
Adam Kilgarriff History in Linguistics Pre-Chomsky Chomsky (from 1957) Competence and performance Corpora out of fashion More recently Computational linguistics/NLP Often corpus-based Web corpora, Google
Adam Kilgarriff In English Language Teaching For vocabulary selection West's General Service List, 1953 Main reference until BNC (1994) To find how the language really is Textbook language often wrong John Sinclair, Birmingham Learner corpora
Adam Kilgarriff Direct and indirect Indirect Vocab lists Dictionaries COBUILD (Collins, Birmingham Univ) 1980 Oxford, Longman British National Corpus (1994) 100m words: enormous for its time Now: all leading dictionaries use corpora Textbooks
Adam Kilgarriff Corpora in the classroom Direct use Tim Johns data-driven learning Students explore concordances Discover language facts for themselves Real language Test hypotheses If they learn like this, they will remember Since 1994: TALC conferences
Adam Kilgarriff “Condensed reading” Best vocabulary learning Extensive reading How to focus? Reinforce vocab items Classroom exercise Cobb (1999) Students need 2500 new words in a year pretend they are corpus lexicographers Each week, work out meaning of 200 new items
Adam Kilgarriff Is C-in-the-C successful? After twenty years Minority interest Advanced level (university) only Most teachers haven't heard of it Compare indirect corpus use other parts of linguistics Why?
Adam Kilgarriff Do they meet student needs? Dictionary is much easier Concordances slow and arduous distractions, confusions Motivation Not sexy “I want to learn English, not Corpus Linguistics”
Adam Kilgarriff ... scaring the students Concordances are hard to read No context Incomplete sentences Complex structures Difficult vocab Junk (every corpus has it)
Adam Kilgarriff Reading concordances Best donequickly Read many lines Filter Find patterns a second per line Also for filtering Learners Not possible
Adam Kilgarriff New strategies Between corpus and dictionary GDEX Find good example sentences Automatic Collocations Dictionary Motivation
Adam Kilgarriff Corpus and dictionary Dictionary High quality but limited in size might not have what you need Corpus vast forwhen the dictionary does not tell you enough
Adam Kilgarriff Corpus and dictionary Corpus unfamiliar difficult Dictionary familiar high quality sometimes even
Adam Kilgarriff Corpus and dictionary Corpus unfamiliar difficult Dictionary familiar high quality sometimes even loved
Adam Kilgarriff Corpus and dictionary Corpus unfamiliar difficult Dictionary familiar high quality sometimes even loved Disguise corpus as dictionary Word sketch
Adam Kilgarriff In dictionaries: Users appreciate examples Paper: space constraints Electronic: no space constraints Give lots of example Constraint Cost of selection, editing GDEX: good example finder
Adam Kilgarriff What makes a good example? Readable EFL users Informative Typical, for the collocation Context helps user understand target word/phrase
Adam Kilgarriff GDEX Get concordance For each sentence Score it Sort Show best ones
Adam Kilgarriff GDEX heuristics Sentence length (10-26 words) Mostly common words: good Rare words: bad Sentences Start with capital, end with one of .!? No [, ], <, >, http, \ Penalise: Other punctuation, numbers More than 2 or 3 capitals Typicality: third collocate is a plus
Adam Kilgarriff GDEX: Models for use • More examples for dictionaries • With manual checking • Original project: MEDAL • Without • Some-some • Corpus query tool • Sort concordances, best first • option in Sketch Engine • Automatic collocations dictionary • http://forbetterenglish.com
Adam Kilgarriff Motivation What if ... Student's favourite topic Not The family But Hip hop;manga;gaming Student owns corpus demo
Adam Kilgarriff Summary Long history Word frequency lists Concordances scare students because Too hard to read Corpus valuable where dictionary does not tell you enough Corpus and dictionary Points in between
Adam Kilgarriff Sketch Engine English, Chinese, other languages In use at OUP, Macmillan, CUP, Collins Many universities Word sketches Instant web corpora WebBootCaT Free trial Flyer
Adam Kilgarriff Thank you http://www.sketchengine.co.uk