Corpora, Language Technology and Maltese
170 likes | 212 Vues
Learn about the evolution of corpus research, from pre-computer to modern collocation statistics and word sketches. Discover how to navigate the challenges of large corpora effectively. Explore the applications of corpus linguistics in language studies.
Corpora, Language Technology and Maltese
E N D
Presentation Transcript
Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex
How do you find out about a language? • Native speakers • Dictionaries and Grammars • Corpus Kilgarriff, Lexical Computing
Four ages of corpus research Kilgarriff, Lexical Computing
Age 1: • Pre-computer • Oxford English • Dictionary: • 20 million • index cards Kilgarriff, Lexical Computing
Age 2: KWIC Concordances • From 1980 • Computerised Kilgarriff, Lexical Computing
Age 2: KWIC Concordance Kilgarriff, Lexical Computing
Age 2: KWIC Concordances • From 1980 • Computerised • COBUILD project was innovator • the coloured-pens method Kilgarriff, Lexical Computing
The coloured pens method 1political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people Kilgarriff, Lexical Computing
Age 2: limitations as corpora get bigger: too much data • 50 lines for a word: read all • 500 lines: could read all, takes a long time • 5000 lines: no Kilgarriff, Lexical Computing
Age 3: Collocation statistics • Problem:too much data - how to summarise? • Solution:list of words occurring in neighbourhood of headword, with frequencies • Sorted by salience Kilgarriff, Lexical Computing
Collocation listing For right collocates of save (>5 hits) Kilgarriff, Lexical Computing
Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour Kilgarriff, Lexical Computing
Age 4: The word sketch • Large well-balanced corpus • Parse to find • subjects, objects, heads, modifiers etc • One list for each grammatical relation • Statistics to sort each list Kilgarriff, Lexical Computing
Macmillan English Dictionary For Advanced Learners Ed: Rundell, 2002 Kilgarriff, Lexical Computing
Developer: Pavel Rychly, Brno • Users: • OUP, Chambers, CUP • Universities for teaching and research • ELT textbook authors • Demo: • http://www.sketchengine.co.uk/ • Self-registration for free account • Paper: Kilgarriff & Rychly (2004) – Proc Euralex, Lorient, France) [pdf] Kilgarriff, Lexical Computing