1 / 48

Without data, nothing

Without data, nothing. Adam Kilgarriff Lexical Computing Ltd University of Leeds. Generative Lexicon. Account of non-standard uses of words So: we need a dataset. Method. Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag

lovey
Télécharger la présentation

Without data, nothing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds

  2. Generative Lexicon • Account of non-standard uses of words • So: we need a dataset Kilgarriff: Without Data, Nothing

  3. Method • Sample of words • Sample of corpus instances for each • Choose a dictionary • Sense-tag • Identify mismatches to dict senses • For each • Does it fit the GL model? Kilgarriff: Without Data, Nothing

  4. Resources • Words (random sample) • modest disability steering seize sack (v) sack (n) onion rabbit handbag • Corpus instances • between 82 and 718 for each word • Total: 2276 • Dictionary: HECTOR • OUP/Xerox project in corpus lexicography Kilgarriff: Without Data, Nothing

  5. Tagging • Three professional lexicographers • Assign sense to each corpus instance • For this exercise • If anything other than 3-way agreement • Re-examine • 390 of 2276 cases (17%) Kilgarriff: Without Data, Nothing

  6. modest • Any two dictionaries divide up space differently • HECTOR: 9 • CIDE: 3 • LDOCE: 4 • COBUILD: 5 • tagger agreement – less than half • Messy but no GL-like cases Kilgarriff: Without Data, Nothing

  7. What is language? Kilgarriff, Global WordNet

  8. steering • 2 senses • Activity: his steering was careless • Mechanism: they overhauled the steering • 16 re-examined, most underspecified • it has the Peugeot’s steering feel • One more complex case • After nearly fifty years [as a bus driver] Mr. Hannis stepped down from behind the steering wheel Kilgarriff: Without Data, Nothing

  9. onion • Two senses: plant and food • 34 cases re-examined • 10 bridged divide • Plant the sets two inches apart to produce a good yield of medium-sized onions • Others – medicine, decorative feature, dye, cliché of Frenchness • It’s not all frogs legs and strings of onions in the South of France Kilgarriff: Without Data, Nothing

  10. sack (n) • 2 x sack race • One metaphor • Santa Claus Ridley pulled another doubtful gift from his sack • Ridley: British politician Kilgarriff: Without Data, Nothing

  11. sack (v) • And Labour MP, Mr. Bruce George, has called for the firm to be sacked from duty at Prince Andrew’s £5 million home at Sunningwell Park near Windsor • Non-standard because end-employment needs PERSON as direct object. • Candidate for GL treatment Kilgarriff: Without Data, Nothing

  12. handbag • She moved from handbags through gifts to the flower shop • [handbag department in department store] • Candidate for GL treatment Kilgarriff: Without Data, Nothing

  13. Results • 2276 corpus instances • 390 re-examined • 41 non-standard uses • 2 potentially accounted for by GL Conclusion • GL will never account for a large share of non-standard word use Kilgarriff: Without Data, Nothing

  14. What is language? Kilgarriff: Without Data, Nothing

  15. What is language? • In our heads Kilgarriff: Without Data, Nothing

  16. What is language? • In our heads • In texts and sound signals Kilgarriff: Without Data, Nothing

  17. What is language? • In our heads • In texts and sound signals • Both Kilgarriff: Without Data, Nothing

  18. Methodology • Study language in our heads • Introspection • Semantic analysis • Experiments with human subjects • “rationalist” (Leibniz, Chomsky) • Problems: coverage, arbitrariness Kilgarriff: Without Data, Nothing

  19. Methodology • Study text • “empiricist” (Locke, Hume) • Physics: forces, matter • Chemistry: chemicals, bonds • Language: text, speech signals Kilgarriff: Without Data, Nothing

  20. Empiricist linguistics • A new way to find out about language • 20 years of rapid ascent • Computers • Corpora • bigger and bigger data sets available • Language technology tools • lemmatizers, POS-taggers, parsers, machine learning for pattern finding Kilgarriff: Without Data, Nothing

  21. Preliminaries over • What is a word sense Kilgarriff: Without Data, Nothing

  22. Preliminaries over • What is a word sense • (my PhD in 5 slides) Kilgarriff: Without Data, Nothing

  23. Preliminaries over • What is a word sense • (my PhD in 5 slides) • Where do you find them? Kilgarriff: Without Data, Nothing

  24. Preliminaries over • What is a word sense • (my PhD in 5 slides) • Where do you find them? • Dictionaries! Kilgarriff: Without Data, Nothing

  25. The lexicographers • They create them • Methods • Introspection • Other dictionaries • Corpus • Atkins, Hanks Kilgarriff: Without Data, Nothing

  26. What is a word sense (1) • SFIP • Sufficiently frequent insufficiently predictable • (a glass of) whisky • x (a glass of) tequila Kilgarriff: Without Data, Nothing

  27. What is a word sense (2) homonymy analogy polysemy rules phraseology Kilgarriff: Without Data, Nothing

  28. What is a word sense (3) • A cluster • Of instances of use • Operationalised as: corpus lines • Clustered by lexicographers Kilgarriff: Without Data, Nothing

  29. What is a word sense (3) Kilgarriff: Without Data, Nothing

  30. What is a word sense (3) Kilgarriff: Without Data, Nothing

  31. What is a word sense (3) Kilgarriff: Without Data, Nothing

  32. What is a word sense (3) Kilgarriff: Without Data, Nothing

  33. What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting Kilgarriff: Without Data, Nothing

  34. Theory • Hanks • Norms and exploitations • Task of lexicographer • Record the norms • Speakers may always exploit norms to say something new Kilgarriff: Without Data, Nothing

  35. Boring question • Homonymy or polysemy • We all know it’s a kline • Interesting question • Norm or exploitation Kilgarriff: Without Data, Nothing

  36. metaphor • see meaning understand • Norm I travelled the path From life towards art Desire the horse Depression the cart Leonard Cohen • Exploitation Kilgarriff: Without Data, Nothing

  37. How do they do it? • honeymoon Kilgarriff: Without Data, Nothing

  38. Kilgarriff: Without Data, Nothing

  39. Kilgarriff: Without Data, Nothing

  40. Kilgarriff: Without Data, Nothing

  41. Kilgarriff: Without Data, Nothing

  42. Kilgarriff: Without Data, Nothing

  43. The Sketch Engine • Corpus query tool • Used for making dictionaries at • OUP, CUP, Collins, Macmillan, Le Robert, Cornelsen, Elhuyar Foundation • Also Universities • Linguistic research • Teaching • Linguistics, also languages Kilgarriff: Without Data, Nothing

  44. 60 languages covered Kilgarriff: Without Data, Nothing

  45. Kilgarriff: Without Data, Nothing

  46. Individual licences (£4.99/month) • University site licences • Free trial – self register Kilgarriff: Without Data, Nothing

  47. Build instant corpora form the web • WebBootCaT • Install your corpora • Compare corpora • http://www.sketchengine.co.uk Kilgarriff: Without Data, Nothing

  48. Thank you homonymy analogy polysemy rules phraseology Kilgarriff: Without Data, Nothing

More Related