1 / 141

MultiWord Expressions in NLP

MultiWord Expressions in NLP. Jan Odijk LOT Summerschool Utrecht, June 2004. Overview. NLP MWEs MWEs in NLP MWE Types Treatment of MWEs in selected frameworks MWEs and the lexicon. Overview. NLP MWEs MWEs in NLP MWE Types Treatment of MWEs in selected frameworks

platt
Télécharger la présentation

MultiWord Expressions in NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MultiWord Expressions in NLP Jan Odijk LOT Summerschool Utrecht, June 2004

  2. Overview • NLP • MWEs • MWEs in NLP • MWE Types • Treatment of MWEs in selected frameworks • MWEs and the lexicon

  3. Overview • NLP • MWEs • MWEs in NLP • MWE Types • Treatment of MWEs in selected frameworks • MWEs and the lexicon

  4. Natural Language Processing • Automatic processing of natural language • Generation: Semantic Repr  String • Analysis: String  Semantic Representation • Example applications • Machine Translation (MT) • Information Retrieval (IR) • Cross-language Information Retrieval (CLIR) • Question-Answering

  5. Natural Language Processing • Based on Grammars • (Popular) frameworks • Feature structure based • Head-driven Phrase Structure Grammar (HPSG) • Lexical-Functional Grammar (LFG) • Tree-based • Tree-Adjoining Grammar (TAG) • M-Grammar • Based on grammar components or dedicated modules • Decompounding • PoS-tagging • Chunking • Named Entity Recognition • Name/Address grammars • Date / Amount grammars

  6. Natural Language Processing • Based on Statistics • No explicit grammar • Statistics • Derived from (annotated) training corpus • Tested with test corpus • Applied to new corpora • Combinations of grammar and statistics

  7. NLP Grammar • Defines <form, meaning> pairs and structural descriptions at various levels • Components • Semantics • Syntax • Morphology • Orthography (Phonology)

  8. NLP Grammar • Semantics • Defines the meaning of an utterance • usually synchronized with syntax (compositionality) • HPSG: CONTENTS attribute • M-Grammar: in-tandem build up • Synchronous TAG: in-tandem build-up with derivation trees • LFG: in tandem with f-structure

  9. NLP Grammar • Syntax • Defines the syntactic structure of an utterance • Object types: Trees, DAGs • Features: attribute-value pairs • Value: atomic or structured

  10. NLP Grammar • Syntax • Often surface syntax and deep syntax (not necessarily on a separate level) • HPSG: surface tree v. DAG • M-Grammar: surface trees v. derivation trees • LFG: c-structure v. f-structure • TAG: derived tree v. derivation tree • Alpino: surface tree v. dependency tree

  11. NLP Grammar • Morphology • Relates (word structure, string) • Word-internal structure build-up usually in the syntactic component • Usually a rule system (intensional definition) • Simple Inflection: sometimes list of triples <base form, morph prop, word form> (extensional definition)

  12. NLP Grammar • Orthography • Relates ([String], String) • [he, said, :, “, come, in, !, “] • He said: “come in!” • Usually trivial in generation • Easy in analysis (tokenization) for many languages • Sometimes split (erop, opgebeld) • Very problematic for Chinese, Japanese, etc.

  13. Overview • NLP • MWEs • MWEs in NLP • MWE Types • Treatment of MWEs in selected frameworks • MWEs and the lexicon

  14. What are MWEs?

  15. What are MWEs? • sequence of words that has lexical, orthographic, phonological, morphological, syntactic, semantic, pragmatic or translational properties not predictable from the individual components or their normal mode of combination

  16. What are MWEs? • sequence of • Not necessarily contiguous in a concrete utterance • ...omdat hij de plaat wilde poetsen • Not necessarily always in the same order in each utterance • Hij poetste gisteren de plaat • words • Ambiguity between type and token (intentional) • Inflected word form v. lemma • Ambiguity between • Character sequences separated from other character sequences by spaces and other separators (Narrow interpretation) • Abstract lexical units of the grammar (Broad interpretation)

  17. What are MWEs? • that has properties not predictable from the individual components and their normal mode of combination

  18. What are MWEs? • Lexical • De plaatpoetsen • Een poging wagen / doen / *maken • Dat varkentje eens wassen • Zware / *sterke shag • Scherpe kritiek • Perdre la tête/ la boule / *la cervelle • Se creuser la tête / * la boule / la cervelle

  19. What are MWEs? • Orthographic • viz. • Bijv. • www.uilots.nl • i.v.m. • Yahoo! • Groen! • Aujourd’hui (v. l’homme) • ‘s (avonds/morgens/middags)

  20. What are MWEs? • phonological, • Over de rooie/*rode (gaan/zijn/raken) • om de dooie/*dode donder niet • op zijn dooie akkertje/gemak • op zijn dooie eentje • De kwaaie/*kwade Piet toegespeeld krijgen • Je niet in de kouwe/*koude kleren gaan zitten • Een gouwe ouwe • (but geen rode/rooie cent/duit (hebben))

  21. What are MWEs? • morphological, • Ten gevolge van • Ter wereld • Van goeden huize • Zonder aanzien des persoons • Het lood*(je) leggen • Dat varken*(tje) wassen • De *raap is / rapen zijn gaar

  22. What are MWEs? • Syntactic • Ten gevolge van • In opdracht van (no article) • Iemand een oor aannaaien • Rekening houden met (obligatorily indefinite) • Het bijvoeglijk(*e) naamwoord (v. een groot/grote man)

  23. What are MWEs? • Semantic • De plaat poetsen • Dat varkentje wassen • Een bok schieten • Een flater slaan

  24. What are MWEs? • Pragmatic • Ladies and Gentlemen • Ik heb gezegd. • Eet smakelijk! (Bon appétit!, Enjoy!) • Sincerely yours

  25. What are MWEs? • Translational properties • Laten zien (F. montrer, E. show) • Witte wijn (P. vinho verde) • Nuclear power plant (D. atoomcentrale, G. Kernkraftwerk) • Space probe (F. sonde spatiale) • Iemand iets laten weten • inform someone of something

  26. Overview • NLP • MWEs • MWEs in NLP • MWE Types • Treatment of MWEs in selected frameworks • MWEs and the lexicon

  27. MWEs in NLP • MWEs occur very often in natural language • Esp. in languages with little compounding • Especially in specialized domains • Multi-word terminology

  28. MWEs in NLP • MT • Improves parsing and translation of the MWEs • Also improves parsing hence translation of the sentence containing the MWEs (Nivre & Nilsson LREC 2004) • CLIR • Nuclear power plant • Kern- macht plant • Kern- Macht Pflanz • v. atoomcentrale / Kernkraftwerk

  29. MWEs in NLP • Problems MWEs pose for NLP • How are MWEs to be dealt with in the grammar of an NLP system? • What lexical representation of MWEs is required for this? • How can we obtain lexicons containing MWEs with such lexical representations

  30. Overview • NLP • MWEs • MWEs in NLP • MWE Types • Treatment of MWEs in selected frameworks • MWEs and the lexicon

  31. Types of MWEs (I) • Fixed • Semi-flexible • Flexible

  32. Fixed MWEs • Fixed MWEs • Words of the MWE in a fixed order • No variation in lexical item choice • Always contiguous (no other elements in between) • No inflectional processes except at the edges

  33. Fixed MWEs • Fixed MWEs • ad hoc, stante pede, ter plaatse • Hong Kong, Kuala Lumpur, New York, San Francisco • credit card, travel agency, real estate agency • NOT • in plaats van (cf. in plaats daarvan) (‘instead of’) • carta telefonica (cf. carte telefoniche) • de plaat poetsen (‘polish the plate’, ‘bolt’)

  34. Semi-Flexible MWEs • Semi-Flexible MWEs • MWEs with fixed order of elements • That are impenetrable for other words • Parts can be inflected

  35. Semi-Flexible MWEs • Examples: • Chambre des représentants • House of representatives • Patatas fritas • French fries • Mise au point automatique • Autofocus • Calculateur analogique • Analogue computer

  36. Semi-Flexible MWEs • Examples: • Cité plus haut • Above-stated • Résistant aux acides • Acid-proof • Malade en altitude • Airsick

  37. Flexible MWEs • Flexible MWEs • Allow or require inflection in multiple parts, and • Allow permutations of subphrases, or • Allow intrusion by other phrases, or • Have controlled variation (bound pronouns)

  38. Flexible MWEs • de plaat poetsen (‘bolt’) • Hij heeft gisteren de plaat gepoetst • …omdat hij de plaat wilde poetsen • Hij poetste gisteren de plaat • to lose one’s temper • He lost his temper • She lost her temper

  39. Treatment • Fixed MWEs • No inflection: Relate single string to sequence of strings (in Orthography) • ([ad_hoc] , [ad, hoc]) • Lexical entry for ad_hoc • With inflection: Relate single stem to sequence of stems in Morphology • ([real, estate, agency, Plur] -> [real_estate_agency, Plur]) • Lexical entry for real_estate_agency

  40. Treatment • Semi-flexible MWEs • Require local syntax • Chunking may be enough

  41. Treatment • Flexible MWEs • Require sophisticated syntax

  42. Types of MWEs (II) • Verb –particle combinations (English, German, Dutch, Hungarian) • Ik sloeg hem over • I looked the passage up

  43. Types of MWEs (II) • Verb + prepositional complement • I looked after her • Hij heeft altijd van haar gehouden

  44. Types of MWEs (II) • Circumpositions (Dutch, German) • Op iemand af / ?toe / *heen • Auf jemanden *ab / zu • Over de brug heen / *af / *toe

  45. Types of MWEs (II) • Lexical item (from open or closed class) • + closed class lexical item • Finite (actually small) list • Limited variety of predictable syntactic structures • Dealt with by almost any grammar-based NLP system

  46. Types of MWEs (II) • Multiword Names • Examples • Fifth Avenue • Koning Leopold III-laan • Krimpen aan de IJssel • Koninklijke Nederlandse Philips N.V.

  47. Types of MWEs (II) • Multiword Names • Issues • Keys – variation • (Koning) Leopold III-laan • Fifth (Avenue) • ((Calle) Roberto) González • Many different ones, continuously new ones • Very important for correct parsing and translation • Minister Kohl  Minister Cabbage

  48. Types of MWEs(II) • Compounds (in English) • Examples • Real estate agency • Nuclear power plant • Blue cheese • Private eye • High school

  49. Types of MWEs(II) • Idioms • No or unpredictable meaning of the components • Fixed (or very limited ) lexical item selection • Opaque • Kick the bucket • De plaat poetsen • Casser sa pipe

  50. Types of MWEs(II) • Idioms • Semi-transparant • `een bok schieten’ • Bok (male goat) = blunder • Schieten (shoot) = make • `dat varkentje wassen’ • Varkentje (little pig) = problem • Wassen (wash) = address, take care of

More Related