1 / 63

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics. Lecture 8 Giuseppe Carenini. Knowledge-Formalisms Map ( next ~two lectures ). State Machines (and prob. versions) (Finite State Automata , Finite State Transducers, Markov Models) Neural Language Models, Neural Sequence Modeling. Morphology.

dvivian
Télécharger la présentation

CPSC 503 Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPSC 503Computational Linguistics Lecture 8 Giuseppe Carenini CPSC503 Winter 2019

  2. Knowledge-Formalisms Map (next ~two lectures) State Machines (and prob. versions) (Finite State Automata, Finite State Transducers, Markov Models) Neural Language Models, Neural Sequence Modeling Morphology Logical formalisms (First-Order Logics, Prob. Logics) Syntax Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Semantics Pragmatics Discourse and Dialogue AI planners (MDP Markov Decision Processes, Reinforcement learning) CPSC503 Winter 2019

  3. CPSC503 Winter 2019

  4. Today Jan 30 • Start Syntax • English Syntax • Context-Free Grammar for English • Rules • Trees • Problems • Intro Parsing CPSC503 Winter 2019

  5. Syntax Def. The study of how sentences are formed by grouping and ordering words Example: Ming and Sue prefer morning flights * Ming Sue flights morning and prefer Groups behave as single unitwrt Substitution, Movement, Coordination (they, it, do so) morning flights are preferred… …and… CPSC503 Winter 2019

  6. Syntax: Useful tasks • Why should you care? • Grammar checkers • Basis for semantic interpretation • Question answering • Information extraction • Summarization • Machine translation • Features for Machine Learning solutions to …. CPSC503 Winter 2019

  7. Key Constituents – with heads(English) (Specifier) X (Complement) • (Det) N (PP) • (Qual) V (NP) • (Deg) P (NP) • (Deg) A (PP) • (NP) (I) (VP) • Noun Phrases • Verb Phrases • Prepositional Phrases • Adjective Phrases • Sentences Some simplespecifiers Category Typical function Examples Determiner specifier of N the, a, this, no.. Qualifier specifier of V never, often.. Degree word specifier of A or P very, almost.. Complements? CPSC503 Winter 2019

  8. Sample Complements… more on handout For Verbs CPSC503 Winter 2019

  9. More complements CPSC503 Winter 2019

  10. Key Constituents: Examples • (Det) N (PP) • the cat on the table • (Qual) V (NP) • never eat a cat • (Deg) P (NP) • almost in the net • (Deg) A (PP) • very happy about it • Noun Phrases • Verb Phrases • Prepositional Phrases • Adjective Phrases CPSC503 Winter 2019

  11. Start-symbol Context Free Grammar (Example) • S -> NP VP • NP -> Det NOMINAL • NOMINAL -> Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left • Non-terminal • Terminal CPSC503 Winter 2019

  12. CFG more complex Example • Grammar with example phrases • Lexicon CPSC503 Winter 2019

  13. CFGs • Define a Formal Language (un/grammatical sentences) • Generative Formalism • Generate strings in the language • Reject strings not in the language • Impose structures (trees) on strings in the language CPSC503 Winter 2019

  14. Context Free Grammar (CFG) • 4-tuple (non-term., term., productions, start) • (N, , P, S) • P is a set of rules A; AN, (N)* CPSC503 Winter 2019

  15. CFG: Formal Definitions • 4-tuple (non-term., term., productions, start) • (N, , P, S) • P is a set of rules A; AN, (N)* • A derivation is the process of rewriting 1into  m (both strings in (N)*) by applying a sequence of rules: 1*  m • L G = W|w* and S * w CPSC503 Winter 2019

  16. Nominal Nominal flight Derivations as Trees CPSC503 Winter 2019

  17. I prefer a morning flight Nominal Nominal Parser flight CFG Parsing • It is completely analogous to running a finite-state transducer with its tapes • It’s just more powerful • Chpt. 11 CPSC503 Winter 2019

  18. Other Options • Regular languages (FSA) A xB or A x • Too weak (e.g., cannot deal with recursion in a general way – no center-embedding) • CFGs A  (also produce more understandable and “useful” structure) • Context-sensitive A ; ≠ • Can be computationally intractable • Turing equiv. ; ≠ • Too powerful / Computationally intractable CPSC503 Winter 2019

  19. Common Sentence-Types • Declaratives: A plane left S -> NP VP • Imperatives: Leave! S -> VP • Yes-No Questions: Did the plane leave? S -> Aux NP VP • WH Questions: Which flights serve breakfast? S -> WH NP VP When did the plane leave? S -> WH Aux NP VP CPSC503 Winter 2019

  20. NP: more details • NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom • e.g., all the other cheap cars NP -> Specifiers N Complements • Nom -> Nom PP (PP) (PP) • e.g., reservation on BA456 from NY to YVR Nom -> Nom GerundVP e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening CPSC503 Winter 2019

  21. Conjunctive Constructions • S -> S and S • John went to NY and Mary followed him • NP -> NP and NP • John went to NY and Boston • VP -> VP and VP • John went to NY and visited MOMA • … • In fact the right rule for English is • X -> X and X CPSC503 Winter 2019

  22. Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports …. CPSC503 Winter 2019

  23. Problems with CFGs • Agreement • Subcategorization CPSC503 Winter 2019

  24. Agreement • In English, • Determiners and nouns have to agree in number • Subjects and verbs have to agree in person and number • Many languages have agreement systems that are far more complex than this (e.g., gender). CPSC503 Winter 2019

  25. This dog Those dogs This dog eats You have it Those dogs eat *This dogs *Those dog *This dog eat *You has it *Those dogs eats Agreement CPSC503 Winter 2019

  26. S -> NP VP NP -> Det Nom VP -> V NP … SgS -> SgNP SgVP PlS -> PlNp PlVP SgNP -> SgDet SgNom PlNP -> PlDet PlNom PlVP -> PlV NP SgVP3p ->SgV3p NP … Possible CFG Solution OLD Grammar NEW Grammar Sg = singular Pl = plural CPSC503 Winter 2019

  27. CFG Solution for Agreement • It works and stays within the power of CFGs • But it doesn’t scale all that well (explosion in the number of rules) CPSC503 Winter 2019

  28. Subcategorization • Def. It expresses constraints that a predicate (verb here) places on the number and type of its arguments(see complements table) • *John sneezed the book • *I prefer United has a flight • *Give with a flight CPSC503 Winter 2019

  29. Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: I prefer [to leave earlier]TO-VP • Told: I was told [United has a flight]S • … CPSC503 Winter 2019

  30. So? • So the various rules for VPs overgenerate. • They allow strings containing verbs and arguments that don’t go together • For example: • VP -> V NP therefore Sneezed the book • VP -> V S therefore go she will go there CPSC503 Winter 2019

  31. Possible CFG Solution OLD Grammar NEW Grammar • VP -> V • VP -> V NP • VP -> V NP PP • … • VP -> IntransV • VP -> TransV NP • VP -> TransPPto NP PPto • … • TransPPto -> hand,give,.. This solution has the same problem as the one for agreement CPSC503 Winter 2019

  32. CFG for NLP: summary • CFGs cover most syntactic structure in English. • But there are problems (over-generation) • That can be dealt with adequately, although not elegantly, by staying within the CFG framework. • Numerous alternative approaches have been developed that all share the common theme of making better use of the lexicon. • Lexical-Functional Grammar (LFG), • Head-Driven Phrase Structure Grammar (HPSG), • Tree-Adjoining Grammar (TAG), • Combinatory Categorical Grammar (CCG) (Sect. 10.6.1 J&M 3rd Ed). Possible Pedagogical Project CPSC503 Winter 2019

  33. Today Jan 30 • Start Syntax • English Syntax • Context-Free Grammar for English • Rules • Trees • Problems • Intro (CFG) Parsing CPSC503 Winter 2019

  34. Nominal Nominal flight Parsing with CFGs • Valid parse trees • Sequence of words Assign valid trees: covers all and only the elements of the input and has an S at the top I prefer a morning flight Parser CFG CPSC503 Winter 2019

  35. CFG Parsing as Search • Search space of possible parse trees • S -> NP VP • S -> Aux NP VP • NP -> Det Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left, arrive • Aux -> do, does Parsing: find all trees that cover all and only the words in the input • defines CPSC503 Winter 2019

  36. Nominal Nominal flight Constraints on Search • Sequence of words • Valid parse trees Search Strategies: • Top-down or goal-directed • Bottom-up or data-directed I prefer a morning flight Parser CFG (search space) CPSC503 Winter 2019

  37. Input: flight Top-Down Parsing • Since we’re trying to find trees rooted with an S (Sentences) start with the rules that give us an S. • Then work your way down from there to the words. CPSC503 Winter 2019

  38. …….. • …….. • …….. Next step: Top Down Space • When POS categories are reached, reject trees whose leaves fail to match all words in the input CPSC503 Winter 2019

  39. flight flight flight Bottom-Up Parsing • Of course, we also want trees that cover the input words. So start with trees that link up with the words in the right way. • Then work your way up from there. CPSC503 Winter 2019

  40. …….. • …….. • …….. flight flight flight flight flight flight flight Two more steps: Bottom-Up Space CPSC503 Winter 2019

  41. Top-Down vs. Bottom-Up • Top-down • Only searches for trees that can be answers • But suggests trees that may not be consistent with the words • Bottom-up • Only forms trees consistent with the words • Suggest trees that may make no sense globally CPSC503 Winter 2019

  42. So Combine Them • Top-down: control strategy to generate trees • Bottom-up: to filter out inappropriate parses • Top-down Control strategy: • Depth vs. Breadth first • Which node to try to expand next • Which grammar rule to use to expand a node • (left-most) • (textual order) CPSC503 Winter 2019

  43. Top-Down, Depth-First, Left-to-Right Search Sample sentence: “Does this flight include a meal?” CPSC503 Winter 2019

  44. Example “Does this flight include a meal?” CPSC503 Winter 2019

  45. Example “Does this flight include a meal?” flight flight CPSC503 Winter 2019

  46. Example “Does this flight include a meal?” flight flight CPSC503 Winter 2019

  47. Problems with TD-BU • Ambiguity • Repeated Parsing • SOLUTION: • (once again dynamic programming!) CPSC503 Winter 2019

  48. (1) Structural Ambiguity Three basic kinds: Attachment/Coordination/NP-bracketing “I shot an elephant in my pajamas” CPSC503 Winter 2019

  49. Structural Ambiguity (Ex. 1) VP -> V NP ; NP -> NP PP VP -> V NP PP “I shot an elephant in my pajamas” CPSC503 Winter 2019

  50. Structural Ambiguity (Ex.2) “I saw Mary passing by cs2” “I saw Mary passing by cs2” (ROOT (S (NP (PRP I)) (VP (VBD saw) (S (NP (NNP Mary)) (VP (VBG passing) (PP (IN by) (NP (NNP cs2))))))) (ROOT (S (NP (PRP I)) (VP (VBD saw) (NP (NNP Mary)) (S (VP (VBG passing) (PP (IN by) (NP (NNP cs2))))))) CPSC503 Winter 2019

More Related