1 / 57

Natural Language Processing Syntactic Parsing

Natural Language Processing Syntactic Parsing. Meeting 13, Oct 11, 2012 Rodney Nielsen Most of these slides were adapted from James Martin. Subcategorization. Many valid VP rules But not valid for all verbs Subcategorize verbs by sets of VP rules Variation on transitive/intransitive

patia
Télécharger la présentation

Natural Language Processing Syntactic Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language ProcessingSyntactic Parsing Meeting 13, Oct 11, 2012 Rodney Nielsen Most of these slides were adapted from James Martin

  2. Subcategorization • Many valid VP rules • But not valid for all verbs • Subcategorize verbs by sets of VP rules • Variation on transitive/intransitive • Grammars may have 100s of classes

  3. Subcategorization • Sneeze: John sneezed • Find: Please find [a flight to NY]NP • Give: Give [me]NP[a cheaper fare]NP • Help: Can you help [me]NP[with a flight]PP • Prefer: Iprefer [to leave earlier]TO-VP • Told:Iwas told [United has a flight]S • …

  4. Programming Analogy • Verbs = methods • Subcat frames specify the number, position and type of arguments • Like formal parameters to a method

  5. Subcategorization • *John sneezed the book • *I prefer United has a flight • *Give with a flight • As with agreement phenomena, we need a way to formally express these facts

  6. Treebanks • Treebanks • corpora of sentence parse trees • These are generally created • First automatically parse • Then correct • Detailed annotation guidelines • POS tagset • Grammar • Instructions per grammatical constructions

  7. Penn Treebank • Penn TreeBank is a widely used treebank. • Most well known part is the Wall Street Journal section of the Penn TreeBank. • 1 M words from the 1987-1989 Wall Street Journal.

  8. Lexically Decorated Tree • Head Finding

  9. Head Finding - Noun Phrases

  10. Dependency Grammars • CFG-style phrase-structure grammars • Focus on constituents • Dependency grammar • Tree • Nodes = words • Links = dependency relations • Relations may be typed (labeled), or not.

  11. Dependency Parse They hid the letter on the shelf

  12. Treebank and Head-Finding Uses • Critical to develop statistical parsers • Chapter 14 • Valuable to Corpus Linguistics • Investigating empirical details of constructions

  13. Summary • CFGs model syntax • Parsers often critical applications components • Constituency: key phenomena easily captured with CFG rules • Agreement and subcategorization pose significant problems • Treebanks: corpus of sentence trees

  14. Today 10/11/2012 Syntactic Parsing • CKY

  15. Automatic Syntactic Parse

  16. CFG Parsing • Assigning proper trees • Trees that exactly cover the input • Not necessarily the correct tree

  17. For Now • Assume… • Words are in a buffer • No POS tags • Ignore morphology • Words are known • No out of vocabulary (OOV) terms • Poor assumptions for a real application

  18. Top-Down Search • Start with a rule mapping to S (sentences) • Progress down from there to the words

  19. Top Down Space

  20. Bottom-Up Parsing • Or … • Start with trees rooted at the words • Progress up to larger trees

  21. Bottom-Up Search

  22. Bottom-Up Search

  23. Bottom-Up Search

  24. Bottom-Up Search

  25. Bottom-Up Search

  26. Top-Down versus Bottom-Up • Top-down • Proper, feasible trees • But potentially inconsistent with the words • Bottom-up • Consistent with the words • But trees might not make sense globally

  27. Search Strategy • How to search space and make choices • Node to expand next? • Grammar rule used for expansion • Backtracking • Make a choice • If it works, continue • If not, back up and make a different choice

  28. Problems • Even with the best filtering, backtracking methods are doomed because of two inter-related problems • Ambiguity • Shared subproblems

  29. Ambiguity

  30. Shared Sub-Problems • No matter what kind of search • Don’t want to redo work already done • Naïve backtracking leads to duplicated work

  31. Shared Sub-Problems • Consider: a flight from Indianapolis to Houston on TWA

  32. Shared Sub-Problems • Assume a top-down parse making choices among the various Nominal rules • In particular, between these two • Nominal -> Noun • Nominal -> Nominal PP • Statically choosing the rules in this order leads to the following bad behavior...

  33. Shared Sub-Problems

  34. Shared Sub-Problems

  35. Shared Sub-Problems

  36. Shared Sub-Problems

  37. Dynamic Programming • Dynamic Programming search • Fill tables with partial results • Avoid repeating work • Solve exponential problems in nearly polynomial time • Efficiently store ambiguous structures with shared sub-parts • Bottom-up approach • CKY • Top-down approach • Earley

  38. CKY Parsing • Limit grammar to epsilon-free binary rules • Consider the rule A BC • If there is an A somewhere in the input generated by this rule then there must be a B followed by a C in the input • If A spans [i to j), there must be a k st. i<k<j • I.e., B splits from C someplace after i and before j

  39. Problem • What if your grammar isn’t binary? • E.g., the Penn TreeBank • Convert it to binary • Any CFG can be rewritten into Chomsky-Normal Form automatically • What does this mean? • Resulting grammar accepts (and rejects) the same set of strings • But the derivations (trees) are binary

  40. Sample L1 Grammar

  41. CNF Conversion

  42. CKY • Build a table so A spanning [i to j) in the input is placed in cell [i, j] in the table • Non-terminal spanning entire string is in [0, n] • Parts of Amust be i to k & k to j, for some k

  43. CKY • Given A B C • Look for B in [i,k] and C in [k,j]. • I.e., if there is an A spanning i,jAND • A B C THEN • There must be a B in [i,k] and a C in [k,j] for some k such that i<k<j

  44. CKY • Fill the table by looping over the cell values [i, j] in a systematic way • For each cell, loop over the appropriate k values to search for things to add

  45. CKY Table

  46. CKY Algorithm What’s the complexity of this?

  47. CKY • Fills the table one column at a time, from left to right, bottom to top • When filling a cell, the parts needed are already in the table (to the left and below) • It’s somewhat natural in that it processes the input left to right a word at a time • Known as online

  48. Example

  49. Example

  50. Example • Filling col 5 == processing word 5 (Houston) • j is 5. • i goes from 3 to 0 (3,2,1,0)

More Related