1 / 76

Statistical methods in NLP

Statistical methods in NLP. Diana Trandabat 2013-2014. CKY Parsing. Cocke-Kasami-Younger parsing algorithm: (Relatively) efficient bottom-up parsing algorithm based on tabulating substring parses to avoid repeated work Approach: Use a Chomsky Normal Form grammar

mateja
Télécharger la présentation

Statistical methods in NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical methods in NLP Diana Trandabat 2013-2014

  2. CKY Parsing • Cocke-Kasami-Younger parsing algorithm: • (Relatively) efficient bottom-up parsing algorithm based on tabulating substring parses to avoid repeated work • Approach: • Use a Chomsky Normal Formgrammar • Build an (n+1) x (n+1) matrix to store subtrees • Upper triangular portion • Incrementally build parse spanning whole input string

  3. Reminder • A CNF grammar is a Context-Free Grammar in which: • Every rule LHS is a non-terminal • Every rule RHS consists of either a single terminal or two non-terminals. • Examples: • A  BC • NP  Nominal PP • A  a • Noun  man • But not: • NP  the Nominal • S VP

  4. Reminder • Any CFG can be re-written in CNF, without any loss of expressiveness. • That is, for any CFG, there is a corresponding CNF grammar which accepts exactly the same set of strings as the original CFG.

  5. Dynamic Programming in CKY • Key idea: • For a parse spanning substring [i,j] , there exists some k such there are parses spanning [i,k] and [k,j] • We can construct parses for whole sentence by building up from these stored partial parses • So, • To have a rule A -> B C in [i,j], • We must have B in [i,k] and C in [k,j], for some i<k<j • CNF grammar forces this for all j>i+1

  6. CKY • Given an input string S of length n, • Build table (n+1) x (n+1) • Indexes correspond to inter-word positions • W.g., 0 Book 1 That 2 Flight 3 • Cells [i,j] contain sets of non-terminals of ALL constituents spanning i,j • [j-1,j] contains pre-terminals • If [0,n] contains Start, the input is recognized

  7. Recognising strings with CKY Example input: The flight includes a meal. • The CKY algorithm proceeds by: • Splitting the input into words and indexing each position. (0) the (1) flight (2) includes (3) a (4) meal (5) • Setting up a table. For a sentence of length n, we need (n+1) rows and (n+1) columns. • Traversing the input sentence left-to-right • Use the table to store constituents and their span.

  8. The table Rule: Det  the [0,1] for “the” the flight includes a meal

  9. The table Rule1: Det  the Rule 2: N  flight [0,1] for “the” [1,2] for “flight” the flight includes a meal

  10. The table [0,2] for “the flight” Rule1: Det  the Rule 2: N  flight Rule 3: NP  Det N [1,2] for “flight” [0,1] for “the” the flight includes a meal

  11. A CNF CFG for CKY • S  NP VP • NP  Det N • VP  V NP • V  includes • Det  the • Det  a • N  meal • N  flight

  12. CYK algorithm: two components Lexical step: for j from 1 to length(string) do: let w be the word in position j find all rules ending in w of the form X  w put X in table[j-1,1] Syntactic step: for i = j-2 to 0 do: for k = i+1 to j-1 do: for each rule of the form A B Cdo: if B isin table[i,k] & C is in table[k,j] then add A to table[i,j]

  13. CKY algorithm: two components for j from 1 to length(string) do: let w be the word in position j find all rules ending in w of the form X  w put X in table[j-1,1] for i = j-2 to 0 do: for k = i+1 to j-1 do: for each rule of the form A B Cdo: if B isin table[i,k] & C is in table[k,j] then add A to table[i,j] We actually interleave the lexical and syntactic steps:

  14. CKY: lexical step (j = 1) • Lexical lookup • Matches Det  the • The flight includes a meal.

  15. CKY: lexical step (j = 2) • Lexical lookup • Matches N  flight • The flight includes a meal.

  16. CKY: syntactic step (j = 2) • Syntactic lookup: • look backwards and see if there is any rule that will cover what we’ve done so far. • The flight includes a meal.

  17. CKY: lexical step (j = 3) • Lexical lookup • Matches V  includes • The flight includes a meal.

  18. CKY: lexical step (j = 3) • Syntactic lookup • There are no rules in our grammar that will cover Det, NP, V • The flight includes a meal.

  19. CKY: lexical step (j = 4) • Lexical lookup • Matches Det  a • The flight includes a meal.

  20. CKY: lexical step (j = 5) • Lexical lookup • Matches N  meal • The flight includes a meal.

  21. CKY: syntactic step (j = 5) • Syntactic lookup • We find that we have NP  Det N • The flight includes a meal.

  22. CKY: syntactic step (j = 5) • Syntactic lookup • We find that we have VP  V NP • The flight includes a meal.

  23. CKY: syntactic step (j = 5) • Syntactic lookup • We find that we have S  NP VP • The flight includes a meal.

  24. From recognition to parsing • The procedure so far will recognise a string as a legal sentence in English. • But we’d like to get a parse tree back! • Solution: • We can work our way back through the table and collect all the partial solutions into one parse tree. • Cells will need to be augmented with “backpointers”, i.e. With a pointer to the cells that the current cell covers.

  25. From recognition to parsing

  26. From recognition to parsing NB: This algorithm always fills the top “triangle” of the table!

  27. What about ambiguity? • The algorithm does not assume that there is only one parse tree for a sentence. • (Our simple grammar did not admit of any ambiguity, but this isn’t realistic of course). • There is nothing to stop it returning several parse trees. • If there are multiple local solutions, then more than one non-terminal will be stored in a cell of the table.

  28. Exercise • Apply the CKY algrithm to the fllowing sentence: Astronomers saw stars with ears. given the following grammar: S - > NP VP 1.0 NP-> NP PP 0.4 PP -> P NP 1.0 NP-> astronomers 0.2 VP -> V NP 0.7 NP-> ears 0.18 VP - > VP PP 0.3 NP->saw 0.04 P -> with 1.0 NP -> stars 0.18 V-> saw 1.0

  29. Exercise

  30. Exercise • Now run the CKY algorithm considering also the probabilities of the rules. • The probability of a cell [i, j] is P(rule learning to the cell)*P(cell[I, j-1])*P(cell[j+1, i]

  31. CKY Discussions • Running time: • where n is the length of the input string • Inner loop grows as square of # of non-terminals • Expressiveness: • As implemented, requires CNF • Weakly equivalent to original grammar • Doesn’t capture full original structure • Back-conversion? • Can do binarization, terminal conversion • Unit non-terminals require change in CKY

  32. Parsing Efficiently • With arbitrary grammars • Earley algorithm • Top-down search • Dynamic programming • Tabulated partial solutions • Some bottom-up constraints

  33. Interesting Probabilities N1 What is the probability of having a NP at this position such that it will derive “the building” ? - Inside Probabilities NP The gunman sprayed the building with bullets 1 2 3 4 5 6 7 Outside Probabilities What is the probability of starting from N1 and deriving “The gunman sprayed”, a NP and “with bullets” ? -

  34. Interesting Probabilities • Random variables to be considered • The non-terminal being expanded. E.g., NP • The word-span covered by the non-terminal. E.g., (4,5) refers to words “the building” • While calculating probabilities, consider: • The rule to be used for expansion : E.g., NP  DT NN • The probabilities associated with the RHS non-terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities .

  35. Outside Probabilities • j(p,q) :The probability of beginning with N1 & generating the non-terminal Njpq and all words outside wp..wq • Outside probability : N1  Nj w1 ………wp-1wp…wqwq+1 ………wm

  36. Inside Probabilities • j(p,q) :The probability of generating the words wp..wq starting with the non-terminal Njpq. • Inside probability : N1  Nj  w1 ………wp-1wp…wqwq+1 ………wm

  37. Outside & Inside Probabilities N1 NP The gunman sprayed the building with bullets 1 2 3 4 5 6 7

  38. Inside probabilities j(p,q) Base case: • Base case is used for rules which derive the words or terminals directly E.g., Suppose Nj = NN is being considered & NN  building is one of the rules with probability 0.5

  39. Induction Step Induction step : Nj • Consider different splits of the words - indicated by d • E.g., the huge building • Consider different non-terminals to be used in the rule: NP  DT NN, NP  DT NNS are available options Consider summation over all these. Nr Ns wp wd wd+1 wq Split here for d=2 d=3

  40. The Bottom-Up Approach • The idea of induction • Consider “the gunman” • Base cases : Apply unary rules DT  the Prob = 1.0 NN  gunman Prob = 0.5 • Induction : Prob that a NP covers these 2 words = P (NP  DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”) = 0.5 * 1.0 * 0.5 = 0.25 NP0.5 DT1.0 NN0.5 The gunman

  41. Parse Triangle • A parse triangle is constructed for calculating j(p,q) • Probability of a sentence using j(p,q):

  42. Example PCFG Rules & Probabilities • S  NP VP 1.0 • NP  DT NN 0.5 • NP  NNS 0.3 • NP  NP PP 0.2 • PP  P NP 1.0 • VP  VP PP 0.6 • VP  VBD NP 0.4 • DT  the 1.0 • NN  gunman 0.5 • NN  building 0.5 • VBD  sprayed 1.0 • NNS  bullets 1.0 • P  with 1.0

  43. Parse Triangle • Fill diagonals with

  44. Parse Triangle • Calculate using induction formula

  45. Example Parse t1` • The gunman sprayed the building with bullets. S1.0 Rule used here is VP  VP PP NP0.5 VP0.6 NN0.5 DT1.0 PP1.0 VP0.4 P1.0 NP0.3 NP0.5 VBD1.0 The gunman DT1.0 NN0.5 with NNS1.0 sprayed the building bullets

  46. Another Parse t2 • The gunman sprayed the building with bullets. S1.0 Rule used here is VP  VBD NP NP0.5 VP0.4 NN0.5 DT1.0 VBD1.0 NP0.2 The gunman sprayed NP0.5 PP1.0 DT1.0 NN0.5 P1.0 NP0.3 the building with NNS1.0 bullets

  47. Parse Triangle

  48. Different Parses • Consider • Different splitting points : E.g., 5th and 3rd position • Using different rules for VP expansion : E.g.,VP  VP PP, VP  VBD NP • Different parses for the VP “sprayed the building with bullets” can be constructed this way.

  49. Outside Probabilities j(p,q) Base case: Inductive step for calculating : N1 Nfpe Njpq Ng(q+1)e Summation over f, g & e wp wq wq+1 we w1 wp-1 we+1 wm

  50. Probability of a Sentence • Joint probability of a sentence w1m and that there is a constituent spanning words wp to wq is given as: N1 NP The gunman sprayed the building with bullets 1 2 3 4 5 6 7

More Related