Dynamic Programming in Parsing: CKY and Earley Algorithms

Sentence Parsing Parsing 3 Dynamic Programming

Acknowledgement • Lecture based on • Jurafsky and Martin Ch. 13 (2nd Edition) • J & M Lecture Notes Speech and Language Processing - Jurafsky and Martin

Avoiding Repeated Work • Dynamic Programming • CKY Parsing • Earley Algorithm • Chart Parsing Speech and Language Processing - Jurafsky and Martin

Dynamic Programming • DP search methods fill tables with partial results and thereby • Avoid doing avoidable repeated work • Solve exponential problems in polynomial time (well, no not really) • Efficiently store ambiguous structures with shared sub-parts. • We’ll cover two approaches that roughly correspond to top-down and bottom-up approaches. • CKY (after authors Cocke, Kasami and Younger) • Earley (often referred to as chart parsing, because it uses a data structure called a chart) Speech and Language Processing - Jurafsky and Martin

CKY Parsing • First we’ll limit our grammar to epsilon-free, binary rules (more later) • Key intuition: consider the rule A BC • If there is an A somewhere in the input then there must be a B followed by a C in the input. • If the A spans from i to j in the input then there must be some k st. i<k<j • i.e. The B splits from the C someplace. • This intuition plays a role in both CKY and Earley methods. Speech and Language Processing - Jurafsky and Martin

Problem • What if your grammar isn’t binary? • As in the case of the TreeBank grammar? • Convert it to binary… any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically. • What does this mean? • The resulting grammar accepts (and rejects) the same set of strings as the original grammar. • But the resulting derivations (trees) are different. Speech and Language Processing - Jurafsky and Martin

Problem • More specifically, we want our rules to be of the form A  B C Or A w That is, rules can expand to either 2 non-terminals or to a single terminal. Speech and Language Processing - Jurafsky and Martin

Binarization Intuition • Eliminate chains of unit productions. • Introduce new intermediate non-terminals into the grammar that distribute rules with length > 2 over several rules. • So… S  A B C turns into S  X C and X  A B Where X is a symbol that doesn’t occur anywhere else in the the grammar. Speech and Language Processing - Jurafsky and Martin

Sample L1 Grammar Speech and Language Processing - Jurafsky and Martin

CNF Conversion Speech and Language Processing - Jurafsky and Martin

CKY • Build a table so that an A spanning from i to j in the input is placed in cell [i,j] in the table. • So a non-terminal spanning an entire string will sit in cell [0, n] • Hopefully an S • If we build the table bottom-up, we’ll know that the parts of the A must go from i to k and from k to j, for some k. Speech and Language Processing - Jurafsky and Martin

CKY • Meaning that for a rule like A B C we should look for a B in [i,k] and a C in [k,j]. • In other words, if we think there might be an A spanning i,j in the input… AND A B C is a rule in the grammar THEN • There must be a B in [i,k] and a C in [k,j] for some i<k<j Speech and Language Processing - Jurafsky and Martin

CKY • So to fill the table loop over the cell[i,j] values in some systematic way • What constraint should we put on that systematic search? • For each cell, loop over the appropriate k values to search for things to add. Speech and Language Processing - Jurafsky and Martin

CKY Algorithm Speech and Language Processing - Jurafsky and Martin

CKY Parsing • Is that really a parser? Speech and Language Processing - Jurafsky and Martin

CKY Parsing • Is that really a parser? • No it’s a recogniser. But the parse tree can in principle be recovered from the table provided that each time a LH symbol A is put into the table deriving from a rule A  B C, a pointer is kept to the RH instances B, C from which it was derived. • The parse tree is then reconstructed by following the pointers from the top item. Speech and Language Processing - Jurafsky and Martin

Note • We arranged the loops to fill the table a column at a time, from left to right, bottom to top. • This assures us that whenever we’re filling a cell, the parts needed to fill it are already in the table (to the left and below) • It’s somewhat natural in that it processes the input left to right one word at a time • Known as online • Other ways of filling the table are possible Speech and Language Processing - Jurafsky and Martin

Example Speech and Language Processing - Jurafsky and Martin

Example Filling column 5 Speech and Language Processing - Jurafsky and Martin

Example Speech and Language Processing - Jurafsky and Martin

CKY Notes • Since it’s bottom up, CKY populates the table with a lot of phantom constituents. • Segments that by themselves are constituents but cannot really occur in the context in which they are being suggested. • To avoid this we can switch to a top-down control strategy • In addition we can add some kind of filtering that blocks constituents where they can not happen in a final analysis. Speech and Language Processing - Jurafsky and Martin

Earley Parsing • Allows arbitrary CFGs (not just binary ones). • This requires dotted rules • Some top-down control • Uses a prediction operation • Parallel top-down search • Fills a table in a single sweep over the input • Table is length N+1; N is number of words • Table entries consist of dotted rules Speech and Language Processing - Jurafsky and Martin

Earley Parsing • Dynamic Programming: solution involves filling in table of solutions to subproblems. • Parallel Top Down Search • Worst case complexity = O(N3) in length N of sentence. • Table is sometimes called a chart. Earley parsing also called chart parsing. • Chart contains N+1 entries ● book ● that ● flight ● 0 1 2 3 Speech and Language Processing - Jurafsky and Martin

The Chart • Each table entry contains a list of states • Each state represents all partial parses that have been reached so far at that point in the sentence. • States are represented using dotted rules containing information about • Rule/subtree: which rule has been used • Progress: dot indicates how much of rule's RHS has been recognised. • Position: text segment to which this parse applies Speech and Language Processing - Jurafsky and Martin

Examples of Dotted Rules • Initial S Rule (incomplete) S -> . VP,[0,0] • Partially recognised NP (incomplete) NP -> Det . Nominal,[1,2] • Fully recognised VP (complete)VP -> V VP .,[0,3] • These states can also be represented graphically on the chart Speech and Language Processing - Jurafsky and Martin

The Chart Speech and Language Processing - Jurafsky and Martin

Earley Algorithm • Main Algorithm: proceeds through each text position, applying one of the three operators below. • Predictor: Creates "initial states" (ie states whose RHS is completely unparsed). • Scanner: checks current input when next category to be recognised is pre-terminal. • Completer: when a state is "complete" (nothing after dot), advance all states to the left that are looking for the associated category. Speech and Language Processing - Jurafsky and Martin

Earley Main Algorithm Speech and Language Processing - Jurafsky and Martin

Earley Sub Functions Speech and Language Processing - Jurafsky and Martin

Early Algorithm – Sub Functions Predictor(A -> alpha . B beta, [i,j]) for each B -> gamma in Grammar(B) enqueue((B -> . gamma, [j,j]), chart[j]) Scanner(A -> alpha . B beta, [i,j]) if B in PartOfSpeech(word[j]) then enqueue((B -> word[j], [j,j+1]), chart[j+1]) Completer(B -> gamma . , [j,k]) for each (A -> . B beta) in chart[j] enqueue((A -> B . beta , [j,j]), chart[j]) Speech and Language Processing - Jurafsky and Martin

Grammar S -> NP VP S -> Aux NP VP S -> VP NP -> Det Nominal Nominal -> Noun Nominal -> Noun Nominal NP -> Proper-Noun VP -> Verb VP -> Verb NP Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin

fl Speech and Language Processing - Jurafsky and Martin

Retrieving Trees • To turn recogniser into a parser, representation of each state must also include information about completed states that generated its constituents Speech and Language Processing - Jurafsky and Martin

Speech and Language Processing - Jurafsky and Martin

Chart[3] ↑Extra Field Speech and Language Processing - Jurafsky and Martin

Back to Ambiguity • Did we solve it? Speech and Language Processing - Jurafsky and Martin

Ambiguity • No… • Both CKY and Earley will result in multiple S structures for the [0,N] table entry. • They both efficiently store the sub-parts that are shared between multiple parses. • And they obviously avoid re-deriving those sub-parts. • But neither can tell us which one is right. Speech and Language Processing - Jurafsky and Martin

Dynamic Programming in Parsing: CKY and Earley Algorithms

Dynamic Programming in Parsing: CKY and Earley Algorithms

Presentation Transcript

Parsing

Parsing

Parsing

Sentence Parsing (labeling)

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing