Natural Language Processing

Natural Language Processing Lecture 17—10/29/2013 Jim Martin

Today • Finish Statistical CFG Parsing • Dependency parsing • Dependency trees • Basic transition-based parsing • Machine learning Speech and Language Processing - Jurafsky and Martin

Simple Probability Model • A derivation (tree) consists of the bag of grammar rules that are in the tree • The probability of a tree is the product of the probabilities of the rules in the derivation. Speech and Language Processing - Jurafsky and Martin

Improved Approaches • There are two approaches to overcoming these shortcomings • Rewrite the grammar to better capture the dependencies among rules • Integrate lexical dependencies into the model • And come up with the independence assumptions needed to make it work. Speech and Language Processing - Jurafsky and Martin

Solution 2: Lexicalized Grammars • Lexicalize the grammars with heads • Compute the rule probabilities on these lexicalized rules • Run Prob CKY as before Speech and Language Processing - Jurafsky and Martin

Dumped Example Speech and Language Processing - Jurafsky and Martin

Declare Independence • When stuck, exploit independence and collect the statistics you can… • There are a large number of ways to do this... • Let’s consider one generative story: given a rule we’ll • Generate the head • Generate the stuff to the left of the head • Generate the stuff to the right of the head Speech and Language Processing - Jurafsky and Martin

Example • That is, the rule probability for is estimated as Speech and Language Processing - Jurafsky and Martin

Dependency Parse ROOT I booked a morning flight. (booked, I) (booked, flight) (flight, a) (flight, morning)

Tree Constraints • Words can only have one head • One incoming arc • Every word has to have a head • Result is a tree • There’s a path from the root to each word • There’s only one path from the root to any word • These are the formal constraints on dependency trees. For any given sentence there will be lots of such trees. Most of which are non-sense. Speech and Language Processing - Jurafsky and Martin

Dependency Grammar • The linguistic constraints underlying “correct trees” are usually called a dependency grammar • Which may or may not correspond to an explicit formal generative grammar of the kind we’ve been using • The parsing technique discussed today doesn’t use an explicitly represented grammar Speech and Language Processing - Jurafsky and Martin

Transition-Based Parsing • Transition-based parsing is a greedy word-by-word approach to parsing • A single dependency tree is built up an arc at a time as we move left to right through a sentence • No backtracking • A classifiers is used to make decisions as we move through the sentence Speech and Language Processing - Jurafsky and Martin

Dependency Parse I booked a morning flight.

Transition-Based Parsing • We can (again) view this as a search space through a set of states for a state that contains what we want • In the standard notation a state consists of three elements • A stack representing partially processed words • A list containing the remaining words to be processed • A set containing the relations discovered so far Speech and Language Processing - Jurafsky and Martin

States • So the start state looks like • [[root], [sentence], ()] • A valid final state looks like • [[root], [] (R)] • Where R is the set of relations that we’ve discovered. The [] represents the fact that all the words in the sentence are accounted for Speech and Language Processing - Jurafsky and Martin

Example • Here’s our example • Start • [[root], [I booked a morning flight], ()] • End • [[root], [], ((booked, I) (booked, flight) (flight, a) (flight, morning))] Speech and Language Processing - Jurafsky and Martin

Parsing • The parsing problem is how to get from the start state to the final state • To begin, we’ll define a set of three basic operators that take a state and produce a new state • Left • Right • Shift Speech and Language Processing - Jurafsky and Martin

Shift • Shift takes the next word to be processed and pushes it onto the stack and removes it from the list. • So a shift for our example at the start looks like this [[root], [I booked a morning flight], ()]  [[root, I], [booked a morning flight], ()] Speech and Language Processing - Jurafsky and Martin

Left • The Left operator • Adds relation (a, b) to the set of relations where • a is the first word on the word list • b is the word at the top of the stack • Pops the stack • So for our current state [[root, I], [booked a morning flight], ()]  [[root], [booked a morning flight], (booked, I)] Speech and Language Processing - Jurafsky and Martin

Right • The Right operator • Adds (b, a) to the set of relations • Where b and a are the same as before: a is the first work in the remainder list, and b is the top of the stack • Removes the first word from the remainder list • Pops the stack and places the popped item back at the front of the remaining word list Speech and Language Processing - Jurafsky and Martin

Example Speech and Language Processing - Jurafsky and Martin

Two Problems • First, we really want labeled relations • That is, we want things like subject, direct object, indirect object, etc. as relations • Second how did we know which operator (L, R, S) to invoke at each step along the way? • Since we’re not backtracking, one wrong step and we won’t get the tree we want • How do we even know what tree we want? • Well we could add backtracking... Speech and Language Processing - Jurafsky and Martin

Grammatical Relations • Well, to handle this we can just add new transitions • Essentially replace Left and Right with {Left, Right} X {all the relations of interest} • Note this isn’t going to make problem 2 any easier to deal with Speech and Language Processing - Jurafsky and Martin

Making Choices • Method 1 • Use a set of rules that choose an operator based on features of the current state • As in, if the word at the top of the stack is “I” and the rest of the stack is just “root” and the word at the front of the word list is “booked”, then invoke Left_Subj Speech and Language Processing - Jurafsky and Martin

Making Choices • Method 1 • Use a set of rules that choose an operator based on features of the current state • As in, if there’s a pronoun at the top of the stack and the rest of the stack is just root and there’s a verb at the front of the word list, then invoke Left_Subj Speech and Language Processing - Jurafsky and Martin

Making Choices • Method 2 • Use supervised machine learning (ML) to train a classifier to choose among the available operators • Based on features derived from the states • Then use that classifier to make the right choices Speech and Language Processing - Jurafsky and Martin

Three Problems • To apply ML in situations like this we have three problems • Discovering features that are useful indicators of what to do in any situation • Characteristics of the state we’re in • Acquiring the necessary training data • Treebanks • Training Speech and Language Processing - Jurafsky and Martin

Three Problems: Features • Features are typically described along two dimensions in this style of parsing • Position in the state (aka configuration) • Position in the stack, position in the word list, location in the partial tree • Attributes of particular locations or attributes of tuples of locations • Part of speech of the top of the stack, POS of the third word in the remainder list, lemmas, last three letters • Head word of a word, number of relations already attached to a word, does the word already have a SUBJ relation, etc. Speech and Language Processing - Jurafsky and Martin

Three Problems: Data • Training data • Get a treebank • Directly as a dependency treebank • Or derived from a phrase-structure treebank Speech and Language Processing - Jurafsky and Martin

Three Problems: Training • This is tricky • Our treebanks associate sentences with their corresponding trees • We need parser states paired with their corresponding correct operators (never going to get this directly) • But we do know the correct trees • So.... Speech and Language Processing - Jurafsky and Martin

Three Problems: Training • We’ll parse with our standard algorithm, asking an oracle which operator to use at any given time. • The oracle has access to the correct tree for this sentence. At each stage it chooses as a case statement • Left if the resulting relation is in the correct tree. • Right if the resulting relation is in the correct tree AND if all the other outgoing relations associated with the word are already in the relation list. • Otherwise shift Speech and Language Processing - Jurafsky and Martin

Natural Language Processing