Prepositional Phrase Attachment

Prepositional Phrase Attachment Chris Brew Ohio State University

Prepositional Phrase Attachment • Hindle and Rooth: partial parser to get statistics • Collins and Brooks: back off estimation from tree bank data + attachment decision. • Merlo,Crocker and Berthouzoz: multiple PP disambiguated • Ratnaparkhi: entirely unsupervised

The problem

Hindle and Rooth • Whittemore, Ferrara and Brunner • Structural heuristics (Kimball’s Right Association, Frazier’s Minimal Attachment) account for only 55% of behaviour • Lexical preferences do much better • H and R • note that the preferences for this experiment were provided by human judgement • ask how to get automatically a good list of lexical preferences

Discovering Lexical Association in text • Church’s part of speech analyser • Hindle’s FIDDICH partial parser • 13 million words of AP news wire

Fiddich S ? NP AUX VP ? PP DART NBAR VPREZ VPPRT NP FIN TNS PREP NP the ADJ NPL are aimed . pro+ ? in DART NBAR ? ? radical PP the PNP ADV changes PP PREP NP PNP PNP evidently PREP NBAR at VP Union Soviet in N NPL NP VING CONJ NPL N regulations IART NBAR PP remedying export an ADJ N PREP NP and of NBAR extreme customs shortage N NPL consumer goods

Extract information about words ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

What the table means • noun column has head noun of noun phrase (or various special cases) • verb column has head verb if noun phrase was its object • prep column has following preposition • Syntax column V- if no preceding verb

Counting attachments • Parser isn’t reliable, so use a decision procedure to assign nouns and verbs to noun-attach (na) and verb-attach (va)

No preposition • add a count for <noun,NULL> or <verb,NULL> ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Sure Verb Attach 1: • if the noun phrase head is a pronoun add a count for <verb,prep> ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Sure Verb Attach 2: • if the verb is passivized, verb attach unless preposition is “by” ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Sure Noun Attach • if no verb available, then noun attach ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Ambiguous Attach 1: • if LA score > 2.0 verb attach, < -2.0 noun attach. Use stats so far for calculating score. Repeat until stable. ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of<- maybe e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Ambiguous Attach 2: • Share counts between noun and verb ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of <- maybe e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Unsure Attach: • attach to noun by default ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING

Smooth the estimates • using typical association rates of prepositions with the whole classes of nouns and verbs P(p|n) = (|n,p|+P(na|p))/( |n|+1)where P(na|p) is |any noun,p|/|any noun| and similarly for verbs • Laplace’s M-estimate again

Performance • ~ 80% correct • can get better precision by accepting lower recall (useful for exploratory text analysis) • “good enough to be added to a parser like Fidditch”

V N2 P N1 V N2 P N1 Backed-off estimation • Collins and Brooks • use N2 as well as N1

Use treebank data • similar approaches • Ratnaparkhi, Reynar and Roukos • Brill and Resnik • difficult to compare results with Hindle and Rooth, because the corpora used are different (but raw scores around 80% in both cases)

The data • 20801 training and 3097 test examples • about 95% of the quadruples in the test data had not been seen in the training set. • compare H&R 200,000 triples

Take this method and apply to PP data • Start with full quadruples • Four possible triples to back off to • Six possible pairs to back off to • Restrict attention to those containing P

How to combine counts from triples and pairs ptriple(1|v,n1,p,n2) ~p(1,v,n1,p)+p(1,v,p,n2)+p(1,n1,p,n2) p(v,n1,p)+p(v,p,n2)+p(n1,p,n2) ppair(1|v,n1,p,n2) ~p(1,v,p)+p(1,p,n2)+p(1,n1,p) p(v,p)+p(p,n2)+p(n1,p) • other combinations tried, this formula is better than simple averaging for this task

What was “enough data”? • In this task it turns out that using a threshold of 0 for the denominator is best. If there is even one instance of the quadruple, trust it. • For n-grams, it was better to ignore low counts • reason for this is not obvious, but in such situations trying things is essential.

Results • 84.1% correct without morphological analysis, 84.5% with • Quadruples more accurate than triples , in turn more accurate than doubles, etc. • But only 148 quadruples in test data, vs 764 triples, 1965 doubles, 216 singles

Comparison with Hindle and Rooth • We have 1924 test cases where H&R would have made a decision • The backoff method using just the |v,p| and |n1,p| counts (86.5%) outscores H&R style (82.1%).

Extra experiments • Setting threshold to 5 reduces performance to 81.6% • Tuples with prepositions in are the most effective.

Attaching Multiple PPs • Merlo, Crocker, Berthouzoz • For a single PP there are two structures, for 2 PPs there are 5, for 3 PPs 14 • so the problem is harder, a dumb algorithm will do poorly • Generalization of Collins/Brooks

Five structures for V NP PP PP • Structure 1 535 The agency said it will [keep]v [the debt]np [under review]pp [ for possible downgrade]pp • Structure 2 1160 Penney will [extend]v [[its involvement]np [with the service]pp]np [for at least five years]pp

Structure 3 1394 [address]v [[budget limits]np [on [credit allocations [ for the Federal Housing agency ]pp]np]pp]np Structure 4 1055 [abandon] [the everyday pricing approach] [in the face of [the poor results]]

Structure 5 539 [answering] [questions [from members of Parliament]] [after his announcement]

Algorithm • Model of PP1 as Collins and Brooks, but excluding p2 • Model of 2PPs is back off over sextuples (i,v,n1,p1,n2,p2) until we get to tuples that don’t have p1, or that don’t have p2 • then Competitive Back off

Competitive Back off • Do standard back off for PP1 using v,n1,p1 • Do standard back off for PP2 using v,n2,p2 • Do back off for PP2 using n1 instead of n2 (ie., v,n1,p2) • Combine these results using a simple procedure, with tiebreak where they conflict.

Results • PP1(2) 84.3% baseline 61.2% (choose most frequent) • PP2(5) 69.6% baseline 29.8% (choose most frequent) • PP3(14) 43.6% baseline 18.5% (choose most frequent)

Results • Take-home messages • Devise a baseline • Measure performance • Pick tasks where beating the baseline is • Impressive • Useful

Ratnaparkhi (Coling 98) • 970K unannotated sentences of WSJ • tagger, simple chunker • heuristic extraction of unambiguous cases

Heuristic extraction • (v,p,n2) if • p is a real preposition (not “of”) • v is the first verb that occurs < K words left of p • v is not a form of the verb “to be” • No noun occurs between v and p • n2 is first word < K words right of p • No verb occurs between p and n2

Heuristic extraction 2 • (n,p,n2) if • p is a real preposition (not “of”) • n is the first that occurs < K words left of p • No verb occurs between v and p • n2 is first word < K words right of p • No verb occurs between p and n2

Accuracy of extraction • Noisy data (c 69% correct) • But abundant

Evaluation • 81.91% with a back off technique • 81.85% with interpolation like H&R • Baseline for this data 70.39%

Portability • Moved to Spanish and got similar performance • H&R would have had to port Fidditch to Spanish

Where to get more information • Charniak ch 8. • Hindle and Rooth CL 19(1) pp 103-120, 1993 • Manning and Schütze, section 8.3 • Original papers

Prepositional Phrase Attachment