1 / 16

Non-compositional Dependencies and Parse Error Detection

Non-compositional Dependencies and Parse Error Detection. Colin Cherry Chris Pinchak Ajit Singh. Introduction. What is a non-compositional phrase (NCP) ? Why do we want to find them ? Why are they hard to detect ? Example Mary exploded a bombshell … Mary astonished ….

alice
Télécharger la présentation

Non-compositional Dependencies and Parse Error Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-compositional Dependencies and Parse Error Detection Colin Cherry Chris Pinchak Ajit Singh

  2. Introduction • What is a non-compositional phrase (NCP) ? • Why do we want to find them ? • Why are they hard to detect ? • Example • Mary exploded a bombshell … • Mary astonished …

  3. Non-compositional Phrase Detection Collocation Database Set of non-compositional dependency triples {(H R M)} Thesaurus Corpus

  4. Problems with parse errors • Hell’s Angels  (angel N:subj:N hell) • Honky Tonk Moon – Randy Travis  (Moon V:obj:N Randy Travis) • Parse errors often labeled non-compositional (incorrectly) • Reinforcing errors if we add them to an idiom list • Makes idiom database unnecessarily large

  5. Methods for Filtering Parse Errors • The Basic Idea • Simple Filter • Hand-tuned Nemesis Lists • Nemesis Classes

  6. The Idea • Every triple contains a word pair and a relationship • Words pairs can participate in multiple relationships • Example: ‘level’ & ‘playing field’ • That really leveled the playing field • (level V:obj:N field) • Now we’re on a level playing field • (field N:mod:A level)

  7. The Idea continued... • Dekang’s software allows us to list all possible ways in which two words can be connected • Some relationships are easily confused • Example: • Hell’s Angels (The angels of hell) • Hell’s Angels (Hell is angels) • Can this help us automatically detect errors?

  8. Simple Filter • Strong (wrong) assumption: • Any two words can be used in exactly one relationship • That relationship will occur most frequently • The filter: • If the relationship present in the dependency isn’t the most frequently used relationship for the two words, label it a parse error

  9. Nemesis Lists • Much weaker assumption: • For any relationship, there is a closed list of other relationships that it can be confused with • If something on this closed list occurs more frequently, label it as a parse error • List for N:subj:N -> {N:gen:N} • If we see “Hell is angels” it might have really been, “Hell’s angels”

  10. Nemesis Classes • Problem with Lists: • Take too long, designed by hand! • Prone to human error / bias • Replace lists with broad classes, where all members of one class have the same nemeses (another class) • We used two classes, one of which had no nemesis (was never filtered)

  11. Our Classes • Class 1 – “Normal Relationships” • Subj, obj, modifier, noun-noun • Class 2 – “Preposition-like Relationships” • Of, for, to, in, under, over (anything not in 1) • Class 1 nemeses = Class 1 • They interfere with each other • Class 2 nemeses = {} • They’re passed automatically

  12. Experimental Design • 100 random idioms, as identified by Dekang’s method • Only one run, as 100 idioms takes approx. 1.5 hours of manual parse examination • Selected after designing the filters • Metrics • Error Ratio = retained parse errors / retained • Precision = filtered parse errors / total filtered • Recall = filtered parse errors / total parse errors

  13. Performance Comparison

  14. Future Work • The whole notion seems to be very limited in terms of recall • Best we could do was 4/13 errors found! • Need more extensive rules: • Confirming relationships • Syria is Lebanon's main power broker, ... • Syria, Lebanon's primary power broker, … • Might allow us to make other rules tougher

  15. More Future Work • Need a way to cut down on manual effort • Labeling is a pain but manually selecting rules is worse • Machine learning methods: • Decision Trees? • What about co-training ?

  16. Conclusions • Nemesis lists provide a high precision, low recall method • Classes provide better recall at the cost of precision • A user who needed to avoid parse errors could choose the method that is most appropriate to his or her task

More Related