1 / 20

Number or Nuance: Factors Affecting Reliable Word Sense Annotation

Number or Nuance: Factors Affecting Reliable Word Sense Annotation. Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder. Annotators in their little nests agree; And ‘tis a shameful sight, When taggers on one project Fall out, and chide, and fight.

kolya
Télécharger la présentation

Number or Nuance: Factors Affecting Reliable Word Sense Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Number or Nuance: Factors Affecting Reliable Word Sense Annotation Susan Windisch Brown, Travis Rood, and Martha Palmer University of Colorado at Boulder

  2. Annotators in their little nests agree; And ‘tis a shameful sight, When taggers on one project Fall out, and chide, and fight. —[adapted from] Isaac Watts

  3. Automatic word sense disambiguation • Lexical ambiguity is a significant problem in natural language processing (NLP) applications (Agirre & Edmonds, 2006) • Text summarization • Question answering • WSD systems might help • Several studies show benefits for NLP tasks (Sanderson, 2000; Stokoe, 2003; Carpuat and Wu, 2007; Chan, Ng and Chiang, 2007) • But only with higher system accuracy (90%+)

  4. Annotation reliability affects system accuracy

  5. Senses for the verb control

  6. Possible factors affecting the reliability of word sense annotation • Fine-grained senses result in many senses per word, creating a heavy cognitive load on annotators, making accurate and consistent tagging difficult • Fine-grained senses are not distinct enough to reliably discriminate between

  7. Requirements to compare fine-grained and coarse-grained annotation • Annotation of the same words on the same corpus instances • Sense inventories differing only in sense granularity • Previous work (Ng et al., 1999; Edmonds & Cotton, 2001; Navigli et al. 2007)

  8. 3 experiments • 40 verbs • Number of senses : 2-26 • Sense granularity: WordNet vs. OntoNotes • Exp. 1: confirm difference in reliability between fine- and coarse-grained annotation; vary granularity and number of senses • Exp. 2: hold granularity constant; vary number of senses • Exp. 3: hold number constant; vary granularity

  9. Experiment 1 • Compare fine-grained sense inventory to coarse • 70 instances for each verb from the ON corpus • Annotated with WN senses by multiple pairs of annotators • Annotated with ON senses by multiple pairs of annotators • Compare the ON ITAs to the WN ITAs

  10. Results

  11. Results • Coarse-grained ON annotations had higher ITAs than fine-grained WN annotations • Number of senses • No significant effect (t(79) = -1.28, p = .206). • Sense nuance • Yes, a significant effect (t(79) = 10.39, p < .0001). • With number of senses held constant, coarse-grained annotation is 16.2 percentage points higher than fine-grained.

  12. Experiment 2: Number of senses • Hold sense granularity constant; vary # of senses • 2 pairs of annotators, using fine-grained WN senses • First pair uses full set of WN senses for a word • Second pair uses a restricted set on instances that we know should fit one of those senses

  13. OntoNotes grouped sense A OntoNotes grouped sense B WN 1 2 4 5 6 11 12 WN 3 7 8 13 14 OntoNotes grouped sense C WN 9 10

  14. "Then I just bought plywood, drew the pieces on it and cut them out." Full set of WN senses Restricted set of WN senses • 1. ---------------- • 2. ---------------- • 3. ---------------- • 4. ---------------- • 5. ---------------- • 6. ---------------- • 7. ---------------- • 8. ---------------- • 9. ---------------- • 10. ---------------- • 11. ---------------- • 12. ---------------- • 13. ---------------- • 14. ---------------- • 3. ---------------- • 7. ---------------- • 8. ---------------- • 13. ---------------- • 14. ----------------

  15. Results

  16. Experiment 3 • Number of senses controlled; vary sense granularity • Compare the ITAs for the ON tagging with the restricted-set WN tagging

  17. Results

  18. Conclusion • Number of senses annotators must choose between: never a significant factor • Granularity of the senses: a significant factor, with fine-grained senses leading to lower ITAs • Poor reliability of fine-grained word sense annotation cannot be improved by reducing the cognitive load on annotators. • Annotators cannot reliably discriminate between nuanced sense distinctions.

  19. Acknowledgements We gratefully acknowledge the efforts of all of the annotators and the support of the National Science Foundation Grants NSF-0415923, Word Sense Disambiguation and CISE-CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE-CRI 0709167, as well as a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, a subcontract from BBN, Inc.

  20. Restricted set annotation • Use the adjudicated ON data to determine the ON sense for each instance. • Use instances from experiment1 that were labeled with one selected ON sense (35 instances). • Each restricted-set annotator saw only the WN senses that were clustered to form the appropriate ON sense. • Compare to the full set annotation for those instances.

More Related