280 likes | 389 Vues
This paper evaluates Waspbench, a tool developed to aid lexicographers by incorporating word sense disambiguation (WSD). Lexicography plays a critical role in natural language processing (NLP), especially for machine translation (MT), where manual production of translation rules can be costly. The study examines the input/output dynamics of Waspbench, including the creation of word experts and the evaluation of MT performance using human subjects. Results indicate that WSD can enhance MT quality, and the paper suggests future directions for expanding Waspbench's capabilities.
E N D
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University of Brighton Credits: UK EPSRC grant WASPS, M34971
Word senses: nowhere truer • Lexicography • the second hardest part
Word senses: nowhere truer • Lexicography • the second hardest part • NLP • Word sense disambiguation (WSD) • SENSEVAL-1 (1998): 77% Hector • SENSEVAL-2 (2001): 64% WordNet
Word senses: nowhere truer • Lexicography • the second hardest part • NLP • Word sense disambiguation (WSD) • SENSEVAL-1 (1998): 77% Hector • SENSEVAL-2 (2001): 64% WordNet • Machine Translation • Main cost is lexicography
Synergy The WASPBENCH
Inputs and outputs • Inputs • Corpus (processed) • Lexicographic expertise
Inputs and outputs • Outputs • Analysis of meaning/translation repertoire • Implemented: • Word expert • Can disambiguate A “disambiguating dictionary”
Inputs and outputs MT needs rules of form in context C, S => T • Major determinant of MT quality • Manual production: expensive • Eng oil => Fr huile or petrole? • SYSTRAN: 400 rules
Inputs and outputs MT needs rules of form in context C, S => T • Major determinant of MT quality • Manual production: expensive • Eng oil => Fr huile or petrole? • SYSTRAN: 400 rules Waspbench output: thousands of rules
Evaluation hard
Evaluation hard • Three communities
Evaluation hard • Three communities • No precedents
Evaluation hard • Three communities • No precedents • The art and craft of lexicography
Evaluation hard • Three communities • No precedents • The art and craft of lexicography • MT personpower budgets
Five threads • as WSD: SENSEVAL • for lexicography: MED • expert reports • Quantitative experiments with human subjects • India • Within-group consistency • Leeds • Comparison with commercial MT
Method • Human1 creates word experts • Computer uses word experts to disambiguate test instances • MT system translates same test instances • Human2 • evaluates computer and MT performance on each instance: • good / bad / unsure / preferred / alternative
Words • mid-frequency • 1,500-20,000 instances in BNC • At least two clearly distinct meanings • Checked with ref to translations into Fr/Ger/Dutch • 33 words • 16 nouns, 10 verbs, 7 adjs • around 40 test instances per word
Human subjects • Translation studies students, Univ Leeds • Thanks: Tony Hartley • Native/near-native in English and their other language • twelve people, working with: • Chinese (4) French (3) German (2) Italian (1) Japanese (2) (no MT system for Japanese) • circa four days’ work: • introduction/training • two days to create word experts • two days to evaluate output
Method • Human1 creates word experts, average 30 mins/word • Computer uses word experts to disambiguate test instances • MT system: Babelfish via Altavista translates same test instances • Human2 • evaluates computer and MT performance on each instance: • good / bad / unsure / preferred / alternative
Observations • Grad student users, 4-hour training • 30 mins per (not-too-complex) word • ‘fuzzy’ words intrinsically harder • No great inter-subject disparities • (it’s the words that vary, not the people)
Conclusion • WSD can improve MT (using a tool like WASPS)
Future work • multiwords • n>2 • thesaurus • other source languages • new corpora, bigger corpora • the web