Improving Machine Learning Approaches to Coreference Resolution
E N D
Presentation Transcript
Improving Machine Learning Approaches to Coreference Resolution Vincent Ng and Claire Cardie Cornell Univ. ACL 2002 slides prepared by Ralph Grishman
Goal Improve on Soon et al. by • better preprocessing (chunking, names, …) • better search procedure for antecedent • better selection of positive examples • more features • more features • more features ...
Better search for antecedent • Soon et al. Use decision tree as binary classifier, take nearest antecedent classified as +ve • Ng&Cardie use same sort of classifier, but count +ve and -ve examples at each leaf, and use that to compute a probability • Ng&Cardie then take highest ranking antecedent (if probability > 0.5)
Better choice of positive examples • Soon et al. always use most recent antecedent • For Ng&Cardie, if anaphor is not a pronoun, they use most recent antecedent that is not a pronoun
More features #1 • Soon et al. Have a ‘same string’ feature • Ng&Cardie split this up into 3 features, for pronominals, nominals, and names
More features Added 41 more features: • lexical • grammatical • semantic
Lexical features (examples) • Non-empty overlap of words of two NPs • Prenominal modifiers of one NP are a subset of prenominal modifiers of other
Grammatical features (examples) • NPs are in predicate nominal construct • One NP spans the other • NP1 is a quoted string • One of the NPs is a title
Semantic features (examples) For nominals with different heads • direct or indirect hypernym relation in WordNet • distance of hypernym relation • sense number for hypernym relation
Selecting features • Full feature set yielded very low precision on nominal anaphors • overtraining: too many features for too little data • So they (manually) eliminated many features which led to low precision (on training data) • no ‘development set’ separate from training and test sets