250 likes | 410 Vues
Learning Dutch noun phrase coreference resolution. Véronique Hoste and Walter Daelemans CNTS Language Technology group University of Antwerp CLIN 2004. Outline. Definition Data set Cross-validation: The effect of optimization The effect of skewedness Results on the test set
E N D
Learning Dutch noun phrase coreference resolution Véronique Hoste and Walter Daelemans CNTS Language Technology group University of Antwerp CLIN 2004
Outline • Definition • Data set • Cross-validation: • The effect of optimization • The effect of skewedness • Results on the test set • Error analysis • Conclusion
Definition (Hirst, 81) Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will be able to disabbreviate the reference and thereby determine the identity of the entity.
Definition (Hirst, 81) Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will be able to disabbreviate the reference and thereby determine the identity of the entity. ANAPHOR
Definition (Hirst, 81) ANTECEDENT or REFERENT Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will be able to disabbreviate the reference and thereby determine the identity of the entity. ANAPHOR
Definition (Hirst, 81) Anaphora is the device of making in discourse an abbreviated reference to some entity in the expectation that the perceiver will be able to disabbreviate the reference and thereby determine the identity of the entity. ANTECEDENT or REFERENT ANAPHOR RESOLUTION
Example (KNACK-2002) Zacarias Moussaoui, de eerste persoon die door het Amerikaanse gerecht aangeklaagd is voor de terreuraanvallen van 11 september, pleit onschuldig bij zijn eerste verschijning voor de rechtbank. De Fransman van Marokkaanse afkomst wordt ervan verdacht de ‘twintigste vliegtuigkaper’ te zijn die door omstandigheden (hij zat in een Amerikaanse cel) niet aan de kapingen kon deelnemen.
Example (ctd.) Zacarias Moussaoui, de eerste persoon die door het Amerikaanse gerecht aangeklaagd is voor de terreuraanvallen van 11 september, pleit onschuldig bij zijn eerste verschijning voor de rechtbank. De Fransman van Marokkaanse afkomst wordt ervan verdacht de ‘twintigste vliegtuigkaper’ te zijn die door omstandigheden (hij zat in een Amerikaanse cel) niet aan de kapingen kon deelnemen.
Example (ctd.) Zacarias Moussaoui, de eerste persoon die door het Amerikaanse gerecht aangeklaagd is voor de terreuraanvallen van 11 september, pleit onschuldig bij zijn eerste verschijning voor de rechtbank. De Fransman van Marokkaanse afkomst wordt ervan verdacht de ‘twintigste vliegtuigkaper’ te zijn die door omstandigheden (hij zat in een Amerikaanse cel) niet aan de kapingen kon deelnemen.
KNACK-2002 • New corpus annotated with coreferential relations between noun phrases • Existing corpora for Dutch are small and only contain anaphorical relations for pronouns (op den Akker et al., 2002) (Bouma, 2003) • Articles from KNACK, a Flemish weekly magazine with articles on national and international current affairs. • 267 annotated texts, ca. 12,500 annotated NPs • Experiments: random selection of 50 texts (25 for training, 25 for testing)
Which anaphora? • Annotation: adaptation of the MUC guidelines http://cnts.uia.ac.be/~hoste/manual_dutch.ps • Identity, bound, ISA (identity of sense), modality relations <-> part-whole relation: “If the gas tank is empty, you should refuel the car.” • Between NPs • Personal, possessive and demonstrative pronouns • Non lexicalized reflexive pronouns • Names and named entities • Definite NPs
Approaches • The field is still highly knowledge-based (constraints and preferences; centering and focusing theory), e.g. Lappin & Leass (1994), Baldwin (1996), Poesio et al. (2004) • Recently: machine learning (C4.5, Ripper, Maximum entropy) in which coreference resolution is defined as a classification task E.g. De Verenigde staten probeerden van [Pakistan en India] de belofte af te dwingen dat [ze] geen kernwapens zouden inzetten. [ze] - [de belofte] not coreferential [ze] - [Pakistan en India] coreferential [ze] - [De Verenigde Staten] not coreferential
Free text Tokenization POS tagging NP chunking NER Relation finding Instance construction Preprocessing
Positive and negative instances • Per NP type (Pronouns/Proper nouns/Common nouns) • Positive: combination of the anaphor with each preceding element in the coreference chain. • Negative: combination of the anaphor with each preceding NP which is not part of the coreference chain (search scope: <= 20 sentences) • Highly skewed class distribution: • positive: 6,457 inst. • negative: 95,919 inst.
Information sources • Positional features (eg. dist_sent, dist_NP) • Local context features • Morphological and lexical features (e.g. i/j/ij-pron, j_demon, j_def, i/j/ij-proper, num_agree) • Syntactic features (e.g. i/j/ij_SBJ/OBJ/PREDC, appositive) • String-matching features (comp_match, part_match, alias, same_head) • Semantic features (synonym, hyperonym, same_NE, (linguistic) gender of antecedent and anaphor)
Algorithms compared • Ripper • Cohen, 95 • Rule Induction • Algorithm parameters: different class ordering principles; negative conditions or not; loss ratio values; cover parameter values • TiMBL • Memory-Based Learning • Algorithm parameters: IB1, igtree; overlap, mvdm; 5 feature weighting methods; 4 distance weighting methods; different values of k
Two step procedure • First step: cross-validation • Application of Timbl and Ripper on training set; 10-fold-cv • Extensive feature selection and parameter optimization using a genetic algorithm • Undersampling of the negative class • Evaluation: accuracy, precision, recall, F-beta • Second step: testing • Training of Timbl and Ripper on train set; testing on test set. • Reconstruction of coreference chains • Evaluation using MUC scoring software
Initial population Generate new population using crossover and mutation Best individual Population of candidate solutions Evaluation based on fitness Selection GA optimization Feature weighting 0,1,2,3,4 Neighbour weighting 0,1,2,3 Values: 0,1,2 k 0 1 0 1 2 0 2 1 0 2 0 0 2 1 0 2 2 0 3 2 2.0288721872 Parameters Features
Cross-validation results Default GA optimization
Testing • Application of optimized classifiers on held-out test set • Antecedent selection: 1 antecedent per anaphor. Some basic heuristics to select the most likely antecedent among the positive instances • New evaluation procedure using the MUC scoring software: evaluation of the equivalence classes (transitive closure of a coreference chain)
2 baselines • Baseline I: link every NP to its immediately preceding NP • Baseline II: application of simple rules, viz. (i) select the closest NP with same gender and number, (ii) select the closest antecedent which matches the anaphor
Error analysis • POS tagging / chunking errors and inconsistencies: • De moeder van Moussaoui gaf een persconferentie waarin ze om een eerlijk proces vroeg. • In de opiniepeilingen liggen Jospin en Chirac zij aan zij. • Zacarias Moussaoui, de eerste persoon (…) De moeder van Moussaoui (…) • Low informativeness of some feature vectors: e.g linguistic gender vs. real gender, erroneous apposition recognition • Zij stelden dat het moeilijk zou zijn om de studie te dupliceren. Waarmee werd gezegd dat ze niet wetenschappelijk verantwoord was uitgevoerd.
Error analysis • Limited synonym recognition: • Donderdag gaven Stevaert en Picque elkaar de schuld voor het disfunctioneren van twee onbemande camera’s. Picque - bevoegd voor de erkenning van de flitspalen (…) • No recognition of hyponyms: • Zacarias Moussaoui is aangeklaagd voor de terreuraanvallen van 11 september. Hij kon door omstandigheden niet aan de kapingen deelnemen. • …. no world knowledge
Conclusion • First system for Dutch noun phrase coreference resolution • Approach works for English (results among state of the art) • Substantial room for improvement • Future work: • restart preprocessing • use web for synonym, hyp(er)onym and collocation search