170 likes | 269 Vues
Anaphor Resolution in Norwegian. Gordana Ili c Holen Institut for lingvistiske fag Det historisk-filosofiske fakultet Universitetet i Oslo g.i.holen@hfstud.uio.no. Some technical data. Hovedfagsoppgave (incl. obligatory courses, a 4 semestrer project)
E N D
Anaphor Resolution in Norwegian Gordana Ilic Holen Institut for lingvistiske fag Det historisk-filosofiske fakultet Universitetet i Oslo g.i.holen@hfstud.uio.no
Some technical data Hovedfagsoppgave (incl. obligatory courses, a 4 semestrer project) Aim: Making a system for resolving pronominal anaphors in Norwegian. Mentor: Janne Bondi Johannessen Implementation in (CLOS) LISP To be finished Christmas 2003 Fefor
Where did it start? Martin Hassel, 2000 Made AR system for Swedish pronouns han/ honom/ hans and hon/ henne /hennes Differences Planning to cover more pronouns A different theoretical background Fefor
The Top List • Han/ ham/ hans and hun/ henne/ hennes • Among the most used; not ambiguous • Seg and selv • Syntactic solutions • Den • Ambiguous with the determinative den (gule bilen). Fefor
The Top Wish List • De • Ambiguous with a determinative de (gule bilene) • Problems delimiting the antecedent • Det • Problems in deciding whether det is pronominal • det (gule huset) • det (regner) Fefor
Approach To be based on • Mitkov's anaphora resolution system/ MARS (Mitkov 1996, 1998) and partially on • Resolution of Anaphora Procedures/ RAP (Leass & Lappin 1994). Fefor
Why MARS and RAP • Both made for English • MARS: intuitive, fully automated • RAP: high precision • Flexible Fefor
MARS • No parsing • The AR module uses a list of preferences called antecedent indicators • Boosting • Impeding • Fully automatic, not very high precision (60 - 61%) Fefor
MARS: The algorithm • The text is POS tagged. • NPs are extracted by a NP-extractor • NPs which precede the anaphor (in a two-sentence scope) are located • Gender and number constraints are applied • Antecedent indicators are applied to the antecedent candidates that agree in gender and number. The scores (2, 1, 0 or -1) are assigned. • The NP with the highest score is proposed as antecedent. Fefor
MARS: Antecedent indicators(boosting) • First noun phrases +1 • Indicating verbs +1 • Lexical reiteration +2 / +1 • Section heading preference +1 • Collocation match +2 • Immediate reference +2 • Sequential instructions +2 • Term preference +2 Fefor
MARS: Antecedent indicators(boosting) • First noun phrases +1 • Indicating verbs +1 • Lexical reiteration +2 / +1 • Section heading preference +1 • Collocation match +2 • Immediate reference +2 • Sequential instructions +2 • Term preference +2 Fefor
MARS: Antecedent indicators(boosting) • First noun phrases +1 • Indicating verbs +1 • Lexical reiteration +2 / +1 • Section heading preference +1 • Collocation match +2 • Immediate reference +2 • Sequential instructions +2 • Term preference +2 Fefor
MARS: Antecedent indicators(impeding) • Indefiniteness -1 • Prepositional NPs -1 Fefor
RAP • A high precision system (86% correctly resolved anaphors) • Originally based on parsed text, but there exists a version without (Kennedy and Boguraev, 1996) • The AR module: Salience weighting Fefor
RAP: Salience weighting • Salience factors: • Sentence recency 100 • Subject emphasis 80 • Head noun emphasis 80 • Existential emphasis 70 • Accusative emphasis 50 • Non-adverbial emphasis 50 • IO and oblique component emphasis 40 Fefor
Modifications As both systems exist in versions with or without parsing, leaving this question open. Starting with using Oslo Corpus for training and adjusting • Experiment with antecedent indicators and adjust them for Norwegian • Try to combine them with RAP’s salience factors Fefor
Open for suggestions g.i.holen@hfstud.uio.no Fefor