220 likes | 363 Vues
Dealing with Italian Temporal Expressions: the ITA-Chronos System. Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy negri@itc.it EVALITA 2007 - Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007. Outline.
E N D
Dealing with Italian Temporal Expressions: the ITA-Chronos System Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy negri@itc.it EVALITA 2007 - Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System Outline • Chronos: a multilingual system for TE recognition/normalization • System description • Some examples • Results at EVALITA 2007
Dealing with Italian Temporal Expressions: the ITA-Chronos System Chronos • Multilingual (ITA/ENG) tool for TE recognition and normalization according to the TIMEX2 standard • Approach • Rule-based system • ENG-Chronos: 1500 rules • ITA-Chronos: 981 rules • Six phases: Preprocessing, Detection, Braketing, Information Gathering, Anchors Selection, Normalization • ENG-Chronos participated in TERN-04 with good results on the “Recognition+Normalization Task” • Ranked 2nd, with 76% TERN-Value (best system: 78%)
Dealing with Italian Temporal Expressions: the ITA-Chronos System ITA-Chronos: System Architecture Plain Text Tagged Text Intermediate Annotation Tokenization, POS Tagging, Multiwords Recognition DetectionBasic Tagging Rules Attributes Normalization Bracketing Composition Rules Dates Normalization Information GatheringTagging Rules for: SET, Anchor_Dir, Anchor_Val, MOD Type, T_Cat, Heur, Op, Quant, Val_Ext Anchors Selection Detection and Bracketing Normalization
Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP1: Preprocessing • The first phase of the process performs: • Tokenization • POS tagging • Multiwords recognition • The preprocessed input text is then passed to the TE detection phase, where around 400 tagging rules are in charge of finding all the TEs it contains.
Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP2: Detection • Markable expressions are detected considering the presence of lexical triggers in the input text • “anno”, “oggi”, “Venerdì”, “Natale”, “quotidianamente”, “10/09/2007”, “1982”, etc. • Basic Tagging Rules • Regular expressions checking for: word senses, parts of speech, symbols, or words satisfying specific predicates …“E” = preposition …“N” = numeral …TimeUnit-p satisfied by: “secondo”, “minuto”, “ora”, “giorno”, “settimana”, “mese”, etc. Tagging rule matching with “Fra tre giorni”
Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP3: Bracketing • Considers the context surrounding the detected triggers • “inizio”, “fine”, “prima”, “dopo”, “fa”, “successivo”, “precedente”, “durante”, “circa”, “almeno”, “3”, “sesto”, etc. • Composition rules: • In charge of handling conflicts between possible multiple taggings (e.g. when a recognized TE contains, overlaps, or is adjacent to one or more detected TEs) Tutta la notte di sabato Tutta la notte la notte la notte di sabato sabato Tutta la notte di sabato Composition rulefor handling inclusions
Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP4: Information gathering • Goal: mine relevant information for normalization • Considers triggers+context to assign values to • TIMEX2 attributes(e.g. SET, MOD, ANCHOR_DIR) • TEMPORARY attributes(e.g. Type, T_Cat, Heur, Op, Quant) • This is done by running separate sets of specialized tagging rules • Such information is stored in the Intermediate Annotation, and input to the normalization component
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE]
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltretre anni dopo Detected TE • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE]
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre treanni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltretreanni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN ENDING
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltretre anni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltretreanni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR +
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltre tre anni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR + 3
Dealing with Italian Temporal Expressions: the ITA-Chronos System Information Gathering: Example oltretre anni dopo • TIMEX2 attributes • MOD: “più di”, “circa”, “oltre” … • SET: “ogni”, “tutti” … • ANCHOR_DIR: “prima”, “durante”, “dopo”... • TEMPORARY attributes • type:[T-ABS | T-REL] • t-cat: [second, minute, hour, day,…] • op: [=, +, -] • quant: [n≥0] • heur: [CR-DATE | PR-DATE] MORE_THAN ENDING T-REL YEAR + 3 PR-DATE
Dealing with Italian Temporal Expressions: the ITA-Chronos System Intermediate Annotation: Example adige20041007_id413938 “…Così il 31 Luglio del 2002, quindioltre tre anni dopol’incidente, il giovane venne nuovamente ricoverato e sottoposto ad un intervento che si dimostrerà risolutivo…” …quindi <TIMEX2MOD=“MORE_THAN” ANCHOR_DIR=“ENDING”type=“T-REL” t-cat=“YEAR” op=“+” quant=“3”, heur=“PR-DATE>oltre tre anni dopo </TIMEX2> l’incidente… Plain Text Detection and Bracketing Intermediate Annotation
Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP5: Anchors Selection • Goal: connect each detected T-REL to an appropriate anchor date • While the meaning of T-ABSs (“13 Marzo 2005”) is context-independent, T-RELs (“tre anni dopo”) can only be interpreted with respect to e reference TE • The “heur” attribute is used for this purpose • 2 heuristics: CR-DATE: connects a T-REL to the document’s creation date (found at the beginning of the doc, or induced from doc’s name. e.g. “adige20041007_…) PR-DATE: connects a T-REL to the nearest detected TE with a compatible granularity (a “t-cat” with at least the same degree of specificity) t-cat= “month” “month”, “week”, “day”,“century”
Dealing with Italian Temporal Expressions: the ITA-Chronos System STEP6: Dates Normalization • Goal: fill the VAL attribute of each detected TE T-ABSs: regular expressions considering their superficial form (“1990s” “199”) T-RELs: rewriting rules considering the anchor(e.g. “2002”) the operator (“OP”) to be applied (e.g. “+”) the quantity (“QUANT”) to be added/subtracted (e.g. “3”) tre anni dopo “2002” “+” “3” 2005
Dealing with Italian Temporal Expressions: the ITA-Chronos System ITA-Chronos at EVALITA 2007 • Results over the EVALITA-07 test set (27’15’’ computation time, ~50 words/sec) • Higher scores on MOD and SET attributes • Activated by the presence of triggers that are easy to identify • Lower scores with ANCHOR_VAL and ANCHOR_DIR • Require the analysis of a larger context, e.g. including verb tense
Dealing with Italian Temporal Expressions: the ITA-Chronos System Web Demo http://www.qallme.itc.it/server/chronos/italian