SUTime
E N D
Presentation Transcript
SUTime JavaNLP time annotations
What does SUTime do? • Similar to GUTime • Recognizes time expressions using patterns • Deterministic, based on regular expression patterns • Greedy (picks longest sequence of tokens that may represent a time expression) • Normalizes time expressions • Annotations follow TimeML TIMEX3 standard • http://www.timeml.org/site/publications/timeMLdocs/timeml_1.2.1.html#timex3 • XSD: http://www.timeml.org/timeMLdocs/TimeML.xsd • Extensions for time expressions that are not supported by TIMEX3 standard • Resolves relative times with respect to reference date
SUTime Time Representation • Main Temporal types • Time – A instance in time (2011-08-11), can be partially specified (Friday), with limited granularity • Duration - A length of time (3 days) • Range – Time interval with start and end points • Set – A set of temporals • Periodic sets: Every Friday
SUTime Representation • Time • Standard date and times (in years, months, days, day of week, hours, minutes, seconds, milliseconds) • Common times: Seasons (e.g. winter), Time of day (e.g. morning), Weekend • Partial Times (June => XXXX-06) • Relative Time (last week) • Duration • Exact durations (specified in milliseconds or in fields) • Inexact durations (a few years => PXY) • Duration ranges (2 to 3 months => P2M/P3M)
SUTime Limitations • Holidays are not supported • Support for ranges is poor • from 3 to 4 p.m is identified as 15:57:00 • 12-13 March 2011 (12-13 is ignored) • Resolving relative expressions with respect to the given reference date can be problematic • Handling of ambiguous phrases is poor • Some common words (e.g. spring/fall) are always identified as a temporal expression • Patterns are language (English) specific • …
SUTime Usage • TimeAnnotator • TimeAnnotatortimeAnnotator = new TimeAnnotator(“sutime”, properties); • Properties: • Specifies SUTime options (prefixed by “sutime.”) • Pipeline • TimeAnnotator should come after the tokenizer, sentence splitter, and pos tagger • Optional (also before): NER or NumberAnnotator/QuantifiableEntityNormalizingAnnotator
SUTime input annotations • DocDateAnnotation (String) • If present, then the string is interpreted as a date/time and used as the reference document date with respect to which other temporal expressions are resolved • SentencesAnnotation (List<CoreMap>) • If present, time expressions will be extracted from each sentence and each sentence will be annotated individually. • TokensAnnotations (List<CoreLabel>) • Required either at the entire annotation level or per sentence level.
SUTime output annotations • Timex.Annotations (List<CoreMap>) • List of time expressions (each a CoreMap) • On the entire annotation and also for each sentence • Time annotations (for each time expression/CoreMap)
SUTime output annotations • Standard annotations (for each time expression) Note: Indices are 0-based, and always relative to the original annotation. Begin indices are inclusive, end indices are exclusive.
Examples (GUTime unsupported) Reference Date is 2011-08-01
Examples (SUTime unsupported) Reference Date is 2011-08-01