1 / 41

Identification of Temporal Phrases in Natural Language

Identification of Temporal Phrases in Natural Language. by Robert W. Lyon. Motivation. A large collection of legal documents Motions on September 1, 1997. September 1, 1997. 1 September 1997. 1st of September, 1997 1 September ‘97 01/09/1997 1997 September 1 1st of Sept ‘97 09/1997

enrique
Télécharger la présentation

Identification of Temporal Phrases in Natural Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of Temporal Phrasesin Natural Language by Robert W. Lyon

  2. Motivation • A large collection of legal documents • Motions on September 1, 1997

  3. September 1, 1997 1 September 1997 1st of September, 1997 1 September ‘97 01/09/1997 1997 September 1 1st of Sept ‘97 09/1997 1st of Sept 1997 1st after Sept. 1, 1997 1 Sept 1997 after the 1st of September, 1997 1 Sept ‘97 after the 1st of Sept., 1997 the first of Sept. Sept. 1 the first of Sept., 1997 09/01 1st of Sept 01/09 1 Sep 1997 Sept the month of Sept after Sep. 1, 1997 1 Sep ‘97 after the 1st of Sep., 1997 the year of 1997 Sep 1st of Sep 1997 the 1st of Sept. 1st of Sep ‘97 the first of Sep., 1997 Sep. 1 the first of Sep. 1st of Sep September 1st, 1997 Sept. 1, 1997 Sep. 1, ‘97 September first September 1997 09/01/1997 1st of September 1997 … and many others after September 1, 1997

  4. Motivation Motions on September 1, 1997 through October 17, 1997

  5. Question • What do we need to search on Time?

  6. Time Search Application • Identification • Model • Index • Search

  7. Identification • Structures • Tools • Corpus • Process

  8. Structures • Atoms • Components • Patterns • Groupings

  9. Atoms One variation of a logical part of a temporal phrase Example: • 1 … 31 • 1st … 31st • First … Thirty-first

  10. Components A group of interchangeable atoms Example: • September 1 • September 1st • September First

  11. Patterns A specific ordering of components Example: • September 1, 1997 mdy • 1 September 1997 dmy

  12. Groupings A related set of patterns Example: • The Calendar Reference Grouping • The Landmark Reference Grouping

  13. Perl Implementation • Regular Expressions • Perl Constructions # Ordinal Day Atom $atom_day_ordinal = $the_prefix . "[0-3]?\\d(?:st|nd|rd|th|d)" . $non_eos_period;

  14. Without Structures (?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:(?:(?:January|February|March|April|May|June|July|August|September|October|November|December|(?:Febr|Sept|Octob)\.?|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?)(?:(?:,?\s)|(?:[-/])))|(?:(?:[0-1]?\d)(?:[-/])))(?:(?:[0-3]?\d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?[0-3]?\d(?:st|nd|rd|th|d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?(?:First|Second|Third|Fourth|Fifth|Sixth|Seventh|Eighth|Ninth|Tenth|Eleventh|Twelfth|Thirteenth|Fourteenth|Fifteenth|Sixteenth|Seventeenth|Eighteenth|Nineteenth|Twentieth|Twenty-first|Twenty-second|Twenty-third|Twenty-fourth|Twenty-fifth|Twenty-eight|Twenty-ninth|Thirtieth|Thirty-first|FIRST|SECOND|THIRD|FOURTH|FIFTH|SIXTH|SEVENTH|EIGHTH|NINTH|TENTH|ELEVENTH|TWELFTH|THIRTEENTH|FOURTEENTH|FIFTEENTH|SIXTEENTH|SEVENTEENTH|EIGHTEENTH|NINETEENTH|TWENTIETH|TWENTY-FIRST|TWENTY-SECOND|TWENTY-THIRD|TWENTY-FOURTH|TWENTY-FIFTH|TWENTY-EIGHT|TWENTY-NINTH|THIRTIETH|THIRTY-FIRST|first|second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth|eleventh|twelfth|thirteenth|fourteenth|fifteenth|sixteenth|seventeenth|eighteenth|nineteenth|twentieth|twenty-first|twenty-second|twenty-third|twenty-fourth|twenty-fifth|twenty-eight|twenty-ninth|thirtieth|thirty-first))(?:(?:,?\s)|(?:[-/]))(?:(?:'?[1-9]?\d?\d?\d)(?:/\d)?(?:\.(?! *[\r\n\f]))?)(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:(?:[0-3]?\d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?[0-3]?\d(?:st|nd|rd|th|d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?(?:First|Second|Third|Fourth|Fifth|Sixth|Seventh|Eighth|Ninth|Tenth|Eleventh|Twelfth|Thirteenth|Fourteenth|Fifteenth|Sixteenth|Seventeenth|Eighteenth|Nineteenth|Twentieth|Twenty-first|Twenty-second|Twenty-third|Twenty-fourth|Twenty-fifth|Twenty-eight|Twenty-ninth|Thirtieth|Thirty-first|FIRST|SECOND|THIRD|FOURTH|FIFTH|SIXTH|SEVENTH|EIGHTH|NINTH|TENTH|ELEVENTH|TWELFTH|THIRTEENTH|FOURTEENTH|FIFTEENTH|SIXTEENTH|SEVENTEENTH|EIGHTEENTH|NINETEENTH|TWENTIETH|TWENTY-FIRST|TWENTY-SECOND|TWENTY-THIRD|TWENTY-FOURTH|TWENTY-FIFTH|TWENTY-EIGHT|TWENTY-NINTH|THIRTIETH|THIRTY-FIRST|first|second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth|eleventh|twelfth|thirteenth|fourteenth|fifteenth|sixteenth|seventeenth|eighteenth|nineteenth|twentieth|twenty-first|twenty-second|twenty-third|twenty-fourth|twenty-fifth|twenty-eight|twenty-ninth|thirtieth|thirty-first))(?:(?:(?:(?:,?\s)|(?:[-/]))(?:day\s)?(?:(?:of)\s)?(?:January|February|March|April|May|June|July|August|September|October|November|December|(?:Febr|Sept|Octob)\.?|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?))|(?:(?:[-/])(?:[0-1]?\d)))(?:(?:,?\s)|(?:[-/]))(?:(?:'?[1-9]?\d?\d?\d)(?:/\d)?(?:\.(?! *[\r\n\f]))?)(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:(?:(?:January|February|March|April|May|June|July|August|September|October|November|December|(?:Febr|Sept|Octob)\.?|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?)(?:(?:,?\s)|(?:[-/])))|(?:(?:[0-1]?\d)(?:[-/])))(?:(?:[0-3]?\d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?[0-3]?\d(?:st|nd|rd|th|d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?(?:First|Second|Third|Fourth|Fifth|Sixth|Seventh|Eighth|Ninth|Tenth|Eleventh|Twelfth|Thirteenth|Fourteenth|Fifteenth|Sixteenth|Seventeenth|Eighteenth|Nineteenth|Twentieth|Twenty-first|Twenty-second|Twenty-third|Twenty-fourth|Twenty-fifth|Twenty-eight|Twenty-ninth|Thirtieth|Thirty-first|FIRST|SECOND|THIRD|FOURTH|FIFTH|SIXTH|SEVENTH|EIGHTH|NINTH|TENTH|ELEVENTH|TWELFTH|THIRTEENTH|FOURTEENTH|FIFTEENTH|SIXTEENTH|SEVENTEENTH|EIGHTEENTH|NINETEENTH|TWENTIETH|TWENTY-FIRST|TWENTY-SECOND|TWENTY-THIRD|TWENTY-FOURTH|TWENTY-FIFTH|TWENTY-EIGHT|TWENTY-NINTH|THIRTIETH|THIRTY-FIRST|first|second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth|eleventh|twelfth|thirteenth|fourteenth|fifteenth|sixteenth|seventeenth|eighteenth|nineteenth|twentieth|twenty-first|twenty-second|twenty-third|twenty-fourth|twenty-fifth|twenty-eight|twenty-ninth|thirtieth|thirty-first))(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:January|February|March|April|May|June|July|August|September|October|November|December|(?:Febr|Sept|Octob)\.?|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?)(?:(?:,?\s)|(?:[-/]))(?:(?:'?[1-9]?\d?\d?\d)(?:/\d)?(?:\.(?! *[\r\n\f]))?)(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:(?:[0-3]?\d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?[0-3]?\d(?:st|nd|rd|th|d)(?:\.(?! *[\r\n\f]))?|(?:the\s)?(?:First|Second|Third|Fourth|Fifth|Sixth|Seventh|Eighth|Ninth|Tenth|Eleventh|Twelfth|Thirteenth|Fourteenth|Fifteenth|Sixteenth|Seventeenth|Eighteenth|Nineteenth|Twentieth|Twenty-first|Twenty-second|Twenty-third|Twenty-fourth|Twenty-fifth|Twenty-eight|Twenty-ninth|Thirtieth|Thirty-first|FIRST|SECOND|THIRD|FOURTH|FIFTH|SIXTH|SEVENTH|EIGHTH|NINTH|TENTH|ELEVENTH|TWELFTH|THIRTEENTH|FOURTEENTH|FIFTEENTH|SIXTEENTH|SEVENTEENTH|EIGHTEENTH|NINETEENTH|TWENTIETH|TWENTY-FIRST|TWENTY-SECOND|TWENTY-THIRD|TWENTY-FOURTH|TWENTY-FIFTH|TWENTY-EIGHT|TWENTY-NINTH|THIRTIETH|THIRTY-FIRST|first|second|third|fourth|fifth|sixth|seventh|eighth|ninth|tenth|eleventh|twelfth|thirteenth|fourteenth|fifteenth|sixteenth|seventeenth|eighteenth|nineteenth|twentieth|twenty-first|twenty-second|twenty-third|twenty-fourth|twenty-fifth|twenty-eight|twenty-ninth|thirtieth|thirty-first))(?:(?:(?:(?:,?\s)|(?:[-/]))(?:day\s)?(?:(?:of)\s)?(?:January|February|March|April|May|June|July|August|September|October|November|December|(?:Febr|Sept|Octob)\.?|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?))|(?:(?:[-/])(?:[0-1]?\d)))(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:the\syear\s)?(?:(?:[1-2]\d\d\d)(?:/\d)?(?:\.(?! *[\r\n\f]))?)(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:the\s(?:month|beginning|middle|last|latter\spart)\sof\s)?(?:January|February|March|April|May|June|July|August|September|October|November|December|(?:Febr|Sept|Octob)\.?|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\.?)(?:(?:In|About|Before|After|Around|On|Since|During|Of|Until|Till|From|To|For|IN|ABOUT|BEFORE|AFTER|AROUND|ON|SINCE|DURING|OF|UNTIL|TILL|FROM|TO|FOR|in|about|before|after|around|on|since|during|of|until|till|from|to|for)\s)?(?:(?:the\s)?[0-3]?\d(?:st|nd|rd|th|d)(?:\.(?! *[\r\n\f]))?)

  15. Patterns

  16. Tools • 11 Perl Tools • apply • check • compare • convert • dump • edit • expand • filter • remove • split • stats Apply

  17. Corpus • Electronic Documents • 1.81 MB • Copies • Original Corpus • Tagged Corpus

  18. Original Corpus He died Aug. 17. 1757, leaving my mother a widow who lived till 1776, with 6 daughters and 2 sons, myself the elder.

  19. Tags /#phrase#type#/ • phrase temporal phrase • type structure pattern

  20. Tagged Corpus He died /#Aug. 17. 1757#cmdy#/, leaving my mother a widow who lived /#till 1776#cy#/, with 6 daughters and 2 sons, myself the elder.

  21. Process Run • Dump • Apply • Compare • Analyze

  22. S1 Dump Dump Tagged Corpus

  23. Temporal Phrase Sets #^type^phrase • # line number • type structure pattern • phrase temporal phrase

  24. Apply S2 Apply Original Corpus Structures

  25. Compare S3, S4, S5 Compare S1 S2

  26. S3, S4, S5 • S3 - True Positives Correct! • in S1 and S2 • S4 - False Positives Wrong! • in S2, but not S1 • S5 - Negatives Oops! • in S1, but not S2

  27. Results Stats Structures Filter Expand Analyze S3, S4, S5

  28. Run 4 • Filtered: cy • 24 / 50 False Positives : Preposition • Optional Preposition Prefix • Accuracy: +8.16% (+3.69%)

  29. True Positives * 100 (True Positives + False Positives + Negatives) Accuracy

  30. True Positives * 100 (True Positives + False Positives + Negatives) Precision

  31. True Positives * 100 (True Positives + False Positives + Negatives) Recall

  32. Start Knowledge • Cardinal and Ordinal Days • Months and Abbreviations • One to Four Digit Years • Eight Patterns • Punctuation

  33. Runs

  34. Added Knowledge • Prepositions • Capitalization • Descriptors • Punctuation • Numeric Formats • Word Formats • Narrowing • Abbreviations

  35. Accuracy • Accuracy: 86.28% • Remove: cd, cm, cy • Narrowed Accuracy: 96.38% • Precision: 96.38% Recall: 100%

  36. Error Examples • July 30. 31. • 10-12 • Oct. 3 and 17, 1868

  37. Testing Corpus • Sealed Portion of Corpus • Accuracy: 78.21% • Narrowed Accuracy: 84.00%

  38. Time Search Application • Identification • Model • Index • Search

  39. Model September 1, 1997 September 1997 after September 1, 1997 around September 1, 1997

  40. Index & Search • September 1, 1997 • all variations (i.e. “1 Sept, 1997”) • September, 1997 • all variations in range (i.e. “Sept. 5, 97”) • September 1, 1997 • sub-range of “September 1997”

  41. Conclusion • Structures • Tagged Corpus ( 1.81 MB ) • 11 Tools • Accuracy: 86.28% ( 96.38% ) • Future Work: Model, Index, Search

More Related