1 / 35

Exploiting Timelines to Enhance Multi-document Summarization

Exploiting Timelines to Enhance Multi-document Summarization. Jun-Ping Ng, Yan Chen, Min-Yen Kan , Zhoujun Li DSO National Laboratories National University of Singapore Beihang University. Outline. Overview Approach Experiments and Results Discussion. Overview.

marlee
Télécharger la présentation

Exploiting Timelines to Enhance Multi-document Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Timelines to Enhance Multi-document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan, Zhoujun Li DSO National Laboratories National University of Singapore Beihang University

  2. Outline • Overview • Approach • Experiments and Results • Discussion

  3. Overview

  4. Multi-document Summarization

  5. Extractive Summarization • Find the most salient sentences in source collection • Top-k sentences are extracted to compose final summary • <Graphic>

  6. Two Storms • A fierce cyclone packing extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years. • More than 100,000 coastal villagers have been evacuated before the cyclone made landfall. • The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP

  7. Two Storms • A fierce cyclonepacking extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years. • More than 100,000 coastal villagers have been evacuated before the cyclone made landfall. • The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP

  8. Timeline

  9. Approach

  10. Merging Timelines Into Summarization

  11. Temporal Processing • Based on TimeML (Pustejovsky et al 2003) • Basic temporal units – events + timexes • Three steps • Event-timex temporal relation classification • Event-event temporal relation classification • Timex normalization • Merge to obtain timelines • <TODO>

  12. Timelines

  13. Summarization --- SWING • https://github.com/WING-NUS/SWING

  14. Sentence Scoring • Time span importance • Contextual time span importance • Sentence temporal coverage density

  15. Defining Timeline Features

  16. Time Span Importance (TSI) • Time spans which contain many events are more salient • Sentences which references events in these time spans are thus better candidates for a summary

  17. Scoring TSI

  18. Contextual Time Span Importance (CTSI) • Time spans near to “important” time spans may also be important

  19. Scoring CTSI

  20. Sentence Temporal Coverage Density (TCD) • Number of sentences in a summary is limited • Favour sentences which • contain more events • covering a wide variety of time spans

  21. Scoring TCD

  22. Sentence Re-ordering • SWING makes use of the Maximal Marginal Relevance (MMR) algorithm to identify redundancies in selected sentences • MMR is heavily biased towards lexicons and surface similarities

  23. Beyond Lexical Penalties An official in Barisal, 120 kilometres south of Dhaka, spoke of severe destruction as the 500 kilometre-wide mass of cloud passed overhead. “Many trees have been uprooted and houses and schools blown away,” Mostofa Kamal, a district relief and rehabilitation officer, told AFP by telephone. “Mud huts have been damaged and the roofs of several houses blown off,” said the state’s relief minister, Mortaza Hossain.

  24. TimeMMR • Novel dimension to redundancy detection • Beyond lexical similarities, identify sentences which contain substantial time span overlaps • Candidate sentences which share many time spans with selected sentences are penalised

  25. Experiments and Results

  26. Results • TAC-2010 data set to train regression model • TAC-2011 data set to test • Using timelines lead to better summaries!

  27. Overcoming Errors • Timelines contain errors • Errors from underlying temporal processing systems • Simplifying assumptions made in timeline construction • Lack of consistency checking and validation

  28. Reliability Filtering • Identify timelines which potentially contain more errors • Exclude these when performing summarization

  29. Length as a Metric • Use the length of a timeline as a gauge of its “accuracy” • Drop the use of timelines which are less than the average length, computed over the whole input document collection

  30. Results • Experiments repeated with reliability filtering • Significant improvement obtained • After filtering timelines are used in 21 out of 44 document sets

  31. Discussion

  32. Text Example • Timelines Used • SWING

  33. Future Work • Study the use of alternative evaluation metrics, especially for TimeMMR • Look at better metrics for reliability filtering • Expand the scope of the timelines that are used for more flexibility

  34. Conclusion • The use of time is useful for summarization! • Sentence Scoring • Derive features from a timeline • Combine features with a supervised learning summarization framework • Sentence Re-ordering • Use overlapping time spans to identify redundancies

  35. Thank you!

More Related