350 likes | 598 Vues
Exploiting Timelines to Enhance Multi-document Summarization. Jun-Ping Ng, Yan Chen, Min-Yen Kan , Zhoujun Li DSO National Laboratories National University of Singapore Beihang University. Outline. Overview Approach Experiments and Results Discussion. Overview.
E N D
Exploiting Timelines to Enhance Multi-document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan, Zhoujun Li DSO National Laboratories National University of Singapore Beihang University
Outline • Overview • Approach • Experiments and Results • Discussion
Extractive Summarization • Find the most salient sentences in source collection • Top-k sentences are extracted to compose final summary • <Graphic>
Two Storms • A fierce cyclone packing extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years. • More than 100,000 coastal villagers have been evacuated before the cyclone made landfall. • The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP
Two Storms • A fierce cyclonepacking extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years. • More than 100,000 coastal villagers have been evacuated before the cyclone made landfall. • The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP
Temporal Processing • Based on TimeML (Pustejovsky et al 2003) • Basic temporal units – events + timexes • Three steps • Event-timex temporal relation classification • Event-event temporal relation classification • Timex normalization • Merge to obtain timelines • <TODO>
Summarization --- SWING • https://github.com/WING-NUS/SWING
Sentence Scoring • Time span importance • Contextual time span importance • Sentence temporal coverage density
Time Span Importance (TSI) • Time spans which contain many events are more salient • Sentences which references events in these time spans are thus better candidates for a summary
Contextual Time Span Importance (CTSI) • Time spans near to “important” time spans may also be important
Sentence Temporal Coverage Density (TCD) • Number of sentences in a summary is limited • Favour sentences which • contain more events • covering a wide variety of time spans
Sentence Re-ordering • SWING makes use of the Maximal Marginal Relevance (MMR) algorithm to identify redundancies in selected sentences • MMR is heavily biased towards lexicons and surface similarities
Beyond Lexical Penalties An official in Barisal, 120 kilometres south of Dhaka, spoke of severe destruction as the 500 kilometre-wide mass of cloud passed overhead. “Many trees have been uprooted and houses and schools blown away,” Mostofa Kamal, a district relief and rehabilitation officer, told AFP by telephone. “Mud huts have been damaged and the roofs of several houses blown off,” said the state’s relief minister, Mortaza Hossain.
TimeMMR • Novel dimension to redundancy detection • Beyond lexical similarities, identify sentences which contain substantial time span overlaps • Candidate sentences which share many time spans with selected sentences are penalised
Results • TAC-2010 data set to train regression model • TAC-2011 data set to test • Using timelines lead to better summaries!
Overcoming Errors • Timelines contain errors • Errors from underlying temporal processing systems • Simplifying assumptions made in timeline construction • Lack of consistency checking and validation
Reliability Filtering • Identify timelines which potentially contain more errors • Exclude these when performing summarization
Length as a Metric • Use the length of a timeline as a gauge of its “accuracy” • Drop the use of timelines which are less than the average length, computed over the whole input document collection
Results • Experiments repeated with reliability filtering • Significant improvement obtained • After filtering timelines are used in 21 out of 44 document sets
Text Example • Timelines Used • SWING
Future Work • Study the use of alternative evaluation metrics, especially for TimeMMR • Look at better metrics for reliability filtering • Expand the scope of the timelines that are used for more flexibility
Conclusion • The use of time is useful for summarization! • Sentence Scoring • Derive features from a timeline • Combine features with a supervised learning summarization framework • Sentence Re-ordering • Use overlapping time spans to identify redundancies