Ani Nenkova, Stanford University Lucy Vanderwende, Microsoft Research

A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende, Microsoft Research Kathleen McKeown, Columbia University SIGIR 06

Why is summarization important?

Summarizing multi-document Pan Am bombing Libya suspects Gadhafi trial Libya refuses to surrender two Pan Am bombing suspects UK and USA ???

Introduction • Most current automatic summarization systems rely on sentence extraction • Common approaches for identifying important sentences to include in the summary: • training a binary classifier • training a Markov model • directly assigning weights to sentences • But the question of which components and features of automatic summarizers contribute most to their performance has largely remained unanswered

Introduction (cont’d) • In this paper, examining several design decisions and the impact they have on the performance of generic multi-document summarizers of news • Content word frequency • content words such as nouns, verbs and adjectives serve as surrogates for the atomic units of meaning in text • Choice of composition function • it must estimatethe importance of larger text units, typically sentences • Context sensitivity • The notion of importance is not static: it depends on what has been already said in a summary

Frequency in Human Summaries • Performance Evaluation - Human agreement • different people choose different content for their summaries • the degree of overlap between input documents and human summaries • Investigating the association between content that appears frequently, and the likelihood that it will be selected by a human summarizer

Frequency in Human Summaries (cont’d)Content word frequency and importance • DUC (Document Understanding Conference) 2003, • 30 test sets • Input documents, four human abstracts, automatic summaries • Each sets: 10 documents, 100 words summary • The counts for frequency in the input were taken over the concatenation of the documents (All) in the input set

Frequency in Human Summaries (cont’d)Content word frequency and importance Words frequent in the input appear in human summaries • Are content words that are very frequent in the input likely to appear in at least one of the human summaries? • Exclude stop words • Use only nouns, verbs and adjectives • Result: The high frequency words from the input are very likely to appear in the human models • For the automatic summarizer, the trend to include more frequent words is preserved

Frequency in Human Summaries (cont’d)Content word frequency and importance Humans agree on words that are frequent in the input • The words that human summarizers agreed to use in their summaries include the high frequency ones • In the 30 sets of DUC 2003 data, the state-of-the-art machine summary contained 69% of the words appearing in all 4 human models and 46% of the words that appeared in 3 models

Frequency in Human Summaries (cont’d)Content word frequency and importance Formalizing frequency: the multinomial model • The findings from the previous sections suggest that frequency in the inputs is strongly indicative of whether a word will be used in a human summary • Likelihood of a summary • N is the number of words in the summary • r is the number of unique words in the summary • Ni is the number of times word wi appears in the summary • p(wi) is the probability wi appearing in the summary estimated from the input documents

Frequency in Human Summaries (cont’d)Content word frequency and importance Formalizing frequency: the multinomial model (cont’d) • The log likelihood of summaries produced by human summarizers were overall higher than for those produced by systems log[L(sum; p(wi))] =

Frequency in Human Summaries (cont’d)Frequency of semantic content • We established that high-frequency content words in the input are very likely to be used in human summaries • But, not the same facts have covered • A better granularity for such investigation is the semantic content unit, an atomic fact expressed in a text • Pyramid approach for evaluation • In annotation procedure, the content units are manually annotated, and expressions with the same meaning are linked together • 11 sets of DUC 2004

Frequency in Human Summaries (cont’d)Frequency of semantic content • As in the study for words, we looked at the N most frequent content units in the inputs and calculated the percentage of these that appeared in any of the human summaries • The 5 most frequent content units, 96% appeared in a human summary across the 11 sets • Top 8 and top 12 content units were 92% and 85%

Composition Functions • Sentence is the usual units for extraction in summarization • How is the frequency of words to be combined in order to get an estimate for the importance of sentences • We can define a family of summarizers, SUMCF , where CF is the combination function

Composition Functions (cont’d) Input Text • Context-sensitive frequency-based summarizer Sentence Splitter Sentences Calculating Term Probability based on term frequency Calculating CF Value for each Sentence Words Probability Verbs, nouns, adjectives, numbers Sentences CF Value the specific choice has a huge impact on the performance of the summarizer; not all frequency-based summarizers perform well

Context Adjustment Input Text • Using frequency alone to determine summary content in multi-document summarization will result in a repetitive summary Sentence Splitter Sentences Calculating Term Probability based on term frequency Calculating CF Value for each Sentence Words Probability Verbs, nouns, adjectives, numbers Select sentence with max score for each round, and reassign word probability Sentences CF Value

Context Adjustment (cont’d) • It gives the summarizer sensitivity to context • We also allow words with initially low probability to have higher impact on the choice of subsequent sentences • In terms of content units, the inclusion of the same unit twice in the same summary is rather improbable

Evaluation Results • Document Understanding Conference (DUC) • 50 test sets • We used the data from the 2003 DUC conference for development and the data from the 2004 DUC as test data • The choice of combination function CF has a significant impact on summarizer performance • CF=Product: shorter sentences • CF=Sum: more words in sentence • CF=Average: compromise

Evaluation Results (cont’d) • All summaries were truncated to 100 words for the evaluation HMM System SUMCF summarizers are non-supervised 3 vs 13 repeated content unit Significantly worse

Evaluation Results (cont’d) • Machine translation and summarization evaluation 2005 • Workshop at ACL only 10 of test sets

Conclusions (1/2) • The analysis using the DUC datasets shows that frequency has a powerful impact on the performance of summarization systems • And, a good composition function is used • Results show that SUMAvr yields a system that performs comparably to other state-of-the-art systems and that outperforms many of the participating systems • More importantly, repetition in the summary significantly decreases

Conclusions (2/2) • These results suggest that: more complex combination of features used by state-of-the-art systems today may not be necessary • The fact that composition plays an important role in performance, but is an unknown for most state-of-the-art systems, who often do not report

Comments

Pyramid score for evaluation • New summary with n content units

Ani Nenkova, Stanford University Lucy Vanderwende, Microsoft Research

Ani Nenkova, Stanford University Lucy Vanderwende, Microsoft Research

Presentation Transcript

Stanford University Medical Scholars Research Program

Stanford University Tom Andriacchi

Sara Bible, Dean of Research Office Stanford University

GPS and GNSS Research at Stanford University

Daniel Schwartz Stanford University

Website Research Lucy Proctor

Stanford University

Microsoft Research Stanford University

Stanford University

Stanford University

Martin Kay Stanford University

Scott Fendorf Stanford University

Maria Giovanna Dainotti Stanford University, Stanford, USA

Devavrat Shah Stanford University

Stanford University

Stanford University Undergraduate Advising and Research

Stanford University Medical Scholars Research Program

Stanford University

Stanford University

Microsoft Research Stanford University

Stanford University

Stanford University Medical Scholars Research Program