Automatic Text Summarization Introduction and Research Problems

Automatic Text SummarizationIntroduction and Research Problems

Talk Outline • Why automatic text summarization? • What is automatic text summarization? • Classification of methodologies. • Multiple Document Summarization. • Evaluation. • Research Problems. • Demo.

Why Summarization? • Information overload. • The problem: • 4 Billion URLs indexed by Google • 200 TB of data on the Web [Lyman and Varian 03] • Possible approaches: • information retrieval • document clustering • information extraction • visualization • question answering • text summarization

Summarization Examples

What’s Summarization for? • Need to access textual information • indexing • search • Decision making process • Should I read the document? • Summary as surrogate • I don’t need to read the document! • Crossing the language barrier • Should I ask for a translation of the document?

What’s Summarization for? • User oriented summaries (“slanted”) • E-mail me summaries of the news I like • Summaries in hand-held devices • Spoken summaries • Government analysts • Need profiles of persons and organizations • Scientists and academics • Need summaries of state of the art • students • Need summary for tomorrow’s exams

What is Summarization? • Definition: A brief but accurate representation of the contents of a document.

Types of Summaries • Purpose • Indicative, informative, and critical summaries • Form • Extracts (representative paragraphs/sentences/phrases) (majority of approaches) • Abstracts: “a concise summary of the central subject matter of a document” [Paice90]. • Dimensions • Single-document vs. multi-document • Context • Query-specific vs. query-independent • Monolingual vs. Multilingual.

Types of Summaries Genres: • headlines • outlines • minutes • biographies • abridgments • sound bites • movie summaries • chronologies, etc.

People Involved • Author of the document • Expert in a field • Professional abstractor • An expert in abstract writing

Stages in Summarization • Three stages (typically) • content identification • conceptual organization • realization

Stages in Automatic Summarization Summarization system Document Preprocessing Document Representation Summary representation Summary Summary generation

Stages in Automatic Summarization Preprocessing: • morphological analysis or stemming • found => find + ed • morphological features (time, person, aspect, • etc) • parsing • syntactic tree • total or partial (chunking) • symbolic vs. statistical • rhetorical parsing

Stages in Automatic Summarization Document representation: • Set of linguistic components (paragraph, sentences, word, …). • Boolean representation. • Term vector space. Summary representation: • Subset of sentences. • Transformed sentences. • Extraction Templates.

Summarization by Extraction • Easy to implement and robust • Select the most relevant sentences… • How to discover what type of linguistic/semantic information contributes with the notion of relevance? • How extracts should be evaluated?

Methodologies for Automatic Summarization Classification of Methods: • Traditional Methods • Term, Word, phrase frequencies. • Corpus-based Approaches • Combination of statistical features. • Learning to extract. • Exploiting Discourse Structures • E.g WordNet, RTF • Knowledge Rich Approaches • For particular domain

Classical Methods Keyword method (Luhn’ 58): • Very first work in automated summarization • Computes measures of significance • Words: • stemming • bag of words E FREQUENCY WORDS Resolving power of significant words

Classical Methods Keyword method (Luhn’ 58):

Classical Methods Position/Location method (Edmundson’ 69): • Important sentences occur in specific positions • “lead-based” summary (Brandow’95) • Inverse of position in document works well for the news • Important information occurs in specific sections of the document (introduction/conclusion)

Classical Methods Position/Location method (Edmundson’ 69): • Extra points for sentences in specific sections • Make a list of important sections • LISTA = “introduction”, “method”, “conclusion”, • “results”, ... • Position evidence (Baxendale’58) • First/last sentences in a paragraph are topical • Give extra points to = initial | middle | final

Classical Methods Position/Location method (Edmundson’ 69): • Position depends on type of text! • “Optimum Position Policy” (Lin & Hovy’97) method to learn “positions” which contain relevant information OPP= { (p1,s2), (p2,s1), (p1,s1), ...} • Pi = paragraph num; si = sentence num • Learning method uses documents + abstracts + keywords provided by authors

Classical Methods Title method (Edmundson’ 69): • Hypothesis: title of document indicates its content. Therefore, words in title help to find relevant contents. • Create a list of title words, remove “stop words”

Classical Methods Cue method (Edmundson’ 69, Paice’ 81): • Important sentences contain cue words ‘This paper presents…’or ‘Results show…’ • Some words are considered bonus others stigma • bonus: comparatives, superlatives, conclusive expressions, etc. • stigma: negatives, pronouns, etc. • Paice implemented a dictionary of <cue,weight> • Grammar for indicative expressions In + skip(0) + this + skip(2) + paper + skip(0) + we + ... • Cue words can be learned (Teufel’98)

Classical Methods Experimental Combination of Features (Edmundson’ 69): • Linear combination of four features: 1C + 2K + 3T + 4L • First the parameters are adjusted using Manually labeled training data. • Testing all possible of combinations. • Produce summaries. • Evaluate the resultant summaries.

Classical Methods Experimental Combination of Features (Edmundson’ 69): Result obtained • Best system • cue + title + position • Individual features • Position is best, then • cue • title • keyword

Corpus-based Methods Learning to extract:

Corpus-based Methods Learning to extract: (Kupiec et al. 95) • Extracts of roughly 20% of original text • Feature set: • sentence length • |S| > 5 • fixed phrases • 26 manually chosen • paragraph • sentence position in paragraph • thematic words • binary: whether sentence is included in manual extract • uppercase words • not common acronyms • Corpus: • 188 document + summary pairs from scientific journals

Corpus-based Methods • Uses Bayesian classifier: Learning to extract: (Kupiec et al. 95) • Assuming statistical independence:

Problems with Extraction Using statistics, (key)-word based, learning classifier for sentence extraction has limitation: • Lack of cohesion

Problems with Extraction Using statistics, (key)-word based, learning classifier for sentence extraction has limitation: • Lack of coherence

Problems with Extraction Some solutions • Rules for the identification of anaphora • Corpus-based heuristics • Aggregation techniques • IF sentence contains anaphor THEN include preceding sentences • Anaphora resolution is more appropriate but • Programs for anaphora resolution are far from perfect • BLAB project (Johnson & Paice’93) • Selection (indicator) & rejection & aggregation rules • Reported success: abstract > aggregation > extract.

Exploiting Discourse Structures • Lexical Chain: • Word sequence in a text where the words are related by one of the relations previously mentioned. • Use: • ambiguity resolution • identification of discourse structure

Exploiting Discourse Structures • WordNet – lexical database • synonymy • dog, can • hypernymy • dog, animal • antonym • dog, cat • meronymy (part/whole) • dog, leg

Exploiting Discourse Structures Extract by Lexical Chain (Barzilay & Elhadad’97; Silber & McCoy’02) • A chain C represents a “concept” in WordNet • Financial institution “bank” • Place to sit down in the park “bank” • Sloppy land “bank” • A chain is a list of words, the order of the words is that of their occurrence in the text • A noun N is inserted in C if N is related to C

Exploiting Discourse Structures Extract by Lexical Chain (Barzilay & Elhadad’97; Silber & McCoy’02) Mr. Kenny is the person that invented the anesthetic machine which uses micro-computers to control the rate at which an anesthetic is pumped into the blood. Such machines are nothing new. But his device uses two micro-computers to achieve much closer monitoring of the pump feeding the anesthetic into the patient

Exploiting Discourse Structures Extract by Lexical Chain (Barzilay & Elhadad’97; Silber & McCoy’02) • Compute the contribution of N to C as follows • If last element of C is M, identify relation of N with M • If C is empty consider the relation to be “repetition” • Compute distance between N and M in number of sentences ( 1 if N is the first word of chain) • Contribution of N is looked up in a table with entries given by type of relation and distance e.g., hyper & distance=3 then contribution=0.5. • How to determine the table entries?

Exploiting Discourse Structures Extract by Lexical Chain (Barzilay & Elhadad’97; Silber & McCoy’02) • After inserting all nouns in chains there is a second step. • For each noun, identify the chain where it most contributes; delete it from the other chains and adjust weights Select sentences that belong or are covered by “strong chains”

Exploiting Discourse Structures Extract by Lexical Chain (Barzilay & Elhadad’97; Silber & McCoy’02) • Strong chain: • weight(C) > thr • thr = average(weight(Cs)) + 2*std(weight(Cs)) • Sentence Selection: • H1: select the first sentence that contains a member of a strong chain • H2: select the first sentence that contains a “representative” member of the chain • H3: identify a text segment where the chain is highly dense (density is the proportion of words in the segment that belong to the chain)

Exploiting Discourse Structures IR technique (Salton et al., 97) • Vector Space Model • Similarity metric • Construct a graph of paragraphs. Strength of link is the similarity metric. • Use threshold to decide upon similar paragraphs.

Exploiting Discourse Structures IR technique (Salton et al., 97) • Identify regions where paragraphs are well connected. • Paragraph selection heuristics • bushy path • depth-first path • Segmented bushy path • Co-selection evaluation • optimistic, pessimistic, union, intersection

Exploiting Discourse Structures Rhetorical Analysis: • Rhetorical Structure Theory (RST) • Mann & Thompson’88 • Descriptive theory of text organization • Relations between two text spans • Nucleus & satellite (hypotactic) • Nucleus & nucleus (paratactic)

Exploiting Discourse Structures Rhetorical Analysis: • Relations can be marked on the syntax • John went to sleep because he was tired. • Mary went to the cinema and Julie went to the theatre. • RST authors say that markers are not necessary to identify a relation. • However all RTS analyzers rely on markers • “however”, “therefore”, “and”, “as a consequence”, etc.

Exploiting Discourse Structures Rhetorical Analysis: • (A) Smart cards are becoming more attractive • (B) as the price of micro-computing power and storage continues to drop. • (C) They have two main advantages over magnetic strip cards. • (D) First, they can carry 10 or even 100 times as much information • (E) and hold it much more robustly. • (F) Second, they can execute complex tasks in conjunction with a terminal.

Automatic Text Summarization Introduction and Research Problems

Automatic Text Summarization Introduction and Research Problems

Presentation Transcript

Automatic summarization

Text summarization

Text Summarization

Text summarization

Automatic Text Summarization

Automatic Text Summarization

Automatic Text Summarization

Text Summarization: News and Beyond

Automatic Text Summarization

Text Summarization: News and Beyond

AUTOMATIC TEXT SUMMARIZATION

Automatic Summarization

Automatic Text Summarization

Text summarization

Text summarization

Text summarization

Text Summarization

Automatic Text Summarization

Automatic text summarization

Text Summarization

Automatic Text Summarization: A Solid Base

A Survey on Automatic Text/Speech Summarization