1 / 14

Query Chain Focused Summarization

Query Chain Focused Summarization. Tal Baumel , Rafi Cohen, Michael Elhadad Jan 2014. Generic Summarization. Generic Extractive Multi-doc Summarization: Given a set of documents Di Identify a set of sentences Sj s.t. | Sj | < L The “central information” in Di is captured by Sj

jonah
Télécharger la présentation

Query Chain Focused Summarization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014

  2. Generic Summarization • Generic Extractive Multi-doc Summarization: • Given a set of documents Di • Identify a set of sentences Sjs.t. • |Sj| < L • The “central information” in Di is captured by Sj • Sj does not contain redundant information • Representative methods: • KLSum • LexRank • Key concepts: Centrality, Redundancy

  3. Update Summarization • Given a set of documents split as A = ai / B = bjdefined as background / new sets • Select a set of sentences Sks.t. • |Sk| < L • Sk captures central information in B • Sk does not repeat information conveyed by A • Key concepts: centrality, redundancy, novelty

  4. Query-Focused Summarization • Given a set of documents Di and a query Q • Select a set of sentences Sjs.t.: • |Sj| < L • Sj captures information in Di relevant to Q • Sj does not contain redundant information • Key concepts: relevance, redundancy

  5. Query-Chain Focused Summarization • We define a new task to clarify among key concepts: • Relevance • Novelty • Contrast • Similarity • Redundancy • The task is also useful for Exploratory Search

  6. QCFS Task • Given a set of topic-related documents Di and a chain of queries qj • Output a chain of summaries {Sjk} s.t.: • |Sjk| < L • Sjk is relevant to qj • Sjk does not contain information in Slk for l < j

  7. Query Chains • Query Chains are observed in query logs: • PubMed search log mining • Extract query chains (length 3) of same session / with related terms (manually) • Query Chains evolution may correspond to: • Zoom in (asthma  atopic dermatitis) • Query reformulation (respiratory problem  pneumonia) • Focus Change (asthma  cancer)

  8. Query Chains vs. Novelty Detection TREC Novelty Detection Task (2005) • Task 1: Given a set of documents for the topic, identify all relevant and novel sentences. • Task 2: Given the relevant sentences in all documents, identify all novel sentences. • Task 3: Given the relevant and novel sentences in the first 5 docs only, find the relevant and novel sentences in the remaining docs. • Task 4: Given the relevant sentences from all documents and the novel sentences from the first 5 docs, find the novel sentences in the remaining docs.

  9. Novelty Detection Task • Create 50 topics: • Compose topic (textual description) • Select 25 relevant docs from News collection • Sort docs chronologically • Mark relevant sentences • Among relevant sentences, mark novel ones (not covered in previous relevant sentences). • 28 “events” topics / 22 “opinion” topics

  10. TREC Novelty – Dataset Analysis • Select parts of documents (not full docs). • Relevant rate: events: 25% / opinion: 15% • Consecutive sentences: 85% / 65% • Relevant agreement: 68% / 50% • Novelty rate: 38% / 42% • Novelty agreement: 45% / 29%

  11. TREC Novelty Methods • Relevance = Similarity to Topic. • Novelty = Dissimilarity to past sentences. • Methods: • Tf.idf and okapi with threshold for retrieval • Topic expansion • Sentences expansion • Named entities as features • Coreference resolution • Named entities normalization (entity linking) • Results: • High recall / Low precision • Almost no distinction relevant / novel

  12. QCFS and Contrast • QCFS is different from Query Focus: • When generating S2 – must take S1 into account. • QCFS is different from Update: • Split A/B is not observed. • QCFS is different from Novelty Detection: • Chronology is not relevant Key concepts: • Query Relevance • Query Distinctiveness (how qi+1 contrasts with qi)

  13. Contrastive IR • CWS: A Comparative Web Search SystemSun et al, WWW 2006 • Given 2 queries q1 and q2 • Rank a set of “contrastive pairs” (p1, p2)where p1 and p2 are snippets of relevant docs. • Method: • Retrieve relevant snippets SR1 = {p1i} and SR2 = {p2j} • Score aR(p1, q1) + bR(p2, q2) + cT(p1,p2,q1,q2) • T(p1,p2,q1,q2) = x Sim(url1, url2) + (1-x)Sim(p1\q1, p2\q2) • Greedy ranking of pairs: • rank all pairs (p1,p2) by score – take top • Remove p1top and p2top from all pairs – iterate. • Cluster pairs into comparative clusters • Extract terms from comparative clusters.

  14. Document Clustering • A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search ResultsKummamuru et al, WWW 2004 • Desirable properties of clustering: • Coverage • Compactness • Sibling distinctiveness • Reach time • Incremental algorithm: • Decide on width n of tree (# children / node) • Nodes are represented by “concepts” (terms) • Rank concepts by score and add them under current node • Score(Sak, cj) = a ScoreC(Sak-1, cj) + b ScoreD(Sak-1, cj) • ScoreC = document coverage • ScoreD = sibling distinctiveness

More Related