From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

From bursty patterns to bursty facts:The effectiveness of temporal text mining for news Ilija Subašić & Bettina Berendt K.U. Leuven, Belgium

Agenda • The first problem: temporal text mining • Solution methods • The second problem: evaluation of 2) • Our approach: cross-evaluation framework • Case study: evaluation of 3 methods

Temporal text mining (TTM): The problem time • What happened? • What were the important new developments in a time period?

TTM: keyword representation methods [Kleinberg, 2002] “bursty“ ~ more frequent in 1985-94 than in the whole analysed time

TTM:group representation methods [Mei & Zhai, 2005]

TTM: combo representation methods– STORIES: graphical summary [Subašić, Berendt, 2008+]

Demo at ECML/PKDD STORIES: story tracking and exploration

Evaluations so far • Standardized tasks and competitions • DUC update task & ROUGE framework: summarization • TREC Novelty track (2002-04): novel-sentence retrieval • Disadvantages: • # documents too small: 10 for DUC and 25 per topic for TREC • Output is textual  not possible to compare all TTM methods • TTM: evaluations limited to the respective method and corpora

Retrieved sentences Patterns Queries Cross-evaluation: Our approach (1) Sentence retrieval Query-likelihood retrieval QL More : https://sites.google.com/site/subasicilija/ttm-evaluation Query generation Generic / method-specific (top bursty elements & combinations) shave hair britney_spears Last Friday, pop star Britney shaved her head, parting with her long hair.

Our approach (2):Sentences‘ “precision/recall“ Sentence retrieval Query-likelihood retrieval QL Query generation Generic / method-specific (top bursty elements & combinations) Retrieved sentences Patterns Queries IR-style evaluation ROUGE2, ROUGE.SU4, aggregate measures, Friedman and Tukey‘s multiple comparison test Ground-truth sentences

Our approach (3):“Recall-oriented“ aggregate measure maxMR Retrieved sentences t Ground- truth sentences t Best fit (ROUGE) Normalize by max. possible best fit All sentences t

Our approach (4):“Precision-oriented“ aggregate measure maxMP Retrieved sentences method I Ground- truth sentences  Method II has a better chance of good matches  Scale maxMR by Retrieved sentences method II

Case study experiment: Data & settings • Corpus 1: Crime case • 21 weeks, 306 documents, 31 ground-truth sentences • Corpus 2: Celebrity reporting • 8 weeks, 3000 documents, 19 ground-truth sentences • Corpora available at https://sites.google.com/site/subasicilija/ttm-evaluation • M1: a keyword representation method: Kleinberg‘s bursty words • M2: a group repr.method: Mei & Zhai‘s temporal text mining • M3: a combo representation method: STORIES

Results: Top group method comparison maxMR maxMP

maxMP maxMR Results: Query generation comparison

Summary • First cross-methods evaluation framework for Temporal Text Mining methods with different patterns • Experimental investigation of 3 TTM types • Results: • different methods – different strengths and weaknesses • M3/named entities: most robust method over settings • M3 variants > M1, M2 in “precision-oriented” measures • specific query generation improves “precision-oriented” results, especially for M1 and M2 • corpus dependence

Future work • Standardized, bigger, more varied datasets • Establish a baseline (ROUGE originally for longer text sequences) • Explore possible sources of bias for/against specific methods • User studies (in progress)

STORIES: graphical summary, textual summary, documents

From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

From bursty patterns to bursty facts: The effectiveness of temporal text mining for news

Presentation Transcript

Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining

Parameter Free Bursty Events Detection in Text Streams

Mining Correlated Bursty Topic Patterns from Coordinated Text Streams

Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining

Bursty Event Detection from Text Streams for Disaster Management

Bursty Subgraphs in Social Network

Bursty and Hierarchical Structure in Streams

XJoin : Getting Fast Answers From Slow and Bursty Networks

On the Bursty Evolution of Blogspace

Aggressiveness Protective Fair Queuing for Bursty Applications

Multimedia Traffic Engineering The Bursty Data Model

Mining Correlated Bursty Topic Patterns from Coordinated Text Streams

Clarification of the Link Setup Bursty parameter (CID 2547, 3214)

Performance Estimation of Bursty Wavelength Division Multiplexing Networks

Adaptive Frequency Counting over Bursty Data Streams

Reliable Bursty Convergecast in Wireless Sensor Networks

Congestion models for bursty TCP traffic

Real Time Bursty Topic Detection from Twitter

Text Mining of Electronic News Content for Economic Research