BurstyEvent Detection from Text Streams for Disaster Management 2012-04-17 Sungjun Lee, Sangjin Lee, Kwanho Kim, and Jonghun Park firstname.lastname@example.org Information Management Lab. Dept. of Industrial Engineering Seoul National University
Introduction • Identify disaster related bursty events from multiple text streams. • Characterize bursty terms in terms of • Skewness, consistency, periodicity, and variation. Scoring a term to determine whether or not it is bursty term. Normal terms Disaster related terms nice catastrophe happy bad Observations good fine die nightmare … Streams Stream 2 Stream K Stream 1 Real world states normal disaster
Motivation example • The distribution of the frequency of terms observed in AP news stream on Feb. 27, 2010 and Mar. 1, 2010. On Feb. 27, 2010, earthquake hit Chile. On Mar. 1, 2010, the trial about a Bosnian politician,Radovan Karadzic, began.
Skewness feature • A bursty term appears intensively in a specific time period during the corresponding event occurs. where . The change of the term frequency distribution of “tsunami” Probability Probability Term frequency during L days Term frequency during L days
Consistency feature • The frequency of a bursty term soars across multiple streams. The change of the term appearance of “tsunami” Twitterer focusing on tsunami research Stream 1 Article not containing “tsunami” Stream 2 Twitterer focusing on travel … Article containing “tsunami” Stream K
Periodicity feature • Periodic terms are less likely to be bursty terms. • Penalize terms exhibiting the periodicity. Periodicity of “Sunday” Periodicity of “earthquake” period=6.8966 period=3.4843
Variation feature • To cope with different writing styles among streams. • Reduce the possibility of identifying a term with high frequency only in a specific stream as a bursty term. The change of the term appearance of “AP” with a fixed signature “AP news” AP news Start to publish articles Article not containing “AP” Stream 1 Stream 2 … Article containing “AP” Stream K
Putting them all together to measure burstyness • Combine the four scores of different features based on different rationale and scales. • The final term weighting scheme, burst, as follows: where
Experiment setting • 6 news channels are collected • Sources: CNN, AP, Reuters, Times Online, Wall Street Journal, New York Times • Category: World news • Period: 1 Oct. 2009 – 15 Mar. 2010 • Source Type: RSS feed Data channels Google reader API Google ReaderRepository Experiment DB
Experiment results • Comparison of bursty term detection results with methods proposed by Whitney et al. (2009), Fung et al. (2005), Chen et al. (2007), and He et al. (2005). • Bold terms: bursty terms assumed to be correct. • Underlined terms: topical terms. • Starred terms: general terms.
Experiment results • Comparison of the performance of retrieving documents related with bursty events.
Further work Bursty terms MI CV Chi-Square Chernoff Divergence Skewness Self-Similarity KL Divergence Union of “Statistically Sufficient” Conditions
Conclusion • Focus on identifying bursty terms to detect disaster related bursty events. • Bursty terms can help people in properly reacting in decision critical situations. • Bursty terms can be characterized by using four perspectives. • Skewness, consistency, periodicity, and variation. • The final scoring function to detect bursty terms is proposed. • The experiment results showed that the proposed approach is effective to detect bursty terms compared to the existing alternatives.