1 / 77

Understanding and Predicting Human Behavior using Propagation

Learn how to understand and predict human behavior through the study of propagation. Explore topics such as flu trends, cyber security, viral marketing, and more.

Télécharger la présentation

Understanding and Predicting Human Behavior using Propagation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding and Predicting Human Behavior using Propagation: From Flu-trends to Cyber-Security B. Aditya Prakash Computer Science Virginia Tech. Keynote Talk, BEAMS Workshop, ICDM, Nov 14, 2015

  2. Thanks! • Reza Zafarani • Huan Liu Prakash 2015

  3. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2015

  4. Dynamical Processes over networks are also everywhere! Prakash 2015

  5. Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........ Prakash 2015

  6. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] SI Model CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2015

  7. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2015

  8. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2015

  9. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2015

  10. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2015

  11. Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2015

  12. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2015

  13. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing/Utilizing DATA Large real-world networks & processes Prakash 2015

  14. Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2015

  15. Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2015

  16. In this talk Q1: How to predict Flu- trends better? Q2: How does information evolve over time? DATA Large real-world networks & processes Prakash 2015

  17. In this talk Q3: How do malware attacks evolve over time? DATA Large real-world networks & processes Prakash 2015

  18. Outline • Motivation • Part 1: Learning Models (Empirical Studies) • Q1: How to predict Flu-trends better? • Q2: How does information evolve over time? • Q3: How does malware attacks evolve over time? • Conclusion Prakash 2015

  19. [Chen et. al. ICDM 2014] Surveillance • How to estimate and predict flu trends? Surveillance Report Hospital record Lab survey Population survey Prakash 2015

  20. GFT& Twitter • Estimate flu trends using online electronic sources So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying. Prakash 2015

  21. Observation 1: States • There are different states in an infection cycle. • SEIR model: 1. Susceptible 2.Exposed 3. Infected 4.Recovered Prakash 2015

  22. Observation 2: Ep. & So. Gap • Infection cases drop exponentially in epidemiology (Hethcote 2000) • Keyword mentions drop in a power-law pattern in social media (Matsubara 2012) Prakash 2015

  23. HFSTM Model • Hidden Flu-State from Tweet Model (HFSTM) • Each word (w) in a tweet (Oi) can be generated by: • A background topic • Non-flu related topics • State related topics Latent state Initial prob. Transit. switch Binary non-flu related switch Transit. prob. Binary background switch Word distribution Prakash 2015

  24. HFSTM Model Generate the state for a tweet Generate the topic for a word • Generating tweets State: [S,E,I] Topic: [Background, Non-flu, State] good S: This restaurant is really E: The movie was good but was it freezing I: I think I have flu Prakash 2015

  25. Inference • EM-based algorithm: HFSTM-FIT • E-step: • At(i)=P(O1,O2,…,Ot,St=i) • Bt(i)=P(Ot+1,…,OTu|St=i) • γt(i)=P(St=i|Ou) • M-step: • Other parameters such as state transition probabilities, topic distributions, etc. • Parameters learned: Prakash 2015

  26. A possible issue with HFSTM • Suffersfrom large, noisy vocabulary. • Semi-supervision for improvement • Introduce weak supervision into HFSTM. Prakash 2015

  27. HFSTM-A • HFSTM-A(spect) • Introduce an aspect variable y, expressing our belief on whether a word is flu-related or not. • The value of y biases the switch variables s.t. flu-related words are more likely to be explained by state topics. When the aspect value (y) is introduced, the switching probability are updated accordingly. Prakash 2015

  28. Vocabulary & Dataset • Vocabulary (230 words): • Flu-related keyword list by Chakraborty SDM 2014 • Extra state-related keyword list • Dataset (34,000 tweets): • Identify infected users and collect their tweets • Train on data from Jun 20, 2013-Aug 06, 2013 • Test on two time period: • Dec 01, 2012- July 08, 2013 • Nov 10, 2013-Jan 26, 2014 Prakash 2015

  29. Learned word distributions • The most probable words learned in each state Probably healthy: S Having symptons: E Definitely sick: I Prakash 2015

  30. Learned state transition Transition probabilities Transition in real tweets Learned by HFSTM: Not directly flu-related, yet correctly identified Prakash 2015

  31. Flu trend fitting • Ground-truth: • The Pan American Health Organization (PAHO) • Algorithms: • Baseline: • Count the number of keywords weekly as features, and regress to the ground-truth curve. • Google flu trend: • Take the google flu trend data as input, regress to the PAHO curve. • HFSTM: • Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO. Prakash 2015

  32. Flu trend fitting • Linear regression to the case count reported by PAHO (the ground-truth) Prakash 2015

  33. HFSTM-A • Results are qualitatively similar with HFSTM, when the vocabulary is 10 times larger. Prakash 2015

  34. Outline • Motivation • Part 1: Learning Models (Empirical Studies) • Q1: How to predict Flu-trends better? • Q2: How does information evolve over time? • Q3: How does malware attacks evolve over time? • Conclusion Prakash 2015

  35. Google Search Volume e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date (1) First spike (2) Release date (3) Two weeks before release ? ? Prakash 2015

  36. Patterns Y X Prakash 2015

  37. Patterns Y More Data X Prakash 2015

  38. Patterns Y Anomaly ? X Prakash 2015

  39. Patterns Y Anomaly ? Extrapolation X Prakash 2015

  40. Patterns Y Anomaly Imputation Extrapolation X Prakash 2015

  41. Patterns Anomaly Imputation Compression Extrapolation Prakash 2015

  42. Rise and fall patterns in social media • Meme (# of mentions in blogs) • short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Prakash 2015

  43. Rise and fall patterns in social media • Can we find a unifying model, which includes these patterns? • four classes on YouTube [Crane et al. ’08] • six classes on Meme [Yang et al. ’11] Prakash 2015

  44. Rise and fall patterns in social media • Answer: YES! • We can represent all patterns by single model In Matsubara, Sakurai, Prakash+ SIGKDD 2012 Prakash 2015

  45. Main idea - SpikeM • 1. Un-informed bloggers (uninformed about rumor) • 2. External shock at time nb(e.g, breaking news) • 3. Infection(word-of-mouth) β Time n=0 Time n=nb Time n=nb+1 • Infectiveness of a blog-post at age n: • Strength of infection (quality of news) • Decay function (how infective a blog posting is) Power Law Prakash 2015

  46. J. G. Oliveira et. al. Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature437, 1251 (2005) . [PDF] (also in Leskovec, McGlohon+, SDM 2007) -1.5 slope Prakash 2015

  47. Details SpikeM - with periodicity • Full equation of SpikeM Periodicity 12pm Peak activity Bloggers change their activity over time (e.g., daily, weekly, yearly) 3am Low activity activity Time n Prakash 2015

  48. Tail-part forecasts • SpikeMcan capture tail part Prakash 2015

  49. “What-if” forecasting e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date (1) First spike (2) Release date (3) Two weeks before release ? ? Prakash 2015

  50. “What-if” forecasting • SpikeM can forecast not only tail-part, but also rise-part! • SpikeMcan forecast upcoming spikes (1) First spike (2) Release date (3) Two weeks before release Prakash 2015

More Related