1 / 110

B. Aditya Prakash Computer Science Virginia Tech.

Understanding, Predicting and Managing Behaviors using Propagation: From Flu-trends to Cyber-Security. B. Aditya Prakash Computer Science Virginia Tech. Fidelis Cybersecurity , Sept 26, 2016. Thanks!. Abhishek Sharma. Networks are everywhere!. Facebook Network [2010].

booth
Télécharger la présentation

B. Aditya Prakash Computer Science Virginia Tech.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding, Predicting and Managing Behaviors using Propagation: From Flu-trends to Cyber-Security B. Aditya Prakash Computer Science Virginia Tech. Fidelis Cybersecurity, Sept 26, 2016

  2. Thanks! • Abhishek Sharma Prakash 2016

  3. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2016

  4. Dynamical Processes over networks are also everywhere! Prakash 2016

  5. Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........ Prakash 2016

  6. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] SI Model CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2016

  7. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2016

  8. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2016

  9. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2016

  10. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2016

  11. Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2016

  12. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2016

  13. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing/Utilizing DATA Large real-world networks & processes Prakash 2016

  14. Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2016

  15. Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2016

  16. In this talk Q1: How to predict Flu- trends better? Q2: How does information evolve over time? DATA Large real-world networks & processes Prakash 2016

  17. In this talk Q3: How do malware attacks evolve over time? DATA Large real-world networks & processes Prakash 2016

  18. Outline • Motivation • Part 1: Learning Models (Empirical Studies) • Part 2: Policy and Action (Algorithms) • Conclusion Prakash 2016

  19. Part 1 • Part 1: Learning Models (Empirical Studies) • Q1: How to predict Flu-trends better? • Q2: How does information evolve over time? • Q3: How does malware attacks evolve over time? Prakash 2016

  20. [Chen et. al. ICDM 2014] Surveillance • How to estimate and predict flu trends? Surveillance Report Hospital record Lab survey Population survey Prakash 2016

  21. GFT& Twitter • Estimate flu trends using online electronic sources So cold today, I’m catching cold. I have headache, sore throat, I can’t go to school today. My nose is totally congested, I have a hard time understanding what I’m saying. Prakash 2016

  22. Observation 1: States • There are different states in an infection cycle. • SEIR model: 1. Susceptible 2.Exposed 3. Infected 4.Recovered Prakash 2016

  23. Observation 2: Ep. & So. Gap • Infection cases drop exponentially in epidemiology (Hethcote 2000) • Keyword mentions drop in a power-law pattern in social media (Matsubara 2012) Prakash 2016

  24. Details HFSTM Model • Hidden Flu-State from Tweet Model (HFSTM) • Each word (w) in a tweet (Oi) can be generated by: • A background topic • Non-flu related topics • State related topics Latent state Initial prob. Transit. switch Binary non-flu related switch Transit. prob. Binary background switch Word distribution Prakash 2016

  25. HFSTM Model Generate the state for a tweet Generate the topic for a word • Generating tweets State: [S,E,I] Topic: [Background, Non-flu, State] good S: This restaurant is really E: The movie was good but was it freezing I: I think I have flu Prakash 2016

  26. Details Inference • EM-based algorithm: HFSTM-FIT • E-step: • At(i)=P(O1,O2,…,Ot,St=i) • Bt(i)=P(Ot+1,…,OTu|St=i) • γt(i)=P(St=i|Ou) • M-step: • Other parameters such as state transition probabilities, topic distributions, etc. • Parameters learned: Prakash 2016

  27. A possible issue with HFSTM • Suffersfrom large, noisy vocabulary. • Semi-supervision for improvement • Introduce weak supervision into HFSTM. Prakash 2016

  28. HFSTM-A • HFSTM-A(spect) • Introduce an aspect variable y, expressing our belief on whether a word is flu-related or not. • The value of y biases the switch variables s.t. flu-related words are more likely to be explained by state topics. When the aspect value (y) is introduced, the switching probability are updated accordingly. Prakash 2016

  29. Vocabulary & Dataset • Vocabulary (230 words): • Flu-related keyword list by Chakraborty SDM 2014 • Extra state-related keyword list • Dataset (34,000 tweets): • Identify infected users and collect their tweets • Train on data from Jun 20, 2013-Aug 06, 2013 • Test on two time period: • Dec 01, 2012- July 08, 2013 • Nov 10, 2013-Jan 26, 2014 Prakash 2016

  30. Learned word distributions • The most probable words learned in each state Probably healthy: S Having symptons: E Definitely sick: I Prakash 2016

  31. Learned state transition Transition probabilities Transition in real tweets Learned by HFSTM: Not directly flu-related, yet correctly identified Prakash 2016

  32. Flu trend fitting • Ground-truth: • The Pan American Health Organization (PAHO) • Algorithms: • Baseline: • Count the number of keywords weekly as features, and regress to the ground-truth curve. • Google flu trend: • Take the google flu trend data as input, regress to the PAHO curve. • HFSTM: • Distinguish different states of keyword, and only use the number of keywords in I state. Again regress to PAHO. Prakash 2016

  33. Flu trend fitting • Linear regression to the case count reported by PAHO (the ground-truth) Prakash 2016

  34. HFSTM-A • Results are qualitatively similar with HFSTM, when the vocabulary is 10 times larger. Prakash 2016

  35. Part 1 • Part 1: Learning Models (Empirical Studies) • Q1: How to predict Flu-trends better? • Q2: How does information evolve over time? • Q3: How does malware attacks evolve over time? Prakash 2016

  36. Google Search Volume e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date (1) First spike (2) Release date (3) Two weeks before release ? ? Prakash 2016

  37. Patterns Y X Prakash 2016

  38. Patterns Y More Data X Prakash 2016

  39. Patterns Y Anomaly ? X Prakash 2016

  40. Patterns Y Anomaly ? Extrapolation X Prakash 2016

  41. Patterns Y Anomaly Imputation Extrapolation X Prakash 2016

  42. Patterns Anomaly Imputation Compression Extrapolation Prakash 2016

  43. Rise and fall patterns in social media • Meme (# of mentions in blogs) • short phrases Sourced from U.S. politics in 2008 “you can put lipstick on a pig” “yes we can” Prakash 2016

  44. Rise and fall patterns in social media • Can we find a unifying model, which includes these patterns? • four classes on YouTube [Crane et al. ’08] • six classes on Meme [Yang et al. ’11] Prakash 2016

  45. Rise and fall patterns in social media • Answer: YES! • We can represent all patterns by single model In Matsubara, Sakurai, Prakash+ SIGKDD 2012 Prakash 2016

  46. Main idea - SpikeM • 1. Un-informed bloggers (uninformed about rumor) • 2. External shock at time nb(e.g, breaking news) • 3. Infection(word-of-mouth) β Time n=0 Time n=nb Time n=nb+1 • Infectiveness of a blog-post at age n: • Strength of infection (quality of news) • Decay function (how infective a blog posting is) Power Law Prakash 2016

  47. J. G. Oliveira et. al. Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature437, 1251 (2005) . [PDF] (also in Leskovec, McGlohon+, SDM 2007) -1.5 slope Prakash 2016

  48. Details SpikeM - with periodicity • Full equation of SpikeM Periodicity 12pm Peak activity Bloggers change their activity over time (e.g., daily, weekly, yearly) 3am Low activity activity Time n Prakash 2016

  49. Tail-part forecasts • SpikeMcan capture tail part Prakash 2016

  50. “What-if” forecasting e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date (1) First spike (2) Release date (3) Two weeks before release ? ? Prakash 2016

More Related