1 / 80

Understanding and Managing Cascades on Large Graphs

This seminar explores the dynamics of cascades on large graphs and the implications for various domains like epidemiology, viral marketing, and cybersecurity. Topics include policy and action algorithms, learning models, and edge manipulation techniques.

jcorona
Télécharger la présentation

Understanding and Managing Cascades on Large Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012

  2. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2012

  3. Dynamical Processes over networks are also everywhere! Prakash 2012

  4. Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • Localized effects: riots…

  5. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2012

  6. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2012

  7. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2012

  8. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2012

  9. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2012

  10. Why do we care? (4: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2012

  11. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2012

  12. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes Prakash 2012

  13. Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2012

  14. Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2012

  15. In this talk Q1: How to immunize and control out-breaks better? Q2: How to find culprits of epidemics? POLICY/ ACTION Managing Prakash 2012

  16. In this lecture Q3: How do cascades look like? Q4: How does activity evolve over time? DATA Large real-world networks & processes Prakash 2012

  17. Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2012

  18. Part 1: Algorithms • Q1: Whom to immunize? • Q2: How to detect culprits? Prakash 2012

  19. Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, MichalisFaloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation” in ACM CIKM 2012 (Best Paper Award) [Thanks to Hanghang Tong for some slides!] Prakash 2012

  20. Sick Healthy An Example: Flu/Virus Propagation Contact 1: Sneeze to neighbors 2: Some neighbors  Sick 3: Try to recover Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point  existing work - Q2: Minimize the propagation - Q3: Maximize the propagation  This paper 20

  21. Vulnerability measure λ[ICDM 2011, PKDD2010] λ is the epidemic threshold “Safe” “Vulnerable” “Deadly” Increasing λ Increasing vulnerability Prakash 2012

  22. Minimizing Propagation: Edge Deletion • Given: a graph A, virus prop model and budget k; • Find: delete k ‘best’ edges from A to minimize λ Bad Good

  23. Q: How to find k best edges to delete efficiently? Right eigen-score of target Left eigen-score of source

  24. Minimizing Propagation: Evaluations Log (Infected Ratio) (better) Our Method Time Ticks Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)

  25. Discussions: Node Deletion vs. Edge Deletion • Observations: • Node or Edge Deletion  λ Decrease • Nodes on A = Edges on its line graph L(A) Original Graph A Line Graph L(A) • Questions? • Edge Deletion on A = Node Deletion on L(A)? • Which strategy is better (when both feasible)?

  26. Discussions: Node Deletion vs. Edge Deletion • Q: Is Edge Deletion on A = Node Deletion on L(A)? • A: Yes! • But, Node Deletion itself is not easy: Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A) Theorem: Hardness of Node Deletion. Find Optimal k-node Immunization is NP-Hard 26

  27. Discussions: Node Deletion vs. Edge Deletion • Q: Which strategy is better (when both feasible)? • A: Edge Deletion > Node Deletion (better) Green: Node Deletion (e.g., shutdown a twitter account) Red: Edge Deletion (e.g., un-friend two users) 27

  28. Maximizing Propagation: Edge Addition • Given: a graph A, virus prop model and budget k; • Find: add k ‘best’ new edges into A. • By 1st order perturbation, we have λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je) • So, we are done  need O(n2-m) complexity Right eigen-score of target Left eigen-score of source Low Gv High Gv 28

  29. Maximizing Propagation: Edge Addition λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je) • Q: How to Find k new edges w/ highest Gv(S) ? • A: Modified Fagin’s algorithm #2: Sorting Targets by v k k+d #3: Search space Search space k k+d #1: Sorting Sources by u Time Complexity: O(m+nt+kt2), t = max(k,d) :existing edge

  30. Maximizing Propagation: Evaluation Log (Infected Ratio) Our Method (better) Time Ticks 30

  31. Fractional Immunization of Networks B. Aditya Prakash, LadaAdamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Prakash 2012

  32. Previously: Full Static Immunization Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 ? ? Prakash 2012

  33. Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012

  34. Now: Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012

  35. Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012

  36. Fractional Asymmetric Immunization Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Prakash 2012

  37. Fractional Asymmetric Immunization = f Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Prakash 2012

  38. Fractional Asymmetric Immunization Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Another Hospital Hospital Prakash 2012

  39. Our Algorithm “SMART-ALLOC” ~6x fewer! [US-MEDICARE NETWORK 2005] • Each circle is a hospital, ~3000 hospitals • More than 30,000 patients transferred CURRENT PRACTICE SMART-ALLOC Prakash 2012

  40. Running Time Wall-Clock Time > 1 week ≈ > 30,000x speed-up! Lower is better 14 secs Simulations SMART-ALLOC Prakash 2012

  41. Lower is better Experiments SECOND-LIFE PENN-NETWORK ~5 x ~2.5 x K = 200 K = 2000 Prakash 2012

  42. Part 1: Algorithms • Q2: Whom to immunize? • Q3: How to detect culprits? Prakash 2012

  43. B. Aditya Prakash, JillesVreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels Prakash and Faloutsos 2012

  44. Culprits: Problem definition 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012

  45. Culprits: Problem definition 2-d grid ‘+’ -> infected Who started it? Prior work: [Lappas et al. 2010, Shah et al. 2011] Prakash and Faloutsos 2012

  46. Culprits: Exoneration Prakash and Faloutsos 2012

  47. Culprits: Exoneration Prakash and Faloutsos 2012

  48. Who are the culprits • Two-part solution • use MDL for number of seeds • for a given number: • exoneration = centrality + penalty • Running time = • linear! (in edges and nodes) Prakash and Faloutsos 2012

  49. Modeling using MDL • Minimum Description Length Principle == Induction by compression • Related to Bayesian approaches • MDL = Model + Data • Model • Scoring the seed-set Number of possible |S|-sized sets En-coding integer |S|

  50. Modeling using MDL • Data: Propagation Ripples Infected Snapshot Original Graph Ripple R1 Ripple R2

More Related