800 likes | 810 Vues
This seminar explores the dynamics of cascades on large graphs and the implications for various domains like epidemiology, viral marketing, and cybersecurity. Topics include policy and action algorithms, learning models, and edge manipulation techniques.
E N D
Understanding and Managing Cascades on Large Graphs B. Aditya Prakash Computer Science Virginia Tech. CS Seminar 11/30/2012
Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2012
Dynamical Processes over networks are also everywhere! Prakash 2012
Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • Localized effects: riots…
Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2012
Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2012
Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2012
Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2012
Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2012
Why do we care? (4: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2012
High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2012
Research Theme ANALYSIS Understanding POLICY/ ACTION Managing DATA Large real-world networks & processes Prakash 2012
Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2012
Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2012
In this talk Q1: How to immunize and control out-breaks better? Q2: How to find culprits of epidemics? POLICY/ ACTION Managing Prakash 2012
In this lecture Q3: How do cascades look like? Q4: How does activity evolve over time? DATA Large real-world networks & processes Prakash 2012
Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2012
Part 1: Algorithms • Q1: Whom to immunize? • Q2: How to detect culprits? Prakash 2012
Hanghang Tong, B. Aditya Prakash, Tina Eliassi-Rad, MichalisFaloutsos, Christos Faloutsos “Gelling, and Melting, Large Graphs by Edge Manipulation” in ACM CIKM 2012 (Best Paper Award) [Thanks to Hanghang Tong for some slides!] Prakash 2012
Sick Healthy An Example: Flu/Virus Propagation Contact 1: Sneeze to neighbors 2: Some neighbors Sick 3: Try to recover Q: How to guild propagation by opt. link structure? - Q1: Understand tipping point existing work - Q2: Minimize the propagation - Q3: Maximize the propagation This paper 20
Vulnerability measure λ[ICDM 2011, PKDD2010] λ is the epidemic threshold “Safe” “Vulnerable” “Deadly” Increasing λ Increasing vulnerability Prakash 2012
Minimizing Propagation: Edge Deletion • Given: a graph A, virus prop model and budget k; • Find: delete k ‘best’ edges from A to minimize λ Bad Good
Q: How to find k best edges to delete efficiently? Right eigen-score of target Left eigen-score of source
Minimizing Propagation: Evaluations Log (Infected Ratio) (better) Our Method Time Ticks Aa Data set: Oregon Autonomous System Graph (14K node, 61K edges)
Discussions: Node Deletion vs. Edge Deletion • Observations: • Node or Edge Deletion λ Decrease • Nodes on A = Edges on its line graph L(A) Original Graph A Line Graph L(A) • Questions? • Edge Deletion on A = Node Deletion on L(A)? • Which strategy is better (when both feasible)?
Discussions: Node Deletion vs. Edge Deletion • Q: Is Edge Deletion on A = Node Deletion on L(A)? • A: Yes! • But, Node Deletion itself is not easy: Theorem: Line Graph Spectrum. Eigenvalue of A Eigenvalue of L(A) Theorem: Hardness of Node Deletion. Find Optimal k-node Immunization is NP-Hard 26
Discussions: Node Deletion vs. Edge Deletion • Q: Which strategy is better (when both feasible)? • A: Edge Deletion > Node Deletion (better) Green: Node Deletion (e.g., shutdown a twitter account) Red: Edge Deletion (e.g., un-friend two users) 27
Maximizing Propagation: Edge Addition • Given: a graph A, virus prop model and budget k; • Find: add k ‘best’ new edges into A. • By 1st order perturbation, we have λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je) • So, we are done need O(n2-m) complexity Right eigen-score of target Left eigen-score of source Low Gv High Gv 28
Maximizing Propagation: Edge Addition λs - λ ≈Gv(S)= c ∑eєS u(ie)v(je) • Q: How to Find k new edges w/ highest Gv(S) ? • A: Modified Fagin’s algorithm #2: Sorting Targets by v k k+d #3: Search space Search space k k+d #1: Sorting Sources by u Time Complexity: O(m+nt+kt2), t = max(k,d) :existing edge
Maximizing Propagation: Evaluation Log (Infected Ratio) Our Method (better) Time Ticks 30
Fractional Immunization of Networks B. Aditya Prakash, LadaAdamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos Under Submission Prakash 2012
Previously: Full Static Immunization Given: a graph A, virus prop. model and budget k; Find: k ‘best’ nodes for immunization (removal). k = 2 ? ? Prakash 2012
Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012
Now: Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012
Fractional Asymmetric Immunization # antidotes = 3 Fractional Effect [ f(x) = ] Asymmetric Effect Prakash 2012
Fractional Asymmetric Immunization Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Prakash 2012
Fractional Asymmetric Immunization = f Drug-resistant Bacteria (like XDR-TB) Another Hospital Hospital Prakash 2012
Fractional Asymmetric Immunization Problem: Given k units of disinfectant, how to distribute them to maximize hospitals saved? Another Hospital Hospital Prakash 2012
Our Algorithm “SMART-ALLOC” ~6x fewer! [US-MEDICARE NETWORK 2005] • Each circle is a hospital, ~3000 hospitals • More than 30,000 patients transferred CURRENT PRACTICE SMART-ALLOC Prakash 2012
Running Time Wall-Clock Time > 1 week ≈ > 30,000x speed-up! Lower is better 14 secs Simulations SMART-ALLOC Prakash 2012
Lower is better Experiments SECOND-LIFE PENN-NETWORK ~5 x ~2.5 x K = 200 K = 2000 Prakash 2012
Part 1: Algorithms • Q2: Whom to immunize? • Q3: How to detect culprits? Prakash 2012
B. Aditya Prakash, JillesVreeken, Christos Faloutsos ‘Detecting Culprits in Epidemics: Who and How many?’ in ICDM 2012, Brussels Prakash and Faloutsos 2012
Culprits: Problem definition 2-d grid ‘+’ -> infected Who started it? Prakash and Faloutsos 2012
Culprits: Problem definition 2-d grid ‘+’ -> infected Who started it? Prior work: [Lappas et al. 2010, Shah et al. 2011] Prakash and Faloutsos 2012
Culprits: Exoneration Prakash and Faloutsos 2012
Culprits: Exoneration Prakash and Faloutsos 2012
Who are the culprits • Two-part solution • use MDL for number of seeds • for a given number: • exoneration = centrality + penalty • Running time = • linear! (in edges and nodes) Prakash and Faloutsos 2012
Modeling using MDL • Minimum Description Length Principle == Induction by compression • Related to Bayesian approaches • MDL = Model + Data • Model • Scoring the seed-set Number of possible |S|-sized sets En-coding integer |S|
Modeling using MDL • Data: Propagation Ripples Infected Snapshot Original Graph Ripple R1 Ripple R2