520 likes | 634 Vues
This study explores the critical dynamics of worms and viruses in network infrastructures, such as the infamous Code Red and SQL Slammer worm incidents, which significantly disrupted parts of the Internet. It delves into the complex modeling of worm propagation, virus virulence, and cascading failures within the network structures, particularly focusing on deterministic and stochastic models. The research aims to identify factors that affect virulence and the influence of network topology on epidemic behaviors, providing insight into better modeling and potential countermeasures for future outbreaks.
E N D
Worms, Viruses, and Cascading Failures in networks D. Towsley U. Massachusetts Collaborators: W. Gong, C. Zou (UMass) A. Ganesh , L. Massoulie (Microsoft)
Internet as enabler of terrific apps • … but also of malicious behavior • worms, viruses • Internet as a complex system • critical DNS, BGP infrastructures
Worms and failures • Code Red worm • more than 360,000 infected in less than one day • disrupted parts of BGP infrastructure • SQL Slammer • less than 15 minutes to infect 75,000 hosts • congested parts of Internet • BGP errors in one network → cascade of faults in BGP in another network
Goals • what are appropriate models? • deterministic • stochastic • what makes worm/virus/failure virulent? • how does topology affect virulence?
Outline • worms, deterministic models • cascading failures, stochastic models • summary
Worm spreading behavior • scan for vulnerable hosts • sequential, random, topological • uniform, local preference • virulence sensitive to • scanning strategy • host speed, bandwidth • protocol • …
W N Worm spreading model • address space, size W • N vulnerable hosts • scan rate (per host), h
Simple worm spreading model I(t) - number of infected hosts at time t Epidemic model: with initial condition I(0)
D. Goldsmith K. Eichman scan rate time Code Red: model • measurements from two Class A networks • scan rate I(t) • epidemic model matches increasing part of observed Code Red data (Staniford) What about decrease? • human countermeasures • congestion Zou, etal, 2002
Assumptions • classic epidemic model • ignore countermeasures • ignore congestion • Code Red parameters • h = 358/min • N = 360,000 • uniform scan, W = 232 • I(0) = 10 • 100s minutes to spread
Worm virulence • increase h • increase I(0) • decrease W
Worm virulence • increase h • increase I(0) • decrease W • smarter scanning
The perfect worm • perfect worm • scan vulnerable nodes exactly once • flash worm (Staniford,…) • uniform scan of vulnerable nodes (W = N)
Perfect Code Red worm • I(0) = 10 • h = 358/min • N = 360,000 • all hosts infected within 2 sec. • add 2 sec. infection delay -> six-fold slowdown • random scan almost perfect!
Perfect Code Red worm • I(0) = 10 • h = 358/min • N = 360,000 • all hosts infected within 2 sec. • add 2 sec. infection delay -> six-fold slowdown • random scan almostperfect!
Hitlist, routing worms • hitlist worm • increases I(0) • routing worm • decreases W • BGP table information: W = .29 232 • 29% of IP address space
Hitlist, routing worms • Code Red style worm • h = 358/min • N = 360,000 • hitlist, I(0) = 10,000 • routing worm as effective as hitlist worm • hitlist/routing worm extremely virulent
1 1-p 2 K Local preference worm • K subnetworks • p – probability scan local subnet • (1-p) – prob. scan outside localsubnet p …
Local preference worm • Nk, no. vulnerable hosts in subnet k • Ik(t), no. infected hosts in subnet k • fits epidemic model for interacting groups set of coupled ODEs
Local preference worm • K = 116 • Nk = 360,000/K • I1(0) = 10; Ik(0) = 0, k>1 • h = 358/min • provides some of the locality of a routing worm
Questions • topological worms • sequential scan • bandwidth constraints
topology? • failure recovery?
Topology and fast/slow recovery • model description • general network topologies • conditions for fast-slow recovery • specific network topologies • complete graphs (BGP routers) • hypercubes (peer-to-peer networks) • power-law graphs (Internet AS graph; E-mail address book graph)
Susceptible-Infective-Susceptible (SIS) epidemic model Also known as contact process; see [Liggett] • topology: undirected, finite graph G=(V,E),connected ; • Xv = 1if nodevdown(infected) Xv = 0if nodevup (healthy)
Model • {Xv vV} Markov process on {0,1}V with jump rates: • Xv→ 1 with rate w→vXw • Xv → 0 with rate • unique absorbing state at 0 • all other states communicate, 0 is reachable
Time to absorption • system eventually recovers • how long does this take? • T = time to hit 0(from a given initial condition) • how does E[T] depend on , , G?
Example • G = line segment or ring with n nodes • Fix =1 • Theorem (Durrett and Liu): There is critical c > 0 such that, • if < c , then E[T] = O(log n) • if > c , then log E[T] ≈ na • signature of phase transition in infinite 1-D lattice.
Fast recovery, spectral radius - spectral radius of graph adjacency matrix, A; n=|V| . Then, P(X(t) 0) ≤ c n½ exp([ -]t) Hence, when < , Survival time T satisfies: E(T) ≤ [log(n)+1]/[ - ]
Coupling proof Consider “Branching Random Walk”, i.e. Markov process {Yv}vV • Yv→Yv +1 with rate w~v Yw = (AY)v • Yv → Yv -1 with rate Yv Can couple processes so that, for all t, X(t) ≤ Y(t).
Branching random walk bound By “linearity” of Y, dE[Y(t)]/dt = ( A - I) Y(t), so E[Y(t)] = exp( A - I) Y(0) ; Use P(X(t) 0) ≤vV E[Yv(t)]
Slow recovery Graph isoperimetric constant: “perimeter” S “area”
Slow die-out and isoperimetric constant Suppose for some m ≤ n/2, r := [m] / > 1 Then, with positive probability, epidemics survive for time at least rm/[2m] Hence, if m = na, survival time T satisfies log (E[T]) = (na)
Coupling proof Let |X| = v Xv . Then |X| dominates process Z on {0,…,m} with transition rates: z→ z+1 at rate z, z→ z-1 at rate z. Then study absorption time for Z
Complete graph Here, = n-1, m = n-m By picking m = na, a < 1, Thresholds: fast recovery if / < 1/(n-1) slow recovery if / > 1/(n-na)
Hypercube {0,1}d Here, d = log2(n) and = d For m=2k, k < d, m = d-k Hence, for k = d, Thresholds: , fast recovery if / < 1/d slow recovery if / > 1/[d(1-)]
Erdős-Rényi random graph • edge between each pair of nodes present with probability pn independent of others • dense: dn := npn = Ω(log n) • thenρ ~ ~ dn with high probability
Star network • spectral radius: n1/2 • isoperimetric constant: m = 1 for all m < n/2 • general results not useful Specialized analysis yields: • for arbitrary constant c > 0, if / < c/n1/2, fast recovery, E[T] = O(log(n)) • if / > na-1/2 , for a > 0, slow recovery, log(E[T]) = (na)
Power-law random graph Power-law graph with exponent : number of degree kvertices k- E.g. Internet AS graph with = 2.1 Expected degree PLRG [Chung et al]: • expected degrees w1 > ··· > wn: edge (i,j) present w.p. wi wj/k wk • particular choice: wi = c1(i+c2)-1/( -1)
Power-law random graph (2) Spectral radius of PLRG [Chung et al.,03]: Denote by m max. expected degree (m=w1), and by d average of expected degrees. Then:
PLRG, > 2.5 Epidemics on full graph live longer than on sub-graph. Look at star induced by node 1: slow die-out for / > m-1/2 Compare to spectral radius condition: Fast die-out for / < m-1/2 Two thresholds differ by m ; same gap as for star
PLRG, 2 < < 2.5 Consider top N nodes, for suitable N; Erdős-Rényi core, with isoperimetric constant: = F() Gap between thresholds and : constant factor, F()
Open problems • gap between upper and lower bounds in • sparse ER graphs • power law random graphs for < 2.5 • spectral radius bound tight in examples, always true? • conditioned on slow recovery, how many nodes are down at intermediate times? • extensions to other graphs and to SIR epidemics
Observations • neither parameter tight • gap for topologies with diverse degrees • spectral radius “seems” to be right • nothing between log n and exp(n)?
0110…0xxx 8 Hitlist, routing worms • hitlist worm • increase I(0) • routing worm • decrease W • BGP table information: W = .29 232 • 29% of IP address space • /8 aggregation: W = .45 232 • 116 out of 256 possible 8 bit prefixes
The appearance of phase transitions N=200, ks =1, kl=0.01 Mean time to absorption goes down from 1047 , to about 0 in a matter of few states
Accuracy of fluid model • population: 360,000 • scan rate h = N(358/min, 1002) normal distr. • scanning space: 232 • I(0) =1 • 100 simulations
Accuracy of fluid model • population: 360,000 • scan rate h = N(358/min, 1002) normal distr. • scanning space: 232 • I(0) =10 • 100 simulations
Accuracy of fluid model • population: 360,000 • scan rate h = N(358/min, 1002) normal distr. • scanning space: 232 • I(0) =10 • 100 simulations