Analyzing the Witty Worm: Forensic Insights and Detection Strategies

ICSI Work on Detection/Defense Vern Paxson, Nicholas Weaver, et al September 20, 2005

Overview • Forensic analysis of “Witty” • Internet “Situational Awareness” • Scan detection • Detecting “Triggers” • Preliminary: signature white-listing • Students: Abhishek Kumar (Georgia Tech), Vinod Yegneswaran (UWisc) Jaeyeon Jung (MIT), Juan Caballero (CMU), Jayanthkumar Kannan (UCB), Christian Kreibich (Cambridge)

Forensic Analysis of Witty • March 2004 (flaw announced previous day) • Single UDP packet - stateless spreading • Exploited flaw in the passive analysis of Internet Security Systems products • Payload: slowly corrupt random disk blocks • Telescope data from UCSD/CAIDA /8 • Also UWisc /8, sampled 1-in-10

Witty Abstract Pseudo-code • Seed the PRNG using system time. • Send 20,000 copies of self to randomly selected destinations. • Open physical disk chosen randomly between 0 .. 7. • If success: • Overwrite a randomly chosen block on this disk. • Goto line 1. • Else: • Goto line 2.

More Detailed Pseudo-code srand(seed) { X seed } rand() { X  X*214013 + 2531011; return X } main() • srand(get_tick_count()); • for(i=0;i<20,000;i++) • dest_iprand()[0..15] || rand()[0..15] • dest_portrand()[0..15] • packetsize 768 + rand()[0..8] • packetcontentstop-of-stack • sendto() • if(open_physical_disk(rand()[13..15] )) • write(rand()[0..14] || 0x4e20) • goto 1 • else goto 2

Witty Becomes Deterministic • Given top 16 bits of linear congruential pseudo-random number generator, can brute-force possible bottom bits to recover the pseudo-random state • Keys to the kingdom: infectee operation effectively becomes deterministic (except for pesky reseeding) with packets carrying an implicit sequence number • So, for example, we can compute each infectee’s local access bandwidth even in the presence of heavy packet loss (since Window’s sendto() call is blocking) • Just based on sequence number of packets seen @ telescope and the amount of data sent between them

Inferred Access Bandwidth of Individual Witty Infectees

Precise Bandwidth Estimation vs. Rates Measured by Telescope

} Plus one more every 20,000 packets, if disk open fails srand(seed) { X seed } rand() { X  X*214013 + 2531011; return X } main() • srand(get_tick_count()); • for(i=0;i<20,000;i++) • dest_iprand()[0..15] || rand()[0..15] • dest_portrand()[0..15] • packetsize 768 + rand()[0..8] • packetcontentstop-of-stack • sendto() • if(open_physical_disk(rand()[13..15] )) • write(rand()[0..14] || 0x4e20) • goto 1 • else goto 2 } 4 calls to rand() per loop } Or complete reseeding if not

Witty Infectee Reseeding Events • Recall every 20,000 packets, Witty burns a random number picking a disk to open & trash. For packets with state Xi and Xj: • If from the same batch of 20,000 then • j - i = 0 mod 4 • If from separate but adjacent batches, for which Witty did not reseed, then • j - i = 1 mod 4 (but which of the 100s/1000s of intervening packets marked the phase shift?) • If from batches across which Witty reseeded, then no apparent relationship. • Lets us find the phase of Witty reseeding events …

Finding Each Infectee’s Random Seed • Given the phase of reseeding events … • … plus the fact that Witty uses uptime (in msec) for its entropy … • thus its seeds increase linearly with time … • plus some computational geometry … • We can extract each infectee’s random seed • I.e. we know its uptime • And, by observing times it didn’t reseed, how many disks it has attached

Uptime of 750 Witty Infectees

Disk Drives Per Witty Infectee

Given Exact Valuesof Seeds Used for Reseeding … • More generally, we know every packet each infectee sent • Can compare this to when new infectees show up • i.e. Who-Infected-Whom

Infection Attempts That WereToo Early, Too Late, or Just Right Infector/Infectee Signature

Witty is Incomplete • Recall that LCD PRNG generates a complete orbit over a permutation of 0..232-1. • But: Witty author didn’t use all 32 bits of single PRNG value • dest_ip (Xi)[0..15] || (XI+1)[0..15] • This does not generate a complete orbit! • Misses 10% of the address space • Visits 10% of the addresses (exactly) twice • So, were 10% of the potential infectees protected?

Time When Infectees Seen At Telescope Doubly-scanned infectees infected faster Unscanned infectees still get infected! In fact, some are infected Extremely Quickly!

How Do Unscanned Infectees Become Infected? • Multihomed host infected via another address • DHCP or NAT aliasing • But what about the extra-quick ones? • Either they were passively infected and had a large cross-sections • Or they were known in advance to the attacker

Uptime of 750 Witty Infectees Part of a group of 135 infectees from same /16

Time When Infectees Seen At Telescope Most also belong to that /16

Witty Started With A “Hit List” • Initial infectees exhibit super-exponential growth  they weren’t found by random scanning • (And can in fact show large-scale passive infection unlikely) • Prevalent /16 = U.S. military base • Attacker knew of ISS security software installation at military site  ISS insider(or ex-insider) • Fits with very rapid development of worm after public vulnerability disclosure

Are All The Worms In Fact Executing Witty? • Answer: No. • One “infectee” probes addresses not on the orbit, each of the form A.B.A.B rather than A.B.C.D. • Each probe contains Witty contagion, but lacks randomized payload size. • Shows up very near beginning of trace. • Patient Zero - machine attacker used to launch Witty. (Really, Patient Negative One.) • European retail ISP. • Communicated to law enforcement.

Implications of Witty Forensics • Provided a degree of worm attribution • (truth be told, doesn’t require the full analysis) • Powerful demonstration of opportunistic measurement and exploiting structure • Very labor intensive • A one-trick pony?

Internet “Situational Awareness” • Separate from ICSI honeyfarm, at LBL we operate a 2,560 honeynet w/ honeyd responders • Basic question: how do we tell when it sees something new … • … and interesting • Idea: • Characterize “background radiation” in abstract terms • Remove any matches, consider remainder “new” … • … except first run for a few months to converge on full set of abstractions

Internet “Situational Awareness”, con’t • It doesn’t work. • There is constant churn in what arrives that’s new • Though often with very minor variations • In principle removable, but need better meta-abstractions for doing so • Basic question #2: What can we say about an “event” seen by the honeynet? • Is it a worm, a botnet, a misconfiguration? • If a botnet, could it be more than one? Is the scanning coordinated? How large a region is the scan targeting?

Internet “Situational Awareness”, con’t • It doesn’t work ... Yet. • Significant noise problems • Significant modalities & variations • Calibration difficulties

Scan Detection • TRW (Threshold Random Walk) very effective at detecting random scanners … • … at least, at a site’s border • (we now have some enterprise traces to evaluate) • What about non-random scanning worms? • Topological, meta-server • Idea: detect anomalously high fan-out rate • But with what detection threshold? Too low and busy hosts trigger false positives. Too high and worm can fly under the radar.

Applying Sequential Hypothesis Testing to Rate-based Detection • Idea: per-host, learn its past rate of contacting new hosts • This becomes its Bayesian prior for non-infection • Hypothesize higher rate for infected hosts • As new contacts made, apply SHT to decision between infection/non-infection • Benefits: • No single fixed detection threshold • Host’s behavior somewhat integrated over multiple time scales by updates to SHT

RBS (Rate-Based Seq. Hyp. Testing) • Math based on Poisson arrivals for hosts contacting new destinations (not too bad an assumption) • Evaluated on partial enterprise traces • Proxies for topological scanners: internal security scanner, web crawlers, printer manager, service monitor • Prior for benign fan-out rate: 3.8 Hz • Preliminary: works fairly well, ≈ 1 FP/hr • Also assess hybrid, RBS+TRW • But: • FP high enough to make automatic response problematic • Topological worm can still spread very fast @ 3.8 Hz if avoids TRW’s failure detection

DNS-Based Scan Detection • Previous work: watch DNS traffic to detect random-address scanners because not preceded by name lookup • Idea (preliminary): for non-random scanning worms, use a site’s DNS server to gain insight into what can’t otherwise be seen • The hope: even if scanning activity occurs within an unmonitored subnet, for topological worms will still often be preceded by DNS lookup that is seen at DNS server • Assessed on traces from LBL’s name servers • Problem: there are a lot of hosts with significant DNS fan-out (also, surprisingly, a lot of failure to cache previous answers)

DNS-Based Scan Detection, con’t • Another idea: analyze DNS lookups to spot potential contact graphs • I.e., A looks up B which then looks up C which looks up D • Somewhat more promising, but: • Needs to work on short chains, since trouble likely grows exponentially with chain-length • Trace evaluation finds clusters of hosts that frequently look each other up. Need to distinguish these from true contact graphs (by training? by a “tell”?)

Detecting “Triggers” • Observation: many forms of successful attack/abuse manifest as incoming traffic to a host H triggers H to initiate/receive connections it otherwise wouldn’t: • “Phone home” signal on successful exploits • Also done by opening up a new port that’s probed by attackware to determine success • Incoming worm traffic triggers outgoing scanning • Incoming email/IRC triggers outgoing email/IRC • Idea: such triggers manifest as apparently unrelated connections occurring closer in time than should happen just due to chance

Detecting “Triggers”, con’t • Mathematical framework assumes that application sessions well-modeled as Poisson process. • Compute probability that two independent Poisson processes would occur as close together as observed. If low, flag as anomalous. • Requires recognizing known session structure, e.g., FTP user connection + FTP data connections … + optional ident connection. Or: SMTP in to known server (again w/ optional ident) that leads to SMTP out as it forwards it. • We codified 39 of these

Detecting “Triggers”, con’t • This works! … in terms of finding “hidden causality”, i.e., connections that are related even though not part of one of the recognized sessions. • This doesn’t work! …in terms of assuming that such hidden causality reflects abuse. • Instead, it nearly always means we’ve found a new type of (benign) application session. • Prevalence could be skewed by degree to which LBL’s traffic includes a very diverse set of applications. • We got the FP rate down to a few dozen per day; not good enough. Serves as good anomaly signal but not actionable. • We’re now thinking about recasting in terms of automatically discovering session structure.

Signature White-listing • Problem: when automatically distilling signatures (e.g.., from honeypot traffic), how do we ensure that the signature doesn’t reflect benign/common protocol elements? • E.g., USER-AGENT: Mozilla/4.0 (compatible; MSIE 6.0b; Windows 98) • Idea: run signature distillation over large corpus of mostly benign traffic, identify frequently occurring protocol elements for white-listing • Status: basic algorithms developed, preliminary test on HTTP traces promising … • … with key questions being how will it scale to sufficiently large datasets … • … and will this suffice to construct a complete enough list?

(Additional Slides Re Witty Analysis)

Analyzing the Witty Worm: Forensic Insights and Detection Strategies