Sapphire Worm

Sapphire Worm -Sandeep Pinnamaneni

Introduction • The Sapphire Worm was the fastest computer worm in history. As it began spreading throughout the Internet, it doubled in size every 8.5 seconds. • It infected more than 90 percent of vulnerable hosts within 10 minutes. The worm (also called Slammer) began to infect hosts slightly before 05:30 UTC on Saturday, January 25. Sapphire exploited a buffer overflow vulnerability in computers on the Internet running Microsoft's SQL Server or MSDE 2000 (Microsoft SQL Server Desktop Engine). This weakness in an underlying indexing service was discovered in July 2002. • The worm infected at least 75,000 hosts, perhaps considerably more, and caused network outages and such unforeseen consequences as canceled airline flights, interference with elections, and ATM failures.

Spread of Sapphire Worm Source: http://www.cs.berkeley.edu/~nweaver/sapphire/

Spread of Sapphire Worm Figure: The geographic spread of Sapphire in the 30 minutes after release. Source: http://www.cs.berkeley.edu/~nweaver/sapphire/

Propagation Speed of Sapphire Worm • Propagation speed was Sapphire's novel feature: in the first minute, the infected population doubled in size every 8.5 (±1) seconds. The worm achieved its full scanning rate (over 55 million scans per second) after approximately three minutes, after which the rate of growth slowed down somewhat because significant portions of the network did not have enough bandwidth to allow it to operate unhindered. • Most vulnerable machines were infected within 10-minutes of the worm's release. By comparison, it was two orders magnitude faster than the Code Red worm, which infected over 359,000 hosts on July 19th, 2001. In comparison, the Code Red worm population had a leisurely doubling time of about 37 minutes.

Propagation Speed of Sapphire Worm • While Sapphire did not contain a malicious payload, it caused considerable harm simply by overloading networks and taking database servers out of operation. Many individual sites lost connectivity as their access bandwidth was saturated by local copies of the worm and there were several reports of Internet backbone disruption. • It is important to realize that if the worm had carried a malicious payload, had attacked a more widespread vulnerability, or had targeted a more popular service, the effects would likely have been far more severe.

Sapphire: A Random Scanning Worm • Sapphire's spreading strategy is based on random scanning -- it selects IP addresses at random to infect, eventually finding all susceptible hosts. Random scanning worms initially spread exponentially rapidly, but the rapid infection of new hosts becomes less effective as the worm spends more effort retrying addresses that are either already infected or immune. • Thus as with the Code Red worm of 2001, the proportion of infected hosts follows a classic logistic form of initially exponential growth in a finite system. We refer to this as the random constant spread (RCS) model.

Random Scanning Worm Code Red was a typical scanning worm. This graph shows Code Red's probe rate during its re-emergence on August 1, 2001 as seen by the Chemical Abstract Service, matched against the RCS model of worm behavior. Source: http://www.cs.berkeley.edu/~nweaver/sapphire/

Why Sapphire Was So Fast? • Sapphire spread nearly two orders of magnitude faster than Code Red, yet it probably infected fewer machines. Both worms used the same basic strategy of scanning to find vulnerable machines and then transferring the exploitive payload; they differed in their scanning constraints. While Code Red was latency limited, Sapphire was bandwidth-limited, allowing it to scan as fast as the compromised computer could transmit packets or the network could deliver them. • Sapphire contains a simple, fast scanner in a small worm with a total size of only 376 bytes. With the requisite headers, the payload becomes a single 404-byte UDP packet. This can be contrasted with the 4kb size of Code Red, or the 60kb size of Nimda.

Why Sapphire Was So Fast? • Previous scanning worms, such as Code Red, spread via many threads, each invoking connect() to probe random addresses. Thus each thread's scanning rate was limited by network latency, the time required to transmit a TCP-SYN packet and wait for a response or timeout. • The Sapphire worm's scanning technique was so aggressive that it quickly interfered with its own growth. Consequently, the contribution to the rate of growth from later infections was diminished since these instances were forced to compete with existing infections for scarce bandwidth. Thus Sapphire achieved its maximum Internet-wide scanning rate within minutes.

Sapphire's Pseudo Random Number Generator • Sapphire uses a linear congruent, or power residue, pseudo random number generation (PRNG) algorithm. These algorithms are of the form: x' = (x * a + b) mod m, where x' is the new pseudo-random number to be generated, x is the last pseudo-random number generated, m represents the range of the result, Both a and b are carefully chosen constants. • The initial value of the generator is typically "seeded" with a source of higher quality random numbers to ensure that the precise sequence is not identical between runs.

Sapphire's Pseudo Random Number Generator The writers of Sapphire intended to use a linear congruent parameterization popularized by Microsoft, x' = (x * 214013 + 2531011) mod 2^32. However, they made two mistakes in its implementation. • First, they substituted their own value for the increment: the hex number 0xffd9613c. This value is equivalent to -2531012 when interpreted as a 2's complement decimal, so it seems likely that their representation was in error, and possibly they intended to use the SUB instruction to compensate, but mistakenly used ADD instead. The end result is that the increment is always even.

Sapphire's Pseudo Random Number Generator • Second mistake was to misuse the OR instruction, instead of XOR, to clear a key register -- leaving the register's previous contents intact. As a result, the increment is accidentally XORed with the contents of a pointer contained in SqlSort's Import Address Table (IAT). An interesting feature of this PRNG is that it makes it difficult for the Internet community to assemble a list of the compromised Internet addresses. With earlier worms, it was sufficient to just collect a list of all addresses that probed into a large network. With Sapphire, one would need to monitor networks in every cycle of the random number generator for each salt value to have confidence of good coverage.

Measurements of Sapphire's Spread and Operator Response Figure: The early progress of Sapphire, as measured at the University of Wisconsin Advanced Internet Lab (WAIL) Tarpit. Source: http://www.cs.berkeley.edu/~nweaver/sapphire/

Measurements of Sapphire's Spread and Operator Response Figure: The response to Sapphire over the 12 hours after release, measured in several locations. Source: http://www.cs.berkeley.edu/~nweaver/sapphire/

Measurements of Sapphire's Spread and Operator Response They recorded all distinct infected IP addresses seen by our monitors in the first 30 minutes of worm spread. They noticed 74,856 distinct IP addresses, spread across a wide range of domains and geographic locations. Source: http://www.cs.berkeley.edu/~nweaver/sapphire/

Solution – Filter Algorithm • The idea is to implement a filter on the network stack that uses a series of timeouts to restrict the rate of connections to new hosts such that most normal traffic is unaffected. • Any traffic which attempts connections at a higher rate is delayed. The aim of the filter is to limit the rate of connections to new hosts.

Filter Algorithm The overall system is split into two parts that run in parallel: • a system to determine whether requests for connections are to new hosts; • and a system based on a series of timeouts that limits the rate of those connections. Source: http://www.acsac.org/2002/papers/97.pdf

Filter Algorithm tdelay = d(rattack - rallowed)t0 Where, d is time between timeouts t0 is the time that the connection was placed on the queue. If the attack rate is a lot greater than the allowed rate, then the delay queue will grow at roughly the attack rate, and the delay to the individual connections will grow as the queue length grows. Malicious code that has a high attack rate will thus be severely delayed.

Implications and Conclusions • One implication of this advance is that smaller susceptible populations are now vulnerable to attack. Formerly, small populations (<20,000 machines or less on the Internet) were not viewed as particularly vulnerable to worms, as the probability of finding a susceptible machine in any given scan is quite low. • However, a worm which can infect a population of 75,000 hosts in 10 minutes can similarly infect a population of 20,000 hosts in under an hour. Thus, exploits for less popular software present a viable breeding ground for new worms.

Implications and Conclusions • Though very simple, Sapphire represents a significant milestone in the evolution of computer worms. Although it did not contain a destructive payload, Sapphire spread worldwide in roughly 10 minutes causing significant disruption of financial, transportation, and government institutions. • It clearly demonstrates that fast worms are not just a theoretical threat, but a reality -- one that should be considered a standard tool in the arsenal of an attacker.

References • http://www.cs.berkeley.edu/~nweaver/sapphire/ • http://www.acsac.org/2002/papers/97.pdf • http://www.caida.org/outreach/papers/2002/codered/codered.pdf • http://www.icir.org/vern/papers/cdc-usenix-sec02/

Vandana Gunupudi CSCE 6581 October 11, 2005 Signature Generation for Worms

Agenda • Introduction • Signature Generation Methods • Autograph • Polygraph • Conclusion

Current Worm Detection • Internet Quarantine Methods • Destination port blocking • Infected source host IP blocking • Content-based blocking • Identify the payload content string that worm uses in its infection attempts and filter all flows whose payloads contain that content string

Observations • Worm • String in the payload content unique to a particular worm • Some part of the binary of a worm is invariant • Spreading dynamics of a worm significantly different from genuine applications • This behavior manifests itself as a large number of failed connection attempts • Polymorphic worms • Can change their binary representation during infection process • Encrypt code with a different, random key each time

How does this work? • General Signature Derivation Process • Report of anomalies in traffic from people • Capture the worm trace • Security experts perform manual analysis • Signature of the worm is generated Need to automate the process • Autograph[1] • Automatically generate signatures of previously unknown Internet worms (Focus on TCP worms) • as accurately/quickly as possible • Does not work for polymorphic worms • Polygraph[2] • Find invariant content elsewhere in the worm if binary representation changes

Requirements • Automation: Minimal manual intervention • Signature quality: Sensitive & specific • Sensitive • Should match all worms (have low false negative rate and high true positive rate) • Specific • Should match onlyworms (low false positive rate) • Timeliness: Early detection • Application neutrality • Broad applicability

Autograph • Runs on Linux/BSD Unix • Sits at the edge of a network, listens to traffic • How does it identify worms? • A worm wants to spread; so numerous copies present in the traffic • Each copy of the worm has the same content • Automatically generates signatures for worms in the traffic (works only for TCP) • Signature is a 3-tuple [IP-protocol, dest-port, byteseq] • Match network flows (possibly requiring flow reassembly) against signatures. • match when byteseq within the payload of a flow using the IP-protocol destined for dest-port

Autograph • Flow Classifier • Content-based Signature Generation • Tattler – component to share information among distributed Autograph monitors

Flow Classifier • Use a Port Scanner detection technique • Classify all flows from port-scanning sources as suspicious (simple and efficient) • Autograph can adopt any anomaly detection technique that classifies worm flows as suspicious • Autograph reassembles all TCP flows in the suspicious flow pool • Everyr minutes, Autograph considers initiating signature generation (experimentally determined r = 10 min) • It does so when the suspicious flow pool contains more than a threshold number of flows θ for a single destination port

Signature Generation • Now the goal is to select the most frequently occurring byte sequenceacross the flows in the suspicious flow pool as worm signature • How?: Content-based Payload Partitioning (COPP) • Divide each suspicious flow into smaller content blocks • Count the number of suspicious flows in which each content block occurs • Rank content blocks from most to least prevalent • One or more content block matches w% of flows in suspicious flow pool • Efficient method

Signature Generation • F- set of all suspicious flows • C – set of all content blocks produced by COPP • Choose most prevalent content block as signature

Performance Evaluation • Metrics • Sensitivity = # of true positives / total # of worm flows false negatives • Efficiency = # of true positives / total # of positives false positives • w:what % of suspicious flows in suspicious flow pool match signature • Trace • Contains 24-hour HTTP traffic; 1396 flows • Includes Nimda, CodeRed flows

Signature Quality • Larger block sizes (64 and 128 bytes) generate more specific signatures • As w increases, sensitivity (fewer false negatives) of signatures increases. • A range of w = 90-94.8%, (content block matches w% of flows in pool) produces a good signature

Limitations of Autograph • False positives for unspecific content block • Works only for TCP-based worms • Works only for scanning worms • Does not work for hit-list scanning worms • Does not work for polymorphic worms • Strong assumption: content of worms does not vary across replicated copies

Polymorphic Worms • Changes its appearance with every instance • Polymorphic worms minimize invariant content • Encrypted payload • Obfuscated decryption routine • Byte sequences of different instances look different • Code of worm remains same

Invariant content • Apache-Knacker Exploit header • GET request containing multiple Host headers • Server concatenates multiple Host headers into one, causing a buffer overflow • Invariant protocol framing strings: GET, HTTP/1.1, Host

Approach • Previous approaches use a single, contiguous byte sequence common among suspicious flows • there are multiple invariant substrings • Protocol framing • Value used to overwrite return address • Parts of poorly obfuscated code • combine the substrings : usemultiple disjoint byte strings • Longest substring • “HTTP/1.1”: 93% false positive rate • Most specific substring • “\xff\xbf”: .008% false positive rate (10 / 125,301)

Signature Classes • Conjunction Signature • Signature is a set of tokens (a contiguous byte sequence) • Flow matches signature iff it contains all tokens in the signature • Generation time is linear in total byte size of suspicious pool( well-known string processing algorithm [Hui1992]) • Generated signature: • “GET” and “HTTP/1.1” and “\r\nHost:” and “\r\nHost:” and “\xff\xbf”: .0024% false positive rate (3 / 125,301) • Token Subsequence Signature • Signature is an ordered set of tokens • Flow matches iff it contains all the tokens in signature, in the given order • Generated signature: • GET.*HTTP/1.1.*\r\nHost:.*\r\nHost:.*\xff\xbf: .0008% false positive rate (1 / 125,301) • Use dynamic programming to find longest common token subsequence (lcseq) between 2 samples in O(n2) time [SmithWaterman1981]

Experiments • How many worm samples do we need? • Too few samples  signature is too specific false negatives • Experimental setup • Using a 15 day port 80 trace from lab perimeter • Innocuous pool: First 5 days (45,111 streams) • Suspicious Pool: • Using Apache exploit described earlier • BIND-TSIG exploit: DNS based • Signature evaluation: • False positives:Last 10 days (125,301 streams) • False negatives: 1000 generated worm samples

Results • Correct signature generated 100% of the time when pool size > 2 • Correct signature generated 0% of time when pool size = 2

Conclusion • Stopping spread of novel worms requires early generation of signatures • Autograph: automated signature detection system • Automated suspicious flow selection→ Automated content prevalence analysis • COPP: robustness against payload variability • Distributed monitoring: faster signature generation • Autograph finds sensitive & specific signatures early in real network traces • Does not work for polymorphic worms

Conclusion • Polygraph works for polymorphic worms • Content variability is limited by nature of the software vulnerability • Use multiple, disjoint strings that are invariant across copies of a worm • Accurate signatures can be automatically generated for polymorphic worms • Demonstrated low false positives with real exploits, on real traffic traces

References • H. A. Kim and B. Karp, “Autograph: Toward Automated Distributed Worm Signature Detection, USENIX Security Symposium, 2004 • J.Newsome, B.Karp, D.Song, “Polygraph: automatically generating signatures for polymorphic worms”, Proc. of IEEE Privacy and Security, 2005, pp.226-241.

Data Mining Worm Detection “Detecting Mass-Mailing Worm Infected Hosts by Mining DNS Traffic Data” Authors: Keisuke Ishibashi and Masahiro Ishino Presented by: Terry Griffin

Data Mining Worm Detection Background DNS www.yahoo.com 198.126.44.221 198.126.44.221 RR = <name, TTL, class, type, data> name = www.acm.org TTL = time to live class = IN (internet request) type = “A” resource records are for domain names “MX” resource records are for mail exchange server query = client attempt to resolve an ip address using <name,type> (paper defined)

Data Mining Worm Detection Background Mass Mailing Worms and DNS • Mass mail worms propagate by sending virus emails • many time with self contained SMTP servers • Before they send the mail, • MX query is sent to local DNS server • Find appropriate server for mail address • Self contained SMTP servers keep worm away from ISP’s mail servers (therefore they’re worthless to monitor)

Data Mining Worm Detection Background Data Set • Obtained from largest ISP in Japan • Logged from 15:00 to 17:00 on March 7 , 2005 • There were 31,278,205 queries • Unique hosts = 221,782 • Unique query contents = 1,302,204 REF REF

< name, type, class, TTL, data > Data Mining Worm Detection Background Data Set • Baseline signature derived from a flaw in many Mailing Worms • A null byte was inserted in the wrong place. /0

Sapphire Worm

Sapphire Worm

Presentation Transcript

WORM FARMING

Sapphire

Slammer Worm

Worm

Worm

Sapphire

Sapphire

WORM DRIVE

guinea worm!!

Blaster Worm

Worm Gear

sapphire

Worm Structure

Sapphire/Slammer Worm

Worm gear

Meningeal Worm

The Spread of the Sapphire/Slammer Worm

Worm- Base

The Worm

Sapphire