Network Payload-based Anomaly Detection and Content-based Alert Correlation

Network Payload-based Anomaly Detection andContent-based Alert Correlation Ke Wang Thesis Defense Aug. 14th, 2006 Department of Computer Science Columbia University

Why do we need payload-based anomaly detection • Attacks that are normal connections may carry bad (anomalous) content indicative of a new exploit • Slow and stealthy, or targeted/hitlist worms do not display “loud and obvious” scanning or propagation behavior detectable via flow statistics • This sensor augments other sensors and enriches the view of the network

Conjecture and Goal • Detect Zero-Day Exploits via Content Analysis • Worms – propagation detectable via flow statistics (except perhaps slow worms) • Targeted Attacks (sophisticated, stealthy, no “loud and obvious” propagation) • True Zero-day will manifest as “never before seen data” delivered to an application or server • Learn “typical/normal” data, detect abnormal data • Generate signature immediately to stop further propagation • No need to wait until “payload prevalence” (a sufficient number of repeated occurrences of the same content) • Develop sensors that are accurate, efficient, scalable, with resiliency to mimicry attacks

Contributions • Demonstrate the usefulness of analyzing network payload for anomaly detection • PAYL: 1-gram modeling • Anagram: higher order n-gram modeling • Randomized modeling/testing that can help thwart mimicry attacks • Ingress/egress payload correlation to capture a worm’s initial propagation attempt • Efficient privacy-preserving payload correlation across sites, and automatic signature generation

Contributions • Demonstrate the usefulness of analyzing network payload for anomaly detection • PAYL: 1-gram modeling • Statistical, semantics/language-independent, efficient • Incremental learning • Clustering for space saving • Multi-centroids fine grained modeling • Anagram: higher order n-gram modeling • Randomized modeling/testing that can help thwart mimicry attacks • Ingress/egress payload correlation to capture a worm’s initial propagation attempt • Efficient privacy-preserving payload correlation across sites

Motivation of PAYL • Content traffic to different ports have very different payload distributions • Within one port, packets with different lengths also have different payload distributions • Furthermore, worm/virus payloads usually are quite different from normal distributions • Previous work: • Attack signature: Snort, Bro • First few bytes of a packet: NATE, PHAD, ALAD • Service-specific IDS [CKrugel02]: coarse modeling, 256 ASCII characters in 6 groups.

Example byte distributions for different ports ssh Mail Web

Example byte distribution for different payload lengths of port 80 on the same host server

CR II distribution versus a normal distribution

How to model “normal” content: 1-gram Centroid The average relative frequency of each byte, and the standard deviation of the frequency of each byte, for payload length 185 of port 80

PAYL operation • Learning phase • Models are computed from packet stream incrementally conditioned on port/service and length of packet • Hands-free epoch-based training • Fine-grained multi-centroids modeling • Clustering: merge two neighbouring centroids if their Manhattan distance is smaller than threshold • Save space, remove redundancy, linear time computation • Improve the modeling accuracy for those length bins with few training data (sparseness) • Self-calibration phase • Sampled training data sets an initial threshold setting • Detection phase • Packets are compared against models using simplified Mahalanobis distance

Performance comparison: single centroid vs. multi-centroids Test Worms: CR, CRII, WebDAV, and nsiislog.dll buffer overflow vulnerability (MS03-022) At 0.1% false positive rate: 5.8 alerts/h for EX, 6 alerts/h for W, 8 alerts/h for W1

PAYL Summary • Models: length conditioned character frequency distribution (1-gram) and standard deviation of normal traffic • Testing: Mahalanobis distance of the test packet against the model • Pro: • Simple, fast, memory efficient • Con: • Cannot capture attacks displaying normal byte distribution • Easily fooled by mimicry attacks with proper padding

Example: phpBB forum attack GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gif?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo|..HTTP/1.1.Host:.128.59.16.26.User‑Agent:.Mozilla/4.0.(compatible;.MSIE.6.0;.Windows.NT.5.1;).. • Relatively normal byte distribution, so PAYL misses it • Abnormal sequence of commands for exploitation • The attack invariants • The subsequence of new, distinct bye values should be “malicious” • What we need: capture order dependence of byte sequences --- higher order n-grams modeling

Contributions • Demonstrate the usefulness of analyzing network payload for anomaly detection • PAYL: 1-gram modeling • Anagram: higher order n-gram modeling • Binary-based modeling • Bloom filter for space efficiency • Semi-supervised learning • Privacy-preserving payload alert for correlation • Randomized modeling/testing that can help thwart mimicry attacks • Ingress/egress payload correlation to capture a worm’s initial propagation attempt • Efficient privacy-preserving payload correlation across sites

Overview of Anagram • Binary-base higher order n-grams modeling • Models all the distinct n-grams appearing in the normal training data • During test, compute the percentage of never-seen distinct n-grams out of the total n-grams in a packet: • Semi-supervised learning • Normal traffic is modeled • Prior known malicious traffic is modeled: Snort Rules, captured malcode • Model is space-efficient by using Bloom filters • Previous work • Foreign system call sequences [Forrest96] • Trie-based n-gram storage and comparison for network anomaly detection [Rieck06]

False positive rate (with 100% detection rate) with different training time and n of n-grams Normal traffic: real web traffic collected of two CUCS web servers Test worms: CR, CRII, WebDAV, Mirela, phpBB forum attack, nsiislog.dll buffer overflow(MS03-022) • Low False positive rate per packet (better per flow) • No significant gain after 4 days’ training • Higher order n-grams needs longer training time to build good model • 3-grams are not long enough to distinguish malicious byte sequences from normal ones

The false positive rate (with 100% detection rate) for different n-grams, under both normal and semi-supervised training – per packet rate

Mimicry attacks • Attackers can mimic the normal traffic and hide the exploit inside “the sled” to avoid the sensor easily. • Example: polymorphic mimicry worm developed by [OK05] targeting PAYL, which do encoding and traffic blending to simulate normal profile.

Contributions • Demonstrate the usefulness of analyzing network payload for anomaly detection • Randomized modeling/testing that can help thwart mimicry attacks • Ingress/egress payload correlation to capture a worm’s initial propagation attempt • Efficient privacy-preserving payload correlation across sites

Randomization against mimicry attacks • The general idea of payload-based mimicry attacks is by crafting small pieces of exploit code with a large amount of “normal” padding to make the whole packet look normal. • If we randomly choose the payload portionfor modeling/testing, the attacker would not know precisely which byte positions it may have to pad to appear normal; harder to hide the exploit code! • This is a general technique can be used for both PAYL and Anagram, or any other payload anomaly detector. • For Anagram, additional randomization, keep n-gram size a secret!

Randomized Modeling • Separate the whole packet randomly into several (possibly interleaved) substrings or subsequences: S1, S2, ..SN, and build one model for each of them • Test packet’s payload is divided accordingly

Shortcomings: • Models from sub-partitions may be similar • Higher memory consumption, no real model diversity • The testing partitioning need to be the same as training partitioning • Less flexibility • Need to retrain when wants to change partitions Top plot is the model built from the whole packet, and the bottom two are the models built from two random sub-partitions.

Randomized Testing • Simpler strategy that does not incur substantial overhead • Build one model for whole packet, randomize tested portions • Separate the whole packet randomly into several (possibly interleaved) partitions: S1, S2, ..SN, • Score each randomly chosen partition separately • Use the maximum score:

PAYL Test:on the mimicry attack designed by [OK05] targeting it,20 fold randomized testing

Anagram Test: average false positive rate and standard deviation with100% detection rate, chunked random mask, 10 fold randomized testing Normal training Semi-supervised training

Contributions • Demonstrate the usefulness of analyzing network payload for anomaly detection • Randomized modeling/testing that can help thwart mimicry attacks • Ingress/egress payload correlation to capture a worm’s initial propagation attempt • Detect slow or stealthy worms • Immediate signature generation • Efficient privacy-preserving payload correlation across sites.

Ingress/egress correlation to detect worm’s propagation • Observation • Self-propagating worms will start attacking other machines (by sending at least the exploit portion of its content) shortly after a host is infected • The attacked destination port will be the same since it’s exploiting the same vulnerability • An approach to stop the worm’s very first propagation attempt • If we detect anomalous egress packets to port i very similar to those anomalous ingress packets to port i, there is a high probability that a worm has started its propagation • Advantage: • Can detect slow or stealthy worms which won’t show probe behavior and thus avoid probe detectors

Metric Data used Handle fragment Similarity score [0, 1] Detect metamorphic String equality (SE) Raw data No 1 for equal, 0 otherwise No Longest common substring (LCS) Raw data Yes 2*C/( L1+ L2) No Longest common subsequence (LCSeq) Raw data Yes 2*C/( L1+ L2) Some Similarity metrics to compare the payloads of two or more anomalous packet alerts Experiment result

|d0|$@|0 ff|5|d0|$@|0|h|d0| @|0|j|1|j|0|U|ff| 5|d8|$@|0 e8 19 0 0 0 c3 ff|%`0@|0 ff|%d0@|0 ff|%h0@|0 ff|%p0@|0 ff|%t0@|0 ff|%x0@|0 ff|%| 0@|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0|\EXP LORER.EXE|0 0 0|SOFTWARE\Microsoft\Windows NT \CurrentVersion\Winlogon|0 0 0|SFCDisable|0 0 9d ff ff ff|SYSTEM\CurrentControlSet\Service s\W3SVC\Parameters\Virtual Roots|0 0 0 0|/Scr ipts|0 0 0 0|/MSADC|0 0|/C|0 0|/D|0 0|c:\,,21 7|0 0 0 0|d:\,,217|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc … LCS signature generation: Code Red II

Previous Work • Worm signature generation: Autograph, Earlybird, Honeycomb, Polygraph, Hasma • Detecting frequently occurring payload substrings or tokens from suspicious IP, which still depends on the scanning behavior • Detection occurs some time after the worm propagation • Cannot detect slow and stealthy worms

Contributions • Demonstrate the usefulness of analyzing network payload for anomaly detection • Randomized modeling/testing that can help thwart mimicry attacks • Ingress/egress payload correlation to capture a worm’s initial propagation attempt • Efficient privacy-preserving payload correlation across sites • Robust and privacy-preserving means of representing content-based alerts. • Automatic signature generation.

Cross-site payload alert correlation • Each site has a distinct content flow • Diversity via content (not system or software) • Find global, common “invariants in content”. • If multiple sites see the same/similar content alerts, it’s highly likely to be a true worm/targeted outbreak • Separate TP’s from FP’s! The False False Positive Problem • Reduces false positives by creating white lists of those alerts that cannot be correlated • Higher standard to prevent mimicry attack • Exploit writers/attackers have to learn the distinct content traffic patterns of many different sites • Need to be privacy-preserving

Related Research • DNAD/Worminator (slow/IP) sharing • Domino alert sharing • The DShield.org model for content sharing and querying • Could also serve as a “trap” to detect attacker watermarking behavior • PeerPressure, Privacy-Preserving friends troubleshooting network

Correlation techniques • Baseline • “Raw” suspect content string-based correlation: String equality (SE), longest common substring (LCS), longest common subsequence (LCSeq), edit distance (ED) • Frequency-modeled 1-gram correlation • Frequency distribution: Manhattan distance • Z-String: supports SE, LCS, LCSeq, ED • Binary-modeled n-gram correlation • N-gram signature, Bloom filter n-gram “signature”

Example suspect content This is a bot command string Original content: 256 bits. Thi, his, is□, s□i, □is, s□a, □a□, a□b, □bo, bot, ot□, t□c, □co, com, omm, mma, man, and, nd□, d□s, □st, str, tri, rin, ing Frequency distribution; the most frequent character is a space (ASCII code 32). Size ≈ 8160 bits. □isamnotTbcdghr List of 3-grams in original string. A box represents a space; the underlined n-gram appears twice in the original alert. 25 n-grams take approximately 600 bits. Z-String; the space (box) is the most frequent character. Non-appearing characters are removed. 15 characters = 120 bits. 0000011010101101001101100110101101010…01010011101010101111000 Bloom filter of above n-grams. If three hash values are used, a minimum optimal size would be ~ 150 bits.

Real traffic evaluation • Goal: measure performance in identifying true alerts from false positives • Ideal: true positives have very high similarity scores, while false positives have very low scores • Mix the collection of attacks into two hours of traffic from www and www1 • Multiple, differently-fragmented instances of Code Red and Code Red II to simulate a real worm attack • Mixed sets are run through PAYL and Anagram, with alerting threshold reduced so that 100% of attacks are detected, but with possibly higher FP rates String evaluation

Real traffic evaluation (II) Range of scores across multiple instances of the same worm (CR or CRII) Range of scores across instances of different worms (CR vs. CRII), e.g., polymorphism False positive score range; blue bar represents 99.9% percentile; white represents maximum score Methods are, from 1 to 8: Raw-LCS, Raw-LCSeq, Raw-ED, Freq-MD, ZStr-LCS, ZStr-LCSeq, Zstr-ED, N-grams with n=5.

Real traffic evaluation (III) • Correlation of identical (non-polymorphic) attacks works accurately for all techniques • Non-fragmented attacks score near 1 • Z-Strings (MD, LCseq, ED) and n-grams handle fragmentation well • Polymorphism is hard to detect; only Raw-LCSeq and n-grams score well • Overall, n-grams are particularly effective at eliminating false positives, and Bloom filters enable privacy preservation

Signature Generation • Each class of techniques can generate its own signature • Raw packets: Exchange LCS/LCSeq • Not privacy-preserving • Byte frequency/Z-Strings • Given the frequency distribution, Z-Strings generated by ordering from most to least frequent and dropping the least frequent • N-grams • Robust to reordering or fragmentation • If position information is available, can “flatten” into a deployable string signature

Signature/Query generation (II) GET./default.ida?XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%u cbd3%u7801%u9090%u6858%ucbd3%u7801%u 9090%u6858%ucbd3%u7801%u9090%u9090%u 8190%u00c3%u0003%u8b00%u531b%u53ff%u 0078%u0000%u0 88 0 255 117 48 85 116 101 106 232 100 133 80 254 1 69 137 56 51 * /def*ult.ida?XXXX*XXXX%u9090% u6858%ucbd3%u7801%u9090%u6858%u cbd3%u7801%u9090%u6858%ucbd3%u7 801%u9090%u9090%u8190%u00c3%u00 03%u8b00%u531b%u53ff%u0078%u000 0%u00=a HT*: 3379 Original CRII packet (first 300 bytes) Byte frequency distribution Z-String (first 20 bytes, ASCII values) Flattened 5-grams (first 172 bytes; “*” implies wildcard)

Accuracy of the signaturesThe accumulative frequency of the signature match scores computed by matching normal traffic against different worm signatures. The closer to the y-axis, the more accurate. The six curves represent the following, in order from the left to the right: 1) n-grams signature, 2) Z-string signature comparing using LCS, 3) LCSeq of raw signature, 4) Z-string signature using LCSeq, 5) LCSeq of raw signature, 6) byte-frequency signature.

Signature for polymorphic worm • Our approaches work poor since they are based on payload similarity • Will there be enough invariants for accurate signature? • Slammer: first byte “0x04” • CLET shellcode 2: “\0xff\0xff\0xff” and “\0xeb\0x31”. • Proposed alternative: “generalized signature” specifying the higher-level pattern of an attack, instead of raw payload based. • “0xeb 0x31”B {92 bytes, entropy: E, “0xff 0xff 0xff”B}

Conclusions • Network payload-based PAYL and Anagram can detect zero-day attacks with high accuracy and low false positives • Randomization help thwart mimicry attack • Ingress/egress correlation detects worm’s initial propagation and generate accurate worm signature • Good at detecting slow/stealth worms • Privacy-preserving payload alerts correlation across sites can identify true anomalies and reduces false positive • Accurate signature generation

Accomplishments • Major papers: • Anagram: A Content Anomaly Detector Resistant to Mimicry Attack, K. Wang, J. Parekh, S. Stolfo, RAID, Sept 2006. • Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic Detection, J. Parekh, K. Wang, S. Stolfo, SIGCOMM LSAD Workshop, Sept, 2006. • Anomalous Payload-based Worm Detection and Signature Generation, K. Wang, G. Cretu, S. Stolfo, RAID, Sept 2005. • FLIPS: Hybrid Adaptive Intrusion Prevention, M. Locasto, K. Wang, A. Keromytis, S. Stolfo, RAID, Sept. 2005. • Anomalous Payload-based Network Intrusion Detection, K. Wang, S. Stolfo, RAID, Sept 2004. • Software implementation (licensed by Columbia): • PAYL sensor • Anagram sensor

Future Work • Further Evaluation – including • measures/features of high-entropy partitions • Optimization problem: model parameter settings (n-gram size, thresholds, etc.), random mask generation • Real deployment of multiple-site correlation • Shadow server architecture implementation and testing • Pushing into the host: integration with instrumented application software

Thank you! • Q/A ?

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Presentation Transcript

Intrusion Detection Alert Correlation

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection

Network Payload-based Anomaly Detection and Content-based Alert Correlation

transAD : A Content Based Anomaly Detector

Benchmarking Anomaly-based Detection Systems

Network-Based Intrusion Detection

Anomalous Payload Based Worm Detection

Signature Based and Anomaly Based Network Intrusion Detection

NEURAL NETWORK-BASED FACE DETECTION

Network-based Intrusion Detection and Prevention

Rule-Based Anomaly Detection on IP Flows

RAIDM: Router-based Anomaly/Intrusion Detection and Mitigation

An Algorithm for Anomaly-based Botnet Detection

ELISHA: A Visual-Based Anomaly Detection System

PANACEA: AUTOMATING ATTACK CLASSIFICATION FOR ANOMALY-BASED NETWORK INTRUSION DETECTION SYSTEMS

Rule-based Anomaly Detection on IP Flows

Anomaly Detection of Web-based Attacks

In/Out Traffic Proportion Based Analyses for Network Anomaly Detection

A Statistical Anomaly Detection Technique based on Three Different Network Features

ITEC 810 Entropy based anomaly detection systems

RAIDM: Router-based Anomaly/Intrusion Detection and Mitigation

Benchmarking Anomaly-Based Detection Systems

Sea Ice

Sea Ice