Data Regularity Impact on Anomaly Detection Accuracy

Benchmarking Anomaly-based Detection Systems Ashish Gupta Network Security May 2004

Overview • The Motivation for this paper • Waldo example • The approach • Structure in data • Generating the data and anomalies • Injecting anomalies • Results • Training and Testing: the method • Scoring • Presentation • The ROC curves: somewhat obvious

Motivation Does anomaly detection depend on regularity/randomness of data ?

Where’s Waldo !

The aim • Hypothesis: • Differences in data regularity affect anomaly detection • Different environments  different regularity • Regularity • Highly redundant or random ? • Example of environment’s affect 010101010101010101010101 Or 0100011000101000100100101

Consequences One IDS : Different False Alarm Rates Need custom system/training for each environment ? Temporal affects: Regularity may vary over time ?

Structure in data Measuring randomness

010101010101010101010101 Or 0100011000101000100100101 Measuring Randomness + Relative Entropy Sequential Dependence Conditional Relative Entropy

The benchmark datasets • Three types: • Training data ( the background data) • Anomalies • Testing data ( background + anomalies ) • Generating the sequences • 5 sets, each set  11 files ( for increasing regularity) • Each set  different alphabet size • Alphabet size  decides complexity

Anomaly Generation • What’s a surprise ? • Different from the expected probability • Types: • Juxta-positional : different arrangements of data • 001001001001001001111 • Temporal • Unexpected periodicities • Other types ?

Types in this paper • Foreign symbol • AAABABBBABABCBBABABBA • Foreign n-gram • AAABABAABAABAAABBBBA • Rare n-gram • AABBBABBBABBBABBBABBBABBAA

Injecting anomalies • Make sure not more than 0.24 %

The experiments The Hypothesis is true

The hypothesis: • Nature of “normal” background noise affects signal detection • The anomaly detector • To detect anomalous subsequences • Learning phase  n-gram probability table • Unexpected event  anomaly ! • Anomaly threshold decides level of surprise

Example of anomaly detection AAC  ANOMALY !

Scoring • Event outcomes • Hits • Misses • False alarms • Threshold • Decides level of surprise • 0  completely unsurprising, 1  astonishing • Need to calibrate

Presentation of results • Presents two aspects: • % correct detections • % false detections • Detector operates through a range of sensitivities • Higher sensitivity  ? • Need the right sensitivity

Interpretation • Nothing overlaps  regularity affects detection !

What does this mean ? • Detection metrics are data dependent • Cannot say: • My XYZ product will flag down 75% percent anomalies with 10% false hit rate ! • Sir, are you sure ?

Real world data • Regularity index for system calls for different users

Is this surprising ? • What about network traffic ?

Conclusions Anomaly Detection Effectiveness Data Structure Evaluation is data dependent

Conclusions Different system Or Change the parameters Change in regularity

Quirks ? • Assumes rather naïve detection systems • “Simple retraining will not suffice” • An intelligent detection can take this into account. • What is really an anomaly ? • If data is highly irregular, won’t randomness produce some anomalies by itself • Anomaly is a relative term • Here anomalies are generated independently

Data Regularity Impact on Anomaly Detection Accuracy

Data Regularity Impact on Anomaly Detection Accuracy

Presentation Transcript

Global Router-based Anomaly/Intrusion Detection (GRAID) Systems

Anomaly Detection

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Anomaly Detection

Benchmarking Anomaly-based Detection Systems

Anomaly Detection Systems

Signature Based and Anomaly Based Network Intrusion Detection

ecs236 Winter 2007: Intrusion Detection #2: Explanation-based Anomaly Detection

Rule-Based Anomaly Detection on IP Flows

Traffic Anomaly Detection

An Algorithm for Anomaly-based Botnet Detection

An Auctioning Reputation System Based on Anomaly Detection

ELISHA: A Visual-Based Anomaly Detection System

Anomaly Detection Systems

Volume Anomaly Detection

PANACEA: AUTOMATING ATTACK CLASSIFICATION FOR ANOMALY-BASED NETWORK INTRUSION DETECTION SYSTEMS

Rule-based Anomaly Detection on IP Flows

Anomaly Detection of Web-based Attacks

ITEC 810 Entropy based anomaly detection systems

RAIDM: Router-based Anomaly/Intrusion Detection and Mitigation

Benchmarking Anomaly-Based Detection Systems

Rule-Based Anomaly Detection on IP Flows