Unsupervised Intrusion Detection Using Clustering Approach

Unsupervised Intrusion Detection Using ClusteringApproach Muhammet Kabukçu Sefa Kılıç Ferhat Kutlu Teoman Toraman

Outline • Introduction • UsingClustering for Intrusion Detection • Methodology • Overall Summary • Conclusion • References

Introduction • Intrusion detection is the process of monitoring the eventsoccurring in a computer system or network and analyzingthem for signs of possible incidents. • Incidents are violations or imminent threats of violation of: * computer securitypolicies, * acceptable use policies, * standard security practices.

Introduction • An intrusion detection system (IDS) is software that automates the intrusion detection process. • IDSs are primarilyfocuses on identifying possible incidents and detecting whenan attacker has successfully compromised a system by exploiting vulnerability in the system.

Introduction

Signature-Based Detection • A signature is a pattern that corresponds to a known threat (e.g. a telnet attempt with a username of "root", which is a violation of an organization's security policy). • Signature-based detection is the process of comparing signatures against observed events to identify possible incidents. Advantage: Very effective at detecting known threats. Disadvantage: Ineffective at detecting previously unknown threats.

Anomaly-Based Detection • The process of comparing definitions of what activity is considered normal against observed events to identify significant deviations. • Capable of detecting previously unknown threats. • Uses host or network-specific profiles.

Detection by Stateful Protocol Analysis • The process of comparing predetermined profiles of generally accepted definitions of benign protocol activity for each protocolstate against observed events to identify deviations. • Relies on vendor-developed universal profiles that specify how particular protocols should and should not beused.

Using Clustering for Intrusion Detection • Methods other than Signature-Based Detectionuse data miningand machine learning algorithms to train on labeled networkdata. • For training data, there are two major paradigms: Misuse DetectionAnomaly Detection. Which one to use ???

Using Clustering for Intrusion Detection- Misuse Detection - • In misuse detection, machine learning algorithms areused with labeled data. • Byusing the extracted features from labeled networktraffic,network data is classified. • By using new data which includes new type of attacks,detection models are retrained.

Using Clustering for Intrusion Detection- Anomaly Detection - • In anomaly detection, models are built by training on normal data, deviations are searched over the normalmodel. • Generating purely normaldata is very difficult and costly in practice. • It is veryhard to guarantee that there are no attacks during the time the traffic is collected from the network.

Using Clustering for Intrusion Detection Misuse DetectionAnomaly Detection. • Use a mechanism to detect intrusions by using unlabeled data as a train model. • Find intrusions buried within thatdata.

Using Clustering for Intrusion Detection UnsupervisedAnomaly Detection Algorithm ASet of Unlabeled Data Detected Intrusion Clusters • Assumptions for unsupervised anomalydetection algorithm: • The intrusions are rare with respect to normal network traffic. • The intrusions are different from normal network traffic. • As a Result: • The intrusions will appear as outliersin the data. Connection Comparison with Detected Clusters Detected malicious attacks

Using Clustering for Intrusion Detection • The unsupervised anomaly detection algorithm clusters the unlabeled data instances together into clusters using a simple distance-based metric.

Using Clustering for Intrusion Detection Once data is clustered, all of the instances that appear in small clusters are labeled as anomaliesbecause; • The normal instances should form large clusters compared to the intrusions, • Malicious intrusions and normal instances are qualitatively different, so they do not fall into the same cluster. Intrusion cluster Normal cluster

Methodology • Description of the dataset • Metric & Normalization • Clustering Algorithm • Portnoy et.al. • Y-means Algorithm • Labeling Clusters • Intrusion Detection

Description of the dataset • KDD Cup 1999 Data • Main attack categories • DOS: Denial of Service, (e.g. synood) • R2L: Unauthorized access from a remote machine(e.g. guessing password) • U2R: Unauthorized access to local superuser (root) privileges (e.g. various buffer overflowattacks) • Probing: Surveillance and other probing (e.g. portscanning) • In total, 24 attack types in training data; 14 additional ones in test data...

Metric & Normalization • Euclidean Metric (for distance computation) • Feature Normalization (to eliminate the difference in the scale of features)

Clustering Algorithm (Portnoy et. al.) . . . d1 d2 Empty set of clusters d3 Xi • d1 is selected. • if d1 < W ( predefined threshold value ), • then Xi is assigned to that cluster. • - else, a new cluster is created, then Xi is assigned to it. Training set

Clustering Algorithm (Portnoy et. al.) • Advantage: No need to know the initial no. of clusters. • Disadvantage: Need to know W, which may label instances wrong in some cases. • However… 20/29

Clustering Algorithm (Y-means Algorithm) • 3 main parts: • assigning instances to k clusters • splitting clusters • merging clusters

Clustering Algorithm (Y-means Algorithm) 1. assigning instances to k clusters redefine cluster centroid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . k: no. of clusters n: no. of instances 1 < k < n Dataset 22/29

Clustering Algorithm (Y-means Algorithm) 2. splitting clusters t ( normal threshold) = 2.32 σ σ = standard deviation . di Xi ( instance ) . t • if di > t , Xi is an outlier. • New clusters are created firstly with the farthest outliers. Confident area

Clustering Algorithm (Y-means Algorithm) 3. merging clusters . Xi If Xi is in the confident area of two clusters, merge these clusters back.

Labeling Clusters • Our first assumption: # of normal instances >># of intrusions • Label instances in large clusters: normal • Label instances in small clusters: intrusion • Start labeling as normal, until 99% of data is labeled as normal, label rest of them as intrusion. Normal cluster Intrusion cluster

Intrusion Detection For test instance x, • Measure the distance to each cluster. • Select the nearest cluster C. • If C is normal cluster, label x as normal, • Otherwise label x as intrusion.

Overall Summary • IDS & IDS Technologies • Using Clustering for Intrusion Detection • Methodology • Description of the dataset • Metric & Normalization • Clustering Algorithm • Labeling Clusters • Intrusion Detection • Conclusion • Unsupervised Clustering is choosen. • KDD Cup 1999 Data • Y-means Algorithm is used for creating ID System.

References [1] KDDCup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html. [2] Y. Guan and A. A. Ghorbani. Y-means: A clusteringmethod for intrusion detection. In Proceedings ofCanadian Conference on Electrical and ComputerEngineering, pages 1083{1086, 2003. [3] L. Portnoy, E. Eskin, and S. Stolfo. Intrusion detectionwith unlabeled data using clustering. In Proceedings ofACM CSS Workshop on Data Mining Applied toSecurity (DMSA-2001), 2001. [4] K. Scarfone and P. Mell. Guide to intrusion detectionand prevention systems (idps), 2007.

Questions?

Unsupervised Intrusion Detection Using Clustering Approach

Unsupervised Intrusion Detection Using Clustering Approach

Presentation Transcript

Intrusion Detection using Honeypots

Intrusion Detection

Unsupervised Learning: Clustering

unsupervised learning - clustering

Unsupervised learning: Clustering

Intrusion Detection

Intrusion Detection Approach in WSN

Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Layered Approach using Conditional Random Fields For Intrusion Detection

Intrusion Detection

Intrusion Detection

Intrusion Detection

Unsupervised Intrusion Detection Using Clustering Approach

Intrusion Detection

Intrusion Detection