Single Pass Anomaly Detection In Network Deepak Garg 05329015 KReSIT MTech II IIT Bombay Guide by: Prof. Om P. Damani • Anomaly Detection • Refer to network behaviour, which deviate from normal network behaviour • Study typical normal use and detect abnormal usage • Nature of attack is unknown • Misuse Detection • Use previously known attacks and flag matching patterns • Nature of attack is known
Motivation and Our Approach • To prevent an attack one needs to detect it as soon as possible. • The problem of anomaly detection is a real-time problem and, thus, we need to detect an anomaly in single pass. • Algorithm should be adaptive. • Here we are using clustering algorithm for anomaly detection. Where similar data point group together into cluster. • Our algorithm combines three single pass algorithms, namely, ADWICE-BOX, GenIc and Tree-based. The efficiency of these algorithms is different and our aim is to combine the results of these algorithms to achieve a higher efficiency than any of these algorithms individually.
GenIc • Select input parameters M, K, N. • Initialize M cluster with weight Wi = 1. • Incremental clustering on each chunks. • Check for cluster survival. • Calculate final clusters. • Change the clusters killing approach. • Efficiency is 81.2%
Tree-Based Efficiency is 84.2%.
ADWICE - BOX • Input parameters: • Total number of clusters M. • Initial threshold T. • Threshold step for updating T. • Leaf size LS. • ADWICE uses BIRCH as the clustering algorithm for learning. BIRCH suffer with many problems. • Cluster structure. • Threshold calculation. • Distance based calculation. • CF = (n, Ls, ss, min, max). • Efficiency is 92.1%. • This is memory efficient. ADWICE store CF in main memory instead of all training data points.
Cont..... FP Rate = FP/(FP + TN) Detection Rate = TP/(TP + FN)
Conclusion and Future Work • The efficiency of the GenIc algorithm is found to be 81.2%, of the Tree-Based algorithm is found to be 84.1% and of the ADWICE-BOX algorithm is found to be 92.1%. Our algorithm which combines results from all these algorithms exhibits an efficiency of 88.1%. • Make the algorithm independent from the the maximum number of clusters. • Increase the total efficiency of combined algorithms. • We will also concentrate on learning from input data set, so it will not depend on the order of input dataset • we also worked on decreasing the false positive rate by introducing additional statistical parameters for the clusters.
References • Kalle Burbeck and Simin Nadjm-Tehrani. Adwice : Anomaly detection with real-time incremental clustering. In Seongtaek Chee Choonsik Park, editor, Lecture Notes in Computer Science, pages 407 – 424. Springer Berlin / Heidelberg, jan 2005. • Philip K. Chan, Matthew V. Mahoney, and Muhammad H. Arshad. A machine learning approach to anomaly detection. Technical Report from Department of Computer Sciences Florida Institute of Technology, March 2003. • C. Gupta and R. Grossman. Genic: A single pass generalized incremental algorithm for clustering. In Chetan Gupta and Robert L. Grossman. Genic: A single pass generalized incremental algorithm for clustering. SIAM,, 2004. • Network Traffic anomaly detection based on packet header [Matthew v. Mahoney ] ACM 2003.
Packet Header Anomaly Detection • Checking anomaly field of packet header. • Link Layer • Network Layer • Transport Layer • The model detect novel attacks. • Split large field and merge small field • During training record each value of fields • Problems • required excessive memory • not enough training data results over fits the data • To Solve • PHAD-H1000 • PHAD-C32, PHAD-C1000