Download
single pass anomaly detection in network n.
Skip this Video
Loading SlideShow in 5 Seconds..
Single Pass Anomaly Detection In Network PowerPoint Presentation
Download Presentation
Single Pass Anomaly Detection In Network

Single Pass Anomaly Detection In Network

240 Vues Download Presentation
Télécharger la présentation

Single Pass Anomaly Detection In Network

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Single Pass Anomaly Detection In Network Deepak Garg 05329015 KReSIT MTech II IIT Bombay Guide by: Prof. Om P. Damani • Anomaly Detection • Refer to network behaviour, which deviate from normal network behaviour • Study typical normal use and detect abnormal usage • Nature of attack is unknown • Misuse Detection • Use previously known attacks and flag matching patterns • Nature of attack is known

  2. Motivation and Our Approach • To prevent an attack one needs to detect it as soon as possible. • The problem of anomaly detection is a real-time problem and, thus, we need to detect an anomaly in single pass. • Algorithm should be adaptive. • Here we are using clustering algorithm for anomaly detection. Where similar data point group together into cluster. • Our algorithm combines three single pass algorithms, namely, ADWICE-BOX, GenIc and Tree-based. The efficiency of these algorithms is different and our aim is to combine the results of these algorithms to achieve a higher efficiency than any of these algorithms individually.

  3. GenIc • Select input parameters M, K, N. • Initialize M cluster with weight Wi = 1. • Incremental clustering on each chunks. • Check for cluster survival. • Calculate final clusters. • Change the clusters killing approach. • Efficiency is 81.2%

  4. Tree-Based Efficiency is 84.2%.

  5. ADWICE - BOX • Input parameters: • Total number of clusters M. • Initial threshold T. • Threshold step for updating T. • Leaf size LS. • ADWICE uses BIRCH as the clustering algorithm for learning. BIRCH suffer with many problems. • Cluster structure. • Threshold calculation. • Distance based calculation. • CF = (n, Ls, ss, min, max). • Efficiency is 92.1%. • This is memory efficient. ADWICE store CF in main memory instead of all training data points.

  6. Cont..... FP Rate = FP/(FP + TN) Detection Rate = TP/(TP + FN)

  7. Conclusion and Future Work • The efficiency of the GenIc algorithm is found to be 81.2%, of the Tree-Based algorithm is found to be 84.1% and of the ADWICE-BOX algorithm is found to be 92.1%. Our algorithm which combines results from all these algorithms exhibits an efficiency of 88.1%. • Make the algorithm independent from the the maximum number of clusters. • Increase the total efficiency of combined algorithms. • We will also concentrate on learning from input data set, so it will not depend on the order of input dataset • we also worked on decreasing the false positive rate by introducing additional statistical parameters for the clusters.

  8. References • Kalle Burbeck and Simin Nadjm-Tehrani. Adwice : Anomaly detection with real-time incremental clustering. In Seongtaek Chee Choonsik Park, editor, Lecture Notes in Computer Science, pages 407 – 424. Springer Berlin / Heidelberg, jan 2005. • Philip K. Chan, Matthew V. Mahoney, and Muhammad H. Arshad. A machine learning approach to anomaly detection. Technical Report from Department of Computer Sciences Florida Institute of Technology, March 2003. • C. Gupta and R. Grossman. Genic: A single pass generalized incremental algorithm for clustering. In Chetan Gupta and Robert L. Grossman. Genic: A single pass generalized incremental algorithm for clustering. SIAM,, 2004. • Network Traffic anomaly detection based on packet header [Matthew v. Mahoney ] ACM 2003.

  9. Packet Header Anomaly Detection • Checking anomaly field of packet header. • Link Layer • Network Layer • Transport Layer • The model detect novel attacks. • Split large field and merge small field • During training record each value of fields • Problems • required excessive memory • not enough training data results over fits the data • To Solve • PHAD-H1000 • PHAD-C32, PHAD-C1000