1 / 29

Volume Anomaly Detection

CS 386M. Volume Anomaly Detection. December 3, 2009 Joohyun Kim Sooel Son. # Diagnosing Network-Wide Traffic Anomalies SIGCOMM 2004. # Spatio-Temporal Compressive Sensing and Internet Traffic Matrices (SIGCOMM 2009). The purpose of our project.

tuan
Télécharger la présentation

Volume Anomaly Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 386M Volume Anomaly Detection December 3, 2009 Joohyun Kim Sooel Son # Diagnosing Network-Wide Traffic Anomalies SIGCOMM 2004. # Spatio-Temporal Compressive Sensing and Internet Traffic Matrices (SIGCOMM 2009)

  2. The purpose of our project • Compare two state-of-the-art anomaly detection methods • Subspace method (PCA) • Spatio-temporal compressive sensing (SRMF) • Perform subspace method on real TMs for fair quantitative and qualitative analysis • Check performance of SRMF on real TMs

  3. Motivation: Network Anomalies • Anomaly is a pattern in the data that does not conform to the expected behavior • Also referred to as outliers, exceptions, peculiarities, surprise, etc. • Relatively infrequent, but cause significant negative consequences when occurred

  4. Volume Anomaly • Volume Anomaly: a sudden positive or negative change in an Origin-Destination flow • Challenges • Getting a correct OD flow matrix is resource intensive. • There are many OD flows (i.e. 11 routers for 1009 time interval = 121 * 1008 size OD matrix ) • Manual inspection is not desirable.

  5. Traffic matrices • Traffic Matrix (TM) • Gives traffic volumes between origins and destinations • Link load matrix: • T*M matrix, T: time, M: traffic volume on a single link • OD flow matrix: • T*M matrix, T: time, M: traffic volume at a single router

  6. Dataset • Abilene TM • http://abilene.internet2.edu/observatory/data-collections.html • One of the most standard datasets for anomaly detection task • 11 routing points, over 1009 time intervals • Provides both OD flow and link load TMs • Comes with ground truth anomalies

  7. PCA (Principal Component Analysis) • A technique deriving the principal vector which captures the maximum variance of measured data. • Most data patterns are captured by the top K principle vectors.

  8. Subspace method • An approach to separate normal traffic from anomalous traffic. • Normal Subspace : space spanned by the first K principal vectors (In the paper, K = 4) • Anomalous Subspace: space spanned by the remaining principal vectors • Intuition: a volume anomaly is an abrupt change. • It should be projected into an anomalous subspace.

  9. The procedure getting anomalies 41 Each link in the network:41 PCA Time 1009 41 4 41 1009 41

  10. Experimental results – Ground Truth (1) Accuracy: 5/6

  11. Experimental results – Ground Truth (2) Accuracy: 83% Accuracy: 25% PCA method is not sensitive against small volume anomalies

  12. Experimental results –Injected Anomalies (1) Average link load 1.5 * 10^8 byte

  13. Experimental results –Injected Anomalies (2)

  14. Injected anomalies on IP flow OD Table Accuracy: 5/37

  15. Injected anomalies on the number of packets OD Table Accuracy: 5/33

  16. Spatio-Temporal Compressive Sensing on Anomaly Detection • PCA is only aware of spatial information • Each column of TM (time series) is considered independently • Incorporating temporal information could produce better results • Sparsity Regularized Matrix Factorization (SRMF) • Involves both spatial and temporal properties of the underlying TM

  17. Compressive Sensing • Exploit low-rank nature of TMs • Xnxm Lnxr * RmxrT(r « n,m) • Basic approach: find X=LRT s.t. A(LRT)=B • (m+n)*r unknowns (instead of m*n) • Solution: Sparsity Regularized SVD (SRSVD) • minimize |(LRT) – X|2 // fitting error +  (|L|2+|R|2) // regularization • Similar to SVD but can handle missing values and indirect measurements

  18. Sparsity Regularized Matrix Factorization • Motivation • The theoretical conditions for compressive sensing to perform well may not hold on real-world TMs • Sparsity Regularized Matrix Factorization • minimize |(LRT) – X|2 // fitting error +  (|L|2+|R|2) // regularization + |S(LRT)|2 // spatial constraint + |(LRT)TT|2 // temporal constraint • S and T capture spatio-temporal properties of TMs • Can be solved efficiently via alternating least-squares

  19. Method • Project input TM onto a low-dimensional, spatially & temporally smooth subspace (LRT)  Estimation of normal traffic • Signal anomalies from (Input Real TM - Estimated TM) • Use standard thresholding to pick 15 most significant differences

  20. Experimental Results – Ground Truth (1) Accuracy: 4/6

  21. Experimental Results – Ground Truth (2) Accuracy: 67% Accuracy: 30% SRMF method shows comparable performance

  22. Experimental Results – Injected Anomalies (1)

  23. Experimental Results – Injected Anomalies (2) Average link load 1.5 * 10^8 byte

  24. Discussion • Both approaches utilize the low rank nature of real TMs • Use low-rank approximation of input TM as normal traffic • Assume anomalous traffic to be difference between input traffic and estimated normal traffic • Both approaches are comparable on detecting ground truth anomalies • SRMF is better to detect injected anomalies

  25. Difficulties • Not enough available datasets for further comparison. • Hard to pick generally well-performing threshold • Highly data-dependent

  26. Thank You!

  27. Two Approaches Considered • Subspace method • Purely spatial method • Shows state-of-the-art performance • Spatio-temporal compressive sensing • Considers both spatial and temporal correlations • Previously tested on synthetic data only

  28. Approaches • Point solutions • Detect outliers from clustering • Rule-based classification • Detection from single-timeseries traffic • Subspace method • Spatio-temporal compressive sensing

  29. Intrusion Detection • Intrusion detection is what you do after prevention has failed • Detect attack in progress • Network traffic patterns, suspicious system calls, etc. • The type of intrusion detection system • Host-based intrusion detection system • Network-based intrusion detection system • Examine packet payloads or headers • Examine the traffic pattern

More Related