440 likes | 552 Vues
Network Anomography focuses on detecting and addressing volume anomalies in networks through analyzing link traffic measurements. Challenges include under-constraint and changes in traffic patterns. Strategies involve early and late inverse methods, temporal and spatial analysis for anomaly extraction. Fourier analysis is utilized to extract high-frequency components representing sudden changes in traffic behavior.
E N D
Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew RoughanInternet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong Sun Some slides borrow from Yin Zhang
Network Anomaly Detection • Is the network experiencing unusual conditions? • Call these conditions anomalies • Anomalies can often indicate network problems • DDoS attack, network worms, flash crowds, misconfigurations , vendor implementation bugs, … • Need rapid detection and diagnosis • Want to fix the problem quickly • Questions of interest • Detection • Is there an unusual event? • Identification • What’s the best explanation? • Quantification • How serious is the problem?
B C A Network Anomography • What we want • Volume anomalies [Lakhina04]Significant changes in an Origin-Destination flow, i.e., traffic matrix element • Detect Volume anomalies • Identify which O-D pair
Network Anomography • Challenge • It is difficult to measure traffic matrix directly • The anomalies detection problem is somewhat more complex and difficult • First, anomaly detection is performed on a series of measurements over a period of time, rather than from a single snapshot. • In addition to changes in the traffic, the solution must build in the ability to deal with changes in routing. • What we have • Link traffic measurements Simple Network Management Protocol (SNMP) data on individual link loads is available almost ubiquitously. • Network Anomography • Infer volume anomalies from link traffic measurements
An Illustration Courtesy: Anukool Lakhina [Lakhina04]
Only measure at links 1 route 3 link 1 router 2 route 2 link 2 route 1 3 link 3 Mathematical Formulation Problem: Infer changes in TM elements (xt) given link measurements (bt)
Only measure at links 1 route 3 link 1 router 2 route 2 link 2 route 1 3 link 3 Mathematical Formulation bt = At xt (t=1,…,T) Typically massively under-constrained!
Only measure at links 1 route 3 link 1 router 2 route 2 link 2 route 1 3 link 3 Static Network Anomography B = AX Time-invariantAt (= A), B=[b1…bT], X=[x1…xT]
Anomography Strategies • Early Inverse • Inversion • Infer OD flows X by solving bt=Axt • Anomaly extraction • Extract volume anomalies X from inferred X Drawback: errors in step 1 may contaminate step 2 • Late Inverse • Anomaly extraction • Extract link traffic anomalies B from B • Inversion • Infer volume anomalies X by solving bt=Axt Idea: defer “lossy” inference to the last step
Extracting Link Anomalies B • Temporal Anomography: • Fourier / wavelet analysis • Link anomalies = the high frequency components • ARIMA modeling • Diff • EWMA (Exponentially Weighted Moving Average) is ARIMA(0, 1, 1) • Holt-Winters is ARIMA(0, 2, 2) • Temporal PCA • PCA = Principal Component Analysis • Project columns onto principal link column vectors • Spatial Anomography: • Spatial PCA [Lakhina04] • Project rows onto principal link row vectors
Extracting Link Anomalies B • Fourier analysis • Fourier analysis decompose a complex periodic waveform into a set of sinusoids with different amplitudes, frequencies and phases. • The sum of these sinusoids can exactly match the original waveform. • The idea of using the Fourier analysis to extract anomalous link traffic is to filter out the low frequency components. • In general, low frequency components capture the daily and weekly traffic patterns, while high frequency components represent the sudden changes in traffic behavior.
Extracting Link Anomalies B • Fourier analysis • For a discrete-time signal x0, x1, . . . , xN-1, the Discrete Fourier Transform (DFT) is defined by • where fn is a complex number that captures the amplitude and phase of the signal at the n-th frequency • Lower n corresponds to a lower frequency component, with f0 being the DC component, • fn with n close to N/2 corresponding to high frequencies
Extracting Link Anomalies B • Fourier analysis • The Inverse Discrete Fourier Transform (IDFT) is used to reconstruct the signal in the time domain by • An efficient way to implement the DFT and IDFT is the Fast Fourier Transform (FFT). • The computational complexity of the FFT is O(N log(N)).
Extracting Link Anomalies B • FFT based anomography. • 1. Transform link traffic B into the frequency domain: F = FFT(B): apply the FFT on each row of B. (a row corresponds to the time series of traffic data on one link.) • 2. Remove low frequency components: i.e. set Fi = 0, for i ∈[1, c] ∪ [N-c, N], where c is a cut-off frequency. • (For example, using 10-minute aggregated link traffic data of one week duration, and c = 10N/60, corresponding to a frequency of one cycle per hour.) • 3. Transform back into the time domain: i.e. we take B = IFFT(F). The result is the high frequency components in the traffic data, which we will use as anomalous link traffic
Extracting Link Anomalies B • Wavelet analysis • 1. Use wavelets to decompose B into different frequency levels: W = WAVEDEC(B), by applying a multi-level 1-D wavelet decomposition on each row of B. • 2. Then remove low- and mid-frequency components in W by setting all coefficients at frequency levels higher than wc to 0. Here wc is a cut-off frequency level. • 3. Reconstruct the signal: B = WAVEREC(W’). The result is the high-frequency components in the traffic data.
Extracting Link Anomalies B • ARIMA Modeling -- Box-Jenkins methodology, or AutoRegressive Integrated Moving Average (ARIMA) • A class of linear time-series forecasting techniques that capture the linear dependency of the future values on the past. • It has been extensively used for anomaly detection in univariate time series. • To get back to anomaly detection, we simply identify the forecast errors as anomalous link traffic. • Traffic behavior that cannot be well captured by the model is considered anomalous.
Extracting Link Anomalies B • ARIMA(p, d, q) model includes three parameters: • The autoregressive parameter (p), • The number of differencing passes (d), • The moving average parameter (q). • Some model used for detecting anomalies in time-series, • for example, the Exponentially Weighted Moving Average (EWMA) is ARIMA(0, 1, 1); Holt-Winters is ARIMA(0, 2, 2).
Extracting Link Anomalies B • ARIMA(p, d, q) model includes three parameters: • the autoregressive parameter (p), • the number of differencing passes (d), • the moving average parameter (q). where zk is obtained by differencing the original time series d times (when d ≥ 1) or by subtracting the mean from the original time series (when d = 0), ek is the forecast error at time k, φi (i = 1, ..., p) and θj (j = 1, ..., q) are the autoregression and moving average coefficients, respectively.
Extracting Link Anomalies B
Diagnosing Network-Wide Traffic Anomalies Anukool Lakhina, Mark Crovella, Christophe Diot “Diagnosing Network-Wide Traffic Anomalies” SIGCOMM’04,
Extracting Link Anomalies B • Spatial Anomography: Spatial PCA [Lakhina04] • 1. Identify the first axis that the link traffic data have the greatest degree of variance along the first axis • 2. Identify the second axis that the link traffic data have the second greatest degree of variance along the second one, and so on so forth:
Extracting Link Anomalies B • Spatial Anomography: Spatial PCA [Lakhina04] • 3. Divide the link traffic space into the normal subspace and the anomalous subspace • by examining the projection of the time series of link traffic data on each principal axis in order. • As soon as a projection is found that contains a 3σ deviation from the mean, that principal axis and all subsequent axes are assigned to the anomalous subspace. • All previous principal axis are assigned to the normal subspace.
Data Collected Abilene Sprint-Europe
Low Intrinsic Dimensionality of Link Traffic • Studied via Principal • Component Analysis • Key result: • Normal traffic is well approximated as occupying a low dimensional subspace • Reasons: • Links share OD flows • Set of OD flows also low • dimensional
The Subspace Method • An approach to separate normal from anomalous traffic • Normal Subspace, : space spanned by the first k principal components • Anomalous Subspace, : space spanned by the remaining principal components • Then, decompose traffic on all links by projecting onto and to obtain: Residual trafficvector Traffic vector of all links at a particular point in time Normal trafficvector
y A Geometric Illustration In general, anomalous traffic results in a large value of Traffic on Link 2 Traffic on Link 1
Detection • Capture size of vector using squared prediction error (SPE): Result due to [Jackson and Mudholkar, 1979] Traffic on Link 2 Traffic on Link 1
Value of over time (all traffic) Detection Illustration Value of over time(SPE) SPE at anomaly time points clearly stand out
Extracting Link Anomalies B Temporal PCA • PCA = Principal Component Analysis • Similar with Spatial PCA • Project columns onto principal link column vectors
Solving bt = Axt • Temporal Anomography: B = AX • Now if we know B, how to solve the abnormal traffic O-D pairs X? • (1) Pseudoinverse solution • (2) Sparsity maximization
Solving bt = A xt • Pseudoinverse: xt = pinv(A) bt • Shortest minimal L2-norm solution • Solve xt subject to |bt – A xt|2 is minimal
Solving bt = A xt • Maximize sparsity • In practice, we expect only a few anomalies at any one time, so x typically has only a small number of large values. • Hence it is natural to proceed by maximizing the sparsityof x, i.e., solving the following l0 norm minimization problem:
Performance Evaluation • Fix one anomaly extraction method • Compare “real” and “inferred” anomalies • “real” anomalies: directly from OD flow data • “inferred” anomalies: from link data • Order them by size • Compare the size • How many of the top N do we find • Gives detection rate: | top N”real” top Ninferred | / N
Performance Evaluation: Anomography • Hard to compare performance • Lack ground-truth: what is an anomaly? • So compare events from different methods • Compute top M “benchmark” anomalies • Apply an anomaly extraction method directly on OD flow data • Compute top N “inferred” anomalies • Apply another anomography method on link data • Report min(M,N) - | top Mbenchmarktop Ninferred | • M < N “false negatives” # big “benchmark” anomalies not considered big by anomography • M > N “false positives” # big “inferred” anomalies not considered big by benchmark method • Choose M, N similar to numbers of anomalies a provider is willing to investigate, e.g. 30-50 per week
Anomography: “False Negatives” • Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent • PCA methods not consistent (even with each other) • - PCA cannot detect anomalies in the “normal” subspace • - PCA insensitive to reordering of [b1…bT] cannot utilize all temporal info • Spatial methods (e.g. spatial PCA) are not self-consistent
Anomography: “False Positives” • Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent • PCA methods not consistent (even with each other) • - PCA cannot detect anomalies in the “normal” subspace • - PCA insensitive to reordering of [b1…bT] cannot utilize all temporal info • Spatial methods (e.g. spatial PCA) are not self-consistent
Conclusions • Anomography = Anomalies + Tomography • Find anomalies in {xt} given bt=Atxt (t=1,…,T) • Contributions • A general framework for anomography methods • Decouple anomaly extraction and inference components • A number of novel algorithms • Taking advantage of the range of choices for anomaly extraction and inference components • Choosing between spatial vs. temporal approaches • Extensive evaluation on real traffic data • 6-month Abilene and 1-month Tier-1 ISP • The method of choice: ARIMA + Sparsity-L1
Extracting Link Anomalies B • Temporal Anomography: B = BT • Fourier / wavelet analysis • Link anomalies = the high frequency components • ARIMA modeling • Diff: ft = bt-1 bt = bt – ft • EWMA: ft = (1-) ft-1 + bt-1 bt = bt – ft • Temporal PCA • PCA = Principal Component Analysis • Project columns onto principal link column vectors • Spatial Anomography: B = TB • Spatial PCA [Lakhina04] • Project rows onto principal link row vectors