Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science

An Empirical Study of the Multiscale Predictability of Network Traffic Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL 60201

Talk in a Nutshell In-depth trace-based study of predictability of link bandwidth at different resolutions • Binning and wavelet approximations • Generalizations very difficult to make • Aggregation often helps • Predictability does not monotonically increase with decreasing resolution • Predictability largely independent of mechanism • Simple models sufficient

Outline • Motivation and Related Work • MTTA • Traces • Binning Approximations and Wavelet Approximations • Results • Conclusions

Background • Why study predictability of network traffic? • Adaptive applications • Congestion Control • Admission Control • Network management • Eventual goal • Providing application level network traffic queries to adaptive applications • Fine-grain app, e.g., Immersive audio • Coarse-grain app. e.g., Scientific app on grids

(conf_lower, conf_upper, conf_expected) = MTTA::PredictTransferTime(src_ip_address, dest_ip_address, message_size, transport_protocol, conf_level); Our contributions here Predicting aggregate background traffic Dealing with a wide range of time resolutions Message Transfer Time Advisor Target API MTTA Application Query Query Answer Time for transferring a 10MB message, confidence level =0.95 ? Expected transfer time is 50 seconds, confidence interval is [45.9 54.1] seconds

Our Approach High-Resolution Bandwidth Signal Sensor Predictor Low-Resolution Prediction Application Query App MTTA Resolution Selection Query Answer High-Resolution Prediction

Multiresolution Views of Resource Signals • Two Different Approaches • Binning • Commonly used by existing network measurement tools • Wavelets • N-levelstreaming wavelet transform yielding detail signals and approximation signals • Wavelet domain enables many useful analyses

Questions For This Study • What is the nature of predictability of network resource signals? • How does predictability depend on resolution? • What predictive models should be used? • What are the implications for the MTTA?

Tools And Data • RPS: Resource Prediction System Toolkit for Distributed Systems • Tsunami: Wavelet Toolkit for Distributed Systems • NLANR Trace Archive • Internet Traffic Archive (Publicly Available From Us) (Publicly Accessible)

Relevant Previous Work • Groschwitz, et al, ARIMA models to predict long-term NSFNET traffic growth • Basu, et al, Modeling of FDDI, Ethernet LAN, and NSFNET entry/exit point traffic • Leland, et al, Self-similarity of Ethernet traffic • Wolski, et al, Network Weather Service • Sang and Li: Multi-step prediction of network traffic using ARMA and MMPP • Both aggregation and smoothing increase predictability • Our finding: predictability often does not increase monotonically with smoothing

Trace Classification and Analysis Time-series ACF Classification Scheme Histogram PSD Repeated the analysis for a wide-range of resolutions Conclusions Large number and high variety of traces Y. Qiao, and P. Dinda, Network Traffic Analysis, Classification, and Prediction, Technical Report NWU-CS-02-11, Department of Computer Science, Northwestern University, January, 2003

Traces Number of Range of Name Raw Traces Classes Studied Duration Resolutions 1,2,4,…, 1024ms NLANR 180 12 39 90s .125,.25,…, 1024s AUCKLAND 34 8 34 1d 7.8125 ms to 16s BC 4 N/A 4 1h, 1d 90s to 1d 1 ms to 1024 s Totals 218 N/A 77

Binning Approximations • Methodology • Commonly used by existing network measurement tools • Averages over N non-overlapping, power-of-two bins 1 S 8 S 128 S 1024 S Increasing Bin Sizes

Wavelet Approximations Level 2 • Parameterized by a wavelet basis function • Equivalent to binning approach when using the Haar wavelet • Methodology • N-level streaming wavelet transform • D8-wavelet were used for our study Level 1 Level 0 Increasing Approximation Level

Binning Prediction Methodology Binning Component Prediction Component

Wavelet Prediction Methodology Wavelet Component Prediction Component

One-step Ahead Predictions now High Resolution One-step ahead prediction Low Resolution One-step ahead prediction Lower Resolution => Longer Interval Into Future

Predictability Ratio • Predictability ratio = Variance of error signal over variance of resource signal = • Fraction of the “surprise” in the signal left after prediction • The smaller the ratio, the better predictability we have Resource signal =[1 4 10 9] Predictability Ratio =1.33/18 =0.07389 Prediction =[2 3 9 10] Error signal =[1 -1 -1 1]

Wide Range of Prediction Models • Simple Models • MEAN – long term mean of signal • LAST – last observed value as prediction • BM(32) – average over a history window of optimal size • Box-Jenkins Models • AR(8), AR(32) – pure autoregressive • MA(8) – pure moving average • ARMA(4,4) – autoregressive moving average • ARIMA(4,1,4), ARIMA(4,2,4) – integrated ARMA • Long-range dependence model • ARFIMA(4,-1,4) – “Fractionally integrated” ARMA • Nonlinear model • MANAGED AR(32) – TAR variant

Binning Study on NLANR Traces LAST BM(32) With AR Comp • Generally unpredictable • Predictability worse at coarser granularities Log Scale

Binning Study On BC Traces • Weak predictability • Predictability not always monotonically increasing with smoothing LAST MA(8) With AR Comp

Results for AUCKLAND Traces • General predictability of traces • How predictability changes with different resolutions • Relative performance of different predictive models 3 different behaviors for binning study, and 4 different behaviors for wavelet study

AUCKLAND Behavior 1 - Binning • 14 of 34 traces • Predictability converges to a high level with increasing bin size • Commensurate with conclusions from earlier papers LAST BM(8) MA(8) With AR Comp

AUCKLAND Behavior 1 - Wavelet • 7 of the 34 traces • Generally shows monotonic relationship with approximation levels except outliners • Relatively uncommon behavior LAST MA(8) With AR Comp

AUCKLAND Behavior 2 - Binning • 15 of 34 traces • Presence of sweet spot - optimal bin size that maximizes predictability • Contradicts earlier work MA(8) Sweet Spot LAST BM(8) Max Predictability With AR Comp

AUCKLAND Behavior 2- Wavelet • 13 of the 34 AUCKLAND traces • a sweet spot at a particular scale • Contradicting earlier work Sweet Spot MA(8) LAST Max Predictability With AR Comp

AUCKLAND Behavior 3 - Binning MA(8) LAST BM(8) With AR Comp • 11 of the 34 traces • Non-monotonic relationship between scale and predictability • Predictability weaker than behavior 1 and 2

AUCKLAND Behavior 3 - Wavelet • Uncommon, 5 of 34 traces • Multiple peaks and valleys at different approximations • Predictability not as strong as the earlier two classes MA(8) MA(8) LAST With AR Comp

AUCKLAND Behavior 4 - Wavelet • 3 of the 34 traces • Predictability ratio plateaus and becomes more predictable at coarsest resolutions • Behavior did not occur in binning study LAST MA(8) With AR Comp

Conclusions In-depth trace-based study of predictability of link bandwidth at different resolutions • Binning and wavelet approximations • Generalizations very difficult to make • Aggregation often helps • Predictability does not monotonically increase with decreasing resolution • Predictability largely independent of mechanism • Simple models sufficient

Implications for Message Transfer Time Advisor (MTTA) • Online multiscale prediction system to support MTTA is feasible • Likely to be more accurate for WAN traffic • Often a natural time scale for prediction • Adaptation likely best here • Prediction system must itself adapt to changing network behavior

Current and Future Work D. Lu, Y. Qiao, P. Dinda, and F. Bustamante, Characterizing and Predicting TCP Throughput on the Wide Area Network, Technical Report NWU-CS-04-34, Department of Computer Science, Northwestern University, April, 2004. Wide-area TCP throughput characterization and prediction Wide-area Parallel TCP throughput modeling and prediction Tsunami Wavelet Toolkit D. Lu, Y. Qiao, P. Dinda, and F. Bustamante, Modeling and Taming Parallel TCP on the Wide Area Network, Technical Report NWU-CS-04-35, May, 2004 J. Skicewicz, P. Dinda, Tsunami: A Wavelet Toolkit for Distributed Systems, Technical Report NWU-CS-03-16, Department of Computer Science, Northwestern University, November, 2003.

For MoreInformation • Prescience Lab • http://plab.cs.northwestern.edu • Tsunami and RPS Available for Download • http://rps.cs.northwestern.edu • Contact • yqiao@cs.northwestern.edu

AUCKLAND Behavior 1-Binning • 14 of 34 traces • Predictability converges to a high level with increasing bin size • Commensurate with conclusions from earlier papers

AUCKLAND Behavior 1-Wavelet • 7 of the 34 traces • Generally shows monotonic relationship with approximation levels except outliners • Relatively uncommon behavior

AUCKLAND Behavior 2-Binning • 15 of 34 traces • Presence of sweet spot, an optimal bin size that maximize predictability • Contradicts the conclusion of earlier works

AUCKLAND Behavior 2-Wavelet • 13 of the 34 AUCKLAND traces • a sweet spot at a particular approximation scale for maximum predictability • Contradicting earlier work

AUCKLAND Behavior 3-Binning • Uncommon, 5 of 34 traces • Multiple peaks and valleys at different bin sizes • Predictability not as strong as the earlier two classes

AUCKLAND Behavior 3-Wavelet • 11 of the 34 traces • Non-monotonic relationship between the approximation scale and the predictability • Predictability weaker then class 1

AUCKLAND Behavior 4-Wavelet • 3 of the 34 traces • The predictability ratio reaches plateaus and becomes more predictable at coarsest resolutions • A behavior not happened for binning study

Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science