Huai Huang Dept. of Electronic Engineering Queen Mary, University of London

Analysis of self-similar Traffic Using Multiplexer & Demultiplexer Loaded with Heterogeneous ON/OFF Sources Huai Huang Dept. of Electronic Engineering Queen Mary, University of London

Overview • Background Knowledge • Motivation & Model description • Results & Analysis • Achievements of the Research • Questions from the Research

Background Knowledge • Traditional Poisson-based models for Voice and Early Data Networks (before early 1990s) • Packet arrivals: Call arrivals (Poisson) • Exponential holding times Traditional network traffic models, most of which assume Markovian characteristics, have been used extensively as an attractive means to the simulation and control of the networks before the early 1990s; in many cases they prove adequate for evaluating network performance and show their practicality.

Background knowledge • Big Bang from 1993 • “On the Self-Similar Nature of Ethernet Traffic”Will E. Leland, Walter Willinger, Daniel V. Wilson, Murad S. Taqqu • Extract from abstract : “We demonstrate that Ethernet local area network (LAN) traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal behavior, that such behavior has serious implications for the design, control, and analysis of high-speed…” • Evidence of Self-similarity and Long-Range Dependence in network traffics • Burstiness on multiple time scales • Highly variable traffic • Heavy-tailed distributions of file sizes and corresponding transmission times That Changed Everything…..

Background Knowledge • Self-Similarity • Let X = (Xk : k>0)be stationary process representing the amount of data transmitted in consecutive short time periods. • Let Xk(m)= 1/m kmi=(k-1)m+1 Xiwhere m≥ 1denote the m aggregated process. • Xis self-similar if Xand m1-H X(m)have the same variance and autocorrelation ( with Hurst parameter H ). • Long-range Dependency ( LRD ) • Autocorrelation r(k)  k -β, as k , which means the process follows a power law, rather than exponential decaying.( 0<β<1 ) • H=1-β/2, so self-similar process shows long-range dependency if 0.5<H<1 • Heavy-tailed Distribution • A distribution of a random variable P is said to be heavy-tailed if P{ X > x } ≈ x -α , as x  & 0 < α< 2 • If α≤ 1, the distribution has an infinite mean. If α≤ 2, the distribution has an infinite variance.

Background knowledge

Self-Similar traffic V.S.Poisson Traffic

LRD V.S.SRD • LRD traffic streams are highly correlated at every time scales. • SRD traffic streams has negative exponentially distributed inter arrival times.

Heavy-tailed V.S.Exponential • The PDF of the Pareto (Heavy-tailed)distribution decays slowly as the batch size increases. In log-log plot, it decays linearly and have very big batch size. • While the PDF of the Exponential distribution decays very fast as the batch size increases.

Background knowledge Multiplexer is a key element of the modern high-speed flow networks in that statistical multiplexing allows increasing network utilization considerably. It allows statistical multiplexing of different sources to make efficient use of the network resources. Modelling the multiplexer loaded with heterogeneous sources has been done to get the performance evaluation of the aggregate traffic. These studies get many useful results.

Motivation & Model description However, most of them just considered multiplexing the traffic, and didn’t investigate the statistic features of the individual traffic flows after they divided by the demultiplexer. Actually, it is very interesting and valuable to study on the related issues.

ON/OFF Model for traffic generation • We choose ON/OFF model is because it is practical and popular for network traffic modeling, and matches very well with the real network activities: active and silent. • We use two methods to generate the ON/OFF input traffic: using the Pareto and Exponential functions, and using the chaotic maps.

Traffic Pattern for input sources

Results & Analysis ( 1 million run time) • Take case ‘0110’ as an example, from the simulation, we can obtain the statistic features of the ON and OFF periods for both the inputs and outputs. • From the figures we can see the outputs share the same attribute as the inputs. The input is ‘0110’, and the output is ‘0110’ too. • We can also get the statistics of the buffer state and delay time from the simulation.

Simulation results in brief ( 1 million ) • A tick or cross in the column ‘Unclear about the Output’ indicates whether or not there is the need for further investigating about the output. • A tick or cross in the column ‘Big Delays for the packets through the server’ indicates whether or not there are big delays for the packets, and that means whether or not we need apply some control algorithms on the server to reduce the big delays. • In the table, ‘0’ means the sojourn time of the traffic is exponentially distributed; ‘1’ means the sojourn time of the traffic is Pareto distributed; We use ‘2’ denotes the statistic feature is not ‘0’ or ‘1’, or we are just unsure about what it is.

The traffic Pattern ‘2’ in the results • We highlight the ‘2’, and we use the log-log plot and the lineal-log plot with different scales to show what’s the difference between ‘0’,’1’ and ‘2’. • In the log-log plots, We can clearly see from the graphs that the highlighted ‘2’ looks like exponential distribution, and it doesn’t have any sojourn time larger than 100 timeslots. • In the linear-log plot, the Exponential-distributed traffic looks like a straight line, but the highlighted ‘2’ turns outside just like the Pareto distribution.

Results & Analysis (100 million timeslot) • Though the simulations on the magnitude of 100 millions, we clear out the ambiguous situations. • Although we have ‘2’ in the final results, but in here, the ‘2’ is not unclear. It is just another kind of the traffic which is not behaving like Pareto or Exp.

Results & Analysis (100 million timeslot) • As we can see in the graphs, the ‘2’ is almost as the Exp distributed before its probability reaches 0.0001, after that, it looks like a straight line as Pareto distributed input source but with much smaller tails.

Results & Analysis (100 million timeslot) • This is the final result of the outputs of the MUX\DEMUX network.

Results & Analysis • Validation of the simulation result • The Multiplexed outputs, does it agree with the results done by other people? • The Queue Analysis, same question. • Using different Parameters for the Simulation • Highlight on some interesting cases to investigate further. • Find out more subtle interaction between the traffic sources, especially about the Heavy-tailed sojourn time of the traffic.

Multiplexed & Demultiplexed outputs

Queue Analysis of the simulation • We find that if the ON period distribution is Pareto distributed in any of the input sources, the Probability Density Function (PDF) of the queue decays like a straight line, otherwise, it decays exponentially fast.

Sim using Different Parameters We choose 0101, 0110, 0111, 1011, 1111 to study on, and we reach a rough conclusion below: • The MUX/DEMUX network doesn't change the attribute of the heavy-tailed distribution of the OFF period very much. • The MUX/DEMUX network tends to change the attribute of the heavy-tailed distribution of the ON period a lot. • If a heavy-tailed ON sojourn-time traffic multiplexed with a exponential ON sojourn-time traffic, usually, the heavy-tailed ON will be less burst than the original traffic. • If a heavy-tailed ON sojourn-time traffic multiplexed with a another heavy-tailed ON sojourn-time traffic, usually, the lighter one will remain almost the same. Meanwhile, the heavier one will be less burst than the original traffic, in some cases, can change from the heavy-tailed distribution to the exponential distribution. • As an exception in 4, for case 1010, both of the heavy-tailed ON sojourn-time are changed from heavy-tailed distribution to the exponential distribution.

ACF Analysis of the Simulation • We are not only interested in the tail distribution of the traffic, but we are also very interested in the LRD and SRD attributes of the traffic. • We use autocorrelation function (ACF) to measure the LRD or SRD attributes of the traffic. And we divide the ten cases into two groups: • Group 1 : The outputs share the same pattern with the inputs. • Group 2 : The outputs are different with the inputs.

ACF Analysis of the Sim ( Group 1) For the cases in the first group, we find that the correlation structure of the outputs remain the same as the inputs, just as they do in the distribution of the ON/OFF sojourn-time. The example figure is the case 0001.

ACF Analysis of the Sim ( Group 2) • For the case 0010(Output 0000), we can easily find the output two was changed into a correlated traffic by the queue, while the output one shared the same pattern with the input one. • For the case 0011(Output 2001), the result is very similar to the case 0010.

ACF Analysis of the Sim ( Group 2) For the case 0111(Output 2111 or 1111), we can easily find both of the outputs share the same pattern with the inputs.

ACF Analysis of the Sim ( Group 2) • For the case 1010(Output 0000), we can clearly see there exist strong correlation within the sources. And another interesting phenomenon about this case is the AutoCorrelation Function of the outputs go up and down from the beginning, appear as two separate line in the log-log scale, and finally converge to one line. • For the case 1011(Output 1001), the result is very similar to the case 1010.

Achievements of the Research • We have successfully obtained the detailed and accurate results for the whole situation of the 10 cases for two kinds of traffic source models: one, traffic sources generated by Pareto and Exponential functions; and two, traffic sources generated by chaotic maps. • We analyzed the subtle interaction of the traffic sources by using different parameters and reach a conclusion. • We find some new traffic sources don’t have heavy-tailed distribution, but at the same time, possess the LRD correlation structure. These sources can not be modeled with the chaotic maps or random processes as far as we know.

Thank you !

Huai Huang Dept. of Electronic Engineering Queen Mary, University of London