1 / 1

Does Internet media traffic really follow the Zipf-like distribution?

y c. slope : -a. b. log i. and a have significant impacts on caching performance !. Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1 , Enhua Tan 1 , Songqing Chen 2 , Zhen Xiao 3 , and Xiaodong Zhang 1

winola
Télécharger la présentation

Does Internet media traffic really follow the Zipf-like distribution?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. yc slope: -a b log i and a have significant impacts on caching performance ! Does Internet media traffic really follow the Zipf-like distribution? Lei Guo1, Enhua Tan1, Songqing Chen2, Zhen Xiao3, and Xiaodong Zhang1 1Ohio State University 2George Mason University 3IBM Research Implications on media caching Internet media traffic: Zipf-like or not? Physical explanations of media access patterns It is commonly agreed that Web traffic follows the Zipf-like distribution. However, existing studies on media traffic are largely workload specific due to the variety of media delivery systems and the diversity of media content, and the observed access patterns are often different from or even conflict with each other. Different from Web objects, media objects have large file sizes and long lifespan. Single-parameter Zipf-like distribution cannot well characterize the access patterns of media objects. In stretched exponential model, parameter c characterizes the effect of media file sizes, and parameter a characterizes the non-stationary effect of media access aging. Temporal locality comes from request concentration and request correlation. For short periods such as one week, object popularity is almost stationary, thus locality mainly comes from concentration. Client-server model is not efficient Physical meaning of parameter c Web media systems VoD media systems P2P media systems Live streaming and IPTV systems Assume N objects with unit storage volume, cache size is N, the optimal hit ratios of SE and Zipf workloads are shown in the left figure. When concentration dominates the locality, caching of media (SE) workload is far less efficient than that of Web (Zipf-like) workload. In general, despite the different techniques and systems used for media delivery, the greater the median file size of a workload, the greater the stretch factor of the stretched exponential model of its reference rank distribution. USITS’01: Zipf-like NOSSDAV’02: non Zipf-like MMCN’00: non Zipf-like EUROSYS’06: Zipf-like SOSP’03: non Zipf-like INFOCOM’04: Zipf-like IMW’02: Zipf-like IMC’04: non Zipf-like In existing measurements, the reported Zipf-like observations are either very rough, for example, only the head or tail of the distribution curve follows Zipf law, or may contain extraneous traffic such as streaming media ads, which do not reflect real user access pattern. A general model of Internet media access patterns is highly desirable for traffic engineering on the Internet and is critical to design, benchmark, and evaluate Internet media distribution systems. entertainment workloads Long-term caching Request concentration in SE workloads decreases with parameter c while increases with parameter a. Thus, for media workloads, the request concentration increases with time. Long-term caching can exploit higher request concentration with huge amount of storage. The stretched exponential of Internet media traffic Physical meaning of parameter a P2P media Web media In a coarse time granularity, media systems often have constant object birth rate and constant media request rate . Considerthe average number of references per object in the system: client side workload client side workload workload of a server number of requests number of accessed objects Live media VoD media The increase of slows down after a long time: In a short time period, most accessed objects are old: new objects born after t = 0 old objects born before t = 0 With long-term caching, request correlation can be further exploited due to the decay of object popularity. However, it may need months to years and huge amount of storage to have a significant performance improvement. With scalable storage and huge amount of pre-existing media content in potential users, P2P-based caching system seems attractive. Reference rank distribution of media objects is non-stationary workload of a server Analyzing a wide variety of media workloads collected from different kinds of Internet media systems, we find that the reference ranks of media objects follow the stretched exponential (SE) distribution: For media systems with constant and , and constant median file size, stretch factor c is a time-invariant constant. Parameter a increases with time gradually, but tends to converge to a constant. Conclusion Internet media access patterns follow the stretched exponential distribution. The performance of media caching with a client-server model is far less effective than that of Web content caching. The stretched exponential distribution lays out an analytical foundation to establish peer-to-peer caching systems for delivering the rapidly increasing Internet media content. SE model is accepted by Chi-square test while Zipf model is rejected. i : rank of media objects y : number of references N : number of objects c: stretch factor a: minus of theslope b: normalization factor

More Related