1 / 45

Reseach on Web caching (UvA)

Reseach on Web caching (UvA). Web Cache Modeling & cache replacement strategy. Adam Belloum Computer Architecture & Parallel Systems group University of Amsterdam adam@science.uva.nl. Outline: summary.

kedma
Télécharger la présentation

Reseach on Web caching (UvA)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reseach on Web caching (UvA) Web Cache Modeling & cache replacement strategy Adam Belloum Computer Architecture & Parallel Systems group University of Amsterdam adam@science.uva.nl

  2. Outline: summary This course discusses some aspects of web caching modeling and introduces the web cache model developed at the UvA. It investigates also the first research issue which is the cache replacement policy. A number of well known methods are compared using the cache simulator on real workloads. More details can be found in our publications listed here: • A. Belloum and A.J.H. Peddemors and L.O. Hertzberger, “JERA: A Scalable Web Server.” In Proc. of the PDPDA`98, pages 576-581. 1998. • A. Belloum and L.O. Hertzberger, “Dealing with One-Timer-documents in Web Caching.” In Proc. of the EUROMICRO`98, pages 544-550. 1998. • A. Belloum and L.O. Hertzberger, “Replacement Strategies dedicated to Web Caching” In Proc. of the ISIC/CIRA/ISAS`98, pages 577-584. 1998.

  3. Outline: Issues addressed in this course • Describe the Web cache model • Discuss some Cache replacement strategies • Define the metrics and the simulation settings • Compare different cache replacement strategies.

  4. Web Cache Model

  5. Why we need to make models ? • Easier than building a prototype • Models allow to focus on a specific problem & discard the unwanted side effects • It is faster and lighter • More flexible than a prototype

  6. What do have to simulate? • Cache replacement policy • Cache coherence • Traffic of the incoming requests • Object life time • Distribution of the incoming traffic

  7. The Web Cache Simulator Architecture Cache Trace Generator Requests First Level Cache Access Log File Object removed Miss Second Level Cache

  8. The simulated cached objects Object Communication Buffer Object Communication buffer Object Object identifier Object size entry time Last modification time Object Time to live cache priority Object

  9. Cache Replacement strategy

  10. External request External request Object Object Object Object Object Object Remaining Free memory space Object Object Object When the Cache Replacement Policy is needed? External request Empty cache • Which objects have to be removed first: • Traditional strategy: LFU, LRU, FIFO • New strategy based on: size, TTL, TTR

  11. When the Cache Replacement Policy is needed? • The cache replacement strategy is closely related to the size of the cache • The idea of the cache replacement is to remove some objects when memory space is needed BUT • This process should not have an impact on the performance of the system • In other words remove the useless cached objects • The question is “ what is a useless objects”

  12. Some Reasons why an object should be removed • Not frequently referenced object • The number of reference • Not recently referenced object • The time of the last reference • Not up-to-date document • The state of the document comparing to this origin • Not likely to be referenced in the nearest future • The state of the document comparing to this origin

  13. Some Reasons why an object should be removed • An object that occupy to much space • The size of the object • An old object • The time the object has been cached • An object that is likely to change in the near future • The state of the document comparing to this origin • An object that can easily retrieved if needed in the future • Geographical distribution • The available bandwidth

  14. Which factor will lead to the best performance? • There is no simple answer to this question, it depends on: • The Internet traffic (flow of requests/responses crossing the Web cache “workload”) • The target • Reduce the traffic over the Internet backbones • Reduce the delay to retrieve the requested object • Increase the chance to forward an up to date object • Etc.

  15. Least Frequently Requested • LFU • Number of reference to each document • Removes the object with the lowest reference count • One Timer Object (reference count = 1) or • lowest reference count

  16. Least Frequently Requested Cache replacement using LFU strategy Incoming request Object (Max reference count) Lest frequently requested objects Or Object Next object to be removed

  17. Least Frequently Requested • Side effects • A newly cached object is likely to be removed, it will start will the lowest of requests • Objects that build up a very high reference count will be never removed • Does not optimize the number of removed document • Variants • LFU_Aging (combines the number of reference and the time) • LFU*(Store only if the object is requested more than once)

  18. Another method to improve LFU(cache partitioning, UvA work) • One partition is dedicated to the newly cached objects, and used other factor as a sorting process • Within this partition newly requested objects compete with each other not with the whole cache • The advantage of the partition is to keep a certain number of newly requested objects, to allow refreshing the content of the cache.

  19. Frequently requested object Partition (?) 2 Object 1 One timer Object Partition (LRU) Object Object 1 2 Cache partitioning • Top of the cache • (contain the last object to be removed) • Top of the removal list • (contain the next object to be removed) When a OTO is reference twice it is moved to the frequently requested partition When the cache is full the last object is pushed to the removal partition

  20. Least Recently requested (LRU) • The objects are sorted in the cache according to the last time they have been requested (newly requested objects are the last to be removed) • It allows to purge the cache from the old popular objects

  21. Least Recently Requested Cache replacement using LRU strategy Incoming request Object (new cached object) Object Lest recently used object Object Next object to be removed

  22. Least Recently Requested • Side effects • The transition time in the cache of one timer requested object is longer than the LFU • Does not optimize the number of removed document

  23. Cache partitioning • Two partition of the cache • Reduce the transition time of the one timer documents • Could be used with most of the existing cache replacement strategies

  24. Size based • Object are sorted according to their size • There are strategies that remove small size first • And strategies that remove small size • Optimize the number of objects to be removed

  25. New methods for cache replacement • Objects are stored according to mathematical model that combines a number of factors and assign weights to each factor: • The choice of the mathematical model (cost function) • The choice of the parameters (ttl, ttr, size, nref, etc.) • The choice of the weights assigned with the parameters

  26. Weighted function (bolot-Hoshka INRIA, France) • Objects are stored according to following optimisation function:

  27. The HYB algorithm (Abram et al. Virginia University, USA ) • Objects are stored according to following cost function:

  28. Nearest Neighbor Classifier(our group, UvA) • NNC relies on the strong relationship between similarity and distance • It relies on a number of well known methods • Nearest Neighbor Classifier (NNC) is widely used in the pattern recognition and classification domain. • Principle Component Analysis (PCA) to classify the workload components.

  29. Nearest Neighbor Classifier(our group, UvA) • Cache replacement can described as a classification problem: • We define two classes: one containing objects to be removed and one representing the objects to be kept • Each class is represented by a unique descriptor • Worst cacheable Object (WCO) • Best cacheable Object (BWO) • Define a metric of the distance • The removal policy will be just to remove the documents which are the closest to the WCO

  30. Parameter 1 Descriptor of class A Parameter 2 Descriptor of class B Parameter 3 x: Observation assigned to class A +: Observation assigned to class B Nearest Neighbor Classifier(our group, UvA)

  31. Nearest Neighbor Classifier(our group, UvA) • This method has 3 parameters that can tuned: • The WCO & BCO • The distance • The parameters (using PCA)

  32. Nearest Neighbor Classifier(our group, UvA) • We have a number of factors that can be used to select the next object to be removed, and we want to know which are the most important • PCA:

  33. Nearest Neighbor Classifier(our group, UvA) • Workload • Wins • 3000 elements

  34. Workload • Wins • 12000 elements

  35. Workload • Wins • 12000 elements * Address o Size x Last ref + Number of ref. eigen values Number of requests

  36. Setup the simulation: workload • The workload • UvA workload (UvA proxy cache server) • External request to local objects ( this workload has a strong locality of reference • NLAR workload (cache.nlar.net) • Request to object generated other remotely

  37. Setup the simulation: workload Other characteristics of a workload • Duration • Number of byte transferred • Number of requests

  38. Setup the simulation: metrics • The metrics • SHR(HR): System Hit Ratio • Give us information how successful is the cache management • Delay of retrieving object • SBHR(BHR): System Byte Hit Ratio • Gives us information about the bandwidth usage

  39. Setup the simulation: Assumptions • Objects are supposed to be static • The cache has two level, the second is used as a victim cache of the first level. • The first level cache is fixed to 64 MB.

  40. 1 0.9 0.7 0.6 0.5 0.4 0.3 0.2 0 100 200 300 400 500 600 LRU LFU SIZE Weighted-method NNC LRU LFU SIZE Weighted-method NNC x x o o + + * * Workload: WINS Workload: WINS Static Document Static Document Simulation results: wins Workload 0.95 0.9 0.85 0.80 0.75 0.70 0 100 200 300 400 500 600 BHR/Cache Size (MB) DHR/Cache Size (MB)

  41. 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 100 200 300 400 500 600 0 100 200 300 400 500 600 LRU LFU SIZE Weighted-method NNC LRU LFU SIZE Weighted-method NNC x x o o + + * * Workload: NLAR Workload: NLAR Static Document Static Document Simulation results: nlanr workload BHR/Cache Size (MB) DHR/Cache Size (MB)

  42. Simulation results (Analysis) • Under the same conditions (same workload, same cache size, and same assumptions over the cached objects) • The replacement policies lead to the same cache performance • A relatively small cache size (600 MB) makes no difference among the performance obtained using different cache replacement strategies

  43. Simulation results (Analysis) • However, this can not be the final results because, we did not take into considerations the object coherency (one replacement strategy may lead to a lot more faulty hits than the other)

  44. Conclusions • If documents are considered static hit ratios seem to be equivalent for all the replacement policy for relatively small size configuration • This is not a final result since it does not consider the cache coherence.

  45. 0.055 0.05 0.045 0.040 0.035 0.030 0.025 0.08 0.075 0.070 0.065 0.060 0.055 0.055 0.045 0 100 200 300 400 500 600 0 100 200 300 400 500 600 Part 1.1 Part 1.2 Part1.3 Part 1.1 Part 1.2 Part1.3 + + * * Simulation results (Cache partition)

More Related