1 / 47

Intelligent Bayesian Network-Based Approaches for Web Proxy Caching

Intelligent Bayesian Network-Based Approaches for Web Proxy Caching. Prepared By : Waleed Ali Ahmed & Siti Mariyam Shamsuddin Soft Computing Research Group, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 Johor, Malaysia

uyen
Télécharger la présentation

Intelligent Bayesian Network-Based Approaches for Web Proxy Caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Bayesian Network-Based Approaches for Web Proxy Caching Prepared By : Waleed Ali Ahmed & SitiMariyamShamsuddin Soft Computing Research Group, Faculty of Computer Science and Information Systems, UniversitiTeknologi Malaysia, 81310 Johor, Malaysia waleedalodini@gmail.com, mariyam@utm.my

  2. Introduction • Related Works • The Proposed Intelligent Web Proxy Caching Approaches • Implementation and Performance Evaluation • Conclusion and Future works Outline

  3. Introduction

  4. Background • Web caching is one of the most successful solutions for improving the performance of Web-based systems. • Web caching is a well-known strategy for improving the performance of Web-based system by keeping Web objects that are likely to be used in the near future in location closer to user. • Why? • To decrease latencies • To reduce web server loads • To reduce bandwidth usage

  5. Web proxy caching • In Web proxy caching, the popular web objects that are likely to be revisited in the near future are stored on the proxy server which plays the key roles between users and web sites in reducing the response time of user requests and saving the network bandwidth.

  6. Why Web proxy caching? • Proxy servers play the key roles between users and web sites, which could reduce the response time and save network bandwidth. • The most common caching strategy. The proxy caching is widely utilized by computer network administrators, technology providers, and businesses to reduce user delays and to alleviate Internet congestion (Kaya et al., 2009; Kumar, 2009, Kumar et al., 2008)

  7. Since the apportioned space to the cache is limited, the space must be utilized judiciously (Romano and ElAarag , 2011). • The most common Web caching methods are not enough efficient and may suffer from cache pollution problem (Cobb and ElAarag, 2008 ; Koskela et al., 2003). • Reduction of the effective cache size • Low hit • Wasting bandwidth. • Overload on the original server • So far, the difficulty in determining which ideal web objects will be re-visited is still a major challenge Problem Statement

  8. Motivations for using machine learning In Web caching • Availability of web access logs and trace files or history of accesses that considered complete and prior knowledge of future accesses . • The need toefficient and adaptive scheme since Web environment changes and updates rapidly and continuously . • Recent studies have proposed utilized ANN in web proxy caching although ANN training may consume long time and require extra computational overhead. • More significantly, integration of intelligent technique in web cache replacement is still under research. Intelligent Web Proxy Caching

  9. The suggested solutions • We present new intelligent approaches that depend on the capability of Bayesian Network (BN) to learn from Web proxy logs files and predict the classes of objects to be re-visited or not. • More significantly, the trained BN classifier is incorporated effectively with traditional Web proxy caching algorithmto present novel intelligent web proxy caching approaches

  10. Bayesian Network (BN) • A Bayesian network is one of the most popular machine learning models that depends on probability estimations to find a class of an observed pattern. • Rationale: • – The Bayesian network (BN) is defined as a directed acyclic graph over which is defined a probability distribution. Each node in the graph represents a random variable or event, while the arcs or edges between the nodes represent association or causal relationship

  11. Bayesian Network (BN) • The probabilistic dependency is maintained by the conditional probability table(CPT), which is attached to the corresponding event. • In classification tasks : • – the classification decision is calculated simply • using formula. probability of finding the pattern x in class c , probability of class c

  12. Why BN in Web Caching? Bayesian networks are popular supervised learning algorithms that have great popularity in medical filed and other applications such as military applications, forecasting, control, modelling for human understanding, cognitive science, statistics, and philosophy . Hence, Bayesian networks can be utilized to produce promising solutions for Web proxy caching.

  13. Related Works

  14. Intelligent Web Caching? • The conventional Web caching methods are not enough efficient (Cobb and ElAarag, 2008 ; Koskela et al., 2003) • Therefore, several researchers have proposed incorporating intelligent solutions to cope with Web caching problem. • According to Chen (2008), the intelligent approaches are more efficient and more adaptive to Web caching environment compared to others approaches

  15. Related works on intelligent web caching

  16. Summary of intelligent web caching • From the previous studies, we can observe two approaches in intelligent web caching. • An intelligent technique is employed in web caching individually. • An intelligent technique is employed with LRU Algorithm. • Both approaches may predict Web objects that can be re-accessed; However, • They did not take into account the cost and size of the predicted objects in the cache replacement decision. • Some important features are ignored. • The training process requires long time and extra computational overhead.

  17. Proposed Approach VS Existing Approaches

  18. The Proposed Intelligent Web Proxy Caching Approaches

  19. The operational framework for the proposed approach

  20. The framework consists of two functional components: • Offline component: It works only while the proxy server in leisure periods. It is responsible for training BN classifier. • Online component: The intelligent caching strategies are executed in this part. A Framework for the proposed approach

  21. In the online component, the intelligent caching strategies are achieved for managing proxy cache content. • We propose intelligent web proxy caching approaches depends on integrating BN with traditional Web caching to provide more effective caching policies • Bayesian Network-Greedy-Dual-Size Approach (BN-GDS): BN classifier is integrated with GDS for improving the performance in terms of the byte hit ratio of GDS. • Bayesian Network-Least-Recently-Used Approach (BN-LRU) : BN classifier is combined with LRU to form a new algorithm called BN-LRU. • Bayesian Network-Dynamic Aging Approach(BN-DA): BN classifier is combined with dynamic aging (DA) to form a new algorithm called BN-DA. Online Component

  22. The Greedy-Dual- Size (GDS) caching algorithm was proposed by Cao and Irani (1997). The algorithm assigns a key value K(p) to each object p in the cache, so that the object with the lowest key value is replaced : 1- The intelligent BN-GDS approach where C(p) is the cost to bring object p into the cache; S(p) is the object size; L is an inflation factor that starts at 0 and is updated to the key value of the last replaced object. If an object is accessed again, its key value is updated using the new L value.

  23. Cherkasova(1998) enhanced GDS algorithm by incorporating a frequency count , so the algorithm is called Greedy- Dual-Size-Frequency (GDSF) algorithm. where F(p) is the access count of object p. • One advantage of GDSF policy is GDSF performs well in terms of the hit ratio. However, the byte hit ratio of GDSF policy is too low. • Therefore, BN classifier is integrated with GDS for improving the performance in terms of the byte hit ratio, called BN-GDS. 1- The intelligent BN-GDS approach

  24. In the proposed BN-GDS, GDS is enhanced by incorporating the accumulative scores or probabilities of revisiting object g depending on BN classifier as in Eq. • This means that the key value of object g is determined not just by its past occurrence frequency, but also by the class predicted depending on the six factors. The rationale behind the proposed BN-GDS approach is that we can enhance the priority of those cached objects that may be revisited in the near future according to the BN classifier, even if they are not accessed frequently enough 1- The intelligent BN-GDS approach

  25. 2- The intelligent BN-LRU approach • LRU policy is the most common proxy caching policy; However, LRU policy suffers from cold cache pollution. In other words, in LRU, a new object is inserted at the top of the cache stack. If the object is not requested again, it will take some times to be moved down to the bottom of the stack before removing it. • For reducing cache pollution in LRU, BN classifier is combined with LRU to form a new algorithm called BN-LRU.

  26. 2- The intelligent BN-LRU approach • The proposed SVM-LRU is worked as follows: When the web object g is requested by user, BN classifier predicts the class of that object either will be revisited again or not. • If the object g is classified by BN as object will be re-visited again, the object g will be placed on the top of the cache stack. • Otherwise, the object g will be placed in the middle of the cache stack. • Hence, BN-LRU can efficiently remove the unwanted objects early to make space for the new Web objects.

  27. 2- The intelligent BN-LRU approach

  28. In addition to frequency, several factors can contribute in predicting the revisiting of the object in the future. • The proposed BN-DA approach combines the most significant factors depending on Bayesian network (BN)classifier for predicting probability that Web objects can be re-visited later. • In the proposed BN-DA approach, when user visits Web object g, the trained BN classifier can predict the probability of belonging g to the class with objects may be revisited. Then, the probabilities of g are accumulated as scores used in cache replacement decision 3- The intelligent BN-DA approach

  29. Implementation and Performance Evaluation

  30. 1-Data collection • We have obtained data of the proxy logs files of web objects requested in several proxy servers located around the United States of the IRCache network for fifteen days (NLANR, 2010). • In this study, the proxy log files of 21st August, 2010 were used in the training phase, while the proxy log files of the following days were used in simulation and implementation phase

  31. 2-Data Pre-processing • The data preprocessing involves removing the irrelevant requests from the log files since some the log entries are not valid or irrelevant entries. • The trace preparation is carried out as follows • Parsing: identifying the boundaries between successive fields and records in logs file • Filtering: This includes elimination of irrelevant entries such as The uncacheable requests and Entries with unsuccessful HTTP status codes. • Finalizing: This involves removing unnecessary fields. Moreover, each unique URL is converted to a unique integer identifier for reducing time of simulation.

  32. 2-Data Pre-processing The final format of our data consists of URL ID, timestamp, elapsed time, size and type of web object

  33. 3-Training Phase • The training pattern takes the format:

  34. Preparation of Dataset for web objects classification 3-Training Phase

  35. 3-Training Phase • Each proxy dataset is then divided randomly into training data (70%) and testing data (30%). • Subsequently, the dataset is discretized accordingly using MDL method suggested by Fayyad & Irani (1993) with default setup in WEKA. • Finally, the Bayesian network (BN) is trained using WEKA as well. In WEKA, BN algorithm is available in the Java class “weka.classifiers.bayes.BayesNet”. The default values of parameters and settings predefined in WEKA are used in BN training.

  36. 4-Performance Evaluation • We have modified the WebTraff simulator (Markatchev and Williamson,2002) to meet our proposed proxy caching approaches. • The trained classifiers are integrated with WebTraff simulator to simulate the proposed intelligent web proxy caching approaches.

  37. 4-Performance Evaluation • There are common measures to analyze the efficiency • Hit Ratio (HR) • Byte Hit Ratio (BHR)

  38. 4-Performance Evaluation • Analysis of IRcache traces

  39. (b) NY HR • (a) BO2 HR 4-Performance Evaluation Impact of cache size on HR for different proxy datasets

  40. 4-Performance Evaluation • BN-GDS achieves the best HR among all algorithms, while LRU achieves the worst HR among all algorithms . • BN-GDS and BN-LRU improve the performance in terms of HR for GDS and LRU respectively • Although HR of BN-DA is worse than HR of GDS and GDSF, HR of BN-DA is better than HR of NNPCR-2, BN-LRU and LRU. In terms of Hit Ratio(HR)

  41. (b) NY HR • (a) BO2 HR 4-Performance Evaluation Impact of cache size on BHR for different proxy datasets

  42. 4-Performance Evaluation • BN-LRU and BN-DA achieve the best BHR among all algorithms, while GDS and GDSF attain the worst BHR. • BHR of LRU is better than BHR of BN-GDS, GDS and GDSF. • BN-GDS improve significantly BHR of GDS and GDSF • BN-LRU and BN-DA have better BHR compared with BHR of LRU and NNPCR-2 . In terms of Byte Hit Ratio(BHR)

  43. Conclusion • This study has proposed three Intelligent Web proxy caching approaches called BN-GDS, BN-LRU and BN-DA for improving performance of the conventional Web proxy caching algorithms. • BN classifier learns from Web proxy logs file to predict the classes of objects to be re-visited or not. • The trained classifier is integrated effectively with conventional web proxy caching to provide more effective proxy caching policies. • The simulation results have revealed that BN-GDS achieved the best HR, better BHR compared to GDS and GDSF, and acceptable BHR compared to BN-LRU and BN-DA that achieved the best BHR. That means BN-GDS was able to make better balance between HR and BHR than other algorithms. On the other hand, BN-LRU and BN-DA achieved the best BHR among all algorithms, and better HR compared LRU and NNPCR-2 .

  44. Future works • In the future: • Other intelligent classifiers can be utilized to improve the performance of traditional web caching policies. • Clustering algorithms can be used for enhancing performance of web caching policies.

  45. References Kaya, C.C., Zhang, G., Tan, Y., & Mookerjee, V.S. 2009. An admission-control technique for delay reduction in proxy caching. Decision Support Systems, 46, 594-603. Kumar, C. 2009. Performance evaluation for implementations of a network of proxy caches. Decision Support Systems, 46, 492-500. Kumar, C., & Norris, J.B. 2008. A new approach for a proxy-level web caching mechanism. Decision Support Systems, 46, 52-60. Romano, S., & ElAarag, H. 2011. A neural network proxy cache replacement strategy and its implementation in the Squid proxy server. Neural Computing & Applications, 20, 59-78. Cobb, J., & ElAarag, H. 2008. Web proxy cache replacement scheme based on back-propagation neural network. Journal of Systems and Software, 81, 1539-1558. Koskela, T., Heikkonen, J., & Kaski, K. 2003. Web cache optimization with nonlinear model using object features. Computer Networks, 43, 805-817. Chen, H.T. 2008. Pre-fetching and Re-fetching in Web caching systems: Algorithms and Simulation. TRENT UNIVESITY,Peterborough, Ontario, Canada, Peterborough, Ontario, Canada. Cao, P., & Irani, S. 1997. Cost-Aware WWW Proxy Caching Algorithms. IN PROCEEDINGS OF THE 1997 USENIX SYMPOSIUM ON INTERNET TECHNOLOGY AND SYSTEMS. Publishing, Monterey, CA. Cherkasova, L. 1998. Improving WWW Proxies Performance with Greedy-Dual-Size-Frequency Caching Policy. In HP Technical Report, Palo Alto.

  46. References NLANR. 2010. National Lab of Applied Network Research(NLANR). Sanitized access logs: Available at http://www.ircache.net/. Fayyad, U.M., & Irani, K.B. 1993. Multi-interval discretization of continuous-valued attributes for classification learning, 13th International Joint Conference on Artificial Intelligence (IJCAI-93). Publishing, pp. 1022-1027. Markatchev, N., & Williamson, C., 2002. WebTraff: A GUI for Web Proxy Cache Workload Modeling and Analysis. Proceedings of the 10th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems. Publishing, p. 356.

  47. Thank you

More Related