Dynamic Web Content

Caching Dynamic Web Content: Designing and Analyzing an Aspect-Oriented SolutionSara Bouchenak – INRIA, France Alan Cox – Rice University, Houston Steven Dropsho – EPFL, Lausanne Sumit Mittal – IBM Research, India Willy Zwaenepoel – EPFL, Lausanne

Cache HTTP request SQL req. SQL res. HTTP response Web tier Business tier Database tier Internet Database server Client Web server Application server Dynamic Web Content • Motivation for Caching • Represents large portion of web requests • Stock quotes, bidding-buying status on auction site, best-sellers on bookstore • Generation places huge burden on application servers

Caching Dynamic Web Content • Dynamic Content Not easy to Cache • Ensure consistency, invalidate cached entries due to updates • Write requests can modify entries used by read requests • Caching logic inserted at different points in the application • Entry and exit of requests, access to underlying database • Correlation between requests and their database accesses • Most solutions rely on “manually” understanding complex application logic

Our Contributions • Design a cache “AutoWebCache” that • Ensures consistency of cached documents • Insertion of caching logic transparent to application • Make use of aspect-oriented programming • Analysis of the cache • Transparency of injecting caching logic • Improvement in response time for test-bed applications

Cache Check Request info Database access Caching Logic Cache inserts, invalidations AutoWebCache HTTP request SQL req. Internet SQL res. Database server HTTP response Client Web server Application server Dynamic Web Caching – Solution Approach • Consistency • Correlation between read and write requests Web Page Cache • Transparency • Capture information flow

Outline • Design of AutoWebCache • Maintaining cache consistency • Determine relationship between reads and updates • Cache Structure • Aspectizing Web Caching • Insertion of caching logic transparently • Evaluation • Analysis of effectiveness, transparency • Conclusion

Maintaining Cache Consistency – Read Requests • Response to read-only requests cached • Read SQL queries recorded with cache entry Index: URI (readHandlerName + readHandlerArgs) Cached web page Associated Read Queries URI1 WebPage1 { Read Query 11, Read Query 12, ….} URI2 WebPage2 { Read Query 21, Read Query 22, ….} … …

No Invalidation WS RS Invalidation WS RS Maintaining Cache Consistency – Write Requests • Result not cached • Write SQL queries recorded • Intersect write SQL queries with read queries of cached pages • Invalidate if non-zero intersection

Remove Invalidating Cache Entries Index: URI (readHandlerName + readHandlerArgs) Cached web page Associated Read Queries URI1 WebPage1 { Read Query 11, Read Query 12, ….} URI2 WebPage2 { Read Query 21, Read Query 22, ….} URI3 WebPage3 { Read Query 31, Read Query 32, ….} URInWrite Query

Query Analysis Engine • Determines intersection between SQL queries • Three levels of granularity for intersection • Column based • Value based • Extra query based • Balance precision with complexity

UPDATE T SET T.c = 7 WHERE T.b = 10 UPDATE T SET T.a = 12 WHERE T.b = 10 Column Based Intersection Invalidate if Column_Read = Column_Updated a b c 5 8 7 1 10 9 SELECT T.a FROM T WHERE T.b = 8 Ok Invalidate

SELECT T.a FROM T WHERE T.b = 8 UPDATE T SET T.a = 7 WHERE T.b = 10 UPDATE T SET T.a = 12 WHERE T.b = 8 Value Based Intersection Invalidate if Rows_Read = Rows_Updated a b c 5 8 7 1 10 9 Invalidate with column-based Ok Invalidate

SELECT T.b FROM T WHERE T.c = 9 Extra Query Based Intersection Generate extra query to find missing values a b c 5 8 7 ?? 1 10 9 Invalidate with value-based SELECT T.a FROM T WHERE T.b = 8 Ok UPDATE T SET T.a = 3 WHERE T.c = 9

Outline • Design of AutoWebCache • Maintaining cache consistency • Determine relationship between reads and updates • Cache Structure • Aspectizing Web Caching • Insertion of caching logic transparently • Evaluation • Analysis of effectiveness, transparency • Conclusion

Cache Check Request info Database access Caching Logic Cache inserts, invalidations AutoWebCache HTTP request SQL req. Internet SQL res. Database server HTTP response Client Web server Application server Dynamic Web Caching – Solution Approach • Transparency • Capture information flow Web Page Cache

Aspect-Oriented Programming (AOP) • Modularize cross-cutting concerns - Aspects • Logging, billing, exception handling • Works on three principles • Capture the execution points of interest – Pointcuts (1) • Method calls, exception points, read/write accesses • Determine what to do at these pointcuts – Advice (2) • Encode cross-cutting logic (before/ after/ around) • Bind Pointcuts and Advice together – Weaving (3) • AspectJ compiler for Java

Original web application Caching library Weaving Rules Aspect Weaving (Aspect J) Cache-enabled web application version Insertion of Caching Logic

Cache check Capturing request entry Capturing request exit String cachedDoc = Cache.get (uri, inputInfo); if (cachedDoc != null) return cachedDoc; // Cache hit Capturing SQL queries Collecting dependency info Capture main Collect SQL query info Cache insert Cache.add(webDoc, uri, inputInfo, dependencyInfo); // Cache miss Aspectizing Read Requests Original code of a read-only request handler // Execute SQL queries … SQL query 1 SQL query 2 … // Generate a web document webDoc = … // Return the web document …

Capturing SQL queries Collecting invalidation info Collect SQL query info Capture main Capturing request exit Cache invalidation // Cache consistency Cache.remove(invalidationInfo); Aspectizing Write Requests Original code of a write request handler // Execute SQL queries … SQL query 1 SQL query 2 … … // Return

Capturing Servlet’s main Method // Pointcut for Servlets’ main methodpointcut servletMainMethodExecution(...) : execution( void HttpServlet+.doGet( HttpServletRequest, HttpServletResponse)) ||execution( void HttpServlet+.doPost( HttpServletRequest, HttpServletResponse)); • Pointcut captures entry and exit points of web request handlers • Cache Checks and Inserts for Read Requests • Invalidations for Update Requests

Weaving Rules for Cache Checks and Inserts // Advice for read-only requestsaround(...) : servletMainMethodExecution (...) { // Pre-processing: Cache check String cachedDoc; cachedDoc = ... call Cache.get of AutoWebCache if (cachedDoc != null) {... return cachedDoc } // Normal execution of the requestproceed(...); // Post-processing: Cache insert ... call Cache.add of AutoWebCache }

Weaving Rules for Cache Invalidations // Advice for write requestsafter(...) : servletMainMethodExecution (...) { // Cache invalidation ... call Cache.remove of AutoWebCache }

Weaving Rules for Collecting Consistency Information // Pointcut for SQL query callspointcut sqlQueryCall( ) : call(ResultSet PreparedStatement.executeQuery()) || call(int PreparedStatement.executeUpdate()); // Advice for SQL query callsafter( ) : sqlQueryCall ( ) { ... collect consistency info ...} • After each SQL query, note • Query template • Query instance values

Transparency of AutoWebCache • Ability to Capture Information Flow • Entry and exit points of request handlers • e.g. doGet(), doPost() APIs for Java Servlets • Modification to underlying data sets • e.g. JDBC calls for SQL requests • Multiple sources of dynamic behavior • Currently handle dynamic behavior from SQL queries • Need standard interfaces for all sources

Hidden State Problem … Number number = getRandom ( ); Image img = getImage (number); displayImage (img); request execution … • Request does not contain all information for response creation • Occurs when random nos., timers etc. used by application • Subsequent requests result in different responses • Duty of developer to declare such requests non-cacheable

Use of Application Semantics • Aspect-orientedness relies on code syntax • Cannot capture semantic concepts • In TPC-W application • Best Seller requests allows dirty reads for 30 sec • Conforms to specification clauses 3.1.4.1 and 6.3.3.1 • Application semantics can be used to improve performance • Best seller cache entry time-out set for 30 sec

Outline • Design of AutoWebCache • Maintaining cache consistency • Determine relationship between reads and updates • Cache Structure • Aspectizing Web Caching • Insertion of caching logic transparently • Evaluation • Analysis of effectiveness • Conclusion

Evaluation Environment • RUBiS • Auction site based on eBay • Browsing items, bidding, leaving comments etc. • Large number of requests that can be satisfied quickly • TPC-W • Models an on-line bookstore • Listing new products, best-sellers, shopping cart etc. • Small number of requests that are database intensive • Client Emulator • Client browser emulator generates requests • Average think time, session time conform to TPCW v1.8 specification • Cache warmed for 15 min, statistics gathered over 30 min

140 120 100 80 60 Response Time (ms) 40 20 0 0 200 400 600 800 1000 Number of Clients No cache AutoWebCache Response Time for RUBiS – Bidding Mix

Relative Benefits for different Requests in RUBiS 25 20 15 Percent of Requests 10 5 0 Put Bid Put Cmt Buy Now About Me View Bids View Item View User Search Rgn Search Cat Browse Cat Browse Rgn Request Type Hits Misses

10000 1000 100 Response Time (ms) 10 1 50 100 150 200 250 300 350 400 Number of Clients No cache AutoWebCache Optimization for Semantics Response Time for TPC-W – Shopping Mix

25 20 15 Percent of Requests 10 5 0 best sellers order display order inquiry new products product detail admin request search request execute search home interaction Request Type Hits based on app. semantics Hits Misses Relative Benefits for different Requests in TPC-W

Implementation of AutoWebCache

Conclusion • AutoWebCache - a cache that • Ensures consistency of cached documents • Query Analysis • Insertion of caching logic transparent to application • Make use of aspect-oriented programming • Transparency of AutoWebCache • Well-defined, standard interfaces for information flow • Presence of hidden states • Use of application semantics

Questions / Comments / Suggestions !

Thank You!!

Column(s) Selected Table Concerned Predicate Condition Column(s) Updated SQL Query Structure SELECT T.a FROM T WHERE T.b=10 UPDATE T SET T.c WHERE 20 < T.d < 35

Response Time for RUBiS – Bidding Mix 140 120 100 80 Response time (ms) 60 40 20 0 0 200 400 600 800 1000 Number of Clients No cache AC column based AC value based AC extra query Hand-coded

Response Time for TPCW – Shopping Mix 10000 1000 Response time (ms) 100 10 1 0 50 100 150 200 250 300 350 400 450 Number of Clients No cache AC column based AC value based AC extra query Hand-coded

Remove If a Write Query invalidates ReadQueryTemplate1with instances values1a Cache Structure in AutoWebCache Index: SQL String <value vector, URI> pair Index: URI (readHandlerName + readHandlerArgs) Cached web page ReadQueryTemplate1 <instance values1a, URI1> <instance values1b, URI41> <instance values1c, URI57> URI1 WebPage1 ReadQueryTemplate2 <instance values2a, URI7> URI2 WebPage2 ReadQueryTemplate3 <instance values3a, URI12> … … … …

Evaluation • Analysis of AutoWebCache • Effect on performance of applications • Relation of application semantics to cache efficiency • Relative benefit of caching on different read-only requests • Usefulness of AOP techniques in implementing the caching system

350 300 250 200 Response Time (ms) 150 100 50 0 Put Bid Put Cmt Buy Now View Item About Me View Bids View User Search Cat Browse Cat Search Rgn Browse Rgn Request Type Breakdown of Response Times for Requests in RUBiS Overall avg. response time Extra time for a Miss (on top of overall response time)

350 300 250 200 Response Time (ms) 150 100 50 0 best sellers order inquiry order display product detail new products admin request execute search search request home interaction Request Type Extra time for a Miss (on top of overall response time) Overall avg. response time Breakdown of Response Times for Requests in TPC-W

Key Aspect-Oriented Programming Concepts • “Join points”identify executable points in system • Method calls, read and write accesses, invocations • “Pointcuts” allow capturing of various join points • “Advice” specifies actions to be performed at pointcuts • Before or after the execution of a pointcut • Encode the cross-cutting logic

Conclusion • Dynamic Content Not easy to Cache • Ensure consistency, invalidate cached entries as a result of updates • AutoWebCache – Query Analysis • Caching logic inserted at different points in the application • Entry and exit of requests, access to underlying database • Most solutions rely on understanding complex application logic • AutoWebCache – Transparent insertion of caching logic using AOP • Transparency affected by • Well-defined, standard interfaces for information flow • Presence of hidden states • Use of application semantics

Web Caching versus Query Caching • The two are complimentary • Web caching useful when app server is bottleneck • Documents can be cached nearer to the client, distributed • Can make use of application semantics with web page caching (best seller for TPC-W)

Dynamic Web Content